This blog post is part of a series on XPath. The content comes from my ebook EverydayXPath. Part of the content from the book will be released to the public as blog posts. In this post, we explain what XPath does. We disect the components of an XPath expression. And why the context is the key to forming the expression.
Operators
Operators are special characters that has special meaning in a XPath expression.
- Child
/
: Selects the children of the current context. When used at the beginning of the expression, it starts from the root node. - Recursive descent
//
: Searches the document, descending from the current context. When used at the beginning of the expression, it searches the entire document from the root node. - Dot
.
: Selects the current context. By default it takes the root node as the current context. - Double dot
..
: Selects the parent of the current context. - Wildcard
*
: Selecta all nodes from the current context. - Attribute
@
: Selects the node with the attribute and returns the attribute. - Attribute wildcard
@*
: Selects nodes with any attributes and returns the attributes. - Round brackets
()
: Expressions within round brackets takes precedence in the evaluation order. - Square brackets
[]
: Used both as a subscript operator and to encapsulate a filter expression. - Addition
+
: Performs addition. - Subtraction
-
: Performs subtraction. - Division
div
: Performs division. - Multiplication
*
: Performs multiplication. - Modulo
mod
: Performs modulo
For the following examples, we would use the below as the reference document.
<table class="active">
<tr>
<td>Alpha</td>
</tr>
<tr>
<td>Bravo</td>
</tr>
<tr>
<td>Charlie</td>
</tr>
<tr>
<td>Delta</td>
</tr>
<tr>
<td>
<table style="margin: 12px">
<tr>
<td>Echo</td>
</tr>
<tr>
<td>Foxtrot</td>
</tr>
</table>
</td>
</tr>
</table>
Child operator /
The child operator selects the children of the current context. When used at the beginning of the expression, it starts from the root node.
/table
This expression selects the table
node starting from the root.
<table class="active">
<tr>
<td>Alpha</td>
</tr>
<tr>
<td>Bravo</td>
</tr>
<tr>
<td>Charlie</td>
</tr>
<tr>
<td>Delta</td>
</tr>
<tr>
<td>
<table style="margin: 12px">
<tr>
<td>Echo</td>
</tr>
<tr>
<td>Foxtrot</td>
</tr>
</table>
</td>
</tr>
</table>
/table/tr/td
This expression selects all the td
nodes who are children of tr
nodes who are children of the table
node. Notice that it does not select the td
nodes in the inner table
.
<td>Alpha</td> <1>
<td>Bravo</td> <2>
<td>Charlie</td> <3>
<td>Delta</td> <4>
<td> <5>
<table style="margin: 12px">
<tr>
<td>Echo</td>
</tr>
<tr>
<td>Foxtrot</td>
</tr>
</table>
</td>
Recursive descent //
The recursive descent operator searches the document, descending from the current context. When used at the beginning of the expression, it searches the entire document from the root node.
//td
This expression selects all the td
nodes. Notice that it selects all the td
nodes regardless of their parents.
<td>Alpha</td> <1>
<td>Bravo</td> <2>
<td>Charlie</td> <3>
<td>Delta</td> <4>
<td> <5>
<table style="margin: 12px">
<tr>
<td>Echo</td>
</tr>
<tr>
<td>Foxtrot</td>
</tr>
</table>
</td>'
<td>Echo</td> <6>
<td>Foxtrot</td> <7>
/table/tr/td//td
This expression selects all td
nodes descending from the context of the td
node of the root table
. Notice that the first level td
nodes are not selected.
<td>Echo</td> <1>
<td>Foxtrot</td> <2>
Dot operator .
The dot operator selects the current context. By default it takes the root node as the current context.
//td[.="Bravo"]
This expression selects the td
node whose value is Bravo
.
<td>Bravo</td>
Double dot operator ..
The double dot operator selects the parent of the current context.
/table/tr/td/..
This expression selects the parent node of the td
nodes.
<tr> <1>
<td>Alpha</td>
</tr>
<tr> <2>
<td>Bravo</td>
</tr>
<tr> <3>
<td>Charlie</td>
</tr>
<tr> <4>
<td>Delta</td>
</tr>
<tr> <5>
<td>
<table style="margin: 12px">
<tr>
<td>Echo</td>
</tr>
<tr>
<td>Foxtrot</td>
</tr>
</table>
</td>
</tr>
Wildcard operator *
The wildcard operator selects all nodes from the current context.
/table/tr/td/table/*/td
This expression selects all td
nodes from the inner table
.
<td>Echo</td> <1>
<td>Foxtrot</td> <2>
Attribute operator @
The attribute operator selects the node with the attribute and returns the attribute.
//table/@style
This expression selects the table
with the style
attribute and returns the style
attribute.
style="margin: 12px"
Attribute wildcard operator @*
The attribute wildcard operator selects nodes with any attributes and returns the attributes.
//@*
This expression selects all the attributes found in any of the nodes.
class="active" <1>
style="margin: 12px" <2>
Round brackets operator ()
Expressions within round brackets takes precedence in the evaluation order.
(//@*)[1]
The expression within the round brackets is evaluated first, and the first result is returned by the subscript operator.
class="active"
Square brackets operator []
Square brackets are either used as a subscript operator or to encapsulate a filter expression.
(//td)[2]
This expression selects the second td
node from the results of the expression encapsulated in the round brackets operator.
<td>Bravo</td>
Here is another example as a subscript operator
/table/tr[2]
This expression selects the second tr
node which is a child of the table
node.
<tr>
<td>Bravo</td>
</tr>
The following example shows the square brackets operator encapsulating a filter expression.
(//td)[last()]
This expression selects the last node from the results of the expression in the round brackets operator.
<td>Foxtrot</td>
Here is another example of encapsulating a filter expression.
//td[table]
This expression selects the td
node which contains a table
node.
<td>
<table style="margin: 12px">
<tr>
<td>Echo</td>
</tr>
<tr>
<td>Foxtrot</td>
</tr>
</table>
</td>
Arithmetic operators + - * div mod
The arithmetic operators evaluates expressions to return numerical results.
count(//td) + count(//table)
The count
function returns the number of nodes returned from evaluating the expression. There are 7 td
nodes and 2 table
nodes which when added, returns 9.
9.0