Wong Liang Zan

Wong Liang Zan

© 2020

Everday XPath - Operators

This blog post is part of a series on XPath. The content comes from my ebook EverydayXPath. Part of the content from the book will be released to the public as blog posts. In this post, we explain what XPath does. We disect the components of an XPath expression. And why the context is the key to forming the expression.

Operators

Operators are special characters that has special meaning in a XPath expression.

  • Child /: Selects the children of the current context. When used at the beginning of the expression, it starts from the root node.
  • Recursive descent //: Searches the document, descending from the current context. When used at the beginning of the expression, it searches the entire document from the root node.
  • Dot .: Selects the current context. By default it takes the root node as the current context.
  • Double dot ..: Selects the parent of the current context.
  • Wildcard *: Selecta all nodes from the current context.
  • Attribute @: Selects the node with the attribute and returns the attribute.
  • Attribute wildcard @*: Selects nodes with any attributes and returns the attributes.
  • Round brackets (): Expressions within round brackets takes precedence in the evaluation order.
  • Square brackets []: Used both as a subscript operator and to encapsulate a filter expression.
  • Addition +: Performs addition.
  • Subtraction -: Performs subtraction.
  • Division div: Performs division.
  • Multiplication *: Performs multiplication.
  • Modulo mod: Performs modulo

For the following examples, we would use the below as the reference document.

<table class="active">
  <tr>
    <td>Alpha</td>
  </tr>
  <tr>
    <td>Bravo</td>
  </tr>
  <tr>
    <td>Charlie</td>
  </tr>
  <tr>
    <td>Delta</td>
  </tr>
  <tr>
    <td>
      <table style="margin: 12px">
        <tr>
          <td>Echo</td>
        </tr>
        <tr>
          <td>Foxtrot</td>
        </tr>
      </table>
    </td>
  </tr>
</table>

Child operator /

The child operator selects the children of the current context. When used at the beginning of the expression, it starts from the root node.

/table

This expression selects the table node starting from the root.

<table class="active">
  <tr>
    <td>Alpha</td>
  </tr>
  <tr>
    <td>Bravo</td>
  </tr>
  <tr>
    <td>Charlie</td>
  </tr>
  <tr>
    <td>Delta</td>
  </tr>
  <tr>
    <td>
      <table style="margin: 12px">
        <tr>
          <td>Echo</td>
        </tr>
        <tr>
          <td>Foxtrot</td>
        </tr>
      </table>
    </td>
  </tr>
</table>
/table/tr/td

This expression selects all the td nodes who are children of tr nodes who are children of the table node. Notice that it does not select the td nodes in the inner table.

<td>Alpha</td> <1>
<td>Bravo</td> <2>
<td>Charlie</td> <3>
<td>Delta</td> <4>
<td> <5>
  <table style="margin: 12px">
    <tr>
      <td>Echo</td>
    </tr>
    <tr>
      <td>Foxtrot</td>
    </tr>
  </table>
</td>

Recursive descent //

The recursive descent operator searches the document, descending from the current context. When used at the beginning of the expression, it searches the entire document from the root node.

//td

This expression selects all the td nodes. Notice that it selects all the td nodes regardless of their parents.

<td>Alpha</td> <1>
<td>Bravo</td> <2>
<td>Charlie</td> <3>
<td>Delta</td> <4>
<td> <5>
  <table style="margin: 12px">
    <tr>
      <td>Echo</td>
    </tr>
    <tr>
      <td>Foxtrot</td>
    </tr>
  </table>
</td>'
<td>Echo</td> <6>
<td>Foxtrot</td> <7>
/table/tr/td//td

This expression selects all td nodes descending from the context of the td node of the root table. Notice that the first level td nodes are not selected.

<td>Echo</td> <1>
<td>Foxtrot</td> <2>

Dot operator .

The dot operator selects the current context. By default it takes the root node as the current context.

//td[.="Bravo"]

This expression selects the td node whose value is Bravo.

<td>Bravo</td>

Double dot operator ..

The double dot operator selects the parent of the current context.

/table/tr/td/..

This expression selects the parent node of the td nodes.

<tr> <1>
  <td>Alpha</td>
</tr>
<tr> <2>
  <td>Bravo</td>
</tr>
<tr> <3>
  <td>Charlie</td>
</tr>
<tr> <4>
  <td>Delta</td>
</tr>
<tr> <5>
  <td>
    <table style="margin: 12px">
      <tr>
        <td>Echo</td>
      </tr>
      <tr>
        <td>Foxtrot</td>
      </tr>
    </table>
  </td>
</tr>

Wildcard operator *

The wildcard operator selects all nodes from the current context.

/table/tr/td/table/*/td

This expression selects all td nodes from the inner table.

<td>Echo</td> <1>
<td>Foxtrot</td> <2>

Attribute operator @

The attribute operator selects the node with the attribute and returns the attribute.

//table/@style

This expression selects the table with the style attribute and returns the style attribute.

style="margin: 12px"

Attribute wildcard operator @*

The attribute wildcard operator selects nodes with any attributes and returns the attributes.

//@*

This expression selects all the attributes found in any of the nodes.

class="active" <1>
style="margin: 12px" <2>

Round brackets operator ()

Expressions within round brackets takes precedence in the evaluation order.

(//@*)[1]

The expression within the round brackets is evaluated first, and the first result is returned by the subscript operator.

class="active"

Square brackets operator []

Square brackets are either used as a subscript operator or to encapsulate a filter expression.

(//td)[2]

This expression selects the second td node from the results of the expression encapsulated in the round brackets operator.

<td>Bravo</td>

Here is another example as a subscript operator

/table/tr[2]

This expression selects the second tr node which is a child of the table node.

<tr>
  <td>Bravo</td>
</tr>

The following example shows the square brackets operator encapsulating a filter expression.

(//td)[last()]

This expression selects the last node from the results of the expression in the round brackets operator.

<td>Foxtrot</td>

Here is another example of encapsulating a filter expression.

//td[table]

This expression selects the td node which contains a table node.

<td>
  <table style="margin: 12px">
    <tr>
      <td>Echo</td>
    </tr>
    <tr>
      <td>Foxtrot</td>
    </tr>
  </table>
</td>

Arithmetic operators + - * div mod

The arithmetic operators evaluates expressions to return numerical results.

count(//td) + count(//table)

The count function returns the number of nodes returned from evaluating the expression. There are 7 td nodes and 2 table nodes which when added, returns 9.

9.0