XML and the language XPath
Introduction
XPath stands for XML Path Language
XPath uses "path expressions" syntax to
identify and navigate nodes in an XML
document
XPath contains over 200 built-in functions
XPath is a major element in the XSLT
standard
XPath is a W3C recommendation
2
Introduction
XPath is used in other XML technologies:
◦ XML Schemas (expression of uniqueness and key
constraints),
◦ XSLT transforms,
◦ Xquery
◦ Xlink
◦ XPointer, etc.
Unlike XML Schema, XPath is not an XML
language (uses another syntax)
3
XPath Terminology
Nodes
◦ In XPath, there are seven kinds of nodes:
element, attribute, text,
namespace, processing-instruction,
comment, and root node.
◦ XML documents are treated as trees of nodes.
◦ The topmost element of the tree is called the root
element.
Atomic values
◦ Atomic values are nodes with no children or parent.
Items
◦ Items are atomic values or nodes.
4
Examples of Items
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book>
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
<bookstore> (root element node)
<author>J K. Rowling</author> (element node)
lang="en" (attribute node)
J K. Rowling and "en" are Atomic values
5
XPath "Path Expressions"
XPath uses path expressions to select
nodes or node-sets in an XML document.
These path expressions look very much
like the path expressions you use with
traditional computer file systems:
6
Path Expression
A Path Expression is:
A traversal of the document tree :
◦ from a starting node
◦ to a set of target nodes
◦ the targets constitute the value of the path
Node sequence :
◦ T1.T2. ... .Tn
Returns one or more nodes Tn, such that there are arcs:
◦ T1 T2, ... Tn-1 Tn,
db
db.Book.Author
Book Book
Author Author
Author Title Author A1 A2
A1 T1 A2 7
Relationship of Nodes
Each element and attribute has one parent.
the book element is the parent of the title,
author, year, and price
<bookstore>
<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
8
Relationship of Nodes
Element nodes may have zero, one or more
children.
the title, author, year, and price elements are
all children of the book element
<bookstore>
<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
9
Relationship of Nodes
Nodes that have the same parent are
Siblings
The title, author, year, and price elements are
all siblings
<bookstore>
<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
10
Relationship of Nodes
Node's parents, parent's parents, etc are
Ancestors
The ancestors of the title element are the
book element and the bookstore element
<bookstore>
<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
11
Relationship of Nodes
Node's children, children's children, etc are
Descendants
Descendants of the bookstore element are
the book, title, author, year, and price
elements
<bookstore>
<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
12
Selecting Nodes
XPath uses path expressions to select nodes in an
XML document.
The node is selected by following a path or steps.
The most useful path expressions are listed below:
Expression Description
nodename Selects all nodes with the name "nodename"
/ Selects from the root node
// Selects nodes in the document from the current node
that match the selection no matter where they are
. Selects the current node
.. Selects the parent of the current node
@ Selects attributes
13
Some path expressions and the result
Path Expression Result
bookstore Selects all nodes with the name "bookstore"
/bookstore Selects the root element bookstore
Note: If the path starts with a slash ( / ) it always represents
an absolute path to an element!
bookstore/book Selects all book elements that are children of bookstore
//book Selects all book elements no matter where they are in the
document
bookstore//book Selects all book elements that are descendant of the
bookstore element, no matter where they are under the
bookstore element
//@lang Selects all attributes that are named lang
14
Predicates
Predicates are used to find a specific node
or a node that contains a specific value.
Predicates are always embedded in square
brackets.
15
Some path expressions with predicates
and the result
Path Expression Result
/bookstore/book[1] Selects the first book element that is the child
of the bookstore element.
Note: In IE 5,6,7,8,9 first node is[0], but
according to W3C, it is [1]. To solve this
problem in IE, set the SelectionLanguage to
XPath:
In JavaScript:
xml.setProperty("SelectionLanguage","XPath");
/bookstore/book[last()] Selects the last book element that is the child
of the bookstore element
/bookstore/book[last()-1] Selects the last but one book element that is
the child of the bookstore element
/bookstore/book[position()<3] Selects the first two book elements that are
children of the bookstore element
16
Some path expressions with predicates
and the result (2)
Path Expression Result
//title[@lang] Selects all the title elements that have an
attribute named lang
//title[@lang='en'] Selects all the title elements that have a "lang"
attribute with a value of "en"
/bookstore/book[price>35.00] Selects all the book elements of the bookstore
element that have a price element with a value
greater than 35.00
/bookstore/book[price>35.00]/title Selects all the title elements of the book
elements of the bookstore element that have a
price element with a value greater than 35.00
17
Selecting Unknown Nodes
XPath wildcards can be used to select unknown XML nodes.
Wildcard Description
* Matches any element node
@* Matches any attribute node
node() Matches any node of any kind
some path expressions and their result
Path Expression Result
/bookstore/* Selects all the child element nodes of the bookstore
element
//* Selects all elements in the document
//title[@*] Selects all title elements which have at least one attribute
of any kind
18
Selecting Several Paths
The use of the | operator in an XPath
expression allows to select several paths.
Path Expression Result
//book/title | //book/price Selects all the title AND price elements of all
book elements
//title | //price Selects all the title AND price elements in the
document
/bookstore/book/title | Selects all the title elements of the book
//price element of the bookstore element AND all
the price elements in the document
19
Location Path Expression
A location path consists of one or more
steps, each separated by a slash
A location path can be absolute or relative.
◦ An absolute location path starts with a slash "/"
◦ A relative location path does not.
An absolute location path:
/step/step/...
A relative location path:
step/step/...
20
Step
Each step is evaluated against the nodes in
the current node-set.
A step consists of:
◦ an axis (defines the tree-relationship between the
selected nodes and the current node)
◦ a node-test (identifies a node within an axis)
◦ zero or more predicates (to further refine the
selected node-set)
The syntax for a location step is:
axisname::nodetest[predicate1]… [predicateN]
21
XPath Axes
An axis represents a relationship to the context (current) node,
It is used to locate nodes relative to the context node on the tree.
Optional (by default child)
AxisName Result
ancestor Selects all ancestors (parent, grandparent, etc.) of the
current node
ancestor-or-self Selects all ancestors (parent, grandparent, etc.) of the
current node and the current node itself
attribute Selects all attributes of the current node
child Selects all children of the current node
descendant Selects all descendants (children, grandchildren, etc.) of the
current node
descendant-or-self Selects all descendants (children, grandchildren, etc.) of the
current node and the current node itself
22
XPath Axes (2)
AxisName Result
following Selects everything in the document after the
closing tag of the current node
following-sibling Selects all siblings after the current node
namespace Selects all namespace nodes of the current node
parent Selects the parent of the current node
preceding Selects all nodes that appear before the current
node in the document, except ancestors,
attribute nodes and namespace nodes
preceding-sibling Selects all siblings before the current node
self Selects the current node
23
Filtres
A type of nodes that interests us in the
chosen axis (any nodes, any elements or a
specific element, comments, etc.)
Mandatory, describes the subset of
nodes of the selected axis
24
Filtres
Two ways to filter the nodes of an axis:
◦ By their name
For nodes that have a name (Element, Attribute,
ProcessingInstruction)
* : any name
◦ By their type
text() : text nodes
comment() : comment nodes
processing-instruction() : ProcessingInstruction nodes
node() : all node types
25
Predicates
Are optional
Describe additional filtering
Conditions (combined by logical
operators) to be satisfied by the nodes
Additional conditions for selecting nodes
among those retained by the filter in the
axis.
26
Predicates
Boolean expression consisting of tests
connected by the logical operators and and
or
◦ Negation: by the not() function
Test: elementary Boolean expression
◦ Comparison
◦ Boolean function call
◦ Path expression converted to Boolean
Node set: false if the set is empty, otherwise true
27
Xpath examples with axes
Example Result
child::book Selects all book nodes that are children of the current
node
attribute::lang Selects the lang attribute of the current node
child::* Selects all element children of the current node
attribute::* Selects all attributes of the current node
child::text() Selects all text node children of the current node
child::node() Selects all children of the current node
descendant::book Selects all book descendants of the current node
ancestor::book Selects all book ancestors of the current node
ancestor-or-self::book Selects all book ancestors of the current node - and
the current as well if it is a book node
child::*/child::price Selects all price grandchildren of the current node
28
XPath Standard Functions
XPath includes over 200 built-in functions.
There are functions for string values,
numeric values, booleans, date and time
comparison, node manipulation, sequence
manipulation, and much more.
Today XPath expressions can also be used in
JavaScript, Java, XML Schema, PHP, Python, C
and C++, and lots of other languages.
29
XPath Standard Functions
Many functions, here some of the most important:
For Nodes
◦ count(expr): number of nodes in the set produced by the
expression (expr)
◦ name(): context node name
local-name(), namespace-uri(): name components having a namespace
For strings
◦ concat(ch1, ch2, …): concatenation
◦ contains(ch1, ch2): checks if ch1 contains ch2
◦ substring(ch, pos, l): extract from ch the substring of length l
starting at position pos (positions start at 1)
◦ string-length(ch): string length
30
XPath Standard Functions
For Booleans
◦ true(), false(): true/false values
◦ not(expr): negation of logical expression
For numerics
◦ floor(n), ceiling(n), round(n): rounding functions
rounded for node value
◦ sum(expr), avg(expr): sum, average of the numerical
values of the nodes of the set produced by the
expression (expr)
31
XPath Standard Functions
There are functions without parameters
but linked to the current node
position : the number of the current node in
the list of considered nodes;
last: the last node in the list of considered
nodes.
32
XPath Operators
An XPath expression returns either a node-set, a string, a
Boolean, or a number.
Operator Description Example
| Computes two node-sets //book | //cd
+ / - / * / div Addition / Subtraction / Multiplication / Division 6 + 4 / 6 - 4 / 6 * 4 / 8 div 4
= Equal price=9.80
!= Not equal price!=9.80
< Less than price<9.80
<= Less than or equal to price<=9.80
> Greater than price>9.80
>= Greater than or equal to price>=9.80
or or price=9.80 or price=9.70
and and price>9.00 and price<9.90
mod Modulus (division remainder) 5 mod 2
33
Source
https://www.w3schools.com/
34
XPath Exapmles
<?xml version="1.0" encoding="UTF-8"?> <book category="web">
<title lang="en">XQuery Kick Start</title>
<bookstore> <author>James McGovern</author>
<author>Per Bothner</author>
<book category="cooking"> <author>Kurt Cagle</author>
<title lang="en">Everyday Italian</title> <author>James Linn</author>
<author>Giada De Laurentiis</author> <author>Vaidyanathan Nagarajan</author>
<year>2005</year> <year>2003</year>
<price>30.00</price> <price>49.99</price>
</book> </book>
<book category="children"> <book category="web">
<title lang="en">Harry Potter</title> <title lang="en">Learning XML</title>
<author>J K. Rowling</author> <author>Erik T. Ray</author>
<year>2005</year> <year>2003</year>
<price>29.99</price> <price>39.95</price>
</book> </book>
</bookstore>
selects all the title nodes <title lang="en">Everyday Italian</title>
<title lang="en">Harry Potter</title>
/bookstore/book/title <title lang="en">XQuery Kick Start</title>
<title lang="en">Learning XML</title>
35
XPath Exapmles
<?xml version="1.0" encoding="UTF-8"?> <book category="web">
<title lang="en">XQuery Kick Start</title>
<bookstore> <author>James McGovern</author>
<author>Per Bothner</author>
<book category="cooking"> <author>Kurt Cagle</author>
<title lang="en">Everyday Italian</title> <author>James Linn</author>
<author>Giada De Laurentiis</author> <author>Vaidyanathan Nagarajan</author>
<year>2005</year> <year>2003</year>
<price>30.00</price> <price>49.99</price>
</book> </book>
<book category="children"> <book category="web">
<title lang="en">Harry Potter</title> <title lang="en">Learning XML</title>
<author>J K. Rowling</author> <author>Erik T. Ray</author>
<year>2005</year> <year>2003</year>
<price>29.99</price> <price>39.95</price>
</book> </book>
</bookstore>
/bookstore/book[1]/title <title lang="en">Everyday Italian</title>
/bookstore/book[2]/title <title lang="en">Harry Potter</title>
/bookstore/book[last()]/title <title lang="en">Learning XML</title>
36
XPath Exapmles
all the price nodes
/bookstore/book/price
all the price nodes with a price higher than 35
/bookstore/book[price>35]/price
all the book nodes with a price higher than 35
/bookstore/book[price>35]
all the title nodes with a price higher than 35
/bookstore/book[price>35]/title
37