0% found this document useful (0 votes)
66 views

XML Documents - Xquery Xpath

XML documents contain elements and other markup in an orderly package. An XML document example provided shows a simple contact information document with name, company, and phone number elements. XML documents include a document prolog section with an XML declaration and optional document type declaration. The document also contains elements that divide the content into a hierarchical structure and can contain text and other elements.

Uploaded by

barwin raj
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

XML Documents - Xquery Xpath

XML documents contain elements and other markup in an orderly package. An XML document example provided shows a simple contact information document with name, company, and phone number elements. XML documents include a document prolog section with an XML declaration and optional document type declaration. The document also contains elements that divide the content into a hierarchical structure and can contain text and other elements.

Uploaded by

barwin raj
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

XML - Documents

An XML document is a basic unit of XML information composed of elements and


other markup in an orderly package. An XML document can contains wide variety of
data. For example, database of numbers, numbers representing molecular structure
or a mathematical equation.
XML Document Example
A simple document is shown in the following example −
<?xml version = "1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
The following image depicts the parts of XML document.

Document Prolog Section


Document Prolog comes at the top of the document, before the root element. This
section contains −
 XML declaration
 Document type declaration
Document Elements Section
Document Elements are the building blocks of XML. These divide the document into
a hierarchy of sections, each serving a specific purpose. You can separate a
document into multiple sections so that they can be rendered differently, or used by
a search engine. The elements can be containers, with a combination of text and
other elements.
XML – Declaration
XML declaration contains details that prepare an XML processor to parse the XML
document. It is optional, but when used, it must appear in the first line of the XML
document.
Syntax
Following syntax shows XML declaration −
<?xml
version = "version_number"
encoding = "encoding_declaration"
standalone = "standalone_status"
?>
Each parameter consists of a parameter name, an equals sign (=), and parameter
value inside a quote. Following table shows the above syntax in detail −

Parameter Parameter_value Parameter_description

Version 1.0 Specifies the version of the XML standard


used.

Encoding UTF-8, UTF-16, ISO- It defines the character encoding used in


10646-UCS-2, ISO- the document. UTF-8 is the default
10646-UCS-4, ISO-8859- encoding used.
1 to ISO-8859-9, ISO-
2022-JP, Shift_JIS, EUC-
JP

Standalone yes or no It informs the parser whether the


document relies on the information from
an external source, such as external
document type definition (DTD), for its
content. The default value is set to no.
Setting it to yes tells the processor there
are no external declarations required for
parsing the document.
Rules
An XML declaration should abide with the following rules −
 If the XML declaration is present in the XML, it must be placed as the first line
in the XML document.
 If the XML declaration is included, it must contain version number attribute.
 The Parameter names and values are case-sensitive.
 The names are always in lower case.
 The order of placing the parameters is important. The correct order is: version,
encoding and standalone.
 Either single or double quotes may be used.
 The XML declaration has no closing tag i.e. </?xml>
XML Declaration Examples
Following are few examples of XML declarations −
XML declaration with no parameters −
<?xml >
XML declaration with version definition −
<?xml version = "1.0">
XML declaration with all parameters defined −
<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
XML declaration with all parameters defined in single quotes −
<?xml version = '1.0' encoding = 'iso-8859-1' standalone = 'no' ?>

Document Type Declaration


The XML Document Type Declaration, commonly known as DTD, is a way to describe
XML language precisely. DTDs check vocabulary and validity of the structure of XML
documents against grammatical rules of appropriate XML language.
An XML DTD can be either specified inside the document, or it can be kept in a
separate document and then liked separately.
Syntax
Basic syntax of a DTD is as follows −
<!DOCTYPE element DTD identifier
[
declaration1
declaration2
........
]>
In the above syntax,
 The DTD starts with <!DOCTYPE delimiter.
 An element tells the parser to parse the document from the specified root
element.
 DTD identifier is an identifier for the document type definition, which may be
the path to a file on the system or URL to a file on the internet. If the DTD is
pointing to external path, it is called External Subset.
 The square brackets [ ] enclose an optional list of entity declarations
called Internal Subset.
Internal DTD
A DTD is referred to as an internal DTD if elements are declared within the XML files.
To refer it as internal DTD, standalone attribute in XML declaration must be set
to yes. This means, the declaration works independent of an external source.
Syntax
Following is the syntax of internal DTD −
<!DOCTYPE root-element [element-declarations]>
where root-element is the name of root element and element-declarations is where
you declare the elements.
Example
Following is a simple example of internal DTD −
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
<!DOCTYPE address [
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>

Start Declaration − Begin the XML declaration with the following statement.
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
DTD − Immediately after the XML header, the document type declaration follows,
commonly referred to as the DOCTYPE −
<!DOCTYPE address [
The DOCTYPE declaration has an exclamation mark (!) at the start of the element
name. The DOCTYPE informs the parser that a DTD is associated with this XML
document.
DTD Body − The DOCTYPE declaration is followed by body of the DTD, where you
declare elements, attributes, entities, and notations.
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone_no (#PCDATA)>
Several elements are declared here that make up the vocabulary of the <name>
document. <!ELEMENT name (#PCDATA)> defines the element name to be of type
"#PCDATA". Here #PCDATA means parse-able text data.
End Declaration − Finally, the declaration section of the DTD is closed using a closing
bracket and a closing angle bracket (]>). This effectively ends the definition, and
thereafter, the XML document follows immediately.
Rules
 The document type declaration must appear at the start of the document
(preceded only by the XML header) − it is not permitted anywhere else within
the document.
 Similar to the DOCTYPE declaration, the element declarations must start with
an exclamation mark.
 The Name in the document type declaration must match the element type of
the root element.
External DTD
In external DTD elements are declared outside the XML file. They are accessed by
specifying the system attributes which may be either the legal .dtd file or a valid URL.
To refer it as external DTD, standalone attribute in the XML declaration must be set
as no. This means, declaration includes information from the external source.
Syntax
Following is the syntax for external DTD −
<!DOCTYPE root-element SYSTEM "file-name">
where file-name is the file with .dtd extension.
Example
The following example shows external DTD usage −
<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
<!DOCTYPE address SYSTEM "address.dtd">
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>

XML elements
XML elements can be defined as building blocks of an XML. Elements can behave as
containers to hold text, elements, attributes, media objects or all of these.
Each XML document contains one or more elements, the scope of which are either
delimited by start and end tags, or for empty elements, by an empty-element tag.
Syntax
Following is the syntax to write an XML element −
<element-name attribute1 attribute2>
....content
</element-name>
where,
 element-name is the name of the element. The name its case in the start and
end tags must match.
 attribute1, attribute2 are attributes of the element separated by white spaces.
An attribute defines a property of the element. It associates a name with a
value, which is a string of characters. An attribute is written as −
name = "value"
name is followed by an = sign and a string value inside double(" ") or single(' ')
quotes.
Empty Element
An empty element (element with no content) has following syntax −
<name attribute1 attribute2.../>
Following is an example of an XML document using various XML element −
<?xml version = "1.0"?>
<contact-info>
<address category = "residence">
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
</contact-info>
XML Elements Rules
Following rules are required to be followed for XML elements −
 An element name can contain any alphanumeric characters. The only
punctuation mark allowed in names are the hyphen (-), under-score (_) and
period (.).
 Names are case sensitive. For example, Address, address, and ADDRESS are
different names.
 Start and end tags of an element must be identical.
 An element, which is a container, can contain text or elements as seen in the
above example.
XML - Attributes
Attributes are part of XML elements. An element can have multiple unique
attributes. Attribute gives more information about XML elements. To be more
precise, they define properties of elements. An XML attribute is always a name-value
pair.

Syntax

An XML attribute has the following syntax −


<element-name attribute1 attribute2 >
....content..
< /element-name>
where attribute1 and attribute2 has the following form −
name = "value"
value has to be in double (" ") or single (' ') quotes.
Here, attribute1 and attribute2 are unique attribute labels.
Attributes are used to add a unique label to an element, place the label in a category,
add a Boolean flag, or otherwise associate it with some string of data. Following
example demonstrates the use of attributes −
<?xml version = "1.0" encoding = "UTF-8"?>
<!DOCTYPE garden [
<!ELEMENT garden (plants)*>
<!ELEMENT plants (#PCDATA)>
<!ATTLIST plants category CDATA #REQUIRED>
]>

<garden>
<plants category = "flowers" />
<plants category = "shrubs">
</plants>
</garden>
Attributes are used to distinguish among elements of the same name, when you do
not want to create a new element for every situation. Hence, the use of an attribute
can add a little more detail in differentiating two or more similar elements.
In the above example, we have categorized the plants by including attribute category
and assigning different values to each of the elements. Hence, we have two
categories of plants, one flowers and other shrubs. Thus, we have two plant elements
with different attributes.

Attribute Types

Following table lists the type of attributes –


Attribute Type Description

StringType It takes any literal string as a value. CDATA is a StringType. CDATA is


character data. This means, any string of non-markup characters is a
legal part of the attribute.

This is a more constrained type. The validity constraints noted in the


grammar are applied after the attribute value is normalized. The
TokenizedType attributes are given as −
 ID − It is used to specify the element as unique.
 IDREF − It is used to reference an ID that has been named for
another element.
 IDREFS − It is used to reference all IDs of an element.
TokenizedType
 ENTITY − It indicates that the attribute will represent an
external entity in the document.
 ENTITIES − It indicates that the attribute will represent
external entities in the document.
 NMTOKEN − It is similar to CDATA with restrictions on what
data can be part of the attribute.
 NMTOKENS − It is similar to CDATA with restrictions on what
data can be part of the attribute.

This has a list of predefined values in its declaration. out of which, it


must assign one value. There are two types of enumerated attribute

EnumeratedType  NotationType − It declares that an element will be referenced
to a NOTATION declared somewhere else in the XML
document.
 Enumeration − Enumeration allows you to define a specific list
of values that the attribute value must match.

Element Attribute Rules

Following are the rules that need to be followed for attributes −


 An attribute name must not appear more than once in the same start-tag or
empty-element tag.
 An attribute must be declared in the Document Type Definition (DTD) using an
Attribute-List Declaration.
 Attribute values must not contain direct or indirect entity references to
external entities.
 The replacement text of any entity referred to directly or indirectly in an
attribute value must not contain a less than sign (<)
XPath
XPath uses path expressions to select nodes or node-sets in an XML document. The node
is selected by following a path or steps.

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>

<book>
  <title lang="en">Harry Potter</title>
  <price>29.99</price>
</book>

<book>
  <title lang="en">Learning XML</title>
  <price>39.95</price>
</book>

</bookstore>
Selecting Nodes
XPath uses path expressions to select nodes in an XML document. The node is selected by
following a path or steps. The most useful path expressions are listed below:

Expression Description

nodename Selects all nodes with the name "nodename"

/ Selects from the root node

// Selects nodes in the document from the current node that match the
selection no matter where they are

. Selects the current node

.. Selects the parent of the current node

@ Selects attributes

XQuery :
XQuery is a language for finding and extracting elements and attributes from XML
documents.
 XQuery is the language for querying XML data
 XQuery for XML is like SQL for databases
 XQuery is built on XPath expressions
 XQuery is supported by all major databases
 XQuery is a W3C Recommendation

XQuery Example
for $x in doc("books.xml")/bookstore/book
where $x/price>30
order by $x/title
return $x/title

The XML Example Document


We will use the following XML document in the examples below.
"books.xml":
<?xml version="1.0" encoding="UTF-8"?>

<bookstore>

<book category="COOKING">
  <title lang="en">Everyday Italian</title>
  <author>Giada De Laurentiis</author>
  <year>2005</year>
  <price>30.00</price>
</book>

<book category="CHILDREN">
  <title lang="en">Harry Potter</title>
  <author>J K. Rowling</author>
  <year>2005</year>
  <price>29.99</price>
</book>

<book category="WEB">
  <title lang="en">XQuery Kick Start</title>
  <author>James McGovern</author>
  <author>Per Bothner</author>
  <author>Kurt Cagle</author>
  <author>James Linn</author>
  <author>Vaidyanathan Nagarajan</author>
  <year>2003</year>
  <price>49.99</price>
</book>

<book category="WEB">
  <title lang="en">Learning XML</title>
  <author>Erik T. Ray</author>
  <year>2003</year>
  <price>39.95</price>
</book>

</bookstore>

How to Select Nodes From "books.xml"?

Functions

XQuery uses functions to extract data from XML documents.

The doc() function is used to open the "books.xml" file:

doc("books.xml")

Path Expressions
XQuery uses path expressions to navigate through elements in an XML document.

In the table below we have listed some path expressions and the result of the expressions:
Path Expression Result

bookstore Selects all nodes with the name "bookstore"

/bookstore Selects the root element bookstore


Note: If the path starts with a slash ( / ) it always represents an absolute
path to an element!

bookstore/book Selects all book elements that are children of bookstore

//book Selects all book elements no matter where they are in the document

bookstore//book Selects all book elements that are descendant of the bookstore element, no
matter where they are under the bookstore element

//@lang Selects all attributes that are named lang

Predicates
Predicates are used to find a specific node or a node that contains a specific value.
Predicates are always embedded in square brackets.
In the table below we have listed some path expressions with predicates and the result of the
expressions:

Path Expression Result

/bookstore/book[1] Selects the first book element that is the child of the
bookstore element.

/bookstore/book[last()] Selects the last book element that is the child of the
bookstore element

/bookstore/book[last()-1] Selects the last but one book element that is the child of
the bookstore element

/bookstore/book[position()<3] Selects the first two book elements that are children of the
bookstore element

//title[@lang] Selects all the title elements that have an attribute named
lang

//title[@lang='en'] Selects all the title elements that have a "lang" attribute
with a value of "en"

/bookstore/book[price>35.00] Selects all the book elements of the bookstore element that
have a price element with a value greater than 35.00

/bookstore/book[price>35.00]/title Selects all the title elements of the book elements of the
bookstore element that have a price element with a value
greater than 35.00

The following path expression is used to select all the title elements in the "books.xml" file:

doc("books.xml")/bookstore/book/title

(/bookstore selects the bookstore element, /book selects all the book elements under the
bookstore element, and /title selects all the title elements under each book element)
The XQuery above will extract the following:

<title lang="en">Everyday Italian</title>


<title lang="en">Harry Potter</title>
<title lang="en">XQuery Kick Start</title>
<title lang="en">Learning XML</title>

Predicates
XQuery uses predicates to limit the extracted data from XML documents.

The following predicate is used to select all the book elements under the bookstore element
that have a price element with a value that is less than 30:

doc("books.xml")/bookstore/book[price<30]

The XQuery above will extract the following:

<book category="CHILDREN">
  <title lang="en">Harry Potter</title>
  <author>J K. Rowling</author>
  <year>2005</year>
  <price>29.99</price>
</book>

You might also like