0% found this document useful (0 votes)
17 views

Lecture 09

The document discusses XML syntax, semantics, and its use as semistructured data. XML is presented as a flexible syntax for data that can be used for configuration files, document markup, and data exchange. Key XML concepts discussed include elements, attributes, and its tree-like structure.

Uploaded by

K N
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Lecture 09

The document discusses XML syntax, semantics, and its use as semistructured data. XML is presented as a flexible syntax for data that can be used for configuration files, document markup, and data exchange. Key XML concepts discussed include elements, attributes, and its tree-like structure.

Uploaded by

K N
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

Lecture 9

XML/Xpath/XQuery

Tuesday, May 26, 2009

1
XML Outline
•  XML
–  Syntax
–  Semistructured data
–  DTDs
•  Xpath
•  XQuery

2
Additional Readings on XML
•  http://www.w3.org/XML/
–  Main source on XML, but hard to read

•  http://www.w3.org/TR/xquery/
–  Authority on Xquery

•  http://www.galaxquery.org/
–  An easy to use, complete XQuery
implementation

Note: XML/XQuery is NOT covered in the textbook 3


XML
•  A flexible syntax for data
•  Used in:
–  Configuration files, e.g. Web.Config
–  Replacement for binary formats (MS Word)
–  Document markup: e.g. XHTML
–  Data: data exchange, semistructured data
•  Roots: SGML - a very nasty language

We will study only XML as data 4


XML as Semistructured Data
•  Relational databases have rigid schema
–  Schema evolution is costly
•  XML is flexible: semistructured data
–  Store data in XML
•  Warning: not normal form ! Not even
1NF

5
From HTML to XML

HTML describes the presentation


6
HTML
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteoul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
7
XML Syntax
<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year>
</book>

</bibliography>

XML describes the content 8


XML Terminology
•  tags: book, title, author, …
•  start tag: <book>, end tag: </book>
•  elements: <book>…</book>,<author>…</author>
•  elements are nested
•  empty element: <red></red> abbrv. <red/>
•  an XML document: single root element

well formed XML document: if it has matching tags


9
More XML: Attributes
<book price = “55” currency = “USD”>
<title> Foundations of Databases </title>
<author> Abiteboul </author>

<year> 1995 </year>
</book>

10
Attributes v.s. Elements
<book price = “55” currency = “USD”> <book>
<title> Foundations of DBs </title> <title> Foundations of DBs </title>
<author> Abiteboul </author> <author> Abiteboul </author>
… …
<year> 1995 </year> <year> 1995 </year>
</book> <price> 55 </price>
<currency> USD </currency>
</book>

attributes are alternative ways to represent data


11
Comparison

Elements Attributes

Ordered Unordered

May be repeated Must be unique

May be nested Must be atomic

12
XML v.s. HTML
•  What are the differences between XML
and HTML ?

In class

13
That’s All !
•  That’s all you ever need to know about
XML syntax
–  Optional type information can be given in
the DTD or XSchema (later)
–  We’ll discuss some additional syntax in the
next few slides, but that’s not essential
•  What is important for you to know:
XML’s semantics
14
More Syntax: Oids and References
<person id=“o555”>
<name> Jane </name>
</person> Are just keys/ foreign keys design
by someone who didn’t take 444
<person id=“o456”>
Don’t use them: use your own
<name> Mary </name> foreign keys instead.
<mother idref=“o555”/>
</person>

oids and references in XML are just syntax


15
More Syntax: CDATA Section
•  Syntax: <![CDATA[ .....any text here...]]>

•  Example:

<example>
<![CDATA[ some text here </notAtag> <>]]>
</example>

16
More Syntax: Entity
References
•  Syntax: &entityname;
•  Example:
<element> this is less than &lt; </
element> &lt; <
•  Some entities: &gt; >
&amp; &
&apos

;
&quot; “
17
&#38; Unicode char
More Syntax: Comments
•  Syntax <!-- .... Comment text... -->

•  Yes, they are part of the data model !!!

18
XML Namespaces
just a unique
•  name ::= [prefix:]localpart name

<book xmlns:bookStandard=“www.isbn-org.org/def”>
<bookStandard:title> … </bookStandard:title>
<bookStandard:publisher> . . .</bookStandard:publisher>

</book>

19
XML Semantics: a Tree !
Element
Attribute node
<data> node data
<person id=“o555” >
person
<name> Mary </name>
<address> person
<street>Maple</street>
id
<no> 345 </no>
<city> Seattle </city> name address
address
</address> name
phone
</person> o555
<person> street no city
Mary Thai
<name> John </name> John
<address>Thailand 23456
</address> Maple 345 Text
<phone>23456</phone> Seattle
node
</person>
</data> Order matters !!! 20
XML as Data
•  XML is self-describing
•  Schema elements become part of the data
–  Reational schema: persons(name,phone)
–  In XML <persons>, <name>, <phone> are part of
the data, and are repeated many times
•  Consequence: XML is much more flexible
•  XML = semistructured data

21
Mapping Relational Data to XML
The canonical mapping: XML: persons

row row row

phone
Persons name phone name phone name
“John” 3634 “Sue” 6343 “Dick” 6363

Name Phone <persons>


John 3634 <row> <name>John</name>
<phone> 3634</phone></row>
Sue 6343 <row> <name>Sue</name>
<phone> 6343</phone>
Dick 6363
<row> <name>Dick</name>
<phone> 6363</phone></row>
</persons>
22
Mapping Relational Data to XML
XML
Natural mapping <persons>
<person>
Persons <name> John </name>
<phone> 3634 </phone>
Name Phone <order> <date> 2002 </date>
<product> Gizmo </product>
John 3634 </order>
<order> <date> 2004 </date>
Sue 6343 <product> Gadget </product>
</order>
Orders </person>
<person>
<name> Sue </name>
PersonName Date Product <phone> 6343 </phone>
John 2002 Gizmo <order> <date> 2004 </date>
<product> Gadget </product>
John 2004 Gadget </order>
Sue 2002 Gadget </person>
</persons> 23
XML is Semi-structured Data
•  Missing attributes:
<person> <name> John</name>
<phone>1234</phone>
</person>
<person> <name>Joe</name>
</person> no phone !

•  Could represent in
a table with nulls name phone
John 1234
Joe -
24
XML is Semi-structured Data
•  Repeated attributes
<person> <name> Mary</name>
<phone>2345</phone>
<phone>3456</phone>
</person>
Two phones !

•  Impossible in tables:
name phone
Mary 2345 3456 ???

25
XML is Semi-structured Data
•  Attributes with different types in different objects
<person> <name> <first> John </first>
<last> Smith </last>
</name>
<phone>1234</phone>
</person> Structured
name !

•  Nested collections (no 1NF)


•  Heterogeneous collections:
–  <db> contains both <book>s and <publisher>s

26
Document Type Definitions
DTD
•  part of the original XML specification
•  an XML document may have a DTD
•  XML document:
Well-formed = if tags are correctly closed
Valid = if it has a DTD and conforms to it
•  validation is useful in data exchange

27
DTD
Goals:
•  Define what tags and attributes are
allowed
•  Define how they are nested
•  Define how they are ordered

Superseded by XML Schema


•  Very complex: DTDs still used widely 28
Very Simple DTD
<!DOCTYPE company [
<!ELEMENT company ((person|product)*)>
<!ELEMENT person (ssn, name, office, phone?)>
<!ELEMENT ssn (#PCDATA)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT office (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT product (pid, name, description?)>
<!ELEMENT pid (#PCDATA)>
<!ELEMENT description (#PCDATA)>
]>
29
Very Simple DTD
Example of valid XML document:
<company>
<person> <ssn> 123456789 </ssn>
<name> John </name>
<office> B432 </office>
<phone> 1234 </phone>
</person>
<person> <ssn> 987654321 </ssn>
<name> Jim </name>
<office> B123 </office>
</person>
<product> ... </product>
...
</company>
30
DTD: The Content Model
<!ELEMENT tag (CONTENT)>
content
•  Content model: model
–  Complex = a regular expression over other elements
–  Text-only = #PCDATA
–  Empty = EMPTY
–  Any = ANY
–  Mixed content = (#PCDATA | A | B | C)*

31
DTD: Regular Expressions
DTD XML
sequence
<!ELEMENT name <name>
<firstName> . . . . . </firstName>
(firstName, lastName))> <lastName> . . . . . </lastName>
</name>

optional
<!ELEMENT name (firstName?, lastName))>
<person>
<name> . . . . . </name>
Kleene star <phone> . . . . . </phone>
<phone> . . . . . </phone>
<!ELEMENT person (name, phone*))> <phone> . . . . . </phone>
......
</person>
alternation
<!ELEMENT person (name, (phone|email)))> 32
SKIPPED MATERIAL:
XSchema
•  Generalizes DTDs

•  Uses XML syntax

•  Two parts: structure and datatypes

•  Very complex
–  criticized
–  alternative proposals: Relax NG
33
DTD v.s. XML Schemas
DTD:
<!ELEMENT paper (title,author*,year, (journal|conference))>
XML Schema:
<xs:element name=“paper” type=“paperType”/>
<xs:complexType name=“paperType”>
<xs:sequence>
<xs:element name=“title” type=“xs:string”/>
<xs:element name=“author” minOccurs=“0”/>
<xs:element name=“year”/>
<xs: choice> < xs:element name=“journal”/>
<xs:element name=“conference”/>
</xs:choice>
</xs:sequence>
</xs:element> 34
Example

A valid XML Document:

<paper>
<title> The Essence of XML </title>
<author> Simeon</author>
<author> Wadler</author>
<year>2003</year>
<conference> POPL</conference>
</paper>

35
Elements v.s. Types

<xs:element name=“person”> <xs:element name=“person”


<xs:complexType> type=“ttt”>
<xs:sequence> <xs:complexType name=“ttt”>
<xs:element name=“name” <xs:sequence>
type=“xs:string”/> <xs:element name=“name”
<xs:element name=“address” type=“xs:string”/>
type=“xs:string”/> <xs:element name=“address”
</xs:sequence> type=“xs:string”/>
</xs:complexType> </xs:sequence>
</xs:element> </xs:complexType>

Both say the same thing; in DTD:

<!ELEMENT person (name,address)> 36


•  Types:
–  Simple types (integers, strings, ...)
–  Complex types (regular expressions, like in DTDs)

•  Element-type Alternation:
–  An element has a type
–  A type is a regular expression of elements

37
Local v.s. Global Types
•  Local type:
<xs:element name=“person”>
[define locally the person’s type]
</xs:element>
•  Global type:
<xs:element name=“person” type=“ttt”/>

<xs:complexType name=“ttt”>
[define here the type ttt]
</xs:complexType>
38
Global types: can be reused in other elements
Local v.s. Global Elements
•  Local element:
<xs:complexType name=“ttt”>
<xs:sequence>
<xs:element name=“address” type=“...”/>...
</xs:sequence>
</xs:complexType>
•  Global element:
<xs:element name=“address” type=“...”/>

<xs:complexType name=“ttt”>
<xs:sequence>
<xs:element ref=“address”/> ...
</xs:sequence>
</xs:complexType>

Global elements: like in DTDs


39
Regular Expressions
Recall the element-type-element alternation:
<xs:complexType name=“....”>
[regular expression on elements]
</xs:complexType>
Regular expressions:
•  <xs:sequence> A B C </...> =ABC
•  <xs:choice> A B C </...> =A|B|C
•  <xs:group> A B C </...> = (A B C)
•  <xs:... minOccurs=“0” maxOccurs=“unbounded”> ..</...> = (...)*
•  <xs:... minOccurs=“0” maxOccurs=“1”> ..</...> = (...)?

40
Local Names
<xs:element name=“person”>
<xs:complexType>
name has . . . . .
<xs:element name=“name”>
different meanings <xs:complexType>
<xs:sequence>
in person and <xs:element name=“firstname” type=“xs:string”/>
in product <xs:element name=“lastname” type=“xs:string”/>
</xs:sequence>
</xs:element>
. . . .
</xs:complexType>
</xs:element>

<xs:element name=“product”>
<xs:complexType>
. . . . .
<xs:element name=“name” type=“xs:string”/>

</xs:complexType>
</xs:element> 41
Subtle Use of Local Names
<xs:element name=“A” type=“oneB”/> <xs:complexType name=“oneB”>
<xs:choice>
<xs:element name=“B” type=“xs:string”/>
<xs:complexType name=“onlyAs”> <xs:sequence>
<xs:choice> <xs:element name=“A” type=“onlyAs”/>
<xs:sequence> <xs:element name=“A” type=“oneB”/>
<xs:element name=“A” type=“onlyAs”/> </xs:sequence>
<xs:element name=“A” type=“onlyAs”/> <xs:sequence>
</xs:sequence> <xs:element name=“A” type=“oneB”/>
<xs:element name=“A” type=“xs:string”/> <xs:element name=“A” type=“onlyAs”/>
</xs:choice> </xs:sequence>
</xs:complexType> </xs:choice>
</xs:complexType>

Arbitrary deep binary tree with A elements, and a single B element

Note: this example is not legal in XML Schema (why ?)


Hence they cannot express all regular tree languages 42
Attributes in XML Schema
<xs:element name=“paper” type=“papertype”>
<xs:complexType name=“papertype”>
<xs:sequence>
<xs:element name=“title” type=“xs:string”/>
......
</xs:sequence>
<xs:attribute name=“language" type="xs:NMTOKEN" fixed=“English"/>
</xs:complexType>
</xs:element>

Attributes are associated to the type, not to the element


Only to complex types; more trouble if we want to add attributes
to simple types.
43
“Mixed” Content, “Any” Type
<xs:complexType mixed="true">
. . . .
•  Better than in DTDs: can still enforce the type, but
now may have text between any elements

<xs:element name="anything" type="xs:anyType"/>


....
•  Means anything is permitted there

44
“All” Group
<xs:complexType name="PurchaseOrderType">
<xs:all> <xs:element name="shipTo" type="USAddress"/>
<xs:element name="billTo" type="USAddress"/>
<xs:element ref="comment" minOccurs="0"/>
<xs:element name="items" type="Items"/>
</xs:all>
<xs:attribute name="orderDate" type="xs:date"/>
</xs:complexType>

•  A restricted form of & in SGML


•  Restrictions:
–  Only at top level
–  Has only elements
–  Each element occurs at most once
•  E.g. “comment” occurs 0 or 1 times

45
Derived Types by Extensions
<complexType name="Address">
<sequence> <element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>

<complexType name="USAddress">
<complexContent>
<extension base="ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>

46
Corresponds to inheritance
Derived Types by Restrictions

<complexContent>
<restriction base="ipo:Items“>
… [rewrite the entire content, with restrictions]...
</restriction>
</complexContent>

•  (*): may restrict cardinalities, e.g.


(0,infty) to (1,1); may restrict choices;
other restrictions…
Corresponds to set inclusion 47
Simple Types
•  String •  Time
•  Token •  dateTime
•  Byte •  Duration
•  unsignedByte
•  Date
•  Integer
•  ID
•  positiveInteger
•  Int (larger than integer)
•  IDREF
•  unsignedInt •  IDREFS
•  Long
•  Short
•  ...

48
Facets of Simple Types
Facets = additional properties restricting a simple type
15 facets defined by XML Schema

Examples •  maxInclusive
•  length •  maxExclusive
•  minLength
•  minInclusive
•  maxLength
•  pattern •  minExclusive
•  enumeration •  totalDigits
•  whiteSpace •  fractionDigits

49
Facets of Simple Types
•  Can further restrict a simple type by
changing some facets
•  Restriction = subset

50
Not so Simple Types
•  List types:
<xs:simpleType name="listOfMyIntType">
<xs:list itemType="myInteger"/>
</xs:simpleType>

<listOfMyInt>20003 15037 95977 95945</listOfMyInt>

•  Union types
•  Restriction types
51
END OF SKIPPED MATERIAL
Discussion 1
What kinds of applications might use
XML ?

52
Discussion 1
What kinds of applications might use
XML ?
•  Data exchange
–  Take the data, don’t worry about schema
•  Property lists
–  Many attributes, most are NULL
•  Evolving schema
–  Add quickly a new attribute
53
Discussion 2
How is XML processed ?

54
Discussion 2
How is XML processed ?
•  Via API
–  Called DOM
–  Navigate, update the XML arbitrarily
–  BUT: memory bound
•  Via some query language:
–  Xpath or Xquery
–  Stand-alone processor OR embedded in SQL
55
Querying XML Data
Will discuss next:

•  XPath = simple navigation on the tree

•  XQuery = “the SQL of XML”

56
Sample Data for Queries
<bib>
<book> <publisher> Addison-Wesley </publisher>
<author> Serge Abiteboul </author>
<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
<author> Victor Vianu </author>
<title> Foundations of Databases </title>
<year> 1995 </year>
</book>
<book price=“55”>
<publisher> Freeman </publisher>
<author> Jeffrey D. Ullman </author>
<title> Principles of Database and Knowledge Base Systems </title>
<year> 1998 </year>
</book>
</bib> 57
Data Model for XPath
The root

bib The root element

book book

publisher author . . . .

Addison-Wesley Serge Abiteboul


58
XPath: Simple Expressions
/bib/book/year

Result: <year> 1995 </year>


<year> 1998 </year>

/bib/paper/year

Result: empty (there were no


papers)
/bib What’s the difference ? / 59
XPath: Restricted Kleene
Closure
//author
Result:<author> Serge Abiteboul </author>
<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
<author> Victor Vianu </author>
<author> Jeffrey D. Ullman </author>

Result: <first-name> Rick </first-name>

/bib//first-name

60
Xpath: Attribute Nodes
/bib/book/@price
Result: “55”

@price means that price is has to be an


attribute

61
Xpath: Wildcard
//author/*
Result: <first-name> Rick </first-name>
<last-name> Hull </last-name>

* Matches any element


@* Matches any attribute

62
Xpath: Text Nodes
/bib/book/author/text()
Result: Serge Abiteboul
Victor Vianu
Jeffrey D. Ullman

Rick Hull doesn’t appear because he has firstname, lastname

Functions in XPath:
–  text() = matches the text value
–  node() = matches any node (= * or @* or text())
–  name() = returns the name of the current tag

63
Xpath: Predicates
/bib/book/author[first-name]
Result: <author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>

64
Xpath: More Predicates
/bib/book/author[first-name][address[.//zip][city]]/last-name

Explain how this is evaluated !

65
Xpath: More Predicates
/bib/book/author[first-name][address[.//zip][city]]/last-name
Result: <lastname> … </lastname>
<lastname> … </lastname>

How do we read this ?


First remove all qualifiers (predicates):
/bib/book/author/last-name

Then add them one by one:


/bib/book/author[first-name][address]/last-name 66
Xpath: More Predicates

/bib/book[@price < 60]

/bib/book[author/@age < 25]

/bib/book[author/text()]

67
Xpath: More Axes

. means current node /bib/book[.//review]

/bib/book[./review] Same as /bib/book[review]

/bib/book/. /author Same as /bib/book/author


68
Xpath: More Axes

.. means parent node

/bib/book/author/../author Same as

/bib/book/author

/bib/book[.//first-name/../last-name] Same as

/bib/book[.//*[first-name][last-name]]
69
Xpath: Brief Summary
bib matches a bib element
* matches any element
/ matches the root element
/bib matches a bib element under root
bib/paper matches a paper in bib
bib//paper matches a paper in bib, at any depth
//paper matches a paper at any depth
paper|book matches a paper or a book
@price matches a price attribute
bib/book/@price matches price attribute in book, in bib
bib/book[@price<“55”]/author/lastname matches…

70
XQuery
•  Based on Quilt, which is based on XML-QL

•  Uses XPath to express more complex queries

71
FLWR (“Flower”) Expressions

FOR ...
LET...
WHERE...
RETURN...

72
FOR-WHERE-RETURN
Find all book titles published after 1995:
for $x in document("bib.xml")/bib/book
where $x/year/text() > 1995
return $x/title

Result:
<title> abc </title>
<title> def </title>
<title> ghi </title> 73
FOR-WHERE-RETURN
Equivalently (perhaps more geekish)

for $x in document("bib.xml")/bib/book[year/text() > 1995] /title


return $x

And even shorter:

document("bib.xml")/bib/book[year/text() > 1995] /title


74
FOR-WHERE-RETURN
•  Find all book titles and the year when
they were published:
for $x in document("bib.xml")/ bib/book
return <answer>
<title> { $x/title/text() } </title>
<year>{ $x/year/text() } </year>
</answer>

Result:
<answer> <title> abc </title> <year> 1995 </year > </answer>
<answer> <title> def </title> <year> 2002 </year > </answer>
<answer> <title> ghk </title> <year> 1980 </year > </answer>
75
FOR-WHERE-RETURN
•  Notice the use of “{“ and “}”
•  What is the result without them ?
for $x in document("bib.xml")/ bib/book
return <answer>
<title> $x/title/text() </title>
<year> $x/year/text() </year>
</answer>

76
FOR-WHERE-RETURN
•  Notice the use of “{“ and “}”
•  What is the result without them ?
for $x in document("bib.xml")/bib/book
return <answer>
<title> $x/title/text() </title>
<year> $x/year/text() </year>
</answer>

<answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>


<answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>
<answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>
77
Nesting
For each author of a book published in
1995, list all books she published:
for $b in document(“bib.xml”)/bib,
$a in $b/book[year/text()=1995]/author
return <result>
{ $a,
for $t in $b/book[author/text()=$a/text()]/title
return $t
}
</result>

In the RETURN clause comma concatenates XML fragments


78
Result
<result>
<author>Jones</author>
<title> abc </title>
<title> def </title>
</result>
<result>
<author> Smith </author>
<title> ghi </title>
</result>
79
Aggregates
Find all books with more than 3 authors:

for $x in document("bib.xml")/bib/book
where count($x/author)>3
return $x

count = a function that counts


avg = computes the average
sum = computes the sum
distinct-values = eliminates duplicates 80
Aggregates
Same thing:

for $x in document("bib.xml")/bib/book[count(author)>3]
return $x

81
Aggregates
Print all authors who published more than
3 books

for $b in document("bib.xml")/bib,
$a in distinct-values($b/book/author/text())
where count($b/book[author/text()=$a])>3
return <author> { $a } </author>

82
Flattening
•  “Flatten” the authors, i.e. return a list of
(author, title) pairs
for $b in document("bib.xml")/bib/book, Result:
$x in $b/title/text(), <answer>
$y in $b/author/text() <title> abc </title>
<author> efg </author>
return <answer> </answer>
<title> { $x } </title> <answer>
<author> { $y } </author> <title> abc </title>
</answer> <author> hkj </author>
</answer>
83
Re-grouping
•  For each author, return all titles of her/
his books Result:
<answer>
for $b in document("bib.xml")/bib <author> efg </author>
let $a:=distinct-values($b/book/author/text()) <title> abc </title>
for $x in $a <title> klm </title>
....
return </answer>
<answer>
<author> { $x } </author>
{ for $y in $b/book[author/text()=$x]/title
return $y }
</answer> 84
Re-grouping
•  Same thing:
for $b in document("bib.xml")/bib,
$x in distinct-values($b/book/author/text())
return
<answer>
<author> { $x } </author>
{ for $y in $b/book[author/text()=$x]/title
return $y }
</answer>

85
SQL and XQuery Side-by-side
Find all product names, prices,
Product(pid, name, maker, price)
sort by price

SELECT x.name, for $x in document(“db.xml”)/db/product/row


x.price order by $x/price/text()
FROM Product x return <answer>
ORDER BY x.price { $x/name, $x/price }
</answer>

SQL
XQuery

86
Xquery’s Answer
<answer>
<name> abc </name>
<price> 7 </price>
</answer>
<answer>
<name> def </name>
<price> 23 </price>
</answer>
....

87
SQL and XQuery Side-by-side
Product(pid, name, maker, price)
Find all products made in Seattle
Company(cid, name, city, revenues)
for $r in document(“db.xml”)/db,
$x in $r/product/row,
SELECT x.name
$y in $r/company/row
FROM Product x, Company y
where
WHERE x.maker=y.cid
$x/maker/text()=$y/cid/text()
and y.city=“Seattle”
and $y/city/text() = “seattle”
return { $x/name }
SQL XQuery
for $y in /db/company/row[city/text()=“seattle”],
Cool $x in /db/product/row[maker/text()=$y/cid/text()]
XQuery return { $x/name } 88
<product>
<row> <pid> 123 </pid>
<name> abc </name>
<maker> efg </maker>
</row>
<row> …. </row>

</product>
<product>
...
</product>
....

89
SQL and XQuery Side-by-side
For each company with revenues < 1M, count how many
products with price > $100 they make
SELECT y.name, count(*)
FROM Product x, Company y
WHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000
GROUP BY y.cid, y.name
for $r in document(“db.xml”)/db,
$y in $r/company/row[revenue/text()<1000000]
return
<proudcompany>
<companyname> { $y/name/text() } </companyname>
<numberofexpensiveproducts>
{count($r/product/row[maker/text()=$y/cid/text()][price/text()>100])}
</numberofexpensiveproducts>
</proudcompany> 90
SQL and XQuery Side-by-side
Find companies with at least 30 products, and their average price
SELECT y.name, avg(x.price)
FROM Product x, Company y $r=element
WHERE x.maker=y.cid
GROUP BY y.cid, y.name
HAVING count(*) > 30
for $r in document(“db.xml”)/db,
$y in $r/company/row
let $p := $r/product/row[maker/text()=$y/cid/text()]
$y=collection where count($p) > 30
return
<thecompany>
<companyname> { $y/name/text() }
</companyname>
<avgprice> avg($p/price/text()) </avgprice>
</thecompany> 91
FOR v.s. LET

FOR
•  Binds node variables  iteration

LET
•  Binds collection variables  one value

92
FOR v.s. LET

Returns:
for $x in /bib/book <result> <book>...</book></result>
return <result> { $x } </result> <result> <book>...</book></result>
<result> <book>...</book></result>
...

let $x := /bib/book Returns:


<result> <book>...</book>
return <result> { $x } </result> <book>...</book>
<book>...</book>
...
</result>
93
XQuery
Summary:
•  FOR-LET-WHERE-RETURN = FLWR
FOR/LET Clauses

List of tuples

WHERE Clause

List of tuples

RETURN Clause

94
Instance of Xquery data model
XML in SQL Server 2005
•  Create tables with attributes of type XML

•  Use Xquery in SQL queries

•  Rest of the slides are from:


Shankar Pal et al., Indexing XML data stored in
a relational database, VLDB’2004

95
CREATE TABLE DOCS (
ID int primary key,
XDOC xml)

SELECT ID, XDOC.query(’


for $s in /BOOK[@ISBN= “1-55860-438-3”]//SECTION
return <topic>{data($s/TITLE)} </topic>')
FROM DOCS

96
XML Methods in SQL
•  Query() = returns XML data type
•  Value() = extracts scalar values
•  Exist() = checks conditions on XML
nodes
•  Nodes() = returns a rowset of XML
nodes that the Xquery expression
evaluates to

97
Examples
•  From here:
http://msdn.microsoft.com/library/
default.asp?url=/library/en-us/dnsql90/
html/sql2k5xml.asp

98
XML Type

CREATE TABLE docs (


pk INT PRIMARY KEY,
xCol XML not null
)

99
Inserting an XML Value

INSERT INTO docs VALUES (2,


'<doc id="123">
<sections>
<section num="1"><title>XML Schema</title></section>
<section num="3"><title>Benefits</title></section>
<section num="4"><title>Features</title></section>
</sections>
</doc>')

100
Query( )

SELECT pk, xCol.query('/doc[@id = 123]//section')


FROM docs

101
Exists( )

SELECT xCol.query('/doc[@id = 123]//section')


FROM docs
WHERE xCol.exist ('/doc[@id = 123]') = 1

102
Value( )

SELECT xCol.value(
'data((/doc//section[@num = 3]/title)[1])', 'nvarchar(max)')
FROM docs

103
Nodes( )

SELECT nref.value('first-name[1]', 'nvarchar(50)')


AS FirstName,
nref.value('last-name[1]', 'nvarchar(50)')
AS LastName
FROM @xVar.nodes('//author') AS R(nref)
WHERE nref.exist('.[first-name != "David"]') = 1

104
Nodes( )

SELECT nref.value('@genre', 'varchar(max)') LastName


FROM docs CROSS APPLY xCol.nodes('//book') AS R(nref)

105
Internal Storage
•  XML is “shredded” as a table
•  A few important ideas:
–  Dewey decimal numbering of nodes; store in clustered B-
tree indes
–  Use only odd numbers to allow insertions
–  Reverse PATH-ID encoding, for efficient processing of
postfix expressions like //a/b/c
–  Add more indexes, e.g. on data values

106
<BOOK ISBN=“1-55860-438-3”>
<SECTION>
<TITLE>Bad Bugs</TITLE>
Nobody loves bad bugs.
<FIGURE CAPTION=“Sample bug”/>
</SECTION>
<SECTION>
<TITLE>Tree Frogs</TITLE>
All right-thinking people
<BOLD> love </BOLD>
tree frogs.
</SECTION>
</BOOK>
107
108
109
Infoset Table
/BOOK[@ISBN = “1-55860-438-3”]/SECTION

SELECT SerializeXML (N2.ID, N2.ORDPATH)


FROM infosettab N1 JOIN infosettab N2 ON (N1.ID = N2.ID)
WHERE N1.PATH_ID = PATH_ID(/BOOK/@ISBN)
AND N1.VALUE = '1-55860-438-3'
AND N2.PATH_ID = PATH_ID(BOOK/SECTION)
AND Parent (N1.ORDPATH) = Parent (N2.ORDPATH)

110

You might also like