Lecture 09
Lecture 09
XML/Xpath/XQuery
1
XML Outline
• XML
– Syntax
– Semistructured data
– DTDs
• Xpath
• XQuery
2
Additional Readings on XML
• http://www.w3.org/XML/
– Main source on XML, but hard to read
• http://www.w3.org/TR/xquery/
– Authority on Xquery
• http://www.galaxquery.org/
– An easy to use, complete XQuery
implementation
5
From HTML to XML
10
Attributes v.s. Elements
<book price = “55” currency = “USD”> <book>
<title> Foundations of DBs </title> <title> Foundations of DBs </title>
<author> Abiteboul </author> <author> Abiteboul </author>
… …
<year> 1995 </year> <year> 1995 </year>
</book> <price> 55 </price>
<currency> USD </currency>
</book>
Elements Attributes
Ordered Unordered
12
XML v.s. HTML
• What are the differences between XML
and HTML ?
In class
13
That’s All !
• That’s all you ever need to know about
XML syntax
– Optional type information can be given in
the DTD or XSchema (later)
– We’ll discuss some additional syntax in the
next few slides, but that’s not essential
• What is important for you to know:
XML’s semantics
14
More Syntax: Oids and References
<person id=“o555”>
<name> Jane </name>
</person> Are just keys/ foreign keys design
by someone who didn’t take 444
<person id=“o456”>
Don’t use them: use your own
<name> Mary </name> foreign keys instead.
<mother idref=“o555”/>
</person>
• Example:
<example>
<![CDATA[ some text here </notAtag> <>]]>
</example>
16
More Syntax: Entity
References
• Syntax: &entityname;
• Example:
<element> this is less than < </
element> < <
• Some entities: > >
& &
&apos
‘
;
" “
17
& Unicode char
More Syntax: Comments
• Syntax <!-- .... Comment text... -->
18
XML Namespaces
just a unique
• name ::= [prefix:]localpart name
<book xmlns:bookStandard=“www.isbn-org.org/def”>
<bookStandard:title> … </bookStandard:title>
<bookStandard:publisher> . . .</bookStandard:publisher>
</book>
19
XML Semantics: a Tree !
Element
Attribute node
<data> node data
<person id=“o555” >
person
<name> Mary </name>
<address> person
<street>Maple</street>
id
<no> 345 </no>
<city> Seattle </city> name address
address
</address> name
phone
</person> o555
<person> street no city
Mary Thai
<name> John </name> John
<address>Thailand 23456
</address> Maple 345 Text
<phone>23456</phone> Seattle
node
</person>
</data> Order matters !!! 20
XML as Data
• XML is self-describing
• Schema elements become part of the data
– Reational schema: persons(name,phone)
– In XML <persons>, <name>, <phone> are part of
the data, and are repeated many times
• Consequence: XML is much more flexible
• XML = semistructured data
21
Mapping Relational Data to XML
The canonical mapping: XML: persons
phone
Persons name phone name phone name
“John” 3634 “Sue” 6343 “Dick” 6363
• Could represent in
a table with nulls name phone
John 1234
Joe -
24
XML is Semi-structured Data
• Repeated attributes
<person> <name> Mary</name>
<phone>2345</phone>
<phone>3456</phone>
</person>
Two phones !
• Impossible in tables:
name phone
Mary 2345 3456 ???
25
XML is Semi-structured Data
• Attributes with different types in different objects
<person> <name> <first> John </first>
<last> Smith </last>
</name>
<phone>1234</phone>
</person> Structured
name !
26
Document Type Definitions
DTD
• part of the original XML specification
• an XML document may have a DTD
• XML document:
Well-formed = if tags are correctly closed
Valid = if it has a DTD and conforms to it
• validation is useful in data exchange
27
DTD
Goals:
• Define what tags and attributes are
allowed
• Define how they are nested
• Define how they are ordered
31
DTD: Regular Expressions
DTD XML
sequence
<!ELEMENT name <name>
<firstName> . . . . . </firstName>
(firstName, lastName))> <lastName> . . . . . </lastName>
</name>
optional
<!ELEMENT name (firstName?, lastName))>
<person>
<name> . . . . . </name>
Kleene star <phone> . . . . . </phone>
<phone> . . . . . </phone>
<!ELEMENT person (name, phone*))> <phone> . . . . . </phone>
......
</person>
alternation
<!ELEMENT person (name, (phone|email)))> 32
SKIPPED MATERIAL:
XSchema
• Generalizes DTDs
• Very complex
– criticized
– alternative proposals: Relax NG
33
DTD v.s. XML Schemas
DTD:
<!ELEMENT paper (title,author*,year, (journal|conference))>
XML Schema:
<xs:element name=“paper” type=“paperType”/>
<xs:complexType name=“paperType”>
<xs:sequence>
<xs:element name=“title” type=“xs:string”/>
<xs:element name=“author” minOccurs=“0”/>
<xs:element name=“year”/>
<xs: choice> < xs:element name=“journal”/>
<xs:element name=“conference”/>
</xs:choice>
</xs:sequence>
</xs:element> 34
Example
<paper>
<title> The Essence of XML </title>
<author> Simeon</author>
<author> Wadler</author>
<year>2003</year>
<conference> POPL</conference>
</paper>
35
Elements v.s. Types
• Element-type Alternation:
– An element has a type
– A type is a regular expression of elements
37
Local v.s. Global Types
• Local type:
<xs:element name=“person”>
[define locally the person’s type]
</xs:element>
• Global type:
<xs:element name=“person” type=“ttt”/>
<xs:complexType name=“ttt”>
[define here the type ttt]
</xs:complexType>
38
Global types: can be reused in other elements
Local v.s. Global Elements
• Local element:
<xs:complexType name=“ttt”>
<xs:sequence>
<xs:element name=“address” type=“...”/>...
</xs:sequence>
</xs:complexType>
• Global element:
<xs:element name=“address” type=“...”/>
<xs:complexType name=“ttt”>
<xs:sequence>
<xs:element ref=“address”/> ...
</xs:sequence>
</xs:complexType>
40
Local Names
<xs:element name=“person”>
<xs:complexType>
name has . . . . .
<xs:element name=“name”>
different meanings <xs:complexType>
<xs:sequence>
in person and <xs:element name=“firstname” type=“xs:string”/>
in product <xs:element name=“lastname” type=“xs:string”/>
</xs:sequence>
</xs:element>
. . . .
</xs:complexType>
</xs:element>
<xs:element name=“product”>
<xs:complexType>
. . . . .
<xs:element name=“name” type=“xs:string”/>
</xs:complexType>
</xs:element> 41
Subtle Use of Local Names
<xs:element name=“A” type=“oneB”/> <xs:complexType name=“oneB”>
<xs:choice>
<xs:element name=“B” type=“xs:string”/>
<xs:complexType name=“onlyAs”> <xs:sequence>
<xs:choice> <xs:element name=“A” type=“onlyAs”/>
<xs:sequence> <xs:element name=“A” type=“oneB”/>
<xs:element name=“A” type=“onlyAs”/> </xs:sequence>
<xs:element name=“A” type=“onlyAs”/> <xs:sequence>
</xs:sequence> <xs:element name=“A” type=“oneB”/>
<xs:element name=“A” type=“xs:string”/> <xs:element name=“A” type=“onlyAs”/>
</xs:choice> </xs:sequence>
</xs:complexType> </xs:choice>
</xs:complexType>
44
“All” Group
<xs:complexType name="PurchaseOrderType">
<xs:all> <xs:element name="shipTo" type="USAddress"/>
<xs:element name="billTo" type="USAddress"/>
<xs:element ref="comment" minOccurs="0"/>
<xs:element name="items" type="Items"/>
</xs:all>
<xs:attribute name="orderDate" type="xs:date"/>
</xs:complexType>
45
Derived Types by Extensions
<complexType name="Address">
<sequence> <element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>
<complexType name="USAddress">
<complexContent>
<extension base="ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>
46
Corresponds to inheritance
Derived Types by Restrictions
<complexContent>
<restriction base="ipo:Items“>
… [rewrite the entire content, with restrictions]...
</restriction>
</complexContent>
48
Facets of Simple Types
Facets = additional properties restricting a simple type
15 facets defined by XML Schema
Examples • maxInclusive
• length • maxExclusive
• minLength
• minInclusive
• maxLength
• pattern • minExclusive
• enumeration • totalDigits
• whiteSpace • fractionDigits
49
Facets of Simple Types
• Can further restrict a simple type by
changing some facets
• Restriction = subset
50
Not so Simple Types
• List types:
<xs:simpleType name="listOfMyIntType">
<xs:list itemType="myInteger"/>
</xs:simpleType>
• Union types
• Restriction types
51
END OF SKIPPED MATERIAL
Discussion 1
What kinds of applications might use
XML ?
52
Discussion 1
What kinds of applications might use
XML ?
• Data exchange
– Take the data, don’t worry about schema
• Property lists
– Many attributes, most are NULL
• Evolving schema
– Add quickly a new attribute
53
Discussion 2
How is XML processed ?
54
Discussion 2
How is XML processed ?
• Via API
– Called DOM
– Navigate, update the XML arbitrarily
– BUT: memory bound
• Via some query language:
– Xpath or Xquery
– Stand-alone processor OR embedded in SQL
55
Querying XML Data
Will discuss next:
56
Sample Data for Queries
<bib>
<book> <publisher> Addison-Wesley </publisher>
<author> Serge Abiteboul </author>
<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
<author> Victor Vianu </author>
<title> Foundations of Databases </title>
<year> 1995 </year>
</book>
<book price=“55”>
<publisher> Freeman </publisher>
<author> Jeffrey D. Ullman </author>
<title> Principles of Database and Knowledge Base Systems </title>
<year> 1998 </year>
</book>
</bib> 57
Data Model for XPath
The root
book book
publisher author . . . .
/bib/paper/year
/bib//first-name
60
Xpath: Attribute Nodes
/bib/book/@price
Result: “55”
61
Xpath: Wildcard
//author/*
Result: <first-name> Rick </first-name>
<last-name> Hull </last-name>
62
Xpath: Text Nodes
/bib/book/author/text()
Result: Serge Abiteboul
Victor Vianu
Jeffrey D. Ullman
Functions in XPath:
– text() = matches the text value
– node() = matches any node (= * or @* or text())
– name() = returns the name of the current tag
63
Xpath: Predicates
/bib/book/author[first-name]
Result: <author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
64
Xpath: More Predicates
/bib/book/author[first-name][address[.//zip][city]]/last-name
65
Xpath: More Predicates
/bib/book/author[first-name][address[.//zip][city]]/last-name
Result: <lastname> … </lastname>
<lastname> … </lastname>
/bib/book[author/text()]
67
Xpath: More Axes
/bib/book/author/../author Same as
/bib/book/author
/bib/book[.//first-name/../last-name] Same as
/bib/book[.//*[first-name][last-name]]
69
Xpath: Brief Summary
bib matches a bib element
* matches any element
/ matches the root element
/bib matches a bib element under root
bib/paper matches a paper in bib
bib//paper matches a paper in bib, at any depth
//paper matches a paper at any depth
paper|book matches a paper or a book
@price matches a price attribute
bib/book/@price matches price attribute in book, in bib
bib/book[@price<“55”]/author/lastname matches…
70
XQuery
• Based on Quilt, which is based on XML-QL
71
FLWR (“Flower”) Expressions
FOR ...
LET...
WHERE...
RETURN...
72
FOR-WHERE-RETURN
Find all book titles published after 1995:
for $x in document("bib.xml")/bib/book
where $x/year/text() > 1995
return $x/title
Result:
<title> abc </title>
<title> def </title>
<title> ghi </title> 73
FOR-WHERE-RETURN
Equivalently (perhaps more geekish)
Result:
<answer> <title> abc </title> <year> 1995 </year > </answer>
<answer> <title> def </title> <year> 2002 </year > </answer>
<answer> <title> ghk </title> <year> 1980 </year > </answer>
75
FOR-WHERE-RETURN
• Notice the use of “{“ and “}”
• What is the result without them ?
for $x in document("bib.xml")/ bib/book
return <answer>
<title> $x/title/text() </title>
<year> $x/year/text() </year>
</answer>
76
FOR-WHERE-RETURN
• Notice the use of “{“ and “}”
• What is the result without them ?
for $x in document("bib.xml")/bib/book
return <answer>
<title> $x/title/text() </title>
<year> $x/year/text() </year>
</answer>
for $x in document("bib.xml")/bib/book
where count($x/author)>3
return $x
for $x in document("bib.xml")/bib/book[count(author)>3]
return $x
81
Aggregates
Print all authors who published more than
3 books
for $b in document("bib.xml")/bib,
$a in distinct-values($b/book/author/text())
where count($b/book[author/text()=$a])>3
return <author> { $a } </author>
82
Flattening
• “Flatten” the authors, i.e. return a list of
(author, title) pairs
for $b in document("bib.xml")/bib/book, Result:
$x in $b/title/text(), <answer>
$y in $b/author/text() <title> abc </title>
<author> efg </author>
return <answer> </answer>
<title> { $x } </title> <answer>
<author> { $y } </author> <title> abc </title>
</answer> <author> hkj </author>
</answer>
83
Re-grouping
• For each author, return all titles of her/
his books Result:
<answer>
for $b in document("bib.xml")/bib <author> efg </author>
let $a:=distinct-values($b/book/author/text()) <title> abc </title>
for $x in $a <title> klm </title>
....
return </answer>
<answer>
<author> { $x } </author>
{ for $y in $b/book[author/text()=$x]/title
return $y }
</answer> 84
Re-grouping
• Same thing:
for $b in document("bib.xml")/bib,
$x in distinct-values($b/book/author/text())
return
<answer>
<author> { $x } </author>
{ for $y in $b/book[author/text()=$x]/title
return $y }
</answer>
85
SQL and XQuery Side-by-side
Find all product names, prices,
Product(pid, name, maker, price)
sort by price
SQL
XQuery
86
Xquery’s Answer
<answer>
<name> abc </name>
<price> 7 </price>
</answer>
<answer>
<name> def </name>
<price> 23 </price>
</answer>
....
87
SQL and XQuery Side-by-side
Product(pid, name, maker, price)
Find all products made in Seattle
Company(cid, name, city, revenues)
for $r in document(“db.xml”)/db,
$x in $r/product/row,
SELECT x.name
$y in $r/company/row
FROM Product x, Company y
where
WHERE x.maker=y.cid
$x/maker/text()=$y/cid/text()
and y.city=“Seattle”
and $y/city/text() = “seattle”
return { $x/name }
SQL XQuery
for $y in /db/company/row[city/text()=“seattle”],
Cool $x in /db/product/row[maker/text()=$y/cid/text()]
XQuery return { $x/name } 88
<product>
<row> <pid> 123 </pid>
<name> abc </name>
<maker> efg </maker>
</row>
<row> …. </row>
…
</product>
<product>
...
</product>
....
89
SQL and XQuery Side-by-side
For each company with revenues < 1M, count how many
products with price > $100 they make
SELECT y.name, count(*)
FROM Product x, Company y
WHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000
GROUP BY y.cid, y.name
for $r in document(“db.xml”)/db,
$y in $r/company/row[revenue/text()<1000000]
return
<proudcompany>
<companyname> { $y/name/text() } </companyname>
<numberofexpensiveproducts>
{count($r/product/row[maker/text()=$y/cid/text()][price/text()>100])}
</numberofexpensiveproducts>
</proudcompany> 90
SQL and XQuery Side-by-side
Find companies with at least 30 products, and their average price
SELECT y.name, avg(x.price)
FROM Product x, Company y $r=element
WHERE x.maker=y.cid
GROUP BY y.cid, y.name
HAVING count(*) > 30
for $r in document(“db.xml”)/db,
$y in $r/company/row
let $p := $r/product/row[maker/text()=$y/cid/text()]
$y=collection where count($p) > 30
return
<thecompany>
<companyname> { $y/name/text() }
</companyname>
<avgprice> avg($p/price/text()) </avgprice>
</thecompany> 91
FOR v.s. LET
FOR
• Binds node variables iteration
LET
• Binds collection variables one value
92
FOR v.s. LET
Returns:
for $x in /bib/book <result> <book>...</book></result>
return <result> { $x } </result> <result> <book>...</book></result>
<result> <book>...</book></result>
...
List of tuples
WHERE Clause
List of tuples
RETURN Clause
94
Instance of Xquery data model
XML in SQL Server 2005
• Create tables with attributes of type XML
95
CREATE TABLE DOCS (
ID int primary key,
XDOC xml)
96
XML Methods in SQL
• Query() = returns XML data type
• Value() = extracts scalar values
• Exist() = checks conditions on XML
nodes
• Nodes() = returns a rowset of XML
nodes that the Xquery expression
evaluates to
97
Examples
• From here:
http://msdn.microsoft.com/library/
default.asp?url=/library/en-us/dnsql90/
html/sql2k5xml.asp
98
XML Type
99
Inserting an XML Value
100
Query( )
101
Exists( )
102
Value( )
SELECT xCol.value(
'data((/doc//section[@num = 3]/title)[1])', 'nvarchar(max)')
FROM docs
103
Nodes( )
104
Nodes( )
105
Internal Storage
• XML is “shredded” as a table
• A few important ideas:
– Dewey decimal numbering of nodes; store in clustered B-
tree indes
– Use only odd numbers to allow insertions
– Reverse PATH-ID encoding, for efficient processing of
postfix expressions like //a/b/c
– Add more indexes, e.g. on data values
106
<BOOK ISBN=“1-55860-438-3”>
<SECTION>
<TITLE>Bad Bugs</TITLE>
Nobody loves bad bugs.
<FIGURE CAPTION=“Sample bug”/>
</SECTION>
<SECTION>
<TITLE>Tree Frogs</TITLE>
All right-thinking people
<BOLD> love </BOLD>
tree frogs.
</SECTION>
</BOOK>
107
108
109
Infoset Table
/BOOK[@ISBN = “1-55860-438-3”]/SECTION
110