0% found this document useful (0 votes)
10 views59 pages

xml

The document provides an overview of XML (eXtensible Markup Language) and its applications in web development, emphasizing its flexibility and ability to separate data from presentation. It discusses the benefits of XML for data exchange, including self-describing data, modularity, and extensibility, as well as its family of technologies like DOM, DTD, and XSLT. Additionally, it covers XML syntax, structure, and the use of Document Type Definitions (DTD) to define document structure.

Uploaded by

nbhasarkar16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views59 pages

xml

The document provides an overview of XML (eXtensible Markup Language) and its applications in web development, emphasizing its flexibility and ability to separate data from presentation. It discusses the benefits of XML for data exchange, including self-describing data, modularity, and extensibility, as well as its family of technologies like DOM, DTD, and XSLT. Additionally, it covers XML syntax, structure, and the use of Document Type Definitions (DTD) to define document structure.

Uploaded by

nbhasarkar16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Web Application Engineering:

XML and the Web

Service Oriented Computing Group, CSE, UNSW

Week 4

References used for the Lecture:


XML in a nutshell, Chapters 9 and 10
http://www.ibm.com/developerworks/library/xml-schema/
Acknowledgement: Some examples in these notes are originated from Dr. David Edmond from QUT, Brisbane

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 1 / 59


eXtensible Markup Language (XML)

in HTML ... in XML ...


<html> <bibliography>
<h1>Bibliography</h1> <book>
<ol> <title>Foundation of Databases</title
<li><i>Foundation of Databases</i>, <author>Abiteboul</author>
<b>Abiteboul, Hull</b>, 1995</li> <author>Hull</author>
<li><i>Database Systems</i> <year>1995</year>
<b>Elmasri, Navathe</b>, 1994</li> </book>
</ol> <book> <!-- continues --> </book>
</html> </bibliography>

A simple, very flexible and extensible text data format


“extensible” because the markup format is not fixed like HTML
It lets you design your own customised markup
XML is a language that describes data
It separates presentation issues from the actual data

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 2 / 59


XML: Tags, tags, tags
Consider the following snippet of information from a staff list:

LName Title FName School Campus Room


Edgar Miss Pam Optometry KG B501
Edmond Dr David Information Systems GP S842
Edmonds Dr Ian Physical Sciences GP M206

In XML ...

<Phonebook> <Entry>
<Entry> <LastName>Edmond</LastName>
<LastName>Edgar</LastName> <Title>Dr</Title>
<Title>Miss</Title> <FistName>David</FirstName>
<FistName>Pam</FirstName> <School>Information Systems</School>
<School>Optometry</School> <Campus>GP</Campus>
<Campus>KG</Campus> <Room>S842</Room>
<Room>B501</Room> </Entry>
</Entry> <!-- Entry continues ... -->
</Phonebook>
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 3 / 59
Why XML? – Background

Early Web
Used to publish documents to be read by humans
HTML was designed for the purpose

Today’s Web
Many business activities are performed on the Web
Dynamic interactions:
Web app ⇔ people / Web app ⇔ Web app
Web becomes a platform for data exchange
XML provides a simple, cross-platform data format
Web contains vast amount of data published in HTML format
Many programs process or analyse such data
HTML changes ... (when data inside does not) → the program that
reads the HTML page must change too
XML provides a long-term, reliable data format for publishing

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 4 / 59


Why XML?

Benefits of using XML in document (data) exchange


Self-describing, modular and portable data
A common, widely accepted data representation language
Standard supports available for creating/parsing XML docs
Standard supports for checking validity of data
Efficient search of business information
standard support for querying XML docs
quick and simple search (XPath)
more comprehensive keyword + structure based search possible as well
(XQuery)
Extensible document descriptions
XML is flexible (cf. relational tables)!
reuse, adaptation of existing documents

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 5 / 59


Separating the Content from Presentation

XML
<?xml version=”1.0” ?>
<?xml-stylesheet type=”text/css” href=”staffcard.css” ?>
<staff>
<name>Helen Paik</name>
<title>Lecturer, UNSW</title>
<email>hpaik@cse</email>
<extension>54095</extension>
<photo src=”me.gif” />
</staff>

CSS
staff{background-color: #cccccc; ...}
name{display: block; font-size: 20pt; ... }
title{display: block; margin-left: 20pt;}
email{display: block; font-family: monospace;
extension{display: block; margin-left: 20pt;}

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 6 / 59


XML Applications

Like any other good inventions, XML is now used for things that are far
beyond its creators original imagination.
A set of ’tags’ that are developed for specific types of documents.
e.g., Chemical Markup Language (CML)

<atom id="caffeine_karne_a_1">
<float builtin="x3" units="A">-2.8709</float>
<float builtin="y3" units="A">-1.0499</float>
<float builtin="z3" units="A">0.1718</float>
<string builtin="elementType">C</string>
</atom>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 7 / 59


XML Applications

Really Simple Syndication (RSS)


<rss version="0.91">
<channel>
<title>CNN.com</title>
<item>
<title>July ends with 76 ... killed</title>
<link>http://www.cnn.com/.../story.html</link>
<description>Three U.S. soldiers were ...</description>
</item>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 8 / 59


XML is ...
Is a Language
there is a grammar, and it can be parsed by machines
Is a Markup Language
XML looks a bit like HTML (tags).
But it describes what things are, not what they are supposed to do
Is eXtensible
you can define more words and add to the language
XML is for structuring data.
XML is for describing data.
XML is text, but isn’t meant to be read.
XML is verbose by design.
XML is a family of technologies.
XML is license-free, platform-independent and well-supported.
XML is NOT a programming language.
it is not something you can ’compile’
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 9 / 59
The XML Family

XML: a language used to describe information.


DOM: a programming interface for accessing and updating
documents.
DTD/XML schema: a language for specifying the structure and
content of documents.
XSLT: a language for transforming documents.
XPath: a query language for navigating XML documents.
XPointer: for identifying fragments of a document.
XLink: generalises the concept of a hypertext link.
XInclude: for merging documents.
XQuery: a language for making queries across documents.
RDF: a language for describing resources.

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 10 / 59


Quick XML Syntax

An XML document is a tree ...

office
<office>
<phone>1235</phone>
<person>
<name>Alan</name>
phone person person
<age>29</age>
<phone>2044</phone>
</person> 1235
<person>
<name>Sue</name>
<age>45</age>
<phone>2043</phone> name age phone name age phone
</person>
</office>
Alan 29 2044 Sue 45 2043

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 11 / 59


Mixed Content
XML can be used for more free-form documents (e.g., business reports,
magazine articles, essays, short stories, etc.)

Example
<biography>
<name><first_name>Alan</first_name> <last_name>Turing</last_name>
</name> was one of the first people to truly deserve the name
<emphasize>computer scientist</emphasize>. Although his contributions
to the field are too numerous to list, his best-known are the
eponymous <emphasize>Turing Test</emphasize> and
<emphasize>Turing Machine</emphasize>.

<definition>The <term>Turing Test</term> is to this day the standard


test for determining whether a computer is truly intelligent. This test
has yet to be passed.</definition>
</biography>

Mixed content: some elements may contain both character data and child
elements (e.g., <definition> and <biography>).

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 12 / 59


Attributes in XML tags
LName Title FName School Campus Room
Edgar Miss Pam Optometry KG B501
... ... ... ... ... ...

Phonebook with Attributes


<Phonebook>
<Entry entryNumber="001">
<Name Title="Miss">
<Last>Edgar</Last>
<First>Pam</First>
</Name>
<School Campus="KG">Optometry</School>
<Room Building="B" Level="5">01</Room>
</Entry>
</Phonebook>

Attribute order is not significant


Sometimes using attributes can make an XML document concise
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 13 / 59
Attributes in XML tags

LName Title FName School Campus Room


Edgar Miss Pam Optometry KG B501
... ... ... ... ... ...

Phonebook with many attributes ...


<Phonebook>
<Entry entryNumber="001">
<Name Title="Miss" LName="Edgar" FName="Pam"/>
<Location Campus="KG" School="Optometry" Building="B" Room="501"/>
</Entry>
</Phonebook>

Avoid using too many (loses structure, more parsing effort ...)

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 14 / 59


Entity References

The character data inside an element must not contain certain


characters with special meanings (e.g., < means start of a tag)
You must escape the characters using entity references
XML predefines exactly five entity references:
&lt; - The less then sign (<)
&amp; - The ampersand (&)
&gt; - The greater than sign (>)
&quot; - The straight double quotation marks (")
&apos; - The apostrophe, single quote (’)

<image source=’koala.gif’ width=’122’ height=’66’


alt = ’Powered by O&apos;Reilly Books’
/>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 15 / 59


CDATA section
Sometimes the character data of an element might contain too many
characters that need to be escaped (e.g., chunk of other XML parts or
HTML code).

CDATA section lets you enclose the character data as literal.

Example
<p>You can use default <code>xmlns</code> to attribute to avoid
having to add the svg prefix to all your elements:</p>
<![CDATA[
<svg xmlns="http://www.w3.org/2000/svg"
width="12cm" height="12cm"
<eclipse rx="110" ry="130"/>
<rect x = "4cm" y="1cm" width="3cm" height="6cm"/>
</svg>
]]>

Everything between <!CDATA[ and ]]> is treated as raw characters, not


markups.
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 16 / 59
Quick XML syntax

All XML documents must have ’a’ root element


All XML elements must have a closing tag
Empty element tags end with / >
XML tags are case sensitive (NAME vs. Name)
All XML elements must be nested (<p><q></p></q> ??)
Element Naming
letters, numbers, and other characters
must not start with a number, ’. (period)’ or ’- (hyphen)’
must not start with ’xml’ (or XML or Xml ..)
cannot contain spaces
Attribute values must always be quoted (single or double)
Comments in XML: <!-- This is a comment -->

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 17 / 59


Defining the document structure

Phonebook.xml
<Phonebook>
<Entry>
<LastName Title=”Miss”>Edgar</LastName>
<FirstName>Pam</FirstName>
<School>Optometry</School>
<Campus>KG</Campus>
<Room>B501</Room>
<Extension>5695</Extension>
</Entry> <!– and so on –>

How would we communicate the nature of this document? If we were to


describe the document to someone over a phone line, we might say:
1 It’s a kind of internal (staff) phone book.
2 It’s made up of a number of individual entries.
3 Each entry contains the staff members’s last name, title, first name ...
4 A person’s title must be Miss or Mrs or Ms or Mr or Dr or Prof ...
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 18 / 59
DTD (Document Type Definitions)

within XML: Internal DTD outside XML: External DTD


<?xml version="1.0"?> <?xml version="1.0"?> <!DOCTYPE
<!DOCTYPE Login [ Login SYSTEM "login.dtd"> <Login>
<!ELEMENT Login (Username,Password) ><Username>hpaik</Username>
<!ELEMENT Username (#PCDATA)> <Password>IwillNeverTell</Password>
<!ELEMENT password (#PCDATA)> </Login>
]>
<Login>
<Username>hpaik</Username>
<Password>IwillNeverTell</Password>
</Login>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 19 / 59


Phonebook.xml with Internal DTD

<?xml version="1.0"?>
<!DOCTYPE Phonebook [
<!ELEMENT Phonebook (Entry)+ >
<!ELEMENT Entry (LastName, FirstName, School, Campus, Room, Extension)>
<!ELEMENT LastName (#PCDATA)>
<!ELEMENT FirstName (#PCDATA)>
<!ELEMENT School (#PCDATA)>
<!ELEMENT Campus (#PCDATA)>
<!ELEMENT Room (#PCDATA)>
<!ELEMENT Extension (#PCDATA)>
<!ATTLIST LastName Title (Miss | Ms | Mrs | Mr | Dr | Prof) #REQUIRED>
]>
<Phonebook>
<Entry>
<LastName Title="Miss">Edgar</LastName>
<FirstName>Pam</FirstName>
<School>Optometry</School>
<Campus>GP</Campus>
<Room>B501</Room>
<Extension>5695</Extension>
</Entry> <!-- more entries not shown ... -->
</Phonebook>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 20 / 59


Phonebook.xml with External DTD. Phonebook.dtd
Phonebook.xml
<?xml version="1.0"?>
<!DOCTYPE Phonebook SYSTEM "Phonebook.dtd">
<Phonebook>
<Entry>
<LastName Title="Miss">Edgar</LastName>
<FirstName>Pam</FirstName> <!-- rest of the entries -->
</Phonebook>

Phonebook.dtd
<!ELEMENT Phonebook (Entry+) >
<!ELEMENT Entry (LastName, FirstName, School,Campus, Room, Extension)>
<!ELEMENT LastName (#PCDATA)>
<!ELEMENT FirstName (#PCDATA)>
<!ELEMENT School (#PCDATA)>
<!ELEMENT Campus (#PCDATA)>
<!ELEMENT Room (#PCDATA)>
<!ELEMENT Extension (#PCDATA)>
<!ATTLIST LastName Title (Miss | Ms | Mrs | Mr | Dr | Prof) #REQUIRED>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 21 / 59


Defining XML Content: Elements

A Book
<book>
<author>
<name>J.K. Rowling</name>
</author>
<detail>
<series>Seventh</series>
<title>Harry Potter and the Deathly Hallows</title>
</detail>
</book>

Creating Elements:
<!ELEMENT book (author, detail)> <!ELEMENT detail (series, title)>
<!ELEMENT author (name)> <!ELEMENT series (#PCDATA)>
<!ELEMENT name (#PCDATA)> <!ELEMENT title (#PCDATA)>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 22 / 59


Defining XML Content: Modifers

A Book
<book>
<author> <!– more than one authors? –>
<name>E. Harold</name> <name>S. Means</name>
</author>
<detail> <!– not every book is in a series –>
<title>XML in a Nutshell</title>
</detail> </book>

1 ? : optional element (only once)


2 + : mandatory element (1 or more)
3 * : optional element (0 or more)

<!ELEMENT book (author, detail*)> <!ELEMENT detail (series?, title)>


<!ELEMENT author (name+)> <!ELEMENT series (#PCDATA)>
<!ELEMENT name (#PCDATA)> <!ELEMENT title (#PCDATA)>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 23 / 59


Defining XML Content: Choices, Empty

Element Choices
<!ELEMENT newbooks (book+)>
<!ELEMENT book (author+, detail*)>
<!ELEMENT author (name | penname)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT penname (#PCDATA)>
<!ELEMENT detail ((series?, title) | (publisher, release*))>
<!ELEMENT series (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT publisher (#PCDATA)>
<!ELEMENT release (#PCDATA)>

Empty Element Content


<!ELEMENT BR EMPTY>
<BR/> is called an empty element
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 24 / 59
Defining XML Content: Mixed content, Any

Mixed content: mixture of elements and text


<!ELEMENT message (#PCDATA | bold | italic)*>

<message>You <italic> really <bold>must</bold> try this delicious


<bold>new</bold> recipe for <italic>pudding </message>

ANY : Any predefined element could be included


<!ELEMENT book (author+, description, detail*)>
<!ELEMENT author (name+)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT description ANY>
<!ELEMENT detail (series?, title)>
<!ELEMENT series (#PCDATA)>
<!ELEMENT title (#PCDATA)>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 25 / 59


Defining XML Content: Creating Attributes

<book>
<author period=”classical” category=”children”>
<name type=”normal”>J.K. Rowling</name>
</author>
<title>Harry Potter and the Half-Blood Prince</title>
</book>

Creating Attributes:
<!ELEMENT book (author, title)>
<!ELEMENT author (name+)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ATTLIST name type (normal | penname) ”normal” #REQUIRED>
<!ATTLIST author period CDATA #REQUIRED
category CDATA #IMPLIED>
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 26 / 59
Defining XML Content: Creating Attributes

Default values for attributes:

The default postcode in an address is to be 4001, state must be QLD.

<!ATTLIST Address Postcode CDATA ”4001”


State CDATA #FIXED ”QLD”>

The above definition has the following effects on the source doc.

<Address /> → <Address Postcode=”4001” State=”QLD”>


<Address Postcode=”4010” State=”QLD”/> → (no error)
<Address Postcode=”4001” State=”NSW”/> → (error)

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 27 / 59


XML Custom Entities
Sometimes it might be desirable to construct a document from several
(other) XML documents:
<!DOCTYPE sql [
<!ELEMENT sql (select, from)>
<!ELEMENT select (col+)>
<!ATTLIST select order CDATA #REQUIRED>
<!ELEMENT col (#PCDATA)>
<!ELEMENT from (table+)>
<!ELEMENT table (#PCDATA)>
<!ENTITY select SYSTEM "select.xml">
<!ENTITY from SYSTEM "from.xml">]>
<sql>&select;&from;</sql>
Where ”select.xml” contains: And ”from.xml” contains:
<select order="cost"> <from>
<col>CarNr</col> <table>Cars</table>
<col>Make</col> </from>
<col>Cost</col>
</select>
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 28 / 59
Parameter Entities

It might be a good idea to fragment the DTD in the same way that the
document content is partitioned:
<!DOCTYPE sql [
<!ELEMENT sql (select, from)>
<!ENTITY % seldef SYSTEM "select.dtd">
%seldef;
<!ENTITY % fromdef SYSTEM "from.dtd">
%fromdef;
<!ENTITY select SYSTEM "select.xml">
<!ENTITY from SYSTEM "from.xml">]>
<sql>&select;&from;</sql>

Where select.dtd is defined as: and ”from.dtd” is:


<!ELEMENT select (col+)> <!ELEMENT from (table+)>
<!ATTLIST select order CDATA #REQUIRED> <!ELEMENT table (#PCDATA)>
<!ELEMENT col (#PCDATA)>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 29 / 59


Well-formedness and Validity of XML
Well-formedness Rules:
Open and close all tags
Empty-element tags end with />
There is a unique root element
Elements may not overlap
Attribute values are quoted
< and & are only used to start tags and entity references, respectively
Only the five predefined entity references are used
Validity Rules:
Well-formed
Must have a Document Type Definition (DTD)
Must comply with the constraints specified in the DTD

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 30 / 59


Limitations of DTD

Limited support for constraining attribute values


No limits on character data
Limited support for namespaces
No self-documentation, does not have XML syntax

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 31 / 59


XML Schema Definition

XML Schema - W3C’s recommendation for replacing DTD with features


such as:
Simple and complex data types
Type derivation and inheritance
Namespace-aware element and attributes
Limits on number of appearances by an element
Combining with regular expressions for finer control over document
structure

Most importantly, XML Schemas are well-formed XML documents


themselves. But first, what is a namespace ?

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 32 / 59


XML Namespaces

XML elements can have any names.


What if a name could mean two different things (ie., name clash)?
The following two XML documents that describe student information.

From University X: From University Y:


<student> <student>
<id>12345</id> <id>534-22-5252</id>
<name>Jeff Smith</name> <name>Bob Citizen</name>
<language>C#</language> <language>Spanish</language>
<rating>9.5</rating> <rating>3.2</rating>
</student> </student>

How could a program distinguish the different elements?

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 33 / 59


Another example ...

<Book> <Author>
<Name> <Name>

<ISBN> <First>

<Ed> <Last>

<Email>
<Books>
<Book>
<Name>

<ISBN>

<Ed>

<Author>
<Name>

<First>

<Last>

<Email>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 34 / 59


XML Namespaces

A namespace is a set of names in which all names are unique.

The name ’title’ can now be identified


namespace:Book
ID
as: Book.title, Project.title,
title
price Employee.title ...
author
publisher
These names are called “qualified
namespace:Project
names”.
due-date XML namespaces give elements and
ID manager
title
budget
attributes a unique name across the
auditor
Internet.
namespace:Employee XML namespaces enable programmers
ID to process the tags and attributes they
firstname salary title care about and ignore those that don’t
lastname
matter to them.

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 35 / 59


Previous examples can now be ...

The previous examples can now have qualified names:

namespace:UniversityX/Student
Book Namespace
ID <Books>
rating name <Book>

language <Name>

<ISBN>

namespace:UniversityY/Student <Ed>

ID <Author>
rating name
<Name>
language <First>

<Last>

<Email>

Author Namespace

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 36 / 59


XML Namespace Syntax
xmlns:<prefix>=’namespace identifier’
eg., <books xmlns:xdc="http://www.xml.com/books">
not a normal XML attribute (treated differently)
the URI must be unique, but may not represent a ’useful’ resource
the prefix is by convention or author’s choice
Consider the following XML document: painting.xml
<catalog>
<rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#">
<rdf:Description xmlns:dc="http://purl.org/dc/" about="painting.xml">
<dc:title>Impressionist Paintings</dc:title>
<dc:creator>Elliotte Rusty Harold</dc:jcreator>
<dc:description>impressionist paintings</dc:description>
<dc:date>2000-08-22</dc:date>
</rdf:Description>
</rdf:RDF>
<painting>
<title>Memory of the Garden at Etten</title>
<artist>Vincent Van Gogh</artist>
<date>1888</date>
<descrption>Two women look to the left.</description>
</painting> </catalog>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 37 / 59


XML Schema Definition

The elements of an schema are derived from the XML Schema namespace
(http://www.w3.org/2001/XMLSchema)
Every schema document should start with xs:schema or schema
element.

<schema xmlns="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.example.org"
xmlns:tns="http://www.example.org"
elementFormDefault="qualified">
All the names defined in a schema belong to its targetNamespace.
Element declarations can refer to names in other source namespaces
as well.
A form of inheritance.

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 38 / 59


XML Schema Definition

Types and Declarations


Simple Types - Basic data types such as strings, integers, boolean,
etc.
Complex Types - Composed of simple types. Consists of an
arrangement of elements and attributes.
Element declarations - Associates an element name to an instance of
a simple or complex type
Attribute declaration - Associates an attribute name with an instance
of a simple type.

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 39 / 59


Simple Types

Simple Type Declaration


Cannot contain elements or attributes
Can be pre-defined or user-defined.
Examples:

<element name=’sname’ type="string" />


<element name=’age’ type="integer" />
<element name=’course’ type=’string’/>

Which defines:

<sname>John Doe</sname>
<age>24</age>
<course>COMP9321</course>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 40 / 59


Attributes

Attributes are simple types as well (however, simple types themselves


CANNOT contain attributes)

<attribute name="currency" type="string"/>

Attributes can have default or fixed values and can be optional or required

<attribute name="currency" type="string" default="EUR"/>


<attribute name="currency" type="string" fixed="AUD"/>
<attribute name="currency" type="string" use="required"/>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 41 / 59


Type Restrictions

Restrictions are used to specify a range of acceptable values for XML


elements or attributes.
Examples:
<simpleType name="nameString">
<restriction base="string">
<pattern value="([a-zA-Z])+"/>
</restriction>
</simpleType>
<element name=’sname’ type="tns:nameString" />

<simpleType name="ageNum">
<restriction base="integer">
<minInclusive value="1"/>
<maxInclusive value="80"/>
</restriction>
</simpleType>
<element name=’age’ type="tns:ageNum" />

<simpleType name="courseString">
<restriction base="string">
<enumeration value="COMP9321"/>
<enumeration value="COMP9322"/>
<enumeration value="COMP9323"/>
</restriction>
</simpleType>
<element name=’course’ type="tns:courseString"/>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 42 / 59


Complex Types

Complex Types can be empty or composed of only elements, or only text,


or a mix of both elements and text.
The number of elements are controlled by the indicators as below:
Order indicators:
All - all elements specified in the type can occur in any order but
must occur only once
Choice - either one or the other element must be present
Sequence - all elements must occur in the order specified. You can
have more than one element.
Occurrence indicators:
maxOccurs
minOccurs

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 43 / 59


Complex Types

<complexType name="student">
<sequence>
<element name=’sname’ type="tns:nameString" />
<element name=’age’ type="tns:ageNum" />
<element name=’course’ type="tns:courseString"/>
</sequence>
</complexType>

which defines:

<student>
<sname>John Doe</sname>
<age>24</age>
<course>COMP9321</course>
</student>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 44 / 59


Parsing XML documents with Java
Inside an XML file, there are ...
Markup:
Tags, Entity References, Comments Processing Instructions, DTD
declarations, XML declaration, and CDATA Section Delimiters
and Character Data which includes everything else
Parsed Character Data (PCDATA): character data left after entity
references are replaced with their text
e.g. Given <PUBLISHER>A &amp; M Records</PUBLISHER>, the
parsed character data is A & M Records
<?xml version="1.0" encoding="UTF-8">
<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG xmlns="http://www.cafeconleche.org/namespace/song">
<TITLE>New Slang</TITLE>
<ARTWORK ALT="Garden State" WIDTH="100" HEIGHT="200"/>
<ARTIST>The Shins</ARTIST>
<!-- The publisher is actually Polygram but I needed an example of a general entity reference. -->
<PUBLISHER>A &amp; M Records</PUBLISHER>
<LENGTH>3:29</LENGTH>
<YEAR>2004</YEAR>
</SONG>

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 45 / 59


Parsing XML documents with Java

What do you mean by ’parsing (or processing) XML docs’ ?


Parsing makes an interface available to your application that needs to
make use of the document
Through the interface, you can modify, retrieve the document
contents.

Some Your
XML Data XML Parser
Interface Application

What if the interface provided by the parser is parser-specific? → your


application will have to be ’parser-specific’.
Obviously ... we want “STANDARD”!

Standard Your
XML Data XML Parser
Interface Application

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 46 / 59


SAX and DOM as the Standard Interfaces

SAX - the Simple API for XML


DOM - the Document Object Model
Why two standards? → trade-off between control and performance
DOM gives you a tree structure
you have a complete control over the structure
ie., traverse the tree, modify structure, etc.
the tree gets stored in memory all at once
SAX lays out the document in time, as a sequence of ’events’
events are associated with each tag (open/close), each tag body, etc.
you will write event handlers (ie., you can ignore certain events)
It requires much less memory
It becomes difficult to use if processing an element depends on
earlier/later elements

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 47 / 59


Document Object Model (DOM)

DOM is an API for HTML and XML documents, its specification is


developed by W3C (http://www.w3.org/DOM/DOMTR)
It defines the logical structure of documents and the way a document
is accessed and manipulated.

TABLE
<TABLE>
<TABLE>
<TBODY>
<TR> <TBODY>
<TD>Assignment One</TD>
<TD>Submission Instructions</TD> <TR> <TR>
</TR>
<TR> <TD> <TD> <TD> <TD>
<TD>Assignment Two</TD>
<TD>Submission Instructions</TD> Ass..One Sub..Inst Ass..Two Sub..Inst
</TR>
</TBODY>
</TABLE>

http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/package-summary.html
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 48 / 59
Dealing with Nodes in DOM

In DOM, XML Documents are treated as a tree of nodes


Types of Nodes: Twelve different kinds of node are defined by the W3C
DOM standard.
elements
attributes
text
CDATA
entity reference
entity
processing instruction
comment
Document
Document type
Document fragment
Notation

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 49 / 59


An example XML here ...
<?xml version="1.0"?>
<bibliography>
<book>
<author>North, Ken</author>
<title>Database magic with Ken North</title>
<address>Upper Saddle River, NJ</address>
<publisher>Prentice Hall</publisher>
<year>1999</year>
<isbn>0136471994</isbn>
</book>
<book>
<author> Elmasri, Ramez, and Shamkant B. Navathe</author>
<title>Fundamentals of database systems</title>
<address>Reading, Mass.</address>
<publisher>Addison-Wesley Pub Co</publisher>
<year>2000 </year>
<edition>3rd</edition>
<isbn>0201542633</isbn>
</book>
<book>
<author>Feiler, Jesse</author>
<title>Database-driven Web sites</title>
<address>San Francisco</address>
<publisher>Morgan Kaufmann</publisher>
<year>1999</year>
<isbn>0122513363</isbn>
</book>
<!-- more book -->
</bibliography>
% Text of an element node is stored in a text node.
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 50 / 59
DOM for XML

The root node has no parent but every other node has exactly one
parent node
A node can have any number of children
Two nodes that have the same parent are called siblings
A node with no children is called a leaf node
Among siblings, the node that appears first in sequence is called the
first child and the node that appears last is the last child
Important: Text of an element node is stored in a separate text node.

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 51 / 59


Using a DOM Parser (eg., Apache Xerces)
import org.w3c.dom.*;
import org.apache.xerces.parsers.DOMParser;

public class DOMCountNames {


public static void main(String[] args) {
try {
DOMParser parser = new DOMParser();
parser.parse(args[0]);
Document doc = parser.getDocument();
// do something ..
}
catch(Exception e){
e.printStackTrace(System.err);
}
}
}

The DOMParser Class:


DOMParser class is derived from the XMLParser class
parse() method parses the input source given by a system identifier
Document getDocument() method returns the document itself
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 52 / 59
Document Interface Methods

Once you have the Document object, you can:

Attr createAttribute(String name): Creates an attribute


Element createElement(String tagName): Creates an element
Text createTextNode(String data): Creates a Text Node
Element getDocumentElement(): Gets the root element of the
document
Element getElementById(String elementId): Get the element by ID
NodeList getElementsByTagName(String tagname): Returns a
NodeList of all the elements with a given tag name
NodeList Interface Methods:

int getLength(): Gets the number of nodes in this list


Node item(int index): Gets the item at the specified index value in
the collection
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 53 / 59
Examples of Node Properties (XML), p.9.25

persons

person person person

first last first last first last

Alan Wiles Jun Li Sue White

document.getElementsByTagName("person")[1] person

document.getElementsByTagName("person")[1].parentNode persons

document.getElementsByTagName("person")[1].childNodes first last

document.getElementsByTagName("person")[1].firstChild first

document.getElementsByTagName("person")[1].lastChild last

document.getElementsByTagName("person")[1].previousSibling person

document.getElementsByTagName("person")[1].lastChild.firstChild Li

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 54 / 59


Count/Print the number of ’book’ elements

import org.w3c.dom.*;
import org.apache.xerces.parsers.DOMParser;
public class DOMCountNames {
public static void main(String[] args) {
try{
DOMParser parser = new DOMParser();
parser.parse(args[0]);
Document doc = parser.getDocument();
NodeList nodelist = doc.getElementsByTagName("book");
System.out.println(args[0] + " has " + nodelist.getLength()
+ " <book> elements.");
}
catch(Exception e){
e.printStackTrace(System.err);
}
}
}

Compiling and Running


% javac -classpath ”:xerces.jar” DOMCountNames.java
% java -classpath ”:xerces.jar” DOMCountNames books.xml

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 55 / 59


Dealing with Nodes in DOM

There is a large range of methods that can be applied to the nodes:

Node Interface Methods


getNodeName() getNodeValue() setNodeValue(. . . )
getNodeType() getParentNode() getChildNodes()
getFirstChild() getLastChild() getPreviousSibling()
getNextSibling() getAttributes() getOwnerDocument()
insertBefore(. . . ) replaceChild(. . . ) removeChild(. . . )
appendChild(. . . ) hasChildNodes() cloneNode(. . . )
normalize() isSupported(. . . ) getNamespaceURI()
hasAttributes() getLocalName()

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 56 / 59


Dealing with Nodes

Consider the following program:


Document doc = parser.getDocument();
Element docRoot = doc.getDocumentElement();
String docRootName = docRoot.getTagName();
System.out.println("Doc root: "+docRootName);
int i = 0;
for (Node node = docRoot.getFirstChild();
node != null;
node = node.getNextSibling()) {
if (node.getNodeType()==Node.ELEMENT_NODE) {
System.out.println(i+": " + node.getNodeType()
+ node.getNodeName());
}
else {
System.out.println(i+": " + node.getNodeType());
}
i++;
}

The method getNodeType() returns a number in the range 1 to 12. Thus


we can tell which kind of node we are dealing with. What will the output
of this program be?
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 57 / 59
The Element interface
This interface outlines operations that are specific to elements:
getTagName(): This method returns the name of the tag associated
with the element.
getAttribute(name): This method returns a string containing the
value of an attribute:
name is the name of the attribute.
Amend the code on the previous example to print out the name and value
of the type attribute attached to the food element.

Element tmp = (Element)node;


if (tmp.getTagName().equals(”food”))
{
String typeVal = tmp.getAttribute(”type”);
System.out.println(”type: ” + typeVal);
}

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 58 / 59


More with DOM ...

Heaps of hands on tutorials on the web ...


DOM and Javascript: e.g.,
http://www.sitepoint.com/print/xml-javascript-mozilla
https://developer.mozilla.org/en/The_DOM_and_JavaScript

DOM, XML, Javascript and Ajax: e.g.,


http://www.w3schools.com/Ajax/ajax_intro.asp

H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 59 / 59

You might also like