xml
xml
Week 4
In XML ...
<Phonebook> <Entry>
<Entry> <LastName>Edmond</LastName>
<LastName>Edgar</LastName> <Title>Dr</Title>
<Title>Miss</Title> <FistName>David</FirstName>
<FistName>Pam</FirstName> <School>Information Systems</School>
<School>Optometry</School> <Campus>GP</Campus>
<Campus>KG</Campus> <Room>S842</Room>
<Room>B501</Room> </Entry>
</Entry> <!-- Entry continues ... -->
</Phonebook>
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 3 / 59
Why XML? – Background
Early Web
Used to publish documents to be read by humans
HTML was designed for the purpose
Today’s Web
Many business activities are performed on the Web
Dynamic interactions:
Web app ⇔ people / Web app ⇔ Web app
Web becomes a platform for data exchange
XML provides a simple, cross-platform data format
Web contains vast amount of data published in HTML format
Many programs process or analyse such data
HTML changes ... (when data inside does not) → the program that
reads the HTML page must change too
XML provides a long-term, reliable data format for publishing
XML
<?xml version=”1.0” ?>
<?xml-stylesheet type=”text/css” href=”staffcard.css” ?>
<staff>
<name>Helen Paik</name>
<title>Lecturer, UNSW</title>
<email>hpaik@cse</email>
<extension>54095</extension>
<photo src=”me.gif” />
</staff>
CSS
staff{background-color: #cccccc; ...}
name{display: block; font-size: 20pt; ... }
title{display: block; margin-left: 20pt;}
email{display: block; font-family: monospace;
extension{display: block; margin-left: 20pt;}
Like any other good inventions, XML is now used for things that are far
beyond its creators original imagination.
A set of ’tags’ that are developed for specific types of documents.
e.g., Chemical Markup Language (CML)
<atom id="caffeine_karne_a_1">
<float builtin="x3" units="A">-2.8709</float>
<float builtin="y3" units="A">-1.0499</float>
<float builtin="z3" units="A">0.1718</float>
<string builtin="elementType">C</string>
</atom>
office
<office>
<phone>1235</phone>
<person>
<name>Alan</name>
phone person person
<age>29</age>
<phone>2044</phone>
</person> 1235
<person>
<name>Sue</name>
<age>45</age>
<phone>2043</phone> name age phone name age phone
</person>
</office>
Alan 29 2044 Sue 45 2043
Example
<biography>
<name><first_name>Alan</first_name> <last_name>Turing</last_name>
</name> was one of the first people to truly deserve the name
<emphasize>computer scientist</emphasize>. Although his contributions
to the field are too numerous to list, his best-known are the
eponymous <emphasize>Turing Test</emphasize> and
<emphasize>Turing Machine</emphasize>.
Mixed content: some elements may contain both character data and child
elements (e.g., <definition> and <biography>).
Avoid using too many (loses structure, more parsing effort ...)
Example
<p>You can use default <code>xmlns</code> to attribute to avoid
having to add the svg prefix to all your elements:</p>
<![CDATA[
<svg xmlns="http://www.w3.org/2000/svg"
width="12cm" height="12cm"
<eclipse rx="110" ry="130"/>
<rect x = "4cm" y="1cm" width="3cm" height="6cm"/>
</svg>
]]>
Phonebook.xml
<Phonebook>
<Entry>
<LastName Title=”Miss”>Edgar</LastName>
<FirstName>Pam</FirstName>
<School>Optometry</School>
<Campus>KG</Campus>
<Room>B501</Room>
<Extension>5695</Extension>
</Entry> <!– and so on –>
<?xml version="1.0"?>
<!DOCTYPE Phonebook [
<!ELEMENT Phonebook (Entry)+ >
<!ELEMENT Entry (LastName, FirstName, School, Campus, Room, Extension)>
<!ELEMENT LastName (#PCDATA)>
<!ELEMENT FirstName (#PCDATA)>
<!ELEMENT School (#PCDATA)>
<!ELEMENT Campus (#PCDATA)>
<!ELEMENT Room (#PCDATA)>
<!ELEMENT Extension (#PCDATA)>
<!ATTLIST LastName Title (Miss | Ms | Mrs | Mr | Dr | Prof) #REQUIRED>
]>
<Phonebook>
<Entry>
<LastName Title="Miss">Edgar</LastName>
<FirstName>Pam</FirstName>
<School>Optometry</School>
<Campus>GP</Campus>
<Room>B501</Room>
<Extension>5695</Extension>
</Entry> <!-- more entries not shown ... -->
</Phonebook>
Phonebook.dtd
<!ELEMENT Phonebook (Entry+) >
<!ELEMENT Entry (LastName, FirstName, School,Campus, Room, Extension)>
<!ELEMENT LastName (#PCDATA)>
<!ELEMENT FirstName (#PCDATA)>
<!ELEMENT School (#PCDATA)>
<!ELEMENT Campus (#PCDATA)>
<!ELEMENT Room (#PCDATA)>
<!ELEMENT Extension (#PCDATA)>
<!ATTLIST LastName Title (Miss | Ms | Mrs | Mr | Dr | Prof) #REQUIRED>
A Book
<book>
<author>
<name>J.K. Rowling</name>
</author>
<detail>
<series>Seventh</series>
<title>Harry Potter and the Deathly Hallows</title>
</detail>
</book>
Creating Elements:
<!ELEMENT book (author, detail)> <!ELEMENT detail (series, title)>
<!ELEMENT author (name)> <!ELEMENT series (#PCDATA)>
<!ELEMENT name (#PCDATA)> <!ELEMENT title (#PCDATA)>
A Book
<book>
<author> <!– more than one authors? –>
<name>E. Harold</name> <name>S. Means</name>
</author>
<detail> <!– not every book is in a series –>
<title>XML in a Nutshell</title>
</detail> </book>
Element Choices
<!ELEMENT newbooks (book+)>
<!ELEMENT book (author+, detail*)>
<!ELEMENT author (name | penname)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT penname (#PCDATA)>
<!ELEMENT detail ((series?, title) | (publisher, release*))>
<!ELEMENT series (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT publisher (#PCDATA)>
<!ELEMENT release (#PCDATA)>
<book>
<author period=”classical” category=”children”>
<name type=”normal”>J.K. Rowling</name>
</author>
<title>Harry Potter and the Half-Blood Prince</title>
</book>
Creating Attributes:
<!ELEMENT book (author, title)>
<!ELEMENT author (name+)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ATTLIST name type (normal | penname) ”normal” #REQUIRED>
<!ATTLIST author period CDATA #REQUIRED
category CDATA #IMPLIED>
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 26 / 59
Defining XML Content: Creating Attributes
The above definition has the following effects on the source doc.
It might be a good idea to fragment the DTD in the same way that the
document content is partitioned:
<!DOCTYPE sql [
<!ELEMENT sql (select, from)>
<!ENTITY % seldef SYSTEM "select.dtd">
%seldef;
<!ENTITY % fromdef SYSTEM "from.dtd">
%fromdef;
<!ENTITY select SYSTEM "select.xml">
<!ENTITY from SYSTEM "from.xml">]>
<sql>&select;&from;</sql>
<Book> <Author>
<Name> <Name>
<ISBN> <First>
<Ed> <Last>
<Email>
<Books>
<Book>
<Name>
<ISBN>
<Ed>
<Author>
<Name>
<First>
<Last>
<Email>
namespace:UniversityX/Student
Book Namespace
ID <Books>
rating name <Book>
language <Name>
<ISBN>
namespace:UniversityY/Student <Ed>
ID <Author>
rating name
<Name>
language <First>
<Last>
<Email>
Author Namespace
The elements of an schema are derived from the XML Schema namespace
(http://www.w3.org/2001/XMLSchema)
Every schema document should start with xs:schema or schema
element.
<schema xmlns="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.example.org"
xmlns:tns="http://www.example.org"
elementFormDefault="qualified">
All the names defined in a schema belong to its targetNamespace.
Element declarations can refer to names in other source namespaces
as well.
A form of inheritance.
Which defines:
<sname>John Doe</sname>
<age>24</age>
<course>COMP9321</course>
Attributes can have default or fixed values and can be optional or required
<simpleType name="ageNum">
<restriction base="integer">
<minInclusive value="1"/>
<maxInclusive value="80"/>
</restriction>
</simpleType>
<element name=’age’ type="tns:ageNum" />
<simpleType name="courseString">
<restriction base="string">
<enumeration value="COMP9321"/>
<enumeration value="COMP9322"/>
<enumeration value="COMP9323"/>
</restriction>
</simpleType>
<element name=’course’ type="tns:courseString"/>
<complexType name="student">
<sequence>
<element name=’sname’ type="tns:nameString" />
<element name=’age’ type="tns:ageNum" />
<element name=’course’ type="tns:courseString"/>
</sequence>
</complexType>
which defines:
<student>
<sname>John Doe</sname>
<age>24</age>
<course>COMP9321</course>
</student>
Some Your
XML Data XML Parser
Interface Application
Standard Your
XML Data XML Parser
Interface Application
TABLE
<TABLE>
<TABLE>
<TBODY>
<TR> <TBODY>
<TD>Assignment One</TD>
<TD>Submission Instructions</TD> <TR> <TR>
</TR>
<TR> <TD> <TD> <TD> <TD>
<TD>Assignment Two</TD>
<TD>Submission Instructions</TD> Ass..One Sub..Inst Ass..Two Sub..Inst
</TR>
</TBODY>
</TABLE>
http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/package-summary.html
H. Paik, S.Venugopal (CSE, UNSW) COMP9321, 13s2 Week 4 48 / 59
Dealing with Nodes in DOM
The root node has no parent but every other node has exactly one
parent node
A node can have any number of children
Two nodes that have the same parent are called siblings
A node with no children is called a leaf node
Among siblings, the node that appears first in sequence is called the
first child and the node that appears last is the last child
Important: Text of an element node is stored in a separate text node.
persons
document.getElementsByTagName("person")[1] person
document.getElementsByTagName("person")[1].parentNode persons
document.getElementsByTagName("person")[1].firstChild first
document.getElementsByTagName("person")[1].lastChild last
document.getElementsByTagName("person")[1].previousSibling person
document.getElementsByTagName("person")[1].lastChild.firstChild Li
import org.w3c.dom.*;
import org.apache.xerces.parsers.DOMParser;
public class DOMCountNames {
public static void main(String[] args) {
try{
DOMParser parser = new DOMParser();
parser.parse(args[0]);
Document doc = parser.getDocument();
NodeList nodelist = doc.getElementsByTagName("book");
System.out.println(args[0] + " has " + nodelist.getLength()
+ " <book> elements.");
}
catch(Exception e){
e.printStackTrace(System.err);
}
}
}