XML: Introduction To XML, Defining XML Tags, Their Attributes and Values, Document Type Definition, XML Schemas, Document Object Model, XHTML. Parsing XML Data - DOM and SAX Parsers in Java
XML: Introduction To XML, Defining XML Tags, Their Attributes and Values, Document Type Definition, XML Schemas, Document Object Model, XHTML. Parsing XML Data - DOM and SAX Parsers in Java
What is XML?
•XML document is human readable and we can edit any XML document in simple text
editors.
•The XML document is language neutral. That means a Java program can generate an
•Every XML document has a tree structure. Hence complex data can be arranged
<person>
<person gender="female">
<gender>female</gender>
<firstname>Anna</firstname>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
<lastname>Smith</lastname>
</person>
</person>
Defining XML tags, their attributes
and values
XML Namespace
<table>
<table>
<tr>
<name>African Coffee Table</name>
<td>Apples</td>
<width>80</width>
<td>Bananas</td>
<length>120</length>
</tr>
</table>
</table>
If these XML fragments were added together, there would be a name conflict. Both contain a
<table> element, but the elements have different content and meaning.
A user or an XML application will not know how to handle these differences.
Defining XML tags, their attributes
and values
Name conflicts in XML can easily be avoided using a name prefix.
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
Document Type Definition
•An XML document with correct syntax is called "Well Formed".
•An XML document validated against a DTD is both "Well Formed" and "Valid".
•The document type definition is used to define the basic building block of any xml
document. Using DTD we can specify the various elements types, attributes and their
relationship with one another.
•DTD is used to specify the set of rules for structuring data in any XML file.
•Various building blocks of XML are-
•1. Elements – Used for defining tags.
•2. Attribute-Used to specify the values of element.
•3. CDATA-Character data, parsed by the parser.
•4. PCDATA-Parsed Character Data (i.e. Text)
Document Type Definition
Types of DTD
1. Internal DTD
Types of DTD
2. External DTD (student.dtd)
<!ELEMENT student (name,address,std,marks)>
<!ELEMENT name(#PCDATA)>
<!ELEMENT address (#PCDATA)>
<!ELEMENT std(#PCDATA)> DTDDemo.xml
<?xml version = “1.0”?>
<!ELEMENT marks(#PCDATA)> <!DOCTYPE student SYSTEM “student.dtd”>
<student>
<name> Anand</name>
<address>Hyderabad</address>
<std>Second</std>
<marks>70 percent</marks>
</student>
Document Type Definition
Merits of DTD
1. DTDs are used to define the structural components of XML document.
2. These are relatively simple and compact.
3. DTDs can be defined inline and hence can be embedded directly in the XML document.
Demerits of DTD
1. The DTDs are very basic and cannot be much specific for complex documents.
2. The language that DTD uses is not an XML document. Hence various frameworks used by XML cannot be
supported by the dTDs.
3. The DTD cannot define the type of data contained within the XML document. Hence using
DTD we cannot specify whether the element is numeric or string or of data type.
4. There are some XML processor which do not understand DTDs.
5. The DTDs are not aware of namespace concept.
XML Schemas
An XML Schema describes the structure of an XML document.
The XML Schema language is also referred to as XML Schema Definition (XSD). The XML
Schema became the World Wide Web Consortium
(W3C) recommendation in 2001.
The purpose of an XML Schema is to define the legal building blocks of an XML document:
• the elements and attributes that can appear in a document
• the number of (and order of) child elements
• data types for elements and attributes
• default and fixed values for elements and attributes
DTDDemo.xml
<?xml version="1.0"?>
<!DOCTYPE Student SYSTEM "student.dtd">
<Student>
<name>Anand</name>
<address>Hyderabad</address>
<std>Tenth</std>
<marks>90 percent</marks>
</Student>
XML Schemas
StudentSchema.xsd
<?xml version=“1.0”?>
<xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema>
<xs:element name=“Student”>
<xs:complexType>
<xs:seq
uence>
<
x
s
:
e
l
e
m
e
n
MySchema.xml t
XML Schema has a lot of built-in data types. The most common types are:
xs:string
xs:decimal
xs:integer
xs:boolean
xs:date
xs:time
XML Schemas
Advantages of Schema:
4. It is written in XML itself and has a large number of built in and derived types.
Disadvantages of Schema:
1. Both the schemas and DTDs are useful for defining structural components of XML. But the DTDs are
basic and cannot be much specific for complex operations. On the other hand schemas are more specific.
2. The schemas provide support for defining the type of data. The DTDs do not have this ability. Hence
content definition is possible using schema.
4. The XML schema is written in XML itself and has a large number of built in and derived types.
5. The schema is the W3C recommendation. Hence it is supported by various XML validator and XML
processors but there are some XML processors which do not support DTD.
6. Large number of web applications can be built using XML schema. On the other hand
relatively simple and compact operations can be built using DTD.
Document Object Model (DOM)
The document object modeling is for defining the standard for accessing and manipulating
XML.
It is W3 recommendation for handling the structured documents.
DOM provides standard set of programming interfaces for working with XML and HTML.
Document Object Model (DOM) is a set of platform independent and language
neutral application programming interface (API) which describes how to access and manipulate
the information stored in XML or in HTML documents.
hierarchy. This hierarchy allows a developer to navigate through the tree looking for
specific information. Because it is based on a hierarchy of information, the DOM is said to be tree based.
The XML DOM, on the other hand, also provides an API that allows a developer to add,
edit, move, or remove nodes in the tree at any point in order to create an application.
Document Object Model (DOM)
1. Loading an XML file: student.xml
<html> <?xml version="1.0">?
<!—Simple DOM example for loading xml file--> <Student>
<Roll_No>10</Roll_No>
<body>
<Personal_Info>
<script type=“text/javascript”>
<Name>Anand</Name>
try
<Address>Hyderabad</Address>
{ <Phone>8978903739</Phone>
xmlDocument=new ActiveXObject(“Microsoft.XMLDOM”); </Personal_Info>
} <Class>B.Tech</Class>
catch(e) <Subject>WT</Subject>
{ <Marks>100</Marks>
try </Student>
{
xmlDocument=document.implementation.createDocument(“”,””,null);
}
catch(e)
{ alert(e.message)}
}
try
{
xmlDocument.async=false;
xmlDocument.load(“student.xml”);
document.write(“XML Document
} student.xml is loaded”);
catch(e){alert(e.message)}
</script>
</body>
</html>
Document Object Model (DOM)
XHTML
• XHTML is HTML written as XML.
Document Structure
XHTML Elements
XHTML Attributes
Attribute names must be in lower case
Attribute values must be quoted
Attribute minimization is forbidden
XHTML
Example:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Title of document</title>
</head>
<body>
some content
</body>
</html>
XHTML
XHTML Elements Must Be Properly Nested
<p>This is a paragraph</p>
<p>This is another paragraph</p>
<body>
<p>This is a paragraph</p>
</body>
<table width="100%">
<table width="100%">
XHTML
How to Convert from HTML to XHTML
The primary goal of any XML processor is to parse the given XML document. Java has a rich source
of in-built APIs for parsing the given XML document. It is parse in two ways-
DOM is used to parse the XML document using the tree structure. We can access
the information of an XML document by interacting with the tree nodes.
Simple API for XML (i.e. SAX) allows us to access the information of XML document using
sequences of events. Thus there are two methods of parsing the XML document:
• Parsing using DOM (tree based)
• Parsing using SAX(event based)
Parsing XML Data
DOM SAX
DOM is a tree based parsing method used to parse the SAX is an event based parsing method used to parse
given XML document. the given XML document.
In this method, the entire XML document is stored in In this method, the parsing is done by generating the
the memory before actual processing. Hence it sequence of events or it calls handler functions.
requires more memory.
The DOM approach is useful for smaller applications Although SAX development is much more
because it is simpler to use but it is certainly not used challenging, it is useful for parsing the large XML
for larger XML documents because it will then require document because the approach is event based, xml
larger amount of memory. gets parsed node by node and does not require large
amount of memory.
Traversing is done in any direction in DOM approach. Top to bottom traversing is done in this approach.
Parsing XML Data
Using DOM API
In Java JDK, two built-in XML parsers are available – DOM and SAX, both have their pros and cons.
The DOM is the easiest to use Java XML Parser. It parses an entire XML document and load it into
memory, modeling it with Object for easy nodel traversal. DOM Parser is slow and consume a lot
SAX parser is work differently with DOM parser, it does not load any
XML
document into memory and create some representation of
the
document. Instead, the object use callback
XML
SAX
(org.xml.sax.helpers.DefaultHandler) parser
to informs clients of the XML document
function
structure.