0% found this document useful (0 votes)
110 views20 pages

WT Unit 2

XML is a markup language that defines rules for encoding documents in a format that is both human-readable and machine-readable. It allows users to define their own tags to structure information. The document discusses XML's advantages over HTML, how XML documents are structured using elements, attributes, and tags, and how they must adhere to syntax rules to be considered well-formed. It provides an example XML document defining student data to demonstrate how custom tags can be defined and used without an associated DTD or schema.

Uploaded by

s s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views20 pages

WT Unit 2

XML is a markup language that defines rules for encoding documents in a format that is both human-readable and machine-readable. It allows users to define their own tags to structure information. The document discusses XML's advantages over HTML, how XML documents are structured using elements, attributes, and tags, and how they must adhere to syntax rules to be considered well-formed. It provides an example XML document defining student data to demonstrate how custom tags can be defined and used without an associated DTD or schema.

Uploaded by

s s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Advanced Java & Web Technologies UNIT – III : Working with XML

Working with XML: Document type Definition, XML schemas, Document object model, XSLT, DOM
and SAX.
XML is the acronym of eXtensible Markup Language. The XML standard was developed by W3C.
The primary purpose of this standard was to provide a way to store self describing data easily and the
motivation for the development of XML was the deficiencies of HTML.

Problems with HTML – is that it was defined to describe the layout of information without
considering its meaning. To describe a particular kind of information, it would be necessary to have tags
indicating the meaning of the elements content.

XML is used to describe structured data or information. HTML documents describe how data
should appear on the browser’s screen. They carry no information about the data. XML documents on the
other hand, describe the meaning of data. XML documents may also refer to presentation information.
XML is used as a primary means to manipulate and transfer structured data over the web. The content
and structure of XML documents are accessed by a software module, called software processor. Like
HTML documents, data are marked up by tags in XML documents. HTML supports a predefined set of
tags whereas XML allows us to define new tags and use them in XML documents to satisfy application
requirement. As more new tags can be defined, XML is said to be extensible.

XML was not meant to be a replacement for HTML:


XML and HTML have different goals.
HTML is a markup language meant to describe the layout of general information and provide
some guidance that how the info should be displayed.
XML is a meta-markup language that provides a framework for defining specialized markup
languages.
HTML itself can be defined as an XML markup language.

XML is a universal data interchange language:


XML provides a simple and universal way of storing any textual data.
Data stored in XML documents can be electronically distributed and processed by any number of
different processing applications.
XML is not a markup language; it is a meta markup language that specifies rules for creating
markup languages. As a result, XML includes no tags. The designer have to define collection of
tags while designing a markup language using XML.
A markup language designed with XML is called an XML application.
A browser cannot have default presentation styles for elements it has never seen. Therefore the
data in an XML document can be displayed by browsers only if the presentation styles are
provided by style sheets of some kind.

XML is
easy to understand;
non-proprietary plain-text:
o human readable,
o software independent,
o hardware independent;
(relatively) easy to write a parser for;
widespread: very well supported by both commercial and open source software.

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 1


Advanced Java & Web Technologies UNIT – III : Working with XML
Example for an XML document, which describes an e-mail message is
<note>
<to>John</to>
<from>Ani</from>
<heading>Reminder</heading>
<body>Return my book on Monday</body>
</note>
XML syntax:
An XML document consists of Prolog and Body parts.
The prolog part maycontain XML declaration, optional processing instructions, comments,
Document Type Definitions etc.
XML declaration : W3C recommends that an XML document should start with a declaration.
Eg: <? xml version=”1.0” ?>
This declaration tells the processing agent that the document is an XML document. Version attribute
specifies used version of XML document.
o Version attribute specifies used version of XML mandatory.
o Encoding, an optional attribute specifies the type of encoding scheme.
o The optional standalone attribute specifies whether the document can be processed as a stand-alone
document or is dependent on other documents.
<?xml version=”1.0” encoding=”utf-8” standalone=”no” ?>

Processing instructions start with <? And end with ?>. They contain special instructions to pass
parameters to the application. These parameters instruct the application about how to interpret the XML
document.
Eg: <?xml-stylesheet href=”simple.xsl” type=”text/xsl” ?>
This processing instruction states that the XML document should be transformed using the style sheet
simple.xsl.

Comments in XML strats with <!—and ends with --> like HTML. Everything within theses character
sequences will be ignored by the parsers and will not be parsed. Any character sequence, including
markup, is allowed inside a comment, except “--”

Document Type Defintion used to specify the logical structure of the XML document.

Body portion contains textual data marked up by tags. It must have one element called, document
element or root element. The root element must be top-level element in the document hierarchy and there
can be only one root element. The root element contains other elements which, in turn, contain other
elements and so on.
<?xml version=”1.0” encoding=”utf-8” ?>
<contact>
<person>
<name>B S Roy</name>
<number>9998765234</number>
</person>
<person>
<name>sairam</name>
<number>9998995212</number>
</person>
</contact>

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 2


Advanced Java & Web Technologies UNIT – III : Working with XML
XML Elements
XML elements consist of a starting tag, an ending tag, its content and attributes. The content may
be simple text, or other elements. Tags are similar to HTML tags.
Example : <greeting>Hello, word!</greeting>

The following figure depicts the structure of an XML element.

Element names can contain letters, digits, and some other special characters, they cannot start with
a number or punctuation mark, they must not contain the string XML (in any case), they should not
contain white spaces.

Attributes are used for describing and providing more information about elements. They appear in
the starting tag of the element. An element can have multiple attributes.
Syntax: <element-name attr-name=”attr-value” …… > …………. </element>
Example: <employee gender=”male”> ……. </employee>

Predefined entities:
Some characters are reserved by the XML syntax itself. Hence, they cannot be used directly. To use them,
some replacement-entities are used, which are listed below:
not allowed replacement-entity character description
character

< &lt; less than

> &gt; greater than

& &amp; ampersand

' &apos; apostrophe

" &quot; quotation mark

Well-formed XML
An XML document is said to be well-formed if it adheres to the following syntax rules.
1. All XML documents begin with an XML declaration.
2. XML comments must be enclosed in between <!-- an -->. Comments text cannot contain two
adjacent dashes.
3. An XML element name must begin with a letter or an underscore and can include digits, hyphens,
periods.
4. XML names are case-sensitive. There is no length limitation for XML names.
5. Every XML document defines a single root element, whose opening tag must appear on the first
line of XML code.
6. Every XML element must have a closing tag.
7. All tags must be properly nested.
8. Attributes must always be quoted.

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 3


Advanced Java & Web Technologies UNIT – III : Working with XML
Consider the following simple example:

Student.xml
<?xml version=”1.0” encoding=”utf-8”?>
<class>
<student regid=’501’>
<name>Vamsika</name>
<contactno>9885409528</contactno>
<email>[email protected]</email>
<address>
<street>Bank Colony</street>
<city>Bhimavaram</city>
<state>AndhraPradesh</state>
<zip>534201</zip>
</address>
</student>
<student regid=’502’>
<name>Srithan</name>
<contactno>9886756452</contactno>
<email>[email protected]</email>
<address>
<street>Kukatpally</street>
<city>Hyderabad</city>
<state>Telangana</state>
<zip>500021</zip>
</address>
</student>
</class>

This document effectively defines an XML tag set. This example shows that an XML-based markup
language can be defined without a DTD or XML schema, but the above XML document is an informal
definition with no structure rules.

XML documents
An XML document is said to be valid, if it is well-formed, comply with rules specified in DTD/schema.

XML document often uses two auxillary files.


1. Specifies its tag set and structural syntactic rules
2. A stylesheet to describe how the content of the document is to be printed/displayed.

Document Type Defintions


A Document Type Defintion (DTD) is a set of structural rules called declarations, which specify a
set of elements that can appear in the document as well as how and where these elements may
appear.
DTDs also provide entity definitions
DTDs are used when the same tag set definition is used by a collection of documents.
A document can be tested against DTD to determine whether it conforms to the rules the DTD
describes.

DTDs can be specified in two ways:


1. Internal DTD: A DTD can be embedded in the XML document whose syntax rules it describes.
Internal DTD must be introduced with

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 4


Advanced Java & Web Technologies UNIT – III : Working with XML
<!DOCTYPE rootname [
:
]
Example :
<?xml version=”1.0” emcoding=”utf-8”>
<!DOCTYPE class [
:
]

2. External DTD: DTD is stored in an external file. To include external DTD, the syntax is
<!DOCTYPE rootname SYSTEM “external dtd file” >
Example : <!DOCTYPE class SYSTEM “student.dtd”>

Syntactically, a DTD is a sequence of declarations. Each declaration has a form of a markup declaration.
<!keyword ……. >

Possible keywords used in the declaration are


1. ELEMENT : used to define tags.
2. ATTLIST : used to define tag attributes.
3. ENTITY : used to define entities.
4. NOTATIONS : used to define datatype notations. This keyword is infrequently used.

Declaring Elements
Each element declaration in a DTD specifies the structure of one category of elements. The
declaration provides name of the element along with specification of the structure of that element.
XML document is like a general tree. An element is a node in that tree. It can be either a leaf node
or an internal node.
If the element is a leaf node, its syntactic description is its character pattern.
If the element is an internal node, its syntactic description is a list of its child elements, each of
which can be a leaf node or an internal node.
The form of an internal node declaration i.e., an element declaration for elements that contain
element is
<!ELEMENT element-name (list of names of child elements)>
Example : <!ELEMENT memo(from, to, date, re, body)>
A modifier is added to the child element specification to specify the number of times that a child
element may appear. Child element specification modifiers are
o + - one or more occurrences
o - zero or more occurrences
o ? – zero or one occurrence.
Example : <!ELEMENT person(parent+, age, spouse?, sibling*)>
Leaf nodes of a DTD specify the data types of the content of their parent nodes. These data types
can be
o PCDATA – stands for parsable character data. PCDATA is a string of any printable
characters except less-than (<) and ampersand (&)
o CDATA – unparsed character data.
o EMPTY – used to specify that the element has no content.
o ANY – used to specify when the element may contain literally any content.
Example for leaf node : <!ELEMENT element-name (#PCDATA)>

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 5


Advanced Java & Web Technologies UNIT – III : Working with XML
Declaring Attributes:
The attributes are declared separately from the element declaration. An attribute declaration must include
The name of the element to which the attribute belongs to
Attributes name
Attributes type
Default value (optional)
The general form of attribute declaration is
<!ATTLIST element_name attribute_name attribute_type [default_value] >
The declarations can be combined, if there are more than one attribute.

The default value in an attribute declaration can specify either an actual value or a requirement for the
value of the attribute in the XML document. Possible default values for attribute are
A value – The quoted value, which is used if none is specified in an element.
#FIXED value – The quoted value, which every element will have and which cannot be changed.
#REQUIRED – No default value is given; every instance of the element must specify a value.
#IMPLIED – No default value is given; the value may or maynot be specified in an element.

Example: <!ATTLIST airplane places CDATA “4”>


<!ATTLIST airplane engine_type CDATA #REQUIRED>
<!ATTLIST airplane price CDATA #IMPLIED>
<!ATTLIST airplane manufacturer CDATA #FIXED “Cessna”>

A valid XML element for this DTD is


<airplane places=”10” engine_type=”jet”>……..</airplane>

Declaring Entities:
To reference entities in the XML document, they should be defined, then they will become general
entities. The entities which are referenced only in DTDs are called parameter entities.
The form of an entity declaration is <!ENTITY [%] entity_name “entity_value”>
Optional % sign specifies that the entity is a parameter entity rather than a general entity.

Example : If a document includes large number of references to the full name of ―Nara Chandra Babu
Naidu‖, then an entity can be defined to represent his complete name.
<!ENTITY ncbn “Nara Chandra Babu Naidu”>
Then the reference &ncbn; specifies complete name in the XML document.
A Sample DTD for the student
Student.dtd
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT class (student+)>
<!ELEMENT student(name, contactno, email, address)>
<!ELEMENT name (#PCDATA) #REQUIRED>
<!ELEMENT contactno (#PCDATA)>
<!ELEMENT email (#CDATA)>
<!ELEMENT address(street, city, state, zip)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip (#PCDATA)>

<!ATTLIST student regid CDATA #REQUIRED>

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 6


Advanced Java & Web Technologies UNIT – III : Working with XML
The document is said to be valid, if the XML document conforms to the specified DTD. XML
parsers report errors, if XML document are not well formed, which means that they do not follow
syntactic rules.

Namespaces
It is often convenient to construct XML documents that include tagsets that are defined for and used
by other documents. For this purpose W3C has developed a standard for XML namespaces.
An XML namespace is a collection of element and attribute names used in XML documents. The name
of a namespace usually has the form of Uniform Resource Identifier (URI). A namespace for the elements
and attributes of the hierarchy rooted at a particular element is declared as the value of the attribute xmlns.
The form of a namespace declaration for an element follows:
<element-name xmlns:[prefix] = URI>
The optional prefix is the name that must be attached to the names in the declared namespace. A
prefix is used for two reasons.
1. The URI is too long to be typed on every occurrence of every name from the namespace.
2. A URI includes characters that are illegal in XML.

Note that the element for which a namespace is declared is usually the root of a document.
Eg: <html xmlns = ―http://www.w3.org/1999/xhtml‖>

XML Schemas
XML Schema, recommended by W3C is an XML based schema language. It is a language used to
create XML-based languages and data models. XML schema document defines elements and attribute
names for a class of XML documents.

XML Schema Definition is a specific XML schema document written using XML schema and its filename
extension is ―.xsd‖.

XML Schema Instance : The XML documents that try to follow the rules specified by the XML schema
document are said to be instances of that schema. If they strictly conform to the schema, they are valid
instances.

Limitations of DTD : Although DTDs are well accepted and used very frequently, they have several
limitations, some of them are
There is no built-in data type in DTDs.
No new data type can be created in DTDs.
No support for namespaces.
DTDs provide very limited support for modularity and reuse.
Not possible to put restrictions on text content.
Little control over mixed content (text + elements)
DTDs are written in strange format and are difficult to validate.
DTDs are written in a syntax unrelated to XML, so they cannot be analyzed with XML processor.

Strengths of Schema : Schemas are XML-based alternatives to DTDs in that they are used to create classes
of XML documents that conform to the schema.
XML Schema is a much more powerful language than DTD.
Supports large number of built-in data types.
XML schemas are namespace centric.
Extensible to future additions.

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 7


Advanced Java & Web Technologies UNIT – III : Working with XML
Supports the uniqueness and referential integrity constraints in a much better way.
It is easier to define data facets ( restrictions on data)
Main strength is that XSDs are written in XML.
An XML schema is an XML document, so it can be parsed with an XML parser.

XML Schema Structure


An XSD is an XML document that must follow all the syntax rules like any other XML document.
It must be well formed. Since every XSD is itself an XML document, it starts with an XML declaration with
version, optional encoding and standalone attributes. In addition, an XML schema definition has the
following components
Schema component
Element definitions
Attribute definitions
Annotations
Type definitions

XML Schema Element


The schema element is the container element where specification of all elements, attributes and
data types are stored. An XML schema is composed of top-level schema element i.e., root element. The
schema element definition must include the following namespace. http://www.w3.org/2001/XMLSchema
the sample XML schema document is:
<?xml version=’1.0’ ?>
<xs:schema
xmlns:xs = “http://www.w3c.org/2001/XMLSchema”>

</xs:schema>
The "xs" designation is called the "Namespace Prefix," and can be used with an element and a child
element. The schema elemt contains type definitions (simpleType and ComplexType elements) and attribute
and element declarations.

Element Declaration:
The primary building blocks of any XML documents are elements. In a schema, elements are
declared by the element tag. General example of element declaration is
<xs:element name=’element-name’ type=’element-type’>
Elements must have a name, and the value of this attribute is the element name that will appear in
the XML document. For element types, XML schema supports a number of built-in data types, such as
string, integer, boolean and date. Users may also create their own custom types using the simpleType and
complexType tags.

The declaration method is different depending on whether the element has a child element or not.
When no child element is present, the element name is designated with the name attribute, and the data
type is designated using the type attribute.

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 8


Advanced Java & Web Technologies UNIT – III : Working with XML
Example : greeting.xsd
<?xml version=’1.0’ ?>
<xs:schema xmlns:xs = “http://www.w3c.org/2001/XMLSchema”>
<xs:element name=”greeting” type=”xs:string” />
</xs:schema>
greeting.xml that conforms to the schema represented in greetind.xsd
<?xml version=’1.0’ ?>
<greeting>Hello Wordl!</greeting>

Main XML Schema Datatypes


General Data Types
Name Explanation
xs:integer Integers (infinite precision)
xs:positiveInteger Positive integers (infinite precision)
xs:negativeInteger Negative integers (infinite precision)
xs:nonPositiveInteger Negative integers including 0 (infinite precision)
xs:nonNegativeInteger Positive integers including 0 (infinite precision)
xs:byte Integer represented by 8 bits
xs:unsignedByte Integer represented by 8 bits (no symbols)
xs:short Integer represented by 16 bits
xs:unsignedShort Integer represented by 16 bits (no symbols)
xs:int Integer represented by 32 bits
xs:unsignedInt Integer represented by 32 bits (no symbols)
xs:long Integer represented by 64 bits
xs:unsignedLong Integer represented by 64 bits (no symbols)
xs:decimal Decimal number (infinite precision)
xs:float Single-precision floating-point number (32-bit)
xs:double Double-precision floating-point number (64-bit)
xs:Boolean Boolean value
xs:string Arbitrary text string
Types Representing Dates and Times
Name Explanation
xs:time Time of day
xs:dateTime Date and time of day
xs:date Date
xs:gYear Year
xs:gYearMonth Year and month
xs:gMonth Month
xs:gMonthDay Month and day
xs:gDay Day

An element is limited by its type. Schema authors can use their own defined types or used the
built-in types. Depending on the content model, elements are categorized as simple type or complex type.

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 9


Advanced Java & Web Technologies UNIT – III : Working with XML
Declaring simple elements : simple type elements can contain only text and/or data. They cannot have
child elements or attributes, and cannot be empty. Simple elements are defined as follows
<xs:element name=’element-name’ type=’someSimpleType’ />

Attributes
default – specifies the default content to be used when no content is supplied.
<xs:element name=”passed” type:’xs:boolean’ default=’false’ />
fixed – used to ensure that the element’s content is always set to a particular value.
<xs:element name=”institution” type:’xs:string’ fixed=’VIT’ />
minOccurs – specifies the minimum number of times an element can occur. Default value is 1.
<xs:element name=”middlename” type:’xs:string’ minOccurs=’0’ />
maxOccurs – specifies the maximum number of times an element can occur. Default value is 1.
<xs:element name=”option” type:’xs:string’ maxOccurs=’6’ />

Declaring Complex Elements: these elements can contain child elements, text or both and can also have
attributes. Complex types can be limited to having no content, meaning they are empty, but they may
have attributes. For complex elements, element-type is a complex type(user-defined).
A complex type is defined using the complexType schema element. The general form of complexType
definition is
<xs:complexType>
Skeleton of the complex type
</xs:complexType>
Example: <xs:complexType name=’personType’>
<xs:sequence>
<xs:element name=’firstName’ type=’xs:string’ />
<xs:element name=’lastName’ type=’xs:string’ />
</xs:sequence>
</xs:complexType>
A declaration of an element of such a type will then look like this
<xs:element name=’employee’ type=’personType’ />

In the above example sequence specifies a model group. The Model Group specifies settings method for
the occurrence order of the child element. In the Model Group, use the sequence element to output
occurrences in the order written, and use the choice element to output the occurrence of any given element.
Example for choice group:
<xs:choice>
<xs:element name=’dob’ type=’xs:date’ />
<xs:element name=’age’ type=’xs:integer’ />
</xs:choice>
Defining Attributes: attribute element is used under XML Schema to define attributes. Attributes are
themselves declared as simple types as follows.

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 10


Advanced Java & Web Technologies UNIT – III : Working with XML
Annotations
• XML-schema provides annotation elements for documentation purposes.
• XML-schema provides several tags for annotating a schema: documentation (intended for human
readers), appInfo (intended for applications) and annotation
• Documentation and appInfo usually appear as subelements of annotation
<xs:annotation>
<xs:appInfo>UKR XML tutorial</xs:appInfo>
<xs:documentation xml:lang = “en”>
here goes the documentation text for the schema
</xs:documentation>
</xs:annotation>

Referencing:
An element is defined with unique name, then by using ref attribute one can reference it
Eg: <xsd:element ref=’dob’ maxOccurs=’1’ />
This declaration references an existing element , dob, which was declared elsewhere in the schema.

User-defined data types:


User-defined data types are defined by specifying restrictions on existing data types, which is then
called a base type, and such used defined types are derived types.
To create user-defined data types, which are constrained predefined types. A simple user-defined
data type is described in simple type element using facets. Facets must be specified in the content of a
restriction element, which gives base type name.

Eg: declare a userdefined name firstname, for strings of fewer than 11 characters.
<xsd:simpleType name=”firstName”>
<xs:restriction base=”xs:string”>
<xs:maxLength value=”10” />
</xs:restriction>
</xsd:simpleType>

Restrictions on numbers:
• minInclusive -- number must be ≥ the given value
• minExclusive -- number must be > the given value
• maxInclusive -- number must be ≤ the given value
• maxExclusive -- number must be < the given value
• totalDigits -- number must have maximum value digits
• fractionDigits -- number must have maximum value digits after the decimal point

Restrictions on strings:
• length -- the string must contain exactly value characters
• minLength -- the string must contain at least value characters
• maxLength -- the string must contain no more than value characters
• pattern -- the value is a regular expression that the string must match
• whiteSpace -- not really a ―restriction‖--tells what to do with whitespace
– whiteSpace="preserve" Keep all whitespace
– whiteSpace="replace" Change all whitespace characters to spaces
– whiteSpace="collapse" Remove leading and trailing whitespace, and
replace all sequences of whitespace with a single space

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 11


Advanced Java & Web Technologies UNIT – III : Working with XML
Examples on restrictions
<xs:simpleType name="temperatureType"> <xs:element name="temperature" type="temperatureType"/>
<xs:restriction base="xs:integer">
<xs:minInclusive value="-273"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="binary"> <xs:element name="bit" type="binary"/>
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="1"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="PinCode"> <xs:element name="pin" type="PinCode" />
<xs:restriction base="xs:string">
<xs:length value="6"/>
<xs:pattern value="\d{6}"/>
</xs:restriction>
</xs:simpleType>

Example schema for the following employee tree

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" >

<xs:element name="Employee_Info" type="EmployeeInfoType" />


<xs:complexType name="EmployeeInfoType">
<xs:sequence>
<xs:element ref="Employee" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>

<xs:element name="Employee" type="EmployeeType" />


<xs:complexType name="EmployeeType">
<xs:sequence >
<xs:element ref="Name" />
<xs:element ref="Department" />
<xs:element ref="Telephone" />
<xs:element ref="Email" />
</xs:sequence>
<xs:attribute name="Employee_Number" type="xs:int" use="required"/>
</xs:complexType>

<xs:element name="Name" type="xs:string" />


<xs:element name="Department" type="xs:string" />
<xs:element name="Telephone" type="xs:string" />
<xs:element name="Email" type="xs:string" />
</xs:schema>

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 12


Advanced Java & Web Technologies UNIT – III : Working with XML
Displaying XML Documents:
Style sheet information can be provided to the browser for an XML document in two ways.
1. A Cascading Style Sheet (CSS) file that has style information for the elements in the XML
document can be developed.
2. The XSLT style sheet technology, which was developed by W3C can be used. It provides more
power over the appearance of the document’s display.

CSS files the form of a CSS style sheet for an XML document is simple. It is just a list of element names,
each followed by a brace-delimited set of elements. The following shows a CSS stylesheet for students
XML doc.

Eg: <!--student.css - a style sheet for the planes.xml document -->


student { display: block; margin-top: 15px; color:blue; }
name, contactno, email { color: red; font-size: 16pt; }
street, city, state, zip {display: block; margin-left:40px; font-size:14pt; }
output:

The connection of an XML document to a CSS style sheet is established with the processing
instruction xml-stylesheet, which specifies the particular type of the stylesheet via its type attribute and the
name of the file that stores the stylesheet via href attribute.
Example: <?xml-stylesheet type=”text/css” href=”student.css” ?>

XSLT Style Sheets


The eXtensible Stylesheet Language (XSL) is a family of recommendations for defining XML
document transformations and presentations. It consists of three related standards.
o XSL Transformations (XSLT)
o XML Path Language (XPATH)
o XSL Formatting Objects (XSL-FO)
Each of these has an importance and use of its own.
XSLT style sheets are used to transform XML documents into different forms or formats. One
common use for XSLT is to transform XML documents into HTML documents, primarily for display. In
the transformation of an XML document, element content can be moved and/or modified, stored, or
converted to attribute values, among other things.
XSLT files are themselves XML documents so, these documents must adhere to the well-
formedness constraint. Being an XML document, an XSLT file starts with an XML declaration as follows:
<?xml version=”1.0” ?>

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 13


Advanced Java & Web Technologies UNIT – III : Working with XML
Under XSLT, a XSLT stylesheet is used to describe transformation rules in XML format. This is read by an
application called an "XSLT Processor," transforming a designated XML document. The transformation
results are output in XML, HTML or text format. Following figure shows document transformation via
XSLT processor.

XSLT is a functional-style programming language. XSLT includes functions, parameters, names to which
values can be bound, selection constructs, and conditional expressions for multiple selections.

Every XSLT file must have a root element <stylesheet>.


Version attribute indicates the version of XSLT being used.
xmlns, an namespace attribute must have value – ―http://www.w3.org/1999/XSL/Transform”. It
distinguishes XSLT elements from other elements. xsl is the prefix.

To apply an XSLT document to an XML document, add a link to the XML document which points
to the actual XSLT file and lets the browsers do the transformation. This linking is placed after
XML declaration.
Example: <?xml version=”1.0” ?>
<?xml-stylesheet type=”text/xsl” href=”file.xsl” ?>
<root>
</root>
An XSLT document mainly consists of one or more templates. Each template has associated with a
section of XSLT code, which is executed when a match to the template is found in the XML
document. Therefore each template describes a function, which is executed whenever the XSLT
processor finds a match to the template’s pattern.
An XSLT processor sequentially examines the input XML document, searching for parts that match
one of the templates in the XSLT document.

XSL transformations for presentation:


All XSLT elements that represent HTML elements are copied by the XSLT processor to the output
document being generated.

<xsl:template> : A style sheet document must include at least one template element. It defines a way to
reuse templates in order to generate the desired output for nodes of a particular type/context. Templates
can occur any number of times.
Attributes
Name Description
name Name of the element on which template is to be applied.
match Pattern which signifies the element(s) on which template is to be applied.

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 14


Advanced Java & Web Technologies UNIT – III : Working with XML
Template element can have the following child elements: xsl:apply-imports, xsl:apply-templates, xsl:attribute,
xsl:call-template, xsl:choose, xsl:comment, xsl:copy, xsl:copy-of, xsl:element, xsl:fallback, xsl:for-each, xsl:if,
xsl:message, xsl:number, xsl:param, xsl:processing-instruction, xsl:text, xsl:value-of, xsl:variable, output elements.

Template included to match the root node of the XML document : <xsl:template match=”/” >
Stylesheets have templates for descendents of root node also like : <xsl:template math=”year” >

<xsl:value-of> : tag puts the value of the selected node as per XPath expression, to the output document
being generated. It uses select attribute to specify the element of the XML document, whose contents are
to be copied.
Example : <xsl:value-of select=”author” />
Select attribute can specify any node of the XML document.

<xsl:for-each> : XML document includes a collection of elements. the XSLT template used for one XML
element can be used repeatedly with the for-each element, which uses a select attribute to specify an
element in XML data.

<xsl:sort> : this element specifies a simple way to sort the elements of the XML document before sending
them or their content to the output document.
The select attribute specifies the node that is used for the key of the sort.
Data-type attribute specifies whether the key is to be sorted as text or numberically.
Order attribute specifies sorting order (ascending or descending)
Example : <xsl:sort select=”year” data-type=”number” />

<xsl:if> : this element specifies a conditional test against on the content of nodes. It has a test attribute
which specifies the condition in the xml data to test.
Example : <xsl:if test="marks > 90">

<xsl:choose> : This tag specifies a muliple conditional tests against on the content of nodes in conjunction
with the <xsl:otherwise> and <xsl:when> elements.
<xsl:choose>
<xsl:when test="marks > 90"> High </xsl:when>
<xsl:when test="marks > 85"> Medium </xsl:when>
<xsl:otherwise> Low </xsl:otherwise>
</xsl:choose>

<xsl:apply-templates> : this element applies appropriate templates to the descendent nodes of the current
node
Examples:
<?xml version="1.0" ?> Output:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
<html>
<body>
<h2>Students</h2>
<xsl:for-each select="class/student">
Regdid : <xsl:value-of select="@regid" /> <br />
Name : <xsl:value-of select="name"/> <br />
Contact No: <xsl:value-of select="contactno"/> <br />

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 15


Advanced Java & Web Technologies UNIT – III : Working with XML
Email-Id : <xsl:value-of select="email"/> <br />
</xsl:for-each>
</body>
</html>
</xsl:template>

</xsl:stylesheet>

<?xml version="1.0" ?> Output :


<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
<html>
<body>
<h2>Students</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>rollno</th>
<th>Name</th>
<th>ContactNo</th>
<th>E-mail address</th>
</tr>
<xsl:for-each select="class/student">
<tr>
<td><xsl:value-of select="@regid" /></td>
<td><xsl:value-of select="name"/></td>
<td><xsl:value-of select="contactno"/></td>
<td><xsl:value-of select="email"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>

</xsl:stylesheet>

XML Processors
The word parser comes from compilers. In a compiler, a parser is the module that reads and
interprets the programming language. In XML, a parser is a software component that sits between the
application and the XML files. It reads a text-formatted XML file or stream and converts it to a document
to be manipulated by the application. As we know, Well-formed documents respect the syntactic rules
where as Valid documents not only respect the syntactic rules but also conform to a structure as described
in a DTD.

The purpose of XML processors are


1. The processor must check the basic syntax of the document for well-formed ness.
2. The processor must replace all references to entities in an XML document by their definitions.
3. DTDs and XML schemas can specify that certain values in an XML document have default values,
which must be copied into the XML document during processing.
4. When a DTD or an XML schema is specified and the processor includes a validating parser, the
structure of XML document must be checked to ensure that its structure is legitimate.

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 16


Advanced Java & Web Technologies UNIT – III : Working with XML
If the browser has an XML parser, one can simply check the well-formedness of an XML document.
Another way is to run the XML parser directly on the document. They are two approaches
DOM (Document Object Model) approach - implement DOM API
SAX (Simple API for XML) approach – implement SAX API

DOM Parsers :
 A DOM document is an object containing all the information of an XML document
 It is composed of a tree (DOM tree) of nodes , and various nodes that are somehow associated with
other nodes in the tree but are not themselves part of the DOM tree.
 DOM parser is tree-based (or DOM obj-based). The idea is to build a hierarchical syntactic
structure of the document. The nodes of the tree are represented as objects that can be accessed and
processed or modified by the application
 There are 12 types of nodes in a DOM Document object
Document node
Element node
Text node
Attribute node
Processing instruction node etc.
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href=“test.css"?>
<!-- It's an xml-stylesheet processing instruction. -->
<!DOCTYPE shapes SYSTEM “shapes.dtd">
<shapes>
……
<squre color=“BLUE”>
<length> 20 </length>
</squre>
……
</shapes>

 A DOM parser creates an internal structure in memory which is a DOM document object
 When parsing is complete, the complete DOM representation of the document is in memory and
can be accessed in a number of different ways, including tree traversals of various kinds as well as
random accesses.
 Client applications get the information of the original XML document by invoking methods on this
Document object or on other objects it contains
 Client application seems to be pulling the data actively, from the data flow point of view

Advantages:
1. Access to random parts of the document are possible. It is good when random access to widely
sparated parts of a document is required
2. It supports both read and write operations.
3. If the application must perform any rearrangement of the document, that can most easily be done
if the whole document is accessible at the same time.
4. Because the parser sees the whole document before any processing takes place, this approach
avoids any processing of a document that is later found to be invalid.

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 17


Advanced Java & Web Technologies UNIT – III : Working with XML
Disadvantages:
1. It is memory inefficient. The DOM structure is stored entirely in memory. For large documents this
requires a lot of memory. In fact, because there is no limit on the size of an XML document, there
may be some documents that cannot be parsed this way.
2. It seems complicated, although not really

The following are the steps to use DOM parsers


-Import XML-related packages
-Create a DocumentBuilder
-Create a Document from a file or stream
-Validate Document structure
-Extract the root element
-Examine attributes
-Examine sub-elements
Parsing known xml structure
Populate java DTO objects using DOM Parser
Parsing unknown xml structure
For example purpose, we are parsing below xml content
employees.xml Some broad steps involved in using a DOM parser for parsing any
<employees> XML file in java.
<employee id="111">
<firstName>Lokesh</firstName>
<lastName>Gupta</lastName>
<location>India</location>
</employee>
<employee id="222">
<firstName>Alex</firstName>
<lastName>Gussin</lastName>
<location>Russia</location>
</employee>
<employee id="333">
<firstName>David</firstName>
<lastName>Feezor</lastName>
<location>USA</location> DOM Parser in Action
</employee>
</employees>
Import XML-related packages
You will need to import below packages first in your application.
import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.*;
Create a DocumentBuilder : Next step is to create the DocumentBuilder object.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Create a Document from a file
Document document = builder.parse(new File( file ));
Extract the root element: You can get the root element from XML document using below code.
Element root = document.getDocumentElement();
Examine attributes : You can examine the node attributes using below methods.
element.getAttribute("attributeName") ; //returns specific attribute
element.getAttributes(); //returns a Map (table) of names/values
Examine sub-elements : Child elements can inquired in below manner.
node.getElementsByTagName("subElementName") //returns a list of sub-elements of specified name
node.getChildNodes() //returns a list of all child nodes

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 18


Advanced Java & Web Technologies UNIT – III : Working with XML
Parsing known xml structure
In below example code, I am assuming that user is already aware of the structure of employees.xml file
(it’s nodes and attributes); So example directly start fetching information and start printing it in console.
//Get Document Builder
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

//Build Document
Document document = builder.parse(new File("employees.xml"));

//Normalize the XML Structure; It's just too important !!


document.getDocumentElement().normalize();

//Here comes the root node


Element root = document.getDocumentElement();
System.out.println(root.getNodeName());

//Get all employees


NodeList nList = document.getElementsByTagName("employee");
System.out.println("============================");

for (int temp = 0; temp < nList.getLength(); temp++)


{
Node node = nList.item(temp);
System.out.println(""); //Just a separator
if (node.getNodeType() == Node.ELEMENT_NODE)
{
//Print each employee's detail
Element eElement = (Element) node;
System.out.println("Employee id : " + eElement.getAttribute("id"));
System.out.println("First Name : " + eElement.getElementsByTagName("firstName").item(0).getTextContent());
System.out.println("Last Name : " + eElement.getElementsByTagName("lastName").item(0).getTextContent());
System.out.println("Location : " + eElement.getElementsByTagName("location").item(0).getTextContent());
}
}
Output:
employees
============================
Employee id : 111
First Name : Lokesh
Last Name : Gupta
Location : India

Employee id : 222
First Name : Alex
Last Name : Gussin
Location : Russia

Employee id : 333
First Name : David
Last Name : Feezor
Location : USA
SAX parsers:
 It does not first create any internal structure
 Client does not specify what methods to call
 Client just overrides the methods of the API and place his own code inside there

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 19


Advanced Java & Web Technologies UNIT – III : Working with XML
 The processor scans the XML document from beginning to end. Every time a syntactic structure of
the document is recognized, the processor signals an event to the application by calling an event
handler for the particular structure that was found. The syntactic structures of interest naturally
include opening tags, attributes, text and closing tags.
 When the parser encounters start-tag, end-tag,etc., it thinks of them as events
 When such an event occurs, the handler automatically calls back to a particular method
overridden by the client, and feeds as arguments the method what it sees
 SAX parser is event-based,it works like an event handler in Java (e.g. MouseAdapter) for this
reason the SAX approach to processing is called event processing. The interface that describe the
eventhandlers form the SAX API.
 Client application seems to be just receiving the data inactively, from the data flow point of view
Advantages:
1. It is simple
2. It is memory efficient
3. It works well in stream application
4. It is faster than the DOM approach
Disadvantages:
The data is broken into pieces and clients never have all the information as a whole unless they create
their own data structure

Difference between DOM and SAX XML Parser


Here are few high level differences between DOM parser and SAX Parser in Java:

1) DOM parser loads whole xml document in memory while SAX only loads small part of XML file in
memory.
2) DOM parser is faster than SAX because it access whole XML document in memory.
3) SAX parser in Java is better suitable for large XML file than DOM Parser because it doesn't require
much memory.
4) DOM parser works on Document Object Model while SAX is an event based xml parser.

Important Questions
1. Design & Develop an XML schema for student information management. Include every feature
available with schema.
2. Write about Document Type Definition.
3. Differentiate between DTD & XML Scheme with an example.
4. Write about SAX Parser in detail.
5. Write about DOM Parser in detail.
6. Design & Develop an XML DTD for Employee Database. Include every feature available with
DTD.
7. Write about XML Schema.
8. What is XML? Explain the differences between XML and HTML.
9. XML is not a replacement for HTML. Discuss.
10. Discuss how XML simplifies data sharing and data transport.
11. Discuss how XML separates data from HTML
12. Write an example XML document and explain it.
13. Explain XML document object model.
14. Discuss about DOM and SAX.

B Sridevi, Dept., of CSE, Vishnu Institute of Technology 20

You might also like