0% found this document useful (0 votes)
24 views

Xmlunit 2

Uploaded by

22r11a05t5
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Xmlunit 2

Uploaded by

22r11a05t5
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 133

UNIT-II

XML: introduction to XML, defining XML


tags, their attributes and values,
Document type definition, XML Schemas,
Document Object model, XHTML
Parsing XML Data: DOM and XML
parsers in java
Two problems with HTML:
– 1. Fixed set of tags and attributes
• User cannot define new tags or attributes
– 2. There are no restrictions on arrangement or order
of tag appearance in a document
• Why do we need XML?
• HTML is used to how the data is displayed on web.
<p><b>Mrs. Praveen</b>
<br>
LBNagar
<br>
Hyderabad</p>
Tags in this document tell a browser how to display
this information, tags don’t tell the browser what the
information is
XML

• Xml (eXtensible Markup Language) is a mark up language.


• XML is designed to store and transport data.
• Xml was released in late 90’s. it was created to provide an easy to
use and store self describing data.
• XML became a W3C Recommendation on February 10, 1998.
• XML is not a replacement for HTML.
• XML is designed to be self-descriptive.
• XML is designed to carry data, not to display data.
• XML tags are not predefined. You must define your own tags.
• XML is platform independent and language independent.
How Can XML be Used?
1. XML Separates Data from HTML
2. XML Simplifies Data Transport
3. XML Simplifies Platform Changes
4. XML Makes Your Data More Available.
5. Takes data from program like mysql(microsoft sql) and convert into
xml and share that xml to all the sort of applications and platforms.

Applications of xml
1. cellephones- xml data is sent to some cellphones.
That data is formatted by specifications of the cellphone software
designer to display text, image or even to play sounds.
2. File converters- Many applications have been written to convert
existing documents into the XML standard.
An example is a PDF to XML converter.
3. VoiceXML - Converts XML documents into an audio format so that
you can listen to an XML document.
4. Ms office also uses its file format in xml.
Difference between HTML and XML:
Xml syntax
• Syntax is used to create well formed xml document.

<?xml version="1.0" encoding=“UTF-8"?>


<class_list>
<student>
<name> Rohit</name>
<grade> A+ </grade>
</student>
<student>
<name>Sai</name>
<grade>A-</grade>
</student>
</class_list>
<?xml version="1.0" encoding=“UTF-8"?>
• This is called xml declaration which states that what type xml
version you are using and type of encoding you using.
• encoding- character set used in this document is all character set
used by western european languages or it may be UTF- 8(UCS
(Unicode) Transformation Format)
 All XML Elements Must Have a Closing Tag
 XML Tags are Case Sensitive
 XML Elements Must be Properly Nested
In HTML :<b><i>This text is bold and italic</b></i>
In xml:<b><i>This text is bold and italic</i></b>
 XML Documents Must Have a Root Element
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
JSON XML
It is JavaScript Object Notation It is Extensible markup language
It is derived from SGML.(standard
It is based on JavaScript language.
generalized mark-up language)
It is a markup language and uses tag
It is a way of representing objects.
structure to represent data items.
It does not provides any support for
It supports namespaces.
namespaces.
It supports array. It doesn’t supports array.
Its files are very easy to read as Its documents are comparatively
compared to XML. difficult to read and interpret.
It doesn’t use end tag. It has start and end tags.

It is less secured. It is more secured than JSON.


Rules For Writing XML Documents
1.XML is a case sensitive language .
Ex : <name> Praveen</name>
<Name> Praveen </Name>
It treated two different tags .
2.Each start tag must have matching End Tag
Ex : <branch>CSE</branch>
3.The Elements in XML must be Properly nested .
Ex : <name><branch>Kumar/CSE</branch></name>
<address>
<name>
<title>Mr.</title>
<first-name> your_First_name</first-name>
<last-name> Last_Name</last-name>
</name>
<street> chintalkunta </street>
<city>Hyderabd</city>
<state>Telangana</state>
<postal-code> 500074</postal-code>
</address>
-> XML is a software- hardware independent tool for carrying
information.
-> XML is a most common tool for data transmission between all sorts
of applications.
DTD
An XML Document Consists of two parts :
1.Prolog 2. Body
• In the Prolog part the document contains declarations, optional
processing instructions, comments ,Document type
Declarations .
Ex : <?xml version=“1.0” encoding =“UTF-8” standalone=“no” ?>
<?xml-stylesheet href=“example.xsl” type=“text/xsl” ?>
<!-- Comment -->
• DTD (Document type Declaration ) Specify the type of
document used and the structure is specified by imposing
constraints on what tags can used and where .
• Body part contains textual data marked up by tags .
An XML document contains
1. Elements
2. Attribute
3. Entity references
4. PCDATA
5. CDATA
• PCDATA:
1. PCDATA means parsed character data.
think of character data as the text found between the start tag and
the
end tag of an XML element.
2. PCDATA is text that WILL be parsed by a parser. The text will

be examined by the parser for entities and markup.


3. Tags inside the text will be treated as markup and entities will be
expanded.
4. However, parsed character data should not contain any &, <, or >
characters; these need to be represented by the &amp; &lt; and
&gt;
entities, respectively.

CDATA:
1. CDATA means character data.
2. CDATA is text that will NOT be parsed by a parser. Tags inside

the text will NOT be treated as markup and entities will not be
expanded
1)XML Element: An XML element is everything from (including)
the element's start tag to (including) the element's end tag including
text data.
Example:
<employee>
<empno>16</empno>
<name>Goutham</name> Elements
<salary>45000</salary>
</employee>
An element can contain:
• Child elements
• attributes
• Text-data
• or a mix of all of the above...
2)XML attributes:
• Attributes provide additional information about an element.
• XML Attributes Must be Quoted
<city state=“ap”> Attribute

<employee empno=“1319” name=“Sam”>


<employee>
Rules to define Attributes:
• Attribute values must be enclosed with either single or double
quotes.
• Attribute names are Unique for one element.
• Duplicate attribute are not allowed in an element.
• Attributes order is not important in an element.
XML elements vs attributes

<person>
<gender>female</gender>
<person gender="female">
<firstname>Anna</firstname>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
<lastname>Smith</lastname>
</person>
</person>

gender is element
gender is attribute
Entity reference: Some characters have a special meaning in XML.
<person>
<name>Ramesh</name>
<age>age is <18</age>
</person>
• In above example “lessthan symbol” has special meaning
• “<“ is used for opening tag
• In entity reference we will use “&lt;” for lessthan
<person>
<name>Ramesh</name>
<age>age is &lt;18</age>
</person>
User defined Entity ref:
Syntax:
&entity reference name;
XML tree:
• XML documents form a tree structure that starts at "the root"
and branches to "the leaves".
• XML document contains a single element.
• That single element is called root element
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to> Rakesh</to>
<from> Jani </from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

note- root element


to, from, heading, body are child elements
Well Formed XML Documents:
A "Well Formed" XML document has correct XML syntax.
 XML document must begin with PROLOG
 XML documents must have a unique root element
 XML elements must have a closing tag for all opening tags
 XML tags are case sensitive
 XML elements must be properly nested
 XML attribute values must be quoted
 In place of some special characters we can use entity references
Document type definition(DTD) ;- DTD is an XML technique to
define the structure of a XML Document.
• XML elements, attributes, Entity references functionality are
defining inside the DTD.
• DTD is an Text based Document with .dtd extension.
• DTD contains
1. Elements declaration
2. Attribute declaration
3. Entity references
4. PCDATA
5. CDATA
Why Use a DTD?

1. Each of our XML files can carry a description of its own format.

2. Independent groups of people can agree to use a standard DTD


for interchanging data.

3. Application can use a standard DTD to verify that the data we


receive from the outside world is valid.

4. We can also use a DTD to verify our own data.


DTD declaration syntax: To declare elements, attributes, entity
references , to declare any component we use following syntax:
• <!>

<!ELEMENT element-name(content-
model)>
Example:
• <!ELEMENT employee(empno , empname, sal)>
• <!ELEMENT empno(#PCDATA)>
• <!ELEMENT name(#PCDATA)>
• <!ELEMENT sal(#PCDATA)>
The Building Blocks of XML Documents :- All XML documents
(and HTML documents) are made up by the following building
blocks:
1. Elements
2. Attributes
3. Entities
4. PCDATA
5. CDATA
1.DTD ELEMENTS: XML elements can be defined as building blocks
of an XML document. Elements can behave as a container to hold text,
elements, attributes, media objects or mix of all.
A DTD element is declared with an ELEMENT declaration. When an
XML file is validated by DTD, parser initially checks for the root
element and then the child elements are validated.
Syntax:- <!ELEMENT elementname (content)>
From the above syntax
 ELEMENT declaration is used to indicate the parser that user
specified about to define an element.
 elementname is the element name (also called the generic
identifier) that defining by the user.
 content defines what content (if any) can go within the element.
Element Content Types:- Content of elements declaration in a DTD
divided into following types
i. Empty content
ii. Element content/Elements with Parsed Character Data
iii. Mixed content
iv. Any content
i. Empty Content:In the empty content type of element
declaration, The element declaration does not contain any content.
These are declared with the keyword EMPTY.
Syntax:
<!ELEMENT element-name EMPTY>
Eg:

<?xml version = "1.0"?>


<!DOCTYPE user[ <!ELEMENT name EMPTY> ]> </user>
ii.Element Contents: In This declaration, the content would be allowable elements
within parentheses. user can also include more than one element separated by
comma.
Syntax:- <!ELEMENT elementname (child1, child2...)>
 ELEMENT is the element declaration tag,
 elementname is the name of the element.
 child1, child2.. are the elements and each element must have its own
definition within the DTD.
Eg:- <!DOCTYPE address [
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone_no (#PCDATA)> ]>
<address>
<name>Mahender</name>
<company>GCET</company>
<phone_no>99120XXXX</phone_no>
</address>
Note:- From the above example, address is the parent element and
name, company and phone_no are its child elements.
List of Operators and Syntax Rules:-The following table shows the
list of operators and syntax rules which can be applied in defining child
elements.
Operator Syntax Description Example
+ <!ELEMENT element-name It indicates that <!ELEMENT
(child1+)> child element can subjects(sub_name+
occur one or more )>
times inside parent Child
element. element sub_name c
an occur one or more
times inside the
element name
* <!ELEMENT element-name It indicates that subjects
<!ELEMENT subjects
(child1*)> child element can (sub_name*)>
occur zero or more Child element
times inside parent sub_name can occur
element. zero or more times
inside the element
name subjects
Operator Syntax Description Example
? <!ELEMENT It indicates that child element <!ELEMENT address
element-name can occur zero or one time (name?)>
(child1?)> inside parent element. Child element name can
occur zero or one time
inside the element
name address.
, <!ELEMENT It gives sequence of child <!ELEMENT address (name,
element-name elements separated by comma company)>
(child1, which must be included in the Sequence of child
child2)> element-name. elements name, company,
which must occur in the
same order inside the
element name address.
| <!ELEMENT It allows making choices in the <!ELEMENT address (name
element-name child element. | company)>
(child1 | It allows user to choose
child2)> either of child elements
i.e. name or company, which
must occur in inside the
element name address.
Rules:-The Rules are used when there is more than one element
content.
i) Sequences − the elements within DTD documents must appear in a
distinct order. User can define the content using a sequence.
The declaration indicates that the <address> element must have
exactly three children - <name>, <company>, and <phone> - and that
they must appear in this order.
Eg:- <!ELEMENT address (name,company,phone)>

ii)Choices − If user need to allow one element or another, but not


both. In such cases we must use the pipe (|) character. The pipe
functions as an exclusive OR
Eg:-<!ELEMENT address( mobile | and no)>
• <!ELEMENT book (book-name | title)>
• <!ELEMENT book-name (#PCDATA)>
• <!ELEMENT title (#PCDATA)>
=====================================
• <book>
• <title>WebTechnologies</title>
• </book>
=====================================
• <book>
• <book-name>WT</book-name>
• </book>
a. Declaring Only One Occurrence of an Element :-

<!ELEMENT element-name (child-name)>


Example:<!ELEMENT note (message)>

The example above declares that the child element "message" must occur once,
and only once inside the "note" element.

b. Declaring Minimum One Occurrence of an Element:-

<!ELEMENT element-name (child-name+)>

Example:<!ELEMENT note (message+)>

The + sign in the example above declares that the child element "message" must
occur one or more times inside the "note" element.
c. Declaring Zero or More Occurrences of an Element

<!ELEMENT element-name (child-name*)>

Example:<!ELEMENT note (message*)>

The * sign in the example above declares that the child element "message" can
occur zero or more times inside the "note" element.

d. Declaring Zero or One Occurrences of an Element

<!ELEMENT element-name (child-name?)>


Example:<!ELEMENT note (message?)>

The ? sign in the example above declares that the child element "message" can
occur zero or one time inside the "note" element.
iii).Mixed Element Content:-The combination of (#PCDATA) and
children elements. Within mixed content models, text can appear by
itself or it can be interspersed between elements. The rules for mixed
content models are similar to the element content.
Syntax:- <!ELEMENT elementname (#PCDATA|child1|child2)*>

From the above syntax


 ELEMENT is the element declaration tag.
 elementname is the name of the element.
 PCDATA is the text that is not markup. #PCDATA must come first in
the mixed content declaration.
 child1, child2.. are the elements and each element must have its own
definition within the DTD.
 The operator (*) must follow the mixed content declaration if
children elements are included
 The (#PCDATA) and children element declarations must be separated
by the (|) operator.
iv Elements with any Contents:- Elements declared with the category
keyword ANY, can contain any combination of parable data:
Syntax:-
<!ELEMENT element-name ANY>
From the above syntax ANY keyword indicates that text (PCDATA) and/or
any elements declared within the DTD can be used within the content of the
<elementname> element. They can be used in any order any number of times.
Example:
<!DOCTYPE address [ <!ELEMENT address ANY> ]>
<address> Here's a bit of sample text </address>
2.DTD – Attributes: Attribute gives more information
about an element or more precisely it defines a property of
an element.
An XML attribute is always in the form of a name-value
pair.
• An element can have any number of unique attributes.
• Attribute declaration is very much similar to element
declarations in many ways except one; instead of
declaring allowable content for elements, you declare a
list of allowable attributes for each element.
• These lists are called ATTLIST declaration.
Syntax:
<!ATTLIST element-name attribute-name attribute-type attribute-value>

In the above syntax …


• The DTD attributes start with <!ATTLIST keyword if
the element contains the attribute.
• element-name specifies the name of the element to
which the attribute applies.
• attribute-name specifies the name of the attribute
which is included with the element-name.
• attribute-type defines the type of attributes.
• attribute-value takes a fixed value that the attributes
must define.
Example:
<?xml version = "1.0"?>
<!DOCTYPE address [ <!ELEMENT address ( name )>
<!ELEMENT name ( #PCDATA )>
<!ATTLIST name id CDATA #REQUIRED> ]>
<address>
<name id = "1216">Ramesh</name>
</address>
• Rules of Attribute Declaration
• All attributes used in an XML document must be declared in the
Document Type Definition (DTD) using an Attribute-List Declaration
• Attributes may only appear in start or empty tags.
• The keyword ATTLIST must be in upper case
• No duplicate attribute names will be allowed within the attribute
list for a given element.
Attribute Value Declaration:- Within each attribute declaration,
user must specify how the value will appear in the document. You can
specify if an attribute −
• can have a default value
• can have a fixed value
• is required
• is implied
• Default Values: It contains the default value. The values can be
enclosed in single quotes(') or double quotes(").
• Syntax:
<!ATTLIST element-name attribute-name attribute-type "default-value">

• where default-value is the attribute value defined.


A Default Attribute Value Example:
DTD:
<!ELEMENT square>
<!ATTLIST square width CDATA "0">

Valid XML:<square width="100" />

In the example above, the "square" element is defined with a "width"


attribute of type CDATA. If no width is specified, it has a default
value of 0.
• The default-value can be one of the following:

Value Explanation
====================================
value The default value of the attribute
#REQUIRED The attribute is required
#IMPLIED The attribute is not required
#FIXED value The attribute value is fixed
#REQUIRED:-

Syntax:- <!ATTLIST element-name attribute-name attribute-type #REQUIRED>


Eg:-

DTD: <!ATTLIST person number CDATA #REQUIRED>


Valid XML: <person number="5677" />
Invalid XML: <person />

Use the #REQUIRED keyword if you don't have an option for a default value, but
still want to force the attribute to be present.
#IMPLIED

Syntax

<!ATTLIST element-name attribute-name attribute-type #IMPLIED>

Example
DTD: <!ATTLIST contact fax CDATA #IMPLIED>
Valid XML: <contact fax="555-667788" />
Valid XML: <contact />

Use the #IMPLIED keyword if you don't want to force the author to
include an attribute, and you don't have an option for a default value.
#FIXED

Syntax
<!ATTLIST element-name attribute-name attribute-type #FIXED "value">
Example

DTD: <!ATTLIST sender company CDATA #FIXED "Microsoft">


Valid XML: <sender company="Microsoft" />
Invalid XML:<sender company="W3Schools" />

Use the #FIXED keyword when you want an attribute to have a fixed
value without allowing the author to change it. If an author includes
another value, the XML parser will return an error.
• Attribute Types
• When declaring attributes, you can specify how the processor
should handle the data that appears in the value.
• We can categorize attribute types in three main categories −
• String type
• Tokenized types
• Enumerated types
2. Enumerated Attribute Values:- It is used to specify list of values.
This attribute allows any one of the value from the specified list.

Syntax:-

<!ATTLIST element-name attribute-name (en1|en2|..) default-value>


Example DTD:
<!ATTLIST payment type (check|cash) "cash">

XML example: <payment type="check" /> or


<payment type="cash" />

Use enumerated attribute values when you want the attribute value to be one of a
fixed set of legal values.
EG:-att1.xml
2. ID Attribute Values:-It is unique type and start with _ or A-Z or a-z, should not
strat With digit.

Eg:-att2.xml
3.IDREF Attribute Values:-
example-1
<!DOCTYPE bookstore [

<!ELEMENT bookstore (topic+)>

<!ELEMENT topic (name,book*)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT book (title,author)>

<!ELEMENT title (#CDATA)>

<!ELEMENT author (#CDATA)>

<!ELEMENT isbn (#PCDATA)>

<!ATTLIST book isbn CDATA "0">


]>
<?xml version="1.0"?>
<!DOCTYPE bookstore SYSTEM
"http://webserver/bookstore.dtd">
<bookstore>
<topic>
<name>XML</name>
<book isbn="123-456-789">
<title>Mike's Guide To DTD's and XML Schemas<</title>
<author>Mike Jervis</author>
</book>
</topic>
</bookstore>
PCDATA:
1. PCDATA means parsed character data.
think of character data as the text found between the start tag and
the
end tag of an XML element.
2. PCDATA is text that WILL be parsed by a parser. The text will
be examined by the parser for entities and markup.
3. Tags inside the text will be treated as markup and entities will be
expanded.
4. However, parsed character data should not contain any &, <, or >
characters; these need to be represented by the &amp; &lt; and
&gt;
entities, respectively.

CDATA:
1. CDATA means character data.
2. CDATA is text that will NOT be parsed by a parser. Tags inside
the text will NOT be treated as markup and entities will not be
expanded
TYPES OF DTD’S:
• 1.Internal DTD
• 2.External DTD
• Internal DTD:
• If the DTD is declared inside the XML file, it should be
wrapped in a DOCTYPE definition with the following
syntax:
<!DOCTYPE root-element [elementdeclarations]>
• Internal DTD’s are specific to XML document.
• Internal DTD’s are not reusable.
Example:
<!DOCTYPE employee [
<!ELEMENT employee (empno,empname,sal)>
<!ELEMENT empno (#PCDATA)>
<!ELEMENT empname (#PCDATA)>
<!ELEMENT sal (#PCDATA)>
]>
<employee>
<empno>1216</empno>
<empname>ram</empname>
<sal>34000</sal>
</employee>
External DTD Declaration: If the DTD is declared in an
external file, it should be wrapped in a DOCTYPE definition
with the following syntax:
<!DOCTYPE root-element SYSTEM/PUBLIC “url of file dtd">
• External DTD’s are two types:
a. Private DTD’s
b. Public DTD’s
a.Private DTD’s:
<!DOCTYPE root-element SYSTEM “url of file dtd">

SYSTEM – DTDs defined by individuals and organizations.


It indicates private DTD’s.
<!ELEMENT employee (empno,empname,sal)>
<!ELEMENT empno (#PCDATA)>
<!ELEMENT empname (#PCDATA)>
<!ELEMENT sal (#PCDATA)>
• Above file save with .dtd extension
<!DOCTYPE employee SYSTEM “urlpath of DTD file" >
<employee>
<empno>1216</empno>
<empname>ram</empname>
<sal>34000</sal>
</employee>
Above file save with .XML extension
DTF - Document Type Definition
• What is DTD
• DTD stands for Document Type Definition. It defines the legal
building blocks of an XML document. It is used to define document
structure with a list of legal elements and attributes.
• Purpose of DTD
• Its main purpose is to define the structure of an XML document. It
contains a list of legal elements and define the structure with the
help of them
• Checking Validation
• Before proceeding with XML DTD, you must check the validation. An
XML document is called "well-formed" if it contains the correct
syntax.
• A well-formed and valid XML document is one which have been
validated against DTD.
Valid and well-formed XML document with DTD
• Let's take an example of well-formed and valid XML document. It
follows all the rules of DTD.
• employee.xml:
<?xml version="1.0"?>
<!DOCTYPE employee SYSTEM "employee.dtd">
<employee>
<firstname>vimal</firstname>
<lastname>jaiswal</lastname>
<email>[email protected]</email>
</employee>
In the above example, the DOCTYPE declaration refers to an external DTD
file. employee.dtd:
<!ELEMENT employee (firstname,lastname,email)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT email (#PCDATA)>
• Description of DTD:
• <!DOCTYPE employee : It defines that the root element of the document is
employee.
• <!ELEMENT employee: It defines that the employee element contains 3
elements "firstname, lastname and email".
• <!ELEMENT firstname: It defines that the firstname element is #PCDATA
typed. (parse-able data type).
• <!ELEMENT lastname: It defines that the lastname element is #PCDATA
typed. (parse-able data type).
• <!ELEMENT email: It defines that the email element is #PCDATA typed.
(parse-able data type).
employee.xml:
output:
DTD:
• The XML Document Type Declaration, commonly known as DTD, is a way to describe
XML language precisely. DTDs check vocabulary and validity of the structure of XML
documents against grammatical rules of appropriate XML language.
• An XML DTD can be either specified inside the document, or it can be kept in a
separate document and then liked separately.
• Syntax
• Basic syntax of a DTD is as follows −
<!DOCTYPE element DTDidentifier
[ declaration1 declaration2 ........ ]>
In the above syntax,
• The DTD starts with <!DOCTYPE delimiter.
• An element tells the parser to parse the document from the specified root element.
• DTD identifier is an identifier for the document type definition, which may be the path
to a file on the system or URL to a file on the internet. If the DTD is pointing to
external path, it is called External Subset.
• The square brackets [ ] enclose an optional list of entity declarations called Internal
Subset.
• Internal DTD:
• A DTD is referred to as an internal DTD if elements are declared within the XML files.
To refer it as internal DTD, standalone attribute in XML declaration must be set
to yes. This means, the declaration works independent of an external source.
• Syntax
• Following is the syntax of internal DTD −
<!DOCTYPE root-element [element-declarations]>
where root-element is the name of root element and element-declarations is where
you declare the elements.
• Example
• Following is a simple example of internal DTD −
Internal DTD Example

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>


<!DOCTYPE address [
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address> <name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
• Let us go through the above code −
• Start Declaration − Begin the XML declaration with the following statement.
• <?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
• DTD − Immediately after the XML header,
the document type declaration follows, commonly referred to as the DOCTYPE
• <!DOCTYPE address [ The DOCTYPE declaration has an exclamation
mark (!) at the start of the element name.
• The DOCTYPE informs the parser that a DTD is associated with this XML
document.
• DTD Body − The DOCTYPE declaration is followed by body of the DTD,
where you declare elements, attributes, entities, and notations.
• End Declaration − Finally, the declaration section of the DTD is closed
using a closing bracket and a closing angle bracket (]>). This effectively
ends the definition, and thereafter, the XML document follows immediately.
Internal DTD
• Following is a simple example of internal DTD −
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?> <!
DOCTYPE address [
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
• External DTD:
• In external DTD elements are declared outside the XML file. They are
accessed by specifying the system attributes which may be either the
legal .dtd file or a valid URL.
• To refer it as external DTD, standalone attribute in the XML declaration must
be set as no. This means, declaration includes information from the external
source.
• Syntax
• Following is the syntax for external DTD −
• <!DOCTYPE root-element SYSTEM "file-name"> where file-name is the
file with .dtd extension.
• Example:
externalDTD

• The following example shows external DTD usage −


• <?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?> <!
DOCTYPE address SYSTEM "address.dtd">
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>

• The content of the DTD file address.dtd is as shown −


<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
EXTERNAL DTD
• System Identifiers
• A system identifier enables you to specify the location of an external file
containing DTD declarations. Syntax is as follows −
• <!DOCTYPE name SYSTEM "address.dtd" [...]> As you can see, it
contains keyword SYSTEM and a URI reference pointing to the location of
the document.
• Public Identifiers
• Public identifiers provide a mechanism to locate DTD resources and is
written as follows −
• <!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address
Example//EN">
• As you can see, it begins with keyword PUBLIC, followed by a specialized
identifier. Public identifiers are used to identify an entry in a catalog. Public
identifiers can follow any format, however, a commonly used format is
called Formal Public Identifiers, or FPIs.
b.Public DTD’s:- If any DTD not specific to any particular
project – “Public DTD”.
• If any DTD is specific to a particular XML document are
called “Internal DTD”.
• If any DTD is specific to a particular project – “Private
DTD”.
Example:
• Hibernate provides DTD’s are public .
• Spring frame work DTD’s are public.
• Syntax:
<!DOCTYPE rootElement PUBLIC “-//vendroename//version//EN ”DTDfile”>

• PUBLIC – DTDs defined by International Organizations like W3C


(standards)
3. DTD Entities :Entities are variable used to define the shortcut
names for text or special characters.

Syntax:
– General entity references start with & and end with ;
– The entity reference is replaced by its true value when parsed.
– The characters < > & “ ‘ require entity references to avoid
conflicts with the XML application ( parser )

74
Types of Entity
1. Internal Entity: declared within DTD
syntax:-
<!ENTITY entity-name "entity-value">

Ex: <!ENTITY c “GCET”>


Instruction: <college> my college name is &c;</college>
• Note: An entity has three parts: an ampersand (&), an entity name,
and a semicolon (;).

75
a. An Internal Entity Declaration
Syntax
• <!ENTITY entity-name "entity-value">
Eg:-
DTD Example:
• <!ENTITY writer "Donald Duck.">
• <!ENTITY copyright "Copyright W3Schools.">
• XML example:
• <author>&writer;&copyright;</author>
2. External Entity: Included in the different file and referred in
the xml file.

An External Entity Declaration


Syntax
<!ENTITY entity-name SYSTEM "URI/URL">
Example

DTD Example:
<!ENTITY writer SYSTEM "http://www.w3schools.com/entities.dtd">
or <!ENTITY writer SYSTEM “d:\test.txt”
<!ENTITY copyright SYSTEM
"http://www.w3schools.com/entities.dtd">

XML example:
<author>&writer; &copyright;</author>
Disadvantages of DTD

1. DTD does not follow the XML syntax it requires new syntax.
2. Namespace does not supported
3. No data types.
4. No modularity and no reuse of elements.
5. No inheritance for elements or attributes
6. DTD is old technique.
Namespace Declaration:- A Namespace is a set of unique names.
Namespace is a mechanisms by which element and attribute name can
be assigned to a group. The Namespace is identified by URI(Uniform
Resource Identifiers).

XML Namespaces:-XML Namespaces provide a method to avoid


element name conflicts.
Name Conflicts:- In XML, element names are defined by the developer. This
often results in a conflict when trying to mix XML documents from different
XML applications.
This XML carries HTML table information:

This XML carries information about a table (a piece of furniture):

• If these XML fragments were added together, there would be a name


conflict. Both contain a <table> element, but the elements have different
content and meaning.
• A user or an XML application will not know how to handle these differences.
A Namespace is a set of unique names. Namespace is a process by which element
and attribute name can be allocated to a group. The Namespace is identified by
URI(Uniform Resource Identifiers).

Namespace Declaration:- A Namespace is declared using reserved attributes.


Such an attribute name must either be xmlns or begin with xmlns
syntax:- <element xmlns:name = "URI">
The Namespace starts with the keyword xmlns.
The word name is the Namespace prefix.

The URI(Uniform Resource Identifier) is the Namespace identifier(namespace a


unique name).
Uniform Resource Identifier (URI):- A Uniform Resource Identifier (URI) is a
string of characters which identifies an Internet Resource.
The most common URI is the Uniform Resource Locator (URL) which identifies an
Internet domain address. Another, not so common type of URI is the Uniform
Resource Name (URN).

Eg:- <Mahender xmlns:t=http://www.Mahenderwt/iiyear>


namespace.xml(xmldemo)
<?xml version="1.0" encoding="UTF-8"?>
<praveen xmlns:t="http://www.praveenwt/iiiyear"
xmlns:s="https://www.praveenwt/iiyear">

<t:wt>
<t:unit-1>html</t:unit-1>
<t:unit-2>CSS</t:unit-2>
</t:wt>

<s:wt>
<s:unit-1>introduction to internet</s:unit-1>
<s:unit-2>html</s:unit-2>
</s:wt>
XSD- XML Schema Definition:
1. XML Schema is an XML-based alternative to DTD.

2. An XML schema describes the structure of an XML document.

3. The XML Schema language is also referred to as XML Schema


Definition (XSD).

XSDs can be extensible for future additions. XSD is richer and more
powerful than DTD.
What is an XML Schema?
The purpose of an XML Schema is to define the legal building
blocks of an XML document, just like a DTD.
An XML Schema:
1. defines elements that can appear in a document
2. defines attributes that can appear in a document
3. defines which elements are child elements
4. defines the order of child elements
5. defines the number of child elements
6. defines whether an element is empty or can include text
7. defines data types for elements and attributes
8. defines default and fixed values for elements and attributes
XSD Elements:In XSD two ways to create Elements.
i. Simple Element (ii) Complex Element
======================================
i. Simple Element:A simple element is an XML element that can
contain only text data. It cannot contain any other elements or
attributes.
Syntax:
<xs:schema>
<xs:element name=“element name” type=“xs:data
type”>
</xs:element>
</xs:schema>
Example:
<xs:schema>
<xs:element name=“EmpNo” type=“xs:int”>
<xs:element name=“EmpName” type=“xs:string”>
</xs:element>
</xs:schema>
Writing Simple XML Schema
Step 1: Write a Simple schema file to define the structure of XML file
and save it as .XSD extension .
Step 2: Write an XML Document for the Defined Schema .
Step 3: Execute the XML in Browser or XML Editor .
XSD - The <schema> Element: The <schema> element is the root
element of every XML Schema.
Synta:- <xs:schema>
...
...
</xs:schema>
• The <schema> element may contain some attributes.
• Data Types in XSD:
• Primitive types-19: • Built-in- derived Data Types:
• String, • normalizedString
• boolean, • token,
• intger,
• decimal, • language,
• • NMTOKEN, • nonPositiveInteger,
float,
• double, • NMTOKENS • negativeInteger,
• duration,
• Name, • long,
• NCName,
• dateTime, • int,
• ID,
• time, • IDREF, • short,
• date, • IFREFS, • byte,
• gYearMonth, • ENITIY, • nonNegativeIntege
• gYear,gMonthDay, • ENTITIES,
• gDay,gMonth,
r
• nexbinary,
• unsignedLong,
• base64Binary, • unsignetInt
• anyURI, • unsignedShort,
• Qname, • unsignedByte,
• NOTATION. • positiveInteger
example
<student>
<sname>sunny1</sname>
<rollno>5n5</rollno>
<marks>9.9</marks>
<mobileno>97045326</mobileno>
</student>
***********************************
<xs:element type="xs:string" name="sname"/>
<xs:element type="xs:string" name="rollno"/>
<xs:element type="xs:float" name="marks"/>
<xs:element type="xs:int" name="mobileno"/>

Default and Fixed Values for Simple Elements:-

<xs:element name="color" type="xs:string" default="red"/>

<xs:element name="color" type="xs:string" fixed="red"/>


Attribute declaration:-All attributes are declared as simple types.

Syntax
<xs:attribute name="xxx" type="yyy"/>

<lastname lang="EN">Smith</lastname>- XML


<xs:attribute name="lang" type="xs:string"/>-XSD

Default and fixed values for attributes


<xs:attribute name="lang" type="xs:string" default="EN"/>
<xs:attribute name="lang" type="xs:string" fixed="EN"/>
Restrictions on Content/ facets
 Restrictions are used to define acceptable values for XML
elements or attributes. Restrictions on XML elements are called
facets.
 When u define the data types for elements or attributes, it put
restrictions on content of elements and attributes.
 If an XML element is of type "xs:date" and contains a string like
"Hello World", the element will not validate.
<xs:element name=“rollno">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value=“1"/>
<xs:maxInclusive value=“70"/>
<xs:element name="car">
</xs:restriction>
<xs:simpleType>
</xs:simpleType>
<xs:restriction base="xs:string">
</xs:element>
<xs:enumeration value="Audi"/>
<xs:enumeration value="Golf"/>
<xs:enumeration value="BMW"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
To create XSD following steps:-
1) Create valid .xml file
eg:-
<cse-2D>-root element
<student>-Childelement
<sname>sunny</sname>
<rollno>5n4</rollno>
<marks>9</marks>
<mobileno>97045326</mobileno>
</student>
</cse-3e>
2. Create XSD file with .xsd
(i) First line of the xsd is
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
……
<</xs:schema>
The above fragment specifies that elements and datatypes used in the
schema are defined in http://www.w3.org/2001/XMLSchema namespace
and these elements/data types should be prefixed with xs. It is always
required.
(ii) Create root element elements followed by element
type(simple/complex) and content type(sequence/choice /all)
Eg:-<xs:element name="student" maxOccurs="" minOccurs="0">
<xs:complexType>
<xs:sequence>
---------------
--------------------------
--------------------
</xs:sequence>
</xs:complexType>
</xs:element>
EG:-

<xs:element name = 'class'>


<xs:complexType>
<xs:sequence>
<xs:element name = 'student' type = 'StudentType' minOccurs = '0'
maxOccurs = 'unbounded' />
</xs:sequence>
</xs:complexType>
</xs:element>
XSD Name Space concept:
• In java package concept is used to group all the classes as a single
Unit.
• When class is present in a package at the time of using we are
specifying with fully qualified names.
• Fully qualified name: class name along with package.
• If we use fully qualified name it is possible t avoid ambiguity
problem.
• Ex:
• Class Test{
java.util,Date d1;
• Java.sql.Date d2;
• }
• Similarly in XSD we have name space concept.
• Namespace are used to group all the elements as a single unit.
• At the time of using the element we will use fully qualified name.
• Fully qualified names:NameSpace+Element name
• In java package packagename;
• In XSD targetNameSpace
• targetNameSpace attribute is with in <schema>
• Ex:
• <schema argetNameSpace=“http://preminfo.com”>
• Create elements
• </schema>
• Inside the schema we are creating elements are comes under
NameSpace.
• <schema targetNameSpace=“http://preminfo.com”>
• <element name=“employee”>
• <complexType>
• <sequence>
• <element name=“empno” type=“int”/>
• <element name=“empname” type=“string”/>
• <element name=“salary” type=“decimal”/>
• </sequence></ComplexType> </schema >

---------------------------------------------------------------------
<http://preminfo.com:employee>
< http://preminfo.com:empno >1216</ http://preminfo.com:empno >
< http://preminfo.com:empname >Ram</ http://preminfo.com:empname >
< http://preminfo.com:salary >45000< /http://preminfo.com:salary >
</http://preminfo.com:employee>
Writing fully qualified name with elements it is over
burden to over come this problem we are using XMLNS.
• <employee XMLNS:”http://preminfo.com”>
• <empno>1216</empno>
• <empname>Ram</empname>
• <salary>45000<salary/>
• </employee>
---------------------------------------------------------------
• It also possible to create prefix for XMLNS
• <employee XMLNS:e=”http://preminfo.com”>
• <e:empno>1216</e:empno>
• <e:empname>Ram</e:empname>
• <e:salary>45000<e:salary/>
• </e:employee>
In XSD one target NameSpace declaration is posssible.
• In XML any no.of XMLNS declaration are possible.
• Schema
• complexType http://www.w3.org/2001/xml/schema
• Sequence etc
• It is also possible to define XMLNS in XSD

Complex Element: A complex element contains other elements and/or


attributes.
We can define a complex element in an XML Schema two different ways:
1.Complex with child element:
2. element can have a type attribute that refers to the name of the
complex type to use
We can define a complex element in an XML Schema two different
ways:
1.Complex with child element:
Syntax:

<xs:schema>
<xs:element name=“element name”>
<xs:complexType>
<xs:sequence>
<xs:element name=“child1” type=“xs:datatype”/>
<xs:element name=“child2” type=“xs:datatype”/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Example XSD:
<xs:schema>
<xs:element name=“employee”>
<xs:complexType>
<xs:sequence>
<xs:element name=“EmpNo” type=“xs:int”/>
<xs:element name=“EmpName” type=“xs:string”/>
<xs:element name=“EmpSalary” type=“xs:decimal”/>
</xs:sequence>
</xs:complexType>
XML for above XSD
</xs:element> <employee>
<EmpNo>1216</EmpNo>
</xs:schema> <EmpName>Sam</EmpName>
<EmpSalary>35000</EmpSalary>
</employee>
2.element can have a type attribute that refers to the name of the
complex type to use:
• <xs:element name="employee" type="personinfo"/>
<xs:complexType name="personinfo">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
• If you use the method described above, several elements can refer
to the same complex type, like this:
• <xs:element name="employee" type="personinfo"/>
<xs:element name="student" type="personinfo"/>
<xs:element name="member" type="personinfo"/>
<xs:complexType name="personinfo">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
There are four kinds of complex elements:
• 1.empty elements
• 2.elements that contain only other elements
• 3.elements that contain only text
• 4.elements that contain both other elements and text
• Note: Each of these elements may contain attributes as
well!
1.empty elements:-
<productprodid="1345"/>

<xs:element name="product">
<xs:complexType>
<xs:attribute name="prodid“ type= "xs:positiveInteger"/>
</xs:complexType>
</xs:element>
2. Complex Types Containing Elements Only:-

XML
<person>
<firstname>John</firstname>
<lastname>Smith</lastname>
</person>

XML schema
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Complex Text-Only Elements:- it contains simple Content(text/attributes)

we must use simpleContent element around the content.


we must define an extension OR a restriction within the
simpleContent element, like this:

OR
<xs:element name="somename">
<xs:complexType> <xs:element name="somename">
<xs:simpleContent> <xs:complexType>
<xs:extension base="basetype"> <xs:simpleContent>
.... <xs:restriction base="basetype">
.... ....
</xs:extension> ....
</xs:simpleContent> </xs:restriction>
</xs:complexType> </xs:simpleContent>
</xs:element> </xs:complexType>
</xs:element>
Complex Text-Only Elements:- it contains simple content(text/attributes

Example- XML
<carcost Cname=“swift”>600000</carcost>

XML schema
<xs:element name=“carcost">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:integer">
<xs:attribute name=“Cname" type="xs:string" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
Complex Types with Mixed Content:- An XML element that
contains both text and other elements:

XML
<address>
To,<name>Sree Ram</name>
Flat-no-207<aptname>S.S.Heavens</aptname>
<city>hyderabad</city>
</address>

XML schema
<xs:element name=“address">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name=“aptname" type="xs:string"/>
<xs:element name=“city" type="xs:hyderabad"/>
</xs:sequence>
</xs:complexType>
</xs:element>
XSD Indicators: Indicators control the way how elements are to be
organized in an XML document.
 The indicators used to control the elements presentation in the
documents.
 There are seven types of indicators, falling into three broad
categories.
Order Indicators:
 All − Child elements can occur in any order.
 Choice − Only one of the child element can occur.
 Sequence − Child element can occur only in specified order.
Occurence Indicators:
 maxOccurs − Child element can occur only maxOccurs number of
times.
 minOccurs − Child element must occur minOccurs number of times.
Group Indicators:
 Group − Defines related set of elements.
 attributeGroup − Defines related set of attributes.
Order Indicators:the <xs:all> indicate - the child elements described
in the xsd schema can appear in the xml document in any order.
• The child elements described in the xsd schema can appear in the
xml document in any order.
Occurrence indicator :Occurrence indicators are used to define the
frequency of an element occur.
• Note: For all of the "Order" and "Group" indicator (any, all, choice,
sequence , group name and group reference), which maxOccurs and
minOccurs defaults are 1.
• To make the number of occurrences of an element is not limited,
please use the maxOccurs = "unbounded" this statement:
Group Indicators:-<group> is used to group a related set of elements.
The elements and attributes - groups can then be referenced in the
definition of complex types, as shown below:
XML DOM: The Document Object Model (DOM) is a W3C standard.
The DOM presents an XML document as a tree-structure.It defines a
standard for accessing documents like HTML and XML.
Definition: The Document Object Model (DOM) is an application
programming interface (API) for HTML and XML documents.
• It defines the logical structure of documents and the way a
document is accessed and manipulated.
• DOM defines the objects and properties and methods (interface) to
access all XML elements.
• It is separated into 3 different parts / levels −
• Core DOM − standard model for any structured document
• HTML DOM − standard model for HTML documents
• XML DOM − standard model for XML documents
XML DOM is Defined For :
1. Loading the XML Files
2. Accessing the elements of XML Documents .
3. Deleting the Elements of XML Documents .
4. Changing the Elements of XML Documents.
Loading an XML File
• Step 1: Create an empty xmlDocument Object .
Syntax : xmlDocument =new ActiveXObject(Microsoft.XMLDOM);
• Step 2: To Continue the Execution after loading Set the
xmlDocument.async=false ;
• Step 3: Specify the name of the XML file to load .
Syntax : xmlDocument.load(“xmldocument”);
• It is possible to write both internal and External Functions to load an
XML File using DOM Object .
Properties and Methods of XML DOM
• Properties are meant for accessing the XML elements and Methods
are used to perform some actions on the XML Elements .
XML DOM Nodes: In the DOM, everything in an XML document is a
node.
 The entire document is a document node
 Every XML element is an element node
 The text in the XML elements are text nodes
 Every attribute is an attribute node
 Comments are comment nodes
XML DOM properties:-
1.nodeName --->Find the name of the node
2.nodeValue --->Obtain the value of the node
3.parentNode ---> Getting the parent Node Name
4.childNode ---> Obtain the Child Nodes of parent
5.attributes ---> Getting the attribute value of nodes
6.documentElement ---> Get the Root element of Document .
7.firstChild ---> Access the first child of node .
8.nextSibling ---> Access the Sibling elements .
9.nodeType ---> To specify the Type of Node
1-Element,2-attribute , 3-text ,9- Document ,8-comment .
XML DOM Methods:-
1.getElementsByTagName(name) ---> get the Elements of Specified tag
name .
2.appendChild(node) ---> To insert a Child Node .
3.createElement(“newNodeName”)---> To create A node .
4.createTextNode(“valuefornode”)---> To create a value for node .
5.replaceChild(newnode,oldnode);
6.removeChild(node) ---> To remove a Child Node .
7.replaceData(offset,length,replacement) .
8.getAttribute(tagname).
9.setAttribute(“attribute”,”value”) .
10.removeAttribute(“attributename”);
XML Parsers :- To read and update, create and manipulate an XML
document, you will need an XML parser.
• An XML parser is a software library or package that provides
interfaces for client applications to work with an XML
document.
• The XML Parser is designed to read the XML and create a way
for programs to use XML.
• XML parser validates the document and check that the
document is well formatted.
• XML Processors are used to parse the given XML document .
There are two ways to parse the XML document .
1.Tree Based Parsing
2.Event Based Parsing
• There are two types of processors
1.DOM Parser (Document Object Model)
2.SAX Parser (Simple API for XML)
1.DOM Parser (Document Object Model):-A DOM document is an
object which contains all the information of an XML document. It is
composed like a tree structure. The DOM Parser implements a DOM
API. This API is very simple to use.
Features of DOM Parser:-
 A DOM Parser creates an internal structure in memory which is a
DOM document object and the client applications get information
of the original XML document by invoking methods on this
document object.
 DOM Parser has a tree based structure.
Advantages
1) It supports both read and write operations and the API is very
simple to use.
2) It is preferred when random access to widely separated parts of a
document is required.
Disadvantages
1) It is memory inefficient. (consumes more memory because the
whole XML document needs to loaded into memory).
2) It is comparatively slower than other parsers.

You might also like