100% found this document useful (1 vote)
510 views

Introduction To XML

This document provides an introduction to XML (eXtensible Markup Language). It describes XML as a markup language that uses tags to provide extra information about a document. The document discusses how XML is used to transfer data between places, its advantages over other formats, basic XML rules and syntax, and differences between XML and HTML. It provides examples of XML and HTML code.

Uploaded by

Swapnil Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
510 views

Introduction To XML

This document provides an introduction to XML (eXtensible Markup Language). It describes XML as a markup language that uses tags to provide extra information about a document. The document discusses how XML is used to transfer data between places, its advantages over other formats, basic XML rules and syntax, and differences between XML and HTML. It provides examples of XML and HTML code.

Uploaded by

Swapnil Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 35

Introduction to XML

Extensible Markup Language


What is XML

• XML stands for eXtensible Markup Language.


• A markup language is used to provide
information about a document.
• Tags are added to the document to provide the
extra information.
• HTML tags tell a browser how to display the
document.
• XML tags give a reader some idea what some of
the data means.
What is XML Used For?
• XML documents are used to transfer data from one
place to another often over the Internet.
• XML subsets are designed for particular applications.
• One is RSS (Rich Site Summary or Really Simple
Syndication ). It is used to send breaking news bulletins
from one web site to another.
• A number of fields have their own subsets. These
include chemistry, mathematics, and books publishing.
• Most of these subsets are registered with the
W3Consortium and are available for anyone’s use.
Advantages of XML

• XML is text (Unicode) based.


– Takes up less space.
– Can be transmitted efficiently.
• One XML document can be displayed differently
in different media.
– Html, video, CD, DVD,
– You only have to change the XML document in order
to change all the rest.
• XML documents can be modularized. Parts can
be reused.
Example of an HTML Document

<html>
<head><title>Example</title></head.
<body>
<h1>This is an example of a page.</h1>
<h2>Some information goes here.</h2>
</body>
</html>
Example of an XML Document

<?xml version=“1.0”/>
<address>
<name>Alice Lee</name>
<email>[email protected]</email>
<phone>212-346-1234</phone>
<birthday>1985-03-22</birthday>
</address>
Difference Between HTML and XML

• HTML tags have a fixed meaning and


browsers know what it is.
• XML tags are different for different
applications, and users know what they
mean.
• HTML tags are used for display.
• XML tags are used to describe documents
and data.
XML Rules

• Tags are enclosed in angle brackets.


• Tags come in pairs with start-tags and
end-tags.
• Tags must be properly nested.
– <name><email>…</name></email> is not allowed.
– <name><email>…</email><name> is.
• Tags that do not have end-tags must be
terminated by a ‘/’.
– <br /> is an html example.
More XML Rules
• Tags are case sensitive.
– <address> is not the same as <Address>
• XML in any combination of cases is not allowed
as part of a tag.
• Tags may not contain ‘<‘ or ‘&’.
• Tags follow Java naming conventions, except
that a single colon and other characters are
allowed. They must begin with a letter and may
not contain white space.
• Documents must have a single root tag that
begins the document.
Encoding
• XML (like Java) uses Unicode to encode characters.
• Unicode comes in many flavors. The most common one
used in the West is UTF-8.
• UTF-8 is a variable length code. Characters are
encoded in 1 byte, 2 bytes, or 4 bytes.
• The first 128 characters in Unicode are ASCII.
• In UTF-8, the numbers between 128 and 255 code for
some of the more common characters used in western
Europe, such as ã, á, å, or ç.
• Two byte codes are used for some characters not listed
in the first 256 and some Asian ideographs.
• Four byte codes can handle any ideographs that are left.
• Those using non-western languages should investigate
other versions of Unicode.
Well-Formed Documents
• An XML document is said to be well-formed if it follows all the rules.
• An XML parser is used to check that all the rules have been obeyed.
• Recent browsers such as Internet Explorer 5 and Netscape 7 come
with XML parsers.
• Parsers are also available for free download over the Internet.
• A well-formed XML document is a document that conforms to the
XML syntax rules, like:
• it must begin with the XML declaration
• it must have one unique root element
• start-tags must have matching end-tags
• elements are case sensitive
• all elements must be closed
• all elements must be properly nested
• all attribute values must be quoted
• entities must be used for special characters
XML Example Revisited
<?xml version=“1.0”/>
<address>
<name>Alice Lee</name>
<email>[email protected]</email>
<phone>212-346-1234</phone>
<birthday>1985-03-22</birthday>
</address>
• Markup for the data aids understanding of its purpose.
• A flat text file is not nearly so clear.
Alice Lee
[email protected]
212-346-1234
1985-03-22
• The last line looks like a date, but what is it for?
Expanded Example
<?xml version = “1.0” ?>
<address>
<name>
<first>Alice</first>
<last>Lee</last>
</name>
<email>[email protected]</email>
<phone>123-45-6789</phone>
<birthday>
<year>1983</year>
<month>07</month>
<day>15</day>
</birthday>
</address>
XML Files are Trees

address

name email phone birthday

first last year month day


XML Trees

• An XML document has a single root node.


• The tree is a general ordered tree.
– A parent node may have any number of
children.
– Child nodes are ordered, and may have
siblings.
• Preorder traversals are usually used for
getting information out of the tree.
XML Attributes
• Attribute values must always be quoted.
Either single or double quotes can be
used.
• For a student's gender, the <student>
element can be written as:
• <student gender="female">
XML Elements vs. Attributes
• <student gender="female">
<firstname>Abc</firstname>
<lastname>Pqr</lastname>
</student >
• OR
<student >
<gender>female</gender>
<firstname>Abc</firstname>
<lastname>Pqr</lastname>
</student>
• Both examples provide the same information.
• There are no rules about when to use attributes or
when to use elements in XML.
Some things to consider when using
attributes are:
• attributes cannot contain multiple values
(elements can)
• attributes cannot contain tree structures
(elements can)
• attributes are not easily expandable (for
future changes)
XML Namespaces
• XML Namespaces provide a method to
avoid element name conflicts.
• In XML, element names are defined by the
developer. This often results in a conflict
when trying to mix XML documents from
different XML applications.
Validity
• Use our XML validator to syntax-check your
XML. With XML, errors are not allowed.
• An XML document with correct syntax is called
"Well Formed". A well-formed document has a
tree structure and obeys all the XML rules.
• A particular application may add more rules in
either a DTD (document type definition) or in a
schema.
• Many specialized DTDs and schemas have
been created to describe particular areas.
• These range from disseminating news bulletins
(RSS) to chemical formulas.
• DTDs were developed first, so they are not as
comprehensive as schema.
Valid XML Documents
• A "well formed" XML document is not the same as a
"valid" XML document.
• A "valid" XML document must be well formed. In
addition, it must conform to a document type
definition/ Schema.
• There are two different document type definitions that
can be used with XML:
• DTD - The original Document Type Definition
• XML Schema - An XML-based alternative to DTD
• A document type definition defines the rules and the
legal elements and attributes for an XML document.
XML DTD- Document Type Definitions
• An XML document with correct syntax is called "Well
Formed".
• An XML document validated against a DTD is both
"Well Formed" and "Valid".
• A DTD describes the tree structure of a document
and something about its data.
• There are two data types, PCDATA and CDATA.
– PCDATA is parsed character data.
– CDATA is character data, not usually parsed.
• A DTD determines how many times a node may
appear, and how child nodes are ordered.
DTD for address Example
<!ELEMENT address (name, email, phone, birthday)>
<!ELEMENT name (first, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT last (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT birthday (year, month, day)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT day (#PCDATA)>
Schemas

• Schemas are themselves XML documents.


• They were standardized after DTDs and provide
more information about the document.
• They have a number of data types including
string, decimal, integer, boolean, date, and time.
• They divide elements into simple and complex
types.
• They also determine the tree structure and how
many children a node may have.
Schema for First address Example
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="address">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="email" type="xs:string"/>
<xs:element name="phone" type="xs:string"/>
<xs:element name="birthday" type="xs:date"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Explanation of Example Schema
<?xml version="1.0" encoding="ISO-8859-1" ?>
• ISO-8859-1, Latin-1, is the same as UTF-8 in the first 128 characters.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
• www.w3.org/2001/XMLSchema contains the schema standards.
<xs:element name="address">
<xs:complexType>
• This states that address is a complex type element.
<xs:sequence>
• This states that the following elements form a sequence and must
come in the order shown.
<xs:element name="name" type="xs:string"/>
• This says that the element, name, must be a string.
<xs:element name="birthday" type="xs:date"/>
• This states that the element, birthday, is a date. Dates are always of
the form yyyy-mm-dd.
XML Schemas are More
Powerful than DTD
• XML Schemas are written in XML
• XML Schemas are extensible to additions
• XML Schemas support data types
• XML Schemas support namespaces
Why Use an XML Schema?
• With XML Schema, your XML files can
carry a description of its own format.
• With XML Schema, independent groups of
people can agree on a standard for
interchanging data.
• With XML Schema, you can verify data.
XML Schemas Support Data
Types
• One of the greatest strengths of XML
Schemas is the support for data types:
• It is easier to describe document content
• It is easier to define restrictions on data
• It is easier to validate the correctness of
data
• It is easier to convert data between
different data types
XML Schemas use XML Syntax
• Another great strength about XML
Schemas is that they are written in XML:
• You don't have to learn a new language
• You can use your XML editor to edit your
Schema files
• You can use your XML parser to parse
your Schema files
• You can manipulate your Schemas with
the XML DOM
• You can transform your Schemas with
XSLT
When to Use a DTD/Schema?
• With a DTD, independent groups of people
can agree to use a standard DTD for
interchanging data.
• With a DTD, you can verify that the data
you receive from the outside world is valid.
• You can also use a DTD to verify your own
data.
When NOT to Use a
DTD/Schema?
• XML does not require a DTD/Schema.
• When you are experimenting with XML, or
when you are working with small XML
files, creating DTDs may be a waste of
time.
• If you develop applications, wait until the
specification is stable before you add a
document definition. Otherwise, your
software might stop working because of
validation errors.
Parsers

• There are two principal models for


parsers.
• SAX – Simple API for XML
– Uses a call-back method
– Similar to javax listeners
• DOM – Document Object Model
– Creates a parse tree
– Requires a tree traversal
XML DOM
• The XML DOM makes a tree-structure
view for an XML document.
• We can access all elements through the
DOM tree.
• We can modify or delete their content and
also create new elements. The elements,
their content (text and attributes) are all
known as nodes.
• According to the XML DOM, everything in
an XML document is a node:
• The entire document is a document node
• Every XML element is an element node
• The text in the XML elements are text
nodes
• Every attribute is an attribute node
• Comments are comment nodes

You might also like