XML - Overview
XML - Overview
XML tags identify the data and are used to store and organize the data, rather
than specifying how to display it like HTML tags, which are used to display the
data. XML is not going to replace HTML in the near future, but it introduces new
possibilities by adopting many successful features of HTML.
There are three important characteristics of XML that make it useful in a variety of
systems and solutions −
XML is extensible − XML allows you to create your own self-descriptive tags,
or language, that suits your application.
XML carries the data, does not present it − XML allows you to store the data
irrespective of how it will be presented.
XML is a public standard − XML was developed by an organization called the
World Wide Web Consortium (W3C) and is available as an open standard.
XML Usage
XML can work behind the scene to simplify the creation of HTML documents
for large web sites.
XML can be used to exchange the information between organizations and
systems.
XML can be used for offloading and reloading of databases.
XML can be used to store and arrange the data, which can customize your
data handling needs.
XML can easily be merged with style sheets to create almost any desired
output.
Virtually, any type of data can be expressed as an XML document.
What is Markup?
XML is a markup language that defines set of rules for encoding documents in a
format that is both human-readable and machine-readable. So what exactly is a
markup language? Markup is information added to a document that enhances its
meaning in certain ways, in that it identifies the parts and how they relate to each
other. More specifically, a markup language is a set of symbols that can be placed
in the text of a document to demarcate and label the parts of that document.
Following example shows how XML markup looks, when embedded in a piece of
text −
<message>
<text>Hello, world!</text>
</message>
1
This snippet includes the markup symbols, or the tags such as
<message>...</message> and <text>... </text>. The tags <message> and
</message> mark the start and the end of the XML code fragment. The tags
<text> and </text> surround the text Hello, world!.
XML - Syntax
In this chapter, we will discuss the simple syntax rules to write an XML document.
Following is a complete XML document −
You can notice there are two kinds of information in the above example −
The following diagram depicts the syntax rules to write different types of markup
and text in an XML document.
Where version is the XML version and encoding specifies the character encoding
used in the document.
<element>
Element Syntax − Each XML-element needs to be closed either with start or with
end elements as shown below −
<element>....</element>
<element/>
Root Element − An XML document can have only one root element. For example,
following is not a correct XML document, because both the x and y elements occur
at the top level without a root element −
<x>...</x>
<y>...</y>
<root>
<x>...</x>
<y>...</y>
</root>
Case Sensitivity − The names of XML-elements are case-sensitive. That means the
name of the start and the end elements need to be exactly in the same case.
XML Attributes
An attribute specifies a single property for the element, using a name/value pair.
An XML-element can have one or more attributes. For example −
<a b = x>....</a>
4
In the above syntax, the attribute value is not defined in quotation marks.
XML References
References usually allow you to add or include additional text or markup in an
XML document. References always begin with the symbol "&" which is a reserved
character and end with the symbol ";". XML has two types of references −
Entity References − An entity reference contains a name between the start and
the end delimiters. For example & where amp is name. The name refers to
a predefined string of text and/or markup.
Character References − These contain references, such as A, contains a hash
mark (“#”) followed by a number. The number always refers to the Unicode
code of a character. In this case, 65 refers to alphabet "A".
XML Text
The names of XML-elements and XML-attributes are case-sensitive, which means
the name of start and end elements need to be written in the same case. To avoid
character encoding problems, all XML files should be saved as Unicode UTF-8 or
UTF-16 files.
Some characters are reserved by the XML syntax itself. Hence, they cannot be
used directly. To use them, some replacement-entities are used, which are listed
below −
XML - Documents
5
A simple document is shown in the following example −
XML declaration
Document type declaration
You can learn more about XML declaration in this chapter − XML Declaration
You can learn more about XML elements in this chapter − XML Elements
XML - Declaration
This chapter covers XML declaration in detail. XML declaration contains details
that prepare an XML processor to parse the XML document. It is optional, but
when used, it must appear in the first line of the XML document.
Syntax
6
Following syntax shows XML declaration −
<?xml
version = "version_number"
encoding = "encoding_declaration"
standalone = "standalone_status"
?>
Each parameter consists of a parameter name, an equals sign (=), and parameter
value inside a quote. Following table shows the above syntax in detail −
Rules
An XML declaration should abide with the following rules −
If the XML declaration is present in the XML, it must be placed as the first
line in the XML document.
If the XML declaration is included, it must contain version number attribute.
The Parameter names and values are case-sensitive.
The names are always in lower case.
The order of placing the parameters is important. The correct order
is: version, encoding and standalone.
Either single or double quotes may be used.
The XML declaration has no closing tag i.e. </?xml>
XML declaration with all parameters defined − <?xml version = "1.0" encoding = "UTF-8"
standalone = "no" ?>
7
XML declaration with all parameters defined in single quotes −
Let us learn about one of the most important part of XML, the XML tags. XML
tags form the foundation of XML. They define the scope of an element in XML.
They can also be used to insert comments, declare settings required for parsing
the environment, and to insert special instructions.
Start Tag
The beginning of every non-empty XML element is marked by a start-tag.
Following is an example of start-tag − <address>
End Tag
Every element that has a start tag should end with an end-tag. Following is an
example of end-tag − </address>
Note, that the end tags include a solidus ("/") before the name of an element.
Empty Tag
The text that appears between start-tag and end-tag is called content. An element
which has no content is termed as empty. An empty element can be represented
in two ways as follows −
<hr></hr>
<hr />
Empty-element tags may be used for any element which has no content.
Rule 1 XML tags are case-sensitive. Following line of code is an example of wrong
syntax </Address>, because of the case difference in two tags, which is treated as
erroneous syntax in XML.
<address>This is wrong syntax</Address>
8
Following code shows a correct way, where we use the same case to name the
start and the end tag.
Rule 2 XML tags must be closed in an appropriate order, i.e., an XML tag opened inside
another element must be closed before the outer element is closed. For example −
<outer_element>
<internal_element>
This tag is closed before the outer_element
</internal_element>
</outer_element>
XML - Elements
XML elements can be defined as building blocks of an XML. Elements can behave
as containers to hold text, elements, attributes, media objects or all of these.
Each XML document contains one or more elements, the scope of which are either
delimited by start and end tags, or for empty elements, by an empty-element tag.
where,
element-name is the name of the element. The name its case in the start and
end tags must match.
attribute1, attribute2 are attributes of the element separated by white spaces.
An attribute defines a property of the element. It associates a name with a
value, which is a string of characters. An attribute is written as − name = "value"
name is followed by an = sign and a string value inside double(" ") or single(' ')
quotes.
XML - Attributes
This chapter describes the XML attributes. Attributes are part of XML elements.
An element can have multiple unique attributes. Attribute gives more information
about XML elements. To be more precise, they define properties of elements. An
XML attribute is always a name-value pair.
Syntax
An XML attribute has the following syntax −
name = "value"
Attributes are used to add a unique label to an element, place the label in a
category, add a Boolean flag, or otherwise associate it with some string of data.
Following example demonstrates the use of attributes −
Attributes are used to distinguish among elements of the same name, when you
do not want to create a new element for every situation. Hence, the use of an
attribute can add a little more detail in differentiating two or more similar
elements.
You can also observe that we have declared this attribute at the beginning of XML.
Attribute Types
Following table lists the type of attributes −
This has a list of predefined values in its declaration. out of which, it must
assign one value. There are two types of enumerated attribute −
NotationType − It declares that an element will be referenced to a
EnumeratedType
NOTATION declared somewhere else in the XML document.
Enumeration − Enumeration allows you to define a specific list of
values that the attribute value must match.
An attribute name must not appear more than once in the same start-tag or
empty-element tag.
An attribute must be declared in the Document Type Definition (DTD) using
an Attribute-List Declaration.
Attribute values must not contain direct or indirect entity references to
external entities.
The replacement text of any entity referred to directly or indirectly in an
attribute value must not contain a less than sign (<)
XML - Comments
This chapter explains how comments work in XML documents. XML comments are
similar to HTML comments. The comments are added as notes or lines for
understanding the purpose of an XML code.
Comments can be used to include related links, information, and terms. They are
visible only in the source code; not in the XML code. Comments may appear
anywhere in XML code.
Syntax
XML comment has the following syntax −
<!--Your comment-->
A comment starts with <!-- and ends with -->. You can add textual notes as
comments between the characters. You must not nest one comment inside the
other.
Example
12
Comments cannot appear before XML declaration.
Comments may appear anywhere in a document.
Comments must not appear within attribute values.
Comments cannot be nested inside the other comments.
This chapter describes the XML Character Entities. Before we understand the
Character Entities, let us first understand what an XML entity is.
"The document entity serves as the root of the entity tree and a starting-point for
an XML processor".
This means, entities are the placeholders in XML. These can be declared in the
document prolog or in a DTD. There are different types of entities and in this
chapter we will discuss Character Entity.
Both, HTML and XML, have some symbols reserved for their use, which cannot be
used as content in XML code. For example, < and > signs are used for opening
and closing XML tags. To display these special characters, the character entities
are used.
There are few special characters or symbols which are not available to be typed
directly from the keyboard. Character Entities can also be used to display those
symbols/special characters.
They are introduced to avoid the ambiguity while using some symbols. For
example, an ambiguity is observed when less than ( < ) or greater than ( > )
symbol is used with the angle tag (<>). Character entities are basically used to
delimit tags in XML. Following is a list of pre-defined character entities from XML
specification. These can be used to express characters without ambiguity.
Ampersand − &
Single quote − '
Greater than − >
Less than − <
Double quote − "
13
Numeric Character Entities
The following table lists some predefined character entities with their numeric
values −
For example −
In this chapter, we will discuss XML CDATA section. The term CDATA means,
Character Data. CDATA is defined as blocks of text that are not parsed by the
parser, but are otherwise recognized as markup.
The predefined entities such as <, >, and & require typing and are
generally difficult to read in the markup. In such cases, CDATA section can be
14
used. By using CDATA section, you are commanding the parser that the particular
section of the document contains no markup and should be treated as regular
text.
Syntax
Following is the syntax for CDATA section −
<![CDATA[
characters with markup
]]>
Example
The following markup code shows an example of CDATA. Here, each character
written inside the CDATA section is ignored by the parser.
<script>
<![CDATA[
<message> Welcome to TutorialsPoint </message>
]] >
</script >
CDATA Rules
The given rules are required to be followed for XML CDATA −
CDATA cannot contain the string "]]>" anywhere in the XML document.
Nesting is not allowed in CDATA section.
XML - WhiteSpaces
<name>TanmayPatil</name>
and
<name>Tanmay Patil</name>
Insignificant Whitespace
Insignificant whitespace means the space where only element content is allowed.
For example −
<address.category = "residence">
or
<address....category = "..residence">
The above examples are same. Here, the space is represented by dots (.). In the
above example, the space between address and category is insignificant.
Where,
The value default signals that the default whitespace processing modes of
an application are acceptable for this element.
The value preserve indicates the application to preserve all the whitespaces.
XML - Processing
This chapter describes the Processing Instructions (PIs). As defined by the XML 1.0
Recommendation,
16
Processing instructions (PIs) can be used to pass information to applications. PIs
can appear anywhere in the document outside the markup. They can appear in
the prolog, including the document type definition (DTD), in textual content, or
after the document.
Syntax
Following is the syntax of PI − <?target instructions?>
Where
A PI starts with a special tag <? and ends with ?>. Processing of the contents ends
immediately after the string ?> is encountered.
Example
PIs are rarely used. They are mostly used to link XML document to a style sheet.
Following is an example −
In this case, a browser recognizes the target by indicating that the XML should be
transformed before being shown; the first attribute states that the type of the
transform is XSL and the second attribute points to its location.
A PI can contain any data except the combination ?>, which is interpreted as the
closing delimiter. Here are two examples of valid PIs −
<?welcome?>
XML - Encoding
17
Encoding is the process of converting unicode characters into their equivalent
binary representation. When the XML processor reads an XML document, it
encodes the document depending on the type of encoding. Hence, we need to
specify the type of encoding in the XML declaration.
Encoding Types
There are mainly two types of encoding −
UTF-8
UTF-16
UTF stands for UCS Transformation Format, and UCS itself means Universal
Character Set. The number 8 or 16 refers to the number of bits used to represent
a character. They are either 8(1 to 4 bytes) or 16(2 or 4 bytes). For the
documents without encoding information, UTF-8 is set by default.
Syntax
Encoding type is included in the prolog section of the XML document. The syntax
for UTF-8 encoding is as follows −
Example
The XML files encoded with UTF-8 tend to be smaller in size than those encoded
with UTF-16 format.
XML - Validation
Non DTD XML files must use the predefined character entities
for amp(&), apos(single quote), gt(>), lt(<), quot(double quote).
It must follow the ordering of the tag. i.e., the inner tag must be closed before
closing the outer tag.
Each of its opening tags must have a closing tag or it must be a self ending
tag.(<title>....</title> or <title/>).
It must have only one attribute in a start tag, which needs to be quoted.
amp(&), apos(single quote), gt(>), lt(<), quot(double quote) entities other than
these must be declared.
Example
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
It defines the type of document. Here, the document type is element type.
It includes a root element named as address.
Each of the child elements among name, company and phone is enclosed in
its self explanatory tag.
Order of the tags is maintained.
XML - DTDs
An XML DTD can be either specified inside the document, or it can be kept in a
separate document and then liked separately.
Syntax
Basic syntax of a DTD is as follows −
Internal DTD
A DTD is referred to as an internal DTD if elements are declared within the XML
files. To refer it as internal DTD, standalone attribute in XML declaration must be
set to yes. This means, the declaration works independent of an external source.
Syntax
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
Start Declaration − Begin the XML declaration with the following statement.
DTD − Immediately after the XML header, the document type declaration follows,
commonly referred to as the DOCTYPE −
<!DOCTYPE address [
The DOCTYPE declaration has an exclamation mark (!) at the start of the element
name. The DOCTYPE informs the parser that a DTD is associated with this XML
document.
DTD Body − The DOCTYPE declaration is followed by body of the DTD, where you
declare elements, attributes, entities, and notations.
Several elements are declared here that make up the vocabulary of the <name>
document. <!ELEMENT name (#PCDATA)> defines the element name to be of
type "#PCDATA". Here #PCDATA means parse-able text data.
End Declaration − Finally, the declaration section of the DTD is closed using a
closing bracket and a closing angle bracket (]>). This effectively ends the
definition, and thereafter, the XML document follows immediately.
21
Rules
The document type declaration must appear at the start of the document
(preceded only by the XML header) − it is not permitted anywhere else within
the document.
Similar to the DOCTYPE declaration, the element declarations must start with
an exclamation mark.
The Name in the document type declaration must match the element type of
the root element.
External DTD
In external DTD elements are declared outside the XML file. They are accessed by
specifying the system attributes which may be either the legal .dtd file or a valid
URL. To refer it as external DTD, standalone attribute in the XML declaration must
be set as no. This means, declaration includes information from the external
source.
Syntax
Example
Types
You can refer to an external DTD by using either system identifiers or public
identifiers.
22
System Identifiers
As you can see, it contains keyword SYSTEM and a URI reference pointing to the
location of the document.
Public Identifiers
XML - Schemas
Syntax
You need to declare a schema in your XML document as follows −
Example
The following example shows how to use schema −
The basic idea behind XML Schemas is that they describe the legitimate format
that an XML document can take.
Elements
As we saw in the XML - Elements chapter, elements are the building blocks of XML
document. An element can be defined within an XSD as follows −
Definition Types
You can define XML schema elements in the following ways −
Simple Type
Simple type element is used only in the context of the text. Some of the
predefined simple types are: xs:integer, xs:boolean, xs:string, xs:date. For
example −
Complex Type
A complex type is a container for other element definitions. This allows you to
specify which child elements an element can contain and to provide some
structure within your XML documents. For example −
Global Types
With the global type, you can define a single type in your document, which can be
used by all other references. For example, suppose you want to generalize
the person and company for different addresses of the company. In such case,
you can define a general type as follows −
24
<xs:element name = "AddressType">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string" />
<xs:element name = "company" type = "xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
Instead of having to define the name and the company twice (once
for Address1 and once for Address2), we now have a single definition. This makes
maintenance simpler, i.e., if you decide to add "Postcode" elements to the
address, you need to add them at just one place.
Attributes
Attributes in XSD provide extra information within an element. Attributes
have name and type property as shown below −
The tree structure contains root (parent) elements, child elements and so on. By
using tree structure, you can get to know all succeeding branches and sub-
branches starting from the root. The parsing starts at the root, then moves down
the first branch to an element, take the first branch from there, and so on to the
leaf nodes.
25
Example
Following example demonstrates simple XML tree structure −
In the above diagram, there is a root element named as <company>. Inside that,
there is one more element <Employee>. Inside the employee element, there are
five branches named <FirstName>, <LastName>, <ContactNo>, <Email>, and
<Address>. Inside the <Address> element, there are three sub-branches, named
<City> <State> and <Zip>.
XML - DOM
The Document Object Model (DOM) is the foundation of XML. XML documents
have a hierarchy of informational units called nodes; DOM is a way of describing
those nodes and the relationships between them.
26
for specific information. Because it is based on a hierarchy of information, the
DOM is said to be tree based.
The XML DOM, on the other hand, also provides an API that allows a developer to
add, edit, move, or remove nodes in the tree at any point in order to create an
application.
Example
The following example (sample.htm) parses an XML document ("address.xml")
into an XML DOM object and then extracts some information from it with
JavaScript −
<!DOCTYPE html>
<html>
<body>
<h1>TutorialsPoint DOM example </h1>
<div>
<b>Name:</b> <span id = "name"></span><br>
<b>Company:</b> <span id = "company"></span><br>
<b>Phone:</b> <span id = "phone"></span>
</div>
<script>
if (window.XMLHttpRequest)
{// code for IE7+, Firefox, Chrome, Opera, Safari
xmlhttp = new XMLHttpRequest();
}
else
{// code for IE6, IE5
xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.open("GET","/xml/address.xml",false);
xmlhttp.send();
xmlDoc = xmlhttp.responseXML;
document.getElementById("name").innerHTML=
xmlDoc.getElementsByTagName("name")[0].childNodes[0].nodeValue;
document.getElementById("company").innerHTML=
xmlDoc.getElementsByTagName("company")[0].childNodes[0].nodeValue;
document.getElementById("phone").innerHTML=
xmlDoc.getElementsByTagName("phone")[0].childNodes[0].nodeValue;
</script>
</body>
</html>
Now let us keep these two files sample.htm and address.xml in the same
directory /xml and execute the sample.htm file by opening it in any browser. This
should produce the following output.
Here, you can see how each of the child nodes is extracted to display their values.
XML - Namespaces
Namespace Declaration
A Namespace is declared using reserved attributes. Such an attribute name must
either be xmlns or begin with xmlns: shown as below −
Syntax
The Namespace starts with the keyword xmlns.
The word name is the Namespace prefix.
The URL is the Namespace identifier.
Example
Namespace affects only a limited area in the document. An element containing
the declaration and all of its descendants are in the scope of the Namespace.
Following is a simple example of XML Namespace −
28
Here, the Namespace prefix is cont, and the Namespace identifier (URI)
as www.tutorialspoint.com/profile. This means, the element names and attribute
names with the cont prefix (including the contact element), all belong to
the www.tutorialspoint.com/profile namespace.
XML - Databases
XML Database is used to store huge amount of information in the XML format. As
the use of XML is increasing in every field, it is required to have a secured place to
store the XML documents. The data stored in the database can be queried
using XQuery, serialized, and exported into a desired format.
XML- enabled
Native XML (NXD)
Example
Following example demonstrates XML database −
<contact2>
29
<name>Manisha Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 789-4567</phone>
</contact2>
</contact-info>
Here, a table of contacts is created that holds the records of contacts (contact1
and contact2), which in turn consists of three entities − name,
company and phone.
30