XML
XML
Introduction
XML stands for Extensible Markup Language. It is a text-based
markup language derived from Standard Generalized Markup
Language (SGML).
XML tags identify the data and are used to store and organize
the data, rather than specifying how to display it like HTML
tags, which are used to display the data. XML is not going to
replace HTML in the near future, but it introduces new
possibilities by adopting many successful features of HTML.
There are three important characteristics of XML that make it
useful in a variety of systems and solutions −
XML is extensible − XML allows you to create your own self-
descriptive tags, or language, that suits your application.
XML carries the data, does not present it − XML allows you to
store the data irrespective of how it will be presented.
XML is a public standard − XML was developed by an organization
called the World Wide Web Consortium (W3C) and is available as an
open standard.
XML Usage
XML can work behind the scene to simplify the
creation of HTML documents for large web sites.
XML can be used to exchange the information
between organizations and systems.
XML can be used to store and arrange the data, which
can customize your data handling needs.
XML can easily be merged with style sheets to create
almost any desired output.
Virtually, any type of data can be expressed as an XML
document.
What is Markup?
XML is a markup language that defines set of rules for
encoding documents in a format that is both human-
readable and machine-readable.
So what exactly is a markup language?
Markup is information added to a document that
enhances its meaning in certain ways, in that it
identifies the parts and how they relate to each other.
More specifically, a markup language is a set of
symbols that can be placed in the text of a document
to demarcate and label the parts of that document.
Example
<message>
<text>Hello, world!</text>
</message>
Is XML a Programming Language?
A programming language consists of grammar rules
and its own vocabulary which is used to create
computer programs. These programs instruct the
computer to perform specific tasks.
XML does not qualify to be a programming language
as it does not perform any computation or algorithms.
It is usually stored in a simple text file and is processed
by special software that is capable of interpreting
XML.
XML - Syntax
<?xml version = "1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
Example 2
<?xml version = "1.0"?>
<contact-info>
<company>TutorialsPoint</company>
</contact-info>
Syntax Rules for Tags and Elements
Root Element − An XML document can have only
one root element. For example, following is not a
correct XML document, because both
the x and yelements occur at the top level without
a root element −
<x>...</x>
<y>...</y>
The Following example shows a correctly formed
XML document −
<root>
<x>...</x>
<y>...</y>
</root>
Syntax Rules for Tags and Elements
Case Sensitivity − The names of XML-
elements are case-sensitive. That means the
name of the start and the end elements need
to be exactly in the same case.
For example, <contact-info> is different
from <Contact-Info>
XML Attributes
An attribute specifies a single property for
the element, using a name/value pair.
An XML-element can have one or more
attributes.
For example −
<a href =
"http://www.tutorialspoint.com/">Tutorialspoint!
</a>
Here href is the attribute name
and http://www.tutorialspoint.com/ is
attribute value.
Syntax Rules for XML Attributes
• Attribute names in XML (unlike HTML) are
case sensitive.
• That is, HREFand href are considered two
different XML attributes.
• Same attribute cannot have two values in a
syntax.
• Attribute names are defined without quotation
marks, whereas attribute values must always
appear in quotation marks.
XML
References
References usually allow you to add or include
additional text or markup in an XML document.
References always begin with the symbol "&" which is
a reserved character and end with the symbol ";".
XML has two types of references −
1. Entity References − An entity reference contains a
name between the start and the end delimiters. For
example & where amp is name. The name refers
to a predefined string of text and/or markup.
2. Character References − These contain references,
such as A, contains a hash mark (“#”) followed
by a number. The number always refers to the
Unicode code of a character. In this case, 65 refers to
alphabet "A".
XML Text
The names of XML-elements and XML-attributes are
case-sensitive, which means the name of start and end
elements need to be written in the same case.
To avoid character encoding problems, all XML files
should be saved as Unicode UTF-8 or UTF-16 files.
XML Text
Some characters are reserved by the XML syntax itself.
Hence, they cannot be used directly.
To use them, some replacement-entities are used,
which are listed below −
Not Allowed Replacement Entity Character
Character Description
<?xml
version = "version_number"
encoding = "encoding_declaration"
standalone = "standalone_status"
?>
XML - Declaration
Parameter Parameter_value Parameter_description
Version 1.0 Specifies the version of the XML standard
used.
Encoding UTF-8, UTF-16, ISO- It defines the character encoding used in
10646-UCS-2, ISO- the document. UTF-8 is the default
10646-UCS-4, ISO- encoding used.
8859-1 to ISO-8859-9,
ISO-2022-JP, Shift_JIS,
EUC-JP
Standalone yes or no It informs the parser whether the
document relies on the information from
an external source, such as external
document type definition (DTD), for its
content. The default value is set to no.
Setting it to yes tells the processor there
are no external declarations required for
parsing the document.
XML - Tags
XML tags form the foundation of XML.
They define the scope of an element in XML.
They can also be used to insert comments, declare
settings required for parsing the environment, and to
insert special instructions.
XML - Tags
Start Tag
The beginning of every non-empty XML element is marked by a
start-tag. Following is an example of start-tag −
<address> End Tag
Every element that has a start tag should end with an end-tag.
Following is an example of end-tag −
</address> Note, that the end tags include a solidus ("/") before the
name of an element.
Empty Tag
The text that appears between start-tag and end-tag is called
content. An element which has no content is termed as empty.
An empty element can be represented in two ways as follows −
A start-tag immediately followed by an end-tag as shown below −
<hr></hr>A complete empty-element tag is as shown below −
<hr />Empty-element tags may be used for any element which has
no content.
XML - Elements
XML elements can be defined as building blocks of an
XML. Elements can behave as containers to hold text,
elements, attributes, media objects or all of these.
Each XML document contains one or more elements,
the scope of which are either delimited by start and
end tags, or for empty elements, by an empty-
element tag.
Syntax
Following is the syntax to write an XML element −
<element-name attribute1 attribute2> ....content </element-
name>
XML - Attributes
Attributes are part of XML elements. An element can
have multiple unique attributes. Attribute gives more
information about XML elements. To be more precise,
they define properties of elements. An XML attribute is
always a name-value pair.
Syntax
An XML attribute has the following syntax −
<element-name attribute1 attribute2 >
....content..
< /element-name>
where attribute1 and attribute2 has the following form −
name = "value"
value has to be in double (" ") or single (' ') quotes.
Here, attribute1 andattribute2 are unique attribute
labels.
XML - Attributes
<?xml version = "1.0" encoding = "UTF-8"?>
<!DOCTYPE garden [
<!ELEMENT garden (plants)*>
<!ELEMENT plants (#PCDATA)>
<!ATTLIST plants category CDATA #REQUIRED>
]>
<garden>
<plants category = "flowers" />
<plants category = "shrubs">
</plants>
</garden>
XML - Comments
XML comments are similar to HTML comments. The
comments are added as notes or lines for understanding
the purpose of an XML code.
Comments can be used to include related links,
information, and terms. They are visible only in the
source code; not in the XML code. Comments may appear
anywhere in XML code.
Syntax
XML comment has the following syntax −
<!-------Your comment----->
A comment starts with <!-- and ends with -->. You can
add textual notes as comments between the characters.
You must not nest one comment inside the other.
XML Comments Rules
Following rules should be followed for XML
comments −
Comments cannot appear before XML declaration.
Comments may appear anywhere in a document.
Comments must not appear within attribute values.
Comments cannot be nested inside the other comments.
XML - Character Entities
There are few special characters or symbols which are
not available to be typed directly from the keyboard.
Character Entities can also be used to display those
symbols/special characters.
Ampersand − &
Single quote − '
Greater than − >
Less than − <
Double quote − "
XML - Character Entities
Numeric Character Entities
The numeric reference is used to refer to a
character entity.
Numeric reference can either be in decimal or
hexadecimal format.
As there are thousands of numeric references
available, these are a bit hard to remember.
Numeric reference refers to the character by its
number in the Unicode character set.
General syntax for decimal numeric reference is −
&# decimal number ;
General syntax for hexadecimal numeric reference is −
&#x Hexadecimal number ;
Numeric Character Entities
Entity name Character Decimal Hexadecimal
reference reference
<![CDATA[
characters with markup
]]>
The above syntax is composed of three sections −
CDATA Start section − CDATA begins with the nine-character
delimiter <![CDATA[
Syntax
Following is the syntax of PI −
<?target instructions?>
Where
target − Identifies the application to which the instruction is
directed.
instruction − A character that describes the information for the
application to process.
A PI starts with a special tag <? and ends with ?>. Processing of the
contents ends immediately after the string ?> is encountered.
XML - Encoding
Encoding is the process of converting unicode characters into
their equivalent binary representation.
When the XML processor reads an XML document, it encodes the
document depending on the type of encoding.
Hence, we need to specify the type of encoding in the XML
declaration.
Encoding Types
There are mainly two types of encoding −
UTF-8
UTF-16
UTF stands for UCS Transformation Format, and UCS itself
means Universal Character Set.
The number 8 or 16 refers to the number of bits used to represent
a character.
They are either 8(one byte) or 16(two bytes). For the documents
without encoding information, UTF-8 is set by default.
XML - Encoding
Syntax
Encoding type is included in the prolog section of the
XML document.
The syntax for UTF-8 encoding is as follows −
<?xml version = "1.0" encoding = "UTF-8"
standalone = "no" ?>
The syntax for UTF-16 encoding is as follows −
<?xml version = "1.0" encoding = "UTF-16"
standalone = "no" ?>
XML - Validation
Validation is a process by which an XML document is
validated.
An XML document is said to be valid if its contents
match with the elements, attributes and associated
document type declaration(DTD), and if the
document complies with the constraints expressed in
it.
Validation is dealt in two ways by the XML parser.
They are −
Well-formed XML document
Valid XML document
Well-formed XML Document
Well-formed XML Document
An XML document is said to be well-formed if it adheres to
the following rules −
Non DTD XML files must use the predefined character entities
for amp(&), apos(single quote), gt(>), lt(<), quot(double quote).
It must follow the ordering of the tag. i.e., the inner tag
must be closed before closing the outer tag.
Each of its opening tags must have a closing tag or it must
be a self ending tag.(<title>....</title> or <title/>).
It must have only one attribute in a start tag, which needs to be
quoted.
amp(&), apos(single quote), gt(>), lt(<), quot(double quote)
entities other than these must be declared.
Example
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
<!DOCTYPE address
[
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
Valid XML Document
If an XML document is well-formed and has an
associated Document Type Declaration (DTD), then it
is said to be a valid XML document.
XML - DTDs
The XML Document Type Declaration, commonly known
as DTD, is a way to describe XML language precisely.
DTDs check vocabulary and validity of the structure of
XML documents against grammatical rules of appropriate
XML language.
An XML DTD can be either specified inside the document,
or it can be kept in a separate document and then liked
separately.
Syntax
Basic syntax of a DTD is as follows −
<!DOCTYPE element DTD identifier
[
declaration1
declaration2
........
XML - DTDs
In the above syntax,
The DTD starts with <!DOCTYPE delimiter.
An element tells the parser to parse the document
from the specified root element.
DTD identifier is an identifier for the document type
definition, which may be the path to a file on the
system or URL to a file on the internet. If the DTD is
pointing to external path, it is called External Subset.
The square brackets [ ] enclose an optional list of
entity declarations called Internal Subset.
Internal DTD
A DTD is referred to as an internal DTD if elements are
declared within the XML files.
To refer it as internal DTD, standalone attribute in XML
declaration must be set to yes. This means, the declaration works
independent of an external source.
Syntax
Following is the syntax of internal DTD −
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
External DTD
In external DTD elements are declared outside the XML
file.
They are accessed by specifying the system attributes
which may be either the legal .dtd file or a valid URL.
To refer it as external DTD, standalone attribute in the XML
declaration must be set as no.
This means, declaration includes information from the
external source.
Syntax
Following is the syntax for external DTD −
<!DOCTYPE root-element SYSTEM "file-name">
where file-name is the file with .dtd extension.
External DTD
Example
The following example shows external DTD usage −
document.getElementById("name").innerHTML=
xmlDoc.getElementsByTagName("name")
[0].childNodes[0].nodeValue;
document.getElementById("company").innerHTML=
xmlDoc.getElementsByTagName("company")
[0].childNodes[0].nodeValue;
document.getElementById("phone").innerHTML=
xmlDoc.getElementsByTagName("phone")
[0].childNodes[0].nodeValue;
</script>
</body>
</html>
<?xml version = "1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
XML - Namespaces
A Namespace is a set of unique names. Namespace is a mechanisms
by which element and attribute name can be assigned to a group. The
Namespace is identified by URI(Uniform Resource Identifiers).
Namespace Declaration
A Namespace is declared using reserved attributes. Such an attribute
name must either be xmlns or begin with xmlns: shown as below −
XML- enabled
Native XML (NXD)
XML - Databases
XML - Enabled Database
XML enabled database is nothing but the extension provided
for the conversion of XML document. This is a relational
database, where data is stored in tables consisting of rows and
columns. The tables contain set of records, which in turn
consist of fields.