Indian Institute of Technology Kharagpur
Extensible Markup Language
(XML)
Prof. Indranil Sen Gupta
Dept. of Computer Science & Engg.
I.I.T. Kharagpur, INDIA
Lecture 16: Extensible Markup
Language (XML)
On completion, the student will be able to:
1. Explain the structure of a XML document.
2. Explain the different types of document type
declarations.
3. Explain the basic concepts of simple and
extended links.
1
Introduction
• What is XML?
¾A markup language for creating documents
containing structured information.
¾Markup language
Mechanism to identify structures in a document.
¾Structured information:
Contains content (text, image, etc.)
Contains indication of what role the content
plays (e.g., heading, footnote, address, etc.)
XML vs. HTML
• Both are markup languages, but
there are differences.
¾In HTML, both the tag set and tag
semantics are predefined and fixed.
¾XML specifies neither a tag set nor
semantics.
Provides facility to define tags.
Semantics defined by applications that
process the documents (or by stylesheets).
XML is thus a meta-language for describing
markup languages.
2
XML Development Goals
• It should be easy to use XML over the
Internet.
• XML shall support a wide variety of
applications.
• It shall be easy to write programs that
process XML documents.
• The number of optional features in XML is
kept to a minimum (zero, ideally).
• Design of XML shall be formal and
concise.
• XML documents should be easy to create.
How is XML Defined?
• XML is defined by the following
specifications:
¾Extensible Markup Language (XML) 1.0
Defines the syntax of XML.
¾XML Pointer Language (XPointer) and
XML Linking language (XLink)
Defines a standard way to represent link
between resources.
¾Extensible Style Language (XSL)
Defines the standard stylesheet language
for XML.
3
An Example XML Document
<?xml version=“1.0”?>
<quotation>
<isay> Hello, how are you </isay>
<yousay> I am not well </yousay>
<frown/>
</quotation>
Structure of a XML Document
• An XML document consists of:
¾Prolog
¾Elements
¾Attributes
¾Entity references
¾Comments
4
XML: Prolog
• The Prolog is the first structural
element that is present in the XML
document.
• Usually divided into an XML
declaration and an (optional) DTD.
• Example:
<? xml version=“1.0” encoding=“UTF-8” ?>
<? Xml version=“1.0” ?>
XML: Elements
• Elements are most common form of markup.
• XML elements must contain a start tag and a
matching end tag prefixed by a slash.
<city> Kharagpur </city>
• Empty elements can be written as <city/>
instead of both tags without contents.
• Remember …. XML is case-sensitive.
• Element naming convention:
¾ Must begin with an underscore or letter.
¾ Can contain letters, digits, underscore, hyphen, and
periods.
5
XML: Attributes
• XML attributes are attached to
elements.
¾They are name-value pairs that occur
inside start-tags after the element name.
¾Must begin with a letter or an
underscore.
¾Must not contain any white spaces.
<faculty name=“Indranil Sen Gupta”>
[email protected] </faculty>
XML: Entity References
• They are used to reference data that
is not directly in the structure.
¾Can be internal or external.
¾Built-in entity references are used to
represent &, <, >, “ and ‘.
¾The string
Tom&Jerry(“Don’t write x<y”)
would be written as
Tom&Jerry("Don't write
x<y")
6
¾A special form of entity reference,
called a character reference, can be
used to insert arbitrary Unicode
characters in the document.
Decimal references: ℞
Hexadecimal references: ℞
== > Refers to the Rx prescription symbol.
XML: Comments
• Comments begin with <!-- and end
with --> .
• Can contain any data except the
literal string “--”.
• All data between these two tags are
ignored by the XML processor.
7
Processing Instructions
• Used to provide information to an
application.
¾Like comments, they are not textually part
of the XML document.
¾The XML processor is required to pass
them to an application.
• They have the form:
<?name pidata?>
¾PI names beginning with xml are reserved.
CDATA Sections
• A CDATA section instructs the XML parser
to ignore most markup characters.
• An example:
<![CDATA[
temp = *p;
*p = *q;
*q = temp;
if (temp < 0) temp = -temp;
]]>
• All character data in between is passed to
the application without interpretation.
8
Document Type Declarations (DTD)
• XML allows us to create our own tag names.
• DTD allows a document to send meta
information to the parser about its contents.
¾Sequence and ordering of tags, etc.
• Four kinds of declarations in XML:
¾Element type declarations
¾Attribute list definitions
¾Entity declarations
¾Notation declarations
Element Type declaration
• They identify the names of the elements
and the nature of their content.
¾Elements can contain simple, predefined
data types.
¾They can refer to other elements.
¾They can be defined w.r.t. their cardinality.
• Example:
<xsd:element name = “faculty”
type = “xsd:string”
maxOccurs = “unbounded”>
9
Attribute List Declaration
• Like elements, attributes must have a
name and type.
¾Attributes can use custom data types.
¾They can be restricted w.r.t. cardinality or
default values.
¾They can refer to other attribute definitions.
• Example:
<xsd:attribute name = “city”
type = “xsd:string”
fixed = “Kharagpur”/>
Entity Declarations
• They allow us to associate a name with
some other fragment or content.
• Two types:
¾Internal entities
They associate a name with a string of
literal text.
Five predefined entities are predefined:
<, >, &, &apos, "
10
¾External entities:
They associate a name with the
contents of another file.
The contents of the (text) file is
inserted at the point of reference.
Example:
<!ENTITY IITLOGO
SYSTEM “/institute/logo.gif>
Notation Declarations
• They identify specific types of
external binary data.
• This information gets forwarded to
the processing application.
• Example:
<!NOTATION GIF87A SYSTEM “GIF”>
11
Linking Documents in XML
• The XPointer and XLink specifications
provide a standard linking model for XML.
• We look into some of the features of XLink.
¾ Gives us control over the semantics of the link.
¾ Introduces the concept of Extended Links, which
can involve more than two resources.
• XML processors identify links by identifying
the attribute “xml:link”.
Simple Links
• Strongly resembles an HTML <A> link.
<link xml:link=“simple”
href=“http://www.iitkgp.ac.in”>
Our Institute Home page </link>
• The simple link identifies a link between
two resources, one of which is the content
of the linking element itself.
12
Extended Links
• They allow us to express relationships
between more than two resources.
<elink xml:link=“extended”>
<locator xml:link=“locator” href=“text.htm”>
Some text here </locator>
<locator xml:link=“locator” href=“face.jpg”>
Photo of the face </locator>
……..
</elink>
Issue of White Space
• By default, white space in a XML
document is not significant.
• We can change this:
¾The special attribute xml:space can be
used to specify that white space is
significant.
On any element which includes the
attribute specification
xml:space=‘preserve’
all white spaces would be significant.
13
Including a DTD
<?XML version=“1.0” standalone=“no” ?>
<!DOCTYPE chapter SYSTEM “mybook.dtd” [
………
………
]>
<chapter>
……..
……..
</chapter>
Validity of XML Documents
• Two categories of XML documents:
¾Well-formed
If the document obeys the syntax of XML.
Can be parsed.
¾Valid
A well-formed document is valid only if it
contains a proper DTD, and if the document
obeys the constraints of the declaration.
14
Standard XML Languages
• Synchronized Multimedia Integration
Language (SMIL)
¾An XML language for combining audio,
video, text and graphics in a precise,
synchronized fashion.
• Scalable Vector Graphics (SVG)
¾A language for specifying two dimensional
graphics in XML.
• Mathematical Markup Language (MathML)
¾An XML application for describing
mathematical notation and capturing both
its structure and contents.
• Wireless Markup Language (WML)
¾An XML application for marking up
documents to be delivered to handheld
devices.
• Chemical Markup Language (CML)
¾Used for managing and presenting
molecular and technical information.
• Open Financial Exchange (OFX)
¾An XML application for describing
financial transactions that take place
over the Internet.
15
To Summarize
• We have discussed most of the major
features of XML.
• Details and complete examples were
beyond the scope of the discussion.
• With this background, XML documents
can be interpreted and understood
without much difficulty.
16
SOLUTIONS TO QUIZ
QUESTIONS ON
LECTURE 15
Quiz Solutions on Lecture 15
1. What are the HTML tags associated with
table definitions?
<TABLE>, <TH>, <TD>, <TR>
2. How do you specify table entries spanning
multiple columns?
By using the rowspan and colspan
attributes associated with the <td> tag.
17
Quiz Solutions on Lecture 15
3. What is the purpose of the <FRAMESET>
tag>
It is used to define a collection of
frames, The <FRAME> tag can be
embedded inside it.
4. What is the purpose of the <NOFRAMES>
tag?
To handle browsers that do not support
frames.
Quiz Solutions on Lecture 15
5. What does “*” signify when specifying
the width/height of a frame?
“*” specifies the relative value with
respect to the available space.
6. What does “%” signify when specifying
the width/height of a frame?
It specifies the percentage of the
available space.
18
Quiz Solutions on Lecture 15
7. What is inline style for specifying style
sheets? Give an example.
Where the style is specified “in-line” as
part of the same document.
<H2 style = “color: blue”> This will
appear as blue. </H2>
8. What is external style for specifying style
sheets?
All styles exist in a separate document,
a link to which is specified.
QUIZ QUESTIONS ON
LECTURE 16
19
Quiz Questions on Lecture 16
1. What is a markup language?
2. What are the three main specifications
defining XML?
3. Give an example of an XML element? How
can an empty element be specified?
4. What is an XML attribute? Give an example.
5. Using entity reference, how will the string
“Hello ma’m” be represented?
6. How do you insert comments in XML?
Quiz Questions on Lecture 16
7. Why is the CDATA section used?
8. What do element type declaration do?
9. What do attribute list declaration do?
10. Give an example of simple link.
11. How do you specify extended links in
XML?
12. How do you retain white spaces in the
document?
20