UNIT 1
UNIT 1
HTML stands for Hypertext Markup Language, and it is the most widely used language to write Web
Pages.
Hypertext refers to the way in which Web pages (HTML documents) are linked together. Thus, the link
available on a webpage is called Hypertext.
As its name suggests, HTML is a Markup Language which means you use HTML to simply
"mark-up" a text document with tags that tell a Web browser how to structure it to display.
Originally, HTML was developed with the intent of defining the structure of documents like headings,
paragraphs, lists, and so forth to facilitate the sharing of scientific information between researchers.
HTM or HTML Extension?
When you save an HTML file, you can use either the .htm or the .html extension. The .htm extension
comes from the past when some of the commonly used software only allowed three letter extensions. It is
perfectly safe to use either .html or .htm, but be consistent. mypage.htm and mypage.html are treated as
different files by the browser.
Basic HTML Document
In its simplest form, following is an example of an HTML document:
<!DOCTYPE html>
<html>
<head>
<title>This is document title</title>
</head>
<body>
<h1>This is a heading</h1>
<p>Document content goes here.....</p>
</body>
</html>
HTML Tags
What are HTML tags?
HTML tags are used to mark-up HTML elements
HTML tags are surrounded by the two characters < and >
The surrounding characters are called angle brackets
HTML tags normally come in pairs like <b> and </b>
The first tag in a pair is the start tag, the second tag is the end tag
The text between the start and end tags is the element content
HTML tags are not case sensitive, <b> means the same as <B>
Tag Description
<html> Defines an HTML document
<body> Defines the document's body
<h1> to <h6> Defines header 1 to header 6
<p> Defines a paragraph
<br> Inserts a single line break
<hr> Defines a horizontal rule
<!--> Defines a comment
Headings
Headings are defined with the <h1> to <h6> tags. <h1> defines the largest heading while <h6> defines the
smallest.
<h1>This is a heading</h1>
<h2>This is a heading</h2>
<h3>This is a heading</h3>
<h4>This is a heading</h4>
<h5>This is a heading</h5>
<h6> This is a heading</h6>
HTML automatically adds an extra blank line before and after a heading. A useful heading attribute is
align.
<h5 align="left">I can align headings </h5>
<h5 align="center">This is a centered heading </h5>
<h5 align="right">This is a heading aligned to the right </h5>
Paragraphs
Paragraphs are defined with the <p> tag. Think of a paragraph as a block of text. You can use the align
attribute with a paragraph tag as well.
The horizontal rule does not have a closing tag. It takes attributes such as align and width. For
instance:
Comments in HTML
The comment tag is used to insert a comment in the HTML source code. A comment can be placed
anywhere in the document and the browser will ignore everything inside the brackets. You can use
comments to write notes to yourself, or write a helpful message to someone looking at your source
code.
HTML automatically adds an extra blank line before and after some elements, like before and after a
paragraph, and before and after a heading. If you want to insert blank lines into your document, use
the <br> tag.
Try It Out!
Open your text editor and type the following text:
<html>
<head>
<title>My First Webpage</title>
</head>
<body>
<h1 align="center">My First Webpage</h1>
<p>Welcome to my first web page. I am writing this page using a text editor and plain
old html.</p>
<p>By learning html, I'll be able to create web pages like a pro....<br>
which I am of course.</p>
</body>
</html>
HTML – ELEMENTS
An HTML element is defined by a starting tag. If the element contains other content, it ends with a closing
tag, where the element name is preceded by a forward slash.
This is an HTML element:
<b>This text is bold</b>
The HTML element begins with a start tag: <b>
The content of the HTML element is: This text is bold
The HTML element ends with an end tag: </b>
The purpose of the <b> tag is to define an HTML element that should be displayed as bold.
This is also an HTML element:
<body>
This is my first homepage. <b>This text is bold</b>
</body>
This HTML element starts with the start tag <body>, and ends with the end tag </body>. The purpose of
the <body> tag is to define the HTML element that contains the body of the HTML document.
HTML – ATTRIBUTES
We have seen few HTML tags and their usage like heading tags <h1>, <h2>, paragraph tag <p> and other
tags. We used them so far in their simplest form, but most of the HTML tags can also have attributes,
which are extra bits of information.
An attribute is used to define the characteristics of an HTML element and is placed inside the element's
opening tag. All attributes are made up of two parts: a name and a value:
The name is the property you want to set. For example, the paragraph <p> element in the example
carries an attribute whose name is align, which you can use to indicate the alignment of paragraph on the
page.
The value is what you want the value of the property to be set and always put within quotations. The
below example shows three possible values of align attribute: left, center and right.
Attribute names and attribute values are case-insensitive. However, the World Wide Web Consortium
(W3C) recommends lowercase attributes/attribute values in their HTML 4 recommendation.
Example
<!DOCTYPE html>
<html>
<head>
<title>Align Attribute Example</title>
</head>
<body>
<p align="left">This is left aligned</p>
<p align="center">This is center aligned</p>
<p align="right">This is right aligned</p>
</body>
</html>
HTML – FORMATTING
Bold Text
Anything that appears within <b>...</b> element, is displayed in bold as shown below:
Example
<!DOCTYPE html>
<html>
<head>
<title>Bold Text Example</title>
</head>
<body>
<p>The following word uses a <b>bold</b> typeface.</p>
</body>
</html>
This will produce the following result:
The following word uses a bold typeface.
Italic Text
Anything that appears within <i>...</i> element is displayed in italicized as shown below:
Example
<!DOCTYPE html>
<html>
<head>
<title>Italic Text Example</title>
</head>
<body>
<p>The following word uses a <i>italicized</i> typeface.</p>
</body>
</html>
This will produce the following result:
The following word uses an italicized typeface.
Underlined Text
Anything that appears within <u>...</u> element, is displayed with underline as shown below:
Example
<!DOCTYPE html>
<html>
<head>
<title>Underlined Text Example</title>
</head>
<body>
<p>The following word uses a <u>underlined</u> typeface.</p>
</body>
</html>
This will produce the following result:
The following word uses an underlined typeface.
Strike Text
Anything that appears within <strike>...</strike> element is displayed with strikethrough, which is a thin
line through the text as shown below:
Example
<!DOCTYPE html>
<html>
<head>
<title>Strike Text Example</title>
</head>
<body>
<p>The following word uses a <strike>strikethrough</strike> typeface.</p>
</body>
</html>
This will produce the following result:
The following word uses a strikethrough typeface.
Superscript Text
The content of a <sup>...</sup> element is written in superscript; the font size used is the same size as the
characters surrounding it but is displayed half a character's height above the other characters.
Example
<!DOCTYPE html>
<html>
<head>
<title>Superscript Text Example</title>
</head>
<body>
<p>The following word uses a <sup>superscript</sup> typeface.</p>
</body>
</html>
This will produce the following result:
The following word uses a superscript typeface.
Subscript Text
The content of a <sub>...</sub> element is written in subscript; the font size used is the same as the
characters surrounding it, but is displayed half a character's height beneath the other characters.
Example
<!DOCTYPE html>
<html>
<head>
<title>Subscript Text Example</title>
</head>
<body>
<p>The following word uses a <sub>subscript</sub> typeface.</p>
</body>
</html>
This will produce the following result:
The following word uses a subscript typeface.
HTML Fonts
The <font> tag in HTML is deprecated. The World Wide Web Consortium (W3C) has removed the <font>
tag from its recommendations. In future versions of HTML, style sheets (CSS) will be used to define the
layout and display properties of HTML elements.
The <font> Tag Should NOT be used.
HTML Backgrounds
Backgrounds
The <body> tag has two attributes where you can specify backgrounds. The background can be a color or
an image.
Bgcolor
The bgcolor attribute specifies a background-color for an HTML page. The value of this attribute can be a
hexadecimal number, an RGB value, or a color name:
<body bgcolor="#000000">
<body bgcolor="rgb(0,0,0)">
<body bgcolor="black">
The lines above all set the background-color to black.
Background
The background attribute can also specify a background-image for an HTML page. The value of this
attribute is the URL of the image you want to use. If the image is smaller than the browser window, the
image will repeat itself until it fills the entire browser window.
<body background="clouds.gif"> <body
background="http://profdevtrain.austincc.edu/html/graphics/clouds.gif">
The URL can be relative (as in the first line above) or absolute (as in the second line above).
If you want to use a background image, you should keep in mind:
Will the background image increase the loading time too much?
Will the background image look good with other images on the page?
Will the background image look good with the text colors on the page?
Will the background image look good when it is repeated on the page?
Will the background image take away the focus from the text?
Example
<html>
<head>
<title>My First Webpage</title>
</head>
<body background="http://profdevtrain.austincc.edu/html/graphics/clouds.gif" bgcolor="#EDDD9E">
<h1 align="center">My First Webpage</h1>
<p>Welcome to my <strong>first</strong> webpage. I am writing this page using a text editor and plain
old html.</p>
<p>By learning html, I'll be able to create webpages like a <del>beginner</del> pro....<br>
which I am of course.</p>
</body>
</html>
HTML Lists
HTML provides a simple way to show unordered lists (bullet lists) or ordered lists (numbered lists).
Unordered Lists
An unordered list is a list of items marked with bullets (typically small black circles). An unordered list
starts with the <ul> tag. Each list item starts with the <li> tag.
This Code Would Display
<ul> Coffee
<li>Coffee</li> Milk
<li>Milk</li>
</ul>
Ordered Lists
An ordered list is also a list of items. The list items are marked with numbers. An ordered list starts with
the <ol> tag. Each list item starts with the <li> tag.
This Code Would Display
<ol> 1. Coffee
<li>Coffee</li> 2. Milk
<li>Milk</li>
</ol>
Inside a list item you can put paragraphs, line breaks, images, links, other lists, etc.
Definition Lists
Definition lists consist of two parts: a term and a description. To mark up a definition list, you need three
HTML elements; a container <dl>, a definition term <dt>, and a definition description <dd>.
HTML Images
The Image Tag and the Src Attribute
The <img> tag is empty, which means that it contains attributes only and it has no closing tag. To display
an image on a page, you need to use the src attribute. Src stands for "source". The value of the src attribute
is the URL of the image you want to display on your page. The syntax of defining an image:
Not only does the source attribute specify what image to use, but where the image is located. The above
image, graphics/chef.gif, means that the browser will look for the image name chef.gif in a graphics folder
in the same folder as the html document itself.
Image Dimensions
When you have an image, the browser usually figures out how big the image is all by itself. If you put in
the image dimensions in pixels however, the browser simply reserves a space for the image, then loads the
rest of the page. Once the entire page is loads it can go back and fill in the images. Without dimensions,
when it runs into an image, the browser has to pause loading the page, load the image, then continue
loading the page. The chef image would then be:
<img src="graphics/chef.gif" width="130" height="101" alt="Smiling Happy Chef">
Tables
Tables are defined with the <table> tag. A table is divided into rows (with the <tr> tag), and each row is
divided into data cells (with the <td> tag). The letters td stands for table data, which is the content of a data
cell. A data cell can contain text, images, lists, paragraphs, forms, horizontal rules, tables, etc.
Table Tags
Tag Description
<table> Defines a table
<th> Defines a table header
<tr> Defines a table row
<td> Defines a table cell
<caption> Defines a table caption
<colgroup> Defines groups of table columns
<col> Defines the attribute values for one
or more columns in a table
Tables and the Border Attribute
To display a table with borders, you will use the border attribute.
Headings in a Table
Headings in a table are defined with the <th> tag.
Cellspacing is the pixel width between the individual data cells in the table (The thickness of the lines
making the table grid). The default is zero. If the border is set at 0, the cellspacing lines will be invisible.
Cellpadding is the pixel space between the cell contents and the cell border. The default for this property is
also zero. This feature is not used often, but sometimes comes in handy when you have your borders turned
on and you want the contents to be away from the border a bit for easy viewing. Cellpadding is invisible,
even with the border property turned on. Cellpadding can be handled in a style sheet.
What Is a Markup Language?
A markup language is a set of rules that defines how the layout and presentation of text
and images should appear in a digital document. It allows structuring documents,
adding formatting, and specifying how different elements should be displayed (or
“rendered”) on webpages. This structuring helps search engines like Google understand
the information on websites better.
XML Basics
XML stands for Extensible Markup Language and is a text-based markup language
derived from Standard Generalized Markup Language (SGML).
XML is a software- and hardware-independent tool for storing and transporting data.
XML tags identify the data and are used to store and organize the data, rather than
specifying how to display it like HTML tags, which are used to display the data. XML
is not going to replace HTML in the near future, but it introduces new possibilities by
adopting many successful features of HTML.
1
There are three important characteristics of XML that make it useful in a variety of
systems and solutions:
XML is extensible: XML allows you to create your own self-descriptive tags, or
language, that suits your application.
XML carries the data, does not present it: XML allows you to store the data
irrespective of how it will be presented.
Comprehension by humans: The readability aspect of XML files means they are
simple to edit and keep up with.
Web development: XML is used to store and exchange data in web applications.
API: XML is often used to implement APIs. APIs allow different applications to
communicate with each other and exchange data.
2
Features of XML
XML has a number of features that make it a popular data format. These features
include:
Flexibility: XML is a flexible data format as it allows you to create your own
custom tags to meet your specific needs.
Human-readability: XML files are also human-readable, which makes them easy
to edit and maintain.
3
4
XML Syntax:
<?xml version="1.0"?>
<contact_info>
<name>Rajesh</name>
<company>TCS</company>
<phone>9333332354</phone>
</contact_info>
You can notice there are two kinds of information in the above example:
The following diagram depicts the syntax rules to write different types of markup and
text in an XML document.
XML Example
XML documents create a hierarchical structure looks like a tree so it is known as XML
Tree that starts at "the root" and branches to "the leaves".
The first line is the XML declaration. It defines the XML version (1.0) and the encoding
used (ISO-8859-1 = Latin-1/West European character set).
1
The next line describes the root element of the document (like saying: "this document is
a note"):
1. <note>
The next 4 lines describe 4 child elements of the root (to, from, heading, and body).
And finally the last line defines the end of the root element.
1. </note>
XML documents must contain a root element. This element is "the parent" of all other
elements.
The elements in an XML document form a document tree. The tree starts at the root and
branches to the lowest level of the tree.
1. <root>
2. <child>
3. <subchild>.....</subchild>
4. </child>
5. </root>
The terms parent, child, and sibling are used to describe the relationships between
elements. Parent elements have children. Children on the same level are called siblings
(brothers or sisters).
All elements can have text content and attributes (just like in HTML).
File: books.xml
<bookstore>
<book category="COOKING">
<title>Everyday Indian</title>
<author>XYZ</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
2
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
The root element in the example is <bookstore>. Allelements in the document are
contained within <bookstore>.
The <book> element has 4 children: <title>,< author>, <year> and <price>.
XML Declaration
The XML document can optionally have an XML declaration. It is written as below:
Where version is the XML version and encoding specifies the character encoding used
in the document.
XML Tags
XML tags are the important features of XML document. It is similar to HTML but
XML is more flexible then HTML. It allows to create new tags (user defined tags).
The first element of XML document is called root element. The simple XML
document contain opening tag and closing tag. The XML tags are case sensitive i.e.
<root> and <Root> both tags are different. The XML tags are used to define the
scope of elements in XML document.
3
Property of XML Tags:
1. XML tags are case-sensitive. Following line of code is an example of wrong syntax.
<address>This is wrong syntax</Address>
Following code shows a correct way, where we use the same case to name the start
And the end tag.
<address>This is correct syntax</address>
2. XML tags must be closed in an appropriate order, i.e., an XML tag opened inside
another element must be closed before the outer element is closed.
For example:
<outer_element>
<internal_element>
This tag is closed before the outer_element
</internal_element>
</outer_element>
XML Elements
XML elements can be defined as building blocks of an XML. Elements can behave as
containers to hold text, elements, attributes, media objects or all of these.
Each XML document contains one or more elements, the scope of which are either
delimited by start and end tags, or for empty elements, by an empty-element tag.
Syntax
where,
element-name is the name of the element. The name its case in the start and
end tags must match.
attribute1, attribute2 are attributes of the element separated by white spaces.
An attribute defines a property of the element. It associates a name with a value,
which is a string of characters. An attribute is written as −
name = "value"
name is followed by an = sign and a string value inside double(" ") or single(' ') quotes.
Empty Element
XML Attributes
Attributes are part of XML elements. An element can have multiple unique attributes.
Attribute gives more information about XML elements. To be more precise, they define
properties of elements. An XML attribute is always a name-value pair.
Syntax
An XML attribute has the following syntax −
<element-name attribute1 attribute2 >
....content..
< /element-name>
where attribute1 and attribute2 has the following form −
name = "value"
value has to be in double (" ") or single (' ') quotes. Here, attribute1 and attribute2 are
unique attribute labels.
Attributes are used to distinguish among elements of the same name, when you do not
want to create a new element for every situation. Hence, the use of an attribute can add
a little more detail in differentiating two or more similar elements.
Example:-
<?xml version = "1.0" encoding = "UTF-8"?>
<!DOCTYPE garden [
<!ELEMENT garden (plants)*>
<!ELEMENT plants (#PCDATA)>
<!ATTLIST plants category CDATA #REQUIRED>
]>
<garden>
<plants category = "flowers" />
5
<plants category = "shrubs">
</plants>
</garden>
In the above example, we have categorized the plants by including attribute category
and assigning different values to each of the elements. Hence, we have two categories
of plants, one flowers and other shrubs. Thus, we have two plant elements with
different attributes.
Attribute Types
Attribute
Description
Type
It takes any literal string as a value. CDATA is a StringType. CDATA is
String Type character data. This means, any string of non-markup characters is a
legal part of the attribute.
This is a more constrained type. The validity constraints noted in the
grammar are applied after the attribute value is normalized. The
Tokenized Type attributes are given as −
This has a list of predefined values in its declaration, out of which, it must
assign one value. There are two types of enumerated attribute −
An attribute name must not appear more than once in the same start-tag or
empty-element tag.
An attribute must be declared in the Document Type Definition (DTD) using an
Attribute-List Declaration.
Attribute values must not contain direct or indirect entity references to external
entities.
6
The replacement text of any entity referred to directly or indirectly in an attribute
value must not contain a less than sign (<)
7
XML comments
XML comments are similar to HTML comments. The comments are added as notes or
lines for understanding the purpose of an XML code.
Comments can be used to include related links, information, and terms. They are visible
only in the source code; not in the XML code. Comments may appear anywhere in XML
code.
Syntax
<!--Your comment-->
A comment starts with <!-- and ends with -->. You can add textual notes as comments
between the characters. You must not nest one comment inside the other.
Example
XML Technologies
XML technologies are defined as processing the XML application and to build a web
different XML variants are used in the XML document to process the activities.
Extensible Mark-up Language (XML) is a modern Mark-up language with platform-
independent that why it’s been popular as it adapts to new technologies and supports a
wide variety of applications. The main reason is XML supports Unicode as it is easy to
communicate if written in human language. With efficient data sharing, any new
programming languages can read and process an XML file.
1
No. Technology Meaning Description
It is a clearer and stricter version of XML. It belongs
to the family of XML markup languages. It was
1) XHTML Extensible html
developed to make html more extensible and
increase inter-operability with other data.
It is a standard document model that is used to
XML document
2) XML DOM access and manipulate XML. It defines the XML file in
object model
tree structure.
XSL
it contain
three parts:
Extensible style i) It transforms XML into other formats, like html.
3) i) XSLT (xsl
sheet language ii) It is used for formatting XML to screen, paper etc.
transform)
iii) It is a language to navigate XML documents.
ii) XSL
iii)XPath
XML query It is a XML based language which is used to query
4) XQuery
language XML based data.
Document type It is a standard which is used to define the legal
5) DTD
definition elements in an XML document.
XML schema It is an XML based alternative to DTD. It is used to
6) XSD
definition describe the structure of an XML document.
xlink stands for XML linking language. This is a
XML linking
7) XLink language for creating hyperlinks (external and
language
internal links) in XML documents.
It is a system for addressing components of XML
XML pointer
8) XPointer based internet media. It allows the xlink hyperlinks to
language
point to more specific parts in the XML document.
It is an acronym stands simple object access
protocol. It is XML based protocol to let applications
Simple object
9) SOAP exchange information over http. in simple words you
access protocol
can say that it is protocol used for accessing web
services.
web services It is an XML based language to describe web
10) WSDL description services. It also describes the functionality offered by
languages a web service.
RDF is an XML based language to describe web
Resource resources. It is a standard model for data
11) RDF description interchange on the web. It is used to describe the
framework title, author, content and copyright information of a
web page.
It is an XML based vector image format for two-
Scalable vector
12) SVG dimensional images. It defines graphics in XML
graphics
format. It also supports animation.
RSS is a XML-based format to handle web content
Really simple
13) RSS syndication. It is used for fast browsing for news and
syndication
updates. It is generally used for news like sites.
2
XML Tree Structure:
An XML document has a self-descriptive structure. It forms a tree structure which is
referred as an XML tree. The tree structure makes easy to describe an XML document.
A tree structure contains root element (as parent), child element and so on. It is very
easy to traverse all succeeding branches and sub-branches and leaf nodes starting from
the root.
1
Example-2 XML Document
<?xml version="1.0"?>
<college>
<student>
<firstname>Tamanna</firstname>
<lastname>Bhatia</lastname>
<contact>09990449935</contact>
<email>[email protected]</email>
<address>
<city>Ghaziabad</city>
<state>Uttar Pradesh</state>
<pin>201007</pin>
</address>
</student>
</college>
These rules are used to figure out the relationship of the elements. It shows if an
element is a child or a parent of the other element.
Ancestors: The containing element which contains other elements is called "Ancestor"
of other element. In the above example Root element (College) is ancestor of all other
elements.
XML Namespaces
A Namespace is a set of unique names. Namespace is a mechanism by which element
and attribute name can be assigned to a group. The Namespace is identified by URI
2
(Uniform Resource Identifiers).
Name Conflicts
In XML, element names are defined by the developer. This often results in a conflict when
trying to mix XML documents from different XML applications.
<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
<table>
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
If these XML fragments were added together, there would be a name conflict. Both
contain a <table> element, but the elements have different content and meaning.
A user or an XML application will not know how to handle these differences.
This XML carries information about an HTML table, and a piece of furniture:
<h:table>
<h:tr>
3
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
In the example above, there will be no conflict because the two <table> elements have
different names.
XML Namespaces - The xmlns Attribute
When using prefixes in XML, a namespace for the prefix must be defined.
The namespace can be defined by an xmlns attribute in the start tag of an element.
Syntax
The Namespace starts with the keyword xmlns.
The word name is the Namespace prefix.
The URL is the Namespace identifier.
<root>
<h:table xmlns:h="http://www.w3.org/TR/html4/">
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table xmlns:f="https://www.w3schools.com/furniture">
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
The xmlns attribute in the first <table> element gives the h: prefix a qualified
namespace.
The xmlns attribute in the second <table> element gives the f: prefix a qualified
namespace.
When a namespace is defined for an element, all child elements with the same prefix are
associated with the same namespace.
4
Note: The namespace URI is not used by the parser to look up information.
However, companies often use the namespace as a pointer to a web page containing
namespace information.
5
XML Validator
Validation is a process by which an XML document is validated. An XML document is
said to be valid if its contents match with the elements, attributes, and associated
document type declaration (DTD), and if the document complies with the constraints
expressed in it. Validation is dealt in two ways by the XML parser. They are:
Well-formed XML document
Valid XML document
Well Formed XML Documents :- An XML document with correct syntax is called
"Well Formed".
Example 1
Following is an example of a well-formed XML document:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE address
[
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
1
Example 2:
<?xml version=”1.0”?>
<book>
<title>Java</Title>
<author>James</book>
<pirce>570
</author>
The above XML document is not a well formed document. Reasons given below...
tags are not matching <title> … </Title>
There is no proper nesting <author>….</book>
Tag doesn’t closed <price>
Valid XML Documents :-
A "well formed" XML document is not the same as a "valid" XML document.
A "valid" XML document must be well formed. In addition, it must conform to a
document type definition.
There are two different document type definitions that can be used with XML:
A document type definition defines the rules and the legal elements and attributes for
an XML document.
The XML Document Type Declaration, commonly known as DTD, is a way to describe
XML language precisely. DTDs check vocabulary and validity of the structure of XML
documents against grammatical rules of appropriate XML language.
Syntax
The purpose of a DTD is to define the structure of an XML document. It defines the
structure with a list of legal elements.
<!DOCTYPE book [
<!ELEMENT book
(title,author,price)> <!ELEMENT
title (#PCDATA)> <!ELEMENT author
(#PCDATA)> <!ELEMENT price
(#PCDATA)> ]>
!DOCTYPE book defines that the root element of the document is book
!ELEMENT book defines that the book element must contain the elements:
"title, author, price”
!ELEMENT title defines the title element to be of type "#PCDATA"
!ELEMENT author defines the author element to be of type "#PCDATA"
!ELEMENT price defines the price element to be of type "#PCDATA"
2) External DTD.
1. Internal DTD
A DTD is referred to as an internal DTD if elements are declared within the XML files. To
refer it as internal DTD, standalone attribute in XML declaration must be set to yes. This
means, the declaration works independent of an external source.
Syntax
where root-element is the name of root element and element-declarations is where you
declare the elements.
3
Example
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
Start Declaration − Begin the XML declaration with the following statement.
DTD − Immediately after the XML header, the document type declaration follows,
commonly referred to as the DOCTYPE −
<!DOCTYPE address [
The DOCTYPE declaration has an exclamation mark (!) at the start of the element name.
The DOCTYPE informs the parser that a DTD is associated with this XML document.
DTD Body − The DOCTYPE declaration is followed by body of the DTD, where you
declare elements, attributes, entities, and notations.
Several elements are declared here that make up the vocabulary of the <name>
document. <!ELEMENT name (#PCDATA)> defines the element name to be of type
"#PCDATA". Here #PCDATA means parse-able text data.
End Declaration − Finally, the declaration section of the DTD is closed using a closing
bracket and a closing angle bracket (]>). This effectively ends the definition, and
thereafter, the XML document follows immediately.
Rules
The document type declaration must appear at the start of the document
(preceded only by the XML header) − it is not permitted anywhere else within the
document.
Similar to the DOCTYPE declaration, the element declarations must start with an
exclamation mark. 4
The Name in the document type declaration must match the element type of the
root element.
2. External DTD
In external DTD elements are declared outside the XML file. They are accessed by
specifying the system attributes which may be either the legal .dtd file or a valid URL.
To refer it as external DTD, standalone attribute in the XML declaration must be set as
no. This means, declaration includes information from the external source.
Syntax
Example
An XML Schema describes the structure of an XML document, just like a DTD.
An XML document validated against an XML Schema is both "Well Formed" and
"Valid".
Syntax
5
You need to declare a schema in your XML document as follows:
<xs:schema>
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
<xs:element name="price" type="xs:integer"/>
<xs:element name="edition" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
If the above xml code follows the exact rules defined in “student.xsd” then we can
conclude that our xml document is a valid document. Otherwise it is an invalid
document.
DTD vs XSD
There are many differences between DTD (Document Type Definition) and XSD (XML
Schema Definition). In short, DTD provides less control on XML structure whereas XSD
(XML schema) provides more control.
7
XML Parsers
An XML parser is a software library or package that provides interfaces for client
applications to work with an XML document. The XML Parser is designed to read the XML
and create a way for programs to use XML.
XML parser validates the document and check that the document is well formatted.
It checks for proper format of the XML document and may also validate the XML
documents. Modern day browsers have built-in XML parsers.
Following diagram shows how XML parser interacts with XML document:
These are the two main types of XML Parsers: 1. DOM 2. SAX
Advantages
1) It supports both read and write operations and the API is very simple to use.
Disadvantages
1) It is memory inefficient. (Consumes more memory because the whole XML document
needs to loaded into memory).
Example
<!DOCTYPE html>
<html>
<body>
<h1>TutorialsPoint DOM example </h1>
<div>
<b>Name:</b> <span id = "name"></span><br>
<b>Company:</b> <span id = "company"></span><br>
<b>Phone:</b> <span id = "phone"></span>
2
</div>
<script>
if (window.XMLHttpRequest)
{// code for IE7+, Firefox, Chrome, Opera, Safari
xmlhttp = new XMLHttpRequest();
}
else
{// code for IE6, IE5
xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.open("GET","/xml/address.xml",false);
xmlhttp.send();
xmlDoc = xmlhttp.responseXML;
document.getElementById("name").innerHTML=
xmlDoc.getElementsByTagName("name")[0].childNodes[0].nodeValue;
document.getElementById("company").innerHTML=
xmlDoc.getElementsByTagName("company")[0].childNodes[0].nodeValue;
document.getElementById("phone").innerHTML=
xmlDoc.getElementsByTagName("phone")[0].childNodes[0].nodeValue;
</script>
</body>
</html>
Now let us keep these two files sample.htm and address.xml in the same directory
/xml and execute the sample.htm file by opening it in any browser. This should
produce the following output.
3
XML - Databases
XML Database is used to store huge amount of information in the XML format. As the
use of XML is increasing in every field, it is required to have a secured place to store the
XML documents. The data stored in the database can be queried using XQuery,
serialized, and exported into a desired format. XML databases are usually associated
with document-oriented databases.
XML enabled database is nothing but the extension provided for the conversion of XML
document. This is a relational database, where data is stored in tables consisting of rows
and columns. The tables contain set of records, which in turn consist of fields.
Native XML database is based on the container rather than table format. It can store
large amount of XML document and data. Native XML database is queried by the XPath-
expressions.
Native XML database has an advantage over the XML-enabled database. It is highly
capable to store, query and maintain the XML document than XML-enabled database.
Example
<contact2>
<name>Manisha Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 789-4567</phone>
</contact2>
</contact-info>
Here, a table of contacts is created that holds the records of contacts (contact1 and
contact2), which in turn consists of three entities − name, company and phone.