Html basics
Html basics
UNIT 2
HTML: HTML (Hypertext Mark-up Language) is a markup language that is use to describe web
pages. Markup language is written using Markup tags. Markup tags are keywords
surrounded by opening (<) and closing (>) angular brackets. Example <html>
Hypertext is text displayed on an electronic device with references (hyperlinks) to other text that
the reader can immediately access, usually by a mouse click or key press sequence.
An HTML document itself is a text file that contains text and markups called tags. Any simple
text editor like notepad can be use to create and edit HTML file. The only requirement is to save
the file with .html or .htm extension.
HTML Elements: An element consists of a tag, its attributes and content. Tags are codes each of
which mark up a certain region in an HTML documents. These markup regions are displayed on
the screen using the style as mentioned by the marking tags. The general format of an element is
-----
<tag>tag content </tag>
For example <b> Hello RKGITians </b>
Here <b>is opening tag and </b> is closing tag.<b></b> tells the browser to display the content
bold. Thus each tag of HTML has some specific meaning to the browser.
There are two types of tags----1.Container tag 2.Empty tag
Container tags have both opening and closing tags. Tags that have no closing tags are
called
Empty tag.
Tags may have properties that can optionally be assigned values to change the default behavior of
these tags .These properties are called attributes. Attributes are placed within the opening tag.
Even Empty tags may have attributes. For example:
<font face=”arial” >Hello RKGITians</font>
<hr width=”400”>
Multiple attribute of singal tag can be specified separated by white space(s). for example…..
<hr width=”400” color=”red”>
<font face=”arial” size=”2” color=”blue”>Hello RKGITians</font>
Each tag has its own set of predefined attributes. The value specified for a specific attribute for a
specific tag is browser-specific.
Basic Structure of HTML Page: Each HTML document should start with the <html>
and should end with </html> .
<html>
<head>
-------
-------------
</head>
<body>
-----------
UNIT 2
---------------
</body>
</html>
An HTML page has basically two distinct logical section:head section specified by <head> and
</head> tags and body section specified by <body> and </body> tags.The head section contains
the meta information about the html page.this section is processed but not rendered to display on
screen.Typically the head section contains <title> tags that is used to assign a title to the web
page ,which appears on the title bar of the browser window. This section quite often contains
JavaScript code, CSS codes etc.
The body section contains text and tags,which are rendered on the screen as specified there.
HTML Comments: An html comment is not really a tag.It start with <!—and end with
--
>.Everything within this character sequence will be ignored by the browser and will not
be parsed. For example---
<!--This is a comment. Comments are not displayed in the browser-->
Adding a Title: The <title> and </title> tags represent page titles. Their content is displayed on
the top of the browser window. For example following example adds the title “My RKGIT “ to
your page.
<html>
<head>
<title>My RKGIT</Title>
</head>
</html>
First HTML page:
<html>
<head>
<title>My RKGIT</Title>
</head>
<body> Hello RKGITians
</body>
</html>
HTML Background: The background of an HTML page is specified in the <body> tags. We
may use plain color or an image as background.
Using Plain color as background:
<body bgcolor=”pink”>
Using image as background:
<body background=”Diksha.jpg”>
UNIT 2
Text Style:
Changing base font: To set the default font style, we use <basefont> tag. The face ,size, color
attributes are used to specify font face ,sixe and color respectively.
The following code sets default font style “arial”, font size”12” and font color”red”
<basefont face=”arial” size=”12” color=”red”>
Changing Style of a specific text: To change style of a specific text use <font> tag.
<font face=”verdana” size”12” color=”blue”>
HTML Headings: HTML provide six levels of headings <h1> to <h6> . The <h1> has
the largest and <h6> has the smallest font size.
Creating HyperText(link) in HTML: The HTML code for a link is simple. It looks like this:
<a href="url">Link text</a>
For example:
<a href="http://www.rkgit.edu.in">Visit RKGIT</a>
HTML Links - The target Attribute:
The target attribute specifies where to open the linked document.
The example below will open the linked document in a new browser window or a new tab:
<a href=”http://www.rkgit.edu.in” target="_blank">Visit RKGIT</a>
The browser displays the image where the <img> tag occurs in the document. If you put
an image tag between two paragraphs, the browser shows the first paragraph, then the image, and
then the second paragraph.
HTML Lists: A list is a collection of one or more items. HTML supports three types of list---
1.Unordered list
2.Orderd list
3.Definition list
Unordered list:
An unordered list starts with the <ul> tag. Each list item starts with the <li> tag.
The list items are marked with bullets (typically small black circles) by default.
<ul>
<li>Coffee</li>
<li>Milk</li>
</ul>
Output will be
Coffee
Milk
<ul> tag have attribute type to specify the bullet options its values may be disc, circle or square.
ordered list: An ordered list starts with the <ol> tag. Each list item starts with the <li> tag.
The list items are marked with numbers by default.
UNIT 2
<ol>
<li>Coffee</li>
<li>Milk</li>
</ol>
Output:
1. Coffee
2. Milk
<ol> tag have attribute type and start to specify the numbering options . values of type attribute
may be 1,a,i, I and start tells from where to start the numbering.
Definition List: A definition list is a list of items, with a description of each item.
The <dl> tag defines a definition list.
The <dl> tag is used in conjunction with <dt> (defines the item in the list) and <dd> (describes
the item in the list):]
<dl>
<dt>Coffee</dt>
<dd>- black hot drink</dd>
<dt>Milk</dt>
<dd>- white cold drink</dd>
</dl>
Output:
Coffee
- black hot drink
Milk
- white cold drink
HTML Table:
Tables are defined with the <table> ---</table> tag. Table specific properties are defined
in
<table> opening tag.
A table is divided into rows (with the <tr> tag), and each row is divided into data cells (with the
<td> tag). td stands for "table data," and holds the content of a data cell. A <td> tag can contain
text, links, images, lists, forms, other tables, etc.
Rows and Columns header can be created using <th> tag .
<caption> tag is used to specify the caption of the table.
must span.
rowspan Specifies the number of rows this cell Number of pixel may be value
must span.
Height Specifies height of this cell in pixels Number of pixel may be value
Width Specifies width of this cell in pixels Number of pixel may be value
Example ------
<html>
<body>
<h4>Cell that spans two columns:</h4>
<table border="1">
<tr>
<th>Name</th>
<th colspan="2">Telephone</th>
</tr>
<tr>
<td>Bill Gates</td>
<td>555 77 854</td>
<td>555 77 855</td>
</tr>
</table>
<h4>Cell that spans two rows:</h4>
<table border="1">
<tr>
<th>First Name:</th>
<td>Bill Gates</td>
</tr>
<tr>
<th rowspan="2">Telephone:</th>
<td>555 77 854</td>
</tr><tr>
<td>555 77 855</td>
</tr></table></body>
</html>
HTML Forms: HTML <form> tag is use to create form in html page. <form> create asection in
our HTML document to collect information from visitors.Form provide several control elements
such as buttons,text boxes,password field,check boxes,radio buttons,selection boxes,hidden
fields etc.Visitors enter datas using these control elements that are then usually sent(submitted) to
a program at the server side. The server program ,in turn, processes these data and take
necessary action.
The <form> tag has an optional attribute action that specifies the target url, which handles the
form data on submission.Usually this url is server side program url that is created to handle form
data and give response.
The <form> has another optional attribute method that specifies HTTP method to be used
to send the form data to the specified server which is mentioned through action attribute
value. Method attribute may have values GET or POST.
If GET method is used , the form data are appended to the url,so, the information being passed
will be visible in the address bar of the browser. This is vulnerable for sensitive data . But GET
method is fast in compare to POST method. GET is used for search engine like google .Since url
length vary for browser to browser that’s why speed of GET method vary from browser
to browser..
If POST method is used , data are sent as a part of the HTTP request message and are
not appended to the URL.Data sent using this method are not visible and is useful for sensitive
data. A form has, typically, a name attribute that may be used to refer to this form in another
places where needed.
Following is a simple form to send login information to the server handler.jsp…….
<form action=”handler.jsp” method=”POST” name=”loginform”>
Login: <input type=”text” name=”login”>
Password: <input tpe=”password” name=”pass”>
<input type=”submit” name=”submit” value=”Login”>
</form>
Form Element: Most of the form elements are created using the <input> tag.It has an attribute
type that specifies the type of the input. Below table shows the attribute of input tag.
Attributes of input tag:
Type Specifies the type of control element. Can have values text, password, checkboxes,
radio , submit, reset, file, hidden, image, button. Its default value is text.
UNIT 2
There are ten input types: text, password, checkbox, radio, button, submit, reset, hidden, file, and
image.
Text Fields:
<input type="text" /> defines a one-line input field that a user can enter text into:
<form>
First name: <input type="text" name="firstname" /><br />
Last name: <input type="text" name="lastname"
/>
</form>
How the HTML code above looks in a browser:
First name:
Last name:
Note: Note that the default width of a text field is 20 characters.
Password Field:
<input type="password" /> defines a password field:
<form>
Password: <input type="password" name="pwd" />
</form>
How the HTML code above looks in a browser:
Password:
Note: The characters in a password field are masked (shown as asterisks or circles).
Radio Buttons:
<input type="radio" /> defines a radio button. Radio buttons let a user select ONLY ONE of a
limited number of choices:
<form>
<input type="radio" name="sex" value="male" /> Male<br />
UNIT 2
Checkboxes:
<input type="checkbox" /> defines a checkbox. Checkboxes let a user select ONE or
MORE
options of a limited number of choices.
<form>
<input type="checkbox" name="vehicle" value="Bike" /> I have a bike<br />
<input type="checkbox" name="vehicle" value="Car" /> I have a car
</form>
How the HTML code above looks in a browser:
I have a bike
I have a car
Submit Button:
<input type="submit" /> defines a submit button.
A submit button is used to send form data to a server. The data is sent to the page specified in the
form's action attribute. The file defined in the action attribute usually does something with the
received input:
<form name="input" action="html_form_action.asp" method="get">
Username: <input type="text" name="user" />
<input type="submit" value="Submit" />
</form>
How the HTML code above looks in a browser:
Submit
Username:
If you type some characters in the text field above, and click the "Submit" button, the browser
will send your input to a page called "html_form_action.asp". The page will show you
the received input.
Hidden Field:
Hidden fields are not displayed by the browser and user can never interact with them.One of the
important application of a hidden field is session tracking where information is sent back and
forth between the server and the browser.
UNIT 2
<form>
<input type="hidden" name="name" value=”some values”/><br />
</form>
This name value pair can be use by server and client to identify each other during
session tracking.
Text Area:
A text area is an extension of a text field, where the text can span multiple lines.
Example:
<textarea rows="2" cols="20">
RKGIT provide world class infrastructure.
</textarea>
File Upload: The file form field allows users to select one or more files to be sent to the server
side.When the form is submitted , the content of the file is sent to the server.
<form action=”upload.jsp” method=”post”>
Select a Question file<br>
<input type=”file” name=”question”><br>
<input type=”submit” value=”Upload”>
</form>
Reset Button:When a reset button is clicked , the form fields are assigned their default initial
values.
<input type=”reset” value=”Restore Default”>
HTML Frames: HTML allows us to divide a web page into several blocks called frames.Each
frame may display a separate html file(web page). So, frame allow us to display multiple html
documents in one browser window simultaneously.They are refreshed separately.They maintain
their own content without having any relation to others.
The typical use of frames is to have the menu in one frame and the content in the
another frame.When the user click on some menu in one frame , the corresponding content is
displayed in another frame.
Frameset Element: The layout of the document is specified using the <frameset>
tag.The
<frameset> tag is put in parent html document without any body tag.<frameset> contains
<frame> tags each of which creates a frame.The relative size of frames may be specified using
cols and rows attributes of <frameset> tag.
UNIT 2
Cols: This attributes is used to creates frames vertically. Its value is comma-separated list of
percentages,pixels, and relative lengths that indicates the width of the frames. The default value
is 100%.This indicate one column.
Rows: This attributes is used to creates frames horizontally. Its value is comma-separated list of
percentages,pixels, and relative lengths that indicates the height of the frames. The default value
is 100%.This indicate one row.
The following specifications are valid:
Cols=”10%, *” Two frame are created vertically. The left frame has the width 10% to
the total page width and the right frame will use rest (90%).
Cols=”20%,50%,*%” Three frames created vertically.Left and middle frames will have width of
20% and 50% respectively .The right one will use rest (30%) of the page width
Rows=”20%,*%,30%” Three frames created horizontally.Left and right frames will have width
of 20% and 30% respectively .The middle one will use rest (50%) of the page height.
top
Bottom- Bottom-right
left
Fig1
<frameset rows=”10%,90%”>
<frameset cols=”50%,50%”>
<frame name=”top-left” src=”top-left.html”>
<frame name=”top-right” src=”top-right.html”>
</frameset>
<frameset cols=”50%,50%”>
<frame name=”bottom-left” src=”bottom-left.html”>
UNIT 2
Top-left Top-right
Bottom-left Bottom-right
Fig2
Frame Targeting: The target attribute may be set for elements that creates links such as <a>,
<link> etc. The value of target attribute refers to the frame where the document is to be loaded.
The value of the target attribute must be an existing frame name.The following example
demonstrate how to specify the target attribute.
First we define a frameset document main.html as follows—
<html>
<frameset cols=”20%,80%”>
<frame name=”left” src=”left.html”>
<frame name=”right” src=”right.html”>
</frameset>
</html>
Above devide the window into two frames vertically.In the left frame, HTML document left.html
is loaded. It looks like this---
<html>
<head>
<title>Target demo </title>
</head>
<body>
<a href=”http://www.rkgit.edu.in” target=”right”>Visit RKGIT</a><br>
<a href=”http://www.google.com” target=”right”> open Google</a></br>
</body>
</html>
UNIT 2
It contains two links Visit RKGIT and Open Google each of which when clicked, the respective
link is opened in the frame named right.
Inline frame: An iframe is used to display a web page within a web page.Inline frame allow us
to insert a frame even within a block of text.
Style Sheet:
A style sheet is a document that contains style information about one or more documents written
in markup languages. It enables us to control rendering of styles such as fonts, color, typeface,
size, spacing, margins and other aspects of document style. A style sheet is composed of a set of
style rules written in a specified format. This set of style rules instructs browsers on
how to present a document on the screen.
CSS:
Cascading Style Sheet (CSS) is a style sheet language that specifies how to incorporate
style information in a style sheet. The term “cascading” indicates that several style sheets
can be blended to present a document on the browser`s screen. Later style sheets have
greater precedence than earlier ones.
Advantages of CSS:
1. Separate document presentation from document content.
2. Allows us to give different look to same document, without significant effort.
3. Same style can be applied to different documents.
4. Reduce development time significantly.
5. Most of the browser cache external style sheets, this speed up overall response time.
6. CSS provides many more style attributes and pseudo classes for defining the look and
feel of web pages, than plain HTML.
Selector{
Property1:value1;
Property2:value2;
Property3:value3;
--
property:valueN;
}
For example----
Body{background-color:gray;
text-decoration:red;
}
UNIT 2
This rule has the selector body. It is only applied to the <body> tag in an html document.
Simple selector: If the selector is simply the name of the element than it is simple selector
Complex Selector: A selector that consists of a rich contextual pattern is called complex
selector. A complex selector consists of one or more simple selectors separated by combinatory
such as white space , “>”, and “+”.
Type Selector: A type selector is a simple selector, which is the name of a document element
and it matches every single element of the document. For example, the selector p selects every
<p> element in the document.
Universal Selector: Universal selector is denoted by *, which matches with every single element
in the document. For example—
*{color: red ;}
It makes all the text in the document red.
Universal selector is useful when element names are not known in advance during the
development of the style sheet.
Descendant Selectors: A descendent selector selects only those elements that are descendent of
a specified element. For example---
<div><b>C</b>ascading <b>S</b>tyle <b>S</b>heet</div>
<p>Descendent <b>Selectors</b></p>
<p>this<b>is </b>a <i><b>paragraph</b></i></p>
How can one select all <b> elements, which are highlighted in this code segment? Note that all
highlighted <b> elements are descendent of the <p> element. The type selector b will select all b
elements that we do not want. The correct way is to use descendent selector as follows: p b
This selects only those <b> elements that are descendent of <p> elements, i.e., every <b>
element that has a <p> element as its ancestor.
Child Selectors: Child selectors select elements that are immediate children of a specified
element.
For example ----
<p>This <b>is </b>a<i><b>paragraph</b></i></p>
Note that highlighted <b> element are immediate children of <p> element. We can select
highlighted <b> elements by child selector as follows:
P>B
UNIT 2
The following selector selects the <b> element whose parent is the <i> element whose parent is,
in turn, the <p> element.
P>I>B
Class Selectors: Class selector deal with the elements having the attribute class. The
class attribute adds an element to a group whose name is the attribute value. This way a set of
related element can be grouped and a common style can be applied to every element in the group.
An element may belong to more than one group. Class selector is defined by placing a .
Symbol before the selector name. To make a class named intro, we declare the class selector as
below----
.intro
{
color:yellow;
}
To apply this class on specific <p> element use attribute class as follows---
<p class=”intro”>My name is Donald.</p>
<p>I live in Duckburg.</p>
Output will be :
My name is Donald.
I live in Duckburg.
We can apply a class style by <p class=”intro”> or p.intro .Means <p class=”intro”> and p.intro
are same.
ID Selectors: The attribute id of an element is unique identifier in a web page. This means that
no two id attributes can have the same value within the document. The id differs from class in
that id identifies a single element uniquely whereas class identifies a set of related elements.
id selector is defined by placing a # symbol before the selector name.
The id selector is used to specify a style for a single, unique element.
The id selector uses the id attribute of the HTML element, and is defined with a "#".
The style rule below will be applied to the element with id="para1":
#para1
{
text-align:center;
color:red;
}
To apply this on specific <p> element use attribute class as follows---
<p id="para1">Hello World!</p>
<p>This paragraph is not affected by the style.</p>
We can apply a idstyle by <p id=”para1”> or p#para1 .Means <p id=”para1”> and p#para1 are
same.
UNIT 2
Note: The selector p#para1 selects the p element having id attribute value para3.So, it matches
the following—
<p id=”para1”>----</p>
But not
<div id=”para1”>---</div>
The selector #para1 matches both. However before using it make sure that id attribute values are
unique.
External Style Sheets: In this case, style information is written in a separate file and is
referenced from an HTML document. The external style sheet should be saved with.css
extension. An external style sheet is useful when the same style is applied on different
documents. External style sheet are cached by most of the browser. So, browsers have to
download only documents. This implies faster response. In an HTML document, the
external style sheet is specified using the HTML <LINK> tag. For example---
To use mystyle.css file of same directory (html and css file are in same directory) write
following---
<link rel=”stylesheet” type=”text/css” href=”mystyle.css”>
Embedded Style Sheets: In this method , style information is placed under the style tag in the
head section of an HTML page.For example ---
<HEAD>
<STYLE type="text/css">
H1 {border-width: 1; border: solid; text-align: center}
</STYLE></HEAD>
Imported Style sheet: Another way of importing a style sheet is to use @import statement. It
allows us to include external style sheets in our document. It is a way of creating a style sheet
within your document, and then importing additional rules into the document.
To use the @import rule, type:
<style type="text/css">
@import url("import1.css");
@import url "import2.css";
UNIT 2
</style>
UNIT 2
The url() is not required. The double quotes are required for valid XHTML, but browsers that
support url() tend to support it with or without quotes.
You can also include an @import rule in a style sheet with styles:
<style type="text/css">
@import url("import3.css");
p { color : #f00; }
</style>
@import rules must always be first in a document.
Internal rule override the conflicting rules in the imported styles.For example , the following
style rule makes all paragraphs green even if a style style1.css file contains a rule p{color:red;}
<style>
@import url(“style1.css”);
P{color:green;}
</style>
The Latter style sheets get preference if there is any conflicting rule in the imported style sheet.
Inline Style Sheet: In this case, style information is incorporated directly into the HTML tags.
<p style=”color:red”>Hello RKGITians</p>
Cascading Rule:There are several ways to specify style rules.If more than one rule is specified,
conflicting rules are resolved according to the following Rules:--
Specificity Rule:More specific rules get preference over less specific rules.for example---
P b {color:green;}
B{color:red;}
The former makes text under the tag <b> , which is descendent of the tag <p>,green.The latter
makes text under the <b> tag red. Every <b> tag that match the former rule also matches the
latter but the reverse is not true. So, the former is more specific one.
Order rules: For conflicting rules, latter rules get preference over the earlier rules. Figure shows
the rules for resolving conflicting styles.
Inline Internal External Imported Browser`s
Style style Style Style default style
Pseudo Classes and Elements: Pseudo classes and elements are used to add style to
those elements that are not accessible by traditional selectors. These special classes are useful
when information about an element is not available from the document tree. For example, no
selector refers to the first line of a paragraph or the first letter of a line. Pseudo classes match
elements using the information other than their name, content, or attribute such as states
of an anchor element. Pseudo element on the other hand, addresses sub-parts of an element such
as the first letter of a paragraph.
The general forms of pseudo class and pseudo element look like this:
Selector:pseudo-class {declarations}
Or
Selector:pseudo-element {declarations}
For example: first-child,:last-child,:only-child,a:link,a:visited,a:hover,a:active,a:focus
UNIT 2
XML: is a very handy format for storing and communicating your data between disparate
systems in a platform-independent fashion. XML is more than just a format for computers
— a guiding principle in its creation was that it should be Human Readable and easy to
create.
XML allows UNIX systems written in C to communicate with Web Services that, for example,
run on the Microsoft .NET architecture and are written in ASP.NET. XML is however, only the
meta-language that the systems understand — and they both need to agree on the format that the
XML data will be in. Typically, one of the partners in the process will offer a service to
the other: one is in charge of the format of the data.
The definition serves two purposes: the first is to ensure that the data that makes it past
the parsing stage is at least in the right structure. As such, it’s a first level at which ‘garbage’
input can be rejected. Secondly, the definition documents the protocol in a standard,
formal way, which makes it easier for developers to understand what is available.
External Definition:
<?xml version="1.0"?>
The actual body of the DTD itself contains definitions in terms of elements and their attributes.
For example, the following short DTD defines a bookstore. It states that a bookstore has a name,
and stocks books on at least one topic.
Each topic has a name and 0 or more books in stock. Each book has a title, author and ISBN
number. The name of the topic, and the name of the bookstore are defined as being the same type
of element: this store’sPCDATA: just text data. The title and author of the book are
stored
UNIT 2
as CDATA -- text data that won’t be parsed for further characters by the XML parser. The ISBN
number is stored as an attribute of the book:
<!DOCTYPE bookstore [
]>
<?xml version="1.0"?>
<!DOCTYPE bookstore [
]>
<bookstore>
<name>Mike's Store</name>
<topic>
<name>XML</name>
<book isbn="123-456-789">
<author>Mike Jervis</author>
</book>
</topic>
</bookstore>
Using an inline definition is handy when you only have a few documents and they are offline, as
the definition is always in the file. However, if, for example, your DTD defines the
XML protocol used to talk between two separate systems, re-transmitting the DTD with each
document adds an overhead to the communications. Having an external DTD eliminates the need
to re-send each time. We could remove the DTD from the document, and place it in a DTD file
on a Web
server that’s accessible by the two systems:
<?xml version="1.0"?>
<bookstore>
<name>Mike's Store</name>
<topic>
UNIT 2
<name>XML</name>
<book isbn="123-456-789">
<author>Mike Jervis</author>
</book>
</topic>
</bookstore>
The file bookstore.dtd would contain the full definition in a plain text file:
So for example, if you stored your applications settings in an XML file, it could be manually
edited so that the windows start coordinates were strings — and you’d still need to validate this
in your code, rather than have the parser do it for you.
XML SCHEMAS
XML Schemas provide a much more powerful means by which to define your XML document
structure and limitations. XML Schemas are themselves XML documents. They reference
the XML Schema Namespace and even have their own DTD.
What XML Schemas do is provide an Object Oriented approach to defining the format of
an XML document. XML Schemas provide a set of basic types. These types are much
wider ranging than the basic PCDATA and CDATA of DTDs. They include most basic
programming types such as integer, byte, string and floating point numbers, but they also expand
into Internet data types such as ISO country and language codes (en-GB for example). A
full list can be found at http://www.w3c.org/TR/xmlschema-0/#simpleTypesTable
The author of an XML Schema then uses these core types, along with various operators
and modifiers, to create complex types of their own. These complex types are then used to define
an element in the XML Document.
As a simple example, let’s try to create a basic XML Schema for defining the bookstore that we
used as an example for DTDs. Firstly, we must declare this as an XSD Document, and, as we
want this to be very user friendly, we’re going to add some basic documentation to it:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation xlm:lang="en">
</xsd:documentation>
</xsd:annotation>
Now, in the previous example, the bookstore consisted of the sequence of a name and at least
one topic. We can easily do that in an XML Schema:
UNIT 2
<xsd:complexType name="bookstoreType">
<xsd:sequence>
</xsd:sequence>
</xsd:complexType>
In this example, we’ve defined an element, bookstore, that will equate to an XML element in our
document. We’ve defined it of type bookstoreType, which is not a standard type, and so
we provide a definition of that type next.
We then define a complexType, which defines bookstoreType as a sequence of name and topic
elements. Our "name" type is an xsd:string, a type defined by the XML Schema Namespace, and
so we’ve fully defined that element.
The topic element, however, is of type topicType, another custom type that we must define. We
have also defined our topic element with minOccurs="1", which means there must be at least one
element at all times. As maxOccurs is not defined, there no upper limit to the number of
elements that might be included. If we had specified neither, the default would be exactly one
instance, as is used in the name element. Next, we define the schema for the topicType.
<xsd:complexType name="topicType">
</xsd:complexType>
This is all similar to the declaration of the bookstoreType, but note that we have to re-define our
name element within the scope of this type. If we’d used a complex type for name, such
as nameType, which defined only an xsd:string — and defined it outside our types, we could re-
UNIT 2
use it in both. However, to illustrate the point, I decided to define it within each section. XML
gets interesting when we get to defining our bookType:
<xsd:complexType name="bookType">
</xsd:complexType>
<xsd:simpleType name="isbnType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\[0-9]{3}[-][0-9]{3}[-][0-9]{3}"/>
</xsd:restriction>
</xsd:simpleType>
So the definition of the bookType is not particularly interesting. But the definition of its attribute
"isbn" is. Not only does XML Schema support the use of types such as
xsd: nonNegativeNumber, but we can also create our own simple types from these basic types
using various modifiers. In the example for isbnType above, we base it on a string, and restrict it
to match a given regular expression. Excusing my poor regex, that should limit any ISBN
attribute to match the standard of three groups of three digits separated by a dash.
This is just a simple example, but it should give you a taste of the many things you can do to
control the content of an attribute or an element. You have far more control over what
is considered a valid XML document using a schema. You can even
extend your types from other types you’ve created,
require uniqueness within scope, and
Provide lookups.
It has a nicely object oriented approach. You could build a library of complexTypes and
simpleTypes for re-use throughout many projects, and even find other definitions of
common
UNIT 2
types (such as an "address", for example) from the Internet and use these to provide powerful
definitions of your XML documents.
The DTD provides a basic grammar for defining an XML Document in terms of the metadata that
comprise the shape of the document. An XML Schema provides this, plus a detailed way to
define what the data can and cannot contain. It provides far more control for the developer over
what is legal, and it provides an Object Oriented approach, with all the benefits this entails.
Firstly, and rather an important point, is that XML Schema is a new technology. This means that
whilst some XML Parsers support it fully, many still don’t. If you use XML to
communicate with a legacy system, perhaps it will not support the XML Schema.
Many systems interfaces are already defined as a DTD. They are mature definitions, rich
and complex. The effort in re-writing the definition may not be worthwhile.
DTD is also established, and examples of common objects defined in a DTD abound on
the Internet — freely available for re-use. A developer may be able to use these to define a DTD
more quickly than they would be able to accomplish a complete re-development of the
core elements as a new schema.
Finally, you must also consider the fact that the XML Schema is an XML document. It has an
XML Namespace to refer to, and an XML DTD to define it. This is overhead. When a parser
examines the document, it may have to link this all in, interoperate the DTD for the Schema, load
the namespace, and validate the schema, etc., all before it can parse the actual XML document in
question. If you’re using XML as a protocol between two systems that are in heavy use, and need
a quick response, then this overhead may seriously degrade performance.
Then again, if your system is available for third party developers as a Web service, then
the detailed enforcement of the XML Schema may protect your application a lot more
effectively from malicious — or just plain bad — XML packets. As an example, Muse.net is an
interesting technology. They have a publicly available SOAP API defined with an XML
Schema that provides their developers more control over what they receive from the user
community.
UNIT 2
On the other hand, I was recently involved in designing a system to handle incoming transactions
from multiple devices. In order to scale the system, the chosen service that processes requests is a
SOAP server. However, the system is completely closed, and a simple DTD on the server is
enough to ensure that the packets sent from the clients arrive complete and uncorrupted, without
the additional overhead of XML Schema.