XML Introduction To
XML Introduction To
Introduction to XML
What is XML?
• XML stands for EXtensible Markup Language
• XML is a markup language much like HTML
• XML was designed to carry data, not to display data
• XML tags are not predefined. You must define your own tags
• XML is designed to be self-descriptive
• XML is a W3C Recommendation
The Difference Between XML and HTML
XML is not a replacement for HTML.
XML and HTML were designed with different goals:
• XML was designed to transport and store data, with focus on what data
is
• HTML was designed to display data, with focus on how data looks
HTML is about displaying information, while XML is about carrying
information.
With XML You Invent Your Own Tags
The tags in the example above (like <to> and <from>) are not defined in
any XML standard. These tags are "invented" by the author of the XML
document.
That is because the XML language has no predefined tags.
The tags used in HTML are predefined. HTML documents can only use tags
defined in the HTML standard (like <p>, <h1>, etc.).
XML allows the author to define his/her own tags and his/her own document
structure.
XML is Not a Replacement for HTML
XML is a complement to HTML.
It is important to understand that XML is not a replacement for HTML. In
most web applications, XML is used to transport data, while HTML is used to
format and display the data.
My best description of XML is this:
XML is a software- and hardware-independent tool for carrying
information.
XML is Everywhere
XML is now as important for the Web as HTML was to the foundation of the
Web.
XML is the most common tool for data transmissions between all sorts of
applications.
XML Tree
An Example XML Document
XML documents use a self-describing and simple syntax:
Example:
<note date="12/11/2007">
<to>Tove</to>
<from>Jani</from>
</note>
The error in the first document is that the date attribute in the note element
is not quoted.
Entity References
Some characters have a special meaning in XML.
If you place a character like "<" inside an XML element, it will generate an
error because the parser interprets it as the start of a new element.
This will generate an XML error:
<message>if salary < 1000 then</message>
To avoid this error, replace the "<" character with an entity reference:
<message>if salary < 1000 then</message>
There are 5 predefined entity references in XML:
< < less than
> > greater than
& & ampersand
' ' apostrophe
" " quotation mark
Note: Only the characters "<" and "&" are strictly illegal in XML. The greater
than character is legal, but it is a good habit to replace it.
Comments in XML
The syntax for writing comments in XML is similar to that of HTML.
<!-- This is a comment -->
Imagine that the author of the XML document added some extra information
to it:
<note>
<date>2008-01-10</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Should the application break or crash?
No. The application should still be able to find the <to>, <from>, and
<body> elements in the XML document and produce the same output.
One of the beauties of XML, is that it can be extended without breaking
applications.
XML Attributes
XML elements can have attributes, just like HTML.
Attributes provide additional information about an element.
XML Attributes
In HTML, attributes provide additional information about elements:
<img src="computer.gif">
<a href="demo.asp">
Attributes often provide information that is not a part of the data. In the
example below, the file type is irrelevant to the data, but can be important
to the software that wants to manipulate the element:
<file type="gif">computer.gif</file>
My Favorite Way
The following three XML documents contain exactly the same information:
A date attribute is used in the first example:
<note date="10/01/2008">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
A date element is used in the second example:
<note>
<date>10/01/2008</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
An expanded date element is used in the third: (THIS IS MY FAVORITE):
<note>
<date>
<day>10</day>
<month>01</month>
<year>2008</year>
</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Avoid XML Attributes?
Some of the problems with using attributes are:
• attributes cannot contain multiple values (elements can)
• attributes cannot contain tree structures (elements can)
• attributes are not easily expandable (for future changes)
Attributes are difficult to read and maintain. Use elements for data. Use
attributes for information that is not relevant to the data.
Don't end up like this:
<note day="10" month="01" year="2008"
to="Tove" from="Jani" heading="Reminder"
body="Don't forget me this weekend!">
</note>
XML DTD
The purpose of a DTD is to define the structure of an XML document. It
defines the structure with a list of legal elements:
<!DOCTYPE note
[
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
If you want to study DTD, you will find our DTD tutorial on our homepage.
XML Schema
W3C supports an XML-based alternative to DTD, called XML Schema:
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
If you want to study XML Schema, you will find our Schema tutorial on
our homepage.
Bottom of Form
Note: This only checks if your XML is "Well formed". If you want to validate
your XML against a DTD, see the last paragraph on this page.
Validate
Bottom of Form
Note: If you get an "Access denied" error, it's because your browser security
does not allow file access across domains.
The file "note_error.xml" demonstrates your browsers error handling. If you
want see an error free message, substitute the "note_error.xml" with
"cd_catalog.xml".
Bottom of Form
Note: Only Internet Explorer will actually check your XML against the DTD.
Firefox, Mozilla, Netscape, and Opera will not.