XML XML: in This Chapter, You Will Learn
XML XML: in This Chapter, You Will Learn
Introduction to XML
XML is a Extensible markup language for documents containing well structured information. Structured information contains any type of content (words, pictures, etc.) and some indication of what role that content plays (for example, content in a section heading has a different significance from content in a footnote, which means something different than content in a figure caption or content in a database table, etc.). Almost all documents have some structure. A markup language is a mechanism to identify the document structures. The XML is used to define a standard way to add markup to documents. It was designed to carry data, not to display data. Its tags are not predefined. You must define your own tags. XML is designed to be self-descriptive. XML is a formal recommendation from the World Wide Web Consortium (W3C) similar to the language of today's Web pages, the Hypertext Markup Language (HTML).
Markup
HTML, SGML, and XML all markup content using tags. The difference is that SGML and XML mainly deal with the relationship between content and structure, the structural tags that markup the content are not predefined (you can make up your own language), and style is kept TOTALLY separate; HTML on the other hand, is a mix of content marked up with both structural and stylistic tags. HTML tags are predefined by the HTML language. By mixing structure, content and style you limit yourself to one form of presentation and in HTML's case that would be in a limited group of browsers for the World Wide Web. By separating structure and content from style, you can take one file and present it in multiple forms. XML can be transformed to HTML/XHTML and displayed on the Web, or the information can be transformed and published to paper, and the data can be read by any XML aware browser or application.
XML
SGML (Standard Generalized Markup Language)
Historically, Electronic publishing applications such as Microsoft Word, Adobe PageMaker or QuarkXpress, "marked up" documents in a proprietary format that was only recognized by that particular application. The document markup for both structure and style was mixed in with the content and was published to only one media, the printed page. These programs and their proprietary markup had no capability to define the appearance of the information for any other media besides paper, and really did not describe very well the actual content of the document beyond paragraphs, headings and titles. The file format could not be read or exchanged with other programs, it was useful only within the application that created it. Because SGML is a nonproprietary international standard it allows you to create documents that are independent of any specific hardware or software. The document structure (what elements are used and their relationship to each other) is described in a file called the DTD (Document Type Definition). The DTD defines the relationships between a document's elements creating a consistent, logical structure for each document. SGML is good for handling large-scale, long-term information management needs and has been around for more than a decade as the language of defense contractors and the electronic publishing industry. Because SGML is very large, powerful, and complex it is hard to learn and understand and is not well suited for the Web environment.
XML
Features of XML
XML is Just Plain Text XML is nothing special. It is just plain text. Software that can handle plain text can also handle XML. However, XML-aware applications can handle the XML tags specially. The functional meaning of the tags depends on the nature of the application. With XML You Invent Your Own Tags. The tags are "invented" by the author of the XML document. That is because the XML language has no predefined tags. The tags used in HTML (and the structure of HTML) are predefined. HTML documents can only use tags defined in the HTML standard (like <p>, <h1>, etc.). XML allows the author to define his own tags and his own document structure. XML is Not a Replacement for HTML XML is a complement to HTML. It is important to understand that XML is not a replacement for HTML. In most web applications, XML is used to transport data, while HTML is used to format and display the data. My best description of XML is this: XML is a software and hardware independent tool for carrying information. XML is a W3C Recommendation XML became a W3C Recommendation 10. February 1998. XML is Everywhere We have been participating in XML development since its creation. It has been amazing to see how quickly the XML standard has developed, and how quickly a large number of software vendors have adopted the standard. XML is now as important for the Web as HTML was to the foundation of the Web. XML is everywhere. It is the most common tool for data transmissions between all sorts of applications, and is becoming more and more popular in the area of storing and describing information. XML Separates Data from HTML If you need to display dynamic data in your HTML document, it will take a lot of work to edit the HTML each time the data changes. With XML, data can be stored in separate XML files. This way you can concentrate on using HTML for layout and display, and be sure that changes in the underlying data will not require any changes to the HTML. With a few lines of JavaScript, you can read an external XML file and update the data content of your HTML. You will learn more about this in a later chapter of this tutorial. XML Simplifies Data Sharing In the real world, computer systems and databases contain data in incompatible formats. XML data is stored in plain text format. This provides a software- and hardware-independent way of storing data. This makes it much easier to create data that different applications can share. XML Simplifies Data Transport With XML, data can easily be exchanged between incompatible systems. One of the most time-consuming challenges for developers is to exchange data between incompatible systems over the Internet. Exchanging data as XML greatly reduces this complexity, since the data can be read by different incompatible applications. XML Simplifies Platform Changes Upgrading to new systems (hardware or software platforms), is always very time consuming. Large amounts of data must be converted and incompatible data is often lost. XML data is stored in text format. This makes it easier to expand or upgrade to new operating systems, new applications, or new browsers, without losing data. XML Makes Your Data More Available Since XML is independent of hardware, software and application, XML can make your data more available and useful. Different applications can access your data, not only in HTML pages, but also from XML data sources. With XML, your data can be available to all kinds of "reading machines" (Handheld computers, voice machines, news feeds, etc), and make it more available for blind people, or people with other disabilities. XML is Used to Create New Internet Languages o o o o o A lot of new Internet languages are created with XML. Here are some examples: XHTML the latest version of HTML WSDL for describing available web services WAP and WML as markup languages for handheld devices
XML Declaration Document Type Definition (DTD) Prolog (optional) Comment Processing Instructions White Space Root element opening tag Elements & Content (required)
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <!doctype document system "tutorials.dtd"> <!-- Here is a comment --> <?xml-stylesheet type="text/css" href="myStyles.css"?>
<tutorials> <tutorial> <name>XML Tutorial</name> <url>http://www.quackit.com/xml/tutorial</url> </tutorial> <tutorial> <name>HTML Tutorial</name> <url>http://www.quackit.com/html/tutorial</url> </tutorial> </tutorials>
Prolog (optional)
Right at the top of the document, we have a prolog (also spelt prologue). A prolog is optional, but if it is included, it should become at the beginning of the document. The prolog can contain things such as the XML declaration, comments, processing instructions, white space, and document type declarations. Although the prolog (and everything in it) is optional, it's recommended that you include the XML declaration in your XML documents.
XML
XML Declaration The XML declaration indicates that the document is written in XML and specifies which version of XML. The XML declaration, if included, must be on the first line of the document. The XML declaration can also specify the language encoding for the document (optional) and if the application refers to external entities (optional). In our example, we specify that the document uses UTF-8 encoding (although we don't really need to as UTF-8 is the default), and we specify that the document refers to external entities by using standalone="no". This is not a standalone document as it relies on an external resource (i.e. the DTD). Document Type Definition (DTD) The DTD defines the rules of your XML document. Although XML itself has rules, the rules defined in a DTD are specific to your own needs. More specifically, the DTD allows you to specify the names of the elements that are allowed in the document, which elements are allowed to be nested inside other elements, and which elements can only contain data. The DTD is used when you validate your XML document. Any application that uses the document must stop processing if the document doesn't adhere to the DTD. DTDs can be internal (i.e. specified within the document) or external (i.e. specified in an external file). In our example, the DTD is external. Comments XML comments begin with <!-- and end with -->. Similar to HTML comments, XML comments allow you to write stuff within your document without it being parsed by the processor. You normally write comments as an explanatory note to yourself or another programmer. Comments can appear anywhere within your document. Processing Instructions Processing instructions begin with <? and end with ?>. Processing instructions are instructions for the XML processor. Processing instructions are not built into the XML recommendation. Rather, they are processor-dependant so not all processors understand all processing instructions. White Space White space is simply blank space created by carriage returns, line feeds, tabs, and/or spaces. White space doesn't affect the processing of the document, so you can choose to include whitespace or not. Speaking of white space, there is a special attribute (xml:whitespace) that you can use to preserve whitespace within your elements (but we won't concern ourselves with that just now).
XML
XML Elements
XML elements are represented by tags. Elements usually consist of an opening tag and a closing tag, but they can consist of just one tag. Opening tags consist of <, followed by the element name, and ending with >. Closing tags are the same but have a forward slash inserted between the less than symbol and the element name. Example: <tag>Data</tag> Example of empty tag: <tag />
The following syntax rules are important to note, especially if you're used to working with HTML where you don't usually need to worry about these rules. All Elements Must Be Closed Properly If you're familiar with HTML, you will know that some HTML tags don't need to be closed. In XML however, you must close all tags. This is usually done in the form of a closing tag where you repeat the opening tag, but place a forward slash before the element name (i.e. </child>). If you are using an empty element (i.e. one with no closing tag), you need to place a forward slash before the greater than symbol at the end of the tag (i.e. <child />). Example for opening/closing tags: <child>Data</child> Example for empty elements: <child attribute="value" />
Tags Are Case Sensitive All tags must be written using the correct case. XML sees <tutorial> as a different tag to <Tutorial> Wrong: Right: <Tutorial>XML</tutorial> <Tutorial>XML</Tutorial> <tutorial>XML</tutorial> <TUTORIAL>XML</TUTORIAL> Elements Must Be Nested Properly You can place elements inside other elements but you need to ensure each element's closing tag doesn't overlap with any other tags. Wrong: Right: <tutorial> <tutorial> <name>XML</tutorial> <name>XML</name> </name> </tutorial>
XML Attributes
The previous lesson covered the syntax rules related to XML elements. XML elements can also contain attributes. You use attributes within your elements to provide more information about the element. These are represented as name/value pairs. Example: <tag attribute="value">Data</tag> It's important to remember the following syntax rules when using attributes. Quotes You must place quotation marks around the attribute's value. Wrong: Right: <tutorials type=Web> <tutorial> <name>XML</name> </tutorial> </tutorials> <tutorials type="Web"> <tutorial> <name>XML</name> </tutorial> </tutorials>
Shorthand Is Prohibited Attributes must contain a value. Some HTML coders like to use shorthand, where if you provide the attribute name without a value, it will equal true. This is not allowed in XML. Wrong: <tutorials published> <tutorial> <name>XML</name> Right: <tutorials published="true"> <tutorial> <name>XML</name>
XML
</tutorial> </tutorials> </tutorial> </tutorials>
DTD is a simple language with only 4 types of statements: DOCTYPE, ELEMENT, ATTLIST, and ENTITY. One DOCTYPE statement defines one document type. Within the DOCTYPE statement, one or more ELEMENT statements, some ATTLIST statements and some ENTITY statements are included to define details of the document type. DTD statements that define the document type can be included inside the XML file. DTD statements that define the document type can be stored as a separate file and linked to the XML file. Validation of XML files against their document types can be done by XML validation tools.
1. Internal DTDs Internal DTD (markup declaration) are inserted within the doctype declaration. DTDs inserted this way are used in the that specific document. This might be the approach to take for the use of a small number of tags in a single document, as in this example: <?xml version="1.0"?> <!DOCTYPE film [ <!ENTITY COM "Comedy"> <!ENTITY SF "Science Fiction"> <!ELEMENT film (title+,genre,year)> <!ELEMENT title (#PCDATA)> <!ATTLIST title xml:lang NMTOKEN "EN" id ID #IMPLIED> <!ELEMENT genre (#PCDATA)> <!ELEMENT year (#PCDATA)> ]> <film> <title id="1">Tootsie</title> <genre>&COM;</genre> <year>1982</year> <title id="2">Jurassic Park</title> <genre>&SF;</genre> </film> 2. External DTD DTDs can be very complex and creating a DTD requires a certain amount of work. DTDs are stored as ASCII text files with the extension '.dtd'. In the following example we assume, that the previously internal DTD was saved as a separate file (under the namefilm.dtd), and is therefore now referred to as external definition (external DTD): <?xml version="1.0"?> <!DOCTYPE film SYSTEM "film.dtd"> <film> <title id="1">Tootsie</title> <genre>&COM;</genre> <year>1982</year>
XML
<title id="2">Jurassic Park</title> <genre>&SF;</genre> <year>1993</year> </film>
If a document is valid, it's clearly defined what the data in the document really means. There's no possibility to use a tag that's not defined in the DTD. Companies that exchange XMLdocuments can check them with the same DTD. Because a valid XML document is also well formed, there's no possibility for typo's in the tags. A valid XML-document has a structure that's valid. That's the part you can check. There's no check for the content. Difference between Valid XML and Well-Formed Xml A valid document conforms to semantic rules but can be also user defined, while a simple well formed xml structure only respects basic xml syntax rules.
Linking
To link to a style sheet you use an XML processing directive to associate the style sheet with the current document. This statement should occur before the root node of the document. <?xml-stylesheet type="text/css" href="styles/general.css"> The two attributes of the tag are as follows: href: The URL for the style sheet. type: The MIME type of the document begin linked, which in this case is text/css.
MIME stands for Multipart Internet Mail Extension. It is a standard which defines how to make systems aware of the type of content being included in e-mail messages.
general.css
employees { background-color: #ffffff; width: 100%; } id { display: block; marginbottom: 30pt; marginleft: 0; } name { color: #FF0000; font-size: 20pt; } city,state,zipcode { color: #0000FF; font-size: 20pt; }
<?xml version="1.0" encoding="utf-8" standalone="no"?> <!--This xml file represent the details of an employee--> <?xml-stylesheet type="text/css" href="styles/general.css"> <employees> <employee id="1"> <name> <firstName>Mohit</firstName> <lastName>Jain</lastName> </name> <city>Karnal</city> <state>Haryana</state> <zipcode>98122</zipcode> </employee> <employee id="2"> <name> <firstName>Rahul</firstName> <lastName>Kapoor</lastName> </name> <city>Ambala</city> <state>Haryana</state> <zipcode>98112</zipcode> </employee>
10
An XSL stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses the formatting vocabulary. Like CSS an XSL is linked to an XML document and tell browser how to display each of document's elements. An XML document with an attached XSL can be open directly in Internet Explorers. You don't need to use an HTML page to access and display the data. There are two basic steps for using a css to display an XML document: Create the XSL file. Link the XSL sheet to XML document.
Creating XSL file XSL is a plain text file with .css extension that contains a set of rules telling the web browser how to format and display the elements in a specific XML document. You can create a css file using your favorite text editors like Notepad, Wordpad or other text or HTML editor as show below: Linking
To link to a style sheet you use an XML processing directive to associate the style sheet with the current document. This statement should occur before the root node of the document. <?xml-stylesheet type="text/xsl" href="styles/general.xsl"> The two attributes of the tag are as follows: href: The URL for the style sheet. type: The MIME type of the document begin linked, which in this case is text/css.
MIME stands for Multipart Internet Mail Extension. It is a standard which defines how to make systems aware of the type of content being included in e-mail messages.
general.xsl
employees { background-color: #ffffff; width: 100%; } id { display: block; marginbottom: 30pt; marginleft: 0; } name { color: #FF0000; font-size: 20pt; } city,state,zipcode { color: #0000FF; font-size: 20pt; }
11
Research is also being carried out into the properties and use cases for binary encoding of the XML information set.
XML
Short Question Answers
1. What is a markup language? A markup language is a set of words and symbols for describing the identity of pieces of a document (for example this is a paragraph, this is a heading, this is a list, this is the caption of this figure, etc). Programs can use this with a style sheet to create output for screen, print, audio, video, Braille, etc. 2. What is XML? XML is the Extensible Markup Language. It improves the functionality of the Web by letting you identify your information in a more accurate, flexible, and adaptable way. It is extensible because it is not a fixed format like HTML (which is a single, predefined markup language). Instead, XML is actually a meta languagea language for describing other languageswhich lets you design your own markup languages for limitless different types of documents. XML can do this because its written in SGML, the international standard meta language for text document markup (ISO 8879). 3. Arent XML, SGML, and HTML all the same thing? Not quite; SGML is the mother tongue, and has been used for describing thousands of different document types in many fields of human activity, from transcriptions of ancient Irish manuscripts to the technical documentation for stealth bombers, and from patients clinical records to musical notation. SGML is very large and complex, however, and probably overkill for most common office desktop applications. XML is an abbreviated version of SGML, to make it easier to use over the Web, easier for you to define your own document types, and easier for programmers to write programs to handle them. It omits all the complex and less-used options of SGML in return for the benefits of being easier to write applications for, easier to understand, and more suited to delivery and interoperability over the Web. But it is still SGML, and XML files may still be processed in the same way as any other SGML file (see the question on XML software). HTML is just one of many SGML or XML applicationsthe one most frequently used on the Web. Technical readers may find it more useful to think of XML as being SGML rather than HTML++. 4. What are the benefits of XML? There are many benefits of using XML on the Web : Simplicity- Information coded in XML is easy to read and understand, plus it can be processed easily by computers. Openness- XML is a W3C standard, endorsed by software industry market leaders. Extensibility - There is no fixed set of tags. New tags can be created as they are needed. Self-description- In traditional databases, data records require schemas set up by the database administrator. XML documents can be stored without such definitions, because they contain meta data in the form of tags and attributes. Contains machine-readable context information- Tags, attributes and element structure provide context information that can be used to interpret the meaning of content, opening up new possibilities for highly efficient search engines, intelligent data mining, agents, etc. Facilitates the comparison and aggregation of data - The tree structure of XML documents allows documents to be compared and aggregated efficiently element by element. Can embed multiple data types - XML documents can contain any possible data type - from multimedia data (image, sound, video) to active components (Java applets, ActiveX). 5. What is the difference between XML and HTML?
12
HTML is used to mark up text so it can be XML is used to mark up data so it can be displayed to users. processed by computers.
XML
HTML describes both structure (e.g. <p>, <h2>, <em>) and appearance (e.g. <br>, XML describes only content, or meaning <font>, <i>) HTML uses a fixed, unchangeable set of tags In XML, you make up your own tags
13
XML is no way clashes with HTML, since they are for two different purposes. 6. Do I have to know HTML or SGML before I learn XML? No, although its useful because a lot of XML terminology and practice derives from two decades experience of SGML. Be aware that knowing HTML is not the same as understanding SGML. Although HTML was written as an SGML application, browsers ignore most of it (which is why so many useful things dont work), so just because something is done a certain way in HTML browsers does not mean its correct, least of all in XML. 7. What is the structure of XML document ?
8. Define XML Elements, Attributes with example? There are no rules about when to use attributes or when to use elements. For Example: 1. <person sex="female"> <firstname>Anna</firstname> <lastname>Smith</lastname> </person> 2. <person> <sex>female</sex> <firstname>Anna</firstname> <lastname>Smith</lastname> </person> In the first example sex is an attribute. In the last, sex is an element. Both examples provide the same information 9. What is a well-formed XML document? If a document is syntactically correct it can be called as well-formed XML documents. A well-formed document conforms to XML's basic rules of syntax:
Every open tag must be closed. The open tag must exactly match the closing tag: XML is case-sensitive. All elements must be embedded within a single root element. Child tags must be closed before parent tags. A well-formed document has correct XML tag syntax, but the elements might be invalid for the specified document type.
XML
10. What is a valid XML document? If a document is structurally correct then it can be called as valid XML documents. A valid document conforms to the predefined rules of a specific type of document:
14
These rules can be written by the author of the XML document or by someone else. The rules determine the type of data that each part of a document can contain.
Note:Valid XML document is implicitly well-formed, but well-formed may not be valid 11. How does the XML structure is defined? XML document will have a structure which has to be defined before we can create the documents and work with them. The structural rules can be defined using many available technologies, but the following are popular way of doing so-
12. What is DTD? A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines rules for a specific type of document, including: Names of elements, and how and where they can be used The order of elements Proper nesting and containment of elements Element attributes To apply a DTD to an XML document, you can: Include the DTD's element definitions within the XML document itself. Provide the DTD as a separate file, whose name you reference in the XML document. 13. What is XML Schema? An XML Schema describes the structure of an XML instance document by defining what each element must or may contain.XML Schema is expressed in the form of a separate XML file.
XML Schema provides much more control on element and attribute datatypes. Some datatypes are predefined and new ones can be created. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="test"> <xsd:complexType>
14. What are differences between DTDs and Schema? Schema Schema document is an XML document i.e., the structure of an XML document is specified by another XML document. Schema supports variety of dataTypes similar to programming language. In Schema, It is possible to inherit and create relationship among elements. In Schema, It is possible to group elements and attributes so that they can be treated as single logical unit. DTD DTDs follow SGML syntax.
In DTD everything is treated as text. This is not possible in DTD without invalidating existing documents. Grouping of elements and attributes is not possible in DTD.
In Schemas, it is possible to specify an upper It is not possible to specify an upper limit of an limit for the number of occurrences of an element element in DTDs
XML
15. Define XML namespace? XML Namespaces provide a method to avoid element name conflicts. XML namespace is a collection of element type and attribute names. A reasonable argument can be made that XML namespaces dont actually exist as physical or conceptual entities. 16. What is XSL? XSL is a language for expressing style sheets. An XSL style sheet is a file that describes the way to display an XML document. Using XSL stylesheets, we can separate the XML document content and its styling. An XSL style sheet begins with the XML declaration: <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet> defines that the document is an XSLT style sheet document. The <xsl:template> element defines a template. 17. Define CSS and XSL. XSL is a language for expressing style sheets. An XSL style sheet is a file that describes the way to display an XML document. Cascading Style Sheets is an answer to the limitations of HTML, where the structure of documents was defined and not the display. CSS formats documents for display in browsers that support it. 18. What Is the Relation between XHTML and XML? XML (Extensible Markup Language) is a generic markup language to organize generic information into a structured document with embedded tags. XHTML is entirely based on XML. You can actually say that XHTML is a child language of XML.
15
XML
d. Specify tag filename 6. The advantages of XML over HTML are (i) It allows processing of data stored in web-pages (ii) It uses meaningful tags which aids in understanding the nature of a document (iii)Is simpler than HTML (iv)It separates presentation and structure of document a. b. c. d. (i),(ii) and (iii) (i),(ii) and(iv) (ii),(iii) and (iv) (i),(iii) and (iv)
16
7. XSL definition is used along with XML definition to specify a. The data types of the contents of XML document b. The presentation of XML document c. The links with other documents d. The structure of XML document 8. XLL definition is used along with XML to specify a. The data types of the contents of XML document b. The presentation of XML document c. The links with other documents d. The structure of XML document 9. DTD definition is used along with XML to specify a. The data types of the contents of XML document b. The presentation of XML document c. The links with other documents d. The structure of XML document 10. Output of XML document can be viewed in a a. Word Processor b. Web browser c. Notepad d. None of the above 11. What is the correct way of describing XML data? a. XML uses a DTD to describe data b. XML uses a description node to describe data c. XML uses XSL to describe the data d. XML uses a validator to describe the data 12. Comments in XML document is given by: a. <?_ _ _ _> b. <!_ _ _ _!> c. <!_ _ _ _> d. </_ _ _ _> 13. Which statement is true? a. An XML document can have one root element b. An XML document can have one child element c. XML elements have to be in lower case d. All of the above
17
What is Well Formed Document ? How it is made valid ? What are differences between DTDs and Schema? Explain the structure of XML documents. What is XML Elements and Attributes ? a) b) What is XSL? What are the steps to transform XML into HTML using XSL? How is XSL different from CSS? Discuss. the XML
Why is XML such an important development? What are the XML rules for distinguishing between the content of a document and markup element?