0% found this document useful (0 votes)
2 views

01 Introduction

XML (Extensible Markup Language) is a markup language used for data interchange, allowing for the storage and transport of data in a software- and hardware-independent manner. It features a flexible structure with user-defined tags, enabling extensibility and simplicity in data sharing and transport. XML documents consist of a prolog and a body, and they can include elements, attributes, comments, and namespaces to manage data effectively.

Uploaded by

acsample0
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

01 Introduction

XML (Extensible Markup Language) is a markup language used for data interchange, allowing for the storage and transport of data in a software- and hardware-independent manner. It features a flexible structure with user-defined tags, enabling extensibility and simplicity in data sharing and transport. XML documents consist of a prolog and a body, and they can include elements, attributes, comments, and namespaces to manage data effectively.

Uploaded by

acsample0
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Extensible Markup Language

(XML)
XML
Definition
• A markup language that is fast becoming a standard of data
interchange
• A software- and hardware-independent tool for storing and
transporting data
– An open standard from W3C
– A direct descendant from SGML(Standard Generalized
Markup Language)
XML
Example
Example: Product Inventory Data
<Product>
<Name>Refrigerator</Name>
<Model Number>R3456d2h</Model Number>
<Manufacturer>General Electric</Manufacturer>
<Price>1290.00</Price>
<Quantity>1200</Quantity>
</Product>
XML
Data Interchange

• XMLs key role is data interchange


• Two business partners want to exchange customer data
– Agree on a set of tags
– Exchange data without having to change internal databases
• Other business partners can participate by using the same
tagset
– New tags can be added to extend the functionality
HTML vs. XML
• Both are markup languages
– HTML has fixed set of tags
– XML allows user to specify the tags based on requirements
• Usage
– HTML tags specify how to display data (with focus on how data looks)
– XML tags specify semantics of the data (with focus on what data is)
• Tag Interpretation
– HTML specifies what each tag and attribute means
– XML tags delimit data & leave interpretation to the parsing application
• Well formedness
– HTML very tolerant of rule violations (nesting, matching tags)
– XML very strictly follows rules of well formedness
HTML vs. XML
• XML Does Not Use Predefined Tags

- HTML has fixed set of tags

- XML allows user to specify the tags based on requirements

• XML is Extensible

- Most XML applications will work as expected even if new


data is added (or removed).

- XML allows user to specify the tags based on requirements


HTML vs. XML
• XML Simplifies Things
- XML simplifies data sharing
- XML simplifies data transport
- XML simplifies platform changes
- XML simplifies data availability
• XML is a W3C Recommendation
- became a W3C Recommendation as early as in February 1998
XML
Structure
• Prolog
– Instructs the parser as to what it is parsing
– Contains processing instructions for processor
• Body
– Tags - Entities
– Attributes - Properties of Entities
– Comments - Statements for clarification in the document
XML
Structure
Example
<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?> 🡨 Prolog
<contact>
<name>
<first name>Vishal </first name>
<last name>Srivastava</last name>
</name>
<address> 🡨 Body
<street>56 KN Road</street>
<city>Prayagraj</city>
<state>Uttar Pradesh</state>
<zip>211004</zip>
</address>
</contact>
XML
Prolog
• Syntax: <?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>
Contains declaration that identifies a document as xml
• Version
– Version of XML markup language used in the data
– Not optional
• Encoding
– Identifies the character set used to encode the data
– Default compressed Unicode: UTF-8
• Standalone
– Tells whether or not this document references external entity
• May contain entity definitions and tag specifications
XML Syntax
Elements & Attributes
• Uses less-than and greater-than characters (<…>) as delimiters
• Every opening tag must having an accompanying closing tag
– <First Name>Vishal </First Name>
– Empty tags do not require an accompanying closing tag.
– Empty tags have a forward slash before the greater-than sign e.g.
<Name/>
• Tags can have attributes which must be enclosed in double quotes
– <name first=“Vishal” last=“Srivastava”>
• Elements should be properly nested
– The nesting can not be interleaved
– Each document must have one single root element
• Elements and attribute names are case sensitive
Tree Structure
Elements
• XML documents have a tree structure containing multiple
levels of nested tags.
– Root element is a single XML element which encloses all
of the other XML elements and data in the document
– All other elements are children of the root element
Tree Structure
Elements
<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>
<contact> 🡨 Root Element
<name>
<first name></first name>
<last name>Srivastava</last name>
</name>
<address>
<street>56 KN Road</street> 🡨 Child Elements
<city>Prayagraj</city>
<state>Uttar Pradesh</state>
<zip>211004</zip>
</address>
</contact>
Attributes
Definition and Example
• Attributes are properties associated with an element
• Each attribute is a name value pair
– No element may contain two attributes with same name
– Name and value are strings
Example
<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>
<contact>
<name first=“Vishal ” last=“Srivastava”></name> 🡨 Attributes
<address>
<street>56 KN Road</street> 🡨 Nested Elements
<city>Prayagraj</city>
<state>Uttar Pradesh</state>
<zip>211004</zip>
</address>
Elements vs. Attributes
Comparison
• Data should be stored in Elements
• Information about data (meta-data) should be stored in
attributes
– When in doubt use elements
• Rules of thumb
– Elements should have information which some one may want to read.
– Attributes are appropriate for information about document that has
nothing to do with content of document
e.g. URLs, units, references, ids belong to attributes
– What is your meta-data may be some ones data
Comments
Basics
• XML comments begin with “<!--”and end with “-->”
– All data between these delimiters is discarded
– <!-- This is a list of names of people -->
• Comments should not come before XML declaration
• Comments can not be placed inside a tag
• Comments may be used to hide and surround tags
<Name>
<first>Vishal </first>
<!-- <last>Srivastava</last> --> 🡨 Last tag is ignored
</Name>
• “--” string may not occur inside a comment except as part of
Namespaces
Basics
• XML documents come from different sources
– Combining elements from different sources can result in
name conflict
– Namespaces allow the interpreter to resolve the elements
• Namespaces
– Declared within element start-tag using attribute xmlns
– Represented as an actual URI (since namespaces are
globally unique)
– e.g. <Collection xmlns:book="http://www.mjyOnline.com/books"
xmlns:cd=http://www.mjyOnline.com/books>
– Here book and cd are short hands for the full namespace name
Namespaces
Example
<?xml version="1.0"?> <?xml version="1.0"?>
<!-- File Name: Collection.xml --> <!-- File Name: Collection.xml -->
<COLLECTION <COLLECTION
xmlns:book="http://www.mjyOnline.com/books" <ITEM>
xmlns:cd="http://www.mjyOnline.com/cds"> <TITLE>Violin Concertos Numbers 1, 2, and 3</TITLE>
<ITEM Status="in"> <COMPOSER>Mozart</COMPOSER>
<TITLE>The Adventures of Huckleberry <PRICE>$16.49</PRICE>
Finn</book:TITLE>
</ITEM>
<AUTHOR>Mark Twain</book:AUTHOR>
<TITLE>Violin Concerto in D</TITLE>
<PRICE>$5.49</book:PRICE>
<COMPOSER>Beethoven</COMPOSER>
</ITEM>
<PRICE>$14.95</PRICE>
<ITEM Status="in">
</ITEM>
<TITLE>The Marble Faun</TITLE>
</COLLECTION>
<AUTHOR>Nathaniel Hawthorne</AUTHOR>
<PRICE>$10.95</PRICE>
</ITEM>
<ITEM>
<ITEM Status="out">
<TITLE>Leaves of Grass</TITLE>
<AUTHOR>Walt Whitman</AUTHOR>
<PRICE>$7.75</PRICE>
Namespaces
Example
<?xml version="1.0"?> <cd:ITEM>
<!-- File Name: Collection.xml --> <cd:TITLE>Violin Concertos Numbers 1, 2, and
<COLLECTION 3</cd:TITLE>
<cd:COMPOSER>Mozart</cd:COMPOSER>
xmlns:book="http://www.mjyOnline.com/books"
<cd:PRICE>$16.49</cd:PRICE>
xmlns:cd="http://www.mjyOnline.com/cds">
</cd:ITEM>
<book:ITEM Status="in">
<book:ITEM Status="out">
<book:TITLE>The Adventures of Huckleberry <book:TITLE>The Legend of Sleepy Hollow</book:TITLE>
Finn</book:TITLE>
<book:AUTHOR>Washington Irving</book:AUTHOR>
<book:AUTHOR>Mark Twain</book:AUTHOR> <book:PRICE>$2.95</book:PRICE>
<book:PRICE>$5.49</book:PRICE> </book:ITEM>
</book:ITEM> <book:ITEM Status="in">
<cd:ITEM> <book:TITLE>The Marble Faun</book:TITLE>
<cd:TITLE>Violin Concerto in D</cd:TITLE> <book:AUTHOR>Nathaniel Hawthorne</book:AUTHOR>
<cd:COMPOSER>Beethoven</cd:COMPOSER> <book:PRICE>$10.95</book:PRICE>
<cd:PRICE>$14.95</cd:PRICE> </book:ITEM>
</cd:ITEM> </COLLECTION>
<book:ITEM Status="out">
<book:TITLE>Leaves of Grass</book:TITLE>
<book:AUTHOR>Walt Whitman</book:AUTHOR>
Display XML
Style Sheets
• A style sheet is a file that contains instructions for
rendering individual elements in an XML document
• Two kinds of style sheets exist
– Cascading Style Sheets (CSS)
– Extensible Stylesheet language (XSLT)
• Please refer to the following web site for
comprehensive information on style sheets
– http://www.w3schools.com/css/default.asp
Cascading Style Sheets
Example
<?xml version="1.0"?> <BOOK>
<!-- File Name: Inventory01.xml --> <TITLE>The Legend of Sleepy Hollow</TITLE>
<?xml-stylesheet type="text/css" <AUTHOR>Washington Irving</AUTHOR>
href="Inventory01.css"?> <BINDING>mass market paperback</BINDING>
<PAGES>98</PAGES>
<INVENTORY> <PRICE>$2.95</PRICE>
<BOOK> </BOOK>
<TITLE>The Adventures of Huckleberry <BOOK>
Finn</TITLE> <TITLE>The Marble Faun</TITLE>
<AUTHOR>Mark Twain</AUTHOR> <AUTHOR>Nathaniel Hawthorne</AUTHOR>
<BINDING>mass market paperback</BINDING> <BINDING>trade paperback</BINDING>
<PAGES>298</PAGES> <PAGES>473</PAGES>
<PRICE>$5.49</PRICE> <PRICE>$10.95</PRICE>
</BOOK> </BOOK>
<BOOK> <BOOK>
<TITLE>Leaves of Grass</TITLE> <TITLE>Moby-Dick</TITLE>
<AUTHOR>Walt Whitman</AUTHOR> <AUTHOR>Herman Melville</AUTHOR>
<BINDING>hardcover</BINDING> <BINDING>hardcover</BINDING>
Cascading Style Sheets
Example
/* File Name: Inventory02.css */ BINDING
BOOK {display:block;
{display:block; margin-left:15pt}
margin-top:12pt;
PAGES
font-size:10pt}
{display:none}
TITLE
{display:block;
PRICE
font-size:12pt; {display:block;
font-weight:bold; margin-left:15pt}
font-style:italic}
AUTHOR
Cascading Style Sheets
Display
Formal Languages/Grammars
Basics
• A formal language is a set of strings
– It is characterized by a set of rules which determine which strings are
a part of the language and which are not
– In case of programming languages, programs which compile are
grammatical corret (others are not)
– In a natural language, like English, correct sentences follows rules of
the English language grammar
• More precisely grammar a defines four things
– A vocabulary out of which the strings are constructed (terminal
symbols)
– Vocabulary that is used to formulate grammar rules (non terminal
symbols)
Validated XML Document
Basics
• An XML document is valid if it conforms to the grammar of
the language
– Validity is different from well-formedness
• Two ways to specify the grammar of the language
– Document Type Definition (DTD)
– XML Schema
• Why bother with the language grammar
– It provides the blueprint of the language
– Ensures that the data is interchangable
– Eliminates processing errors in custom software which expects a
particular document content and structure
Document Type Declaration
Basics
• Document type declaration is a block of XML markup added
to the prolog of the document
– It has to follow the XML declaration
– It has to be outside of other markup language
• It defines the content and structure of the language
– Without a document type declaration or schema a document is merely
checked for well-formedness and not validity
• Why bother with the language grammar
– It provides the blueprint of the language
– Ensures that the data is interchangeable
– Eliminates processing errors in custom software which expects a
particular document content and structure
Document Type Definitions
Basics
• Document type definition (DTD) consists of a series of
markup declarations enclosed in square brackets
<?xml version=“1.0” standalone=“yes”?>
<!DOCTYPE GREETING [
<!ELEMENT GREETING (#PCDATA)>
]>
<GREETING>
Hello XML!
</GREETING>
• A DTD can also be stored separately from the XML document
and referenced in it. PCDATA is parsed character data
Document Type Definitions
Syntax
• Element Type Declaration
– Syntax: <!Element Name contentspec>
– Name is the name of the element
– contentspec is the content specification
• Example:
– <!Element Title (#PCDATA)>
• Content specification can have four types of values
– EMPTY content – Element must not have content
<!Element Image EMPTY>
– ANY Content – Can contain any thing
<!Element misc ANY>
– Element Content – Child elements but no character data
<!DOCTYPE BOOK [
Element Content Specification
Types
• Content Specification indicates allowed child elements and
their order
– If element has element content it can not contain any character data
• Types of content specifications
– Sequence: Indicates that each element must have a specific sequence
of child elements
– Example
<!Doctype Mountain [
<!ELEMENT MOUNTAIN (NAME, HEIGHT, STATE)>
<!ELEMENT NAME (#PCDATA)
<!ELEMENT HEIGHT (#PCDATA)
<!ELEMENT STATE (#PCDATA)
]>
– Valid XML
Element Content Specification
Types
• Types of content specifications
– Choice: Indicates that element can have one of a series of child
elements
– Each element is separated by a | sign
– Example
<!Doctype FILM [
<!ELEMENT FILM (STAR | NARRATOR | INSTRUCTOR)>
<!ELEMENT STAR (#PCDATA)>
<!ELEMENT NARRATOR (#PCDATA)>
<!ELEMENT INSTRUCTOR (#PCDATA)>
]>
– Valid XML
<FILM>
<STAR>ROBERT REDFORD</STAR>
</FILM>
Element Content Specification
Number of Elements
• Specifying the number of elements allowed
– ? zero or one
– + one or more
– * zero or more
– Example
<!Doctype Mountain [
<!ELEMENT MOUNTAIN (NAME+, HEIGHT?, STATE)>
<!ELEMENT NAME (#PCDATA)
<!ELEMENT HEIGHT (#PCDATA)
<!ELEMENT STATE (#PCDATA)
]>
– Valid XML
<MOUNTAIN>
<NAME>Peublo Peak</NAME>
Element Content Specification
Modification
• Modifying a group of elements
– Example
<!Doctype FILM [
<!ELEMENT FILM (STAR | NARRATOR | INSTRUCTOR)+>
<!ELEMENT STAR (#PCDATA)>
<!ELEMENT NARRATOR (#PCDATA)>
<!ELEMENT INSTRUCTOR (#PCDATA)>
]>
– Valid XML
<FILM>
<NARRATOR>Sir Gregory Parsloe</NARRATOR>
<STAR>ROBERT REDFORD</STAR>
Element Content Specification
Nesting
• Nesting in specification
– Example
<!Doctype FILM [
<!ELEMENT FILM TITLE, CLASS,(STAR | NARRATOR | INSTRUCTOR)+>
<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT CLASS (#PCDATA)>
<!ELEMENT STAR (#PCDATA)>
<!ELEMENT NARRATOR (#PCDATA)>
<!ELEMENT INSTRUCTOR (#PCDATA)>
]>
– Valid XML
<FILM>
<TITLE>The Net</TITLE>
Element Content Specification
Mixed Content Model
• Mixed Content Model: Allows element to contain
– Character Data
– Child elements in any position and any frequency (zero or more
repetitions)
– Child elements can be interspersed with data
• Character data only
– Example
<!ELEMENT TITLE (#PCDATA)>

• Character data and elements


– Example:
<!ELEMENT TITLE (#PCDATA | SUBTITLE)+>
Attribute Specification
Basics
• All attributes in the document need to be specified using an
attribute declaration list. It defines
– Defines the name of the attribute
– Defines the data type of each attribute
– Specifies whether an attribute is required or noe
• Syntax: <!ATTLIST Name Attdefs>
– Name is the name of the element
– Attdefs is a series of one or more attribute definitions
• Attribute definition Syntax: Name AttType DefaultDecl
– Name is the attribute name
– AttType is the type of the attribute (CDATA, Token Type,
Enumerated)
Entity Specification
Types
• There are two kinds of entities in XML documents1
– Character entities (referred by character unicode number)
– Named entities, referred to by name
XML Parsing
Definition and Types
• An XML parser is a program that reads an XML document
and makes its contents available for processing
• There are two standard types of parsers for XML
– Document Object Model (DOM) which makes the document available
as a tree
– Simple XML Parser (SAX) which associates an event with each tag
and each block of text
• XML parsers are available from many vendors
– Each vendor conforms to the standardized XML interfaces
– One of the best parsers is the xerces parser
– Suns API for XML parsing is JAXP (supports basic classes and
SAX Parser
Basics
• As the parser scans the document it sends notifications of
events, for instance
– Element start
– Element end
– Character sequence between two elements is found
• SAX provides standard names for these callback functions that
are triggerd by these events
void characters (char[] ch, int start, int length): notification of character data
void startDocument(): notification of start of document
void endDocument(): notification of end of document
void startElement(String name, AttributeList atts): notification of start of element
SAX Parser
Example
From professional JSP page 658
XSLT Parser
Definition and Uses
• XSLT is an XML structure transforming language
– Any treee transforming language needs an ability to refer
to tree paths
– Xpath is the sub-language underneath XSLT for tree path
description
• There are two scenarios for use of XSLT
– Browser contains an XSLT and uses it to render XML
documents
– XSLT is used for changing the structure of an existing
XML document
XSLT Parser
Basics
• XSLT style sheet is an XML document
• Consists of two parts
– Standard XML declaration including namespace declaractions
– Top level elements that set up the general framework for the output,
e.g., variables or import parameters from the command line
• Processing involves the following
– A current list of nodes from the source document is created by
matching a pattern
– Output to the current node is generated by instantiating a template
corresponding the current pattern
– In process of transformation new nodes can be added to the list
XSLT Parser
Example
• XSLT
Web Services
Web Services
Definition
• Web Services are software programs that use XML to exchange
information with other software programs via common Internet
protocols.
– Web services communicate over the network to provide specific
methods that other applications can invoke.
– Thus applications residing on different computer can work
synergistically by invoking methods on each other
– Http is the key protocol used for Web Services.
• Characteristics
– Programmable
– Encapsulate a task
– XML based data exchange allows programs on heterogenous platforms
to communicate (SOAP)
Web Services
SOAP
• SOAP – Simple Object Access Protocol
– Enables data transfer between systems distributed over a network
– A SOAP method send to the a Web Service invokes a method provided
by the service
– Web Service may return the result via another SOAP message
• SOAP consists of standardized XML schemas
• Defines a format for transmitting XML messages over network
– Includes data types and message structure
• Layered over an Internet protocol, such as HTTP and can be
used to transfer data across the Web and other networks
– Http allows message transfer across firewall since Http messages are
Web Services
SOAP
• SOAP message consists of three parts
– Envelope
– Header
– Body
• Envelope wraps the entire message and contains header and
body
• Header (optional) provides information on security and routing
• Body contains application specific data that is being transferred
• Other alternative to SOAP are XML-RPC
– SOAP de facto standard due to simplicity, extensibility and
Web Services
WSDL
• WSDL – Web Services Description Language
• Provides means to provide information about a web service
– Instructions of its use
– Capability of the service
• Provides information on connection to the service and
communicate
• Syntax is fairly complex
– Normally created using automated tools
– Not important to understand the precise syntax of WSDL while
developing web services
Web Services
UDDI
• UDDI – Universal Description, Discovery and Integration
– Allows developers and businesses to publish and locate web services on
a network via use of registries
– The registries can be made private or public
• Structure similar to a phone book
– White pages contain contact information and textual description
– Yellow pages provides classification information about companies and
details of company’s electronic capability
– Green pages list technical data relating to services and business processes

You might also like