X Cert1424 A4
X Cert1424 A4
10 Oct 2006
When an application needs to share data with another system, it is often necessary
to transform an XML document into another XML format, governed by a differing XML
Schema or Document Type Definition (DTD). When the app is required to share or
display XML data to a user, the XML document might be transformed into HTML,
Scalable Vector Graphics (SVG), VoiceXML, plain text, or any of a large number of
human-readable formats. This tutorial deals with XML transformations, and is the
fourth in a series of five tutorials that you can use to help prepare for the IBM
certification Test 142, XML and Related Technologies.
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 1 of 43
developerWorks® ibm.com/developerWorks
This tutorial is written for programmers and scripters who have a basic
understanding of XML and whose skills and experience are at a beginning to
intermediate level. As such, you should have a general familiarity with data types,
including arrays, graphs, and trees in particular. You should also be familiar with
general programming techniques such as iteration and recursion. Although this
tutorial begins with the basics of the technologies discussed, it is not intended to be
a comprehensive reference. However, if studied well, this tutorial, combined with the
references in Resources, will provide sufficient breadth and depth to master the
transformation aspects of the XML certification exam.
Objectives
After completing this tutorial, you will:
• Understand how to use XSLT to transform XML
• Be able to do string and math operations and to search and traverse XML
with XPath
• Know how to visually format XML with CSS
Prerequisites
This tutorial is written for developers who have a background in programming or
scripting and who have an understanding of basic computer-science models and
data structures. You should be familiar with the following XML-related,
computer-science concepts: tree traversal, recursion, and reuse of data. You should
be familiar with Internet standards and concepts, such as Web browser,
client-server, documenting, formatting, e-commerce, and Web applications.
System requirements
XML transformations
Page 2 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
As with Part 3 of this series, you need a Linux® or Microsoft® Windows® box with at
least 50MB of free disk space and administrative access to install software. This
tutorial uses, but does not require:
• Altova XMLSpy (The free Home Edition will suffice.)
• Microsoft™ Internet Explorer, Version 6.0 or greater
• Mozilla Firefox, Version 1.0.7 or greater
Please note that XSLT documents are XML and are therefore capable of being
edited with any text editor, such as Microsoft Notepad or Vim. It is useful, however, if
your editor has the ability to assist you in making your documents well-formed;
XMLSpy can do this and much more. CSS documents are not well-formed, so
please use whatever text editor you prefer for these.
<?xml version="1.0"?>
<catalog>
<dvd code="_1234567">
<title>Terminator 2</title>
<description>A shape-shifting cyborg is sent back
from the future to kill the leader of the
resistance.</description>
<price>19.95</price>
<year>1991<year>
</dvd>
<dvd code="_7654321">
<title>The Matrix</title>
<price>12.95<price>
<year>1999<year>
</dvd>
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 3 of 43
developerWorks® ibm.com/developerWorks
Another XML instance document for a Web site map will be shown later.
Section 3. XSLT
The wonder is that we can see these trees and not wonder more.
-- Ralph Waldo Emerson
XSLT is basically a system for declaring what should happen when certain element
types are encountered within an XML document. XSLT is not compiled; instead, it --
along with an XML input document -- is interpreted by a stylesheet processor, such
as Xalan or Microsoft XML Core Services (MSXML). You may imagine its usage as
a mathematical function: XSLT( XML ) = output.
Because the word stylesheet appears within the name XSLT, some within the
programming community assume that XSLT has no serious capability as a
programming language -- that it is merely something akin to Cascading Style Sheets
(CSS). Nothing against CSS, but tsk-tsk. Although not as terse as most other
languages -- in large part because it must be well-formed -- XSLT (combined with
XPath, which is its means of searching and traversing XML tree structures and of
performing string and math operations) is capable of rich functionality. Surprisingly
elegant code is possible, as you'll see later when I discuss recursion.
In the sections that follow, you'll see how to employ XSLT and XPath to retrieve data
from XML documents. In most examples, this data will eventually be formatted as
HTML.
XML transformations
Page 4 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
on a server or within an environment such as XMLSpy, the transforms that follow will
also work with minor or no modifications within the latest versions of browsers that
support XSLT, such as Mozilla Firefox and Microsoft Internet Explorer. XML
documents viewed on these browsers require a directive similar to the following,
which you should place in the document prolog, just below the <?xml
version="1.0"?> tag at the top of the XML input document. Aadjust the href
attribute for the appropriate value, which can be absolute or relative:
or:
You would then close these tags at the bottom of the document with
</xsl:stylesheet or </xsl:transform>, respectively. The terms stylesheet
and transform will therefore be used interchangeably throughout this tutorial.
Template elements
The xsl:template element contains a set of rules that you apply to specified
elements within an XML input document. Every xsl:stylesheet or
xsl:transform must have at least one xsl:template element. Much of the
richness of XSLT programming comes from the use of multiple templates as logical
modules, each with its own purpose. You can trigger the template elements through
a match attribute or by direct invocation through a name attribute.
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 5 of 43
developerWorks® ibm.com/developerWorks
Use of the match attribute is fairly straightforward. When a pattern indicated by the
value of the match attribute is found, a rule is executed. For instance, you could use
the following to indicate each dvd element found in the document with the string
"Another DVD":
Notice that there is one occurrence of the template rule for each dvd element in the
input document. Every time the dvd element is found in the input document, this rule
is invoked.
Context node
The context node is the element that is currently being examined.
All other elements are referenced (through XPath expressions)
relative to the context node.
The value of the select attribute of the <xsl:value-of/> tag in both cases is a
pattern that gives the text value of the current element being examined, or context
node. The XPath expression denoted in the select attribute values above by "." is
the self axis. The value of the match attribute of the template tag is also an XPath
expression. Here, the context node is set by the template match. Other ways of
setting the context node are to use xsl:apply-templates and xsl:for-each.
The previous two template tags won't give you only the title and price values;
everything else in the XML input document will be displayed as well. To display only
the title and price element values, you need one more template rule:
XML transformations
Page 6 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
<xsl:template match="dvd">
<p>
<xsl:apply-templates select="title"/>- $<xsl:apply-templates select="price"/>
</p>
</xsl:template>
Some minor HTML formatting (a <p/> tag) has been added as well. The output looks
like this:
Calling templates
A variation on the previous approach that opens up other possibilities is to explicitly
call templates that display the child title and price elements of the dvd element.
In this case, the entire transform looks like this:
For readability, the hyphen for the title and the dollar sign for the price has been
moved to the named templates, Title and Price. This is now the same output that the
preceding transform created:
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 7 of 43
developerWorks® ibm.com/developerWorks
Iteration
Continuing with the example input document catalog.xml, let's look at ways to repeat
a template rule across a set of elements. In this case, the context node will be the
root, or catalog, element. Instead of the method shown previously of matching
elements to trigger the enforcement of template rules, you can use iteration with the
<xsl:for-each/> tag.
First, look at how you might use xsl:for-each to loop through each of the dvd
elements in the catalog:
<xsl:template match="catalog">
<html>
<body>
<xsl:for-each select="dvd">
<xsl:call-template name="DVD"/>
<xsl:for-each>
<body>
</html>
</xsl:template>
Notice that some HTML has been added for formatting within a browser. Looking at
the contents of the xsl:for-each tag, you can see that a named template, DVD, is
called.
might then have the following display logic:
<xsl:template name="DVD">
<xsl:variable name="label" select="@code"/>
<p>
<img src="images/{$label}.gif" alt=""/>
<xsl:value-of select="title"/>
</p>
<xsl:template>
Most new XSLT programmers find this behavior annoying, but there
are benefits to be gained from it. Consensus within the
programming community has it that strictly speaking, XSLT is not a
functional programming language -- however, it does bear some
resemblance to one. One of the characteristics that XSLT shares
with functional languages is that it prohibits side effects in variables.
This allows XSLT programmers to know that once a template
returns values correctly, it can be trusted to do so from then on
since no other templates can change the values of any "variables"
that template is using.
XML transformations
Page 8 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Second, the new label variable is used in the creation of an <img/> tag for the
slick, new HTML formatting. (Assumed here is that the presence of an image within
the images/ directory for each DVD. Each image would then have the DVD's code
attribute value for its file name.) Notice that to get the value of the variable, you put a
dollar sign ("$") in front of it and then wrap it with curly braces ("{" and "}"); this is
similar to using the xsl:value-of element. (Of course, you do not need to create
the variable in this example; you might instead just directly reference the @code
attribute's value within the img tag like this: <img src="images/{@code}.gif"
alt=""/>.)
<html>
<body>
<p>
<img alt="" src="images/_1234567.gif">Terminator 2</p>
<p>
<img alt="" src="images/_7654321.gif">The Matrix</p>
<p>
<img alt="" src="images/_2255577.gif">Life as a House</p>
<p>
<img alt="" src="images/_7755522.gif">Raiders of the Lost Ark</p>
</body>
</html>
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 9 of 43
developerWorks® ibm.com/developerWorks
Formatting output
You might have noticed that the previous output no longer has the <?xml?> tag at
the top. This is because of two things -- either one of which would cause the output
to be treated as HTML. First, the tag <xsl:output method="html"/> was
added just after the opening <xsl:stylesheet> tag. The method attribute of
xsl:output typically has either this value (html) or xml. Another useful attribute
of the xsl:output tag is encoding, which defines the character set of the output
and is used when the method attribute value is xml. Another reason why the
<?xml?> tag disappeared is that most stylesheet processors notice when HTML
tags are used, such as <html/> and <body/>, which you included in the matched
template. When this happens, the output method is treated automatically as HTML.
Sorting results
Document order
Document order is the simply the order in which elements appear
within an XML document. Under this scheme, parent elements
come before their children; this makes the root element the first
element in any document.
<xsl:for-each select="dvd">
<xsl:sort select="title"/>
<xsl:call-template name="DVD"/>
</xsl:for-each>
Incorporating the preceding bold text xsl:sort into the previous transform gives
the following output:
<html>
<body>
<p>
<img alt="" src="images/_2255577.gif">Life as a House</p>
<p>
<img alt="" src="images/_7755522.gif">Raiders of the Lost Ark</p>
<p>
<img alt="" src="images/_1234567.gif">Terminator 2</p>
<p>
<img alt="" src="images/_7654321.gif">The Matrix</p>
</body>
</html>
Notice that the DVDs are now arranged alphabetically by title. You may put
additional xsl:sort elements after the first one to provide finer-grained sorts of the
results set.
XML transformations
Page 10 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Recursion
To implement the previous DVD list transform through recursion migth seem a little
strange because:
<html>
<body>
<p>
<img alt="" src="images/_2255577.gif">Life as a House</p>
<p>
<img alt="" src="images/_7755522.gif">Raiders of the Lost Ark</p>
<p>
<img alt="" src="images/_1234567.gif">Terminator 2</p>
<p>
<img alt="" src="images/_7654321.gif">The Matrix</p>
<body>
</html>
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 11 of 43
developerWorks® ibm.com/developerWorks
Tail recursion
Because the recursive call to itself happens at the end of the DVD
template in the example, this kind of recursion is called tail
recursion. Tail recursion can be less computationally expensive
than other kinds of recursion. This is because some stylesheet
processors are able to optimize tail recursion by converting it to
iteration.
Here, the recursion starts within the matched template at the top of the transform
through a call to the DVD template. The DVD template then calls itself until the dvd
element that it is working with is the last one.
A few additional things are worth looking at in this example. For one thing, the
context node is never shifted to any of the dvd elements. This is because
xsl:call-template doesn't change the context node like xsl:for-each does.
Because of this, the variable defined within the DVD template must reference the
current dvd element with the help of an xsl:param element, position, which
holds the position of the dvd element that is currently being examined. Parameters
can pass from a template to the named template that it calls through the
<xsl:with-param/> tag. <xsl:with-param> is always a child element of the
<xsl:call-template/> tag. These parameters are picked up in the called
template through the <xsl:param/> tag, which must always come immediately after
the opening <xsl:template/> tag.
You might notice that the xsl:param, position, inside of the DVD template, has
a default value indicated by the select attribute. For this reason, it is not necessary
to use a corresponding xsl:with-param in the first call to the DVD template from
the matched template. Use of default values for xsl:param tags is a good practice;
it can help to prevent unexpected behaviors within called templates. (Were it not for
the need to contrast the two parameters, the dvdCount parameter would also have
a default value.)
Notice too within the xsl:call-template at the end of the DVD template that the
position parameter is incremented as it is passed to the next instance of the DVD
template. As seen within the test attribute of <xsl:if/>, the values of the two
parameters are used as the basis for the stop condition of the recursion.
The position parameter is also used within the DVD template to indicate which
dvd element is being examined. This is done through the predicate used to qualify
the dvd element within the select attribute of both the xsl:variable and
xsl:value-of tags. Both of these are XPath expressions, which you will explore
more fully later in this tutorial. XPath predicates are noted between square brackets
("[" and "]") and immediately follow the element that they qualify. Notice that in this
case, the position parameter is used within the predicate to indicate the position
of the currently examined dvd element: dvd[$position].
The <xsl:if/> element is pretty straightforward in its usage. It has a condition that
is evaluated within its mandatory test attribute for a Boolean (true or false) result.
In case you wondered, there is no <xsl:else/> or <xsl:elseif/> tag. Instead of those,
XML transformations
Page 12 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
you use <xsl:choose/>, which will describe later. Lastly, notice that the less-than
sign within the test attribute of the <xsl:if/> is XML-escaped (<>). The use of
a non-escaped less-than sign (<) would cause the stylesheet processor to throw an
exception, since this character as well as the closing angle bracket, or greater-than
sign (>), are used to indicate the beginning and ending of XML elements.
<?xml version="1.0"?>
<site>
<page label="A" href="0.html">
<page label="AA" href="0_0.html">
<page label="AAA" href="0_0_0.html">
<page label="AAAA" href="0_0_0_0.html"/>
<page label="AAAB" href="0_0_0_1.html"/>
<page label="AAAC" href="0_0_0_2.html"/>
</page>
<page label="AAB" href="0_0_1.html"/>
<page label="AAC" href="0_0_2.html"/>
</page>
<page label="AB" href="0_1.html"/>
<page label="AC" href="0_2.html"/>
</page>
<page label="B" href="1.html"/>
<page label="C" href="2.html"/>
</site>
The sample site map in Listing 2 is four levels deep and suggests HTML pages
nested by topic. The three top-level pages have label attribute values A, B, and C.
Page A has three child pages (AA, AB, and AC), and its first child has three child
pages (AAA, AAB, and AAC). Finally, page AAA has child pages (AAAA through
AAAC). This tree needs to be rendered as HTML.
To show the site map, let's build a set of nested HTML unordered lists -- one for
each navigation level. To construct these lists, iterate through one navigation level at
a time, starting from the top level. If you find that a page has child pages, add that
list of child pages after the current page and then continue iterating through the
current navigation level. In
, the algorithm looks like this:
Build List;
Build List {
For each page {
Write page;
If (page has child pages)
Build List;
}
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 13 of 43
developerWorks® ibm.com/developerWorks
This algorithm processes a navigation tree to any number of levels using a small
amount of code (small for XSLT, anyway). The brevity of the XSLT code required
demonstrates how well suited it is to this kind of task. Add a dash of HTML syntax,
and a transform using the previous logic follows. Comments are added to tie it to the
algorithm:
Notice that the preceding transform is tail-recursive. Also, if the current page has no
child pages, a stop condition for the recursion is met, and the recursive call to the
BuildList template is not made. The HTML output of this transform then looks like
this:
<html>
<body>
<ul>
<li><a href="0.html">A</a></li>
<ul>
<li><a href="0_0.html">AA</a></li>
<ul>
<li><a href="0_0_0.html">AAA</a><li>
<ul>
<li><a href="0_0_0_0.html">AAAA</a><li>
<li><a href="0_0_0_1.html">AAAB</a><li>
<li><a href="0_0_0_2.html">AAAC</a></li>
</ul>
<li><a href="0_0_1.html">AAB<a></li>
<li><a href="0_0_2.html">AAC</a><li>
<ul>
<li><a href="0_1.html">AB</a></li>
<li><a href="0_2.html">AC</a><li>
</ul>
<li><a href="1.html">B<a></li>
XML transformations
Page 14 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
<li><a href="2.html">C</a><li>
<ul>
</body>
</html>
• A
• AA
• AAA
• AAAA
• AAAB
• AAAC
• AAB
• AAC
• AB
• AC
• B
• C
When initially rendered, the HTML elements that represent all but the top-tier pages
initially have their style's display property set to "none" (more about this in CSS
later). You then use JavaScript logic on the browser to expand the HTML tag
representing each ancestor page of the context node by setting the style display
property value to "block". To make this behavior page-specific, you can assume
that the @href attribute of each page element is unique. You can then use an
xsl:variable to hold the value of the context node's @href attribute. Then, as
each page element receives the context through iteration within each navigation tier,
you can compare its @href attribute to the xsl:variable that holds the @href for
the page being rendered. In this way, links for the current page and all pages above
it are displayed.
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 15 of 43
developerWorks® ibm.com/developerWorks
It is also appropriate to show the child pages of the current page, if there are any.
You can use either the HTML unordered list elements shown previously or HTML
<div/> tags to hold each navigation tier. These elements require unique id
attribute values so that you can programmatically toggle their style properties
between display:none and display:block. If you use <div/> tags, they will
need a non-zero padding-left or margin-left style value to provide indention.
It is left as an exercise to implement these hints. However, with a few images added
for glitz, the transform output can look like the tree navigation widget shown in
Figure 1. The image is the rendered HTML output of a recursive XSLT transform,
which is the basis of most of the hints given. However, instead of nested elements, I
uaws sibling elements with pointer-to-parent relationships implemented through a
pair of attributes. The HTML output shown in Figure 1 was rendered in Internet
Explore 6.0.
In the rendered HTML (not the image shown in Figure 1), clicking on the minus and
plus icons depicted in Figure 1 collapses and expands the subtree beneath them,
while clicking on the node text navigates you to that page. (All nodes in Figure 1 are
shown expanded to reveal the entire tree.) After the destination page is loaded
within the browser, all nodes are initially collapsed so that only the top-tier pages are
shown. Then, the tree is expanded from the current page upward (leftward, really) to
the top tier so that all ancestor pages and the current page are shown. Expanding
the tree to the current page also reveals any child pages it might have.
Stop condition
One last recursion hint: Much like the brakes on a car, the stop
condition is vital -- so understand and code it first.
XML transformations
Page 16 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Because of the nested nature of XML documents, combined with the fact that
xsl:variable elements are immutable, you'll find it beneficial to think and code in
recursion to solve most problems in XSLT. In general, the trick to doing this is just to
design your templates so that, like the BuildList template, they are general enough to
apply to all cases of the problem at hand. If they are, they can then just call
themselves as needed.
Conditional logic
Returning to the DVD catalog example, suppose that you want to base some aspect
of a report of the DVDs within the catalog on the price of each DVD. Instead of
displaying the price, you might instead want to categorize the DVD by cost. To do
this, you'll use xsl:choose.
Let's modify the iterative solution used previously to show each dvd element by
calling the Price template that you wrote earlier:
Now, modify the Price template to display one of three adjectives ("Pricey!,"
"Cheap!," and "So so") instead of the dollar amount:
<xsl:template name="Price">
<xsl:choose>
<xsl:when test="price < 15.00"> - Cheap!</xsl:when>
<xsl:when test="price > 19.00"> - Pricey!</xsl:when>
<xsl:otherwise> - So so</xsl:otherwise>
</xsl:choose>
</xsl:template>
Notice that xsl:choose has two child elements: the mandatory xsl:when, which
behaves much like xsl:if, and xsl:otherwise, which is optional. The output of
this transform using xsl:choose looks like the following:
<html>
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 17 of 43
developerWorks® ibm.com/developerWorks
<body>
<p>Terminator 2 - Pricey!</p>
<p>The Matrix - Cheap!</p>
<p>Life as a House - So so<p>
<p>Raiders of the Lost Ark - Cheap!</p>
</body>
</html>
Like most XSLT coders, you'll be less than thrilled because there are no xsl:else and
xsl:else-if elements -- and xsl:choose is so verbose. XSLT 2.0 has a solution for
this, but I won't discuss it in this tutorial. XML in a Nutshell, 3rd Edition (see
Resources) has sections within most chapters that discuss the differences between
versions 1.0 and 2.0 of XSLT and XPath.
<xsl:import href="URI"/>
and
<xsl:include href="URI"/>
The URI value for the href attributes refers to the path and name of the file
containing the transform that is being added. The URI value can be relative or
absolute. The difference between these two elements is that templates incorporated
through xsl:import are allowed to have name conflicts, in which case those
templates that were imported are ignored. With xsl:include, named templates
belonging to those transforms that are brought in cannot not clash with any in the
importing transform. These files are simply copied into the current transform at the
point of the xsl:include. For both of these elements, circular references between
imported and importing files are not allowed. As with most other things XML, the
nesting of files through xsl:import and xsl:include is allowed.
XSLT summary
XSLT is a capable programming language that also happens to bear the novelty of
being well-formed. As you'll see in the following section, XSLT combined with XPath
can solve pretty much any computational or formatting problem. Tricks to mastering
XSLT include gaining an appreciation for nesting named template calls and
becoming fond of recursion.
XML transformations
Page 18 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Section 4. XPath
A seed hidden in the heart of an apple is an orchard invisible.
-- Welsh proverb
In the previous section, you saw how to use XPath to reference the elements within
the sample XML documents. For example, you saw that in order to see the elements
within the trees, XPath expressions such as / (the document root), page (the
expected child element type), and .. (the parent of the context node) are used. As
you'll see in the following sections, it is possible to access any part of an XML
document through the appropriate combination of XPath expressions.
XPath also features set, string, and math functions. While these are not necessarily
on par with most other scripting languages such as Perl or JavaScript in terms of
brevity or power, in most cases you can construct XSLT templates such that these
functions -- combined with the appropriate logic -- can accomplish anything you can
do in other languages. This section serves to clarify the XPath expressions seen
within the previous section on XSLT and to elaborate upon a few concepts.
Axes
A key concept within XPath is the notion of axes. An axis within XPath is the set of
nodes that lie above (as in parent and ancestor elements), below (child and
descendant elements), to the left (preceding and preceding-sibling
elements), or to the right (following and following-sibling elements) of the
context node. The preceding and following axes refer to those elements that
occur before and after the context node, respectively, in document order, whereas
preceding-sibling and following-sibling refer to elements that have the
same parent element as the context node. The self axis is the context node itself
and can be combined with two other axes to form ancestor-or-self and
descendant-or-self. Access to any part of an XML document is then ensured
by the addition to these of the attribute and namespace axes.
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 19 of 43
developerWorks® ibm.com/developerWorks
Location steps form the basis of a location path. The delimiter for a location step is
the forward slash (/). A location path is a series of concatenated instances of the
axes discussed previously. For example, beginning at the context node, a location
path might go up two levels, find an element specified by a predicate, look down
within that subtree for a particular element, and return one of its attributes:
XPath functions
Quoting values within XPath
Notice that single quotes (') are used within the previous XPath
expression. This is because this expression would most likely
appear within some XSLT tag as the value of some attribute. The
attribute value would of course be enclosed within double quotes
("), so the use of those within something embedded, such as an
XPath expression, would break the XSLT tag.
You can accomplish string and math operations within XSLT through the functions
provided by XPath. For instance, if you need to add the values of two
xsl:variable elements x and y, you will use:
This returns a number, as do many other XPath functions. XPath provides functions
for the following math operations: addition (+), subtraction (-), multiplication (*),
division (div), and modulo, or remainder (mod).
XPath functions can also return a Boolean value (true or false). This is important for
conditional logic:
In this case, the concat() function may accept any number of arguments greater
than one. Another useful string function is normalize-space(string), which
strips leading and trailing whitespace and reduces all internal whitespace character
sequences to one. The following list comprises other XPath string-related functions:
• starts-with(string, string): Returns true if the first argument
starts with the second; otherwise, it returns false. The previous example
demonstrated this.
• contains(string, string): Returns true if the first argument
XML transformations
Page 20 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
XPath functions can also return node sets. For instance, within the example
sitemap.xml, you can find the number of page elements that have child page
elements:
<xsl:value-of select="count(//page[page])"/>
Here, a node set result is wrapped by a function that returns a number. The number
returned by the XPath count() function is 3; it counted the page elements with
label values A, AA, and AAA, each of whom have child page elements. Notice that
the previous code uses the descendant axis in its abbreviated syntax (//). When // is
used with nothing before it, a recursive search is done from the root element; such
searches can be expensive within a large document, so please use with care. Within
the square brackets ([ and ]), a predicate has been placed upon these page
elements, requiring that they have child page elements. The content of this predicate
is itself an XPath expression, "page" within the child axis. Notice also that you can
extend this logic to the following expression to get the number of pages that have
child pages -- that also have child pages -- by nesting another predicate:
<xsl:value-of select="count(//page[page[page]])"/>
The result, as you might expect, is 2 -- a count of the page elements A and AA.
Suppose that you want to find how deep a given context node is within its XML tree
(its navigation tier, to use the language of the site map example). You can again use
the count() function, but this time along a different axis:
<xsl:value-of select="count(ancestor::page)"/>
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 21 of 43
developerWorks® ibm.com/developerWorks
The ancestor axis is used to determine the depth of the context node. Its use is
shown here within a template:
<xsl:template match="/">
<xsl:for-each select="//page">
Page: <xsl:value-of select="@label"/>
,level: <xsl:value-of select="count(ancestor::page)"/>
</xsl:for-each>
</xsl:template>
Notice that xsl:for-each is driven by the node set result provided by the results
of its select attribute. The following output results:
The select="//page" attribute returns a node set containing all page elements in
document order, starting from the root element. The xsl:for-each then sets the
context node to each of these page elements in turn. Without the ability of
xsl:for-each to set the context node, the expressions within the xsl:value-of
elements will not function.
XPath summary
XPath is crucial to XSLT. It is the means by which math and string operations are
accomplished and that XML input is searched and traversed. XSLT would be blind
and toothless without XPath.
Section 5. CSS
A fool sees not the same tree that a wise man sees.
-- William Blake
XML transformations
Page 22 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
XML is inherently contextual. That is, its tags, their order, and the ways in which they
are nested tell you much about the meaning of the document. Other than their
arrangement, however, there is no information that indicates how the XML content
should be formatted visually (or within other media). The default view of an XML
document within a text editor or browser is to see all of its tags, with no special
visual treatment given to any one element, even the root. But what if you want to
show a big, fat font for certain elements, or to show other elements in a bulleted or
numbered list? What if it makes sense to always position a grouping of information in
the same place on a page, or to make that grouping's background color different
from that of its parent? To do this, you can map presentation rules to elements
through CSS.
CSS presentation rules allow you to separate information (the XML elements) from
its presentation -- such that if you want to, you can apply any one of a variety of
looks to a set of data under different conditions. Please note that CSS cannot
compare to the power of XSLT for styling or transforming XML; neither is it uniformly
interpreted across platforms or Web browsers -- especially in its support for XML
styling, as you will see. CSS is discussed here, however, because this lighter weight
approach might sometimes be all that is required to do the job. CSS is based upon a
fairly simple, non-XML syntax, and is quite easy to use. Within this tutorial, CSS
documents will be referred to simply as "stylesheets."
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 23 of 43
developerWorks® ibm.com/developerWorks
When you click on the plus and minus icons, you collapse and expand elements that
are parents of other elements -- very functional -- but with an appearance that only a
computer-science major could love. Firefox is no better, but at least it offers an
excuse, as Figure 3 shows.
XML transformations
Page 24 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
The message above the tree in Firefox implies that if there were style information
associated with this XML document, you might like it better. Okay, fine. How do you
do this?
Within HTML documents, you can include style information in the following forms:
• Within the HTML tags that use it -- or inline in CSS-speak
• Within a <style/> tag for styles internal to the HTML document
• External in another file and referenced through an HTML <link/> tag
This is news you can use if you write XSL transforms that produce HTML from XML.
Here, however, you're just concerned with infusing the XML with some visual
formatting. The problem is that the XML probably won't be valid if you add inline
style attributes to its elements wherever you like. In addition, the DTD or XML
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 25 of 43
developerWorks® ibm.com/developerWorks
schema for the XML document probably doesn't include a script element, so the
internal method of applying styles won't work either. Besides, both of these
approaches modify the XML, which you don't want to do. The only approach left is to
use external styles -- via one or more files with a .css suffix -- to apply visual
treatments to the XML documents. However, you can't use the HTML <link/> tag (not
unless your schema or DTD allows it). Therefore, you can associate one or more
CSS files with an XML document through a browser directive similar to the following:
Pllace this at the beginning (prolog) of your XML document, just after the directive,
<?xml version="1.0"?>. The stylesheet directive references a CSS file,
catalog.css, which sits in the same directory as the XML file. The value of the
href attribute can also be absolute. You might use other attributes within the
previous directive, one of which is media. Possible values for this include all,
braille, embossed, handheld, print, projection, screen, speech, tty,
and tv.
Thus, you can reference different cascading style sheet rules from the same XML
document. This allows the XML to be formatted for the media within which it is being
viewed (or heard!). You can test for the current media (and thus service it by the
same stylesheet) through the @media rule. The following code provides an example:
@media print {
body { font-size: 10pt; }
}
@media screen {
body { font-size: x-small; }
}
Syntax
The basic grammar for defining a style within a cascading style sheet is: selector
{property: value;}. The selector is the XML element for which you define an
appearance based upon the value of one or more properties. You can group more
than one selector together, each separated by a comma.
You can also use the universal selector, represented by an asterisk (*); this can
apply to everything in the XML document for which don't define an explicit rule. The
property value is expressed in one of a set of allowable units. These units express
color, length, and so on, as appropriate. Normally, whitespace does not matter within
CSS, allowing you to format these files in whatever way makes sense. However, the
value of a property and the units that it quantifies cannot be separated by any
whitespace.
Within CSS, you have some flexibility for how you define selectors. For instance, if
you want to set styling rules for an element title that is the child of a dvd element,
you can use the following:
XML transformations
Page 26 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Here, it is declared that DVD title element text values will render bold. You don't
have to use this syntax (you might have more simply declared title as the
selector), but it is a useful thing to remember in case you need to restyle an element
that appears in differing contexts (that is, an element appears as the child of one
element here, but maybe also appears as the child of a different element
elsewhere). Figure 4 shows the effect of this in Internet Explore 6.0.
For readability, let's now say that each dvd element gets displayed within its own
paragraph, with a little padding around it:
dvd {
display: block;
padding: 5px;
}
I'll discuss these properties and their values shortly. Now, if you want to highlight
those dvd elements that include a genre attribute, you declare the following rule:
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 27 of 43
developerWorks® ibm.com/developerWorks
Astute readers will notice the happy resemblance of this syntax to XPath predicates.
While Internet Explore 6.0 does not recognize this syntax, Firefox 1.0.7 gives the
rendered output shown in Figure 5.
Now, our favorite fellow movie collector wants to highlight drama movies with her
preferred color:
This rule, along with the more general one regarding the display of dvd elements
that have a genre attribute, are both obeyed, as shown in Figure 6. Again, this
works in Firefox but not Internet Explore 6.0.
XML transformations
Page 28 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
This looks better, but the price of each DVD is a little hard to read, isn't it? Let's put a
dollar sign in front of each price element value:
Let's also enclose the year that the movie was made in parentheses. For this, you
can use the same pseudo element technique used for the price element:
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 29 of 43
developerWorks® ibm.com/developerWorks
Internet Explore 6.0 doesn't recognize the selector:before and selector:after pseudo
elements, so its rendered result is not shown.
Listing 3. Stylesheet to visually format the XML document for the DVD
collection
/*
** catalog.css
** This stylesheet visually formats our DVD collection XML doc.
*/
catalog {
font-family: Verdana, Arial, Helvetica, Sans-serif;
font-size: x-small;
padding: 25px;
background-color: #DDDDCC;
}
dvd {
display: block;
padding-bottom: 8px;
border-top: 1px solid gray;
border-left: 2px solid #666666;
border-right: 2px solid white;
XML transformations
Page 30 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Here, you added a few properties to the catalog element (which is the root
element) to define the entire document. This sets overall rules for the background
color, borders, font-family, and font-size. As you'll see in the rendered
output, these font properties are inherited by all child elements of the catalog
element. Since catalog is the root element, these rules apply to the entire
document. The nested elements can have their own font property rules, which
override those of the parent.
Next, you added some rules to the dvd elements to make them stand out from the
background (the catalog element). You used two different ways to describe the
colors: name and hexadecimal -- I'll briefly discuss these and other ways to describe
colors in CSS later. You used the display property with a value of "block" for the
catalog and dvd elements to get them to show on their own line and to implement
padding. You also experimented with first-child and last-child pseudo
class selectors to give the first and last dvd elements an extra thick border.
(Pseudo class selectors match conditions of elements instead of element names.)
Finally, our favorite fellow movie collector requested that you hide all movies of the
action genre; this was attempted with the display: hidden property and value.
Figure 8 and Figure 9 show how Firefox and Internet Explore 6.0, respectively,
handle this stylesheet.
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 31 of 43
developerWorks® ibm.com/developerWorks
XML transformations
Page 32 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Notice that within Firefox, you've succeeded in hiding the DVD with a genre
attribute value of Action (Raiders of the Lost Ark) by setting its display property
to "none". In fact, Firefox nicely executes all the rules in the stylesheet. With
Internet Explore 6.0, however, you don't succeed with much at all. This is especially
true after you applied a background-color property to the catalog element; this
is the gray blob that covers most of the content in Figure 9. As shown here, the
not-very-standards-adherent Internet Explore 6.0 does not come close to Firefox for
CSS support for XML.
Lastly, as seen at the top of the previous stylesheet (Listing 3), you can use
comments within CSS files. These must begin with /* and end with */, just like
those used in C and Java programming.
Color
Color in CSS is typically a red, green, and blue triplet expressed in that order and in
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 33 of 43
developerWorks® ibm.com/developerWorks
Display
As seen in the Listing 3, the display property allows you to show an element in
different ways, given the following values:
Length
Lengths are used for many CSS properties. Width, height, font size, border, padding,
XML transformations
Page 34 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
margin, and positioning all use some length unit to describe their values. Lengths
are either absolute or relative. Absolute units, which are appropriate for print media
(but not much else), include points, picas, millimeters, centimeters, and inches.
Relative units are more commonly used for screen media and include pixels,
percentages, ems, and exes.
Pixels might at first seem like absolute units, but they are relative to the resolution of
the display device. These are typically used to scale block type elements relative to
bitmapped images, but avoid their use as font sizes. When the pixel unit is used to
describe a font size, it is treated as an absolute unit and most browsers don't allow
users to scale the font size it describes for readability.
Percentages describe a length of something relative to some other object. Often, this
object is the maximum space available. A width property with a value of 60% will
then take up 60% of the space that it could occupy. For instance, you can replace
the rules for the previous catalog element with these rules:
catalog {
font-family: Verdana, Arial, Helvetica, Sans-serif;
font-size: x-small;
background-color: #DDDDCC;
width: 60%;
}
Keeping all other rules in the stylesheet example as they are yields the output within
Firefox, as shown in Figure 10.
Here you can see that the catalog element indeed occupies 60% of the space
available, which, in this case, is the browser window -- just as the newly added
width property dictates.
Ems and exes are useful units for specifying font sizes relative to the parent font
size. The trick with using these is to be aware of the size of the comparable attribute
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 35 of 43
developerWorks® ibm.com/developerWorks
Font
The font properties govern how a typeface is displayed. Available font properties
include:
• font-family: A list of comma-delimited names of font families, in order
of preference. Example values include Arial, Verdana, Sans-serif,
Times New Roman, and Serif. Font families comprised of more than
one word should be wrapped in quotes as shown. The last family within
the list should always be either Sans-serif or Serif, depending on
what is needed.
XML transformations
Page 36 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Text
To control attributes of type faces not handled by the font properties, you can use
the text properties. These include:
• color: Describe the value as noted earlier in Color.
• background-color: Sets the color of the area immediately surrounding
the text. You can enlarge this area by the use of the padding:
properties, discussed in Padding.
• text-align: Accepts the values left (the default), right, center, or
justify.
• text-decoration: Typically takes the values none (the default) or
underline. Other possible values include overline or
line-through. (There is also rumor of a blink value, though common
decency dictates that it not be discussed here.)
Background
You can apply background treatments to various inline, block, and table elements
through the background-color property or by use of background images. The
background-color property values conform to the previous color description. The
use of images, however, can be a little more involved. Properties used to describe a
background image include the following:
• background-image: Requires the syntax url(' image URI ') to
locate the image.
• background-position: Indicates the orientation of the image relative
to the element. The syntax for this includes two of the following: left or
right and top or bottom.
• background-repeat: Controls whether the background image is tiled,
and if so, whether it's tiled vertically or horizontally. The syntax for this is
either no-repeat for no tiling, repeat-x for a horizontal tiling, or
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 37 of 43
developerWorks® ibm.com/developerWorks
background-image: url("../images/tubeTile.gif");
background-position: left top;
background-repeat: repeat-x;
Padding
Padding properties work for inline and block elements. You can specify either a
single padding: property, or you can specify one or more of the four directions:
padding-top:, padding-left:, padding-right:, and padding-bottom:.
You can use the units described earlier in Length. For a single padding: property,
you can still uniquely specify each of the four directions by putting the values in a
particular order, given the number of values. For instance:
Borders
Similar to the padding: properties, you can describe border: either all-around for
an element or by one or more of each of the four directions: border-top:,
border-left:, border-right:, and border-bottom:. The value syntax is:
length border-style color. Allowable length units were discussed earlier in Length.
Border-style can be one these values: dashed, double, dotted, groove, inset,
outset, or solid. The following rules provide an example of this syntax:
Position
With the CSS position properties, you can tell elements exactly where to display
within a document. For instance, if you want to specify that the first dvd element
XML transformations
Page 38 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
within the catalog appear at a fixed point relative to all other elements, you might use
the following code:
dvd:first-child {
border-top: 2px solid #666666;
position: absolute;
left: 80px;
top: 150px;
}
Notice the position property and its value of absolute. Once this is specified,
you can then describe where you want to absolutely position the element from the
top left of the page. In the previous rule, we declared 80 pixels from the left edge of
the layout and 150 pixels down from the top. Figure 11 shows what this looks like
when rendered in Firefox.
In Figure 11, Terminator 2 is the first dvd element in document order, so it received
the absolute positioning rule that was declared for the first dvd element. In addition
to the left and top properties, you can declare right and bottom; these place
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 39 of 43
developerWorks® ibm.com/developerWorks
the element relative to the right edge or the bottom edge of the layout, respectively.
You can use any combination of these properties, so long as it makes sense for the
layout. Another value of the position property is relative, which offsets the normal
rendering position of an element by the top, left, right, and bottom properties.
For instance, you might use this code to place the first dvd element above and to
the left of its normal position by 10 pixels each:
dvd:first-child {
border-top: 2px solid #666666;
position: relative;
left: -10px;
top: -10px;
}
It is possible to overlay the display of one element upon another. You can control
which element is rendered on top with the z-index: property. Integer values are
used for this, with the element having the greater value placed above the other
elements.
XML transformations
Page 40 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
the page, position: fixed sets an element's position relative to the browser
window. This is hard to show without a scrollable example, but you can try it for
yourself; the syntax is simply position: fixed. The top, left, right, and
bottom properties allow you to set the position where the element will remain, even
as the page is scrolled.
CSS summary
Although not the generally preferred way to style XML documents, and not capable
of doing computation or transformation such as what XSLT can accomplish, CSS at
least offers a lightweight way to format XML elements visually. Because of the
inconsistent support of CSS (especially for XML) within various browsers, use CSS
for screen media, at least,with caution. As shown here, however, XML support for
CSS is strong within Mozilla Firefox. Articles written about CSS for print media have
indicated great success; this is especially useful if you can avoid the greater
complexity of XSLT.
Section 6. Conclusion
Summary
This concludes the discussion of the XML transformation topics XSLT, XPath, and
CSS. It is hoped that this information provides a solid introduction for those who are
new to the subject. An honest effort to understand the material presented here and
examine the references that follow should be more than adequate to prepare you for
the XML transformations portion of Exam 142, XML and Related Technologies.
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 41 of 43
developerWorks® ibm.com/developerWorks
Resources
Learn
• XML in a Nutshell, 3rd Edition (Elliotte Rusty Harold and W. Scott Means,
O'Reilly Media, September 2004, ISBN: 0596007647): In this comprehensive
XML reference, find excellent chapters on XSLT, XPath, and CSS for XML. Also
find great comparisons between versions 1.0 and 2.0 of XSLT and XPath.
• XSLT Cookbook, 2nd Edition (Sal Mangano. O'Reilly, December 2005, ISBN:
0596009747): Dig into detailed examples of XSLT and XPath usage, plus some
more involved applications of these technologies.
• Universal Turing machine: Find out how XSLT is Turing complete in this
Wikipediaarticle.
• Investigating XSLT: The XML transformation language (LindaMay Patterson,
developerWorks, August 2001): Read about basic syntax and XSL
programming techniques in this early article on XSLT and XPath.
• Practical data binding: XPath as a data binding tool, Part 1 (Brett McLaughlin,
developerWorks, November 2005): Explore a good primer on XPath, and add to
what you saw in this tutorial.
• XSLT Transformation: Visit W3Schools for more on basic XSLT syntax and
grammar.
• CSS tutorial: Learn to apply style and layout to multiple Web pages at once
from W3Schools. Most of what you do for HTML also applies to XML.
• CSS Length Units Reference (MSDN): Review supported length units for
Cascading Style Sheets (CSS), text, layout, and positioning properties.
• XML Transformations with CSS and DOM: Visit Apple's Developer Connection
for Mozilla support for XML and CSS.
• XPath string functions: Explore string functions for XPath in this W3C
recommendation.
• The CSS @media rule: Learn how to specify target media types in this W3C
recommendation.
• Document order: Review the definition of document order in the W3C XPath
recommendation.
• IBM XML 1.1 certification: Become an IBM Certified Developer in XML 1.1 and
related technologies.
• XML: See developerWorks XML Zone for a wide range of technical articles and
tips, tutorials, standards, and IBM Redbooks.
• developerWorks technical events and webcasts: Stay current with technology in
these sessions.
Get products and technologies
XML transformations
Page 42 of 43 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
• Altova XMLSpy 2006 Home Edition: Download a free entry level XML editor and
development tool for designing and editing XML-based applications.
• Microsoft Internet Explorer 7: Download Internet Explorer 7, and recommended
updates.
• Mozilla Firefox 1.5: Download Firefox 1.5 with its support of open Web
standards.
• IBM product evaluation versions: Download and try application development
tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and
WebSphere®.
Discuss
• XML zone discussion forums: Participate in any of several XML-centered
forums.
• developerWorks blogs: Get involved in the developerWorks community.
Trademarks
IBM, DB2, Lotus, Rational, Tivoli, and WebSphere are trademarks of IBM
Corporation in the United States, other countries, or both.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the
United States, other countries, or both.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft
Corporation in the United States, other countries, or both.
XML transformations
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 43 of 43