Dynamic Transformations From XML To PDF Documents
Dynamic Transformations From XML To PDF Documents
Agenda
About the speakers Types of documents Transforming XML documents Introduction to LaTeX Basic usage of LaTeX Converting LaTeX to PDF Dynamic creation of LaTeX and PDF documents Transforming XML documents Using patXMLRenderer to transform XML to PDF
16.09.2005 Slide 2
Stephan Schmidt
Web Application Developer at Metrix Internet Design GmbH in Karlsruhe/Germany Programming since 1988, PHP since 1998 Publishing OS on http://www.php-tools.net Contributor to the German PHP Magazine Regular speaker at conferences Maintainer of patXMLRenderer, patTemplate, patUser and others
16.09.2005
Slide 3
The problem
Have been developing a really large application Writing technical as well as end-user documentation Documentation was available in XML (made available in the application as HTML) customers wanted documentation on paper
16.09.2005
Slide 4
XML documents
Readable by humans: self-explaining tag names self-explaining attribute names structured by indentation Readable by machines: Well-formed document only ASCII data Validation with DTD or schema Describe only the content
16.09.2005
Slide 5
PDF documents
Readable by humans: nice layout can be view on any platform can be easily printed Not readable by machines Binary document Mixture of content and layout Describe the content and layout
16.09.2005
Slide 6
16.09.2005
Slide 7
Transforming XML
Data is stored in an XML document Needed in different formats and environments Other XML formats (DocBook, SVG, ) HTML Plain text LaTeX Anything else you can imagine
16.09.2005
Slide 8
16.09.2005
Slide 9
16.09.2005
Slide 10
Introduction to LaTeX
based on TeX by Donald E. Knuth not a word-processor document preparation system for high-quality typesetting used for medium to large scientific documents can be used for any document: articles, books, letters, invoices,
16.09.2005
Slide 11
16.09.2005
Slide 12
16.09.2005
Slide 13
16.09.2005
Slide 14
Easy to understand
\documentclass{article} the document is an article \title{Dynamic transformations of XML to PDF with LaTex} the title is "Dynamic transformations " \author{Stephan Schmidt} Stephan Schmidt is the author \date{April 2003} it has been written in April 2003
Slide 15
16.09.2005
Easy to understand
\begin{document} \maketitle We love XML, but everyone wants PDF. \end{document}
document consists of a title (somehow generated) and
some text.
16.09.2005
Slide 16
LaTeX features
Typesetting articles, technical reports, letters, books and slide presentations Control over large (and I really mean large) documents Control over sectioning, cross references, footnote, tables and figures Automatic creation of bibliographies and indexes Inclusion of images Using PostScript or Metafont fonts
16.09.2005
Slide 17
text markup, paper definitions, etc. collection of commands split the document into logical components
16.09.2005
Slide 18
LaTeX commands
start with a backslash ("\") parameters enclosed in curly braces ("{" and "}") optional parameters enclosed in brackets ("[" and "]") and separated by commas Example: \maketitle \footnote{I am a footnote} \documentclass[a4paper,twoside]{book}
Slide 19
16.09.2005
LaTeX comments
start with percent sign ("%") end at the end of the line Example: \documentclass{article} % This will be an article % This line is a comment and will be ignored later
16.09.2005
Slide 20
LaTeX environments
used to split the document into logical parts similar to tags in an XML document start with "\begin" command and end with "\end" command Example: \begin{document} % Place anything that is part of the document here \end{document}
Slide 21
16.09.2005
16.09.2005
Slide 22
Creating a document
document always starts with "\documentclass" command to define the type of document responsible for the available commandset (no use for "\chapter" when you are writing a letter) used to define the basic layout style load packages after this command
16.09.2005
Slide 23
16.09.2005
Slide 24
16.09.2005
Slide 26
16.09.2005
Slide 27
16.09.2005
Slide 28
Resulting document:
Bookmark table:
16.09.2005
Slide 29
Resulting files
After "pdflatex" has been called, several files are available in the folder: myDocument.pdf is the PDF file you wanted to create myDocument.log is a log file containing all log messages myDocument.toc contains the table of contents myDocument.out contains bookmarks for the PDF reader myDocument.aux contains all data needed for cross references
16.09.2005
Slide 30
Two-pass transformations
LaTeX parses file from top-down generates table of contents, anchor files for links, PDF bookmarks and stores them in external files This information often has to be included at the beginning of the document (e.g. table of contents) Latex file has to be parsed twice two-pass transformation pdflatex has to be called twice
16.09.2005
Slide 31
Now open http://localhost/latex.php?name=Aquaman and save the result Your first dynamic LaTeX document!
16.09.2005 Slide 32
16.09.2005
Slide 33
dynamicLatex.php
<?PHP ob_start(); ?>
\documentclass{article} \begin{document} Hi, my name is <?PHP echo $_GET[name]; ?>. \end{document}
<?PHP $latex = ob_get_contents(); ob_end_clean(); $fp = fopen( "dynamic.tex", "w" ); fputs( $fp, $latex ); fclose( $fp ); system( "pdflatex dynamic.tex" ); system( "shutdown -h now" ); ?>
16.09.2005
Slide 34
Not state-of-the-art
Creating larger and complex files can get messy: PHP and LaTeX commands in one file No separation of logic, content and layout
16.09.2005
Slide 35
State-of-the-art techniques
Implement the same techniques that are used in dynamic webpages: use templates store content in databases or XML use caching to gain performance
16.09.2005
Slide 36
Transforming XML
XSLT has been developed for the task of transforming XML documents XSLT stylesheets are XML documents Transforms XML trees that are stored in memory Uses XPath to access parts of a document Based on pattern matching (When see you something that looks like this, do that) Functional syntax Sounds good? think again!
16.09.2005
Slide 37
Drawbacks of XSLT
XSLT is domain specific:
Developed to work with XML Creating plain text/LaTeX is quite hard, as whitespace is important (<xslt:text>) Transforming world to W O R L D is next to impossible
16.09.2005
Slide 38
Drawbacks of XSLT
XSLT is verbose and circumstantial:
<xsl:choose> <xsl:when test="@author"> <xsl:value-of select="@author"/> <xsl:text> says: </xsl:text> <xsl:value-of select="."/> </xsl:when> <xsl:otherwise> <xsl:text>Somebody says: </xsl:text> <xsl:value-of select="."/> </xsl:otherwise> </xsl:choose>
16.09.2005
Slide 39
Drawbacks of XSLT
XSLT is hard to learn:
Functional programming language Complex structure (see if/else example) XPath is needed Designer needs to learn it
16.09.2005
Slide 40
16.09.2005
Slide 41
16.09.2005
Slide 42
Example
XML
<section title="XML and PDF"> <para>We love XML, but everybody wants PDF.</para> </section>
Template for <section> <table border="0" cellpadding="0" cellspacing="2" width="500"> <tr><td><b>{TITLE}</b></td><tr> <tr><td>{CONTENT}</td></tr> </table> Template for <para> <font face="Arial" size="2">{CONTENT}<br></font>
16.09.2005
Slide 43
Example (Result)
<table border="0" cellpadding="0" cellspacing="2" width="500"> <tr><td><b>XML and PDF</b></td><tr> <tr><td> <font face="Arial" size="2"> We love XML but, everybody wants PDF.<br> </font> </td></tr> </table>
16.09.2005
Slide 44
16.09.2005
Slide 45
Installation of patXMLRenderer
Download archive at http://www.php-tools.de Unzip the archive Adjust all path names and options in the config file (cache, log, etc.) Create the templates (transformation rules) Create your XML files Let patXMLRenderer transform the files
16.09.2005
Slide 46
Introduction to patTemplate
PHP templating class published under LGPL Supports LaTeX templates when instantiated with $tmpl = new patTemplate( "tex" ); Placeholder for variables have to be UPPERCASE and enclosed in { and } or <{ and }> if used with LaTeX templates Uses <patTemplate:tmpl name="..."> tags to split files into template blocks that may be addressed seperately Other Properties of the templates are written down as attributes, e.g: type="condition" or whitespace="trim" Emulation of simple switch/case and if/else statement by using <patTemplate:sub condition="..."> tags
16.09.2005
Slide 47
patTemplate Example
simple Template with two variables (Corresponds to the XML tag <box>)
<patTemplate:tmpl name="box"> <table border="1" cellpadding="5" cellspacing="0" width="{WIDTH}"> <tr> <td>{CONTENT}</td> </tr> </table> </patTemplate:tmpl>
16.09.2005
Slide 48
patTemplate Example 2
Task: Box should be available in three sizes: small, large and medium (default) Solution: Condition Template to emulate a switch/case statment: Template type is "condition" Variable that should be checked is called "size" Three possible values for "size": "small", "large" and "medium" (or any other unknown value) three Subtemplates.
16.09.2005
Slide 49
patTemplate Example 2
<patTemplate:tmpl name="box" type="condition" conditionvar="size"> <patTemplate:sub condition="small"> <table border="1" cellpadding="5" cellspacing="0" width="200"> <tr><td>{CONTENT}</td></tr> </table> </patTemplate:sub> <patTemplate:sub condition="large"> <table border="1" cellpadding="5" cellspacing="0" width="800"> <tr><td>{CONTENT}</td></tr> </table> </patTemplate:sub> <patTemplate:sub condition="default"> <table border="1" cellpadding="5" cellspacing="0" width="500"> <tr><td>{CONTENT}</td></tr> </table> </patTemplate:sub> </patTemplate:tmpl>
16.09.2005 Slide 50
16.09.2005
Slide 51
The Result
\documentclass[a4paper,twocolumn]{article} \usepackage{hyperref} \title{Me and the superheroes, part 2} \author{Me, of course} \begin{document} \tableofcontents \section{I lied to you} When I was talking about {\em Superman}, I lied. He came back from the dead and rose to the glory he once had.\\ \end{document}
16.09.2005
Slide 53
16.09.2005
Slide 54
16.09.2005
Slide 55
Simple Example
<example> Today is <time:current format=m-d-Y/>. </example> Will be transformed to <example> Today is 05-09-2004. </example> Which will then be transformed to LaTeX using the rules defined in the templates.
16.09.2005
Slide 56
patXMLRenderer Example
<page> <dbc:connection name="foo"> <dbc:type>mysql</dbc:type> <dbc:host>localhost</dbc:host> <dbc:db>myDb</dbc:db> <dbc:user>me</dbc:user> <dbc:pass>secret</dbc:pass> </dbc:connection> <dbc:query connection="foo" returntype="assoc"> SELECT id,name,email FROM authors WHERE id=<var:get scope="_GET" var="uid"/> </dbc:query> <page>
...place any XML code here...
16.09.2005
Slide 57
16.09.2005
Slide 58
Existing Extensions
Repository on http://www.php-tools.net Examples: <xml:...> for XML syntax highlighting <php:...> for PHP syntax highlighting <dbc:...> database interface <var:...> access to variables <control:...> control structures <rss:...> to include content from RSS feeds <file:...> file operations and many more... Allow you to develop "XML Applications"
16.09.2005
Slide 59
The End
Thank you!
More information: http://www.php-tools.net [email protected] Thanks to: Sebastian Mordziol, gERD Schaufelberger, Metrix Internet Design GmbH
16.09.2005
Slide 60