Pdfbox - User Guide
Pdfbox - User Guide
Table of contents
1 Introduction........................................................................................................................ 2
2 Examples............................................................................................................................ 2
3 PDF File Format Overview................................................................................................2
4 PD Model........................................................................................................................... 3
1 Introduction
This page will discuss the internals of PDF documents and how those internals map to
PDFBox classes. Users should reference the javadoc to see what classes and methods are
available. The Adobe PDF Reference can be used to determine detailed information about
fields and their meanings.
2 Examples
A variety of examples can be found in the src/org/pdfbox/examples folder. This guide will
refer to specific examples as needed.
Copyright © 2008 The Apache Software Foundation All rights reserved. Page 2
PDFBox - User Guide
A page in a pdf document is represented with a COSDictionary. The entries that are available
for a page can be seen in the PDF Reference and an example of a page looks like this:
<< /Type /Page /MediaBox [0 0 612 915] /Contents 56 0 R >>
4 PD Model
The COS Model allows access to all aspects of a PDF document. This type of programming
is tedious and error prone though because the user must know all of the names of the
parameters and no helper methods are available. The PD Model was created to help alleviate
this problem. Each type of object(page, font, image) has a set of defined attributes that can be
available in the dictionary. A PD Model class is available for each of these so that strongly
typed methods are available to access the attributes. The same code from above to get the
page width can be rewritten to use PD Model classes.
PDPage page = ...; PDRectangle mediaBox = page.getMediaBox(); System.out.println( "Width:" +
mediaBox.getWidth() );
PD Model objects sit on top of COS model. Typically, the classes in the PD Model will
only store a COS object and all setter/getter methods will modify data that is stored in the
COS object. For example, when you call PDPage.getLastModified() the method will do a
lookup in the COSDictionary with the key "LastModified", if it is found the value is then
converter to a java.util.Calendar. When PDPage.setLastModified( Calendar ) is called then
the Calendar is converted to a string in the COSDictionary.
Here is a visual depiction of the COS Model and PD Model design.
Copyright © 2008 The Apache Software Foundation All rights reserved. Page 3
PDFBox - User Guide
Note:
For example, each call to PDPage.getMediaBox() will return a new PDRectangle object, but will
contain the same underlying COSArray.
Copyright © 2008 The Apache Software Foundation All rights reserved. Page 4