PDF To The Max: Multifunctional Tool For PDF Files
PDF To The Max: Multifunctional Tool For PDF Files
PDF Toolkit
ative Linux PDF utilities such as GhostScript are very useful if youre willing to click through the menus. But if youre looking for something faster, or if you would like to automate a recurring task, try pdftk (the PDF Toolkit). pdftk is a convenient command-line program for processing PDF files. According to creator Sid Steward, If PDF is electronic paper, then pdftk is an electronic staple-remover, holepunch, binder, secret-decoder-ring, and X-Ray-glasses.
the PDF toolkit from one of the Sid Stewards [1] websites. The GPL program is available for Linux, Mac OS X (Panther), FreeBSD, Solaris, and Windows. The platform-specific install proved to be quite simple on the platforms we tested (including Debian and SuSE Linux). After completing the install, you can run pdftk from a shell. The pdftk --help command gives you a list of commands and options with short help texts. Table 1 lists and explains the major operations. The generic syntax for processing PDF files with the program is:
pdftk inputfile(s) U
Input files have to be in PDF format. The tool additionally needs text files in a special format for some operations. pdftk outputs one or more PDF files and also the text files in special cases. In the following sections, I have put together a few examples that demonstrate a few of pdftks more interesting uses. These examples by no means explore the programs limits.
Alternatively, you could use pdfLaTeX and attachfile to add the source code to the finished PDF file. The recipient would then use pdftk to unpack the source code file and other attachments using in any directory:
86
W W W. L I N U X - M A G A Z I N E . C O M
PDF Toolkit
LINUXUSER
showpage
In this example, pdftk saves the attachments in a directory named Source. Adding the directory name always makes sense if you are handling multiple attachments.
It is easy to change the background color in the EPS code. You can then convert the EPS file to a PDF using epstopdf and then run pdftk to use the file as a background:
pdftk example.pdf background U Bg.pdf output eg_color.pdf
by linking part of a PDF file with parts of other PDFs to create a new document.
The watermark looks like a stamp on any part of the document without content. You can create a small EPS file with a background color of your choice. The PostScript commands for an A4 size page look like this:
%!PS-Adobe-2.0 %%BoundingBox: 0.95 0.95 0.90 0 0 moveto 595 rlineto -595 0 closepath fill
In both examples a three-digit page number will be added to the page names. In the second example, pdftk will store the PDF files in an existing subdirectory. The cat operation tells pdftk to concatenate multiple PDF files to create a new document. You can use wildcards to specify the filenames of the individual source files.
pdftk example.pdf form.pdf U attachment.pdf U cat output example_concat.pdf pdftk D=coversheet.pdf U B=example.pdf U cat D B1-4 output U example_coversheet.pdf
As the second example demonstrates, you can use cat to rearrange documents
This command saves the meta-information from the PDF document to a file titled info.txt. The information comprises a key field and the matching value (see Listing 1). Before forwarding or archiving PDF documents, it often makes sense to update the meta-data. Pdftk allows you to do so without having to recreate or translate the PDF file. To update the meta-information, first create a text file with the meta-data; the file should look something like this (shortened for the sake of brevity):
InfoKey: Creator InfoValue: TeX InfoKey: Corporation InfoValue: Sample and Sons
This file does not need to contain all the information that a PDF file can store. Fields that already contain values are not touched by the update if the text file does not specify them. You can even add new key fields (Corporation in our example) and assign values to them. The fol-
W W W. L I N U X - M A G A Z I N E . C O M
87
LINUXUSER
PDF Toolkit
The input and output files are not permitted to have the same name. In other words, you either need to manually rename the output file, or use a shell script to do so.
fills out the form fields in his or her browser. Then a PHP or Perl script running in the background creates the FDF file; finally, pdftk combines the two parts. The completed PDF file can then be mailed.
PDF files can be protected by user and owner passwords. pdftk allows you to set both the passwords and the permissions for a PDF file. The following example sets both passwords:
pdftk file.pdf output U file_new.pdf owner_pw U Lie5quai user_pw phupaefu
As the previous example does not allow you to concatenate the PDF file for file_ new.pdf, you need to supply the owner password.
Conclusions
If you are looking for a quick, simple, and efficient tool for editing PDF files from the command line, try the PDF Toolkit. pdftk is a versatile, multifunctional PDF manipulation tool without the burden of a GUI. If you want to dig deeper into the subject of manipulating PDF files, see Sid Stewards book on PDF Hacks [2]. pdftk is written in C+ and based + on the iText library [3], which in turn was written in Java. The whole program was complied and linked with tools from the free GNU Compiler Collection [4], This makes pdftk easily portable and extensible. The pdftk website has links to ports. Development work on the pdftk program still continues. The programs author, Sid Steward, will answer queries on pdftk and PDF programming posted in the comp.text.pdf newsgroup and in his own PDF forum [1].
The first example gives you editable results; whereas the flatten option in the second file indicates that the form fields should be indelibly merged with the PDF file. The form feature allows you to use pdftk to create completed PDF forms on an Internet or Intranet server. The user
The passwords in this example were generated using the pwgen tool. You must choose different strings for the user and owner passwords. The owner of a PDF file can assign specific permissions. Table 2 has a list of permissions that you can set with pdftk. The following example first creates a PDF file that can only be printed. The second line creates a PDF file that can be printed and also copied.
pdftk example.pdf output U file_new.pdf owner_pw U Lie5quai user_pw phupaefu U allow printing pdftk example.pdf output U file_new.pdf owner_pw U Lie5quai user_pw phupaefu U allow printing CopyContents
PDF files can be encrypted with different levels of encryption. To encrypt a file with pdftk, add one of the following as the final option: encrypt_40bit or encrypt_128bit. You also need to supply a password for a password-protected PDF file. If you are processing multiple files, you can bind variables to the filenames and then assign a password to each file. In the following example, only file A is password protected:
pdftk A=file_new.pdf U B=eg_color.pdf input_pw U
INFO
[1] Sid Steward: pdftk; Version 1.12 (Nov. 2004): http://www.accesspdf.com/pdftk/ [2] Sid Steward, PDF Hacks; OReilly, 2004. [3] Bruno Lowagie, Paulo Soares: iText-Library; Version 1.1 (Nov. 2004): http://itext.sourceforge.net [4] GNU Compiler Collection, Version 3.4.3 (Nov. 2004): http://gcc.gnu.org
88
W W W. L I N U X - M A G A Z I N E . C O M