0% found this document useful (0 votes)
59 views

Minimal PDF: Adobe PDF Specification ("ISO Approved Copy of The ISO 32000-1 Standards Document") Tips

This document contains a minimal valid PDF file with only 4 objects: the catalog, pages, page and content stream. It demonstrates that PDF is a human-readable format by showing the raw code for a simple PDF with no compression, encryption or images. The file specifies PDF version 1.1, defines the root catalog and one page with dimensions. The page content stream draws the text "Hello World".

Uploaded by

prnco
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Minimal PDF: Adobe PDF Specification ("ISO Approved Copy of The ISO 32000-1 Standards Document") Tips

This document contains a minimal valid PDF file with only 4 objects: the catalog, pages, page and content stream. It demonstrates that PDF is a human-readable format by showing the raw code for a simple PDF with no compression, encryption or images. The file specifies PDF version 1.1, defines the root catalog and one page with dimensions. The page content stream draws the text "Hello World".

Uploaded by

prnco
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Minimal PDF

PDF is a binary format, but it contains mostly plain text

Most PDF files do not look readable in a text editor. Compression, encryption, and embedded images are largely to blame. After removing these three components, one can
more easily see that PDF is a human-readable document description language.

With patience, one can write a PDF file by hand

The Adobe PDF specification (“ISO approved copy of the ISO 32000-1 Standards document”) includes an example “minimal PDF file,” but it's possible to trim it down
even further. The trickiest part is making sure that all the byte counts are correct (tips).

The file
%PDF-1.1 Header; specifies that this file uses PDF version 1.1
%¥±ë Comment containing at least 4 “high bit” characters. This example has 6.

1 0 obj Object 1, Generation 0


<< /Type /Catalog Begin a Catalog dictionary
/Pages 2 0 R The root Pages object: Object 2, Generation 0
>> End dictionary
endobj End object

2 0 obj Object 2, Generation 0


<< /Type /Pages Begin a Pages dictionary
/Kids [3 0 R] An array of the individual pages in the document
/Count 1 The array contains only one page
/MediaBox [0 0 300 144] Global page size, lower-left to upper-right, measured in points
>> End dictionary
endobj End object

3 0 obj Object 3
<< /Type /Page Begin a Page dictionary
/Parent 2 0 R
/Resources The resources for this page…
<< /Font Begin a Font “resource dictionary”
<< /F1 Bind the name “F1” to
<< /Type /Font a Font dictionary
/Subtype /Type1 It's a Type 1 font
/BaseFont /Times-Roman and the font face is Times-Roman
>>
>>
>>
/Contents 4 0 R The contents of the page: Object 4, Generation 0
>>
endobj

4 0 obj Object 4
<< /Length 55 >> A stream, 55 bytes in length
stream Begin stream
BT Begin Text object
/F1 18 Tf Use “F1” font at 18 point size
0 0 Td Position the text at 0,0
(Hello World) Tj Show text “Hello World”
ET End Text
endstream End stream
endobj

xref The xref section


0 5 A contiguous group of 5 objects, starting with Object 0
0000000000 65535 f Object 0: is object number 0, generation 65535, free, space+linefeed
0000000018 00000 n Object 1: at byte offset 18, generation 0, in use, space+linefeed
0000000077 00000 n
0000000178 00000 n
0000000457 00000 n
trailer The trailer section
<< /Root 1 0 R The document root is Object 1, Generation 0 (the Catalog dictionary)
/Size 5 The document contains 5 indirect objects
>>
startxref Where is the newest xref?
565 byte offset 565
%%EOF End of File

Download

With linefeed newlines: minimal.pdf


With linefeed newlines and a license comment: minimal_l.pdf
With Windows-style newlines: minimal_crlf.pdf
With Windows-style newlines and a license comment: minimal_crlf_l.pdf

Notes

The high bit comment in this example contains 6 one-byte characters. These happen to show up as 3 two-byte characters when viewing the file as UTF-8 encoded text. To
see 6 characters, try changing your browser's character encoding to “Western.”
Differences from the minimal PDF file in the Adobe spec.

Some optional entities are omitted, such as an Outlines dictionary.


All objects are direct objects, except where indirect objects are mandated by the specification.
Linefeed (0x0a) is used for the newline character.
All the whitespace formatting indents are included in the byte offsets. The text can be copied from the webpage and pasted into a PDF file, as long as linefeed is used
for the newlines and the encoding is kept as UTF-8. Note: Windows uses carriage return + linefeed for newline, so pasting into Notepad is not ideal.

Links

http://blog.idrsolutions.com/?s=%22Make+your+own+PDF+file%22
A series of posts that explains how to write PDF files from scratch.

For a gentler introduction to the specification, read some tips on writing a PDF file (this site) or an introduction to PDF (another site).

Found a mistake?

Submit a comment or correction

License

MIT License. See the LICENSE file.

Updates
14 Mar 2019 Correct download links for the PDF files that include license comments.
02 Dec 2018 Add MIT License because placing in the public domain is not supported in all jurisdictions.
03 May 2014 Add note about 6 high bit characters appearing as 3 UTF-8 characters. Thanks @pdfkungfu!
2012 Jan 08 Comments link
The file was not working in some readers (including Adobe Reader!) because the Contents stream needed to be an indirect object. “All streams
2012 Dec 24
shall be indirect objects”
2012 Jun 15 Reword download link
2012 Jan 26 remove document trapdoor tech talk link. it seems off-topic.
2010 Dec 02 link to Google Tech Talk
2010 Nov 20 clean up
2010 Sep 27 Small changes, corrections
2010 Sep 13 Created

You might also like