PDF Processing With Gnostice PDFtoolkit (Part 1)
PDF Processing With Gnostice PDFtoolkit (Part 1)
In the first part of this article, originally published on Codegear.com last month, we will see what Gnostice PDFtoolkit
VCL can do for you. We will be using code examples to illustrate the ease with which PDFtoolkit will help you accomplish
your PDF-related tasks.
By V. Subhash
Why PDF?
PDF is best known for its ability to retain high fidelity on all platforms. It is also a final form document
format in that people do not expect PDF documents to undergo further change. That is why PDF is a popular
choice for making invoices and user manuals, and also for transmitting documents over the Internet. PDF is
also liked for its features such as font embedding, bookmarks, thumbnails, attachments, watermarks,
annotations, encryption, and digital signatures. Last but not the least, PDF is an open format.
For these reasons, PDF has become a part of our technology-oriented lives. From e-books to web forms to
sophisticated workflow transports, PDF has seen applications in innumerable ways.
Although the format supports a lot of features, most applications that produce PDF documents make use of
only a few features. Usually, it is just text and images. At other times, we may have just text and form fields.
PDF users often require more value, such as encryption, compression, bookmarks, stamps, or watermarks. In
a workflow-like environment, the demands to cut, chop, and mince PDF documents are even more.
To meet this need, there is a flourishing market for PDF processors. In this arena, Gnostice PDFtoolkit has
long established its name as a leader.
I. Manipulation
II. Content Extraction (text and images)
III. Transformation (merging and splitting)
IV. Enhancement (adding bookmarks, hyperlinks, comments, stamps and watermarks; encryption;
compression)
V. Forms Processing (adding/editing/deleting/flattening form fields)
VI. Viewing and Printing (visual components)
VII. Text Search (visual component)
Reading a PDF document is straightforward, as shown in the code snippet below. Create a
TgtPDFDocument object, load a document, and we are ready to roll.
...
...
I. Manipulation of PDF Documents
After a PDF document has been loaded, document contents and their properties can be read and modified
using the properties and methods of TgtPDFDocument object.
In the next code snippet, we first specify the measurement unit that will be used when rendering elements on
a PDF page. Next, a HTML-formatted string is written on the last page. The number of the last page is
obtained from the property TgtPDFDocument.PageCount().
...
...
The formatted string is written at the center of the last page. To obtain the location of the center of the page,
we first call TgtPDFDocument.GetPageSize().
function GetPageSize(
PageNo: Integer;
MMUnit: TgtMeasurementUnit): TgtPageSize;
TgtMeasurementUnit
= (muPixels, muPoints, muInches, muMM, muTwips);
TgtPageSize = record
Width,
Height: Double;
end;
This method returns a TgtPageSize record, whose fields TgtPageSize.Width and TgtPageSize.Height
provide the dimensions of the specified page. From this information, it is easy to calculate the location of the
center of the page.
As you can see, PDFtoolkit provides an elegant interface that hides the complexities imposed by the format
specification.
function GetPageElements(
APageNo: Integer;
ElementTypes: TgtElementTypes;
MMUnit: TgtMeasurementUnit): TgtPDFPageElementList;
PDFtoolkit offers more than one way of doing the same thing, each one more useful in a special situation.
The TgtPDFDocument.SearchAll() method can perform a variety of text searches for a given search string.
Function SearchAll(
Const SearchText: String;
AOptions: TgtSearchTypes;
SearchList: TStringList):Integer;
...
var
gtPDFDocument1: TgtPDFDocument;
StringList1: TStringList;
begin
try
// Merge the documents
gtPDFDocument1.MergeDocs(StringList1);
// Save the merged document to file
gtPDFDocument1.SaveToFile('merged_doc.pdf');
...
Marking up text
Hyperlinking images and text
Adding bookmarks, stamps, and watermarks
Embedding files as attachments
In this code snippet, we see how to add bookmarks for all pages in a document.
...
var
I: Integer;
gtPDFDocument1: TgtPDFDocument;
// Bookmark
gtPDFOutline1: TgtPDFOutline;
// Destination linked by a bookmark
gtPDFDestination1: TgtPDFDestination;
// Display style of a bookmark in bookmark panel
gtBookmarkAttribute1: TgtBookmarkAttribute;
begin
gtPDFDocument1 := TgtPDFDocument.Create(Nil);
try
gtPDFDocument1.LoadFromFile('sample_doc.pdf');
if gtPDFDocument1.IsLoaded then
begin
// For each page in the document
for I := 1 to gtPDFDocument1.PageCount do
begin
// Create a bookmark that links to the top-left
// corner of the page in the current iteration
gtPDFDestination1 :=
TgtPDFDestination.Create(
I, // Number of the page
dtXYZ, // Destination type (use x-y coordinates and zoom)
0, // X-coordinate of the destination
0, // Y-coordinate of the destination
100); // Zoom
if I = 1 then
begin
// If it's the first page, then create a new bookmark
gtPDFOutline1 := gtPDFDocument1.CreateNewBookmark(
'Page #' + IntToStr(I), // Bookmark title text
gtPDFDestination1,
gtBookmarkAttribute1);
end
else
begin
// For other pages, add a bookmark next to the
// previously created bookmark
gtPDFOutline1 := gtPDFOutline1.AddNext(
'Page #' + IntToStr(I), // Bookmark title text
gtPDFDestination1,
gtBookmarkAttribute1);
end;
end;
end;
// Save the modified document
gtPDFDocument1.SaveToFile('modified_doc.pdf');
...
...
uses
...
gtPDFCrypt,
gtPDFDoc;
var
gtPDFDocument1: TgtPDFDocument;
begin
// Create a document object
gtPDFDocument1 := TgtPDFDocument.Create(Nil);
try
// Load input document
gtPDFDocument1.LoadFromFile('unencrypted_doc.pdf');
if gtPDFDocument1.IsLoaded then
begin
// Modify documents encryption settings with
// the TgtPDFEncryption object returned by
// TgtPDFDocument.Encryption property
with gtPDFDocument1.Encryption do
begin
Enabled := True;
Level := el128bit; // 128-bit encryption level
OwnerPassword := 'Owner';
UserPassword := 'User';
UserPermissions
:= [AllowAccessibility,
AllowPrint,
AllowHighResPrint];
end;
end;
// Save the encrypted document to file
gtPDFDocument1.SaveToFile('encrypted_doc.pdf');
...
This code snippet shows how to mark page numbers on all pages in a PDF document.
...
var
I: Integer;
gtPDFDocument1: TgtPDFDocument;
begin
gtPDFDocument1 := TgtPDFDocument.Create(Nil);
try
gtPDFDocument1.LoadFromFile('sample_doc.pdf');
if gtPDFDocument1.IsLoaded then
begin
gtPDFDocument1.MeasurementUnit := muPixels;
// Write formatted string on all pages
// at specified location
gtPDFDocument1.TextOut(
'Page <%PageNo%> of <%TotPage%>', // page number
gtPDFDocument1. // x-coordinate
GetPageSize(I, muPixels).Width - 150,
100); // y-coordinate
end;
// Save the modified document
gtPDFDocument1.SaveToFile('numbered_pages_doc.pdf');
...
The text string is written by an overloaded TgtPDFDocument.TextOut() method. The string contains two
built-in placeholders for the current page number and the total page number. PDFtoolkit substitutes built-in
placeholders with their values at run time.
You can use placeholders with any TgtPDFDocument method that writes text to a document. You can create
your own placeholders and have them substituted at run time by writing a handler for the TgtPDFDocument
OnCalcVariables() event.
PDFtoolkit can add, edit, fill, and flatten PDF form fields. Editing a PDF form field involves changing its
properties such as its looks, position, or interactivity. Filling a PDF form field involves specifying a
particular value for the form field and saving the modified form field to the document. Flattening a form
field removes all interactivity from the form field but ensures that the form field still looks its original self.
...
var
gtPDFDocument1: TgtPDFDocument;
// List box form field
gtPDFFormListBox1: TgtPDFFormListBox;
// Push button form field
gtPDFFormPushButton1: TgtPDFFormPushButton;
// Rectangles
gtRect1: TgtRect;
gtRect2: TgtRect;
begin
gtPDFDocument1 := TgtPDFDocument.Create(Nil);
try
gtPDFDocument1.LoadFromFile('sample_doc.pdf');
if gtPDFDocument1.IsLoaded then
begin
// Set document measurement unit
gtPDFDocument1.MeasurementUnit := muInches;
PDFtoolkit’s viewer is a visual component that can be used to display PDF documents on a VCL forms
application. It does not require Adobe® Reader to be installed on the client machine. The viewer’s API
provides methods to implement navigation, zooming, and other toolbar-driven functionality.
...
gtPDFDocument1: TgtPDFDocument;
gtPDFViewer1: TgtPDFViewer;
OpenDialog1: TOpenDialog;
edFilePath: TEdit;
edNumberOfPages: TEdit;
...
try
// Load the selected PDF document
gtPDFDocument1.LoadFromFile(edFilePath.Text);
(Click to enlarge)
PDFtoolkit’s PDF printer is a non-visual component. It has methods and properties that allow a VCL
application to query available printers, select a printer, specify print settings, and print a specified set of
pages to the selected printer. The most attractive thing about the printer component is that it can print PDF
documents without requiring external components such as GhostScript or Adobe® Reader.
PDFtoolkit includes a visual component meant for providing interactive text search capabilities to VCL
forms applications. It needs to be used in conjunction with the PDFtoolkit’s viewer component. The
functionality of the search panel is similar to the one found in Adobe Reader. See screenshot.
PDFtoolkit has several other components such as the PDFOutlineViewer, which can be used to display a
bookmark panel for a PDF document.
In summary, Gnostice PDFtoolkit is a component suite that has well-rounded capabilities in PDF processing.
What’s Next
The next version of PDFtoolkit is currently in beta. Gnostice PDFtoolkit v3.0 will use a whole new PDF
processing engine that is separate from the PDFtoolkit API logic. The key objective in writing the new PDF
processor was also modularization of logic. The advantage of this approach has been phenomenal increase in
speed, scalability, robustness, and scope for optimization. In the next part of this article, we will learn more
about this.
---o0O0o---
Links:
Trial Download
Feature Matrix
Pre-Order and Get 20% Discount
---o0O0o---
PDFOne .NET
Gnostice Document Studio .NET
A Delphi/C++Builder component suite to edit, enhance, view, print, merge, split, encrypt, annotate, and
bookmark PDF documents.
Our Java developer tools
Multi-format document-processing A Java PDF component suite to create, edit, view, print, reorganize,
component suite for Java developers. encrypt, annotate, bookmark PDF documents in Java applications.
Our Platform-Agnostic Cloud and On-Premises APIs
StarDocs