How to convert a PDF document to a preview image in PHP?
Last Updated :
22 Sep, 2021
Converting a PDF document into a set of images may not sound that fun, but it can have a few applications. As the content from images cannot be copied that easily, the conversion makes the document strictly ‘read-only’ and brings an extra layer of protection from plagiarism. The images may also come in handy when you need some ready-made slides for your quick office presentations or for embedding them into your reports and blogs.
In this post, however, we will limit ourselves to a much smaller example, that is to generate an image preview from a given PDF document. “Why previews?”, you may ask. Well, one may need it for his library management system, her online e-book retail store or just for some insane weekend programming challenge. Where do you think you can use this concept into your project? Do let me know in the comments.
Now implementing the complete conversion algorithm from scratch is not feasible, so we will stick to the 3rd party libraries to ease our task. The methods that I found appealing in this scenario are based on the following tools:
- Ghostscript: It is a command line utility available for all three major platforms, viz. Windows, Linux and Mac, that interprets PostSript and PDF files. You can read more about it on its official site.
- ImageMagick: It is a free and open-source software suite for displaying, converting, and editing raster image and vector image files. It is available for majority of mainstream programming languages, including PHP. Here’s the standard documentation for a quick overview.
Using Ghostscript
For using Ghostscript into your project, start with its installation. If you are on windows, download the executable from its download page.
Linux users can install Ghostscript directly through their default package managers;
# RPM based distros, Fedora 26/27/28
$ sudo dnf install ghostscript
Verify the installation via this command,
$ gs --version
After installation, move to the directory containing the PDF file and run the following command.
$ gs -dSAFER -dBATCH -sDEVICE=jpeg \
-dTextAlphaBits=4 -dGraphicsAlphaBits=4 \
-dFirstPage=1 -dLastPage=1 -r300 \
-sOutputFile=preview.jpg input.pdf
This will generate an image of the first page from the document. Let us understand what it actually does;
- -sDEVICE: sets the output file format of the image.
- -sTEXTVAL, -sGRAPHICVAL: sets the anti-aliasing for the resultant image. Allowed values are 1, 2 and 4.
- -r{NUM}: sets the resolution (in dpi) of the image.
- -sFirstPage, -sLastPage: set the first and the last page of the document that has to be rendered.
- -sOutputFile: sets the name of the output file.
- input.pdf: it is the actual pdf document that is used for conversion.
Now for using this command in PHP, we call exec() function. For ex:
php
<?php
exec ( "ls -l" , $output_str , $return_val );
foreach ( $output_str as $line ) {
echo $line . "\n" ;
}
?>;
|
This example, on Linux, will execute ls command and list all the directories and files onto the console.
We can use this concept and execute ghostscript command from our PHP code. Here’s how I have done it;
php
<?php
function is_pdf ( $file ) {
$file_content = file_get_contents ( $file );
if ( preg_match( "/^%PDF-[0-1]\.[0-9]+/" , $file_content ) ) {
return true;
}
else {
return false;
}
}
function create_preview ( $file ) {
$output_format = "jpeg" ;
$antialiasing = "4" ;
$preview_page = "1" ;
$resolution = "300" ;
$output_file = "preview.jpg" ;
$exec_command = "gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=" . $output_format . " " ;
$exec_command .= "-dTextAlphaBits=" . $antialiasing . " -dGraphicsAlphaBits=" . $antialiasing . " " ;
$exec_command .= "-dFirstPage=" . $preview_page . " -dLastPage=" . $preview_page . " " ;
$exec_command .= "-r" . $resolution . " " ;
$exec_command .= "-sOutputFile=" . $output_file . " '" . $file . "'" ;
echo "Executing command...\n" ;
exec ( $exec_command , $command_output , $return_val );
foreach ( $command_output as $line ) {
echo $line . "\n" ;
}
if ( ! $return_val ) {
echo "Preview created successfully!!\n" ;
}
else {
echo "Error while creating the preview.\n" ;
}
}
function __main__() {
global $argv ;
$input_file = $argv [1];
if ( is_pdf( $input_file ) ) {
create_preview( $input_file );
}
else {
echo "The input file " . $input_file . " is not a valid PDF document.\n" ;
}
}
__main__();
?>
|
The execution starts from __main__() which takes PDF file at command line. It checks whether the input file is valid PDF or not. If valid, it executes the ghostscript command over the input file.
Output:
$ php pdf_preview.php input.pdf
Executing command...
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
Preview created successfully!!
Using ImageMagick
As usual, we will start with installing ImageMagick binaries into the system. Start with the dependencies;
$ sudo dnf install gcc php-devel php-pear
After that, install ImageMagick;
$ sudo dnf install ImageMagick ImageMagick-devel
Then install the PHP wrapper classes;
$ sudo pecl install imagick
$ sudo bash -c "echo "extension=imagick.so" > /etc/php.d/imagick.ini"
If you are planning to use it on LAMP architecture, consider restarting the Apache Web server;
$ sudo service httpd restart
Now that our system is ready, we can use ImageMagick into our example project. The basic functionality of the script remains the same. All you have to do is to replace the content of the create_preview() function with the following code.
php
function create_preview ( $file ) {
$output_format = "jpeg" ;
$preview_page = "1" ;
$resolution = "300" ;
$output_file = "imagick_preview.jpg" ;
echo "Fetching preview...\n" ;
$img_data = new Imagick();
$img_data ->setResolution( $resolution , $resolution );
$img_data ->readImage( $file . "[" . ( $preview_page - 1) . "]" );
$img_data ->setImageFormat( $output_format );
file_put_contents ( $output_file , $img_data , FILE_USE_INCLUDE_PATH );
}
|
The code is self-explanatory. We are defining an instance of Imagick type and setting various parameters like resolution, file format, etc. The PDF page you want to render is mentioned as an array index after the file name. For ex:
First page: input.pdf[0]
Second page: input.pdf[1]
.
.
.
Nth page: input.pdf[N - 1]
Output:
$ php pdf_preview.php input.pdf
Fetching preview...
Some of you might be wondering why to use this method over the previous one. Well, I found the ImageMagick one pretty consistent with the PHP code. A command line in programming does not look that good and sometimes becomes notorious to understand. However, with the same set of configurations, Ghostscript produced smaller image files than the ones rendered by ImageMagick. I am not sure if that is because of some optimization issues, but the difference is not of that big concern. The choice of one over the other is merely based on your own taste.
So this is how you create a preview for a given PDF document. I hope you have learned something new from this post. Which method would you prefer? Have any suggestions for further improvements? Feel free to mention them in the comments.
Similar Reads
How to convert an image to base64 encoding in PHP?
The base64_encode() function is an inbuilt function in PHP which is used to convert any data to base64 encoding. In order to convert an image into base64 encoding firstly need to get the contents of file. This can be done with the help of file_get_contents() function of PHP. Then pass this raw data
2 min read
How to convert an HTML element or document into image ?
This article is going to tell and guide the users to convert a div element into an image using AngularJS. The user will be generating an image from the webpage and also be able to convert a particular part of the HTML page into the picture. Also, the user needs an HTML tag and html2canvas JavaScript
3 min read
How to Convert a PDF to Google Doc: A Step-by-Step Guide
How to Turn a PDF into a Google Doc - Quick StepsOpen Google Drive Click on Upload > Upload the PDFLocate the PDF > Perform a Right-click Select "Open With" > Google DocsGo to File Menu in Google Docs > Download as Google DocsLooking to convert a PDF into an accessible Google Doc? Conver
8 min read
How to Convert Photos to PDF on iPhone and iPad?
In this digital era, smaller file sizes play a big role in transferring larger files. One of the most common ways to reduce size is by converting pictures to PDF on iPhone, whether high-quality images in JPEG, JPG, or PNG format. People donât realize that they donât need a third-party application to
6 min read
How to extract images from PDF in Python?
The task in this article is to extract images from PDFs and convert them to Image to PDF and PDF to Image in Python. To extract the images from PDF files and save them, we use the PyMuPDF library. First, we would have to install the PyMuPDF library using Pillow. pip install PyMuPDF PillowPyMuPDF is
3 min read
How to Convert a Google Doc to PDF
Google Doc to PDF - Quick StepsVisit Google Docs > Open your documentClick File in the top-left menuSelect Download > PDF Document (.pdf).PDF downloaded > Locate the downloaded fileConverting a Google Doc to a PDF is a quick and efficient way to share or preserve your document with consiste
11 min read
How to Create PDF Document in Node ?
Creating a PDF document in Node can be achieved using various libraries like pdf-lib, pdfkit, and puppeteer. This article will focus on using pdfkit to create PDF documents because it is a well-documented and powerful library suitable for a wide range of PDF generation tasks. Prerequisites:NPMNodeJS
2 min read
How To Convert Google Forms Responses To Pdf
Google Forms is a powerful and versatile tool widely used for creating surveys, quizzes, and various data collection forms. It allows users to gather responses efficiently, but there often arises a need to convert these responses into a more portable and shareable format, such as PDF. Converting Goo
7 min read
How to Create and Manipulate PDF Documents in Node.js with 'PDFKit' Module ?
Creating and manipulating PDF documents programmatically is a common requirement in many web applications. Whether it's generating invoices, reports, or complex forms, the ability to create PDFs directly from your server-side code can greatly enhance the functionality of your application. In this ar
3 min read
How to convert a PDF file to TIFF file using Python?
This article will discover how to transform a PDF (Portable Document Format) file on your local drive into a TIFF (Tag Image File Format) file at the specified location. We'll employ Python's Aspose-Words package for this task. The aspose-words library will be used to convert a PDF file to a TIFF fi
3 min read