How to Extract Text from Images with Python? Last Updated : 26 Dec, 2020 Comments Improve Suggest changes Like Article Like Report OCR (Optical Character Recognition) is the process of electronical conversion of Digital images into machine-encoded text. Where the digital image is generally an image that contains regions that resemble characters of a language. OCR is a field of research in pattern recognition, artificial intelligence and computer vision. This is due to the fact that newer OCR's are trained by providing them sample data which is ran over a machine learning algorithm. This technique of extracting text from images is generally carried out in work environments where it is certain that the image would be containing text data. In this article, we would learn about extracting text from images. We would be utilizing python programming language for doing so. For enabling our python program to have Character recognition capabilities, we would be making use of pytesseract OCR library. The library could be installed onto our python environment by executing the following command in the command interpreter of the OS:- pip install pytesseract The library (if used on Windows OS) requires the tesseract.exe binary to be also present for proper installation of the library. During the installation of the aforementioned executable, we would be prompted to specify a path for it. This path needs to be remembered as it would be utilized later on in the code. For most installations the path would be C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe. Explanation: Firstly we imported the Image module from PIL library (for opening an image) and then pytesseract module from pytesseract library(for text extraction). Then after we defined the path_to_tesseract variable which contains the path to the executable binary (tesseract.exe) that we installed in the prerequisite (this path would depend on the location where the binary is installed). Then we defined the image_path variable which contains the path to the image file. This path is passed to the open() function to create an image object out of our image. After this, we assigned the pytesseract.tesseract_cmd variable the path stored in path_to_tesseract variable (this would be used by the library to find the executable and use it for extraction). After which we passed the image object (img) to image_to_string() function. This function takes in argument an image object and returns the text recognized inside it. In the end, we displayed the text which was found in the image using text[:-1] (due to a additional character (^L) that gets appended by default). Example 1: Image for demonstration: An image of white text with black background Below is the full implementation: Python3 from PIL import Image from pytesseract import pytesseract # Defining paths to tesseract.exe # and the image we would be using path_to_tesseract = r"C:\Program Files\Tesseract-OCR\tesseract.exe" image_path = r"csv\sample_text.png" # Opening the image & storing it in an image object img = Image.open(image_path) # Providing the tesseract executable # location to pytesseract library pytesseract.tesseract_cmd = path_to_tesseract # Passing the image object to image_to_string() function # This function will extract the text from the image text = pytesseract.image_to_string(img) # Displaying the extracted text print(text[:-1]) Output: now children state should after above same long made such point run take call together few being would walk give Example 2: Image for demonstration: Code: Python3 from PIL import Image from pytesseract import pytesseract # Defining paths to tesseract.exe # and the image we would be using path_to_tesseract = r"C:\Program Files\Tesseract-OCR\tesseract.exe" image_path = r"csv\d.jpg" # Opening the image & storing it in an image object img = Image.open(image_path) # Providing the tesseract # executable location to pytesseract library pytesseract.tesseract_cmd = path_to_tesseract # Passing the image object to # image_to_string() function # This function will # extract the text from the image text = pytesseract.image_to_string(img) # Displaying the extracted text print(text[:-1]) Output: Geeksforgeeks Comment More infoAdvertise with us Next Article How to Extract Text from Images with Python? V vasudev4 Follow Improve Article Tags : Python Image-Processing Python-pil Practice Tags : python Similar Reads Python Pillow Tutorial sinceDigital Image processing means processing the image digitally with the help of a computer. Using image processing we can perform operations like enhancing the image, blurring the image, extracting text from images, and many more operations. There are various ways to process images digitally. He 15+ min read Introduction to PillowPython: Pillow (a fork of PIL)Python Imaging Library (expansion of PIL) is the de facto image processing package for Python language. It incorporates lightweight image processing tools that aids in editing, creating and saving images. Support for Python Imaging Library got discontinued in 2011, but a project named pillow forked 4 min read Installation and setupHow to Install Pillow on MacOS?In this article, we will learn how to install Pillow in Python on MacOS. Python Imaging Library (expansion of PIL) is the de facto image processing package for Python language. Installation:Method 1: Using pip to install Pillow Follow the below steps to install the Pillow package on macOS using pip: 2 min read How to Install PIL on Windows?In this article, we will look into the various methods of installing the PIL package on a Windows machine. Prerequisite:Python PIP or Ananconda (Depending upon your preference)For PIP Users: Open up the command prompt and use the below command to install the PIL package: pip install Pillow The follo 1 min read How to Install PIL on Linux?PIL is an acronym for Python Image Library. It is also called Pillow. It is one of the most famous libraries for manipulating images using the python programming language. It is a free and open-source Python library. Installing PIL on Linux:Method 1: Using PIP command: Step 1: Open up the Linux term 1 min read Loading and Saving ImagesPython PIL | Image.save() methodPIL is the Python Imaging Library which provides the python interpreter with image editing capabilities. The Image module provides a class with the same name which is used to represent a PIL image. The module also provides a number of factory functions, including functions to load images from files, 3 min read Python PIL | Image.show() methodPIL is the Python Imaging Library which provides the python interpreter with image editing capabilities. The Image module provides a class with the same name which is used to represent a PIL image. The module also provides a number of factory functions, including functions to load images from files, 1 min read Finding Difference between Images using PILPython interpreter in itself doesn't contain the ability to process images and making out a conclusion to it. So, PIL(Python Imaging Library) adds image processing powers to the interpreter. PIL is an open-source library that provides python with external file support and efficiency to process image 2 min read Image Manipulation BasicsPython Pillow - Working with ImagesIn this article, we will see how to work with images using Pillow in Python. We will discuss basic operations like creating, saving, rotating images. So let's get started discussing in detail but first, let's see how to install pillow. Installation To install this package type the below command in t 4 min read Python PIL | Image.resize() methodPIL is the Python Imaging Library which provides the python interpreter with image editing capabilities. The Image module provides a class with the same name which is used to represent a PIL image. The module also provides a number of factory functions, including functions to load images from files, 4 min read Python Pillow - Flip and Rotate ImagesPrerequisites: Pillow Python Pillow or PIL is the Python library that provides image editing and manipulating features. The Image Module in it provides a number of functions to flip and rotate images. image.transpose() is the function used to rotate and flip images with necessary keywords as paramet 2 min read Python PIL | paste() and rotate() methodPIL is the Python Imaging Library which provides the python interpreter with image editing capabilities. PIL.Image.Image.paste() method is used to paste an image on another image. This is where the new() method comes in handy. Syntax: PIL.Image.Image.paste(image_1, image_2, box=None, mask=None) OR i 2 min read Adjusting Image PropertiesChange image resolution using Pillow in PythonPrerequisites: Python pillow PIL is the Python Imaging Library which provides the python interpreter with an in-depth file format support, an efficient internal representation, and fairly powerful image processing capabilities. Changing the resolution of an image simply means reducing or increasing 2 min read Image Enhancement in PILThe Python Imaging Library(PIL) adds powerful image processing capabilities. It provides immense file format support, an efficient representation, and fairly powerful image processing capabilities. The core image library is intended for fast access to data stored in very few basic pixel formats. It 4 min read Image Filtering and EffectsPython Pillow - Blur an ImageBlurring an image is a process of reducing the level of noise in the image, and it is one of the important aspects of image processing. In this article, we will learn to blur an image using a pillow library. To blur an image we make use of some methods of ImageFilter class of this library on image o 2 min read How to merge images with same size using the Python 3 module pillow?In this article, the task is to merge image with size using the module pillow in python 3. Python 3 module pillow : This is the update of Python Imaging Library. It is a free and open-source additional library for the Python programming language that adds support for opening, manipulating, and savi 2 min read Drawing on ImagesAdding Text on Image using Python - PILIn Python to open an image, image editing, saving that image in different formats one additional library called Python Imaging Library (PIL). Using this PIL we can do so many operations on images like create a new Image, edit an existing image, rotate an image, etc. For adding text we have to follow 2 min read Python Pillow - ImageDraw ModulePython's Pillow which is a fork of the discontinued Python Imaging Library (PIL) is a powerful library that is capable of adding image processing capabilities to your python code. Pillow offers many modules that ease the process of working and modifying images. In this article, we will have a look a 5 min read Python Pillow - Colors on an ImageIn this article, we will learn Colors on an Image using the Pillow module in Python. Let's discuss some concepts: A crucial class within the Python Imaging Library is the Image class. It's defined within the Image module and provides a PIL image on which manipulation operations are often administere 4 min read Image TransformationsHow to rotate an image using Python?Image rotation in Python rotates an image around its centre by a specified angle using forward or inverse methods. When the angle isnât a multiple of 90 degrees, parts of the image may move outside the visible boundaries and get clipped. To avoid losing important content during rotation you need pro 3 min read Python PIL | Image.transform() methodPIL is the Python Imaging Library which provides the python interpreter with image editing capabilities. The Image module provides a class with the same name which is used to represent a PIL image. The module also provides a number of factory functions, including functions to load images from files, 1 min read Working with Image MetadataHow to extract image metadata in Python?Prerequisites: PIL Metadata stands for data about data. In case of images, metadata means details about the image and its production. Some metadata is generated automatically by the capturing device. Some details contained by image metadata is as follows: HeightWidthDate and TimeModel etc. Python h 2 min read Python | Working with the Image Data Type in pillowIn this article, we will look into some attributes of an Image object that will give information about the image and the file it was loaded from. For this, we will need to import image module from pillow. Image we will be working on : size() method - It helps to get the dimensions of an image. IMG = 2 min read Like