Handwritten Digits Recognition
Handwritten Digits Recognition
A MINI-PROJECT REPORT
ON
“Handwritten Digits Recognition”
Submitted by
210818104007 T. Anusha
210818104015 L. Jayalakshmi
i
KINGS ENGINEERING COLLEGE
Irungattukottai, Sriperumbudur, Chennai – 602117
CERTIFICATE
210818104007 T. Anusha
210818104015 L. Jayalakshmi
……………………… …………………………
……………...…………………………
Head of the Department
ii
ACKNOWLEDGEMENT
We thank our God for all his blessings showered on us and also for giving
us good knowledge and strength in enabling us to finish our project. Our deep
gratitude goes to our founder (Late) D. SELVARAJ M.A., M.L.A. for his
patronage in the completion of our project.
iii
ABSTRACT
Nowadays, more and more people use images to represent and transmit
information. It is also popular to extract important information from images.
Image recognition is an important research area for its widely applications. In the
relatively young field of computer pattern recognition, one of the challenging
tasks is the accurate automated recognition of human handwriting. Optical
Character Recognition (OCR) is a subfield of Image Processing which is
concerned with extracting text from images or scanned documents. In this project,
we have chosen to focus on recognizing handwritten digits available in the
MNIST database. The challenge in this project is to use basic Image Correlation,
also known as Matrix Matching, techniques in order to maximize the accuracy of
the handwritten digits recognizer without going through sophisticated techniques
like machine learning.
iv
TABLE OF CONTENTS
ABSTARCT iv
1. INTRODUCTION 1
1.1 OBJECTIVES 2
1.2 SCOPE 2
2. LITERATURE SURVEY 3
3.4 Tools 10
v
CHAPTER NO TITLE PAGE NO
3.4.1 Octave 11
4. METHODOLOGY 13
5. DATA 14
REFERENCES 19
vi
LIST OF FIGURES
vii
LIST OF ABBREVIATION
viii
CHAPTER 1
INTRODUCTION
It is easy for the human brain to process images and analysis them. When the eye
sees a certain image, the brain can easily segment it and recognize its different elements.
The brain automatically goes through that process, which involves not only the analysis of
this images, but also the comparison of their different characteristics with what it already
knows in order to be able to recognize these elements. There is a field in computer science
that tries to do the same thing for machines, which is Image Processing. Image processing
is the field that concerns analyzing images so as to extract some useful information from
them. This method takes images and converts them into a digital form readable by
computers, it applies certain algorithms on them, and results in a better-quality image or
with some of their characteristics that could be used in order to extract some important
information from them. Image processing is applied in several areas, especially nowadays,
and several software’s have been developed that use this concept. Now we have self-driven
cars which can detect other cars and human beings to avoid accidents. Also, some social
media applications, like Facebook, can do facial recognition thanks to this technique.
Furthermore, some software’s use it in order to recognize the characters in some images,
which is the concept of optical character recognition, that we will be discussing and
discovering in this project. One of the narrow fields of image processing is recognizing
characters from an image, which is referred to as Optical Character Recognition (OCR).
This method is about reading an image containing one or more characters, or reading a
scanned text of typed or handwritten characters and be able to recognize them. A lot of
research has been done in this field in order to find optimal techniques with a high accuracy
and correctness. The most used algorithms that proved a very high performance are
machine learning algorithms like Neural Networks and Support Vector Machine. One of
the main applications of OCR is recognizing handwritten characters. In this project, we will
focus on building a mechanism that will recognize handwritten digits. We will be reading
images containing handwritten digits extracted from the MNIST database and try to
recognize which digit is represented by that image. For that we will use basic Image
Correlation techniques, also referred to as Matrix Matching. This approach is based on
matrices manipulations, as it reads the images as matrices in which each element is a pixel.
1
1.1 OBJECTIVES
• To provide an easy user interface to input the object image.
• User should be able to upload the image
• System should be able to upload the image
• System should be able to preprocess the given input to supress the background
• System should be able to detect digit regions present in the image
• System should retrieve digit present in the image and display them to the user
1.2 SCOPE
• Improve of human computer interface for computer illiterate people by providing
various computing services on inputs
• Can be implemented on smart phones, tablets, as a virtual keyboard
• The system can create paperless environment by digitizing handwritten character
2
CHAPTER 2
LITERATURE SURVEY
The last frontiers of handwriting recognition are considered to have started in the
last decade of the second millennium. This paper summarizes the nature of the problem of
handwriting recognition, the state of the art of handwriting recognition at the turn of the
new millennium, the results of CENPARMI researchers in automatic recognition of
handwritten digits, touching numerals, cursive scripts, and dates formed by a mixture of
the former 3 categories. Wherever possible, comparable results have been tabulated
according to techniques used, databases, and performance. Aspects related to human
generation and perception of handwriting are discussed. The extraction and usage of human
knowledge, and their cooperation into handwriting recognition systems are presented.
Challenges, aims, trends, efforts and possible rewards, and suggestions for future
investigations are also included.
3
classifiers are comparable to or higher than those of the nearest neighbor (1-NN) rule and
regularized discriminant analysis (RDA). It is shown that neural classifiers are more
susceptible to small sample size than MQDF, although they yield higher accuracies on large
sample size. As a neural classifier, the polynomial classifier (PC) gives the highest accuracy
and performs best in ambiguity rejection. On the other hand, MQDF is superior in outlier
rejection even though it is not trained with outlier data. The results indicate that pattern
classifiers have complementary advantages and they should be appropriately combined to
achieve higher performance.
At present the deep neural network is the hottest topic in the domain of machine
learning and can accomplish a deep hierarchical representation of the input data. Due to
deep architecture the large convolutional neural networks can reach very small test error
rates below 0.4% using the MNIST database. In this work we have shown, that high
accuracy can be achieved using reduced shallow convolutional neural network without
adding distortions for digits. The main contribution of this paper is to point out how using
simplified convolutional neural network is to obtain test error rate 0.71% on the MNIST
handwritten digit bench‐ mark. It permits to reduce computational resources in order to
model convolutional neural network.
5
CHAPTER 3
There are two ways to provide input to the system. The user can either upload the
image of the digit he wants to detect or the data from the MNIST dataset. The input images
are pre-processed. Using the different classifiers, the recognized digits’ accuracy is
compared and the result is obtained. The results obtained are displayed along with the
accuracy.
6
3.2 Optical Character Recognition (OCR)
It is easy for the naked eye to recognize a character when spotted in any document;
however, computers cannot identify the characters from an image or scanned document. In
order to make this possible, a lot of research has been done, which resulted in the
development of several algorithms that made this possible. One of the fields that specialize
in character recognition under the light of Image Processing is Optical Character
Recognition (OCR). In Optical Character Recognition, a scanned document or an image is
read and segmented in order to be able to decipher the characters it contains. The images
are taken and are preprocessed so as to get rid of the noise and have unified colors and
shades, then the characters are segmented and recognized one by one, to finally end up with
a file containing encoded text containing these characters, which can be easily read by
computers. Optical Character Recognition dates back to the early 1900s, as it was
developed in the United States in some reading aids for the blind. In 1914, Emanuel
Goldberg was able to implement a machine able to convert characters into “standard
telegraph code”. In the 1950s, David Shepard, who was at that time an engineer at the
Department of Defense, developed a machine that he named Gismo, which is able to read
characters and translate them into machine language. In 1974, Ray Kurzweil decided to
develop a machine that would read text for blind and visually impaired people under his
company, Kurzweil Computer Products. There are several software and programs,
nowadays, which use OCR in several different applications. In 1996, the United States
Postal Services were able to develop a mechanism, HWAI, which recognizes handwritten
mail addresses.
A lot of research has been done in the field of OCR, and still being done, which
resulted in the development of several algorithms which enable computers to recognize
characters from images or scanned texts. Many of these techniques have attained very high
efficiency and a low error rate. However, these algorithms are still being investigated and
improved for a better performance.
7
3.3.1 Machine Learning
Machine learning is a field that concerns making programs learn and know how to
behave in different situations using data. One of its applications is Optical Character
Recognition.
Artificial Neural Networks are used for OCR and have proved a very high accuracy
rate. In this case, the ANN would “recognize a character based on its topological features
such as shape, symmetry, closed or open areas, and number of pixels”. The high accuracy
of this kind of algorithms is mainly thanks to its ability of learning from the training set,
which would contain characters with similar features.
8
3.3.3 Support Vector Machine
Support Vector Machine (SVM) is an algorithm that belongs to machine learning
as well. SVMs are known as high performance pattern classifiers. While Neural Networks
aim at minimizing the training error, SVMs have as goal to minimize the “upper bound of
the generalization error”. The learning algorithm in this technique is based on classification
and regression analysis.
This kind of classifier has been used in the recognition of very complex characters
like the Khmer language and has proved a very high performance.
9
problems arise when having characters of different sizes, or when one of them is rotated by
a certain angle.
3.4 Tools
This project’s main objective is to be able to read the images containing the handwritten
digits and be able to identify those digits using basic image correlation techniques. These
images are normally represented and read as matrices, in which every element portrays a
pixel. The image correlation technique takes these matrices and compares them using some
algorithms so as to identify the match that represents the digit we are trying to figure out.
This project will be mainly using matrices and heavy numerical computations, that is why
it is very important to consider the tools that would provide us with a suitable environment
for performing these computations.
10
Figure: 3.4 Work Flow Diagram
3.4.1 Octave
Octave is a free and open-source software that uses a high-level programming
language. It has the same functionalities as MATLAB and is compatible with it. It offers a
very simple and suitable interface to exert some mathematical computations. It provides
some tools to solve mathematical problems like some common linear algebra problems. It
is also very efficient when it comes to the use of resources, i.e., time and memory, when it
comes to these operations. Also, it is very easy to use it when dealing with matrices, as it
provides with many functions and operations that make it less costly to manipulate them.
In this project, we will deal with images as matrices, in which each element represents a
pixel, that is why it is very necessary for us to choose a tool that will make our computations
easier and more efficient in terms of time and memory resources. Both MATLAB and
Octave are very easy to learn and work with and provide a suitable environment for this
kind of projects. We have opted for Octave as it is free and open source.
11
3.4.2 MNIST Database
The MNIST database, which stands for the Modified National Institute of Standards
and Technology database, is a very large dataset containing several thousands of
handwritten digits. This dataset was created by mixing different sets inside the original
National Institute of Standards and Technology (NIST) sets, so as to have a training set
containing several types and shapes of handwritten digits, as the NIST set was divided into
those written by high school students and others written by the Census Bureau workers.
The MNIST dataset 8 has been the target of so many researches done in recognizing
handwritten digits. This allowed the development and improvements of many different
algorithms with a very high performance, such as machine learning classifiers. In order to
be able to implement our recognizer and test its performance, it is necessary to have a
suitable dataset which contains a large number of handwritten digits. This dataset should
be able to allow us to discover the challenges and limitation of the image correlation
technique and push us to look for ways and rules to enhance it and assess its accuracy. We
have opted for this dataset to be used for testing our program since it has proved a great
reliability and importance in the field.
As for the dataset to use in the testing of the project, we have chosen the MNIST
Database. This database contains thousands of handwritten digits that have been used in
the development of programs with a similar aim. This dataset is open for public use with
no charges. It is also very convenient for our project and will help us reduce the time by
using directly as a test set without having to make one ourselves.
Since all the tools to be used in this project are free of charge and very easy to use,
we can conclude that this project is very feasible in terms of financial resources as well as
effort and time.
12
CHAPTER 4
METHODOLOGY
4.1 Getting Familiar with the Tools
The first step we had to go through while working on this project was getting
familiar with the tools used, i.e., Octave and the MNIST dataset. After setting up the
environment for Octave to work perfectly and downloading the dataset, I have started
experimenting with both in order to get familiar with them and know how to use them easily
in the future. Since all the programming is mainly done in Octave, we had to download it
along with its Graphical User Interface into the computer, and learn a little bit about its
functions and how to use it. Octave is a free software which makes it very easy to work
with matrices and vectors and is very efficient in performing calculations on them. I have
started learning how to use it and looking for its main functions that I will be using in the
implementation of the project. For that, I have used some random images of digits to see
how they can be read and modified as well as how to apply some computations on them.
Moreover, I had to investigate the format of the MNIST dataset and get familiar with its
representation. The MNIST dataset, which was used to create our test set, contains
thousands of handwritten digits, represented as matrices. It has been used in the
development of several programs and projects with the same aim as ours. After
downloading the file which contains the handwritten digits, I have loaded it on Octave in
order to visualize the images and figure out how to use and manipulate them.
13
CHAPTER 5
DATA
Data
It is very necessary to know the kind of data we are using before we start the design
and the implementation of the program. That is why we had to have a look at its format to
understand how it is represented before creating the reference and the test set.
Furthermore, the data did not contain noise or any major problems to deal with, that is why
it was used without preprocessing it.
15
5.3. Test Set
The program to be developed needs to be tested against some images that contain
handwritten digits so as to be able to assess its performance and calculate its success rate.
That is why it is very necessary to create a test set. The test set represents an example of
the images containing the handwritten digits which will have to be compared to the images
in the reference set so as to identify them. This set was formed using the file from the
MNIST database. The original file contained 60,000 images representing different digits.
This made it difficult to look for each number using the label for the testing of the program.
In order to make it easier to access each digit we want; we have decided to store a number
of images from each digit in a separate file. That is why we have stored 20 images of each
digit in ten different files. That is to say, the resulting test set was in the form of ten files,
each one of them represents a digit and contains 20 images of it. These images were
extracted from the initial file by reading them and their labels using Octave. In order to
make the manipulation of the matrices/images easier, we had to make some modifications
in the elements of all the matrices representing the test set as well. The black pixels were
originally represented as zeros, so they were left the same. As for the white ones, each of
them had a different non zero number, so we turned them all into ones.
16
CHAPTER 6
CONCLUSION AND FUTURE WORK
Conclusion
Optical Character Recognition is a very broad field concerned with turning an
image or a scanned document containing a set of characters into an encoded text that could
be read by machines. In this project, we have attempted to build a recognizer for
handwritten digits using the MNIST dataset. The challenge of this project was to be able to
come up with some basic image correlation techniques, instead of some sophisticated
algorithms, and see to what extent we can make this mechanism accurate. We have tried
several versions and kept trying to improve each one in order to reach a higher performance
rate. The last version has reached a rate of 57% accuracy. Unfortunately, we could not
compare the performance of the mechanism we have built to some others that have already
been designed and/or implemented before because we did not find any academic paper that
tackles this method. The performance we have reached is far less than that of machine
learning, which reaches a performance rate of 99.3%; however, it could be further improved
and made into a better one. The goal of this project was to explore the field of OCR and try
to come up with some techniques that could be used without going into deep computations,
and even if the final result is not very reliable, it still provides an accuracy way better than
random.
Future work
The future steps that to go for would be having a closer look at the results of all the
versions in order to find new rules. By extracting and implementing them, we will be able
to enhance the performance of these versions. Moreover, it would be good if we could make
some modifications to both the reference set and the rules in order to make our program
more general and able to identify both typed and handwritten digits. Furthermore, in the
future, we could make a great use of the matrices that indicate the first maximum overlap
of each test image with the reference images, along with the number of pixels left out from
both. These matrices could be used with some clustering algorithms to build a program able
to recognize handwritten digits with a very high efficiency. Last but not least, we thought
about using linear or high-level regression in the versions we have developed in order to
create more rules. As regression could be used for binary classification and is not very
17
suitable to classify a digit out of ten, this technique could be used in order to tell which
digit is the most suitable, the first maximum or second maximum, which will enable us to
generate more rules; thus, reach a higher efficiency.
18
REFERENCES
[1] C.Y. Suen, J. Kim, K. Kim, Q. Xu, L. Lam, Handwriting recognition the last frontiers,
Proc. 15th ICPR, Barcelona, 2000, Vol.4, pp.1-10.
[2] C.-L. Liu, M. Koga, H. Sako, H. F ujisa w a,Aspect ratio adaptive normalization for
handwritten character recognition, A dvances in Multimodal Interfaces| ICMI 2000,
LNCS 1948, Springer, 2000, pp.418-425.
[3] Vladimir Golovko, MikhnoEgor, AliaksandrBrich, and AnatoliySachenko (October
2016), “A Shallow Convolutional Neural Network for Accurate Handwritten Digits
Classification” 13th international conference, PRIP, Minsk, Belarus, pp. 77-85.
[4] Hongjian Zhan, ShujingLyu, Yue Lu Shanghai (August 2018), “Handwritten Digit
String Recognition using Convolutional Neural Network”, 24th International Conference
on Pattern Recognition (ICPR), pp. 3729-3734.
[5] Ahlawat, S., Choudhary, A., Nayyar, A., Singh, S., & Yoon, B. (2020). Improved
handwritten digit recognition using convolutional neural networks (CNN). Sensors,
20(12), 3344. doi:10.3390/s20123344.
[6] N. Hagita, S. Naito, I. Masuda, Handprinted Kanji characters recognition based on
pattern matching method, Pr oc. ICTP, 1983, pp.169-174.
[7] D.-S. Lee, S.N. Srihari, Handprinted digit recognition: a comparison of algorithms, Pr
oc. 3rd IWFHR, 1993, pp.153-164.
[8] U. Kreel, J. Sch•urmann, Pattern classication techniques based on function
approximation, Handbook of Character R ecognition and Document Image Analysis,
World Scientic, 1997, pp.49-78.
[9] Xiaofeng Han and Yan Li (2015), “The Application of Convolution Neural Networks
in Handwritten Numeral Recognition” in International Journal of Database Theory and
Application, Vol. 8, No. 3, pp. 367-376.
[10] T Siva Ajay (July 2017), “Handwritten Digit Recognition Using Convolutional
Neural Networks” International Research Journal of Engineering and Technology
(IRJET), Vol. 04, Issue 07, pp. 2971-2976.
19