0% found this document useful (0 votes)
26 views

An Efficient OCR System Based On The Regional Feature Using The ASVM As Classifier

In Image Processing, sometimes due to poor handwriting, the writer left some gap between diacritics and character or between diacritics and header line due to which small text blocks gets created which leads to improper text line segmentation and hence leads to wrong results and overlapping. As a result accuracy of the algorithm degrades. In proposed work Adaptive SVM will be used to improve accuracy of the system. Maninder Kaur | Ms. Manjeet Kaur"An Efficient OCR System based on the Regional Feature using the ASVM as Classifier" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-5 , August 2017, URL: http://www.ijtsrd.com/papers/ijtsrd2425.pdf http://www.ijtsrd.com/engineering/electronics-and-communication-engineering/2425/an-efficient-ocr-system-based-on-the-regional-feature--using-the-asvm-as-classifier/maninder-kaur

Uploaded by

Editor IJTSRD
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

An Efficient OCR System Based On The Regional Feature Using The ASVM As Classifier

In Image Processing, sometimes due to poor handwriting, the writer left some gap between diacritics and character or between diacritics and header line due to which small text blocks gets created which leads to improper text line segmentation and hence leads to wrong results and overlapping. As a result accuracy of the algorithm degrades. In proposed work Adaptive SVM will be used to improve accuracy of the system. Maninder Kaur | Ms. Manjeet Kaur"An Efficient OCR System based on the Regional Feature using the ASVM as Classifier" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-5 , August 2017, URL: http://www.ijtsrd.com/papers/ijtsrd2425.pdf http://www.ijtsrd.com/engineering/electronics-and-communication-engineering/2425/an-efficient-ocr-system-based-on-the-regional-feature--using-the-asvm-as-classifier/maninder-kaur

Uploaded by

Editor IJTSRD
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

International Journal of Trend in Scientific

Research and Development (IJTSRD)


International Open Access Journal
ISSN No: 2456 - 6470 | www.ijtsrd.com | Volume - 1 | Issue – 5

An Efficient OCR System based on the Regional Feature


using the ASVM as Classifier

Maninder Kaur Ms. Manjeet Kaur


Research Scholar, ECE Department, Assistant Professor, ECE Department,
Rayat and Bahra University, Mohali Rayat and Bahra University, Mohali

ABSTRACT
In Image Processing, sometimes due to poor information entry from printed paper data records,
handwriting, the writer left some gap between whether passport documents, invoices, bank
diacritics and character or between diacritics and statements, computerized receipts, business cards,
header line due to which small text blocks gets mail, printouts of static-data, or any suitable
created which leads to improper text line documentation. OCR is mostly used in the area of the
segmentation and hence leads to wrong results and computer vision, artificial intelligence, and pattern
overlapping. As a result accuracy of the algorithm recognition. Handwriting text recognition (HTR) can
degrades. In proposed work Adaptive SVM will be be defined as the ability of a computer to transform
used to improve accuracy of the system. handwritten input represented in its spatial form of
graphical marks into equivalent symbolic
INTRODUCTION: representation as ASCII text. Usually, this
handwritten input comes from sources such as paper
Optical character recognition which is also commonly documents, photographs or electronic pens and touch-
known as optical character reader is the process of screens.
converting the mechanical and electronic images into
the handwritten, printed text etc.OCR is a course by
which focused software is used to change the
skimmed pictures of manuscript to electronic text so
that digitized data can be examined, indexed and
recovered. The OCR are basically design to settled
and improved the multiple real world applications
such as mining data from business documents, checks,
passports, invoices, bank statements, insurance Fig.1 Handwritten document Convert into Text
documents, license plates etc. Each and every
application contains the processing data sets that Approaches for learning Optical character
contains the hundreds and thousands of scanned recognition:-The following are the approaches for
documents of the images in order to train and enhance learning Optical character recognition.
the systems and the processing of the drill data set is
➢ Histogram Approach:- This method is based on
naturally done by humans in order to provide accurate
the pixel histogram in which a Y-histogram
data that can be used by the engine to learn and apply,
forecast is achieved which results in text line
makes it smarter. OCR is mainly used as a form of
position and to divide the line into different areas
a threshold is applied.

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 1 | Issue – 5 | July-Aug 2017 Page: 1226
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
➢ Header line and base line detection based OCR involved various steps to read the characters
method: -This method calculates the header line from a scanned Image. In proposed research, a model
and base line of a text document for line has been built for handwritten images. The system
segmentation. extracts the characters from handwritten images and
➢ Line Segmentation: This algorithm is based on writes into text file.
the projection profile method. This algorithm
professionally pacts with skewed text as well as Flow Chart OF OCR Model
with the overlapped and touched text lines.
➢ Character Segmentation: To extract characters
with overlapping, this method helps to removes Pre-processing
the vowel converters and consonant transformers.
By removing the consonant and vowel converters, Segmentation
the word image contain only the base characters
with clear paths between them.
➢ Overlapping and Touching of Characters:- Due Normalization
to overlapping and touching of characters, there
may contains no important break between text
lines and when the two or more text lines comes in Feature extraction
a same text block then they leads to wrong results.
Classification

Post -processing

Fig.3 Steps of OCR System


Fig.2 Overlapping of Character
Machine learning techniques used in hand written Data Acquisition: Most Important initial phase in
recognition: OCR is to gather the image from either device sensor
like PDA or tablets in case on online recognition or
Artificial Neural Network (ANN) An Artificial getting the images containing characters directly for
Neuron is basically an engineering approach of offline recognition. The image should have a specific
biological neuron. ANN consists of a number of format such as JPEG, BMP etc.
nodes, called neurons. Neural networks are typically Pre Processing: The goal of pre-processing is to
organized in layers. In neural network each neuron in simplify the pattern recognition problem without
hidden layer receives signals from all the neurons in missing any vital information. It reduces the noises
the input layer. The strength of each signal and the and inconsistent data. It enhances and prepares it for
biases are represented by weights and constants, the next steps.
which are calculated through the training phase. After Segmentation: Segmentation is an integral part of
the inputs are weighted and added, the result is then any text based recognition system. It assures
transformed by a transfer function into the output. efficiency of classification and recognition. Accuracy
of character recognition heavily depends upon
Support Vector Machine: SVM is a non-linear
segmentation phase.
classifier which is now mostly new in the machine
Normalization: The results of segmentation process
learning which is used to solve the texture
provides isolated characters which are ready to pass
classification and pattern recognition problems.SVM
through feature extraction stage, thus the isolated
is designed to work with only two classes by
characters are reduced to a specific size depending on
determining the hyper plane to divide two classes and
the methods used. The segmentation process
the separation of these classes is performed by using
essentially renders the image in the form of m*n
different Kernels.
matrix.
PROPOSED METHODOLOGY Feature Extraction: Feature extraction is the process
of extracting the relevant features from

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 1 | Issue – 5 | July-Aug 2017 Page: 1227
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
objects/alphabets to form a feature vectors. These Post Processing: The goal of post processing is the
feature vectors is then used by classifiers to recognize incorporation of context and shape information in all
the input unit with target output unit Feature the stages of OCR systems is necessary for
extraction methods are based on 3types of features: meaningful improvements in recognition rates.
➢ Statistical Feature extraction: The following is the feature
➢ Structural matching and classification algorithm for matching
➢ Global transformations and moments the extracted plant disease image with the different
images of same plant, which are taken at different
Classification: The results Classification is the last
times, from different viewpoints, or by different
stage where we train the neural net using the feature
sensors.
vectors obtained during feature extraction method
against the required targets

RESULT AND DISCUSSION:


No. Images Real Text Recognized Text Accuracy
(%)
The electrical The electrical
1 resistance of an resistance of an 96.52
electrical conductor is electrical conductor is
a measure of the diffi- a measure of the oiffi .
culty to pass an Culty to pass an
electric current electric current
through that conoctor. through that conoctor
Difficult roads often Difficult roads often
2 lead to beautiful lead to beautiful 100
destinations destinations

It is a field of reserch It is a field of reserch


3 in pattern recognition, in pattern recognition, 97.50
artificial intelligence artificial intelligence
and machine vision. and machine vistion

Morning walk is a Morning walk is a


4 good exercise. An gooo exercise. An 95.09
early-riser can be a early-rtser can be a
regular morning regular morning
walker.The benefit of walker the benefii of
morning walk are morning walk are
manifold. manifold

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 1 | Issue – 5 | July-Aug 2017 Page: 1228
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
The future is what will The future is what will
5 happe in the time after happe in the time after 97.56
the present.Its arrival the present its arrival is
is considerd inevitable considerd inevitable
due to the existence of oue to the extstence of
time and the laws of time and the laws of
physics. physics.

Teamwork makes the Teamwoik mapeo the


6 dreamwork dremwoik 80.00

Neural networks can Neural networks can


7 be used , if we have a be used , if we have a 98.62
suitable dataset for sutable dataset for
taining and learning taining and learning
purposes. dataset are purposes dataset are
one of most important one of most important
things when things when
constructing new constructing new
neural network. neural network

Good things take time Goool things tare time


8 83.34

Older ocr systems Older ocr systems


9 match these images match these images 98.61
against stored bitmaps against stored 8itmaps
based on specific based on specific
fonts. fonts.

Social media are Social media are


10 computer-mediated computer-medtated 97.60
technologies that technoloc-ies that
facilitate the creation facilitate the creation
and sharing of and sharting of
informaton ideas , informaton ideas ,
career interest and career interest and
other forms of other forms of
expession via virtual expession via virtual
communities and communities and
networks . networrs .
Overall Accuracy of Proposed work 94.48

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 1 | Issue – 5 | July-Aug 2017 Page: 1229
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470

Fig.4 Accuracy of proposed work


In above figure, the accuracy of proposed work is represented in the form of graph. In graph, X-axis denotes the
number of samples which are included in the proposed work for the testing and Y-axis denotes the accuracy of
proposed work in percentage. Form the above graph; it has been observed that the average percentage of
accuracy is more than 94% with handwritten images.

Comparison between Classifier


Classifier ANN SVM ASVM
Accuracy (%) 86 93 94

Fig.5 Accuracy comparison between Classifier

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 1 | Issue – 5 | July-Aug 2017 Page: 1230
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470

In above figure, the accuracy comparison between Information Technology and Computer Science,
artificial neural network (ANN), support vector pp.58- 63, Feb, 2014.
machine (SVM) and adaptive support vector machine 5) Kumar R., Singh A., “Algorithm to Detect and
(ASVM) is represented in the form of bar graph. From Segment Gurmukhi Handwritten Text into Lines,
the figure we observe the accuracy of proposed Words and Characters”, IACSIT International
character recognition system with ASVM is better Journal of Engineering and Technology, vol.3,
than ANN and SVM classifier due to the best training. issue.4, 2011.
6) Kumar R., Singh A., “Detection and Segmentation
Conclusion: Due to overlapping and touching of of Lines and Words in Gurmukhi Handwritten
characters, there remains no significant gap between Text” Institute of Electrical and Electronics
the text lines and hence two or more text lines comes Engineers, pp.353-356, 2010.
in a same text block which leads to wrong results. The 7) Mangla P., Kaur H., “An End Detection
main focus in this research project is to experiment Algorithm for segmentation of Broken and
deeply with, and find alternative solutions to the Touching characters in Gurumukhi Word”,
image segmentation and character recognition Handwritten Institute of Electrical and Electronics
problems within the Overlapped Character Engineers, pp.1-4, 2014.
Recognition. In the existing work, SVM classifies is 8) Mehta B., Rani S., “Segmentation of Broken
applied but it has less accuracy. So in future, Adaptive Characters of handwritten Gurmukhi Script”,
SVM will be applied to improve better accuracy of International Journal of Engineering Sciences,
the system. vol.3 pp.95-105, 2014.
9) Kumar R., Singh A., “Challenges in Segmentation
Future work: In future, we can use the artificial of Text in Handwritten Gurmukhi Script”
neural network along with the optimization algorithm Proceedings in BAIP 2010, CCIS 70, Springer-
to achieve the better results by minimizing the more Verlag Berlin Heidelberg, pp. 388-392, 2010
noisy data from the images for the character 10) BinnyThakral, Manoj Kumar, “Devanagari
recognition system. The combination of the artificial Handwritten Text Segmentation for Overlapping
neural network as classifier instead of SVM with and Conjunct Characters- A Proficient
optimization technique the precision of character Technique”, pp.1-4, IEEE 2014.
recognition will have to increase and the rate of noise 11) M. A. Massoud, M. Sabee, M. Gergais, R. Bakhit,
will decreases. “Automated new license plate recognition in
Egypt”, Alexandria Engineering Journal, vol.5,
REFERENCES issue.2, pp.319-326, Science Direct, 2013.
12) Ching-Liang Su, “Car Plate recognition by whole
1) Chame, Shivadatt D., and Anil Kumar.
2-D image”, Expert Systems with Applications,
"Overlapped Character Recognition: An
vol.38, pp.7195-7200, Science Direct, 2011.
Innovative Approach." Advanced Computing
13) Dening Jiang, Tulu Muluneh Mekonnen, Tiruneh
(IACC), 2016 IEEE 6th International Conference
Embiale Merkebu, Ashenafi Gebrehiwot, "Car
on. IEEE, pp.464-469, 2016.
Plate Recognition System", ICINIS, vol.1, pp.9-
2) Garg N.K., Kaur L., Jindal M.K. “The
12, IEEE, 2012.
segmentation of half characters in Handwritten
14) Setumin, U.U. Sheikh, S.A.R Abu-Bakar, "Car
Hindi Text”, Springer Verlag Berlin Heidelberg,
Plate Character Extraction and Recognition using
pp.48-53, 2011.
Stroke Analysis", SITIBS, vol.1, pp.30-34, IEEE,
3) Bansal G., Sharma D., “Isolated Handwritten
2010.
Words Segmentation Techniques in Gurmukhi
15) Pei-Chen Tseng, Jiun-Kuei Shiung, Chun-Ting
Script”, International Journal of Computer
Huang, Shih-Mine Guo, Wen-Shyang Hwang,
Applications, vol.1, issue.24, pp. 104-111, 2010.
"Adaptive Car Plate Recognition in QoS-Aware
4) Kumar M., Jindal M.K., Sharma R.K.,
Security Network", SSIRI, vol.1, pp.120-127,
“Segmentation of Isolated and Touching
2008.
Characters in Offline Handwritten Gurmukhi
16) Ping Wang, Wei Zhang, "Research and
Script Recognition”, International Journal
Realization of Improved Pattern Matching in

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 1 | Issue – 5 | July-Aug 2017 Page: 1231
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
License Plate Recognition", ISIITAW, vol.1,
pp.1089-1092, 2008.
17) Ratree Juntanasub, Nidapan Sureerattanan, "Car
License Plate Recognition through Hausdorff
Distance Technique", IICTAI, vol.1, pp.645-651,
IEEE, 2005.
18) Benjapa Ratchata sriprasert, Kittawee Kongpan,
Paruhat Punyarprateep, "License Plate detection
Based on Template Matching Algorithm", ICCCT,
vol.1, pp.139-143, 2012.
19) Clemens Arth, Florian Limberger and Horst
Bischof, “Real-Time License Plate Recognition on
an Embedded DSP-Platform”, Proceedings of
IEEE conference on Computer Vision and Pattern
Recognition, pp.1-8, June 2007.
20) Halina Kwasnicka and Bartosz Wawrzyniak,
“License plate localization and recognition in
camera pictures”, AI-METH 2002, November
pp.13-15, 2002.

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 1 | Issue – 5 | July-Aug 2017 Page: 1232

You might also like