0% found this document useful (0 votes)
54 views7 pages

NEW 10 August Feature Extraction Techniques For Handwritten Character Recognition

This document discusses feature extraction techniques for handwritten character recognition. It begins by introducing handwritten character recognition and explaining that feature extraction is a key step that affects recognition accuracy. It then categorizes feature extraction techniques into shape-based and non-shape-based categories. Shape-based techniques include geometrical features like angular width and area, as well as moment-based features like Zernike and Hu moments. Non-shape based techniques include statistical features, shadow features, and template matching. The document reviews several papers analyzing different feature extraction methods and reporting recognition accuracies between 82-98% on various character datasets.

Uploaded by

shruti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views7 pages

NEW 10 August Feature Extraction Techniques For Handwritten Character Recognition

This document discusses feature extraction techniques for handwritten character recognition. It begins by introducing handwritten character recognition and explaining that feature extraction is a key step that affects recognition accuracy. It then categorizes feature extraction techniques into shape-based and non-shape-based categories. Shape-based techniques include geometrical features like angular width and area, as well as moment-based features like Zernike and Hu moments. Non-shape based techniques include statistical features, shadow features, and template matching. The document reviews several papers analyzing different feature extraction methods and reporting recognition accuracies between 82-98% on various character datasets.

Uploaded by

shruti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Feature Extraction Techniques for Handwritten Character Recognition: A Review

Abstract- Handwritten Character Recognition is one of the


eminent fields of research nowadays. Feature Extraction is a
key step which affects the accuracy or recognition of the
characters. The selection of features depends upon the types of
characters to be recognized. Since in each language the
characters have different shapes with different styles of
writing which varies from user to user such as size, tilt,
rotation, etc. A lot of research is being carried out on feature
extraction techniques to recognize any character with high
accuracy. Most of the feature extraction techniques which
exist now are suitable for some but not all types of fonts being
used. Therefore there is still some scope to work on feature
extraction techniques for the recognition of different
handwritten characters.

Keywords-Character Recognition, Feature extraction,


Accuracy.

I. Introduction
Character recognition is identified as one of the fields of Pattern Figure 1: Different Feature Extraction Techniques
Recognition where the main aim is to recognize or identify
characters of various languages written in different styles and III. Feature Extraction
formats. Character recognition is among the most challenging
tasks which finds many applications in banks, industries, postal Shape based Features
services [5], forensic research, text recognition of old documents, Shape based features are categorized in four sections namely
etc [1]. OCR in its initial days was developed as a reading aid for geometrical feature, local features, spatial relational features and
blind during the early decades of the nineteenth century. In the moment based features.
present scenario the OCR systems have shown a significant
progress in recognition of the characters present in old noisy (i) Geometrical Features:- Various Geometry based features
documents, however much work is required to be done on include angular width, area, square, perpendicular geometry based
unconstrained handwritten character recognition. calculations which in general depend on the shapes of different
characters.
II. Feature Classification
Character recognition in OCR is a process of identification of In a paper by Yarman and Erica [8] a shape descriptor is proposed
characters of different languages with high accuracy. Initially which is based on the measurement of angle formed by the lines
OCR was developed as an aid for visually handicapped persons. joining any point on the boundary to the rest of the boundary
In the era of mid 1980’s the OCR techniques have developed points known as Beam Angle Statistics (BAS). The shape of the
very rapidly due to an intense research in the field of pattern character is defined by using a third order statistics.
recognition. In the present time scenario OCR systems have
shown a considerable improvement in the recognition of A novel structured based method is proposed by Alajlan in a paper,
characters present in old documents. In India there are 22 official it uses a new approach of shape based feature extraction using
languages and nearly about 10-12 different existing scripts in triangle representation [9]. In this paper different shapes are
which these official languages are written. The idea of classified by using a dynamic space warping metric. This method
developing an OCR system in a diversified country like India is a reports better performance as compared to the already existing
challenging task because of a complex character shape techniques when applied on available benchmark databases.
representation in different scripts. The feature extraction
techniques can be classified into two major categories defined as Iqbal et al. in the paper [10] proposed a new novel approach to
extract translational, scale and rotational invariant features. In this
approach any character is segmented or divided into n sectors with
Shape based category and Non-shape based category. The Non reference to the centroid. Distance is used as a feature and is
shape based category mainly determines some special points calculated from the centroid of the image to the boundary points
present in the character image, correlation among the images of present on the character’s edge also called as cutpoints [24]. An
same or different characters, template matching and statistical accuracy of 98.6% is obtained using multiple layer with feed-
features [4] such as projection profiles [6], holes, probability forward artificial neural network.
distribution functions etc. Shape based features on the other end
usually encodes the information of the characters based on their In [11] a novel feature extraction technique is proposed by
geometry. However in some languages such as Devanagiri some Tripathy and Pal for recognition of Bangla and Devanagari
characters have almost the same geometric representation, and characters having variable font-size and multiple orientations. A
hence to recognize such characters with high accuracy a distance based feature is used for recognition, in which the
combination of shaped and non shaped features is used. The distance from the centroid to the closest contour point is
figure below shows the different types of features which are calculated. An accuracy of 98.1 and 97.8 is achieved for
useful in obtaining a high accuracy for character recognition. Devanagari and Bangla characters respectively.
technique for classification of handwritten Devanagari characters
S. No. Features Database Number of Accuracy
using edit distance and chain codes as features is used. The input
Samples (%) character image is encoded in string chain codes and regular
1 (BAS) MPEG shape-1 100 91.69 expressions are matched with it. An accuracy of 82% is achieved.
Angular characters In [16] Hsu proposed a methodology of character recognition.
Width
using Local Binary Pattern as feature and classification is done
2 Triangular Kimia shapes 95 95.59
area characters using two stage linear discriminant analysis. This paper reports an
accuracy of 94%
3 Geometrical English Character 4680 98.61
Distance Dataset characters S.No Features Used No of Samples Accuracy
(%)
4 Geometrical Bangla Character 2890 97.81 1 Local Features 2000 English character 94
Distance Dataset characters 2 Topology Features 100 characters 100
3 Chain codes 4000 Devanagari 82
5 Geometrical Devanagari 3000 98.1 characters
Distance Character Dataset characters
Table 2: Performance of Spatial Relational Features
Table 1: Performance of Geometrical Features for HCR
Non Shape Based Features
Non-Shape based features mainly includes statistical features
Moment Based Features
shadow features, template matching, correlation, special points
The term moment is derived from Physics and is used for the
such as corners, dots, endpoints, centroids, etc. Statistical features
analysis of different shapes and various contour of the character to
mainly include mean, standard deviation, peak extent features,
be recognized. Zernike and Hu are two major moment based
probability distributions, hole size, etc. The shape of a character
features being used for character recognition. In [3], Kunte and
can be determined with the help of statistical features such as
Samuel et al. proposed an approach for the extraction of features of
projection profiles and shadow features.
printed Kannada characters using Zernike and Hu’s moments. The
accuracy obtained is 96.8%. In [2] for Handwritten compound
Hybrid Features
character recognition, Kale proposed a method in which the input
The combination of shape based and non shape based features are
image is divided into nine zones and feature extraction based on
called as hybrid features. In [18] Pal and Chaudhari explained
Zernike moment is done for each zone. Support Vector Machine
about some specific properties of Indic scripts. The paper also
(SVM) [7] and k-Nearest Neighbor (k-NN) classifier is used for
suggested some methodologies which may be adopted to recognize
classification which achieves an accuracy of 98.37% and 95.82%
characters of Kannada, Guajarati, Gurumukhi characters. The
respectively.
paper also emphasized on the recognition of multifont and Omni-
font, etc present in Indian scripts.
Local Features
Local features are the features which are derived from the skeleton
In [19] a novel geometrical shape features extraction technique for
of an image given as input to a recognition system. Some of the
multilingual character recognition which is based on some
basic local features include types of curves, loops, orientations,
geometrical parameters calculation such as perpendicular distance
positions of the lines, etc.
and triangle area. Using this method an accuracy of 99.03% and
98.5% using media-lab benchmark database and proprietary
For character recognition, Gaurav and Ramesh [12] proposed a
database respectively.
geometry based method feature extraction technique. Firstly the
skeleton is extracted from character image. The feature extracted
In [20] Soora and Deshpande proposed feature extraction with the
from this method is used for training the neural network. An
help of different scan lines such as horizontal, vertical and
accuracy of 95.38% is achieved using this method.
diagonal scan lines. This particular feature was tested on media-lab
standard database and achieves an accuracy of 98.8%.
In [13] Raja and John, proposed a method to extract vertical and
horizontal lines, loops, curves, etc. The position of horizontal and
In [21] author Manjunath proposed a multilingual OCR for
vertical lines is obtained by considering the centroid of the image
English, Malayalam,Tamil, etc by using Principal Component
as reference point. Using chain codes, small and big loops are
Analysis (PCA) technique with Fourier transform and by using this
extracted and are classified with respect to the position of the
technique an accuracy of 95.1% is achieved.
centroid of an image. Decision tree based classifier is used for
classification and an accuracy of 95.12% is obtained.
In [17] Cao et al. proposed a method to recognize handwritten
numerals with multistage classifier. Features such as Directional
In [14] Sinha and Bansal obtained the skeleton of the input
code histograms are used to recognize the handwritten numerals.
character image and is further divided into strokes and each stroke
This feature is extracted by dividing the input image into [4x4]
is described on the basis of its geometrical appearance. An
rectangular zones and contour chain codes are calculated for each
accuracy of 70% is achieved by training a dataset of nearly 12,000
zone. Subclass neural network and incremental clustering is used
characters.
for classification reporting an accuracy of 99.83%.
Spatial Relational Features
In [22] Trier et al. proposed the feature extraction techniques for
Relationship between contour pixels in any character image is
greyscale and binary images. Some of these techniques include
referred to as spatial relational features. A topological feature of a
character in an image is also a spatial feature. In [15] a hybrid
projection histograms, Fourier descriptors, contour profiles, zoning, peak extent features which are similar to longest run length
geometric moment invariants, graph description, etc. features. The peak extents are calculated by computing the sum of
the lengths called longest lengths for each row and column in each
zone called horizontal peak extents and vertical peak extents,
In [23] authors Suen and Li proposed an approach based on respectively. The classifiers used for character recognition are k-
partition combination methods. Regional decomposition methods, NN, linear-SVM and achieved an accuracy of 95.62% and 95.48%.
mean and standard deviation, confusion and perfect combinations,
recognition of basic parts, are the partition-combination methods
used in handwritten character recognition. The paper reported an S.No Features Used Languages/Number of Accuracy
accuracy of 98.55%. Samples (%)
1 Local Features Bangla(23,000) 96
2 Statistical, Spatial Numerals and English
ShenZheng et al. in the paper [26] proposed a technique based on Relational Features alphabets (196,000) 98.75
calculation of background area features for the recognition of 3 Chain codes, Area of Devanagari, Marathi 97.03
handwritten character for different types of shape changes. By this triangle, Perpendicular characters (20,000)
distance based features
technique paper an accuracy of 98.6% is achieved. 4 Hybrid Features Devanagari (14,000) 85.5
5 Stroke based, Run number, Oriya (5000) 96.3
Roy et al. in the paper [28] proposed an approach for unconstrained Water overflow for
handwritten character recognition using Hidden Markov Model reservoir based features
(HMM) in which a local gradient histogram is calculated by a 6 Gradient, Direction based Telugu (2795) 98
features
sliding window mechanism. This method is scale and rotation 7 Density, Moment-based Devanagari numerals 89.68
invariant. An accuracy of nearly 80% is reported by the authors. profiles (2460)
8 Hybrid features Devanagari (4900) 92.8
Tian et al. in the paper [32] proposed a methodology for character 9 Shadow based features, Devanagari characters 90.74
Histogram of Chain code (7154)
recognition with the help of the combination of three features. The
10 Hybrid features (Moments, Devanagari alphabets 90
first feature is an extension of Histogram of Gradient method called Pixel density, Horizontal (2,000)
as (EHOG), the second feature is Co-occurrence HOG (Co-HOG), zero crossing)
and the third feature is Convolution co-HOG (ConvCo-HOG). The
classifier used for classification is Support Vector Machine (SVM). Table 3: Comparison of Performance methods which used Hybrid
This technique is used for the recognition of Bengali characters and features
achieved an accuracy of 92%.

Jain and Dave et al. in their paper [27] proposed a novel technique Gap Areas for Future Work
for the recognition of handwritten text. This technique is a hybrid On the basis of the discussion done till now in this paper on various
geometric feature extraction technique which uses different stroke feature extraction techniques for Character Recognition the
levels such as middle loop, upper stroke and lower stroke as following points are observed which may be also be considered as
features for recognition. The method achieved 85% recognition the scope for future work. The points are as follows:-
accuracy.
(i) The number of standard datasets available especially for Indic
Yang and Wang [30] presented an approach for the recognition of scripts is very less. So in order to develop a robust character
Chinese characters in printed form by implementing a feature recognition system one should develop more handwritten
extraction technique which is rotation invariant. The proposed datasets for Indic scripts with variability in styles of writing,
technique extracts a 3 dimensional feature vector with the font size, orientation, etc.
computation of the measures with respect to each pair of pixels.
The authors obtained an accuracy of 97.4% for this method. (ii) The above discussion doesn’t mention about the recognition of
old text languages such as Sanskrit available in both Brahmi
Rani and Meena et al. in the paper [29] proposed a novel feature and Devanagari script which may help us to decode and
extraction technique for the recognition of handwritten characters. enhance our knowledge thus leading to some fruitful scientific
The proposed feature is called as cross-corner feature. Firstly if any developments.
background noise is present in the image it is removed by the
method of pre-processing by using a suitable filter. The input image (iii) Very less amount of work is done on compound characters
obtained after filtering is then divided into nine zones and the which usually occur in some Devanagari scripts.
number of left and right diagonal lines will be a cross-corner
feature point for each zone. By combining these cross-corner (iv) Very few amount of work is done on scale invariant and
features of all zones, the feature vector for a character image is rotation invariant feature extraction techniques especially for
obtained. An average accuracy of 84.25% is obtained on using two Indic scripts which will further enhance the ability to
data-sets called Data-set1 and Data-set2. recognize uneven characters, broken characters and characters
from many languages.
Kumar et al. in the paper [31] proposed a feature extraction
technique for handwritten Gurumukhi character recognition called (v) There are many characters which have similar appearance in
different languages such as Hindi, Marathi, etc. It is not easy to
recognize these similar characters with high accuracy. In such
situations, the authors must develop some useful robust hybrid
[13] S. Raja and M. John. (2014). “A novel Tamil character
(vi) feature extraction techniques which can be used to distinguish recognition using decision tree classifier,” IETE J. Res. 59
similar characters with high accuracy. (5), pp. 569–575.
[14] V. Bansal and R. M. K. Sinha. “On how to describe shapes
of Devanagari characters and use them for recognition,” in
Conclusion Proc.5th Int. Conf. Doc. Anal. Recognit., Bangalore, 1999,
In this paper, different types of techniques for feature extraction of pp. 410–413.
handwritten characters are discussed. The different feature
extraction techniques are classified into non shape based, shape [15] Parag S. Deshpande, L. Malik and S. Arora. (2008). “Fine
based and hybrid features. Most of the discussed features are classification & recognition of hand written Devanagari
proposed for unconstrained handwritten character recognition of characters with regular expressions & minimum edit distance
different languages. A comparative analysis of these methods is method,” J. Comput. 3 (5), pp. 11–17.
also done in terms of accuracy obtained in recognition of characters
[16] G. S. Hsu, J. C. Chen and Y. Z. Chung. (2013).
of different languages. Finally, a discussion is done on gap areas
“Application-oriented license plate recognition,” IEEE Trans.
and scope for future work.
Veh. Technol. 62 (2), pp. 552–561.
[17] J. Cao, M. Ahmadi and M. Shridhar. (1995). “Recognition
REFERENCES
of handwritten numerals with multiple feature and multistage
[1] B M Vinjit, Mohit Kumar Bhojak, Sujit Kumar, Gitanjali classifier,”Pattern Recognit. 28 (2), pp. 153–160.
Chalak. A Review on Handwritten Character Recognition
Methods and Techniques. International Conference on [18] U. Pal and B. B. Chaudhuri. (2004). “Indian script character
Communication and Signal Processing, pp.1224-1228. recognition: A survey,” Pattern Recognit. 37 (9), pp. 1887–
1899.
[2] K. V. Kale, P. D. Deshmukh, S. V. Chavan, M. M. Kazi and
Y. S. Rode. (2014). “Zernike moment feature extraction [19]N. R. Soora and P. S. Deshpande. (2016). “Novel
for handwritten Devanagari (Marathi) compound character geometrical shape feature extraction techniques for
recognition,” Int. J. Adv. Res. Artif. Intell. 3 (1), pp. 68–76. multilingual character recognition,” IETE Tech. Rev., pp. 1–
10. doi:10.1080/ 02564602.2016.122958.
[3] R. S. Kunte and R. D. S. Samuel. (2007). “A simple and
efficient optical character recognition system for basic [20] N. R. Soora and P. S. Deshpande. (2014). “Robust feature
symbols in printed Kannada text,”Sadhana32(5),pp.521–533. extraction technique for license plate characters recognition,”
IETE J. Res. 61 (1),pp.72–79.
[4] Hamad, K., & Kaya, M. (2016). A Detailed Analysis of
Optical Character Recognition Technology. International [21] V. N. Manjunath Aradhya, G. Hemantha Kumar and S.
Journal of Applied Mathematics, Electronics and Computers, Noushath. (2008). “Multilingual OCR system for South
4(Special Issue-1). Indian scripts and English documents: An approach based on
Fourier transform and principal component analysis,” Eng.
[5] R. Ghosh, C. Panda and P. Kumar, "Handwritten Text Appl. Artif. Intell. 21, pp. 658–668.
Recognition in Bank Cheques", Conference on Information
and Communication Technology,2018. [22] D. Trier, A. K. Jain and T. Taxt. (1996). “Feature extraction
methods for character recognition-A survey,” Pattern
[6] Thi Thi Zin, Shin Thant and Ye Htet. “Handwritten Recognit. 29 (4), pp. 641–662.
Characters Segmentation using Projection Approach”, In
IEEE 2nd Global Conference on Life Sciences and [23] Z. C. Li and C. Y. Suen. (2000). “The partition-combination
Technologies (LifeTech), pp. 107 108, Mar.2020. method for recognition of handwritten characters,” Pattern
Recognit. Lett. 21 (8).
[7] Y. B. Hamdan et al., "Construction of Statistical SVM based
Recognition Model for Handwritten Character Recognition", [24] N. Reddy Soora and Parag Deshpande. (2017). “Review of
J. Inf. Technol.Digit.World,vol.3, pp.92-107,2021. Feature Extraction Techniques for Character
Recognition,”IETE Journal of Research pp.1-17
[8] N.Arica and F. T. Yarman Vural. (2003). “BAS: A https://doi.org/10.1080/03772063.2017.1351323.
perceptual shape descriptor based on the beam angle
statistics,” Pattern Recognit.Lett. 24(9–10),pp.1627-1639. [25] Parag S. Deshpande, L. Malik and S. Arora. (2008). “Fine
classification & recognition of hand written Devanagari
[9] N. Alajlan, I. El Rube, M. S. Kamle and G. Freeman. characters with regular expressions & minimum edit
(2007).“Shape retrieval using triangle-area representation and distance method,” J. Comput.3(5),pp.11–17.
dynamic space warping,” Pattern Recognit. 40 (7), pp. 1911–
1920. [26] W. Shen-Zheng and L. His-Jian. “Detection and recognition
of license plate characters with different appearances,”in
[10] A. Iqbal, A. B. M. Musa, A. Tahsin, M. A. Sattar, M. M. IEEE Proc. Intell. Transp.Syst.,Shanghai,2,2003,pp.979–
Islam and K. Nurase. “A novel algorithm for translation, 984.
rotation and scale invariant character recognition,” in Proc.
SCIS & ISIS, Nagoya, 2008, pp. 1367– 1374. [27] S. T. Jain and H. B. Dave. (1998). “Handwritten text
recognition using geometric features,” IETE J. Res. 44 (6)
[11] U. Pal and N. Tripathy. (2009). “A contour distance-based pp.299–303.
approach for multi-oriented and multi-sized character
recognition,” Sadhana 34 (5), pp. 755–765. [28] P. P. Roy, S. Roy and U. Pal. “Multi-oriented text
recognition in graphical documents using HMM,” in 11th
[12] D. D. Gaurav and R. Ramesh. (2012). “A feature extraction IAPR Int. Workshop Doc. Anal.Syst.,Tours,2014,pp.136–
technique based on character geometry for character 140.
recognition,” Cornell Uni. Libr. 1202.3884, pp. 1–4.
[29] M. Rani and Y. K. Meena. “An efficient feature extraction
method for handwritten character recognition,” in Swarm
Evolutionary Memetic Comput. Conf., Visakhapatnam,
7077, 2011, pp. 302–309.
[30] T. N. Yang and S. D. Wang. (2001). “A rotation invariant
printed Chinese character recognition system,” Pattern
Recognit. Lett. 22, pp. 85–95.
[31] M. Kumar, R. K. Sharma and M. K. Jindal. (2013). “A
novel feature extraction technique for offline handwritten
Gurumukhi character recognition,” IETE J. Res. 59 (6), pp.
687–691.
[32] S. Tian, U. Bhattacharya, S. Lu, B. Su, Q. Wang, X. Wei,
Y. Lu and C. L. Tan. (2016). “Multilingual scene character
recognition with co-occurrence of histogram of oriented
gradients,” Pattern Recognit. 51 (3),pp.125–13.

You might also like