0% found this document useful (0 votes)

14 views5 pages

OCR For Printed Kannada Text To Machine Editable F

The document describes an optical character recognition system for printed Kannada text. It discusses segmentation of text lines, words, and characters. A database approach is used for character recognition, aiming for 100% accuracy.

Uploaded by

manmithrane149

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views5 pages

OCR For Printed Kannada Text To Machine Editable F

Uploaded by

manmithrane149

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/234762329

OCR for printed Kannada text to machine editable format using database
approach

Article in WSEAS TRANSACTIONS ON COMPUTERS · June 2008

CITATIONS READS

20 10,724

3 authors:

Sagar B M G. Shobha
Bangalore University Rashtreeya Vidyalaya College of Engineering
5 PUBLICATIONS 72 CITATIONS 117 PUBLICATIONS 1,139 CITATIONS

SEE PROFILE SEE PROFILE

Ramakanth Kumar P.
Rashtreeya Vidyalaya College of Engineering
72 PUBLICATIONS 391 CITATIONS

SEE PROFILE

All content following this page was uploaded by Sagar B M on 25 October 2016.

The user has requested enhancement of the downloaded file.

9th WSEAS International Conference on AUTOMATION and INFORMATION (ICAI'08), Bucharest, Romania, June 24-26, 2008

OCR for printed Kannada text to Machine editable format

using Database approach
B.M. SAGAR1, Dr. SHOBHA G2, Dr. RAMAKANTH KUMAR P3
Information Science1, Computer Science2, Computer Science3
Visvesvaraya Technological University1, 2, 3
Lecturer , Professor , Professor3, R.V.C.E, Bangalore-59, Karnataka, INDIA
1 2

Abstract: - This paper describes an Optical Character Recognition (OCR) system for printed text
documents in Kannada, a South Indian language. The proposed OCR system for the recognition
of printed Kannada text, which can handle all types of Kannada characters. The system first
extracts image of Kannada scripts, then from the image to line segmentation then segments the
words into sub-character level pieces. For character recognition we have used database approach.
The level of accuracy reached to 100%.

Key-words: - Optical Character Recognition, Segmentation, Kannada Scripts

1. Introduction almost identical to that of other Indian

Optical character recognition (OCR) refers languages. It is written horizontally from left
to reading text from paper and translating to right and the concept of lower and upper
the images into a form that the computer can case is absent. [1]
manipulate. OCR systems have been Kannada language has 16 vowels and 34
effectively developed for the recognition of consonants as the basic alphabet of the
printed characters of non-Indian languages. language. The number of written symbols,
Until quite recently, the focus of this however, is far more than the 50 characters,
endeavor has been on characters of English because different characters can be
Language. Such systems are also available combined to form compound characters
for many European languages as well as (ottaksharas).
some of the Asian languages such as
Japanese, Chinese, etc. However, there are 2. Background Study
not many reported efforts at developing Due to the impact and the advancements in
OCR systems for Indian languages the Information Technology, nowadays
especially for a South Indian language like more emphasis is given in Karnataka to use
Kannada. Kannada at all levels and hence the use of
Section 3 describes work done on Kannada Kannada in computer systems is also a
character recognition. Section 4 describes necessity. Therefore, efficient OCR systems
the segmentation process of line, word and for Kannada are one of the present day
character. Section 5 describes the proposed requirements. Currently there are many
system for the Kannada character OCR systems available for handling printed
recognition. Section 6 describes the method English documents with reasonable levels of
of character recognition with the increased accuracy [1]. It is difficult to find OCR
efficiency. Section 7 describes the systems for Kannada with the increased
experimental results and then conclusion accuracy. Few researchers are worked on
and future work. Kannada character recognition with novel
set of features for the recognition problem
1.1 Introduction to Kannada Scripts which are computationally simple to extract.
Kannada is one of the South Indian The recognition achieved by employing a
languages. The Kannada character set is number of 2-class classifiers based on the

ISBN: 978-960-6766-77-0 322 ISSN 1790-5117

9th WSEAS International Conference on AUTOMATION and INFORMATION (ICAI'08), Bucharest, Romania, June 24-26, 2008

Support Vector Machine (SVM) method.

The recognition is independent of the font
and size of the printed text and the system is
seen to deliver reasonable performance [3].
Another researcher who worked on Kannada
character recognition with Hu’s invariant
moments and Zernike moments. Those are
used in the system to extract the features of Figure: 3.1 Shows the line and character
printed Kannada characters. Neural segmentation
classifiers have been effectively used for the
classification of characters based on moment 3.2 Word Segmentation
features. An encouraging recognition rate of
As we know that there is a distance between
96·8% has been obtained [1].
one word to another word. We use that
concept for word segmentation. After the
In our system we have used database
line segmentation scan the image vertically
approach for the character recognition.
for word segmentation.
Section 5 describes the method of character
recognition with the increased efficiency.
Steps for the word Segmentation is as
follows
3. Segmentation Process 1. Scan the BMP image vertically for the
Due to the peculiarities of the Kannada recognized line segment, to find first ON
script, the following segmentation scheme is pixel and remember that x coordinate as x1.
proposed where lines are segmented then Treat this as starting coordinate for the
words and finally characters. These are then word.
put together to the effect of recognition of 2. Continue scanning the BMP image then
individual aksharas or characters. we would find lots of ON pixel since the
As Kannada is a non-cursive script, the word would have started.
individual characters in a word are isolated. 3. Finally we get the successive five (this is
Spacing between the characters can be used assumed word distance) OFF pixel column
for segmentation. and remember that x coordinate as x2.
4. x1 to x2 is the word.
3.1 Line Segmentation 5. Repeat the above steps till the end of the
Line segmentation is the process of line segment.
identifying lines in a given image. 6. Repeat the above steps for all the
Steps for the line Segmentation is as follows recognized line segments.

1. Scan the BMP image horizontally to find 3.3 Character Segmentation

first ON pixel and remember that y 1. Scan the BMP image vertically for the
coordinate as y1. recognized word segment, to find first ON
2. Continue scanning the BMP image then pixel and remember that x coordinate as x1.
we would find lots of ON pixel since the Treat this as starting coordinate for the
characters would have started. character.
3. Finally we get the first OFF pixel and 2. Continue scanning the BMP image then
remember that y coordinate as y2. we would find lots of ON pixel since the
4. y1 to y2 is the line. characters would have started.
5. Repeat the above steps till the end of the 3. Finally we get the OFF pixel column and
image. remember that x coordinate as x2.
4. x1 to x2 is the character.
5. Repeat the above steps till the end of the
word segment, line segment.

ISBN: 978-960-6766-77-0 323 ISSN 1790-5117

9th WSEAS International Conference on AUTOMATION and INFORMATION (ICAI'08), Bucharest, Romania, June 24-26, 2008

6. Repeat the above steps for all the 5. Character Recognition

recognized line segments. After we got the character by character
segmentation we store the character image
4. Proposed system in a structure. This character as to be
The OCR’s task is to identify the characters identified for the pre defined character set.
of Kannada script and the word processor There will be preliminary data will be stored
provides an interface for viewing and editing for all the kannada characters for a identified
documents in Kannada. Figure 4.1 shows the font and size. This data contains the
details. In this work, the sequence of following information
operations carried out is as follows. A page 1. Character ascii value
of Kannada text is scanned. The image 2. Character name
format used is the bmp format. The input to 3. Character BMP image
the system is a scanned image file in BMP 4. Character width and length
format of pure Kannada document. The 5. Total number of ON pixel in the
document is then segmented into lines and image.
each line into individual characters. The For every recognized Character above
documented is scanned and a line in the mentioned information will be captured. The
image file is extracted. The extracted line is recognized character information will be
given as input to the Character compared with the pre defined data which
Segmentation. Within each line the we have stored in the system.
characters are segmented one by one. The As we are using the same font and size for
extracted character that is still to be the recognition there will be exact one
recognized is given as input to the Character unique match for the character. This will
Recognizing Module. identify us the name of the character.
If the size of the character varies it will be
scaled to the known standard and then
recognizing process will be done.

6. Experimental Results
Figure 4.1 shows the input to the system and
once we say recognize we get the output at
the bottom.
Since we are using database approach for
the character recognition we get 100%
accuracy. But the limitation for this
approach is that for each character we need
to have details like Character ASCII value,
Character name, Character BMP image,
Character width, length and total number of
ON pixel in the image. This takes lot of
space as well as lot of computation involved
in recognizing the character. But we get
Output 100% accuracy.

Figure: 4.1 shows interface for viewing 8. Conclusion & future work
and editing documents in Kannada. In this paper, we have presented a database
approach for recognizing Kannada
characters.
Kannada is widely used language in South
India. Lots of applications need Kannada

ISBN: 978-960-6766-77-0 324 ISSN 1790-5117

9th WSEAS International Conference on AUTOMATION and INFORMATION (ICAI'08), Bucharest, Romania, June 24-26, 2008

OCR which can give 100% accuracy. The VTU. His research interests are Pattern
database approach shows the required Recognition. He has guided more than 25
accuracy but with the above said limitation. under graduate projects. He has presented
Using Neural Network, Support Vector and published papers at national conference
Machine recognition work can be carried out / International Conference.
but not with the required accuracy. But we
can make use of dictionary approach to
increase the accuracy.

Reference:
[1] R SANJEEV KUNTE and R D
SUDHAKER SAMUEL "A simple and
efficient optical character recognition
system for basic symbols in printed Kannada Dr. Shobha G., Professor of Computer
text" by Science & Engg. She has been awarded
Ph.D for her thesis titled “Knowledge
[2] "Hidden Markov Models for Online Discovery in Transactional Database
Handwritten Tamil Word Recognition" Systems” from Mangalore University,
Bharath A, Sriganesh Madhvanath, HP Mangalore. She obtained her M.S. degree in
Laboratories India HPL-2007-108, July 6, Software Systems from BITS, Pillani and
2007 BE in Computer Science from Gulbarga
University. Her research interests are Data
[3] T V ASHWIN and P S SASTRY "A Mining, DBMS, and Operating Systems &
font and size-independent OCR system for Networking. She has guided more than 30
printed Kannada documents using support undergraduate and 09 post graduate projects.
vector machines", Department of Electrical
Engineering, Indian Institute of Science,
Bangalore 560 012, India

[4] Rohana K. Rajapakse, A. Ruvan

Weerasinghe "A Neural Network based
character recognition system for Sinhala
Script” , Department of Statistics and
Computer Science, University of Colombo
Dr. Ramakanta Kumar, P was awarded
[5] SEETHALAKSHMI R "Optical Doctorate from Mangalore University, has
Character Recognition for printed Tamil text teaching experience of around 14 years in
using Unicode", Thanjavur, Tamil Nadu academics and Industry. His area of research
is on Artificial Intelligence, Pattern
About the authors recognition. He has to his credits 03
National Journals, 02 International Journals,
12 Conferences and 15 Research
Publications. He is guiding 04 MTech
students and 03 Phd students.

B.M.Sagar, Lecturer of Department of

Information Science and Engineering. He
obtained his Master’s Degree in Computer
Science & Engineering from VTU and B.E.
in Computer Science & Engineering from

ISBN: 978-960-6766-77-0 325 ISSN 1790-5117

View publication stats

Guerilla Film Makers PLUS PDF
100% (1)
Guerilla Film Makers PLUS PDF
54 pages
Skeleti I Kokes - PDF
No ratings yet
Skeleti I Kokes - PDF
30 pages
A Comparative Study of Optical Character Recognition For Tamil Script
No ratings yet
A Comparative Study of Optical Character Recognition For Tamil Script
13 pages
Smart Note Taker
100% (1)
Smart Note Taker
18 pages
A46de A46df
100% (2)
A46de A46df
227 pages
Kannada Handwritten Digit Recognition. Version-1.0
0% (1)
Kannada Handwritten Digit Recognition. Version-1.0
9 pages
B767 Flightdeck and Avionics
90% (21)
B767 Flightdeck and Avionics
142 pages
Boiler Tube Thickness Procedure
100% (3)
Boiler Tube Thickness Procedure
19 pages
Assignment C SCC Propsed Final 29 June 2024
No ratings yet
Assignment C SCC Propsed Final 29 June 2024
10 pages
JETIRFP06101
No ratings yet
JETIRFP06101
10 pages
ATS48 User Manual
No ratings yet
ATS48 User Manual
85 pages
Dion Fortune: A Character Sketch
100% (2)
Dion Fortune: A Character Sketch
7 pages
Ielts Corona Period
No ratings yet
Ielts Corona Period
83 pages
Handwritten Kannada Kagunita
No ratings yet
Handwritten Kannada Kagunita
11 pages
Tarang JI - Edited
No ratings yet
Tarang JI - Edited
20 pages
Mov Itp
100% (1)
Mov Itp
1 page
Eaton 12kV Power Xpert UX Catalog EN US PDF
No ratings yet
Eaton 12kV Power Xpert UX Catalog EN US PDF
20 pages
Script Identification of Telugu, English and Hindi Document Image
No ratings yet
Script Identification of Telugu, English and Hindi Document Image
11 pages
Research Format
No ratings yet
Research Format
11 pages
Socalgas Bill Template
No ratings yet
Socalgas Bill Template
2 pages
AcceptedjournalpaperSegmentationV17I07A71 s6V5nEIrm13919O
No ratings yet
AcceptedjournalpaperSegmentationV17I07A71 s6V5nEIrm13919O
18 pages
New 3
No ratings yet
New 3
17 pages
Kiernan 2018 Framework Analysis
No ratings yet
Kiernan 2018 Framework Analysis
16 pages
SLR Ocr
No ratings yet
SLR Ocr
28 pages
SLR Ocr
No ratings yet
SLR Ocr
28 pages
Script Recognition Ghosh 2009
No ratings yet
Script Recognition Ghosh 2009
22 pages
Chronological Evolution: Development and Identification of An Odia Handwritten Character Dataset Using Deep Learning
No ratings yet
Chronological Evolution: Development and Identification of An Odia Handwritten Character Dataset Using Deep Learning
17 pages
Ocr For Hindi Using NLP
No ratings yet
Ocr For Hindi Using NLP
25 pages
BXE Lab Manuals 3 To 11
No ratings yet
BXE Lab Manuals 3 To 11
19 pages
Optically Processed Kannada Script Realization With Siamese Neural Network Model
No ratings yet
Optically Processed Kannada Script Realization With Siamese Neural Network Model
7 pages
Aradhya-Multi-Lingual OCR
No ratings yet
Aradhya-Multi-Lingual OCR
11 pages
Project Report
No ratings yet
Project Report
38 pages
Siddiqua 2019
No ratings yet
Siddiqua 2019
5 pages
WWW - Afgji.in, WWW - Airforcebalbharatischool.in
No ratings yet
WWW - Afgji.in, WWW - Airforcebalbharatischool.in
7 pages
Fin Irjmets1714726014
No ratings yet
Fin Irjmets1714726014
6 pages
Lesson 1 The Nature of Statistics-1
No ratings yet
Lesson 1 The Nature of Statistics-1
20 pages
IJRPR34095
No ratings yet
IJRPR34095
7 pages
Script Recognition-A Review: Debashis Ghosh, Tulika Dube, and Adamane P. Shivaprasad
No ratings yet
Script Recognition-A Review: Debashis Ghosh, Tulika Dube, and Adamane P. Shivaprasad
20 pages
Kannada Manuscript Digitization Through OCR and Machine Learning
No ratings yet
Kannada Manuscript Digitization Through OCR and Machine Learning
5 pages
Paper 5
No ratings yet
Paper 5
5 pages
OCR of Kannada Characters Using Deep Learning
No ratings yet
OCR of Kannada Characters Using Deep Learning
4 pages
A Simple and Efficient Optical Character Recognition System For Basic Symbols in Printed Kannada Text
No ratings yet
A Simple and Efficient Optical Character Recognition System For Basic Symbols in Printed Kannada Text
13 pages
OCR For Printed Telugu Documents
No ratings yet
OCR For Printed Telugu Documents
32 pages
Kannada Text Recognition
No ratings yet
Kannada Text Recognition
7 pages
Inv TN B1 53139946 103572073899 Apr 2024
No ratings yet
Inv TN B1 53139946 103572073899 Apr 2024
3 pages
Paper 1
No ratings yet
Paper 1
3 pages
Praveen 2014 Towards
No ratings yet
Praveen 2014 Towards
6 pages
Kunte 2007
No ratings yet
Kunte 2007
6 pages
Intelligent Digitalization of The Sinhala Form Templates
No ratings yet
Intelligent Digitalization of The Sinhala Form Templates
7 pages
JETIR2307376
No ratings yet
JETIR2307376
3 pages
OHKWR Offline Handwritten Kannada Words Recognitio
No ratings yet
OHKWR Offline Handwritten Kannada Words Recognitio
9 pages
CurvedSegmentation Kannada1
No ratings yet
CurvedSegmentation Kannada1
9 pages
Glucose Assay by Dinitrosalicylic Colorimetric Method
No ratings yet
Glucose Assay by Dinitrosalicylic Colorimetric Method
6 pages
Script Identification of Telugu, English and Hindi Document Image
No ratings yet
Script Identification of Telugu, English and Hindi Document Image
11 pages
Stainless Steel Cable Ties SPEEDWELL
No ratings yet
Stainless Steel Cable Ties SPEEDWELL
2 pages
Kannada Text Extraction From Images and Videos For Vision Impaired Persons
No ratings yet
Kannada Text Extraction From Images and Videos For Vision Impaired Persons
8 pages
IJIGSP BHCR CNN Pub 2015 8 52-59
No ratings yet
IJIGSP BHCR CNN Pub 2015 8 52-59
9 pages
Handwritten Telugu Character Recognition Using Machine Learning
No ratings yet
Handwritten Telugu Character Recognition Using Machine Learning
6 pages
Offline Handwritten Kannada Numerals Recognition: Sushritha S N Lohitesh Kumar
No ratings yet
Offline Handwritten Kannada Numerals Recognition: Sushritha S N Lohitesh Kumar
4 pages
Paper 4
No ratings yet
Paper 4
8 pages
New 1
No ratings yet
New 1
7 pages
Optical Character Recognition OCR For Telugu Datab
No ratings yet
Optical Character Recognition OCR For Telugu Datab
6 pages
Mario Puzo-The Godfather Eng
No ratings yet
Mario Puzo-The Godfather Eng
3 pages
A Kannada Handwritten Character Recognition System Exploiting Machine Learning Approach
No ratings yet
A Kannada Handwritten Character Recognition System Exploiting Machine Learning Approach
7 pages
Character Recognition in Natural Images: Te Ofilo E. de Campos, Bodla Rakesh Babu
No ratings yet
Character Recognition in Natural Images: Te Ofilo E. de Campos, Bodla Rakesh Babu
8 pages
Gravitation 50 Questions 09-10-23
No ratings yet
Gravitation 50 Questions 09-10-23
9 pages
A Survey On Recognition of Devnagari Script: Ratnashil N Khobragade1 Dr. Nitin A. Koli Mahendra S Makesar
No ratings yet
A Survey On Recognition of Devnagari Script: Ratnashil N Khobragade1 Dr. Nitin A. Koli Mahendra S Makesar
5 pages
Assignment / Tugasan
No ratings yet
Assignment / Tugasan
13 pages
Paper 3
No ratings yet
Paper 3
6 pages
Kannada Character Recognition System A Review: January 2010
No ratings yet
Kannada Character Recognition System A Review: January 2010
13 pages
Myp Guide
No ratings yet
Myp Guide
2 pages
Segmentation of The Overlapping Kannada Characters: Soumyadeep Sinha
No ratings yet
Segmentation of The Overlapping Kannada Characters: Soumyadeep Sinha
2 pages
Ijarcce 5
No ratings yet
Ijarcce 5
5 pages
Recital Comparison of Bilingual Language Using Various Filters For Offline Handwritten Character
No ratings yet
Recital Comparison of Bilingual Language Using Various Filters For Offline Handwritten Character
6 pages
An Impact of Ridgelet Transform in Handwritten Recognition A Study On Very Large Dataset of Kannada Script
No ratings yet
An Impact of Ridgelet Transform in Handwritten Recognition A Study On Very Large Dataset of Kannada Script
4 pages
MGM'S Jnec Question Bank For The Subject: CE (FE-All)
No ratings yet
MGM'S Jnec Question Bank For The Subject: CE (FE-All)
30 pages
TransactionSummary 005010300020420 150524121032
No ratings yet
TransactionSummary 005010300020420 150524121032
4 pages
A Survey of Neural Network Based Script Recognition Using Wavelet Features
No ratings yet
A Survey of Neural Network Based Script Recognition Using Wavelet Features
4 pages
Aabin
No ratings yet
Aabin
4 pages
Ijcses 030602
No ratings yet
Ijcses 030602
13 pages
Ancient Kannada Text Recognition IEEE Paper
No ratings yet
Ancient Kannada Text Recognition IEEE Paper
4 pages
An Efficient OCR For Printed Malayalam Text Using Novel Segmentation Algorithm and SVM Classifiers
No ratings yet
An Efficient OCR For Printed Malayalam Text Using Novel Segmentation Algorithm and SVM Classifiers
5 pages
Curriculum Vitae A. Anandakrishnan: Contact No: 09941623745
No ratings yet
Curriculum Vitae A. Anandakrishnan: Contact No: 09941623745
5 pages
Teac Ag d9320
No ratings yet
Teac Ag d9320
2 pages
Recognition of Isolated Handwritten Kannada Vowels: Sangame S.K., Ramteke R.J., Rajkumar Benne
No ratings yet
Recognition of Isolated Handwritten Kannada Vowels: Sangame S.K., Ramteke R.J., Rajkumar Benne
4 pages
Electronics I
No ratings yet
Electronics I
4 pages
Optical Character Recognition Algorithms and Systems: Definitive Reference for Developers and Engineers
From Everand
Optical Character Recognition Algorithms and Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Optical Character Recognition Technologies and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Optical Character Recognition Technologies and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Tesseract OCR Essentials: Definitive Reference for Developers and Engineers
From Everand
Tesseract OCR Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
PostScript Language Essentials: Definitive Reference for Developers and Engineers
From Everand
PostScript Language Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

OCR For Printed Kannada Text To Machine Editable F

Uploaded by

OCR For Printed Kannada Text To Machine Editable F

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Article in WSEAS TRANSACTIONS ON COMPUTERS · June 2008

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

OCR for printed Kannada text to Machine editable format

Key-words: - Optical Character Recognition, Segmentation, Kannada Scripts

1. Introduction almost identical to that of other Indian

ISBN: 978-960-6766-77-0 322 ISSN 1790-5117

Support Vector Machine (SVM) method.

1. Scan the BMP image horizontally to find 3.3 Character Segmentation

ISBN: 978-960-6766-77-0 323 ISSN 1790-5117

6. Repeat the above steps for all the 5. Character Recognition

ISBN: 978-960-6766-77-0 324 ISSN 1790-5117

[4] Rohana K. Rajapakse, A. Ruvan

B.M.Sagar, Lecturer of Department of

ISBN: 978-960-6766-77-0 325 ISSN 1790-5117

View publication stats

You might also like