2021 Textreader
2021 Textreader
1
Mohd Nadhir Ab Wahab, 1Ahmad Sufril Azlan Mohamed, 2Abdul Syafiq Abdull
Sukor, 1Ong Chia Teng
1
School of Computer Sciences, Universiti Sains Malaysia, 11800, Penang
2
Centre of Advanced Sensor Technology (CEASTech), Universiti Malaysia Perlis,
02600, Arau, Perlis
Abstract. There are approximately 1.3 billion people in the world have visual impairment issue.
They usually have to read printed material using Braille. However, there are limitations for these
people when the material is not printed in Braille. Although there is much electronic equipment
that can help them to read, the prices are too expensive to afford. Thus, this paper proposes an
affordable mobile application which is designed for the visually impaired person. The mobile
application is able to capture the image of printed material with a mobile camera. The captured
image is then converted to text by using image-to-text conversion in Optical Character
Recognition (OCR) framework. Finally, the text will be read out into speech format using text-
to-speech conversion in Text to Speech (TTS) framework. As a result, a person who has visual
impairment can understand the printed material which is not written in Braille through listening
instead of touching. Some alert sound is provided to allow the users to know what exactly
happened in the mobile application. It is user friendly for the visually impaired person since the
designed system has sound for guideline so they can always get to know the process of the
application.
1. Introduction
Visual impairment is known as a person who has lost its ability to see clearly as a normal person.
According to the World Health Organization (WHO), there are about 2.3 billion people have some form
of visual impairment worldwide, which represents one-third of the world population [1]. These people
are usually deprived of the ability to read textual information, which limits their reading mobility in the
world. Referring to the International Classification of Diseases (ICD-11 version 2019), a person is
considered as visually impaired if presented distance visual acuity is worse than 3/60 and there are
approximately 36 million people have fallen in this field [2].
Table 1 shows the categories of visual impairment worldwide. Distance visual acuity is the basic
measurement to identify the categories of individual’s visual impairment. The test in distance visual
acuity is carried out by using different size letters arranged in decreasing size called Snellen Chart [3].
A person carries on the eye test by one eye and covers another eye. The visually impaired person has
visual acuity less than 3/60, which means the smallest size of letter that the person can identify is 60 in
3 meters or worse than that [4].
When it comes to read or write, the visual impairment person uses Braille’s system. The Braille’s
system is a ‘basic cell’ of six raised dots arranged like a domino and each braille, the letter of the alphabet
is made up of the combinations of dots from this basic cell [5]. Total of 63 possible combinations of
dots not only included in the alphabet but also can represented punctuation, and also a group of letters
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
5th International Conference on Electronic Design (ICED) 2020 IOP Publishing
Journal of Physics: Conference Series 1755 (2021) 012055 doi:10.1088/1742-6596/1755/1/012055
known as contraction words such as isn’t, aren’t, don’t and others [6]. Although Braille is a useful
language for visual impairment person when reading without the use of sight, there are some limitations
that are associated with the system.
A survey conducted by researchers found that the average reading rate of children through Braille
just under half the reading rate of print readers with the use of sight [7]. This is because normal human
eyes can read several words in one time but with fingers, it can only pass over Braille in one-by-one
word. There are also limited reading materials such as book, paper, and journal that are not printed in
Braille. There is much electronic equipment that can help the visually impaired person to read but they
are comparatively expensive [8], and it leads to limiting the reading media that the visually impaired
person can use. Furthermore, it is imperatively time-consuming to learn a new language in Braille and
use a new piece of electronic equipment [9].
This paper proposes to build an application which is convenience for the visually impaired person
since it can be applied using smartphone. A mobile application is designed to allow users to capture the
image of printed material. Then the image will be converted to text using Optical Character Recognition
(OCR) approach [10]. After that, the built text will be converted to speech and speak out using Text to
Speech (TTS) framework [11].
2. System Design
2
5th International Conference on Electronic Design (ICED) 2020 IOP Publishing
Journal of Physics: Conference Series 1755 (2021) 012055 doi:10.1088/1742-6596/1755/1/012055
Figure 1 shows the system architecture of the text reader system. Users need to install the Text Reader
application in their android smartphone. They will take an image as an input to the app and it will process
the image in the system (convert image to speech). After processing, Text Reader application will
convert the output from the speech to the user app and speak out in a voice mode to the user.
There are three different types of modules in the designed system. First, the Pre-process Image
module is used to convert the original image into a better representation. This will help to obtain a clear
output result. Then, the OCR module is used to convert the captured image into text using OCR
algorithm. Finally, the obtained text will be converted to the speech mode in the TTS module. Table 2
presents the breakdown information of each module in the system of the user app.
Table 2. Details of Text Reader System
No 1 2 3
Pre-process Image Optical Character Text to Speech
Module
Recognition (OCR) (TTS)
Input Image Pre-processed Image Text
Pre-processed Text Speech
Output
Image
3
5th International Conference on Electronic Design (ICED) 2020 IOP Publishing
Journal of Physics: Conference Series 1755 (2021) 012055 doi:10.1088/1742-6596/1755/1/012055
Figure 2 shows the flowchart of the text reader system. From the start, it will open the camera to allow
users to capture an image. If the user has captured the image, it would proceed to process the image and
the application will stop if the user does not want to capture anymore image. In the image processing, it
will convert the captured image to the text and convert the text to speech. After the conversion is
completed, it will speak out the text. Then, the user can choose whether to capture another image or
terminate the application.
3. System Implementation
4
5th International Conference on Electronic Design (ICED) 2020 IOP Publishing
Journal of Physics: Conference Series 1755 (2021) 012055 doi:10.1088/1742-6596/1755/1/012055
If the android phone system is running lower than Android 6.0 (API level 23) and the app’s targetSdk
version is lower than 23, it does not have any app permissions, so the user can proceed to use the phone
camera directly such in Figure 5.
For the case which requires the user to allow the permissions, the text reader application will give
notification sound “Phone permissions is needed” to alert the visual impaired people to ask for the
setting. If the permissions are denied, the application will be terminated, it will only process to use the
camera if the user allows both camera and storage permissions. If “deny & don’t ask again” is chosen,
it will terminate the application and not be able to launch unless the user changes their permissions in
the phone settings. The message “Some permissions is Denied” will pop up when the app is terminated
due to permissions issue.
Since the text reader application needs people to help visual impairment person to install, it also
requests people to set up the permission of phone when launching the application for the first time.
5
5th International Conference on Electronic Design (ICED) 2020 IOP Publishing
Journal of Physics: Conference Series 1755 (2021) 012055 doi:10.1088/1742-6596/1755/1/012055
When all the requirements are fulfilled (rectangle object fix in the middle of the screen and not too
small), it will play sound “Hold for 2 seconds”, before it is automatically captured the image. The user
also can use volume down button to capture the image. There are 3 ways to capture the image in this
application, auto-capture, click on the button in the middle bottom of the screen and volume down button
which can easily found by touching.
When the image is captured, “Image is captured. Saved Image” is played to alert the user. Then, it
will process to convert the captured image to speech (back-end) and will notify by the sound “Please
wait, the image is converting to text “. The system also allows the user to use volume up button to exit
the application.
6
5th International Conference on Electronic Design (ICED) 2020 IOP Publishing
Journal of Physics: Conference Series 1755 (2021) 012055 doi:10.1088/1742-6596/1755/1/012055
For case 1, the captured image is clearer and the result is good, but for case 2, it can be assumed as
a bad captured image with a bad result may due to the blur image, not an image with text nor in English
language. It will request the user to capture image again but also allow the user to listen to the result
text.
For both cases, it will keep on alerting the user to play and stop the speech by pressing any space on
the screen. Then the text will speak out to the user in format “Text of image that had been captured is
……” when the user press on any space on the screen. It will also use volume down button to go back
to the camera (to capture another image) or use volume up button to exit this application and sound
“bye, see you” will be played.
3.3 System Implementation
7
5th International Conference on Electronic Design (ICED) 2020 IOP Publishing
Journal of Physics: Conference Series 1755 (2021) 012055 doi:10.1088/1742-6596/1755/1/012055
can be played using wave file player in the smartphone. The speech waveform is varied according to the
different text from OCR output.
Acknowledgement
This project is supported by USM Short Term Grant (PKOMP/6315262) and part of the collaboration
project under Robotics, Computer Vision, and Image Processing (RCVIP) Research Group of Universiti
Sains Malaysia (USM) and Centre of Advanced Sensor and Technology (CEASTech), University
Malaysia Perlis (UniMAP).
References
[1] World Health Organization (WHO), World Report on Vision. 2014.
[2] W. H. Organization, “International Ststistical Classification of Diseases and Related Health
Problems,” 2011.
[3] I. S. for the E. of Eyesight, “20/20 Vision Activity – Eye Chart,” 2006.
[4] M. Bowen et al., “The Prevalence of Visual Impairment in People with Dementia (the PrOVIDe
study): a cross-sectional study of people aged 60–89 years with dementia and qualitative
exploration of individual, carer and professional perspectives,” Heal. Serv. Deliv. Res., vol. 4,
no. 21, pp. 1–200, 2016.
[5] T. Saba, G. Sulong, and A. Rehman, “A Survey on Methods and Strategies on Touched
Characters Segmentation,” Int. J. Res. Rev. Comput. Sci., vol. 1, no. 2, pp. 103–114, 2010.
[6] K. Vijayabharathi and V. Mahalakshmi, “Implementation of OCR Using Raspberry Pi for
Visually Impaired Person,” Int. J. Pure Appl. Math., vol. 119, no. 15, pp. 111–117, 2018.
[7] D. Dimitrova, “Students with Visual Impairments: Braille Reading Rate,” Int. J. Cogn. Res. Sci.
Eng. Educ., vol. 3, no. 1, pp. 1–6, 2015.
[8] L. A. Vader, “Measuring Vision and Vision Loss.,” Nurs. Clin. North Am., vol. 27, no. 3, pp.
705–714, 2009.
[9] E. Ashrafi et al., “National and sub-national burden of visual impairment in Iran 1990–2013;
Study protocol,” Arch. Iran. Med., vol. 17, no. 12, pp. 810–815, 2014.
[10] S. K. Singla and R. K. Yadav, “Optical character recognition based speech synthesis system
using LabVIEW,” J. Appl. Res. Technol., vol. 12, no. 5, pp. 919–926, 2014.
[11] N. Jondhale and S. Gupta, “Reading text extracted from an image using OCR and android Text
to Speech,” Int. J. Latest Eng. Manag. Res. (IJLEMR). ISSSN 2455-4847, vol. 03, no. 04, pp. 64–
67, 2018.
[12] H. Esmaeel, “Apply Android Studio ( SDK ) Tools,” Int. J. Adv. Res. Comput. Sci. Softw. Eng.,
vol. 5, no. 5, pp. 88–92, 2019.