0% found this document useful (0 votes)
24 views12 pages

Hand Written Digit Recognition2106.08267

Uploaded by

Saad Mansoor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views12 pages

Hand Written Digit Recognition2106.08267

Uploaded by

Saad Mansoor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Multi-script Handwritten Digit Recognition

Using Multi-task Learning

Mesay Samuel Gondere1 , Lars Schmidt-Thieme2 , Durga Prasad Sharma1 , and


Randolf Scholz2
1
Arba Minch University, Faculty of Computing and Software Engineering, Ethiopia
arXiv:2106.08267v1 [cs.CV] 15 Jun 2021

{mesay.samuel,sharma.dp}@amu.edu.et
2
Information Systems and Machine Learning Lab, 31141 Hildesheim, Germany
{schmidt-thieme,scholz}@ismll.uni-hildesheim.de

Abstract. Handwritten digit recognition is one of the extensively stud-


ied area in machine learning. Apart from the wider research on handwrit-
ten digit recognition on MNIST dataset, there are many other research
works on various script recognition. However, it is not very common for
multi-script digit recognition which encourage the development of robust
and multipurpose systems. Additionally working on multi-script digit
recognition enables multi-task learning, considering the script classifica-
tion as a related task for instance. It is evident that multi-task learning
improves model performance through inductive transfer using the infor-
mation contained in related tasks. Therefore, in this study multi-script
handwritten digit recognition using multi-task learning will be investi-
gated. As a specific case of demonstrating the solution to the problem,
Amharic handwritten character recognition will also be experimented.
The handwritten digits of three scripts including Latin, Arabic and Kan-
nada are studied to show that multi-task models with reformulation of
the individual tasks have shown promising results. In this study a novel
way of using the individual tasks predictions was proposed to help clas-
sification performance and regularize the different loss for the purpose
of the main task. This finding has outperformed the baseline and the
conventional multi-task learning models. More importantly, it avoided
the need for weighting the different losses of the tasks, which is one of
the challenges in multi-task learning.

Keywords: Multi-script · Handwritten Digit Recognition · Multi-task


Learning · Amharic Handwritten Character Recognition

1 Introduction
Handwritten digit recognition is commonly known to be the “Hello World” of
machine learning. Accordingly, it has been studied widely for different languages
[21,17,12]. However, this is not the case for multi-script digit recognition works
that encourage the development of robust and multipurpose systems. Whereas
in practice it is possible to see multiple scripts in a document. More impor-
tantly working on multi-script recognition opens a way for multi-task learning
2 M. Samuel et al.

(MTL), considering the script classification as an auxiliary task for instance.


Deep learning methods proved to show a very good recognition performance on
such classification tasks. Apart from the success stories of recognition perfor-
mance, the requirement for large amount of data, the issue of over-fitting, and
computation cost of the complex models has remained a challenge in the area of
deep learning. On the other hand the introduction of multi task learning seems
to have resolutions for that. With multi-task learning one can address multiple
problems reducing the requirement of having individual models[29]. On top of
this, it has shown to be good at regularizing the models which prevent from over-
fitting[27]. One can also use advantage of multi-task learning to increase amount
of dataset which is usually required in machine learning. However, multi-task
learning by itself is not free from challenges. Combining the losses of the differ-
ent tasks, tuning the hyper-parameters, and using the estimate of one task as a
feature to another task are the major challenges in multi-task learning[18].
In this study we make use of multi-task learning and also avoid one of the
challenges which is combining the different weighted losses. First we will intro-
duce the formulation of a multi-task learning setting from the individual tasks.
The motivation behind this formulation is to bring the problem of Amharic, In-
dian, Japanese, and related character recognition [14,17,4,28] to a more general
setting so that researchers contribute to the solution with ease. In these lan-
guages the alphabets can be organized in a matrix form where one can exploit
the information available over the rows and columns as they exhibit similarities,
Fig. 1,2,3. Since there is no baseline with this method, we aim at presenting
an exploratory investigation towards a higher a goal. Hence, in this study we
organize the main task (classifying the exact label) in to additional rows and
columns of different classification tasks as shown in Table 1. All these digits are
Hindu–Arabic numeral systems where the widespread Western Arabic numerals
are used with Latin scripts whereas the Eastern Arabic numerals are used with
Arabic scripts. However, Kannada with its own script is the official and admin-
istrative language of the state of Karnataka in India [7,21]. In this study, this
general method will also be applied to the specific case of Amharic handwritten
character recognition.
Finaly we will compare three models. The first one is a baseline model to clas-
sify each label as a thirty class (3scripts×10digits) classification problem. Using
a related task as an auxiliary task for MTL is the classical choice[25]. Hence, the
second model employees a conventional multi-task learning considering the clas-
sification of the rows (scripts) and columns (digits) as auxiliary tasks. The third
one which is the proposed model also applies multi-task learning however with
a new way of controling and exploiting the information contained in the related
tasks. This is basically done by creating a four class classification problem as an
auxilary task. The four class lables indicate whether the main task is good at
identifying the digit, the language, the lable (both digit and language), or none.
By doing this we will get the information regarding the training behaviour of
the different tasks which can be used to help and control the main task. For this
study since we gave emphasis to show the useful formulation and advantage of
Multi-script Handwritten Digit Recognition Using Multi-task Learning 3

multi-task learning, we adapted the ResNet[16] pretrained model to build our


models. The rest of this paper is organized as follows: related works are reviewed
in the next section. Section 3 outlines the methodology followed for the study
and experimental results are discussed under Section 4. Finally, conclusion and
future works are forwarded.

Contributions of the Paper:


i. Presents the possible formulation of individual tasks in to multi-task learning
setting
ii. Proposes a novel way of exploiting axillary tasks to regularize and help the
main task
iii. Demonstrates the proposed method on the specific case of Amharic hand-
written character recognition

Fig. 1. Parts of Amharic alphabet. Fig. 2. Parts of Devanagari alphabet.

Fig. 3. Parts of Japanese Hiragana alphabet.

2 Related Works
There are some related works on multi-script recognition and a few of them
employ multi-task learning. Sadeghi et al.[26] performed a comparative study
4 M. Samuel et al.

Table 1. Organization of the individual tasks in to multi-task settings.

between a monolingual training and bilingual training (Persian and Latin dig-
its) using deep neural networks. They have reported the superior performance
of bilingual networks in handwritten digit recognition, thereby suggesting that
mastering multiple languages might facilitate knowledge transfer across similar
domains. Bai et al. [8] proposed shared-hidden-layer deep convolutional neural
network (SHL-CNN), the input and the hidden layers are shared across char-
acters of different tasks while the final soft-max layer is task dependent. They
have used Chinese and English superimposed texts and show that the SHL-CNN
reduce recognition errors by 16-30% relatively compared with models trained
by characters of only one language. Maitra et al. [19] employed six databases:
MNIST, Bangla numerals, Devanagari numerals, Oriya numerals, Telugu nu-
merals and Bangla basic characters. They used the larger class (Bangla basic
characters) pretrained on CNN as a feature extractor with aim to show the
transfer learning that result in a good performance of other scripts with smaller
class. All of the above mentioned works didn’t address to balance the effects of
the related tasks.
Multi-Task Learning (MTL) is a learning paradigm in machine learning and
its aim is to leverage useful information contained in multiple related tasks to
help improve the generalization performance of all the tasks[29]. Technically, it
is also optimizing more than one loss function in contrast to single-task learn-
ing. We can view multi-task learning as a form of inductive transfer. Inductive
transfer can help improve a model by introducing an inductive bias provided
by the auxiliary tasks, which cause the model to prefer hypotheses that explain
more than one task [29,25]. According to Zhang et al.[29] MTL algorithms are
classified into five categories: feature learning (feature transformation and fea-
ture selection approaches), low-rank, task clustering, task relation learning, and
decomposition. The widely used approach of MTL including this study is homo-
geneous MTL and parameter based MTL with decomposition approach. In this
case the tasks are decomposed with their relevance and usually the main task
remains unpenalized. Zhang et al.[29] suggest the decomposition as a good MTL
approach with the limitation of the black box associated with coefficients and
forward future work emphasize its formalization since there is no guaranty that
MTL is better than single task.
Ruder [25] introduced the two most common methods for MTL in Deep
Learning, soft parameter sharing and hard parameter sharing. As in most com-
puter vision tasks this study uses hard parameter sharing where the hidden
layers between all tasks are shared while keeping task specific output layers.
Multi-script Handwritten Digit Recognition Using Multi-task Learning 5

The author [25] further stress that only a few papers have looked at developing
better mechanisms for MTL in deep neural networks and our understanding of
tasks, their similarity, relationship, hierarchy, and benefit for MTL is still limited.
Since multitask learning models are sensitive to task weights and task weights
are typically selected through extensive hyperparameter tuning, Guo et al. [15]
introduced dynamic task prioritization for multitask learning. This avoids the
imbalances in task difficulty which can lead to unnecessary emphasis on easier
tasks, thus neglecting and slowing progress on difficult tasks.
Research works on Amharic document recognition in general lack combined
efforts mainly due to unavailabilty of publicly available standard dataset. Ac-
cordingly different techniques are applied in different times without tracing and
following a common baseline. It is worth mentioning the work done by Assa-
bie et al.[6] for handwritten Amharic word recognition. Betselot et al.[23] also
worked on handwritten Amharic character recognition. Both works used their
own datasets and employed conventional machine learning techniques. Recently
there are some encouraging works emerging on Amharic character recognition
applying deep learning techniques. The different authors emphasized on different
types of documents including printed [2,9,13], ancient [20,11], and handwritten
documents [14,13,1]. Accordingly the research efforts in this regard lack comple-
menting each other and improving results based on a clear baseline.

3 Methodology
This section outlines the dataset preparation and organization of the models for
the experiments.

3.1 Dataset Preparation


We have used the publicly available datasets MNIST [24], MADBase [12], and
Kannada MNIST [21] for Latin, Arabic, and Kannada handwritten digit scripts
respectively. All these datasets are 28 × 28 pixel size images and they have equal
size of data-sets which is 60,000 for training and 10,000 for test. We have used
16% of the training set for validation with a balanced stratified split.
The dataset for Amharic handwritten character recognition experiment was
organized from Assabie et al. and Samuel et al. [5,14]. It was organized to 77
characters with 11 (row) by 7 (column) tabular structure as shown in Table 2.
This is done intentionally to minimize the number of classes as compared to the
number of samples available per character, that is 150. Another reason is to see
the application of the proposed method in a balanced dataset setup on visually
similar characters.

3.2 Organization of the Experiments


In this study the main task is to classify each label from all the three scripts.
That is, a trained model is expected to classify each image as it is “X digit
6 M. Samuel et al.

Table 2. 77 visually similar Amharic characters.

from Y script”. Hence, our baseline model will be a thirty class (3scripts ×
10digits) classification model, the blue line in Fig. 4. Another baseline is the
usual multi-task learning, the outer rectangle in Fig. 4, which uses the advantages
of reformulating the main problem in to other two auxiliary tasks. The third
approach, proposed, will introduce a novel way of integrating multi-task learning
to extract relevant information from the auxiliary tasks while balancing the
effect on each other.From the usual multi-task learning we show the optimum
performance by tuning the loss weights which is better than the first baseline 30
class vanilla model. For the sake of just giving a glance on how the specialized
models perform, we have also experimented the three independent single task
models. All the experiments conducted in this study are described in Table 3.
The proposed method removes the two auxiliary tasks and introduces a one
reformuated auxilary task instead. That is a four class classification problem
including getting both the row and column (the label), only the row (language),
only the column (digit), and missing both. This information can be obtained from
the main task itself by converting the predicted label in to row and column using
the formulas row = labeldiv10 and column = label mod 10 respectively. This
helps to learn the properties of the characters like how they confuse the model
during the training with out affecting each other. We give highest number that is
3 as a label for getting both rows and columns which in turn signals over fitting.
We also add this numbers with in batches to use as a factor to be multiplied
to control the loss of the main task. That is the more we know this label the
more the loss will be. Therefore, the model prefers to minimize the second loss
instead. That is predicting the properties of the characters in to four classes.
This balances and controls the whole training process.
Likewise for Amharic handwritten character recognition the same procedure
will be followed. Here the row will be 11 instead of 3 and the column will be 7
Multi-script Handwritten Digit Recognition Using Multi-task Learning 7

instead of 10. The 3×10 = 30 class classification problem will now be 11×7 = 77
class classification problem in the case of Amharic characters.

Fig. 4. Models Stucture. Blue line shows the baseline model, the outer rectangle shows
the multi-task models, and the inner rectangle represents the individual (single task)
model.

LBase = l(y, ŷ) (1)


W loss
L = l(y, ŷ) + σ1 · l(y1 , yˆ1 ) + σ2 · l(y2 , yˆ2 ) (2)
LN ew = f actor · l(y, ŷ) + l(ya , yˆa ) (3)
where in all the equations y, y1 , y2 , ŷ, yˆ1 , yˆ2 are the ground truth and
prediction of their respective label, digit, and script classes and
l is Cross Entropy Loss.

4 Experimental Results

Due to our emphasis to show the useful formulation and advantage of multi-task
learning over individual tasks, in all the models in these experiments we have
adapted the ResNet pretrained model from torchvision.models. We have used
a mini batch size of 32 and Adam optimizer. All these configurations are kept
unchanged between the individual, baseline, and the multi-task models. All the
experiments were performed using Pytorch 1.3.0 machine learning framework on
8 M. Samuel et al.

Table 3. Description of the different models in the experiment.

Model Name Equation Description


Lat (1) Single task model trained on the Latin digits
Arab (1) Single task model trained on the Arabic digits
Kan (1) Single task model trained on the Kannada digits
Base (1) Single task model trained on all the three scripts
Wloss (2) Multi-task model with weighted loss of 0.2,0.65 for
sigma 1 and 0.3,0.35 for sigma 2 for multi-script and
Amharic recognition respectively
New (3) The newly proposed model

GPU nodes connected to computing cluster at Information Systems and Machine


Learning Lab (ISMLL), University of Hildesheim.
Each model run up to 100 epochs three times. The average result on test sets
from the three evaluations by each model are presented in Table 4. The accuracy
and loss curves of the four competing models (baseline, conventional multi-task,
and proposed multi-task model) are shown in Fig. 5 for multi-script recognition
and in Fig. 6 for Amharic recognition. Further Fig. 7 show how the proposed
multi-task model regularizes the main task as compared to the conventional
multi-task learning.

Table 4. Accuracy score of the models on test sets

Model Latin digits Arabic digits Kannada digits Average Range Amharic Characters
Lat 98.45 0 0 - - -
Arab 0 98.49 0 - - -
Kan 0 0 96.25 - - -
Base 97.19 95.51 94.90 95.87 2.99 73.83
Wloss 97.18 97.94 96.13 97.08 1.81 74.68
New 97.85 98.07 97.23 97.71 0.84 75.91

The results from Table 4 show the advantages gained from multi task learn-
ing. However, it is expensive to find the optimum sigmas for weighting the dif-
ferent losses in regard to conventional multi-task setting. Whereas the proposed
multi-task approach performed best and is shown to be robust enough not to be
affected by the auxiliary tasks without a need for these hidden coefficients. This
is also likely to be one reason for we see the minimum range between the scores
of the proposed model.
As it can be seen in Fig. 7, the proposed model enforces regularization effect
for the main task. This can be seen from the oscillating behavior of the auxi-
lary task while maintaining a relatively smooth curve for the main task. This is
an interesting behavior expected since we aim at being good on the main task.
The technique incorporates the auxiliary task, their combined contribution, and
the usual main task. It is formulated in a such a way that optimizing the loss
Multi-script Handwritten Digit Recognition Using Multi-task Learning 9

Fig. 5. The learning behavior of the models, multi-script.

Fig. 6. The learning behavior of the models, Amharic.

Fig. 7. Regularizing the main task, multi-script.


10 M. Samuel et al.

implicitly observes the main task and enforces the contribution of the auxiliary
task agree to the main task. Even though this tie-up is feasible for this partic-
ular problem, it is still possible to untie by introducing desired operations that
allow the focus on the auxiliary tasks as well when needed. More importantly,
this technique from the proposed model can open the oportunity to exploit the
learned parameters of the auxiliary task during model evaluation as well. This
is not common in the conventional multi-task learning where the parameters of
the auxiliary tasks are wasted.

5 Conclusion
This study shows a formulation of multi-task learning setting from individual
tasks which can be adapted to solve related problems that can be organized
in a matrix way. Therefore, the study addressed multi-script handwritten digit
recognition using multi-task learning. Apart from exploiting the auxiliary tasks
for the main task, this study presented a novel way of using the individual tasks
predictions to help classification performance and regularize the different loss for
the purpose of the main task. This finding has outperformed the baseline and the
conventional multi-task learning models while avoiding weighted losses which is
one of the challenges in multi-task learning. In this paper the proposed method
worked for a specific case of Amharic handwritten character recognition. Hence,
similar approches can also be followed to address similarly structured languages.
Finally, we forward future works address similar multi-script multi-task learn-
ing problems encouraging the development of robust and multi-purpose systems.
The generalization of the proposed model to any type of multi-task settings could
also be a good future work to look.

References
1. Abdurahman, F., 2019. Handwritten Amharic Character Recognition System Using
Convolutional Neural Networks. Engineering Sciences, 14(2), pp.71-87.
2. Addis, D., Liu, C.M. and Ta, V.D., 2018, June. Printed ethiopic script recognition
by using lstm networks. In 2018 International Conference on System Science and
Engineering (ICSSE) (pp. 1-6). IEEE.
3. Alani, A., 2017. Arabic handwritten digit recognition based on restricted boltzmann
machine and convolutional neural networks. Information, 8(4), p.142.
4. Alom, Z., Sidike, P., Hasan, M., Taha, T.M., and Asari1, V.K., 2018. Handwrit-
ten Bangla Character Recognition Using the State-of-the-Art Deep Convolutional
Neural Networks. Computational Intelligence and Neuroscience.
5. Assabie, Y. and Bigun, J., 2009. A comprehensive Dataset for Ethiopic Handwriting
Recognition. Proceedings SSBA ’09 : Symposium on Image Analysis, Halmstad
University, pp 41–43
6. Assabie, Y. and Bigun, J., 2011. Offline handwritten Amharic word recognition.
Pattern Recognition Letters, 32(8), pp.1089-1099.
7. Ashiquzzaman, A. and Tushar, A.K., 2017, February. Handwritten Arabic numeral
recognition using deep learning neural networks. In 2017 IEEE International Con-
ference on Imaging, Vision and Pattern Recognition (icIVPR) (pp. 1-4). IEEE.
Multi-script Handwritten Digit Recognition Using Multi-task Learning 11

8. Bai, J., Chen, Z., Feng, B. and Xu, B., 2014, October. Image character recognition
using deep convolutional neural network learned from different languages. In 2014
IEEE International Conference on Image Processing (ICIP) (pp. 2560-2564). IEEE.
9. Belay, B.H., Habtegebirial, T., Liwicki, M., Belay, G. and Stricker, D., 2019, Septem-
ber. Amharic text image recognition: Database, algorithm, and analysis. In 2019
International Conference on Document Analysis and Recognition (ICDAR) (pp.
1268-1273). IEEE.
10. Das, S. and Banerjee, S., 2014. Survey of Pattern Recognition Approaches in
Japanese Character Recognition. International Journal of Computer Science and
Information Technologies, 5(1), pp.93-99.
11. Demilew, F.A., 2019. Ancient Geez Script Recognition Using Deep Convolutional
Neural Network (Doctoral dissertation, Near East University).
12. El-Sawy, A., Hazem, E.B. and Loey, M., 2016, October. CNN for handwritten
arabic digits recognition based on LeNet-5. In International Conference on Advanced
Intelligent Systems and Informatics (pp. 566-575). Springer, Cham.
13. Gebretinsae Beyene, E., 2019. Handwritten and Machine printed OCR for Geez
Numbers Using Artificial Neural Network. arXiv e-prints, pp.arXiv-1911.
14. Gondere, M.S., Schmidt-Thieme, L., Boltena, A.S. and Jomaa, H.S., 2019. Hand-
written amharic character recognition using a convolutional neural network. arXiv
preprint arXiv:1909.12943.
15. Guo, M., Haque, A., Huang, D.A., Yeung, S. and Fei-Fei, L., 2018. Dynamic task
prioritization for multitask learning. In Proceedings of the European Conference on
Computer Vision (ECCV) (pp. 270-287).
16. He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual learning for image
recognition. In Proceedings of the IEEE conference on computer vision and pattern
recognition (pp. 770-778).
17. Jangid, M. and Srivastava, S., 2018. Handwritten devanagari character recogni-
tion using layer-wise training of deep convolutional neural networks and adaptive
gradient methods. Journal of Imaging, 4(2), p.41.
18. Kendall, A., Gal, Y. and Cipolla, R., 2018. Multi-task learning using uncertainty
to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition (pp. 7482-7491).
19. Maitra, D.S., Bhattacharya, U. and Parui, S.K., 2015, August. CNN based common
approach to handwritten character recognition of multiple scripts. In 2015 13th
International Conference on Document Analysis and Recognition (ICDAR) (pp.
1021-1025). IEEE.
20. Negashe, G. and Mamuye, A., 2020. Modified Segmentation Algorithm for Recog-
nition of Older Geez Scripts Written on Vellum. arXiv preprint arXiv:2006.00465.
21. Prabhu, V.U., 2019. Kannada-mnist: A new handwritten digits dataset for the
kannada language. arXiv preprint arXiv:1908.01242.
22. Rajput, G.G., Horakeri, R. and Chandrakant, S., 2010. Printed and handwritten
kannada numeral recognition using crack codes and fourier descriptors plate. In-
ternational Journal of Computer Application (IJCA) on Recent Trends in Image
Processing and Pattern Recognition (RTIPPR), pp.53-58.
23. Reta, B.Y., Rana, D. and Bhalerao, G.V., 2018, May. Amharic handwritten char-
acter recognition using combined features and support vector machine. In 2018 2nd
International Conference on Trends in Electronics and Informatics (ICOEI) (pp.
265-270). IEEE.
24. Romanuke, V.V., 2016. Training data expansion and boosting of convolutional
neural networks for reducing the MNIST dataset error rate.
12 M. Samuel et al.

25. Ruder, S., 2017. An overview of multi-task learning in deep neural networks. arXiv
preprint arXiv:1706.05098.
26. Sadeghi, Z., Testolin, A. and Zorzi, M., 2017, October. Bilingualism advantage in
handwritten character recognition: A deep learning investigation on Persian and
Latin scripts. In 2017 7th International Conference on Computer and Knowledge
Engineering (ICCKE) (pp. 27-32). IEEE.
27. Sener, O. and Koltun, V., 2018. Multi-task learning as multi-objective optimiza-
tion. In Advances in Neural Information Processing Systems (pp. 527-538).
28. Tsai, C., 2016. Recognizing handwritten Japanese characters using deep convolu-
tional neural networks. university of Stanford in Stanford, California.
29. Zhang, Y. and Yang, Q., 2017. A survey on multi-task learning. arXiv preprint
arXiv:1707.08114.

You might also like