Hand Written Digit Recognition2106.08267
Hand Written Digit Recognition2106.08267
{mesay.samuel,sharma.dp}@amu.edu.et
2
Information Systems and Machine Learning Lab, 31141 Hildesheim, Germany
{schmidt-thieme,scholz}@ismll.uni-hildesheim.de
1 Introduction
Handwritten digit recognition is commonly known to be the “Hello World” of
machine learning. Accordingly, it has been studied widely for different languages
[21,17,12]. However, this is not the case for multi-script digit recognition works
that encourage the development of robust and multipurpose systems. Whereas
in practice it is possible to see multiple scripts in a document. More impor-
tantly working on multi-script recognition opens a way for multi-task learning
2 M. Samuel et al.
2 Related Works
There are some related works on multi-script recognition and a few of them
employ multi-task learning. Sadeghi et al.[26] performed a comparative study
4 M. Samuel et al.
between a monolingual training and bilingual training (Persian and Latin dig-
its) using deep neural networks. They have reported the superior performance
of bilingual networks in handwritten digit recognition, thereby suggesting that
mastering multiple languages might facilitate knowledge transfer across similar
domains. Bai et al. [8] proposed shared-hidden-layer deep convolutional neural
network (SHL-CNN), the input and the hidden layers are shared across char-
acters of different tasks while the final soft-max layer is task dependent. They
have used Chinese and English superimposed texts and show that the SHL-CNN
reduce recognition errors by 16-30% relatively compared with models trained
by characters of only one language. Maitra et al. [19] employed six databases:
MNIST, Bangla numerals, Devanagari numerals, Oriya numerals, Telugu nu-
merals and Bangla basic characters. They used the larger class (Bangla basic
characters) pretrained on CNN as a feature extractor with aim to show the
transfer learning that result in a good performance of other scripts with smaller
class. All of the above mentioned works didn’t address to balance the effects of
the related tasks.
Multi-Task Learning (MTL) is a learning paradigm in machine learning and
its aim is to leverage useful information contained in multiple related tasks to
help improve the generalization performance of all the tasks[29]. Technically, it
is also optimizing more than one loss function in contrast to single-task learn-
ing. We can view multi-task learning as a form of inductive transfer. Inductive
transfer can help improve a model by introducing an inductive bias provided
by the auxiliary tasks, which cause the model to prefer hypotheses that explain
more than one task [29,25]. According to Zhang et al.[29] MTL algorithms are
classified into five categories: feature learning (feature transformation and fea-
ture selection approaches), low-rank, task clustering, task relation learning, and
decomposition. The widely used approach of MTL including this study is homo-
geneous MTL and parameter based MTL with decomposition approach. In this
case the tasks are decomposed with their relevance and usually the main task
remains unpenalized. Zhang et al.[29] suggest the decomposition as a good MTL
approach with the limitation of the black box associated with coefficients and
forward future work emphasize its formalization since there is no guaranty that
MTL is better than single task.
Ruder [25] introduced the two most common methods for MTL in Deep
Learning, soft parameter sharing and hard parameter sharing. As in most com-
puter vision tasks this study uses hard parameter sharing where the hidden
layers between all tasks are shared while keeping task specific output layers.
Multi-script Handwritten Digit Recognition Using Multi-task Learning 5
The author [25] further stress that only a few papers have looked at developing
better mechanisms for MTL in deep neural networks and our understanding of
tasks, their similarity, relationship, hierarchy, and benefit for MTL is still limited.
Since multitask learning models are sensitive to task weights and task weights
are typically selected through extensive hyperparameter tuning, Guo et al. [15]
introduced dynamic task prioritization for multitask learning. This avoids the
imbalances in task difficulty which can lead to unnecessary emphasis on easier
tasks, thus neglecting and slowing progress on difficult tasks.
Research works on Amharic document recognition in general lack combined
efforts mainly due to unavailabilty of publicly available standard dataset. Ac-
cordingly different techniques are applied in different times without tracing and
following a common baseline. It is worth mentioning the work done by Assa-
bie et al.[6] for handwritten Amharic word recognition. Betselot et al.[23] also
worked on handwritten Amharic character recognition. Both works used their
own datasets and employed conventional machine learning techniques. Recently
there are some encouraging works emerging on Amharic character recognition
applying deep learning techniques. The different authors emphasized on different
types of documents including printed [2,9,13], ancient [20,11], and handwritten
documents [14,13,1]. Accordingly the research efforts in this regard lack comple-
menting each other and improving results based on a clear baseline.
3 Methodology
This section outlines the dataset preparation and organization of the models for
the experiments.
from Y script”. Hence, our baseline model will be a thirty class (3scripts ×
10digits) classification model, the blue line in Fig. 4. Another baseline is the
usual multi-task learning, the outer rectangle in Fig. 4, which uses the advantages
of reformulating the main problem in to other two auxiliary tasks. The third
approach, proposed, will introduce a novel way of integrating multi-task learning
to extract relevant information from the auxiliary tasks while balancing the
effect on each other.From the usual multi-task learning we show the optimum
performance by tuning the loss weights which is better than the first baseline 30
class vanilla model. For the sake of just giving a glance on how the specialized
models perform, we have also experimented the three independent single task
models. All the experiments conducted in this study are described in Table 3.
The proposed method removes the two auxiliary tasks and introduces a one
reformuated auxilary task instead. That is a four class classification problem
including getting both the row and column (the label), only the row (language),
only the column (digit), and missing both. This information can be obtained from
the main task itself by converting the predicted label in to row and column using
the formulas row = labeldiv10 and column = label mod 10 respectively. This
helps to learn the properties of the characters like how they confuse the model
during the training with out affecting each other. We give highest number that is
3 as a label for getting both rows and columns which in turn signals over fitting.
We also add this numbers with in batches to use as a factor to be multiplied
to control the loss of the main task. That is the more we know this label the
more the loss will be. Therefore, the model prefers to minimize the second loss
instead. That is predicting the properties of the characters in to four classes.
This balances and controls the whole training process.
Likewise for Amharic handwritten character recognition the same procedure
will be followed. Here the row will be 11 instead of 3 and the column will be 7
Multi-script Handwritten Digit Recognition Using Multi-task Learning 7
instead of 10. The 3×10 = 30 class classification problem will now be 11×7 = 77
class classification problem in the case of Amharic characters.
Fig. 4. Models Stucture. Blue line shows the baseline model, the outer rectangle shows
the multi-task models, and the inner rectangle represents the individual (single task)
model.
4 Experimental Results
Due to our emphasis to show the useful formulation and advantage of multi-task
learning over individual tasks, in all the models in these experiments we have
adapted the ResNet pretrained model from torchvision.models. We have used
a mini batch size of 32 and Adam optimizer. All these configurations are kept
unchanged between the individual, baseline, and the multi-task models. All the
experiments were performed using Pytorch 1.3.0 machine learning framework on
8 M. Samuel et al.
Model Latin digits Arabic digits Kannada digits Average Range Amharic Characters
Lat 98.45 0 0 - - -
Arab 0 98.49 0 - - -
Kan 0 0 96.25 - - -
Base 97.19 95.51 94.90 95.87 2.99 73.83
Wloss 97.18 97.94 96.13 97.08 1.81 74.68
New 97.85 98.07 97.23 97.71 0.84 75.91
The results from Table 4 show the advantages gained from multi task learn-
ing. However, it is expensive to find the optimum sigmas for weighting the dif-
ferent losses in regard to conventional multi-task setting. Whereas the proposed
multi-task approach performed best and is shown to be robust enough not to be
affected by the auxiliary tasks without a need for these hidden coefficients. This
is also likely to be one reason for we see the minimum range between the scores
of the proposed model.
As it can be seen in Fig. 7, the proposed model enforces regularization effect
for the main task. This can be seen from the oscillating behavior of the auxi-
lary task while maintaining a relatively smooth curve for the main task. This is
an interesting behavior expected since we aim at being good on the main task.
The technique incorporates the auxiliary task, their combined contribution, and
the usual main task. It is formulated in a such a way that optimizing the loss
Multi-script Handwritten Digit Recognition Using Multi-task Learning 9
implicitly observes the main task and enforces the contribution of the auxiliary
task agree to the main task. Even though this tie-up is feasible for this partic-
ular problem, it is still possible to untie by introducing desired operations that
allow the focus on the auxiliary tasks as well when needed. More importantly,
this technique from the proposed model can open the oportunity to exploit the
learned parameters of the auxiliary task during model evaluation as well. This
is not common in the conventional multi-task learning where the parameters of
the auxiliary tasks are wasted.
5 Conclusion
This study shows a formulation of multi-task learning setting from individual
tasks which can be adapted to solve related problems that can be organized
in a matrix way. Therefore, the study addressed multi-script handwritten digit
recognition using multi-task learning. Apart from exploiting the auxiliary tasks
for the main task, this study presented a novel way of using the individual tasks
predictions to help classification performance and regularize the different loss for
the purpose of the main task. This finding has outperformed the baseline and the
conventional multi-task learning models while avoiding weighted losses which is
one of the challenges in multi-task learning. In this paper the proposed method
worked for a specific case of Amharic handwritten character recognition. Hence,
similar approches can also be followed to address similarly structured languages.
Finally, we forward future works address similar multi-script multi-task learn-
ing problems encouraging the development of robust and multi-purpose systems.
The generalization of the proposed model to any type of multi-task settings could
also be a good future work to look.
References
1. Abdurahman, F., 2019. Handwritten Amharic Character Recognition System Using
Convolutional Neural Networks. Engineering Sciences, 14(2), pp.71-87.
2. Addis, D., Liu, C.M. and Ta, V.D., 2018, June. Printed ethiopic script recognition
by using lstm networks. In 2018 International Conference on System Science and
Engineering (ICSSE) (pp. 1-6). IEEE.
3. Alani, A., 2017. Arabic handwritten digit recognition based on restricted boltzmann
machine and convolutional neural networks. Information, 8(4), p.142.
4. Alom, Z., Sidike, P., Hasan, M., Taha, T.M., and Asari1, V.K., 2018. Handwrit-
ten Bangla Character Recognition Using the State-of-the-Art Deep Convolutional
Neural Networks. Computational Intelligence and Neuroscience.
5. Assabie, Y. and Bigun, J., 2009. A comprehensive Dataset for Ethiopic Handwriting
Recognition. Proceedings SSBA ’09 : Symposium on Image Analysis, Halmstad
University, pp 41–43
6. Assabie, Y. and Bigun, J., 2011. Offline handwritten Amharic word recognition.
Pattern Recognition Letters, 32(8), pp.1089-1099.
7. Ashiquzzaman, A. and Tushar, A.K., 2017, February. Handwritten Arabic numeral
recognition using deep learning neural networks. In 2017 IEEE International Con-
ference on Imaging, Vision and Pattern Recognition (icIVPR) (pp. 1-4). IEEE.
Multi-script Handwritten Digit Recognition Using Multi-task Learning 11
8. Bai, J., Chen, Z., Feng, B. and Xu, B., 2014, October. Image character recognition
using deep convolutional neural network learned from different languages. In 2014
IEEE International Conference on Image Processing (ICIP) (pp. 2560-2564). IEEE.
9. Belay, B.H., Habtegebirial, T., Liwicki, M., Belay, G. and Stricker, D., 2019, Septem-
ber. Amharic text image recognition: Database, algorithm, and analysis. In 2019
International Conference on Document Analysis and Recognition (ICDAR) (pp.
1268-1273). IEEE.
10. Das, S. and Banerjee, S., 2014. Survey of Pattern Recognition Approaches in
Japanese Character Recognition. International Journal of Computer Science and
Information Technologies, 5(1), pp.93-99.
11. Demilew, F.A., 2019. Ancient Geez Script Recognition Using Deep Convolutional
Neural Network (Doctoral dissertation, Near East University).
12. El-Sawy, A., Hazem, E.B. and Loey, M., 2016, October. CNN for handwritten
arabic digits recognition based on LeNet-5. In International Conference on Advanced
Intelligent Systems and Informatics (pp. 566-575). Springer, Cham.
13. Gebretinsae Beyene, E., 2019. Handwritten and Machine printed OCR for Geez
Numbers Using Artificial Neural Network. arXiv e-prints, pp.arXiv-1911.
14. Gondere, M.S., Schmidt-Thieme, L., Boltena, A.S. and Jomaa, H.S., 2019. Hand-
written amharic character recognition using a convolutional neural network. arXiv
preprint arXiv:1909.12943.
15. Guo, M., Haque, A., Huang, D.A., Yeung, S. and Fei-Fei, L., 2018. Dynamic task
prioritization for multitask learning. In Proceedings of the European Conference on
Computer Vision (ECCV) (pp. 270-287).
16. He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual learning for image
recognition. In Proceedings of the IEEE conference on computer vision and pattern
recognition (pp. 770-778).
17. Jangid, M. and Srivastava, S., 2018. Handwritten devanagari character recogni-
tion using layer-wise training of deep convolutional neural networks and adaptive
gradient methods. Journal of Imaging, 4(2), p.41.
18. Kendall, A., Gal, Y. and Cipolla, R., 2018. Multi-task learning using uncertainty
to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition (pp. 7482-7491).
19. Maitra, D.S., Bhattacharya, U. and Parui, S.K., 2015, August. CNN based common
approach to handwritten character recognition of multiple scripts. In 2015 13th
International Conference on Document Analysis and Recognition (ICDAR) (pp.
1021-1025). IEEE.
20. Negashe, G. and Mamuye, A., 2020. Modified Segmentation Algorithm for Recog-
nition of Older Geez Scripts Written on Vellum. arXiv preprint arXiv:2006.00465.
21. Prabhu, V.U., 2019. Kannada-mnist: A new handwritten digits dataset for the
kannada language. arXiv preprint arXiv:1908.01242.
22. Rajput, G.G., Horakeri, R. and Chandrakant, S., 2010. Printed and handwritten
kannada numeral recognition using crack codes and fourier descriptors plate. In-
ternational Journal of Computer Application (IJCA) on Recent Trends in Image
Processing and Pattern Recognition (RTIPPR), pp.53-58.
23. Reta, B.Y., Rana, D. and Bhalerao, G.V., 2018, May. Amharic handwritten char-
acter recognition using combined features and support vector machine. In 2018 2nd
International Conference on Trends in Electronics and Informatics (ICOEI) (pp.
265-270). IEEE.
24. Romanuke, V.V., 2016. Training data expansion and boosting of convolutional
neural networks for reducing the MNIST dataset error rate.
12 M. Samuel et al.
25. Ruder, S., 2017. An overview of multi-task learning in deep neural networks. arXiv
preprint arXiv:1706.05098.
26. Sadeghi, Z., Testolin, A. and Zorzi, M., 2017, October. Bilingualism advantage in
handwritten character recognition: A deep learning investigation on Persian and
Latin scripts. In 2017 7th International Conference on Computer and Knowledge
Engineering (ICCKE) (pp. 27-32). IEEE.
27. Sener, O. and Koltun, V., 2018. Multi-task learning as multi-objective optimiza-
tion. In Advances in Neural Information Processing Systems (pp. 527-538).
28. Tsai, C., 2016. Recognizing handwritten Japanese characters using deep convolu-
tional neural networks. university of Stanford in Stanford, California.
29. Zhang, Y. and Yang, Q., 2017. A survey on multi-task learning. arXiv preprint
arXiv:1707.08114.