Super-Resolution of Document Images Using Transfer Deep Learning of An ESRGAN Model
Super-Resolution of Document Images Using Transfer Deep Learning of An ESRGAN Model
Abstract— This paper presents a novel super-resolution inspired approach based on deep learning neural networks,
approach of document images. It is based on transfer deep we have shown that the latter is able to learn the mechanism
learning of an ESRGAN model. This model, which showed of a super-resolution approach and to make possible its
good robustness on natural images, has been adapted to extension, which is vital for its quality, with a very reduced
document images by using better levels of fine-tuning and a complexity.
post-processing to enhance contrast. The experiments were
carried out on our document image dataset that we built from The structure of this article is as follows: part 2
document images presenting different challenges. Documents gave a deep sight on the bibliographical review of existing
2022 5th International Symposium on Informatics and its Applications (ISIA) | 978-1-6654-7473-3/22/$31.00 ©2022 IEEE | DOI: 10.1109/ISIA55826.2022.9993497
of different categories with different complexity levels and and most interesting SR approaches, section 3 presented our
degradation kinds. The results obtained are better compared to developed approach, and section 4 spot light on evaluation
ten existing approaches, which we have developed and tested and experimental results.
on the same dataset with the same evaluation protocol.
II. EXISTING SISR APPROACHES
Keywords— Image processing; super-resolution; document
The techniques in this category are further divided into two
images; deep learning; intelligent vision and AI.
categories: techniques based on direct transformation of LR
I. INTRODUCTION image (without learning) and techniques based on machine
Over the past decade, deep learning-based image learning process.
processing has become a very active area of research, using A. Techniques based on direct transformation
super resolution (SR) which refers to image restoration to
There are two broad groups into which these techniques
enhance image's quality. The principle consists in
could be divided: interpolation and reconstruction based
transforming a low resolution images (LR) into a high
resolution images (HR) of better quality and readability. In techniques. Interpolation-based techniques apply an
the field of automatic document processing, the interpolation on the LR image to obtain an HR image. In the
transformation of LR images into HR images is an important literature, there are two types of interpolation methods non-
step to reduce the errors and difficulties of automatic reading adaptive and adaptive interpolation method.
of this type of document images. This task is very complex The Non-adaptive (classic) methods increase at the
on documents of great variability which present degradations beginning the LR image size, then find the un-known value
and fine writings or a font text of very small size. These of each pixel using its neighboring pixels. This type of
poses difficulties to conventional SR approach and require algorithm is very efficient in homogeneous regions,
more advanced approaches, more adapted to these however it fails to maintain the integrity of the contour
constraints. structures. Among this methods, we cite Nearest neighbor
The current literature presents a many super interpolation, bilinear interpolation [1], Bicubic
resolution methods related to the types of images to be interpolation [1], Basic-spline (bspline) [2], Mitchell [3],
processed. Single-image super-resolution (SISR) and multi- Apodization of Hanning, Lanczos [3], Bell filter and
image super-resolution (MISR) are two major categories Gaussian interpolation [4].
that can be used to categorize these approaches depending on The adaptive interpolation techniques come to
the input number of LR images, necessary for the cover the gaps and the limits of the conventional techniques
construction of the HR image. In our case, we generate HR which reside in their follow-up of the same model along the
resolution image from single LR image, so the first category whole image. These methods treat each part of image
of techniques (SISR) is more suitable for this problem, even differently, depending on the local variations . Among the
if it presents more difficulties and challenges. In the field of adaptive methods, we cite : NEDI [5], DDT [6], FCBI [7],
document scanning, there is a strong demand for higher ICBI [8].
resolution images with sufficient quality and text sharpness Reconstruction-based techniques are based on
to succeed in the automatic reading of document images of transformation of image, using mathematical operators
any type. Therefore, the search for a solution that meets this (masks, convolution, etc.) to find the high resolution image.
need presents challenges in the world of computer vision. It This category includes several techniques such as: Based on
is in fact a problem of image restoration or reconstruction to wavelets method [9,10], Expert Areas [11], Gradient Profile
achieve a higher resolution. In this PFE project, we propose a [12], Primary Sketches [13], Based on Contourlet
software solution to increase the resolution of images of
Transformation [14], Bilateral Filters used for edge
documents of any type. The main objective is to improve the
readability of text on these images and to reduce the preservation [15], methods based on a mixture of Gaussian
processing time. For this purpose, we have employed a bio- models [16], and methods SISR based on surveying
adjustment [10].
Authorized licensed use limited to: VIT University. Downloaded on January 20,2024 at 07:06:44 UTC from IEEE Xplore. Restrictions apply.
3
Authorized licensed use limited to: VIT University. Downloaded on January 20,2024 at 07:06:44 UTC from IEEE Xplore. Restrictions apply.
4
𝑚−1 𝑛−1
1 2
𝑀𝑆𝐸 = 𝐼 𝑖, 𝑗 − 𝐼′(𝑖, 𝑗) (𝟐)
𝑚𝑛
𝑖=0 𝑗 =0
where,
I(i,j) represents the pixel values of the reference
HR image. I'(i,j) represents the pixel values of the HR image
of obtained by the SR. m represents the number of rows of
pixels in the image and i the index of these rows. n
represents the number of columns of pixels in the image and
j the index of these columns. Max f is the maximum value of
the signal f that exists in the original image (given that the
original image is in good quality-).
Authorized licensed use limited to: VIT University. Downloaded on January 20,2024 at 07:06:44 UTC from IEEE Xplore. Restrictions apply.
5
Authorized licensed use limited to: VIT University. Downloaded on January 20,2024 at 07:06:44 UTC from IEEE Xplore. Restrictions apply.
6
limitations and vulnerabilities rigorously and objectively, [21] M. Carcenac, A modular neural network for super-resolution of
human faces, Ap-plied Intelligence, pp 168-186, 2007.
we can demonstrate the reliability and flexibility of our
[22] Dong, C.C. Loy, K. He and X. Tang, Learning a Deep Convolutional
approach to restoring images of all types. Network for Image Super-Resolution, European Conference on
Computer Vision (ECCV), 2014.
REFERENCES [23] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D.
Rueckert, and Z. Wang, Real-time single image and video super
[1] Prajapati, Evaluation of Different Image Interpolation Algorithms. resolution using an efficient sub-pixel convolutional neural network,
International Journal of Computer Applications, 7, November 2012. in IEEE Int. Conf. on Computer Vision and Pattern Recognition, pp.
[2] T. Acharya and P. Tsai, Computational Foundations of Image 1874–1883,2016.
Interpolation Algo-rithms,ACM Ubiquity 8, 2007. [24] J. Kim, J. Kwon Lee, and K. Mu Lee, Accurate image super-
[3] W. Burger and M. Burge , Digital Image Processing: An Algorithmic resolution using very deep convolutional networks, in IEEE
Introduction Using Java, Springer Science, New York, NY, 2009. Conference on Computer Vision and Pattern Recognition, pp. 1646–
1654, 2016.
[4] R. Appledorn, A new approach to the interpolation of sampled data,
in IEEE Trans-actions on Medical Imaging , 15( 3), pp. 369- [25] Y. Tai, J. Yang, and X. Liu, Image super-resolution via deep
376,1996. recursive residual net-work, in IEEE Conference on Computer Vision
and Pattern Recognition, pp. 3147–3155, 2017.
[5] Li X.and Orchard M. T.. New edge-directed interpolation, in IEEE
Transaction on Image Processing, pp. 1521–1527,2001. [26] Y. Tai, J. Yang, X. Liu and C. Xu, MemNet: A persistent memory
network for image restoration, in IEEE Conference on Computer
[6] Dan Su, P. W.Image Interpolation by Pixel Level Data Dependent
Vision and Pattern Recognition, pp. 4539–4547, 2017.
Trian-gulation. COMPUTER GRAPHICS Forum, 2002.
[27] G. Suryanarayana, K. Chandran, O. I. Khalaf, Y. Alotaibi, A.
[7] Reddy K.S.et.Reddy K.R.L, Enlargement of Image Based Upon
Alsufyani and S. A. Alghamdi, Accurate Magnetic Resonance Image
Interpolation Tech-niques, Department of Electronics and
Super-Resolution Using Deep Net-works and Gaussian Filtering in
Communication Engineering VITS, Karimna-gar India, Decembre
the Stationary Wavelet Domain, in IEEE Access, pp. 71406-71417,
2013.
2021.
[8] J.C. Gillette,T.M. Stadtmiller and Hardie R.C., Aliasing reduction in
[28] Ledig, L. Theis,F. Huszar and J. Caballero, Photo-Realistic Single
staring infrared imagers utilizing subpixel techniques, Optical
Image Super-Resolution Using a Generative Adversarial Network. In
Engineering, 34 31-37.
IEEE Conference on Com-puter Vision and Pattern, 2017.
[9] G. Anbarjafari and H. Demirel , Image Super Resolution Based on
[29] J. Lei, H. Xue, Sh. Yang, W. Shi, Sh. Zhang, and Y. Wu, HFF-
Interpolation of Wavelet Domain High Frequency Subbandsandthe
SRGAN: super-resolution generative adversarial network based on
Spatial Domain Input Image, 32pp. 390-394, 2010.
high-frequency feature fusion, Journal of Electronic Imaging, 2022.
[10] Z. Jianjun,Z. Cui , F.Donghao and Z. Jinghong, A New Method for
[30] H. Lei, W. Zugen, Ch. Tian, Z. Yongmei. An Improved SRGAN
Super resolution Image Reconstruction Based on Surveying
Infrared Image Super-Resolution Reconstruction Algorithm[J].
Adjustment, Journal of Nanomatrials, 2014.
Journal of System Simulation, pp. 2109-2118, 2021.
[11] S. Roth and M. Black: Fields of experts: a framework for learning
[31] B. Lim, S. Son, H. Kim, S. Nah, and K.M. Lee, Enhanced deep
image priors. in IEEE Int. Conf. on Computer Vision and Pattern
residual networks for single image super-resolution, in IEEE
Recognition, USA 2005.
Conference on Computer Vision and Pattern,2017.
[12] J. Sun, Z.B. Xx and H.Y. Shum, Image super-resolution using
[32] S. Nah, T.H. Kim,K.M. Lee, Deep multi-scale convolutional neural
gradient profile prior, in IEEE Int. Conf. on Computer Vision and
network for dynamic scene deblurring. in IEEE Conference on
Pattern Recognition, USA 2008.
Computer Vision and Pattern Recognition, pp. 3883-3891,2017.
[13] J. Sun, N. Zheng, H. Tao and H.Y. Shum.: Image hallucination with
[33] C. Szegedy, S. Ioffe, V. Vanhoucke, Inception-v4, inception-resnet
primal sketch priors, in IEEE Int. Conf. on Computer Vision and
and the impactof residual connections on learning. Thirty-First AAAI
Pattern Recognition, 729–736, 2003.
Conference on Artificial Intelli-gence , 2016.
[14] C.V. Jiji and S. Chaudhuri Single-frame image super-resolution
[34] L. Gatys, A.S. Ecker, M. Bethge, Texture synthesis using
through contourlet learning. EURASIP Journal Advanced Signal
convolutional neural net-works. Pattern Recognition Association of
Process, 2006.
South Africa and Robotics and Mecha-tronics International
[15] S. Dai,M. Han, Y. Wu and Y. Gong, Bilateral back-projection for Conference (PRASA-RobMech) pp. 1-6,2016.
single image super resolution, in IEEE Int. Conf. on Multimedia and
[35] J. Bruna, P. Sprechmann, Y. LeCun, Super-resolution with deep
Expo, pp. 1039–1042, 2007.
convolutional suffi-cient statistics, in International Conference on
[16] Y. Ogawa, Y. Ariki and T. Takiguchi, Super-resolution by Learning Representations, 2015.
GMMbased conversion using self-reduction image, in: IEEE Int.
[36] J. Johnson,A. Alahi, L. Fei-Fei, Perceptual Losses for Real-Time
Conf. on Acoustics, Speech and Signal Processing, , pp. 1285–1288,
Style Transfer and Super-Resolution. In Computer Vision and Pattern
2012.
Recognition, 2016.
[17] S. Hakyin, Neural network, a comprehensive foundation, Prentice-
[37] C. Ledig, L.Theis,F. Huszar, , J. Caballero, A. Cunningham, A.
Hall, 1994.
Acosta,A. Aitken, A. Tejani, J. Totz and Z. Wang, Z.: Photo-realistic
[18] F. Ahmed, S.C. Gustafson and M.A. Karim, High fidelity image single image super-resolution using a generative adversarial network,
interpolation using radial basis function neural networks, in IEEE in Computer Vision and Pattern Recognition, 2017.
National Aerospace and Electronics Conference, , pp. 588-592, 1995.
[38] U. Sara,M. Akter, and M. Uddin, (2019) Image Quality Assessment
[19] N. Plaziac. Image interpolation using neural networks, IEEE through FSIM, SSIM, MSE and PSNR—A Comparative Study.
Transactions on Image Processing, pp. 1647–1651, 1999. Journal of Computer and Communications, vol. 7, pp. 8-18, 2019.
[20] F. Pan and L. Zhang, New image super-resolution scheme based on [39] X. Wang, L. Xie, C. Dong, & Y. Shan, Real-esrgan: Training real-
residual error restoration by neural networks, Optical Engineering , world blind super-resolution with pure synthetic data. In Proceedings
pp. 3038-3046, 2003. of the IEEE/CVF International Conference on Computer Vision ,pp.
1905-1914, 2021.
Authorized licensed use limited to: VIT University. Downloaded on January 20,2024 at 07:06:44 UTC from IEEE Xplore. Restrictions apply.