0% found this document useful (0 votes)

77 views

Super-Resolution of Document Images Using Transfer Deep Learning of An ESRGAN Model

This paper presents a novel super-resolution approach for document images based on transferring a deep learning model called ESRGAN. The ESRGAN model showed good results on natural images and is adapted here for document images with fine-tuning and post-processing. The approach is tested on a custom document image dataset and shows better results than 10 other approaches when evaluated with the same protocol.

Uploaded by

darshanmythreya14

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views

Super-Resolution of Document Images Using Transfer Deep Learning of An ESRGAN Model

Uploaded by

darshanmythreya14

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Super-resolution of document images using transfer deep

learning of an ESRGAN model

2nd Djamel Gaceb 3rd Nadjat Gritli
st
1 Zakia Kezzoula University M'Hamed Bougara of University M'Hamed Bougara of
University M'Hamed Bougara of Boumerdes, Boumerdes,
Boumerdes, LIMOSE Laboratory LIMOSE Laboratory
LIMOSE Laboratory ALGERIA ALGERIA
ALGERIA [email protected] [email protected]
[email protected]

Abstract— This paper presents a novel super-resolution inspired approach based on deep learning neural networks,
approach of document images. It is based on transfer deep we have shown that the latter is able to learn the mechanism
learning of an ESRGAN model. This model, which showed of a super-resolution approach and to make possible its
good robustness on natural images, has been adapted to extension, which is vital for its quality, with a very reduced
document images by using better levels of fine-tuning and a complexity.
post-processing to enhance contrast. The experiments were
carried out on our document image dataset that we built from The structure of this article is as follows: part 2
document images presenting different challenges. Documents gave a deep sight on the bibliographical review of existing
2022 5th International Symposium on Informatics and its Applications (ISIA) | 978-1-6654-7473-3/22/$31.00 ©2022 IEEE | DOI: 10.1109/ISIA55826.2022.9993497

of different categories with different complexity levels and and most interesting SR approaches, section 3 presented our
degradation kinds. The results obtained are better compared to developed approach, and section 4 spot light on evaluation
ten existing approaches, which we have developed and tested and experimental results.
on the same dataset with the same evaluation protocol.
II. EXISTING SISR APPROACHES
Keywords— Image processing; super-resolution; document
The techniques in this category are further divided into two
images; deep learning; intelligent vision and AI.
categories: techniques based on direct transformation of LR
I. INTRODUCTION image (without learning) and techniques based on machine
Over the past decade, deep learning-based image learning process.
processing has become a very active area of research, using A. Techniques based on direct transformation
super resolution (SR) which refers to image restoration to
There are two broad groups into which these techniques
enhance image's quality. The principle consists in
could be divided: interpolation and reconstruction based
transforming a low resolution images (LR) into a high
resolution images (HR) of better quality and readability. In techniques. Interpolation-based techniques apply an
the field of automatic document processing, the interpolation on the LR image to obtain an HR image. In the
transformation of LR images into HR images is an important literature, there are two types of interpolation methods non-
step to reduce the errors and difficulties of automatic reading adaptive and adaptive interpolation method.
of this type of document images. This task is very complex The Non-adaptive (classic) methods increase at the
on documents of great variability which present degradations beginning the LR image size, then find the un-known value
and fine writings or a font text of very small size. These of each pixel using its neighboring pixels. This type of
poses difficulties to conventional SR approach and require algorithm is very efficient in homogeneous regions,
more advanced approaches, more adapted to these however it fails to maintain the integrity of the contour
constraints. structures. Among this methods, we cite Nearest neighbor
The current literature presents a many super interpolation, bilinear interpolation [1], Bicubic
resolution methods related to the types of images to be interpolation [1], Basic-spline (bspline) [2], Mitchell [3],
processed. Single-image super-resolution (SISR) and multi- Apodization of Hanning, Lanczos [3], Bell filter and
image super-resolution (MISR) are two major categories Gaussian interpolation [4].
that can be used to categorize these approaches depending on The adaptive interpolation techniques come to
the input number of LR images, necessary for the cover the gaps and the limits of the conventional techniques
construction of the HR image. In our case, we generate HR which reside in their follow-up of the same model along the
resolution image from single LR image, so the first category whole image. These methods treat each part of image
of techniques (SISR) is more suitable for this problem, even differently, depending on the local variations . Among the
if it presents more difficulties and challenges. In the field of adaptive methods, we cite : NEDI [5], DDT [6], FCBI [7],
document scanning, there is a strong demand for higher ICBI [8].
resolution images with sufficient quality and text sharpness Reconstruction-based techniques are based on
to succeed in the automatic reading of document images of transformation of image, using mathematical operators
any type. Therefore, the search for a solution that meets this (masks, convolution, etc.) to find the high resolution image.
need presents challenges in the world of computer vision. It This category includes several techniques such as: Based on
is in fact a problem of image restoration or reconstruction to wavelets method [9,10], Expert Areas [11], Gradient Profile
achieve a higher resolution. In this PFE project, we propose a [12], Primary Sketches [13], Based on Contourlet
software solution to increase the resolution of images of
Transformation [14], Bilateral Filters used for edge
documents of any type. The main objective is to improve the
readability of text on these images and to reduce the preservation [15], methods based on a mixture of Gaussian
processing time. For this purpose, we have employed a bio- models [16], and methods SISR based on surveying
adjustment [10].

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

Authorized licensed use limited to: VIT University. Downloaded on January 20,2024 at 07:06:44 UTC from IEEE Xplore. Restrictions apply.
2

Techniques based on machine learning process ; this class

of super-resolution techniques makes advantage of training
techniques to predict the HR image, and use the dictionary
created from an image dataset. The methods proposed in
the literature are composed of two phases: Dictionary
construction phase (learning), in which a learning base is
created by generating the y sequence from the x sequence,
and super-resolution phase (exploitation of model resulting
from learning). The second phase allows the reconstruction
of a HR resolved image. It consists of extracting the patches
from the LR images and estimates their associated HR
patches using the learning base.
There are several approaches within this category Fig. 1. Generator and discriminator architecture of SRGAN model [28].
of methods: local learning approach based on nearest
neighbor search, sparse coding approaches that are based on The discriminator's goal is to distinguish
a sparse assumption, and artificial neural network (ANNs) between HR images and artificial SR images. Leaky ReLU
approaches [17]. For several years, ANNs were very was the activation function utilized in the discriminator
effective in super-resolution was presented in [18] [19], architecture.
where they used to handle the interpolation problem i.e. The authors of [29] targeted the low resolution of infrared
choosing the ideal pixels to interpolation. On natural images images, they designed the improved SRGAN super-
and images of text, authors in [20] have effectively used an resolution reconstruction algorithm. In the generative
ANNs to predict the residual errors of the HR images from network, the residual dense network application method to
LR images. In [21], Carcenac develops an ANNs for obtain the image features extracted from each network layer
face image SR. [22] presents a deep convolutional neural so as to retain more high-frequency information of the
network (CNN) divided into three layers. image, and the adoption of progressive oversampling
The authors of [23] proposed the model ESPCN method to improve the super-resolution reconstruction effect
(Real-time single image and video super-resolution) using under a large scale factor. They used the loss function of
an efficient sub-pixel CNN. This model has a light structure perception, which is more consistent with human senses, is
and widely evaluated on widely adopted datasets. The adopted to make the generated image closer to the real
authors of [24] present an extremely precise method of image with a high resolution of senses and content. The
super-resolution VDSR (Accurate image super-resolution paper [30] present a high-frequency feature fusion-based
using very deep convolutional networks). This method uses super-resolution GAN (HFF-SRGAN), which can improve
the VGG network. According to the results of this method , the texture detail of the reconstructed picture while lowering
it can be seen that the increase in the grating depth shows a noise and artifacts at a lower computational cost. According
significant improvement in the precision. The authors of to experimental findings, this method performs much better
[25] propose an extremely profound CNN model called than SRGAN and has a major edge over existing SR
DRRN (Image super-resolution via deep recursive residual algorithms in terms of texture details and noise.
network) which strives to create networks that are both deep
and concise. The technique presented in [26] is based on a III. PROPOSED METHOD: ADOPTION AND ADAPTATION
very deep persistent memory network MemNet (for image For our SR approach to images document, we adopt the
restoration) which introduces a memory block, composed of Enhanced Super-Resolution Generative Adversarial
a recursive unit and a door unit. A SR method for magnetic Network (ESRGAN) model in combination with a contrast
resonance imaging is presented in [27], it uses a very deep enhancement stage. ESRGAN is originally proposed to
residual model (VDR-net) in the training phase. By applying enhance quality of SRGAN's restored image. Most changes
the 2D Stationary Wavelet Transformation (SWT), we are made to the generator's structure(Fig. 2). SR [31] and
decompose each example of a (LR, HR) image pair into its blur removal [32] performance improvement and computing
low and high frequency sub-bands. complexity reduction by removing all BN layers. In place of
New deep neural network algorithms like the original base block, use the Residual-in-Residual Dense
Generative Adversarial Networks (GANs) have been started (RRDB) block, which combines a multi-level residual
to gain attention in super-resolution literature. SRGAN[28], network and density connections. Residual scaling [32, 33],
one of the early successful examples of GANs in image i.e. reduce the residuals by multiplying a constant between 0
super-resolution, optimized for a new perceptual loss. It and 1 before adding them to the main path to avoid
replaces the MSE-based content loss with a loss calculated instability of a smaller initialization.
on feature maps of the VGG network, which are more
invariant to changes in pixel space. Like GAN architectures,
SRGAN contains two parts Generator and Discriminator.
The generator architecture (Fig. 1) replaces deep
convolution networks with residual networks because they
used skip connections which simplified the train and
substantially deeper in order to generate better results.

Authorized licensed use limited to: VIT University. Downloaded on January 20,2024 at 07:06:44 UTC from IEEE Xplore. Restrictions apply.
3

Our super-resolution image using transfer learning

use ESRGAN's pre-trained network and train it on a
document image dataset by adopting the transfer learning
approach. We freeze all network layers except the last
encoder block and the first decoder block and pass our
database, so that the network specialization in super
resolution nature images is removed. This operation is
repeated by removing the gel from the upper layers
sequentially.
To train our mode we have followed the steps
Fig. 2. ESRGAN network architecture [31].
below:
• First we convert the images from the database to
More efficient perceptual loss is used by limiting
matrices to manipulate pixels without application of pre-
characteristics before activation rather than after activation.
treatment.
Johnson et al. [36] offer a perceptual loss that is expanded in
• Then we build from these matrices batches of 10 LR
[37] based on the being closer to perceptual similarity
and HR images, while maintaining the relationship between
[34,35].The loss of perception is previously defined on the
LR / HR images generated by dictionary.
activation layers of a pre-trained deep network, where the
• Network processing is applied to LR batches. The
distance between two activated entities is minimized. They
image result is compared to the HR batches to calculate the
propose using features before activation layers, which will
loss using the MSE loss function, and then this loss is used
fix two problems in initial architecture. The active of
to adjust the network weights by back propagation function.
features are extremely rarely, particularly after a very deep
Back propagation applies only to unfrozen layers.
network. Sparse activation provides poor supervision and
• Once the training is completed, we will have a
has lower performance. And, Utilizing characteristics after
model able to improve the resolution of the images.
activation yields luminosity reconstruction that is the same
• We apply contrast enhancement to the SR result
as the original image.
image with an existing function, this will allow to remove in
The network is pre-trained on the DIV2K dataset
some cases a slight blur on the edges of characters.
which contains 800 images and the Flickr2K base with 2650
We first selected the ideal architecture for our neural
images collected from the Flickr and OutdoorSceneTraining
netwerk as well as the best kind of images then
(OST) site. It is trained in RGB channels and with
we befor compare it to existing SR techniques using fine
augmented training data with random horizontal flips and 90
tuning. We examined 6 levels of fine tuning of the pre-
degree rotations.
trained model. In the first level we freeze all the network
The model is evaluated used benchmark datasets - Set5,
layers with a learning rate a = 0.0004.
Set14, BSD100, Urban100 and the PIRM self-validation
In the second we defreeze the last encoder part layer (tenth
dataset that is provided in the PIRM-SR challenge. All
network layer) and the first decoder part and we restart the
experiments are performed with a scale factor of 4 × 4
learning keeping the value of a=0.0004.
between the LR and HR images. LR images is obtained by
Then we unblock the next generator block and the previous
down-sampling HR images by the MATLAB bicubic kernel
encoder block keeping the learning rate. We repeat this
function. The mini-batch size is set to 16. The spatial size of
operation until the sixteenth network layer,
the cropped HR patch is 128 × 128.
We have found that the optimal architecture using ESRGAN
ESRGAN won first place in the PIRM2018-SR
generator and discriminator is when we unlock the last four
Challenge. It allows to preserve and restore the fine details
generator layers and the first four discriminator layers. The
of the image, this feature is very important to oppose the
next figure represent the state of our network for this level.
limitations and constraints of the images of scanned
documents during the resolution improvement operation,
allowing not to lose the document information after
applying the process.
We have adopted the ESRGAN model and adapted
it to document images (by transfer learning) because of its
interesting super resolution results already obtained on
natural images. We use the pre-trained network of this Fig. 3. Elaboration of different levels of fine-tuning (from top to bottom,
from level 2 to level 6), the layers framed in red are fine-tuned, the other
model, based on the Fine-tuning approach, to achieve a layers are frozen.
super resolution approach more suited to images of scanned
documents. IV. EVALUATION AND EXPERIMENTAL RESULTS
Transfer learning is a machine learning technique
A. The image dataset used to training our model
where a model created for one problem is utilized as the
foundation for a model for another problem. Pre-trained The selection of the training dataset is important in the
models are used as a beginning point for deep learning tasks design of a powerful network because a bad choice of this
in computer vision and natural language processing due to dataset considerably reduces the performance of the model.
the massive computational resources and time needed to Faced with the absence of SR datasets for deep learning on
develop ANNs for these problems . document images, we have created our new SR_VISION
dataset of document images. This dataset makes it possible

Authorized licensed use limited to: VIT University. Downloaded on January 20,2024 at 07:06:44 UTC from IEEE Xplore. Restrictions apply.
4

to test super-resolution deep learning approaches on images 2𝜇𝑥 𝜇𝑦 + 𝑐1 2𝜎𝑥 𝜎𝑦 + 𝑐2 𝑐𝑜𝑣𝑥𝑦 + 𝑐3

𝑆𝑆𝐼𝑀 𝑥, 𝑦 = (𝟑)
of scanned documents of all types: historical, administrative, 𝜇𝑥2 + 𝜇𝑦2 + 𝑐1 𝜎𝑥2 + 𝜎𝑦2 + 𝑐2 𝜎𝑥 𝜎𝑦 + 𝑐3
manuscripts, printed matter, degradation of all types (noise,
blur, poor lighting, light ink, physical degradation, etc.). It Where: μx the mean of x; μy the mean of y; σx2 the variance
presents challenges to assess the robustness of various of x; σy2 the variance of y; covxy the covariance of x and y.
methods and preserving the layout quality. All images are 𝑐2
𝑐1 = (𝑘1 𝐿)2 , 𝑐2 = (𝑘2 𝐿)2 𝑒𝑡 𝑐3 = 4
divided into two subfolders, TRAIN: 757 images and TEST: 2
757 images
The existing folders are:
- HR_GT_512×512: Contains the folder of HR images of C. SR Results
size 512×512, these images will be compared with those In the process of our system building, the network
resulting from super-resolution approaches to assess the has gone through different stages of learning where each
quality of the result (e.g. PSNR). stage is an improvement on the previous stage. The
- LR_256×256: Contains LR images of size 256×256, following figure presents results of certain levels that
- LR_128×128: Contains LR images of size 128×128. models.

Fig. 5. Example of results of different fine tuning levels of our model.

In the first level, we didn't notice any change in the

image quality, it blurs and contains unnecessary details
Fig. 4. Example of images of our SR_VISION dataset. (PSNR = 24.32 and SSIM = 0.698). At the second level, the
resulting images lose their color and excessive details are
B. Evaluation metrics used incompletely removed (PSNR = 23.30 and SSIM = 0.784).
The most used image evaluation metrics in the literature in In level 03, we notice that the images find their color with a
the context of super-resolution are PSNR and SSIM. But little blur remaining (PSNR =27.36 and SSIM = 0.829). In
there are other metrics such as MSE (which is part of the the other levels (4, 5 and 6), we notice a progressive
PSNR formula), RMSE as a function of MSE and DSSIM as improvement in the image quality and the continuous
a function of SSIM , according to [39] the SSIM and FSIM disappearance of the degradations with successive values of
metrics are comparatively better than the MSE and PSNR PSNR and SSIM (PSNR =27.26, 27.56, 27.73 and SSIM =
measurements from the human visual point of view. There 0.833, 0.835, 0.847).
are other metrics but not widely used . To estimate the efficiency of our model, we compared to
To evaluate our model we used three metrics: PSNR, MSE well-classic technique, (Fig. 6), and with other approaches
and SSIM. such as EDSR, SRGAN,ESRGAN, REAL ESRGAN and
𝑃𝑆𝑁𝑅 = 10𝑙𝑜𝑔10
𝑀𝑎𝑥𝑓 2
(𝟏)
SRCNN (Fig. 7).
𝑀𝑆𝐸

𝑚−1 𝑛−1
1 2
𝑀𝑆𝐸 = 𝐼 𝑖, 𝑗 − 𝐼′(𝑖, 𝑗) (𝟐)
𝑚𝑛
𝑖=0 𝑗 =0

where,
I(i,j) represents the pixel values of the reference
HR image. I'(i,j) represents the pixel values of the HR image
of obtained by the SR. m represents the number of rows of
pixels in the image and i the index of these rows. n
represents the number of columns of pixels in the image and
j the index of these columns. Max f is the maximum value of
the signal f that exists in the original image (given that the
original image is in good quality-).

Fig. 6. Comparative results of our approach with bilinear and bicubic.

Authorized licensed use limited to: VIT University. Downloaded on January 20,2024 at 07:06:44 UTC from IEEE Xplore. Restrictions apply.
5

From this comparison, we were able to conclude that our

system exceeds the limits of traditional technologies for
restoring images of scanned documents. Because most of
these methods leave a poor quality image which involves a
risk of losing information. We observe that the results of our
approach are very similar to the original picture comparing
with bicubic and bilinear interpolation methods.

Fig. 7. Comparative results of our approach with deep learning methods.

Image restoration using deep learning models

shows deterioration of the resulting images represented by
blurring (SRGAN, SRCNN) and emergence of unnecessary
details (EDSR, SRCNN, REAL ESRGAN[39]). We can
therefore conclude that our approach gives a good
restoration of the image while increasing the initial Fig. 8. Comparison of our proposed approach to the ten existing
approaches.
resolution so it offers better performance in term of image
quality compared to other methods. TABLE I. COMPARISON OF PSNR ,MSE AND SSIM AVERAGE OF OUR
These results show that the noise suppression rate PROPOSED APPROACH TO THE TEN EXISTING APPROACHES.
begins to decrease from level 6 with a very low percentage
Our Real
(0.3), and it can be neglected if we take into account the SRGAN SRCNN EDSR ESRGAN Methode ESRGAN
images from the last levels, but this does not mean that this
PSNR 28.11 25.75 28,09 26.40 29.36 20,23
rate will not increase if we add other levels. As for the rate
of similarity between the resulting image and the projected MSE 136.15 232.08 139.02 204.23 99.77 250,23
image, the curve of SSIM proves that the resulting image SSIM 0,775 0,726 0,778 0,723 0,815 0,623
gets closer and closer to the original image during training
and during the increase in the number of levels. The This study allows us to enrich our information on the
following figure shows the comparison of our proposed limits of conventional interpolation techniques to increase
approach to the ten existing approaches. resolution. They do not remove all the degradations present
at the level of the low resolution image, but sometimes these
methods themselves cause degradations. on the processed
images. which emerges from the PSNR curve. While the
SSIM ,MSE curves shows unsatisfactory similarity results
for these methods. All these results allow us to include that
our studied approach gives better results compared to these
conventional methods.
V. CONCLUSION
Deep learning-based methods have become the
center of attention of the scientific community in recent
times due to its amazing results compared to classical
methods, and this is what made us adopt it in our effort to
find a solution to the problem of image SR and restoration.
In this article we have presented our system for restoring
scanned document images using fine tuning and transfer
learning. The system has been tested on our SR_IVISION
dataset containing images of documents of all types. The
results of our system are very satisfactory in terms of
precision and response time and show the feasibility and
robustness of the approach employed. After comparing the
different super-resolution methods and examining these

Authorized licensed use limited to: VIT University. Downloaded on January 20,2024 at 07:06:44 UTC from IEEE Xplore. Restrictions apply.
6

limitations and vulnerabilities rigorously and objectively, [21] M. Carcenac, A modular neural network for super-resolution of
human faces, Ap-plied Intelligence, pp 168-186, 2007.
we can demonstrate the reliability and flexibility of our
[22] Dong, C.C. Loy, K. He and X. Tang, Learning a Deep Convolutional
approach to restoring images of all types. Network for Image Super-Resolution, European Conference on
Computer Vision (ECCV), 2014.
REFERENCES [23] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D.
Rueckert, and Z. Wang, Real-time single image and video super
[1] Prajapati, Evaluation of Different Image Interpolation Algorithms. resolution using an efficient sub-pixel convolutional neural network,
International Journal of Computer Applications, 7, November 2012. in IEEE Int. Conf. on Computer Vision and Pattern Recognition, pp.
[2] T. Acharya and P. Tsai, Computational Foundations of Image 1874–1883,2016.
Interpolation Algo-rithms,ACM Ubiquity 8, 2007. [24] J. Kim, J. Kwon Lee, and K. Mu Lee, Accurate image super-
[3] W. Burger and M. Burge , Digital Image Processing: An Algorithmic resolution using very deep convolutional networks, in IEEE
Introduction Using Java, Springer Science, New York, NY, 2009. Conference on Computer Vision and Pattern Recognition, pp. 1646–
1654, 2016.
[4] R. Appledorn, A new approach to the interpolation of sampled data,
in IEEE Trans-actions on Medical Imaging , 15( 3), pp. 369- [25] Y. Tai, J. Yang, and X. Liu, Image super-resolution via deep
376,1996. recursive residual net-work, in IEEE Conference on Computer Vision
and Pattern Recognition, pp. 3147–3155, 2017.
[5] Li X.and Orchard M. T.. New edge-directed interpolation, in IEEE
Transaction on Image Processing, pp. 1521–1527,2001. [26] Y. Tai, J. Yang, X. Liu and C. Xu, MemNet: A persistent memory
network for image restoration, in IEEE Conference on Computer
[6] Dan Su, P. W.Image Interpolation by Pixel Level Data Dependent
Vision and Pattern Recognition, pp. 4539–4547, 2017.
Trian-gulation. COMPUTER GRAPHICS Forum, 2002.
[27] G. Suryanarayana, K. Chandran, O. I. Khalaf, Y. Alotaibi, A.
[7] Reddy K.S.et.Reddy K.R.L, Enlargement of Image Based Upon
Alsufyani and S. A. Alghamdi, Accurate Magnetic Resonance Image
Interpolation Tech-niques, Department of Electronics and
Super-Resolution Using Deep Net-works and Gaussian Filtering in
Communication Engineering VITS, Karimna-gar India, Decembre
the Stationary Wavelet Domain, in IEEE Access, pp. 71406-71417,
2013.
2021.
[8] J.C. Gillette,T.M. Stadtmiller and Hardie R.C., Aliasing reduction in
[28] Ledig, L. Theis,F. Huszar and J. Caballero, Photo-Realistic Single
staring infrared imagers utilizing subpixel techniques, Optical
Image Super-Resolution Using a Generative Adversarial Network. In
Engineering, 34 31-37.
IEEE Conference on Com-puter Vision and Pattern, 2017.
[9] G. Anbarjafari and H. Demirel , Image Super Resolution Based on
[29] J. Lei, H. Xue, Sh. Yang, W. Shi, Sh. Zhang, and Y. Wu, HFF-
Interpolation of Wavelet Domain High Frequency Subbandsandthe
SRGAN: super-resolution generative adversarial network based on
Spatial Domain Input Image, 32pp. 390-394, 2010.
high-frequency feature fusion, Journal of Electronic Imaging, 2022.
[10] Z. Jianjun,Z. Cui , F.Donghao and Z. Jinghong, A New Method for
[30] H. Lei, W. Zugen, Ch. Tian, Z. Yongmei. An Improved SRGAN
Super resolution Image Reconstruction Based on Surveying
Infrared Image Super-Resolution Reconstruction Algorithm[J].
Adjustment, Journal of Nanomatrials, 2014.
Journal of System Simulation, pp. 2109-2118, 2021.
[11] S. Roth and M. Black: Fields of experts: a framework for learning
[31] B. Lim, S. Son, H. Kim, S. Nah, and K.M. Lee, Enhanced deep
image priors. in IEEE Int. Conf. on Computer Vision and Pattern
residual networks for single image super-resolution, in IEEE
Recognition, USA 2005.
Conference on Computer Vision and Pattern,2017.
[12] J. Sun, Z.B. Xx and H.Y. Shum, Image super-resolution using
[32] S. Nah, T.H. Kim,K.M. Lee, Deep multi-scale convolutional neural
gradient profile prior, in IEEE Int. Conf. on Computer Vision and
network for dynamic scene deblurring. in IEEE Conference on
Pattern Recognition, USA 2008.
Computer Vision and Pattern Recognition, pp. 3883-3891,2017.
[13] J. Sun, N. Zheng, H. Tao and H.Y. Shum.: Image hallucination with
[33] C. Szegedy, S. Ioffe, V. Vanhoucke, Inception-v4, inception-resnet
primal sketch priors, in IEEE Int. Conf. on Computer Vision and
and the impactof residual connections on learning. Thirty-First AAAI
Pattern Recognition, 729–736, 2003.
Conference on Artificial Intelli-gence , 2016.
[14] C.V. Jiji and S. Chaudhuri Single-frame image super-resolution
[34] L. Gatys, A.S. Ecker, M. Bethge, Texture synthesis using
through contourlet learning. EURASIP Journal Advanced Signal
convolutional neural net-works. Pattern Recognition Association of
Process, 2006.
South Africa and Robotics and Mecha-tronics International
[15] S. Dai,M. Han, Y. Wu and Y. Gong, Bilateral back-projection for Conference (PRASA-RobMech) pp. 1-6,2016.
single image super resolution, in IEEE Int. Conf. on Multimedia and
[35] J. Bruna, P. Sprechmann, Y. LeCun, Super-resolution with deep
Expo, pp. 1039–1042, 2007.
convolutional suffi-cient statistics, in International Conference on
[16] Y. Ogawa, Y. Ariki and T. Takiguchi, Super-resolution by Learning Representations, 2015.
GMMbased conversion using self-reduction image, in: IEEE Int.
[36] J. Johnson,A. Alahi, L. Fei-Fei, Perceptual Losses for Real-Time
Conf. on Acoustics, Speech and Signal Processing, , pp. 1285–1288,
Style Transfer and Super-Resolution. In Computer Vision and Pattern
2012.
Recognition, 2016.
[17] S. Hakyin, Neural network, a comprehensive foundation, Prentice-
[37] C. Ledig, L.Theis,F. Huszar, , J. Caballero, A. Cunningham, A.
Hall, 1994.
Acosta,A. Aitken, A. Tejani, J. Totz and Z. Wang, Z.: Photo-realistic
[18] F. Ahmed, S.C. Gustafson and M.A. Karim, High fidelity image single image super-resolution using a generative adversarial network,
interpolation using radial basis function neural networks, in IEEE in Computer Vision and Pattern Recognition, 2017.
National Aerospace and Electronics Conference, , pp. 588-592, 1995.
[38] U. Sara,M. Akter, and M. Uddin, (2019) Image Quality Assessment
[19] N. Plaziac. Image interpolation using neural networks, IEEE through FSIM, SSIM, MSE and PSNR—A Comparative Study.
Transactions on Image Processing, pp. 1647–1651, 1999. Journal of Computer and Communications, vol. 7, pp. 8-18, 2019.
[20] F. Pan and L. Zhang, New image super-resolution scheme based on [39] X. Wang, L. Xie, C. Dong, & Y. Shan, Real-esrgan: Training real-
residual error restoration by neural networks, Optical Engineering , world blind super-resolution with pure synthetic data. In Proceedings
pp. 3038-3046, 2003. of the IEEE/CVF International Conference on Computer Vision ,pp.
1905-1914, 2021.

Authorized licensed use limited to: VIT University. Downloaded on January 20,2024 at 07:06:44 UTC from IEEE Xplore. Restrictions apply.

Trackpad Pro Ver. 5.0 Class 6
From Everand
Trackpad Pro Ver. 5.0 Class 6
Nidhi Arora
No ratings yet
IoT Module 1
No ratings yet
IoT Module 1
29 pages
Enhanced Super-Resolution Using GAN
No ratings yet
Enhanced Super-Resolution Using GAN
6 pages
Image Super-Resolution The Techniques Applications
No ratings yet
Image Super-Resolution The Techniques Applications
20 pages
DL Unit-4
No ratings yet
DL Unit-4
26 pages
Computer Networks Prof. Sujoy Ghosh Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 9 Sonet/Sdh
No ratings yet
Computer Networks Prof. Sujoy Ghosh Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 9 Sonet/Sdh
38 pages
Image Super Resolution Report
No ratings yet
Image Super Resolution Report
12 pages
Image and Video Super-Resolution
No ratings yet
Image and Video Super-Resolution
62 pages
453 Deep CNN Based Blind Image Quality Predictor
No ratings yet
453 Deep CNN Based Blind Image Quality Predictor
75 pages
IoT Unit 5-1
No ratings yet
IoT Unit 5-1
30 pages
IPCV Unit 04
No ratings yet
IPCV Unit 04
12 pages
21CS735 Solutions
No ratings yet
21CS735 Solutions
24 pages
IOT Final
No ratings yet
IOT Final
71 pages
ML Lab Session 06 - VGG16-CNN
No ratings yet
ML Lab Session 06 - VGG16-CNN
15 pages
Research Article: Image Enhancement Method Based On Deep Learning
No ratings yet
Research Article: Image Enhancement Method Based On Deep Learning
9 pages
Image Processing and Machine Vision
No ratings yet
Image Processing and Machine Vision
2 pages
Module 1
100% (1)
Module 1
65 pages
Mesh Warping
No ratings yet
Mesh Warping
6 pages
F-IoT Unit-4
No ratings yet
F-IoT Unit-4
101 pages
Assignment 2
No ratings yet
Assignment 2
1 page
2 Convolutional Neural Network For Image Classification
No ratings yet
2 Convolutional Neural Network For Image Classification
6 pages
CV Lab Manual
No ratings yet
CV Lab Manual
45 pages
deep-learning-r18-jntuh-lab-manual
No ratings yet
deep-learning-r18-jntuh-lab-manual
20 pages
ECE 5th Sem Syllabus
0% (1)
ECE 5th Sem Syllabus
84 pages
UNIT_3 _DL
No ratings yet
UNIT_3 _DL
15 pages
Reconfigurable Hardware Design Approach For Economic Neural Network
No ratings yet
Reconfigurable Hardware Design Approach For Economic Neural Network
5 pages
Iiot Question Bank Main
No ratings yet
Iiot Question Bank Main
3 pages
RMK Group 21cs905 CV Unit 2
No ratings yet
RMK Group 21cs905 CV Unit 2
76 pages
Unit 3
No ratings yet
Unit 3
30 pages
6.0 Introduction To Real-Time Operating Systems (Rtos)
No ratings yet
6.0 Introduction To Real-Time Operating Systems (Rtos)
35 pages
Computer Vision Course
No ratings yet
Computer Vision Course
552 pages
Object Tracking in Crowd Environment Using Deep Learning
No ratings yet
Object Tracking in Crowd Environment Using Deep Learning
8 pages
Object Detection
No ratings yet
Object Detection
57 pages
Unit-I: Digital Image Fundamentals & Image Transforms
No ratings yet
Unit-I: Digital Image Fundamentals & Image Transforms
70 pages
Yolov 7
100% (1)
Yolov 7
17 pages
Frequency Domain Filtering Image Processing
100% (1)
Frequency Domain Filtering Image Processing
24 pages
Fractal Image Compression: Presented By: Sarika Rani EC3 Year
No ratings yet
Fractal Image Compression: Presented By: Sarika Rani EC3 Year
17 pages
Theoretical and Practical Analysis On CNN, MTCNN and Caps-Net Base Face Recognition and Detection PDF
No ratings yet
Theoretical and Practical Analysis On CNN, MTCNN and Caps-Net Base Face Recognition and Detection PDF
35 pages
22OCS02 IOT CONCEPTS AND APPLICATION QB (MECH,AUTO,CIVIL)
No ratings yet
22OCS02 IOT CONCEPTS AND APPLICATION QB (MECH,AUTO,CIVIL)
12 pages
Circle Generation Algorithm
No ratings yet
Circle Generation Algorithm
10 pages
Image Processing
No ratings yet
Image Processing
24 pages
Unit 3-I M - Combine
No ratings yet
Unit 3-I M - Combine
36 pages
Unit-2 IoT
100% (1)
Unit-2 IoT
45 pages
Image Processing
No ratings yet
Image Processing
27 pages
Cpu Vs Gpu
No ratings yet
Cpu Vs Gpu
12 pages
Week-2 Module-5 Image Characteristics and Different Resolutions in Remote Sensing
No ratings yet
Week-2 Module-5 Image Characteristics and Different Resolutions in Remote Sensing
23 pages
Artificial Intelligence - Based Multiopath Transmission Model For WSN Energy Efficiency
100% (1)
Artificial Intelligence - Based Multiopath Transmission Model For WSN Energy Efficiency
11 pages
Exploring Methods To Improve Edge Detection With Canny Algorithm
No ratings yet
Exploring Methods To Improve Edge Detection With Canny Algorithm
21 pages
CP16036
No ratings yet
CP16036
6 pages
Anubhav
No ratings yet
Anubhav
43 pages
TechKnowledge Image Processing U1-6 SPLIT
No ratings yet
TechKnowledge Image Processing U1-6 SPLIT
218 pages
A Convolutional Neural Network For Network Intrusion Detection System
No ratings yet
A Convolutional Neural Network For Network Intrusion Detection System
6 pages
ML Unit 1
No ratings yet
ML Unit 1
25 pages
FPGA Implementation of A Face Recognition System
No ratings yet
FPGA Implementation of A Face Recognition System
5 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
99 pages
Iot Unit1
No ratings yet
Iot Unit1
41 pages
Prototyping: Prototyping: Prototypes and Production - Open Source Versus Closed Source
0% (1)
Prototyping: Prototyping: Prototypes and Production - Open Source Versus Closed Source
14 pages
IOT Mod-4
No ratings yet
IOT Mod-4
42 pages
Object Detection Using YOLO: Challenges, Architectural Successors, Datasets and Applications
No ratings yet
Object Detection Using YOLO: Challenges, Architectural Successors, Datasets and Applications
33 pages
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
From Everand
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
Dr.Chandrakant
No ratings yet
Photonics 08 00431 v2 Compressed
No ratings yet
Photonics 08 00431 v2 Compressed
14 pages
Wang Real-ESRGAN Training Real-World Blind Super-Resolution With Pure Synthetic Data ICCVW 2021 Paper Compressed
No ratings yet
Wang Real-ESRGAN Training Real-World Blind Super-Resolution With Pure Synthetic Data ICCVW 2021 Paper Compressed
10 pages
Forests 14 02188 Compressed
No ratings yet
Forests 14 02188 Compressed
17 pages
Healthcare 11 00863 Compressed
No ratings yet
Healthcare 11 00863 Compressed
17 pages
Advanced Power System Anlaysis Syllabus
No ratings yet
Advanced Power System Anlaysis Syllabus
2 pages
Mohanlal Sukhadia University Udaipur (Rajasthan) : Date of Submission: 29/05/2020
No ratings yet
Mohanlal Sukhadia University Udaipur (Rajasthan) : Date of Submission: 29/05/2020
9 pages
Chapter 6 Review Packet
No ratings yet
Chapter 6 Review Packet
10 pages
Higher Order Hold Devices
No ratings yet
Higher Order Hold Devices
5 pages
Board of Advanced Studies and Research Cecos University of It and Emerging Sciences
No ratings yet
Board of Advanced Studies and Research Cecos University of It and Emerging Sciences
9 pages
Yr10 Cambridge Ch1C
No ratings yet
Yr10 Cambridge Ch1C
1 page
Muzero Poster Neurips 2019
No ratings yet
Muzero Poster Neurips 2019
1 page
BMS College of Engineering, Bangalore-560019: May 2016 Semester End Main Examinations
No ratings yet
BMS College of Engineering, Bangalore-560019: May 2016 Semester End Main Examinations
3 pages
TMA4215 Report (10056,10047,10023)
No ratings yet
TMA4215 Report (10056,10047,10023)
5 pages
UNetFormer - A Unified Vision Transformer Model and Pre-Training Framework For 3D Medical Image Segmentation - 2204.00631v2
No ratings yet
UNetFormer - A Unified Vision Transformer Model and Pre-Training Framework For 3D Medical Image Segmentation - 2204.00631v2
12 pages
T.Y.B.Sc. CS-357 Practicalslips
100% (3)
T.Y.B.Sc. CS-357 Practicalslips
25 pages
PGM 7
No ratings yet
PGM 7
3 pages
Yan Multiview Transformers For Video Recognition CVPR 2022 Paper
No ratings yet
Yan Multiview Transformers For Video Recognition CVPR 2022 Paper
11 pages
Btech Cs 7 Sem Data Compression Ecs077 2018
No ratings yet
Btech Cs 7 Sem Data Compression Ecs077 2018
2 pages
An Introduction To Sequential Monte Carlo: Nicolas Chopin Omiros Papaspiliopoulos
No ratings yet
An Introduction To Sequential Monte Carlo: Nicolas Chopin Omiros Papaspiliopoulos
390 pages
Linear Algebra
No ratings yet
Linear Algebra
65 pages
Optimal Plant Leaf Disease Detection Using SVM Classifier With Fuzzy System
No ratings yet
Optimal Plant Leaf Disease Detection Using SVM Classifier With Fuzzy System
12 pages
University of Manchester CS3291: Digital Signal Processing 2003/2004
No ratings yet
University of Manchester CS3291: Digital Signal Processing 2003/2004
11 pages
Modifed Hash
No ratings yet
Modifed Hash
42 pages
Px c 3891600
No ratings yet
Px c 3891600
7 pages
Fast Pattern Matching in Strings PDF
No ratings yet
Fast Pattern Matching in Strings PDF
2 pages
Chapter 3 - Reduction of Multiple Subsystems PDF
No ratings yet
Chapter 3 - Reduction of Multiple Subsystems PDF
28 pages
Topic: Operator Precedence Parsing (3 Sep)
No ratings yet
Topic: Operator Precedence Parsing (3 Sep)
3 pages
Quantitative Methods for Business 12th Edition Anderson Solutions Manual download
100% (1)
Quantitative Methods for Business 12th Edition Anderson Solutions Manual download
69 pages
Tutorial 8 - Comp352 WINTER 2018
No ratings yet
Tutorial 8 - Comp352 WINTER 2018
10 pages
Lecture 17 - Hyperplane Classifiers - SVM - Plain
No ratings yet
Lecture 17 - Hyperplane Classifiers - SVM - Plain
16 pages
A Neural Networks Approach For Portfolio
No ratings yet
A Neural Networks Approach For Portfolio
66 pages
Sortari 1 2023
No ratings yet
Sortari 1 2023
19 pages
C. A. Bouman: Digital Image Processing - January 12, 2015
No ratings yet
C. A. Bouman: Digital Image Processing - January 12, 2015
21 pages
Problem Set 7 Solutions
No ratings yet
Problem Set 7 Solutions
7 pages

Super-Resolution of Document Images Using Transfer Deep Learning of An ESRGAN Model

Uploaded by

Super-Resolution of Document Images Using Transfer Deep Learning of An ESRGAN Model

Uploaded by

Super-resolution of document images using transfer deep

learning of an ESRGAN model

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

Techniques based on machine learning process ; this class

Our super-resolution image using transfer learning

to test super-resolution deep learning approaches on images 2𝜇𝑥 𝜇𝑦 + 𝑐1 2𝜎𝑥 𝜎𝑦 + 𝑐2 𝑐𝑜𝑣𝑥𝑦 + 𝑐3

Fig. 5. Example of results of different fine tuning levels of our model.

In the first level, we didn't notice any change in the

Fig. 6. Comparative results of our approach with bilinear and bicubic.

From this comparison, we were able to conclude that our

Fig. 7. Comparative results of our approach with deep learning methods.

Image restoration using deep learning models

You might also like