0% found this document useful (0 votes)
43 views42 pages

Final Year Report

The document proposes a Lightweight Inception Non-Local Residual Dense Network for image super resolution. It discusses traditional and deep learning methods for super resolution and benchmarks various models. It then describes the proposed model which combines aspects like inception, non-local, residual and dense connections to enhance super resolution results.

Uploaded by

Lakshya Jalan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views42 pages

Final Year Report

The document proposes a Lightweight Inception Non-Local Residual Dense Network for image super resolution. It discusses traditional and deep learning methods for super resolution and benchmarks various models. It then describes the proposed model which combines aspects like inception, non-local, residual and dense connections to enhance super resolution results.

Uploaded by

Lakshya Jalan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Lightweight Inception Non-Local Residual Dense Network

for
Image Super Resolution

Submitted in partial fulfilment of the requirements


of the degree of
Bachelor of Technology (B.Tech)
by

Lakshya Jalan (207138)


Kishan Jha (207137)
Hritik Raushan (207130)

Under the esteemed guidance of:

Dr. U.S.N. Raju


Associate Professor

Department of Computer Science and Engineering


NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL
2020-2024
Acknowledgement

We would like to express our heartily gratitude towards Dr. U.S.N. Sir, Asso-
ciate Professor, CSE Department, for his valuable guidance, supervision, suggestions,
encouragement and the help throughout the semester and also for the completion of our
project work. He kept us going when we were down and gave us the courage to keep
moving forward.
We would like to take this opportunity once again to thank prof. Dr. R.Padmavathy,
Head of the Department, Computer Science and Engineering, NIT Warangal for giving
us this opportunity and resources to work on this project and supporting through out. We
also want to thank evaluation committee for their valuable suggestions on my proposals
and research and for conducting smooth presentation of the project.

Lakshya Jalan Kishan Jha Hritik Raushan


207138 207137 207130

Date:
NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL
2023-24

APPROVAL SHEET

The Project Work entitled Lightweight Inception Non-Local Residual Dense


Network for Image Super Resolution by
Lakshya Jalan (207138), Kishan Jha (207137), Hritik Raushan (207130),
is approved for the degree of Bachelor of Technology (B.Tech) in Computer Science
and Engineering.

Examiners

————————————–

————————————–

————————————–

Supervisor

————————————–

Dr. U.S.N. Raju


Associate Professor

Chairman

————————————–

Date: _________

Place: NIT, Warangal


Declaration

We declare that this written submission represents our ideas, my supervisor’s ideas in
our own words and where others’ ideas or words have been included, we have
adequately cited and referenced the original sources. We also declare that we have
adhered to all principles of academic honesty and integrity and have not misrepresented
or fabricated or falsified any idea/data/fact/source in our submission. We understand
that any violation of the above will be cause for disciplinary action by the Institute and
can also evoke penal action from the sources which have thus not been properly cited or
from whom proper permission has not been taken when needed.

Lakshya Jalan Kishan Jha Hritik Raushan


207138 207137 207130

Date:
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL
2023-24

Certificate

This is to certify that the Dissertation work entitled Lightweight Inception Non-Local
Residual Dense Network for Image Super Resolution is a bonafide record of work
carried out by Lakshya Jalan (207138), Kishan Jha (207137), Hritik Raushan
(207130), submitted to the Dr. U.S.N. Raju of Department of Computer Science and
Engineering, in partial fulfilment of the requirements for the award of the degree of
B.Tech at National Institute of Technology, Warangal during the 2023-2024.

(Signature)
(Signature)
Dr. R.Padmavathy
Dr. U.S.N. Raju
Professor
Associate Professor
Head of the Department
Department of Computer
Department of Computer
Science and Engineering
Science and Engineering
NIT Warangal
NIT Warangal
Abstract

Image super-resolution is a challenging problem in image processing that


aims to obtain a high resolution (HR) image output from its one or more low-
resolution (LR) counterparts. Various methods including recent powerful deep
learning algorithms have been applied to this problem and they have achieved
state-of-the-art performance in this domain which has a variety of real world
applications. This project focuses on first understanding the current accom-
plishments in this field of super-resolution and benchmark various classical and
modern methods against many evaluation metrics. It then focuses on enhancing
the results of the super-resolution techniques by combining various different
aspects of image processing that compliment each other. We have also tried
to highlight the several approaches that we took across the development of the
final model, these approaches can be used to make the future developers and
researchers aware of the pros and cons of each direction taken towards their
goal.

Keywords —- Image Super-Resolution, High Resolution Image, Low


Resolution Image
Contents

Declaration ii

Certificate iii

Abstract iv

1 Introduction 1
1.1 Image Super-resolution . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Advantages and Challenges . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Metrics of Comparison . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Related Work 8
2.1 Traditional methods of upscaling . . . . . . . . . . . . . . . . . 8
2.2 Deep Learning methods of Image Super-resolution . . . . . . . 12
2.3 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Methodology 18
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Implementing Evaluation Metrics . . . . . . . . . . . . . . . . 18
3.3 Models/methods implemented for study . . . . . . . . . . . . . 19
3.3.1 Interpolation methods . . . . . . . . . . . . . . . . . . 19
3.3.2 Deep learning methods used . . . . . . . . . . . . . . . 20
vi CONTENTS

3.4 Proposed Model - Lightweight Inception Non-Local Residual


Dense Network . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Experimentation and Discussion 25


4.1 Comparison Models . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 Results of testing . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4.1 Changes in Total and Trainable Parameters by imple-
menting Depthwise Separable Convolution (DSC):
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4.2 Comparing LINLRDN against different models and
methods:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5 Conclusion and Future Work 29

References 30
List of Figures

2.1 SRCNN Architecture . . . . . . . . . . . . . . . . . . . . . . . 12


2.2 Block diagram of a very-deep super-resolution (VDSR). VDSR
has 20 consecutive convolutional layers and one skip connection. 13
2.3 The architecture of residual dense network (RDN). . . . . . . . 13
2.4 Left: vanilla residual block. Middle WDSR-A: residual block
with wide activation. Right WDSR-B: residual block with
wider activation and linear low-rank convolution . . . . . . . . . 14
2.5 Cross-Scale Non-Local (CS-NL) attention module. The bottom
green box is for patch-level cross-scale similarity-matching.
The upper branch shows extracting the origi- nal HR patches
in LR image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 Illustration of Non Local-Fast Fourier Convolution (NL-FFC)
layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7 The network architecture of F2SRGAN, in which DSC is the
Depthwise Separable Convolution, and RFFC is the Revised
Fast Fourier Convolution. . . . . . . . . . . . . . . . . . . . . . 16
2.8 The overall network architecture of the proposed Fast Non-
Local Attention Network (FNLNET). . . . . . . . . . . . . . . 16

3.1 PSNR and SSIM values . . . . . . . . . . . . . . . . . . . . . . 19


3.2 Residual Dense Block . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Inception Module . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 DIRDN Architecture . . . . . . . . . . . . . . . . . . . . . . . 21
3.5 Inception Module with DSC Convolution . . . . . . . . . . . . 22
3.6 Non Local Module Architecture . . . . . . . . . . . . . . . . . 23
3.7 Proposed Architecture Of LINLRDN . . . . . . . . . . . . . . . 24
viii LIST OF FIGURES

4.1 Normal Convolution on DIRDN . . . . . . . . . . . . . . . . . 26


4.2 DSC on DIRDN . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 DSC on LINLRDN . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Example output of LINLRDN . . . . . . . . . . . . . . . . . . 28
List of Tables

2.1 Literature Survey on Image Super-Resolution . . . . . . . . . . 17

4.1 Comparison of Image Quality Metrics Across Different Inter-


polation and Super-Resolution Methods . . . . . . . . . . . . . 28
Chapter 1

Introduction

1.1 Image Super-resolution

Image super-resolution (SR) is a field in computer vision that enhances


the resolution of digital images. Unlike simple upscaling, super-resolution re-
constructs high-resolution images from low-resolution ones by inferring fine
details to improve image clarity. This technology is crucial in areas like satellite
imaging, medical imaging, and surveillance, where detailed visual information
is necessary.
With the rise of deep learning, new methods using neural networks
have significantly advanced super-resolution, often surpassing traditional tech-
niques like bicubic interpolation. Super-resolution can be categorized into
single-image and multi-image techniques, each employing different methods to
achieve high-quality upscaling. This area remains a dynamic field of research
with broad applications across various industries.

1.2 Advantages and Challenges

1.2.1 Advantages

Image super-resolution offers several compelling advantages, making it


a valuable technique across various fields. Firstly, it enhances image quality
by increasing resolution and adding finer details, which is crucial for applica-
tions like medical imaging where precision is vital for diagnosis. In the realm
of consumer electronics, super-resolution improves the viewing experience by
upscaling lower-resolution content to fit high-definition displays without losing
quality. Additionally, it extends the usability of legacy data in surveillance and
archival footage by enabling clearer visual interpretations. For remote sensing
2 Introduction

and satellite imagery, super-resolution allows for better analysis of environmen-


tal and geographical data from low-resolution images, supporting more accurate
and informed decision-making. These benefits underline the transformative im-
pact of super-resolution, driving its adoption in both professional and consumer
applications.

1.2.2 Challenges

Computational Intensity: Super-resolution, especially when imple-


mented using deep learning techniques, often requires substantial computa-
tional resources. Training deep learning models for super-resolution is resource-
intensive, involving large datasets and powerful GPUs.

Dependency on Training Data: The performance of deep learning-based


super-resolution models heavily depends on the diversity and quality of the
training data. Models might not generalize well to different types of images
or scenarios not well-represented in the training set, leading to poorer perfor-
mance on unseen data.

Overfitting: There is a risk that super-resolution models become overfit-


ted to the training data, especially if the training set is not sufficiently varied.
Overfitting results in models that perform exceptionally on training data but fail
to deliver similar results on real-world or diverse datasets.

1.3 Metrics of Comparison

We have considered a variety of benchmarking parameters to compare the


super resolution techniques which are used today. A lot of these are also lesser
known parameters which can give important insights in various use cases. We
have used Deep-Learning based, Reference and No-reference based parameters
as well. The parameters have been mentioned as follows:

1. Peak Signal To Noise Ratio : The Peak Signal-to-Noise Ratio (PSNR) is


a widely used metric to measure the quality of reconstructed images, espe-
cially in the field of image processing. It compares the similarity between
the original high-quality image and its compressed or degraded version.
The PSNR is most commonly expressed in terms of the logarithmic deci-
bel scale.
1.3 Metrics of Comparison 3

MAXI2
 
PSNR = 10 · log10 MSE

2. Weighted Peak Signal To Noise Ratio : The Weighted Peak Signal-to-


Noise Ratio (WPSNR) is a variant of the traditional PSNR that considers
the varying sensitivity of the human eye to different image regions. It mod-
ifies the basic PSNR formula by introducing a weighting function to the
mean squared error calculation, emphasizing regions that are more impor-
tant or noticeable to human viewers.
MAXI2
 
WPSNR = 10 · log10 W MSE

3. Structural Similarity Index Metric : The Structural Similarity Index


Measure (SSIM) is a widely recognized metric for assessing the percep-
tual quality of digital images and videos. SSIM evaluates image quality
by comparing three key components of an image: luminance, contrast, and
structure.
The Structural Similarity Index (SSIM) is defined as:
(2µx µy +C1 )(2σxy +C2 )
SSIM(x, y) =
(µx2 + µy2 +C1 )(σx2 + σy2 +C2 )

where µx and µy are the means of x and y, σx2 and σy2 are their variances,
and σxy is their covariance. Constants C1 and C2 are used to stabilize the
division with a weak denominator.

4. MS-SSIM : The Multi-Scale Structural Similarity Index (MS-SSIM) is an


advanced version of the widely used Structural Similarity Index Measure
(SSIM) that evaluates image quality across multiple scales. MS-SSIM was
developed to provide a more comprehensive assessment of image quality
by considering variations in image resolution and viewing conditions. This
approach recognizes that the perception of image quality can change de-
pending on scale, particularly when images are viewed at different sizes or
resolutions.
The multi-scale structural similarity (MS-SSIM) index is defined as:
M β j  γ j
αM
MS-SSIM(x, y) = [lM (x, y)] c
∏ j (x, y) s j (x, y)
j=1
4 Introduction

where lM is the luminance comparison at the coarsest scale, and c j and s j


are the contrast and structure comparisons at each scale j.

5. FSIM : The Feature Similarity Index Metric (FSIM) is an advanced im-


age quality assessment metric designed to evaluate the similarity between
two images based on their feature similarity. Unlike traditional quality met-
rics such as PSNR or SSIM, FSIM focuses on extracting low-level features
from images and assessing their similarity, which aligns closely with hu-
man visual perception.
The Feature Similarity Index Metric (FSIM) for comparing two images is
defined as:
∑x,y SL (x, y) · PC(X,Y )
FSIM(X,Y ) =
∑x,y PC(X,Y )
where PC(X,Y ) is the phase congruency between the two images, repre-
senting the significance of local features, and SL (x, y) is the local similarity
at each point, which includes assessments of phase congruency and gradi-
ent magnitude.

6. NQM : The Noise Quality Measure (NQM) is an image quality assessment


metric specifically designed to evaluate the impact of visual noise on per-
ceived image quality. NQM considers both the visibility of noise and the
preservation of important image features, making it particularly suitable for
applications where noise is an inherent issue, such as digital photography
and medical imaging.
The Noise Quality Measure (NQM) for evaluating the impact of noise on
image quality is defined as:
 
S(i)
NQM(X,Y ) = ∑
i N(i) +V (i)

where S(i) is the strength of the i-th image component, N(i) is the noise
level, and V (i) represents the visibility of noise, which varies based on the
local image content and viewing conditions.

7. IFC : The Information Fidelity Criterion (IFC) is an advanced metric used


for assessing the quality of images based on the fundamental concept of
information fidelity. Unlike traditional metrics that evaluate visual qual-
ity based on error visibility, IFC quantifies the mutual information between
1.3 Metrics of Comparison 5

the reference and distorted images, measuring how much information from
the reference image is preserved in the distorted version. This approach re-
flects a more detailed understanding of image degradation by considering
the preservation of information, making it particularly useful for applica-
tions like image compression and transmission.
The Information Fidelity Criterion (IFC) for evaluating the quality of im-
ages based on information preservation is defined as:
2
 
Sxy
1 + S2 S2
x y 
IFC(X,Y ) = ∑ log2 

N2

i 1 + N 2xyN 2
x y

where Sxy , Sx2 , Sy2 represent the signal-related terms and Nxy , Nx2 , Ny2 repre-
sent the noise-related terms of the images X and Y .

8. LPIPS : Learned Perceptual Image Patch Similarity (LPIPS) is a modern


metric used to evaluate image quality based on perceptual similarity, espe-
cially in the context of images generated by deep learning methods such
as those used in super-resolution, image synthesis, and image compression.
Developed to better reflect human visual perception, LPIPS uses deep fea-
tures from pre-trained neural networks to assess the perceptual differences
between images.
The Learned Perceptual Image Patch Similarity (LPIPS) metric for assess-
ing image quality based on perceptual differences is defined as:
1
LPIPS(X,Y ) = ∑ wl · ∑ ∥φl (X)hw − φl (Y )hw ∥2
l HlWl h,w

where φl (·) represents the feature map from layer l of a pre-trained net-
work, wl is the layer’s weight, and Hl , Wl are the dimensions of the feature
map at that layer.

9. NRQM : The No Reference Quality Metric (NRQM) is a specialized image


quality assessment (IQA) tool designed to evaluate the quality of an image
without the need for a reference image. This type of metric is particularly
valuable in situations where an original, undistorted image is unavailable
for comparison. NRQM assesses image quality based solely on the charac-
teristics of the image being analyzed, using models that predict perceived
image quality from inherent features that are indicative of typical distor-
tions or quality degradations.
6 Introduction

The No Reference Quality Metric (NRQM) for assessing image quality in


the absence of a reference image is conceptually defined as:
NRQM(I) = f (features(I))

where features(I) extracts various indicators of image quality from the im-
age I, and f is a function that evaluates these features to produce a quality
score.

10. NIQE : The Natural Image Quality Evaluator (NIQE) is a sophisticated no-
reference image quality assessment (NR-IQA) metric that operates without
the need for a distortion-specific reference image. NIQE is based on the
statistical regularities observed in natural, undistorted images, which are
often violated by the presence of distortions. The model quantifies the de-
gree of naturalness and predicts perceived image quality using a collection
of features based on natural scene statistics.
The Natural Image Quality Evaluator (NIQE) that assesses image quality
based on natural scene statistics without reference to a distorted image is
defined as: q
NIQE(I) = (f(I) − m)T C−1 (f(I) − m)
where f(I) are the features extracted from the image I, m is the mean vector,
and C is the covariance matrix derived from a collection of natural images.
This measure quantifies how much the image deviates from typical natural
image statistics.

11. PIQE : The Perception based Image Quality Evaluator (PIQE) is another
no-reference image quality assessment (NR-IQA) metric that quantifies the
perceptual quality of digital images without requiring a reference image.
PIQE is designed to assess visual quality based on perceptual factors that
are critical to human viewers, making it useful in contexts where subjective
image quality is more important than objective metrics.
The Perception based Image Quality Evaluator (PIQE) that assesses the
perceptual quality of images without reference is conceptually defined as:
PIQE(I) = g({p1 (I), p2 (I), . . . , pn (I)})

where pi (I) represents various perceptual quality factors measured in the


image I, and g is the function that aggregates these factors into a final qual-
ity score. This metric is designed to reflect the perceived visual quality
1.4 Objectives 7

based on significant human perceptual criteria.

1.4 Objectives

1. To understand the current state of art of image super resolution techniques.


This involves the study of several interpolation and deep learning based
Super-resolution techniques and their implementation from scratch.
2. To study, analyse and implement the various evaluation metrics for image
comparison and then select the most consistent parameters for further eval-
uations.
3. To benchmark these various models against the various parameters and tab-
ulate their results.
4. Incorporating new ideas and image processing concepts to the existing tech-
niques which give good performance to achieve a new standard model that
delivers the best results.
5. Detail discussion of the architecture and working of our proposed model
"Lightweight Inception Non-Local Residual Dense Network" and also
other areas explored before reaching to the final model.
6. To tabulate the results obtained from the various techniques and compare
them with our proposed technique.
Chapter 2

Related Work

2.1 Traditional methods of upscaling

Classical methods of image upscaling, also known as interpolation tech-


niques, are fundamental processes used to increase the resolution of digital
images. These methods are crucial in various applications, including image
editing, video upscaling, and real-time graphics rendering.
(a) Nearest Neighbor Interpolation: This is the simplest form of image
upscaling. It works by assigning to each pixel in the upscaled image the
value of the closest pixel from the original image. This method is very
fast but often results in a blocky, pixelated image, especially noticeable
in higher magnification levels.
Given an input image I with dimensions (w, h) and the desired output di-
mensions (W, H), the nearest neighbor interpolation formula for a pixel
(x′ , y′ ) in the output image can be expressed as:
j k  h 
′ ′ ′ ′w
I (x , y ) = I x , y′
W H
where:
• I(x, y) is the pixel value at coordinates (x, y) in the input image.
• I ′ (x′ , y′ ) is the pixel value at coordinates (x′ , y′ ) in the output image.
• ⌊·⌋ denotes the floor function, which maps a real number to the
largest previous integer.

(b) Bilinear Interpolation: Bilinear interpolation considers the closest


2x2 neighborhood of known pixel values surrounding an unknown
pixel. It then performs a linear interpolation first in one direction, and
then in the other to obtain the final pixel value. This method smooths
2.1 Traditional methods of upscaling 9

out the image more effectively than nearest neighbor interpolation but
can still introduce blurring, especially around edges.
Given an input image I with dimensions (w, h), and the desired output
dimensions (W, H), the bilinear interpolation for a pixel (x′ , y′ ) in the
output image can be expressed using the coordinates of the four nearest
pixels in the input image:
1
I ′ (x′ , y′ ) =

(x2 − x1 )(y2 − y1 )
I(x1 , y1 )(x2 − x′ )(y2 − y′ ) + I(x1 , y2 )(x2 − x′ )(y′ − y1 )
+ I(x2 , y1 )(x′ − x1 )(y2 − y′ ) + I(x2 , y2 )(x′ − x1 )(y′ − y1 )


where:
• (x1 , y1 ) and (x2 , y2 ) are the coordinates of the top-left and bottom-
right corners, respectively, of the pixel square surrounding the target
location (x′ , y′ ).
• x1 = ⌊x′ W
w
⌋, x2 = ⌈x′ W
w
⌉, y1 = ⌊y′ Hh ⌋, y2 = ⌈y′ Hh ⌉.
• I(x, y) denotes the pixel value at coordinates (x, y) in the input im-
age.
• ⌊·⌋ and ⌈·⌉ denote the floor and ceiling functions, respectively.

(c) Bicubic Interpolation: A more advanced technique, bicubic interpo-


lation, improves on bilinear by considering the closest 4x4 neighbor-
hood of pixels and using a cubic polynomial to interpolate new pixel
values. This method generally produces smoother edges and fewer arti-
facts than bilinear interpolation, making it well-suited for photographic
images where smooth gradations are important. The formula for bicu-
bic interpolation can be expressed in a generalized form as follows:

2 2
I ′ (x′ , y′ ) = ∑ ∑ I(xi , y j ) · fx (x′ − xi ) · fy (y′ − y j )
i=−1 j=−1

where:
• I(x, y) is the pixel value at coordinates (x, y) in the input image.
• xi = ⌊x′ Ww
⌋ + i − 1, and y j = ⌊y′ Hh ⌋ + j − 1 for i, j ∈ {−1, 0, 1, 2}.
• fx (t) and fy (t) are the bicubic kernel functions applied in the x and
y directions, respectively. These functions often take the form of a
cubic polynomial, the most common being:
10 Related Work


3 2
(1.5|t| − 2.5|t| + 1)
 for |t| < 1
f (t) = (−0.5|t|3 + 2.5|t|2 − 4|t| + 2) for 1 ≤ |t| < 2

0 otherwise

(d) Lanczos Resampling: Lanczos resampling uses a sinc function as its


interpolation kernel, which is then windowed by a sinc function (Lanc-
zos kernel). It considers a larger neighborhood of source pixels than
bicubic, often leading to better preservation of details with less blurring
and fewer ringing artifacts. Lanczos is particularly effective for images
with high contrast details and is favored for its ability to maintain sharp
edges.
The formula for Lanczos interpolation at a point (x′ , y′ ) in the output
image, based on the input image dimensions (w, h) and the desired out-
put dimensions (W, H), is given by:

a a
w h
I ′ (x′ , y′ ) = ∑ ∑ I(xi , y j ) · L(x′ − xi ) · L(y′ − y j )
i=−a j=−a W H

where:
• I(x, y) denotes the pixel value at coordinates (x, y) in the input im-
age.
• xi = ⌊x′ W
w
⌋ + i and y j = ⌊y′ Hh ⌋ + j for i, j ∈ {−a, ..., a}.
• L(t) represents the Lanczos function, defined as:
( sin(πt)·sin(πt/a)
π 2t 2 /a
if t ̸= 0
L(t) =
1 if t = 0
The parameter a is the size of the Lanczos kernel, typically a small in-
teger such as 2 or 3, which determines the number of pixels considered
on either side of the target pixel. This interpolation kernel uses the sinc
function, windowed by another sinc function to limit the size of the data
window to 2a + 1 by 2a + 1 around each output pixel, thus managing
computational complexity while enhancing accuracy.

(e) Quintic Interpolation: Quintic interpolation is an advanced technique


used in image processing, particularly for tasks like image upscaling
where higher resolution versions of an existing image are desired. It is
2.1 Traditional methods of upscaling 11

a type of polynomial interpolation that uses fifth-degree (quintic) poly-


nomials to interpolate the values of new pixels based on the values of
existing pixels around them.
The generic form of the quintic interpolation polynomial for estimating
a value y based on a known set of points and a target interpolant x can
be given by:

P(x) = a0 + a1 x + a2 x2 + a3 x3 + a4 x4 + a5 x5
where:
• P(x) is the polynomial function representing the interpolated value
at x.
• a0 , a1 , a2 , a3 , a4 , and a5 are the polynomial coefficients, which are
determined based on boundary conditions and the values and deriva-
tives at the known points.
To calculate the coefficients a0 , a1 , . . . , a5 , one typically uses conditions
derived from the function values and derivatives at key points. For ex-
ample, if interpolating between two points, values and derivatives up to
the second derivative at both ends might be used to set up a system of
equations to solve for these coefficients.
The detailed calculation of these coefficients generally involves solving
a system of linear equations derived from applying the interpolation
conditions to the polynomial form, ensuring that P(x) not only passes
through the given points but also meets any required derivative condi-
tions at those points.

(f) Septic Interpolation: Septic interpolation is an advanced image pro-


cessing technique used for tasks such as image upscaling, where the
goal is to create higher-resolution versions of an existing image. This
method involves the use of seventh-degree (septic) polynomials to per-
form interpolation, offering an even higher level of detail and smooth-
ness compared to lower-degree polynomial methods like quintic inter-
polation.
The general form of the septic interpolation polynomial for estimating
a value y at a given point x is expressed as:

P(x) = a0 + a1 x + a2 x2 + a3 x3 + a4 x4 + a5 x5 + a6 x6 + a7 x7
where:
12 Related Work

• P(x) is the polynomial function representing the interpolated value


at x.
• a0 , a1 , a2 , a3 , a4 , a5 , a6 , and a7 are the coefficients of the polynomial.
These coefficients are determined based on the known data points
and any boundary conditions or derivative requirements at those
points.
To compute these coefficients, conditions from the function values, and
potentially their derivatives at key interpolation points, are used. This
often involves solving a system of linear equations that are derived from
setting up the polynomial to meet all specified conditions, such as func-
tion values and derivatives up to certain orders at the endpoints of the
interpolation range.
Septic interpolation provides an extremely smooth transition between
data points, suitable for applications requiring high-degree continuity
and minimal visual or analytic discontinuities.

2.2 Deep Learning methods of Image Super-resolution

(a) SRCNN[1] : SRCNN [1], introduced by Dong et al. in 2014, was one
of the first deep learning approaches to tackle the super-resolution prob-
lem using convolutional neural networks (CNNs). This model demon-
strated that deep learning could be effectively applied to super-resolve
images, achieving superior results over traditional methods like bicubic
interpolation.

Figure 2.1: SRCNN Architecture

(b) VDSR[2] : VDSR[2] (Very Deep Super-Resolution) is a model de-


veloped to enhance the resolution of images using deep learning tech-
2.2 Deep Learning methods of Image Super-resolution 13

niques. Introduced in a paper by Jiwon Kim, Jung Kwon Lee, and Ky-
oung Mu Lee in 2016, VDSR leverages the power of very deep convo-
lutional neural networks (CNNs) to significantly improve the accuracy
of super-resolved images beyond what previous models had achieved.

Figure 2.2: Block diagram of a very-deep super-resolution (VDSR). VDSR has 20 consecutive convolutional layers and one
skip connection.

(c) RDN[3] : RDN[3] (The Residual Dense Network) is an advanced deep


learning model specifically designed to tackle the challenge of image
super-resolution (SR). Introduced by Yulun Zhang, Yapeng Tian, Yu
Kong, Bineng Zhong, and Yun Fu in their 2018 paper, "Residual Dense
Network for Image Super-Resolution," RDN represents a significant
evolution in the architecture of convolutional neural networks (CNNs)
used for SR tasks. The model is particularly noted for its ability to
effectively capture abundant local features through its unique design.

Figure 2.3: The architecture of residual dense network (RDN).

(d) WDSR[4] : The concept of "Wide Activation for Efficient and Accu-
rate Image Super-Resolution"[4] introduces an innovative approach to
designing convolutional neural networks (CNNs) for the task of image
super-resolution (SR). This approach primarily focuses on enhancing
the network architecture by broadening the activation layers rather than
deepening the network itself. The main objective is to create a balance
14 Related Work

between efficiency and accuracy, achieving high-quality upscaling re-


sults without excessively increasing computational demands.

Figure 2.4: Left: vanilla residual block. Middle WDSR-A: residual block with wide activation. Right WDSR-B: residual block
with wider activation and linear low-rank convolution

(e) CSNL-SEM[5] : "Image Super-Resolution with Cross-Scale Non-


Local Attention and Exhaustive Self-Exemplars Mining"[5]. This
mechanism addresses a common limitation in many super-resolution
techniques—limited receptive fields. Typical convolutional layers in
deep neural networks focus on local information, which might miss out
on the broader context needed for reconstructing high-quality images
from low-resolution inputs.
It also exploits the self-similarity property of natural images. Images
often contain repetitive structures and patterns. By mining these self-
exemplars (i.e., similar patches within the same image or across dif-
ferent scales), the model can better learn how to upscale novel patches
based on previously learned examples.

Figure 2.5: Cross-Scale Non-Local (CS-NL) attention module. The bottom green box is for patch-level cross-scale similarity-
matching. The upper branch shows extracting the origi- nal HR patches in LR image.

(f) NL-FFC[6] : The concept of "Non-Local Fast Fourier Convolution


for Image Super Resolution"[6] involves an innovative approach to en-
hance the performance of super-resolution (SR) techniques by integrat-
ing non-local operations and Fast Fourier Transform (FFT) convolu-
2.2 Deep Learning methods of Image Super-resolution 15

tions. This method addresses some of the common limitations in con-


ventional SR approaches, primarily by improving the ability to capture
long-range dependencies and increasing the efficiency of the convolu-
tion operations.

Figure 2.6: Illustration of Non Local-Fast Fourier Convolution (NL-FFC) layer.

(g) F2SRGAN[7] : "Fast and Flexible Super-Resolution Generative Ad-


versarial Network"[7] is an enhanced version of the standard Super-
Resolution Generative Adversarial Network (SRGAN)[8] designed to
offer improvements in both speed and flexibility for image super-
resolution tasks. F2SRGAN[7] aims to address some of the practical
challenges faced by earlier GAN-based super-resolution models, such
as computational inefficiency and limited adaptability to different up-
scaling factors or image conditions.
F2SRGAN incorporates optimizations in the network architecture to
speed up the training and inference processes. These optimizations
might include more efficient convolutional layers, reduced model com-
plexity, or improved training techniques that converge faster than tradi-
tional methods.
The model is designed to be lightweight, making it suitable for deploy-
ment in environments with limited computational resources, such as
mobile devices or embedded systems.
16 Related Work

Figure 2.7: The network architecture of F2SRGAN, in which DSC is the Depthwise Separable Convolution, and RFFC is the
Revised Fast Fourier Convolution.

(h) FNLNET[9] : The "Fast Non-Local Attention network for light super-
resolution"[9] (FNLN) is a specialized deep learning model that in-
tegrates the principles of non-local attention mechanisms within a
lightweight network architecture to enhance image super-resolution ef-
ficiently. This model is particularly designed to handle the compu-
tational and memory constraints of devices with limited processing
power, such as smartphones and embedded systems, while still deliver-
ing significant improvements in image quality.

Figure 2.8: The overall network architecture of the proposed Fast Non-Local Attention Network (FNLNET).
2.3 Literature Review 17

2.3 Literature Review

Table 2.1: Literature Survey on Image Super-Resolution

SNo. Title Authors Method Year


1 SRCNN[1] Chao Dong, Chen Change Loy et al. Super-Resolution using 2015
a series of CNNs
2 EDSR[10] Lim, Bee, Son et al. Enhanced Deep Resid- 2017
ual Networks
3 SRGAN[8] Ledig et al. Super-Resolution GAN 2017
4 RDN[3] Zhang et al. Residual Dense Net- 2018
work
5 VDSR[2] Kim et al Very Deep Super- 2016
Resolution
6 WDSR[4] Jiahui Yu, Yuchen Fan et al. Wider features before 2018
ReLU activation
7 CSNL-SEM[5] Yiqun Mei, Yuchen Fan et al. Cross-Scale Non-Local 2020
Attention and Exhaus-
tive Self-Exemplars
8 NL-FFC[6] Abhishek Kumar Sinha, S. Manthira Moorthi et al. Non-Local Fast Fourier 2022
Convolution for Image
Super Resolution
9 F2SRGAN [7] Duc Phuc Nguyen, Khanh Hung Vu et al. Lightweight approach to 2023
super-resolution
10 FNLNET[9] Jonghwan Hong, Bokyeung Lee et al. Fast Non-Local Atten- 2023
tion network for light
super-resolution
Chapter 3

Methodology

In this chapter, we are going to discuss the proposed model and the
various steps taken to develop the final architecture.

3.1 Overview

The proposed Lightweight Inception Non-Local Residual Dense Net-


work uses the base architecture of Residual Dense Blocks as used in the
RDN[3] model. It also integrates the Inception module used in DIRDN
and further tries to improve results by using integral image process-
ing concepts like non-local feature importance and computational effi-
ciency methodologies. The resultant model consistently gives a better
result when compared with standard evaluation metrics against previ-
ously implemented Deep Learning models for image super-resolution.

3.2 Implementing Evaluation Metrics

The study and implementation of different evaluation metrics was im-


portant to our research. We found 11 such metrics and a detailed de-
scription for each of them has been provided in the introduction section.
We deal with reference based and no reference based parameters. 7 of
the discussed parameters are Reference based parameters. We also look
at Deep Learning based Reference based parameter which is LPIPS. 3
further Deep Learning based with no reference parameters are also dis-
cussed.
Two of these metrics which are the most widely used as they are the
most consistent across all methodologies and datasets are :
i. Peak Signal To Noise Ratio (PSNR)
ii. Structural Similarity Index Metric (SSIM)
3.3 Models/methods implemented for study 19

Figure 3.1: PSNR and SSIM values

3.3 Models/methods implemented for study

The analysis of evaluation metrics was followed by the in-depth


study and implementation of a variety of methods and models of
image Super-resolution. These models cover the classical methods
as well as Deep Learning methods. Even in the latest developments,
there have been different approaches to image super resolution by
incorporating different approaches. Some of these methods right
from the classical ones are :

3.3.1 Interpolation methods

A brief explanation of each of these methods has been given in the


Related work (Chapter 2) section. These methods are deterministic
and do not utilize neural networks. The interpolation methods are as
follows :

i. Nearest Neighbour Interpolation


ii. Bilinear Interpolation
iii. Bicubic Interpolation
iv. Lanczos Interpolation
v. Quintic Interpolation
vi. Septic Interpolation
20 Methodology

Among these methods, Bicubic interpolation consistently has pro-


duced better results on the images. This is due to the fact that it is
able to maintain a strong balance between overfitting and underfit-
ting. Even the standard photo editing softwares like Adobe Photo-
shop still use Bicubic interpolation for standard scaling factors.

3.3.2 Deep learning methods used

Even though our exploration included reading , analysing and im-


plementing various deep learning models as few were mentioned in
chapter 2, here we will be discussing about the key techniques and
models which are used for the base of our research and subsequent
improvements which lead to our proposed work.

i. Deep Inception Residual Dense Network (DIRDN) :


As discussed in our Related Work, the Residual Dense Network
uses the Residual Dense Blocks ( RDBs ) which are the building
block for the network. DIRDN builds up on the same blocks but
here the RDBs are fully connected which means a RDB receives
input from all its’ previous blocks and hence it makes full use of
the original LR image’s hierarchical properties.

Figure 3.2: Residual Dense Block

Also an Inception block[11] is added after each RDB such that


instead of using only 3 x 3 kernel size, all the 1x1, 3x3 and 5x5
kernels are used simultaneously so that features at different spa-
tial scales with larger receptive fields can be captured, enabling
the network to learn more global and spatially distributed pat-
terns.
3.3 Models/methods implemented for study 21

Figure 3.3: Inception Module

Figure 3.4: DIRDN Architecture

ii. Inception Module with Depthwise Separable Convolution[12][11]


: Depthwise Separable Convolution[12] (DSC) is an advanced
technique implemented in architectures like MobileNets, aimed
at optimizing the efficiency of convolutional neural networks.
This method significantly cuts down on the computational de-
mands and the parameter count of models without greatly affect-
ing their accuracy. It accomplishes this by splitting a traditional
convolution into two distinct processes: depthwise convolution
and pointwise convolution. While a depthwise convolution is ba-
sically a convolution along only one spatial dimension of the im-
age, a pointwise convolutional layer is a regular convolutional
layer with a 1x1 kernel (hence looking at a single point across all
the channels).
22 Methodology

Figure 3.5: Inception Module with DSC Convolution

iii. Non Local Module[6][9][5]: In general architectures like we


saw, the larger receptive field is of- ten obtained by stacking more
convolutional layers. However, a vanilla convolutional layer only
provides local spatial information while missing global spatial in-
formation. To solve this problem of integrating both local and
global spatial information to SR networks and including global
information for better efficiency, we use non-local operations to
directly calculate the relationship between two locations, so as to
quickly capture the long-distance correlation and get more global
feature information in our results. Non-local operations can be
defined as:
Given:
• x: Input feature map
• i: Output feature position index
• j: Index of all possible positions
• x and y: Inputs and outputs of non-local operations
• f (xi , x j ): Similarity function between xi and x j
• g(x j ): Representation of feature map at position j

The non-local operation can be described as follows:

1
yi = f (xi , x j )g(x j )
c(x) ∑
∀j
where
f (xi , x j ) = exp(θ (xi )T φ (x j ))
and
c(x) = ∑ f (xi , x j )
∀j
3.4 Proposed Model - Lightweight Inception Non-Local Residual Dense Network 23

Here, yi represents the output feature at position i, computed


based on the input feature map x and the similarity between fea-
tures at different positions.

Figure 3.6: Non Local Module Architecture

3.4 Proposed Model - Lightweight Inception Non-Local Resid-


ual Dense Network

The LINLRDN architecture is a novel deep neural network ar-


chitecture designed for image super resolution tasks. It incorpo-
rates several advanced modules and techniques to improve per-
formance while maintaining computational efficiency. The pri-
mary goal is to extract rich multi-scale features, model long-
range dependencies, and effectively propagate and reuse features
throughout the network.

It uses the concept of the Inception module for hierarchical fea-


ture representation by combining features extracted from dif-
ferent scales, the Residual Dense Blocks (RDBs)[3] for resid-
ual learning and handling the vanishing gradient problem , DSC
Convolutions to achieve computational efficiency by reducing
the number of parameters and computations and the Non-Local
Block to capture global dependencies from the image by using
similarity metric.

As shown in the following diagram ,

1) The LR image first goes through two DSC[12] convolutional


layers which extract low-level features from the input LR image.

2) The extracted features are then processed by the consecutive


Residual Dense Blocks (RDBs)[3], where dense connections and
24 Methodology

Figure 3.7: Proposed Architecture Of LINLRDN

residual learning are employed for effective feature propagation


and reuse. Our architecture used 7 RDBs where all RDBs get in-
puts from the output of all the previous RDBs. The output of all
the RDBs are concatenated at the end.

3) The outputs from each RDBs are passed through the respec-
tive Inception modules as well to capture multi-scale features.
The features are processed by the ConCat module, which con-
catenates the features.

4) The concatenated output from the inception modules is passed


through the Non-Local Block, which captures long-range depen-
dencies and models the global context within the features.This is
concatenated with the RDB concatenated output.

5) After this, the features are processed by another set of DSConv


layers and the UPSAMPLE module, which is responsible for up-
sampling the features to generate the high-resolution (HR) output
image. The model also has an elementwise addition of the con-
catenated output with the initial feature maps before it goes to the
upsampling function.
Chapter 4

Experimentation and Discussion

4.1 Comparison Models

A. Nearest-Neighbour Interpolation
B. Bilinear Interpolation
C. Bicubic Interpolation
D. Lanczos Interpolation
E. Super-Resolution Convolution Neural Network (SRCNN)[1]
F. Very Deep Super Resolution (VDSR)[2]
G. Residual Dense Network (RDN)[3]
H. Deep Inception Residual Dense Network (DIRDN)

4.2 Platform

The Deep Learning models were implemented using the Python


programming language, which includes a variety of libraries.
These models were executed on the Google Colab IDE and
Jupyter Notebook.
We tested the Models on Google Colab IDE with the following
system configurations :

Intel Xeon CPU @2.20 GHz, 13 GB RAM and 12 GB GDDR5


VRAM.

The classical methods of interpolation along with SRCNN and


VDSR were trained and tested locally as they are not very re-
source extensive. The local system specs are :

Processor Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz, 1801


Mhz, 4 Core(s), 8 Logical Processor(s)
26 Experimentation and Discussion

Installed Physical Memory (RAM) 8.00 GB


GPU Nvidia Geforce MX110

4.3 Dataset

The classical methods of interpolation are deterministic and do


not require to be trained. The other machine learning models
were trained using the DIV2K dataset. The testing was per-
formed on some images from the DIV2K dataset and the results
were aggregated,

4.4 Results of testing

In this section we will show the results first based on the memory
and number of parameters that had to be trained by implement-
ing normal convolution and by using Depthwise Separable Con-
volution. Then we will compare the various models and methods
using the evaluation metrics that we had previously defined.

4.4.1 Changes in Total and Trainable Parameters by imple-


menting Depthwise Separable Convolution (DSC):

These are the results of the initial DIRDN model compilation


without using DSC :

Figure 4.1: Normal Convolution on DIRDN

Depthwise Separable Convolutions give us a significantly lesser


model size. We achieve 7 times lesser total prarameters count
which is very computationally efficient as well. For example
compared to EDSR which has a 42M parameter size, we achieve
10M with DSC.

The results of DIRDN model compilation using DSC :


4.4 Results of testing 27

Figure 4.2: DSC on DIRDN

The results of LINLRDN model compilation using DSC :

Figure 4.3: DSC on LINLRDN

Inference : Our proposal to use DSC instead of normal con-


volution significantly reduces the total and trainable parameters.
This greatly improves efficiency and computational overheads.
28 Experimentation and Discussion

4.4.2 Comparing LINLRDN against different models and


methods:

Table 4.1: Comparison of Image Quality Metrics Across Different Interpolation and Super-Resolution Methods

Method PSNR WPSNR SSIM MS-SSIM FSIM NQM IFC LPIPS


Nearest-neighbour 30.535 30.535 0.797 0.976 0.709 29.4 0.986 0.433
Bilinear 29.991 29.992 0.818 0.966 0.69 28.86 0.984 0.326
Bicubic 24.98 24.99 0.78 0.965 0.669 23.86 0.954 0.446
Biquintic 27.042 27.05 0.812 0.963 0.69 25.91 0.98 0.3197
BiSeptic 31.55 31.55 0.8283 0.974 0.711 30.42 0.989 0.3165
Lanczos 31.73 31.73 0.833 0.975 0.715 30.61 0.98 0.317
SRCNN 20.69 20.67 0.503 0.893 0.6 19.168 0.884 0.3777
VDSR 20.456 20.432 0.525 0.898 0.613 18.933 0.879 0.342
RDN 29.97 29.991 0.823 0.984 0.765 28.448 0.985 0.194
DIRDN 30.031 29.97 0.823 0.985 0.762 28.5 0.985 0.206
LINLRDN (ours) 30.656 30.59 0.971 0.988 0.795 30.6 0.992 0.090

Inferences : Based on the metrics, we can see that our proposed


model LINLRDN gives the best results.

Figure 4.4: Example output of LINLRDN


Chapter 5

Conclusion and Future Work

In this Project, we implemented the classical models of


image super resolution from scratch. The models are Nearest
Neighbour, Bilinear interpolation, Bicubic interpolation, Quintic
interpolation, Septic interpolation and Lanczos interpolation.

We also implemented various deep learning models from


scratch. The models are : SRCNN[1], VDSR[2], RDN[3],
DIRDN.

We implemented the evaluation metrics to compare the dif-


ferent models. The metrics are : PSNR, WPSNR, SSIM, MS-
SSIM, FSIM, IFC, LPIPS, NQM, NIQE, PIQE, NRQM.

We then implemented LINLRDN and showed that it gives


better results than DIRDN and RDN[3]. However, training such
ML models is very resource extensive and if we want to apply
it to real world scenarios, the training dataset has to be large
enough. The training step thus takes a lot of time, this is where
distributed deep learning can be applied train a model across dif-
ferent systems in parallel. Moreover, even though our results
our good, they can be better. There are new techniques that are
emerging everyday.
Bibliography

[1] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-


resolution using deep convolutional networks,” 2015. 6a,
2.1, 6(h)iiE, 5
[2] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-
resolution using very deep convolutional networks,” 2016.
6b, 2.1, 6(h)iiF, 5
[3] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual
dense network for image super-resolution,” 2018. 6c, 2.1,
3.1, 3.4, 3.4, 6(h)iiG, 5
[4] J. Yu, Y. Fan, J. Yang, N. Xu, Z. Wang, X. Wang, and
T. Huang, “Wide activation for efficient and accurate image
super-resolution,” 2018. 6d, 2.1
[5] Y. Mei, Y. Fan, Y. Zhou, L. Huang, T. S. Huang, and H. Shi,
“Image super-resolution with cross-scale non-local attention
and exhaustive self-exemplars mining,” 2020. 6e, 2.1, iii.
[6] A. K. Sinha, S. Manthira Moorthi, and D. Dhar, “Nl-ffc:
Non-local fast fourier convolution for image super resolu-
tion,” in 2022 IEEE/CVF Conference on Computer Vision
and Pattern Recognition Workshops (CVPRW), pp. 466–
475, 2022. 6f, 2.1, iii.
[7] D. P. Nguyen, K. H. Vu, D. D. Nguyen, and H.-A. Pham,
“F2srgan: A lightweight approach boosting perceptual qual-
ity in single image super-resolution via a revised fast fourier
convolution,” IEEE Access, vol. 11, pp. 29062–29073,
2023. 6g, 2.1
[8] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunning-
ham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and
W. Shi, “Photo-realistic single image super-resolution using
a generative adversarial network,” 2017. 6g, 2.1
BIBLIOGRAPHY 31

[9] J. Hong, B. Lee, K. Ko, and H. Ko, “Fast non-local attention


network for light super-resolution,” J. Vis. Commun. Image
Represent., vol. 95, p. 103861, 2023. 6h, 2.1, iii.
[10] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced
deep residual networks for single image super-resolution,”
2017. 2.1
[11] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,
D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich,
“Going deeper with convolutions,” 2014. i., ii.
[12] F. Chollet, “Xception: Deep learning with depthwise sepa-
rable convolutions,” 2017. ii., 3.4

You might also like