Final Year Report
Final Year Report
for
Image Super Resolution
We would like to express our heartily gratitude towards Dr. U.S.N. Sir, Asso-
ciate Professor, CSE Department, for his valuable guidance, supervision, suggestions,
encouragement and the help throughout the semester and also for the completion of our
project work. He kept us going when we were down and gave us the courage to keep
moving forward.
We would like to take this opportunity once again to thank prof. Dr. R.Padmavathy,
Head of the Department, Computer Science and Engineering, NIT Warangal for giving
us this opportunity and resources to work on this project and supporting through out. We
also want to thank evaluation committee for their valuable suggestions on my proposals
and research and for conducting smooth presentation of the project.
Date:
NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL
2023-24
APPROVAL SHEET
Examiners
————————————–
————————————–
————————————–
Supervisor
————————————–
Chairman
————————————–
Date: _________
We declare that this written submission represents our ideas, my supervisor’s ideas in
our own words and where others’ ideas or words have been included, we have
adequately cited and referenced the original sources. We also declare that we have
adhered to all principles of academic honesty and integrity and have not misrepresented
or fabricated or falsified any idea/data/fact/source in our submission. We understand
that any violation of the above will be cause for disciplinary action by the Institute and
can also evoke penal action from the sources which have thus not been properly cited or
from whom proper permission has not been taken when needed.
Date:
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL
2023-24
Certificate
This is to certify that the Dissertation work entitled Lightweight Inception Non-Local
Residual Dense Network for Image Super Resolution is a bonafide record of work
carried out by Lakshya Jalan (207138), Kishan Jha (207137), Hritik Raushan
(207130), submitted to the Dr. U.S.N. Raju of Department of Computer Science and
Engineering, in partial fulfilment of the requirements for the award of the degree of
B.Tech at National Institute of Technology, Warangal during the 2023-2024.
(Signature)
(Signature)
Dr. R.Padmavathy
Dr. U.S.N. Raju
Professor
Associate Professor
Head of the Department
Department of Computer
Department of Computer
Science and Engineering
Science and Engineering
NIT Warangal
NIT Warangal
Abstract
Declaration ii
Certificate iii
Abstract iv
1 Introduction 1
1.1 Image Super-resolution . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Advantages and Challenges . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Metrics of Comparison . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Related Work 8
2.1 Traditional methods of upscaling . . . . . . . . . . . . . . . . . 8
2.2 Deep Learning methods of Image Super-resolution . . . . . . . 12
2.3 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Methodology 18
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Implementing Evaluation Metrics . . . . . . . . . . . . . . . . 18
3.3 Models/methods implemented for study . . . . . . . . . . . . . 19
3.3.1 Interpolation methods . . . . . . . . . . . . . . . . . . 19
3.3.2 Deep learning methods used . . . . . . . . . . . . . . . 20
vi CONTENTS
References 30
List of Figures
Introduction
1.2.1 Advantages
1.2.2 Challenges
MAXI2
PSNR = 10 · log10 MSE
where µx and µy are the means of x and y, σx2 and σy2 are their variances,
and σxy is their covariance. Constants C1 and C2 are used to stabilize the
division with a weak denominator.
where S(i) is the strength of the i-th image component, N(i) is the noise
level, and V (i) represents the visibility of noise, which varies based on the
local image content and viewing conditions.
the reference and distorted images, measuring how much information from
the reference image is preserved in the distorted version. This approach re-
flects a more detailed understanding of image degradation by considering
the preservation of information, making it particularly useful for applica-
tions like image compression and transmission.
The Information Fidelity Criterion (IFC) for evaluating the quality of im-
ages based on information preservation is defined as:
2
Sxy
1 + S2 S2
x y
IFC(X,Y ) = ∑ log2
N2
i 1 + N 2xyN 2
x y
where Sxy , Sx2 , Sy2 represent the signal-related terms and Nxy , Nx2 , Ny2 repre-
sent the noise-related terms of the images X and Y .
where φl (·) represents the feature map from layer l of a pre-trained net-
work, wl is the layer’s weight, and Hl , Wl are the dimensions of the feature
map at that layer.
where features(I) extracts various indicators of image quality from the im-
age I, and f is a function that evaluates these features to produce a quality
score.
10. NIQE : The Natural Image Quality Evaluator (NIQE) is a sophisticated no-
reference image quality assessment (NR-IQA) metric that operates without
the need for a distortion-specific reference image. NIQE is based on the
statistical regularities observed in natural, undistorted images, which are
often violated by the presence of distortions. The model quantifies the de-
gree of naturalness and predicts perceived image quality using a collection
of features based on natural scene statistics.
The Natural Image Quality Evaluator (NIQE) that assesses image quality
based on natural scene statistics without reference to a distorted image is
defined as: q
NIQE(I) = (f(I) − m)T C−1 (f(I) − m)
where f(I) are the features extracted from the image I, m is the mean vector,
and C is the covariance matrix derived from a collection of natural images.
This measure quantifies how much the image deviates from typical natural
image statistics.
11. PIQE : The Perception based Image Quality Evaluator (PIQE) is another
no-reference image quality assessment (NR-IQA) metric that quantifies the
perceptual quality of digital images without requiring a reference image.
PIQE is designed to assess visual quality based on perceptual factors that
are critical to human viewers, making it useful in contexts where subjective
image quality is more important than objective metrics.
The Perception based Image Quality Evaluator (PIQE) that assesses the
perceptual quality of images without reference is conceptually defined as:
PIQE(I) = g({p1 (I), p2 (I), . . . , pn (I)})
1.4 Objectives
Related Work
out the image more effectively than nearest neighbor interpolation but
can still introduce blurring, especially around edges.
Given an input image I with dimensions (w, h), and the desired output
dimensions (W, H), the bilinear interpolation for a pixel (x′ , y′ ) in the
output image can be expressed using the coordinates of the four nearest
pixels in the input image:
1
I ′ (x′ , y′ ) =
(x2 − x1 )(y2 − y1 )
I(x1 , y1 )(x2 − x′ )(y2 − y′ ) + I(x1 , y2 )(x2 − x′ )(y′ − y1 )
+ I(x2 , y1 )(x′ − x1 )(y2 − y′ ) + I(x2 , y2 )(x′ − x1 )(y′ − y1 )
where:
• (x1 , y1 ) and (x2 , y2 ) are the coordinates of the top-left and bottom-
right corners, respectively, of the pixel square surrounding the target
location (x′ , y′ ).
• x1 = ⌊x′ W
w
⌋, x2 = ⌈x′ W
w
⌉, y1 = ⌊y′ Hh ⌋, y2 = ⌈y′ Hh ⌉.
• I(x, y) denotes the pixel value at coordinates (x, y) in the input im-
age.
• ⌊·⌋ and ⌈·⌉ denote the floor and ceiling functions, respectively.
2 2
I ′ (x′ , y′ ) = ∑ ∑ I(xi , y j ) · fx (x′ − xi ) · fy (y′ − y j )
i=−1 j=−1
where:
• I(x, y) is the pixel value at coordinates (x, y) in the input image.
• xi = ⌊x′ Ww
⌋ + i − 1, and y j = ⌊y′ Hh ⌋ + j − 1 for i, j ∈ {−1, 0, 1, 2}.
• fx (t) and fy (t) are the bicubic kernel functions applied in the x and
y directions, respectively. These functions often take the form of a
cubic polynomial, the most common being:
10 Related Work
3 2
(1.5|t| − 2.5|t| + 1)
for |t| < 1
f (t) = (−0.5|t|3 + 2.5|t|2 − 4|t| + 2) for 1 ≤ |t| < 2
0 otherwise
a a
w h
I ′ (x′ , y′ ) = ∑ ∑ I(xi , y j ) · L(x′ − xi ) · L(y′ − y j )
i=−a j=−a W H
where:
• I(x, y) denotes the pixel value at coordinates (x, y) in the input im-
age.
• xi = ⌊x′ W
w
⌋ + i and y j = ⌊y′ Hh ⌋ + j for i, j ∈ {−a, ..., a}.
• L(t) represents the Lanczos function, defined as:
( sin(πt)·sin(πt/a)
π 2t 2 /a
if t ̸= 0
L(t) =
1 if t = 0
The parameter a is the size of the Lanczos kernel, typically a small in-
teger such as 2 or 3, which determines the number of pixels considered
on either side of the target pixel. This interpolation kernel uses the sinc
function, windowed by another sinc function to limit the size of the data
window to 2a + 1 by 2a + 1 around each output pixel, thus managing
computational complexity while enhancing accuracy.
P(x) = a0 + a1 x + a2 x2 + a3 x3 + a4 x4 + a5 x5
where:
• P(x) is the polynomial function representing the interpolated value
at x.
• a0 , a1 , a2 , a3 , a4 , and a5 are the polynomial coefficients, which are
determined based on boundary conditions and the values and deriva-
tives at the known points.
To calculate the coefficients a0 , a1 , . . . , a5 , one typically uses conditions
derived from the function values and derivatives at key points. For ex-
ample, if interpolating between two points, values and derivatives up to
the second derivative at both ends might be used to set up a system of
equations to solve for these coefficients.
The detailed calculation of these coefficients generally involves solving
a system of linear equations derived from applying the interpolation
conditions to the polynomial form, ensuring that P(x) not only passes
through the given points but also meets any required derivative condi-
tions at those points.
P(x) = a0 + a1 x + a2 x2 + a3 x3 + a4 x4 + a5 x5 + a6 x6 + a7 x7
where:
12 Related Work
(a) SRCNN[1] : SRCNN [1], introduced by Dong et al. in 2014, was one
of the first deep learning approaches to tackle the super-resolution prob-
lem using convolutional neural networks (CNNs). This model demon-
strated that deep learning could be effectively applied to super-resolve
images, achieving superior results over traditional methods like bicubic
interpolation.
niques. Introduced in a paper by Jiwon Kim, Jung Kwon Lee, and Ky-
oung Mu Lee in 2016, VDSR leverages the power of very deep convo-
lutional neural networks (CNNs) to significantly improve the accuracy
of super-resolved images beyond what previous models had achieved.
Figure 2.2: Block diagram of a very-deep super-resolution (VDSR). VDSR has 20 consecutive convolutional layers and one
skip connection.
(d) WDSR[4] : The concept of "Wide Activation for Efficient and Accu-
rate Image Super-Resolution"[4] introduces an innovative approach to
designing convolutional neural networks (CNNs) for the task of image
super-resolution (SR). This approach primarily focuses on enhancing
the network architecture by broadening the activation layers rather than
deepening the network itself. The main objective is to create a balance
14 Related Work
Figure 2.4: Left: vanilla residual block. Middle WDSR-A: residual block with wide activation. Right WDSR-B: residual block
with wider activation and linear low-rank convolution
Figure 2.5: Cross-Scale Non-Local (CS-NL) attention module. The bottom green box is for patch-level cross-scale similarity-
matching. The upper branch shows extracting the origi- nal HR patches in LR image.
Figure 2.7: The network architecture of F2SRGAN, in which DSC is the Depthwise Separable Convolution, and RFFC is the
Revised Fast Fourier Convolution.
(h) FNLNET[9] : The "Fast Non-Local Attention network for light super-
resolution"[9] (FNLN) is a specialized deep learning model that in-
tegrates the principles of non-local attention mechanisms within a
lightweight network architecture to enhance image super-resolution ef-
ficiently. This model is particularly designed to handle the compu-
tational and memory constraints of devices with limited processing
power, such as smartphones and embedded systems, while still deliver-
ing significant improvements in image quality.
Figure 2.8: The overall network architecture of the proposed Fast Non-Local Attention Network (FNLNET).
2.3 Literature Review 17
Methodology
In this chapter, we are going to discuss the proposed model and the
various steps taken to develop the final architecture.
3.1 Overview
1
yi = f (xi , x j )g(x j )
c(x) ∑
∀j
where
f (xi , x j ) = exp(θ (xi )T φ (x j ))
and
c(x) = ∑ f (xi , x j )
∀j
3.4 Proposed Model - Lightweight Inception Non-Local Residual Dense Network 23
3) The outputs from each RDBs are passed through the respec-
tive Inception modules as well to capture multi-scale features.
The features are processed by the ConCat module, which con-
catenates the features.
A. Nearest-Neighbour Interpolation
B. Bilinear Interpolation
C. Bicubic Interpolation
D. Lanczos Interpolation
E. Super-Resolution Convolution Neural Network (SRCNN)[1]
F. Very Deep Super Resolution (VDSR)[2]
G. Residual Dense Network (RDN)[3]
H. Deep Inception Residual Dense Network (DIRDN)
4.2 Platform
4.3 Dataset
In this section we will show the results first based on the memory
and number of parameters that had to be trained by implement-
ing normal convolution and by using Depthwise Separable Con-
volution. Then we will compare the various models and methods
using the evaluation metrics that we had previously defined.
Table 4.1: Comparison of Image Quality Metrics Across Different Interpolation and Super-Resolution Methods