Project Report Final
Project Report Final
CERTIFICATE
Certified that the project work entitled “Facial Recognition On Low Resolution Images” carried out
by Ms. Muteeba Shoukat, USN 1CR20CS121, Ms. Moksha Sri S, USN 1CR20CS119, Ms. P Varshika
Prashanth, USN 1CR20CS133, bonafide students of CMR Institute of Technology, in partial fulfillment
for the award of Bachelor of Engineering in Computer Science and Engineering of the Visveswaraiah
Technological University, Belgaum during the year 2023-2024. It is certified that all
corrections/suggestions indicated for Internal Assessment have been incorporated in the Report
deposited in the departmental library.
The project report has been approved as it satisfies the academic requirements in respect of Project work
prescribed for the said Degree.
External Viva
1. ___________________________ ________________________
2. ___________________________ ________________________
ii
DECLARATION
We, the students of 8th semester of Computer Science and Engineering, CMR Institute of
Technology, Bangalore declare that the work entitled " Facial Recognition On Low Resolution
Images” has been successfully completed under the guidance of Prof. Paramita Mitra, Assistant
Professor, Computer Science and Engineering Department, CMR Institute of Technology,
Bangalore. This dissertation work is submitted in partial fulfillment of the requirements for the
award of Degree of Bachelor of Engineering in Computer Science and Engineering during the
academic year 2023 - 2024. Further the matter embodied in the project report has not been
submitted previously by anybody for the award of any degree or diploma to any university.
Place: Bangalore
Date:
Team members:
iii
ABSTRACT
The project aims to address the challenge of enhancing image resolution through the application
of advanced deep learning techniques. Image resolution enhancement is a critical task in various
domains like surveillance. Traditional methods often suffer from limitations in handling complex
patterns and generating high-quality results. In this project, we propose a novel approach
leveraging SR3 for image resolution enhancement.
Single Image Super-Resolution techniques are crucial for recovering high-resolution images from
low-resolution counterparts, essential in various image processing applications. SR3 is a novel
approach to single image super-resolution emphasizing structural information preservation while
achieving significant visual quality enhancement. Unlike conventional methods struggling with
maintaining sharp edges and fine details, SR3 utilizes advanced deep learning architectures and
regularization techniques to reconstruct high-resolution images with unparalleled fidelity and
naturalness.
At the core of the SR3 framework lies its innovative structural reconstruction module, effectively
capturing and restoring vital structural features like edges, textures, and contours during upscaling.
Through the integration of perceptual loss functions and attention mechanisms, SR3 ensures that
generated high-resolution images not only exhibit superior visual quality but also closely resemble
authentic high-resolution imagery.
Moreover, SR3 offers scalability and adaptability across different magnification factors and input
conditions, rendering it suitable for a wide range of practical applications, including image
enhancement, content generation, and computer vision tasks. Extensive experiments on
benchmark datasets demonstrate SR3's effectiveness, showcasing significant improvements in
both quantitative metrics and qualitative visual assessment compared to state-of-the-art SR
methods.
iv
ACKNOWLEDGEMENT
We take this opportunity to express my sincere gratitude and respect to CMR Institute of
Technology, Bengaluru for providing me a platform to pursue my studies and carry out my final
year project.
I have a great pleasure in expressing my deep sense of gratitude to Dr. Sanjay Jain,
Principal, CMRIT, Bangalore, for his constant encouragement.
I would like to thank Dr. Kavitha P, Associate Professor & HOD, Department of
Computer Science and Engineering, CMRIT, Bangalore, who has been a constant support and
encouragement throughout the course of this project.
I consider it a privilege and honor to express my sincere gratitude to my guide
Prof. Paramita Mitra, Assistant Professor, Department of Computer Science and Engineering,
for the valuable guidance throughout the tenure of this review.
I also extend my thanks to all the faculty of Computer Science and Engineering who
directly or indirectly encouraged me.
Finally, I would like to thank my parents and friends for all their moral support they have
given me during the completion of this work.
v
TABLE OF CONTENTS
Certificate ii
Declaration iii
Abstract iv
Acknowledgement v
Table of contents vi-vii
List of Figures viii
List of Tables ix
List of Abbreviations x
1 INTRODUCTION 1-5
1.1 Relevance of the Project 1
1.2 Problem Statement 2
1.3 Objectives 2-3
1.4 Scope of the project 3
1.5 Software Engineering Methodology 3-4
1.6 Tools and Technologies 4-5
1.7 Chapter Wise Summary 5
2 LITERATURE SURVEY 6-19
2.1 Overview 6
2.2 Image Super Resolution Via Iterative Refinement 6-7
2.3 Dense Nested Attention Network for Infrared Small Target 8-9
Detection
2.4 Deep Convolutional Neural Network for Inverse Problems in 9-11
Imaging
2.5 High-Resolution Image Synthesis and Semantic Manipulation 11-12
with Conditional GANs
2.6 Convolutional Sparse Coding for Compressed Sensing CT 13-14
Reconstruction
vi
2.7 A Variational Auto-Encoder Approach for Image Transmission 14-15
in Noisy Channel
2.8 A Comparative Study on Variational Autoencoder and 16-17
Generative Adversarial Networks
2.9 Research Gap / Market Analysis 18-19
3 PROPOSED ARCHITECTURE AND DESIGN 20-26
3.1 Data Flow Diagram 21-22
3.2 Use Case Diagram 23-24
3.3 UML Diagram 24-26
4 IMPLEMENTATION 17-29
4.1 Datasets 27
4.2 Training 27-28
4.3 Evaluation Metrics 28
4.4 Algorithm 29
5 RESULTS AND DISCUSSION 30-32
6 CONCLUSION 33
6.1 Scope For Future Work 34
REFERENCES 35
vii
LIST OF FIGURES
Page No.
Fig 1.1 Software Engineering Methodology Model 4
Fig 2.1 The representation of small targets in deep CNN layers of (a) U- 8
shape network (b) Dense Nested U-shape (DNA-Net) network.
Fig 2.2 Architecture of CNNs 10
Fig 2.3 Architecture of the generator 11
Fig 2.4 The architecture of the proposed VAE model 14
Fig 2.5 Architecture of Proposed Model 16
Fig 3.1 Depiction of U-Net architecture of SR3 20
Fig 3.2 Data Flow Diagram 21
Fig 3.3 Use Case Diagram 23
Fig 3.4 UML Diagram 26
Fig 5.1 High-resolution output 31
Fig 5.2 High-resolution output 31
viii
LIST OF TABLES
Page No.
Table 2.1 Comparison of different approaches 18
ix
LIST OF ABBREVIATIONS
CT Computed Tomography
CNN Convolutional Neural Network
DDPM Dell Display And Peripheral Manager
GANs Generative Adversarial Networks
PSNR Peak Signal-to-Noise Ratio
RNNs Recurrent Neural Networks
SR3 Super Resolution Via Repeated Refinement
SSI Small Scale Integration
VAEs Variational Autoencoders
x
Facial Recognition On Low Resolution Images
CHAPTER 1
INTRODUCTION
The need for high-resolution images continues to surge across various fields, like
medical imaging, satellite observations, and surveillance systems in computer vision
and image processing. This project introduces a novel approach to Image Resolution
Enhancement through the implementation of SR3 (Super-Resolution) modeling.
The SR3 (Super-Resolution with Recurrent Neural Networks and Residual Errors)
model is highly relevant for Image Resolution Enhancement. By combining recurrent
neural networks and residual error learning, SR3 effectively captures complex
relationships between low-resolution and high-resolution images. The model preserves
contextual information, learns residual errors, and adapts to diverse image content. It
is assessed using quantitative metrics like PSNR and SSI, ensuring robust evaluation.
It also holds significant relevance in real world applications.
1.3 Objectives
• Develop a State-of-the-Art Super Resolution Technique: Create an
innovative image super-resolution method that leverages diffusion models to
significantly enhance image quality and detail.
• Human Perception Testing: Conduct rigorous human evaluation tests to
validate the perceptual quality and realism of super-resolved images generated
by the diffusion model.
• Achieve High-Quality Results: Aim to produce super-resolved images that
exhibit superior visual quality, closely resembling high-resolution ground truth
images.
• High-Quality Image Reconstruction: SR3 focuses on reconstructing high-
resolution images with enhanced quality, aiming for sharper details, better
texture preservation, and reduced artifacts compared to traditional methods.
• Preservation of Structural Information: SR3 aims to preserve important
structural information such as edges, lines, and contours during the upscaling
process, ensuring that the generated high-resolution image maintains the
integrity and coherence of the original scene.
• Natural-Looking Results: Unlike some existing technologies that may
produce overly smooth or artificially sharpened images, SR3 strives to generate
high-resolution images that appear natural and visually pleasing to human
observers, minimizing the perception of distortion or manipulation.
• Efficient Computational Performance: SR3 seeks to achieve high-quality
super-resolution while maintaining computational efficiency, enabling real-
time or near-real-time processing.
• Robustness to Various Input Conditions: SR3 is designed to perform well
under diverse input conditions, including images with different levels of noise,
blur, or compression artifacts, ensuring robust performance across different
scenarios.
Image Enhancement: The primary focus of the project is to enhance the quality and
detail of low-resolution images, making them more useful for various applications,
including visual content creation, medical imaging, surveillance, and remote sensing.
Data Requirements: Consideration of the project's scope should include the need for
large-scale datasets to train and validate the model.
Our Project uses agile development methodology for cyclic development and
improvement(reviews). The major stages of our software cycle are:
In Chapter 1, We give a short introduction of the project giving its scope and
relevance. The objective of the project has also been defined.
In Chapter 2, The literature survey of the sources of the papers has been analyzed.
The sources have been discussed with advantages and disadvantages.
In Chapter 3, We have discussed the system architecture. Data flow is defined. Use
case diagrams and UML diagrams are also shown.
In Chapter 4, implementation details are discussed. The datasets used, the training
process, performance metrics and the algorithm used are defined.
In Chapter 5, The screenshots of results are described, from training the model.
In Chapter 6, Conclusion and future scope for the project are discussed.
CHAPTER 2
LITERATURE SURVEY
The literature survey on the SR3 model for image resolution enhancement reveals a
foundational paper introducing the integration of recurrent neural networks (RNNs)
and residual error learning. Researchers have explored architectural innovations,
emphasizing the need for curated datasets and optimal training strategies. Quantitative
metrics like PSNR and SSI are commonly used for performance evaluation, and real-
world applications and domains have been investigated.
2.1 Overview
Keywords used for the search: Image super-resolution, diffusion models, deep
generative models, image-to-image translation, denoising process, iterative methods,
face recognition.
Advantages
1. The iterative nature of the approach may enable the model to capture finer
details in the images over successive iterations, giving more accurate and
detailed reconstruction.
2. Iterative refinement methods may enhance the robustness of the super-
resolution model by mitigating noise and artifacts present in low-resolution
images through successive improvements.
3. The model may adapt and learn from its own previous iterations, allowing it to
refine its predictions based on the feedback and information gained during each
iteration.
Disadvantages
This research paper introduces a novel solution to address the intricacies of infrared
target detection. Leveraging deep learning and attention mechanisms, this proposed
network architecture aims to enhance the accuracy of target detection in infrared
imagery. The integration of dense nested attention mechanisms facilitates the model's
ability to capture both global context and intricate local features, enabling it to discern
subtle target signatures against challenging backgrounds.
Figure 2.1 The representation of small targets in deep CNN layers of (a) U-shape
network (b) Dense Nested U-shape (DNA-Net) network.
Advantages
1. The proposed Dense Nested Attention Network may lead to improved accuracy,
thanks to the integration of advanced attention mechanisms that capture both
global and local context.
Disadvantages
1. If the Attention Network has a high computational cost, it might limit its
practicality, especially in real-time applications or scenarios with resource
constraints.
2. The iterative and complex nature of attention mechanisms could potentially
lead to overfitting, where the model memorizes details from the training
data but struggles to generalize well to new, unseen infrared images.
3. The success of the iterative refinement process may be sensitive to the
quality of the initializations. If the model's performance is highly dependent
on the initial estimates, it could be a limitation.
The research paper focuses on the use of deep convolutional neural networks to address
inverse problems in the domain of imaging. Inverse problems involve the estimation of
input parameters or information from observed data, and this is a common challenge in
various imaging applications such as medical imaging, computer vision, and remote
sensing.
Advantages
Disadvantages
2. Deep learning models need a lot of labelled training data to generalize well.
If it suffers from a lack of diverse and representative training data, the
model's performance might be limited in real-world applications.
3. Deep models are susceptible to overfitting, where the model does well for
the training data but fails for unseen data. The paper may face criticism if it
does not adequately address or mitigate overfitting issues.
Advantages
Disadvantages
Advantages
Disadvantages
The paper discusses how the variational autoencoder is structured and trained to encode
images into a latent space and decode them back to the original form, emphasizing its
ability to handle noisy channel conditions.
Advantages
1. VAEs are known to generate data with inherent noise robustness. The
probabilistic nature of VAEs allows them to handle noise in the
transmission channel more effectively, resulting in improved image
reconstruction under noisy conditions.
2. VAEs encode images into a latent space, which often captures meaningful
and compact representations of the input data. This can lead to efficient
transmission as the information is concentrated in a lower-dimensional
space.
3. VAEs are generative models, meaning they can produce samples from the
trained latent space. This generative capability can be advantageous in
scenarios where reconstructed images need to be generated from partial or
degraded data received in a noisy channel.
Disadvantages
1. Although VAEs can produce samples from the trained latent space, the
quality of images may be less than the quality of the input images. This
could be a limitation in scenarios where high-fidelity image reconstruction
is crucial.
2. The latent space representation learned by VAEs might lack interpretability.
Understanding the significance of specific dimensions in the latent space
may be challenging, impacting the model's transparency and explainability.
3. The effectiveness of VAEs in handling noise may depend on the properties
of the noise.
The paper explores and contrasts two prominent generative models: VAEs and GANs.
Both VAEs and GANs are popular frameworks in the domain deep learning for
generating realistic data, and their analysis provides important results about their
strengths and weaknesses.
The paper highlights the importance of generative models in many fields, like image
synthesis, data augmentation, and generative tasks.
Advantages
3. The paper provides gives the architectural differences between VAEs and
GANs, explaining how each model operates and generates realistic data.
This knowledge can be beneficial for researchers aiming to design or
modify generative models.
Disadvantages
Research Gaps-
CHAPTER 3
We up-sample the low-resolution input image x to the target resolution using bicubic
interpolation. Then, we concatenate it with the noisy high resolution output image yt.
We illustrate the activation dimensions for a super-resolution model transitioning from
16x16 to 128x128. Self-attention is applied to the 16x16 feature maps.
The U-Net architecture stands as a well-known neural network design utilized across a
spectrum of processing tasks, like super-resolution. A U-Net structure comprises an
encoder-decoder network bolstered by skip connections, facilitating effective capture
of both low-level and high-level features, also preserving integrity.
The data flow of SR3 (Single Image Super-Resolution) encompasses several key
stages, each contributing to the process of enhancing the resolution of input images.
Here's a depiction of the data flow:
2. Preprocessing: Before inputting the low-resolution images into the SR3 model,
preprocessing steps are applied to enhance their suitability for super-resolution
tasks. It includes processes like normalization, resizing, or noise reduction.
3. Model Input: The preprocessed low-resolution images are then trained using
the SR3 model. This model typically consists of a deep neural network
architecture, such as a U-Net.
6. Output Generation: The final output of the SR3 model consists of high-
resolution images generated from the input low-resolution images. These output
images exhibit enhanced clarity, sharpness, and detail compared to their low-
resolution counterparts.
In summary, the data flow of SR3 encompasses the acquisition of low-resolution input
images, preprocessing, super-resolution processing using a deep neural network model,
postprocessing, output generation of high-resolution images, evaluation, and potential
deployment or further processing for various applications.
The actors in this scenario could include the input image, the super-resolution
algorithm, and the output high-resolution image. Each of these actors would have
corresponding use cases representing their actions and interactions within the system.
Use case diagrams show the various interactions between the system and its users or
external components. Here's a representation for SR3 :
This diagram shows the various interactions and functionalities of the SR3 system,
encompassing input, processing, output, evaluation, and feedback loops.
A UML diagram for SR3 involves depicting the various components and their
relationships within the system. Here's a UML diagram for SR3:
1. User Interface (UI): Represents the interface by which interaction with the SR3
system is done by users. Users can provide input images and receive super-resolved
output images.
2. Controller: Acts as an intermediary between the UI and the SR3 Engine, handling
user inputs, triggering image processing tasks, and managing the flow of data.
8. Image Data: Represents the input low-resolution images and output high-resolution
images processed by the SR3 system.
This UML diagram provides a clear overview of the components and their interactions
within the SR3 system, facilitating understanding and development of the super-
resolution functionality.
CHAPTER 4
IMPLEMENTATION
We evaluate the performance of SR3 in super resolution tasks involving faces, natural
scenes, and synthetic images generated from a low-resolution model. This synthetic
dataset facilitates high resolution image synthesis through a cascaded modelling
approach.
4.1 Datasets
We conduct training of face super-resolution models on the Flickr-Faces-HQ (FFHQ)
dataset and assess their performance on CelebA-HQ. Additionally, we train
unconditional face and class conditional ImageNet generative models using DDPM on
the same datasets mentioned earlier.
Throughout both training and testing phases, we employ low-resolution images that
undergo down sampling via bicubic interpolation with antialiasing enabled. Employing
the largest central crop approach, we subsequently resize it to the target resolution using
area resampling to generate the high-resolution image.
4.2 Training
Implemented in Python using the PyTorch deep learning framework, SR3 integrates a
U-Net architecture trained with denoising objectives. The denoising process follows a
stochastic iterative refinement approach, drawing inspiration from denoising diffusion
probabilistic models.
This implies a higher ratio of signal power to noise power, resulting in less noticeable
distortion in the reconstructed image.
Within SR3, these metrics play a vital role as quantitative indicators of the effectiveness
of super-resolution algorithms in improving image quality.
By evaluating the MSE and PSNR values across different algorithms or parameter
configurations, both players and developers can gauge and optimize the performance
of the super-resolution feature within the game, ensuring it aligns with the desired
standards of image enhancement and fidelity.
4.4 Algorithm
Within SR3, denoising plays a pivotal role in elevating the visual quality of images,
particularly in the realm of single image super-resolution (SR). Denoising techniques
are pivotal for eliminating undesired noise from low-resolution input images before
applying super-resolution algorithms, consequently refining the fidelity of the resulting
high-resolution images.
4. Inference: During the inference stage, the trained denoising model is applied
to noisy input images to generate denoised outputs. The model's parameters
remain fixed, and no further training occurs during this phase.
CHAPTER 5
Our findings underscore SR3's effectiveness across different image types and
magnification factors, showcasing superior performance compared to several GAN and
Normalizing Flow baselines. Human perception studies affirm SR3's capability in
generating outputs with high fidelity, as indicated by fool rates close to 50% for faces
and 40% for natural images.
However, the complexities with more number of refinement steps during inference
presents a practical challenge. We explored trade-offs between sample quality and
generation speed, achieving satisfactory results with just four refinement steps.
Figure 5.1 and Figure 5.2 show the training results, where the high-resolution images
are generated. The performance metrics are also displayed.
Moreover, in face super-resolution, the model tended to produce smooth skin textures,
overlooking details such as moles, pimples, and piercings present in the reference
image. These biases underscore the necessity for further investigations and
considerations before deploying SR3 in production settings. Nevertheless, diffusion
models like SR3 hold promise for mitigating dataset biases by generating synthetic data
from underrepresented groups.
CHAPTER 6
CONCLUSION
REFERENCES
[1] Image Super Resolution via Iterative Refinement, Chitwan Saharia, Jonathan Ho,
William Chan , Tim Salimans, David J. Fleet , and Mohammad Norouzi, IEEE, 2023
[2] Dense Nested Attention Network for Infrared Small Target Detection, Boyang Li ,
Chao Xiao , Longguang Wang , Yingqian Wang , Zaiping Lin ,Miao Li, Wei An , and
Yulan Guo , Senior Member, IEEE, 2023
[3] Deep Convolutional Neural Network for Inverse Problems in Imaging, Chitwan
Saharia, Jonathan Ho, William Chan , Tim Salimans, David J. Fleet , and Mohammad
Norouzi, IEEE, 2019
[4] High-Resolution Image Synthesis and Semantic Manipulation with Conditional
GANs, Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan
Catanzaro, NVIDIA Corporation, UC Berkeley, 2018
[5] Convolutional Sparse Coding for Compressed Sensing CT Reconstruction, Chitwan
Saharia, Jonathan Ho, William Chan , Tim Salimans, David J. Fleet , and Mohammad
Norouzi, IEEE,2023
[6] A variational auto-encoder approach for image transmission in noisy channel, Amir
Hossein Estiri, Ali Banaei, Benyamin Jamialahmadi, Mahdi Jafari siavoshani, 2021
[7] A comparative study on variational autoencoder and generative adversarial
networks, Mirza Sami , Iftekharul Mobin, 2019