0% found this document useful (0 votes)
24 views

Project Report Final

Uploaded by

mokshasris2002
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Project Report Final

Uploaded by

mokshasris2002
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Jnana Sangama, Belgaum-590018

A PROJECT REPORT (18CSP83) ON

“Facial Recognition On Low Resolution Images”


Submitted in Partial fulfillment of the Requirements for the Degree of

Bachelor of Engineering in Computer Science & Engineering


By

Muteeba Shoukat (1CR20CS121)

Moksha Sri S (1CR20CS119)

P Varshika Prashanth (1CR20CS133)

Under the Guidance of,


Prof. Paramita Mitra
Assistant Professor, Dept. of CSE

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CMR INSTITUTE OF TECHNOLOGY

#132, AECS LAYOUT, IT PARK ROAD, KUNDALAHALLI, BANGALORE-560037


CMR INSTITUTE OF TECHNOLOGY
#132, AECS LAYOUT, IT PARK ROAD, KUNDALAHALLI, BANGALORE-560037

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE
Certified that the project work entitled “Facial Recognition On Low Resolution Images” carried out
by Ms. Muteeba Shoukat, USN 1CR20CS121, Ms. Moksha Sri S, USN 1CR20CS119, Ms. P Varshika
Prashanth, USN 1CR20CS133, bonafide students of CMR Institute of Technology, in partial fulfillment
for the award of Bachelor of Engineering in Computer Science and Engineering of the Visveswaraiah
Technological University, Belgaum during the year 2023-2024. It is certified that all
corrections/suggestions indicated for Internal Assessment have been incorporated in the Report
deposited in the departmental library.

The project report has been approved as it satisfies the academic requirements in respect of Project work
prescribed for the said Degree.

________________ ________________ ________________


Prof. Paramita Mitra Dr. Kavitha P Dr. Sanjay Jain
Assistant Professor Associate Professor & Head Principal
Dept. of CSE, CMRIT Dept. of CSE, CMRIT CMRIT

External Viva

Name of the Examiners Signature with Date

1. ___________________________ ________________________

2. ___________________________ ________________________

ii
DECLARATION

We, the students of 8th semester of Computer Science and Engineering, CMR Institute of
Technology, Bangalore declare that the work entitled " Facial Recognition On Low Resolution
Images” has been successfully completed under the guidance of Prof. Paramita Mitra, Assistant
Professor, Computer Science and Engineering Department, CMR Institute of Technology,
Bangalore. This dissertation work is submitted in partial fulfillment of the requirements for the
award of Degree of Bachelor of Engineering in Computer Science and Engineering during the
academic year 2023 - 2024. Further the matter embodied in the project report has not been
submitted previously by anybody for the award of any degree or diploma to any university.

Place: Bangalore

Date:

Team members:

Muteeba Shoukat (1CR20CS121) __________________

Moksha Sri S (1CR20CS119) __________________

P Varshika Prashanth (1CR20CS133) __________________

iii
ABSTRACT

The project aims to address the challenge of enhancing image resolution through the application
of advanced deep learning techniques. Image resolution enhancement is a critical task in various
domains like surveillance. Traditional methods often suffer from limitations in handling complex
patterns and generating high-quality results. In this project, we propose a novel approach
leveraging SR3 for image resolution enhancement.

Single Image Super-Resolution techniques are crucial for recovering high-resolution images from
low-resolution counterparts, essential in various image processing applications. SR3 is a novel
approach to single image super-resolution emphasizing structural information preservation while
achieving significant visual quality enhancement. Unlike conventional methods struggling with
maintaining sharp edges and fine details, SR3 utilizes advanced deep learning architectures and
regularization techniques to reconstruct high-resolution images with unparalleled fidelity and
naturalness.

At the core of the SR3 framework lies its innovative structural reconstruction module, effectively
capturing and restoring vital structural features like edges, textures, and contours during upscaling.
Through the integration of perceptual loss functions and attention mechanisms, SR3 ensures that
generated high-resolution images not only exhibit superior visual quality but also closely resemble
authentic high-resolution imagery.

Moreover, SR3 offers scalability and adaptability across different magnification factors and input
conditions, rendering it suitable for a wide range of practical applications, including image
enhancement, content generation, and computer vision tasks. Extensive experiments on
benchmark datasets demonstrate SR3's effectiveness, showcasing significant improvements in
both quantitative metrics and qualitative visual assessment compared to state-of-the-art SR
methods.

Keywords: Image super-resolution, diffusion models, deep generative models, image-to-image


translation, denoising process, iterative methods, face recognition

iv
ACKNOWLEDGEMENT

We take this opportunity to express my sincere gratitude and respect to CMR Institute of
Technology, Bengaluru for providing me a platform to pursue my studies and carry out my final
year project.
I have a great pleasure in expressing my deep sense of gratitude to Dr. Sanjay Jain,
Principal, CMRIT, Bangalore, for his constant encouragement.
I would like to thank Dr. Kavitha P, Associate Professor & HOD, Department of
Computer Science and Engineering, CMRIT, Bangalore, who has been a constant support and
encouragement throughout the course of this project.
I consider it a privilege and honor to express my sincere gratitude to my guide
Prof. Paramita Mitra, Assistant Professor, Department of Computer Science and Engineering,
for the valuable guidance throughout the tenure of this review.
I also extend my thanks to all the faculty of Computer Science and Engineering who
directly or indirectly encouraged me.
Finally, I would like to thank my parents and friends for all their moral support they have
given me during the completion of this work.

v
TABLE OF CONTENTS

Certificate ii
Declaration iii
Abstract iv
Acknowledgement v
Table of contents vi-vii
List of Figures viii
List of Tables ix
List of Abbreviations x
1 INTRODUCTION 1-5
1.1 Relevance of the Project 1
1.2 Problem Statement 2
1.3 Objectives 2-3
1.4 Scope of the project 3
1.5 Software Engineering Methodology 3-4
1.6 Tools and Technologies 4-5
1.7 Chapter Wise Summary 5
2 LITERATURE SURVEY 6-19
2.1 Overview 6
2.2 Image Super Resolution Via Iterative Refinement 6-7
2.3 Dense Nested Attention Network for Infrared Small Target 8-9
Detection
2.4 Deep Convolutional Neural Network for Inverse Problems in 9-11
Imaging
2.5 High-Resolution Image Synthesis and Semantic Manipulation 11-12
with Conditional GANs
2.6 Convolutional Sparse Coding for Compressed Sensing CT 13-14
Reconstruction

vi
2.7 A Variational Auto-Encoder Approach for Image Transmission 14-15
in Noisy Channel
2.8 A Comparative Study on Variational Autoencoder and 16-17
Generative Adversarial Networks
2.9 Research Gap / Market Analysis 18-19
3 PROPOSED ARCHITECTURE AND DESIGN 20-26
3.1 Data Flow Diagram 21-22
3.2 Use Case Diagram 23-24
3.3 UML Diagram 24-26
4 IMPLEMENTATION 17-29
4.1 Datasets 27
4.2 Training 27-28
4.3 Evaluation Metrics 28
4.4 Algorithm 29
5 RESULTS AND DISCUSSION 30-32
6 CONCLUSION 33
6.1 Scope For Future Work 34
REFERENCES 35

vii
LIST OF FIGURES

Page No.
Fig 1.1 Software Engineering Methodology Model 4
Fig 2.1 The representation of small targets in deep CNN layers of (a) U- 8
shape network (b) Dense Nested U-shape (DNA-Net) network.
Fig 2.2 Architecture of CNNs 10
Fig 2.3 Architecture of the generator 11
Fig 2.4 The architecture of the proposed VAE model 14
Fig 2.5 Architecture of Proposed Model 16
Fig 3.1 Depiction of U-Net architecture of SR3 20
Fig 3.2 Data Flow Diagram 21
Fig 3.3 Use Case Diagram 23
Fig 3.4 UML Diagram 26
Fig 5.1 High-resolution output 31
Fig 5.2 High-resolution output 31

viii
LIST OF TABLES

Page No.
Table 2.1 Comparison of different approaches 18

ix
LIST OF ABBREVIATIONS

CT Computed Tomography
CNN Convolutional Neural Network
DDPM Dell Display And Peripheral Manager
GANs Generative Adversarial Networks
PSNR Peak Signal-to-Noise Ratio
RNNs Recurrent Neural Networks
SR3 Super Resolution Via Repeated Refinement
SSI Small Scale Integration
VAEs Variational Autoencoders

x
Facial Recognition On Low Resolution Images

CHAPTER 1

INTRODUCTION

The need for high-resolution images continues to surge across various fields, like
medical imaging, satellite observations, and surveillance systems in computer vision
and image processing. This project introduces a novel approach to Image Resolution
Enhancement through the implementation of SR3 (Super-Resolution) modeling.

Super-Resolution (SR) techniques aim to reconstruct high-resolution images from their


low-resolution counterparts, offering a solution to enhance visual quality and extract
finer details. The SR3 modeling technique employed in this project integrates the power
of recurrent neural networks (RNNs) with residual error learning to achieve superior
image resolution.

1.1 Relevance of the Project

The SR3 (Super-Resolution with Recurrent Neural Networks and Residual Errors)
model is highly relevant for Image Resolution Enhancement. By combining recurrent
neural networks and residual error learning, SR3 effectively captures complex
relationships between low-resolution and high-resolution images. The model preserves
contextual information, learns residual errors, and adapts to diverse image content. It
is assessed using quantitative metrics like PSNR and SSI, ensuring robust evaluation.
It also holds significant relevance in real world applications.

Its capacity to enhance image resolution has practical implications, including


improved medical diagnostics, precise satellite observations, enhanced surveillance
systems, sharper media production, detailed geospatial mapping, augmented
capabilities in artificial intelligence and autonomous vehicles, forensic analysis, and
improved resolution in astronomy and astrophysics. The versatility of SR3 underscores
its potential to positively impact multiple sectors by providing sharper and more
detailed visual information in practical and meaningful ways.

Dept of CSE, CMRIT 2023-2024 Page 1


Facial Recognition On Low Resolution Images

1.2 Problem Statement

To enhance Image Resolution using SR3 modelling.

1.3 Objectives
• Develop a State-of-the-Art Super Resolution Technique: Create an
innovative image super-resolution method that leverages diffusion models to
significantly enhance image quality and detail.
• Human Perception Testing: Conduct rigorous human evaluation tests to
validate the perceptual quality and realism of super-resolved images generated
by the diffusion model.
• Achieve High-Quality Results: Aim to produce super-resolved images that
exhibit superior visual quality, closely resembling high-resolution ground truth
images.
• High-Quality Image Reconstruction: SR3 focuses on reconstructing high-
resolution images with enhanced quality, aiming for sharper details, better
texture preservation, and reduced artifacts compared to traditional methods.
• Preservation of Structural Information: SR3 aims to preserve important
structural information such as edges, lines, and contours during the upscaling
process, ensuring that the generated high-resolution image maintains the
integrity and coherence of the original scene.
• Natural-Looking Results: Unlike some existing technologies that may
produce overly smooth or artificially sharpened images, SR3 strives to generate
high-resolution images that appear natural and visually pleasing to human
observers, minimizing the perception of distortion or manipulation.
• Efficient Computational Performance: SR3 seeks to achieve high-quality
super-resolution while maintaining computational efficiency, enabling real-
time or near-real-time processing.
• Robustness to Various Input Conditions: SR3 is designed to perform well
under diverse input conditions, including images with different levels of noise,
blur, or compression artifacts, ensuring robust performance across different
scenarios.

Dept of CSE, CMRIT 2023-2024 Page 2


Facial Recognition On Low Resolution Images

• Scalability Across Resolutions: SR3 aims to provide scalable solutions


capable of generating high-resolution images at different magnification factors,
allowing flexibility in adapting to specific resolution requirements or
constraints.
• Adaptability to Different Domains: SR3 strives to be versatile and adaptable
to various domains such as photography, security, and multimedia content
generation, catering to different application needs and user requirements.

1.4 Scope of the project

The project's scope is comprehensive and involves a multi-faceted approach to


improving image quality and resolution using diffusion models. It encompasses both
technical and practical aspects, with the potential to influence various industries and
contribute to the domain of computer vision and image processing.

Image Enhancement: The primary focus of the project is to enhance the quality and
detail of low-resolution images, making them more useful for various applications,
including visual content creation, medical imaging, surveillance, and remote sensing.

Data Requirements: Consideration of the project's scope should include the need for
large-scale datasets to train and validate the model.

Industry Impact: Consideration of the practical and commercial scope includes


identifying potential industry applications, collaborations, or licensing opportunities
for the developed technology.

1.5 Software Engineering Methodology

Our Project uses agile development methodology for cyclic development and
improvement(reviews). The major stages of our software cycle are:

• Requirements Gathering: Ensure all requirements are well-documented,


including user stories, functional specifications, and technical requirements.

Dept of CSE, CMRIT 2023-2024 Page 3


Facial Recognition On Low Resolution Images

• Dataset Retrieval: Gathering the required dataset for the implementation of


our project.
• Training: With the acquired dataset, we went through various training
processes.
• Testing: Conduct various tests after the training process.
• Refinement and Iteration: Address bugs, gather feedback, and make
necessary improvements based on user testing and reviews.
• Comparison: Keeping up with the current and available technologies and
comparing it with the algorithm used by us in order to make our project function
better.
• Monitoring and Optimization: Put monitoring instruments into place, keep an
eye on data, and improve the system iteratively in response to performance
evaluations and new needs.
• Documentation: To make future maintenance and additions easier, keep
thorough documentation throughout the development lifecycle.

Fig. 1.1 – Software Engineering Methodology Model

1.6 Tools and Technologies

• Deep Learning Frameworks:


• TensorFlow
• Keras (built on top of TensorFlow)
• Computer Vision Libraries:

Dept of CSE, CMRIT 2023-2024 Page 4


Facial Recognition On Low Resolution Images

• OpenCV: For image and video processing tasks, including face


detection and video capture.
• Data Collection and Annotation Tools:
• Video surveillance hardware or a surveillance camera.
• Data Pre-processing:
• Image and video pre-processing libraries for tasks like resizing,
normalization, and augmentation.
• Kaggle
• GPU

1.6 Chapter Wise Summary

In Chapter 1, We give a short introduction of the project giving its scope and
relevance. The objective of the project has also been defined.

In Chapter 2, The literature survey of the sources of the papers has been analyzed.
The sources have been discussed with advantages and disadvantages.

In Chapter 3, We have discussed the system architecture. Data flow is defined. Use
case diagrams and UML diagrams are also shown.

In Chapter 4, implementation details are discussed. The datasets used, the training
process, performance metrics and the algorithm used are defined.

In Chapter 5, The screenshots of results are described, from training the model.

In Chapter 6, Conclusion and future scope for the project are discussed.

Dept of CSE, CMRIT 2023-2024 Page 5


Facial Recognition On Low Resolution Images

CHAPTER 2

LITERATURE SURVEY

The literature survey on the SR3 model for image resolution enhancement reveals a
foundational paper introducing the integration of recurrent neural networks (RNNs)
and residual error learning. Researchers have explored architectural innovations,
emphasizing the need for curated datasets and optimal training strategies. Quantitative
metrics like PSNR and SSI are commonly used for performance evaluation, and real-
world applications and domains have been investigated.

Studies also focus on interpretability and visualization of SR3 models, highlighting


ongoing research on attention mechanisms, multi-scale architectures, and adversarial
training. Challenges related to computational complexity, generalization, and real-time
deployment are addressed as future directions. The literature reflects a dynamic field
with continuous advancements in super-resolution techniques.

2.1 Overview
Keywords used for the search: Image super-resolution, diffusion models, deep
generative models, image-to-image translation, denoising process, iterative methods,
face recognition.

2.2 Image Super Resolution Via Iterative Refinement [1]


The paper introduces a novel framework that employs iterative refinement techniques
to enhance the resolution of low-resolution images effectively. The key focus of the
research lies in developing a model that refines its predictions through multiple
iterations, progressively improving the visual quality of the reconstructed high-
resolution images.

Dept of CSE, CMRIT 2023-2024 Page 6


Facial Recognition On Low Resolution Images

The paper addresses the challenges of image super-resolution with a multi-stage


approach, demonstrating promising results through comprehensive experiments and
evaluations. This iterative refinement strategy contributes to the growing body of
research in image processing and computer vision, offering a valuable perspective on
improving image resolution through successive enhancements.

Advantages

1. The iterative nature of the approach may enable the model to capture finer
details in the images over successive iterations, giving more accurate and
detailed reconstruction.
2. Iterative refinement methods may enhance the robustness of the super-
resolution model by mitigating noise and artifacts present in low-resolution
images through successive improvements.
3. The model may adapt and learn from its own previous iterations, allowing it to
refine its predictions based on the feedback and information gained during each
iteration.

Disadvantages

1. Iterative refinement approaches can be computationally intensive, requiring


multiple passes through the network for each image. This may lead to increased
computational time and resource requirements.
2. Training models with iterative refinement might be more complex compared to
single-pass models. It may involve additional challenges related to
convergence, stability, and tuning hyperparameters for multiple stages.
3. The interpretation of the learning process and feature extraction in each iteration
may be challenging, making it harder to understand and explain the decisions
made by the model.

Dept of CSE, CMRIT 2023-2024 Page 7


Facial Recognition On Low Resolution Images

2.3 Dense Nested Attention Network for Infrared Small


Target Detection [2]

This research paper introduces a novel solution to address the intricacies of infrared
target detection. Leveraging deep learning and attention mechanisms, this proposed
network architecture aims to enhance the accuracy of target detection in infrared
imagery. The integration of dense nested attention mechanisms facilitates the model's
ability to capture both global context and intricate local features, enabling it to discern
subtle target signatures against challenging backgrounds.

Figure 2.1 The representation of small targets in deep CNN layers of (a) U-shape
network (b) Dense Nested U-shape (DNA-Net) network.

Advantages

1. The proposed Dense Nested Attention Network may lead to improved accuracy,
thanks to the integration of advanced attention mechanisms that capture both
global and local context.

Dept of CSE, CMRIT 2023-2024 Page 8


Facial Recognition On Low Resolution Images

2. The dense nested attention mechanisms can contribute to a more effective


representation of features, allowing the network to discern subtle details of
small targets against complex backgrounds, leading to better discrimination.
3. If the proposed Dense Nested Attention Network is computationally efficient,
it could be advantageous for real-time applications.

Disadvantages

1. If the Attention Network has a high computational cost, it might limit its
practicality, especially in real-time applications or scenarios with resource
constraints.
2. The iterative and complex nature of attention mechanisms could potentially
lead to overfitting, where the model memorizes details from the training
data but struggles to generalize well to new, unseen infrared images.
3. The success of the iterative refinement process may be sensitive to the
quality of the initializations. If the model's performance is highly dependent
on the initial estimates, it could be a limitation.

2.4 Deep Convolutional Neural Network for Inverse


Problems in Imaging [3]

The research paper focuses on the use of deep convolutional neural networks to address
inverse problems in the domain of imaging. Inverse problems involve the estimation of
input parameters or information from observed data, and this is a common challenge in
various imaging applications such as medical imaging, computer vision, and remote
sensing.

Dept of CSE, CMRIT 2023-2024 Page 9


Facial Recognition On Low Resolution Images

Figure 2.2 Architecture of CNNs

Advantages

1. Deep convolutional neural networks have the ability to learn complex


mappings, enabling more accurate reconstructions in inverse problems. The
paper may demonstrate improved accuracy in reconstructing images or
information from noisy or incomplete data.
2. A well-designed deep learning model can often generalize well to unseen
data. The paper proposes a model that can be used in a variety of imaging
tasks and datasets.
3. Deep learning models can learn features from data, reducing the need for
manually designed algorithms. This can be advantageous in situations
where the underlying mathematical model of the inverse problem is
complex or not well understood.

Disadvantages

1. Deep learning models can be computationally intensive. The paper might


face criticism if it does not adequately address concerns about the
computational complexity of the proposed method, especially in scenarios
where resources are less.

Dept of CSE, CMRIT 2023-2024 Page 10


Facial Recognition On Low Resolution Images

2. Deep learning models need a lot of labelled training data to generalize well.
If it suffers from a lack of diverse and representative training data, the
model's performance might be limited in real-world applications.
3. Deep models are susceptible to overfitting, where the model does well for
the training data but fails for unseen data. The paper may face criticism if it
does not adequately address or mitigate overfitting issues.

2.5 High-Resolution Image Synthesis and Semantic


Manipulation with Conditional GANs [4]

The research paper focuses on the application of Conditional Generative Adversarial


Networks to get high-resolution images and semantic manipulation. GANs consist of a
generator and a discriminator trained adversarially. The term "conditional" implies that
the generation process is conditioned on specific information, often in the form of
additional input data or labels.

Figure 2.3 Architecture of the generator

Dept of CSE, CMRIT 2023-2024 Page 11


Facial Recognition On Low Resolution Images

Advantages

1. The paper may introduce a novel architecture or training strategy that


enhances the fidelity of generated images, producing results with higher
resolution and visual quality compared to existing methods.
2. Leveraging conditional GANs allows for the generation of images based on
specific conditions or attributes. This capability enables more controlled
and customizable image synthesis, addressing the needs of various
applications requiring specific visual characteristics.
3. If the paper focuses on semantic manipulation, it could provide a way for
precise control over specific features or aspects of the images. This fine-
grained control is valuable in applications where users need to modify or
customize certain visual elements.

Disadvantages

1. GANs are susceptible to mode collapse, giving limited varieties of samples,


failing to capture the full range of the distribution. If the proposed model is
prone to mode collapse, it could limit the range of generated images.
2. GAN training is known for being sensitive and prone to instability. If the
paper does not address or mitigate training challenges, such as oscillations
or divergence issues, it may hinder the practical applicability of the
proposed model.
3. Generating high-resolution images with complex models can be
computationally intensive, requiring substantial resources in terms of
memory and processing power. This could limit the accessibility of the
proposed approach, for users with less computational resources.

Dept of CSE, CMRIT 2023-2024 Page 12


Facial Recognition On Low Resolution Images

2.6 Convolutional Sparse Coding for Compressed Sensing


CT Reconstruction [5]

The research paper focuses on the application of convolutional sparse coding


techniques for reconstruction of CT images using compressed sensing principles.
Compressed sensing involves signal processing, allowing for the reconstruction of
images, reducing the amount of data acquired during the imaging process. Sparse
coding is a method that represents signals as a combination of a few basis functions,
and convolutional sparse coding extends this idea by incorporating local spatial
relationships through convolutional operations. The paper explores how these methods
can be tailored to the specific challenges of image reconstruction from sparse data.

Advantages

1. Compressed sensing techniques aim to reconstruct images from a reduced


set of acquired data, potentially leading to lesser radiations for patients
undergoing CT scans. The paper successfully demonstrates a reduction in
radiation without reducing image quality, it is a significant advantage.
2. Convolutional sparse coding methods may capture local spatial
relationships more effectively than traditional sparse coding approaches.
This could lead to improved image quality, reduced artifacts, and better
preservation of fine details in the reconstructed CT images.
3. Convolutional sparse coding, by incorporating local spatial relationships,
may contribute to achieving higher spatial resolution in reconstructed CT
images. Higher resolution is crucial for accurate diagnosis and better
visualization of anatomical structures.

Disadvantages

1. Convolutional sparse coding methods, especially when integrated into


complex algorithms or neural network architectures, can be
computationally intensive. Challenges can arise for real-time applications
or environments with limited computational resources.
2. Convolutional neural networks and sparse coding models need a lot of
training data to generalize well. If the proposed method demands extensive

Dept of CSE, CMRIT 2023-2024 Page 13


Facial Recognition On Low Resolution Images

training datasets, it can be a challenge when such data is scarce or difficult


to obtain.
3. Convolutional sparse coding models might have numerous hyperparameters
that require careful tuning for optimal performance.

2.7 A Variational Auto-Encoder Approach for Image


Transmission in Noisy Channel [6]

The paper focuses on leveraging variational autoencoders for the transmission of


images over a noisy communication channel. Variational autoencoders are generative
models that aim to learn a probabilistic representation of input data, and they are used
in image processing and compression tasks.

The paper discusses how the variational autoencoder is structured and trained to encode
images into a latent space and decode them back to the original form, emphasizing its
ability to handle noisy channel conditions.

Figure 2.4 The architecture of the proposed VAE model

Dept of CSE, CMRIT 2023-2024 Page 14


Facial Recognition On Low Resolution Images

Advantages

1. VAEs are known to generate data with inherent noise robustness. The
probabilistic nature of VAEs allows them to handle noise in the
transmission channel more effectively, resulting in improved image
reconstruction under noisy conditions.
2. VAEs encode images into a latent space, which often captures meaningful
and compact representations of the input data. This can lead to efficient
transmission as the information is concentrated in a lower-dimensional
space.
3. VAEs are generative models, meaning they can produce samples from the
trained latent space. This generative capability can be advantageous in
scenarios where reconstructed images need to be generated from partial or
degraded data received in a noisy channel.

Disadvantages

1. Although VAEs can produce samples from the trained latent space, the
quality of images may be less than the quality of the input images. This
could be a limitation in scenarios where high-fidelity image reconstruction
is crucial.
2. The latent space representation learned by VAEs might lack interpretability.
Understanding the significance of specific dimensions in the latent space
may be challenging, impacting the model's transparency and explainability.
3. The effectiveness of VAEs in handling noise may depend on the properties
of the noise.

Dept of CSE, CMRIT 2023-2024 Page 15


Facial Recognition On Low Resolution Images

2.8 A Comparative Study on Variational Autoencoder and


Generative Adversarial Networks [7]

The paper explores and contrasts two prominent generative models: VAEs and GANs.
Both VAEs and GANs are popular frameworks in the domain deep learning for
generating realistic data, and their analysis provides important results about their
strengths and weaknesses.

The paper highlights the importance of generative models in many fields, like image
synthesis, data augmentation, and generative tasks.

Figure 2.5 Architecture of Proposed Model

Advantages

1. The paper aids researchers and practitioners in making choices about


choosing the appropriate generative model for their specific tasks.
Understanding the pros and cons of both VAEs and GANs can guide the
choice according to the requirements.
2. A comparative study offers an overview of the strengths and weaknesses of
VAEs and GANs. This can serve as an important result for readers wanting
to understand these generative models.

Dept of CSE, CMRIT 2023-2024 Page 16


Facial Recognition On Low Resolution Images

3. The paper provides gives the architectural differences between VAEs and
GANs, explaining how each model operates and generates realistic data.
This knowledge can be beneficial for researchers aiming to design or
modify generative models.

Disadvantages

1. Comparative studies can be sensitive to the choice of datasets,


hyperparameters, and evaluation metrics. Small variations in these factors
might lead to different conclusions. It's essential for authors to thoroughly
detail their experimental setup to enhance the study's reproducibility.
2. Findings from a comparative study may be specific to the datasets and tasks
chosen for evaluation. The study's generalizability to different domains or
applications might be limited, and this limitation should be acknowledged.
3. VAEs and GANs can be sensitive to hyperparameter tuning. The study
might not capture the full range of each model's performance if certain
hyperparameter configurations are not explored.

Dept of CSE, CMRIT 2023-2024 Page 17


Facial Recognition On Low Resolution Images

Table 2.1 Comparison of different approaches

2.9 Research Gap / Market Analysis


Analyzing research gaps and conducting a market analysis for image resolution
involves identifying areas where current research or market offerings fall short of
meeting specific needs or expectations.

Research Gaps-

• Super-Resolution Techniques for Real-time Applications: A gap is present


in the growth of super-resolution techniques that can achieve high-quality
image upscaling in real-time or near-real-time scenarios, such as video
streaming or live broadcasts.
• Adversarial Attacks on Image Super-Resolution: The vulnerability of super-
resolution models to adversarial attacks is evolving and needs more

Dept of CSE, CMRIT 2023-2024 Page 18


Facial Recognition On Low Resolution Images

investigation. Understanding and mitigating vulnerabilities can be crucial for


the security of systems relying on high-resolution images.
• Cross-Domain Super-Resolution: Research addressing the challenges of
super-resolving images across different domains or modalities, such as medical
imaging or satellite imagery, is relatively limited. Models that can adapt to
diverse domains is a research gap.

Market Analysis Considerations-

• Demand for High-Resolution Imaging Devices: Analyzing the market


demand for high-resolution imaging devices, such as cameras, smartphones,
and drones, is crucial. Understanding consumer preferences and industry needs
can guide the development of technologies that meet market expectations.
• Applications in Medical Imaging: There is a growing market for high-
resolution medical imaging devices and software. Analyzing the specific
requirements of medical professionals and healthcare institutions can find the
areas for advancements in image resolution technologies.
• Entertainment and Gaming Industry: The entertainment and gaming
industry often requires high-resolution graphics for immersive experiences.
Investigating market trends and demands in these industries can guide the
development of technologies that cater to their specific needs.

Dept of CSE, CMRIT 2023-2024 Page 19


Facial Recognition On Low Resolution Images

CHAPTER 3

Proposed Architecture And Design

We up-sample the low-resolution input image x to the target resolution using bicubic
interpolation. Then, we concatenate it with the noisy high resolution output image yt.
We illustrate the activation dimensions for a super-resolution model transitioning from
16x16 to 128x128. Self-attention is applied to the 16x16 feature maps.

Figure 3.1 Depiction of U-Net architecture of SR3

The U-Net architecture stands as a well-known neural network design utilized across a
spectrum of processing tasks, like super-resolution. A U-Net structure comprises an
encoder-decoder network bolstered by skip connections, facilitating effective capture
of both low-level and high-level features, also preserving integrity.

Dept of CSE, CMRIT 2023-2024 Page 20


Facial Recognition On Low Resolution Images

3.1 Data Flow Diagram


The diagram shows the flow of data from different sources to the system where it is
trained to produce high resolution images and subsequent testing is performed.

Figure 3.2 Data Flow Diagram

The data flow of SR3 (Single Image Super-Resolution) encompasses several key
stages, each contributing to the process of enhancing the resolution of input images.
Here's a depiction of the data flow:

1. Input Data Acquisition: The process begins with acquiring low-resolution


input images, which serve as the initial data fed into the SR3 model. These
images may be obtained from different areas, like datasets or real-world
scenarios.

2. Preprocessing: Before inputting the low-resolution images into the SR3 model,
preprocessing steps are applied to enhance their suitability for super-resolution
tasks. It includes processes like normalization, resizing, or noise reduction.

3. Model Input: The preprocessed low-resolution images are then trained using
the SR3 model. This model typically consists of a deep neural network
architecture, such as a U-Net.

Dept of CSE, CMRIT 2023-2024 Page 21


Facial Recognition On Low Resolution Images

4. Super-Resolution Processing: Within the SR3 model, the low-resolution


images undergo processing to generate high-resolution counterparts. This
process involves leveraging machine learning techniques to predict high-
frequency details and enhance image resolution while preserving important
structural features.

5. Postprocessing: After the super-resolution processing, postprocessing steps


can be used to further refine the output images. This may have processes like
denoising, contrast adjustment, or sharpening to improve the visual quality of
the high-resolution images.

6. Output Generation: The final output of the SR3 model consists of high-
resolution images generated from the input low-resolution images. These output
images exhibit enhanced clarity, sharpness, and detail compared to their low-
resolution counterparts.

7. Evaluation: To assess the performance of the SR3 model, evaluation metrics


such as PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity
Index) may be computed. These metrics quantify the fidelity and similarity of
the generated high-resolution images compared to ground truth high-resolution
images.

8. Deployment or Further Processing: Depending on the application, the high-


resolution images generated by SR3 may be deployed directly for downstream
tasks or further processed as needed. This could involve tasks like image
analysis, object recognition, or image editing.

In summary, the data flow of SR3 encompasses the acquisition of low-resolution input
images, preprocessing, super-resolution processing using a deep neural network model,
postprocessing, output generation of high-resolution images, evaluation, and potential
deployment or further processing for various applications.

Dept of CSE, CMRIT 2023-2024 Page 22


Facial Recognition On Low Resolution Images

3.2 Use Case Diagram


The use-case diagram shows the user interacting with the system to get the desired
outputs. It involves visualizing the interactions between the various components and
processes involved in the super-resolution process.

The actors in this scenario could include the input image, the super-resolution
algorithm, and the output high-resolution image. Each of these actors would have
corresponding use cases representing their actions and interactions within the system.

Figure 3.3 Use Case Diagram

Use case diagrams show the various interactions between the system and its users or
external components. Here's a representation for SR3 :

1. User: Represents the user interacting with the SR3 system.

2. Input Low-Resolution Image: The low-resolution image provided by the user


to the SR3 system for super-resolution processing.

3. Preprocessing: Preparing the input low-resolution image for super-resolution,


which may involve tasks such as normalization, resizing, or noise reduction.

4. Super-Resolution Processing: The core functionality of the SR3 system,


where the low-resolution input image undergoes processing to generate a high-
resolution output image.

Dept of CSE, CMRIT 2023-2024 Page 23


Facial Recognition On Low Resolution Images

5. Postprocessing: Optional step after super-resolution processing, involving


tasks like denoising, contrast adjustment, or sharpening to further refine the
output image.

6. Output High-Resolution Image: The high-resolution image generated by the


SR3 system, which is provided as output to the user.

7. Evaluation Metrics: Metrics such as PSNR (Peak Signal-to-Noise Ratio) and


SSIM (Structural Similarity Index) are computed to analyse the generated high-
resolution image compared to ground truth high-resolution images.

8. Feedback: Users may provide feedback on the generated high-resolution


image, used to build on the execution of the SR3 system in future iterations.

This diagram shows the various interactions and functionalities of the SR3 system,
encompassing input, processing, output, evaluation, and feedback loops.

3.3 UML Diagram


The UML diagram shows the relationship the user and the resolution system have, for
the input and output of data. The UML diagram includes several components such as
classes representing the input image, the super-resolution algorithm, and the output
high-resolution image.

Additionally, it depicts the relationships and interactions between these components,


through sequence diagrams illustrating the iterative refinement process.

A UML diagram for SR3 involves depicting the various components and their
relationships within the system. Here's a UML diagram for SR3:

1. User Interface (UI): Represents the interface by which interaction with the SR3
system is done by users. Users can provide input images and receive super-resolved
output images.

Dept of CSE, CMRIT 2023-2024 Page 24


Facial Recognition On Low Resolution Images

2. Controller: Acts as an intermediary between the UI and the SR3 Engine, handling
user inputs, triggering image processing tasks, and managing the flow of data.

3. SR3 Engine: The core component responsible for super-resolution processing. It


takes low-resolution input images as input and generates high-resolution output images
using advanced algorithms and deep learning models.

4. Preprocessing Module: Prepares the input images for super-resolution processing


by performing tasks such as normalization, resizing, and noise reduction.

5. Postprocessing Module: Performs optional post-processing tasks on the super-


resolved output images to further enhance their quality. This may include denoising,
contrast adjustment, or sharpening.

6. Evaluation Module: Computes evaluation metrics such as PSNR (Peak Signal-to-


Noise Ratio) and SSIM (Structural Similarity Index) to assess the quality of the
generated high-resolution images compared to ground truth images.

7. Feedback Loop: Allows users to provide feedback on the super-resolved images,


which can be used to build on the execution of the SR3 system in future iterations.

8. Image Data: Represents the input low-resolution images and output high-resolution
images processed by the SR3 system.

Dept of CSE, CMRIT 2023-2024 Page 25


Facial Recognition On Low Resolution Images

Figure 3.4 UML Diagram

This UML diagram provides a clear overview of the components and their interactions
within the SR3 system, facilitating understanding and development of the super-
resolution functionality.

Dept of CSE, CMRIT 2023-2024 Page 26


Facial Recognition On Low Resolution Images

CHAPTER 4

IMPLEMENTATION

We evaluate the performance of SR3 in super resolution tasks involving faces, natural
scenes, and synthetic images generated from a low-resolution model. This synthetic
dataset facilitates high resolution image synthesis through a cascaded modelling
approach.

4.1 Datasets
We conduct training of face super-resolution models on the Flickr-Faces-HQ (FFHQ)
dataset and assess their performance on CelebA-HQ. Additionally, we train
unconditional face and class conditional ImageNet generative models using DDPM on
the same datasets mentioned earlier.

Throughout both training and testing phases, we employ low-resolution images that
undergo down sampling via bicubic interpolation with antialiasing enabled. Employing
the largest central crop approach, we subsequently resize it to the target resolution using
area resampling to generate the high-resolution image.

4.2 Training
Implemented in Python using the PyTorch deep learning framework, SR3 integrates a
U-Net architecture trained with denoising objectives. The denoising process follows a
stochastic iterative refinement approach, drawing inspiration from denoising diffusion
probabilistic models.

Throughout training, we utilize datasets such as Flickr-Faces-HQ (FFHQ), employing


bicubic interpolation for down-sampling and preprocessing. Evaluation of the model's
performance involves standard metrics such as PSNR and SSIM on validation datasets.

Dept of CSE, CMRIT 2023-2024 Page 27


Facial Recognition On Low Resolution Images

Overall, the SR3 implementation provides a robust framework for super-resolution


tasks, delivering high-quality results across diverse image datasets and magnification
factors. Following model training, significant advancements were observed.

4.3 Evaluation Metrics


Mean Squared Error (MSE) and Peak Signal-to-Noise Ratio (PSNR) stand as widely
utilized metrics for evaluating the performance of super-resolution algorithms. MSE
measures the average squared difference between the original high-resolution image
and the super-resolved image generated by the algorithm, representing the average
reconstruction error.

Lower MSE values indicate superior performance in minimizing disparities between


the original and reconstructed images.

In contrast, PSNR offers a more easily interpretable measure of image quality by


comparing the peak signal power to that of the distortion introduced during the super-
resolution process. Evaluated as the ratio between the maximum possible signal power
and the noise power, expressed in decibels (dB), higher PSNR values signify enhanced
reconstruction fidelity.

This implies a higher ratio of signal power to noise power, resulting in less noticeable
distortion in the reconstructed image.

Within SR3, these metrics play a vital role as quantitative indicators of the effectiveness
of super-resolution algorithms in improving image quality.

By evaluating the MSE and PSNR values across different algorithms or parameter
configurations, both players and developers can gauge and optimize the performance
of the super-resolution feature within the game, ensuring it aligns with the desired
standards of image enhancement and fidelity.

Dept of CSE, CMRIT 2023-2024 Page 28


Facial Recognition On Low Resolution Images

4.4 Algorithm
Within SR3, denoising plays a pivotal role in elevating the visual quality of images,
particularly in the realm of single image super-resolution (SR). Denoising techniques
are pivotal for eliminating undesired noise from low-resolution input images before
applying super-resolution algorithms, consequently refining the fidelity of the resulting
high-resolution images.

Denoising is segmented into several key steps:

1. Collecting Data and Preprocessing: Initially, a dataset comprising pairs of


noisy and clean images is amassed. Noisy images may either be synthetically
generated by adding noise to clean images or sourced from real-world origins.

2. Model Training: Training a model has the objective of minimizing a loss


function that quantifies the disparity between the denoised output and the
pristine target image.

3. Model Evaluation: The trained denoising model undergoes assessment on a


distinct validation set to gauge its efficacy in noise removal while retaining
image intricacies. Metrics like peak signal-to-noise ratio (PSNR) and structural
similarity index (SSIM) are commonly employed to assess denoising
effectiveness.

4. Inference: During the inference stage, the trained denoising model is applied
to noisy input images to generate denoised outputs. The model's parameters
remain fixed, and no further training occurs during this phase.

5. Integration with Super-Resolution: Subsequently, denoised images are fed


into the super-resolution algorithm to produce high-resolution images with
heightened clarity and fidelity. By eliminating noise from the input images, the
denoising process aids the super-resolution algorithm in generating more
precise and visually captivating results.

In essence, the denoising process in SR3 encompasses training deep learning-based


models to adeptly eliminate noise from low-resolution images, thereby augmenting the
quality of input data for subsequent super-resolution tasks.

Dept of CSE, CMRIT 2023-2024 Page 29


Facial Recognition On Low Resolution Images

CHAPTER 5

RESULTS AND DISCUSSION

The performance evaluation of SR3 encompassed diverse datasets, including FFHQ


and CelebA-HQ, for both face and natural image super-resolution tasks. Training
iterations involved refining the denoising model using bicubic-interpolated low-
resolution inputs. Results showcased SR3's efficacy in producing high-quality super-
resolved images, outperforming baseline methods and achieving state-of-the-art
performance across varying magnification factors and image categories.

SR3 employs conditional diffusion models to address single-image super-resolution


tasks. It initiates the output image with random Gaussian noise, progressively refining
it based on the low-resolution input.

Our findings underscore SR3's effectiveness across different image types and
magnification factors, showcasing superior performance compared to several GAN and
Normalizing Flow baselines. Human perception studies affirm SR3's capability in
generating outputs with high fidelity, as indicated by fool rates close to 50% for faces
and 40% for natural images.

However, the complexities with more number of refinement steps during inference
presents a practical challenge. We explored trade-offs between sample quality and
generation speed, achieving satisfactory results with just four refinement steps.

Recent research suggests alternative approaches for faster samplers in diffusion


models. Additionally, while self-attention enhances model performance, it imposes
constraints on output dimensions, which we aim to address in future iterations of SR3.
Regarding biases, we recognize the importance of mitigating biases inherent in
generative models like SR3.

Dept of CSE, CMRIT 2023-2024 Page 30


Facial Recognition On Low Resolution Images

While our log-likelihood-based objective aims to cover multiple modes, instances of


mode drop were observed, where the model consistently generated similar outputs for
the same input.

Figure 5.1 High-resolution output

Figure 5.2 High-resolution output

Figure 5.1 and Figure 5.2 show the training results, where the high-resolution images
are generated. The performance metrics are also displayed.

Dept of CSE, CMRIT 2023-2024 Page 31


Facial Recognition On Low Resolution Images

Moreover, in face super-resolution, the model tended to produce smooth skin textures,
overlooking details such as moles, pimples, and piercings present in the reference
image. These biases underscore the necessity for further investigations and
considerations before deploying SR3 in production settings. Nevertheless, diffusion
models like SR3 hold promise for mitigating dataset biases by generating synthetic data
from underrepresented groups.

Dept of CSE, CMRIT 2023-2024 Page 32


Facial Recognition On Low Resolution Images

CHAPTER 6

CONCLUSION

The development of facial recognition technology for low-resolution images involves


the creation of specialized algorithms, datasets, and techniques to accurately identify
faces despite limitations in image quality. Researchers and engineers contribute by
innovating methods that use deep learning, specifically tailored for low-resolution
facial recognition tasks. They also play a crucial role in assembling datasets containing
diverse low-resolution facial images, crucial for training and evaluating algorithm
performance under challenging conditions. These efforts extend to devising effective
feature extraction techniques to improve facial feature recognition in images with
limited detail, noise, or distortion.

The integration of low-resolution facial recognition into various applications spans


security, biometric authentication, healthcare, and marketing domains. Its utility lies in
enhancing security measures, providing secure authentication methods, enabling
methods for people with disabilities, and analysing customer behaviour for targeted
marketing strategies. Overall, contributions in this field aim to advance accuracy and
usability, expanding the applicability of facial recognition technology across diverse
environments while ensuring originality in content creation.

Dept of CSE, CMRIT 2023-2024 Page 33


Facial Recognition On Low Resolution Images

6.1 Scope for future work

• Enhanced Accuracy: As technology progresses, facial recognition algorithms


are expected to advance, becoming more sophisticated in identifying faces even
within low-resolution images. Models can be trained on extensive datasets
containing low-resolution images, to make their accuracy better.

• Strengthened Security Measures: The combination of low-resolution facial


recognition can bolster security across various sectors like law enforcement,
surveillance, and access control. It aids in identifying individuals in CCTV footage
or security camera feeds, even amidst poor image quality, thereby aiding in the
prevention and investigation of criminal activities.

• Biometric Authentication: Low-resolution facial recognition can be


seamlessly integrated into biometric authentication systems, providing a
convenient and secure means for user verification. Especially beneficial in mobile
devices, ATMs, and other systems necessitating user authentication, it enhances
security while ensuring a smooth user experience.

• Assistive Technologies: Within healthcare and assistive technologies, low-


resolution facial recognition serves to develop systems assisting individuals with
disabilities. For instance, it enables devices to recognize facial expressions or
gestures within low-quality images, thereby facilitating communication and
interaction for individuals with mobility or speech impairments.

• Privacy Concerns and Ethical Considerations: Despite the potential benefits,


the widespread adoption of low-resolution facial recognition raises pertinent
concerns surrounding privacy and ethical implications. Consequently, future
advancements in this domain will likely entail addressing these concerns through
the implementation of robust privacy-enhancing technologies, transparent policies,
and ethical guidelines, ensuring responsible deployment and usage.

Dept of CSE, CMRIT 2023-2024 Page 34


Facial Recognition On Low Resolution Images

REFERENCES
[1] Image Super Resolution via Iterative Refinement, Chitwan Saharia, Jonathan Ho,
William Chan , Tim Salimans, David J. Fleet , and Mohammad Norouzi, IEEE, 2023
[2] Dense Nested Attention Network for Infrared Small Target Detection, Boyang Li ,
Chao Xiao , Longguang Wang , Yingqian Wang , Zaiping Lin ,Miao Li, Wei An , and
Yulan Guo , Senior Member, IEEE, 2023
[3] Deep Convolutional Neural Network for Inverse Problems in Imaging, Chitwan
Saharia, Jonathan Ho, William Chan , Tim Salimans, David J. Fleet , and Mohammad
Norouzi, IEEE, 2019
[4] High-Resolution Image Synthesis and Semantic Manipulation with Conditional
GANs, Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan
Catanzaro, NVIDIA Corporation, UC Berkeley, 2018
[5] Convolutional Sparse Coding for Compressed Sensing CT Reconstruction, Chitwan
Saharia, Jonathan Ho, William Chan , Tim Salimans, David J. Fleet , and Mohammad
Norouzi, IEEE,2023
[6] A variational auto-encoder approach for image transmission in noisy channel, Amir
Hossein Estiri, Ali Banaei, Benyamin Jamialahmadi, Mahdi Jafari siavoshani, 2021
[7] A comparative study on variational autoencoder and generative adversarial
networks, Mirza Sami , Iftekharul Mobin, 2019

Dept of CSE, CMRIT 2023-2024 Page 35

You might also like