0% found this document useful (0 votes)
189 views59 pages

PROJECT REPORT FORMAT 2025

The project report titled 'Photo-Sketch Synthesis Using Autoencoder Based Generative Modeling' presents a novel method for generating sketches from photos using an autoencoder architecture. The system captures essential features from images to create hand-drawn style sketches, which can aid law enforcement in various applications such as suspect identification and verification of witness testimonies. The report outlines the methodology, project scope, and implementation details, emphasizing advancements in image processing and deep learning techniques.

Uploaded by

Sai Prateik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
189 views59 pages

PROJECT REPORT FORMAT 2025

The project report titled 'Photo-Sketch Synthesis Using Autoencoder Based Generative Modeling' presents a novel method for generating sketches from photos using an autoencoder architecture. The system captures essential features from images to create hand-drawn style sketches, which can aid law enforcement in various applications such as suspect identification and verification of witness testimonies. The report outlines the methodology, project scope, and implementation details, emphasizing advancements in image processing and deep learning techniques.

Uploaded by

Sai Prateik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 59

PHOTO-SKETCH SYNTHESIS USING AUTOENCODER BASED

GENERATIVE MODELING

A PROJECT REPORT

Submitted by

Jack Andre J RA2011026020147


Abishek Raj M RA2011026020131
Charudeve KS RA2011026020139

Under the guidance of

Mrs. Angeline R
Assistant Professor (Selection Grade),
Department of Computer Science and Engineering

in partial fulfillment for the award of the degree of

BACHELOR OF TECHNOLOGY
in

COMPUTER SCIENCE AND ENGINEERING

with specialization in

ARTIFICIAL INTELLIGENCE AND MACHINE INTELLIGENCE

of

FACULTY OF ENGINEERING AND TECHNOLOGY

S R M INSTITUTE OF SCIENCE AND TECHNOLOGY


CHENNAI - 600089

May 2025
S R M INSTITUTE OF SCIENCE AND TECHNOLOGY
(Deemed to be University U/S 3 of UGC Act, 1956)

BONAFIDE CERTIFICATE

Certified that this project report titled “PHOTO-SKETCH SYNTHESIS USING


AUTOENCODER BASED GENERATIVE MODELING” is the bonafide work of
JACK ANDRE J [REG NO: RA2011026020147], ABISHEK RAJ M [REG NO:
RA2011026020131], CHARUDEVE KS [REG NO: RA2011026020139] who
carried out the project work under my supervision. Certified further, that to the best of
my knowledge, the work reported herein does not form any other project report or
dissertation on the basis of which a degree or award was conferred on an occasion on
this or any other candidate.

SIGNATURE SIGNATURE

Name of the Supervisor Name of the HoD


Designation Professor/Associate Professor and Head
Name of the Department, Name of the Department,
S R M Institute of Science and Technology, S R M Institute of Science and Technology,
Chennai-89. Chennai-89.

Submitted for the project viva-voce held on___________at S R M


Institute of Science and Technology, Chennai -600089.

INTERNAL EXAMINER EXTERNAL EXAMINER

ii
S R M INSTITUTE OF SCIENCE AND TECHNOLOGY,
CHENNAI - 89

DECLARATION

We hereby declare that the entire work contained in this project report
titled “PHOTO-SKETCH SYNTHESIS USING AUTOENCODER
BASED GENERATIVE MODELING” has been carried out by JACK
ANDRE [REG NO: RA2011026020147], ABISHEK RAJ M [REG
NO: RA2011026020131], CHARUDEVE KS [REG NO:
RA2011026020139] at SRM Institute of Science and Technology,
Ramapuram, Chennai- 600089, under the guidance of Ms. Angeline R,
M.Tech., (Ph.D)., Assistant Professor (Selection Grade), Department of
Computer Science and Engineering.

Place: Chennai
Date: JACK ANDRE J

ABISHEK RAJ M

CHARUDEVE KS

iii
Own Work Declaration

Department of Computer Science and Engineering

S R M Institute of Science and Technology


Own Work Declaration form

Degree/ Course : B. Tech Computer Science and Engineering with Specialization in


Artificial Intelligence and Machine Learning
Student Name : JACK ANDRE J, ABISHEK RAJ M, CHARUDEVE KS
Registration Number : RA2011026020147, RA2011026020131, RA2011026020139
Title of Work : PHOTO-SKETCH SYNTHESIS USING AUTOENCODER BASED
GENERATIVE MODELING
We hereby certify that this assessment complies with the University’s Rules and Regulations relating to
Academic misconduct and plagiarism, as listed in the University Website, Regulations, and the Education
Committee guidelines.
We confirm that all the work contained in this assessment is our own except where indicated, and that We have
met the following conditions:
● Clearly references / listed all sources as appropriate
● Referenced and put in inverted commas all quoted text (from books, web, etc)
● Given the sources of all pictures, data etc. that are not our own
● Not made any use of the report(s) or essay(s) of any other student(s) either past or present
● Acknowledged in appropriate places any help that we have received from others (e.g., fellow students,
technicians, statisticians, external sources)
● Compiled with any other plagiarism criteria specified in the Course handbook / University website
We understand that any false claim for this work will be penalised in accordance with the University policies and
regulations.

DECLARATION:
We are aware of and understand the University’s policy on Academic misconduct and
plagiarism and we certify that this assessment is our own work, except where indicated by
referring, and that we have followed the good academic practices noted above.

RA2011026020147 RA2011026020131 RA2011026020139

iv

ACKNOWLEDGEMENT
We place on record our deep sense of gratitude to our lionized Chairman
Dr. R. SHIVAKUMAR, MBBS., MD., for providing us with the requisite
infrastructure throughout the course.
We take the opportunity to extend our hearty and sincere thanks to our
Dean, Dr. M. SAKTHI GANESH., Ph.D., for maneuvering us into
accomplishing the project.
We take the privilege to extend our hearty and sincere gratitude to the
Professor and Chairperson, Dr. K. RAJA, Ph.D., for his suggestions, support and
encouragement towards the completion of the project with perfection.
We thank our honorable Head of the department Dr.XXX. Designation,
Department for his/her constant motivation and unwavering support.
We express our hearty and sincere thanks to our guide
Dr.XXX. Designation, Department for his/her encouragement, consecutive
criticism and constant guidance throughout this project work.
Our thanks to the teaching and non-teaching staff of the Department of
Computer Science and Engineering of S R M Institute of Science and Technology,
Chennai, for providing necessary resources for our project

v
ABSTRACT

The transformation of images into sketches is a critical area in computer vision with wide-

ranging applications from artistic rendering to enhancing visual content. This project introduces

an innovative method for synthesizing sketches from photos using an autoencoder-based

generative model. Unlike conventional methods that modify input images to create sketch-like

appearances, our system employs autoencoders to generate sketches from scratch. The system

is structured around an encoder-decoder architecture built upon autoencoders. When presented

with a photo, the encoder extracts crucial features and representations, capturing the image's

underlying structure and content. These extracted features are then used by the decoder to

produce a sketch, employing generative modeling techniques to simulate hand-drawn strokes,

lines, and textures. This photo-sketch synthesis system presents significant benefits for law

enforcement agencies by aiding them in suspect identification, complement facial recognition

technology, assist in cold case investigations, verify witness testimonies and support

undercover operations.

vi
TABLE OF CONTENTS

Page. No

ABSTRACT vi

LIST OF FIGURES x

LIST OF TABLES xi

LIST OF ACRONYMS AND ABBREVIATIONS xii

1 INTRODUCTION 1
1.1 Problem Statement 12
1.2 Aim of the Project 13
1.3 Project Domain 13
1.4 Scope of the Project 13
1.5 Methodology 14
1.6 Organization of the Report 14

2 LITERATURE REVIEW 16

3 PROJECT DESCRIPTION 19
3.1 Existing System 19
3.2 Proposed System 19
3.2.1 Advantages 19
3.3 Feasibility Study 20
3.3.1 Economic Feasibility 20

vii
3.3.2 Technical Feasibility 20
3.3.3 Social Feasibility 20
3.4 System Specification 21
3.4.1 Hardware Specification 21
3.4.2 Software Specification 21
3.4.3 Standards and Policies 21

4 PROPOSED WORK 22
4.1 General Architecture 22
4.2 Design Phase 22
4.2.1 Data Flow Diagram 23
4.2.2 UML Diagram 24
4.2.3 Use Case Diagram 25
4.2.4 Sequence Diagram 25
4.3 Module Description 26
4.3.1 Module 1: Image Processing 26
4.3.2 Module 2: Feature Extraction 26
4.3.3 Module 3: Image to Sketch Synthesis 27
4.3.4 Step 2: Processing of Data 27
4.3.5 Step 3: Split the Data 28
4.3.6 Dataset Sample 28
4.3.7 Step 4: Building the Model 29
4.3.8 Step 5: Compiling and Training the Model 30

5 IMPLEMENTATION AND TESTING 31


5.1 Input and Output 31
5.1.1 Image of the Subject 31
5.1.2 Predicted Sketch Synthesis of Subject 31

viii
5.2 Testing 32
5.2.1 Types of Testing 32
5.2.2 Unit testing 32
5.2.3 Integration testing 34
5.2.4 Functional testing 35
5.2.5 Test Result 35

6 RESULTS AND DISCUSSIONS


6.1 Efficiency of the Proposed System
6.2 Comparison of Existing and Proposed System

7 CONCLUSION AND FUTURE ENHANCEMENTS


7.1 Conclusion
7.2 Future Enhancements
7.3 Results

8 SOURCE CODE & POSTER PRESENTATION


8.1 Sample Code 39

References

A. Sample screenshots

B. Proof of Publication/Patent filed/ Conference Certificate

ix
LIST OF FIGURES

4.1 Architecture Diagram 23


4.2 Data Flow Diagram 24
4.3 Uml Diagram 25
4.4 Use Case Diagram 26
4.5 Sequence Diagram 26
4.6 Pre-Processing Of Data 28
4.7 Dataset Of Photos 29
4.8 Dataset Of Sketches 29
4.9 Model Summary 30
4.10 Code To Compile Model And Fit Model To Dataset 31

5.1 Original Image Of The Subject 32


5.2 Predicted Sketch Of The Subject 33
5.3 Image Augmentation Using Opencv-Python 34
5.4 Code For Printing Images And Their Corresponding Sketches 35
5.5 Sketches And Their Image Printed As Output 35
5.6 Training And Testing Split 36
5.7 Model Is Being Compiled And Trained 36

7.1 Prediction And Sketch Matched 38


7.2 Model Prediction And Sketch Matched 38

8 Sample Code 40

x
LIST OF TABLES

Table No. Table Name Page No.


1 Literature survey 16

xi
LIST OF ACRONYMS AND ABBREVIATIONS

API APPLICATION PROGRAM INTERFACE


CBIR CONTENT BASED IMAGE RETRIEVAL
CNN CONVOLUTIONAL NEURAL NETWORK
CUHK CHINESE UNIVERSITY OF HONG KONG
SBIR SKETCH BASED IMAGE RETRIEVAL

xii
CHAPTER 1 (page number 1

starts from this page)

INTRODUCTION

Face sketch synthesis is a crucial aspect of face style transformation, essential for generating
sketches from input photos. Its utility spans across law enforcement and digital entertainment, where it
aids in enhancing face recognition accuracy by minimizing texture disparities. Exemplar-based
methods, which rely on photo-sketch pairs for training, are prominent in this field. Face sketch
recognition tackles modality disparities through synthesis, projection, or optimization-based
approaches. Additionally, sketch-based image synthesis enables the creation of realistic images from
sketches, catering to individuals without artistic expertise. The formulation of face sketches from
reference photos facilitates tasks in law enforcement and entertainment sectors. Exemplar-based
techniques involve segmenting photos and sketches into patches to streamline the synthesis process.
Overall, face sketch synthesis plays a vital role in various applications, offering efficient solutions for
generating sketches and enhancing recognition accuracy.

1.1 PROBLEM STATEMENT

Edges, boundaries, and contours are pivotal in both computer graphics and computer vision,
conveying 3D shapes and indicating occlusion events. This paper aims to generate contour drawings,
capturing scene outlines, distinct from boundary detection. Content-Based Image Retrieval (CBIR)
offers valuable solutions across various domains, including medical and meteorological applications.
Facial Sketched-Real Image Retrieval (FSRIR) poses a significant challenge due to human face
complexity and domain disparities. This paper contributes by extending the CUFS dataset, presenting
three retrieval systems, and evaluating them on diverse datasets. The proposed systems outperform
recent algorithms, with InfoGAN and Vision Transformer (ViT) excelling in distinguishing freehand
sketches and objects. The ViT system achieves a 1.183 F-score on the ESRIR dataset. Future research
could explore alternative distance metrics and feature extraction algorithms.

12
1.2 AIM OF THE PROJECT

● Enhance model generalization to diverse groups by improving its adaptability to new contexts and
data distributions.
● Update contextual associations between newly derived descriptions and features to ensure
relevance and accuracy.
● Implement a more efficient linear-based attention scheme to enable effective long sequence
interactions on large inputs, enhancing model scalability and performance.
● Improve fine-grained feature extraction capabilities to produce high-quality results with enhanced
detail and accuracy.
● Optimize latent vector optimization processes to minimize errors and improve overall model
performance and image fidelity.

1.3 PROJECT DOMAIN

The domain of the project is CNN (Convolutional Neural Network) and DL (Deep Learning),
particularly emphasizing their applications in computer vision and image processing.

1.4 SCOPE OF THE PROJECT

The scope of this project encompasses the development and implementation of a Collaborative
Generative Representation Learning Neural Network Algorithm for image processing tasks. This
includes modules for image preprocessing, feature extraction, and sketch to image prediction,
leveraging techniques such as histogram equalization and neural network-based feature extraction. The
project aims to address challenges in handling non-linear relationships, maintaining continuity in neural
network operation, and learning from similar events.

13
1.5 METHODOLOGY

In this project, the proposed Collaborative Generative Representation Learning Neural Network
Algorithm is employed to bridge the domain gap between sketches and images, facilitating robust
mapping between the two modalities. By incorporating skip layer connections, the neural network
enhances its ability to process both iconic aspects and advanced features of images, thereby improving
performance in identifying small and large images alike. Additionally, noise is strategically added to
deeper hidden layers to enhance the robustness of the network. The project encompasses three main
modules: Image Preprocessing, Feature Extraction, and Sketch to Image Prediction. Each module plays
a critical role in preparing, analyzing, and predicting images, leveraging techniques such as histogram
equalization for contrast enhancement and convolutional layers for feature extraction. Through this
comprehensive methodology, the project aims to advance the state-of-the-art in learning the mapping
between sketches and images.

1.6 ORGANIZATION OF THE REPORT

Chapter 2 Contains Literature review of relevant papers.

Chapter 3 examines the challenges encountered in Content-Based Image Retrieval (CBIR),


particularly in Facial Sketched-Real Image Retrieval (FSRIR). It presents solutions through the
extension of the Chinese University Face Sketch (CUFS) dataset, resulting in the creation of the
Extended Sketched-Real Image Retrieval (ESRIR) dataset. Additionally, it introduces novel retrieval
systems based on convolutional autoencoder, InfoGAN, and Vision Transformer (ViT) models,
showcasing their effectiveness in differentiating facial sketches and objects across datasets. Cloud-
based approaches for handling large-scale datasets are discussed, along with evaluations of model
performance on diverse image types. The findings emphasize the significance of robust feature
extraction methods and suggest avenues for broader applications in image classification and retrieval.

Chapter 4 outlines the proposed work, starting with the general architecture depicted in Figure 4.1,
inspired by the Collaborative Generative Representation Learning Neural Network (CGRL-NN)
tailored for facial recognition. The design phase includes a data flow diagram (Figure 4.2), illustrating
the image processing pipeline from raw data input to feature extraction and neural network analysis. A
UML diagram (Figure 4.3) further details the processing stages. The proposed system comprises three
modules: Image Preprocessing, Feature Extraction, and Image to Sketch Synthesis, described in detail.
14
Additionally, steps for data processing, model building, and training are delineated, supported by
sample datasets and model summaries. The feasibility study assesses economic, technical, and social
feasibility, ensuring practicality and acceptance of the proposed system.

Chapter 5 focuses on the implementation and testing of the proposed system. Input and output
evaluation involve feeding a random student image into the Neural Network to generate a synthesized
sketch, as depicted in Figures 5.1 and 5.2. Testing encompasses various types, including Unit Testing,
Integration Testing, and Functional Testing. Unit testing verifies source code units for efficiency and
correctness, illustrated in Figure 5.3 for image augmentation. Integration testing assesses system
efficiency with functional requirements, while functional testing verifies output against provided inputs.
A testing strategy incorporating unit, integration, and functional testing ensures comprehensive
evaluation of the system's performance and compliance with requirements.

Chapter 6 provides the results and discussions, highlighting the efficiency of the proposed system
and comparing it to existing methods. It discusses the use of an encoder-decoder architecture, achieving
an accuracy of approximately 40 percent. The proposed system demonstrates suitability for various
tasks and outperforms more data-intensive models with smaller datasets. Results are visually
represented in Figures 6.1 and 6.2.

Chapter 7 provides the conclusion and outlines future enhancements. The proposed approach
involves an end-to-end fully convolutional network for modeling the mapping between face photos and
sketches. Experimental results demonstrate the efficacy of this approach. Future enhancements include
refining the loss function, experimenting with different databases, and exploring correlations with non-
photorealistic rendering methodologies.

Chapter 8 contains the source code and details about the poster presentation. It includes sample code
snippets for reference.

CHAPTER 2
15
LITERATURE REVIEW

This chapter provides a thorough examination of the body of the present research and academic
publications pertinent to the subject of the project. By means of this overview, important discoveries in
the subject, technical developments, and the evolution of approaches are described. Critical evaluations
and comparisons of several methods and models also offer a useful basis for comprehending the state-
of-the-art at the moment and flagging up any holes or chances for more investigation and creativity. This
chapter is essential to setting the project in the bigger picture by applying knowledge and lessons from
earlier research.

Mohamad M et al. [6] examine the realm of cross-modal text-image retrieval within remote sensing
(RS), highlighting its potential as a versatile approach for extracting valuable insights from RS
databases. They note that current methodologies are primarily designed to accommodate queries in
English, potentially limiting accessibility for non-English speakers. To address this limitation, the study
advocates for the integration of multilingual queries, which can enhance interaction with the retrieval
system and broaden the accessibility of RS data. In response, they propose a multi-language framework
employing transformers. Anticipated benefits include improved reliability and resilience in data
retrieval, as well as enhanced quality and consistency of extracted data. Moreover, the proposed
framework offers easy scalability to accommodate increasing demands. However, the study
acknowledges challenges in optimizing system performance, citing difficulties in achieving better
performance metrics and describing the process of message updating as tedious. Such complexities may
inadvertently lead to errors in the resulting data. These insights, derived from research conducted in
2022, underscore both the advancements and challenges within the domain of cross-modal text-image
retrieval in remote sensing.

Zhenghang Yuan et al. [9] explore the application of intelligent human-computer interaction systems
in leveraging visual question answering (VQA) for remote sensing scenes. Despite the considerable
attention VQA has garnered within computer vision, its counterpart tailored for remote sensing data
(RSVQA) remains in its nascent stages of development. The study underscores the potential for RSVQA
to simplify the implementation process, thereby enhancing accessibility to insights from remote sensing
imagery. Notably, the benefits of implementing RSVQA are evident in its ability to streamline tasks
affected by visual data. However, challenges persist, including the need for built-in error handling
mechanisms and concerns over the time-consuming nature of the approach. Additionally, the study

16
identifies limitations in existing solutions, citing their ineffectiveness in addressing pertinent issues.
These observations, drawn from research conducted in 2022, shed light on the evolving landscape of
intelligent human-computer interaction systems and the opportunities they present for advancing remote
sensing applications.

Arka Ujjal Dey et al. [10] delve into the open-ended question answering task of Text-VQA, which
often necessitates deciphering scene text content from images, even when such text is obscured or
seldom observed. To address the inherent challenges of this zero-shot task, the study proposes
leveraging external knowledge in a generalized manner to enhance understanding of scene text. Their
system is tailored to extract, validate, and comprehend comprehension tasks involving vision language
understanding, employing a conventional multimodal transformer. Noteworthy attributes of the
proposed system include its quick and efficient usability, capable of minimizing human intervention
requirements. Additionally, it offers a relatively straightforward and computationally economical
approach. However, the implementation of such systems may incur significant capital and operating
expenditures, particularly due to increased payloads. Moreover, despite its capabilities, the system may
struggle to achieve noise-resistant detection. These insights, derived from research conducted in 2022,
underscore both the advancements and limitations in leveraging external knowledge for enhancing text
comprehension within visual question answering tasks.

Tengfei Wu et al. [8] examine the rising popularity of fingerprints in the biometrics industry,
attributing it to the myriad benefits they offer. Within this context, the research community has
increasingly turned to deep learning-based techniques for palmprint recognition, capitalizing on deep
learning's remarkable performance in computer vision tasks. Notably, deep hashing networks (DHNs)
have emerged as a promising approach, offering the ability to compress storage requirements and
expedite matching and retrieval processes by encoding outputs as binary bit strings. The study highlights
the magnificent effectiveness of DHNs in producing results, alongside their low deployment cost and
potential to enhance worst-case performance scenarios. However, it also underscores challenges faced
by DHNs, particularly in adverse conditions where they may perform poorly. These challenges include
feature loss and inaccurate feature extraction, which can introduce distortions affecting the readability
and measurability of attribute values. These insights, derived from research conducted in 2022, shed
light on the evolving landscape of deep learning-based techniques in palmprint recognition and their
implications for biometric security applications.

17
Arai, Hayato, et al. [55] introduce a novel framework termed Disease-oriented Image Embedding
with Pseudo-Scanner Standardization (DI-PSS), aimed at developing a reliable Content-Based Image
Retrieval (CBIR) system tailored to clinical brain MRI databases. This framework comprises two
primary methods: harmonization of data to mitigate variations stemming from diverse scanning settings,
and a technique for generating low-dimensional embeddings conducive to illness categorization. Despite
the significance of clinical brain MRI databases, research on CBIR in this domain has been relatively
scarce. The study highlights the high robustness and imperceptibility of DI-PSS, emphasizing its
efficacy in clinical settings. Additionally, it notes the framework's efficiency and its capacity to reduce
hardware resource consumption. However, challenges exist, such as the inability to implement DI-PSS
in real-time scenarios and the risk of inaccurate estimations of missing pixels, which can increase the
complexity of the problem. These insights, drawn from research conducted in 2021, underscore the
potential of DI-PSS in advancing CBIR systems for clinical brain MRI databases, while also recognizing
the challenges that need to be addressed for its effective implementation.

Luqing Luo et al. [3] present a novel hybrid framework integrating a multi-channel deep learning
network with a non-iterative and fast feedforward neural network to address the stringent efficiency and
accuracy requirements in intelligent manufacturing. This framework serves as an intelligent tool
recognition system, aiming to achieve a balance between accurate feature extraction and swift
identification. The approach combines the random parameter assignment process of Extreme Learning
Machines (ELMs) with the fine-tuning capabilities of Convolutional Neural Networks (CNNs), thus
offering increased flexibility in model architectures. By leveraging this hybrid framework, the research
anticipates enhanced efficiency and speed in intelligent manufacturing processes. However, it
acknowledges the need for additional configuration and raises concerns regarding the system's ability to
meet current network business demands. Moreover, the system is described as opportunistic and
uncontrollable, implying potential challenges in its implementation. These insights, derived from
research conducted in 2020, underscore the innovative strides in integrating deep learning techniques
into intelligent manufacturing systems while also recognizing the associated limitations and areas for
improvement.

Sain, Aneeshan et al. [56] presents a thesis that aims to improve Sketch-Based Image Retrieval
(SBIR) for real-world use. The research utilizes methods like Cross-Modal Co-Attention and Meta-

18
Learning-based Variational Auto-Encoder to enhance system performance and practicality. The study
leverages paired data of free-hand sketches and photos to improve accuracy in SBIR. The research
demonstrates improved accuracy in SBIR, making it a significant contribution to the field. The study
was conducted in 2023.

Khokhlova, Margarita, et al. [57] present a paper that introduces a Multi-Modal Network for aerial
image retrieval. The research leverages original images and segmented regions with a Siamese network
for feature extraction and a kNN classifier for geo-matching. The approach outperforms state-of-the-art
descriptors like GEM and ResNet50. The study utilizes multi-modal data with labeled information and
extracts descriptors using a Siamese network. The research was conducted in 2020.

19
CHAPTER 3

PROJECT DESCRIPTION

3.1 EXISTING SYSTEM

The utilization of Content-Based Image Retrieval (CBIR), particularly in Facial Sketched-Real


Image Retrieval (FSRIR), poses significant challenges due to the complexity of facial features and the
need for effective similarity matching algorithms. This paper addresses these challenges by extending
the Chinese University Face Sketch (CUFS) [54] dataset to create the Extended Sketched-Real Image
Retrieval (ESRIR) dataset, comprising 53,000 facial sketches and 53,000 real facial images.
Additionally, it introduces three new retrieval systems based on convolutional autoencoder, InfoGAN,
and Vision Transformer (ViT) unsupervised models, demonstrating their effectiveness in differentiating
facial sketches and objects across various datasets. The study also presents cloud-based approaches to
handle large-scale datasets and evaluates the models' performance on diverse image types, showcasing
their superiority over recent algorithms. The findings underscore the significance of robust feature
extraction methods and propose avenues for further research, emphasizing the potential for broader
applications in image classification and retrieval.

3.2 PROPOSED SYSTEM

In our proposed system, we address the challenge of learning the mapping between sketches and
images by employing Collaborative Generative Representation Learning. This technique enables the
creation of accurate sketches corresponding to images while promoting robustness against small
perturbations. By focusing on the iconic aspects of objects during image scanning, we aim to bridge
the inherent domain gap between sketches and photos. Utilizing skip layer connections in the neural
network structure allows for more comprehensive information transmission, particularly in
recognizing lower image features. Conversely, non-skip layer connections facilitate the analysis of
advanced features but may lead to overreliance on original image features, hindering the recognition of
small images. Through our approach, we optimize the neural network's ability to identify both small
and large images effectively, mitigating the impact of noise and enhancing confidence in image
recognition tasks.

3.2.1 ADVANTAGES

20
● Boost performance on very high-dimensional datasets
● Lessen computing workload while enhancing detection capabilities
● Simplicity and explainability
● Reveal highly nonlinear relationships
● Guide label assignment effectively and boost label confidence
● Simplify implementation process

3.3 FEASIBILITY STUDY

A feasibility study is conducted to assess the viability of the project and analyze its strengths and
weaknesses. In this context, the feasibility study is conducted across three dimensions:

• Economic Feasibility

• Technical Feasibility

• Social Feasibility

3.3.1 ECONOMIC FEASIBILITY

The proposed system does not require expensive equipment, making it economically feasible.
Development can be carried out using readily available software, eliminating the need for additional
investment.

3.3.2 TECHNICAL FEASIBILITY

The proposed system is based entirely on a machine learning model utilizing tools such as
Anaconda prompt, Visual Studio, Kaggle datasets, and Jupyter Notebook, all of which are freely
available, ensures technical feasibility. The technical skills required to use these tools are practical and
accessible, further supporting the feasibility of the project.

3.3.3 SOCIAL FEASIBILITY

The proposed system is based entirely on a machine learning model. Utilizing tools such as

21
Anaconda prompt, Visual Studio, Kaggle datasets, and Jupyter Notebook, all of which are freely
available, ensures technical feasibility. The technical skills required to use these tools are practical and
accessible, further supporting the feasibility of the project.

3.4 SYSTEM SPECIFICATION

An effective system is crucial for any computational task. It's important to have the correct
hardware and software components to ensure everything runs smoothly. From strong processors to
essential software packages, each part helps create an efficient environment for data analysis and
machine learning tasks

3.4.1 HARDWARE SPECIFICATION

● Processor: Intel 10th gen or higher with i5, i7 or i9


● Ethernet connection (LAN) OR a wireless adapter (Wi-Fi)
● Hard Drive: Minimum 100 GB; Recommended 200 GB or more
● Memory (RAM): Minimum 8 GB; Recommended 32 GB or above

3.4.2 SOFTWARE SPECIFICATION

● Python
● Anaconda
● Jupyter Notebook
● TensorFlow
● Keras
● opencv-python
● pandas
● matplotlib

22
CHAPTER 4

PROPOSED WORK

4.1 GENERAL ARCHITECTURE

Figure 4.1: Architecture Diagram

Figure 4.1 illustrates a potential architecture inspired by Collaborative Generative Representation


Learning Neural Network (CGRL-NN), tailored for a facial recognition system.

4.2 DESIGN PHASE

During the design phase, diverse diagrams and models are crafted to depict various elements of the
system, such as its components, interactions, and data flow. These diagrams, including UML, sequence,
use case, and data flow diagrams, aid in conveying the system's design and functionality to stakeholders
and development teams. In essence, the design phase is pivotal for ensuring that the software solution
achieves its objectives in a proficient and effective manner.

23
4.2.1 DATA FLOW DIAGRAM

Figure 4.2: Data Flow Diagram

This Figure 4.2 shows a data flow diagram of the image processing pipeline. Raw image data enters
first, followed by pre-processing to enhance quality. Feature extraction then condenses the data by
selecting key characteristics. Finally, a fully connected neural network layer analyzes these features for
image classification or prediction, outputting the processed data.
24
4.2.2 UML DIAGRAM

Figure 4.3: UML Diagram

The Figure 4.3 is a UML diagram depicting the image processing pipeline. It consists of three main
stages: image preprocessing, feature extraction, and sketch to image prediction. Preprocessing aims to
improve the image data by suppressing noise or enhancing features. Feature extraction reduces the data
by selecting the most relevant information. Finally, sketch to image prediction utilizes a neural network
to convert a sketch into a complete image.

25
4.2.3 USE CASE DIAGRAM

Figure 4.4: Use Case Diagram

The diagram in Figure 4.4 illustrates the flow of the Collaborative Generative Representation
Learning technique for mapping sketches to images, involving Image Preprocessing, Feature
Extraction, and Sketch to Image Prediction.

4.2.4 SEQUENCE DIAGRAM

Figure 4.5: Sequence Diagram

26
The sequence diagram illustrates the process of loading and preprocessing images from a dataset
directory. Initiated by the user, the code module iterates over image files with progress tracking by the
tqdm library. Using the OpenCV library, images are loaded, preprocessed, and returned to the code
module. Matplotlib then visualizes the preprocessed images. This iterative process continues until all
images are processed, concluding the sequence.

4.3 MODULE DESCRIPTION

The following modules form the core components of our image processing and analysis pipeline,
each serving a distinct yet interconnected role in Photo - Sketch synthesis :

4.3.1 MODULE1 : IMAGE PREPROCESSING

Pre-processing involves applying various procedures to images at their most basic level, where
both input and output are represented as intensity images. These images closely resemble the original
data captured by sensors, typically appearing as matrices of brightness values. The main aim of pre-
processing is to refine image data, reducing unintended distortions while enhancing specific
attributes vital for subsequent processing.

4.3.2 MODULE 2 : FEATURE EXTRACTION

Feature extraction constitutes a pivotal component of the dimensionality reduction process,


wherein an initial set of raw data undergoes segmentation and condensation into more manageable
groups, facilitating streamlined processing. A defining attribute of these extensive data sets is their
abundance of variables, demanding significant computational resources for analysis. Feature
extraction addresses this challenge by discerning and amalgamating variables into features, thereby
substantively diminishing the data volume. These resultant features are conducive to efficient
processing while retaining the capacity to accurately depict the original data set with fidelity and
precision.

4.3.3 MODULE 3: IMAGE TO SKETCH SYNTHESIS

The neural network is regarded as an adept feature extractor, comprising two primary components
integral to its functioning. Firstly, the feature extractor incorporates convolutional and pooling layers,
tasked with autonomously discerning and assimilating key attributes from raw data. Subsequently, the
fully connected layer employs the acquired features to execute classification tasks.The input layer

27
serves to ingest individual data values, while the output layer yields results corresponding to the
number of distinct categories requiring classification.Within the convolutional layer, localized regions
of the data undergo scrutiny to extract pertinent features, while pooling layers serve to streamline
computational complexity by reducing parameter quantities.

4.3.4 STEP 2: PROCESSING OF DATA

• The CUHK (CUFS) dataset contains 188 images of students from the Chinese University of
Hong Kong along with the sketches of those images [54].
• The dataset is assigned to image_path and sketch_path and sorted in alphanumeric order.

• In Figure 4.7, using the opencv library, they are color corrected, resized, rotated and converted
from an image to an array.

Figure 4.6: Pre-processing Of Data

4.3.5 STEP:3 SPLIT THE DATA

• After the preprocessing part, both sketches and images are split to a training and testing split.

• The training data contains 80 percent of the database. This is applicable for both the images and
the sketch version of the images
• The test data contains the rest 20 percent of the database.

28
4.3.6 DATASETS SAMPLE

Figure 4.7: Dataset Of Photos

Figure 4.8: Dataset Of Sketches

29
4.3.7 STEP 4: BUILDING THE MODEL
Our model of choice is an encoder-decoder architecture-based model implemented using the
Keras functional API.

INPUT LAYER:
The input layer of the model is defined with the input shape being (SIZE, SIZE, 3) denoting an
image with dimensions ‘size x size’ and 3 channels (RGB)

ENCODER SECTION:
We proceed to the encoder section by implementing a downsample function that reduces the
spatial dimensions of the input tensor with an increase in depth. Each downsample operation uses
a 4x4 convolutional layer with a specified number of filters.

DECODER SECTION:
The decoder section mirrors the encoder but in reverse. This is done by using upsample
operations to increase spatial dimensions while decreasing depth .Each upsample operation is
paired with a 4x4 transposed convolutional layer (Conv2DTranspose) except for the last two
layers that use a 2x2 transposed convolutional layer.

MODEL CREATION:
Finally, the encoder input and decoder output is used to create a Keras model object. Figure 4.10
below shows the model summary.

Figure 4.9: Model Summary

30
4.3.8 STEP 5: COMPILING AND TRAINING THE MODEL

The model is now compiled by using the Adam optimizer with a specified learning rate of
0.0001.

The model’s performance during training and evaluation is monitored using the accuracy
metric (‘acc’).

The compiled model is now made to fit or train using the training data. The epochs in this
case have been set to 150 as this showed us the best results. Figure 4.11 shows the lines of
code used to compile and fit the model.

Figure 4.10: Code To Compile Model And Fit Model To Dataset

31
CHAPTER 5

IMPLEMENTATION AND TESTING

5.1 INPUT AND OUTPUT

5.1.1 IMAGE OF THE SUBJECT

● A random student has been selected from the 188 students for evaluation of the model.
● This image is fed into the Neural Network and the network gives a sketch synthesized output
of what the predicted sketch would look like for the input image.
● Figure 5.1 below shows the original, augmented image of the subject

Figure 5.1: Original Image Of The Subject

5.1.2 PREDICTED SKETCH SYNTHESIS OF THE SUBJECT

Figure 5.2 below shows the sketch synthesized output by the neural network

32
Figure 5.2: Predicted Sketch Of The Subject

5.2 TESTING

In photo-sketch synthesis, testing serves to validate whether the generated sketches accurately depict
the original photos. This ensures that the autoencoder-based generative model conforms to specified
criteria. Testing is conducted to confirm whether the synthesized sketches successfully capture essential
details and uphold fidelity to the source images. The process entails rigorous examination aimed at
verifying that the model effectively achieves its intended purpose.

5.2.1 TYPES OF TESTING

5.2.2 UNIT TESTING

Unit testing is a beneficiable software testing method where the units of source code is tested to check
the efficiency and correctness of the program. Figure 5.3 below contains the code for the image
augmentation.

33
INPUT:

Figure 5.3: Image Augmentation Using opencv-python Library

TEST RESULT

• Images of size 224*224 pixels are considered

• The images undergo augmentation using opencv-python

• The considered images are loaded into an array for preprocessing.

34
5.2.3 INTEGRATION TESTING

INPUT:

Figure 5.4 below shows the code snippet for printing the images and sketches while Figure 5.5 shows
the output of the code

Figure 5.4: Code For Printing Images And Their Corresponding Sketches

TEST RESULT

• A The images from the Images folder and their corresponding sketches from the sketches folder
are loaded

• The images and the sketches are then made to display as output parallelly using the matplotlib
library.

Figure 5.5: Sketches And Their Image Printed As Output

35
5.2.4 FUNCTIONAL TESTING

INPUT

Figure 5.6: Training And Testing Split

Figure 5.7: Model Is Being Complied And Trained

TEST RESULT

• All the images from data sets are loaded into the model and carried out for training.

• Training is done by considering each image and saving the characteristics of the image

36
CHAPTER 6
RESULTS AND DISCUSSIONS

6.1 EFFICIENCY OF THE PROPOSED SYSTEM

The current proposed system uses an encoder-decoder architecture to learn a mapping between images
and sketches. Autoencoders having a simple but efficient architecture results in faster training times
and lower computational requirements. It also does not require Labeled data as it trains in an
unsupervised manner. The current model gives us an accuracy of about 40 percent. With the
Autoencoder’s generative capabilities, this is the right neural network for image generation tasks

6.2 COMPARISON OF EXISTING AND PROPOSED SYSTEM

The proposed system can achieve good performance with relatively smaller datasets compared to more
data-hungry models, reducing data requirements and training time. It is also well-suited for tasks such
as image denoising, image compression, anomaly detection, and image-to-image translation, providing
versatility in application areas. The proposed system can also generalize well to unseen data and is
capable of adapting to different domains with minor adjustments to training strategies.

37
CHAPTER 7

CONCLUSION AND FUTURE ENHANCEMENTS

7.1 CONCLUSION

The proposed approach in this study involves an end-to-end fully convolutional network aimed at
directly modeling the intricate nonlinear mapping between face photos and sketches. Experimental
findings underscore the efficacy of the fully convolutional network in adeptly addressing this
challenging task, facilitating pixel-wise predictions with both effectiveness and efficiency.

7.2 FUTURE ENHANCEMENTS

Future enhancements will focus on refining the existing loss function and conducting experiments
across various databases. Additionally, investigations into the correlation between our approach and
non-photorealistic rendering methodologies will be pursued.

7.3 RESULTS:

Figure 7.1: Prediction And Sketch Matched.

38
Figure 7.2: Model Prediction And Sketch Matched

39
CHAPTER 8

SOURCE CODE & POSTER PRESENTATION

8.1 SAMPLE CODE

40
41
42
REFERENCES

[1] Eman S. Sabry,Salah S. Elagooz,Fathi E. Abd El-Samie,Walid El-Shafai,Nirmeen A. El-


Bahnasawy,Ghada M. El-Banby,Abeer D. Algarni,Naglaa F. Soliman,Rabie A. Ramadan Image
Retrieval Using Convolutional Autoencoder, InfoGAN, and Vision Transformer Unsupervised
Models IEEE Access, 2023

[2] Hayato Arai,Yuto Onga,Kumpei Ikuta,Yusuke Chayama,Hitoshi Iyatomi,Kenichi Oishi Disease-


Oriented ImageEmbedding With Pseudo-Scanner Standardization for Content-Based Image
Retrieval on 3D Brain MRI IEEE Access, 2021

[3] Luqing Luo,Zhi-Xin Yang,Lulu Tang,Kun Zhang An ELM-Embedded Deep Learning Based
Intelligent Recognition System for Computer Numeric Control Machine Tools IEEE Access, 2020

[4] Jahanzaib Latif,Chuangbai Xiao,Shanshan Tu,Sadaqat Ur Rehman,Azhar Imran,Anas Bilal


Implementation and

[5] Use of Disease Diagnosis Systems for Electronic Medical Records Based on Machine Learning: A
Complete Review IEEE Access, 2020

[6] Mohamad M. Al Rahhal,Yakoub Bazi,Norah A. Alsharif,Laila Bashmal,Naif Alajlan,Farid


Melgani Multilanguage Transformer for Improved Text to Remote Sensing Image Retrieval IEEE
Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022

[7] Mohamad M. Al Rahhal,Yakoub Bazi,Norah A. Alsharif,Laila Bashmal,Naif Alajlan,Farid


Melgani Multilanguage Transformer for Improved Text to Remote Sensing Image Retrieval IEEE
Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022

[8] Tengfei Wu,Lu Leng,Muhammad Khurram Khan,Farrukh Aslam Khan Palmprint-Palmvein


43
Fusion Recognition Based on Deep Hashing Network IEEE Access, 2021

[9] Zhenghang Yuan,Lichao Mou,Qi Wang,Xiao Xiang Zhu From Easy to Hard: Learning Language-
Guided Curriculum for Visual Question Answering on Remote Sensing Data IEEE Transactions
on Geoscience and Remote Sensing, 2022

[10] Arka Ujjal Dey,Ernest Valveny,Gaurav Harit EKTVQA: Generalized Use of External
Knowledge to Empower Scene Text in Text-VQA IEEE Access, 2022

[11] Naushad Varish,Arup Kumar Pal,Rosilah Hassan,Mohammad Kamrul Hasan,Asif


Khan,Nikhat Parveen,Debrup Banerjee,Vidyullatha Pellakuri,Amin Ul Haqis,Imran Memon Image
Retrieval Scheme Using Quantized Bins of Color Image Components and Adaptive Tetrolet
Transform IEEE Access, 2020

[12] Ali Ahmed,Sharaf J. Malebary Query Expansion Based on Top-Ranked Images for Content-
Based Medical Image Retrieval IEEE Access, 2020

[13] Khawaja Tehseen Ahmed,Humaira Afzal,Muhammad Rafiq Mufti,Arif Mehmood,Gyu Sang


Choi Deep Image Sensing and Retrieval Using Suppression, Scale Spacing and Division,
Interpolation and Spatial Color Coordinates With Bag of Words for Large and Complex Datasets
IEEE Access, 2020

[14] N. F. Soliman, M. Khalil, A. D. Algarni, S. Ismail, R. Marzouk and W. El-Shafai, Efficient


HEVC steganography approach based on audio compression and encryption in QFFT domain for
secure multimedia communication, Multimedia Tools Appl., vol. 80, no. 3, pp. 4789-4823, 2020.

[15] W. El-Shafai, Joint adaptive pre-processing resilience and post-processing concealment


schemes for 3D video transmission, 3D Res., vol. 6, no. 1, pp. 1-13, Mar. 2015.

[16] K. M. Abdelwahab, S. M. A. El-Atty, W. El-Shafai, S. El-Rabaie and F. E. A. El-Samie,


Efficient SVD-based audio watermarking technique in FRT domain, Multimedia Tools Appl., vol.
44
79, no. 9, pp. 5617-5648, Mar. 2020.

[17] A. D. Algarni, G. El Banby, S. Ismail, W. El-Shafai, F. E. A. El-Samie and N. F. Soliman,


Discrete transforms and matrix rotation based cancelable face and fingerprint recognition for
biometric security applications, Entropy, vol. 22, no. 12, pp. 1361, Nov. 2020.

[18] N. A. El-Hag, A. Sedik, W. El-Shafai, H. M. El-Hoseny, A. A. Khalaf, A. S. El-Fishawy, et


al., Classification of retinal images based on convolutional neural network, Microsc. Res.
Technique, vol. 84, no. 3, pp. 394-414, 2021. M. Woelfle, P. Olliaro and M. H. Todd, Open
science is a research accelerator, Nature Chem., vol. 3, no. 10, pp. 745-748, Oct. 2011.

[19] A. Kumar, J. Kim, W. Cai, M. Fulham and D. Feng, Content-based medical image retrieval:
A survey of applications to multidimensional and multimodality data, J. Digit. Imag., vol. 26, no.
6, pp. 1025-1039, 2013.

[20] L. R. Nair, K. Subramaniam and G. K. D. Prasannavenkatesan, A review on multiple


approaches to medical image retrieval system, Intell. Comput. Eng., vol. 1125, pp. 501-509, Apr.
2020.

[21] S. F. Salih and A. A. Abdulla, An improved content based image retrieval technique by
exploiting bi-layer concept, UHD J. Sci. Technol., vol. 5, no. 1, pp. 1-12, Jan. 2021.

[22] Z. Tu and X. Bai, Auto-context and its application to high-level vision tasks and 3D brain
image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 10, pp. 1744-1757, Oct.
2010.

[23] A. Kusiak, Intelligent Manufacturing Systems, Upper Saddle River, NJ, USA:Prentice-Hall,
pp. 448, 1990.

[24] Z. Yang, P. Zhang and L. Chen, RFID-enabled indoor positioning method for a real-time
manufacturing execution system using OS-ELM, Neurocomputing, vol. 174, pp. 121-133, Jan.
45
2016.

[25] W. Xu, Z. Yang and X. Wang, A technical and business perspective on wireless sensor
network for manufacturing execution system, Math. Problems Eng., vol. 2015, pp. 1-15, 2015.

[26] M. Kuroki, T. Yoneoka, T. Satou, Y. Takagi, T. Kitamura and N. Kayamori, Bar-code


recognition system using image processing, Proc. IEEE 6th Int. Conf. Emerg. Technol. Factory
Autom. (EFTA), Nov. 2002.

[27] R. Adelmann, M. Langheinrich and C. Floerkemeier, Toolkit for bar code recognition and
resolving on camera phones—Jump starting the Internet of Things, Proc. INFORMATIK
Informatik fÃ1⁄4r Menschen-Band, vol. 2, pp.366-373, 2006.

[28] M. Stewart, Patient-Centered Medicine: Transforming the Clinical Method, Oxford,


U.K.:Radcliffe Publishing, 2003.

[29] J. Stausberg, D. Koch, J. Ingenerf and M. Betzler, Comparing paper-based with electronic
patient records: Lessons learned during a study on diagnosis and procedure codes, J. Amer. Med.
Inform. Assoc., vol. 10, pp. 470-477, Sep. 2003.

[30] C. S. Kruse, R. Goswamy, Y. Raval and S. Marawi, Challenges and opportunities of big data
in health care: A systematic review, JMIR Med. Informat., vol. 4, no. 4, pp. e38, Nov. 2016.

[31] J. J. Firthous and M. M. Sathik, Survey on using electronic medical records (EMR) to
identify the health conditions of the patients, J. Eng. Sci., vol. 11, no. 5, 2020.

[32] G. Makoul, R. H. Curry and P. C. Tang, The use of electronic medical records:
Communication patterns in outpatient encounters, J. Amer. Med. Inform. Assoc., vol. 8, no. 6, pp.
610-615, Nov. 2001.

[33] M. Sudmanns et al., Big earth data: Disruptive changes in earth observation data
46
management and analysis?,Int. J. Digit. Earth, vol. 13, no. 7, pp. 832-850, Jul. 2020.

[34] Y. Li, J. Ma and Y. Zhang, Image retrieval from remote sensing big data: A survey, Inf.
Fusion, vol. 67, pp. 94-115, Mar. 2021.

[35] Y. Yang and S. Newsam, Geographic image retrieval using local invariant features, IEEE
Trans. Geosci. Remote Sens., vol. 51, no. 2, pp. 818-832, Feb. 2013.

[36] E. Aptoula, Remote sensing image retrieval with global morphological texture descriptors,
IEEE Trans. Geosci. Remote Sens., vol. 52, no. 5, pp. 3023-3034, May 2014.

[37] X.-Y. Tong, G.-S. Xia, F. Hu, Y. Zhong, M. Datcu and L. Zhang, Exploiting deep features
for remote sensing image retrieval: A systematic investigation, IEEE Trans. Big Data, vol. 6, no. 3,
pp. 507-521, Sep. 2020.

[38] A.-S. Ungureanu, S. Salahuddin and P. Corcoran, Toward unconstrained palmprint


recognition on consumer devices: A literature review, IEEE Access, vol. 8, pp. 86130-86148,
2020.

[39] X. Zhou, K. Zhou and L. Shen, Rotation and translation invariant palmprint recognition with
biologically inspired transform, IEEE Access, vol. 8, pp. 80097-80119, 2020.

[40] Y. Hao, Z. Sun, T. Tan and R. Chao, Multispectral palm image fusion for accurate contact-
free palmprint recognition, Proc. 15th IEEE Int. Conf. Image Process., pp. 281-284, Oct. 2008.

[41] A. Iula and M. Micucci, A feasible 3D ultrasound palmprint recognition system for secure
access control applications, IEEE Access, vol. 9, pp. 39746-39756, 2021.

[42] S. Sun, X. Cong, P. Zhang, B. Sun and X. Guo, Palm vein recognition based on NPE and
KELM, IEEE Access, vol.9, pp. 71778-71783, 2021.

47
[43] X. X. Zhu et al., Deep learning in remote sensing: A comprehensive review and list of
resources, IEEE Geosci. Remote Sens. Mag., vol. 5, no. 4, pp. 8-36, Dec. 2017.

[44] S. Talukdar et al., Land-use land-cover classification by machine learning classifiers for
satellite observations—A review, Remote Sens., vol. 12, no. 7, pp. 1135, Apr. 2020.

[45] M. Castelluccio, G. Poggi, C. Sansone and L. Verdoliva, Land use classification in remote
sensing images by convolutional neural networks, arXiv:1508.00092, 2015.

[46] G. Cheng and J. Han, A survey on object detection in optical remote sensing images, ISPRS
J. Photogramm.Remote Sens., vol. 117, pp. 11-28, Jul. 2016.

[47] K. Li, G. Wan, G. Cheng, L. Meng and J. Han, Object detection in optical remote sensing
images: A survey and a new benchmark, ISPRS J. Photogramm. Remote Sens., vol. 159, pp. 296-
307, Jan. 2020.

[48] A. U. Dey, S. K. Ghosh, E. Valveny and G. Harit, Beyond visual semantics: Exploring the
role of scene text in image understanding, Pattern Recognit. Lett., vol. 149, pp. 164-171, Sep.
2021.

[49] S. Karaoglu, R. Tao, T. Gevers and A. W. M. Smeulders, Words matter: Scene text for
image classification and retrieval, IEEE Trans. Multimedia, vol. 19, no. 5, pp. 1063-1076, May
2017.

[50] Z. Hussain, M. Zhang, X. Zhang, K. Ye, C. Thomas, Z. Agha, et al., Automatic


understanding of image and video advertisements, Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. (CVPR), pp. 1100-1110, Jul. 2017.

[51] S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, et al., VQA: Visual
question answering, Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp. 2425-2433, Dec. 2015.

48
[52] S. Bai and S. An, A survey on automatic image caption generation, Neurocomputing, vol.
311, pp. 291-304, Oct.2018.

[53] K. Seetharaman and M. Kamarasan, Statistical framework for image retrieval based on
multiresolution features and similarity method, Multimedia Tools Appl., vol. 73, pp. 1943-1962,
Dec. 2014.

[54] title={Coupled information-theoretic encoding for face photo-sketch recognition},


author={Zhang, Wei and Wang, Xiaogang and Tang, Xiaoou}, booktitle={Computer Vision and
Pattern Recognition (CVPR), 2011 IEEE Conference on}, pages={513--520}, year={2011},
organization={IEEE}

[55] Arai, Hayato, et al. "Disease-oriented image embedding with pseudo-scanner standardization for
content-based image retrieval on 3D brain MRI." IEEE Access 9 (2021): 165326-165340.

[56] Sain, Aneeshan. Exploring Sketch Traits for Democratising Sketch Based Image Retrieval. Diss.
University of Surrey, 2023.

[57] Khokhlova, Margarita, et al. "Cross-year multi-modal image retrieval using siamese networks." 2020
IEEE International Conference on Image Processing (ICIP). IEEE, 2020.

49
50
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Deemed to be University u / s 3 of UGC Act, 1956)

Office of Controller of Examinations


REPORT FOR PLAGIARISM CHECK ON THE DISSERTATION / PROJECT REPORT FOR UG / PG
PROGRAMMES
(To be attached in the dissertation / project report)

1 Name of the JACK ANDRE J


Candidate (IN ABISHEK RAJ M
BLOCK CHARUDEVE KS
LETTERS)
Bharathi Salai, Chennai -600089
2 Address of Candidate Mobile Number: +91 98401 64161,
+91 91508 90969,
+91 97911 50465
RA2011026020147, RA2011026020131, RA2011026020139
3 Registration Number

4 Date of Birth 17/06/2002, 25/03/2002, 06/08/2002

5 Department Computer Science and Engineering with


specialization in Artificial Intelligence and Machine
Learning

6 Faculty Engineering and Technology

7 Title of the Dissertation / TITLE


Project

Individual or group: Group


(Strike whichever is not applicable)

8 Whether the above a) If the project / dissertation is done in group, then how many
project / dissertation is students together completed the project : 03
done by b) Mention the Name and Register number of
other candidates:
JACK ANDRE J [RA2011026020147], ABISHEK RAJ M
[RA2011026020131], CHARUDEVE KS [RA2011026020139]

ADDRESS OF GUIDE
9 Name and address
of the Supervisor / Mail ID :
Guide
Mobile Number :

51
NA
10 Name and address of
the Co- Mail ID: NA Mobile Number: NA
Supervisor /Guide

11 Software Used Turnitin

12 Date of Verification DD/MM/YYYY

13 Plagiarism Details: (to attach the final report from the software)

Percentage Percentag % of plagiarism after


C of similarity e of excluding Quotes,
h Title of the Report index similarity Bibliography, etc.,
a (including index
pt self citation) (Excludin
er g self
citation)
TITLE
1 NA NA 10%

Appendices NA NA NA

We declare that the above information has been verified and found true to the best of our knowledge.

Name and Signature of the Staff


Signature of the Candidate ( Who uses the plagiarism check software )

Name and Signature of the Supervisor / Name and Signature of the Co-Supervisor / Co-Guide
Guide

Dr. XXX
Name and Signature of the HOD

52
.

53
54
55
56
57
58

You might also like