PROJECT REPORT FORMAT 2025
PROJECT REPORT FORMAT 2025
GENERATIVE MODELING
A PROJECT REPORT
Submitted by
Mrs. Angeline R
Assistant Professor (Selection Grade),
Department of Computer Science and Engineering
BACHELOR OF TECHNOLOGY
in
with specialization in
of
May 2025
S R M INSTITUTE OF SCIENCE AND TECHNOLOGY
(Deemed to be University U/S 3 of UGC Act, 1956)
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
ii
S R M INSTITUTE OF SCIENCE AND TECHNOLOGY,
CHENNAI - 89
DECLARATION
We hereby declare that the entire work contained in this project report
titled “PHOTO-SKETCH SYNTHESIS USING AUTOENCODER
BASED GENERATIVE MODELING” has been carried out by JACK
ANDRE [REG NO: RA2011026020147], ABISHEK RAJ M [REG
NO: RA2011026020131], CHARUDEVE KS [REG NO:
RA2011026020139] at SRM Institute of Science and Technology,
Ramapuram, Chennai- 600089, under the guidance of Ms. Angeline R,
M.Tech., (Ph.D)., Assistant Professor (Selection Grade), Department of
Computer Science and Engineering.
Place: Chennai
Date: JACK ANDRE J
ABISHEK RAJ M
CHARUDEVE KS
iii
Own Work Declaration
DECLARATION:
We are aware of and understand the University’s policy on Academic misconduct and
plagiarism and we certify that this assessment is our own work, except where indicated by
referring, and that we have followed the good academic practices noted above.
iv
ACKNOWLEDGEMENT
We place on record our deep sense of gratitude to our lionized Chairman
Dr. R. SHIVAKUMAR, MBBS., MD., for providing us with the requisite
infrastructure throughout the course.
We take the opportunity to extend our hearty and sincere thanks to our
Dean, Dr. M. SAKTHI GANESH., Ph.D., for maneuvering us into
accomplishing the project.
We take the privilege to extend our hearty and sincere gratitude to the
Professor and Chairperson, Dr. K. RAJA, Ph.D., for his suggestions, support and
encouragement towards the completion of the project with perfection.
We thank our honorable Head of the department Dr.XXX. Designation,
Department for his/her constant motivation and unwavering support.
We express our hearty and sincere thanks to our guide
Dr.XXX. Designation, Department for his/her encouragement, consecutive
criticism and constant guidance throughout this project work.
Our thanks to the teaching and non-teaching staff of the Department of
Computer Science and Engineering of S R M Institute of Science and Technology,
Chennai, for providing necessary resources for our project
v
ABSTRACT
The transformation of images into sketches is a critical area in computer vision with wide-
ranging applications from artistic rendering to enhancing visual content. This project introduces
generative model. Unlike conventional methods that modify input images to create sketch-like
appearances, our system employs autoencoders to generate sketches from scratch. The system
with a photo, the encoder extracts crucial features and representations, capturing the image's
underlying structure and content. These extracted features are then used by the decoder to
lines, and textures. This photo-sketch synthesis system presents significant benefits for law
technology, assist in cold case investigations, verify witness testimonies and support
undercover operations.
vi
TABLE OF CONTENTS
Page. No
ABSTRACT vi
LIST OF FIGURES x
LIST OF TABLES xi
1 INTRODUCTION 1
1.1 Problem Statement 12
1.2 Aim of the Project 13
1.3 Project Domain 13
1.4 Scope of the Project 13
1.5 Methodology 14
1.6 Organization of the Report 14
2 LITERATURE REVIEW 16
3 PROJECT DESCRIPTION 19
3.1 Existing System 19
3.2 Proposed System 19
3.2.1 Advantages 19
3.3 Feasibility Study 20
3.3.1 Economic Feasibility 20
vii
3.3.2 Technical Feasibility 20
3.3.3 Social Feasibility 20
3.4 System Specification 21
3.4.1 Hardware Specification 21
3.4.2 Software Specification 21
3.4.3 Standards and Policies 21
4 PROPOSED WORK 22
4.1 General Architecture 22
4.2 Design Phase 22
4.2.1 Data Flow Diagram 23
4.2.2 UML Diagram 24
4.2.3 Use Case Diagram 25
4.2.4 Sequence Diagram 25
4.3 Module Description 26
4.3.1 Module 1: Image Processing 26
4.3.2 Module 2: Feature Extraction 26
4.3.3 Module 3: Image to Sketch Synthesis 27
4.3.4 Step 2: Processing of Data 27
4.3.5 Step 3: Split the Data 28
4.3.6 Dataset Sample 28
4.3.7 Step 4: Building the Model 29
4.3.8 Step 5: Compiling and Training the Model 30
viii
5.2 Testing 32
5.2.1 Types of Testing 32
5.2.2 Unit testing 32
5.2.3 Integration testing 34
5.2.4 Functional testing 35
5.2.5 Test Result 35
References
A. Sample screenshots
ix
LIST OF FIGURES
8 Sample Code 40
x
LIST OF TABLES
xi
LIST OF ACRONYMS AND ABBREVIATIONS
xii
CHAPTER 1 (page number 1
INTRODUCTION
Face sketch synthesis is a crucial aspect of face style transformation, essential for generating
sketches from input photos. Its utility spans across law enforcement and digital entertainment, where it
aids in enhancing face recognition accuracy by minimizing texture disparities. Exemplar-based
methods, which rely on photo-sketch pairs for training, are prominent in this field. Face sketch
recognition tackles modality disparities through synthesis, projection, or optimization-based
approaches. Additionally, sketch-based image synthesis enables the creation of realistic images from
sketches, catering to individuals without artistic expertise. The formulation of face sketches from
reference photos facilitates tasks in law enforcement and entertainment sectors. Exemplar-based
techniques involve segmenting photos and sketches into patches to streamline the synthesis process.
Overall, face sketch synthesis plays a vital role in various applications, offering efficient solutions for
generating sketches and enhancing recognition accuracy.
Edges, boundaries, and contours are pivotal in both computer graphics and computer vision,
conveying 3D shapes and indicating occlusion events. This paper aims to generate contour drawings,
capturing scene outlines, distinct from boundary detection. Content-Based Image Retrieval (CBIR)
offers valuable solutions across various domains, including medical and meteorological applications.
Facial Sketched-Real Image Retrieval (FSRIR) poses a significant challenge due to human face
complexity and domain disparities. This paper contributes by extending the CUFS dataset, presenting
three retrieval systems, and evaluating them on diverse datasets. The proposed systems outperform
recent algorithms, with InfoGAN and Vision Transformer (ViT) excelling in distinguishing freehand
sketches and objects. The ViT system achieves a 1.183 F-score on the ESRIR dataset. Future research
could explore alternative distance metrics and feature extraction algorithms.
12
1.2 AIM OF THE PROJECT
● Enhance model generalization to diverse groups by improving its adaptability to new contexts and
data distributions.
● Update contextual associations between newly derived descriptions and features to ensure
relevance and accuracy.
● Implement a more efficient linear-based attention scheme to enable effective long sequence
interactions on large inputs, enhancing model scalability and performance.
● Improve fine-grained feature extraction capabilities to produce high-quality results with enhanced
detail and accuracy.
● Optimize latent vector optimization processes to minimize errors and improve overall model
performance and image fidelity.
The domain of the project is CNN (Convolutional Neural Network) and DL (Deep Learning),
particularly emphasizing their applications in computer vision and image processing.
The scope of this project encompasses the development and implementation of a Collaborative
Generative Representation Learning Neural Network Algorithm for image processing tasks. This
includes modules for image preprocessing, feature extraction, and sketch to image prediction,
leveraging techniques such as histogram equalization and neural network-based feature extraction. The
project aims to address challenges in handling non-linear relationships, maintaining continuity in neural
network operation, and learning from similar events.
13
1.5 METHODOLOGY
In this project, the proposed Collaborative Generative Representation Learning Neural Network
Algorithm is employed to bridge the domain gap between sketches and images, facilitating robust
mapping between the two modalities. By incorporating skip layer connections, the neural network
enhances its ability to process both iconic aspects and advanced features of images, thereby improving
performance in identifying small and large images alike. Additionally, noise is strategically added to
deeper hidden layers to enhance the robustness of the network. The project encompasses three main
modules: Image Preprocessing, Feature Extraction, and Sketch to Image Prediction. Each module plays
a critical role in preparing, analyzing, and predicting images, leveraging techniques such as histogram
equalization for contrast enhancement and convolutional layers for feature extraction. Through this
comprehensive methodology, the project aims to advance the state-of-the-art in learning the mapping
between sketches and images.
Chapter 4 outlines the proposed work, starting with the general architecture depicted in Figure 4.1,
inspired by the Collaborative Generative Representation Learning Neural Network (CGRL-NN)
tailored for facial recognition. The design phase includes a data flow diagram (Figure 4.2), illustrating
the image processing pipeline from raw data input to feature extraction and neural network analysis. A
UML diagram (Figure 4.3) further details the processing stages. The proposed system comprises three
modules: Image Preprocessing, Feature Extraction, and Image to Sketch Synthesis, described in detail.
14
Additionally, steps for data processing, model building, and training are delineated, supported by
sample datasets and model summaries. The feasibility study assesses economic, technical, and social
feasibility, ensuring practicality and acceptance of the proposed system.
Chapter 5 focuses on the implementation and testing of the proposed system. Input and output
evaluation involve feeding a random student image into the Neural Network to generate a synthesized
sketch, as depicted in Figures 5.1 and 5.2. Testing encompasses various types, including Unit Testing,
Integration Testing, and Functional Testing. Unit testing verifies source code units for efficiency and
correctness, illustrated in Figure 5.3 for image augmentation. Integration testing assesses system
efficiency with functional requirements, while functional testing verifies output against provided inputs.
A testing strategy incorporating unit, integration, and functional testing ensures comprehensive
evaluation of the system's performance and compliance with requirements.
Chapter 6 provides the results and discussions, highlighting the efficiency of the proposed system
and comparing it to existing methods. It discusses the use of an encoder-decoder architecture, achieving
an accuracy of approximately 40 percent. The proposed system demonstrates suitability for various
tasks and outperforms more data-intensive models with smaller datasets. Results are visually
represented in Figures 6.1 and 6.2.
Chapter 7 provides the conclusion and outlines future enhancements. The proposed approach
involves an end-to-end fully convolutional network for modeling the mapping between face photos and
sketches. Experimental results demonstrate the efficacy of this approach. Future enhancements include
refining the loss function, experimenting with different databases, and exploring correlations with non-
photorealistic rendering methodologies.
Chapter 8 contains the source code and details about the poster presentation. It includes sample code
snippets for reference.
CHAPTER 2
15
LITERATURE REVIEW
This chapter provides a thorough examination of the body of the present research and academic
publications pertinent to the subject of the project. By means of this overview, important discoveries in
the subject, technical developments, and the evolution of approaches are described. Critical evaluations
and comparisons of several methods and models also offer a useful basis for comprehending the state-
of-the-art at the moment and flagging up any holes or chances for more investigation and creativity. This
chapter is essential to setting the project in the bigger picture by applying knowledge and lessons from
earlier research.
Mohamad M et al. [6] examine the realm of cross-modal text-image retrieval within remote sensing
(RS), highlighting its potential as a versatile approach for extracting valuable insights from RS
databases. They note that current methodologies are primarily designed to accommodate queries in
English, potentially limiting accessibility for non-English speakers. To address this limitation, the study
advocates for the integration of multilingual queries, which can enhance interaction with the retrieval
system and broaden the accessibility of RS data. In response, they propose a multi-language framework
employing transformers. Anticipated benefits include improved reliability and resilience in data
retrieval, as well as enhanced quality and consistency of extracted data. Moreover, the proposed
framework offers easy scalability to accommodate increasing demands. However, the study
acknowledges challenges in optimizing system performance, citing difficulties in achieving better
performance metrics and describing the process of message updating as tedious. Such complexities may
inadvertently lead to errors in the resulting data. These insights, derived from research conducted in
2022, underscore both the advancements and challenges within the domain of cross-modal text-image
retrieval in remote sensing.
Zhenghang Yuan et al. [9] explore the application of intelligent human-computer interaction systems
in leveraging visual question answering (VQA) for remote sensing scenes. Despite the considerable
attention VQA has garnered within computer vision, its counterpart tailored for remote sensing data
(RSVQA) remains in its nascent stages of development. The study underscores the potential for RSVQA
to simplify the implementation process, thereby enhancing accessibility to insights from remote sensing
imagery. Notably, the benefits of implementing RSVQA are evident in its ability to streamline tasks
affected by visual data. However, challenges persist, including the need for built-in error handling
mechanisms and concerns over the time-consuming nature of the approach. Additionally, the study
16
identifies limitations in existing solutions, citing their ineffectiveness in addressing pertinent issues.
These observations, drawn from research conducted in 2022, shed light on the evolving landscape of
intelligent human-computer interaction systems and the opportunities they present for advancing remote
sensing applications.
Arka Ujjal Dey et al. [10] delve into the open-ended question answering task of Text-VQA, which
often necessitates deciphering scene text content from images, even when such text is obscured or
seldom observed. To address the inherent challenges of this zero-shot task, the study proposes
leveraging external knowledge in a generalized manner to enhance understanding of scene text. Their
system is tailored to extract, validate, and comprehend comprehension tasks involving vision language
understanding, employing a conventional multimodal transformer. Noteworthy attributes of the
proposed system include its quick and efficient usability, capable of minimizing human intervention
requirements. Additionally, it offers a relatively straightforward and computationally economical
approach. However, the implementation of such systems may incur significant capital and operating
expenditures, particularly due to increased payloads. Moreover, despite its capabilities, the system may
struggle to achieve noise-resistant detection. These insights, derived from research conducted in 2022,
underscore both the advancements and limitations in leveraging external knowledge for enhancing text
comprehension within visual question answering tasks.
Tengfei Wu et al. [8] examine the rising popularity of fingerprints in the biometrics industry,
attributing it to the myriad benefits they offer. Within this context, the research community has
increasingly turned to deep learning-based techniques for palmprint recognition, capitalizing on deep
learning's remarkable performance in computer vision tasks. Notably, deep hashing networks (DHNs)
have emerged as a promising approach, offering the ability to compress storage requirements and
expedite matching and retrieval processes by encoding outputs as binary bit strings. The study highlights
the magnificent effectiveness of DHNs in producing results, alongside their low deployment cost and
potential to enhance worst-case performance scenarios. However, it also underscores challenges faced
by DHNs, particularly in adverse conditions where they may perform poorly. These challenges include
feature loss and inaccurate feature extraction, which can introduce distortions affecting the readability
and measurability of attribute values. These insights, derived from research conducted in 2022, shed
light on the evolving landscape of deep learning-based techniques in palmprint recognition and their
implications for biometric security applications.
17
Arai, Hayato, et al. [55] introduce a novel framework termed Disease-oriented Image Embedding
with Pseudo-Scanner Standardization (DI-PSS), aimed at developing a reliable Content-Based Image
Retrieval (CBIR) system tailored to clinical brain MRI databases. This framework comprises two
primary methods: harmonization of data to mitigate variations stemming from diverse scanning settings,
and a technique for generating low-dimensional embeddings conducive to illness categorization. Despite
the significance of clinical brain MRI databases, research on CBIR in this domain has been relatively
scarce. The study highlights the high robustness and imperceptibility of DI-PSS, emphasizing its
efficacy in clinical settings. Additionally, it notes the framework's efficiency and its capacity to reduce
hardware resource consumption. However, challenges exist, such as the inability to implement DI-PSS
in real-time scenarios and the risk of inaccurate estimations of missing pixels, which can increase the
complexity of the problem. These insights, drawn from research conducted in 2021, underscore the
potential of DI-PSS in advancing CBIR systems for clinical brain MRI databases, while also recognizing
the challenges that need to be addressed for its effective implementation.
Luqing Luo et al. [3] present a novel hybrid framework integrating a multi-channel deep learning
network with a non-iterative and fast feedforward neural network to address the stringent efficiency and
accuracy requirements in intelligent manufacturing. This framework serves as an intelligent tool
recognition system, aiming to achieve a balance between accurate feature extraction and swift
identification. The approach combines the random parameter assignment process of Extreme Learning
Machines (ELMs) with the fine-tuning capabilities of Convolutional Neural Networks (CNNs), thus
offering increased flexibility in model architectures. By leveraging this hybrid framework, the research
anticipates enhanced efficiency and speed in intelligent manufacturing processes. However, it
acknowledges the need for additional configuration and raises concerns regarding the system's ability to
meet current network business demands. Moreover, the system is described as opportunistic and
uncontrollable, implying potential challenges in its implementation. These insights, derived from
research conducted in 2020, underscore the innovative strides in integrating deep learning techniques
into intelligent manufacturing systems while also recognizing the associated limitations and areas for
improvement.
Sain, Aneeshan et al. [56] presents a thesis that aims to improve Sketch-Based Image Retrieval
(SBIR) for real-world use. The research utilizes methods like Cross-Modal Co-Attention and Meta-
18
Learning-based Variational Auto-Encoder to enhance system performance and practicality. The study
leverages paired data of free-hand sketches and photos to improve accuracy in SBIR. The research
demonstrates improved accuracy in SBIR, making it a significant contribution to the field. The study
was conducted in 2023.
Khokhlova, Margarita, et al. [57] present a paper that introduces a Multi-Modal Network for aerial
image retrieval. The research leverages original images and segmented regions with a Siamese network
for feature extraction and a kNN classifier for geo-matching. The approach outperforms state-of-the-art
descriptors like GEM and ResNet50. The study utilizes multi-modal data with labeled information and
extracts descriptors using a Siamese network. The research was conducted in 2020.
19
CHAPTER 3
PROJECT DESCRIPTION
In our proposed system, we address the challenge of learning the mapping between sketches and
images by employing Collaborative Generative Representation Learning. This technique enables the
creation of accurate sketches corresponding to images while promoting robustness against small
perturbations. By focusing on the iconic aspects of objects during image scanning, we aim to bridge
the inherent domain gap between sketches and photos. Utilizing skip layer connections in the neural
network structure allows for more comprehensive information transmission, particularly in
recognizing lower image features. Conversely, non-skip layer connections facilitate the analysis of
advanced features but may lead to overreliance on original image features, hindering the recognition of
small images. Through our approach, we optimize the neural network's ability to identify both small
and large images effectively, mitigating the impact of noise and enhancing confidence in image
recognition tasks.
3.2.1 ADVANTAGES
20
● Boost performance on very high-dimensional datasets
● Lessen computing workload while enhancing detection capabilities
● Simplicity and explainability
● Reveal highly nonlinear relationships
● Guide label assignment effectively and boost label confidence
● Simplify implementation process
A feasibility study is conducted to assess the viability of the project and analyze its strengths and
weaknesses. In this context, the feasibility study is conducted across three dimensions:
• Economic Feasibility
• Technical Feasibility
• Social Feasibility
The proposed system does not require expensive equipment, making it economically feasible.
Development can be carried out using readily available software, eliminating the need for additional
investment.
The proposed system is based entirely on a machine learning model utilizing tools such as
Anaconda prompt, Visual Studio, Kaggle datasets, and Jupyter Notebook, all of which are freely
available, ensures technical feasibility. The technical skills required to use these tools are practical and
accessible, further supporting the feasibility of the project.
The proposed system is based entirely on a machine learning model. Utilizing tools such as
21
Anaconda prompt, Visual Studio, Kaggle datasets, and Jupyter Notebook, all of which are freely
available, ensures technical feasibility. The technical skills required to use these tools are practical and
accessible, further supporting the feasibility of the project.
An effective system is crucial for any computational task. It's important to have the correct
hardware and software components to ensure everything runs smoothly. From strong processors to
essential software packages, each part helps create an efficient environment for data analysis and
machine learning tasks
● Python
● Anaconda
● Jupyter Notebook
● TensorFlow
● Keras
● opencv-python
● pandas
● matplotlib
22
CHAPTER 4
PROPOSED WORK
During the design phase, diverse diagrams and models are crafted to depict various elements of the
system, such as its components, interactions, and data flow. These diagrams, including UML, sequence,
use case, and data flow diagrams, aid in conveying the system's design and functionality to stakeholders
and development teams. In essence, the design phase is pivotal for ensuring that the software solution
achieves its objectives in a proficient and effective manner.
23
4.2.1 DATA FLOW DIAGRAM
This Figure 4.2 shows a data flow diagram of the image processing pipeline. Raw image data enters
first, followed by pre-processing to enhance quality. Feature extraction then condenses the data by
selecting key characteristics. Finally, a fully connected neural network layer analyzes these features for
image classification or prediction, outputting the processed data.
24
4.2.2 UML DIAGRAM
The Figure 4.3 is a UML diagram depicting the image processing pipeline. It consists of three main
stages: image preprocessing, feature extraction, and sketch to image prediction. Preprocessing aims to
improve the image data by suppressing noise or enhancing features. Feature extraction reduces the data
by selecting the most relevant information. Finally, sketch to image prediction utilizes a neural network
to convert a sketch into a complete image.
25
4.2.3 USE CASE DIAGRAM
The diagram in Figure 4.4 illustrates the flow of the Collaborative Generative Representation
Learning technique for mapping sketches to images, involving Image Preprocessing, Feature
Extraction, and Sketch to Image Prediction.
26
The sequence diagram illustrates the process of loading and preprocessing images from a dataset
directory. Initiated by the user, the code module iterates over image files with progress tracking by the
tqdm library. Using the OpenCV library, images are loaded, preprocessed, and returned to the code
module. Matplotlib then visualizes the preprocessed images. This iterative process continues until all
images are processed, concluding the sequence.
The following modules form the core components of our image processing and analysis pipeline,
each serving a distinct yet interconnected role in Photo - Sketch synthesis :
Pre-processing involves applying various procedures to images at their most basic level, where
both input and output are represented as intensity images. These images closely resemble the original
data captured by sensors, typically appearing as matrices of brightness values. The main aim of pre-
processing is to refine image data, reducing unintended distortions while enhancing specific
attributes vital for subsequent processing.
The neural network is regarded as an adept feature extractor, comprising two primary components
integral to its functioning. Firstly, the feature extractor incorporates convolutional and pooling layers,
tasked with autonomously discerning and assimilating key attributes from raw data. Subsequently, the
fully connected layer employs the acquired features to execute classification tasks.The input layer
27
serves to ingest individual data values, while the output layer yields results corresponding to the
number of distinct categories requiring classification.Within the convolutional layer, localized regions
of the data undergo scrutiny to extract pertinent features, while pooling layers serve to streamline
computational complexity by reducing parameter quantities.
• The CUHK (CUFS) dataset contains 188 images of students from the Chinese University of
Hong Kong along with the sketches of those images [54].
• The dataset is assigned to image_path and sketch_path and sorted in alphanumeric order.
• In Figure 4.7, using the opencv library, they are color corrected, resized, rotated and converted
from an image to an array.
• After the preprocessing part, both sketches and images are split to a training and testing split.
• The training data contains 80 percent of the database. This is applicable for both the images and
the sketch version of the images
• The test data contains the rest 20 percent of the database.
28
4.3.6 DATASETS SAMPLE
29
4.3.7 STEP 4: BUILDING THE MODEL
Our model of choice is an encoder-decoder architecture-based model implemented using the
Keras functional API.
INPUT LAYER:
The input layer of the model is defined with the input shape being (SIZE, SIZE, 3) denoting an
image with dimensions ‘size x size’ and 3 channels (RGB)
ENCODER SECTION:
We proceed to the encoder section by implementing a downsample function that reduces the
spatial dimensions of the input tensor with an increase in depth. Each downsample operation uses
a 4x4 convolutional layer with a specified number of filters.
DECODER SECTION:
The decoder section mirrors the encoder but in reverse. This is done by using upsample
operations to increase spatial dimensions while decreasing depth .Each upsample operation is
paired with a 4x4 transposed convolutional layer (Conv2DTranspose) except for the last two
layers that use a 2x2 transposed convolutional layer.
MODEL CREATION:
Finally, the encoder input and decoder output is used to create a Keras model object. Figure 4.10
below shows the model summary.
30
4.3.8 STEP 5: COMPILING AND TRAINING THE MODEL
The model is now compiled by using the Adam optimizer with a specified learning rate of
0.0001.
The model’s performance during training and evaluation is monitored using the accuracy
metric (‘acc’).
The compiled model is now made to fit or train using the training data. The epochs in this
case have been set to 150 as this showed us the best results. Figure 4.11 shows the lines of
code used to compile and fit the model.
31
CHAPTER 5
● A random student has been selected from the 188 students for evaluation of the model.
● This image is fed into the Neural Network and the network gives a sketch synthesized output
of what the predicted sketch would look like for the input image.
● Figure 5.1 below shows the original, augmented image of the subject
Figure 5.2 below shows the sketch synthesized output by the neural network
32
Figure 5.2: Predicted Sketch Of The Subject
5.2 TESTING
In photo-sketch synthesis, testing serves to validate whether the generated sketches accurately depict
the original photos. This ensures that the autoencoder-based generative model conforms to specified
criteria. Testing is conducted to confirm whether the synthesized sketches successfully capture essential
details and uphold fidelity to the source images. The process entails rigorous examination aimed at
verifying that the model effectively achieves its intended purpose.
Unit testing is a beneficiable software testing method where the units of source code is tested to check
the efficiency and correctness of the program. Figure 5.3 below contains the code for the image
augmentation.
33
INPUT:
TEST RESULT
34
5.2.3 INTEGRATION TESTING
INPUT:
Figure 5.4 below shows the code snippet for printing the images and sketches while Figure 5.5 shows
the output of the code
Figure 5.4: Code For Printing Images And Their Corresponding Sketches
TEST RESULT
• A The images from the Images folder and their corresponding sketches from the sketches folder
are loaded
• The images and the sketches are then made to display as output parallelly using the matplotlib
library.
35
5.2.4 FUNCTIONAL TESTING
INPUT
TEST RESULT
• All the images from data sets are loaded into the model and carried out for training.
• Training is done by considering each image and saving the characteristics of the image
36
CHAPTER 6
RESULTS AND DISCUSSIONS
The current proposed system uses an encoder-decoder architecture to learn a mapping between images
and sketches. Autoencoders having a simple but efficient architecture results in faster training times
and lower computational requirements. It also does not require Labeled data as it trains in an
unsupervised manner. The current model gives us an accuracy of about 40 percent. With the
Autoencoder’s generative capabilities, this is the right neural network for image generation tasks
The proposed system can achieve good performance with relatively smaller datasets compared to more
data-hungry models, reducing data requirements and training time. It is also well-suited for tasks such
as image denoising, image compression, anomaly detection, and image-to-image translation, providing
versatility in application areas. The proposed system can also generalize well to unseen data and is
capable of adapting to different domains with minor adjustments to training strategies.
37
CHAPTER 7
7.1 CONCLUSION
The proposed approach in this study involves an end-to-end fully convolutional network aimed at
directly modeling the intricate nonlinear mapping between face photos and sketches. Experimental
findings underscore the efficacy of the fully convolutional network in adeptly addressing this
challenging task, facilitating pixel-wise predictions with both effectiveness and efficiency.
Future enhancements will focus on refining the existing loss function and conducting experiments
across various databases. Additionally, investigations into the correlation between our approach and
non-photorealistic rendering methodologies will be pursued.
7.3 RESULTS:
38
Figure 7.2: Model Prediction And Sketch Matched
39
CHAPTER 8
40
41
42
REFERENCES
[3] Luqing Luo,Zhi-Xin Yang,Lulu Tang,Kun Zhang An ELM-Embedded Deep Learning Based
Intelligent Recognition System for Computer Numeric Control Machine Tools IEEE Access, 2020
[5] Use of Disease Diagnosis Systems for Electronic Medical Records Based on Machine Learning: A
Complete Review IEEE Access, 2020
[9] Zhenghang Yuan,Lichao Mou,Qi Wang,Xiao Xiang Zhu From Easy to Hard: Learning Language-
Guided Curriculum for Visual Question Answering on Remote Sensing Data IEEE Transactions
on Geoscience and Remote Sensing, 2022
[10] Arka Ujjal Dey,Ernest Valveny,Gaurav Harit EKTVQA: Generalized Use of External
Knowledge to Empower Scene Text in Text-VQA IEEE Access, 2022
[12] Ali Ahmed,Sharaf J. Malebary Query Expansion Based on Top-Ranked Images for Content-
Based Medical Image Retrieval IEEE Access, 2020
[19] A. Kumar, J. Kim, W. Cai, M. Fulham and D. Feng, Content-based medical image retrieval:
A survey of applications to multidimensional and multimodality data, J. Digit. Imag., vol. 26, no.
6, pp. 1025-1039, 2013.
[21] S. F. Salih and A. A. Abdulla, An improved content based image retrieval technique by
exploiting bi-layer concept, UHD J. Sci. Technol., vol. 5, no. 1, pp. 1-12, Jan. 2021.
[22] Z. Tu and X. Bai, Auto-context and its application to high-level vision tasks and 3D brain
image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 10, pp. 1744-1757, Oct.
2010.
[23] A. Kusiak, Intelligent Manufacturing Systems, Upper Saddle River, NJ, USA:Prentice-Hall,
pp. 448, 1990.
[24] Z. Yang, P. Zhang and L. Chen, RFID-enabled indoor positioning method for a real-time
manufacturing execution system using OS-ELM, Neurocomputing, vol. 174, pp. 121-133, Jan.
45
2016.
[25] W. Xu, Z. Yang and X. Wang, A technical and business perspective on wireless sensor
network for manufacturing execution system, Math. Problems Eng., vol. 2015, pp. 1-15, 2015.
[27] R. Adelmann, M. Langheinrich and C. Floerkemeier, Toolkit for bar code recognition and
resolving on camera phones—Jump starting the Internet of Things, Proc. INFORMATIK
Informatik fÃ1⁄4r Menschen-Band, vol. 2, pp.366-373, 2006.
[29] J. Stausberg, D. Koch, J. Ingenerf and M. Betzler, Comparing paper-based with electronic
patient records: Lessons learned during a study on diagnosis and procedure codes, J. Amer. Med.
Inform. Assoc., vol. 10, pp. 470-477, Sep. 2003.
[30] C. S. Kruse, R. Goswamy, Y. Raval and S. Marawi, Challenges and opportunities of big data
in health care: A systematic review, JMIR Med. Informat., vol. 4, no. 4, pp. e38, Nov. 2016.
[31] J. J. Firthous and M. M. Sathik, Survey on using electronic medical records (EMR) to
identify the health conditions of the patients, J. Eng. Sci., vol. 11, no. 5, 2020.
[32] G. Makoul, R. H. Curry and P. C. Tang, The use of electronic medical records:
Communication patterns in outpatient encounters, J. Amer. Med. Inform. Assoc., vol. 8, no. 6, pp.
610-615, Nov. 2001.
[33] M. Sudmanns et al., Big earth data: Disruptive changes in earth observation data
46
management and analysis?,Int. J. Digit. Earth, vol. 13, no. 7, pp. 832-850, Jul. 2020.
[34] Y. Li, J. Ma and Y. Zhang, Image retrieval from remote sensing big data: A survey, Inf.
Fusion, vol. 67, pp. 94-115, Mar. 2021.
[35] Y. Yang and S. Newsam, Geographic image retrieval using local invariant features, IEEE
Trans. Geosci. Remote Sens., vol. 51, no. 2, pp. 818-832, Feb. 2013.
[36] E. Aptoula, Remote sensing image retrieval with global morphological texture descriptors,
IEEE Trans. Geosci. Remote Sens., vol. 52, no. 5, pp. 3023-3034, May 2014.
[37] X.-Y. Tong, G.-S. Xia, F. Hu, Y. Zhong, M. Datcu and L. Zhang, Exploiting deep features
for remote sensing image retrieval: A systematic investigation, IEEE Trans. Big Data, vol. 6, no. 3,
pp. 507-521, Sep. 2020.
[39] X. Zhou, K. Zhou and L. Shen, Rotation and translation invariant palmprint recognition with
biologically inspired transform, IEEE Access, vol. 8, pp. 80097-80119, 2020.
[40] Y. Hao, Z. Sun, T. Tan and R. Chao, Multispectral palm image fusion for accurate contact-
free palmprint recognition, Proc. 15th IEEE Int. Conf. Image Process., pp. 281-284, Oct. 2008.
[41] A. Iula and M. Micucci, A feasible 3D ultrasound palmprint recognition system for secure
access control applications, IEEE Access, vol. 9, pp. 39746-39756, 2021.
[42] S. Sun, X. Cong, P. Zhang, B. Sun and X. Guo, Palm vein recognition based on NPE and
KELM, IEEE Access, vol.9, pp. 71778-71783, 2021.
47
[43] X. X. Zhu et al., Deep learning in remote sensing: A comprehensive review and list of
resources, IEEE Geosci. Remote Sens. Mag., vol. 5, no. 4, pp. 8-36, Dec. 2017.
[44] S. Talukdar et al., Land-use land-cover classification by machine learning classifiers for
satellite observations—A review, Remote Sens., vol. 12, no. 7, pp. 1135, Apr. 2020.
[45] M. Castelluccio, G. Poggi, C. Sansone and L. Verdoliva, Land use classification in remote
sensing images by convolutional neural networks, arXiv:1508.00092, 2015.
[46] G. Cheng and J. Han, A survey on object detection in optical remote sensing images, ISPRS
J. Photogramm.Remote Sens., vol. 117, pp. 11-28, Jul. 2016.
[47] K. Li, G. Wan, G. Cheng, L. Meng and J. Han, Object detection in optical remote sensing
images: A survey and a new benchmark, ISPRS J. Photogramm. Remote Sens., vol. 159, pp. 296-
307, Jan. 2020.
[48] A. U. Dey, S. K. Ghosh, E. Valveny and G. Harit, Beyond visual semantics: Exploring the
role of scene text in image understanding, Pattern Recognit. Lett., vol. 149, pp. 164-171, Sep.
2021.
[49] S. Karaoglu, R. Tao, T. Gevers and A. W. M. Smeulders, Words matter: Scene text for
image classification and retrieval, IEEE Trans. Multimedia, vol. 19, no. 5, pp. 1063-1076, May
2017.
[51] S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, et al., VQA: Visual
question answering, Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp. 2425-2433, Dec. 2015.
48
[52] S. Bai and S. An, A survey on automatic image caption generation, Neurocomputing, vol.
311, pp. 291-304, Oct.2018.
[53] K. Seetharaman and M. Kamarasan, Statistical framework for image retrieval based on
multiresolution features and similarity method, Multimedia Tools Appl., vol. 73, pp. 1943-1962,
Dec. 2014.
[55] Arai, Hayato, et al. "Disease-oriented image embedding with pseudo-scanner standardization for
content-based image retrieval on 3D brain MRI." IEEE Access 9 (2021): 165326-165340.
[56] Sain, Aneeshan. Exploring Sketch Traits for Democratising Sketch Based Image Retrieval. Diss.
University of Surrey, 2023.
[57] Khokhlova, Margarita, et al. "Cross-year multi-modal image retrieval using siamese networks." 2020
IEEE International Conference on Image Processing (ICIP). IEEE, 2020.
49
50
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Deemed to be University u / s 3 of UGC Act, 1956)
8 Whether the above a) If the project / dissertation is done in group, then how many
project / dissertation is students together completed the project : 03
done by b) Mention the Name and Register number of
other candidates:
JACK ANDRE J [RA2011026020147], ABISHEK RAJ M
[RA2011026020131], CHARUDEVE KS [RA2011026020139]
ADDRESS OF GUIDE
9 Name and address
of the Supervisor / Mail ID :
Guide
Mobile Number :
51
NA
10 Name and address of
the Co- Mail ID: NA Mobile Number: NA
Supervisor /Guide
13 Plagiarism Details: (to attach the final report from the software)
Appendices NA NA NA
We declare that the above information has been verified and found true to the best of our knowledge.
Name and Signature of the Supervisor / Name and Signature of the Co-Supervisor / Co-Guide
Guide
Dr. XXX
Name and Signature of the HOD
52
.
53
54
55
56
57
58