Image Forgery Detection - Report
Image Forgery Detection - Report
ScholarWorks@CWU
Spring 2020
Recommended Citation
Alzamil, Lubna, "Image Forgery Detection with Machine Learning" (2020). All Master's Theses. 1361.
https://digitalcommons.cwu.edu/etd/1361
This Thesis is brought to you for free and open access by the Master's Theses at ScholarWorks@CWU. It has been
accepted for inclusion in All Master's Theses by an authorized administrator of ScholarWorks@CWU. For more
information, please contact [email protected].
IMAGE FORGERY DETECTION WITH MACHINE LEARNING
A Thesis
Presented to
The Graduate Faculty
Central Washington University
In Partial Fulfillment
of the Requirements for the Degree
Master of Science
Computational Science
by
Lubna Alzamil
June 2020
CENTRAL WASHINGTON UNIVERSITY
Graduate Studies
Lubna Alzamil
ii
ABSTRACT
Lubna Alzamil
June 2020
The issue of forged images is currently a global issue that spreads mainly via social
networks. Image forgery has weakened Internet users confidence in digital images. In
recent years, extensive research has been devoted to the development of new techniques to
combat various image forgery attacks. Detecting fake images prevents counterfeit photos
from being used to deceive or cause harm to others. In this thesis, we propose methods
using the error level analysis algorithm to detect manipulated images. We show that our
iii
ACKNOWLEDGMENTS
guidance, support, and participation in the success of this thesis through all the steps
involved. I also want to thank my committee members: Dr. Szilard Vajda and Dr. Boris
Kovalerchuk for generously giving their time, support, advice, and goodwill during this
document’s review.
iv
TABLE OF CONTENTS
Chapter Page
I INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
II THEORETICAL BACKGROUND . . . . . . . . . . . . . . . . . . . . . 5
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Algorithm Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Further Discussion on the Error Level Analysis Algorithm . . . . . . . . . . 18
Quality Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
v
Chapter Page
APPENDICES
A– CASIA DATASET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
C– RGB TABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
vi
LIST OF FIGURES
Figure Page
2 Model architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
12 Patch accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
22 Pristine image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
vii
Figure Page
23 Copy-move image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
24 Image splicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
27 RGB table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
viii
CHAPTER I
INTRODUCTION
With the advent of digital cameras and other smart devices, it has become easy for
anyone to manipulate an image. Some manipulations are not harmful, such as changing
the brightness of an image or converting it to black and white. On the other hand, some
manipulations can cause harm to others and defame them, especially politicians and
celebrities.
essential content or to force the viewer to believe an idea [1]. It has been defined as the
process of manipulating an original digital image to either conceal its original identity
or create an entirely different image than what was originally intended by the user of the
digital platform [2]. Forged images can cause disappointment and emotional distress and
affect public sentiment and behavior [3]. Images can transmit much more information
than text. People tend to believe what they can see, and this affects their judgment, which
the urgency to detect forgeries has significantly increased. The copymove approach is
one of the most widely used forgery techniques. It copies a part of the image and pastes
it onto another part of the image. The technique itself is not harmful, but it can lead to
Image forgery is done mainly for malicious reasons. It serves to distort information,
spread immorality and fake news, obtain money fraudulently from an unsuspecting
audience, ruin the reputation of a popular celebrity or any other public figure, and
spread adverse political influence among the users of a digital platform. Therefore, clear
authentication of images and videos uploaded to digital media platforms, before they are
1
used in any way, makes it more difficult for digital information users to share information
[4]. Image forgery is often used by malicious people to ruin the reputation of public
figures. Image forgery, especially through Photoshop, can be used to display unethical
goods produced or the services offered by these public figures to shift to other markets
[5]. This forgery could also be used for political reasons against opponent politicians
or their agents, spreading images that portray their immoral side. This aims to convey a
message to the public regarding the lack of integrity of the subject. Image forgery often
leads to emotional problems for those whose images are released to public websites in
disregard for their privacy. There have been reports of suicide due to the leaking of private
images to public digital platforms, after which the victims undergo significant rebuke.
increasingly common fraud schemes. The forged images are uploaded with embedded
text, purportedly from the owner of the original image, with instructions that end in
innocent people losing money. This is also done with images portraying people who are
in dire need of help, with intentions of fraudulently acquiring money from unsuspecting
members of the public. Society then ceases helping even those who are in genuine need of
For all of these reasons, it is vital to develop methods of detecting whether an image
Related Work
whether an image is manipulated. In [6], Bunk et al. discuss how resampling is a vital
2
signature of manipulated images. They proposed a method of detecting and locating
the area of manipulation on an image using resampling detectors with deep learning
forged that involved transfer learning. Bappy et al. [8] applied a long-short term memory
network with encoderdecoder architecture to detect and localize the area of manipulation.
Wang et al. [9] employed a feature pyramid network based on ResNet with Mask R-CNN
patches and passing these patches into a patch classifier (image pre-processing). They
subsequently trained the resulting complete images with a convolutional neural network
(CNN). In [11], Kaur and Manro pre-processed the images first. They then altered
the images to grayscale space and performed Gaussian pyramid decomposition from
time to time. Afterward, they detected image forgery using the block-based approach.
Amit Doegar [12] proposed a method involving AlexNets with support vector machines
(SVMs) to classify whether an image was forged without specifying the exact area of
manipulation. Zhou et al. [13] employed a pre-trained model, VGG16, with a steganalysis
match and robust match, and compared their performances. In [15], Shivakumar and
Baboo proposed an approach using the speeded-up robust features algorithm in parallel
with the K-dimensional tree algorithm to identify the manipulated region. Salloum
et al. [16] proposed using fully convolutional networks. In the beginning, they tried
using single-task fully convolutional networks. However, they noticed that multi-task
fully convolutional networks obtained better results than single-task fully convolutional
networks. The Pham et al. [17] segmented the manipulated images into spliced areas
3
and background areas in the manipulation-detection stage before the image-redemption
stage to improve the accuracy of the redemption. They suggested a hybrid approach that
could easily retrieve images using Zernike moment features and features found by a scale-
Contributions
At the beginning, I began working on the error level analysis (ELA) algorithm
with various machine learning classifiers. A detailed discussion of the algorithm and the
classifiers follow in Chapter II, Chapter III, and Chapter IV. I continued researching this
area until I realized that the ELA algorithm is not the best way to detect image forgery.
V, obtain more accurate and promising results. I used the same pre-processing technique
presented in [10]. The authors of [10] divided each image into 100 x 100 patches and
passed them to a patch classifier that classifies whether the patch belongs to a raw image
or a computer graphics image. They passed the resulting images to a CNN to predict
the result. I used the same patch classifier technique, but instead of passing the resulting
images into a standard CNN, I passed them to a VGG16 pre-trained model. The authors
of [10] achieved high accuracy, 93.4%, with their model, but I obtained a higher accuracy,
4
CHAPTER II
THEORETICAL BACKGROUND
that are used effectively in image recognition classification and applications. Specifically,
these neural networks are effective in facial, traffic-sign, and object identification. They
also help power vision in robots and remote-controlled cars (self-driving) [18]. They
evolved from the LeNet architecture, which was the initial CNN that was useful in the
development of deep learning [19]. There are four operations that form the foundation of
every CNN:
From computer graphics concepts, every image can be represented as pixel value
camera images, three channels are present: red, green, and blue [20]. These three
channels are stacked in layers in the form of 2-dimensional matrices, and each
channel has a pixel value that is in the range of 0 to 255. Grayscale images are
The convolution step entails extracting features from the input image. This
operation maintains the patterns between the pixels with the help of small squares
of input data that learn the features of the image. The operation involves sliding one
channel matrix, such as orange, by one pixel over the original image, which could
integer that represents a single element of the desired output matrix, such as pink.
The operations in this block involve a filter and a convolved feature, which interact
After every convolution operation, the ReLu involves a non-linear operation. The
ReLu operates on every pixel and replaces all negative pixel values in the feature
map with zero. The ReLu aims to introduce non-linearity to the CNN, as almost
The pooling step retains the most significant information of the feature map while
reducing the dimensionality of every feature through spatial pooling [21]. The
pooling step involves pooling of different types such as average, sum, and max.
For example, if the operation involves max pooling, then the spatial neighborhood
must be defined, and subsequently, the largest element from the rectified feature
map within the window is selected [21]. The operation uses the average instead of
the largest element for average pooling and the sum for sum pooling. Therefore,
pooling, convolution, and ReLU are the foundation blocks for the effective
implementation of CNN.
Figure 1 shows our model architecture of the ELA algorithm as a pre-processing step and
the passing of the resulting images into a CNN. The algorithm is discussed in Chapter III.
6
FIGURE 1: Error Level Analysis model architecture
The Support Vector Machine (SVM) [22] is a model used in classification and
regression. It can solve linear and non-linear problems and performs well on various
practical challenges. The SVM algorithm generates a hyperplane that divides the data
into categories [23]. It is best applied in regression and classification problems, and it
produces the highest accuracy while using less computational power [24]. This algorithm
distinct data points. The SVM is classified as a supervised machine learning model. It
categorizes sets of training data into one or two other categories, and then a training
algorithm model is built to assign the categories to their respective groups. The SVM
is a non-probabilistic binary linear classifier that employs methods such as Platts scaling.
When working with textual analysis classification tasks, the SVM process involves
refining training data while employing other forms such as Naive Bayes algorithms.
A confidence score is generated for each recognized text or digit. When confidence is
achieved in the dataset, the SVM continues the classification by applying a classification
algorithm that is suitable when in situations with limited data. The algorithm involves
separating two classes of data points with various choices of hyperplane. The SVM
focuses on finding the plane with the maximum margin that represents the distance
7
between two data points in both classes. Classifying future data points becomes effective
In SVMs, hyperplanes represent decision boundaries that are essential for data
point classification. The allocation of a data point to a certain class is based on the
side of the hyperplane that the point falls on. There are various features that form the
basis of the dimension of the hyperplane. Data points at the hyperplane determine the
orientation of the hyperplane to define support vectors. The margin of the classifier is
maximized through the support vectors. Support vector machines use the kernel trick to
perform linear classification while implicitly mapping inputs into feature spaces of high
dimension. The main goal of the SVM is to help classify data in most of the statistical
data points on the hyperplane makes it possible to apply the SVM effectively.
applications. A main application is text and hypertext categorization, which reduces the
requirement for labeled training in transductive and inductive settings. The classification
of images is another major area employing SVMs. The SVM is believed to achieve the
Random Forests
is made using the results from various models. In most cases, the outcome from an
ensemble model is better than that of any individual model [27]. Several decision trees
are generated by RFs, and the decision is determined based on the outcome of all the
decision trees [28]. An RF is a learning algorithm that randomly collects decision trees.
Each decision tree consists of several decisions, and a combination of them forms the
8
RF [29]. An RF integrates a collection of decision models from individual decision
trees in the forest to improve the accuracy of the results. This prevents relying on a
single learning model from a single decision tree in the RF. The merging of individual
decision trees, each with its own set of algorithms constituting the RF, therefore creates a
of the individual decision trees. This forms a basis of prediction that is more accurate than
that prediction that could have been made by a single decision tree or by a combination of
Decision trees are the building blocks of an RF, and the individual trees are used
to differentiate various events based on their most unique aspects [30]. An RF consists
of many trees that make the work more straightforward when complex sets of data are
involved. They work on the principle that many uncorrelated decision trees, if made
to operate as a group, will yield clearer and more accurate results than any individual
decision tree. This is possible because the decision trees protect each other from the
independent errors they make [31]. For an RF to work effectively, the decisions made
by the individual decision trees should have little or no correlation with each other. There
should also be unique signs in the differentiating features so that the models perform
Random forests utilize the idea of bagging, a process that allows decision trees to
sample from the dataset while making replacements, which result in different trees [32].
This is possible because individual decision trees are highly sensitive to the data that they
are trained on. Bagging therefore produces results where the individual trees not only
train on different sets of data but can also use different characteristics to make informed
decisions.
9
Image Pre-Processing
operations on images and suppressing unwanted distortions in them. It can also be used
to enhance certain unique features in an image that are crucial to further processing.
Image processing may be a basic task, such as resizing [33]. For example, to feed an
image dataset into a deep learning model, all images must be of the same size. Other
pre-processing tasks include geometric and color conversion, or the transition of color to
In this thesis, I used the pre-processing technique presented in [10]. The pre-
processing step takes every image in the dataset and divides it into 100 x 100 patches.
Subsequently, the patches are passed into a CNN classifier that classifies whether the
given patch belongs to a raw image (green) or a computer graphics image (red). After
the classification of patches, the complete images are generated again from the classified
patches. The authors of [10] passed the resulting images into another CNN to classify
whether an image is spliced. I used the VGG16 pre-trained model instead to accomplish
better accuracy than a standard CNN. Figure 2 shows our model architecture. The pre-
processing subfigure is from [10], and the VGG16 subfigure is from [35]. The datasets
10
FIGURE 2: Model architecture
large dataset to solve a problem that is similar to the problem I want the model to solve
[36]. I used a model pre-trained for a certain task on the ImageNet dataset. The initial
training of the model could have been done on a similar or very different domain, but the
ability to solve problems remains useful [37]. Studies on modern computer vision have
revealed that models that perform better on ImageNet usually perform better on other
vision tasks as well. It is common practice to use imported models, such as MobileNets
or VGG, due to the relatively high costs involved in training these models from scratch.
The task of importing, usually referred to as transfer learning, is not only effective but
also cost friendly to any profit-making institution. Transfer learning is also common
because pre-training a model requires a relatively large dataset for the model to extract
11
the characteristics required for the given task [19]. For instance, ImageNet contains over
one million images in 1000 categories [38]. A lack of data makes it difficult to train a
model from scratch and makes it necessary to import a pre-trained model. Another reason
for importing a pre-trained model instead of training one from scratch is the time required
to train a model from scratch, depending on the experience of the trainer. This is because
one must do many calculations and experiments before discovering a CNN architecture.
Pre-trained models are also commonly used because training a model from scratch
requires specific computational resources that might not be available. This also makes
more efficient than a model that can be trained from scratch [39]. Pre-trained models are
more accurate in most cases because they have been trained on a large number of classes,
such as the 1000 classes in ImageNet. Employing a pre-trained model enhances their
suitability to work on a wider range of issues compared to a model that is trained from
scratch. Importing a pre-trained model is also advantageous because the most complex
work of optimizing the parameters has usually been completed; what remains is only fine-
tuning the model, a process that involves adjusting the hyperparameters to improve the
pre-trained model. Another advantage of the pre-trained model is that it uses fewer steps
before the convergence of the output [39]. This is because for a classification task, the
features to be extracted will be similar, and it thus requires less time. Prior to choosing
question, and the keywords should be determined based on the type of dataset to be used.
This is because, depending on the complexity of the dataset, some models usually work
better than others. VGG16 is a CNN model that is used in large-scale image recognition.
It provides high accuracy testing using ImageNet, which consists of 100 classes of 14
12
million images. The model comprises 16 layers with weights, indicated by the value 16 in
The input to the first convolutional layer of the VGG16 model is an RGB image
of fixed-size, 224 x 224. The image is passed through a stack of convolutional layers
that represent the use of a receptive field of size 3x3, the smallest size that can represent
bottom, up, left, right, and center. The VGG16 also makes use of 1 x 1 convolution filters
has a fixed value of 1 pixel. Therefore, any spatial padding of the convolutional layer
input must be preserved during convolution. A stack of convolutional layers forms the
foundation for fully connected layers. However, the stack is different from the fully
connected layers in architecture and depth. Each of the top two stacks comprises 4096
channels, and the third layer can perform a 1000-way ImageNet large-scale visual
recognition challenge classification, giving it 1000 channels. The soft max layer forms
the final layer that contributes to the configuration of the fully connected layers in the
networks [41]. In the presence of hidden layers in pre-trained models, the ReLU non-
13
linearity is the basic option that is applied. One of the challenges facing this model is
the need for increased memory due to a high consumption of space. The VGG16 is a
responsible model that assists machine learning experts in applying pre-trained networks
14
CHAPTER III
Error Level Analysis (ELA) is a tool for exposing fabricated regions in JPEG
by detecting the noise distribution present after resaving the image at a particular
compression rate.
Overview
Neal Krawetz invented the concept of Error Level Analysis for images when he
noticed how errors spread when a JPEG image is saved [43][44]. When cutting out a
section of an image and pasting it into another image, the ELA for the pasted section
often detects a more significant error, which means it is brighter than the rest of the
image. There are several implementations of this algorithm, but they all follow the same
steps.
The Algorithm
1. Compress the input image with a given compression rate and save it as a new
image.
2. Calculate the pixel-by-pixel difference between the original image and the new
image.
4. Compute the minimum and maximum pixel values for each band in the image and
7. Enhance the brightness of elaImg based on the resulting scale, then save and return
import os
from PIL import Image, ImageChops, ImageEnhance
img = Image.open(filename).convert(’RGB’)
img.save(resavedImageName, ’JPEG’, quality=quality)
imgResaved = Image.open(resavedImageName)
extrema = elaImg.getextrema()
diff = max([ex[1] for ex in extrema])
if diff == 0:
diff = 1
scale = 255.0 / diff
elaImg = ImageEnhance.Brightness(elaImg).enhance(scale)
os.remove(resavedImageName)
return elaImg
16
Algorithm Output
Figure 5 and Figure 6 show pristine and manipulated images with their
17
I loaded Figure 5a into the ELA algorithm, and the output is given in Figure 5b.
Figure 5b shows a nearly entirely black box, indicating that there is no noise in the image,
I calculated the error level of Figure 6a , as shown in Figure6b. The noise in the
image is clear and indicates that the image of the hybrid animal is manipulated.
quality level. An image at 100% quality has (almost) no loss, and a 1%-quality image
is a very low-quality image. In general, quality levels of 90% or higher are considered
high quality, 8090% is medium quality, and 7080% is low quality. Anything below 70%
is typically very low quality [45]. Low-quality images can reduce the ability of analysis
known quality level, such as 75%, and during Step 3 in the algorithm, it then identifies the
Quality Dependence
Saving an image with different quality levels affects the ELA algorithm, and this
leads to a distinct number of bright pixels. The lower the quality, the higher the number of
bright pixels. An image can be modified and saved in a quality lower than the quality
used in the ELA algorithm, and this makes it difficult to detect whether the image is
manipulated. Figure 7 shows an example of a pristine image that is saved with different
qualities. Subfigures (a) and (d) show the outcome of saving an image with 90% quality.
Subfigures (b) and (e) show the outcome of saving an image with 70% quality. Subfigures
18
(c) and (f) show the outcome of saving an image with 48% quality. From Figure 7, I
have concluded that the lower the quality, the greater the noise. This means that the ELA
19
CHAPTER IV
The CASIA dataset [46] contains three categories that make it an appropriate
2. Copymove images: The manipulated region has been copied from the same image
3. Spliced images: The manipulated region has been copied from a different image
System Specifications
The specifications of the computer I used to conduct these experiments are the
following:
TM
– CPU: Intel R Core i7.
– RAM: 16 GB.
CNNs [47], SVMs, and RFs. The following sections contain explanations of and results
The image paths were passed to a function that converts the images into their ELA
form, as discussed in Section 3.2. I then split the dataset into training and testing sets
using the train test split method from sklearn. I subsequently created a CNN
with three convolutional layers because it achieved better results than two layers. With
four layers, the model became overfitted. The results are discussed in Section 4.3. The
to plot performance curves, sklearn and keras to create the neural network.
The Image, ImageChops and ImageEnhance modules were used in the ELA
algorithm. Figure 8 shows validation and training accuracy for various numbers of epochs
and batches.
21
Results
(c) 100 epochs batch size=50. (d) 100 epochs batch size=100.
With the CNN, the model achieved a 79% accuracy. It had reached 80% when
I initially ran it on PopOs 19.04. However, I were required to downgrade the system
to Ubuntu 18.04 Bionic Beaver, since some Python packages and libraries are not yet
supported by PopOs. Figure 8 shows various performance measure curves: (a) shows
the accuracy of training with 60 epochs and a batch size of 50, (b) shows the accuracy of
training with 60 epochs and a batch size of 100, (c) shows the accuracy of training with
100 epochs and a batch size of 50, and (d) shows the accuracy of training with 100 epochs
and a batch size of 100. Of the training dataset, 80% was from the CASIA dataset. The
remaining 20% was for validation. Figure 8 shows that training accuracy was high. With
22
the validation data used on the model to evaluate its performance, the accuracy dropped to
72%.
I converted all the images to their ELA format. I then passed the resulting two
folders (pristine image folder and manipulated image folder) to the SVM [48] with
rbf kernel. The results are shown in Section 4.4. The primary packages used include
skimage to open images and sklearn and keras to use the SVM classifier.
Results
For feature extraction, I chose it based on the brightness of the pixels. I began
counting bright pixels on the ELA form. I estimated 150 pixels after consulting the
RGB table shown in Appendix C. According to the table, three zeros represent the color
black. The combination (255,255,255) represents white, the brightest shade. The second
row of the RGB table shows the combination (127,127,127) that represents gray. In my
23
opinion, the shade of gray provided by this combination was too dark. I therefore chose
considered to be noise in the ELA images. The brighter the pixels in an ELA image,
the greater the noise. Figure 6b shows the ELA for a fake image. It contains 634 bright
pixels. I created the condition that if there are more than 300 bright pixels on an ELA
image, then the image is fake. I estimated 300 because some pristine images may have
some noise, as discussed in Section 3.4. The results are shown in Section 4.5. The
primary packages used include pandas to read images paths and sklearn to create
the RF.
Results
24
CHAPTER V
RAW images, which are uncompressed and never processed [49]. An image from this
taken from games. It is a computer graphics database [50]. An image from this dataset is
compare our results to those from [10]. The authors of [10] used CNNs to detect forgery
in an image. I used the same approach for image pre-processing but with a pre-trained
As mentioned in Chapter II, the pre-processing function takes every image in the
dataset and divides it into 100 x 100 patches. The patches are then passed into a CNN
classifier that classifies whether the given patch belongs to a raw image (green) or a
25
computer graphics image (red). After classifying 100 patches, the pre-processing function
displays the accuracy of the classification. The classifier saves weights in a .ckpt file
after classifying 500 patches. After the classification of patches, the complete images are
regenerated from the classified patches. The authors of [10] passed the resulting images
into another CNN to obtain a final result, and they achieved an accuracy of 93.4%. I used
the vGG16 pre-trained model to accomplish higher accuracy than using a standard CNN.
For the pre-processing step, the authors of [10] achieved an accuracy of 84.4%.
The authors of [10] achieved 84.4% accuracy on the pre-processing step. I obtained
87% on patch level. This higher accuracy was achieved because our weights were
26
FIGURE 13: Patch validation accuracy
The authors of [10] passed the resulting complete images to another CNN and
achieved 93.4% accuracy by training 1800 images. I obtained 94.5% accuracy by using a
27
FIGURE 15: Whole image training and validation accuracy
28
I wrote a program that generates randomly spliced images, illustrated in Figure17.
export="/lubna/CWU/Thesis/generatingFakeImages/SpHuge/"
realPath="/lubna/CWU/Thesis/generateFakeImages/realimSP/"
realfiles=os.listdir(realPath)
fakePath="/lubna/CWU/Thesis/generatingFake/fakeimgSP/"
fakefiles=os.listdir(fakePath)
for i in range(1439):
left = 155
top = 65
right = 600
bottom = 600
x=random.randint(0,2300)
y=random.randint(0,2300)
a=random.choice(realfiles)
b=random.choice(fakefiles)
im1 = Image.open(realPath+a)
im2 = Image.open(fakePath+b)
im3=im2.crop((left, top, right, bottom))
backim = im1.copy()
backim.paste(im3, (x,y))
backim.save(export+str(i)+’.jpg’, quality=95)
By using the image pre-processing function found in the library, I could locate the
spliced area in an image. I tested our model with a randomly generated spliced image,
29
FIGURE 18: Detecting spliced area
I also took a photograph from a personal camera and pasted an object onto it.
I also attempted to input an image with lower brightness and a different object
Generating the ground truth mask of a manipulated image is more reliable than the
ELA algorithm, and it is not affected by the quality of the image. It also shows the exact
area of manipulation. Obtaining the ground truth mask requires knowledge of image pre-
31
Conclusions
malicious reasons. This involves a genuine image that had been displayed on a public
website or a digital communication platform and is edited into an entirely different image.
The new image will likely be immoral in nature or targeted to spread negative publicity.
The ELA algorithm shows whether an image is manipulated when the input images
quality is close to the quality used in the algorithm. If there is a large difference between
the quality of the image and the quality of the algorithm, then the result will always be
incorrect. Furthermore, the algorithm does not show the exact area of manipulation.
A pre-trained model is a model that has been trained on a certain task on the
ImageNet dataset. It is a model that has been trained to solve issues that might be similar
to the problem at hand. A pre-trained model is preferred in most cases to training a model
learning.
Other approaches do not depend on the quality of the images and show the exact
area of manipulation. The patch classification approach is not affected by the quality of
the image and achieves more accurate results. Commonly imported models such as VGG
and MobileNets have been trained on large sets of data and are therefore very efficient
on any given dataset. The time required to train a model from scratch, depending on
models.
32
Bibliography
[2] S. Walia and K. Saluja, “Digital image forgery detection: a systematic scrutiny,”
images: The effects of source, intermediary, and digital media literacy on contextual
assessment of image credibility online,” New Media & Society, vol. 21, no. 2, pp.
[5] C. Salge, “Is that social bot behaving unethically?” Communications of the ACM,
localization of image forgeries using resampling features and deep learning.” in CVPR
http://dblp.uni-trier.de/db/conf/cvpr/cvprw2017.html#BunkBMNFMCRP17
33
[7] Y. Abdalla, M. Iqbal, and M. Shehata, “Image forgery detection based
//ejece.org/index.php/ejece/article/view/125
IEEE Transactions on Image Processing, vol. 28, no. 7, pp. 3286–3300, July 2019.
[9] X. Wang, H. Wang, S. Niu, and J. Zhang, “Detection and localization of image
graphics from natural images using convolution neural networks,” in 2017 IEEE
[11] G. Kaur and D. R. Manro, “A brief review : Copy-move forgery detection,” 2018.
[12] K. G. Amit Doegar, Maitreyee Dutta, “Cnn based image forgery detection using
[13] P. Zhou, X. Han, V. I. Morariu, and L. S. Davis, “Learning rich features for image
[14] A. Gupta, N. Saxena, and S. Kumar, “Detecting copy move forgery in digital images,”
94–97, 03 2013.
34
[15] B. Shivakumar and S. Baboo, “Detection of region duplication forgery in digital
images using surf,” International Journal of Computer Science Issues, vol. 8, no. 3,
[16] R. Salloum, Y. Ren, and C.-C. J. Kuo, “Image splicing localization using a multi-task
fully convolutional network (mfcn),” J. Vis. Commun. Image Represent., vol. 51, pp.
201–209, 2018.
[17] N. T. Pham, J.-W. Lee, G.-R. Kwon, and C.-S. Park, “Hybrid image-retrieval method
[19] K. Kang, “Comparison of face recognition and detection models: Using different
convolution neural networks,” Opt. Mem. Neural Netw., vol. 28, no. 2, p. 101108,
[20] Y. Mao, Z. He, Z. Ma, X. Tang, and Z. Wang, “Efficient convolution neural networks
for object tracking using separable convolution and filter pruning,” IEEE Access,
towardsdatascience.com/, 2018.
35
[24] W. Land and J. Schaffer, The Support Vector Machine, 01 2020, pp. 45–76.
[26] Tin Kam Ho, “Random decision forests,” pp. 278–282 vol.1, 1995.
Classifier,” https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.
RandomForestClassifier.html, 2020.
[28] N. Horning, “Random forests: An algorithm for image classification and generation
[29] R. Pandya and J. Pandya, “C5. 0 algorithm to improved decision tree with
[31] Q. Zhang, Y. Yang, H. Ma, and Y. Wu, “Interpreting cnns via decision trees,” 06
[32] L. Ma, B. Sun, and Z. Li, “Bagging likelihood-based belief decision trees,” 07 2017.
36
[35] GeeksforGeeks, “VGG-16 — CNN model,” shorturl.at/tvAKM, 2019.
Features and Synthetic Images for Deep Learning: Munich, Germany, September
[37] M. Simon, E. Rodner, and J. Denzler, “Imagenet pre-trained models with batch
[38] M. Long, H. Zhu, J. Wang, and M. I. Jordan, “Deep transfer learning with joint
pp. 4917–4926.
[40] R. Thakur, “Step by step VGG16 implementation in Keras for beginners,” shorturl.at/
gjzL5, 2019.
[41] A. S. Pankaj Kumar Kandpal, Ashish Mehta, “Honey bee bearing pollen and
[42] A. Omar, “Lung ct parenchyma segmentation using vgg-16 based segnet model,”
[43] I. Steadman, “ ’Fake’ World Press Photo isn’t fake, is lesson in need for forensic
tutorial-ela.php, 2012-2019.
Neural Networks with TensorFlow: Solve Computer Vision Problems with Modeling
html, 2020.
raw images dataset for digital image forensics,” in Proceedings of the 6th
ACM Multimedia Systems Conference, ser. MMSys 15. New York, NY, USA:
https://doi.org/10.1145/2713168.2713194
38
APPENDIX A
CASIA DATASET
In Figure 23, the top right flower has been cut, resized, and pasted onto the lower
39
In Figure 24, the two yellow flowers in the top left have been cut from Figure 24a
and pasted onto Figure 24b. The result is shown in Figure 24c.
40
APPENDIX B
41
FIGURE 26: An image from the Reference Database
42
APPENDIX C
RGB TABLE
43