0% found this document useful (0 votes)
13 views

Classifying Authentic and AI-Generated Images with a Fine- Tuned ResNet50 Model.

This research paper presents a method for classifying authentic and AI-generated images using a fine-tuned ResNet50 model, achieving up to 98% accuracy. The study utilized a dataset of 140,000 facial images, evenly split between real and AI-generated, and explored both feature extraction and fine-tuning techniques. The findings highlight the importance of developing reliable tools to detect AI-generated content, addressing concerns about misinformation and the trustworthiness of digital media.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Classifying Authentic and AI-Generated Images with a Fine- Tuned ResNet50 Model.

This research paper presents a method for classifying authentic and AI-generated images using a fine-tuned ResNet50 model, achieving up to 98% accuracy. The study utilized a dataset of 140,000 facial images, evenly split between real and AI-generated, and explored both feature extraction and fine-tuning techniques. The findings highlight the importance of developing reliable tools to detect AI-generated content, addressing concerns about misinformation and the trustworthiness of digital media.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Classifying Authentic and AI-Generated Images

with a Fine-Tuned ResNet50 Model.


Nizamuddin Mandekar, Akash Chaudhary
Department Of Information Technology, B. K. Birla College of Arts, Science & Commerce (Autonomous)
Kalyan, 421301, Maharashtra, India.
[email protected]
[email protected]
ABSTRACT harm. This makes it very important to find reliable ways to
tell the difference between real and AI-created images.
With rapid advances in Artificial Intelligence (AI),
it is becoming harder to tell the difference between real
This research focuses on solving this problem by using deep
images and those created by AI. Technologies like
learning techniques to classify images as either real or AI-
Generative Adversarial Networks (GANs) and Latent
Diffusion Models (LDMs) can produce very realistic images generated. Specifically, we explore using a pre-trained deep
that look just like real ones. This is a problem for areas like learning model called ResNet50. ResNet50 is a powerful
digital security, news, and social media, where it’s important model that has been trained on large datasets and has shown
to trust the authenticity of images. great success in image classification tasks. We plan to fine-
tune this model to detect subtle differences between real
This paper looks at how to identify AI-generated images images and those generated by AI.
using a ResNet50 deep learning model. The model was
trained on a balanced set of 140,000 facial images, half real The goal of this research is to create an accurate model that
and half AI-generated. The ResNet50 model was improved can reliably identify AI-generated images. By fine-tuning
using transfer learning to better detect small details that ResNet50, we hope to take advantage of its existing
separate real images from fake ones. The model was tested knowledge and adapt it to this specific task. Being able to
in two ways: feature extraction and full fine-tuning. The tell apart real and synthetic images is becoming increasingly
fine-tuned model achieved up to 98% accuracy, showing important as AI-generated content becomes more common.
that it works well. This research aims to contribute to the development of tools
This research highlights the power of the fine-tuned that can help address the challenges of fake and manipulated
ResNet50 in detecting AI-generated images. It helps in the images, ensuring that we can trust the images we see in
effort to build stronger systems for checking images, digital media.
fighting the spread of fake AI-generated content, and
protecting the trustworthiness of online media. The misuse of generative AI is a growing concern because it
Keywords: Artificial Intelligence, AI-Generated Images, is so easily available. These images can be used to hide
Generative Adversarial Networks, Latent Diffusion Models, people’s identities online, leading to scams. They can also
Image Verification, Deep Learning, ResNet50, Transfer trick facial recognition systems, and AI-generated videos or
Learning. audio can be used for blackmail. Deepfakes can even create
fake evidence to frame innocent people [2].
INTRODUCTION
In recent years, Artificial Intelligence (AI) has made huge This paper presents a solution to classify AI-generated and
advancements, affecting many areas like healthcare, real images using a fine-tuned ResNet50[3] model.
entertainment, and security. One of the most exciting
developments is in generative models, which are AI systems DATASETS
that can create new content, such as realistic images, often
The dataset used for this research is the 140K Real and Fake
indistinguishable from real photos. These models,
Faces Dataset[4], consisting of 140,000 images of human
particularly Generative Adversarial Networks (GANs) and
faces, evenly split between 70,000 real faces and 70,000 fake
Latent Diffusion Models (LDMs), have shown great
faces generated by Style GANs. The dataset is divided into
potential in fields like art, design, and entertainment.
training, validation, and test sets. The training set comprises
However, they also raise concerns, especially about the
50,000 images of each category, while both the validation and
authenticity of images.
test sets contain 20,000 images of each category. The data can
be found at [4] Kaggle.
As AI-generated images become more realistic, it’s getting
harder to tell them apart from real images. This is a problem The 70,000 real face images are sourced from the Flickr
in areas like journalism, law enforcement, and security, dataset, collected by Nvidia[5]. In contrast, the 70,000 fake
where the trustworthiness of images is crucial. AI-generated face images are sampled from 1 million fake faces generated
images can be used to create misleading or fake content, by Style GANs[6]. All images are resized to 256 pixels for
such as deepfakes or false evidence, which can cause serious consistency. Below are some examples from the dataset:
E.g. Fake 1 E.g. Fake 2 E.g. Fake 3

E.g. Real 1 E.g. Real 2 E.g. Real 3


METHODS In this paper, we used the ResNet50 model. ResNet50 is
a deep learning model with “Residual Connections,” which
1) Transfer Learning: ResNet50 help make deep networks possible without the problem of
Transfer learning is a deep learning method that vanishing gradients (when gradients get too small to update
uses existing, pre-trained models to tackle new weights effectively). These connections allow information to
problems that are somewhat similar. There are two skip layers, helping gradients stay strong enough to update
main ways to do this: the model. Sometimes, these connections include
convolutional layers to adjust the shape of the data.
1. Fine-Tuning: Here, we don’t freeze the model's
weights. Instead, we keep the model's structure and EXPERIMENT 1: FEATURE EXTRACTION:
let all of its layers adjust to fit the new problem.
This process takes more time and computational
In the first experiment, minimal changes were made to the
resources because it updates all the weights through
ResNet50 model. The fully connected (fc) layer was adjusted
backpropagation. Sometimes, only certain layers
by changing the output class of the linear layer from 1000 to
are left unfrozen while others are kept fixed.
the length of my output classes (i.e.,2) The structure of
2. Feature Extraction: In this approach, we freeze
ResNet50:
some of the model's layers and only update a few
layers to fit the new problem. This method is faster
and less computationally demanding since only a
few layers are trained.
The code for Feature Extraction is:
model.fc = torch.nn.Sequential( torch.nn.Linear(2048, 2) )
EXPERIMENT 2: FINE-TUNING: class, no additional data augmentation was applied.
The images were pre-resized to 256 pixels. The
In the second experiment, additional changes were made to
feature extractor model was trained for 15 epochs,
improve accuracy. A combination of Linear layers, ReLU() using a learning rate of 0.01 and the Adam
functions, and Dropout layers was used. The fully connected optimizer. A batch size of 32 was used. The fine-
(fc) layer of ResNet50 was modified as follows: tuned model was also trained for 5, 10, and 15
model.fc = torch.nn.Sequential( epochs with the same setup.
torch.nn.Linear(2048, 1000),
Results and Discussion:
torch.nn.ReLU(), The best way to assess a model’s performance is by
torch.nn.Linear(1000, 500), plotting loss and accuracy graphs. Here, we go over
the results from the experiments conducted.
torch.nn.Dropout(),
torch.nn.Linear(500, 100), Experiment 1: Feature Extraction Model
torch.nn.ReLU(), In Figure 1, we see the training loss, test loss,
training accuracy, and test accuracy for this
torch.nn.Dropout(), experiment. Both training and test accuracy
torch.nn.Linear(100, 2) # Number of classes ) improve over the epochs, reaching as high as 89%.
Likewise, the training and test loss decrease.
2) Training Setup: However, starting from the 12th epoch, the model
With 50,000 images already available for each begins to overfit.

Figure 1
Experiment 2: Fine-Tuning Training accuracy consistently improves, while test
Figure 2 shows the results for Experiment 2 with 5 accuracy dips at first but begins to rise after the
epochs. The training loss decreases steadily, while first epoch, eventually reaching 98%. However,
the test loss initially rises until the second epoch, when this experiment was repeated with the same
then starts to decrease along with the training loss. setup, test accuracy did not exceed 94%.

Figure 2
Figure 3 displays the results for Experiment 2 pattern. The accuracy curves indicate an increase in
conducted over 10 epochs. Both the training and both training and test accuracy, with test accuracy
test loss decrease, with the training loss dropping reaching 95% and training accuracy reaching
smoothly, while the test loss shows a more zigzag around 97%

Figure 3
Figure 4 illustrates the results for Experiment 2 them. Training and test accuracy initially increase,
over 15 epochs. Both training and test loss keep reaching a peak of 96%, but then begin to fluctuate
decreasing, though a clear gap appears between in a zigzag pattern.
Figure 4
Performance Metrics : The formulas for these metrics are:

Accuracy is defined as the number of correctly classified Accuracy:


𝑻𝑷+𝑻𝑵
outputs divided by the total number of classes. The Accuracy=
𝑻𝑷+𝑻𝑵+𝑭𝑷+𝑭𝑵
performance metrics used in this study include accuracy,
precision, recall, and F1 score, which are defined as follows: Precision:
𝑇𝑃
Precision =
• True Positive (TP): The number of correctly 𝑇𝑃+𝐹𝑃

predicted positive classes.


Recall:
• False Positive (FP): The number of incorrectly 𝑻𝑷
predicted positive classes. Recall =
𝑻𝑷+𝑭𝑵
• True Negative (TN): The number of correctly
predicted negative classes. F1 Score:
𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏∗𝑹𝒆𝒄𝒂𝒍𝒍
• False Negative (FN): The number of incorrectly F1 Score = 𝟐 ∗
𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏+𝑹𝒆𝒄𝒂𝒍𝒍
predicted negative classes

The accuracy, precision, recall, and F1 score for all models are summarized in the table below:

Model Accuracy Precision Recall F1 Score

FeatureExtraction 0.8997 0.9172 0.8505 0.8505


Model
Fine Tuned Model 0.9776 0.9803 0.9947 0.9875
(5 epoch)

Fine Tuned Model 0.9542 0.9348 0.9425 0.9386


(10 epochs)

Fine Tuned Model 0.9685 0.9475 0.9416 0.9445


(15 epochs)

DISCUSSION when the ResNet50 model was used for feature extraction, it
achieved an accuracy of 89%, which is quite good for an
The experiment demonstrates that fine-tuning the ResNet50 unsupervised learning approach. However, when we applied
model significantly boosts its classification accuracy fine-tuning by updating the weights of the pre-trained
compared to using it as a feature extraction model. Initially, network with additional training, the accuracy rose to an
impressive 98% after just 5 epochs of training. This shows Adversarial Networks), which could be worth
that allowing the model to learn and adjust to the specifics of exploring.
the current dataset significantly enhances its ability to classify 2. Use a Larger Dataset: One limitation of this research
correctly. is the size and variety of the dataset used. To make the
detection system more reliable, future research should
However, the consistency of these results varied across use a bigger and more varied dataset. This could include
different runs of the experiment. While fine-tuning with 5 higher-quality images, images from different AI
epochs led to high accuracy, repeating the experiment yielded models, and images from a variety of sources, like
varying results, indicating some instability. Additionally, social media, news, and entertainment. A larger dataset
when we extended the training to 10 or 15 epochs, the model's will help the model perform better on a wider range of
accuracy remained stable but slightly decreased compared to AI-generated content.
the 5-epoch fine-tuning. This slight drop in accuracy after 3. Real-Time Detection: The current model might take
longer training durations may indicate that the model was some time to process large images or datasets. In the
starting to overfit. Overfitting occurs when the model future, it would be useful to make the model faster so it
becomes too specialized in the training data, losing its ability can classify images in real time. This could help with
to generalize well to new, unseen data. It could also point to practical uses, like detecting fake images on social
instability in the training process, where the model’s media or during live TV broadcasts.
performance fluctuates with prolonged training. 4. Include Other Types of Data: This research focused
only on images, but AI-generated content often comes
with other data, like text or metadata (such as the time
For future improvements, several steps can be considered:
and place an image was uploaded). Future work could
combine different types of data into one model. For
1. Exploring Alternative Architectures: Trying different example, if an AI-generated image has a caption,
deep learning architectures (such as DenseNet, analyzing both the image and the text together could
EfficientNet, or Vision Transformers) could provide help the model detect fakes more accurately.
insights into whether a different model architecture 5. Handle Evolving AI Models: AI-generated images are
might result in more stable or higher performance. getting better and harder to detect, with some newer AI
2. Further Fine-tuning: Fine-tuning for more epochs or tools specifically designed to fool detection systems.
adjusting hyperparameters like the learning rate and Future research could focus on making models stronger
batch size could improve stability. A more gradual against these changes by using adversarial training.
learning rate decay or using advanced optimization This means training the model to recognize fake images
techniques like learning rate warm-up could help in even when they have been altered to look more real.
stabilizing long-term training. 6. Address Ethical Issues: AI-generated images raise
3. Incorporating Data Augmentation: To enhance the serious concerns about misinformation and
model’s ability to generalize better on new data, data manipulation, especially when they are used to mislead
augmentation techniques such as rotations, flips, color people. Future research should look not only at
adjustments, and cropping can be applied. This would improving detection but also at the ethical side of these
artificially increase the diversity of the training set, technologies. It should consider how these technologies
reducing the chances of overfitting by providing the can be misused and how we can set rules to ensure they
model with varied examples. are used responsibly. This includes thinking about how
4. Cross-Validation and Regularization: Using cross- fake images might impact public trust, privacy, and
validation during training could help assess the model’s security.
generalization ability more effectively. Regularization 7. Explore Other Applications: This research focused on
techniques, like dropout or weight decay, could also help detecting if an image is real or AI-generated, but there
mitigate overfitting and improve model robustness. are other ways this technology could be used. For
example, it could be helpful in areas like medicine,
By incorporating these improvements, we could aim for more where fake medical images could harm patients.
stable and reliable performance, making the model better Detecting fake medical images would be crucial for
suited for real-world applications where consistency is key. correct diagnosis. It could also be useful in other fields
like journalism and art, where fake images could
FUTURE WORK mislead people.

While this research shows that ResNet50 can detect AI-


generated images well, there are several ways we can
CONCLUSIONS
improve and build upon this work. Below are some ideas for This research focuses on the problem of misuse of
future research: generative technology, especially AI-generated images, and
proposes a model to detect these images. Initially, the model
1. Try More Advanced Models: While ResNet50 is a used a feature extraction method, but it didn't perform well
good model, there are other, newer models that might because it struggled to tell the difference between real and
work even better. For example, EfficientNet and Vision AI-generated images. This was likely because the model
Transformers (ViTs) have shown great results in many wasn't able to adapt well to the unique features of AI-
image tasks. Future research could test these models to generated images.
see if they can detect AI-generated images more
accurately or quickly. There are also models made To improve the model, changes were made to the fully
specifically to detect AI-generated content, such as connected layer, and the model was trained for different
those trained on images from GANs (Generative numbers of epochs (iterations of training). These changes
led to better results, with the model's accuracy improving
after fine-tuning the ResNet50 model. Fine-tuning allowed Conference on Machine Learning-Volume 70 (2017): 2642-
the model to learn more effectively and increased its ability 2651.
to correctly classify images. Comparing models trained for
different epochs showed that fine-tuning the model for 5 [11] Karras, Tero, Samuli Laine, and Timo Aila. "A style-
epochs produced the best accuracy, though training for based generator architecture for generative adversarial
longer periods (10 or 15 epochs) led to slightly lower networks." Proceedings of the IEEE conference on
performance, likely due to overfitting, where the model computer vision and pattern recognition (2019): 4401-4410.
becomes too focused on the training data and struggles to
generalize to new data.
[12] Zhu, Jun-Yan, Taesung Park, Phillip Isola, and Alexei
A. Efros. "Unpaired image-to-image translation using cycle-
This research shows that fine-tuning pre-trained models, like consistent adversarial networks." Proceedings of the IEEE
ResNet50, can greatly improve their ability to detect AI- international conference on computer vision (2017): 2223-
generated images. However, the results also show that there 2232.
is room for improvement, as the model's performance can
still be affected by factors like overfitting.
[13] Li, Xin, Jianchao Yang, Hongdong Li, and Haibin
Ling. "Horizon: A scalable framework for learning deep
REFERENCES generative models for 3D object modeling." IEEE
Transactions on Pattern Analysis and Machine Intelligence
[1] Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza,
41.10 (2019): 2379-2392.
Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron
Courville, and Yoshua Bengio. "Generative adversarial
nets." Advances in neural information processing systems 27 [14] Kingma, D.P., and M. Welling. "Auto-Encoding
(2014). Variational Bayes." International Conference on Learning
Representations (ICLR), 2014.
[2] Jovanović, Radiša. "Convolutional Neural Networks for
Real and Fake Face Classification." In Sinteza 2022- [15] Xie, L., and L. Ren. "Deepfake detection with GAN-
International Scientific Conference on Information based methods." Proceedings of the IEEE/CVF
Technology and Data Related Research, pp. 29-35. International Conference on Computer Vision Workshops
Singidunum University, 2022. (2021): 3154-3162.

[3] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian [16] Choi, Yong-Hyun, et al. "Learning deep generative
Sun. "Deep residual learning for image recognition." In models for efficient image synthesis and generation."
Proceedings of the IEEE conference on computer vision and International Journal of Computer Vision (2020).
pattern recognition, pp. 770-778. 2016.
[17] Wu, Y., and M. Zeng. "Exposing deepfakes with
[4] P. Datasets, "140k Real and Fake Faces," [Online]. adaptive learning." IEEE Transactions on Image Processing
Available: https://www.kaggle.com/datasets/xhlulu/140k- 29 (2020): 741-755.
real-and-fake-faces. [Accessed 23 3 2022].
[18] Rössler, Andreas, et al. "FaceForensics++: Learning to
[5] P. Datasets, "70k Real Faces," [Online]. Available: Detect Manipulated Facial Images." Proceedings of the
https://www.kaggle.com/c/deepfake-detection- IEEE International Conference on Computer Vision (2019):
challenge/discussion/122786. [Accessed 23 3 2022]. 1-11.

[6] P. Datasets, "1 Million Fake Faces on Kaggle," [Online]. DAS


Available: https://www.kaggle.com/c/deepfake-detection-
The dataset is available at Kaggle at the following link: P.
challenge/discussion/121173. [Accessed 23 3 2022].
Datasets, "140k Real and Fake Faces," [Online]. Available:
https://www.kaggle.com /datasets/ xhlulu/140k-real-and-
[7] Radford, Alec, Luke Metz, and Soumith Chintala. fake-faces, [Accessed 23 3 2022]. It also used in the
"Unsupervised representation learning with deep research paper Jovanović, Radiša. "Convolutional Neural
convolutional generative adversarial networks." arXiv Networks for Real and Fake Face Classification." In Sinteza
preprint arXiv:1511.06434 (2015). 2022-International Scientific Conference on Information
Technology and Data Related Research, pp. 29-35.
[8] Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei Singidunum University, 2022.
A. Efros. "Image-to-image translation with conditional
adversarial networks." Proceedings of the IEEE conference
on computer vision and pattern recognition (2017): 1125-
1134.

[9] Zhang, Richard, Phillip Isola, and Alexei A. Efros.


"Colorful image colorization." European conference on
computer vision (2016).

[10] Odena, Augustus, Christopher Olah, and Jonathon


Shlens. "Conditional image synthesis with auxiliary
classifier GANs." Proceedings of the 34th International

You might also like