Malware Detection and Classification using Generative Adversarial Network
Malware Detection and Classification using Generative Adversarial Network
ABSTRACT
The Generative Adversarial Networks (GANs) are playing a crucial role in deep-learning-based malware
classification to overcome the dataset imbalance and unseen malware. The Generative AI is preferably
used in many applications, such as improving image resolution and generating audio, video, and text. The
cybercriminals are also using the Generative AI for generating the malware and deepfake videos to harm
the targeted person or device. By generating the synthetic data, it makes the deep learning model more
robust to detect such types of unseen and adversarial attacks. This work utilizes GANs for generating
adversarial malware samples to train a classification and detection model, improving the model’s ability
to identify sophisticated malware variants. The performance of the proposed Conditional Generative
Adversarial Network (CGAN) model is evaluated on a multiclass malware grayscale image dataset and a
binary class malware RGB image dataset. The performance of proposed model is compared with current
state-of-the-art. Results indicate a significant improvement in classification accuracy and a reduction in
training time and false positives, showcasing GAN’s potential in the dynamic cybersecurity landscape.
KEYWORDS
Malware detection, Generative Adversarial Networks, deep learning, classification, adversarial learning,
cybersecurity.
1. INTRODUCTION
Malware remains a significant challenge to global cybersecurity, with attackers continuously
refining their methods, making it difficult for traditional detection systems to identify zero-day
exploits and obfuscated malware[1]. The rapid expansion of cloud computing, IoT, sensor
devices, and Industry 4.0 has significantly increased the risk of cyberattacks[2]. This digital
growth, along with the rise of big data analytics for business decisions, highlights the growing
dependency on computing resources[3], [4]. The use of generative AI technologies, such as
ChatGPT, poses new challenges for conventional cybersecurity techniques, as cybercriminals
exploit these tools to perpetrate fraud, creating fake images, audio, and videos[5]. Consequently,
researchers are focusing on enhancing security measures, particularly against zero-day,
ransomware, and APT attacks[6]. Generative Adversarial Networks (GANs) turn out to be very
useful to overcome the unbalanced dataset problem and improves the performance of model is
terms of malware detection accuracy. Deep learning advancements, particularly in Generative
Adversarial Networks (GANs), offer a promising new direction by enhancing the detection of
DOI: 10.5121/ijcsit.2024.16508 93
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
emerging threats. This study explores how GANs can be effectively applied to malware detection
and classification. GANs not only identify malware but also learn from adversarial samples to
build a more resilient detection model.
Generative Adversarial Network is a deep learning network model proposed by Goodfellow [7].
It is based on the minimax two-player game that consists of two basic components: the
generative model and the discriminative model 𝐷. The generator produces synthetic samples
𝑝(𝑧)from a latent vector 𝑝𝑔 over the data 𝑥, while the discriminator differentiates between fake
𝑝𝑧(𝑧)and real sample 𝑝𝑑𝑎𝑡𝑎(𝑥). Both the generator and discriminator are iteratively fine-tuned
through training to create a highly optimized GAN-based detection system (Figure 1).
The 𝐺 and 𝐷 both are trained simultaneously; we update parameters for 𝐺 to minimize 𝑙𝑜g(1 −
𝐷(𝐺(𝑧))) and update parameters for D to minimize 𝑙𝑜𝑔𝐷(𝑥). Both models are trained up to
when no further improvements can be done because of pg = pdata. The 𝐷 and 𝐺 are using the value
function V(G, D) as [7]:
The generator learns to generates fake image and discriminator trained to identify real and
generated images. The inclusion of adversarial malware samples in the DL model makes it more
robust for novel and unseen malware. The generator learns to generate the fake malware samples
called the unsupervised learning. On the other hand, the discriminator identifies the malware
images as real or fake using the supervised learning because we also provide the label
information along with the image samples from both sides, the generator and the real dataset.
The generated image with respect to the real image for grayscale malware image dataset and
RGB image dataset are displayed in Figure 2 and Figure 3.
94
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
The Conditional Generative Adversarial Network (CGAN) is the extension of the model for
condition-based learning where we provide the data 𝑥 with respective label information 𝑦 for the
both generator𝐺 and discriminator 𝐷. For this, the two-player minimax game equation is
described as [9]:
There is many cybersecurity threats and challenges that can only be reduced by employing the
unsupervised or semi-supervised deep learning techniques [6]. The novel and unseen malwares
cannot be effectively detected by the traditional intrusion detection system[10]. The
Discriminator plays a crucial role to discriminate the real and fake samples. It also helps to learn
model to detect the unseen malware samples.
95
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
Nagarjun and Stamp [12] introduced an Auxiliary-Classifier GAN (AC-GAN) model for
malware classification, which was trained and tested on two malware image datasets: Malimg
and MalExe. They experimented with three different image sizes—32×32, 64×64, and 128×128.
The AC-GAN model achieved its highest classification accuracy of 95% on these datasets,
demonstrating its effectiveness in identifying malware based on image-based data representation.
Chui et al. [13] introduced a lightweight GAN model to address imbalanced malware
classification issue. They utilized two efficient architectures—ShuffleNetV2 and MobileNetV3,
both integrated with a GAN framework for malware detection. Their approach achieved
classification accuracies of 94.2% and 95%, respectively, showcasing the potential of lightweight
deep learning models in handling malware classification tasks while maintaining performance
and efficiency.
Won et al. [14] introduced four GAN models—DCGAN, LSGAN, WGAN-GP, and E-GAN—
for classifying zero-day malware attacks. Their study used the Malimg dataset, focusing on 8
malware classes with shared family names, such as Allaple.A and Allaple.L, comprising 10,868
samples. Among the models, E-GAN achieved the highest classification accuracy at 96.35%,
demonstrating its effectiveness in handling malware classification within these specific
categories.
Reilly et al. [15] introduced robust GAN models for malware classification using byteplot and
space-filling curve conversion techniques. The vanilla byteplot model achieved a classification
accuracy of 95.76%, while the vanilla Z-order model reached 93.12%. Their study highlighted
that incorporating adversarial images during training could significantly enhance the robustness
of the classification models, making them more resilient against evolving malware threats.
Lu and Li [16] proposed a GAN-based approach to enhance deep learning models for malware
classification. By incorporating GAN-generated synthetic malware samples, they significantly
improved the performance of a deep residual network. Initially, the residual network classified
malware images with 84% accuracy. However, after augmenting the training set with GAN
generated samples, the test classification accuracy increased by 6%, reaching 90%,
demonstrating the effectiveness of using GANs for enhancing deep learning-based malware
detection systems.
These studies collectively highlight the increasing focus on applying GANs for malware
classification, emphasizing their capacity to address challenges like limited labelled data and
generating diverse malware samples. The research underscores GANs as a promising approach
in improving cybersecurity, particularly by enhancing malware analysis and classification
performance. By handling imbalanced datasets and producing adversarial samples, GANs
provide a robust mechanism for building more effective and resilient malware detection systems.
96
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
3. PROPOSED METHODOLOGY
section describes the benchmark datasets for the performance evaluation and GAN model details.
There are two separate datasets used separately for malware classification and malware
detection. The Dataset-1 (Malimg [17]) that contains 9,339 grayscale malware image samples of
25 distinct malware classes. The image samples are varying shape across the classes that are
resized to a standard size of 224×224 pixels to ensure consistency across the model training and
testing processes. The distribution of malware samples across 25 classes is displayed in Figure 4.
Dataset-2 (IEEE Data Port [18]) contains the total 48,240 malware image samples and source
code files out of which only 24,109 binary class malware and normal images are used for binary
classification model. The image dataset contains 11,919 malicious images samples and 12,190
normal image samples. These samples are employed to test the model's ability to detect
anomalies and accurately differentiate between malware and benign data.
The Image samples from the grayscale image dataset are resized into the shape of 224×224×1 as
the required input image dimensions for the malware classification model. The dataset is divided
into the 80:20 ratio for the training and testing purpose. The input image pixels are normalized in
the range of [-1, 1] stored in a numpy array. Dataset-2 consists of binary class and having the 3
channel images. These image samples are normalized and reshaped into the shape of 224×224×3
required input image dimension for the malware detection model.
97
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
The generator model generates similar input data as the real data sample from the dataset. Here,
we have a grayscale image of a malware dataset of size 224×224 pixels. For this, the generator
needs to generate the same size of image from the given latent dimension of length 100. It starts
with the fully connected layer of a 7×7×256 input vector with an input dimension value of 100.
This layer generates the 256 images of size 7×7. Now the next layers use the Convolution2D
Transpose layer and LeakyRelu layer to up sample the image size as 14×14, 28×28, 56×56,
112×112, and final layer uses the activation function tanh to get fake image of size 224×224. The
model architecture for generator 𝐺 is given in Figure 5(a).
The discriminator model takes the 224×224 input image and discriminate the image as real or
fake. In this experimental work, the discriminator model contains the three sets of layers: the 2D
Convolution layer and the LeakyRelu layer. The final set of layers contains a flatten layer,
followed by a dropout layer with a dropout value of 0.4, and a final dense layer of 1 output value
with a sigmoid activation function. The model architecture for discriminator 𝐷 is shown in
Figure 5(b).
98
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
(a) (b)
Figure 5. (a) Generator model architecture, and (b) Discriminator model architecture
A GAN model is created by combining the two separate models, generator and discriminator, in
a sequential model using the Keras library (Figure 6).
99
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
The GAN model has two sequential components, first the generator that takes input dimension of
size 100 to generate the 224×224 size fake image and the discriminator takes the input image of
size 224×224-pixel value to generate binary class output 0 or 1. Here, 0 represents the fake and 1
represents the real image. Then the GAN model is trained on the set of real and fake images to
adjust the model parameters up to the condition where generator is competitive for discriminator.
The Conditional Generative Adversarial Network (CGAN) [9] model, extends the capability of
discriminator to classify the number of classes of a given dataset. For this a dense layer is added
as a final layer with the softmax activation function for multiclass malware classification. The
model created for multiclass malware classification is named as ‘CGAN Model1’ (Figure 7(a)).
A CGAN model named ‘CGAN Model2’ is created for malware detection task. The GAN model
is extended by adding a flatten layer and after this one dense layer with activation function relu
and a final output layer for binary class classification with sigmoid activation function (Figure
7(b)).
The performance metrics used to evaluate the CGAN model is described following.
3.6.1. Accuracy
Accuracy measures the overall correctness of the model by calculating the proportion of
correctly predicted instances including both positive and negative out of all predictions.
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
3.6.2. Precision
Precision is the proportion of predicted true positive samples and total samples predicted as
positive by the model also called positive predicted value.
100
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃 + 𝐹𝑃
(a) (b)
Figure 7. (a) Classification Model CGAN Model1, and (b) Detection Model CGAN Model2
101
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
3.6.3. Recall
Recall is the proportion of correctly predictive positive samples over the total positive samples
and also called the true positive rate.
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃 + 𝐹𝑁
3.6.4. F1-score
2 ∗ (𝑟𝑒𝑐𝑎𝑙𝑙 ∗ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)
𝐹1 𝑠𝑐𝑜𝑟𝑒 =
𝑟𝑒𝑐𝑎𝑙𝑙 + 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
The Receiver Operating Characteristic (ROC-AUC) curve shows the probability of model for
classifying the true positive and false positive sample for the threshold values of range 0 and 1.
The confusion matrix provides the model’s predicted value for each sample. It makes easier to
understand the performance of classification and detection model in respect to identify true
positive, false positive, true negative and false negative samples of each class.
The experimental setup consists of the computational resources and software tools and libraries
to create, train and evaluate the GAN model. The Google Collaboratory, is a cloud-based
platform that provides required computational resources as per the requirement. This
experimental work is performed by using an A100 GPU, equipped with 83.5 GB of system
RAM, 40 GB of GPU RAM, and 201.3 GB of disk storage. This virtual computing platform has
installed Python 3(version 3.10.12), TensorFlow (version 2.15.0), and the NumPy library.
4. EXPERIMENTAL ANALYSIS
The GAN-based detection model was evaluated using multiple performance metrics, including
accuracy, precision, recall, F1-score, confusion matrix, and ROC-AUC curve.
4.1.Malware Classification
After training the GAN model or the real and fake malware image samples, the model is
modified for multiclass classification. The models are compiled with two different optimizers
that are Adam and Stochastic Gradient Descant (SGD) named as ‘CGAN Model1’ and ‘CGAN
Model2’ respectively. The CGAN models as multiclass classifier is trained on the training
dataset of grayscale image dataset (Dataset-1) up to the 50 epochs. The training validation
accuracy and training validation loss curve for the classification models shown in Figure 8.
102
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
The performance of both classification models is evaluated on the test dataset. The test accuracy
of CGAN Model1 and CGAN Model2 is achieved as 96.96% and 95.67% respectively. The
performance matrix for CGAN Model1 is illustrated in Table 1.
103
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
The confusion matrix highlights that CGAN Model1 accurately classifies most classes, except
for ‘Swizzor.gen!I’ (Figure 9). It performs well in distinguishing similar classes, such as
‘Allaple.A’, ‘Allaple.L’, and various ‘Lolyda’ variants (‘Lolyda.AA1’, ‘Lolyda.AA2’,
‘Lolyda.AA3’, ‘Lolyda.AT’). The ROC-AUC curve for malware classification, showing that the
AUC value for CGAN Model1 surpasses that of CGAN Model2, indicating better overall
classification performance and reduction in false positive rate (Figure 10).
104
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
The CGAN model was adapted for binary classification to detect malware. CGAN Model1 was
compiled using the Adam optimizer, while CGAN Model2 used the Stochastic Gradient Descant
(SGD)optimizer, both using binary_crossentropy as the loss function. These models were trained
on Dataset-2 for 50 epochs. The resulting accuracy and loss trends over the training process are
illustrated in Figure 11, demonstrating how each model performed across epochs during training.
For the malware detection, the CGAN Model1 achieves the detection accuracy of 95.19% and
the CGAN Model2 achieves the detection accuracy of 93.69% on the test dataset. The
performance matrix of the malware detection model CGAN Model1 shown in Table 2.
105
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
The confusion matrix for the malware detection model CGAN Model1, shows the significant
reduction in the false positive rate as compared to false negative rate (Figure 12). The
performance of CGAN Model1 is better than the CGAN Model2 as higher ROC-AUC value is
achieved by CGAN Model1(Figure 13).
106
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
The performance of a GAN model for malware detection and classification is compared with the
state-of-the-art in Table 4. The proposed malware classification model ‘CGAN Model1’ using
Adam optimizer on the Malimg dataset outperformed all the compared models with the highest
classification accuracy of 96.96%.
107
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
Table 4. Performance Comparison with State-of-the-Art.
ACKNOWLEDGEMENTS
The authors are thankful to the Ministry of Social Justice &Empowerment, India, for providing
financial support as an RGNF-SC fellowship for the research work.
REFERENCES
[1] Prachi Chauhan, Hardwari Lal Mandoria, and Alok Negi, “Security and Privacy Defensive
Techniques for Cyber Security Using Deep Neural Networks (DNNs),” in Advanced Smart
Computing Technologies in Cybersecurity and Forensics, I., 2021, pp. 11–22.
[2] M. Rai and H. L. Mandoria, “A Study on Cyber Crimes, Cyber Criminals and Major Security
Breaches,” International Research Journal of Engineering and Technology, no. July, 2008.
[3] K. Kumar and A. Dwivedi, “Big Data Issues and Challenges in 21 st Century,” International Journal
on Emerging Technologies (Special Issue NCETST-2017), vol. 8, no. 1, pp. 72–77, 2017, [Online].
Available: www.researchtrend.net
[4] A. Dwivedi, R. P. Pant, S. Pandey, and K. Kumar, “Internet of Things’ (IoT’s) Impact on Decision
Oriented Applications of Big Data Sentiment Analysis,” in Proceedings - 2018 3rd International
Conference On Internet of Things: Smart
Innovation and Usages, IoT-SIU 2018, 2018. doi: 10.1109/IoTSIU.2018.8519922.
[5] H. Shahnawaz, S. C. Gupta, C. Mukesh, and H. L. Mandoria, “A proposed model for intrusion
detection system for mobile adhoc network,” in 2010 International Conference on Computer and
Communication Technology, ICCCT-2010, 2010. doi: 10.1109/ICCCT.2010.5640420.
108
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
[6] K. Kumar and H. L. Mandoria, “Cybersecurity Threats, Forensics, and Challenges_Springer”,
Proceedings of International Conference, Intelligent Control, Robotics, and Industrial Automation.
RCAAI 2023, Vol 1220, pp. 281-295, October 2024. Springer, Singapore. DOI: 10.1007/978-981-
97-4650-7_21.
[7] I. J. Goodfellow et al., “Generative adversarial nets,” in Advances in Neural Information Processing
Systems, 2014. doi: 10.1007/978-3-658-40442-0_9.
[8] A. Dash, J. Ye, and G. Wang, “A Review of Generative Adversarial Networks (GANs) and Its
Applications in a Wide Variety of Disciplines: From Medical to
Remote Sensing,” IEEE Access, vol. 12, 2024, DOI:
10.1109/ACCESS.2023.3346273.
[9] M. Mirza and S. Osindero, “Conditional Generative Adversarial Nets Mehdi,” arXiv:1411.1784v1
[cs.LG] 6 Nov 2014 Conditional, 2018.
[10] S. Pujari, H. L. Mandoria, R. P. Shrivastava, and R. Singh, “To Identify Malware Using Machine
Learning Algorithms,” in Communications in Computer and Information Science, 2022. DOI:
10.1007/978-3-031-10551-7_9.
[11] M. Rai and H. L. Mandoria, “Network Intrusion Detection: A comparative study using state-of-the-
art machine learning methods,” in IEEE International Conference on Issues and Challenges in
Intelligent Computing Techniques, ICICT 2019, 2019. DOI: 10.1109/ICICT46931.2019.8977679.
[12] R. Nagaraju and M. Stamp, “Auxiliary-Classifier GAN for Malware Analysis,” in Advances in
Information Security, vol. 54, 2022. doi: 10.1007/978-3-030-970871_2.
[13] K. T. Chui, B. B. Gupta, V. Arya, R. Bansal, and F. Colace, “A Lightweight Generative Adversarial
Network for Imbalanced Malware Image Classification,” in ACM International Conference
Proceeding Series, Association for Computing Machinery, Nov. 2023. DOI:
10.1145/3647444.3652455.
[14] D. O. Won, Y. N. Jang, and S. W. Lee, “PlausMal-GAN: Plausible Malware Training Based on
Generative Adversarial Networks for Analogous Zero-Day
Malware Detection,” IEEE Trans Emerg Top Comput, vol. 11, no. 1, pp. 82–94, Jan. 2023, DOI:
10.1109/TETC.2022.3170544.
[15] C. Reilly, S. O’Shaughnessy, and C. Thorpe, “Robustness of Image-Based Malware Classification
Models trained with Generative Adversarial Networks,” in ACM International Conference
Proceeding Series, Association for Computing Machinery, Jun. 2023, pp. 92–99. DOI:
10.1145/3590777.3590792.
[16] Y. Lu and J. Li, “Generative Adversarial Network or Improving Deep Learning Based Malware
Classification.”
[17] L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, “Malware images: Visualization and
automatic classification,” in ACM International Conference Proceeding Series, 2011. DOI:
10.1145/2016904.2016908.
[18] B. Saridou, J. Rose, S. Shiaeles, and B. Papadopoulos, “48,240 Malware samples and binary
visualisation images for machine learning anomaly detection (2021)”, IEEE Dataport, DOI:
https://dx.doi.org/10.21227/vs0r-8s26 2022.
AUTHORS
Krishna Kumar received the BTech in Information Technology in 2013 and the MTech in Computer
Science and Engineering in 2016. Currently, he is pursuing a PhD in information
technology from the Department of Information Technology, College of Technology,
G. B. Pant University of Agriculture and Technology, Pantnagar, Uttarakhand, India.
His research interests focus on Cyber Security, Machine Learning, and Cyber
Forensics.
Hardwari Lal Mandoria completed his doctoral degree from Maulana Azad National Institute of
Technology Barkatullah University, Bhopal (Madhya Pradesh) in Computer Science &
Engineering in 2003. He is a Professor in Department of Information Technology,
College of Technology, G. B. Pant University of Agriculture and Technology,
Pantnagar, Uttarakhand, India. He has professional experience of more than 33 years in
teaching and research in the field of Computer Science and Engineering including two
years of foreign teaching experience under UNDP at Ethiopia. He has published more
109
International Journal of Computer Science & Information Technology (IJCSIT) Vol 16, No 5, October 2024
than 100 research papers in various National and International peer-reviewed journals. He has written one
book, many book chapters and successfully guided 03 PhD and 21 MTech students. His research expertise
covers a wide range of areas, including Computer Networks, Information Security, Wireless
Communication Networks, Cyber Security & Forensics.
Rajeev Singh received his M.Tech. degree in computer science & engineering from Indian Institute of
Technology, Roorkee, India in 2008 and his Ph.D. degree from National Institute of
Technology, Hamirpur, India in 2014. Currently, he is working as a Professor with the
Department of Computer Engineering, Govind Ballabh Pant University of Agriculture &
Technology, Uttarakhand, India. His research interest includes Computer Networks &
Network Security, Information Systems Network Security Modeling and Simulation
Wireless LANs Programming Languages.
Shri Prakash Dwivedi received his M. E. degree in Computer Science & Automation from Indian Institute
of Science (IISc) and his PhD degree in Computer Science & Engineering from Indian
Institute of Technology (Banaras Hindu University), Varanasi. Currently, he is working
as an Assistant Professor with the Department of Information Technology, G. B. Pant
University of Agriculture & Technology, Pantnagar, India. His research interest includes
Algorithms, Graph Matching and Pattern Recognition.
Paras received his BTech, MTech, and PhD degrees in Electronics & Communication
Engineering from G. B. Pant University of Agriculture & Technology, Pantnagar, India.
Currently, he is working as a Professor in the Department of Electronics &
Communication Engineering at G. B. Pant University of Agriculture & Technology,
Pantnagar, India. He has more than 21 years of professional experience in teaching and
research. He has guided 14 MTech students and one PhD student. His primary research
interests include Antennas, Metamaterials and Artificial Neural Networks.
110