Multiclass Classification of DGA Based Malware Using NLP

This document discusses multiclass classification of domain generation algorithm (DGA) based malware using deep learning. It aims to classify malware samples into their respective families, beyond only distinguishing legitimate from malicious domains as in previous binary classification experiments. However, prior work has shown poor performance in distinguishing similar malware families or detecting newer, more sophisticated DGAs. This research aims to address gaps in multiclass classification through techniques like data balancing and different encoding methods in deep learning.

Uploaded by

Bereket Hailu

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Multiclass Classification of DGA Based Malware Using NLP

Uploaded by

Bereket Hailu

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Multiclass classification of DGA based malware in deep learning

The binary experiment is designed to answer the ML question of separating legitimate FQDNs
from malicious AGDs, considering all malware families as a single category. Experiment 2
(Multiclass) The multiclass experiment is designed to go beyond the above-mentioned binary
experiment in order to classify not only the legitimate FQDN but also sort malware samples
according to their families (Mattia Zago et al, 2019).
Machine learning models that attempt to do DGA classification based only on the domain name
itself, such as the ones considered in this paper, might not be sufficient to detect a DGA like
CharBot. The result highlights the need for ML models that exploit additional context features
such as the IP-addresses that the domains are mapped to, or temporal access patterns (e.g.
how often the domain was requested, and when) [3], [16]–[18], as was done successfully for
dictionary DGAs [10] (JONATHAN PECK et al, 2019- *peck2019.pdf ).
Research question
 which feature reduction strategy optimally approximates the data? Preliminary results using
nonlinear feature reduction techniques seem promising. character features, Unicode features,
Word‐bag model n‐gram
 Can we increase the performance of multiclass classification by balancing our data?
 Can we detect all malware families?
 Can encoding technique improve the performance?
Objectives
 To study and analyze the properties of each malware family
 Apply multiclass classification solution on deep learning
 Analyzing the statistical properties of malicious domains of specific family.

Literature review (Gap analysis)

In 2020, Karunakaran P. come up with deep learning approach to DGA classification to detect
DGA which generates malicious domains randomly []. They achieved 94.9% accuracy for DGA
classification with help of additional feature extraction and knowledge-based extraction in the
deep learning architecture (CNN & RNN). But they only used 20 of the malware type in their
dataset which indicate that their model will lack to detect other malware types. They also didn’t
mention how many of them were detected in the training or testing phase. The other thing that
I have observed was they have only used accuracy metrics to evaluate their model but using
others metrics like FPR is essential in actual deployment (Bin Yu et al, 2018).
In 2019, Mattia Zago et al. presented the state of the art aiming to polish the feature discovery process,
which is the single most time-consuming part of any ML approach in detection of DGA based botnet [].
Results show that only a minor fraction of the defined features are indeed practical and informative,
especially when considering 0-day botnet identification. Both bin ary and multiclass classification
experiment was done. The result of multiclass experiment performed worse than the binary
one due to unable to distinguish similar malware like Oakbot and Matsnu.
In 2019, Ryan R. Curtin et al. Detecting DGA domains which is a combination of a novel recurrent
neural network architecture with domain registration side information. Their experiment was
able to detect DGA families with high smash-word score but with low accuracy in addition the
difficult to detect word-based malware which is called Charbot were not included in the
experiment. Also, their model was unable to detect DGA families that do not look like natural domain
names.

In the same year Daniel S. Berman proposed 1D Application of Capsule Networks to DGA Detection.
They used, CapsNet, CNN and LSTM algorithm to detect different types of DG malware []. Their
experiment was not successful to detect some of them such as vawtrak, Vidro, Sphinx, corebot, virut,
cryptowall.

The greatest weakness of all the models tested is their deficiencies in detecting really word-
based DGAs. In some cases, some of these real word-based DGAs use a limited dictionary to
generate domain names and change that dictionary after some time. This manifests in three
ways. The first is that when the model is trained on data from that DGA, time is not taken into
account and the model fails to detect the malicious domain names, as is the case for matsnu
and gozi. The second is when the model can only detect the malicious domain names when it is
trained on data from that DGA, regardless of time, but fails to detect it otherwise, as is the case
for unknowndropper. Finally, there are models that initially perform well but after time passes,
performance significantly declines because of a change in the DGA generator, as is the case
with pizd and suppobox. Developing a model capable of detecting malicious domains in all
three of these situations is critical, and all models tested here fail to do so [] ( Ryan R. Curtin et al.
, 2019-*info10050157.pdf).
JONATHAN PECK et al, presented a novel DGA called CharBot, which is capable of producing large
numbers of unregistered domain names. In their experiment they get very poor performance
by state-of-the-art classifiers for real-time detection of the DGAs, including the recently
published methods FANCI (a random forest based on human-engineered features) and LSTM.MI
(a deep learning approach). They tried to highlight a dangerous weakness of modern DGA
classifiers, namely their vulnerability to extremely simple attacks that make no use of
sophisticated machine learning techniques.
Yanchen Qiao et al, proposed a DGA domain name classification method based on Long Short-Term
Memory (LSTM) with attention mechanism []. They used the character sequence of the domain name as
a feature but due to imbalanced dataset they achieved poor performance for 10 of them out of 18
malware class.

Xiaochun Yun et al, proposed Khaos, a novel DGA with high anti-detection ability based on neural
language models and the Wasserstein Generative Adversarial Network (WGAN). The experiment results
show that Khaos outperforms the other nine in all detection indices of the detection approaches but the
others was detected with poor performance.

So many researches are done on the classification of DGA based malware detection but it was
unsuccessful in identification of some malware. There Is problem of categorizing them according to their
malware family in current detection systems due to the used reduction & classification algorithms
nature, inappropriate context, and unbalanced data (Duc Tran, ). This research is to fill the gap of
multiclass classification problem by using different encoding techniques in deep learning.

b2 Progress Test 1
100% (3)
b2 Progress Test 1
5 pages
Neurons and Synapse
100% (4)
Neurons and Synapse
44 pages
Classification of Dga Based Malware Using
No ratings yet
Classification of Dga Based Malware Using
27 pages
ML Paper 2
No ratings yet
ML Paper 2
8 pages
A LSTM Based Framework For Handling Multiclass Imbalance in DGA Botnet Detection 1
No ratings yet
A LSTM Based Framework For Handling Multiclass Imbalance in DGA Botnet Detection 1
13 pages
Real Time Detection of Dictionary DGA Network Traffic Using Deep Learning
No ratings yet
Real Time Detection of Dictionary DGA Network Traffic Using Deep Learning
17 pages
Dynamic Android Malware Category Classification Using Semi-Supervised Deep Learning
No ratings yet
Dynamic Android Malware Category Classification Using Semi-Supervised Deep Learning
8 pages
Ieee Argencon 2016 Paper 14
No ratings yet
Ieee Argencon 2016 Paper 14
6 pages
Predicting Domain Generation Algorithms With Long Short-Term Memory Networks
No ratings yet
Predicting Domain Generation Algorithms With Long Short-Term Memory Networks
13 pages
FAMD A Fast Multifeature Android Malware Detection
No ratings yet
FAMD A Fast Multifeature Android Malware Detection
12 pages
1 s2.0 S1570870521001281 Main
No ratings yet
1 s2.0 S1570870521001281 Main
13 pages
Discovering Optimal Features Using Static Analysis and A Genetic Search Based Method For Android Malware Detection
No ratings yet
Discovering Optimal Features Using Static Analysis and A Genetic Search Based Method For Android Malware Detection
25 pages
Kim 2016
No ratings yet
Kim 2016
12 pages
08 Rohit Final Malware Research Paper
No ratings yet
08 Rohit Final Malware Research Paper
13 pages
3116-analisis-statis-deteksi-malware-jurnal-cybersecurity.id.en
No ratings yet
3116-analisis-statis-deteksi-malware-jurnal-cybersecurity.id.en
5 pages
Multi Level Ransomware Detection Framework
No ratings yet
Multi Level Ransomware Detection Framework
8 pages
Sensors: Deep Feature Extraction and Classification of Android Malware Images
No ratings yet
Sensors: Deep Feature Extraction and Classification of Android Malware Images
29 pages
Dynamic Android Malware Category Classification
No ratings yet
Dynamic Android Malware Category Classification
8 pages
A_Comprehensive_Survey_of_Generative_Adversarial_Networks_GANs_in_Cybersecurity_Intrusion_Detection--2023
No ratings yet
A_Comprehensive_Survey_of_Generative_Adversarial_Networks_GANs_in_Cybersecurity_Intrusion_Detection--2023
24 pages
Ijwp 2020010101 PDF
No ratings yet
Ijwp 2020010101 PDF
11 pages
1 s2.0 S2352864821000973 Main
No ratings yet
1 s2.0 S2352864821000973 Main
7 pages
WP DGAs in The Hands of Cyber Criminals
No ratings yet
WP DGAs in The Hands of Cyber Criminals
6 pages
Liu et al. - 2024 - SeGDroid An Android malware detection method base
No ratings yet
Liu et al. - 2024 - SeGDroid An Android malware detection method base
15 pages
Hybrid Sequence Based Android Malware Detection Using Natural Language Processing
No ratings yet
Hybrid Sequence Based Android Malware Detection Using Natural Language Processing
15 pages
Detection of Malware in Android Phones Using Machine Learning
No ratings yet
Detection of Malware in Android Phones Using Machine Learning
6 pages
A Hybrid Approach for Android Mal Ware Detection
No ratings yet
A Hybrid Approach for Android Mal Ware Detection
15 pages
DLAP
No ratings yet
DLAP
15 pages
s11416-021-00385-z
No ratings yet
s11416-021-00385-z
12 pages
masum2019
No ratings yet
masum2019
5 pages
Paper 2 179999913001 INDJCSE22-13-05-109
No ratings yet
Paper 2 179999913001 INDJCSE22-13-05-109
14 pages
Malware Detection and Classification using Generative Adversarial Network
No ratings yet
Malware Detection and Classification using Generative Adversarial Network
18 pages
A Comparative Analysis of Malware
No ratings yet
A Comparative Analysis of Malware
10 pages
A_Machine_Learning-Based_Classification_and_Prediction_Technique_for_DDoS_Attacks
No ratings yet
A_Machine_Learning-Based_Classification_and_Prediction_Technique_for_DDoS_Attacks
7 pages
Obfuscation-Resilient_Software_Plagiarism_Detection_with_JPlag
No ratings yet
Obfuscation-Resilient_Software_Plagiarism_Detection_with_JPlag
2 pages
Statistical Performance Assessment of Supervised Machine Learning Algorithms For Intrusion Detection System
No ratings yet
Statistical Performance Assessment of Supervised Machine Learning Algorithms For Intrusion Detection System
12 pages
GBKPA and AuxShield
No ratings yet
GBKPA and AuxShield
9 pages
paper_preprint
No ratings yet
paper_preprint
12 pages
Literature Survey
No ratings yet
Literature Survey
3 pages
Malware Detection and Suppression Using Blockchain Technology
No ratings yet
Malware Detection and Suppression Using Blockchain Technology
6 pages
DDos Attack Prediction - ML
No ratings yet
DDos Attack Prediction - ML
5 pages
computers-13-00059
No ratings yet
computers-13-00059
18 pages
Research Paper
No ratings yet
Research Paper
8 pages
10 1016@j Compeleceng 2020 106729
No ratings yet
10 1016@j Compeleceng 2020 106729
17 pages
1 s2.0 S1383762122000467 Main
No ratings yet
1 s2.0 S1383762122000467 Main
11 pages
Classification of Android Apps and Malware Using Deep Neural Networks
No ratings yet
Classification of Android Apps and Malware Using Deep Neural Networks
8 pages
Detection of Advanced Malware by Machine Learning Techniques
No ratings yet
Detection of Advanced Malware by Machine Learning Techniques
8 pages
M J S D C T: Alicious AVA Cript Etection Based ON Lustering Echniques
No ratings yet
M J S D C T: Alicious AVA Cript Etection Based ON Lustering Echniques
11 pages
Enhancing Android Malware Detection Throught Ensemble Stakcking
No ratings yet
Enhancing Android Malware Detection Throught Ensemble Stakcking
11 pages
BT Nhóm
No ratings yet
BT Nhóm
16 pages
2022 V13i103
No ratings yet
2022 V13i103
10 pages
Ransomware Attack Detection Based On Pertinent System Calls Using Machine Learning Techniques
No ratings yet
Ransomware Attack Detection Based On Pertinent System Calls Using Machine Learning Techniques
23 pages
Ransomware Attack Detection Based On Pertinent System Calls Using Machine Learning Techniques
No ratings yet
Ransomware Attack Detection Based On Pertinent System Calls Using Machine Learning Techniques
23 pages
A novel machine learning approach for detecting first-time-appeared malware
No ratings yet
A novel machine learning approach for detecting first-time-appeared malware
17 pages
87_95_Ensemble-Machine-Learning-Algorithm-for-Telegram-Spam-Detection
No ratings yet
87_95_Ensemble-Machine-Learning-Algorithm-for-Telegram-Spam-Detection
10 pages
1-s2.0-S2352711022000036-main
No ratings yet
1-s2.0-S2352711022000036-main
6 pages
A New Malware Classification Framework Based On Deep Learning Algorithms
No ratings yet
A New Malware Classification Framework Based On Deep Learning Algorithms
6 pages
805-Article Text-3656-1-10-20220310
No ratings yet
805-Article Text-3656-1-10-20220310
16 pages
Ahts04 Sandia National Laboratories: Multimodal Deep Learning For Flaw Detection in Software Programs
No ratings yet
Ahts04 Sandia National Laboratories: Multimodal Deep Learning For Flaw Detection in Software Programs
13 pages
Malware Application Detection Using Machine Learning
No ratings yet
Malware Application Detection Using Machine Learning
8 pages
Mastering Deepfake Detection - A Cuting-Edge Approach To Distinguish GAN and Diffusion-Model Images
No ratings yet
Mastering Deepfake Detection - A Cuting-Edge Approach To Distinguish GAN and Diffusion-Model Images
24 pages
Machine Learning Algorithms For Spotting 6G Network Penetration For Different Attacks
No ratings yet
Machine Learning Algorithms For Spotting 6G Network Penetration For Different Attacks
5 pages
Learning Applied to Ground Vehicles: Enhancing Ground Vehicle Performance through Computer Vision Learning
From Everand
Learning Applied to Ground Vehicles: Enhancing Ground Vehicle Performance through Computer Vision Learning
Fouad Sabry
No ratings yet
Animated PPT - Critical Thinking
No ratings yet
Animated PPT - Critical Thinking
18 pages
Chapter 1 PR Revised
No ratings yet
Chapter 1 PR Revised
11 pages
Relationship Between Motivation and Performance
No ratings yet
Relationship Between Motivation and Performance
24 pages
Fr. Arsie - Family Rituals During The Pandemic
No ratings yet
Fr. Arsie - Family Rituals During The Pandemic
32 pages
Contrasting and Categorization of Emotions
No ratings yet
Contrasting and Categorization of Emotions
3 pages
Post Bloomfieldian Structuralism: Leonard Bloomfield, (Born April 1, 1887
No ratings yet
Post Bloomfieldian Structuralism: Leonard Bloomfield, (Born April 1, 1887
3 pages
When The World Becomes Too Real
No ratings yet
When The World Becomes Too Real
7 pages
Understanding The Self
50% (2)
Understanding The Self
10 pages
Art. 2
No ratings yet
Art. 2
6 pages
Unit 8 - Special Senses
No ratings yet
Unit 8 - Special Senses
7 pages
Ch.1 Variations in Psychological Attributes
No ratings yet
Ch.1 Variations in Psychological Attributes
11 pages
ExecSummaries-The Brain Advantage
No ratings yet
ExecSummaries-The Brain Advantage
8 pages
Peripheral Neuropathy 2 Volume Set 4th Edition Peter James Dyck 2025 scribd download
No ratings yet
Peripheral Neuropathy 2 Volume Set 4th Edition Peter James Dyck 2025 scribd download
67 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
5 pages
Sensory Integration Therapy
No ratings yet
Sensory Integration Therapy
17 pages
CHP 1 Overview To Understanding Abnormal Behavior 2017-18
No ratings yet
CHP 1 Overview To Understanding Abnormal Behavior 2017-18
93 pages
Self ControlAsValue BasedChoice - BerkmanEtAl
No ratings yet
Self ControlAsValue BasedChoice - BerkmanEtAl
17 pages
Barry. Perception and Visual Communication Theory
50% (2)
Barry. Perception and Visual Communication Theory
16 pages
Lesson 7: Emotions and Emotional Intelligence
No ratings yet
Lesson 7: Emotions and Emotional Intelligence
33 pages
(Ebook) Trouble in Mind: Stories from a Neuropsychologist's Casebook by Jenni Ogden ISBN 9780199827008, 0199827001 pdf download
100% (2)
(Ebook) Trouble in Mind: Stories from a Neuropsychologist's Casebook by Jenni Ogden ISBN 9780199827008, 0199827001 pdf download
53 pages
Alexithymia
No ratings yet
Alexithymia
18 pages
The Descriptive Research Strategy: BE Aware Some Slides On Good Items Are Already Covered
No ratings yet
The Descriptive Research Strategy: BE Aware Some Slides On Good Items Are Already Covered
26 pages
Eugen Bleuler's Dementia Praecox or TH Group of Schizophrenias
No ratings yet
Eugen Bleuler's Dementia Praecox or TH Group of Schizophrenias
9 pages
Define Behavioral Science? Ans:: 1. Sociology 2. Social Science 3. Social Anthropology
No ratings yet
Define Behavioral Science? Ans:: 1. Sociology 2. Social Science 3. Social Anthropology
12 pages
PSY101 Phobias
No ratings yet
PSY101 Phobias
17 pages
Positive Psychology Questionnaires
No ratings yet
Positive Psychology Questionnaires
7 pages
Chapt. 2 TNT
No ratings yet
Chapt. 2 TNT
28 pages
Physical Development of Infants and Toddlers
No ratings yet
Physical Development of Infants and Toddlers
30 pages

Multiclass Classification of DGA Based Malware Using NLP

Uploaded by

Multiclass Classification of DGA Based Malware Using NLP

Uploaded by

Multiclass classification of DGA based malware in deep learning

Literature review (Gap analysis)

You might also like