0% found this document useful (0 votes)
28 views57 pages

Mohak RR

The project report titled 'Evolving Strategies in Android Malware Detection: Leveraging Network Insights' focuses on developing a network-based technique for identifying malicious applications in Android environments. It details the methodology, including dataset generation, feature extraction, and the use of machine learning algorithms for classification, while also presenting experimental results that demonstrate the effectiveness of the proposed approach. The report aims to enhance network security by accurately distinguishing between benign, adware, and malicious network traffic.

Uploaded by

Aditya Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views57 pages

Mohak RR

The project report titled 'Evolving Strategies in Android Malware Detection: Leveraging Network Insights' focuses on developing a network-based technique for identifying malicious applications in Android environments. It details the methodology, including dataset generation, feature extraction, and the use of machine learning algorithms for classification, while also presenting experimental results that demonstrate the effectiveness of the proposed approach. The report aims to enhance network security by accurately distinguishing between benign, adware, and malicious network traffic.

Uploaded by

Aditya Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Evolving Strategies in Android Malware

Detection: Leveraging Network Insights

A project report submitted in fulfillment of the requirements


for the award of

Degree of Bachelor of Technology in Computer


Science and Engineering
by
Mohak (2020UCP1025)
Dommari Anitha (2020UC1070)
Under the Supervision of

Dr. Satyendra Singh Chouhan

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


MALAVIYA NATIONAL INSTITUTE OF TECHNOLOGY

May 14, 2024

1
Declaration

We,

Mohak (2020UCP1025)
Dommari Anitha (2020UCP1070)

Declare that this project titled ”Evolving Strategies in Android Malware Detection:Leveraging
Network Insights” and the work presented in it are our own. I confirm that:

• This project work was done wholly or mainly while in candidature for a B.Tech. De-
gree in the Department of Computer Science and Engineering at Malaviya National
Institute of Technology, Jaipur (MNIT).

• Where any part of this report has previously been submitted for a degree or any other
qualification at MNIT or any other institution, this has been clearly stated. Where we
have consulted the published work of others, this is always clearly attributed. Where
we have quoted from the work of others, the source is always given. With the exception
of such quotations, this project is entirely our own work.

• We have acknowledged all main sources of help.

Mohak Dommari Anitha


2020UCP1025 2020UCP1070

i
Certificate

This is to certify that this B.Tech project titled ”Evolving Strategies in Android Malware De-
tection:Leveraging Network Insights” submitted by Mohak (2020UCP1070) and Dommari
Anitha (2020UCP1070) in partial fulfilment of the requirement for the award of Bachelor of
Technology(CSE) degree as a record of student’s project carried under my supervision and
guidance.

Date

Dr. Satyendra Singh Chouhan

Assistant Professor
Department of Computer Science and Engineering
Malaviya National Institute of Technology

ii
Abstract

This project report focuses on the design and detection of a network-based


traffic technique for identifying malicious applications. The study aims to
develop an efficient and reliable method for differentiating between addware,
adware and genermalical ious network traffic, contributing to enhanced net-
work security. The report is structured into four main sections: Introduc-
tion, Related Work, Proposed Methodology and Algorithms, and Experimen-
tal Setup and Result Analysis.
The introduction provides a brief overview of the problem and the sig-
nificance of detecting malicious applications in network traffic. The related
work section presents a review of existing literature on network traffic anal-
ysis and malicious application detection. In the proposed methodology and
algorithms section, various techniques and tools used in this study are de-
scribed, including the available dataset, architectures such as Flow Labeler
and CICFlowMeter, feature selection, and machine learning algorithms such
as K-Nearest Neighbours, AdaBoost, Random Forest, Decision tree, Logistic
Regression, MLPC and Linear Discriminant classifiers.
The experimental setup and result analysis section detail the experiment
environment, data preprocessing, and classification using Flow Labeler. It
also covers dimensionality reduction, output labels, and the results obtained
from the study. Finally, the report presents a classification of benign, adware
and malicious network traffic, highlighting the effectiveness of the proposed
technique in detecting malicious applications.
In conclusion, the report demonstrates the successful development and
implementation of a network-based traffic technique for detecting malicious
applications, providing valuable insights for enhancing network security and
reducing the risk of cyber threats.

iii
Acknowledgements

We are deeply grateful to our esteemed supervisor, Dr. Satyendra Singh


Chouhan, for his invaluable guidance and dedication of his time to our work.
His expert insights and suggestions have been instrumental in the successful
completion of this project. We are also grateful for the technical and admin-
istrative support he provided throughout our research.
In addition, we would like to express our gratitude to our Head of Depart-
ment, Dr. Namita Mittal , for her continued encouragement and support in
our academic pursuits.
We also extend our appreciation to the Department of Computer Science
and Engineering at Malaviya National Institute of Technology, Jaipur, as well
as all individuals who have played a vital role in the successful realization of
this report.

iv
Contents

Declaration 1

Certificate i

Abstract ii

Acknowledgements iii

Contents iv

List of Figures vi

List of Tables vii

1 Introduction 1

2 Related Work 2

3 Proposed methodology and algorithms 4


3.1 Dataset Generation . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Architectures . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2.1 Manual Flow Labeling Methodology . . . . . . . . 5
3.2.2 CICFlowMeter . . . . . . . . . . . . . . . . . . . . 7
3.2.3 Feature Selection . . . . . . . . . . . . . . . . . . . 8
3.3 Algorithms Used for Training Dataset . . . . . . . . . . . . 9
3.3.1 K-Nearest Neighbours (KNN) . . . . . . . . . . . . 9
3.3.2 AdaBoost . . . . . . . . . . . . . . . . . . . . . . . 10
3.3.3 Random Forest . . . . . . . . . . . . . . . . . . . . 11
3.3.4 Logistic Regression . . . . . . . . . . . . . . . . . . 13
3.3.5 Decision Tree Algorithm . . . . . . . . . . . . . . . 14
3.3.6 Multilayer Perceptron (MLP) . . . . . . . . . . . . 15
3.3.7 Linear Discriminant Analysis (LDA) . . . . . . . . 17
3.4 Loss Functions Used . . . . . . . . . . . . . . . . . . . . . 18
3.4.1 Cross Entropy Loss . . . . . . . . . . . . . . . . . . 18

v
3.4.2 Mean Absolute Loss . . . . . . . . . . . . . . . . . 19

4 Experimental Setup and Result Analysis 20


4.1 Experiment Environment . . . . . . . . . . . . . . . . . . . 20
4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Output Labels . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Classification of Benign,Malicious and adware Network Traffic 31
4.4.1 Application . . . . . . . . . . . . . . . . . . . . . . 34

5 Conclusion and Future Scope 38

vi
List of Figures
3.1 CICFlowMeter Working . . . . . . . . . . . . . . . . . . . 8

4.1 Confusion Matrix -knn . . . . . . . . . . . . . . . . . . . . 22


4.2 Canadian Institute for Cybersecurity-AAGM2017 (Dataset) . 22
4.3 Confusion Matrix-random forest . . . . . . . . . . . . . . . 23
4.4 Canadian Institute for Cybersecurity-AAGM2017 (Dataset) . 24
4.5 Application . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.6 Canadian Institute for Cybersecurity-AAGM2017 (Dataset) . 25
4.7 Application . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.8 Canadian Institute for Cybersecurity-AAGM2017 (Dataset) . 27
4.9 Application . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.10 Canadian Institute for Cybersecurity-AAGM2017 (Dataset) . 28
4.11 Application . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.12 Canadian Institute for Cybersecurity-AAGM2017 (Dataset) . 30
4.13 Ensembled Confusion Matrix of All models . . . . . . . . . 31
4.14 Correlation Matrix . . . . . . . . . . . . . . . . . . . . . . 33
4.15 Home screen of the application . . . . . . . . . . . . . . . . 35
4.16 counts of different categories . . . . . . . . . . . . . . . . . 35
4.17 counts of different categories . . . . . . . . . . . . . . . . . 36
4.18 counts of different categories . . . . . . . . . . . . . . . . . 36

vii
List of Tables
4.1 Performance Metrics for six Algorithms (Macro Average) . . 37
4.2 Performance Metrics for Ensembled predictions (Ma Average) 37

viii
Chapter 1
Introduction

The widespread availability of Android devices has attracted the attention


of malicious actors who aim’s to exploit vulnerabilities and compromise user
data.In response to this escalating threat landscape, our project focuses on
developing an advanced malware detection system specifically tailored for
Android environments.
Our primary objective is to accurately identify and classify malware ap-
plications by analyzing network traffic patterns. These patterns are crucial
indicators of malicious activity, allowing us to differentiate between benign
apps, network-based malware apps (which engage in activities like unautho-
rized file transfers or remote access), and OS-level malware apps that corrupt
the device without network involvement.
To achieve this, we employ a sophisticated approach that involves extract-
ing meaningful features from network flows. These features encapsulate key
behaviors and characteristics exhibited by different types of apps, enabling
our classification model to discern between legitimate and malicious applica-
tions with high precision.
By harnessing the power of machine learning algorithms and big data an-
alytics, our system can effectively detect and mitigate the risks posed by mal-
ware in Android devices. Through extensive experimentation and evaluation
using real-world data, we demonstrate the efficacy and reliability of our ap-
proach in identifying diverse forms of malware and enhancing the overall
security posture of Android ecosystems.
In summary, our project represents a significant step towards bolstering
cybersecurity defenses in the Android domain, providing organizations and
users with a robust toolset for proactive threat detection.

1
Chapter 2
Related Work

The paper by Arash Habibi Lashkari[1] addresses the increasing threat of


mobile malware, particularly on the Android platform. The authors stress
the essential requirement for efficient and dependable detection systems to
protect both users and cellular infrastructure companies from malicious ap-
plications.
Their research proposes a novel detection and characterization system that
focuses on identifying significant deviations in the network behavior of smart
devices applications. The proposed framework aims to detect and classify
various types of malware, including adware, with just nine traffic feature
measurements. They employ machine learning techniques, including clas-
sifiers like Random Forest Classifier, KNN-Classifier, Decision Tree Clas-
sifier, Random Tree Classifer, and Logistic-Regression classifier, achieving
promising accuracy rates.
The paper offers a labeled dataset comprising 400 android device malware
traffic samples, encompassing both benign applications and 12 different fam-
ilies of malware, including adware. The authors highlight the importance of
this dataset for advancing research in Android malware detection.
This proposed model streamlines the process of Android malware detec-
tion and characterization by leveraging network traffic analysis. By employ-
ing a systematic approach that integrates data collection, feature measure-
ment, and machine learning techniques, the model demonstrates potential for
accurately identifying and categorizing malware on mobile devices.
In the literature review section, the paper surveys previous work in An-
droid malware detection, emphasizing the significance of dynamic analysis
of network traffic. It discusses various approaches, including static and dy-
namic detection methods, sensor-based detection, and network-based behav-
ioral analysis, highlighting their strengths and limitations.

In total, the paper provides a detailed view of the challenges posed by


Android malware and proposes a network-based detection framework with

2
promising results, supported by a substantial dataset for further research and
evaluation.

Another paper titled ”Machine Learning Model’s for Network Traffic Clas-
sification in Programmable Logic” explores the application of machine learn-
ing models for network traffic classification, particularly focusing on the anal-
ysis of network packet payloads. It highlights the significance of accurate
classification and identification of anomalous payloads for network security.
The study investigates the implementation of several neural network mod-
els, including convolutional neural networks (CNNs), residual neural net-
works (ResNets), autoencoders, and variational autoencoders, on field pro-
grammable gate arrays (FPGAs) such as the Xilinx VC1902.

These models are evaluated for their inference speeds and accuracy in
classifying packet payloads, with a specific emphasis on achieving inference
speeds exceeding 10,000 packets per second. The performance of FPGA
implementations is compared against that of advanced graphics processing
units (GPUs), specifically the NVIDIA V100 and A100.

The study provides insights into the effectiveness of FPGA-based imple-


mentations for packet payload inspection driven by machine learning, demon-
strating competitive accuracy with GPU implementations while surpassing
the desired inference speeds. Additionally, it presents the design and perfor-
mance of autoencoder and variational autoencoder architectures programmed
on FPGAs for identifying anomalous packet payloads.

Overall, the paper contributes to the understanding of deep learning mod-


els for network traffic analysis and highlights the potential of FPGA-based
solutions for efficient and accurate network security applications.

3
Chapter 3
Proposed methodology and algorithms

3.1 Dataset Generation

In our project on ”Evolving Strategies in Android Malware Detection: Lever-


aging Network Insights,” the initial phase involves meticulously constructing
datasets comprising legitimate, malware, and adware applications. These
datasets will serve as the cornerstone for training and testing our detection
algorithms. The dataset generation process is outlined below:

1.Root Access Acquisition using ADB (Android Debugging Services):


To ensure comprehensive access to the inner workings of Android applica-
tions, we leverage Android Debugging Services (ADB) to attain root access.
Root access allows us to delve deeper into the system, enabling us to monitor
and analyze application behavior more effectively.

2. Running Apps on Emulator: Simulating real-world conditions, we


execute the gathered applications on an Android emulator environment. This
step is crucial for observing their behavior in a controlled setting, facilitating
data collection and analysis without risking the integrity of physical devices.

3. Traffic Capture using TCPDump: Leveraging TCPDump, a network


packet analyzer, we capture the traffic generated by the executed applications.
This includes network communications such as HTTP requests, DNS queries,
and other interactions with remote servers. By capturing network traffic, we
gain valuable insights into the communication patterns of the applications,
aiding in the identification of potential malicious behavior.

4.Data Extraction via ADB: Utilizing ADB once again, we extract the
captured network traffic data from the emulator environment. This data re-
trieval step ensures that we possess a comprehensive dataset containing the
network interactions of each application under scrutiny.

4
5. CICFlowMeter Integration for Flow Extraction: Employing CI-
CFlowMeter, a flow-based network traffic analysis tool, we extract relevant
flow information from the captured traffic data. Flows encapsulate sequences
of packets sharing common attributes, enabling us to discern patterns and
anomalies in application behavior more effectively.

6. Data Storage in CSV Format: The extracted flow data is then stored
in CSV (Comma-Separated Values) format, ensuring compatibility with var-
ious data analysis and machine learning tools. This format enables seamless
integration of the dataset into our analysis pipeline while maintaining flexi-
bility for future modifications and enhancements.

7. Data Refinement with Pandas: Leveraging the power of Pandas, a


Python library for data manipulation and analysis, we refine the extracted
dataset. This step involves data cleaning, filtering, and preprocessing to en-
sure consistency and reliability in subsequent analysis tasks.

3.2 Architectures

3.2.1 Manual Flow Labeling Methodology

The manual flow labeling methodology involves a step-by-step process to


classify and label network flows by analyzing TCP dumps obtained from
installed malware, benign, and adware applications. This approach is labor-
intensive but offers precise control over the labeling process, ensuring accu-
rate classification of network behavior.
Steps for labeling :

1. Application Installation: Malware, benign, and adware applications are


installed on the Android Debug Bridge (ADB) for data capture

2. TCP Dump Acquisition: TCP dumps are obtained from the installed
applications through the ADB interface. These dumps contain detailed
network traffic information.

5
3. Data Extraction: The TCP dumps are pulled from the device’s SD card
and processed using CICFlow to generate CSV files. This step ensures
the data is in structured format for further analysis.

4. Noise Removal: Data preprocessing is performed to eliminate noise


from the TCP dumps. Idle emulator situations and irrelevant traffic are
filtered out to focus on relevant network flows.

5. Manual Labeling: Each network flow is manually labeled according to


the class of application it belongs to (malware, benign, or adware). This
labeling process involves careful examination of traffic patterns and be-
haviors.

Methodology Enhancements
To enhance the manual flow labeling process, several improvements have
been made to the existing Flow Labeler architecture:

1. Customized Labels: The labeling engine has been modified to produce


three different labels: Benign, General Malware and adware. These
labels provide granular insights into network traffic behavior.

2. Code Modifications: The Flow Labeler code has been updated to ac-
commodate the new labeling requirements and feature extraction meth-
ods. This ensures compatibility with the revised methodology and facil-
itates accurate flow classification.

Feature Engineering Pre-Labelling Feature engineering is a critical step in


the manual flow labeling methodology, where additional features are gen-
erated from existing ones to enrich the dataset and improve model training
accuracy.

1. Feature Generation Steps:

(a) Statistical Aggregation: Statistical metrics like mean, median, stan-


dard deviation, minimum, and maximum are computed for numer-
ical features such as duration, number of packets, and total bytes.

6
These metrics provide insights into the distribution and variability
of flow characteristics.

(b) Protocol Specific Features: For each protocol (e.g., HTTP, Telnet,
SMTP), protocol-specific features are generated to capture unique
characteristics associated with different network protocols. For in-
stance, for HTTP flows, features like HTTP method (GET, POST),
status code, and content type are extracted

(c) Flow Rate Features: Flow rate features, such as packets per sec-
ond and bytes per second, are calculated to quantify the rate of data
transfer within each flow. These features provide insights into flow
intensity and help distinguish between normal and anomalous traf-
fic

3.2.2 CICFlowMeter

CICFlowMeter, is a comprehensive network traffic analysis tool that provides


valuable insights into network flow data at both the packet and flow levels.
It was created by the Canadian Institute for Cybersecurity (CIC) and is in-
tended to extract more than 80 properties from unprocessed network data.
These features can be utilised for a variety of tasks, including virus research,
network monitoring, intrusion detection, and traffic profiling.

CICFlowMeter reads network traffic from a network interface or a pcap


file and generates flow-based features in CSV format. The generated features
can be used for machine learning and statistical analysis, helping network
administrators and security analysts in understanding the behavior of traffic
and detecting potential threats.

7
Figure 3.1: CICFlowMeter Working

After Running the CICFlowMeter application we have a CSV file of 80


different attributes and the following table is a snippet of attributes that have
been generated by the application. We shall not be using all the attributes
here and rather have a selected set of attributes having the most effect during
the modelling process.

3.2.3 Feature Selection

A critical stage of machine learning and data analysis is feature selection.,


as it helps in identifying the data’s most pertinent characteristics or factors.
This process aims to simplify the model, reduce computational complexity,
and enhance the overall performance by eliminating noise or redundant in-
formation. By selecting a smaller subset of features, the model becomes
more interpretable and comprehensible, leading to better generalization and
reduced overfitting.Different techniques are utilized for feature selection, in-
cluding filter methods, wrapper methods, and embedded methods, each with
its distinct approach and criteria for selecting the most informative features.
The selection of a feature selection technique is determined by factors such
as the problem domain, dataset attributes, and the desired performance of
the model. Ultimately, effective feature selection enables the development of
robust and efficient machine-learning models, capable of delivering accurate
and insightful predictions.
We have used the correlation matrix technique to identify and remove
highly correlated features in our dataset. The pairwise correlation coeffi-

8
cients between variables in a dataset are displayed in a table called a correla-
tion matrix. It is an essential tool for understanding the relationships among
variables, identifying multicollinearity, and guiding feature selection. The
correlation coefficient quantifies the strength and direction of the linear rela-
tionship between two variables, encompassing values that range from -1 to
1. A direct association is shown by a positive correlation coefficient (as one
variable rises, the other rises as well), whereas an inverse relationship is in-
dicated by a negative correlation coefficient (as one variable rises, the other
falls). There is no linear relationship between the variables, as shown by a
value of 0.
To compute the correlation matrix, a correlation measure such as Pear-
son’s correlation coefficient is used. Pearson’s correlation coefficient, de-
noted by ‘r’, is calculated as follows:
P
((xi − x̄)(yi − ȳ))
r = pP P
(xi − x̄)2 (yi − ȳ)2
where xi and yi are individual data points, x̄ and ȳ are the means of the
P
respective variables, and denotes the summation. Only linear relationships
between variables are sensitive to Pearson’s correlation coefficient.
The correlation matrix is symmetric, with diagonal elements equal to 1,
as the correlation of a variable with itself is always 1. The coefficients of
correlation between two pairs of variables are represented by the off-diagonal
elements.

3.3 Algorithms Used for Training Dataset

3.3.1 K-Nearest Neighbours (KNN)

For classification and regression tasks, K-Nearest Neighbors (KNN) stands


out as a straightforward yet effective supervised machine learning technique.
It’s a non-parametric, instance-based learning approach that hinges on prox-
imity, assuming that data points sharing similar features are likely to belong
to the same category or possess similar target values.
In predicting outcomes, the KNN algorithm identifies the ’k’ nearest neigh-
bors—data points—that are closest to a specific input instance. For classifi-

9
cation, it assigns the majority class label among these neighbors, while for
regression, it calculates the average of their target values.
Various distance metrics can be employed to compute the distance be-
tween data points. For example, the Euclidean distance between two points
’x’ and ’y’ in an n-dimensional space can be computed using the formula:
p
d(x, y) = (x1 − y1 )2 + (x2 − y2 )2 + · · · + (xn − yn )2

In different scenarios, alternative distance metrics like Manhattan or Minkowski


distance can also be employed. The selection of the distance metric and the
value of ’k’ are critical hyperparameters that significantly impact the KNN
algorithm’s performance. Typically, techniques such as cross-validation or
grid search are used to determine these hyperparameters.

One of the primary advantages of KNN is its simplicity, which makes it


easy to implement and comprehend. Additionally, it can swiftly adapt to
new data since there’s no need to retrain the model when introducing new
instances. However, KNN can be computationally expensive, particularly for
large datasets.

3.3.2 AdaBoost

For classification challenges, ensemble learning methods like AdaBoost, or


adaptive boosting, are used. It combines multiple weak learners, usually de-
cision stumps or shallow decision trees, to form a strong classifier. The core
concept of AdaBoost is to sequentially train weak learners and dynamically
adjust their weights based on their performance, leading to a weighted aggre-
gation of learners that collectively form a robust classifier.
Following is how the AdaBoost algorithm operates:
1. Give every instance in the training dataset the same amount of weight.

2. For each iteration:

(a) Utilise the weighted dataset to train a weak learner.


(b) Calculate the error rate of the weak learner, which is the weighted
sum of misclassified instances.

10
(c) Compute the learner’s weight in the ensemble by using the error
rate. A lower error rate results in a higher weight.
(d) Update the weights of the instances in the dataset. occurrences
that were incorrectly identified received heavier weights, while cor-
rectly classified occurrences received lighter weights. This ensures
that misclassified instances have a higher probability of being se-
lected in the next iteration.
(e) Normalize the instance weights so they sum up to 1.

3. Utilise a weighted majority vote to combine the weak learners into a


strong classifier.

The final classifier is given by:

M
!
X
H(x) = sign αm hm (x)
m=1
where M is the number of weak learners, αm is the weight of the m-th
weak learner, and hm (x) is the prediction of the m-th weak learner.
AdaBoost is effective in various problem domains and often achieves high
classification accuracy. It can, however, be sensitive to erratic data and out-
liers as it focuses on correcting misclassifications during the training process.
To perform at the highest level, hyperparameters like the number of iterations
and the type of weak learner must be carefully tuned.

3.3.3 Random Forest

Random Forest is a powerful ensemble learning technique commonly used


for tasks such as classification and regression. In the training phase, it con-
structs multiple decision trees and outputs the class that represents the mean
of the classes (for classification) or the mean prediction (for regression). Ran-
dom Forest is specifically engineered to enhance performance and mitigate
the overfitting issue often encountered with individual decision trees.
The Random Forest algorithm’s key steps are as follows:

11
1. For each tree in the ensemble:

(a) Select some random subset from the training data with replacement
(known as bootstrapping). This subset is used to train the current
decision tree.
(b) Random Forest introduces additional randomness by randomly se-
lecting a subset of features to consider for each decision tree split.
This random feature selection process helps to decorrelate the trees
within the ensemble, leading to a more robust model.
(c) Grow the decision tree until it reaches its maximum depth or until a
stopping requirement, such as a required minimum of samples per
leaf node, is satisfied.

2. For making predictions, input the new instance to all the decision trees
in the ensemble.

3. For classification tasks, the predictions of individual trees are aggregated


by taking the majority vote of the predicted classes.

4. For regression tasks, the predictions are aggregated by computing the


mean of the predicted values.

Numerous benefits of Random Forest include its capacity for handling big
datasets, highly dimensional feature spaces, and missing data. It also pro-
vides robust performance with less risk of overfitting compared to individual
decision trees. Moreover, it can be used for feature importance estimation,
as it calculates the average feature impurity decrease across all trees in the
ensemble.
However, Random Forest can be computationally expensive due to the
need to train multiple trees, and it may not be as interpretable as a single de-
cision tree given its ensemble nature. Despite these limitations, Random For-
est remains a popular and a versatile algorithm for various machine-learning
tasks.

12
3.3.4 Logistic Regression

Logistic Regression is a statistical technique employed in binary classifica-


tion tasks, where the output variable is binary, having two possible values.
Contrary to its name, logistic regression is used predominantly for classifica-
tion purposes rather than regression. It models the probability of an instance
belonging to a specific class based on its features.
Key Steps of the Logistic Regression Algorithm:

1. Model Training:

(a) Parameter Estimation: Logistic Regression estimates the parame-


ters (coefficients) of the logistic function using optimization tech-
niques such as gradient descent. These parameters determine the
relationship between the features and the probability of belonging
to a particular class.

2. Prediction:

(a) Probability Estimation: Given a new instance with a set of features,


Logistic Regression calculates the probability that the instance be-
longs to the positive class (class 1) using the learned parameters and
the logistic function.
(b) Thresholding: After computing the predicted probability, it’s com-
pared to a predetermined threshold, typically set at 0.5. If the prob-
ability surpasses the threshold, the instance is classified into the
positive class; otherwise, it’s classified into the negative class.

The straightforwardness and interpretability of Logistic Regression ren-


der it a favored option for binary classification tasks, offering straightforward
insights into the impact of features on classification decisions. Its computa-
tional efficiency and ability to incorporate regularization techniques enhance
generalization performance. However, Logistic Regression’s linearity as-
sumption may limit its effectiveness in capturing complex relationships, and
its binary nature necessitates extensions for multi-class classification. Addi-
tionally, sensitivity to outliers and reliance on certain assumptions regarding

13
feature distribution can affect performance in real-world datasets. Despite
these limitations, Logistic Regression remains a valuable tool due to its bal-
ance of simplicity, interpretability, and efficiency.

3.3.5 Decision Tree Algorithm

Decision trees are robust and intuitive models utilized for both classification
and regression tasks. They segment the feature space into regions and assign
a class label or forecast a target variable for each region. The ease of under-
standing and interpreting decision trees makes them widely preferred across
diverse domains, including healthcare and finance, where comprehending the
underlying decision process is vital. Key Steps of the Decision Tree Algo-
rithm:

1. Node Splitting:

(a) Decision trees choose the most suitable feature to split the data at
each node by employing a selected criterion, like Gini impurity or
information gain. This process entails assessing each feature’s ca-
pacity to segregate the data into homogeneous classes.
(b) In the case of numerical features, decision trees identify the ideal
split point that maximizes the homogeneity of the child nodes pro-
duced.

2. Recursive Partitioning:

(a) After making a split, decision trees proceed to partition the data re-
cursively into subsets using the chosen feature and split point. This
recursive process continues until a stopping criterion is satisfied,
such as attaining a maximum depth or reaching a minimum number
of samples per leaf..

3. Leaf Node Assignment:

14
(a) At each terminal node (leaf), decision trees assign a class label de-
termined by the majority class of the instances in that node. For
regression tasks, the leaf nodes may instead hold the mean or me-
dian of the target variable.

4. Tree Pruning (Optional):

(a) Decision trees may undergo pruning to prevent overfitting and im-
prove generalization performance. Pruning involves removing nodes
that do not contribute significantly to reducing impurity or that re-
sult in minor improvements in predictive accuracy.

5. Prediction:

(a) For predicting new instances, decision trees navigate the tree start-
ing from the root node and moving to a leaf node based on the
feature values of the instance. The class label or predicted value
stored in the leaf node is then assigned as the final prediction.

Decision trees offer simplicity and interpretability, making them valuable


for understanding complex data relationships. They efficiently handles both
the numerical and the categorical data and are computationally efficient for
small to medium-sized datasets. However, decision trees are prone to overfit-
ting, particularly without proper constraints, and can be sensitive to variations
present in the training data. They may also struggle with complex interactions
between features and exhibit bias towards features with many levels. Despite
these limitations, decision trees remain versatile tools in machine learning,
often mitigated by techniques like pruning and ensemble methods.

3.3.6 Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) is a type of artificial neural network commonly


used for supervised learning tasks such as classification and regression. With

15
its ability to model complex nonlinear relationships in data, MLPs have be-
come a cornerstone in machine learning research and applications.
Key Steps of the Multilayer Perceptron Algorithm:

1. Initialization: MLPs begin by initializing the weights and biases of


the neurons in each layer randomly or using predefined strategies like
Xavier or He initialization.

2. Forward Propagation: During forward propagation, the input data is fed


through the network, and the activations of neurons in each layer are
computed using activation functions such as ReLU (Rectified Linear
Unit) or sigmoid.

3. Calculation of Loss: After forward propagation, the loss between the


predicted outputs and the actual targets is computed using a suitable loss
function, such as cross-entropy loss for classification or mean squared
error for regression tasks.

4. Backpropagation: Backpropagation is used to update the weights and


biases of the network to minimize the loss. This involves computing the
gradients of the loss function with respect to the network parameters and
adjusting the parameters using optimization algorithms like stochastic
gradient descent (SGD) or Adam.

5. Iteration: Steps 2-4 are repeated iteratively for multiple epochs until
the model converges to a satisfactory solution or a predefined stopping
criterion is met.
Multilayer Perceptron (MLP) neural networks offer remarkable flexi-
bility in modeling complex relationships within data, enabling accurate
predictions across diverse domains. With their ability to learn hierar-
chical representations of features from raw data, MLPs reduce the need
for manual feature engineering and excel in tasks requiring nonlinear
mappings. However, MLPs come with computational demands, espe-
cially in training deep architectures, and are sensitive to hyperparame-

16
ter settings, often requiring meticulous tuning for optimal performance.
Additionally, their black-box nature poses challenges for interpretabil-
ity, hindering insights into model reasoning. Despite these limitations,
MLPs remain indispensable in modern machine learning, powering ad-
vancements in artificial intelligence and data-driven decision-making.

3.3.7 Linear Discriminant Analysis (LDA)

Linear Discriminant Analysis (LDA) is a classical classification technique


utilized to discover a linear combination of features that characterizes or sep-
arates two or more classes of objects or events. It is commonly applied in
pattern recognition and machine learning tasks to classify data points into
predefined classes.
Key Steps of Linear Discriminant Analysis:

1. Data Preprocessing: LDA starts by standardizing the input features to


ensure they have zero mean and unit variance. This step aids in achiev-
ing robustness and consistency in the analysis.

2. Compute Class Means: LDA calculates the mean vector for each class
in the dataset. These class means act as representatives of the data dis-
tribution for each class.

3. Compute Scatter Matrices: LDA computes two scatter matrices: the


within-class scatter matrix, measuring the data spread within each class,
and the between-class scatter matrix, quantifying the separation between
classes.

4. Compute Eigenvectors and Eigenvalues: By solving the generalized


eigenvalue problem involving the within-class and between-class scatter
matrices, LDA determines eigenvectors and corresponding eigenvalues.
These eigenvectors represent directions (or axes) in the feature space
that maximize class separation.

17
5. Feature Transformation: LDA selects the top k eigenvectors correspond-
ing to the largest eigenvalues to create a transformation matrix. This
matrix is then utilized to project the data onto a lower-dimensional sub-
space while maximizing class separability.

Linear Discriminant Analysis (LDA) provides efficient dimensionality re-


duction while preserving class discriminatory information, making it a valu-
able tool for classification tasks. By maximizing between-class scatter and
minimizing within-class scatter, LDA ensures optimal linear separability be-
tween classes, leading to effective discrimination. Furthermore, as a su-
pervised learning technique, LDA explicitly incorporates class labels during
training, enhancing its discriminatory power. However, LDA’s performance
may be influenced by its assumption of Gaussian feature distributions and
its sensitivity to outliers. Additionally, it is restricted to binary classifica-
tion tasks and may face challenges with the curse of dimensionality in high-
dimensional feature spaces. Despite these limitations, LDA remains a ro-
bust and interpretable method for classification and dimensionality reduction
tasks.

3.4 Loss Functions Used

3.4.1 Cross Entropy Loss

The cross-entropy loss function, also known as logistic loss or log loss, is
a common cost function used in machine learning for classification tasks.
It measures the difference between the expected probabilities predicted by
the model and the actual labels of the output. This loss function is particu-
larly useful for training models to classify data into multiple categories.vskip
0.5cm The product of the true class label and the log of the predicted class
probability, expressed as a negative sum over all classes, is the cross-entropy
loss function. During training, it is preferred to keep this value as low as
possible because doing so increases the predictability of the model.

The formula for cross-entropy loss is:

18
N
− N1
P
L= [yi log(pi ) + (1 − yi ) log(1 − pi )]
i=1

where:
L is the cross-entropy loss
N is the number of samples in the dataset
y is the true label (0 or 1)
p is the predicted probability of the positive class

3.4.2 Mean Absolute Loss

Mean Absolute Loss, often referred to as Mean Absolute Error (MAE), serves
as a metric to evaluate the performance of a regression model. It computes
the average absolute difference between the predicted and actual values of a
dataset, providing insight into the model’s accuracy in predicting continuous
variables.

The formula for calculating MAE is as follows:

X
M AE = (1/n) ∗ |yi − ŷi|

where:
n is the total number of data points
yi is the actual value of the i-th data point
ŷi is the predicted value of the i-th data point

MAE provides a measure of the average magnitude of the errors made by


the model in its predictions. It is useful in scenarios where the magnitude of
errors is important, and outliers can heavily influence the model performance.
MAE is commonly used in applications such as finance, where the prediction
of actual values with small errors can have a significant impact on decision-
making.

19
Chapter 4
Experimental Setup and Result Analysis

4.1 Experiment Environment

The experiments are performed on a Local system with a GPU of 4 GB.The


train test split was done in the ratio of 0.2.

4.2 Data Preprocessing

1. Combining the datasets: We began by combining several datasets into


one dataset. As a result, the succeeding stages in the data pretreatment
stage were made easier and we were able to see the data in its whole.

2. Handling missing values: The next step was to identify and remove
any instances with null or empty values. This was crucial to ensure
the integrity of the data and prevent any inaccuracies in the subsequent
analysis.

3. Removing duplicates: We then proceeded to eliminate any duplicate


instances in the dataset. Duplicate instances could skew the results of
our analysis, and their removal was essential for maintaining the accu-
racy of our findings.

4. Balancing the data: Finally, we balanced the dataset between benign


and malicious instances. This step was necessary to avoid any bias in
the analysis and ensure a fair representation of both categories in the
dataset.

We initially had 631955 entries of which 471597 Benign packets, 4745


Malicious packets and 155613 addware Network traffic. After Preprocess-
ing we were able to reduce the total entries to about 555345 Malicious and
Benign Network Traffic. The training set (505564) and testing set (126391)
were created from the data.
.

20
4.3 Output Labels

The final output labels for the processed dataset are ’benign’, ’general mal-
ware’, and ’adware’. Each row in the dataset would represent a specific in-
stance of network activity, categorized based on these labels. Here’s a clearer
representation:

1. Benign: Indicates instances of network activity that are considered nor-


mal and non-malicious.

2. General Malware: Refers to instances of network activity associated


with common types of malware or malicious behavior.

3. Adware: Represents instances of network activity related to advertising-


supported software that may display unwanted advertisements or collect
user data.

These output labels help categorize the network activity captured in the CSV
file into different types, such as benign, general malware, or adware, enabling
further analysis and classification of the data.

4.3.1 Results

We train and test the KNN classifier on our dataset which we generated and
calculated the Accuracy, precision, F1 Score and Recall along with the confu-
sion matrix and Compared with the available dataset from Canadian Institute
for Cybersecurity-AAGM2017 .

1. Accuracy: 0.895

2. Precision: 0.895

3. Recall: 0.895

4. F1-Score: 0.895

21
Figure 4.1: Confusion Matrix -knn

Figure 4.2: Canadian Institute for Cybersecurity-AAGM2017 (Dataset)

22
2.Random Forest: We train and test the Random Forest classifier on our
dataset which we generated and calculated the Accuracy, precision, F1 Score
and Recall along with the confusion matrix and Compared with the available
dataset from Canadian Institute for Cybersecurity-AAGM2017

1. Accuracy: 0.913

2. Precision: 0.913

3. Recall: 0.913

4. F1-Score: 0.912

Figure 4.3: Confusion Matrix-random forest

23
Figure 4.4: Canadian Institute for Cybersecurity-AAGM2017 (Dataset)

3.Decision tree: We train and test the Decision tree classifier on our
dataset which we generated and calculated the Accuracy, precision, F1 Score
and Recall along with the confusion matrix and Compared with the available
dataset from Canadian Institute for Cybersecurity-AAGM2017

1. Accuracy: 0.920

2. Precision: 0.919

3. Recall: 0.9202

4. F1-Score: 0.912

24
Figure 4.5: Application

Figure 4.6: Canadian Institute for Cybersecurity-AAGM2017 (Dataset)

25
4. Multi layer perceptron: We train and test the MLP classifier on our
dataset which we generated and calculated the Accuracy, precision, F1 Score
and Recall along with the confusion matrix and Compared with the available
dataset from Canadian Institute for Cybersecurity-AAGM2017

1. Accuracy: 0.829

2. Precision: 0.830

3. Recall: 0.829

4. F1-Score: 0.827

Figure 4.7: Application

26
Figure 4.8: Canadian Institute for Cybersecurity-AAGM2017 (Dataset)

5. Ada Boost: We train and test the AdaBoost classifier on our dataset
which we generated and calculated the Accuracy, precision, F1 Score and Re-
call along with the confusion matrix and Compared with the available dataset
from Canadian Institute for Cybersecurity-AAGM2017

1. Accuracy: 0.741

2. Precision: 0.745

3. Recall: 0.741

4. F1-Score: 0.731

27
Figure 4.9: Application

Figure 4.10: Canadian Institute for Cybersecurity-AAGM2017 (Dataset)

28
6. Logistic regreesion: We train and test the Logistic regression classifier
on our dataset which we generated and calculated the Accuracy, precision,
F1 Score and Recall along with the confusion matrix and Compared with the
available dataset from Canadian Institute for Cybersecurity-AAGM2017

1. Accuracy: 0.672

2. Precision: 0.6525

3. Recall: 0.672

4. F1-Score: 0.646

Figure 4.11: Application

29
Figure 4.12: Canadian Institute for Cybersecurity-AAGM2017 (Dataset)

7. Confusion Matrix -Ensemble Confusion matrix and Accuracy, preci-


sion, recall and F1-Score of ensembled models

1. Accuracy: 0.890

2. Precision: 0.890

3. Recall: 0.890

4. F1-Score: 0.889

30
Figure 4.13: Ensembled Confusion Matrix of All models

4.4 Classification of Benign,Malicious and adware Network Traffic

The main steps involved in this process to classify network packets as benign
or malicious are as follows:

Feature extraction using CICFlowMeter

The initial step in the classification process is to extract relevant features


from the network packets. For this purpose, we utilized the CICFlowMeter,
a widely-used network traffic analysis tool.

31
min fpktl furg cnt burg cnt
fPktsPerSecond bPktsPerSecond flowBytesPerSecond
flowPktsPerSecond mean fpktl max bpktl
std flowpktl mean bpktl std bpktl
std fpktl min flowpktl mean flowpktl
min bpktl max flowiat duration
fpsh cnt max fiat total fiat
mean fiat min flowiat min fiat
max fpktl max biat bpsh cnt
total bpktl total bpackets total bhlen
total biat std biat total fhlen
total fpackets mean biat max flowpktl
min biat std fiat total fpktl
flow urg flow cwr flow ece
fAvgSegmentSize min seg size forward avgPacketSize
bAvgSegmentSize flow fin init win bytes forward
flow syn init win bytes backward downUpRatio
max active max idle mean active
mean idle min active min idle
mean flowiat flow psh bVarianceDataBytes
flow rst sflow bbytes std active
sflow bpacket std flowiat flow ack
std idle bAvgPacketsPerBulk RRT samples clnt
FFNEPD sflow fpacket Act data pkt forward
bAvgBulkRate fHeaderBytes fAvgPacketsPerBulk
fAvgBytesPerBulk sflow fbytes fAvgBulkRate

Identifying important features through correlation matrix

Once the features have been extracted, it is essential to identify the most
important ones that contribute significantly to the classification process. To
achieve this, we calculated the correlation matrix of the extracted features.

32
33
Feature selection based on correlation matrix

After identifying the important features through the correlation matrix, we


proceeded with feature selection. We can decrease the dataset’s complexity
and enhance the effectiveness of the classification model by choosing only
the most pertinent characteristics. We used the correlation matrix to select
features that have a strong relationship with the target variable while discard-
ing less significant ones. These are finally selected features:
min fpktl furg cnt burg cnt
fPktsPerSecond bPktsPerSecond flowBytesPerSecond
flowPktsPerSecond mean fpktl max bpktl
std flowpktl mean bpktl std bpktl
std fpktl min flowpktl mean flowpktl
min bpktl max flowiat duration
fpsh cnt max fiat total fiat
mean fiat

4.4.1 Application

We have built an application to check if an application is malicious in our


device. The following screenshots shows the detection procedure of the ap-
plication.

34
Figure 4.15: Home screen of the application

Figure 4.16: counts of different categories

35
Figure 4.17: counts of different categories

Figure 4.18: counts of different categories

36
Performance Metrics
Algorithm Precision Recall F1-Score Accuracy
Random Forest 0.908 0.88 0.89 0.9135
Decision tree 0.90 0.89 0.90 0.92
Adaboost 0.75 0.64 0.66 0.74
KNN 0.87 0.86 0.86 0.90
LDA 0.55 0.53 0.0.53 0.64
MLP 0.81 0.78 0.79 0.8291
Table 4.1: Performance Metrics for six Algorithms (Macro Average)

Performance Metrics
Algorithm Precision Recall F1-Score Accuracy
Ensemble Predictions 0.88 0.85 0.86 0.8911
Table 4.2: Performance Metrics for Ensembled predictions (Ma Average)

37
Chapter 5
Conclusion and Future Scope
In conclusion, the report delved into the development and evaluation of effec-
tive methods to identify and thwart malicious applications through network
traffic analysis.we successfully examined the performance of three widely-
used machine learning algorithms, namely KNN, Decision tree, random for-
est ,mlp AdaBoost for the detection of malicious activities. To process the
dataset, we employed state-of-the-art tools, such as CICFlowMeter and FlowLa-
beler, which facilitated the proper extraction and labelling of network traffic
features.
Our comprehensive analysis of these algorithms, combined with the ad-
vanced tools utilized, has provided valuable insights into the capabilities and
limitations of network traffic-based techniques for detecting malicious An-
droid applications. However, there are several avenues for future research
and improvement in this area.

1. Future Scope

(a) Exploration of Deep Learning Techniques: While traditional ma-


chine learning algorithms have shown promise in malware detec-
tion, the potential of deep learning techniques remains largely un-
explored in this domain. Future studies could investigate the appli-
cation of deep learning architectures such as convolutional neural
networks (CNNs) and recurrent neural networks (RNNs) for fea-
ture learning and classification tasks in Android malware detection.
These techniques have the potential to capture intricate patterns and
behaviors present in malware traffic data, thereby improving detec-
tion accuracy and efficiency.

(b) Deployment of Real-time Detection Systems: As the threat land-


scape evolves rapidly, there is a growing need for real-time detec-
tion systems capable of identifying and mitigating emerging threats
promptly.

38
Bibliography
1. Abdul Kadir A, Habibi Lashkari A and Daghmehchi Firoozjaei M. (2024).
Android Operating System. Understanding Cybersecurity on Smart-
phones.

2. Nawshin F, Unal D, Hammoudeh M and Suganthan P. (2024). AI-


powered malware detection with Differential Privacy for zero trust se-
curity in Internet of Things networks. Ad Hoc Networks.

3. Li J, He J, Li W, Fang W, Yang G and Li T. (2024). SynDroid. Comput-


ers and Security.

4. Sanamontre T, Visoottiviseth V and Ragkhitwetsagul C. (2024). Detect-


ing Malicious Android Game Applications on Third-Party Stores Using
Machine Learning. Advanced Information Networking and Applica-
tions.

5. Sharma T, Rattan D, Kaur P, Gupta A and Gill J. (2024). Enhancing


Android Malware Detection: CFS Based Texture Feature Selection and
Ensembled Classifier for Malware App Analysis. Recent Trends in Im-
age Processing and Pattern Recognition.

6. Khan A and Sharma I. (2023). SAndro: Artificial Intelligence Enabled


Detection Framework of Adware Attacks on Android Machines 2023
Global Conference on Information Technologies and Communications
(GCITC).

7. Guerra-Manzanares A. (2023). Android malware detection: mission


accomplished? A review of open challenges and future perspectives.
Computers Security.

8. Buriya S and Sharma N. (2023). Malware Detection Using 1d Con-


volution with Batch Normalization and L2 Regularization for Android
2023 International Conference on System, Computation, Automation
and Networking (ICSCAN).

39
9. Kumari A, Dubey R and Sharma I. (2023). ShNP: Shielding Nuclear
Plants from Cyber Attacks Using Artificial Intelligence Techniques 2023
Annual International Conference on Emerging Research Areas: Interna-
tional Conference on Intelligent Systems (AICERA/ICIS).

10. Kumari A and Sharma I. (2023). SmRM: Ensemble Learning Devised


Solution for Smart Riskware Management in Android Machines 2023
Annual International Conference on Emerging Research Areas: Inter-
national Conference on Intelligent Systems (AICERA/ICIS).

11. Mijoya I, Khurana S, Gupta N and Gupta K. (2023). Malware Detection


in Mobile Devices Using Hard Voting Ensemble Technique 2023 In-
ternational Conference on Computing, Communication, and Intelligent
Systems (ICCCIS).

12. Nguyen C, Khoa N, Doan K and Cam N. (2023). Android Malware


Category and Family Classification Using Static Analysis 2023 Interna-
tional Conference on Information Networking (ICOIN).

13. Rodriguez-Bazan H, Sidorov G and Escamilla-Ambrosio P. Android


Ransomware Analysis Using Convolutional Neural Network and Fuzzy
Hashing Features. IEEE Access.

14. Bayazit E, Sahingoz O and Dogan B. Protecting Android Devices From


Malware Attacks: A State-of-the-Art Report of Concepts, Modern Learn-
ing Models and Challenges. IEEE Access.

15. Chaudhary M and Masood A. (2023). RealMalSol: real-time optimized


model for Android malware detection using efficient neural networks
and model quantization. Neural Computing and Applications.

16. Khan A and Sharma I. (2023). SAndro: Artificial Intelligence Enabled


Detection Framework of Adware Attacks on Android Machines 2023
Global Conference on Information Technologies and Communications
(GCITC).

17. Guerra-Manzanares A. (2023). Android malware detection: mission


accomplished? A review of open challenges and future perspectives.
Computers Security.

40
18. Buriya S and Sharma N. (2023). Malware Detection Using 1d Con-
volution with Batch Normalization and L2 Regularization for Android
2023 International Conference on System, Computation, Automation
and Networking (ICSCAN).

19. Kumari A, Dubey R and Sharma I. (2023). ShNP: Shielding Nuclear


Plants from Cyber Attacks Using Artificial Intelligence Techniques 2023
Annual International Conference on Emerging Research Areas: Interna-
tional Conference on Intelligent Systems (AICERA/ICIS).

20. Kumari A and Sharma I. (2023). SmRM: Ensemble Learning Devised


Solution for Smart Riskware Management in Android Machines 2023
Annual International Conference on Emerging Research Areas: Inter-
national Conference on Intelligent Systems (AICERA/ICIS).

21. Mijoya I, Khurana S, Gupta N and Gupta K. (2023). Malware Detection


in Mobile Devices Using Hard Voting Ensemble Technique 2023 In-
ternational Conference on Computing, Communication, and Intelligent
Systems (ICCCIS).

22. Nguyen C, Khoa N, Doan K and Cam N. (2023). Android Malware


Category and Family Classification Using Static Analysis 2023 Interna-
tional Conference on Information Networking (ICOIN).

41
Mohak-Report.pdf
ORIGINALITY REPORT

16 %
SIMILARITY INDEX
12%
INTERNET SOURCES
7%
PUBLICATIONS
11%
STUDENT PAPERS

PRIMARY SOURCES

1
Submitted to Malaviya National Institute of
Technology
2%
Student Paper

2
Submitted to Higher Education Commission
Pakistan
1%
Student Paper

3
Submitted to Aliah University
Student Paper 1%
4
Submitted to South Bank University
Student Paper 1%
5
Submitted to Aston University
Student Paper 1%
6
link.springer.com
Internet Source 1%
7
Submitted to University of West London
Student Paper <1 %
8
Md Firoz Hasan, Md. Hasan Moon, Dewan
Mamun Raza. "IoT Network Intrusion
<1 %
Detection Using Ensemble Learning
Approach", 2023 14th International
Conference on Computing Communication
and Networking Technologies (ICCCNT), 2023
Publication

9
www.coursehero.com
Internet Source <1 %
10
iq.opengenus.org
Internet Source <1 %
11
www.mdpi.com
Internet Source <1 %
12
Ton Duc Thang University
Publication <1 %
13
centrallibrary.cit.ac.in
Internet Source <1 %
14
ebin.pub
Internet Source <1 %
15
Sujatha M., Jaidhar C.D.. "Machine learning-
based approaches to enhance the soil fertility
<1 %
—A review", Expert Systems with
Applications, 2023
Publication

16
dokumen.pub
Internet Source <1 %
17
Submitted to Jawaharlal Nehru Technological
University
<1 %
Student Paper
18
Submitted to New Jersey Institute of
Technology
<1 %
Student Paper

19
Submitted to University College London
Student Paper <1 %
20
eurchembull.com
Internet Source <1 %
21
Submitted to De Montfort University
Student Paper <1 %
22
hal.univ-lorraine.fr
Internet Source <1 %
23
vdocument.in
Internet Source <1 %
24
Submitted to University of Surrey
Student Paper <1 %
25
litbangpemas.unisla.ac.id
Internet Source <1 %
26
Submitted to International University -
VNUHCM
<1 %
Student Paper

27
Submitted to Liverpool John Moores
University
<1 %
Student Paper

28
www.ijraset.com
Internet Source <1 %
<1 %
29
Arash Habibi Lashkari, Andi Fitriah A.Kadir,
Hugo Gonzalez, Kenneth Fon Mbah, Ali A.
Ghorbani. "Towards a Network-Based
Framework for Android Malware Detection
and Characterization", 2017 15th Annual
Conference on Privacy, Security and Trust
(PST), 2017
Publication

30
Submitted to University of Hertfordshire
Student Paper <1 %
31
eprints.whiterose.ac.uk
Internet Source <1 %
32
www.arxiv-vanity.com
Internet Source <1 %
33
fastercapital.com
Internet Source <1 %
34
Submitted to Multimedia University
Student Paper <1 %
35
Mustapha Habib, Thomas Ohlson Timoudas,
Yiyu Ding, Natasa Nord, Shuqin Chen, Qian
<1 %
Wang. "A hybrid machine learning approach
for the load prediction in the sustainable
transition of district heating networks",
Sustainable Cities and Society, 2023
Publication
36
Submitted to Nanyang Technological
University
<1 %
Student Paper

37
v1.overleaf.com
Internet Source <1 %
38
Submitted to National College of Ireland
Student Paper <1 %
39
Submitted to University of East London
Student Paper <1 %
40
Submitted to Concordia University
Student Paper <1 %
41
Submitted to University of Southern California
Student Paper <1 %
42
www.researchgate.net
Internet Source <1 %
43
coek.info
Internet Source <1 %
44
repository.charlotte.edu
Internet Source <1 %
45
Submitted to CSU Northridge
Student Paper <1 %
46
Wang, Jinzhen. "Toward Smart and Efficient
Scientific Data Management", New Jersey
<1 %
Institute of Technology, 2023
Publication
47
dspace.univ-ouargla.dz
Internet Source <1 %
48
"Computational Science and Its Applications –
ICCSA 2018", Springer Science and Business
<1 %
Media LLC, 2018
Publication

49
Dingyu Yang, Jian Cao, Cheng Yu, Jing Xiao. "A
Multi-step-ahead CPU Load Prediction
<1 %
Approach in Distributed System", 2012
Second International Conference on Cloud
and Green Computing, 2012
Publication

50
Mahendra Prasad, Sachin Tripathi, Keshav
Dahal. "An efficient feature selection based
<1 %
Bayesian and Rough set approach for
intrusion detection", Applied Soft Computing,
2020
Publication

51
Sibi Chakkaravarthy Sethuraman, Tharshith
Goud Jadapalli, Devi Priya Vimala Sudhakaran,
<1 %
Saraju P. Mohanty. "Flow based containerized
honeypot approach for network traffic
analysis: An empirical study", Computer
Science Review, 2023
Publication

52
berlmathges.de
Internet Source <1 %
53
fdocuments.net
Internet Source <1 %
54
wiredspace.wits.ac.za
Internet Source <1 %
55
www.math.mcmaster.ca
Internet Source <1 %
56
Abdullah Al Maruf, Zakaria Masud Ziyad, Md.
Mahmudul Haque, Fahima Khanam. "Emotion
<1 %
Detection from Text and Sentiment Analysis
of Ukraine Russia War using Machine
Learning Technique", International Journal of
Advanced Computer Science and
Applications, 2022
Publication

57
Kasim Tasdemir, Rafiullah Khan, Fahad
Siddiqui, Sakir Sezer, Fatih Kurugollu, Alperen
<1 %
Bolat. "An Investigation of Machine Learning
Algorithms for High-bandwidth SQL Injection
Detection Utilising BlueField-3 DPU
Technology", 2023 IEEE 36th International
System-on-Chip Conference (SOCC), 2023
Publication

Exclude quotes On Exclude matches Off


Exclude bibliography On

You might also like