Skip to content

aayuv17/Malware-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Malware-Analysis

Malware Analysis using Machine Learning

Introduction

The objective of this project is to perform malware analysis on a given dataset to implement various basic classification-based machine learning algorithms. We make use of two datasets, one which is self-generated and another obtained from the internet. The classification algorithms used are Gaussian Naive Bayes, RandomForestClassifier, DecisionTreeClassifier & Linear SVC.

Summary

The self-generated dataset was created by:

  • running windows PE files through the cuckoo sandbox to obtain benign file data
  • obtaining malign hashes from virusshare.com and running them through an online malware analyzer (virustotal.com) to obtain malignant file data

The online dataset used was: https://www.kaggle.com/amauricio/pe-files-malwares

The accuracy obtained using various algorithms on the self-generated dataset:

  • using Gaussian Naive Bayes: 50.5%
  • using RandomForestClassifier: 96.96%
  • using DecisionTreeClassifier: 100%
  • using Linear SVC: 100%

The accuracy obtained using various algorithms on the online dataset:

  • using Gaussian Naive Bayes: 32.24%
  • using RandomForestClassifier: 98.44%
  • using DecisionTreeClassifier: 41.01%
  • using Linear SVC: 96.06%

Future Work (Documentation):

  • Include instructions for using Cuckoo Sandbox & storing data obtained in csv form.

About

Malware Analysis using Machine Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published