0% found this document useful (0 votes)
14 views

VolMemLyzer Volatile Memory Analyzer for Malware Classification Using Feature Engineering

The document introduces VolMemLyzer, a Python-based tool designed for extracting critical features from volatile memory dumps during live malware infections, aiding in malware classification. It focuses on kernel space features and has been tested with a dataset of 1900 samples, achieving a high true positive rate for both binary and multi-class classification. The paper discusses the importance of memory forensics and presents various contributions including feature engineering and evaluation of the tool's effectiveness.

Uploaded by

lohisa9422
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

VolMemLyzer Volatile Memory Analyzer for Malware Classification Using Feature Engineering

The document introduces VolMemLyzer, a Python-based tool designed for extracting critical features from volatile memory dumps during live malware infections, aiding in malware classification. It focuses on kernel space features and has been tested with a dataset of 1900 samples, achieving a high true positive rate for both binary and multi-class classification. The paper discusses the importance of memory forensics and presents various contributions including feature engineering and evaluation of the tool's effectiveness.

Uploaded by

lohisa9422
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

VolMemLyzer: Volatile Memory Analyzer for

2021 Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS) | 978-1-7281-6937-8/20/$31.00 ©2021 IEEE | DOI: 10.1109/RDAAPS48126.2021.9452028

Malware Classification using Feature Engineering


Arash Habibi Lashkari, Beiqi Li, Tristan Lucas Carrier, Gurdip Kaur
Canadian Institute for Cybersecurity (CIC)
University of New Brunswick (UNB)
Fredericton, NB, Canada
{A.Habibi.L, beiqi.li, tcarrie2, gurdip.kaur}@unb.ca

Abstract—Memory forensics is a fundamental step that in- dumps. This information includes location of data, kind of data
spects malicious activities during live malware infection. Memory stored at particular location in memory dumps, and the total
analysis not only captures malware footprints but also collects amount of memory occupied by the particular data [4].
several essential features that may be used to extract hidden orig-
inal code from obfuscated malware. There are significant efforts This paper presents VolMemLyzer, a python-based tool
in analyzing volatile memory using several tools and approaches. developed to extract the most important characterization fea-
These approaches fetch relevant information from the kernel ture set from the memory dumps taken during live malware
and user space of the operating system to investigate running infection. The feature set includes different categories of kernel
malware. However, the fetching process will accelerate if the space features such as processes, threads, sockets, dynamic
most dominating features required for malware classification are
readily available. This paper introduces VolMemLyzer, a python- link libraries, number of connections, and callbacks etc. These
based tool developed to excerpt the most critical characterization features account for the majority of memory forensic activities
feature set from the memory dumps taken during live malware taking place in the kernel space. This fact makes these features
infection. It extracts thirty-six most essential features and ranks important for memory forensics. Following are the main
them to classify malware. The tool is tested with a dataset of contributions of this paper:
1900 benign and malware samples with high true positive rate
for binary and multi-class malware classification. • We develop VolMemLyzer, a tool to extract the most
Index Terms—memory forensics, volatility, memory analysis, important features from memory dumps in kernel space
memory dump, malware classification obtained by using Volatility (Development contribution).
• We perform feature engineering to shortlist best memory
I. I NTRODUCTION
features from the extracted list of thirty-six memory
Memory forensics is the process of investigating illegitimate features which are classified into different kernel space
behavior by acquiring and analyzing volatile memory dumps. categories (Scientific contribution).
A memory dump is the snapshot of the current state of • We analyze the benign and malicious samples to perform
the system’s memory that is captured when the malware is binary and multi-class malware classification with high
running. It contains a plethora of data that includes running true positive rate (Evaluation contribution).
processes, dynamic link libraries, loaded drivers, handles, call-
The rest of the paper is organized as follows: Section
backs, logged users, active network connections, and opened
II unfolds a brief overview of the background on memory
files and sockets.
analysis and forensics. Section III introduces related works.
Memory forensics has gained importance over the last
Section IV details the dataset and Section V describes the
decade due to its capability to capture malware behavior in
proposed model. Section VI introduces VolMemLyzer. The
the running state. Although malware may be obfuscated or
discussion of results and analyses are presented in Section
encrypted, it still leaves its footprints in the memory [1]. Most
VII. Finally, Section VIII concludes the paper and highlights
of the information required for memory forensics resides in
future work.
the kernel space of the operating system. This information
includes process lists, threads, sockets, and connections with II. BACKGROUND
other systems. It can be fetched with tools like Rekall [2] Memory forensic techniques have evolved a lot from simple
and Volatility [3]. There is some information in the user string search to a complex kernel space data acquisition for
space also such as heap of user space stores internet protocol various operating systems [5]. There are a large number of
(IP) addresses, domain name system (DNS) information, and tools and techniques available to accomplish this task such
credentials. Since kernel space manages user space, this paper as mdd, inception, the Windows process monitor, and crash
is concerned with kernel space information. dumps. A taxonomy of different memory acquisition methods
To effectively analyze volatile memory, the investigator for ring-based architecture is presented in [6]. This taxonomy
needs to identify and extract useful information from memory puts forward the advantages and disadvantages of memory
978-1-7281-6937-8/20/$31.00 ©2020 IEEE acquisition methods to perform memory forensics.

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.
Use of encryption hinders the process of recovering digital the system and convert it into a gray-scale image using con-
evidence. Therefore, it becomes important to acquire live volutional neural network. The deep learning model performs
memory dumps to perform memory analysis. On the other binary classification with reduced run time and increased
hand, with the prevalence of memory forensic tools, malware accuracy of detection.
writers have started using anti-forensic techniques to hide The use of virtual machines (VMs) in malware analysis
malicious memory space. These techniques manipulate shared is prevalent in several malware investigation techniques. A
memory and kernel space responsible for managing user space MinHash-based efficient volatile memory dump comparison
[7]. In order to prevent malware from manipulating data and method is generalized to detect ransomware and RATs [20].
tools, identifying hidden code is essential in memory forensics In another approach, Virtual Machine Introspection (VMI) is
[8]. used to propose a Virtual Machine Monitor (VMM)-based,
Memory analysis tools are very helpful to identify a specific guest-assisted Automated Internal-and-External (A-IntExt) in-
area of compromise and direct the analyst to that area [9]. trospection system to investigate live guest operating system
Memory forensics tools not only extend support to Windows [21]. Xu et al. also use virtual memory access patterns [22]
operating system but for Executable and Linkable Format to support hardware-assisted malware detection. Finally, a new
(ELF) executable files in Linux. The Windows Subsystem agentless sandbox design is introduced that works independent
for Linux (WSL) breaks the analysis of ELF files and tracks of VM hypervisor [23].
the meta-data for every process [10]. It integrates Linux- Most of the sophisticated malware samples are obfuscated to
specific data structures into Windows’ existing data structures. hide the original source code. Traditional code extraction pro-
However, this integration may result in inconsistent analysis cesses require manual intervention. Therefore, an automated
results and is an area of extended research. behavior-based malware unpacking algorithm is introduced
Memory forensics impose some limitations on the digitally in [24]. It uses a stealth debugger that avoids detection
signed Windows portable executable files such as data incom- and malware is unable to understand when to stay dormant.
pleteness, data modification, and process inconsistencies [11]. The debugger can detect most packers at an incredibly high
Since memory dumps capture extremely large information, the accuracy rate with some packers getting a perfect detection
use of automatic detection tools plays a pivotal role to shortlist rate. To strengthen the detection of obfuscated malware, a
useful information for analysis [12]. However, they need to dataset consisting of positive and negative memory snapshots
be stress-tested for both open- and closed-source memory is designed [25]. The dataset challenges detection methods on
frameworks to ensure that they do not fail in critical moments one of the biggest malware detection problems by using many
[13]. advanced payload systems and obfuscated malware.
Lakhotia and Black [26] create a plugin for Volatility that
III. R ELATED W ORKS embeds code to automatically extract information about what
data was viewed by the malware. It focuses on finding what
This section uncovers the related works in malware analysis documents might have been read or compromised due to
through memory forensics. malicious activity. Usually extracting this information can be
Malware classification is a trade-off between time and extremely difficult and time-consuming, but the plugin helps to
accuracy. To reduce detection time and improve accuracy of speed up the process of identifying indicators of compromise.
detection, a word2vec-based Long Short-Term Memory-Based Case and Richard [27] propose a novel memory forensics
(LSTM) method for classifying malware is proposed [14]. technique to examine the malware code written in Objective-C
It analyzes opcodes and API function names with reduced code on the Mac operating system (OS). The proposed method
dimensions. The proposed method is validated using Microsoft focuses on new methods for detecting userland malware code
Malware Classification Challenge dataset. Working on the that utilizes the rich set of APIs to manipulate and steal
same concept, static analysis and memory forensic techniques application data and perform other malicious activities. The
[15] are combined to reduce detection time and increase authors also investigate the technique against memory samples
precision. To speed up the malware detection process, [16] infected with the malware found in targeted OSX attacks.
proposes simultaneous use of kernel and user space to extract To summarize, there is significant progress in memory
changes in registry files, calls to library files and operating forensic techniques using malware analysis. These efforts
system functions. result in reduced malware detection time and improved ac-
Although memory dumps can fully describe malware behav- curacy and precision of malware classification. Some of the
ior, yet dynamic malware analysis is vulnerable to malware approaches utilize virtual machine environment to detect ob-
evasion. Therefore, researchers propose the use of hardware fuscated malware samples. Table I summarizes the advan-
features to analyze malware by converting the memory dump tages and disadvantages of related works presented in this
file into a gray-scale image [17]. The proposed method section. However, extracting useful memory-related features
achieves more accuracy in backdoor malware dataset. Simi- from memory dumps and using learning techniques are two
larly, memory access images are used to detect malware using key requirements to accomplish all these mentioned tasks. This
a three-dimensional heat mapping system [18]. In another paper presents VolMemLyzer that gathers the most important
similar attempt, Li et al. [19] collect a memory snapshot of thirty-six features from the memory dumps. These features

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.
TABLE I: Analyzing the related works
Article Technique Advantages Disadvantages
Kang et al., 2019 Long short-term mem- Quicker learn time High false positives
ory
Ratenayaka et al., 2017 Efficient Advanced Quicker investigation time, detection False negatives
Analysis pathing
Aghaeikhierabady et al., 2014 Analysis in a memory Strengthens future security High learning time
image
Dai et al., 2018 Gray-scale image Low overhead comparison time High memory usage
Yucel et al., 2019 Imaging and evaluating Low classification time, indicates malware Very high memory usage
intensity
Li et al., 2019 Deep-Learning-based Low run-time High memory usage
analysis
Nissim et al., 2019 MinHash High memory efficiency Longer training time
Xu et al., 2017 Virtual Memory Pat- Multiple level security Very high learning time
terns
Tien et al., 2020 Virtual Machine Intro- Ease of use and obfuscation detection False negatives
spection
Kawakoya et al., 2010 Behavior-Based Automatic obfuscation detection High complexity
Unpacking
Sadek et al., 2019 Obfuscation evasion Obfuscation detection High memory usage
dataset
Lakhotia et al., 2017 Mining Malware Se- Compromised detection Higher extract time
crets

different attacks and is used with other malware to steal data


while the website is down.
Laqma is a Trojan that uses a rootkit. It is often distributed
with phishing emails. It gains access to the user’s device and
looks for confidential information that could be encrypted and
sold back to the user for ransom. As such, Laqma works as
Trojan ransomware.
Coreflood is a botnet that focuses on a targeted website.
Then the Coreflood botnet aims at turning off the website
through automated means. It consists of Trojan like bots
that have targeted high profile targets to steal credentials.
Coreflood has stolen lots of data from banking information,
Fig. 1: Dataset Details email accounts, and social media accounts to even large and
small company confidential reports.
Prolaco is a worm that focuses on infection to execute
help to perform binary and multi-class malware classification other malicious programs. Prolaco propagates through email
with high precision and very low false-positive rate. The attachments in PDF, doc, or txt forms and other well known
results are validated with a dataset presented in the next and non-malicious files. It makes copies of itself and while
section. it hides and stays free of detection the copies spread out and
IV. DATASET explore the system.
Sality is a virus family that infects files and focuses on
We used a dataset with 1900 memory dump files, containing connecting malware to the machine. Along with virus infection
1127 benign and 773 malicious memory dumps as presented which is done by downloading the malicious file, it can also
in Fig. 1. use a rootkit and backdoor to gain access to the machine. Sality
• Benign: uses a payload system that gets executed when accessing a
The benign memory dump files perform normal user activities machine to steal user data.
such as browsing, sending and receiving email from email Tigger is a Trojan malware family that thrives in stealing
server, accessing Microsoft Office files, running video and data. It is highly evasive to keep the user from realising that
music services, and other miscellaneous activities. their device has been compromised. It goes even farther than
• Malware: that by disabling all other malware to prevent them from being
We selected seven malware families that have minimal ma- detected. It performs a cleaning operation at the end to erase its
nipulation in the memory space. These families include BE2, footprints. Tigger also embeds a rootkit to make its detection
CoreFlood, Sality, Tigger, Prolaco, Laqma, and Zeus. more difficult.
BE2 is a tool that is used as a custom plugin by malicious Zeus is a Trojan that focuses on getting credentials and
parties. BE2 focuses on spreading payloads and launching manipulating data. Zeus is often used for stealing banking

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.
A. Step 1: Malware Execution
To begin with, we selected seven malware and twelve benign
samples, totalling to nineteen samples. We executed malware
samples in a sandbox in a virtual environment by using a
Windows virtual machine. We also executed benign samples
in parallel. Volatility is used to capture memory dumps as a
result of execution of benign and malware samples. Finally,
we obtained nineteen memory dumps at the end of this step.

B. Step 2: Volatile Memory Analysis


Fig. 2: Proposed Model
We analyzed the volatile memory dumps obtained in the
previous step using VolMemLyzer. It extracted thirty-six char-
information and sending it back to a linked botnet. With the acteristics from the memory dumps taken during live malware
ability of key logging the victim’s device, Zeus is able to steal infection. A detailed description of working of VolMemLyzer
banking credentials after being downloaded using message is given in Section VI.
spamming or drive by download activity.
C. Step 3: ML Classification
V. P ROPOSED M ODEL USING M EMORY F EATURE Since we got only nineteen memory dumps, we performed
E NGINEERING sampling in Weka to increase the size of the memory dumps
This section uncovers the proposed model to analyze mem- by 100% and added 7% noise to the sampled data so that the
ory dumps using feature engineering. Fig. 2 presents the step- classifier does not memorize the training samples. We tried
by-step procedure to analyze volatile memory and classify with adding less noise as well but the classifier was memoriz-
benign and malware samples. The proposed model comprises ing the training samples. It resulted into classifier detecting all
three components: (1) malware execution, (2) volatile memory the samples with 100% accuracy. This way we obtained 1900
analysis, and (3) machine learning (ML) classification. memory dumps in the dataset. We applied adaboost, k-nearest
Before presenting the components of the proposed model, neighbour, decision trees and random forest ML classifiers to
we compare Autopsy, Redline and Volatility as the common perform binary and multi-class classification on benign and
memory analyzer tools that are used in digital memory foren- malware samples. We used 10-fold cross validation to do so.
sics. It is found that decision trees and random forest classifier
Autopsy is a user interface analyzer tool that focuses on perform equally well. These two classifiers also outperform
timeline analysis and web artifact extraction on data to display other ML classifiers to classify benign samples and malware
the files information history. With the use of data carving families.
facility, Autopsy is able to look into retrieving files that have
VI. VOL M EM LYZER : F EATURE E XTRACTION FROM
been deleted but are still held in allocated memory. Autopsy is
M EMORY D UMPS
able to scan for indicators of compromise in the device quickly
by using multiple cores with tasks done in parallel [28]. We introduce VolMemLyzer [30], a python-based script that
Redline is a live response end point analysis tool that aids the feature extraction from memory dumps generated by
focuses on collecting all data in memory. It emphasizes on Volatility [3]. VolMemLyzer extracts thirty-six features from
end point memory security to prevent exploiting end-point memory dumps corresponding to nine categories including
vulnerabilities. Redline’s toolkit can identify indicators of processes, dynamic link libraries, handles, loaded modules,
compromise. It uses encryption SHA-1 and MD5 algorithms code injections, connections, sockets, services, drivers, and
to hold integrity of data [29]. callbacks. Table II provides the list of features with their
Volatility reads a memory dump file and extracts specific description.
information for analysis. The extraction is done separately to 1. Process features: Different features corresponding to
not interfere with the system being analyzed. The Volatility processes include total number of processes, parent processes,
framework breaks down each module by feature. It further number of threads created by a process, average number of
breaks down these features into lists of processes, sockets, handles created by the process. Further, it extracts total number
handles as well as other features. These features are then of processes not found in various process lists, handles, and
broken down into more detail, for example the total number sessions.
of sockets and count of TCP and UDP sockets is listed [3]. 2. Dynamic link libraries: It includes total and average
In general, Redline works with live response, Autopsy deals number of loaded libraries for each process.
with web artifact extraction, and Volatility extracts and breaks 3. Handles: It extracts total and average number of handles
down features into very specific modules that can be used opened per process.
for comparison on a detailed level. That is why we selected 4. Loaded modules: It covers total number of modules
Volatility to obtain memory dumps. missing from the load, init, and memory list.

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.
TABLE II: Features list
Module Feature No. Feature Name Description
F0 nproc Total number of processes
F1 nppid Total number of parent processes
pslist F2 avg threads Average number of threads for the processes
F3 nprocs64bit Total number of 64 bit processes
F4 avg handlers Average number of handlers
F5 ndlls Total number of loaded libraries for every process
dllist
F6 avg dlls per proc Average number of loaded libraries per process
F7 nhandles Total number of opened handles
handles
F8 avg handles per proc Average number of handles per process
F9 not in load Total number of modules missing from the load list
ldrmodules F10 not in init Total number of modules missing from the init list
F11 not in mem Total number of modules missing from the memory list
malfind F12 ninjections Total number of hidden code injections
F13 not in pslist Total number of processes not found in the pslist
F14 not in eprocess pool Total number of processes not found in the psscan
F15 not in ethread pool Total number of processes not found in the thrdproc
psxview F16 not in pspcid list Total number of processes not found in the pspcid
F17 not in csrss handles Total number of processes not found in the csrss
F18 not in session Total number of processes not found in the session
F19 not in deskthrd Total number of processes not found in the desktrd
F20 nconnections Total number of connections
connections
F21 nremotes Total number of remote connections
F22 nsockets Total number of sockets
sockets F23 ntcp Total number of TCP sockets
F24 nudp Total number of UDP sockets
modules F25 nmodules Total number of modules
F26 nservices Total number of services
F27 kernel drivers Total number of kernel drivers
F28 fs drivers Total number of file system drivers
svcscan F29 process services Total number of Windows 32 owned processes
F30 shared process services Total number of Windows 32 shared processes
F31 interactive process services Total number of interactive service processes
F32 nactive Total number of actively running service processes
F33 ncallbacks Total number of callbacks
callbacks F34 nanonymous Total number of unknown processes
F35 ngeneric Total number of generic processes

5. Code injections: It extracts hidden code injections found volatility \


in memory dump. --output=json \
6. Connections: It includes total number of connections --output-file=/tmp/tmpaabbccdd11223344 \
including remote connections. pslist \
/path/to/capture.vmem
7. Sockets: Sockets correspond to TCP and UDP sockets
created by the malware while execution. Fig. 3: Sample command sent to Volatility by VolMemLyzer
8. Services and drivers: It extracts total number of services,
kernel drivers, file system drivers, Windows 32 owned pro-
cesses, Windows 32 shared processes, interactive services, and as values. VolMemLyzer then extracts features from the key-
actively running processes. value pair list according to our proposed model. The extracted
9. Callbacks: It includes total number of callbacks, unknown features are then written to a CSV file specified by user-defined
processes, and generic processes. command-line arguments.
Volatility provides several tools or modules that conduct
VII. A NALYSIS AND D ISCUSSION
various types of memory forensics such as listing running pro-
cesses (pslist) or loaded dlls for each process (dlllist). This section presents major findings and discussion of
VolMemLyzer calls these modules via the volatility results.
command-line tool as shown in Fig. 3. Each Volatility module
outputs the result as a table formatted in JSON (enabled by A. Feature Extraction
passing --output=json argument to Volatility), which is We ranked them using Extra Trees Classifier [31] to rank
written to a temporary file allocated by VolMemLyzer. Every the thirty-six features extracted from nine categories presented
time Volatility is invoked, VolMemLyzer parses the table-like in Table II and plotted them in Fig. 4. We computed gini im-
Volatility output to a list of key-value pairs to simplify the portance value of each feature and ranked them in descending
later feature extraction process. The key-value pair uses the order to shortlist the most important features. It is evident that
table header as the keys and their corresponding data cells total number of services (F26), kernel drivers (F27), modules

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.
missing from the init list (F10), Windows 32 shared processes AVAILABILITY
(F30), and generic processes (F35) are the top five features.
Further, it is observed that total number of file system drivers The source code for VolMemLyzer will be made publicly
(F28), Windows 32 owned processes (F29), processes not available in GitHub after publication of this work [30].
found in the psscan (F14), 64-bit processes (F3), interactive
service processes (F31), and unknown processes (F34) are the R EFERENCES
least important features with zero gini importance value. [1] Y. Cheng, X. Fu, X. Du, B. Luo, and M. Guizani, “A lightweight
live memory forensic approach based on hardware virtualization,”
B. Binary and Multi-class Classification Information Sciences, vol. 379, pp. 23 – 41, 2017. [Online]. Available:
The results of ML classification for benign and malware http://www.sciencedirect.com/science/article/pii/S0020025516305011
[2] “Recall forensics,” 2020, =http://www.rekall-forensic.com/.
samples are presented in Table III. Decision tree and random [3] “Volatility,” 2020, =https://github.com/volatilityfoundation/
forest classifiers have 93% true positive rate and 6.6% false volatility.
positive rate. These are followed by k-nearest neighbour with [4] F. Block and A. Dewald, “Linux memory forensics:
Dissecting the user space process heap,” Digital Investigation,
87.2% true positive rate whilst adaboost has only 63.8% true vol. 22, pp. S66 – S75, 2017. [Online]. Available:
positive rate with zero precision and f-measure. http://www.sciencedirect.com/science/article/pii/S1742287617301895
Binary classification of benign and malware samples by all [5] A. Case and G. G. Richard, “Memory forensics: The path
forward,” Digital Investigation, vol. 20, pp. 23 – 33, 2017,
the ML classifiers is plotted in confusion matrices in Fig. special Issue on Volatile Memory Analysis. [Online]. Available:
5. Adaboost successfully classifies only 96 malware samples http://www.sciencedirect.com/science/article/pii/S1742287616301529
while k-nearest neighbour classifies 540 malware samples [6] T. Latzo, R. Palutke, and F. Freiling, “A universal taxonomy
and survey of forensic memory acquisition techniques,” Digital
correctly. Decision tree and random forest classify 651 out of Investigation, vol. 28, pp. 56 – 69, 2019. [Online]. Available:
733 malware samples. All these classifiers correctly identify http://www.sciencedirect.com/science/article/pii/S1742287618304535
1,116 benign samples. [7] R. Palutke, F. Block, P. Reichenberger, and D. Stripeika, “Hiding process
memory via anti-forensic techniques,” Forensic Science International:
Finally, multi-class classification of malware families is Digital Investigation, vol. 33, p. 301012, 2020. [Online]. Available:
plotted in Fig. 6. Adaboost gives the worst results which http://www.sciencedirect.com/science/article/pii/S2666281720302614
does not classify any single sample of BE2, Tigger, Prolaco, [8] F. Block and A. Dewald, “Windows memory forensics: Detecting
(un)intentionally hidden injected code by examining page table entries,”
Laqma and Zeus malware family correctly. Adaboost is biased Digital Investigation, vol. 29, pp. S3 – S12, 2019. [Online]. Available:
towards CoreFlood and Sality malware families. These results http://www.sciencedirect.com/science/article/pii/S1742287619301574
are improved a lot by k-nearest neighbour which correctly [9] D. Uroz and R. J. Rodrı́guez, “Characteristics and detectability of
classifies majority of the malware family samples. Finally, the windows auto-start extensibility points in memory forensics,” Digital
Investigation, vol. 28, pp. S95 – S104, 2019. [Online]. Available:
best results are obtained from decision tree and random forest http://www.sciencedirect.com/science/article/pii/S1742287619300362
classifiers which equally classify 93 malware samples of all [10] N. Lewis, A. Case, A. Ali-Gombe, and G. G. Richard,
malware families correctly. “Memory forensics and the windows subsystem for linux,” Digital
Investigation, vol. 26, pp. S3 – S11, 2018. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S1742287618301944
VIII. C ONCLUSION AND F UTURE W ORKS [11] D. Uroz and R. J. Rodrı́guez, “On challenges in verifying trusted
Memory analysis is gaining prominence owing to its ability executable files in memory forensics,” Forensic Science International:
Digital Investigation, vol. 32, p. 300917, 2020. [Online]. Available:
to capture live malware behavior. There are several tools in http://www.sciencedirect.com/science/article/pii/S2666281720300123
the related works that fetch information from kernel and user [12] A. Case, R. D. Maggio, M. Firoz-Ul-Amin, M. M. Jalalzai,
space to classify malware in the running state. Nevertheless, A. Ali-Gombe, M. Sun, and G. G. Richard, “Hooktracer: Automatic
detection and analysis of keystroke loggers using memory forensics,”
to improve the malware classification process, we developed Computers Security, vol. 96, p. 101872, 2020. [Online]. Available:
VolMemLyzer to extract thirty-six most essential features from http://www.sciencedirect.com/science/article/pii/S0167404820301450
nine different categories in the memory dumps obtained during [13] A. Case, A. K. Das, S.-J. Park, J. R. Ramanujam,
and G. G. Richard, “Gaslight: A comprehensive fuzzing
live malware infection. We also ranked these features using architecture for memory forensics frameworks,” Digital Investigation,
extra trees classifier to shortlist the best features from the vol. 22, pp. S86 – S93, 2017. [Online]. Available:
extracted list of features. Finally, we tested the developed http://www.sciencedirect.com/science/article/pii/S1742287617301986
[14] J. Kang, S. Jang, S. Li, Y.-S. Jeong, and Y. Sung, “Long short-term
tool with 1900 memory dumps, including 1127 benign and memory-based malware classification method for information security,”
773 malware samples. The results are validated with ML Computers and Electrical Engineering, vol. 77, 2019.
classifiers. We performed binary and multi-class malware [15] C. Rathnayaka and A. Jamdagni, “An efficient approach for advanced
malware analysis using memory forensic technique,” 2017 IEEE Trust-
classification to achieve 93% true positive rate with decision com/BigDataSE/ICESS, Sydney, NSW, pp. 1145–1150, 2017.
tree and random forest classifiers when applied on a shortlisted [16] M. Aghaeikheirabady, S. M. R. Farshchi, and H. Shirazi, “A new
set of features. approach to malware detection by comparative analysis of data structures
However, there are some limitations to this work. We in a memory image,” 2014 International Congress on Technology,
Communication and Knowledge (ICTCK), Mashhad, pp. 1–4, 2014.
tested the results on a sampled dataset with seven malware [17] Y. Dai, H. Li, Y. Qian, and X. Lu, “A malware classification method
families and a few benign samples. To improve the malware based on memory dump grayscale image,” Digital Investigation, vol. 27,
classification results, we will generate a new dataset that will pp. 30 – 37, 2018.
[18] C. Yucel and A. Koltuksuz, “Imaging and evaluating the memory access
consist of diverse malware families and a comparatively large for malware,” Forensic Science International: Digital Investigation,
number of benign and malware samples. vol. 32, 2019.

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.
Fig. 4: Feature Ranking

TABLE III
Classifier TP Rate FP Rate Precision Recall F-Measure
AdaBoost 0.638 0.085 0 0.638 0
k-Nearest Neighbour 0.872 0.155 0.866 0.872 0.849
Decision Tree 0.930 0.066 0.930 0.930 0.928
Random Forest 0.930 0.066 0.930 0.930 0.928

(a) AdaBoost (b) k Nearest Neighbour (c) Decision Tree (d) Random Forest
Fig. 5: Confusion Matrix for Binary Classification

(a) AdaBoost (b) k Nearest Neighbour (c) Decision Tree (d) Random Forest
Fig. 6: Confusion Matrix for Classifying Malware Families

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.
[19] H. Li, D. Zhan, T. Liu, and L. Ye, “Using deep-learning-based memory
analysis for malware detection in cloud,” 2019 IEEE 16th International
Conference on Mobile Ad Hoc and Sensor Systems Workshops (MASSW),
Monterey, CA, USA, pp. 1–6, 2019.
[20] N. Nissima, O. Lahava, A. Cohena, and Y. E. L. Rokacha, “Volatile
memory analysis using the minhash method for efficient and secured
detection of malware in private cloud,” Computers & Security, vol. 87,
2019.
[21] M. Ajay Kumara and C. Jaidhar, “Leveraging virtual machine
introspection with memory forensics to detect and characterize
unknown malware using machine learning techniques at hypervisor,”
Digital Investigation, vol. 23, pp. 99 – 123, 2017. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S1742287617303328
[22] Z. Xu, S. Ray, P. Subramanyan, and S. Malik, “Malware detection using
machine learning based analysis of virtual memory access patterns,”
Design, Automation Test in Europe Conference Exhibition (DATE),
Lausanne, pp. 169–174, 2017.
[23] C.-W. Tien, J.-W. Liao, S.-C. Chang, and S.-Y. Kuo, “Memory forensics
using virtual machine introspection for malware analysis,” IEEE Confer-
ence on Dependable and Secure Computing, Taipei, pp. 518–519, 2017.
[24] Y. Kawakoya, M. Iwamura, and M. Itoh, “Memory behavior-based
automatic malware unpacking in stealth debugging environment,” 5th
International Conference on Malicious and Unwanted Software, Nancy,
Lorraine, pp. 39–46, 2010.
[25] I. Sadek, P. Chong, S. U. Rehman, Y. Elovici, and A. Binder, “Memory
snapshot dataset of a compromised host with malware using obfuscation
evasion techniques,” Data in brief, vol. 26, 2019.
[26] A. Lakhotia and P. Black, “Mining malware secrets,” 12th International
Conference on Malicious and Unwanted Software (MALWARE), Fajardo,
pp. 11–18, 2017.
[27] A. Case and G. G. Richard, “Detecting objective-c
malware through memory forensics,” Digital Investigation,
vol. 18, pp. S3 – S10, 2016. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S1742287616300524
[28] “Autopsy,” 2020, =https://github.com/sleuthkit/autopsy.
[29] “Redline,” 2020, =https://github.com/craigwblake/redline.
[30] “Volatility memory analyzer,” 2019. [Online]. Available:
https://github.com/ahlashkari/VolMemLyzer
[31] “Extra trees classifier,” 2020, =https://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.
ExtraTreesClassifier.html.

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.

You might also like