VolMemLyzer Volatile Memory Analyzer for Malware Classification Using Feature Engineering

The document introduces VolMemLyzer, a Python-based tool designed for extracting critical features from volatile memory dumps during live malware infections, aiding in malware classification. It focuses on kernel space features and has been tested with a dataset of 1900 samples, achieving a high true positive rate for both binary and multi-class classification. The paper discusses the importance of memory forensics and presents various contributions including feature engineering and evaluation of the tool's effectiveness.

Uploaded by

lohisa9422

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

VolMemLyzer Volatile Memory Analyzer for Malware Classification Using Feature Engineering

Uploaded by

lohisa9422

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

VolMemLyzer: Volatile Memory Analyzer for

Malware Classification using Feature Engineering

Arash Habibi Lashkari, Beiqi Li, Tristan Lucas Carrier, Gurdip Kaur
Canadian Institute for Cybersecurity (CIC)
University of New Brunswick (UNB)
Fredericton, NB, Canada
{A.Habibi.L, beiqi.li, tcarrie2, gurdip.kaur}@unb.ca

Abstract—Memory forensics is a fundamental step that in- dumps. This information includes location of data, kind of data
spects malicious activities during live malware infection. Memory stored at particular location in memory dumps, and the total
analysis not only captures malware footprints but also collects amount of memory occupied by the particular data [4].
several essential features that may be used to extract hidden orig-
inal code from obfuscated malware. There are significant efforts This paper presents VolMemLyzer, a python-based tool
in analyzing volatile memory using several tools and approaches. developed to extract the most important characterization fea-
These approaches fetch relevant information from the kernel ture set from the memory dumps taken during live malware
and user space of the operating system to investigate running infection. The feature set includes different categories of kernel
malware. However, the fetching process will accelerate if the space features such as processes, threads, sockets, dynamic
most dominating features required for malware classification are
readily available. This paper introduces VolMemLyzer, a python- link libraries, number of connections, and callbacks etc. These
based tool developed to excerpt the most critical characterization features account for the majority of memory forensic activities
feature set from the memory dumps taken during live malware taking place in the kernel space. This fact makes these features
infection. It extracts thirty-six most essential features and ranks important for memory forensics. Following are the main
them to classify malware. The tool is tested with a dataset of contributions of this paper:
1900 benign and malware samples with high true positive rate
for binary and multi-class malware classification. • We develop VolMemLyzer, a tool to extract the most
Index Terms—memory forensics, volatility, memory analysis, important features from memory dumps in kernel space
memory dump, malware classification obtained by using Volatility (Development contribution).
• We perform feature engineering to shortlist best memory
I. I NTRODUCTION
features from the extracted list of thirty-six memory
Memory forensics is the process of investigating illegitimate features which are classified into different kernel space
behavior by acquiring and analyzing volatile memory dumps. categories (Scientific contribution).
A memory dump is the snapshot of the current state of • We analyze the benign and malicious samples to perform
the system’s memory that is captured when the malware is binary and multi-class malware classification with high
running. It contains a plethora of data that includes running true positive rate (Evaluation contribution).
processes, dynamic link libraries, loaded drivers, handles, call-
The rest of the paper is organized as follows: Section
backs, logged users, active network connections, and opened
II unfolds a brief overview of the background on memory
files and sockets.
analysis and forensics. Section III introduces related works.
Memory forensics has gained importance over the last
Section IV details the dataset and Section V describes the
decade due to its capability to capture malware behavior in
proposed model. Section VI introduces VolMemLyzer. The
the running state. Although malware may be obfuscated or
discussion of results and analyses are presented in Section
encrypted, it still leaves its footprints in the memory [1]. Most
VII. Finally, Section VIII concludes the paper and highlights
of the information required for memory forensics resides in
future work.
the kernel space of the operating system. This information
includes process lists, threads, sockets, and connections with II. BACKGROUND
other systems. It can be fetched with tools like Rekall [2] Memory forensic techniques have evolved a lot from simple
and Volatility [3]. There is some information in the user string search to a complex kernel space data acquisition for
space also such as heap of user space stores internet protocol various operating systems [5]. There are a large number of
(IP) addresses, domain name system (DNS) information, and tools and techniques available to accomplish this task such
credentials. Since kernel space manages user space, this paper as mdd, inception, the Windows process monitor, and crash
is concerned with kernel space information. dumps. A taxonomy of different memory acquisition methods
To effectively analyze volatile memory, the investigator for ring-based architecture is presented in [6]. This taxonomy
needs to identify and extract useful information from memory puts forward the advantages and disadvantages of memory
978-1-7281-6937-8/20/$31.00 ©2020 IEEE acquisition methods to perform memory forensics.

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.
Use of encryption hinders the process of recovering digital the system and convert it into a gray-scale image using con-
evidence. Therefore, it becomes important to acquire live volutional neural network. The deep learning model performs
memory dumps to perform memory analysis. On the other binary classification with reduced run time and increased
hand, with the prevalence of memory forensic tools, malware accuracy of detection.
writers have started using anti-forensic techniques to hide The use of virtual machines (VMs) in malware analysis
malicious memory space. These techniques manipulate shared is prevalent in several malware investigation techniques. A
memory and kernel space responsible for managing user space MinHash-based efficient volatile memory dump comparison
[7]. In order to prevent malware from manipulating data and method is generalized to detect ransomware and RATs [20].
tools, identifying hidden code is essential in memory forensics In another approach, Virtual Machine Introspection (VMI) is
[8]. used to propose a Virtual Machine Monitor (VMM)-based,
Memory analysis tools are very helpful to identify a specific guest-assisted Automated Internal-and-External (A-IntExt) in-
area of compromise and direct the analyst to that area [9]. trospection system to investigate live guest operating system
Memory forensics tools not only extend support to Windows [21]. Xu et al. also use virtual memory access patterns [22]
operating system but for Executable and Linkable Format to support hardware-assisted malware detection. Finally, a new
(ELF) executable files in Linux. The Windows Subsystem agentless sandbox design is introduced that works independent
for Linux (WSL) breaks the analysis of ELF files and tracks of VM hypervisor [23].
the meta-data for every process [10]. It integrates Linux- Most of the sophisticated malware samples are obfuscated to
specific data structures into Windows’ existing data structures. hide the original source code. Traditional code extraction pro-
However, this integration may result in inconsistent analysis cesses require manual intervention. Therefore, an automated
results and is an area of extended research. behavior-based malware unpacking algorithm is introduced
Memory forensics impose some limitations on the digitally in [24]. It uses a stealth debugger that avoids detection
signed Windows portable executable files such as data incom- and malware is unable to understand when to stay dormant.
pleteness, data modification, and process inconsistencies [11]. The debugger can detect most packers at an incredibly high
Since memory dumps capture extremely large information, the accuracy rate with some packers getting a perfect detection
use of automatic detection tools plays a pivotal role to shortlist rate. To strengthen the detection of obfuscated malware, a
useful information for analysis [12]. However, they need to dataset consisting of positive and negative memory snapshots
be stress-tested for both open- and closed-source memory is designed [25]. The dataset challenges detection methods on
frameworks to ensure that they do not fail in critical moments one of the biggest malware detection problems by using many
[13]. advanced payload systems and obfuscated malware.
Lakhotia and Black [26] create a plugin for Volatility that
III. R ELATED W ORKS embeds code to automatically extract information about what
data was viewed by the malware. It focuses on finding what
This section uncovers the related works in malware analysis documents might have been read or compromised due to
through memory forensics. malicious activity. Usually extracting this information can be
Malware classification is a trade-off between time and extremely difficult and time-consuming, but the plugin helps to
accuracy. To reduce detection time and improve accuracy of speed up the process of identifying indicators of compromise.
detection, a word2vec-based Long Short-Term Memory-Based Case and Richard [27] propose a novel memory forensics
(LSTM) method for classifying malware is proposed [14]. technique to examine the malware code written in Objective-C
It analyzes opcodes and API function names with reduced code on the Mac operating system (OS). The proposed method
dimensions. The proposed method is validated using Microsoft focuses on new methods for detecting userland malware code
Malware Classification Challenge dataset. Working on the that utilizes the rich set of APIs to manipulate and steal
same concept, static analysis and memory forensic techniques application data and perform other malicious activities. The
[15] are combined to reduce detection time and increase authors also investigate the technique against memory samples
precision. To speed up the malware detection process, [16] infected with the malware found in targeted OSX attacks.
proposes simultaneous use of kernel and user space to extract To summarize, there is significant progress in memory
changes in registry files, calls to library files and operating forensic techniques using malware analysis. These efforts
system functions. result in reduced malware detection time and improved ac-
Although memory dumps can fully describe malware behav- curacy and precision of malware classification. Some of the
ior, yet dynamic malware analysis is vulnerable to malware approaches utilize virtual machine environment to detect ob-
evasion. Therefore, researchers propose the use of hardware fuscated malware samples. Table I summarizes the advan-
features to analyze malware by converting the memory dump tages and disadvantages of related works presented in this
file into a gray-scale image [17]. The proposed method section. However, extracting useful memory-related features
achieves more accuracy in backdoor malware dataset. Simi- from memory dumps and using learning techniques are two
larly, memory access images are used to detect malware using key requirements to accomplish all these mentioned tasks. This
a three-dimensional heat mapping system [18]. In another paper presents VolMemLyzer that gathers the most important
similar attempt, Li et al. [19] collect a memory snapshot of thirty-six features from the memory dumps. These features

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.
TABLE I: Analyzing the related works
Article Technique Advantages Disadvantages
Kang et al., 2019 Long short-term mem- Quicker learn time High false positives
ory
Ratenayaka et al., 2017 Efficient Advanced Quicker investigation time, detection False negatives
Analysis pathing
Aghaeikhierabady et al., 2014 Analysis in a memory Strengthens future security High learning time
image
Dai et al., 2018 Gray-scale image Low overhead comparison time High memory usage
Yucel et al., 2019 Imaging and evaluating Low classification time, indicates malware Very high memory usage
intensity
Li et al., 2019 Deep-Learning-based Low run-time High memory usage
analysis
Nissim et al., 2019 MinHash High memory efficiency Longer training time
Xu et al., 2017 Virtual Memory Pat- Multiple level security Very high learning time
terns
Tien et al., 2020 Virtual Machine Intro- Ease of use and obfuscation detection False negatives
spection
Kawakoya et al., 2010 Behavior-Based Automatic obfuscation detection High complexity
Unpacking
Sadek et al., 2019 Obfuscation evasion Obfuscation detection High memory usage
dataset
Lakhotia et al., 2017 Mining Malware Se- Compromised detection Higher extract time
crets

different attacks and is used with other malware to steal data

while the website is down.
Laqma is a Trojan that uses a rootkit. It is often distributed
with phishing emails. It gains access to the user’s device and
looks for confidential information that could be encrypted and
sold back to the user for ransom. As such, Laqma works as
Trojan ransomware.
Coreflood is a botnet that focuses on a targeted website.
Then the Coreflood botnet aims at turning off the website
through automated means. It consists of Trojan like bots
that have targeted high profile targets to steal credentials.
Coreflood has stolen lots of data from banking information,
Fig. 1: Dataset Details email accounts, and social media accounts to even large and
small company confidential reports.
Prolaco is a worm that focuses on infection to execute
help to perform binary and multi-class malware classification other malicious programs. Prolaco propagates through email
with high precision and very low false-positive rate. The attachments in PDF, doc, or txt forms and other well known
results are validated with a dataset presented in the next and non-malicious files. It makes copies of itself and while
section. it hides and stays free of detection the copies spread out and
IV. DATASET explore the system.
Sality is a virus family that infects files and focuses on
We used a dataset with 1900 memory dump files, containing connecting malware to the machine. Along with virus infection
1127 benign and 773 malicious memory dumps as presented which is done by downloading the malicious file, it can also
in Fig. 1. use a rootkit and backdoor to gain access to the machine. Sality
• Benign: uses a payload system that gets executed when accessing a
The benign memory dump files perform normal user activities machine to steal user data.
such as browsing, sending and receiving email from email Tigger is a Trojan malware family that thrives in stealing
server, accessing Microsoft Office files, running video and data. It is highly evasive to keep the user from realising that
music services, and other miscellaneous activities. their device has been compromised. It goes even farther than
• Malware: that by disabling all other malware to prevent them from being
We selected seven malware families that have minimal ma- detected. It performs a cleaning operation at the end to erase its
nipulation in the memory space. These families include BE2, footprints. Tigger also embeds a rootkit to make its detection
CoreFlood, Sality, Tigger, Prolaco, Laqma, and Zeus. more difficult.
BE2 is a tool that is used as a custom plugin by malicious Zeus is a Trojan that focuses on getting credentials and
parties. BE2 focuses on spreading payloads and launching manipulating data. Zeus is often used for stealing banking

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.
A. Step 1: Malware Execution
To begin with, we selected seven malware and twelve benign
samples, totalling to nineteen samples. We executed malware
samples in a sandbox in a virtual environment by using a
Windows virtual machine. We also executed benign samples
in parallel. Volatility is used to capture memory dumps as a
result of execution of benign and malware samples. Finally,
we obtained nineteen memory dumps at the end of this step.

B. Step 2: Volatile Memory Analysis

Fig. 2: Proposed Model
We analyzed the volatile memory dumps obtained in the
previous step using VolMemLyzer. It extracted thirty-six char-
information and sending it back to a linked botnet. With the acteristics from the memory dumps taken during live malware
ability of key logging the victim’s device, Zeus is able to steal infection. A detailed description of working of VolMemLyzer
banking credentials after being downloaded using message is given in Section VI.
spamming or drive by download activity.
C. Step 3: ML Classification
V. P ROPOSED M ODEL USING M EMORY F EATURE Since we got only nineteen memory dumps, we performed
E NGINEERING sampling in Weka to increase the size of the memory dumps
This section uncovers the proposed model to analyze mem- by 100% and added 7% noise to the sampled data so that the
ory dumps using feature engineering. Fig. 2 presents the step- classifier does not memorize the training samples. We tried
by-step procedure to analyze volatile memory and classify with adding less noise as well but the classifier was memoriz-
benign and malware samples. The proposed model comprises ing the training samples. It resulted into classifier detecting all
three components: (1) malware execution, (2) volatile memory the samples with 100% accuracy. This way we obtained 1900
analysis, and (3) machine learning (ML) classification. memory dumps in the dataset. We applied adaboost, k-nearest
Before presenting the components of the proposed model, neighbour, decision trees and random forest ML classifiers to
we compare Autopsy, Redline and Volatility as the common perform binary and multi-class classification on benign and
memory analyzer tools that are used in digital memory foren- malware samples. We used 10-fold cross validation to do so.
sics. It is found that decision trees and random forest classifier
Autopsy is a user interface analyzer tool that focuses on perform equally well. These two classifiers also outperform
timeline analysis and web artifact extraction on data to display other ML classifiers to classify benign samples and malware
the files information history. With the use of data carving families.
facility, Autopsy is able to look into retrieving files that have
VI. VOL M EM LYZER : F EATURE E XTRACTION FROM
been deleted but are still held in allocated memory. Autopsy is
M EMORY D UMPS
able to scan for indicators of compromise in the device quickly
by using multiple cores with tasks done in parallel [28]. We introduce VolMemLyzer [30], a python-based script that
Redline is a live response end point analysis tool that aids the feature extraction from memory dumps generated by
focuses on collecting all data in memory. It emphasizes on Volatility [3]. VolMemLyzer extracts thirty-six features from
end point memory security to prevent exploiting end-point memory dumps corresponding to nine categories including
vulnerabilities. Redline’s toolkit can identify indicators of processes, dynamic link libraries, handles, loaded modules,
compromise. It uses encryption SHA-1 and MD5 algorithms code injections, connections, sockets, services, drivers, and
to hold integrity of data [29]. callbacks. Table II provides the list of features with their
Volatility reads a memory dump file and extracts specific description.
information for analysis. The extraction is done separately to 1. Process features: Different features corresponding to
not interfere with the system being analyzed. The Volatility processes include total number of processes, parent processes,
framework breaks down each module by feature. It further number of threads created by a process, average number of
breaks down these features into lists of processes, sockets, handles created by the process. Further, it extracts total number
handles as well as other features. These features are then of processes not found in various process lists, handles, and
broken down into more detail, for example the total number sessions.
of sockets and count of TCP and UDP sockets is listed [3]. 2. Dynamic link libraries: It includes total and average
In general, Redline works with live response, Autopsy deals number of loaded libraries for each process.
with web artifact extraction, and Volatility extracts and breaks 3. Handles: It extracts total and average number of handles
down features into very specific modules that can be used opened per process.
for comparison on a detailed level. That is why we selected 4. Loaded modules: It covers total number of modules
Volatility to obtain memory dumps. missing from the load, init, and memory list.

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.
TABLE II: Features list
Module Feature No. Feature Name Description
F0 nproc Total number of processes
F1 nppid Total number of parent processes
pslist F2 avg threads Average number of threads for the processes
F3 nprocs64bit Total number of 64 bit processes
F4 avg handlers Average number of handlers
F5 ndlls Total number of loaded libraries for every process
dllist
F6 avg dlls per proc Average number of loaded libraries per process
F7 nhandles Total number of opened handles
handles
F8 avg handles per proc Average number of handles per process
F9 not in load Total number of modules missing from the load list
ldrmodules F10 not in init Total number of modules missing from the init list
F11 not in mem Total number of modules missing from the memory list
malfind F12 ninjections Total number of hidden code injections
F13 not in pslist Total number of processes not found in the pslist
F14 not in eprocess pool Total number of processes not found in the psscan
F15 not in ethread pool Total number of processes not found in the thrdproc
psxview F16 not in pspcid list Total number of processes not found in the pspcid
F17 not in csrss handles Total number of processes not found in the csrss
F18 not in session Total number of processes not found in the session
F19 not in deskthrd Total number of processes not found in the desktrd
F20 nconnections Total number of connections
connections
F21 nremotes Total number of remote connections
F22 nsockets Total number of sockets
sockets F23 ntcp Total number of TCP sockets
F24 nudp Total number of UDP sockets
modules F25 nmodules Total number of modules
F26 nservices Total number of services
F27 kernel drivers Total number of kernel drivers
F28 fs drivers Total number of file system drivers
svcscan F29 process services Total number of Windows 32 owned processes
F30 shared process services Total number of Windows 32 shared processes
F31 interactive process services Total number of interactive service processes
F32 nactive Total number of actively running service processes
F33 ncallbacks Total number of callbacks
callbacks F34 nanonymous Total number of unknown processes
F35 ngeneric Total number of generic processes

5. Code injections: It extracts hidden code injections found volatility \

in memory dump. --output=json \
6. Connections: It includes total number of connections --output-file=/tmp/tmpaabbccdd11223344 \
including remote connections. pslist \
/path/to/capture.vmem
7. Sockets: Sockets correspond to TCP and UDP sockets
created by the malware while execution. Fig. 3: Sample command sent to Volatility by VolMemLyzer
8. Services and drivers: It extracts total number of services,
kernel drivers, file system drivers, Windows 32 owned pro-
cesses, Windows 32 shared processes, interactive services, and as values. VolMemLyzer then extracts features from the key-
actively running processes. value pair list according to our proposed model. The extracted
9. Callbacks: It includes total number of callbacks, unknown features are then written to a CSV file specified by user-defined
processes, and generic processes. command-line arguments.
Volatility provides several tools or modules that conduct
VII. A NALYSIS AND D ISCUSSION
various types of memory forensics such as listing running pro-
cesses (pslist) or loaded dlls for each process (dlllist). This section presents major findings and discussion of
VolMemLyzer calls these modules via the volatility results.
command-line tool as shown in Fig. 3. Each Volatility module
outputs the result as a table formatted in JSON (enabled by A. Feature Extraction
passing --output=json argument to Volatility), which is We ranked them using Extra Trees Classifier [31] to rank
written to a temporary file allocated by VolMemLyzer. Every the thirty-six features extracted from nine categories presented
time Volatility is invoked, VolMemLyzer parses the table-like in Table II and plotted them in Fig. 4. We computed gini im-
Volatility output to a list of key-value pairs to simplify the portance value of each feature and ranked them in descending
later feature extraction process. The key-value pair uses the order to shortlist the most important features. It is evident that
table header as the keys and their corresponding data cells total number of services (F26), kernel drivers (F27), modules

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.
missing from the init list (F10), Windows 32 shared processes AVAILABILITY
(F30), and generic processes (F35) are the top five features.
Further, it is observed that total number of file system drivers The source code for VolMemLyzer will be made publicly
(F28), Windows 32 owned processes (F29), processes not available in GitHub after publication of this work [30].
found in the psscan (F14), 64-bit processes (F3), interactive
service processes (F31), and unknown processes (F34) are the R EFERENCES
least important features with zero gini importance value. [1] Y. Cheng, X. Fu, X. Du, B. Luo, and M. Guizani, “A lightweight
live memory forensic approach based on hardware virtualization,”
B. Binary and Multi-class Classification Information Sciences, vol. 379, pp. 23 – 41, 2017. [Online]. Available:
The results of ML classification for benign and malware http://www.sciencedirect.com/science/article/pii/S0020025516305011
[2] “Recall forensics,” 2020, =http://www.rekall-forensic.com/.
samples are presented in Table III. Decision tree and random [3] “Volatility,” 2020, =https://github.com/volatilityfoundation/
forest classifiers have 93% true positive rate and 6.6% false volatility.
positive rate. These are followed by k-nearest neighbour with [4] F. Block and A. Dewald, “Linux memory forensics:
Dissecting the user space process heap,” Digital Investigation,
87.2% true positive rate whilst adaboost has only 63.8% true vol. 22, pp. S66 – S75, 2017. [Online]. Available:
positive rate with zero precision and f-measure. http://www.sciencedirect.com/science/article/pii/S1742287617301895
Binary classification of benign and malware samples by all [5] A. Case and G. G. Richard, “Memory forensics: The path
forward,” Digital Investigation, vol. 20, pp. 23 – 33, 2017,
the ML classifiers is plotted in confusion matrices in Fig. special Issue on Volatile Memory Analysis. [Online]. Available:
5. Adaboost successfully classifies only 96 malware samples http://www.sciencedirect.com/science/article/pii/S1742287616301529
while k-nearest neighbour classifies 540 malware samples [6] T. Latzo, R. Palutke, and F. Freiling, “A universal taxonomy
and survey of forensic memory acquisition techniques,” Digital
correctly. Decision tree and random forest classify 651 out of Investigation, vol. 28, pp. 56 – 69, 2019. [Online]. Available:
733 malware samples. All these classifiers correctly identify http://www.sciencedirect.com/science/article/pii/S1742287618304535
1,116 benign samples. [7] R. Palutke, F. Block, P. Reichenberger, and D. Stripeika, “Hiding process
memory via anti-forensic techniques,” Forensic Science International:
Finally, multi-class classification of malware families is Digital Investigation, vol. 33, p. 301012, 2020. [Online]. Available:
plotted in Fig. 6. Adaboost gives the worst results which http://www.sciencedirect.com/science/article/pii/S2666281720302614
does not classify any single sample of BE2, Tigger, Prolaco, [8] F. Block and A. Dewald, “Windows memory forensics: Detecting
(un)intentionally hidden injected code by examining page table entries,”
Laqma and Zeus malware family correctly. Adaboost is biased Digital Investigation, vol. 29, pp. S3 – S12, 2019. [Online]. Available:
towards CoreFlood and Sality malware families. These results http://www.sciencedirect.com/science/article/pii/S1742287619301574
are improved a lot by k-nearest neighbour which correctly [9] D. Uroz and R. J. Rodrı́guez, “Characteristics and detectability of
classifies majority of the malware family samples. Finally, the windows auto-start extensibility points in memory forensics,” Digital
Investigation, vol. 28, pp. S95 – S104, 2019. [Online]. Available:
best results are obtained from decision tree and random forest http://www.sciencedirect.com/science/article/pii/S1742287619300362
classifiers which equally classify 93 malware samples of all [10] N. Lewis, A. Case, A. Ali-Gombe, and G. G. Richard,
malware families correctly. “Memory forensics and the windows subsystem for linux,” Digital
Investigation, vol. 26, pp. S3 – S11, 2018. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S1742287618301944
VIII. C ONCLUSION AND F UTURE W ORKS [11] D. Uroz and R. J. Rodrı́guez, “On challenges in verifying trusted
Memory analysis is gaining prominence owing to its ability executable files in memory forensics,” Forensic Science International:
Digital Investigation, vol. 32, p. 300917, 2020. [Online]. Available:
to capture live malware behavior. There are several tools in http://www.sciencedirect.com/science/article/pii/S2666281720300123
the related works that fetch information from kernel and user [12] A. Case, R. D. Maggio, M. Firoz-Ul-Amin, M. M. Jalalzai,
space to classify malware in the running state. Nevertheless, A. Ali-Gombe, M. Sun, and G. G. Richard, “Hooktracer: Automatic
detection and analysis of keystroke loggers using memory forensics,”
to improve the malware classification process, we developed Computers Security, vol. 96, p. 101872, 2020. [Online]. Available:
VolMemLyzer to extract thirty-six most essential features from http://www.sciencedirect.com/science/article/pii/S0167404820301450
nine different categories in the memory dumps obtained during [13] A. Case, A. K. Das, S.-J. Park, J. R. Ramanujam,
and G. G. Richard, “Gaslight: A comprehensive fuzzing
live malware infection. We also ranked these features using architecture for memory forensics frameworks,” Digital Investigation,
extra trees classifier to shortlist the best features from the vol. 22, pp. S86 – S93, 2017. [Online]. Available:
extracted list of features. Finally, we tested the developed http://www.sciencedirect.com/science/article/pii/S1742287617301986
[14] J. Kang, S. Jang, S. Li, Y.-S. Jeong, and Y. Sung, “Long short-term
tool with 1900 memory dumps, including 1127 benign and memory-based malware classification method for information security,”
773 malware samples. The results are validated with ML Computers and Electrical Engineering, vol. 77, 2019.
classifiers. We performed binary and multi-class malware [15] C. Rathnayaka and A. Jamdagni, “An efficient approach for advanced
malware analysis using memory forensic technique,” 2017 IEEE Trust-
classification to achieve 93% true positive rate with decision com/BigDataSE/ICESS, Sydney, NSW, pp. 1145–1150, 2017.
tree and random forest classifiers when applied on a shortlisted [16] M. Aghaeikheirabady, S. M. R. Farshchi, and H. Shirazi, “A new
set of features. approach to malware detection by comparative analysis of data structures
However, there are some limitations to this work. We in a memory image,” 2014 International Congress on Technology,
Communication and Knowledge (ICTCK), Mashhad, pp. 1–4, 2014.
tested the results on a sampled dataset with seven malware [17] Y. Dai, H. Li, Y. Qian, and X. Lu, “A malware classification method
families and a few benign samples. To improve the malware based on memory dump grayscale image,” Digital Investigation, vol. 27,
classification results, we will generate a new dataset that will pp. 30 – 37, 2018.
[18] C. Yucel and A. Koltuksuz, “Imaging and evaluating the memory access
consist of diverse malware families and a comparatively large for malware,” Forensic Science International: Digital Investigation,
number of benign and malware samples. vol. 32, 2019.

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.
Fig. 4: Feature Ranking

TABLE III
Classifier TP Rate FP Rate Precision Recall F-Measure
AdaBoost 0.638 0.085 0 0.638 0
k-Nearest Neighbour 0.872 0.155 0.866 0.872 0.849
Decision Tree 0.930 0.066 0.930 0.930 0.928
Random Forest 0.930 0.066 0.930 0.930 0.928

(a) AdaBoost (b) k Nearest Neighbour (c) Decision Tree (d) Random Forest
Fig. 5: Confusion Matrix for Binary Classification

(a) AdaBoost (b) k Nearest Neighbour (c) Decision Tree (d) Random Forest
Fig. 6: Confusion Matrix for Classifying Malware Families

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.
[19] H. Li, D. Zhan, T. Liu, and L. Ye, “Using deep-learning-based memory
analysis for malware detection in cloud,” 2019 IEEE 16th International
Conference on Mobile Ad Hoc and Sensor Systems Workshops (MASSW),
Monterey, CA, USA, pp. 1–6, 2019.
[20] N. Nissima, O. Lahava, A. Cohena, and Y. E. L. Rokacha, “Volatile
memory analysis using the minhash method for efficient and secured
detection of malware in private cloud,” Computers & Security, vol. 87,
2019.
[21] M. Ajay Kumara and C. Jaidhar, “Leveraging virtual machine
introspection with memory forensics to detect and characterize
unknown malware using machine learning techniques at hypervisor,”
Digital Investigation, vol. 23, pp. 99 – 123, 2017. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S1742287617303328
[22] Z. Xu, S. Ray, P. Subramanyan, and S. Malik, “Malware detection using
machine learning based analysis of virtual memory access patterns,”
Design, Automation Test in Europe Conference Exhibition (DATE),
Lausanne, pp. 169–174, 2017.
[23] C.-W. Tien, J.-W. Liao, S.-C. Chang, and S.-Y. Kuo, “Memory forensics
using virtual machine introspection for malware analysis,” IEEE Confer-
ence on Dependable and Secure Computing, Taipei, pp. 518–519, 2017.
[24] Y. Kawakoya, M. Iwamura, and M. Itoh, “Memory behavior-based
automatic malware unpacking in stealth debugging environment,” 5th
International Conference on Malicious and Unwanted Software, Nancy,
Lorraine, pp. 39–46, 2010.
[25] I. Sadek, P. Chong, S. U. Rehman, Y. Elovici, and A. Binder, “Memory
snapshot dataset of a compromised host with malware using obfuscation
evasion techniques,” Data in brief, vol. 26, 2019.
[26] A. Lakhotia and P. Black, “Mining malware secrets,” 12th International
Conference on Malicious and Unwanted Software (MALWARE), Fajardo,
pp. 11–18, 2017.
[27] A. Case and G. G. Richard, “Detecting objective-c
malware through memory forensics,” Digital Investigation,
vol. 18, pp. S3 – S10, 2016. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S1742287616300524
[28] “Autopsy,” 2020, =https://github.com/sleuthkit/autopsy.
[29] “Redline,” 2020, =https://github.com/craigwblake/redline.
[30] “Volatility memory analyzer,” 2019. [Online]. Available:
https://github.com/ahlashkari/VolMemLyzer
[31] “Extra trees classifier,” 2020, =https://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.
ExtraTreesClassifier.html.

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 20,2023 at 12:10:40 UTC from IEEE Xplore. Restrictions apply.

CST3510 Coursework 2
No ratings yet
CST3510 Coursework 2
37 pages
Introduction To Memory Forensics
100% (1)
Introduction To Memory Forensics
47 pages
CF Lecture 07-Memory Forensics
100% (2)
CF Lecture 07-Memory Forensics
54 pages
Floorplanning AND Placement 16: Key Terms and Concepts
100% (2)
Floorplanning AND Placement 16: Key Terms and Concepts
32 pages
Spring Answer in Yellow
0% (3)
Spring Answer in Yellow
6 pages
ADVANCED MEMORY FORENSICS PAPER
No ratings yet
ADVANCED MEMORY FORENSICS PAPER
3 pages
Sec535 PR PDF
No ratings yet
Sec535 PR PDF
23 pages
Bilal Unit V Pe Csy R 21
No ratings yet
Bilal Unit V Pe Csy R 21
32 pages
Automated_Malware_Detection_Using_Memory_Forensics
No ratings yet
Automated_Malware_Detection_Using_Memory_Forensics
5 pages
Article 245
No ratings yet
Article 245
12 pages
OS-Independent Malware Detection Applying Machine Learning and Computer Vision in Memory Forensics
No ratings yet
OS-Independent Malware Detection Applying Machine Learning and Computer Vision in Memory Forensics
5 pages
Memory Forensics - RALFKAIROS
No ratings yet
Memory Forensics - RALFKAIROS
40 pages
CXB-3463 Memory Forensics Notes
No ratings yet
CXB-3463 Memory Forensics Notes
2 pages
Automatic Detection and Analysis of Keystroke Loggers Using Memory Forensics
No ratings yet
Automatic Detection and Analysis of Keystroke Loggers Using Memory Forensics
43 pages
Visualizing Indicators of Rootkit Infections in Memory Forensics
No ratings yet
Visualizing Indicators of Rootkit Infections in Memory Forensics
18 pages
Samim-DFMA Lab Workbook
No ratings yet
Samim-DFMA Lab Workbook
48 pages
Lab7.docx
No ratings yet
Lab7.docx
10 pages
Detecting Obfuscated Malware using Memory Feature Engineering
No ratings yet
Detecting Obfuscated Malware using Memory Feature Engineering
12 pages
31 - A Brief Survey of Memory Analysis Tools - Njes PDF
100% (1)
31 - A Brief Survey of Memory Analysis Tools - Njes PDF
8 pages
Forensics Physical Memory
No ratings yet
Forensics Physical Memory
19 pages
Unit 1.3
No ratings yet
Unit 1.3
12 pages
Experiment-2 Digital Forensics AIM-Capture The Memory of Any OS System and Try To Analyse .Mem File On Kali Using Volatility Tool DATE: 06-02-2021
No ratings yet
Experiment-2 Digital Forensics AIM-Capture The Memory of Any OS System and Try To Analyse .Mem File On Kali Using Volatility Tool DATE: 06-02-2021
21 pages
ICS - Unit 5
No ratings yet
ICS - Unit 5
11 pages
Unit 4 Memory Forensics
No ratings yet
Unit 4 Memory Forensics
4 pages
CourseWork2 CST3510 Memory Analysis Notes
No ratings yet
CourseWork2 CST3510 Memory Analysis Notes
4 pages
Finding Digital Evidence
No ratings yet
Finding Digital Evidence
49 pages
Memory Forensics
No ratings yet
Memory Forensics
8 pages
Resources: Common Tool List Memory Acquisition Tools
100% (1)
Resources: Common Tool List Memory Acquisition Tools
12 pages
SANS_W_Copeland_Leveraging_Generative_AI_Memory_Analysis
No ratings yet
SANS_W_Copeland_Leveraging_Generative_AI_Memory_Analysis
23 pages
Memory Dump ICS Lab.
No ratings yet
Memory Dump ICS Lab.
21 pages
04-OS and Multimedia Forensics
100% (1)
04-OS and Multimedia Forensics
189 pages
Acquisition and Analysis Mechanism For Operating Systems
No ratings yet
Acquisition and Analysis Mechanism For Operating Systems
7 pages
CYS 27214 - Tutorial - 2
No ratings yet
CYS 27214 - Tutorial - 2
17 pages
Catch Them Alive: Malware Detection
No ratings yet
Catch Them Alive: Malware Detection
19 pages
SANS Memory Forensics CheatSheet 3.0
No ratings yet
SANS Memory Forensics CheatSheet 3.0
2 pages
CaseRichard-memory-forensics-path
No ratings yet
CaseRichard-memory-forensics-path
11 pages
RAM Forensics
No ratings yet
RAM Forensics
6 pages
Linux Forensics
100% (1)
Linux Forensics
105 pages
Unit-5
No ratings yet
Unit-5
144 pages
Command Is Compulsory, Additional As Specified in Table Total 8 Example
No ratings yet
Command Is Compulsory, Additional As Specified in Table Total 8 Example
8 pages
applsci-12-08604-v2
No ratings yet
applsci-12-08604-v2
21 pages
Topic 9. Mem-Forensics
No ratings yet
Topic 9. Mem-Forensics
17 pages
Machine_Learning_Analysis_of_Memory_Images_for_Process_Characterization
No ratings yet
Machine_Learning_Analysis_of_Memory_Images_for_Process_Characterization
10 pages
Document (2)
No ratings yet
Document (2)
25 pages
Memory Analysis
No ratings yet
Memory Analysis
53 pages
39 - Physical Memory Forensics
No ratings yet
39 - Physical Memory Forensics
53 pages
Memory Forensics Cheat Sheet v1
No ratings yet
Memory Forensics Cheat Sheet v1
2 pages
Memory forensic
No ratings yet
Memory forensic
5 pages
Learning Objectives of Memory Analysis: SEPTEMBER 27, 2020
No ratings yet
Learning Objectives of Memory Analysis: SEPTEMBER 27, 2020
14 pages
projectt
No ratings yet
projectt
6 pages
Towards A Generic Approach For Memory Forensics
No ratings yet
Towards A Generic Approach For Memory Forensics
5 pages
DFOR510 Week14 MemoryAnalysis
No ratings yet
DFOR510 Week14 MemoryAnalysis
15 pages
MCA Cyber Security Concepts and Practices 14
No ratings yet
MCA Cyber Security Concepts and Practices 14
9 pages
memory acquisition in forensic
No ratings yet
memory acquisition in forensic
2 pages
Yash Darole DF 5
No ratings yet
Yash Darole DF 5
7 pages
Memory Forensics
No ratings yet
Memory Forensics
61 pages
Automated in Memory Malware Rootkit Detection Via Binary Analysis and Machine Learning (Slides)
No ratings yet
Automated in Memory Malware Rootkit Detection Via Binary Analysis and Machine Learning (Slides)
111 pages
Study On Live Analysis of Windows Physical Memory: Divyang Rahevar
No ratings yet
Study On Live Analysis of Windows Physical Memory: Divyang Rahevar
5 pages
DF011G08 Memory Forensic
No ratings yet
DF011G08 Memory Forensic
10 pages
Penetration Testing Fundamentals-2: Penetration Testing Study Guide To Breaking Into Systems
From Everand
Penetration Testing Fundamentals-2: Penetration Testing Study Guide To Breaking Into Systems
Devi Prasad
No ratings yet
Certified Ethical Hacker (CEH v12) Exam Preparation
From Everand
Certified Ethical Hacker (CEH v12) Exam Preparation
Georgio Daccache
No ratings yet
Kali Linux, Ethical Hacking And Pen Testing For Beginners
From Everand
Kali Linux, Ethical Hacking And Pen Testing For Beginners
BHARAT NISHAD
No ratings yet
Mikrotik With 3cx Pabx
No ratings yet
Mikrotik With 3cx Pabx
3 pages
Electric Machines EEE241 LAB Report#1: Name Registration Number Teacher Date of Submission
No ratings yet
Electric Machines EEE241 LAB Report#1: Name Registration Number Teacher Date of Submission
7 pages
Mongo DB Installation Guide
No ratings yet
Mongo DB Installation Guide
9 pages
06-Cisco ISE CLI
No ratings yet
06-Cisco ISE CLI
2 pages
PLC Introduction
No ratings yet
PLC Introduction
14 pages
ST QP
No ratings yet
ST QP
1 page
Oled LCD 24 Pin
No ratings yet
Oled LCD 24 Pin
35 pages
Bits Pilani: Digital Systems (Combinational Circuit) Rktiwary Mano
No ratings yet
Bits Pilani: Digital Systems (Combinational Circuit) Rktiwary Mano
12 pages
Technical Manual iTNC530 English
95% (20)
Technical Manual iTNC530 English
1,827 pages
CPX-FB11 Quick Guide V A3
No ratings yet
CPX-FB11 Quick Guide V A3
2 pages
Adafruit Pm25 Air Quality Sensor
No ratings yet
Adafruit Pm25 Air Quality Sensor
15 pages
Basic ICT Skills Class 10 MCQ: (A) Turn On
No ratings yet
Basic ICT Skills Class 10 MCQ: (A) Turn On
7 pages
A Beginner's Guide To OFDM: How To Choose Between Analog Hardware and Digital Signal Processing Software
No ratings yet
A Beginner's Guide To OFDM: How To Choose Between Analog Hardware and Digital Signal Processing Software
12 pages
Manual para Referencia PDF
No ratings yet
Manual para Referencia PDF
38 pages
EEE 304 Experiment No. 04 Name of The Experiment: Design of Decoder/Encoder/Multiplexer Circuit
No ratings yet
EEE 304 Experiment No. 04 Name of The Experiment: Design of Decoder/Encoder/Multiplexer Circuit
5 pages
Quiz 0n Electrical Circuits
No ratings yet
Quiz 0n Electrical Circuits
3 pages
Codeforcoder CSE-408 MCQ-1
No ratings yet
Codeforcoder CSE-408 MCQ-1
18 pages
Documentation
No ratings yet
Documentation
45 pages
File System File Systems
No ratings yet
File System File Systems
75 pages
Monitoring in Grid
No ratings yet
Monitoring in Grid
8 pages
SK106C14801 20210308 - SLEG - SF-G Simple Instruction V1.03-03
No ratings yet
SK106C14801 20210308 - SLEG - SF-G Simple Instruction V1.03-03
4 pages
BTech Lab Manual Autocad
No ratings yet
BTech Lab Manual Autocad
43 pages
Nu Il Citi Ca Il Citesti Degeaba
No ratings yet
Nu Il Citi Ca Il Citesti Degeaba
8 pages
Compiler Assembler Linker
100% (1)
Compiler Assembler Linker
15 pages
Penawaran E-Katalog Usg Sogata
No ratings yet
Penawaran E-Katalog Usg Sogata
2 pages
908 Thermocouple
No ratings yet
908 Thermocouple
5 pages
Booting Into Rescue Mode
No ratings yet
Booting Into Rescue Mode
7 pages
MS in CS (Information Security) +bachelor in Computer Science & Engineering +work Exp
No ratings yet
MS in CS (Information Security) +bachelor in Computer Science & Engineering +work Exp
1 page

VolMemLyzer Volatile Memory Analyzer for Malware Classification Using Feature Engineering

Uploaded by

VolMemLyzer Volatile Memory Analyzer for Malware Classification Using Feature Engineering

Uploaded by

VolMemLyzer: Volatile Memory Analyzer for

Malware Classification using Feature Engineering

different attacks and is used with other malware to steal data

B. Step 2: Volatile Memory Analysis

5. Code injections: It extracts hidden code injections found volatility \

You might also like