0% found this document useful (0 votes)

10 views12 pages

Behavior-based_features_model_for_malware_detectio

The document presents a behavior-based features model for malware detection, highlighting the limitations of signature-based techniques in identifying new malware variants. It proposes a model that utilizes API calls to capture and analyze the behaviors of malware in a controlled environment, demonstrating high accuracy through various classification algorithms. The study emphasizes the importance of dynamic analysis and the extraction of semantic features to improve malware detection efficiency.

Uploaded by

guithread1432

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views12 pages

Behavior-based_features_model_for_malware_detectio

Uploaded by

guithread1432

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/277667123

Behavior-based features model for malware detection

Article in Journal of Computer Virology and Hacking Techniques · June 2015

DOI: 10.1007/s11416-015-0244-0

CITATIONS READS

150 6,361

1 author:

Hisham S. Galal
Concordia University
12 PUBLICATIONS 434 CITATIONS

SEE PROFILE

All content following this page was uploaded by Hisham S. Galal on 15 April 2018.

The user has requested enhancement of the downloaded file.

Behavior-based features model for malware
detection

Hisham Shehata Galal, Yousef Bassyouni

Mahdy & Mohammed Ali Atiea

Journal of Computer Virology and

Hacking Techniques

ISSN 2274-2042

J Comput Virol Hack Tech

DOI 10.1007/s11416-015-0244-0

1 23
Your article is protected by copyright and
all rights are held exclusively by Springer-
Verlag France. This e-offprint is for personal
use only and shall not be self-archived
in electronic repositories. If you wish to
self-archive your article, please use the
accepted manuscript version for posting on
your own website. You may further deposit
the accepted manuscript version in any
repository, provided it is only made publicly
available 12 months after official publication
or later and provided acknowledgement is
given to the original source of publication
and a link is inserted to the published article
on Springer's website. The link must be
accompanied by the following text: "The final
publication is available at link.springer.com”.

1 23
Author's personal copy
J Comput Virol Hack Tech
DOI 10.1007/s11416-015-0244-0

ORIGINAL PAPER

Behavior-based features model for malware detection

Hisham Shehata Galal1 · Yousef Bassyouni Mahdy1 · Mohammed Ali Atiea1

Received: 12 December 2014 / Accepted: 26 May 2015

Abstract The sharing of malicious code libraries and tech- 1 Introduction

niques over the Internet has vastly increased the release of
new malware variants in an unprecedented rate. Malware Malware is a term generalized from two words: malicious
variants share similar behaviors yet they have different syn- and software. It is a piece of code that deliberately fulfills
tactic structure due to the incorporation of many obfuscation the harmful intent of an attacker. Malware is considered the
and code change techniques such as polymorphism and meta- root cause of many Internet security problems [1]. Typi-
morphism. The different structure of malware variants poses cally, it appears in various forms [2] such as Virus, Worm,
a serious problem to signature-based detection technique, Trojan Horse, and Bots. Unlike in the past, the creation
yet their similar exhibited behaviors and actions can be a of a malware today is relatively an easy task due to the
remarkable feature to detect them by behavior-based tech- availability of malicious code libraries and techniques over
niques. Malware instances also largely depend on API calls the Internet. Malware author can slightly modify existing
provided by the operating system to achieve their malicious malicious code and release the new variant into the wild.
tasks. Therefore, behavior-based detection techniques that Consequently, the ever-increasing release rate of malware
utilize API calls are promising for the detection of mal- variants puts a high stress on Anti-Virus (AV) vendors. Mal-
ware variants. In this paper, we propose a behavior-based ware authors also know that AV vendors analyze thousands
features model that describes malicious action exhibited by of malware instances submitted to them before they can dis-
malware instance. To extract the proposed model, we first per- tribute patches to their AV software which gives the malware
form dynamic analysis on a relatively recent malware dataset a window of time to infect and spread flawlessly.
inside a controlled virtual environment and capture traces of AV vendors receive thousands of new potentially mal-
API calls invoked by malware instances. The traces are then ware samples every day. These samples come from users
generalized into high-level features we refer to as actions. who found a suspicious program in their systems, and by
We assessed the viability of actions by various classification organizations that use honeypots to capture wild malware
algorithms such as decision tree, random forests, and support samples [3]. In order to effectively analyze thousands of mal-
vector machine. The experimental results demonstrate that ware samples every day, AV vendors use a tool that utilizes
the classifiers attain high accuracy and satisfactory results in a detection model based on machine learning techniques [4].
the detection of malware variants. This tool is responsible for filtering the submitted malware
samples such that only unknown ones are sent to malware
B Hisham Shehata Galal analyst for signature extraction.
[email protected]
Yousef Bassyouni Mahdy
[email protected]
1.1 Malware detection techniques
Mohammed Ali Atiea
[email protected]
There are two malware detection techniques: signature-based
1 Faculty of Computers and Information, Assiut University, and behavior-based [5]. Currently, the signature-based detec-
Assiut, Egypt tion techniques are the tool of choice in combat against

123
Author's personal copy
H. S. Galal et al.

malware used by AV software. Of course, modern AV also as Disassemblers and Executable analyzers. Static analysis
depends on an heuristic engine component to detect unknown has many advantages. First, it is considered a safe analysis
malware instances based on a set of rules [6]. A signature- method since malware is not executed and there is a less
based detection technique matches a previously generated chance of infecting the analysis machine. Second, the disas-
set of signatures against the suspicious samples. A signa- sembling of malware provides information about all possible
ture is a sequence of bytes at specific locations within the execution paths might be taken by the malware. However, a
executable, a regular expression, a hash value of binary packed malware sample is a quite challenge for static analy-
data, or any other formats created by malware analyst which sis method since it requires a high experience and skills to
should accurately identify malware instances. This approach figure out the unpacking routine in order to extract the real
has at least three major drawbacks [7]. First, the signatures payload [11].
are commonly created by a human; this is an error-prone On the other hand, dynamic analysis [9] is considered
task and can lead to creating a signature that falsely alarm an active method. It involves executing the malware and
a benign program. Second, being dependent on previously monitoring the actions and impacts on the system. Unlike
analyzed malware signatures inherently prevents the detec- the static analysis, it provides information about the only
tion of unknown malware for which no signatures yet exist. running execution path. The malware sample is analyzed
Finally, malware samples use obfuscation techniques such within a controlled environment such as Virtual Machines
as packing, polymorphism, and metamorphism [8] to evade (VM) [9]. Dynamic analysis has a considerable time over-
signature-based detection methods due to the sensitive nature head when compared to the static analysis time. Additionally,
of signatures to the smallest changes in malware binary the increased usage of VM in dynamic analysis inspired mal-
images. ware authors to incorporate additional code to detect the VM
On the contrary, behavior-based techniques assume mal- presence [12]. Hence, once the malware sample detects VM
ware can be detected by observing the malicious behaviors presence, it can either infect host machine by exploiting vul-
exhibited by malware during runtime [9]. They do not nerabilities found in VM or changes their execution path to
suffer the limitations found in signature-based techniques turn into a passive process without any malicious impact on
since they make their decision after observing malware system [12].
actions rather than looking for previously known signatures. In this paper, we aim to provide a behavior-based features
Therefore, they are effective in the detection of malware model used by the AV filtering tool to cope with the increas-
variants which share similar behaviors, yet have a differ- ing release rate of malware variants. The proposed model
ent structure. Nonetheless, they suffer from a high false describes the malicious actions exhibited by malware during
positive rate where a benign program is falsely classified runtime. It is extracted by performing dynamic analysis on
as a malicious program. Additionally, they are evaded by a relatively recent malware dataset. We employ API hook-
mimicry attacks which involve reconstructing the malicious ing library [13] to log information about API calls and their
behavior to appear as legitimate behavior. For example, a parameters, however, we further process these API calls into
Trojan could inject its code into a web-browser application, a set of sequences that share a common semantic purpose.
such that it gains the privileges of the web-browser, hence After that, the sequences are analyzed by a set of heuristic
its communication with the attacker can bypass through functions to infer a representative semantic feature which we
the firewall successfully since it appears to be from the refer to as actions.
web-browser application rather than from another suspect The contributions of the paper are as follow:
process.
– We provide a new processing approach on the raw infor-
mation gathered by API call hooking and produce a set of
actions representing the malicious behaviors exhibited.
1.2 Malware analysis techniques
– We demonstrate the semantic value provided by actions
and their insight to help malware analyst.
The signature-based and behavior-based detection tech-
– We assess actions as a feasible features model and employ
niques depend on a variety of malware analysis techniques.
various classification techniques to evaluate its accuracy.
Malware analysis is the art of dissecting malware to under-
stand how it works, how to identify it, and how to defeat The rest of this paper is organized as follows. Sec-
or eliminate it. While malware appears in many different tion 2 provides a review about the related techniques to
forms, three common techniques exist for malware analysis extract features for malware detection while we describe the
such as static analysis, dynamic analysis, and hybrid analysis process of extracting actions in Section 3. In Section 4, we
[10]. evaluate the efficiency and value of actions by various exper-
Static analysis is a passive method. In a sense, malware iments. Finally, conclusions and limitations are presented in
sample is not executed, but it is inspected using tools such Section 5.

123
Author's personal copy
Behavior-based features model for malware detection

2 Related work ing graphs into binary feature vectors and trained on them
various classification techniques.
In this section, We cover research efforts that claim to detect
malware variants. We group the research techniques into
three groups. Namely, they are statistical-based, graph-based, 2.3 Structural-based techniques
and structural-based.
Eskandari et al. [5] presented a novel approach that uti-
2.1 Statistical-based techniques lizes machine learning techniques with taking advantages
of hybrid analysis methods in order to improve the accuracy
Wong and Stamp [8] proposed a technique to detect meta- of malware analysis procedure while preserving its speed at
morphic malware based on hidden Markov Model (HMM) a point reasonable. They called their approach as HDMAna-
and provided a benchmark used in other studies as Canfora lyzer that stands for a Hybrid Analyzer based on Data Mining
et al. [14], Kalbhor et al. [15], Lin and Stamp [16], Musale techniques. They used dynamic analysis to extract API call
et al. [17], Shanmugam et al. [18] on metamorphic mal- sequences during the execution time of malware sample,
ware. They analyzed the similarity degree of metamorphism meanwhile, they used static analysis to extract Enriched
produced by different malware generators such as G2, MPC- Control Flow Graph (ECFG) which incorporates informa-
GEN, NGVCK and VCL32 by training HMM on the opcode tion about API calls. After the extraction of features, they
sequences of the metamorphic malware samples. used a matching engine that combines features obtained by
Annachhatre et al. [19] have used (HMM) analysis to dynamic analysis with corresponding ECFG, and then, each
detect certain challenging classes of malware. In their conditional jump receives a label according to the dynamic
research, they considered the related problem of malware information. At this point, a machine learning algorithm is
variants classification based on HMMs. More than 8,000 employed to build a learning model with the labeled nodes
malware variants are then scored against these models and of ECFG. This learning model is used by HDM-Analyzer at
separated into clusters based on the resulting scores. They scanning time for analyzing unknown executable files.
observed that the clustering results could be used to group the Islam et al. [23] proposed classification approached based
malware samples into their appropriate families with good on features extracted from static analysis and dynamic analy-
accuracy. sis. During the static analysis, they extracted Function Length
Faruki et al. [20] used API call-gram to detect malware. Frequency (FLF) and Printable String Information (PSI) vec-
API call-gram captures the sequence in which API calls are tors. The FLF feature is based on counting the number of
made in a program. First a call graph is generated from functions in different length ranges or bins. They derived a
the disassembled instructions of a binary program. This call vector interpretation of an executable file based on the num-
graph is converted to call-gram. The call-gram becomes the ber of bins chosen and where the function lengths lie across
input to a pattern matching engine. the bins. In PSI, they extracted all printable strings from mal-
ware samples to create a global list of strings. Then, they
2.2 Graph-based techniques reported for each sample, the count of distinct strings, fol-
lowed by a binary report on the presence of each string in
Park et al. [21] proposed a method to construct a com- the global list, where a 1 represents the fact that the string is
mon behavioral graph representing the execution behavior present and a 0 that it is not. On the other hand, during the
of a family of malware instances. The method generates one dynamic analysis they extracted API features such as API
common behavioral graph by clustering a set of individual function names and parameters from the log files, then they
behavioral graphs, which represent kernel objects and their again construct a global list and use a binary vector where
attributes based on system call traces. The resulting common a 1 represents an API function in the global list is called by
behavioral graph has a common path, called HotPath, which sample, otherwise they set a 0.
is observed in all the malware instances in the same fam-
ily. The derived common behavioral graph is highly scalable
regardless of new instances added. It is also robust against
system call attacks. 3 The actions model
Eskandari and Hashemi [22] proposed a technique that
uses Control Flow Graph (CFG) to represent the control In this section, we demonstrate the dynamic analysis tech-
structure and the semantic aspects of a malware sample. nique to extract the actions model as outlined in Fig. 1. Each
The extracted CFG is annotated by API calls only rather malware sample will pass through three stages: API extrac-
than assembly instructions. This new representation model tion, Sequence extraction, and Action extractions, which we
is referred to as API-CFG. Finally, they converted the result- discuss below.

123
Author's personal copy
H. S. Galal et al.

Fig. 2 An example of API-Trace output where the bold values indicate

API handles
Fig. 1 Dynamic analysis stages to extract actions

3.1 API extraction

The dynamic analysis method of intercepting API calls is

known as API hooking that involves manipulating a process
such that when a specific, (i.e., to be hooked), API function is
called, the execution will be redirected to another code. This
code can record API call information to a log file, analyze
its parameters or modify them. After that, it may redirect the
execution back to the original API code. API hooking intro-
Fig. 3 Sequences of API calls extracted from API-Trace
duces a performance penalty in the system. Therefore, only
the most frequent API functions imported by malware should
be hooked, we refer to them as API-Set. It contains API
functions from six categories: Process and Threads, Mem- 3.2 Sequence extraction
ory Management, Files, Registry, Network, and Internet.
The interception of API call information from a sample After the extraction of the API-trace, we identify data-flow
is carried out inside a virtual machine (VM) with 32-bit dependencies among the intercepted API calls to create
Microsoft Windows seven professional Operating System sequences of API calls. For this purpose, we divide the API
(OS). An API function can be called by the main module functions into the following four categories:
of a process as well as the mapped system modules. It has
to be noted, that we only capture API calls invoked by the 1. Handle-creation category consists of API functions that
main module only. Additionally, we also capture API calls open or create an object in Microsoft Windows OS such
invoked by the child processes created by the sample. The as CreateFile. The return of these API functions is
following tasks are carried out during this stage: the Handle that is used to identify the object.
2. Handle-dependent category consists of API functions
1. Copy sample to the virtual machine guest operating sys- that operate on objects and require Handle value as input
tem. parameter such as WriteFile.
2. Install hooks on API-Set and execute the sample. 3. Handle-release category consists of API functions that
3. Terminate the sample and its associated child processes free resources associated with the obtained object when
after five minutes. there are no more operations to be performed such as
4. Output a list of the intercepted API calls along with para- CloseHandle.
meters values. 4. Simple category consists of API functions that do not
5. Rollback the virtual machine to a clean snapshot require Handle such as CopyFile.

The output is referred to as API-Trace, which is a log file We use the handle value to identify data-dependence
with each line formatted as (l, a, r, p1 , . . . pn ) where l is the among API calls of the top three categories while API calls
line number, a is the API function name, r is the return value of the last category will be represented individually into sep-
of API, and pi is the ith parameter’s value. Figure 2 shows arate sequences as shown in Fig. 3. We refer to the collection
an example of the output produced by this stage. of all extracted sequences as Sequence-Trace.

123
Author's personal copy
Behavior-based features model for malware detection

3.3 Actions extraction 4 Experimental results

To extract actions from sequence-traces, we use a set of In this section, we describe the dataset used during experi-
heuristic functions that infer unique actions. The heuristic ments and actions extraction. Then, we present an evaluation
function selects a sequence of API calls based on their API- of the proposed features model performance and provide
category. In Table 1, we list some of the actions produced by insights gained from the experiments. Finally, we compare
heuristic functions. Actions consist of fields with a semantic our work with a recent technique discussed in related work.
value that describe the behavior of a sample. The collection
of actions for a given sample is referred to as Action-Trace, 4.1 Dataset
where each action is formatted as (N , Fi = Vi , ..) such
that N is the action name, Fi and Vi are the ith field and In this research, we have separate datasets for malware and
value, respectively. For example, the output of this stage after benign samples. The malware dataset has a diverse number
processing the extracted sequence-trace above is shown in of malware families for different malware types, and each
Fig. 4. family is represented by an equal number of different vari-
The unique value of the proposed actions model is the ants.
high-level insight it gives to malware analyst compared to We downloaded 9993 samples from VirusSign [24] in the
other techniques based on n-grams such as API n-grams [20] period from October 14, 2013 to March 2, 2014. However,
or techniques based on CFG such as API-CFG proposed in the obtained samples are labeled by MD5 hash values which
[22]. This insight is very helpful for AV malware analyst do not provide any information about their malware family.
when it comes to filtering thousands of submitted samples. We scanned the samples by AV software to identify their
It gives a brief report on the actions exhibited by the sample malware family. Then, we selected 2000 samples for a 50
during its runtime. Moreover, malware analysts can design different malware family such that each malware family is
new heuristic functions based on their expertise to infer addi- represented by 40 malware variants. A partial list of mal-
tional complex actions. ware families along with their type is given in Table 2. On

Table 1 Partial listing of action

Action Field Description
examples
Process Creation Name Name of the created process
IsDropped Set to true if the process image is created by the
running sample
IsModified Set to true if the created process is hooked or its
memory is modified
Process Termination Name Name of the terminated process
IsSystem Set to true if the terminated process is located inside
Windows directory
IsSecurity Set to true if the killed process is a security program
such as anti-virus or firewalls
Remote Threads Name Name of the process with injected remote threads
IsInjected Set to true if the remote thread is used to load DLL
IsDropped Set to true if the injected DLL is created by the
running sample
Registry Value Path of the registry value accessed by the sample
Data The data set in or retrieved from registry value
Operation Type of operation either as Set or as Get
File 1 Directory Path of the directory containing the file
Type For example, Executable, DLL, Binary, and Text
Mode Access mode such as Open and Create
Operation Such as Write, Read, etc
File 2 Source Path of the source directory
Destination Path of the destination directory
Type For example, Executable, DLL, Binary, and Text
Operation Such as Copy, Move, Delete, Rename etc

123
Author's personal copy
H. S. Galal et al.

Fig. 4 Action-Trace output produced by heuristic functions

Table 2 Part of malware families in dataset 3. False positive (FP) is the number of predicted benign
Type Malware Family samples incorrectly classified as malicious.
4. False negative (FN) is the number of predicted malware
Backdoor Androm Bifrose DarkKomet incorrectly classified as benign.
Hupigon Kelihos Zegost
Trojan Buzus Graftor Sirefef
These terms are used to define four performance compari-
Urusay Vundo ZBot
son criteria between DT, RF, and SVM. The first criterion is
Virus Alman Chir Elkern
Sensitivity which measures the proportion of actual positives,
Jadtre Neshta Sality (malware samples), which are correctly identified.
Worm Ratab Allaple Darkbot
Fesber Mydoom Vofbus TP
Sensitivit y = (1)
T P + FN

The second criterion is Specificity which measures the pro-

the other hand, the benign dataset contains 2000 sample rep-
portion of negatives, (benign samples), which are correctly
resented by executables found in the installation directory
identified.
of Microsoft Windows 7 operating system, Microsoft Office,
common web-browsers, and other utilities. TN
Speci f icit y = (2)
T N + FP
4.2 Performance metric
The third criterion is Accuracy which measures the overall
accuracy to classify malware and benign samples correctly.
In the experiments, we employ three supervised classifica-
tion algorithms Random Forest (RF) [25], Support Vector
TP +TN
Machine (SVM) [26], and Decision Tree (DT) [27] to eval- Accuracy = (3)
uate the proposed features model. T N + T P + FN + FP
A decision tree is a widely used inductive learning method.
The fourth criterion is Area under the ROC curve or com-
It is often used for approximating discrete-valued functions.
monly referred to as (AUC). The ROC curve is a graphical
A decision tree is a rooted tree with internal nodes corre-
plot that illustrates the performance of a binary classifier sys-
sponding to attributes and the leaf nodes corresponding to
tem as its discrimination threshold is varied. The curve is
class labels.
created by plotting the true positive rate (sensitivity) against
A random forest is an ensemble learning method, that
the false positive rate (1-specificity) at various threshold set-
operate by constructing a multitude of decision trees at train-
tings. An AUC of 1.0 indicates ideal separation (i.e., there
ing time and outputting the class that receives highest votes
exists a threshold which divides the samples between the two
from them.
classes correctly), while an AUC of 0.5 represents a worthless
A support vector machine constructs a hyperplane or set
classifier.
of hyperplanes in a high-dimensional space. Intuitively, a
good separation is achieved by the hyperplane that has the
largest distance to the nearest training data point of any class 4.3 Model evaluation
(so-called functional margin).
For performance comparison between the different clas- To evaluate actions as features for malware classification, we
sifiers, we define the following terms: extract actions for all samples in benign and malware dataset
and accumulate them into the global Action-List. After that,
we represent every sample as a binary vector that has the
1. True positive (TP) is the number of predicted malware same length as the action-list, such that the binary feature f i
samples correctly classified as malicious. is set to 1, if the sample has exhibited the action i , otherwise,
2. True negative (TN) is the number of predicted benign f i is set to 0. This vector will serve as the input to various
samples correctly classified as benign. classification algorithms.

123
Author's personal copy
Behavior-based features model for malware detection

Table 3 Classifiers results for the proposed features model and SVM classifiers. First, we loaded the binary vectors of
Algorithm Sensitivity Specificity Accuracy AUC all samples into a data table. Then, we filtered the features
by selecting only the features with information gain above
DT 97.3 % 96.53 % 97.19 % 97.65 % a certain threshold. The threshold value is obtained after
RF 97.19 % 96.35 % 96.84 % 99.48 % carrying out several experiments to achieve highest classi-
SVM 92.28 % 96.35 % 93.98 % 98.55 % fication accuracy, we used 0.03 as a threshold value. The
feature selection resulted in a reduced data that we fed to
DT, RF, and SVM classifiers. The defaults parameters for
classifiers are used without changes. The 10-cross validation
technique is used to validate the accuracy achieved by clas-
sifiers. After training and testing classifiers, we report the
evaluation results in Table 3, and the ROC curves for each
classifier is shown in Fig. 5.
During the experiments, we investigated the reason for
false positives and false negatives. An example of false pos-
itive is the Virtual-CD setup program. It had many actions
similar to those exhibited by malware samples. For exam-
ple, some of these actions were installing a service driver in
a way similar to rootkit behaviors, in addition to setting an
auto-start extensibility point in the registry.
On the other hand, one of the IRCBot malware samples
could escape the detection due to inherent drawbacks of
dynamic analysis. Precisely, It did not show its malicious
actions due to the detection of virtual machine artifacts.
We have implemented the technique in [19] with 50 hid-
den Markov Model (HMM) for each malware family in our
Fig. 5 ROC curves of RF, SVM, and DT classifiers malware dataset. The opcode sequences are extracted using
ObjDump tool. We used the following parameters, the num-
ber of states N = 2, and the number of different symbols
We used Orange machine learning toolbox [28] to train
S = 827, (i.e, number of different opcodes), the number of
and test classifiers. It contains tools for data pre-processing,
iterations = 800, and likeli hood = 0.001. We used a vec-
classification, regression, clustering, association rules, and
tor that hold scores from all HMM to be used as features. We
visualization. Figure 6 outlines the workflow between the
have noticed that, since we trained HMM only on malware
different widgets employed to train and test the DT, RF,

Fig. 6 Work-flow between widgets of Orange toolbox to train and test RF, SVM, and DT and perform ROC analysis

123
Author's personal copy
H. S. Galal et al.

Table 4 Classification results for HMM based features model [19] partial behavior of malware sample. In other words, it is not
Algorithm Sensitivity Specificity Accuracy AUC suitable for malware samples that depend on external events
such as receiving a remote command or depending on spe-
DT 97.6 % 97.63 % 96.89 % 97.72 % cific time to trigger their malicious actions. Additionally, the
RF 97.1 % 96.3 % 96.14 % 97.68 % dynamic analysis fails against malware sample that checks
SVM 93.17 % 95.45 % 94.8 % 95.55 % for the existence of virtual machine artifacts [12] and become
a passive process or simply terminate itself resulting in clean
actions. Finally, the extraction technique is not fully auto-
families, we found that benign samples have low scores on matic since it requires malware analyst support to create the
all models while malware samples have at least one model set of heuristic functions.
that produced a high score. The evaluation results are given
in Table 4.
While the results of this state-of-the-art technique and our 5.2 Future work
proposed features model are comparable in terms of perfor-
mance, there are some key advantages of using actions. First, The future work on the actions includes refining them by
actions provide helpful semantic insight of malware behav- utilizing data flow dependence, which should provide more
ior to malware researcher. Second, the proposed technique to insight into malware behavior. Additionally, the tool which
extract actions is extensible as it relies on a set of heuristic extracts API call information can be modified to work outside
functions which can be improved by the technical expertise the virtual machine environment; this approach provides a
of malware researcher. This leads to extracting complex and more robust solution against malware samples that check for
new actions exhibited by evolved malware samples. Finally, the existence of mentioned tool.
actions are easier to understand by non-experts than statisti-
cal HMM as indicated previously in Fig 4.
References

1. Fossi, M., Egan, G., Haley, K., Johnson, E., Mack, T., Adams,
5 Conclusion T., Blackbird, J., Low, M.K., Mazurek, D., McKinney, D., et al.:
Symantec internet security threat report trends for 2010, vol. 16
(2011)
In this paper, we proposed a behavior-based features model to 2. Gennari, J., French, D.: Defining malware families based on ana-
describe the malicious actions exhibited by malware during lyst insights. In: Technologies for Homeland Security (HST), 2011
the runtime. The proposed features model is referred to as IEEE International Conference on IEEE, pp. 396–401 (2011)
actions; it is created by performing dynamic analysis over a 3. Mairh, A., Barik, D., Verma, K., Jena, D.: Honeypot in network
security: a survey. In: Proceedings of the 2011 International Con-
relatively recent malware dataset. In dynamic analysis, we ference on Communication, Computing & Security ACM, pp.
used API hooking technique to trace API calls invoked by 600–605 (2011)
malware sample, then we further process the traced API calls 4. Kiemt, H., Thuy, N.T., Quang, T.M.N.: A machine learning
into groups of semantically dependent API calls. The API approach to anti-virus system (artificial intelligence i). IPSJ SIG
Notes. ICS 2004(125), 61–65 (2004)
sequences are further processed by a set of heuristic functions 5. Eskandari, M., Khorshidpour, Z., Hashemi, S.: Hdm-analyser: a
that extract the actions. hybrid analysis approach based on data mining techniques for mal-
The unique value of actions is the high-level insight it ware detection. J. Comput. Virol. Hacking Tech. 9(2), 77–93 (2013)
gives to malware analyst. More precisely, actions describe 6. Kaspersky. Heuristic analysis in anti-virus. http://support.
kaspersky.com/8641 (2013). Accessed in 1 April 2015
the malicious behaviors of a sample with a better semantic 7. Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for mal-
value than API n-gram based techniques. Additionally, malware detection. In: Twenty-third annual IEEE Computer security
ware analysts can utilize their technical knowledge by adding applications conference, 2007. ACSAC 2007, pp. 421–430 (2007)
more heuristic functions to extract additional actions. Dur- 8. Wong, W., Stamp, M.: Hunting for metamorphic engines. J. Com-
put. Virol. 2(3), 211–229 (2006)
ing the experiments, we assessed the actions as features for 9. Egele, M., Scholte, T., Kirda, E., Kruegel, C.: A survey on
malware and benign programs classification. Based on the automated dynamic malware-analysis techniques and tools. ACM
experimental results, the classifiers achieved high classifica- Comput. Surv. (CSUR) 44(2), 6 (2012)
tion accuracy rates. 10. Sikorski, M., Honig, A.: Practical malware analysis: the hands-on
guide to dissecting malicious software. No Starch Press (2012)
11. Cesare, S., Xiang, Y., Zhou, Wanlei: Malwise&# x2014; an effec-
5.1 Limitations tive and efficient classification system for packed and polymorphic
malware. IEEE Trans. Comput. 62(6), 1193–1206 (2013)
12. Lindorfer, M., Kolbitsch, C., Comparetti, P.M.: Detecting
The technique used to extract the actions has some limitations environment-sensitive malware. In: Recent Advances in Intrusion
inherent from dynamic analysis methods, which observe a Detection, pp. 338–357. Springer (2011)

123
Author's personal copy
Behavior-based features model for malware detection

13. Nektra Advanced Computing. Deviare api hook. http://www. 21. Park, Y., Reeves, D.S., Stamp, M.: Deriving common malware
nektra.com/products/deviare-api-hook-windows/ (2015). Acces behavior through graph clustering. Comput. Secur. 39, 419–430
sed in 1 April 2015 (2013)
14. Canfora, G.: Antonio Niccolò Iannaccone, and Corrado Aaron Vis- 22. Eskandari, M., Hashemi, Sattar: A graph mining approach for
aggio. Static analysis for the detection of metamorphic computer detecting unknown malwares. J. Vis. Lang. Comput. 23(3), 154–
viruses using repeated-instructions counting heuristics. J. Comput. 162 (2012)
Virol. Hacking Tech. 10(1), 11–27 (2014) 23. Islam, R., Tian, R., Batten, L.M., Versteeg, S.: Classification of
15. Kalbhor, A., Austin, T.H., Filiol, E., Josse, S., Mark, S.: Dueling malware based on integrated static and dynamic features. J. Netw.
hidden markov models for virus analysis. J. Comput. Virol. Hack- Comput. Appl. 36(2), 646–656 (2013)
ing Tech. 11, 1–16 (2014) 24. VirusSign. Malware research and data center. http://www.
16. Lin, D., Stamp, M.: Hunting for undetectable metamorphic viruses. VirusSign.com (2015). Accessed in 1 April 2015
J. Comput. Virol. 7(3), 201–214 (2011) 25. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
17. Musale, M., Austin, T.H., Stamp, M.: Hunting for metamorphic 26. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn.
javascript malware. J. Comput. Virol. Hacking Tech. 1–14 (2014) 20(3), 273–297 (1995)
18. Shanmugam, G., Low, R.M., Stamp, M.: Simple substitution dis- 27. Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier
tance and metamorphic detection. J. Comput. Virol. Hacking Tech. methodology (1990)
9(3), 159–170 (2013) 28. Demšar, J., Curk, T., Erjavec, A., Gorup, Č., Hočevar, T., Miluti-
19. Annachhatre, C., Austin, T.H., Stamp, M.: Hidden markov models novič, M., Možina, M., Polajnar, M., Toplak, M., Starič, A.,
for malware classification. J. Comput. Virol. Hacking Tech. 1–15 Štajdohar, M., Umek, L., Žagar, L., Žbontar, J., Žitnik, M., Zupan,
(2014) B.: Orange: Data mining toolbox in python. J. Mach. Learn. Res.
20. Faruki, P., Laxmi, V., Gaur, M.S., Vinod, P.: Mining control flow 14, 2349–2353 (2013)
graph as api call-grams to detect portable executable malware. In
Proceedings of the Fifth International Conference on Security of
Information and Networks ACM, pp. 130–137 (2012)

123

View publication stats

Development of Malware Detection and Analysis Mode
No ratings yet
Development of Malware Detection and Analysis Mode
50 pages
document_malware
No ratings yet
document_malware
9 pages
Malware Detection Using Machine Learning and Deep Learning
No ratings yet
Malware Detection Using Machine Learning and Deep Learning
10 pages
Mini Project
No ratings yet
Mini Project
11 pages
Effective Malware Detection Based On Behaviour and Data Features
No ratings yet
Effective Malware Detection Based On Behaviour and Data Features
16 pages
A Behavior-Based Approach For Malware Detection: Rayan Mosli, Rui Li, Bo Yuan, Yin Pan
No ratings yet
A Behavior-Based Approach For Malware Detection: Rayan Mosli, Rui Li, Bo Yuan, Yin Pan
16 pages
Penetration Testing Fundamentals-2: Penetration Testing Study Guide To Breaking Into Systems
From Everand
Penetration Testing Fundamentals-2: Penetration Testing Study Guide To Breaking Into Systems
Devi Prasad
No ratings yet
A Behavior-Based Approach For Malware Detection
No ratings yet
A Behavior-Based Approach For Malware Detection
15 pages
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
No ratings yet
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
9 pages
Malware Application Detection Using Machine Learning
No ratings yet
Malware Application Detection Using Machine Learning
8 pages
Building A Malware Detection System Based On A Mac
No ratings yet
Building A Malware Detection System Based On A Mac
6 pages
Analyzing and comparing the effectiveness of malware detection_ A study of machine learning approaches - ScienceDirect
No ratings yet
Analyzing and comparing the effectiveness of malware detection_ A study of machine learning approaches - ScienceDirect
39 pages
Malcode Detection
No ratings yet
Malcode Detection
5 pages
Malware Detection Using Machine Leaning
No ratings yet
Malware Detection Using Machine Leaning
9 pages
Malware Detection With LSTM Using Opcode Language
100% (1)
Malware Detection With LSTM Using Opcode Language
7 pages
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
No ratings yet
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
8 pages
Final Research - Merged
No ratings yet
Final Research - Merged
10 pages
Comparison of Malware Classification Methods Using Convolutional Neural Network Based On Api Call Stream
No ratings yet
Comparison of Malware Classification Methods Using Convolutional Neural Network Based On Api Call Stream
19 pages
Malware_Detection_Using_Machine_Learning (1)
No ratings yet
Malware_Detection_Using_Machine_Learning (1)
4 pages
Malware Detection Using Machine Learning
No ratings yet
Malware Detection Using Machine Learning
11 pages
Malware Detection Research Paper Updated Soheb6
No ratings yet
Malware Detection Research Paper Updated Soheb6
6 pages
15709-Article Text-55876-2-10-20220114
No ratings yet
15709-Article Text-55876-2-10-20220114
26 pages
malware_detection_research_paper_updated Soheb6
No ratings yet
malware_detection_research_paper_updated Soheb6
8 pages
Paper 2 179999913001 INDJCSE22-13-05-109
No ratings yet
Paper 2 179999913001 INDJCSE22-13-05-109
14 pages
An Empirical Study of Commercial Antivirus Software Effectiveness
No ratings yet
An Empirical Study of Commercial Antivirus Software Effectiveness
11 pages
Ijcna 2021 o 56
No ratings yet
Ijcna 2021 o 56
18 pages
Chapter One 1.1 Background of The Study
No ratings yet
Chapter One 1.1 Background of The Study
40 pages
Behavioral Malware Classification Using Convolutional Recurrent Neural Networks
No ratings yet
Behavioral Malware Classification Using Convolutional Recurrent Neural Networks
9 pages
The State-of-the-Art in AI-Based Malware Detection Techniques: A Review
No ratings yet
The State-of-the-Art in AI-Based Malware Detection Techniques: A Review
18 pages
Malware Detection and Classification Based On Graph Convolutional Networks and Function Call Graphs
No ratings yet
Malware Detection and Classification Based On Graph Convolutional Networks and Function Call Graphs
11 pages
synopsis1
No ratings yet
synopsis1
7 pages
Malware Detection Using ANN
No ratings yet
Malware Detection Using ANN
10 pages
PublishedChapter
No ratings yet
PublishedChapter
12 pages
Electronics 11 03665 v2
No ratings yet
Electronics 11 03665 v2
20 pages
Reasearch 1
No ratings yet
Reasearch 1
18 pages
Intelligent Behavior-Based Malware Detection System On Cloud Computing Environment
No ratings yet
Intelligent Behavior-Based Malware Detection System On Cloud Computing Environment
20 pages
A novel ensemble-based approach for Windows malware detection
No ratings yet
A novel ensemble-based approach for Windows malware detection
10 pages
1742747318200
No ratings yet
1742747318200
37 pages
pdf3
No ratings yet
pdf3
9 pages
1-s2.0-S1363412709000041-main
No ratings yet
1-s2.0-S1363412709000041-main
14 pages
A Comprehensive Survey on Identification of Malware Types and Malware Classification Using Machine Learning Techniques
No ratings yet
A Comprehensive Survey on Identification of Malware Types and Malware Classification Using Machine Learning Techniques
8 pages
Automated Classification and Analysis of Internet Malware: (Mibailey, Jonojono, Janderse, Zmao, Farnam) @umich - Edu
No ratings yet
Automated Classification and Analysis of Internet Malware: (Mibailey, Jonojono, Janderse, Zmao, Farnam) @umich - Edu
20 pages
Environment-Reactive Malware Behavior: Detection and Categorization
No ratings yet
Environment-Reactive Malware Behavior: Detection and Categorization
16 pages
Survey Paper of Group 7
No ratings yet
Survey Paper of Group 7
9 pages
Capturing_Malware_Behaviour_with_Ontology-based_Knowledge_Graphs
No ratings yet
Capturing_Malware_Behaviour_with_Ontology-based_Knowledge_Graphs
8 pages
IET Information Security - 2020 - Ghouti - Malware Classification Using Compact Image Features and Multiclass Support
No ratings yet
IET Information Security - 2020 - Ghouti - Malware Classification Using Compact Image Features and Multiclass Support
11 pages
A Malicious Code Detection Method Based on Stacked Depthwise Separable Convolutions and Attention Mechanism
No ratings yet
A Malicious Code Detection Method Based on Stacked Depthwise Separable Convolutions and Attention Mechanism
27 pages
Analysis of Cyber Security Threats Using
No ratings yet
Analysis of Cyber Security Threats Using
5 pages
The rise of machine learning for detection and classification of malware_ Research developments, trends and challenges - ScienceDirect
No ratings yet
The rise of machine learning for detection and classification of malware_ Research developments, trends and challenges - ScienceDirect
75 pages
A Review of Deep Learning Based Malware Detection Techniques
No ratings yet
A Review of Deep Learning Based Malware Detection Techniques
19 pages
IEEE - Behavioral-Based Classification and Identification of Ransomware Variants Using Machine Learning
No ratings yet
IEEE - Behavioral-Based Classification and Identification of Ransomware Variants Using Machine Learning
5 pages
MUSHKAN REPORT
No ratings yet
MUSHKAN REPORT
67 pages
Malware Survey IJNSA
No ratings yet
Malware Survey IJNSA
22 pages
IJETT-V73I1P132
No ratings yet
IJETT-V73I1P132
15 pages
Phase 1 Report Group ID CSE19-G58 Malware Detection Using ML
No ratings yet
Phase 1 Report Group ID CSE19-G58 Malware Detection Using ML
30 pages
Research Paper 2 Malware Detection
No ratings yet
Research Paper 2 Malware Detection
24 pages
Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model
No ratings yet
Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model
16 pages
im_2007
No ratings yet
im_2007
48 pages
6thsemminiproject
No ratings yet
6thsemminiproject
12 pages
API_MD
No ratings yet
API_MD
13 pages
Chapter 3 Networks, Internet and Internet Protocols
No ratings yet
Chapter 3 Networks, Internet and Internet Protocols
13 pages
The Onion Router - Overview PT 1
100% (1)
The Onion Router - Overview PT 1
28 pages
Paper On Chinese Censorship
No ratings yet
Paper On Chinese Censorship
18 pages
Peneshd Manyuskrep (Tsek Table of Contents)
No ratings yet
Peneshd Manyuskrep (Tsek Table of Contents)
49 pages
NetSDK - C# Programming Manual (Field Surveillance Unit)
No ratings yet
NetSDK - C# Programming Manual (Field Surveillance Unit)
36 pages
Microsoft Lead2pass Az-900 PDF Download 2020-Oct-26 by Louis 79q Vce
No ratings yet
Microsoft Lead2pass Az-900 PDF Download 2020-Oct-26 by Louis 79q Vce
6 pages
Com.war.Loader Logcat
No ratings yet
Com.war.Loader Logcat
2 pages
Exploring Research Worksheet
No ratings yet
Exploring Research Worksheet
2 pages
FJD Civil E-Filing User Manual
No ratings yet
FJD Civil E-Filing User Manual
80 pages
Safeguarding in School
100% (1)
Safeguarding in School
19 pages
FSP 150CC-GE-V1.0 Course - 5 - Ethernet Services
No ratings yet
FSP 150CC-GE-V1.0 Course - 5 - Ethernet Services
106 pages
Python Telegram Bot
100% (1)
Python Telegram Bot
788 pages
Online Safety, Security, Ethics, and Etiquette
No ratings yet
Online Safety, Security, Ethics, and Etiquette
36 pages
Flashback 5 User Guide
No ratings yet
Flashback 5 User Guide
177 pages
INFO620 Project - Part 1
No ratings yet
INFO620 Project - Part 1
6 pages
Unit 3, Pharmaceutical Microbiology, B Pharmacy 3rd Sem, Carewell Pharma
No ratings yet
Unit 3, Pharmaceutical Microbiology, B Pharmacy 3rd Sem, Carewell Pharma
22 pages
Cookies
No ratings yet
Cookies
57 pages
Synopsis On NCC Management System
No ratings yet
Synopsis On NCC Management System
6 pages
Full Stack Development (R20a0516)
No ratings yet
Full Stack Development (R20a0516)
131 pages
Adobe Dreamweaver Developer Toolbox Help
0% (1)
Adobe Dreamweaver Developer Toolbox Help
345 pages
Semindia Adsl 2+ Modem/Router
No ratings yet
Semindia Adsl 2+ Modem/Router
51 pages
By: Atty. Jerome Francisco Ll. Antonis II Atty. Eleaza Marie P. Encisa
No ratings yet
By: Atty. Jerome Francisco Ll. Antonis II Atty. Eleaza Marie P. Encisa
30 pages
Module 7_ Docker Networking Deep Dive
No ratings yet
Module 7_ Docker Networking Deep Dive
24 pages
Disini Vs Sec. of Justice
No ratings yet
Disini Vs Sec. of Justice
15 pages
Codeks Virtual Card EN Instructions For ADMINISTRATORS
No ratings yet
Codeks Virtual Card EN Instructions For ADMINISTRATORS
14 pages
New LL Acknowledgement
No ratings yet
New LL Acknowledgement
1 page
Mediant Software SBC Users Manual Ver 72
No ratings yet
Mediant Software SBC Users Manual Ver 72
1,394 pages
pf6k_urcap_manual
No ratings yet
pf6k_urcap_manual
15 pages
IP-PP-1
No ratings yet
IP-PP-1
9 pages
Www,Firewall
No ratings yet
Www,Firewall
40 pages

Behavior-based_features_model_for_malware_detectio

Uploaded by

Behavior-based_features_model_for_malware_detectio

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Behavior-based features model for malware detection

Article in Journal of Computer Virology and Hacking Techniques · June 2015

The user has requested enhancement of the downloaded file.

Hisham Shehata Galal, Yousef Bassyouni

Journal of Computer Virology and

J Comput Virol Hack Tech

Behavior-based features model for malware detection

Received: 12 December 2014 / Accepted: 26 May 2015

Abstract The sharing of malicious code libraries and tech- 1 Introduction

Fig. 2 An example of API-Trace output where the bold values indicate

3.1 API extraction

The dynamic analysis method of intercepting API calls is

3.3 Actions extraction 4 Experimental results

Table 1 Partial listing of action

Fig. 4 Action-Trace output produced by heuristic functions

The second criterion is Specificity which measures the pro-

View publication stats

You might also like