0% found this document useful (0 votes)
518 views

Applying Data Mining Techniques To Digital Forensics: Abstract

Uploaded by

Tahir Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
518 views

Applying Data Mining Techniques To Digital Forensics: Abstract

Uploaded by

Tahir Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Applying Data Mining Techniques to Digital forensics

Abstract:
Digital forensics is the process of identifying, preserving, extracting and documenting the
digital evidence. Forensics investigations become very complex while dealing with large volume of
data. As a result, more time is devoured with construction of minimum outputs and results. By using
data mining techniques along with digital forensics process are useful to surpass this issue. This paper
introduces a framework for applying data mining techniques with digital forensics process. The
results after implementation of the model are also shown.

Keywords: Digital Forensics, Digital Evidence, KMT, Apriori-gen, Data Mining.


1 Introduction
Digital forensics also known as computer forensics is the process of identifying, collecting,
analysing and examining the digital evidence along with the preservation of information and
maintaining the integrity of the evidence. The main goal of digital forensics is the preservation of
electronic evidence in original form along with the investigation process. An investigation in digital
forensics has four phases which are collection, examination, analysis and presentation. In collection
phase, the digital evidence is collected from suspect's device. It may include seizure of devices
present at the scene of crime and also the devices that may contain potential digital evidence with the
maximum care to avoid any contamination to the evidence. The second phase is examination phase,
where the data from evidence containers is extracted and then examined. It also includes the
extraction of relevant pieces of information and recovery of any deleted files and folders. In analysis
phase, the extracted data in then analysed in detail to draw conclusions and results. The presentation
phase is the final phase where the forensics analyst prepares and presents detailed reports of the
investigation outcomes from analysis phase. Figure 1.1 shows digital forensics investigation process:

In today's era, every organization has huge amount of data which includes employee details, contracts,
business information and other sensitive information. All these details are stored in some kind of
database either relational or non-relational referred to as no sql. In digital forensics, the hard part is to
get the data from these large databases for analysis. The forensics analyst, law enforcement and
intelligence agencies faces an extreme difficult challenge to analyse large volumes of data collected
from organizational database involved in crimes and terrorism. A suitable technique to tackle the issue
is Data mining. Data mining is a scientific method of extracting the meaningful information from
existing databases. By using data mining techniques we can identify the different patterns of related
data and also construct relationships among data to perform analysis. This comes handful in analysis
phase of digital forensics as it saves time and makes work easier for forensics analyst. We can also
generate crime related patterns through data mining to foresee any crimes in the future. Just like
digital forensics, data mining also has five general phases data collection, data storage and
management, data access and operations, and data presentation.

In general, digital forensics software only assist forensics analyst during investigations. To solve a
crime, it needs a lot more than assistance where Data mining techniques for digital forensics becomes
useful. By using data mining, not only a large amount of data is analysed to produce crime related
patterns and co-relations, investigation time and complexity is also reduced which is a basic need for
digital forensics.
The rest of the paper is organized as following: Section 2 gives the details of some proposed data
mining frameworks for digital forensics. Section 3 provides the brief details of Data mining
techniques. Section 4 presents our framework for Digital forensics with data mining techniques.
Section 5 concludes this paper.

2 Review of Related Work


In this section, various solutions and frameworks proposed for digital forensics through data
mining are analyzed. In 2014, Chrysoula Tsochataridou, Avi Arampatzis, and Vasilios Katos et al
[1] proposed a mechanism for digital forensics based on data mining clustering techniques. Weka, a
collection of data mining algorithms was used for performing operations on data. The whole scheme
was to first store all the collected data in My SQL database and then perform different operations on
data by using weka. Operations include data pre-processing, classification, regression, clustering and
visualization. In 2016, Ashwin kumar Malwadkar and Prof. Sonali Patil et al [2] proposed a system
for digital forensics comprised of computer forensics tools and data mining techniques. Data mining
algorithm based on Apriori Algorithm was proposed for working with data gathered by computer
forensics tools to find crime patterns, co-relations and associations between data items. In 2012, K. K.
Sindhu, and B. B. Meshram et al [3] proposed a tool for digital forensics to find motive, cyber crime
patterns and also frequency of a specific attack happened over a certain period of time. The proposed
system is a combination of digital forensics investigation and crime data mining techniques. Data
gathered from different areas e.g. network traffic, file system analysis, log file analysis was analysed
by data mining algorithm. In 2018, Raburu George, Omollo Richard, and Okumu Daniel et al [4]
proposed different data mining techniques for digital forensics to extract data from extremely large
databases. The main focus was merging data mining techniques with digital forensics to extract digital
evidence and reduce complexity. In 2015, Prof Sonal Honale , and Jayshree Borkar presented a
framework for live digital forensics using data mining. The data mining algorithms used are K-Means
and Apriori Algorithm for determining the cyber attacks that occurs and counting number of times
any specific attack occur during system working time. Various system tools are also used which are
win cap, jp cap and wmic. A brief summary of the literature is discussed in table below:

Method Approach Novel Contribution Tools / Parameters Limitations/ Drawbacks

[1] Model A model combined with WEKA tool and electronic It works well for small data
Based data mining techniques mail messages are used as sets but memory can
Approach and system tools unstructured data overflow in case if data
volume is so large

[2] System A system combined with Apriori's Algorithm for Apriori Algorithm can
Based computer forensics tools data mining is used. become slow while
and data mining Operations are performed processing large item sets to
Approach algorithm on data gathered by generate candidates and
computer forensics tools. relations among them

[3] Tool Based A tool combined with Digital forensics tools and Non standard algorithm is
digital forensics cyber crime mining used Limited to only
Approach exploration and cyber techniques applied to data network related data.
crime data mining
techniques
[4] Techniques Suggest a way to No tools are used. Just a Non standard model based
Based implement data mining literature review of certain diagram which shows the
Approach techniques data mining techniques. point where data mining
should be applied

[5] Priority Knowledge management Knowledge management Sometimes, it is not what its
Model theory is used to give theory with data mining look like,So as it is priority
Based priority to evidence techniques. Parameters based, it may become useless
Approach containers to speed up were seized devices that in such case
investigations contains electronic
evidence

[6] Framework Framework for live Tools used are wincap, Apriori algorithm can be
Based digital forensics with K- jpcap, wmic, K-means and slow for large item sets.
Approach Means and Apriori Apriori algorithm.
Algorithm during system Parameters used are
running time. network traffic packets and
log files

[7] Monitoring Network traffic forensics Network monitoring tool Layer based approach, so if
Model model based on data and data mining techniques the failure occurs at one layer
Based mining techniques to for analysing data to then process will not
Approach monitor network traffic in conclude results continue to work.
real time

[8] Experiment Digital forensics system Data mining techniques to Data requires fine tuning to
al Based with data mining perform experiments. get better results.
Approach techniques

We analyzed the basic requirements for data mining techniques and model proposed by Kumar. Ash
win et al [2]. We then implemented their proposed model on certain item sets. For implementation of
Apriori Algorithm, the programming language used is Node JS due t its flexibility and scalability. We
identified some data for experimentation and then item sets are created on basis of that data. We
implemented it at low level with certain number of item sets. The implementation of Apriori
algorithm was done by using built-in module which takes item sets as transactions and after
processing it gave us the results in form of most frequent items with their frequency. The results of
implementation of model proposed [2] on certain item sets are shown below:
Figure 2.1

Figure 2.2

Figure 2.3

Note that, the above implementation is low level and deals with specific number of item sets.
However, one can increase or decrease the number of item sets according to their will. Also the size
of item sets can also vary like an item set can be small or large and can contain whole strings or just
words. One of the core issue in model is that we have to identify item sets manually from data for
Apriori Algorithm. This identification and creation of item sets takes much more time. Converting
data from different sources and storing it in database is also a complex task. Nowadays, lots of
databases are introduced and used by different organizations. Some use traditional relational
databases, while other use No SQL due to their flexibility and scalability. So which database to
choose is also an issue.

3 Data mining Techniques


Data mining is a scientific method of constructing useful information from raw data stored in
databases. Data mining techniques such as clustering and classification can help us to identify and
track crime patterns and thus speed up the investigation process. Digital forensics is pivotal in
prosecuting a cyber criminal. Data mining is scientific field to construct intriguing structures of data
from pre-stored data in a certain format. The structures of data are basically patterns, statistical or
graphical representation of data. Data mining is a juvenile field in context of criminal and intelligence
analysis. Cyber crime data mining techniques that are helpful for solving crime are clustering,
association, deviation detection, classification, and string operator techniques. Below is a diagram
that shows data mining techniques specific for cyber crimes:
4 Proposed System
4.1 Overview

Our proposed system is a combination of data mining techniques with digital forensics
techniques. The main aim of our work is to get the data ready for analysis and producing the outputs
and results. As discussed above, by using data mining techniques, we can analyze huge amount of
data. In our proposed model, we can find certain patterns, most frequent words, positive words,
negative words and neutral words from any raw data. The working of our model is described in
following steps:

1 Data acquisition using digital forensics techniques

2 Conversion or transfer of data into textual, pdf or csv files

3 Apply data mining techniques on textual, pdf or csv files for analysis

4 Querying the analyzed data to gather further improved results

5 Showing results in form of graphs

In our proposed model, there is no need to store the forensically gathered data into the data base. We
store the data in textual files or csv files. This eliminates the use of database. We then apply data
mining techniques to whole files either textual, pdf or csv. Applying data mining techniques to whole
documents eliminates the need of identification and creation of item sets manually. This saves a lot of
time and makes work less complex for forensics analyst.

4.2 Implementation and Results

We implemented our proposed model in R language, which is one the best language for data
mining with a lot of functions and libraries that makes work easy for us. As for experimentation,
example data is taken and stored in multiple textual files. We then applied text mining techniques on
those files and results are produced. The figure below shows the results.

F
igure 4.2.1

In above figure, the results are shown in concentric circles with most frequent terms in the middle and
less frequent terms on ends as we go out. However, we can perform much more complex
computations. We have performed experiments on 3 documents but it is not limited to any number. In
reality, we can take this number to thousands of documents.

5 Conclusion:
Digital forensics is the process of identifying, preserving, extracting and presenting the digital
evidence. The digital evidence is a data that is gathered from suspects or criminal devices. These
devices can range from small to large including mobile, PDA's, laptops, IPods, computers and other
electronic devices. Meanwhile, there can very huge amount of data and it becomes very difficult for
investigators to analyse the data. Due to this, forensics investigations can be very time consuming and
complex. To overcome this issue, we proposed and implemented our new model for digital forensics
with application of data mining techniques. By using data mining techniques along with digital
forensics techniques can help forensics analyst and intelligence agencies to reduce the work
complexity and conclude results in mean time.

6 References
[1] Tsochataridou. Chrisoula, Arampatzis. Avi, Katos. Vasilios "Improving Digital Forensics
Through Data Mining. IMMM 2014 : The Fourth International Conference on Advances in
Information Mining and Management"
[2] Malwadkar . Ashwinkumar, Patil. Sonali "Data mining Techniques for Digital Forensic
Analysis. International Journal on Recent and Innovation Trends in Computing and
CommunicationISSN:2321-8169Volume:4Issue:3"

[3] Sindhu. K, Meshram. B. B. "Digital Forensics and Cyber Crime Data mining. Journal of
Information Security, 2012, 3, 196-201 http://dx.doi.org/10.4236/jis.2012.33024 Published
Online July 2012 (http://www.SciRP.org/journal/jis)"

[4] Raburu George, Omollo Richard, Okumu Daniel "Applying Data Mining Principles in
the Extraction of Digital Evidence. International Journal of Computer Science and Mobile
Computing .A Monthly Journal of Computer Science and Information Technology ISSN
2320–088X IMPACT FACTOR: 6.017 IJCSMC, Vol. 7, Issue. 3, March 2018, pg.101 –
109"

[5] Rosamaria Bertè1, Fabio Marturana, Gianluigi Me1, Simone Tacconi, "Data Mining
based Crime-Dependent Triage in Digital Forensics Analysis. 2012 International Conference
on Affective Computing and Intelligent Interaction Lecture Notes in Information
Technology, Vol.10 "

[6] Honale. Sonal , Jayshree Borkar "Framework for Live Digital Forensics using Data
Mining. International Journal of Computer Trends and Technology (IJCTT) – volume 22
Number 3–April 2015"

[7] Peng Cheng , Hui Qu , Training "A digital forensic model based on data mining.
International Conference on Information Sciences, Machinery, Materials and Energy
(ICISMME 2015)"

[8] Antonio J. Tallón-Ballesteros, José C. Riquelme "Data Mining Methods Applied to a


Digital Forensics Task for Supervised Machine Learning. Computational Intelligence in
Digital Forensics: Forensic Investigation and Applications, Studies in Computational
Intelligence 555, DOI: 10.1007/978-3-319-05885-6_17, © Springer International Publishing
Switzerland 2014"

[9] S. Umar , A. Praveen , S. Gouse, N. Deepthi "Imminent accession of Artificial


Intelligence based Forensic Exploratory with Data Mining Analysis. International Journal of
Computer Sciences and Engineering Open Access Review Paper
Volume-5, Issue-3 E-ISSN: 2347-2693 "

You might also like