Applying Data Mining Techniques To Digital Forensics: Abstract
Applying Data Mining Techniques To Digital Forensics: Abstract
Abstract:
Digital forensics is the process of identifying, preserving, extracting and documenting the
digital evidence. Forensics investigations become very complex while dealing with large volume of
data. As a result, more time is devoured with construction of minimum outputs and results. By using
data mining techniques along with digital forensics process are useful to surpass this issue. This paper
introduces a framework for applying data mining techniques with digital forensics process. The
results after implementation of the model are also shown.
In today's era, every organization has huge amount of data which includes employee details, contracts,
business information and other sensitive information. All these details are stored in some kind of
database either relational or non-relational referred to as no sql. In digital forensics, the hard part is to
get the data from these large databases for analysis. The forensics analyst, law enforcement and
intelligence agencies faces an extreme difficult challenge to analyse large volumes of data collected
from organizational database involved in crimes and terrorism. A suitable technique to tackle the issue
is Data mining. Data mining is a scientific method of extracting the meaningful information from
existing databases. By using data mining techniques we can identify the different patterns of related
data and also construct relationships among data to perform analysis. This comes handful in analysis
phase of digital forensics as it saves time and makes work easier for forensics analyst. We can also
generate crime related patterns through data mining to foresee any crimes in the future. Just like
digital forensics, data mining also has five general phases data collection, data storage and
management, data access and operations, and data presentation.
In general, digital forensics software only assist forensics analyst during investigations. To solve a
crime, it needs a lot more than assistance where Data mining techniques for digital forensics becomes
useful. By using data mining, not only a large amount of data is analysed to produce crime related
patterns and co-relations, investigation time and complexity is also reduced which is a basic need for
digital forensics.
The rest of the paper is organized as following: Section 2 gives the details of some proposed data
mining frameworks for digital forensics. Section 3 provides the brief details of Data mining
techniques. Section 4 presents our framework for Digital forensics with data mining techniques.
Section 5 concludes this paper.
[1] Model A model combined with WEKA tool and electronic It works well for small data
Based data mining techniques mail messages are used as sets but memory can
Approach and system tools unstructured data overflow in case if data
volume is so large
[2] System A system combined with Apriori's Algorithm for Apriori Algorithm can
Based computer forensics tools data mining is used. become slow while
and data mining Operations are performed processing large item sets to
Approach algorithm on data gathered by generate candidates and
computer forensics tools. relations among them
[3] Tool Based A tool combined with Digital forensics tools and Non standard algorithm is
digital forensics cyber crime mining used Limited to only
Approach exploration and cyber techniques applied to data network related data.
crime data mining
techniques
[4] Techniques Suggest a way to No tools are used. Just a Non standard model based
Based implement data mining literature review of certain diagram which shows the
Approach techniques data mining techniques. point where data mining
should be applied
[5] Priority Knowledge management Knowledge management Sometimes, it is not what its
Model theory is used to give theory with data mining look like,So as it is priority
Based priority to evidence techniques. Parameters based, it may become useless
Approach containers to speed up were seized devices that in such case
investigations contains electronic
evidence
[6] Framework Framework for live Tools used are wincap, Apriori algorithm can be
Based digital forensics with K- jpcap, wmic, K-means and slow for large item sets.
Approach Means and Apriori Apriori algorithm.
Algorithm during system Parameters used are
running time. network traffic packets and
log files
[7] Monitoring Network traffic forensics Network monitoring tool Layer based approach, so if
Model model based on data and data mining techniques the failure occurs at one layer
Based mining techniques to for analysing data to then process will not
Approach monitor network traffic in conclude results continue to work.
real time
[8] Experiment Digital forensics system Data mining techniques to Data requires fine tuning to
al Based with data mining perform experiments. get better results.
Approach techniques
We analyzed the basic requirements for data mining techniques and model proposed by Kumar. Ash
win et al [2]. We then implemented their proposed model on certain item sets. For implementation of
Apriori Algorithm, the programming language used is Node JS due t its flexibility and scalability. We
identified some data for experimentation and then item sets are created on basis of that data. We
implemented it at low level with certain number of item sets. The implementation of Apriori
algorithm was done by using built-in module which takes item sets as transactions and after
processing it gave us the results in form of most frequent items with their frequency. The results of
implementation of model proposed [2] on certain item sets are shown below:
Figure 2.1
Figure 2.2
Figure 2.3
Note that, the above implementation is low level and deals with specific number of item sets.
However, one can increase or decrease the number of item sets according to their will. Also the size
of item sets can also vary like an item set can be small or large and can contain whole strings or just
words. One of the core issue in model is that we have to identify item sets manually from data for
Apriori Algorithm. This identification and creation of item sets takes much more time. Converting
data from different sources and storing it in database is also a complex task. Nowadays, lots of
databases are introduced and used by different organizations. Some use traditional relational
databases, while other use No SQL due to their flexibility and scalability. So which database to
choose is also an issue.
Our proposed system is a combination of data mining techniques with digital forensics
techniques. The main aim of our work is to get the data ready for analysis and producing the outputs
and results. As discussed above, by using data mining techniques, we can analyze huge amount of
data. In our proposed model, we can find certain patterns, most frequent words, positive words,
negative words and neutral words from any raw data. The working of our model is described in
following steps:
3 Apply data mining techniques on textual, pdf or csv files for analysis
In our proposed model, there is no need to store the forensically gathered data into the data base. We
store the data in textual files or csv files. This eliminates the use of database. We then apply data
mining techniques to whole files either textual, pdf or csv. Applying data mining techniques to whole
documents eliminates the need of identification and creation of item sets manually. This saves a lot of
time and makes work less complex for forensics analyst.
We implemented our proposed model in R language, which is one the best language for data
mining with a lot of functions and libraries that makes work easy for us. As for experimentation,
example data is taken and stored in multiple textual files. We then applied text mining techniques on
those files and results are produced. The figure below shows the results.
F
igure 4.2.1
In above figure, the results are shown in concentric circles with most frequent terms in the middle and
less frequent terms on ends as we go out. However, we can perform much more complex
computations. We have performed experiments on 3 documents but it is not limited to any number. In
reality, we can take this number to thousands of documents.
5 Conclusion:
Digital forensics is the process of identifying, preserving, extracting and presenting the digital
evidence. The digital evidence is a data that is gathered from suspects or criminal devices. These
devices can range from small to large including mobile, PDA's, laptops, IPods, computers and other
electronic devices. Meanwhile, there can very huge amount of data and it becomes very difficult for
investigators to analyse the data. Due to this, forensics investigations can be very time consuming and
complex. To overcome this issue, we proposed and implemented our new model for digital forensics
with application of data mining techniques. By using data mining techniques along with digital
forensics techniques can help forensics analyst and intelligence agencies to reduce the work
complexity and conclude results in mean time.
6 References
[1] Tsochataridou. Chrisoula, Arampatzis. Avi, Katos. Vasilios "Improving Digital Forensics
Through Data Mining. IMMM 2014 : The Fourth International Conference on Advances in
Information Mining and Management"
[2] Malwadkar . Ashwinkumar, Patil. Sonali "Data mining Techniques for Digital Forensic
Analysis. International Journal on Recent and Innovation Trends in Computing and
CommunicationISSN:2321-8169Volume:4Issue:3"
[3] Sindhu. K, Meshram. B. B. "Digital Forensics and Cyber Crime Data mining. Journal of
Information Security, 2012, 3, 196-201 http://dx.doi.org/10.4236/jis.2012.33024 Published
Online July 2012 (http://www.SciRP.org/journal/jis)"
[4] Raburu George, Omollo Richard, Okumu Daniel "Applying Data Mining Principles in
the Extraction of Digital Evidence. International Journal of Computer Science and Mobile
Computing .A Monthly Journal of Computer Science and Information Technology ISSN
2320–088X IMPACT FACTOR: 6.017 IJCSMC, Vol. 7, Issue. 3, March 2018, pg.101 –
109"
[5] Rosamaria Bertè1, Fabio Marturana, Gianluigi Me1, Simone Tacconi, "Data Mining
based Crime-Dependent Triage in Digital Forensics Analysis. 2012 International Conference
on Affective Computing and Intelligent Interaction Lecture Notes in Information
Technology, Vol.10 "
[6] Honale. Sonal , Jayshree Borkar "Framework for Live Digital Forensics using Data
Mining. International Journal of Computer Trends and Technology (IJCTT) – volume 22
Number 3–April 2015"
[7] Peng Cheng , Hui Qu , Training "A digital forensic model based on data mining.
International Conference on Information Sciences, Machinery, Materials and Energy
(ICISMME 2015)"