Chapter 12
Chapter 12
The advancement in technology has created a great deal of information. The data
volumes are increasing day by day. Big data means data with large volumes, types,
formats and variety. Artificial Intelligence (AI) has made it easy to process Big Data.
This chapter deals with forensics challenges an investigator face due to the big data and
how AI and Blockchain can help to perform digital forensics efficiently.
1.1. Big Data
Big data refers to data sets that are too large or complex to be dealt with by
traditional data-processing application software. Big data simply have greater volumes,
greater variety and greater velocity. Greater Volume means greater amount of data with
low densities, more precisely an unstructured data. This can be data of unknown value
such as Facebook data feeds etc. Sometimes such data can be tens of terabytes of
hundreds of petabytes. Greater Velocity means data transfer at fast rate. In this new
technology era, packet speed is of great interest. Internet enabled smart devices usually
work in real time or near real time and consequently it requires real time investigation
too. Greater Variety refers to different formats of data. Data can be structured or
unstructured. Structured data is highly specific and is stored in a predefined format,
whereas unstructured data is a collection of many varied types of data that are stored in
their native formats such as text, audio, video etc are native formats and require
additional preprocessing.
The concept of big data is relatively new. In 2005, with the emergence of Facebook,
YouTube, Orkut etc, the researchers realized that the data generated by the users is
increasing day by day. Users are still generating huge amount of data, but its just not
humans who are doing this. With the advent of Internet of Things (IoT), devices are
becoming interconnected over an internet, consequently the data is increasing. The
emergence of machine learning has also created more data.
The big data is huge. Although the new technologies have developed keeping in mind the
data storage requirement as the data volumes are doubling and tripling with very coming
year, organization still face challenges to effectively store data. Similarly in Digital
Forensic field this huge data is hard to analyze and thus creates lots of challenges for the
investigator. Thus, Big data can also be defined in terms of veracity. Veracity means the
quality and accuracy of the data. Data volumes can bring risks and excessive costs if
proper investment is not made for big data veracity. For digital forensics veracity is a
standard that must be maintained. There is one more V that is important in defining the
big data. The 5th V is Value. The Value of data refers to the usefulness of gathered data.
Volume
Value Velocity
Big
Data
Veracity Variety
2. Data classification
Data classification means classifying the unstructured data according to its format
and type. Such classification can help digital forensic investigator to analyse the
data on same grounds which is much easier than analyzing the data evidence that
is of different formats.
3. IoT Forensics
The IoT creates a complex environment and comprises of network forensics,
device forensics and cloud forensics. The forensic process is very time taking
hence AI can help to speed up the process for IoT Forensics.
4. Cyber Forensics
Cyber forensics is another type of forensics in which evidences are collected at
local end as well as from the cloud. Random Forest (RF) can be the best AI
algorithm for the cyber forensics.
5. Network Forensics
The branch of forensics face lots of challenges due to increase number of packets
on the network. The best algorithm so far for the network analysis is Decision
Tree (DT) [3].
There are many open-source tools available for the purpose of AI, more particularly
machine learning and deep learning. The tools are mostly written in python. Other
programming languages are also very useful for the purpose of AI like R language etc.
The table 12.1 shows the important open-source tools for AI that can help later on to
automate the process of digital forensics. The table 12.2 provides information about some
of the famous dataset sources that can be used for applying intelligent algorithms to
different types of digital forensics.
Table 12.1: Open-Source tools for AI, Machine learning and Deep learning
Tools Written in Language Algorithm
WEKA JAVA Data preparation
Classification
Regression
Clustering
Visualization
Association rules mining
Shogun C++ Regression
Classification
Clustering
Support vector machines.
Dimensionality reduction
LIBSVM C++ Support Vector Machine
RapidMiner JAVA Preprocessing
classification
Scikit Learn C, C++, Python Classification
Regression
Clustering
Preprocessing
Model Selection
Dimensionality reduction.
1. AI make identifying and investing the cybercrime easier with greater probability.
2. AI is a cost-effective solution. It saves time by shifting through unstructured data
from the evidence.
3. Data analysis without any contradiction can be possible using the AI because of its
cognitive properties.
4. AI enables to suspect the criminal by looking up the analytic records.
5. AI can help to identify metadata of the photos and videos.
6. AI also help to find the commonalities among the evidences like date, time,
location etc. This thing can help to predict the next move of criminal.
7. AI can perform efficient analysis on volumes of evidence data and can calculate
statistical results.
1.3. Blockchain
Blockchain is relatively a new technology that refers to the shared and immutable ledger
that help to record transactions and track assets within a business network. Assets can be
of two types; tangible- car, house, a land property etc or intangible- copyrights, patents,
intellectual property etc. many businesses require information that is transparent and
accurate. Blockchain technology provides such information on immediate basis.
Blockchain network can help to track the payments, account statements, orders etc. all
transaction details can be seen end-to-end providing greater efficiency. Figure 12.3
demonstrates that how blockchain technology works to perform transaction.
Following are the key elements of Blockchain Technology.
1. Distributed ledger technology- all users have accessed to the shared ledger
2. Immutable records- no user can tamper the transaction details
3. Smart contracts-it defines conditions for insurances, bond transfers etc.
The emergence of blockchain technology has many advantages. Following are some of
them.
1. Highly secure
Blockchain technology uses digital signature to identify any fraudulent activity.
Also, the use of digital signature makes it impossible for other users to make any
change for the details of a user except if they have particular signature with them.
2. Decentralized system
In traditional financial systems, approval is required from the regulatory
authorities like banks etc for transactions whereas blockchain technology is based
on a decentralized system, means the transactions are done with the mutual
consensus of users.
3. Automation capability
The blockchain technology acts immediately when the trigger requirements are
met without any outside command.
References
[1] S. Qadir and B. Noor, “Applications of Machine Learning in Digital Forensics,” IEEE
Xplore, May 01, 2021.
[2] “IDC: The premier global market intelligence firm.,” IDC: The premier global market
intelligence company, 2019. [Online]. Available: https://www.idc.com/
[3] N. Usman, S. Usman, F. Khan, M. A. Jan, A. Sajid, M. Alazab, P. Watters, “Intelligent
Dynamic Malware Detection using Machine Learning in IP Reputation for Forensics Data
Analytics”, Future Generation Computer Systems, Volume 118, 2021, Pp 124-141
[4] X. Du et al., “SoK: Exploring the State of the Art and the Future Potential of Artificial
Intelligence in Digital Forensic Investigation,” Proceedings of the 15th International Conference
on Availability, Reliability and Security, pp. 1–10, Aug. 2020, doi: 10.1145/3407023.3407068.
[5] Simran Fitzgerald, George Mathews, Colin Morris, and Oles Zhulyn. 2012. Using NLP
techniques for file fragment classification. Digital Investigation 9 (2012), S44–S49.
[6] Mohammed Murtaz Amir Naviq, Hassan Azwar, Syed Baqir Ali, and Saad Rehman.
2018. A framework for Android Malware detection and classification. In 2018 IEEE 5th
International Conference on Engineering Technologies and Applied Sciences (ICETAS). 1–5.