0% found this document useful (0 votes)
11 views

Chapter 12

Uploaded by

noorbasirah05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Chapter 12

Uploaded by

noorbasirah05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Chapter 12: Artificial Intelligence, Big Data,

Blockchain and Digital Forensics

The advancement in technology has created a great deal of information. The data
volumes are increasing day by day. Big data means data with large volumes, types,
formats and variety. Artificial Intelligence (AI) has made it easy to process Big Data.
This chapter deals with forensics challenges an investigator face due to the big data and
how AI and Blockchain can help to perform digital forensics efficiently.
1.1. Big Data
Big data refers to data sets that are too large or complex to be dealt with by
traditional data-processing application software. Big data simply have greater volumes,
greater variety and greater velocity. Greater Volume means greater amount of data with
low densities, more precisely an unstructured data. This can be data of unknown value
such as Facebook data feeds etc. Sometimes such data can be tens of terabytes of
hundreds of petabytes. Greater Velocity means data transfer at fast rate. In this new
technology era, packet speed is of great interest. Internet enabled smart devices usually
work in real time or near real time and consequently it requires real time investigation
too. Greater Variety refers to different formats of data. Data can be structured or
unstructured. Structured data is highly specific and is stored in a predefined format,
whereas unstructured data is a collection of many varied types of data that are stored in
their native formats such as text, audio, video etc are native formats and require
additional preprocessing.
The concept of big data is relatively new. In 2005, with the emergence of Facebook,
YouTube, Orkut etc, the researchers realized that the data generated by the users is
increasing day by day. Users are still generating huge amount of data, but its just not
humans who are doing this. With the advent of Internet of Things (IoT), devices are
becoming interconnected over an internet, consequently the data is increasing. The
emergence of machine learning has also created more data.
The big data is huge. Although the new technologies have developed keeping in mind the
data storage requirement as the data volumes are doubling and tripling with very coming
year, organization still face challenges to effectively store data. Similarly in Digital
Forensic field this huge data is hard to analyze and thus creates lots of challenges for the
investigator. Thus, Big data can also be defined in terms of veracity. Veracity means the
quality and accuracy of the data. Data volumes can bring risks and excessive costs if
proper investment is not made for big data veracity. For digital forensics veracity is a
standard that must be maintained. There is one more V that is important in defining the
big data. The 5th V is Value. The Value of data refers to the usefulness of gathered data.

Volume

Value Velocity
Big
Data

Veracity Variety

Figure 12.1: 5 Vs of Big Data

Challenges and adaption in Digital Forensics:

1. Big data storage


As the data is increasing day by day, there is no set threshold for the size of the
data that is considered to be a big data. According to the literature, 1 terabyte is
accepted as a dataset that can be called as “big data”. According to the
International Data corporation, in 2020 every user online was creating an average
of 1.7 mega bytes of information every passing second [2]. According to the report
only 37% of all this data could be analyzed leaving the plethora of data volume
unattended.
There fore the first challenge that big data faces is the storage issues. Storage
applications and softwares although have large capacities but the increasing rate of
the data volumes is an alarming risk for the existing resources.

2. Structured vs Unstructured Data processing


The data can be classified into two types; one is structured data and the other one
is unstructured data. Both the types require different way of data processing.
Hence this seems a big challenge for a forensic investigator. Structured data just
require traditional search while unstructured data require search queries to find the
required data evidence from the database.
Structured data is one that has some defined structure like binary trees, strings
queues etc. The processing of such unstructured data can be done using tools that
can parse user’s data from different applications that are supported or unsupported
and present the stored data.
Unstructured data is a type of data that has no defined structure such as ordered
columns and rows in database. Unstructured data have many examples like text
messages, emails, audio recordings, video files, web pages, social media pages etc.
An investigator can use search queries to find the data from unstructured data set.

Potential solution to big data challenges for Digital Forensics


Big data is a product of the growing information. Digital forensics faces many challenges
when deal with the big data. There can be many solutions pertaining to these challenges
that are discussed below.
1. Data reduction classification
One solution can be to reduce the data volume. For this the extra and bi-product
data must be filtered before starting the digital forensic process. With the reduction
in the data volume the analysis can become more efficient.

2. Data classification
Data classification means classifying the unstructured data according to its format
and type. Such classification can help digital forensic investigator to analyse the
data on same grounds which is much easier than analyzing the data evidence that
is of different formats.

3. Use of AI, machine learning or deep learning for forensic analysis


The advent of Artificial intelligence, machine learning and deep learning has led
new avenues in every field. Similarly, the increasing volumes of data can be dealt
using the AI, machine learning and deep learning techniques [1].
AI, particularly machine learning and deep learning enables automation for the
digital forensic process and hence help to process big data for the analysis
purpose.

1.2. Artificial intelligence


Artificial Intelligence as opposed to the natural intelligence is intelligence demonstrated
by machines. It refers to the simulation of intelligence just like human in machines.
Machines are programmed in a way to think and act like humans. Traits like learning and
problem solving are associated with the AI. Machine learning is the subset of AI, which
refers to learning of machines from the datasets and adapt new data without any human
assistance. Deep Learning is again a subset of Machine Learning that uses neural
networks and enables learning through a huge unstructured dataset.
AI is based upon the principle that machines can act and think in a way a human does
from the simplest problems to the complex ones. The goals of AI include the coping
human’s cognitive actions. With the passage of time, AI is evolving rapidly. Machine are
more intelligent now in the field of mathematics, linguistics, psychology, computer
science etc.
There are endless applications for artificial intelligence. This particular technology can be
applied to anywhere. From healthcare industry to surgical methods, AI is everywhere.
Smart cars, Smart toys, smart homes are applications of AI. It also has applications in
financial industry where it helps to identify fraudulent activities.
Digital forensics also enjoy AI application in the process. The automation procedure can
help to better process the big data and derive result more efficiently.
Applications of AI in digital forensics
Artificial intelligence and its subset fields like machine learning and deep learning have
lots of applications in the field of digital forensics. From malware analysis to memory
forensics, from image/ video forensics to IoT forensics; AI have applications in every
field of digital forensics.
1. Malware Analysis
AI has made it easier to detect and analyse the malwares in the system. Many
algorithms like C4.5, k-NN, Support vector machine (SVM) etc are used by
different researchers for the purpose of malware analysis. Many datasets are
already available pertaining to malwares like Virus Share, Kaggle etc

2. Image/ Video Forensics


A lot of literature is available for the application of AI in image and Video
Forensics. Datasets can be collected from websites like Pascal VOC, image net
etc. The best AI algorithm for Image/ video Forensics is Neural Networks and
SVM.

3. IoT Forensics
The IoT creates a complex environment and comprises of network forensics,
device forensics and cloud forensics. The forensic process is very time taking
hence AI can help to speed up the process for IoT Forensics.
4. Cyber Forensics
Cyber forensics is another type of forensics in which evidences are collected at
local end as well as from the cloud. Random Forest (RF) can be the best AI
algorithm for the cyber forensics.

5. Network Forensics
The branch of forensics face lots of challenges due to increase number of packets
on the network. The best algorithm so far for the network analysis is Decision
Tree (DT) [3].

There are many open-source tools available for the purpose of AI, more particularly
machine learning and deep learning. The tools are mostly written in python. Other
programming languages are also very useful for the purpose of AI like R language etc.
The table 12.1 shows the important open-source tools for AI that can help later on to
automate the process of digital forensics. The table 12.2 provides information about some
of the famous dataset sources that can be used for applying intelligent algorithms to
different types of digital forensics.
Table 12.1: Open-Source tools for AI, Machine learning and Deep learning
Tools Written in Language Algorithm
WEKA JAVA Data preparation
Classification
Regression
Clustering
Visualization
Association rules mining
Shogun C++ Regression
Classification
Clustering
Support vector machines.
Dimensionality reduction
LIBSVM C++ Support Vector Machine
RapidMiner JAVA Preprocessing
classification
Scikit Learn C, C++, Python Classification
Regression
Clustering
Preprocessing
Model Selection
Dimensionality reduction.

Table 12.2: Datasets for different Digital Forensics Types.


Digital Malware Image Video Network Memory/ File
Forensic Analysis Forensics Forensics Forensics System
Types Forensics
Datasets Virus Share Pascal VOC Karina CAIDA Real Data
2012 Corpus
VX Heaven
MS-COCO 2007 INEX
Comodo Cloud ImageNet Wikipedia
Security Center

State-of-the-art in AI based digital forensics


1. File System Forensics
Within a filesystem, deleted files can be recovered if there remains some metadata.
But some times the files lose metadata upon deletion and file contents reside in
unallocated parts. For recovering the data files, either file carving is used are disk
fragmentation can be done. As there is a large number of fragments created, it is
hard to search manually. Therefore, there must be some kind of automation is
required. Fitzgerald et al. proposed file fragmentation process using the Natural
Language Processing (NLP). SVM and bag-of-words model are combined to
perform the Machine learning. File fragments are bags of bytes. Unigrams and
bigrams are calculated as well as other statistical measurements [5].

2. Network Traffic Analysis


Applications of AI can help to filter the redundant information from large volumes
of data associated with the network traffic analysis. Many algorithms like RF, K-
Nearest Neighbor (KNN), decision Tree (DT), Random Tree (RT) and Regression
were applied by different researchers to automate this process [6].

3. Forensics on Encrypted data


Digital investigators face lots of challenge to decrypt the encrypted data. For
example, if the device has a disk encryption, then it becomes really hard for
forensic investigator to collect evidence from the disk image. Also, brute force
attack is infeasible and require lots of time due to large bit length of modern
cryptographic algorithms.
4. Event Reconstruction
In Digital Forensics process, Event reconstruction is an important step and
requires great care. With the spread of internet and other networks the evidence
collection is a time-consuming task. AI can help to collect the evidence from every
node and hence re construct the criminal event with greater accuracy for analysis
purpose.

Framework of intelligent automation in DF


The digital forensics has four basic steps according to NIST. The four steps are evidence
collection, preservation/ examination, analysis and reporting. The most important and
time taking phases are evidence collection and data analysis due to the large volumes of
data available. AI procedures can be used to automate the procedure. Strong datasets are
required for best results. Table 12.2 can provide some of the dataset sources that can be
helpful. The figure 12.2 shows a framework of intelligent automation in digital forensics.
Machine learning and deep learning algorithms are required for proficient data analysis.
Potential impacts of AI on Digital Forensics
AI is a panacea to many problems of digital forensics. From big data issues to time
consumption problems. Following are some of the potential impacts of AI on Digital
Forensics.
Figure 12.2: Framework of intelligent automation in Digital Forensics

1. AI make identifying and investing the cybercrime easier with greater probability.
2. AI is a cost-effective solution. It saves time by shifting through unstructured data
from the evidence.
3. Data analysis without any contradiction can be possible using the AI because of its
cognitive properties.
4. AI enables to suspect the criminal by looking up the analytic records.
5. AI can help to identify metadata of the photos and videos.
6. AI also help to find the commonalities among the evidences like date, time,
location etc. This thing can help to predict the next move of criminal.
7. AI can perform efficient analysis on volumes of evidence data and can calculate
statistical results.

Challenges to AI based Digital Forensics


AI faces lots of challenges in the field of Digital Forensics due to the presence of modern
encryption systems and anti-forensics techniques.
1. Adversarial attacks are a great challenge for the AI in Digital Forensics. An
adversary can deploy incorrect results. Such attacks basically have counter
forensic properties and hence they can affect the AI results as the datasets
available will lose effectiveness.
2. The lack of large and effective datasets is a great challenge to AI because it can
affect the efficiency of the process.
3. Classification models need up to date datasets for proper working. With the
increase in the data volumes day by day, it is really hard to maintain the datasets
according to the date.
4. Cryptography is a great challenge for the digital forensic process and hence for AI.
5. Event reconstruction in digital forensics need timestamps. Timestamps must be
correct for efficient analysis but sometimes they get tempered by cloak drift,
overwritten timestamps, manual clock changing etc.

1.3. Blockchain
Blockchain is relatively a new technology that refers to the shared and immutable ledger
that help to record transactions and track assets within a business network. Assets can be
of two types; tangible- car, house, a land property etc or intangible- copyrights, patents,
intellectual property etc. many businesses require information that is transparent and
accurate. Blockchain technology provides such information on immediate basis.
Blockchain network can help to track the payments, account statements, orders etc. all
transaction details can be seen end-to-end providing greater efficiency. Figure 12.3
demonstrates that how blockchain technology works to perform transaction.
Following are the key elements of Blockchain Technology.
1. Distributed ledger technology- all users have accessed to the shared ledger
2. Immutable records- no user can tamper the transaction details
3. Smart contracts-it defines conditions for insurances, bond transfers etc.

The emergence of blockchain technology has many advantages. Following are some of
them.
1. Highly secure
Blockchain technology uses digital signature to identify any fraudulent activity.
Also, the use of digital signature makes it impossible for other users to make any
change for the details of a user except if they have particular signature with them.

2. Decentralized system
In traditional financial systems, approval is required from the regulatory
authorities like banks etc for transactions whereas blockchain technology is based
on a decentralized system, means the transactions are done with the mutual
consensus of users.

Figure 12.3: Working of Blockchain Technology

3. Automation capability
The blockchain technology acts immediately when the trigger requirements are
met without any outside command.

Blockchain based digital forensics


Blockchain forensic is the emerging advanced technology that leverages science and
technology to combat cybercriminals in the cryptocurrency market. The primary goal is
to recover and analyze different kinds of evidence left on the most useful and transparent
blockchain digital leader.
Blockchain forensic technology can gain user trust efficiently towards blockchain
technology as well as detect possible cybercrimes including illegal transactions. It
includes multiple practices to find and track all necessary details of cryptocurrencies. It
can include the history, origin, number of tokens, ownership, transaction details, and
much more sensitive information. Evidence against cybercriminals in the cryptocurrency
market can be used in the courts of law as legal and strong proof of any crypto crime such
as hidden crypto asset, crypto embezzlement, identity fraud, tax evasion, terrorist
financing, and so on.
The integration of blockchain technology is transforming the digital forensics system to
catch the ones with cases of cybercrimes or crypto crimes. It has generated multiple
open-source blockchain forensic tools for the implementation if needed to offer justice
for crypto crimes. It is owing to the huge demand for investigations of crypto crimes
being dependent on blockchain forensic analysis tools. Some of the popular tools to
provide easy-to-deploy tracking individual transactions are known as Blockchain
Explorer, Bitcoin Block Explorer, Wallet Explorer, Blockparser, ORS CryptoHound, and
many more.

References
[1] S. Qadir and B. Noor, “Applications of Machine Learning in Digital Forensics,” IEEE
Xplore, May 01, 2021.
[2] “IDC: The premier global market intelligence firm.,” IDC: The premier global market
intelligence company, 2019. [Online]. Available: https://www.idc.com/
[3] N. Usman, S. Usman, F. Khan, M. A. Jan, A. Sajid, M. Alazab, P. Watters, “Intelligent
Dynamic Malware Detection using Machine Learning in IP Reputation for Forensics Data
Analytics”, Future Generation Computer Systems, Volume 118, 2021, Pp 124-141
[4] X. Du et al., “SoK: Exploring the State of the Art and the Future Potential of Artificial
Intelligence in Digital Forensic Investigation,” Proceedings of the 15th International Conference
on Availability, Reliability and Security, pp. 1–10, Aug. 2020, doi: 10.1145/3407023.3407068.
[5] Simran Fitzgerald, George Mathews, Colin Morris, and Oles Zhulyn. 2012. Using NLP
techniques for file fragment classification. Digital Investigation 9 (2012), S44–S49.
[6] Mohammed Murtaz Amir Naviq, Hassan Azwar, Syed Baqir Ali, and Saad Rehman.
2018. A framework for Android Malware detection and classification. In 2018 IEEE 5th
International Conference on Engineering Technologies and Applied Sciences (ICETAS). 1–5.

You might also like