Data Fingerprinting and Visualization For AI
Data Fingerprinting and Visualization For AI
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3482728
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2024.Doi Number
KEYWORDS cyber-defense, cyber security, data fingerprint, data visualization, intelligent system
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3482728
security, confidentiality, and privacy concerns [10,11]. development of AI-enhanced cyber-defense systems. For the
Research conducted by [12, 6, 18] has determined that most research conducted for this paper the reconnaissance phase
ML cybersecurity research has not been tested or trained in is of particular interest for the encoding and visualization of
real-time environments. This is critical for determining the data for the development of AI-enhanced cyber-defense
detection efficiency in practical scenarios in which cyber- countermeasures.
defence systems are intended to be deployed. In [13], the Researchers have identified cyber threats that demonstrate
authors (s) concluded that high-quality real-world and real- the use of AI-enhanced attack tools within different phases
time data are required to counter cyber threats. of the cyber-defense lifecycle [15, 7]. Consider, for example,
Training machine learning models on visualized data the reconnaissance phase in which an AI-enhanced tool such
has proven to be more successful than training on raw data as MalGAN can be deployed. MalGAN generates concealed
[14]. This is because researchers have identified that adversarial malware that can successfully bypass “black-
visualizations can represent complex, large, multimodal box” malware detectors [16]. Another tool, DeepLocker
datasets as simple datasets [14], which simplifies the [17], conceals its malware payload to activate it only when
learning task for AI models. This opens up an opportunity triggered. This is achieved by training adversarial samples
for developers of cyber-defense systems to develop AI- that mutate the payload to obfuscate normal appearance.
enhanced tools that can be trained on visualized data. DeepLocker represents advances in AI-enhanced tools that
Furthermore, visualized representations of data create an can be deployed in the Command & Control phases of the
opportunity to extract more meaningful real-world data cyber-defence lifecycle.
from threat-related environments such as computer
networks. B. STATE-OF-THE-ART CYBER-DEFENSE TOOLS
This study represents the first step in addressing certain The ML-based cybersecurity countermeasures detailed in
aspects of the data problem by proposing a methodology the literature show that there is still a significant gap in
for the development of AI-enhanced cyber-defense tools achieving a cyberdefense system to overcome current
that include tasks for data fingerprinting and data cybersecurity threats. Researchers have concluded that
visualization. The remainder of this paper is organized as most datasets used in ML-based detection systems research
follows. First, related work on threats to the cyber-defense are outdated and do not typically reflect real-world traffic
lifecycle and the efficiency of state-of-the-art cyber- or the latest cyber-attacks accurately [18, 19, 6, and 12].
defense machine learning models are discussed. This was The findings indicate that legacy datasets used in Intrusion
followed by the proposal of a methodology for the Detection Software (IDS) ML research represent 88% of
development of AI-enhanced cyber-defence solutions. the dataset distribution [6]. In addition, this study indicated
This methodology includes data fingerprinting tasks that that in ML research on Malware Detection Software
are discussed in more detail. The application of this (MDS), customized datasets represent 33% of the dataset
methodology is demonstrated using a use case that focuses distribution, and 20% of the datasets were created before
on the discovery of cyber threats through fingerprint 2012. Findings in [18] determined that legacy datasets had
network sessions. the highest majority, representing 56% of the dataset
distribution in ML research. According to [18], although
II. RELATED WORK experimental results on legacy datasets are excellent, they
Current cyber-defense research outputs that are important for decrease significantly when tested on more recent datasets,
the research at hand include the following. including real-world datasets.
A The cyber-defense lifecycle. The lack of real-world datasets is compounded by the
B State-of-the-art ML based cyber-defense tools. inability to extract meaningful information from real-world
systems such as computer network environments. Other
A. CYBER-DEFENSE LIFE CYCLE studies [13, 20, 6, and 12] have indicated that most
The cyber-defense lifecycle stipulates that the phases of a experiments for prototyping network-based cyber-defense
cyber-attacker must be completed to infiltrate the systems used simplified calculated features based on data
organization. The cyber-defense lifecycle phases [7] are as telemetry and averaging statistics. According to [6],
follows: Reconnaissance, which collects information and simplified calculated features result in increased inference
intelligence for the planned cyber-attack; Weaponization, sensitivity and time delay for classifying cyber-attacks.
which focuses on the effectiveness of the cyberattack; ML-based IDS countermeasures have evolved from
Delivery, which bypasses existing safeguards; Exploitation, techniques heavily dependent on feature engineering to
which infiltrates; Installation, which opens the network for Deep Learning, which is less dependent on feature
malicious attacks; Command & control- remote control of engineering. This results in more complex models with
the network; Actions that execute the intended malicious incremental performance improvement. However, the
activity. All phases in the cyber-defense life cycle should be detection of threats with minuscule malicious samples has
considered when developing methodologies for the not yet improved. Although several attempts have used
2
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3482728
dataset rebalancing, no advancements have been made in an increase in sophisticated cyber threats, enabling threat
the techniques that perform better in detecting threats with actors to repeat cyberattacks at a greater scale and speed.
minute malicious samples. This, combined with real-time Based on the above discussion, the following
timing differences, is most likely the reason why IDS requirements for methodologies that provide guidelines for
systems perform worse in real-time environments than in the development of AI-enhanced cyber-defence systems
laboratories. In addition, most IDS research has been are considered important.
conducted on outdated datasets with no current threats. • Improve detection rates and reduce detection time.
Similar to ML-based IDS countermeasures, [21] conducted • Employ dynamic self-learning and RL
a comprehensive review of ML-based MDS approaches, approaches.
considering signature, behavior, heuristic, model-checking, • Detect adversarial and unknown cyber-attacks.
DL, cloud, mobile, and Internet-of-Things-based detection. • Detect threats and attacks with minute sample data
According to this study, although there have been sets.
advancements in every approach, no ML-based MDS • Training AI-enhanced countermeasures in real-
detection approach has successfully detected all malware world environments using real-time data.
types. • The focus is on extracting and encoding
Malialis [22] was one of the only researchers to develop meaningful data from real-world systems, which
a Reinforced Learning model for network intrusion is also referred to as fingerprinting.
detection and response in real-time. The input source was • Visualize the data to overcome complexity in
real-time network packets [22]. The authors (s) proposed multimodal threat related data.
the use of a distributed RL defence system to throttle DDoS
attacks. This was modelled using a mesh network with a III. THE AIECDS- METHODOLOGY FOR THE
distributed group of routers configured by the author(s) as DEVELOPMENT OF AI-ENHANCED CYBER-
RL agents. The agents were then trained to limit the amount DEFENSE SYSTEMS
of DDoS traffic that could be passed through the network, Figure 1 depicts a so-called AIECDS (AI-Enhanced Cyber-
based on the flow of packets through each router. defense System)-methodology developed by the same
In a survey by [23] on adversarial ML in cyber warfare, authors of the research at hand and adopted from previous
the author(s) concluded that there are serious concerns research [24]. The AIECDS methodology provides
regarding vulnerabilities in ML-based cyber-defense guidelines for the development of AI-enhanced cyber-
systems. According to the author(s), faulty assumptions defence systems. However, this study presents a high-level
during ML model training are the main cause of overview of the AIECDS methodology and discusses the
vulnerabilities. The author(s) further noted that AI, which fingerprints and visualization of the data in more detail.
confuses models during inference, is a direct result of Furthermore, the application of AIECDS methodology is
assuming that data in datasets are linearly separable and illustrated through a use case study for the discovery of
solvable using linear functions. This was indirectly verified cyber threats in fingerprinted network sessions.
in practice by [3], who reported in 2022 that there will be
features
2. Threat labelling: Determine
2. Extract features threat label for each unique
from the packet session (during training) 3. Protocol Discourse: Encode
Captured Real-time packets / Threat detection DRL model
using tornado graph using
Real-time environment relevant features 1. Predict: Predict optimal
Until buffer is full
FIGURE 1: AIECDS-methodology adapted for the use case that fingerprints network sessions.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3482728
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3482728
specific manner is to enlarge its prominence within the final Use case: The source and destination IP addresses, ports,
fingerprint, because the significance of behaviors for certain and protocols are sufficient for representing a unique
unique events or sessions may otherwise be missed. network session. This is illustrated in Figure 2.
Both TCP and UDP port numbers range from zero to space could be to record the frame sizes for IP, TCP, and
65535, which can be encoded using four colors (from light UDP, or metrics for other transport or application protocols.
gray to black) and two eight × eight Hilbert curves. This was
achieved by counting 255 in the first Hilbert curve for every Protocol discourse
one in the second Hilbert curve. Protocols range from zero to The protocol discourse section of a fingerprint must
255 and are encoded using a similar approach to IP sections. represent the communication sequence among multiple
The last eight × eight Hilbert curve is reserved for future use, hosts. This is achieved by using certain attributes and
and is required to complete the 128 columns required for the features originating from a sequence of communications.
128 × 128 Hilbert curve used for the transmitted data section. The use case is illustrated in Figure 3.
One possible future application for the reserved eight × eight
Use case: In Figure 3, the protocol exchange is visualized were then sent and acknowledged (8, 10, and 12). The session
for a TCP session between 59.166.0.7 on port 53421 and is finalized at the end with a finish and acknowledges (14, 15)
149.171.126.4 on port 80. The exchange is initiated with a before closing the exchange (16). The fingerprint has a
request to synchronize (1), which is acknowledged (2), after sufficient capacity to capture 128 interactions between two
which the initial setup is acknowledged, pushed, and hosts, which can contain multiple flows within the same
acknowledged (3, 4, and 5). Large packets (6, 7, 9, 11, and 13) unique session.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3482728
Transmitted data Use case: The data section of the fingerprint must encode the
The fingerprint data section must encode the relevant data packet data for all packets within a unique session, or until the
within a unique session or event until the Hilbert curve is 128 × 128 Hilbert curve is completed. The complete Hilbert
completed. curve is shown in Figure 4.
Data are transmitted in bytes, which are composed of eight another for training the threat detection DRL (Detection
bits. As a result, each byte can be converted into a decimal Reinforced Learning) model. The purpose of the fingerprint
range from zero to 255, which can be encoded into grayscale management system is to buffer and maintain all
colors. Therefore, each element of the 128 × 128 Hilbert fingerprints. This is achieved by recording a state for each
curve can depict a byte using 256 grayscale colors. A 128 × fingerprint, which should include the available fingerprint
128 Hilbert curve was selected to develop dense transmitted space, when it was last presented to the threat detection DRL
data visualization to limit future changes in the fingerprint model, and when the fingerprint was last updated. The
shape. fingerprint-management system shown in Figure 1 is
5) THREAT DETECTION illustrated in Figure 5. The fingerprint management system
The criteria for AIECDS methodology include the use of involves inserting newly created fingerprints into the buffer
dynamic self-learning and RL. Therefore, the threat and scheduling them to be presented to the threat detection
detection phase of the AIECDS methodology was designed DRL model, as well as routinely scheduling existing
as shown in Figure 1. The threat detection phase consists of fingerprints to be presented to the threat detection DRL
two main tasks, one for managing the fingerprint system and model once updated.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3482728
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3482728
From the example illustrated in Figure 7, the square difference detection model considers fingerprints based on their element
was calculated from the two 64-element matrices. The square differences.
root of the sum of differences was calculated as the Frobenius To determine the significance of the Frobenius distance for
distance. This is 6.2 times the average element value in the each fingerprint comparison, the gauge shown in Figure 8,
example or, more simply, six additional elements in matrix 1 which is based on the Frobenius distance relative to the
compared to matrix 2. This aspect is required for the average element value for the fingerprints, was used. The
comparison between fingerprints, because the DRL threat significance gauge is relevant to both the transmitted data and
the protocol discourse sections.
8
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3482728
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3482728
The investigation of each threat type in Table 1 revealed and standard deviation of 28 points. The significance of these
interesting observations. For example, consider the shellcode differences ranged from low to low.
and reconnaissance threat types. With reference to the Overall, the most dominant pattern for both malicious and
shellcode threat type, both fingerprints 5.1 and 5.2 have the benign fingerprints was Pattern 9 (used 20 times for malicious
same transmitted data pattern. However, even though the fingerprints and 13 times for benign fingerprints). The second
transmitted data patterns were similar, the distance between most dominant patterns for malicious fingerprints were
the two was 1128 points. By contrast, fingerprints 6.1 and 6.2 Patterns 7 and 10 for benign fingerprints. Patterns 2 and 3 are
do not have overlapping patterns with a smaller distance of frequently used by benign fingerprints but only once by a
907 points owing to the small size of the transmitted data malicious fingerprint, whereas Pattern 4 is frequently used by
shape. The significance of these differences ranges from malicious fingerprints but only once by a benign fingerprint.
moderate to significant. The malicious reconnaissance and the Finally, Pattern 11 is used by only one malicious fingerprint.
closest benign fingerprints match completely in the The overall pattern analysis is shown in Figure 10.
transmitted data patterns. In addition, their fingerprints (30 –
34) had the smallest distances, with an average of 139 points
10
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3482728
It is clear from the transmitted data similarity analysis that 4) RESULTS OF PROTOCOL DISCOURSE
the proposed solution provides a framework for identifying ANALYSIS
meaningful differences between malware and benign network
sessions, and between malware threat categories. Not a single The protocol discourse section of the fingerprint consists of
malware-transmitted data section was exactly the same as its 128 values that range from -1500 to 1500, which has 384000
closest benign-transmitted data section at zero distance. factorial permutations, resulting in infinite possibilities.
All malware threat types, including malware categories that From this analysis, it is clear that there are set packet ranges
were undetectable in the UNSW-15 simulation (backdoor, and phases that together form patterns. The focus of this
shellcode, and worm), showed differences in patterns that subsection is to present the unique patterns, analysis of the
could make these malware threats detectable using less protocol discourse for a few selected malicious and closest
benign fingerprints, and the broader findings uncovered
complex algorithms. Even reconnaissance malware with the
during the analysis.
smallest differences, which in the UNSW-15 simulation had
Two different pattern guides were used in the Protocol
the smallest detection ratio of 0.2%, had a consistent
Discourse Results section. The first is a packet pattern guide
difference that could aid in the discovery of these threats. In that focuses on packet sizes and phases of engagement,
addition, seven malicious fingerprints (7, 8, 9, 10, 11, 21, and whereas the second is a setup-phase packet length and
23) shared their closest benign fingerprints (two unique sequence analysis for specific ports with repeating setup-
fingerprints) with other fingerprints, further indicating the phase patterns.
advancement of the proposed solution and its promising The following evaluation guide was used to identify
effectiveness in increasing the decision boundary between unique patterns within the protocol discourse comprising the
malware and benign classification. Therefore, visual packet sizes of the phases. The guide for the different types
fingerprints can be developed for the transmitted data to of patterns is shown in Figure 11.
differentiate between malicious and benign fingerprints. .
11
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3482728
Large Packets
Large Packets = 6
Guide for
packet size Length > 750
Length =< 750
Guide for
packet phases
Setup phase Transfer phase Teardown phase
Each network session starts The transfer phase starts The teardown phase starts after there
with a setup phase where the when large packets are are no longer any large packets or
session is initiated transmitted acknowledgements for large packets
FIGURE 11: Protocol discourse packet guide
Three different phases were identified that corresponded to transmitted in the transfer phase, and the teardown phase
the typical flow of information and sequence of events in a contained ten small packets and two medium packets.
network session: setup, transfer, and teardown. In the To illustrate and reveal patterns within the repeating set-
example shown in Figure 11, there are ten small packets in up phase sequences, the following guide (Figure 12) can be
the setup phase. Six large and four small packets were used to interpret the annotations, indicating how different
sequences were combined.
x.x
Medium packets Medium packets
x.x
Small packets Whole length Small packets
x.x
In this example, three different protocol discourse setup located in the same row to easily compare their pattern
phase sequences are overlaid onto one illustration on the left similarities. The columns include row numbers (#), threat
side of Figure 12. Using the annotation guide for the whole categories, protocols, ports, and pattern guide results for
sequence, partial lengths, and partial sequences, three malicious fingerprints and row number (#), protocol, port, and
different sequences were identified, as depicted on the right- pattern guide results for the closest benign fingerprints. In
hand side of Figure 12. addition, the sum of differences, Frobenius distance measure,
mean, standard deviation, and significance are included. The
5) PROTOCOL DISCOURSE PATTERN ANALYSIS sum of the differences was included to highlight the overall
differences, based on the protocol discourse guide.
Table 2 presents the results of protocol discourse pattern .
analysis. The malicious and closest benign fingerprints are
12
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3482728
The following are the conclusions from the results of the distances and zero packet differences, which were
analysis in Table 2. reconnaissance fingerprints (30, 31, 32, and 34), because the
packet sequence and flags matched exactly.
• In the section on reconnaissance, it is clear that there is no In summary, except for four reconnaissance fingerprints
difference between reconnaissance malware fingerprints (30, 31, 32, and 34), protocol discourse data and visual
and their closest benign fingerprints because all packets fingerprints can aid in differentiating between malicious and
have the same sequence and packet size, except for benign fingerprints.
fingerprint 33. In Fingerprint 33, the malicious fingerprint
has the same number of transmitted packets, but the V. CONCLUSION AND FUTURE WORK
sequence is different, leading to a distance of 226 points.
This roughly aligns with the detection efficiency of 0.2% The AIECDS methodology discussed in this paper includes
for the UNSW-15. In addition, all reconnaissance network guidelines for the development of AI-enhanced cyber-defence
sessions used port 111, which was used for remote systems. The focus was on extracting meaningful data and
procedure calls. producing visualized fingerprints. This was achieved through
• In the backdoor and shellcode sections, only small packets the design of a fingerprint that enabled the discovery of hidden
were exchanged and remained in the setup phase. There patterns. Visually comparing malicious fingerprints with the
were differences in the number of packets exchanged in closest benign fingerprints demonstrates a significant
fingerprints 1, 3, 4, and 6, and all distances were nonzero. improvement in detecting malicious threats. Furthermore, the
The average distances were 232 and 125 points with use of fingerprinted data and data visualization in cyber-
standard deviations of 56 and 167 points, respectively. The defense systems can significantly reduce the complexity of the
significance of the differences ranged from minor to decision boundary and simplify the machine-learning models
moderate, except for fingerprint 5, which had minuscule required to improve detection efficiency, even for malicious
significance. threats with minuscule sample datasets.
From the results in Table 2, the protocol discourse analysis Therefore, the contribution of this study is the
identified 14 fingerprints with no packet difference. These had improvement in the development of AI-enhanced cyber-
an average distance of 288 points compared with fingerprints defence systems. Furthermore, the application of AIECDS
with one or more differences (20 of 34), with an average methodology is illustrated through a use case study for the
distance of 2473 points. Only four fingerprints had zero
13
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3482728
14
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3482728
PROFESSOR DR. JAN ELOFF is appointed as a full Professor in Security) IFIP and was a recipient of the IFIP Silver Core and Outstanding
Computer Science at the University of Pretoria, South Africa. From 2016 to Services Award. He also served as the South African representative on ISO
2021 he was appointed as Deputy Dean Research and Postgraduate studies (the International Standards Organisation) and as a former president of the
and in 2022 as Acting Dean in the Faculty of Engineering, Built South African Institute of Computer Scientists and Information
Environment and IT at the University of Pretoria. From 2008 to 2015 he was Technologists (SAICSIT). In 2017, he received a SAICSIT award
appointed as Research Director for SAP Research in Africa. He holds a B2 recognising him as an individual who has played a pioneering role in
rating from the National Research Foundation in South Africa indicating promoting computer science and information technology as academic
that he receives considerable international recognition for his research in disciplines in South Africa. In 2020 he received the Chancellor’s Medal for
safeguarding platforms against societal and organisational cyber-threats. He Research from the University of Pretoria and in 2021 he is listed as a finalist
is also a leading international scholar in conducting research in the for the NSTF Lifetime award for an exemplary life-long research in
convergence of Cyber-security and AI. He has published widely in leading Cybersecurity.
international journals. In 2018 he published a scholarly book on Software
Failure Investigations. He is an associate editor of Computers & Security, Christiaan Klopper is a master’s student in IT focusing on big data
the world’s leading journal for the advancement of Computer Security. He science at the University of Pretoria, SA. He received his BEng in electronic
is the co-inventor of a number of patents registered in the USA. Jan is a engineering from the University of Pretoria in 2010. His main research areas
member of the governing and advisory board of the International are data science, big data analytics and developing a self-learning cyber
Knowledge Centre for Engineering Sciences and Technology defence system that can discover undetectable threats.
(UNESCO(IKCEST)) in China. During his research career he represented
South Africa as an expert on Technical Committee 11 (Information
15
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4