0% found this document useful (0 votes)
56 views8 pages

Intelligent Intrusion Detection System For Smart Grid Applications

The document presents an intelligent intrusion detection system (IDS) for smart grid applications that utilizes deep reinforcement learning, specifically a novel CVAE-DDQN architecture. This proposed system aims to address issues such as robustness, accuracy, and adaptability to new attack patterns while maintaining a low false positive rate. Experimental results demonstrate the effectiveness of the system using benchmark datasets, highlighting its capability to detect novel attacks with minimal human interaction.

Uploaded by

wajay44865
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views8 pages

Intelligent Intrusion Detection System For Smart Grid Applications

The document presents an intelligent intrusion detection system (IDS) for smart grid applications that utilizes deep reinforcement learning, specifically a novel CVAE-DDQN architecture. This proposed system aims to address issues such as robustness, accuracy, and adaptability to new attack patterns while maintaining a low false positive rate. Experimental results demonstrate the effectiveness of the system using benchmark datasets, highlighting its capability to detect novel attacks with minimal human interaction.

Uploaded by

wajay44865
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Intelligent Intrusion Detection System for Smart

2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA) | 978-1-6654-2529-2/20/$31.00 ©2021 IEEE | DOI: 10.1109/CyberSA52016.2021.9478200

Grid Applications
Dinesh Mohanty∗ , Kamalakanta Sethi ∗ , Sai Prasath∗ , Rashmi Ranjan Rout† , and Padmalochan Bera∗
∗ Indian Institute of Technology Bhubaneswar, India Email: [dm22, ks23, sps13, plb]@iitbbs.ac.in
† National Institute of Technology, Warangal, Email: [email protected]

Abstract—Smart grid is a cyber-physical system that enhances work very well for new attacks due to a lack of available
the capability of conventional power networks leveraging func- information. Anomaly-based systems work much better in case
tional automation of information and communication technolo- of the first occurrence of an attack as they flag activities that
gies. These systems allow energy provider companies to deliver
low cost reliable power with minimal losses. Despite the advan- they find to be considerably different from normal activities.
tages, such cyber-physical systems are prone to heterogeneous However, as not all digressions are malicious, such IDS are
attacks leading to a breach of data integrity and confidentiality. known for high FPR. Manually defined software behavioural
A significant part of the research that aims to tackle such weak- requirements are used as a basis for detecting attacks in the
nesses of the smart grids, suggests intrusion detection systems third category of IDS (specification-based IDS).
(IDS) as an effective solution. However, robustness, accuracy,
and adaptability to new attacks are the major concerns in such We identified some major problems in several research
systems. Therefore, we proposed an intelligent intrusion detection works aimed at proposing artificially intelligent IDS [3]–[10]:-
system for smart grid networks that uses deep reinforcement 1) limited resilience towards new attacks and changes in attack
learning. Our proposed IDS is robust and highly accurate with patterns, 2) rarely high accuracy and low FPR combination
low false alarm rate. Our model is based on the novel CVAE- was found, 3) high manual interaction required, 4) lack of
DDQN architecture, that combines generative model along with
deep reinforcement learning. Due to lack of smart grid specific testing with real world raw data, 5) lower speed of adaptation
datasets, we have used benchmark network-based NSL-KDD due to lack of curiousity based learning. Our research aims
dataset and cloud specific ISOT-CID dataset. The experimental at dealing with all such issues through the use of Deep
results show the effectiveness of our proposed system in terms Reinforcement Learning algorithms combined with intelligent
of accuracy and false positive rate as well as network attack flow analysis and smart experience replay powered by the
detection capabilities. We have also evaluated the adaptiveness
of our model with changes in attack patterns against critical newly introduced Genarative Auto Encoders. Our proposed
attack types. IDS can detect novel attacks and adapt to every possible
Index Terms—Smart Grid, IDS, False Positive Rate, Reinforce- change in attack patterns in a smart grid environment, with
ment Learning very little human interaction. We could not find any publicly
available smart grid data set for evaluating our model. Hence
I. I NTRODUCTION following the general trend, we evaluated our proposed system
Smart power grid or smart grid refers to a set of technology using NSL-KDD, a very popular conventional network dataset.
that involves machine to machine communication between Based on [11], we found that although NSL-KDD is very
components such as sensors, smart meters, gateways and popular for IDS evaluation, it has some major flaws (flaws
other intelligent devices to provide on demand electricity to derived from KDD 1999 [12]). Hence, we have also evaluated
users from central generation units. Due to the presence of our proposed system with the help of ISOT-CID, a real-time
physically unprotected entry points and wireless networks that intrusion dataset (which is a 8 TB sized dataset and has
could be watched and tampered by attackers, ensuring security greater resemblance with the real world smart grid data ). The
becomes one of the most significant concerns in smart grid results show that our proposed model effectively achieves right
system [1]. Network component of the smart grid consists of balance between accuracy and false-positive rate.
a three layered architecture which can be used for metering
related function: Wide Area Network (WAN), Neighborhood II. R ELATED WORK BASED ON LITERATURE STUDY
Area Network (NAN), and Home Area Network (HAN). In this section, we present the related works on Intrusion
Unfortunately the ongoing research has limited focus on the Detection Systems for Smart Grids.
security aspect of smart grids although reliability is considered Boumkheld et al [3] proposed an anomaly based IDS
as one of the most important factor in the development of smart for Advanced Metering Infrastructure (AMI) using AODV
grids. In recent designs intrusion detection systems (IDS) have Protocol to detect blackhole attacks. It achieved 100% TPR,
garnered importance in the smart grid system. 99% accuracy, and 66% Precision. They worked on simulated
IDS can be classified as (1) signature-based systems, (2) data. Faisal et al [4] had also proposed an anomaly based IDS
specification-based intrusion detection, and (3) anomaly-based for AMI using MOA software to detect DOS, R2L, U2R, and
systems [2]. Signature-based systems work by making use Probing attacks. It achieved 94.67% accuracy and 3.31% FPR.
of a database of past attack signatures. Hence, they do not They worked on KDD CUP 1999 and NSLKDD dataset.

Authorized licensed use limited to: Indian Institute of Information technology Sricity. Downloaded on May 11,2025 at 05:31:28 UTC from IEEE Xplore. Restrictions apply.
Goldberg et al [5] had designed an anomaly based IDS the total reward that can be obtained in successive stages [13].
for SCADA component of the smart grid using Modbus It uses Temporal Difference (TD) learning for this purpose.
protocol and software tools such as wireshark, pcapy and Temporal Difference value can be understood as an estimate
Impacket to detect several attacks. They were able to achieve of the amount of reward that can be expected in the future.
100% precision, 0% FNR, 100% accuracy and 0% FPR. They If the TD value is very small, it means that the classifier has
worked on self generated real world dataset. Feng et al. [6] understood the environment well and there is little scope for
had also designed an hybrid IDS for SCADA component of further improvement. Hence, the major goal is to minimize the
the smart grid using Profinet protocol and Snort software tool TD values. The Equation for updating Q value is as follows:-
to detect reconnaissance, protocol anomalies and DoS attacks.
Qnew (st , at ) ← Q(st , at )+α∗(rt +γ∗maxa (Q(st+1 , a))−Q(st , at ))
Kwon et al. [7] had targeted the the smart grid substation
using MMS and IEC 61850 protocol and software tools such where, st is the state at time t, at is the action taken at
as wireshark to detect DoS, port scanning, portable executable, time t, rt is the reward obtained at time t, α is the learning
Goose, MMS, and SNMP attacks. They were able to achieve rate, γ is the discount factor, Q(st , at ) is the old Q value,
100% precision, 1.1% FNR, 98.9% TPR and 0% FPR. They maxa (Q(st+1 , a) is the estimate of optimal future value,
worked on real data from a substation in South Korea. The [rt + γ ∗ maxa (Q(st+1 , a)] is the temporal difference target
proposed IDS was specification based. Yoo et al [8] had also and [rt + γ ∗ maxa (Q(st+1 , a)) − Q(st , at )] is the temporal
designed anomaly based IDS for smart grid substation using difference equation.
MMS and GOOSE protocol and software tools such as the
WEKA framework to detect several attacks. They were able 2) Deep Q Learning: In complex environments, one cannot
to achieve an average of 3.5% FPR. They also worked on real store the full state-action table, as they do not have discrete
data from a substation. and finite state-action spaces. Q Learning does not work
Pan et al. [9] proposed an hybrid IDS for Synchrophasor very well in such environments. To tackle such environments,
component of the smart grid using Snort and OpenPDC Deep Neural Networks can be used as function approximators.
software tools to detect Single line to ground faults, Replay, Deep Q Learning is a solution for applying the concept of Q
Command Injection and Disable Relay attacks. They were Learning in more complex environments. It uses Deep Neural
able to achieve 90.4% accuracy. They worked on a simulated Networks to estimate the Q values of all possible actions
dataset. Yang et al [10] also targeted the Synchrophasor at a given state. The loss function is generally modeled to
component of the smart grid using IEEE C37.118 protocol and represent the Temporal Difference equation of Q Learning (as
software tools such as ITACA, Nmap, Metasploit and hping the objective of reducing TD value holds true here as well).
to detect Reconnaissance, and DoS attacks. They were able to The loss function is defined as
achieve 0% FPR. Their IDS was also specification based.
loss = (r + γ ∗ maxa0 Q̂(s, a0 ) − Q(s, a))2
The aforementioned IDS works have covered major aspects
of smart grid security. They have also used a wide variety where,s is the state, a is the action taken, r is the reward
of datasets and IDS types. They also show acceptable per- obtained, γ is the discount factor and Q̂(s, a0 ) is the delayed
formances. However, robustness, accuracy, and adaptability to Q function. As we can see the loss function is very similar to
new attacks are the major concerns in such systems. Moreover the temporal difference function in the case of Q Learning.
most of the existing solutions cannot cope up with the huge
amount of data involved in smart grid security. Also most Deep Q Learning was also used to the test against classic
of the anomaly based models do not address the forgetting Atari 2600 games, where it outshone other Machine Learning
tendency which is a major problem in the long run. Therefore, approaches in almost all of them and performed on par with
there is a need of designing advanced and intelligent intrusion or better than skilled human games testers. This displayed
detection systems for smart grid applications that handle the the ability of the model to recognise and learn complex logic
above mentioned problems. by playing with itself.

III. BACKGROUND 3) Double Deep Q Learning: Hado et al. [13] discuss


In this section we briefly introduce the various Q-learning the recurrent overestimation problem observed in Deep Q
models and smart grid networks that are analysed in this Learning due to the inherent estimated errors of learning
research work. in their paper [14]. Thrun and Schwartz [13] studied such
overestimation-related errors in the Q Learning algorithm for
A. Deep Learning Models the first time. They showed that if the action values contain
1) Q Learning: Q Learning is one of the most famous random errors uniformly distributed in an interval [−, ] then
Reinforcement Learning algorithms which uses Q (stands for each target is overestimated up to γ ∗  ∗ (m − 1)/(m + 1)
Quality) function to estimate reward values, which are used to where m is the number of actions. They also gave an example
provide the reinforcement. Q-learning defines an optimal ac- in which such errors led to sub-optimal policies. Later van
tion selection strategy for any Finite Markov decision process Hasselt (2010) showed how noise from environment could lead
(FMDP) with the goal of maximising the expected value of to overestimations even while using tabular representation. He

Authorized licensed use limited to: Indian Institute of Information technology Sricity. Downloaded on May 11,2025 at 05:31:28 UTC from IEEE Xplore. Restrictions apply.
suggested Double Q-learning as a solution, in which the action efficient and intelligent power generation, management and
selection and action assessment procedures are decoupled, distribution.
which dramatically reduces the overestimation problem. Hado Smart grids are the ideal solution to these challenges as
et al. [14] later suggested a Double Deep Q Learning architec- they are reliable, secure and economic with improved moni-
ture, which uses the DQN algorithm’s architectural design and toring capabilities and dynamic performance. Smart grids in
deep neural network but finds better policies, thus improving general are a combination of physical hardware and software
performance [14]. They use the target network and the current components that are integrated to ensure optimal performance
networks to replicate the decoupling that was proposed in the and reduce operating costs in both generation and management
Double Q learning procedure. The TD equation for Double phases thus in turn reducing the cost for the end consumers. To
Q-Learning is given by achieve these lofty goals smart grids rely on the data generated
by millions of interconnected intelligent components which
YtDoubleDQN ⇐= rt +γ∗Q(st+1 , argmaxa Q(st+1 , a; θt ), θt− ) share information along with electricity. The data generated
by the power grids helps in making the grids ”intelligent” as
where, YtDoubleDQN is the TD value at time t, st is the state at
opposed to conventional grids which generate only electricity.
time t, at is the action taken at time t, rt is the reward obtained
Smart grid consists of 4 main components namely the trans-
at time t, γ is the discount factor, θt is weight matrix of the
mission system, generation system, distribution system, and
Current Q Network at time t, θt− weight matrix of the Target Q
control and data centre.
Network at time t. The authors tested their model on six Atari
The primary objective of the generation system of the smart
games by running DQN and Double DQN with six different
grid is energy efficiency. The energy generated by the smart
random seeds. The results showed that the overoptimistic
grids can’t be stored efficiently and is generally expensive, thus
estimation in the Deep Q Learning was more common and
they are used right away. The power generation usually doesn’t
severe than what was previously acknowledged. In some cases
happen at a centralized location rather it is distributed across
the over estimations were so high that log scale had to be used
various locations depending on the nature of power being
to show comparison with the optimal policy. The results also
generated (solar, wind, coal). Therefore it is crucial to ensure
confirmed that Double Deep Q Learning achieves state of the
the generation systems are working at maximum capacity to
art outcome on Atari 2600 domain.
ensure optimal performance and the demand for energy at any
We have employed decoupling of action selection and action
given time is fulfilled by using the various available resources.
evaluation concept to build our Double Deep Q Learning-
The role of the transmission and the distribution system
based model and evaluated it on a real-world cloud dataset
is to manage the timely supply and ensure that the lines are
(ISOT-CID) and a conventional network based NSL-KDD
well maintained. They play a major role in transferring the
dataset. Evaluation suggests significant improvements as com-
power generated at various locations to domestic consumers,
pared to the previously proposed DQN model and other simple
small scale factories and large scale industries. The genera-
classifiers, as discussed further.
tion, transmission and distribution systems are also a part of
4) Curiosity Driven Variational Autoencoder: Before traditional electric grids.
defining CVAE, we need to define VAE first. VAE is a gener- But the control and data centres are exclusive to the
ative model that can learn the unsupervised latent representa- smart grids and are absent in conventional grid systems.
tions of complex high-dimensional data by encoding the high- They perform advanced management techniques like real time
dimensional data into a latent space and then decoding it back distributed automation via two-way communication with the
to the original data. [15]. It uses encoders and decoders for this substation. Furthermore, a smart grid infrastructure contains
purpose. It tries to find out the marginal likelihood of a sample. advance grid systems that help develop self-healing charac-
The marginal likelihood consists of reconstruction loss and a teristics in the grid. The aim of self-healing is to diagnose
KL (Kullback–Leibler) divergence term. The CVAE improves problems in the grid early and fix them as quickly as possible
upon the VAE by using prediction losses as intrinsic reward without the need for human interference. As a result, the smart
to ensure sufficient exploration. Thus the training samples grid becomes more resistant to threats, increasing availability
generated by CVAE will be better in terms of intrinsic learning and reliability. Smart grids require self-healing in order to
capacity than the VAE training samples. This is a very brief redirect and change the flow of energy into alternate routes in
description of CVAE. The paper [16] can be refereed for more the event of a disruption, a task that can only be accomplished
vivid descriptions that include the mathematical foundations as through continuous self-assessment of the state of the power
well. system.
Nowadays smart metres collect data on customer use, which
B. Smart Grids
is then used to reliably track, adapt to supply requirements
Electricity generation by traditional power grids is quickly by forecasting peak usage, and recognize the system’s flaws.
becoming outdated due to the lack of efficient power gen- Smart grids also provides us an opportunity to price the
eration and management. Along with the increasing demand supply dynamically based on usage. The consumers can be
for electricity the conventional grids are being continuously encouraged to use the supply at times when the usage is low
overloaded and are not stable. Therefore there is a need for (lower price) and also reduce the usage when the systems

Authorized licensed use limited to: Indian Institute of Information technology Sricity. Downloaded on May 11,2025 at 05:31:28 UTC from IEEE Xplore. Restrictions apply.
are at maximum capacity (high price). This real-time feed to by unknown external hackers, which were identified, by the
consumers creates awareness about the grid systems and also tiger team based on the source IP addresses and attack timing.
helps improving the abilities of the system. A wide variety of attack scenarios were covered including
Compared to conventional systems, the electric grids share simultaneous attack scenarios, coordinated attack scenarios,
a lot more information which makes then vulnerable to cyber etc. Moreover, a wide variety of distinct geographical locations
attacks. Altering the power generation, denying service to in Europe, North America, and Asia were used for launching
the consumers, overloading the grids etc. are some of the the attacks. The normal data collected was also quite varied
many ways in which the grids are susceptible to attack. Thus and complex involving 160 legitimate visitors, ranging from
monitoring the various components for any unusual behaviours data involved in maintaining the status of VMs, rebooting,
or anomalies is now an essential part of any cyber-physical updating, creating files, SSHing to the machine, etc. The types
system. Attacking the power grids can compromise the secu- of attacks are further mentioned in Table I.
rity of the country and affect its economy drastically. Thus
modernizing the grids comes at the cost of a compromised TABLE I
ISOT DATASET ATTACK TYPE DISTRIBUTION
and vulnerable system that is prone to attacks.
Insider Attacks Outsider attacks
IV. DATASET D ESCRIPTION Trojan Horse Unclassified (unsolicited traffic)
A. overview Backdoor (reverse shell) DNS amplification DOS
Unauthorized Crypto- mining Ports and Network scanning
We used the ISOT Cloud Intrusion Dataset (ISOT-CID) UDP Flood DOS Brute Force login /Dictionary attack
[17] to test our model, which is the first publicly accessible Stepping Stone Attack HTTP Flood DOS
cloud-specific dataset. The dataset was gathered over the cloud Ports and scanning Network Directory/Path Traversal
Synflood DOS Brute Force login /Dictionary attack
infrastructure of the ‘Compute Canada’ cloud service provider,
Revealing Users Confidential Data Fuzzers
that provides its services for supporting the computational Brute Force login /Dictionary attack Synflood Dos
needs of researchers [18]. The dataset was collected at the
hypervisor layer, guest hosts layer, and network layer of an
OpenStack-based development environment. We just used the C. Attack categories
network traffic data part of the ISOT-CID for our purposes.
The ISOT-CID malicious activities were divided into outside
The dataset consists of data obtained from a variety of sources
and inside attacks, based on whether they were performed
including system logs, network traffic, CPU performance,
by outsiders or insiders respectively. The outside attacks
memory dumps, system call traces, etc. We have considered
comprised those that were made from the outside world (by
the dataset obtained in phase 2 (out of the two phases).
the tiger team or the unsolicited activities). Inside malicious
Aldribi et al. (2018) [19] contains description of the cloud
activities were carried out by either an insider with high privi-
platform, data collection procedures, and Phase 1 dataset.
leges on the hypervisor nodes within the cloud environment or
Data collection in both the phases was made on the same
a hacked VM inside the cloud environment that was later used
cloud environment and the same collection procedures were
as a launching pad for targeting other instances in the cloud
followed. However, the second phase of collection occurred
or the outside world. Some of the inside attacks were network
more recently and covered a wider variety of newer attack
scanning, password cracking, backdoor and Trojan horse, DoS
vectors.
attacks, etc. [19]
B. Cloud environment and Data collection outline
D. Network Traffic Data
Three hypervisor nodes (A, B, and C) were used in the
ISOT-CID selection environment. In addition, the cloud envi- The entire ISOT-CID data set of size 8TB consisted of
ronment includes ten instances of VM (VM1 to VM10) that 55.2GB of network traffic data. The network traffic data was
were launched in three cloud zones: A, B, and C. In zone composed of three levels of network communications, :-
A, five instances (VM2, VM3, VM4, VM5, and VM6) were • external traffic - traffic between the instances

launched; in zone B, four instances (VM7, VM8, VM9, and • hypervisor traffic or internal traffic - network traffic flow

VM10) were launched; and in zone C, one instance (VM1) among the hypervisor nodes
was launched. The data was collected in the cloud for several • local traffic - traffic between two VMs on a given

days with time slots of 1-2 hours per day. Data was gathered hypervisor node.
using a variety of collector agents that were categorised and The network traffic data was obtained and recorded in packet
integrated into three cloud layers: virtual machine or instance- capture (PCAP) format for public use. In part one, a total
based agents, hypervisor-based agents, and network-based of 22,372,418 packets were captured, with 15,649 (0.07%)
agents. For storage and analysis, the data obtained by these of them being malicious. In part 2, a total of 11,509,254
collectors was sent to the ISOT lab log server. The dataset packets were registered, with 2,006,382 (17.43%) of them
contains both normal and malicious activities. The malicious being malicious. The data collected were organised into folders
data consists primarily of attacks executed. However, there based on the date and hypervisor on which the data collection
were some unsolicited malicious sessions that were undertaken took place.

Authorized licensed use limited to: Indian Institute of Information technology Sricity. Downloaded on May 11,2025 at 05:31:28 UTC from IEEE Xplore. Restrictions apply.
E. Use of Tranalyzer for pre-processing (EDS) for controlling the distribution component and manage
Since packet payload processing entails a large amount and the metered connections. The data collected at various levels
rate of data being processed, flow-based intrusion detection is analysed within each IDS and in case any security breaches
is thought to be better for high-speed networks due to lower are identified they are reported to the central IDS system which
processing loads. However , the problem with such flow based takes necessary action if required upon further analysis.
analysis is high false alarm rates. We used the open source tool B. DDQN architecture
Tranalyzer, which is a lightweight flow generator and packet
analyzer developed for practitioners and researchers, to obtain The main goal of DDQN is to deal with the overestimation
flow based data from packet based data. It was used to process of action values that occurs in DQN. The decoupling of
the PCAP files and files containing multiple JSON files were action selection and action valuation elements is one of the
obtained as output. We were able to extract about 1.8GB of DDQN’s main concepts. In our case, we use two separate
flow based output data in JSON format from 32.2GB of packet Neural Networks to accomplish this. One of them implements
based input data in PCAP format using tranalyzer [20]. the current Q function while the other implements the target
Q function. Back propagation is performed in the current Q
V. P ROPOSED IDS Neural Network, and its weights are copied into the target Q
We present our proposed intrusion detection system based Neural Network with delayed synchronisation. The copying
on deep reinforcement learning in this section. We present the is performed at periodic intervals of a defined number of
dataset and basic processing elements relevant to our system epochs. We used the epoch duration of 16 in our tests because
before introducing the components of our system. it was found to provide the best results in the majority of
cases. The activities (raising necessary alarms) are carried
A. IDS in Smart Grids out according to the current Q function, but the current Qnew
To improve the robustness of the Smart grids against var- values are calculated using the target Q function. While doing
ious attacks, intrusion detection systems are implemented to gradient descent, this is done to avoid the shifting target effect.
analyse collected data and report any findings. They help us to Manuel Lopez-Martin, Belen Carro, and Antonio Sanchez-
get real-time surveillance of the whole electric grid which is Esguevillas used a similar approach in their research [21].
used to take swift actions against any attacks. It is important This process of delayed synchronisation between two Neural
to ensure that the error rate (wrongly reported as attack) has Networks achieves the needed decoupling and, as a result,
to be minimized to reduce the disruptions in the network. handles the DQN’s shifting target phenomenon.
Smart grid network is proposed of Home Area Network
(HAN), Neighborhood Area Network (NAN), and Wide Area C. CVAE-DDQN Architecture
Network (WAN). An IDS system is present at all these Here, we present our algorithms for intrusion detection.
levels to collect and analyse incoming/outgoing data. Different Algorithm 1 shows the working of the CVAE along with
types of attacks target different levels, for instance the power DDQN in each target network update cycle. The administrator
required to be generated can be manipulated at the WAN level network’s role is depicted in Algorithm 2. It uses a voting
whereas DoS attack can be implemented at NAN level to deny mechanism to determine whether or not an attack has occurred.
services to one specific house in a network. Thus collecting
data at various levels helps in recognising the patterns more The Agent Network gets its input from the Host Network. It
easily and helps provide more robust estimates and reduces conducts flow based analysis and preprocessing before feeding
the rate of wrong predictions. the input to the DDQN moodel. The DDQN model predicts the
One IDS is located at each HAN, where various devices are output. The actual result is obtained from the administrative
connected to the electric power lines, this is the first level of network. Based on the result, the reward is calculated. The
the smart grid. Each HAN has a service and a metering compo- input state along with the action, reward and output state is
nent to provide energy and measure the cost associated with it. stored in a experience pool. The CVAE generates tuples that
This IDS tracks the incoming and outgoing communication to get an additional intrinsic curious reward. Normal Experience
the HAN and will notify the central IDS system in case of any Replay tuples are considered with probability G and the
security breaches. The NAN is the second layer of the smart generated experience tuples are considered with probability 1-
grid. Multiple HANs which are spatially nearby are considered G. These tuples are then used to retrain the DDQN model. The
to be a part of the NAN. The IDS at the NAN collects the IDS deployment architecture shown in Figure 1 and CVAE-
service and the metering information from all the HANs within DDQN model architecture shown in Figure 2 can be used for
the considered neighbour for further analysis. The third level further reference.
of the smart grids is the WAN, it facilitates communication
amongst the various components of the smart grids such as VI. E XPERIMENTAL R ESULTS AND D ISCUSSION
the power generation, transmission, distribution (across various The deep learning models were implemented in Python
NANs) including a centralised IDS server which is monitored language using the Tensorflow library. The performance of the
by professionals. This layer also contains Supervisory control models were evaluated on the ISOT-CID and the NSL-KDD
and data acquisition (SCADA) and Energy distribution system dataset. Along with multi-class classification, attack specific

Authorized licensed use limited to: Indian Institute of Information technology Sricity. Downloaded on May 11,2025 at 05:31:28 UTC from IEEE Xplore. Restrictions apply.
Algorithm 1: CVAE DDQN Logic
Initialise Replay Memory D with capacity N, Generate Replay Memory Dg
with capacity Ng , Minibatch size M, proportion factor g ;
Initialise the action value function and target value functions with weights
theta and thetaT ;
for episode =1 to I do
Observe state s0 ;
for t =1 to I do
choose an action at based on epsilon- greedy policy;
Observe transition (st ,at ,rt ,st+1 );
store transition (st ,at ,rt ,st+1 ) in D;
sample random minibatch of transition (st ,at ,rt ,st+1 ) from D;
Generate transition (st ,at ,r’ t ,s’ t+1 );
compute the prediction error et ;
store transition (st ,at ,r’ t +betaet ,s’ t+1 ) in D g ;
Randomly sample M X(1 − G) of transition (sj ,aj ,rj ,sj+1 ) from D;
Randomly sample M X(G) of transition (sj ,aj ,rj ,sj+1 ) from Dg ;
if episode terminates at step j+1 then
yj ← rj ;
else
y j ← r j + γmaxa’ Q(sj+1 , a’ ; θ - );
end
Gradient Descent on (y − Q(sj+1 , a; θ))2 w.r.t. network
paramenters θ;
in every C steps θ - ← θ;
end
end

Algorithm 2: Administrator Network Logic


Obtain AgentResult & f eaturevector from the agents;
for each agent do
a = number of bits set in agent result;
b = number of bits not set in agent result;
if a ≤ b then
result ← ”nonattack”;
else
result ← ”attack”;
if (result == ”attack”) then Fig. 1. IDS Deployment Architecture
feature vector is preprocessed for classification of attack;
attack type is obtained from classifier by inputing the proccessed
features
attacktype = classifier’s output; we obtain the following result:-
else
attacktype = ”nonattack”;
actualresult is obtained ; TABLE II
actualresult is sent to the agent to be used fpr reward calculation; M ODEL P ERFORMANCE ON ISOT CID AND NSLKDD DATASET

Model Dataset Accuracy FPR AUC


CVAE DDQN ISOT-CID 98.16 1.56 0.896
CVAE DDQN NSL-KDD 89.20 1.77 0.8812
DDQN [22] ISOT-CID 96.87 1.57 0.886
classification and performance on changing attack patterns DDQN [22] NSL-KDD 83.40 1.48 0.8432
were also analysed. Anomaly Based [4] NSL-KDD 82.1 - -
The results of the models were evaluated on four standard
machine learning performance metrics, i.e., FPR (False Posi- To justify the use of CVAE-DDQN, we have compared the
tive Rate), TPR (True Positive Rate), AUC (Area under ROC results with DDQN proposed model [22]. The comparison
Curve), and ACC (Accuracy). The use of AUC serves as a is presented in Table II. We have also compared the results
singular metric to balance the TPR and the FPR rates and helps with another state-of-the-art anomaly based model. We see
us compare the performances of various models. We used an a considerable increase in accuracy with very low FPR rate.
80 to 20 train test split. We ran our models on Google Colab. Such results obtained with real world large sized databases
Although CVAE-DDQN is an online learning model, while confirm the pragmatic approach with which the model is built.
implementing it, we have used a part of the Datasets for the The fact that DQN-CVAE model outperformed various other
purpose of testing as well. We have implemented the prepro- advance reinforcement learning based algorithms in Atari 2600
cessing to create lists of JSON objects and later converted it games, is a testimony to the effectiveness our CVAE-DDQN
into CSV files (in case of ISOT-CID Dataset). Also the flow algorithm. [23]
based analysis was done before implementing CVAE-DDQN,
but it can be done in parallel to the CVAE-DDQN model. A. Evaluation of Model robustness against constantly evolving
Similarly, while implementing, we have implemented CVAE attack patterns
in sequence with the DDQN algorithm but it can be run in To test the adaptability and robustness of our model, we
parallel in the real world scenario. Upon testing with datasets, leveraged the characteristic features of the ISOT-CID dataset.

Authorized licensed use limited to: Indian Institute of Information technology Sricity. Downloaded on May 11,2025 at 05:31:28 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Pictorial representation for our DDQN CVAE Model architecture

A large chunk of the data was collected on first six days al- TABLE III
though the entire log collection span was of 18 days. Moreover E VALUATION OF M ODEL ROBUSTNESS AGAINST DAILY CHANGING
ATTACK PATTERN
the attack type was different for each day. Leveraging this fact,
we have tried to train the model to dataset belonging from day Day Attack Type ACC FPR AUC
1 DTA and UCM - - -
1 to day (n-1) and evaluating it on the n th day dataset. We
2 NS 83.11% 3.31% 0.8421
have presented the results in Table III. The results indicate 3 SQLIA, CSS, PTA, S-DOS 90.16% 2.24% 0.8801
that our model shows great results even if it is trained for a 4 BFLA(failed) 93.11% 2.32% 0.9301
few days and is then exposed to an unforeseen attack. The 5 UCMA, DNSADOSA, HTTPFDOS 94.10% 2.00% 0.9448
consistent increase in the metrics presented in Table III, in s
subsequent days, reaffirm the robustness of our model making ACC: Accuracy, DTA: Dictionary Traversal Attack, UCMA: Unauthorized Crypto-
mining attack, NS: Network scanning, SQLIA: SQL Injection Attack, CSS: Cross-
it a practical fit for smart grid deployment. site Scripting(XSS), PTA: Path Traversal Attack, S-DOS: Slowloris DOS, BFLA:
Brute Force login attack UCMA: Unauthorized Crypto-mining, DNSADOSA: DNS
amplification DOS attack HTTPFDOS: HTTP flood DOS
B. Attack specific classification on NSL-KDD Dataset
To ensure that the model performs well against all types of VII. C ONCLUSION
attacks, we perform a binary classification for each attack type
to predict whether or not a data point belongs to that attack The ISOT-CID and NSL-KDD datasets were used to test an
category. We classify all the attacks into 4 major types namely advanced deep reinforcement learning based intrusion detec-
DOS, Probe, R2L and U2R. Our analysis concludes that the tion system for smart grids that provided high accuracy and
model performs fairly well across all the categories, although low FPR. Our models aim to meet the real-world resource
there is a significant fall in accuracy from DoS to R2L/U2R constraints by creating a distributed network of IDS and
that can be attributed to the lack of enough data points for reducing the reliability on a centralised IDS server, in the
R2L/U2R. The attack type classification results are shown in process making the system more safe and robust.
Table IV. We introduced the Curiosity driven variational autoencoder

Authorized licensed use limited to: Indian Institute of Information technology Sricity. Downloaded on May 11,2025 at 05:31:28 UTC from IEEE Xplore. Restrictions apply.
TABLE IV [9] T. M. S. Pan and U. Adhikari, “‘developing a hybrid intrusion detection
ATTACK SPECIFIC PERFORMANCE FOR NSLKDD system using data mining for power systems,’,” in IEEE Trans. Smart
Grid, vol. 06, no. 6, 2015, pp. 3104–3113.
Sl No Attack Type ACC FPR AUC [10] Y. Y. et al., “‘intrusion detection system for network security in syn-
1 DOS 98.80% 4.1% 0.974 chrophasor systems,’,” in IET Int. Conf. Inf. Commun. Technol. (IETICT),
2 Probe 88.01% 9.41% 0.8421 2013, pp. 246–252.
3 R2L 89.41% 0.33% 0.8408 [11] H. Hindy, D. Brosset, E. Bayne, A. K. Seeam, C. Tachtatzis, R. Atkinson,
4 L2R 92.88% 4.08% 0.8891 and X. Bellekens, “A taxonomy of network threats and the effect of
current datasets on intrusion detection systems,” IEEE Access, vol. 8,
pp. 104 650–104 675, 2020.
[12] A. M. A. Tobi and I. Duncan, “Kdd 1999 generation faults:
Double Deep Q Network-based (CVAE DDQN) IDS to handle a review and analysis,” Journal of Cyber Security Technology,
the overestimation of action values in the Deep Q Learning- vol. 2, no. 3-4, pp. 164–200, 2018. [Online]. Available:
https://doi.org/10.1080/23742917.2018.1518061
based model and to ensure curiosity enabled learning so as [13] S. Thrun and A. Schwartz., “Prioritized experience replay,” in M. Mozer,
to handle restricted exploration and forgetting tendencies of P. Smolensky, D. Touretzky, J. Elman, and A. Weigend, editors, Proceed-
machine learning models. We also ensured that the model can ings of the 1993 Connectionist Models Summer School, Hillsdale, NJ,
1993. Lawrence Erlbaum), 1993, pp. 1–6.
cater to changing attack patterns on a day to day basis without [14] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning
any major compromise in the accuracy and false positive rate with double q-learning,” in Thirtieth AAAI conference on artificial
along with ensuring that the model performs relatively well intelligence, 2016.
[15] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2014.
across all attack categories. As compared to individual clas- [16] W. H. M. C. Han GJ., Zhang XF., “Curiosity-driven variational autoen-
sifiers and recently established Deep Q learning techniques, coder for deep q network.” in Lauw H., Wong RW., Ntoulas A., Lim
the experimental results indicate substantial improvements in EP., Ng SK., Pan S. (eds) Advances in Knowledge Discovery and Data
Mining. PAKDD 2020. Lecture Notes in Computer Science, vol 12084.
evaluation metrics. Springer, Cham., 2020, pp. 1–6.
Overall, the model performance was better than existing [17] “Isot cid website.” [Online]. Available: https:
//www.uvic.ca/engineering/ece/isot/datasets/index.phpion
works in the field of IDS for smart grids on the NSL-KDD [18] A. Aldribi, I. Traoré, B. Moa, and O. Nwamuo, “Hypervisor-based cloud
and ISOT-CID dataset. The decentralised nature of the model intrusion detection through online multivariate statistical change track-
and decreased reliance on a central server reduces the amount ing,” Computers Security, vol. 88, p. 101646, 2020. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0167404819301907
of human interventions required for the smooth working of the [19] I. T. A. Aldribi and B. Moa, “Data sources and datasets for cloud
systems. The use of novel CVAE-DDQN algorithm ensures intrusion detection modeling and evaluation,” 2018.
that our IDS is suitable for environments as complex as [20] “Tranalyzer documentation.” [Online]. Available:
https://tranalyzer.com/documentation
smart grids. In the future, we intend for the deployment, [21] M. L. Martı́n, B. Carro, and A. Sánchez-Esguevillas, “Application
implementation and evaluation of the proposed IDS on large of deep reinforcement learning to intrusion detection for supervised
scale real-world electric grids. problems,” Expert Syst. Appl., vol. 141, 2020.
[22] K. Sethi, R. Kumar, D. Mohanty, and P. Bera, “Robust adaptive cloud
intrusion detection system using advanced deep reinforcement learning,”
R EFERENCES in Security, Privacy, and Applied Cryptography Engineering, L. Batina,
S. Picek, and M. Mondal, Eds. Cham: Springer International Publishing,
[1] D. Vozikis, E. Darra, T. Kuusk, D. Kavallieros, A. Reintam, 2020, pp. 66–85.
and X. Bellekens, “On the importance of cyber-security [23] G.-J. Han, X.-F. Zhang, H. Wang, and C.-G. Mao, “Curiosity-driven
training for multi-vector energy distribution system operators,” in variational autoencoder for deep q network,” in Advances in Knowledge
Proceedings of the 15th International Conference on Availability, Discovery and Data Mining, H. W. Lauw, R. C.-W. Wong, A. Ntoulas,
Reliability and Security, ser. ARES ’20. New York, NY, USA: E.-P. Lim, S.-K. Ng, and S. J. Pan, Eds. Cham: Springer International
Association for Computing Machinery, 2020. [Online]. Available: Publishing, 2020, pp. 764–775.
https://doi.org/10.1145/3407023.3409313
[2] S. Parampottupadam and A. Moldovann, “Cloud-based real-time net-
work intrusion detection using deep learning,” in 2018 International
Conference on Cyber Security and Protection of Digital Services (Cyber
Security), 2018, pp. 1–8.
[3] M. G. N. Boumkheld and M. E. Koutbi, “‘intrusion detection system
for the detection of blackhole attacks in a smart grid’,” in Proc. 4th Int.
Symp. Comput. Bus. Intell. (ISCBI), vol. 01, 2016, p. 108–111.
[4] J. R. W. M. A. Faisal, Z. Aung and A. Sanchez, “‘data-streambased
intrusion detection system for advanced metering infrastructure in smart
grid: A feasibility study,,” in IEEE Syst. J., vol. 09, no. 1, 2015, pp.
31–44.
[5] N. Goldenberg and A. Wool, “‘accurate modeling of modbus/tcp for
intrusion detection in scada systems,’,” in Int. J. Critical Infrastructure
Protection, vol. 06, no. 2, 2016, pp. 63–75.
[6] X. H. P. P. Y. L. Z. Feng, S. Qin and L. Wang, ““snort improvement
on profinet rt for industrial control system intrusion detection,”,” in 2nd
IEEE Int. Conf. Comput. Commun. (ICCC), 2016, pp. 942–946.
[7] Y. H. L. Y. Kwon, H. K. Kim and J. I. Lim, ““a behavior-based intrusion
detection technique for smart grid infrastructure,”,” in IEEE Eindhoven
PowerTech, vol. 0, 2015, pp. 1–6.
[8] H. Yoo and T. Shon, ““novel approach for detecting network anomalies
for substation automation based on iec 61850,”,” in Multimedia Tools
Appl., vol. 74, no. 1, 2015, pp. 303–318.

Authorized licensed use limited to: Indian Institute of Information technology Sricity. Downloaded on May 11,2025 at 05:31:28 UTC from IEEE Xplore. Restrictions apply.

You might also like