Transmission Line Fault Detection, Syedazehranadeemdsai2
Transmission Line Fault Detection, Syedazehranadeemdsai2
A Project Report submitted in Partial fulfillment of the requirements for a Postgraduate Diploma in
Data Science with Artificial Intelligence (AI)
Batch: Batch-2
Project Title: Transmission Line Fault Detection and Classification Using Deep Learning Techniques
________________________
Signature of Supervisor
CERTIFICATE
This is to certify that Mr. / Ms. Syeda Zehra Nadeem of batch II has successfully completed the PGD
project in partial fulfillment of requirements for a PGD in Data Science with Artificial Intelligence
(AI) (PGD Title) from NED Academy, NED University of Engineering and Technology, Karachi,
Pakistan.
Project Supervisor
Instructor/Supervisor
NED Academy
I hereby state that this Project titled, Transmission Line Fault Detection and Classification
using Deep Learning Techniques, is my own work and has not been submitted previously by
me for taking any degree/ diploma from anywhere else in the world.
Signature:
Date: 08-06-2023
PLAGIARISM UNDERTAKING
I solemnly declare that the research work presented in this PGD Project titled: Transmission Line
Fault Detection and Classification using Deep Learning Techniques, is solely my research work
except where
Signature:
Student: Syeda Zehra Nadeem
Date: 08-06-2023
Contents
ABSTRACT................................................................................................................................................ 28
Chapter 1: INTRODUCTION..................................................................................................................... 29
5.2.3 Convolutional Neural Network (CNN) Training and Validation Accuracy ...................................... 68
5.2.4 Long Short-Term Memory (LSTM) Training and Validation Accuracy ........................................... 69
It is pivotal for a modern society to have a power system that is reliable and robust. All sectors of
human civilization including industries, households, and businesses rely heavily on power
distribution. Any power disruptions can have everlasting repercussions. Hence, prompt detection
and classification of power system faults are crucial. The main focus of the research project is to
develop a fault detection and classification system through a deep learning approach that is
The study highlights the importance of fault detection in power systems. Particularly, shunt
faults in electrical networks are prevalent. The need for advanced fault detection techniques is
discussed. Exploring the deep learning potential is the primary aim of the project, especially
(LSTM) models.
detailed pre-processing through data training techniques and evaluation. Both Long Short-Term
Memory (LSTM) and Convolutional Neural Network (CNN) models are showcased to detect and
Due to higher test accuracy and lower test loss, the Long Short-Term Memory (LSTM) model
displays superiority. LLLG shunt fault classifying challenges are acknowledged throughout the
While highlighting the importance of work done so far, the study concludes by making future
recommendations for challenging fault classes. Increasing robustness and exploring ensemble
Background of Study
The role of power systems nowadays is of foremost importance, it provides the basic framework
for our society to run upon. The electricity distribution and supply sustenance our industries,
households, businesses, and especially our economy. The entire human ecosystem gravely
depends on uninterrupted and seamless Power Transmission Systems worldwide, otherwise, the
productivity and functionality of all sectors of the ecosystem are disturbed. After power
generation in the Generation station, high-voltage electricity is transmitted over long distances
using a network of Transmission lines, towers, and substations. High-voltage AC is used for
centers, ensuring that Electricity flows throughout the world, provided that their operational
integrity remains intact. The performance of Transmission lines can be affected by several faults.
85 to 87% percent of power system faults occur on Transmission lines (M. Singh, 2011). For the
system to remain efficient, reliable, and fast; robust fault detection techniques must be
developed.
Faults that remain undetected can cause severe economic losses because of equipment damage,
production stoppage, and impending safety hazards. For sectors that are heavily dependent on
power supply, even a second-long disruption can cause substantial losses. Unattended faults are
Traditionally Fault detection and classification in transmission lines were done through a variety
of methods such as relay-based protection schemes, signal processing techniques, and expert
systems (Zhang, 2016). However, these methods have limited efficiency due to their limitations
in complex and ever-changing environments. The power systems have grown to be perpetually
more complex due to advancements in technology and changing demands, so adaptation of the
Exploration for more advanced and efficient techniques, like employing artificial intelligence for
the detection and classification of faults in Transmission lines. AI has shown a commendable
result in solving problems in all sectors of life, whether it’s computer vision or language
processing. Deep learning, a subfield of Machine learning, has especially shown remarkable
progress in sorting out complex systems resembling the power system at hand.
Deep learning can automate learning and feature extraction from raw data without the need to do
feature engineering manually. There are a lot of previous research projects available where
approaches like neural networks graphical or neural, have been used to solve power system
problems. There is a catalytic interest in such an approach in the field of fault analysis as well,
given that deep learning models have continuously shown great effectiveness in tasks such as
text analysis and image recognition for the same field. Deep learning seems to be the one
Professional knowledge is the only roadblock in the way to develop new fault diagnosis
methods. Its reliance can be only bypassed through deep learning technology that combines
feature extraction and classification. This way accurate fault detection and classification can be
performed without developing and adapting to highly complex and sophisticated techniques.
One such approach of applying deep learning technology for back-to-back MMC HVDC fault
classification has been applied through the employment of a convolutional neural network. The
neural network has been used for complex feature extraction from current and voltage readings
Transmission line faults can be divided into two Series (open conductor faults) and Shunt (short
circuit faults). For the sake of this project, we will focus on only Shunt Faults. From now on all
faults addressed are of Shunt type. 11 such faults are elaborated in Figure 1.2 below.
Figure 1.2 Types of faults in overhead Transmission lines
The occurrence of several types of faults in a Transmission line is as follows (Prerana P. Wasnik,
2019):
Transition condition initiates whenever a fault occurs which results in over currents, which often causes
lasting damage to the system. There needs to be a system in place for accurate and efficient
monitoring of the transmission network so that preventive measures can be adopted before any
serious complications occur (Prerana P. Wasnik, 2019). Conventional methods like Impedance-
based techniques and Relay-based protection systems have proved quite fruitful in the past for
fault detection and classification (Jalal Sahebkar Farkhani, 2020). However, their efficacy is
limited, as they are not able to accommodate the vast and overly complex power systems of
today. With the advancements in technology, a need arises for better, more adaptable, and
A subset of Machine learning is Deep learning, which employs complex neural networks to
unfold deep intricacies in data. Informed decisions are then made upon the information gathered
from the data. The innate ability of deep learning models to autonomously extract complex and
hierarchical features from raw data, combined with their scalability and adaptability, has sparked
immense interest in their application to power system fault detection and classification
Ideally, a model developed should be so robust that by only entering the 3 phase voltages and
currents on a single timestamp, accurate detection and classification of transmission line fault
can be carried out. The project aims to explore the possibility of developing a deep learning
technique for the detection and classification of Transmission line faults, upon the foundation of
Notably, most of the work in the fault detection area only accounts for 10 types of faults after
ignoring the LLLG fault (ABCG), This is mainly because unlike any other fault phase currents
spike rather than dipping. This is the research gap that we will largely entertain. To train the
model more accurately, all 11 shunt faults would be taken into account.
Problem Statement
Timely detection and accurate classification of Transmission Line (TL) faults is a critical
challenge for the current state of power transmission and distribution networks. Conventional
methods have often struggled to carry out real-time analysis, which results in grid stability being
compromised and extended downtime. It is necessary to develop a Deep learning (DL) solution
that is robust and capable of efficiently detecting and classifying power line faults, ensuring
immediate results that can minimize power disruption downtime. The system proposed intends
to take leverage of advanced Deep learning models to further enhance fault detection and
To analyze the dataset of the 11-shunt fault in Transmission lines and a no-fault condition,
extract required variables by using different data analytics tools and techniques.
Filling a research gap in Shunt faults by including ABCG fault in the classification
Data Acquisition is the prerequisite of any Machine learning endeavor. The data set must be
reliably sourced, mimicking real-life conditions for our trained model to work effectively. For
this purpose, a robust MATLAB Simulink simulation for a Power system was created. Data for
all 11 fault conditions and a No-fault condition was acquired. The acquired dataset is then
The project delves into multiple Deep learning models including Dense, Convolutional Neural
Networks (CNN), and Long Short-Term Memory (LSTM). The main objective remains to
provide the most efficient and accurate fault detection and classification model, hence
contributing towards the collective human effort of building and maintaining a better power
system.
Limitations of Study
Thesis Structure
The thesis report is divided into six chapters to present a brief flow of literature. In particular,
Chapter 1: It is the introduction part of the thesis. In this chapter, the theme of inquiry is
introduced. The background of the study explains the significance of shunt fault classification
in Transmission lines. Moreover, the justification for the study is also explained.
Chapter 2: This chapter covers the literature review. In this chapter, numerous pieces of
literature are about fault classification in power systems. This chapter provides the foundation
Chapter 3: This chapter explains the methodology section. In this chapter, the different
methods, tools, and techniques for feature engineering and feature selection are discussed.
This chapter also explains the intricacies and particularities of the deep learning models being
used.
Chapter 4:This chapter explains the implementation of the whole project from data
implementation steps are discussed related to data analytics tools, and Deep learning models.
Moreover, in this chapter, the data visualization results are also presented for the best-fitted
model.
Chapter6: This chapter shares the conclusion and future scope of the project. In this chapter,
only the major findings are discussed, and recommendations are also suggested based on the
findings.
Chapter 2 LITERATURE REVIEW
Sami and Junio (2017) have tried to predict gold prices. By using machine learning techniques, they
have analyzed twenty-two market variables. The study revealed that machine learning models
such as artificial neural networks and linear regression are the most appropriate techniques to
predict future gold rates. Hence the results are expected to be fruitful for investors and financial
institutions.
Makala and Li (2022) have used autoregressive integrated moving average (ARIMA) and
support vector machine (SVM) techniques to predict future rates of gold. In order to conduct
the study, the daily data from world gold council has been used from 1979 to 2019. The
analysis has used data up to 2014 for the training of the models. The data beyond 2014 used
validation. The results have shown that the support vector machine (SVM) outperformed
Chukwu dike, et al. (2020) carried out artificial neural network technique to predict future
gold rates. The study has used monthly gold prices in US dollars from October 2004 to
February 2020. Artificial neural network model (ANN) is found to be adequate technique for
predicting gold prices. Study has further carried out graphical analysis to confirm accuracy of
the model. Predicted results have suggested fall in gold prices in future.
Arena, et al. (2021) have successfully tried to predict gold prices by using machine learning
algorithms. The study has used various economic indices from different countries and
businesses. Two models artificial neural network model and linear regression model has been
Vidya and Hari (2020) have carried out research to predict future rates of gold. The study has
used LSTM Network and ANN to conduct the underlying analysis. It has been observed that
gold rates are nonlinear in nature. Hence graphically gold price swings have been represented
in the form of exponential curve. The study has used data of World Gold Council. Artificial
neural networks have been found the most appropriate model to dealt with nonlinearities in
the data.Results of the study have shown the best predictions for future gold rates.
Rady, et al. (2021) have taken in to account ARIMA, DT, RF and GBT models to predict
future gold rates. The study has used time series data of a monthly gold rates from Nov-1989
to Dec 2019. Researchers have tried to build comparison between underlying models to find
out the best forecasting technique. The study has revealed that results of RF were more
precise than those of DT, GBT and ARIMA models. In order to predict the gold prices RF has
Bingo, et al. (2020) have examined the association between gold rates and economic variables.
Underlying economic variables are termed as indicators of financial and geopolitical chaos.
The study has conducted multiple linear regression, support vector machine and auto
regression integrated moving average (ARIMA) algorithms. Results have revealed that auto
regression integrated moving average (ARIMA) model performed very well. It has been
suggested that during pandemics, investors should consider swings in historical gold rates in
network (ANN) and LSTM models have been used for this purpose. In the underlying study
data of daily gold rates from world gold council has been used from 3 September 2018 to 30
October 2020. LSTM and ANN outperformed the ARIMA model been used for forecasting
gold price.
Sarangi, et al. (2021) have used various statistical and machine learning techniques to predict
the expected return on gold investment. The underlying study has tried to explore the efficacy
of a machine learning based hybrid model in order to predict the future gold rates. Artificial
neural network (ANN) model has been used to predict monthly gold rates in India dated from
January 2012 to June 2021. Results have revealed that ANN is the best model to predict
Abdullah and Chena (2020) have carried out the study to predict gold prices by using machine
learning techniques. The study has used weekly time series data from the period of 1 January
2009 to and 1 June 2018. Data has been collected from investing.com website.
Autoregressive integrated moving average (ARIMA) model has been used for analysis. To
researchers have used evaluation metrics namely mean absolute error (MAE) and mean absolute
percentage error (MAPE). It has been observed that larger the data available for prediction
more would be the accuracy in results. The underlying study revealed 99.22% accuracy in
Chandaria and Suresh (1991) have tried to explore the appropriate machine learning
algorithms to predict gold rates in future. The study has obtained the monthly data of gold prices
in India dated from December 1999 to November 2019. The data has been collected from
website of Index mundi. The underlying study has used various machine learning techniques
namely linear regression, random forest, support vector regression and moving average
method. On comparison it has been found that the regression models are the most appropriate
Yurts ever (2021) has tried to explore the performance of LSTM, Bi-LSTM and GRU to
predict future gold rates by using monthly data. The study has used economic indices such as
crude oil price, consumer price index, stock market index, effective exchange rate and interest
Khan (2021) tried to use both linear and non-linear models such as auto-regressive integrated
moving average (ARIMA) and artificial neural network (ANN) to forecast gold prices. Hence
it was stated as ARIMA-ANN model. The study has collected data for Pakistan dated from 1
July 2003 to 1 June 2021. Data has been divided in to two parts. In the first part models have
been calculated while in second part they are evaluated. The study has taken in to account two
error metrics such as root mean square error (RMSE) and mean absolute error (MAE) to
estimate the models. Results have revealed that ANN outperformed ARIMA in terms of
predicting validity of models. Hence the findings of the underlying study have supported
ARIMA-ANN combination which delivered the best predictions compared to ARIMA and
ANN.
Chapter 3: METHODOLOGY
This section provides an insight into the selection of processes and techniques rudimentary to the
project to extract useful outcomes. This process describes the project steps along with necessary
algorithms and equations, which further helps to understand the project implementation.
Initiate by installing the necessary libraries including keras, sklearn NumPy, pickle, and Pandas.
sklearn is the primary library for Deep learning model building. NumPy and Pandas provide
It is a pivotal phase in Deep learning that involves gathering data from the source, prepare it and
then organize into a format that is suitable to be taken in as input to model. The process involves
data sourcing from various mediums and sources. It is very important to acquire data to be
diverse, responsibly sourced and of high quality, since it directly impacts the performance of the
given model.
Once acquired, it is important to pre-process the data to make it suitable for use with deep
• Data Cleaning: Remove any inconsistencies, errors, or outliers in the data that might
• Handling Missing Values: Deal with missing values in the dataset, either by imputing
variables with different scales do not bias the model's learning process. Common
for the regression model. This may involve one-hot encoding, where each category is
The process aims to pick, transform and manipulate the acquired data into usable features, that
allows us to apply deep learning models. Panning out new features irrespective of the original
datatypes and limitations, the overall pro cess of feature extraction is explained in fig 5.1.
Remove
Raw Data/ Scaling & Filling Null
Redundant
Information Normalization Values
Data
Features are the input variables which are given to the deep learning algorithms. Feature selection reduces
the input variables, only to move forward only relevant data for the implementation of a layered network.
Essential feature selection is imperative to training an optimal model to avoid any redundancies in the
learning process. Abundance of features can lead to absolute chaos; models will learn to capture irrelevant
patterns and inculcate noise patterns. Therefore, right feature selection helps minimize noise and produce
Supervised Models: The supervised feature selection model uses the output labelled class for
feature selection, in which the target variables are identified to increase the efficiency of
machine learning model. Subsequently, supervised feature selection model has three types:
Filter Method: In this type of method, features are selected or dropped based on their
correlations to the output. The filter method checks the features correlativity whether they
make positive or negative correlations with the output targets and further drop the redundant
features accordingly.
Wrapper Method: In wrapper method, the data is splited into subsets and then proceeded to
training a model. After analysing the performance of the model, the splited data further
undergoes addition and subtraction of features and continue the cycle of model training to get
Intrinsic Method: The intrinsic method is the combination the of the Filter and Wrapper
Unsupervised Models: The unsupervised feature selection model does not require the
output labelled class for feature selection. It works on unlabelled data.
After pre-processing the data, split it into a training set and a test set. The training set will be
used to train the regression model, while the test set will be used to evaluate its performance.
A common practice is to allocate around 80% of the data for training and reserve the
remaining 20% for testing. This split ensures that the model is trained on a sufficient amount
Utilize TensorFlow's Keras API, which provides high-level abstractions for building neural
networks, to construct the regression model. The model's architecture needs to be defined,
including the number of layers, the number of nodes in each layer, and the activation function
used in each node. The choice of architecture depends on the complexity of the problem and
the available data. For a simple regression model, a single layer with one node and no
activation function may suffice. However, for more complex relationships and patterns,
multiple layers with varying numbers of nodes and appropriate activation functions, such as
Train the regression model using the training set. TensorFlow provides optimization
algorithms, such as stochastic gradient descent (SGD) or Adam, to update the model's
parameters iteratively and minimize the loss function. The loss function measures the
discrepancy between the predicted values and the actual values in the training data. During
training, the model learns to adjust its parameters to minimize this discrepancy and improve
Evaluate the performance of the trained regression model using the test set. Calculate
relevant metrics, such as mean squared error (MSE), root mean squared error (RMSE), mean
42
absolute error (MAE), or R-squared (R2), to assess the model's accuracy and predictive
capability. These metrics provide insights into how well the model generalizes to unseen data
and its ability to make accurate predictions. Additionally, visualizations such as scatter plots
or residual plots can help analyze the model's performance and identify any patterns or
discrepancies.
Iterate on the model by fine-tuning its architecture, hyperparameters, and preprocessing steps
to improve its performance. This may involve adjusting the number of layers and nodes,
This chapter discusses discuss the implementation and results of the project based on different
implementation process declares the steps of the project which are designed to ascertain the
desired goals. It also helps to understand the project alternatives. The results are examined
based on the implementation process for the success and failure tendency of the project.
Data Source
There is a lack of open-source data regarding shunt faults. Most of the work done in this domain
relies on synthetic data generated through simulation that mimics real conditions (Khaoula
Assadi, 2023) . The same approach is extended over here, a dataset is synthesized using real-
world-based simulations.
There is a lack of open-source data. Most of the work done in this domain relies on synthetic
data generated through simulation that mimics real conditions (Khaoula Assadi, 2023) . The
To enable the evaluation of the deep learning approach a simulated dataset is generated through a
frequency 60Hz, using six 350 MVA generators as source, delivers power from the generating
station to a network connected to variable load through a transmission line of 300 km length
operating at the base voltage of 735 kV and base power of 100 MW. The first bus B1 is on the
generation side, second bus B2 is on the load side. CB1 and CB2 are the two line-circuit
breakers.
Figure 4.1 MATLAB Simulation of a real-time power distribution system, used to simulate shunt faults in the system
Voltages and Currents are measured on both buses B1 and B2. The Fault Breaker block is used
to execute the desired fault on the transmission line at the desired resistance and length. The
Data recording: Three-phase voltages and currents are recorded per millisecond after 2 seconds,
with no fault (perfect condition). The acquired data is then stored as a MATLAB workspace
under variables named after the files. they are then converted into individual csv files for each
fault, due to its compatibility with other Machine-learning frameworks. Various range of fault
scenarios and fault references are represented through these to further dive into training a
suitable model.
All 12 csv files representing each shunt fault are imported as Pandas Data Frames. It is important
to note here that since the fault is generated at 0.167th seconds of the total 10 seconds where the
data is recorded every 0.000005th seconds there are 2000001(1 additional entry accounting for
T=0 second) entries per csv file. We skip the first 4000 rows per csv file to take only fault data
Figure 4.2 Importing data from the MATLAB exported csv files
A new column representing each fault type is introduced for each data frame. To differentiate
between fault data all these data frames will be concatenated into one data frame.
Figure 4.3 Adding a new fault column to all the data frames
All data frames contain a timestamp (Time), the three-phase voltage reading (Va, Vb, Vc), the
three-phase current reading (Ia, Ib, Ic), and lastly the shunt fault type (Fault). The
The data is acquired at 60hz (time period comes out to be 0.167sec). All 8 sets of data contain
480 times of a single wave cycle repeated for all the voltages and currents. 0.2 sec to 0.3 sec is
All 12 data frames need to be combined into one to achieve the goal of applying Deep learning
techniques to the data. The only approach is to concatenate row-wise all 12 data frames. Using
the pd.concat command of the pandas library and setting the axis=0 for row-wise concatenation.
Since concatenation is a linear process and it is required to spread the data evenly throughout and
prevent biases in the random batch and later into the model training, the obtained data set is then
shuffled. Shuffling is done through the Dataframe.sample command with parameter frac=1
(frac stands for what fraction of data is to be sampled, it is sampled between 0 to 1 where 1
means all of the data) to sample all data and ends up with a shuffled data frame carrying all faults
data.
Figure 4.6 The data now has 9 columns x 2352012 rows, with the last column representing our target column which is Fault
It is a crucial step in the field of Machine learning (ML) and Deep learning in particular. Raw
data is transformed into a format well-suited for model training. This process is particularly
pivotal as data that is typically encountered is highly dimensional and complex in nature images,
audio, or text.
The electrical parameters of time and the three-phase voltages and currents, namely 'Time', 'Va',
'Vb', 'Vc', 'Ia', 'Ib', and 'Ic’ were extracted from the data frame as Features for the Deep
learning and saved in an array named X. These features collectively provide comprehensive
insights into the system's behavior and are crucial for fault detection and classification. The
Data normalization is a very important pre-processing tool, especially with feature extraction in
deep learning. Features are scaled and transformed into a uniform distribution aiding the neural
networks to train and perform better with more stability (Singh, 2019).
Normalization was applied to the feature set to ensure that all features were on the same scale.
This process enhances the model's convergence during training and avoids the dominance of
certain features due to their larger magnitudes. The StandardScaler from the sklearn.The
preprocessing module was used to normalize the features, resulting in a dataset with zero mean
It is a rudimentary tool in Machine learning and Deep learning, particularly in cases where
categorical data is dealt with. Any discrete, non-numeric values are termed as categorical e.g.
Label encoding is the process of transforming such categorical data into numerical format. Each
unique category is assigned a unique whole number, essentially creating a category to their
corresponding numeral label mapping (Sebastian Raschka, 2020) e.g., if we have a categorical
array {cat, dog, bird}, it might get assigned numeral labels such as {0, 1, 2} respectively.
The 'Fault' column, representing categorical fault labels, was encoded using label encoding.
This transformation converts categorical labels into numerical values while preserving the
ordinal relationship among different fault categories. The LabelEncoder class from the
There remains a chance of ordinal relationships being created between label-encoded categories.
To resolve this, one hot encoding is applied. One hot encoding assigns a unique index for each
category, hence each category has a binary vector corresponding to it. It ensures that each
category has an isolated dimension, so rather than creating ordinal relations, the neural network
The Label-encoded target columns are one hot encoded through the to_categorical command
from the sklearn.preprocessing module. Turning ordinal format into binary vectors.
Figure 4.8 Transition of target values from categorical to ordinal into binary vectors
y y_encoded y_onehot
No_Fault 11 [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]
CG 10 [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.]
AB 0 [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]
ABC 1 [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]
AG 6 [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.]
BCG 8 [0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.]
ABCG 2 [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]
ACG 5 [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.]
ABG 3 [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]
BC 7 [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]
AC 4 [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.]
BG 9 [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.]
The table above shows the key to how each fault now has a corresponding binary vector. After
feature extraction, data normalization, and one hot encoding, the data is in a suitable form to be
To train a neural network it is exposed to a dataset holding output-input pairs, which results in
the network learning the underlying relationships and patterns within the dataset. However, it’s
important to evaluate the performance of the trained model on new unseen data, to see if it
generalizes well to novel data. So, data must be subjected to a train-test split.
Data is split into two mutually exclusive subgroups i.e., train set and test set. The train set is only
used to train the model while the model is oblivious to the test set. The test set is only introduced
to the model after training to evaluate the performance of the model generally on novel data.
furthermore, the train and test set should be representative of the overall data distribution, to
prevent any model’s skewed and biased performance. This is ensured through the data
For the finetuning of hyperparameters during the training process, an additional subset might be
sometimes needed called a validation set. It helps make important decisions about the model
architecture. It is a practice set for the model before being evaluated on an unseen testing set.
To evaluate the model's performance, the dataset was split into training and testing subsets using
a widespread practice of an 80-20 split. The normalized feature matrix X_normalized and the
encoded target vector y_encoded were divided into X_train, X_test, y_train, and y_test using
the train_test_split function from sklearn.model_selection. This separation ensures that the
The stratify parameter ensures that the split maintains the same proportion of classes in the two
subsets as present in the original data. This is highly crucial in cases like the shunt fault dataset
with a large number of classes. Fixing random_state to a number will ensure reproducible
normalization, and label encoding. These steps prepare the dataset for training and evaluation,
enabling the deep learning model to effectively learn and generalize from the data.
For accurate and robust results, it’s important to select a well-suited model for the dataset
rounded explorations through various models and hyperparameter tuning (R. Fan, 2019). The
data at hand with nuanced features needs a model with complexity well versed in our dataset.
Two architectures of deep learning have emerged as promising in this context, the Convolutional
Neural Network (CNN) and the Long Short-Term Memory (LSTM) network. They were chosen
specifically for their unique capacity of accommodating the challenges imposed by our data.
By concentrating on the two selected architectures, the aim is to deeply evaluate their respective
performance and discern the superior model that is better at modeling underlying scenarios of the
data. the two models are selected on the criteria that it not only effectively fits the data but also a
It is a type of Artificial Neural Network, designed to process and analyze grid-like data as in
image-based data. It is a remarkably effective architecture for fault detection and classification
(Anshuman Bhuyan).
The data was reshaped into an image like the 2D structure of an image as Convolutional Neural
The network is designed to take input data in the shape (number of features,1), the data has 7
features. X_train_reshaped and X_test_reshaped are reshaped from 2D arrays into 3D arrays of
shape (the number of samples, the number of features,1) respectively. it is a very common
1D Convolutional Layer: This foundational layer has 32 filters. The input data is to be scanned
by a kernel of size 3, capturing local patterns of data efficiently. Nonlinearity is infused into the
network through the activation function ‘relu’ (rectified linear unit), which is quite hands-on in
Max Pooling Layer: Convolution layer followed by a max pooling layer. It beautifully decreases
data dimensions, down downsampling while keeping all the essential data information. The pool
size of 2 effectively reduces data without compromising the model’s ability to capture salient
features.
Batch Normalization Layer: As the depth of the network increases plenty of problems such as
gradient vanishing arise, which can lead to slow and stalled learning and difficulty in updating
weights. Batch Normalization is a strategic decision to normalize out of each layer and combat
Flatten Layer: Output of earlier layers must be flattened (reshaped) to be made ready to be taken
Dense Layers:128 units’ Dense Layer is employed to create sophisticated spatial patterns. The
non-linearity factor is once again introduced through the ‘real ’ activation function for unfolding
Dropout Layer: They are generally used to prevent overfitting by randomly deactivation neuron
proportions during training of the model. The network’s ability to generalize is enhanced by a
Output Layers: The ‘softmax’ activation function is used in the final Dense layer. The activation
function converts the vectors into target class probabilities. It creates a distribution over all fault
The Convolutional Neural Network (CNN) model is then compiled with ‘categorical cross-
entropy loss, which measures the difference between the predicted class probability and the true
class probabilities. The optimizer of choice is ‘Adam’ (Adaptive Moment Estimation), It adapts
the learning rates of individual model parameters based on their historical gradients and squared
gradients, helping to balance the speed of convergence and the stability of the optimization
process.
Early Stopping: The validation loss metric is monitored and the learning is halted if there is
improvement in the metric, this prevents overfitting and conserving computational power. The
ReduceLROnPlateau: The validation loss metric is monitored and whenever the metric plateaus
the learning rate is dynamically reduced. This adds to the finetuning of the model to fit the data
better
min_lr=1e-6)
callbacks=[early_stopping, reduce_lr])
The data is trained in the batches of size 64 over 50 epoch cycles. The meticulously selected
techniques, parameters, and layers combined contribute to the efficient and robust nature of the
Convolutional Neural Network (CNN) model for grid-based data analysis particularly for fault
problems, hence remarkably effective in training Time series data. Special memory cells store
information for extended periods to prevent loss of context. 3 gates: input, forget, and output;
govern the memory cells. The input gate decides which added info to incorporate while the
forget gate decides which info should be discarded Finally the output gate decides which info has
to be removed from the cell. TL fault problem has Time series data, so Long Short-Term
Memory (LSTM) seems to be an excellent choice (Fezan Rafique, 2021). Inherent temporal
tendencies of the data are the center of the architecture created. The network is designed as
follows:
Long Short-Term Memory (LSTM) Layers: 64 units of LSTM layer are pumped by the “tanh”
activation function. Tanh introduces nonlinearity to the gated memory cell mechanism of LSTM.
Thus, enabling the network to capture complex relations in sequence data. Return sequences
prepare output for the next layer for seamless sequence-based analysis.
Dropout Layer: To prevent overfitting and add to the network’s prowess, a 0.2 rate dropout is
’activation function. This set of lstm layers tracks temporal patterns in the provided data.
Dropout Layer: Its own as another regularization mechanism, preventing overfitting and
Dense Layer: The final layer with the help of ‘softmax’ activation. Prediction vectors are
The Long Short-Term Memory (LSTM) model is then compiled with ‘categorical cross-entropy
loss, which measures the difference between the predicted class probability and the true class
probabilities. The optimizer of choice is ‘Adam’ (Adaptive Moment Estimation), It adapts the
learning rates of individual model parameters based on their historical gradients and squared
gradients, helping to balance the speed of convergence and the stability of the optimization
process.
Two Callback functions are also used. ‘Early Stopping’ and ‘ReduceLROnPlateau’ to enhance
Early Stopping: The validation loss metric is monitored and the learning is halted if there is
improvement in the metric, this prevents overfitting and conserving computational power. The
ReduceLROnPlateau: The validation loss metric is monitored and whenever the metric plateaus
the learning rate is dynamically reduced. This adds to the finetuning of the model to fit the data
better
# Define callbacks
In the area of Deep learning, to achieve accurate and reliable results across a diverse dataset, it is
important to select and fine-tune an appropriate model. In this project, we selected two very
powerful neural networks and rigorously examined them, namely the Convolution Neural
Network (CNN) and the Long Short-Term Memory (LSTM) network. The primary objective of
into a comparative analysis we shed light on the distinct strengths and bounding limitations of
the respective models. Rigorous testing on the obtained test set decides which architecture
Furthermore, valuable insights will be provided for further research in the field. Through this
endeavor, it is aspired to find the most suitable architecture for the shunt fault detection and
Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network were
evaluated on the test dataset and multiple performance metrics such as validation loss were recorded.
Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM) network. These
models were evaluated on a test dataset, and the performance metrics such as loss and accuracy
were recorded.
The Convolutional Neural Network (CNN) was trained on 14,701 samples from the dataset. It
took 61 seconds per epoch approximately for the training process to complete. The average error
between the predicted and actual values was calculated by the final loss on the set which came
out to be 0.1441. Better performance is signified by lower loss. Accuracy of 90.96% was
achieved on the test set, this percentage represents the correctly classified sample percentage.
The Long Short Term Memory (LSTM) model was also trained on 14,701 samples from the
dataset. It took 283 seconds per epoch approximately for the training process to complete. The
average error between the predicted and actual values was calculated by the final loss on the set
which came out to be 0.1156. Better performance is signified by lower loss. An accuracy of
91.67% percentage was achieved on the test set, this percentage represents the correctly
classified sample
However, the training process for the Long Short-Term Memory (LSTM) model was more
computationally intensive, taking approximately 283 seconds per epoch. This suggests that Long
Short-Term Memory (LSTM) models might require more time to process sequential data due to
In summary, based on the provided information, the Long Short-Term Memory (LSTM) model
outperformed the Convolutional Neural Network (CNN) model in terms of both loss and
accuracy on the test dataset. This could imply that the dataset contains temporal dependencies
that the Long Short-Term Memory (LSTM) network can capture effectively.
Long Short-Term Memory (LSTM) might seem to be the superior model here but computational
intensiveness is its strongest limitation, 61 seconds and 283 seconds per epoch is a major
However, it's important to note that these results do not provide insights into the broader context
of the task or the specific dataset used, and further analysis would be required to draw more
robust conclusions.
To examine the training and validation processes in detail their progress should be visually
represented. This could be achieved through loss and accuracy being plotted over epochs.
Training loss starts higher up at 0.3568, it declines to 0.25 within the next 10 epochs, this steep
decline represents a great learning process over the training data. From 10 to 20 epochs, there is
only a drop of 0.5 as the learning stabilizes, this stable not so steep decline continues for the next
30 epochs as the test loss settles at 0.1617. Notice the sudden drops that appear whenever the
Learning Rate is reduced due to call-backs; these occur specifically at 23, 39, and 47 epochs.
Validation loss starts lower than Training loss at 0.2387, it is starting at a stage training loss
reached after about 12 epochs. There is a steady decline through epochs, with a sudden drop at
23, 39, and 47 due to LRonPlateau call-back function. At 50 epochs validation loss comes out to
0.1440
Training loss starts higher up at 0.1409, it declines to 0.1164 within the next 5 epochs, this steep
decline represents a great learning process over the training data. From 5 to 10 epochs, there is
only a drop of 0.0004 as the learning stabilizes, this very minuscule decline continues for the
Validation loss starts lower than Training loss at 0.1162, it is starting at a stage training loss
reached after about 5 epochs. Interestingly enough the validation loss first increases and then
decreases both the loss curves converges into one at 8 epochs, indicating that the model is well
Training accuracy starts at 0.8459, which keeps increasing steadily over the next 50 epochs to
0.9046, although there is only a 0.5 increase over 50 epochs it is still considered a good accuracy
curve, Validation accuracy starts at 0.8810 and steadily increases to 0.9095. although both curves
show the same curvature, they don’t converge into one at any point. The gap maintains that the
validation and training accuracy aren’t matching hence the model is a little skewed.
Figure 5.4 CNN Training and Validation Accuracy
Long Short-Term Memory (LSTM) training and validation curves don’t show any progress they
continuously oscillate and end up at values exactly where they begin at 0,916. The curves meet at
Upon close observation of all the curves, Long Short-Term Memory (LSTM) curves seem to be
more divergent, and hence Long Short-Term Memory (LSTM) is a slightly better model. We
can't conclude only by looking at curves further exploration is still required to draw valid
conclusions.
parameters that measure how well data is being classified into different classes. It is especially
Precision: Measure all the true predictions made by the model by taking the ratio of true
Recall: Measures the ability of the model to catch all the positive instances by taking a ratio of true
F1-score: It is a measure of both precision and recall by calculating their harmonic mean.
Figure 5.8 F1-score equation
Accuracy: It measures how accurate the model classifies by taking the ratio of the sum of true
Weighted Average: Average of F1 score weighted by the number of samples of each class.
Long Short-Term Memory (LSTM) has a higher precision and recall generally as most of them
are 1, This proves that LSTM is more correctly classifying the faults. A higher F1 score in the
case of Long Short-Term Memory (LSTM) shows a good precision and recall balance being
maintained.
Confusion matrix is a table of actual class vs. predicted class that describes the deep learning
model on test data over known values. It is particularly pivotal for multiclass classification where
you can’t gauge model performance by only looking at accuracy. It allows a visual into the
faults have complex structures to classify which is clear due to the low precision and recall
scores. An LG fault (CG) also shows low scores. To further investigate this behavior Confusion
matrix of class prediction vs. true class is created. All classifications show minuscule miss
The conclusion is that while most faults have a high probability of being correctly classified,
It plots the true positive rate vs. false positive rate for various thresholds. The Area Under the
Curve (AUC-ROC) is the most common threshold to measure the classification performance of
a matrix
It proves that ABCG and CG fault types have a lower Area Under Curve (AUC) of 0.95 which
5.6 Discussion
The project jumped into two Deep learning models Convolutional Neural Network (CNN) and
Long Short-Term Memory (LSTM) for shunt fault detection and classification in Transmission
lines. Through careful study of the experimental results, valuable insights into the strengths and
Long Short-Term Memory (LSTM) outperforms Convolutional Neural Network (CNN) upon
evaluation of the test dataset in terms of accuracy and loss. Hence, it has a more effective
tendency to catch temporal patterns of the dataset. The limiting factor remains that it takes 283
long seconds per epoch and hence computationally pretty intensive and practically inconvenient.
Furthermore, the meticulous analysis of learning curves provided an depth understanding of the
learning journey of both models. In the case of Convolutional Neural Network (CNN), there is a
steady decline in both the losses representing the robustness of the process. On the contrary, the
Long Short-Term Memory (LSTM) model training loss curve displayed rapid initial decline but
The Classification Report, Confusion Matrix, and Receiver Operating Characteristic (ROC)
curve analysis showed an overall very high rate of true classification, with two exceptions. The
fault types ABCG and CG have comparatively lower Area Under the Curve (AUC) scores and an
overall higher rate of misclassification. This indicates suboptimal classification for these two
fault types. To conclude, the superior performance of Long Short-Term Memory (LSTM) in fault
6.1. Conclusion
To craft an effective and accurate Transmission Line shunt Fault Detection and
Classification system utilizing a deep learning approach. The 11 shunt Faults are the
main focus for which an encompassing methodology is developed. The data was
A Fault dataset was generated through a real-time MATLAB Simulink simulation. The
extraction, normalization, and label encoding (Venkatesh, 2018). This prepared the data
The experimental results showed that both the Convolutional Neural Network (CNN)
and Long Short-Term Memory (LSTM) models were capable of effectively detecting
and classifying transmission line faults. However, the Long Short-Term Memory
(LSTM) model shows slightly better performance compared to the Convolutional Neural
Network (CNN) model, achieving lower test loss and higher test accuracy. This
superiority of the Long Short-Term Memory (LSTM) model is attributed to its ability to
Out of 11 shunt Faults two: ABCG and CG, proved to be challenging classifying. These
showcased low recall scores and precision. Additionally, lower AUC for the said faults
6.2. Recommendations
Further research is highly recommended in terms of the two more challenging faults. Several
small changes to the process crafted above can have highly improved results.
Exploring Real Data vs. Simulated Data: Integrating real data with the simulated data
acquired in the project can enhance data quality further (Tayo, 2019). Obtaining real data that is
more versatile is a feat since the area of our study involves data not readily available to the
general public.
Conduct Sensitivity Analysis: It enables us to make better decisions for our data modeling and
analysis. Key predictors are identified and their impact on the results can be further explored
taking their predictions mean, will result in a superior robust model that employs the skills of all
Employing Hybrid Models: Deep learning models when integrated with probabilistic
The research so far adds to the advancement of Power System fault detection and classification.
It truly accentuates the Long Short-Term Memory (LSTM) model’s potential for efficaciously
dealing with Time series datasets like TL fault analysis. While there are still some imposing
challenges, the overall high performance indicates the progress made to be decent and successful
Ali Raza, A. B. (2020). A Review of Fault Diagnosing Methods in Power Transmission Systems. MPDI.
Anshuman Bhuyan, B. K. (n.d.). Convolutional Neural Network Based Fault Detection for Transmission
Line. international Conference on Intelligent Controller and Computing for Smart Powe, (pp. 1-4). 2022.
Cach N. Dang, M. N.-G. (2021). Hybrid Deep Learning Models for Sentiment Analysis. Hindawi.
Feifei Xu, Y. L. (2023). An improved ELM-WOA–based fault diagnosis for electric power. Frontier
Energy Res.
Fezan Rafique, L. F. (2021). End to end machine learning for fault detection and classification in power
Jalal Sahebkar Farkhani, M. Z. (2020). The Power System and Microgrid Protection—A Review.
Sustainable Technologies in Intensive Energy Industrial Consumers: The New Path to Carbon Neutrality.
Khaoula Assadi, J. B. (2023). Shunt faults detection and classification in electrical power transmission
line systems based on artificial neural networks. The international journal for computation and
Lee, J. B. (2016, September 21). How To Improve Deep Learning Performance. Deep Learning
Perfromance.
M. Singh, B. K. (2011). Transmission line fault detection and classification. International Conference on
Majid Jamil, S. K. (2011). Fault detection and classification in electrical power transmission system.
Springer Plus.
Markus J. Ankenbrand, L. S. (2021, febuary 15). Sensitivity analysis for interpretation of machine
learning based segmentation models in cardiac MRI. BMC Medical Imaging, 21.
Qinghua Wang, Y. Y. (2020). Fault Detection and Classification in MMC-HVDC Systems Using
R. Fan, T. Y. (2019). Transmission Line Fault Location Using Deep Learning Techniques. North
Sebastian Raschka, J. P. (2020). Machine Learning in Python: MaiDevelopments and Technology Trends
Singh, D. S. (2019). Data normalization is a very important pre-processing tool, especially with feature
extraction in deep learning. Features are scaled and transformed in to a uniform like distribution aiding
the neural networks to train and perform better with more stabili. Science Direct.
Tayo, B. O. (2019, December 13). Combining Actual Data with Simulated Data in Machine Learning.
Commonwealth University.
Zhang, C. L. (2016). Fault classification on transmission line of 10kV rural power grid. Proceedings of
the 2015 4th International Conference on Sensors, Measurement and Intelligent Materials.