Deep Learning for Electrical Insulator Inspection
Deep Learning for Electrical Insulator Inspection
Master Thesis
Em primeiro lugar, agradeço aos meus pais pelo esforço que fizeram e pela opor-
tunidade que me deram.
À Laura pelo apoio e compreensão.
Agradeço ao meu orientador Prof. Dr. André Dias pelo desafio proposto e pela
possibilidade de estudar este tema.
A todo o grupo LSA pela ajuda ao longo do mestrado e em especial aos elementos
da Aerial Team André Ferreira, Miguel Moreira e Tiago Miranda.
Obrigado aos meus amigos e colegas de mestrado mas também aos meus ami-
gos e colegas da licenciatura em Engenharia Electromecânica da UBI pelo apoio e
motivação durante todo o (longo) percurso académico.
O trabalho desenvolvido apenas foi possı́vel graças ao apoio de todos. O meu
sincero obrigado.
Abstract
Keywords: Deep learning, YOLO, power line inspection, UAV, gimbal control,
real-time, object detection
i
Resumo
iii
Contents
1 Introduction 1
1.1 Contextualisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Electrical asset inspection project - EDP Lablec . . . . . . . 2
1.2.2 Contests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 State of Art 5
2.1 Non-Neural Network Methods . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Neural Network Methods . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Convolutional Neural Network (CNN) models . . . . . . . . . . . . 6
2.4 On cable robots for power line inspection . . . . . . . . . . . . . . . 8
2.5 Aerial robots for power line inspection . . . . . . . . . . . . . . . . . 9
4 Implementation 19
4.1 What is an insulator? . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.1 Bounding boxes . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3.1 Processing platform . . . . . . . . . . . . . . . . . . . . . . . 21
v
CONTENTS
5 Results 33
5.1 Error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 YOLO kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3 Insulator detection speed . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3.1 Running from video file . . . . . . . . . . . . . . . . . . . . . 36
5.3.2 Running on ROS . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3.3 False positives . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.4 Gimbal control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4.1 Insulator tracking . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4.2 Insulator not found . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4.3 Tracking issues . . . . . . . . . . . . . . . . . . . . . . . . . . 40
vi Daniel Oliveira
List of Figures
vii
LIST OF FIGURES
4.12 The batch size influence during the training. The green line marks 3
and red line mark 10. . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.13 System architecture overview - Processing pipeline . . . . . . . . . . 29
4.14 ROS architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.15 Gimbal Control - ROS package. . . . . . . . . . . . . . . . . . . . . . 31
4.16 Gimbal Control - State machine. . . . . . . . . . . . . . . . . . . . . 31
ix
Acronyms
xi
1
Introduction
1.1 Contextualisation
Due to the population growth, is necessary to create more and better infrastruc-
tures to supply all population’s demand. Every single infrastructures need to
be in good status to be operational, as is the case of power electrical systems.
With this sentence in mind, preventing electrical faults on power line systems is a
crucial task. Nowadays the power line inspection is done with helicopters [1] or with
humans climbing the towers, which is a very expensive and very dangerous method.
Every year, all around the world, there are helicopter crashes or accidents during
the inspection.
Robotic systems have increased their importance and performance in several and
various fields, since military to sea exploration passing through medicine. This work
aims to be an alternative to the traditional methods, avoiding human’s life risk with
a cheaper and faster solution using an Unmanned Aerial Vehicle (UAV) for visual
inspection and data record on power line inspection.
1.2 Motivation
The Autonomous Systems Laboratory of Instituto Superior de Engenharia do
Porto (ISEP) has many works on robotics field, specifically in the field of marine
mission and aerial inspection. All this know-how creates a good environment to
1
CHAPTER 1. INTRODUCTION
develop new methods and use cutting edge technologies for the most diversified
challenges.
1.2.2 Contests
The Autonomous Systems Laboratory also participates in contests using its own
developed technology. The most successful participation was on Eurathlon 2015,
in which won the Grand Challenge (Land + Air +Sea) plus the Sub-challenge (Land
+ Air): Survey the building and search for a missing worker and in Eurathlon
2017 won the Grand Challenge (Scenario 1: land, sea and air), plus Pipe inspection
and search for missing workers (Scenario 3: sea and air) and get the third place
on Survey the building and search for missing workers (Scenario 2: land and sea).
These victories prove the knowledge’s quality that exists in the laboratory.
2 Daniel Oliveira
1.3. GOALS
1.3 Goals
The purpose is to develop a system capable of automatically detect, in real-time,
insulators on the electric pole. The images are captured using a UAV with a camera
mounted on a gimbal system. This system has to be able to adjust the gimbal
orientation in order to centre the image on insulator in a way that a better view
during the visual inspection is obtained.
Insulators can be made of several types of materials, like porcelain or glass,
each one which have their own modes of light reflection making detection process
difficult. In some cases, their inspection involves image capturing combining visible,
infrared and ultraviolet range [3]. To avoid all these differences and perform a better
detection, it was trained an Artificial Neural Network (ANN) with the purpose of
differentiating the object from the background. The ANN chosen was the You Only
Look Once (YOLO) 1 2 architecture, developed on Darknet framework, created by
Joseph Redmon.
The goals for this work are:
• Develop a method to centre the insulator on image, with the control of the
gimbal.
1
Official documentation: [Link]
2
TED Talk by Joseph Redmon: [Link]
computer_learns_to_recognize_objects_instantly
Daniel Oliveira 3
2
State of Art
This chapter is a brief resume of works in the visual detection and power lines
inspection. For visual detection is approached features detection methods and the
use of ANN. Some CNN models are referred, not because they were used in this
work but because of their importance on the CNN field. There are already robots
developed for power lines inspection and some of them are also referred in this
chapter.
5
CHAPTER 2. STATE OF ART
AlexNet [16]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton created a “large, deep
convolutional neural network” that was used to win the 2012 ImageNet Large-Scale
Visual Recognition Challenge (ILSVRC). The network called AlexNet get a top 5
test error rate of 15.4% where the second place had 26.2% error rate. This mark
starts the beginning of the CNN uses for image detection/classification.
The network contains 8 layers (5 convolutional layers and 3 fully-connected lay-
ers) and applies the ReLU on all outputs. The first convolutional layer has 96 filters
(kernels) of size 11 × 11 × 3.
6 Daniel Oliveira
2.3. CNN MODELS
The network was trained on ImageNet dataset and used augmentation techniques
(applied some transformations on images) like what was done by [15].
Region Based CNN: R-CNN [17], Fast R-CNN [18], Faster R-CNN [19]
The contribute of VGG was the simplicity and depth. The network is composed
of 19 layers using 3 × 3 filters and 2 × 2 max-pooling layer. The input images have
a fixed-size of 224 × 224 RGB colour.
GoogLeNet [21]
Google presents a really deep CNN with inception model. It contains 100 layers
with several filters (1 × 1, 3 × 3 and 5 × 5). Google won the 2014 ILSVRC with this
work.
YOLO [22][23]
Figure 2.1: YOLO image division and bounding boxes (Source: [22]).
Daniel Oliveira 7
CHAPTER 2. STATE OF ART
The paper [23] won the ”CVPR 2017 Best Paper Honorable Mention Awards”.
(a) CAD design of Robonwire - cable in- (b) Cable inspection robot
spection robot (Source: [25]). (Source: [26]).
Figure 2.2: Two examples of robots for line inspection. The robot is held on the cable and
moves along the cable, recording data or send it to a ground station.
Figure 2.3: Insulator inspection robot. This robot has cameras installed for visual inspection
and also perform electrical tests (Source: [29]).
8 Daniel Oliveira
2.5. AERIAL ROBOTS FOR POWER LINE INSPECTION
Figure 2.4: The proposed cooperative UAV systems for power line inspection (Source: [35]).
Daniel Oliveira 9
Deep Learning concepts
3
In this chapter are presented some basic concepts of ANN. There is many differ-
ent types of ANN (discussed in section 3.1) and it isn’t possible to cover them all.
This work focus on image processing and so this chapter will cover only the CNN
models that, by their nature, are strongly used for image processing.
11
CHAPTER 3. DEEP LEARNING CONCEPTS
Figure 3.1: Illustration of a simple ANN. The input layer (red) receives the input data
and send to the hidden layers (blue) which at the end send to the output layer (green)
(Source: [37]).
is only one Hidden Layer the ANN is called Machine Learning and if there is more
than one it’s called Deep Learning.
The Hidden Layer never interacts directly with the input or the output data as
that jobs for the Input and Output layers the name implies. These layers can be
”trained” to a purpose for example, if the network was trained for boat detection
on an image, it is able to classify/detect the image as a boat or not.
Applications
Deep Learning is being used in various fields. Where there is data to organ-
ise/process probably there is a neuronal network behind. Most of the common uses
for deep learning are:
12 Daniel Oliveira
3.2. HOW DOES IT WORKS?
Figure 3.2: Deep Learning performance related with the amount of data. The performance
of Computer Vision methods ends up stabilising rather than Deep Learning (Source: Andrew
Ng. in “What data scientists should know about deep learning” at ExtractConf 2015)
Models of ANN
There is many models of ANN and is important to choose well the better one.
Even with a good dataset (quality and number of samples), if the model isn’t ap-
propriated for the problem, the ANN will never get good results. The Figure 3.3
shows some ANN models.
The CNN model (models with pink circles on Figure 3.3) works well with adjacent
data and this is one reason for why it is commonly used in image processing (work
with adjacent pixels).
3.2.1 Perceptron
A perceptron is an artificial neuron. This concept was developed in the 1950s
and 1960s as shown in Figure 3.4 show an example. The perceptron takes the input,
x1 , x2 , x3 , multiplies it by the weights w1 , w2 , w3 , and do the sum of all operations.
After, it adds what is called the bias and returns the neuron’s output, which is 1 if
it’s equal or greater than 0 or 0 if it’s lower.
P
1 if i wi x i +b≥0
Output = P
0 if i wi x i +b<0
Adjusting the values of weights (or bias) we get different outputs. Deep Learning,
during the training, adjusts this values to get a good model fit.
The problem of perceptrons is that small values changes can cause a big change
on the next neurons. To avoid that it is used other types of activation, which
Daniel Oliveira 13
CHAPTER 3. DEEP LEARNING CONCEPTS
Figure 3.3: Some models of ANN. In this work it is used the CNN model (pink colour)
(Source: [Link]/neural-network-zoo)
Figure 3.4: Representation of a perceptron. The xi are the inputs, wi weights and b is the
bias neuron.
allow applying small changes without getting big changes on the next step [38].
This problem is also called ”Vanishing gradient problem”. To do that Sigmoid or
Rectified linear Unit (ReLu) functions are used and for values between 0 and 1 can
be considered instead of a binary output, Figure 3.5.
14 Daniel Oliveira
3.2. HOW DOES IT WORKS?
Figure 3.5: Example of perceptron’s activation methods. On left is the Sigmoid, centre
tanh and on the right ReLU (Source [37]).
3.2.2 Convolution
On CNN the neuron output is the result of the convolution between the image
and the filter to apply. The more similar the image is with the filter, bigger is the
output value of the convolution.
The Figure 3.6 shows an example of a convolution between the image (left) and
the filter (centre). The filter search for diagonal lines, from top left to down right.
As it is shown, the more the filter matches the image the more activated the neuron
is.
Figure 3.6: Example of an convolution between image and filter. This filter looks for
diagonal lines on the image. It’s possible to see where the filter is equal to the image, the
output is high.
After the filter is applied to the image, and the output ”normalise” with ReLu
or Sigmoid functions, what is called Activation Map is obtained. The Activation
Map shows how the neuron was activated by a particular region on the image.
Daniel Oliveira 15
CHAPTER 3. DEEP LEARNING CONCEPTS
3.2.4 Back-propagation
In some way, the CNN needs feedback of their predictions to update itself and
fit in the model. This kind of feedback process is called Back-propagation. Back-
propagation can be separated into four distinct sections, the Forward Pass (FWP),
the Loss Function (LF), the Backward Pass (BWP), and the Weight Update (WU).
The sequence is listed below:
1. Forward Pass Pass the data through the network, always to the next layer.
In this step, the CNN do the predictions.
2. Loss Function The Loss Function tell how far the predictions to the label
[Link] most of the cases, the Loss Function is calculated with Mean Squared
Error (MSE).
4. Weight Update Update the weight value and repeat the process. The new
weight value is calculated this way:
∂LF
W eightN ew = WInitial − η
∂W eight
Learning rate allow the CNN, during the training phase, to learn faster or
slower. If the Learning Rate is too big the training is faster but it can never reach
the minimum error. If the Learning Rate is too small the training will be very slow.
16 Daniel Oliveira
3.3. CNN ARCHITECTURES
To reach a balance what is usually done is starting with a big Learning Rate, since,
in the beginning, there is a big error so it is possible to do ”big steps” and during
the training reduce the Learning Rate to reach the minimum error. For each set
of training images, batch, the program will repeat this process for a fixed number
of times, iteration. When all data have passed through the CNN it is called an
epoch. As an example, if the dataset is composed of 1000 images and batch is 500,
in 2 iterations is counted 1 epoch.
• Convolutional The layer compute the convolution between the image and
the filter (neuron output) and pass the output to the next layers. After the
convolution an Activation (e.g. ReLu, sigmoid, tanh) is done. The purpose
of activation is to introduce non-linearity. ReLu is faster than other methods
and it also helps to alleviate the vanishing gradient problem, which is the issue
where the lower layers of the network train very slowly because the gradient
decreases exponentially through the layers.
• Pooling This layer is used to reduce the amount of data. It reduces the
computational cost, which makes the CNN faster and helps with overfitting.
The most common option is the max-pooling, which is selected the highest
value.
• Fully Connected Applied on the end of the network. This layer computes
the class scores and output an N dimensional vector, where N is the number
of classes and correlate to a particular class. Each neuron has full connections
to all activation’s in the previous layer
Daniel Oliveira 17
Implementation
4
In this chapter it is described the dataset and the YOLO architectures and how
the training and detection tests were performed.
Figure 4.1: On left an image of a glass insulator. On right an illustration how an insulator
is built.(Source: [Link] ).
19
CHAPTER 4. IMPLEMENTATION
4.2 Dataset
For training YOLO it was necessary a big dataset with insulator images. This
dataset contains images with Low/Medium and High Voltage insulators, different
materials (glass and ceramic), different backgrounds (blue sky, kind yellow ground
and green forest and poles/towers), different views (top/bottom and front) and with
insulators in several positions (vertical, horizontal and diagonal). Some images have
more than one insulator and also insulators in a different position on the same
image. The Figure 4.2 shows some dataset images. The light conditions aren’t
always constant and the ANN should be invariant of that. To improve it the images
have different light conditions as Figure 4.3 shows.
(a) Green background and long (b) Top view of white ceramic in-
insulators. sulators.
(c) Sky and ground in the back- (d) Bottom view of glass insulat-
ground with small insulators. ors.
The dataset was initially composed by +3 000 images of insulators. Since the
images are very similar, rotations and crop regions were applied on the images to
obtain more images (Figure 4.4). Images were rotated ±5◦ (simulating the UAV’s
oscillations) and were cropped around the insulators, resulting in more than 90 000
images with different resolutions and orientations, from 600 × 600px cropped images
up to 3000 × 3000px. This ”technique” was inspired by [41], who did this to raise
20 Daniel Oliveira
4.3. HARDWARE
Figure 4.3: Histogram of some dataset images. Some images don’t have the white balance
corrected and others are dark. The graphs show the number of pixels, yy axis, with a certain
intensity, xx axis in the three colour Red, Green and Blue.
4.3 Hardware
For this work it was used a NVIDIA Jetson TX2 (on the NVIDIA Jetson TX2
Developer Kit) to train the YOLO network and perform detection. Due to its small
size and low power consumption, it’s perfect to be used in robotics and mobile
platforms. On the Table 4.1 are listed some features of this GPU.
1
Available here: [Link]
Daniel Oliveira 21
CHAPTER 4. IMPLEMENTATION
(a) Original image. (b) Original images ro- (c) Original images ro-
tated CCW. tated CW.
(d) First insulator crop. (e) Cropped image ro- (f) Cropped image ro-
tated CCW. tated CW.
(g) Second insulator (h) Cropped image ro- (i) Cropped image ro-
crop. tated CCW. tated CW.
Figure 4.4: From the original image 4.4(a) it was possible to generate 8 ”new” images with
different orientations.
The UAV used to test the concept is shown in Figure 4.6. It’s a hexacopter
equipped with a Point Grey Chameleon3 plus a Fujinon YV2.8×2.8SA-2 lens (image
22 Daniel Oliveira
4.4. YOLO ARCHITECTURE
Figure 4.5: Label insulators on the image with YOLO-Mark program. The YOLO-Mark
converts the boxing drawn by the user into the YOLO format.
Figure 4.6: Image capture platform (UAV + camera + gimbal) used during the tests. On
the right is a close up of gimbal system.
Daniel Oliveira 23
CHAPTER 4. IMPLEMENTATION
Error calculation
YOLO calculate the error using Jaccard index method also know as Intersection
over Union (IoU), which compares the given label area with what it thinks it is
an object. The model does a ratio between the Overlap area (detection overlapped
on the label) and Union area (detection and label areas).
The Figure 4.7(a) illustrate IoU ratio and Figure 4.7(b) shows a real example of
labels (blue box) overlapped with the YOLO predictions (pink boxes).
Darknet outputs the error at each iteration, as shown in Figure 4.8. This output
shows how the loss is evolving at each iteration and indicates how the YOLO perform
in each sub-batch. The output of these lines can be analysed on file detector.c ln.
136.
The analyse of the last line of the output on Figure 4.8 is on Table 4.2.
The other lines are the result of each sub-division. In this case, the subdivision is
16 which means it will divide the batch into 16 groups and process it, so if batch = 64
24 Daniel Oliveira
4.4. YOLO ARCHITECTURE
and subdivision = 16, the training iteration will have 16 groups of 4 image each
allowing a better performance on systems with low memory. The output description
is on Table 4.3.
Table 4.3: Darknet output for each batch
Region Avg IOU: 0.816444 Is the average of the IOU in the current subdi-
vision. The last subdivision has an overlap of
81.64
Class: 1.0000 The relation of classes classified correctly. In
this case it’s only one class.
Obj: 0.657818 In code it is the relation between the average
objects detected and the true positives (count)
No Obj: 0.002966 Relation between all objects detected with the
number of real true positives (count)
Avg Recall: 1.00000 The average recall on subdivision.
count: 4 Is the amount of real true positives on subdivi-
sion.
YOLO has an option to re-size the images automatically during the training
(independently of the image size) at every 10 iterations. This option is enabled on
CFG file, parameter ”random=1” and allows a better training performance com-
pared although the iteration speed is slightly lower but overall worth it.
The average loss is less linear because the detection is affected by the image size,
as is shown on Figure 4.9 however, the ANN became more invariant to the object
size. The differences are shown on subsection 4.4.1.
Hardware consumption
Darknet framework does a full use of the GPU resources (it uses 99% of GPU
capacity at 1.12 GHz). The RAM memory, 8 Gb total on Jetson TX2, is shared
Daniel Oliveira 25
CHAPTER 4. IMPLEMENTATION
Figure 4.9: Re-sizing effect(red line value is 10). For every 10 iterations, the input image
size is changed and the error also changes. Batch=64 Subdivisions=16, random=1.
between the GPU and CPU and the total use is a little more than 5 Gb. The resource
consumption is shown on Figure 4.10.
Figure 4.10: GPU usage during the training stage. On top is marked the GPU consumption
and on bottom is the GPU frequency range..
26 Daniel Oliveira
4.4. YOLO ARCHITECTURE
Comparison graphics
The next figures (Figure 4.11 and Figure 4.12) shows the training comparison
result. The tests were performed with two differences: one using or not the Random
option, which changes the image size before feeding the network; second changing
the number of batch size, batch=1 and batch=64 images. The xx axis is the number
of iterations and the yy is the Average Loss (Error) in logarithmic scale.
The random option changes the image size at every 10 iterations allowing a more
size independent training. On Figure 4.11 the training with the random option
enabled (Figure 4.11(a)) has a less stable average loss and the error is a little higher
when compared with its disabled counterpart. However, the the CNN learn with
different images sizes, making it more size tolerant.
(a) Re-size option enable (random=1). (b) Re-size option disable (random=0).
Figure 4.11: Random option influence during the training. The green line marks 3 and
red line mark 10.
Batch size is the number of images that the CNN ”sees” per iteration. The bigger
that number is the more images the CNN uses to adjust the weights. The Figure 4.12
compares one training with batch=64 (Figure 4.12(a)) vs batch=1 (Figure 4.12(b)).
The conclusions are immediate with an average loss decreasing much faster with a
bigger batch size.
Training time
Daniel Oliveira 27
CHAPTER 4. IMPLEMENTATION
Figure 4.12: The batch size influence during the training. The green line marks 3 and red
line mark 10.
with/without the resize option. The results are shown on Table 4.4.
The hardware used for this test was Jetson TX2 for the most cases. To under-
stand the Jetson TX2 performance was also tested with Nvidia Tesla K80, Quadro
K2000 and with Intel Xeon E5-1650 v2.
The performance of Jetson TX2 is very similar to the Quadro K2000 as expected
because they have piratically the same CUDA Cores. However, when Jetson TX2 is
compared with Tesla K80 the scenario is totally different. Tesla K80 is 3× faster
than Jetson TX2. The use of GPU is very important to accelerate the training
time. The prove of that is using only the CPU, an Intel Xeon E5-1650 v2 the time
per iteration is more than 500 seconds, while with GPU is about 20 seconds or 8
seconds with Tesla K80.
Note: The case of using only CPU wasn’t executed all the 10000 iterations.
Note 2: Nvidia Tesla K80 has 4992 CUDA Cores and 24 Gb of memory however,
in the Amazon Web Services (AWS) it only provides half the graphical resources,
2496 CUDA cores and 12 Gb memory.
28 Daniel Oliveira
4.5. SYSTEM ARCHITECTURE
Batch = 1 Batch = 64
Subdivisions = 1 Subdivisions = 16
12 758 s (3.54 hours) 517 470 s (143 hours)
Random = 1
Average: 1.27 s/iteration Average: 51.75 s/iteration‡ YOLO 2.0
11 933 s (3.31 hours) 417 653 s (116 hours)
Random = 0
Average: 1.19 s/iteration Average: 41.76 s/iteration
2 970 s (0.82 hours) 229 272 s (63.68 hours)
Random = 1
Average: 0.29 s/iteration Average: 22.92 s/iteration?†∗ Tiny YOLO
2 738 s (0.76 hours) 200 242 s (55.62 hours)
Random = 0
Average: 0.27 s/iteration Average: 20.02 s/iteration
‡ On Nvidia Tesla K80, 2496 CUDA Cores ([Link] AWS), the average is 16.63 s/iteration.
? On Nvidia Quadro K2000, 384 CUDA Cores, the average is 22.34 s/iteration.
† On Intel Xeon E5-1650 v2, @6 × 3.50GHz, the average is > 500 s/iteration.
∗ On Nvidia Tesla K80, 2496 CUDA Cores ([Link] AWS), the average is 7.80 s/iteration.
Table 4.4: YOLO and Tiny YOLO time comparison. Time consumption to perform 10 000
iterations.
The system is composed of two main parts (Figure 4.13): the UAV part (orange
boxes), has the camera which captures the images sending them to YOLO and is
responsible to receive the new coordinates for the gimbal and to point the camera to
the right position; second the Jetson TX2 part (blue boxes) is the brain of the sys-
tems, running YOLO which detects and classify the insulators and perform calculus
to correct the gimbal with the gimbal control software.
The second part uses the ROS framework, explained in section 4.6, but it also
works without ROS however, isn’t possible to control the gimbal.
Daniel Oliveira 29
CHAPTER 4. IMPLEMENTATION
30 Daniel Oliveira
4.7. GIMBAL CONTROL
Figure 4.15: ROS package /gimbal control to control the gimbal position. It subscribes
two topics from /darknet ros and publish on topic /mavros/vision pose pose
However, case the YOLO doesn’t detect any insulator on the image, the gimbal
system will go to the horizontal position one degree a time. Case it detect an insu-
lator during this phase, it will centre the image on it. State transition is illustrated
on Figure 4.16.
Figure 4.16: Gimbal Control - State machine. The systems start with gimbal in horizontal
position. If an insulator is detected, it will centre the image on it, else it will go to the
horizontal position.
Daniel Oliveira 31
5
Results
33
CHAPTER 5. RESULTS
Figure 5.2: YOLO and Tiny-YOLO error analysis. Figure 5.2(a) is the Precision-Recall
curve and the Figure 5.2(b) is the Precision and Recall in function of the proposal threshold.
of the use of an ANN. The training session with Tiny-YOLO (batch=64 images and
re-size images option active) took approximately 20 days on Jetson TX2. After
more than 72 000 iterations, the value of average loss (error) is about 2.0 with a
learning rate of 0.01. The average loss should be as close to zero as possible.
34 Daniel Oliveira
5.2. YOLO KERNELS
However, it wasn’t possible to achieve a lower value, even changing the learning rate
to lower values.
The variation of the average loss at each iteration is shown in Figure 5.3.
Figure 5.3: AVG Loss variation of Tiny YOLO along the training. After more than 72 200
iterations, approximately 471 hours (almost 20 days), the average loss it’s about 2.0 (green
line) and the red line marks 10.0.
As explained in Chapter 3, the first layer detects low-level features. The Fig-
ure 5.4 is the Tiny-YOLO 3 × 3 × 16 kernels (filters) applied on the first convolution
layer at different training stage. In the case of YOLO (Figure 5.5) the kernels on
first convolution layer are 3 × 3 × 32.
Both Figures show that the kernel slightly changes during the training phase
according to the dataset.
(a) Initial kernel’s Tiny YOLO. The kernels was trained on VOC dataset.
Figure 5.4: Tiny-YOLO kernel’s evolution. Before, Figure 5.4(a), and after, Figure 5.4(b),
the training. Is possible to see a slight difference on the most left kernels.
Daniel Oliveira 35
CHAPTER 5. RESULTS
Figure 5.5: YOLO’s kernels on the first convolution layer. In this case, the kernels didn’t
change during the training.
The video with a resolution of 1280 × 720 px was filmed on the ground. The
Figure 5.6 shows some frames with insulator detection running between 10 and 12
frames per second (FPS). The GPU usage varies during the detection but wasn’t as
intense as in the training (the GR3D parameter on the bottom of Figure 5.7). As
the background is mostly sky the detection was performed quite well.
The system was also tested in ROS environment in two ways: one with a rosbag
recorded with a UAV, stored on an HDD connected via USB; second subscribing an
image topic from a USB camera pointing to printed images of insulators.
The Figure 5.8 shows the detection running from the rosbag. On the left is the
image with the bounding boxes marking the insulators detected; on the top right is
the output of darknet ros package with frame rate (in this case 0.8 FPS), number
of objects detected and probabilities of detection; on the bottom right is the usage
of CPU and GPU during the detection. As it happen with video file method, the
GPU wasn’t as much used as in the training mode.
The speed processing/detection was very slow (running at less than 1 FPS)
regardless of whether it was with YOLO or Tiny-YOLO architecture.
36 Daniel Oliveira
5.3. INSULATOR DETECTION SPEED
(a) (b)
(c)
Figure 5.6: The Tiny-Yolo detection with a video file (1280 × 720px) runs between 10 and
12 FPS on Jetson TX2. The images on video were not viewed by CNN during the training.
(a) 9.5 FPS; Probability: 34% − 71% (b) 11.5 FPS; Probability: 34% − 82%
Figure 5.7: Probabilities of detection and GPU usage from video file detection.
When getting images from the camera, via USB, the average was 3 FPS (with
Tiny-YOLO, 2 FPS with YOLO) which was faster than the rosbag method despite
/camera/image raw is publishing at 7 FPS.
To avoid false positives, the bounding boxes threshold was adjusted to 55% (left
image on Figure 5.9). The centre and right images were taken with a threshold of
15% and it was possible to detect all the insulators on the image.
Daniel Oliveira 37
CHAPTER 5. RESULTS
Figure 5.8: ROS detection from a rosbag. On top is the image with the bounding boxes, in
the middle is probability detection of each bounding boxes and on the bottom is the resources
usage (Memory, CPU and GPU).
Figure 5.9: Tiny-YOLO running on ROS with the image coming from a camera. The
insulators are in different positions/orientation to test the invariant of it.
The tests did indicate that most of the false positives occur when there are blue
spaces (sky) between the green vegetation. A way to avoid this situation is to use
a higher threshold on bounding boxes because the majority of false positives has
probability values around 40%.
38 Daniel Oliveira
5.4. GIMBAL CONTROL
The system only tracks the insulator with the highest probability, independently
of how many insulators are detected. The Figure 5.11 shows the sequence of tracking
the most right side insulator because it’s the insulator with the higher probability
(77% vs 69%).
The tests demonstrated that the system is capable of tracking the insulator
correctly with low frame rate. If the frame rate is increased probably it will be
necessary to adjust the controller parameter.
Figure 5.11: ROS package /gimbal control tracking the insulator with the higher probability
(77% vs 69%). The UAV is static just the insulator print is moving.
In case the YOLO doesn’t detect any insulator, the /gimba control send orders
for gimbal to go to the horizontal position (Figure 5.12). If meanwhile the YOLO
detects some insulator, it will centre the image on the insulator.
Daniel Oliveira 39
CHAPTER 5. RESULTS
Figure 5.12: The gimbal is moving to the horizontal position. In this case, after the gimbal
is in the horizontal position, YOLO detected an insulator. The next step will be to centre
the image on it.
40 Daniel Oliveira
Conclusions and Future Work
6
The work developed had as mission developing a system capable of helping the
insulator inspection making it more efficient, faster and safer than traditional meth-
ods.
The proposed system is capable of detecting insulators either during the in-
spection by capturing images from a camera or after the inspection with the video
recorded (in post-processing).
The YOLO training process is efficient. With good GPU hardware is possible to
train it in about a week. The YOLOv2 shows a better performance over Tiny-YOLO
but it is slightly slower.
Insulators are centred in image by controlling the gimbal and this way the UAV
isn’t obligated to be right in front of the insulator as it may be slightly next to them
(according to the gimbal limits). If the UAV is moving the gimbal will follow the
insulator until it finds an object with higher probability of being an insulator.
However, detection has low precision and detect many false positives, falling short
of the expected results. For the same threshold, YOLOv2 has less false positives
than Tiny-YOLO. Since early, it was clear the importance of having a good dataset
for training the CNN.
One of the problems was the similarity between images, having few variations
of positions, distance, solar reflection etc. Other problem was the white balance
of the images which in some of them weren’t adjusted so that there were many
images with high gains in green colour and others were too dark. As a consequence,
41
CHAPTER 6. CONCLUSIONS AND FUTURE WORK
CNN considered the green background (like trees or leaves) as an insulator. During
the marking of bounding boxes it was only considered insulators with two or more
caps visible. Maybe the dataset shouldn’t contains images with partial insulators or
partially overlapped insulators to not be ambiguous to YOLO. In the continuation
line of this work, it is extremely important to get a good dataset in way to validate
the influences of such images. It can pass by review of the actual dataset and/or
uses different images.
Once there are some different insulators types, it could be interesting during the
detection to differentiate the insulators by the material type or size. This feature
can provide feedback for statistical analyses of the quality type, adjust automatically
camera parameters or adjust the UAV behaviour.
The next steps could be planning the UAV’s trajectories around the insulators,
in such a way it can automatically record images of insulators and provide a better
and detailed image. This trajectories can be pre-planned with waypoints or develop
an algorithm to keep the distance and looking for possible collisions.
Detect faulty insulator is a good and important feature for the system. However,
it can be difficult to achieve if the cracks/defects are small and impossible to see
from an UAV. Other point is how to train an CNN, or other model, to detect
defective/break insulators. Probably it is necessary help of some electrical field
companies to get images for the training.
Other field where this can be applied is in the railroad power line inspection,
since the insulators are very similar to the electrical power line distribution and
maybe the training would be faster.
In brevi, YOLO and Jetson TX2 seems to be a good choice for this application
but it is necessary to work on the training phase. Personally, this work gave me the
opportunity to learn about deep learning and all the difficulties to train an ANN
with a non-ideal dataset. I hope the work continues and more features are added to
develop a possible commercial product. Would this system, in the future, stop the
crashes and people injuries?
42 Daniel Oliveira
Bibliography
[4] W. Cheng and Z. Song, “Power Pole Detection Based on Graph Cut,” 2008
Congress on Image and Signal Processing 3, 720–724 (2008).
[5] I. Golightly and D. Jones, “Corner detection and matching for visual track-
ing during power line inspection,” Image and Vision Computing 21, 827–840
(2003).
[7] X. Zhang, J. An, and F. Chen, “A Simple Method of Tempered Glass Insulator
Recognition from Airborne Image,” 2010 International Conference on Optoelec-
tronics and Image Processing 1, 127–130 (2010).
[8] F. Zhang, W. Wang, Y. Zhao, P. Li, Q. Lin, and L. Jiang, “Automatic diagnosis
system of transmission line abnormalities and defects based on UAV,” 2016 4th
International Conference on Applied Robotics for the Power Industry (CARPI)
pp. 1–5 (2016).
[10] M. Oberweger, A. Wendel, and H. Bischof, “Visual Recognition and Fault De-
tection for Power Line Insulators,” Computer Vision Winter Workshop (2014).
43
BIBLIOGRAPHY
[11] J. Zhao, X. Liu, J. Sun, and L. Lei, in Intelligent Computing Technology: 8th
International Conference, ICIC 2012, Huangshan, China. Proceedings, D.-S.
Huang, C. Jiang, V. Bevilacqua, and J. C. Figueroa, eds., (Springer Berlin
Heidelberg, Berlin, Heidelberg, 2012), pp. 442–450.
[15] Y. Liu, J. Yong, L. Liu, J. Zhao, and Z. Li, “The method of insulator recognition
based on deep learning,” 2016 4th International Conference on Applied Robotics
for the Power Industry (CARPI) pp. 1–5 (2016).
[19] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time
Object Detection with Region Proposal Networks,” IEEE Transactions on Pat-
tern Analysis and Machine Intelligence 39, 1137–1149 (2017).
[20] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-
Scale Image Recognition,” CoRR abs/1409.1556 (2014).
[22] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once:
Unified, Real-Time Object Detection,” 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) pp. 779–788 (2016).
44 Daniel Oliveira
BIBLIOGRAPHY
[23] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR) (2017).
[24] L. Wang and H. Wang, “A survey on insulator inspection robots for power
transmission lines,” 2016 4th International Conference on Applied Robotics for
the Power Industry (CARPI) pp. 1–6 (2016).
[26] Guo Rui, Zhang Feng, Cao Lei, and Yong Jun, “A mobile robot for inspec-
tion of overhead transmission lines,” Proceedings of the 2014 3rd International
Conference on Applied Robotics for the Power Industry pp. 1–3 (2014).
[27] J.-y. Park, J.-k. Lee, B.-h. Cho, and K.-y. Oh, “Development of Advanced In-
sulator Inspection Robot for 345kV Suspension Insulator Strings,” Proceedings
of the International MultiConference of Engineers and Computer Scientists II,
17–20 (2010).
[29] L. Zhong, J. Jia, R. Guo, J. Yong, and J. Ren, “Mobile robot for inspection of
porcelain insulator strings,” Proceedings of the 2014 3rd International Confer-
ence on Applied Robotics for the Power Industry pp. 1–4 (2014).
[31] I. Golightly and D. Jones, “Visual control of an unmanned aerial vehicle for
power line inspection,” pp. 288–295 (2005).
[33] S. Antunes and K. Bousson, “Safe flight envelope for overhead line inspection,”
Proceedings of the 2014 3rd International Conference on Applied Robotics for
the Power Industry pp. 1–6 (2014).
[34] M. Malveiro, “Inspection of High Voltage Overhead Power Lines With UAV’s,”
pp. 3–5 (2015).
Daniel Oliveira 45
BIBLIOGRAPHY
[35] C. Deng, S. Wang, Z. Huang, Z. Tan, and J. Liu, “Unmanned aerial vehicles for
power line inspection: A cooperative way in platforms and communications,”
Journal of Communications 9, 687–692 (2014).
[36] Y. B. Yann LeCun and G. Hinton, “Deep learning,” Nature pp. 436–444 (2015).
[38] M. A. Nielsen, in Neural Networks and Deep Learning, D. Press, ed., (2015).
46 Daniel Oliveira