0% found this document useful (0 votes)
140 views

Artificial Intelligence For Robotics and Autonomous Systems Applications

Uploaded by

adem hamouda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
140 views

Artificial Intelligence For Robotics and Autonomous Systems Applications

Uploaded by

adem hamouda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 488

Studies in Computational Intelligence 1093

Ahmad Taher Azar


Anis Koubaa Editors

Artificial
Intelligence
for Robotics and
Autonomous
Systems
Applications
Studies in Computational Intelligence

Volume 1093

Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new develop-
ments and advances in the various areas of computational intelligence—quickly and
with a high quality. The intent is to cover the theory, applications, and design methods
of computational intelligence, as embedded in the fields of engineering, computer
science, physics and life sciences, as well as the methodologies behind them. The
series contains monographs, lecture notes and edited volumes in computational
intelligence spanning the areas of neural networks, connectionist systems, genetic
algorithms, evolutionary computation, artificial intelligence, cellular automata, self-
organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems.
Of particular value to both the contributors and the readership are the short publica-
tion timeframe and the world-wide distribution, which enable both wide and rapid
dissemination of research output.
Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
Ahmad Taher Azar · Anis Koubaa
Editors

Artificial Intelligence
for Robotics
and Autonomous Systems
Applications
Editors
Ahmad Taher Azar Anis Koubaa
College of Computer and Information College of Computer and Information
Sciences Sciences
Prince Sultan University Prince Sultan University
Riyadh, Saudi Arabia Riyadh, Saudi Arabia
Automated Systems and Soft Computing
Lab (ASSCL)
Prince Sultan University
Riyadh, Saudi Arabia
Faculty of Computers and Artificial
Intelligence
Benha University
Benha, Egypt

ISSN 1860-949X ISSN 1860-9503 (electronic)


Studies in Computational Intelligence
ISBN 978-3-031-28714-5 ISBN 978-3-031-28715-2 (eBook)
https://doi.org/10.1007/978-3-031-28715-2

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

Robotics, autonomous control systems, and artificial intelligence technology are all
examples of the Fourth Industrial Revolution. In order to further the autonomous car
age, artificial intelligence is integrated with robots and independent driver systems.
The remarkable advancements in artificial technology have sparked the development
of new services and online solutions to a variety of social problems. However, there is
still more to be done to integrate artificial intelligence with physical space. Robotics, a
branch of a technical and scientific discipline that deals with bodily interactions with
the physical world, is mechanical in nature. There are several domain-specific skills
in robotics, including sensing and perception, computing film/dynamic actions, and
controlling theory. Robotics and artificial intelligence work together to revolutionize
society by connecting cyberspace and physical space.

Objectives of the Book

This book’s objective is to compile original papers and reviews that demonstrate
numerous uses of robotics and artificial intelligence (AI). It seeks to showcase cutting-
edge robotics and AI applications as well as developments in machine learning and
computational intelligence technologies in a variety of scenarios. In order to give a
cogent and comprehensive strategy to ion conservation employing technology and
analytics, it is also urged to develop and critically evaluate data analysis methodolo-
gies using such approaches. For applied AI in robotics, this book should serve as a
useful point of reference for both beginners and experts.
Both novice and expert readers should find this book a useful reference in the
field of artificial intelligence, mathematical modelling, robotics, control systems,
and reinforcement learning.

v
vi Preface

Organization of the Book

This well-structured book consists of 15 full chapters.

Book Features

• The book chapters deal with the recent research problems in the areas of
artificial intelligence, mathematical modelling, robotics, control systems, and
reinforcement learning.
• The book chapters present advanced techniques of AI applications in robotics and
drones.
• The book chapters contain a good literature survey with a long list of references.
• The book chapters are well-written with a good exposition of the research problem,
methodology, block diagrams, and mathematical techniques.
• The book chapters are lucidly illustrated with numerical examples and simula-
tions.
• The book chapters discuss details of applications and future research areas.

Audience

The book is primarily meant for researchers from academia and industry, who are
working in the research areas such as robotics engineering, control engineering,
mechatronic engineering, biomedical engineering, medical informatics, computer
science, and data analytics. The book can also be used at the graduate or advanced
undergraduate level and many others.

Acknowledgements

As the editors, we hope that the chapters in this well-structured book will stimulate
further research in artificial intelligence, mathematical modelling, robotics, control
systems, and reinforcement learning, and utilize them in real-world applications.
We hope sincerely that this book, covering so many different topics, will be very
useful for all readers.
Preface vii

We would like to thank all the reviewers for their diligence in reviewing the
chapters.
Special thanks go to Springer, especially the book Editorial team.

Riyadh, Saudi Arabia/Benha, Egypt Prof. Ahmad Taher Azar


[email protected]
[email protected]
[email protected]
Riyadh, Saudi Arabia Prof. Anis Koubaa
[email protected]
Contents

Efficient Machine Learning of Mobile Robotic Systems Based


on Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Milica Petrović, Zoran Miljković, and Aleksandar Jokić
UAV Path Planning Based on Deep Reinforcement Learning . . . . . . . . . . . 27
Rui Dong, Xin Pan, Taojun Wang, and Gang Chen
Drone Shadow Cloud: A New Concept to Protect Individuals
from Danger Sun Exposure in GCC Countries . . . . . . . . . . . . . . . . . . . . . . . 67
Mohamed Zied Chaari, Essa Saad Al-Kuwari, Christopher Loreno,
and Otman Aghzout
Accurate Estimation of 3D-Repetitive-Trajectories using
Kalman Filter, Machine Learning and Curve-Fitting Method
for High-speed Target Interception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Aakriti Agrawal, Aashay Bhise, Rohitkumar Arasanipalai,
Lima Agnel Tony, Shuvrangshu Jana, and Debasish Ghose
Robotics and Artificial Intelligence in the Nuclear Industry: From
Teleoperation to Cyber Physical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Declan Shanahan, Ziwei Wang, and Allahyar Montazeri
Deep Learning and Robotics, Surgical Robot Applications . . . . . . . . . . . . . 167
Muhammad Shahid Iqbal, Rashid Abbasi, Waqas Ahmad,
and Fouzia Sher Akbar
Deep Reinforcement Learning for Autonomous Mobile
Robot Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Armando de Jesús Plasencia-Salgueiro
Event Vision for Autonomous Off-Road Navigation . . . . . . . . . . . . . . . . . . . 239
Hamad AlRemeithi, Fakhreddine Zayer, Jorge Dias, and Majid Khonji

ix
x Contents

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base


Robot in the Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Ajay Kumar Sandula, Pradipta Biswas, Arushi Khokhar,
and Debasish Ghose
Machine Learning and Deep Learning Approaches for Robotics
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Lina E. Alatabani, Elmustafa Sayed Ali, and Rashid A. Saeed
A Review on Deep Learning on UAV Monitoring Systems
for Agricultural Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Tinao Petso and Rodrigo S. Jamisola Jr
Navigation and Trajectory Planning Techniques for Unmanned
Aerial Vehicles Swarm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Nada Mohammed Elfatih, Elmustafa Sayed Ali, and Rashid A. Saeed
Intelligent Control System for Hybrid Electric Vehicle
with Autonomous Charging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Mohamed Naoui, Aymen Flah, Lassaad Sbita, Mouna Ben Hamed,
and Ahmad Taher Azar
Advanced Sensor Systems for Robotics and Autonomous Vehicles . . . . . . 439
Manoj Tolani, Abiodun Afis Ajasa, Arun Balodi, Ambar Bajpai,
Yazeed AlZaharani, and Sunny
Four Wheeled Humanoid Second-Order Cascade Control
of Holonomic Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
A. A. Torres-Martínez, E. A. Martínez-García, R. Lavrenov,
and E. Magid
Efficient Machine Learning of Mobile
Robotic Systems Based on Convolutional
Neural Networks

Milica Petrović, Zoran Miljković, and Aleksandar Jokić

Abstract During the last decade, Convolutional Neural Networks (CNNs) have been
recognized as one of the most promising machine learning methods that are being
utilized for deep learning of autonomous robotic systems. Faced with everlasting
uncertainties while working in unstructured and dynamical real-world environments,
robotic systems need to be able to recognize different environmental scenarios and
make adequate decisions based on machine learning of the current environment’s
state representation. One of the main challenges in the development of machine
learning models based on CNNs is in the selection of appropriate model structure
and parameters that can achieve adequate accuracy of environment representation.
In order to address this challenge, the book chapter provides a comprehensive anal-
ysis of the accuracy and efficiency of CNN models for autonomous robotic applica-
tions. Particularly, different CNN models (i.e., structures and parameters) are trained,
validated, and tested on real-world image data gathered by a mobile robot’s stereo
vision system. The best performing CNN models based on two criteria—the number
of frames per second and mean intersection over union are implemented on the
real-world wheeled mobile robot RAICO (Robot with Artificial Intelligence based
COgnition), which is developed in the Laboratory for robotics and artificial intelli-
gence (ROBOTICS&AI) and tested for obstacle avoidance tasks. The achieved exper-
imental results show that the proposed machine learning strategy based on CNNs
provides high accuracy of mobile robot’s current environment state estimation.

Keywords Efficient deep learning · Convolutional neural networks · Mobile robot


control · Robotic vision · NVidia Jetson Nano

M. Petrović (B) · Z. Miljković · A. Jokić


Faculty of Mechanical Engineering, University of Belgrade, Belgrade, Serbia
e-mail: [email protected]
Z. Miljković
e-mail: [email protected]
A. Jokić
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 1


A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous
Systems Applications, Studies in Computational Intelligence 1093,
https://doi.org/10.1007/978-3-031-28715-2_1
2 M. Petrović et al.

1 Introduction

The worldwide interest in Artificial Intelligence (AI) techniques has become evident
after the paper [1] reached a substantially better result on image classification task
by utilizing Artificial Neural Networks (ANNs). Afterward, numerous ANN models
that achieve even better results on image classification and various other tasks have
been developed in [2]. Consequently, Deep Learning (DL) emerged as a new popular
AI subfield. DL represents the process of training and using ANNs that utilize much
deeper architectures, i.e., models with a large number of sequential layers. Another
important innovation provided by [1, 3] was that the deep ANNs provide better
results with convolutional layers instead of fully connected layers. Therefore, deep
ANNs with convolution as a primary layer are entitled as Convolutional Neural
Networks (CNNs). The CNN models such as ResNet [4], VGG [5], and Xception
[6] have become the industry and research go-to options, and many researchers tried
and succeeded in improving models’ accuracy or modifying the models for other
tasks and purposes. An introductory explanation of the CNN layers (e.g., Pooling,
ReLU, convolution, etc.) is beyond the scope of this chapter, and interested readers
are referred to the following literature [2].
Background of the research performed in this chapter includes the high intercon-
nection of robotics, computer vision, and AI fields that has led numerous researchers
in the robotics community to get interested in DL. Many robotics tasks that have a
highly non-linear nature can be effectively approximated by utilizing DL techniques.
Nowadays, the utilization of DL in robotics spans from Jacobian matrix approxima-
tion [7] to decision-making systems [8]. However, in the robotics context, AI is mainly
used when a high dimensional sensory input is utilized in control [9], simultaneous
localization and mapping [10], indoor positioning [11], or trajectory learning [12,
13]. One of the main challenges for utilizing state-of-the-art DL models in robotics
is related to processing time requirements. Keeping in mind that all robotic algo-
rithms need to be implemented in the real-world setting, where processing power is
limited by robot hardware, DL models need to be able to fulfill the time-sensitive
requirements of embedded robotic processes.
In the beginning, the accuracy of the utilized CNN models was the only relevant
metric researchers considered, and therefore the trend was to utilize larger models.
Larger models not only require more time, energy, and resources to train but are also
impossible to implement in real-world time-sensitive applications. Moreover, one
major wake-up call was the realization that the energy utilized for training one of
the largest DL models for natural language processing [14] was around 1,287 MWh
[15], whereas a detailed analysis of the power usage and pollution generated from
training DL models was shown in [16]. Having that in mind, the researchers started
exploring the models that do not utilize a large amount of power for training, as well
as the models that are usable in real time.
As it can be concluded from the previous elaboration, motivation for the research in
this chapter is in the possible utilization of highly accurate DL models within robotic
domain. Particularly, the DL models that provide effective tool for the mobile robot
Efficient Machine Learning of Mobile Robotic Systems Based … 3

perception system to further enhance the understating of the robot’s surroundings and
utilize that information for further decision-making will be considered. The objective
of the research presented in this chapter is to identify how large (in terms of number
of parameters and layers) the CNN model needs to be to achieve high accuracy for
semantic segmentation task, while having practical value in terms of capability for
its implementation to computationally restricted Nvidia Jetson Nano board.
The main contributions of this chapter include the analysis of the efficiency of
developed DL models implemented on Nvidia Jetson Nano board, within mobile
robot RAICO (Robot with Artificial Intelligence based COgnition) for real-world
evaluation. Particularly, efficient CNN models with different levels of computational
complexity are trained on well-known dataset and tested on mobile robot RAICO.
Different from other approaches (see e.g., [17–19]) that usually utilize AI-boards
with higher level of computation recourses or even high end GPUs (that are much
harder to integrate within robotic systems), the authors analyze the models that have
far lower computational complexity and that are implementable on Jetson Nano.
After the model selection process and thorough analysis of achieved results, the best
CNN model is implemented within the obstacle avoidance algorithm of mobile robot
RACIO for experimental evaluation within real robotic algorithm.
The chapter outline is as follows. The formulation and initial analysis of the
problem at hand are given in Sect. 2. The related work regarding the efficient DL
models in robotic domain is presented in Sect. 3. Section 4 includes the imple-
mentation details of different efficient DL models. The methodology for mobile
robot obstacle avoidance algorithm based on DL is considered in Sect. 5. Section 6
is devoted to the analysis of experimental results and followed by discussion of
achieved results presented in Sect. 7. Section 8 has concluding remarks with future
research directions.

2 Problem Analysis and Formulation

The efficiency of CNN models is measured in FLoating Point Operations (FLOPs).


FLOPs represent the number of computational operations that need to be performed
for a model to produce an output for a given input. If the CNN models are compared
and tested on the same hardware platform, inference time (time required for the CNN
to produce an output) can also be utilized for their comparison. In terms of robotic
applications, inference time (e.g., 0.02 s) can be a more informative metric since
its utilization gives a human-understandable estimate of the model speed. When a
model is implemented on a robotic system, inversion of inference time (also known
as Frames Per Second–FPS) is also provided as a standard metric. The models with
FPS above 30 are usually considered as real-time models. On the other end of the
spectrum, CNN efficiency can also be analyzed in terms of training time. Since the
focus of this chapter is on deep learning for robotic applications, this type of analysis
will not be discussed further (the interested reader is referred to literature sources
[15]) and model efficiency will be focused solely on inference time.
4 M. Petrović et al.

Efficient general purpose (backbone) CNNs that can be adapted to multiple


computer vision tasks have started emerging as a popular research topic. The devel-
opment of novel efficient CNN models will be shown through an analysis of the three
popular models.
One of the first efficient CNN models was proposed in [20] and entitled
SqueezeNet. The authors proposed to minimize the number of parameters in the
model by using mainly convolutional layers with 1 × 1 and 3 × 3 filters. Moreover,
high accuracy was achieved by down-sampling feature maps within later layers in
the network. The network is defined with so-called fire modules that contain two 1
× 1 convolution and one 3 × 3 convolution layer combined with ReLU activations.
The resulting network has 50 × fewer parameters than AlexNet, while achieving the
same accuracy.
In [21], the authors developed procedure to more efficiently implement convo-
lution and named their model MobileNet. The number of parameters utilized for
standard 3 × 3 convolution layers (and also a number of FLOPs) can be greatly
reduced by using depthwise and pointwise convolution layers instead. For example
given in Fig. 1, the following equations demonstrate the difference between the
number of parameters for standard convolution (1) and MobileNet convolution (2):

Pc = Fwc Fhc Nc Mc = 3 · 3 · 5 · 7 = 315 (1)

Pm = Pd + Pp = Fwd Fhd Nd Md + Fw p Fhp N p M p = 3 · 3 · 1 · 5 + 1 · 1 · 5 · 7 = 80


(2)

where P is the number of parameters, F w , F h are the width and height of the filter,
N is the number of channels (depth) of the input feature maps, and M is the number
of filters used; all the parameters have additional index that shows which layer
they represent, c—standard convolution, d—depthwise convolution, p—pointwise
convolution, m—MobileNet. The difference between the standard and MobileNet
convolutional layer can be graphically seen in Fig. 1.
Moreover, Eqs. (3) and (4) represent the difference between a number of FLOPs
(without bias, padding is 0, and stride is 1) utilized for these two convolution layers,

Fc = Fwc Fhc Nc Dw Dh Mc = 3 · 3 · 5 · 10 · 7 · 7 = 22050, (3)

Fm = Pd + Pp = Fwd Fhd Nd Md Dw Dh + Fw p Fhp N p M p Dw Dh


= 3 · 3 · 1 · 5 · 10 · 7 + 1 · 1 · 5 · 7 · 10 · 7 = 3150 + 2450 = 5600, (4)

where F represents the number of FLOPs, Dw and Dh are width and height of the
output feature map, with the same notation as in (1) and (2). As it can be seen,
both memory footprint (according to the number of parameters) and inference time
according to the FLOPs are four times lower for the MobileNet convolution in the
Efficient Machine Learning of Mobile Robotic Systems Based … 5

Input feature Convolution layer


map with 7 (3×3) filters

Input feature Depthwise convolution Pointwise convolution


map layer with 5 (3×3) filters layer with 7 (1×1) filters

Fig. 1 Difference between standard and depthwise separable convolution process

considered example. For larger layers, the difference can be even more significant
(up to 8 or 9 times [21]).
Another efficient general-purpose CNN model is ShuffleNet [22]. In the same
manner as MobileNet, ShuffleNet utilizes depthwise and pointwise convolution
layers. Differently, it utilizes a group convolution to further reduce the number of
FLOPs. Additionally, the model also performs the channel shuffle between groups
to increase the overall information provided for feature maps. ShuffleNet achieves
better accuracy than MobileNet while having the same number of FLOPs. Both
MobileNet and ShuffleNet have novel versions of their models to further improve
their performance ([23, 24]).

3 Efficient Deep Learning for Robotics—Related Work

In the best-case scenario, the developed DL models applied in robotic applications


should be able to achieve real-time specifications by utilizing a small embedded
device. One of the most popular family of AI-embedded devices is NVidia Jetson,
and since many DL models are tested on these devices (including our models), their
specifications are given in Table 1.
6 M. Petrović et al.

Table 1 Embedded NVidia Jetson devices


Jetson device
Nano TX2 Xavier NX
Processing power 0.472 GFLOPS 1.33 TFLOPS 21 TOPS
GPU Maxwell (128 cores) Pascal (256 cores) Volta (384 cores)
RAM 4 GB 8 GB 8 GB
Power 10 W 15 W 15 W

Different CNN models have been developed for various computer vision appli-
cations. Therefore, the following related work is divided into two sections based on
computer vision tasks that are utilized in robotic applications.

3.1 Efficient Deep Learning Models for Object Detection


in Robotic Applications

The first frequently utilized computer vision application in robotics is object detec-
tion. Object detection represents the process of finding a specific object in the image,
defining its location with the bounding box, and classifying the object into one of the
predefined classes with the prediction confidence score. The most common efficient
detection networks are analyzed next. The faster R-CNN [25] represents one of the
first detection CNN models that brought the inference time so low that it encouraged
the further development of real-time detection models. Nowadays, detection models
can be used in real-time with significant accuracy, e.g., YOLO [26] and its variants
(e.g., [27, 28]), and SSD [29].
Object detection-based visual control of industrial robot was presented in [30]. The
authors utilized faster R-CNN model in conjunction with an RGBD camera to detect
objects and decide if the object was reachable for a manipulator. The human–robot
collaboration based on hand gesture detection was considered in [31]. The authors
improved SSD network by changing VGG for Resnet backbone and adding an extra
feature combination layer. The considered modifications improved the detection of
hand signs even when the human was far away from the robot. In [32], the authors
analyzed dynamic Simultaneous Localization And Mapping (SLAM) based on SSD
network. Dynamic objects were detected to enhance the accuracy of the standard
visual ORB-SLAM2 method by excluding parts of the image that were likely to move.
The proposed system significantly improved SLAM performance in both indoor and
outdoor environments. Human detection and tracking performed with SSD network
were investigated in [33]. The authors proposed a mobile robotic system that can
find, recognize, track, and follow (using visual control) a certain human in order
to achieve human–robot interaction. The authors of [34] developed a YOLOv3-
based bolt position detection algorithm to infer the orientation of the pallets the
Efficient Machine Learning of Mobile Robotic Systems Based … 7

industrial robot needs to fill up. The YOLOv3 model was improved by using a k-
means algorithm, a better detector, and a novel localization fitness function. The
human intention detection algorithm was developed in [35]. The authors utilized
YOLOv3 for object detection and LSTM ANN for human action recognition. CNNs
were integrated into one human intention detection algorithm and robot decision-
making system utilizes that information for decision-making purposes.

3.2 Efficient Deep Learning Models for Semantic


Segmentation in Robotic Applications

The second common computer vision task that is utilized within robotic applications
is semantic segmentation (e.g., [36]). Semantic segmentation represents the process
of assigning (labeling) every pixel in the image with an object class. The accuracy of
DL models for semantic segmentation can be represented either in pixel accuracy or
mean Intersection over Union (mIoU) [37]. Few modern efficient CNN models for
semantic segmentation are analyzed next, following by the ones that are integrated
into robotic systems.
The first analyzed CNN model was ENet [17]. The authors improved the effi-
ciency of ResNet model by adding a faster reduction of feature map resolution with
either max-pooling or convolution with stride 2. Moreover, the batch normalization
layer was added after each convolution. The results showed that ENet achieved mIoU
accuracy close to state-of-the-art while having a much lower inference time (e.g., 21
FPS on Jetson TX1). Another efficient CNN model entitled as ERFNet was proposed
in [18]. ERFNet further increased the efficiency of a residual block by splitting 2D
convolution into two 1D convolution layers. Each n × n convolution layer was split
into 1 × n, followed by ReLU activation and another n × 1 convolution. ERFNet
achieved higher accuracy than ENet, at the expense of some inference time (ERFNet–
11 FPS on TX1). The authors of [38] proposed attention-based CNN for semantic
segmentation. Fast attention blocks represent the core improvement of the proposed
paper. The ResNet was utilized as a backbone network. The utilized network was
evaluated on the Cityscape dataset, where it achieved 75.0 mIoU, while being imple-
mented on Jetson Nano and achieving 31 FPS. The authors of [39] proposed a CNN
model that integrates U-net [40] with ResNet’s skip connection. The convolution
layers were optimized with CP decomposition. Moreover, the authors proposed an
iterative algorithm for fine-tuning the ratio of compression and achieved accuracy.
At the end, the compressed network achieved astonishingly low inference time (25
FPS–Jetson Nano), with decent mIoU accuracy. The authors in [19] proposed to
utilize CNNs for semantic segmentation of crops and weeds. The efficiency of the
proposed network was mainly achieved by splitting the residual 5 × 5 convolution
layer into the following combination of convolution layers 1 × 1—5 × 1—1 ×
5—1 × 1. The proposed system was tested on an unmanned agriculture robot with
both 1080Ti NVidia GPU (20 FPS) and Jetson TX2 (5FPS). The novel semantic
8 M. Petrović et al.

Table 2 Overview of the


CNN model Vision task Robotic task
CNN models utilized for
robotic tasks R-CNN [30] Detection Visual control
ResNet-SSD [31] Detection Hand gesture detection used
for control
SSD [32] Detection Visual SLAM
SSD [33] Detection Human detection and
tracking
YOLOv3 [34] Detection Bolt position detection
algorithm
YOLOv3 [35] Detection Human intention detection
Mininet [41] Sem. seg. Visual SLAM
ERFNet [42] Sem. seg. Person detection/free space
representation
ResNet50 [43] Sem. seg. Visual SLAM

segmentation CNN model (Mininet) was proposed in [41]. The main building block
included two subblocks; the first one has depthwise and pointwise convolution where
depthwise convolution was factorized into two layers with filter n × 1 and 1 × n,
and the second subblock includes Atrous convolution with a factor greater than 1.
Both subblocks included ReLU activation and Batch normalization. At the end of the
block, both subblocks were summed, and another 1 × 1 convolution was performed.
The proposed model achieves high accuracy with around 30FPS on high-end GPU. In
regards to the robotic applications, the network was evaluated on efficient keyframe
selection for ORB2 SLAM method. The authors in [42] proposed CNN model for
RGBD semantic segmentation. The system was based on ResNet18 backbone with
decoder that utilized ERFNet modules. Mobile robot had Kinect2 RGBD camera,
and the proposed model was implemented on Jetson Xavier. Resulting system was
able to have high accuracy person detection with free space representation based on
floor discretization. Visual SLAM based on depth map generated by CNN model was
considered in [43]. The authors utilized version of ResNet50 that had improved infer-
ence time, so that it can be implemented on Jetson TX2 in near real-time manner (16
FPS). Overview of all analyzed efficient CNN models utilized for different robotic
tasks can be summarized in Table 2.

4 The Proposed Models of Efficient CNNs for Semantic


Segmentation Implemented on Jetson Nano

As it can be seen from Sect. 3, numerous CNN models have been proposed for small
embedded devices that can be integrated into robotic systems. In Sect. 4, the authors
will describe the CNN models that will be trained and deployed to the NVidia Jetson
Efficient Machine Learning of Mobile Robotic Systems Based … 9

Nano single-board computer. Models will be trained on the Cityscapes dataset [44]
with images that have 512 × 256 resolution.
As a baseline model, we have utilized the CNN network proposed by the official
NVidia Jetson instructional guide for inference (real-time CNN vision library) [45].
The network is based on a fully convolutional network with a ResNet18 backbone
(entitled ResNet18_2222). Due to the limited processing power, the decoder of the
network is omitted, and the output feature map is of lower resolution. The baseline
model is created from several consecutive ResNet Basic Blocks (BB) and Basic
Reduction Blocks (BRB), see Fig. 2. When the padding and stride are symmetrical,
only one number is shown. The number of feature maps in each block is determined
by the level in which the block is, see Fig. 3.

Fig. 2 Blocks of layers utilized for ResNet18 and ResNet18_1D architectures


10 M. Petrović et al.

Fig. 3 ResNet18_2222 and ResNet18_1D_2300 architectures

The complete architecture of the baseline and selected 1D model with three levels
is presented in Fig. 3. As it can be seen, the architecture is divided into four levels.
In the baseline model, each level includes two blocks, either two BB or BB + BRB
(defined in Fig. 2). Size of the feature maps is given between each level and each
layer. For all architectures, the number of features per level is defined as follows:
Efficient Machine Learning of Mobile Robotic Systems Based … 11

level 1—64 features, level 2—128 features, level 3—256 features, and level 4—512
features, regardless of the number of blocks in each level.
The first modification we propose to the baseline model is to change the number
of blocks in each level. The intuition for this modification is twofold, (i) a common
method of increasing the efficiency of CNN models is rapid reduction of feature maps
resolution, and (ii) the prediction mask resolution can be increased by not reducing
input resolution (since we do not use decoder). Classes that occupy small spaces
(e.g., poles in the Cityscapes dataset) cannot be accurately predicted if the whole
image with 256 × 512 resolution is represented by a prediction mask of 8 × 16;
therefore, a higher resolution of the prediction mask can increase both accuracy and
mIoU measure.
The second set of CNN models that are trained and tested include the decomposi-
tion of 3 × 3 layer into 1 × 3 and 3 × 1 convolution layers. Two types of blocks—1D
Block (DB) and 1D Reduction Block (DRB), created from this type of layer can be
seen in Fig. 2. CNN models with 1D blocks are entitled ResNet_1D (or RN_1D).
One of the 1D ResNet models is shown in Fig. 3. This model includes only the first
three levels with a larger number of blocks per level compared to the baseline model.
Since there is one less level, the output resolution is larger with the output mask of
16 × 32.
Lastly, the depth-wise separable and pointwise convolution is added into 1D layers
to create a new set of blocks (Fig. 4). Additional important parameters for separation
block are a number of feature maps at the input and the output of the level.

Fig. 4 Separable convolutional blocks


12 M. Petrović et al.

Fig. 5 ResNet_sep_4400
architectures

Another eight architectures named separable ResNet models (RN_sep) are created
using separable blocks. The example of the separable convolutional model with only
two levels is shown in Fig. 5.

5 Obstacle Avoidance Algorithm for Mobile Robots Based


on Semantic Segmentation

After mobile robots receive high-level tasks that need to be performed (see, e.g.
[46–48]), a path planning step is required to ensure the safe and efficient execution
of tasks. Along the way, new obstacles can be detected in the defined plan; therefore,
local planning needs to occur to avoid collisions.
In this work, the efficient deep learning system is utilized to generate the semantic
map of the environment. Afterward, the semantic map is utilized to detect obstacles
Efficient Machine Learning of Mobile Robotic Systems Based … 13

in the mobile robot’s path. The considered mobile robot system moves in a hori-
zontal plane, and therefore the height of the camera remains the same for the whole
environment. Moreover, since the pose and intrinsic camera parameters are known,
it is possible to geometrically link the position of each group of pixels produced by
the semantic map to the position in the world frame. By exploiting the class of each
group of pixels, the mobile robot can determine how to avoid obstacles and reach the
desired pose. A mathematical and algorithmic explanation of the proposed system
is discussed next.
Mobile robot pose is defined by its position and orientation, included in the state
vector (5):

x = (z, x, θ )T (5)

where x and z are mobile robot coordinates, and θ is the current heading angle. The
camera utilized by the mobile robot RAICO is tilted downwards for the inclination
angle α. Camera angles of view in terms of image height and width are denoted as γ h
and γ w , respectively. As mentioned in Sect. 4, the output semantic mask is smaller
than the input image; therefore, the dimensions of the output mask are defined with
its width (W ) and height (H) defined in pixels. The geometric relationships between
the output mask and the area in front of the mobile robot, in the vertical plane, can
be seen in Fig. 6.
If the output semantic mask pixel belongs to the class “floor”, we can conclude that
there is no obstacle in that area (e.g., between z1 and z2 in Fig. 6) of the environment,

Fig. 6 Camera geometric information defined in the vertical plane


14 M. Petrović et al.

Fig. 7 Camera geometric information defined in the horizontal plane

and mobile can move to that part of the environment. The same view in the horizontal
plane is shown in Fig. 7.
In order to calculate the geometric relationships, the first task is to determine the
increment of the angle between the edges of the output pixels in terms of both camera
width (β w ) and height (β h ) by using (6) and (7):

βw = γw /W , (6)

βh = γh /H . (7)

Afterward, starting angles for width and height need to be determined by using
(8) and (9):

ϕh = (90 − α) − 0.5γh , (8)

ϕw = (90 − 0.5γw ). (9)

Therefore, it is possible to calculate the edges of the area that is covered by each
pixel of the semantic map, defined with their z and x coordinates (10) and (11):

z i = z c + yc tan(ϕh + (i − 1)βh ), i = 1, ..., H + 1, (10)

xi j = xc − z j /tan(ϕw + (i − 1)βw ), i = 1, ..., W + 1, j = 1, ..., H + 1. (11)


Efficient Machine Learning of Mobile Robotic Systems Based … 15

The example of the generated map by the mobile robot is shown in Fig. 8, where
the green area is accessible while obstacles occupy the red areas.
The width that a mobile robot occupies while moving is defined by its width B (see
Fig. 8). Since the obstacles that are too far away from the robot do not influence the
movement, we defined threshold distance D. The area defined with B and D is utilized
to determine if the obstacle avoidance procedure needs to be initiated. Therefore, if
any of the pixels that correspond to this area include obstacles, the obstacle avoidance
procedure is initiated. The whole algorithm utilized for both goal-achieving behavior
and obstacle avoidance is represented in Fig. 9.

Fig. 8 Representation of the free and occupied areas in the environment

Fig. 9 State-action transitions algorithm within obstacle avoidance algorithm


16 M. Petrović et al.

Fig. 10 Examples of the


trajectory mobile robot will
take with and without
obstacles

It is assumed that the mobile robot is localized (i.e., the initial pose of the mobile
robot is known), and the desired pose is specified. Therefore, the initial plan is to
rotate the mobile until it is directed to the desired position and perform translation
until it is achieved. If an obstacle is detected within the planned path (according to
the robot width), the mobile robot performs an additional obstacle avoidance strategy
before computing new control parameters to achieve the desired pose.
There are five states (S1 –S5 ) in which a mobile robot can be and two actions it
can take. The actions that mobile robot performs are translation or rotation. At the
start of the movement procedure, the mobile robot is in state S1 , which indicates
that rotation to the desired pose needs to be performed. After the rotation is finished,
obstacle detection is performed. If the obstacle is not detected (O = 0), the mobile
robot transitions to state S2 ; otherwise, it transitions to state S3 . Within state S2 , the
mobile robot performs translational movement until the desired position is reached
or until the dynamic obstacle is detected in the robot’s path. In state S3 , the mobile
robot calculates a temporary goal (new goal) and rotates until there is no obstacle in
its direction. Afterward, it transitions to state S4 , where the mobile robot performs
translation until the temporary goal is achieved or a dynamical obstacle is detected.
If the translation is completed, the mobile robot starts rotating to the desired position
(S1 ). On the other hand, if the obstacle is detected in the S4 , the robot transitions to
S3 and generates a new temporary goal. The obstacle avoidance process is performed
until the mobile robot achieves state S5 , indicating the desired pose’s achievement.
An example of the mobile robot’s movement procedure with and without obstacles
is shown in Fig. 10.

6 Experimental Results

The experimental results are divided into two sections. The first includes the results
of training of deep learning models, while the second one involves utilizing the best
model within an obstacle avoidance algorithm.
All CNN models are trained and tested on the same setup to ensure a fair compar-
ison. Models have been trained on the Cityscapes dataset [44] with input images
of 512 × 256 resolution. Low-resolution images are selected since the used NVidia
Efficient Machine Learning of Mobile Robotic Systems Based … 17

Jetson Nano has the lowest level of computation power out of all NVidia Jetson
devices (see Table 1). Models are trained on a deep learning workstation with three
NVidia Quadro RTX 6000 GPUs and two Xeon Silver 4208 CPUs using the PyTorch
v1.6.0 framework. All analyzed models are compared based on two metrics, the mIoU
and FPS achieved on Jetson Nano. At the same time, global accuracy and model size
are also shown to compare the utilized models better. All networks are converted to
TensorRT (using ONNX format) with FP16/INT8 precision to increase the models’
inference time.
Table 3 includes all the variations of all three CNN models, whose detailed expla-
nation is provided in Sect. 4. The experiment is proposed not to change the number
of used blocks for each network but only to change their position within four levels.
Since the networks need to be tested on a real-world mobile robot instead of FLOPs,
we compare the efficiency of the networks in FPS. Also, the network size in MB is
also provided.
The CNN with the best mIoU value is the model RN_8000. The model with the
lowest memory footprint is RN_1D_8000, with its size being only 1.6 MB. The
model with the fastest inference time represented in FPS is RN_sep_1115. However,
since the primary motivation for these experiments was to determine the best network
in regards to the ratio of FPS and mIoU, the network selected for utilization in the
obstacle avoidance process is RN_2600, since it achieves both a high level of accuracy
and number of FPS. The primary motivation for training the CNNs on the Cityscapes
dataset is its popularity and complexity. Afterward, the selected network is trained
again on the Sun indoor dataset [49] to be used in mobile robot applications.
By utilizing the algorithm proposed in Sect. 5, the obstacle avoidance ability
of the mobile robot RAICO is experimentally evaluated (Fig. 11). Mobile robot is
positioned on the floor within the ROBOTICS&AI laboratory.
Mobile robot is set to initial pose x = (0, 0, 0), while the desired pose is set to be xd
= (600,100,-0.78). The change in pose of the mobile robot is calculated according to
the dead-reckoning odometry by utilizing wheel encoders [50]. A spherical obstacle
is set to a position (300, 50) with a diameter of roughly 70 mm. The proposed
algorithm is started, and the mobile robot achieves the trajectory shown in Fig. 12.
Mobile robot representation is shown with different colors for different States (see
Sect. 5), S1 is red, S2 is blue, S3 is dark yellow, and S4 is dark purple. Moreover, the
obstacle is indicated with a blue-filled circle. Desired and final positions are shown
with black and green dots, respectively.
Moreover, the selected images mobile robot acquired and semantic segmentation
masks generated by the CNN overlayed over the image can be seen in Fig. 13. As it
can be seen, segmentation of floor (red), wall (teal), chairs (blue), tables (black), and
objects (yellow) is performed well, with precise edges between mentioned classes.
By utilizing accurate semantic maps, mobile robot was able to dodge the obstacle,
and successfully achieve the desired pose.
Now, we show Fig. 14 with four examples of the influence of the semantic maps
on free and occupied areas in the environment generated during the experimental
evaluation. The green area is free, and it corresponds to the floor class, while the red-
occupied area corresponds to all other classes. In the first image, the mobile robot
18 M. Petrović et al.

Table 3 Experimental results for all CNN models


Num Model title Blocks Output Model size FPS mIoU Accuracy
per level resolution [mb] [%] [%]
1 RN_2222 [2 2 2 2] 8 × 16 46.0 44.7 28.674 77.816
2 RN _1133 [1 1 3 3] 8 × 16 67.6 46.0 27.672 77.255
3 RN _1115 [1 1 1 5] 8 × 16 95.3 46.8 27.060 76.541
4 RN _2330 [2 3 3 0] 16 × 32 17.3 43.4 36.091 81.543
5 RN _1160 [1 1 6 0] 16 × 32 28.5 44.7 34.508 81.089
6 RN _4400 [4 4 0 0] 32 × 64 5.7 41.9 40.350 83.484
7 RN _2600 [2 6 0 0] 32 × 64 7.5 42.6 40.668 83.651
8 RN _8000 [8 0 0 0] 64 × 128 2.4 38.7 40.804 84.589
9 RN _1D_2222 [2 2 2 2] 8 × 16 33.7 41.4 28.313 77.236
10 RN _1D_1133 [1 1 3 3] 8 × 16 48.2 41.7 27.304 77.115
11 RN _1D_1115 [1 1 1 5] 8 × 16 66.6 42.2 26.854 76.420
12 RN _1D_2330 [2 3 3 0] 16 × 32 12.3 40.9 34.819 80.928
13 RN _1D_1160 [1 1 6 0] 16 × 32 19.8 41.3 33.937 80.693
14 RN _1D_4400 [4 4 0 0] 32 × 64 4.0 40.2 39.065 83.304
15 RN _1D_2600 [2 6 0 0] 32 × 64 5.2 40.8 39.212 83.061
16 RN _1D_8000 [8 0 0 0] 64 × 128 1.6 37.7 37.947 84.058
17 RN_sep_2222 [2 2 2 2] 8 × 16 25.6 46.4 28.901 78.000
18 RN_sep_1133 [1 1 3 3] 8 × 16 41.3 48.0 28.563 77.383
19 RN_sep_1115 [1 1 1 5] 8 × 16 61.3 48.7 27.909 77.250
20 RN_sep_2330 [2 3 3 0] 16 × 32 10.7 42.9 36.452 81.834
21 RN_sep_1160 [1 1 6 0] 16 × 32 18.8 44.7 35.172 81.018
22 RN_sep_4400 [4 4 0 0] 32 × 64 3.8 39.1 39.990 83.894
23 RN_sep_2600 [2 6 0 0] 32 × 64 5.0 40.1 39.529 83.125
24 RN_sep_8000 [8 0 0 0] 64 × 128 1.8 34.5 38.010 84.062

detects the obstacle in its path and then rotates to the left until robot can avoid an
obstacle. The second image corresponds to the moment the obstacle almost leaves
the robot’s field of view due to its translational movement. The third image represents
the moment at the end of the obstacle avoidance state, and the last image is generated
near the final pose. By analyzing images in the final pose, it can be seen that mobile
robot accurately differentiate between free space (floor) in the environment and (in
this case) the wall class that represents the occupied area. This indicates that it is
possible to further utilize CNN models in conjunction with proposed free space
detection algorithm to define the areas in which mobile robot can safely perform
desired tasks.
Efficient Machine Learning of Mobile Robotic Systems Based … 19

Fig. 11 Mobile robot RAICO with the obstacle in the environment

Fig. 12 Real trajectory Mobile robot trajectories


mobile robot achieved
400

300
X axis [mm]

200

100

-100
0 200 400 600
Z axis [mm]

7 Discussion of Results

The experimental results are divided into two sections, one regarding finding the
optimal CNN model in terms of both accuracy and inference speed and the other
regarding experimental verification with the mobile robot. Within the first part of
the experimental evaluation, three types of CNN models with a different number of
layers in each level are analyzed. The experimental results show that the best network
(RN_8000) in terms of accuracy is the one with all layers concentrated in the first
20 M. Petrović et al.

Fig. 13 Mobile robot perception during movement


Efficient Machine Learning of Mobile Robotic Systems Based … 21

Fig. 14 Mobile robot obstacle detection during movement

level. This type of CNN has the highest output resolution, which is the main reason
why it provides the overall best results. Moreover, the general trend is that networks
with fewer levels have higher accuracy (the best CNN model has 40.8 mIoU and
84.6 accuracy). Regarding the model size, it is shown that networks with many layers
concentrated in the fourth level occupy more disk space (the largest model occupies
96 MB of disk space compared to the smallest model, which occupies only 1.6 MB).
22 M. Petrović et al.

The main reason for this occurrence is that the higher levels include a larger number
of filters. However, the largest CNN models also have the lowest inference time and,
therefore, the highest number of FPS, reaching even 48.7 average FPS. Moreover, it
is shown that both proposed improvements of the network (depth separable and 1D
convolution) show a marginal decrease in inference time at the expense of a slight
decrease in accuracy. By exploiting the ratio between the accuracy and inference
time, the RN_2600 CNN model is utilized in the experiment with a mobile robot.
This network achieved the second-best accuracy and is 4 FPS faster than the network
with the best accuracy. Moreover, since modern SSD or micro-SD cards are readily
available with disk spaces much larger than the proposed size of the models in MB,
it can also be concluded that the disk size all models occupy are is not a substantial
restriction.
On the other hand, the main achievement of this chapter is shown through the
experiment with the mobile robot RAICO as it performed obstacle avoidance to
demonstrate a case study for the utilization of an accurate and fast CNN model.
The model is employed within the obstacle detection and avoidance algorithm. The
output of the network is processed and generates the output semantic segmentation
masks. Afterward, the geometric relationship between the camera position and its
parameters is utilized to determine the free area in the environment. If the obstacle
is detected close to the mobile robot path, the algorithm transitions the mobile robot
from goal achieving states to obstacle avoidance states. Mobile robot avoids obstacle
and transitions back to the goal-achieving states, all while checking for new obstacles
in the path. Experimental evaluation reinforces the validity of the proposed algorithm,
which can, in conjunction with the CNN model, successfully avoid obstacle and
achieve the desired position with a satisfactory error. Moreover, since the proposed
algorithm has shown accurate free space detection, it can be further utilized within
other mobile robotic tasks.

8 Conclusion

In this work, we propose applying an efficient deep learning model employed for the
obstacle avoidance algorithm. The CNN model is used in real-time on the Jetson Nano
development board. The utilized CNN model is inspired by the ResNet model inte-
grated with depth separable convolution and 1D convolution processes. We proposed
and trained 24 variants of CNN models for semantic segmentation. The best model
is selected according to the ratio of mIoU measure and the number of FPS it achieves
on Jetson Nano. The selected model is RN_2600 with two levels of layers and it
achieves 42.6 FPS with 40.6 mIoU. Afterward, the selected CNN model is employed
in the novel obstacle avoidance algorithm. Within obstacle avoidance, the mobile
robot has four states. Two states are reserved for goal achieving and two for obstacle
avoidance purposes. According to the semantic mask and known camera pose, the
area in front of the mobile is divided into free and occupied sections. According to
those areas, mobile robot transitions between goal-seeking and obstacle avoidance
Efficient Machine Learning of Mobile Robotic Systems Based … 23

states during the movement procedure. The experimental evaluation shows that the
mobile robot managed to avoid obstacle successfully and achieve the desired position
with an error in the Z direction of −15 mm, and 23 mm in the X direction, generated
according to the wheel encoder data.
Further research directions include the adaptation of proposed CNN modes and
their implementation on an industrial-grade mobile robot with additional computa-
tional resources. The proposed method should be a subsystem of the entire mobile
robot decision-making framework.

Acknowledgements This work has been financially supported by the Ministry of Education,
Science and Technological Development of the Serbian Government, through the project “Inte-
grated research in macro, micro, and nano mechanical engineering–Deep learning of intelligent
manufacturing systems in production engineering”, under the contract number 451-03-47/2023-
01/200105, and by the Science Fund of the Republic of Serbia, Grant No. 6523109, AI-MISSION4.0,
2020-2022.

Appendix A

Abbreviation List

RAICO Robot with Artificial Intelligence based COgnition


AI Artificial Intelligence
ML Machine Learning
ANN Artificial Neural Networks
DL Deep Learning
CNN Convolutional Neural Network
FLOPs FLoating Point Operations
FPS Frames Per Second
VGG Visual Geometry Group
R-CNN Region–Convolutional Neural Network
SSD Single Shot Detector
YOLO You Only Look Ones
SLAM Simultaneous Localization And Mapping
LSTM Long-Short Term Memory
RGBD Red Green Blue Depth
BN Batch Normalization
ReLU Rectified Linear Unit
BB Basic Block
BRB Basic Reduction Block
DB 1D Block
DRB 1D Reduction Block
SB Separation Block
24 M. Petrović et al.

SRB Separation Reduction Block


RN ResNet
mIoU Mean Intersection over Union
ONNX Open Neural Network eXchange.

References

1. Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012) ImageNet classification with deep convolu-
tional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
2. Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. (2016). Deep learning for visual
understanding: A review. Neurocomputing, 187, 27–48.
3. LeCun, Y., & Bengio, Y. (1995) Convolutional networks for images, speech, and time series.
Handbook brain theory neural networks (Vol. 3361, no. 10).
4. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In
IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778).
5. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image
recognition. In 3rd International Conference on Learning Representations (pp. 1–14).
6. Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceed-
ings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1251–1258).
7. Nguyen, H., & Cheah, C.C. (2022). Analytic deep neural network-based robot control.
IEEE/ASME Transactions Mechatronics (pp. 1–9).
8. Jokić, A., Petrović, M., & Miljković, Z. (2022). Mobile robot decision-making system based
on deep machine learning. In 9th International Conference on Electrical, Electronics and
Computer Engineering (IcETRAN 2022) (pp. 653–656).
9. Miljković, Z., Mitić, M., Lazarević, M., & Babić, B. (2013). Neural network reinforcement
learning for visual control of robot manipulators. Expert Systems with Applications, 40(5),
1721–1736.
10. Miljković, Z., Vuković, N., Mitić, M., & Babić, B. (2013). New hybrid vision-based control
approach for automated guided vehicles. International Journal of Advanced Manufacturing
Technology, 66(1–4), 231–249.
11. Petrović, M., Ci˛eżkowski, M., Romaniuk, S., Wolniakowski, A., & Miljković, Z. (2021).
A novel hybrid NN-ABPE-based calibration method for improving accuracy of lateration
positioning system. Sensors, 21(24), 8204.
12. Mitić, M., Vuković, N., Petrović, M., & Miljković, Z. (2018). Chaotic metaheuristic algorithms
for learning and reproduction of robot motion trajectories. Neural Computing and Applications,
30(4), 1065–1083.
13. Mitić, M., & Miljković, Z. (2015). Bio-inspired approach to learning robot motion trajectories
and visual control commands. Expert Systems with Applications, 42(5), 2624–2637.
14. Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A.,
Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T.,
Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin,
M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., &
Amodei, D. (2020). Language models are few-shot learners. In Advances in Neural Information
Processing Systems (Vol. 33, pp. 1877–1901).
15. Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D., Texier,
M., & Dean, J. (2021). Carbon emissions and large neural network training (pp. 1–22).
arXiv:2104.10350.
16. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep
learning in NLP. arXiv:1906.02243.
Efficient Machine Learning of Mobile Robotic Systems Based … 25

17. Paszke, A., Chaurasia, A., Kim, S., & Culurciello, E. (2016). ENet: a deep neural network
architecture for real-time semantic segmentation. arXiv:1606.02147.
18. Romera, E., Alvarez, J. M., Bergasa, L. M., & Arroyo, R. (2017). Erfnet: Efficient residual
factorized convnet for real-time semantic segmentation. IEEE Transactions on Intelligent
Transportation Systems, 19(1), 263–272.
19. Milioto, A., Lottes, P., & Stachniss, C. (2018) Real-time semantic segmentation of crop and
weed for precision agriculture robots leveraging background knowledge in CNNs. In 2018
IEEE International Conference on Robotics and Automation (pp. 2229–2235).
20. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., & Keutzer, K. (2016).
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size (pp. 1–
13). arXiv:1602.07360.
21. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., &
Adam, H. (2017). MobileNets: Efficient convolutional neural networks for mobile vision
applications. arXiv:1704.04861.
22. Zhang, X., Zhou, X., Lin, M., & Sun, J. (2018). Shufflenet: An extremely efficient convolutional
neural network for mobile devices. Proceedings of the IEEE Conference on Computer Vision
Pattern Recognition (pp. 6848–6856).
23. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). MobileNetV2:
Inverted residuals and linear bottlenecks. IEEE Conference on Computer Vision Pattern
Recognition (pp. 4510–4520).
24. Ma, N., Zhang, X., Zheng, H.-T., & Sun, J. (2018). Shufflenet v2: Practical guidelines for
efficient CNN architecture design. Proceedings of European Conference on Computer Vision
(pp. 116–131).
25. Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards real-time object
detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 39(6), 1137–1149.
26. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-
time object detection. IEEE Conference on Computer Vision Pattern Recognition (pp. 779–788).
27. Redmon, J., & Farhadi, A. (2017) YOLO9000: Better, faster, stronger. IEEE Conference on
Computer Vision Pattern Recognition (pp. 7263–7271).
28. Redmon, J., & Farhadi, A. (2018) YOLOv3: An incremental improvement. arXiv:1804.02767.
29. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD:
Single shot multibox detector. In European Conference Computer Vision (pp. 21–37).
30. Chen, X., & Guhl, J. (2018). Industrial robot control with object recognition based on deep
learning. Procedia CIRP, 76, 149–154.
31. Gao, Q., Liu, J., & Ju, Z. (2020). Robust real-time hand detection and localization for space
human–robot interaction based on deep learning. Neurocomputing, 390, 198–206.
32. Xiao, L., Wang, J., Qiu, X., Rong, Z., & Zou, X. (2019). Dynamic-SLAM: Semantic monocular
visual localization and mapping based on deep learning in dynamic environment. Robotics and
Autonomous Systems, 117, 1–16.
33. Hwang, C.-L., Wang, D.-S., Weng, F.-C., & Lai, S.-L. (2020). Interactions between specific
human and omnidirectional mobile robot using deep learning approach: SSD-FN-KCF. IEEE
Access, 8, 41186–41200.
34. Zhao, K., Wang, Y., Zuo, Y., & Zhang, C. (2022). Palletizing robot positioning bolt detection
based on improved YOLO-V3. Journal of Intelligent and Robotic Systems, 104(3), 1–12.
35. Liu, C., Li, X., Li, Q., Xue, Y., Liu, H., & Gao, Y. (2021). Robot recognizing humans intention
and interacting with humans based on a multi-task model combining ST-GCN-LSTM model
and YOLO model. Neurocomputing, 430, 174–184.
36. Jokić, A., Petrović, M., & Miljković, Z. (2022). Semantic segmentation based stereo visual
servoing of nonholonomic mobile robot in intelligent manufacturing environment. Expert
Systems with Applications, 190, 116203.
37. Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Martinez-Gonzalez,
P., & Garcia-Rodriguez, J. (2018). A survey on deep learning techniques for image and video
semantic segmentation. Applied Soft Computing, 70, 41–65.
26 M. Petrović et al.

38. Hu, P., Perazzi, F., Heilbron, F. C., Wang, O., Lin, Z., Saenko, K., & Sclaroff, S. (2020). Real-
time semantic segmentation with fast attention. IEEE Robotics and Automation Letters, 6(1),
263–270.
39. Falaschetti, L., Manoni, L., & Turchetti, C. (2022). A low-rank CNN architecture for real-
time semantic segmentation in visual SLAM applications. IEEE Open Journal of Circuits and
Systems.
40. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical
image segmentation. In International Conference on Medical Image Computing and Computer-
Assisted Intervention (pp. 234–241).
41. Alonso, I., Riazuelo, L., & Murillo, A. C. (2020). Mininet: An efficient semantic segmentation
convnet for real-time robotic applications. IEEE Transactions on Robotics, 36(4), 1340–1347.
42. Seichter, D., Köhler, M., Lewandowski, B., Wengefeld, T., & Gross, H.-M. (2021) Efficient
rgb-d semantic segmentation for indoor scene analysis. 2021 IEEE International Conference
on Robotics and Automation (pp. 13525–13531).
43. Bokovoy, A., Muravyev, K., & Yakovlev, K. (2019). Real-time vision-based depth reconstruc-
tion with NVidia Jetson. In 2019 European Conference on Mobile Robots (pp. 1–6).
44. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth,
S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In
Proceedings IEEE Conference on Computer Vision and Pattern Recognition (pp. 3213–3223).
45. Dustin, F. (2022) Hello AI world NVidia Jetson. https://github.com/dusty-nv/jetson-inference.
46. Petrović, M., Miljković, Z., & Jokić, A. (2019). A novel methodology for optimal single mobile
robot scheduling using whale optimization algorithm. Applied Soft Computing, 81, 105520.
47. Petrović, M., Jokić, A., Miljković, Z., & Kulesza, Z. (2022). Multi-objective scheduling of
single mobile robot based on grey wolf optimization algorithm. SSRN.
48. Petrović, M., Miljković, Z., Babić, B., Vuković, N., & Čović, N. (2012). Towards a conceptual
design of intelligent material transport using artificial intelligence. Strojarstvo, 54(3), 205–219.
49. Song, S., Lichtenberg, S.P., & Xiao, J. (2015). SUN RGB-D: A RGB-D scene under-
standing benchmark suite. In Proceedings IEEE Conference on Computer Vision and Pattern
Recognition (pp. 567–576).
50. Corke, P. (2017). Robotics, vision and control: Fundamental algorithms in MATLAB®.
Springer.
UAV Path Planning Based on Deep
Reinforcement Learning

Rui Dong, Xin Pan, Taojun Wang, and Gang Chen

Abstract Currently, UAV has been used for the military and civil purposes, espe-
cially the rotor UAV, which has the ability of vertical take-off and landing, has six
degrees of freedom and can hover in the air. Because of its high mobility, it has
become a working platform for various environments with different purposes. When
UAV performs autonomous flight mission, the static and dynamic obstacle environ-
ment occurs, therefore, research on effective obstacle avoidance and path planning
technology for unknown environment is very important. Traditional path planning
technology needs to rely on map information and high real-time algorithm, which
requires huge storage space and computing resources. In this chapter, the author
studies the deep reinforcement learning algorithm for UAV path planning. In view of
the current challenges faced by UAVs in autonomous flight in obstacle environments,
this chapter proposes an improved DQN algorithm combined with artificial poten-
tial fields, establishing a reward function to evaluate the behavior of UAV, which
could guide the UAV to reach the target point as soon as possible under the premise
of avoiding obstacles. The network structure, state space, action space and reward
function of the DQN algorithm is designed and a UAV reinforcement learning path
planning system is established. In order to better verify the advantages of the algo-
rithm proposed in this chapter, a comparative experiment between the improved DQN
algorithm and the DQN algorithm is carried out. The path planning performance of
the DQN algorithm and the improved DQN algorithm in the same environment is

R. Dong · X. Pan · T. Wang · G. Chen (B)


State Key Laboratory for Strength and Vibration of Mechanical Structures, School of Aerospace
Engineering, Xi’an Jiaotong University, No. 28, Xianning West Road, Xi’an 710049, Shaanxi,
China
e-mail: [email protected]
URL: http://gr.xjtu.edu.cn/web/aachengang
R. Dong
e-mail: [email protected]
X. Pan
e-mail: [email protected]
T. Wang
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 27


A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous
Systems Applications, Studies in Computational Intelligence 1093,
https://doi.org/10.1007/978-3-031-28715-2_2
28 R. Dong et al.

compared. The loss function and success rate are selected as the comparison criteria.
The experimental results show that the improved DQN algorithm is faster and more
stable than the DQN algorithm for UAV path planning, which verifies the superiority
of the improved DQN algorithm for path planning.

Keywords Rotor UAV · Path planning · Trajectory planning · Deep reinforcement


learning · DQN

1 Introduction

1.1 Research Background and Significance

The multirotor has the function of vertical take-off and landing, has six degrees
of freedom, and can hover in the air. UAVs can provide many conveniences for
human society. In the military field, UAVs can be used for tasks such as target strike,
reconnaissance monitoring, and information communication; in the civilian field,
security inspections, plant protection, etc., epidemic prevention and control, aerial
photography and other aspects are more and more widely used.
The development of artificial intelligence and information technology has put
forward higher and higher requirements for the autonomous intelligence of UAVs.
UAVs need path planning technology to perform various tasks autonomously. The
task of path planning is to obtain a path from the starting point to the target point,
preventing collisions with obstacles and making the path as short as possible. UAVs
have six degrees of freedom and can move in all directions in the air, so two-
dimensional path planning cannot meet the needs of UAVs. For 3D environments,
there are a lot of complex structures and uncertainties, especially in complex envi-
ronments such as forests, caves, and cities, and efficient 3D path planning algorithms
are required. However, the research on the 3D path planning algorithm of UAV is
full of challenges. Since the birth of the UAV path planning algorithm, its adaptive
stability in scenes with high environmental complexity, unstable external light, and
high obstacle dynamics It has always been a difficult point in this field.
UAVs are usually equipped with depth cameras and lidars to perceive the
surrounding environment, establish environmental maps through vision and laser
SLAM (Simultaneous Localization and Mapping) technologies, map the environ-
mental maps into a form that can be processed by computers, and then perform path
planning. Structural design is currently the most effective solution in the field of
autonomous driving. However, path planning in an unknown environment without
map information is more complicated than in a known environment. When the envi-
ronment is unknown, complex obstacles and unexpected events make the movement
of the UAV rely on the data collected from the sensors and the efficiency of the algo-
rithm to quickly decide a passable path to avoid obstacles and navigate. to the target
location. Therefore, when there is an obstacle in the driving direction of the UAV,
the path planning algorithm must not only decide to move in a certain direction to
UAV Path Planning Based on Deep Reinforcement Learning 29

avoid the obstacle, but also comprehensively consider the overall path condition to
form a good trajectory, which requires a shorter Route length and less time spent. In
recent years, R&D technicians have been continuously studying the use of different
algorithms and methods to deal with problems in path planning, so that the path plan-
ning of UAVs can achieve the best results. Research on effective obstacle avoidance
and path planning technology for unknown environments is crucial for autonomous
driving of UAVs.

1.2 Research Status

• Classical path planning algorithm

Path planning is an important basis for drones to achieve autonomous flight tasks.
The purpose is to obtain a global path from the starting point to the target location,
reduce collisions with obstacles, and make the path as short as possible. At present, the
classical path planning algorithms are mainly divided into three categories, namely
artificial potential field, heuristic search and sampling-based algorithm. The artificial
potential field method was published by Khatib [1] in 1985. By defining a potential
field function, a potential field is artificially assigned to each point in the space, so
that the obstacles in the potential field are opposite to each other. The mobile robot
generates repulsive force, and the target point attracts the mobile robot, so the robot
will drive to the attractive target point, and effectively avoid collision due to the
repulsive force of obstacles [2].
The design of the algorithm has the disadvantage that it is easy to fall into the
local extremum. When the attractive and repulsive forces received by the mobile
robot in the potential field cancel out, the mobile robot will stop moving. Secondly,
the artificial potential field does not introduce kinematic and dynamic constraints.
The mobile robot flies in the direction of the resultant force in the configuration
space without any flight angle restrictions. Therefore, the planned trajectory does
not conform to the dynamic model, and the UAV cannot be planned according to the
actual situation. trajectory flight. Mabrouk in [3] proposed a new extended artificial
potential field method, which uses dynamic internal agent states. Internal state is
modeled as a coupled first-order differential equation of dynamic system, the equation
of the manipulation of the agent’s potential field, internal state dynamic is forced
by the agent and the external environment interaction, and local equilibrium was
monopolized by the internal state of potential field, and from stable equilibrium to
unstable equilibrium, allowing the escape from local minima in the potential field.
This new method successfully solves the complex maze problem with multiple local
minima that cannot be completed by traditional artificial potential fields.
The method of heuristic search is also a representative class of path planning
algorithms. The method is based on a sampling strategy to discretize the configu-
ration space and transform the path search problem into a graph search, which is
easier to handle than the continuous problem. Dijkstra’s algorithm [4] can quickly
30 R. Dong et al.

plan the shortest path. Its main idea is to find the unvisited node with the shortest
current distance, mark it as visited, and update its distance from adjacent nodes, and
execute the loop until all nodes are visited. The algorithm is applicable to graphs
where all edge weights are non-negative. The A* search algorithm improves the
Dijikstra algorithm and introduces two parts: equal cost search and heuristic search
to comprehensively evaluate the cost of the path traversed and the cost of the path to
be searched in the future. Because of the introduction of the heuristic idea, the path
planning storage node and consume less time [5].
A* search algorithm has similar shortcomings to Dijikstra’s algorithm. The algo-
rithm requires the nodes in the environment to be continuously expanded from the
starting point through a certain strategy, and the expanded nodes are saved in the
open list and closed list, and the search and comparison are continuously performed.
Operations, so search strategies in high-dimensional spaces require high memory
requirements and computational resources. Podsedkowski compares a variety of
heuristic functions, discretizes the nodes in the space, and when a new obstacle is
detected, the obstacle node and the node related to the obstacle are removed from
the open list, so that the algorithm’s performance is improved. The author in [6,
7] proposed an improved A * algorithm, which was applied to automated guided
vehicles by traversing all nodes on the path and removing unnecessary nodes and
connections. Sedighi in [8] presented the A * search method with the visibility graph
search, introduces an application-aware cost function, uses the derived shortest path
to provide the correct waypoint and combines A * to plan the optimal path with
respect to nonholonomic constraints.
In addition to the two algorithms of artificial potential field and heuristic function,
the other kind is the algorithm based on sampling. In 1998, Valley proposed the
Rapidly-exploring Random Trees (RRT) algorithm. By randomly extracting a free
space area and expanding outward from the starting point, it is a kind of algorithm that
can inevitably search for a feasible path but not the shortest path [9]. But Karaman
[10] proved that RRT is not asymptotically optimal. Kavraki The road map algorithm
PRM (Probabilistic Road Maps) [11] is proposed, which mainly builds a map through
sampling, and then introduces the A * algorithm for path finding. Karaman also
proposed methods based on progressive optimal sampling, including RRG (Rapidly
Exploring Random Graph), PRM * and RRT*, with the increase of samples, the
solution converges to the global optimum. RRG is an extension of the RRT algorithm
that connects new samples not only to the nearest node, but also to all other nodes in
range, and searches for a path after building the graph. The PRM* algorithm attempts
to connect a range of roadmap vertices, and RRT* is an asymptotically optimal form
of RRT, using a rerouting mechanism that reconnects nodes locally in the tree and
maintains the distance from the root node to each the shortest path to a leaf node.
Webb et al. [12] fixed the final state and free final time optimal controllers in
combination with the RRT* method to ensure the asymptotic optimality and dynam-
ical feasibility of the path. In this method, the optimal state transition trajectory
connecting the two states is computed by solving a linear quadratic regulator problem.
Bry and Roy [13] proposed another method combining RRG and belief roadmap.
In this method, a partial order is introduced to weigh beliefs and distances while
UAV Path Planning Based on Deep Reinforcement Learning 31

expanding the graph in the belief space. There are also some improvements that
can speed up the convergence rate, such as RRT*-Smart [14], informed RRT* [15],
which shows some advantages over the classical RRT algorithm in various scenarios.
Based on the sampling algorithm, the FAST lab at ZJU [16] proposed a lightweight but
effective topology-guided motion dynamics planner, TGK-Planner, for fast quadrotor
flight with limited on-board computational resources. The system follows the tradi-
tional hierarchical planning workflow and uses a novel design to improve the robust-
ness and efficiency of the pathfinding and trajectory optimization submodules. The
method proposes a topology-guided graph, which guides the state sampling of the
dynamic planner through the topology of the rough environment, thereby improving
the efficiency of exploring safe and dynamically feasible trajectories, and integrates
the proposed system into a fully autonomous quadrotor aircraft and verified in a
variety of simulated and real-world scenarios.
• Bionic algorithm and fusion algorithm
Traditional path planning algorithms solve the problem of finding a passable path and
can control mobile robots to avoid obstacles, but they all have common defects. The
traditional path planning method requires a lot of data, so it needs a lot of storage and
computing space. In order to reduce the demand for computer hardware and reduce
the consumption of storage space, researchers often propose innovative solutions,
such as bionic algorithms and fusion algorithms.
Genetic algorithm is a random search algorithm inspired by the evolution mecha-
nism of biological evolution in nature. This method abstracts the problem to be solved
into chromosomes, defines a group by chromosomes, and evaluates all individuals
in the group through the adaptability of chromosomes to the environment, so as to
guide the evolution of the population, and finally iterates the optimal solution to the
problem. The algorithm performs the generation of the next generation of offspring by
crossover operators, and there are many crossover operators for each type of chro-
mosome representation associated with different types of optimization problems.
The crossover operations in the genetic algorithm are designed to solve combinato-
rial optimization problems based on permutations, which are computationally more
expensive compared to other cases. The crossover operations in the genetic algorithm
are designed to solve combinatorial optimization problems based on permutations,
which are computationally more expensive compared to other cases. This is mainly
due to the fact that duplicate numbers are not allowed in chromosomes, and therefore
the offspring need to be legalized after each substring exchange. The time required
to perform the crossover operation increases significantly with the chromosome size,
which can seriously affect the efficiency of these genetic algorithms.
Koohestani [17] proposed a genetic algorithm in the form of partial map crossover
substitution, which represented the path as a chromosome. Numerical experimental
results on a benchmark problem show that using this crossover operator can improve
the effectiveness of permutation-based genetic algorithms and help efficiently handle
path planning problems. Lamini et al. [18] proposed an improved crossover operator,
an algorithm that generates the optimal path through a genetic algorithm model,
thereby preventing the crossover operator from prematurely converging, and giving
32 R. Dong et al.

a better fitness value than its parent the feasible path to make the algorithm converge
faster. A new fitness function that considers distance, stability and energy is also
described. In order to verify the effectiveness of the scheme, it is applied to many
different environments, and the results show that the genetic algorithm with improved
crossover operator and fitness function is beneficial to the solution of the optimal
solution compared with other methods. Combining genetic algorithm with artificial
potential field technology can overcome the problem that artificial potential field
algorithm is easy to fall into local minimum. Li et al. [19] adopted the idea of
integrating two algorithms to rasterize the environment. First, the genetic algorithm
was used for path planning. On this basis, the artificial potential field method was
used for local dynamic obstacle avoidance, which was able to handle local minima.
It improves the search efficiency and the ability to solve path planning problems.
The swarm intelligence algorithm that integrates bionic and artificial intelligence
technology is also a popular research direction. It mainly simulates the food-seeking
behavior of groups represented by fish swarms, bee swarms and bird flocks, and
optimizes by accumulating the experience of all members in the swarm. the direction
of the search. Liang and Lee [20] provided an artificial bee colony path planning
method for swarm robots. Using the artificial bee colony objective function, it can
reach the target point without collision, and proposed a real-time sharing strategy
and a method of adjusting the size of bees. Avoid obstacles and other members of the
group. Ant colony algorithm is an intelligent optimization algorithm, which imitates
the behavior of ants to find paths according to the concentration of pheromone.
Because of its advantages of good feedback information, strong robustness and strong
distributed computing ability, it has been applied to the path planning of mobile
robots. It also has the problem of slow convergence. Akka and Khaber [21] optimized
the ant colony algorithm, adding stimulus probability while pheromone concentration
guided the selection network, expanding the exploration range and improving the
visibility accuracy. And the improved algorithm introduces new pheromone update
rules and dynamic regulation of evaporation rate, which improves the convergence
rate and expands the search space.
Su et al. [22] designed an improved ant colony algorithm to solve the problem
that the traditional ant colony algorithm is prone to path redundancy and easy to fall
into the local optimal solution. First, analyze the process of ant movement, because
they choose the path base on probability, and the path has redundancy, backward, and
wave-like forward, so it is difficult to find the optimal path. To solve this problem, path
correction is used to modify the path to the target point, thus effectively improving the
convergence of the ant colony algorithm and obtaining a shorter optimal path, while
avoiding the pheromone update of some redundant paths to affect the probability
of later ant colony path selection. For the obtained optimal path, the path nodes are
optimized to improve the path smoothness. After the simulation test, the improved
ant colony algorithm has better convergence and fewer nodes than the conventional
ant colony algorithm, which is more in line with the actual needs of robot motion. In
order to improve the performance of bird flock search, Cheng, Wang and Xiong [23]
proposed a new improved bird flock search algorithm. In this algorithm, the search
range is expanded by adding exploration strategies. At the same time, the control
UAV Path Planning Based on Deep Reinforcement Learning 33

parameters of step size and discovery probability are adaptively adjusted through the
improvement rate of the solution to the optimal value.
Fuzzy logic is a method that does not clearly agree the result but a certain value
range. It is an implicit control strategy commonly used in motion control systems.
Khaksar, Hong, Khaksar and Motlagh [24] proposed an algorithm based on real-
time sampling. To evaluate the generated samples, the genetic algorithm strategy
is adopted to improve the controller parameters, and the scope of application of the
algorithm is improved. Xiang [25] proposed an improved dynamic window algorithm
DWA (Dynamic Window Approach), adding the weight coefficient of the original
DWA evaluation function to the fuzzy controller to realize the weight coefficient self-
adaptation, so as to adapt to a more complex environment and generate a smoother
path.

1.2.1 Research Status of Learning-Based Path Planning Algorithms

• Reinforcement learning method

Reinforcement learning methods have been tried in many scenarios and tested in
Atari games, using high-dimensional image information as input, and using game
scores as evaluations to surpass human performance through reinforcement learning
strategies [26]. In 2010, Jaradat et al. [27] et al. proposed to apply the Q–Learning
method to the path planning problem of mobile robots, limiting the number of states
and reducing the size of the Q table. Shi et al. [28] combined the objective function
of the ant colony algorithm with the Q-Learning algorithm, and used pheromone
to spread information among swarm agents, realizing the information interaction of
multi-robot path planning.
• Deep learning methods
Due to the update of high-performance computing hardware, deep neural networks
have shown great potential in dealing with complex computing problems. However,
the practice of deep learning in the field of robotics is usually limited by various
constraints. On the one hand, the workspace is not completely observable and will
change at any time [29]. On the other hand, robots are usually used in complex
working environments, so they will greatly Increase sample space. Usually, in order
to simplify the problem, the workspace is discretized [30, 31]. Due to the advance-
ment of graphics processing capabilities, deep neural networks will also be applied
to high-dimensional complex environments, and have been successfully applied to
obstacle avoidance tasks based on depth images [32]. Some neural network-based
methods have been proposed for Solve the problem of autonomous navigation of
small UAVs in unknown environments, but the network after training is opaque,
unintuitive and difficult to understand, which affects the use in the real world. He
et al. [33] proposed an interpretable deep neural network path planning method for
autonomous flight of quadrotors in unknown environments. The navigation problem
is described as a Markov decision process, and to better understand the trained model,
34 R. Dong et al.

a new model interpretation method based on feature attributes is proposed, and some
easily interpretable textual and visual explanations are generated to allow the end
user to understand What triggers a particular behavior. In addition, a global anal-
ysis is performed to evaluate and improve the network, and real-world flight tests are
performed to verify that the trained path planner can be directly applied to real-world
environments.
Jeong et al. [34] proposed a learning model that simplifies the processing steps.
The laser information is input into the neural network, and then the A* algorithm
is used to label the information for supervised learning. After training, it can pass
the two-dimensional laser data and target coordinates. Directly output robot motion
commands. Chen et al. [35] used semantic information obtained from pictures by
deep neural networks to make behavioral decisions for autonomous vehicles. Wu et al.
[36] proposed a deep neural network approach for real-time online path planning in
unknown cluttered environments, and designed an end-to-end deep neural network
architecture for online 3D path planning networks to learn 3D Local path planning
strategy. It is based on multivalued iterative computation approximated by a recurrent
2D convolutional neural network to determine actions in 3D space. In addition, a
path planning framework is developed to achieve near-optimal real-time online path
planning.
• Deep reinforcement learning method
Deep reinforcement learning combines the abstract ability of deep learning and the
strategy of reinforcement learning, which can be more suitable for human thinking to
solve practical problems. Maw et al. [37] proposed a hybrid path planning algorithm
that uses a graph-based path planning algorithm for global planning using deep
reinforcement learning for local planning, which is applied to a real-time mission
planning system for autonomous UAVs. It mainly solves the problem that local
planning and collision avoidance are not fully considered in the shortest path search.
The main work consists of two main parts: optimal flight path generation and collision
avoidance, a graph-based path planning algorithm is fused with a learning-based local
planning algorithm, and a hybrid path planning method is developed to allow UAVs
to avoid collisions in real time. The global path planning problem is solved in the
first stage using a novel incremental search algorithm on-the-fly called Modified
On-the-Fly A*, validated in the AirSim environment.
Gao et al. [38] proposed an incremental training mode that employs deep rein-
forcement learning to solve the path planning problem. The related graph search
algorithm and reinforcement learning algorithm were evaluated in a lightweight
two-dimensional environment. Then a deep reinforcement learning-based algorithm
is designed in a 2D environment, including observation states, reward functions,
network structures, and parameter optimization to avoid time-consuming work in a
3D environment. The designed algorithm is transferred to a simple 3D environment
for retraining to obtain converged network parameters, including the weights and
biases of the deep neural network. These parameters are used as initial values to train
the model 3D environment in a complex model. To improve the generalization of
the model to different scenes, the deep reinforcement learning algorithm TD3 (Twin
UAV Path Planning Based on Deep Reinforcement Learning 35

Delayed Deep Deterministic Policy gradients) is proposed as a novel path planner by


combining it with the traditional global path planning algorithm PRM. The exper-
imental results show that the incremental training mode can significantly improve
the development efficiency. Moreover, the PRM + TD3 path planner can effectively
improve the model’s generalization ability.
• Problems existing in the current research
Based on the current research status at home and abroad, the current research mainly
has the following five problems and challenges:

– The path planning algorithm of the UAV needs to use a three-dimensional path
planning algorithm. The two-dimensional path planning algorithm cannot solve
more complex three-dimensional scenes. For three-dimensional environments
such as corridors, caves, and cities, due to the existence of complex obstacles and
inconveniences. In various situations such as deterministic factors, it is easy to fall
into the problem of dimension disaster and insufficient computing power when
using classical algorithms, and it is difficult to realize real-time path planning. A
single algorithm can no longer meet the actual needs of UAV path planning. In
order to find a better path planning solution, it is necessary to comprehensively
consider constraints such as environment, time, and performance. At present, a
main solution is to combine multiple algorithms. Use or improve some existing
algorithms.
– The path planning method in the traditional dynamic environment requires the use
of lidar, depth camera, or a combination of the two to collect surrounding envi-
ronmental information, thereby forming an environmental map, and completing
the path under the condition that the map information is known planning, so it
is also a challenge to complete path planning in an unknown environment. In
the face of complex three-dimensional environments, especially cities and other
environments, creating three-dimensional map information requires huge storage
space.
– The environmental navigation problem is usually divided into several processes
such as perception, mapping and planning to solve in sequence, that is, first, use
equipment such as lidar depth cameras to build an environmental map, and map
the point cloud information into grids on the premise of known map information.
A grid map, on which collision-free trajectories are calculated. This increases
processing latency and reduces the correlation between steps.
– Reinforcement learning obtains rewards and punishments by interacting with the
environment and continuous trial and error. Therefore, for UAVs, it has high trial
and error costs and huge security risks, and deep reinforcement learning training
in real environments requires a lot of data. Time-consuming and labor-intensive,
training is usually performed in a simulated environment.
– The rational design of the reward function is challenging. Reinforcement learning
is an end-to-end decision learning model, and the design of the reward function
will affect the learning effect of the strategy.
36 R. Dong et al.

– The structure of the real environment is uncertain, the light is unstable, and the
environment is highly dynamic, which has a large gap with the simulation envi-
ronment. The adaptive and stable navigation of path planning and the migration
from the simulation environment to the real environment are the most important
areas in the field of UAV autonomous navigation. difficulty.
In summary, it is of great significance for UAV path planning to deeply discuss how
to improve the existing reinforcement learning algorithm, reduce the trial and error
cost of deep reinforcement learning training through a realistic physical simulation
engine, and set a reasonable reward function.

1.3 The Main Research Content and Chapter Arrangement


of this Chapter

For unknown indoor environments or relatively unfamiliar environmental conditions


such as suburban residential areas, the rotary-wing drone can recognize the external
environment through lidar, not by building maps but by continuous trial and error,
learning to avoid Obstacles and strategies to safely reach a specific target location,
so as to achieve correct path planning in the absence of reference standards and prior
map information under unknown environmental conditions. This chapter proposes
a deep reinforcement learning DQN (Deep Q Network) algorithm combined with
a path planning method of the artificial potential field method, and designs a simu-
lation scene in the Gazebo and Airsim simulation environments to train the UAV.
According to the above research content, this thesis is divided into six chapters, and
the arrangement of each chapter is as follows.
The first chapter introduces the background and significance of UAV path plan-
ning, and expounds the key points and difficulties of UAV path planning tasks. This
chapter summarizes the domestic and foreign research on UAV classic path planning
technology and learning-based path planning technology, and introduces the current
problems to be solved, research trends, research content and significance.
The second chapter gives the definition of the ground coordinate system of the
UAV, the coordinate system of the body and the transformation relationship between
the coordinate systems, so as to describe the movement of the UAV. Complete the 3D
model modeling and SDF model representation of the UAV, and realize the simulation
of the UAV in the simulation environment. The UAV flight control scheme used in
this chapter is designed, and the UAV path planning task decision scheme used in
this chapter is given and verified.
The third chapter, according to the research content carried out in this chapter,
analyzes and compares the three branches in the field of machine learning, introduces
deep learning methods and deep neural networks, expounds the theoretical methods
of reinforcement learning, introduces the DQN algorithm in deep reinforcement
learning, and integrates artificial The idea of the potential field algorithm is combined
UAV Path Planning Based on Deep Reinforcement Learning 37

with the DQN algorithm, and the improvement of the DQN algorithm proposed in
this chapter is emphasized to make it perform better in the UAV path planning task.
The fourth chapter compares and analyzes the simulation environment commonly
used in the field of autonomous driving and UAV path planning algorithm research.
Gazebo and Airsim are selected as the environment used in this chapter, and the soft-
ware and hardware platform parameters are given. Based on the above two simulation
environments, a simulation environment for training the indoor path planning ability,
outdoor path planning ability and dynamic obstacle avoidance ability is designed and
built respectively to establish a foundation for training the UAV path planning model.
The fifth chapter, combined with the UAV model, the reinforcement learning
path planning algorithm and the simulation environment, is trained according to the
system state, action and reward function of the UAV path planning reinforcement
learning task. The indoor path planning ability and outdoor dynamic obstacle avoid-
ance ability are simulated flight tests, and the test results are analyzed. Compare
the path search results before and after the improvement of the DQN algorithm.
By comparing the number of UAV collisions and the numerical changes of the loss
function, the improvement effect of the efficiency and stability of the algorithm is
obtained. The path planning trajectory effect of the UAV is analyzed in the exper-
imental verification environment, and the deep reinforcement learning of the UAV
designed in this chapter is verified. Feasibility and stability of path planning methods.
The sixth chapter explains the conclusion and progress of this chapter, puts forward
the existing deficiencies, and makes an outlook on the future work in combination
with the actual situation.

2 Deep Learning and Reinforcement Learning

Decision learning based on reinforcement learning is an important means to achieve


autonomous control of robots, but considering the limitations and difficulties of
traditional reinforcement learning in dealing with complex high-dimensional and
continuous action problems, deep learning technology and reinforcement learning
are combined to solve high-dimensional problems. Feature learning problem and
deal with continuous action space, and then get deep reinforcement learning with
wider application scenarios. This chapter compares the training methods and charac-
teristics of deep learning and reinforcement learning, briefly introduces deep learning
methods and relevant theories of reinforcement learning, and introduces the DQN
algorithm in detail. The improvement proposed by the algorithm makes it have better
performance in the UAV path planning decision task.
38 R. Dong et al.

2.1 Comparison of Supervised Learning, Unsupervised


Learning and Reinforcement Learning

There are three main types of machine learning algorithms: supervised learning, unsu-
pervised learning and reinforcement learning. Supervised learning requires manually
given labels, and after learning a large number of labeled samples, makes predic-
tions on new data. Its essence is to first carry out the process of labeling according to
the existing data set samples, then determine the relationship between the input and
output, and iteratively train to obtain an optimal model according to this relation-
ship, and finally use the model to Samples outside the training set are used to make
predictions. The training data in supervised learning must have labels. The limitation
is that the subjectivity and limitations of manual labeling, as well as low efficiency
and high cost, will seriously affect the learning effect.
Unsupervised learning can learn unlabeled samples to mine potential structural
information in training samples, instead of relying on artificially given labels. Suitable
for situations where prior knowledge is unknown or insufficient, and manual labeling
is difficult or expensive. But coexisting with it is that unsupervised learning requires a
huge set of samples as support to find structural features without category information
(Table 1).
The training samples of reinforcement learning also do not need any labels,
and only learn through the reward signal given by the environment, which is an
autonomous learning mode. Reinforcement learning requires the agent to acquire
state information by exploring the environment. It doesn’t give a solution directly, it
needs to find the answer autonomously in the environment through trial and error. In
each state, the agent can choose to perform a variety of actions. Each choice can be
based on a greedy strategy or other strategies such as softmax, and the choices made
are then evaluated. The evaluation is reflected by the reward value, but the reward
value can only be regarded as an evaluation score, and it is impossible to determine
whether the current selection is correct. But the better the action the agent chooses,
the more reward it will get. Therefore, reinforcement learning does not need to be
marked in advance, and the agent can judge the quality of the final result by executing
these actions to complete the process of autonomous learning and optimization of
the strategy.
From the above comparison of the two types of algorithms, it can be concluded
that supervised learning is suitable for situations where the environment can not be
fully explored and actions cannot be evaluated; reinforcement learning is suitable for

Table 1 Comparison of supervised learning, unsupervised learning and reinforcement learning


Supervised learning Unsupervised learning Reinforcement learning
Manual marking Need Partly needed Unnecessary
The amount of data Huge demand Great demand Greater demand
Learning mode Educational Autonomous Autonomous
UAV Path Planning Based on Deep Reinforcement Learning 39

situations where the label information is noisy or the sample labels obtained are not
accurate enough. However, the optimal strategy of reinforcement learning is difficult
to extract features from higher-dimensional states. With the help of deep learning’s
perception of high-dimensional input, it is possible to learn optimal strategies from
high-dimensional data such as images or videos. Deep reinforcement learning fusion
The abstraction ability of deep learning and the strategies of reinforcement learning,
It can be more in line with the human way of thinking to solve practical problems.

2.2 Deep Learning Methods

Because it is difficult for reinforcement learning to learn policies directly from high-
dimensional raw data, some methods have been proposed to use deep neural networks
for reinforcement learning. As an end-to-end learning algorithm, the deep learning
method belongs to a branch of machine learning. It has powerful feature extraction
capabilities and overcomes the limitations of traditional machine learning in the
fields of image processing, speech analysis and other feature extraction. Using deep
learning methods to study classification problems and regression problems, it mainly
includes four parts: data, model, loss function and optimizer.
For the model, the simplest is a single-layer perceptron, which takes several
features as input, multiplies each input feature with its corresponding weight and
sums it up, similar to a model in which a neuron collects information through synaptic
weighting. Linear operations, namely addition and number multiplication, can be
efficiently completed by matrix multiplication [39], as shown in Eq. (1).
 m 
  
y = g W x+b = g
T
wi xi + b (1)
i

In the formula:—N W T -dimensional weight vector of the network;—N b-


dimensional bias vector of the network.
Simple linear models are difficult to solve or problems. Adding a nonlinear acti-
vation function after the linear layer enables it to handle more complex classifica-
tion problems, making it meaningful to deepen the network. A linear layer, plus
a nonlinear activation layer constitute the basic unit of a fully connected neural
network, which is called a neuron or node, as shown in Fig. 1. The weights w and
biases b are learned from the input data, so the model is an adaptive model.
The multi-layer perceptron is composed of several hidden layers and output layers.
The neurons of each layer of the network are connected to the previous layer, and
each group of connections has an exclusive weight. As shown in Fig. 2 is a three—
layer fully connected network, the model expression is similar to formula (1), but
the expression is a multiple composite function.
The deep fully connected network is shown in Fig. 3. In theory, the deeper the
network depth, the stronger the feature extraction and learning ability of the neural
40 R. Dong et al.

Fig. 1 Basic unit of fully connected neural network

Fig. 2 Layer 3 fully


connected network

network. However, a fully connected network that is too deep will lead to overfitting,
and Drapout technology can alleviate this problem [40]. The fully connected network
is not suitable for the case of too many inputs, and its input needs to be manually
selected or processed by feature extraction in advance. When an image is input, there
are more than a million parameters, and the larger the image, the deeper the network,
the larger the amount of parameters. In this chapter, the deep reinforcement learning
algorithm is applied to the UAV path planning task. The deep neural network acts as
a function fitter in the entire algorithm platform. The lidar data is used as the input of
the deep network, and its dimension is generally one-dimensional data. Therefore,
it can be directly input to the fully connected layer.
UAV Path Planning Based on Deep Reinforcement Learning 41

Fig. 3 Deep neural network

2.3 Reinforcement Learning Methods

Reinforcement learning is to obtain rewards from the interaction with the environ-
ment, so that the agent (Agent) learns the desired behavior. The goal of solving
reinforcement learning problems is to find the optimal policy for each state. Solving
reinforcement learning problems generally requires two steps: constructing the math-
ematical model of reinforcement learning Markov decision process and solving the
optimal solution of the Markov decision model.
Intuitively speaking, the state of each moment in a random process is only related
to the state of the previous moment. However, in the real environment, the state at
a certain time is usually related to the state at the historical time. Therefore, when
all the states at the historical time are converted into the state at the current time
according to a certain rule, the Markov property can be met. f n , f n+1 represents the
state of two adjacent moments, when f n+1 not only f n related to but also related to
f n−1 , f n−2 , . . . , f n−m both, transform the historical state into the current state, let
f n = ( f n−m , f n−m+1 , . . . , f n−1 , f n ). A typical Markov decision process is shown
in Fig. 4, where dn and f n represent the decision action and state value at the nth
moment [39], respectively.

Fig. 4 Markov decision process diagram


42 R. Dong et al.

Fig. 5 Basic principles of


reinforcement learning

The agent is in the environment (Environment), the state (State) represents the
agent’s perception of the current environment; the agent performs actions (Action)
to interact with the environment. After an action is performed, the environment
transitions from the current state to another state with a certain probability, and a
reward is given to the agent to evaluate the action according to potential reward rules.
This trial and error learns from experience to optimize the action strategy. The state,
action, state transition probability and reward function are the main components of the
reinforcement learning process, which are defined by a quadruple < S, A, T, R >,
S is the system state space set, A is the system action space set, T : S × A → S is
the state transition probability function and R is the reward function of the system.
Therefore, t the immediate reward obtained by the system when the system rt =
R(st , at , st+1 ) performs an action at ∈ A in the state and st ∈ S transfers to the state
at the moment is st+1 ∈ S. The interactive process can be seen in Fig. 5 [39].
It can be seen more clearly from the above figure that when the agent performs a
certain task, it first obtains the current state information by perceiving the environ-
ment, then selects the action to be performed, and interacts with the environment to
generate a new state, and at the same time The environment gives a reinforcement
signal for evaluating the pros and cons of the action, that is, the reward, and so on.
The process of the reinforcement learning algorithm is to execute the strategy and
the environment to achieve new state data, and use the new data to modify its own
behavior strategy under the guidance of the reward function. After many iterations,
the agent will learn to complete the task. The optimal behavior policy required.
Solving the Markov decision problem refers to solving the distribution of behav-
iors in each state so that the accumulated reward is maximized. For the model-free
method with unknown environment, the method based on the value function is gener-
ally adopted, and only the state value function is estimated during the solution, and
the optimal strategy is obtained in the iterative solution time of the value function.
The DQN method used in this chapter is a method based on value function.
UAV Path Planning Based on Deep Reinforcement Learning 43

2.4 DQN Algorithm

This section first analyzes the principle of the DQN algorithm, and then proposes
some improvements to the DQN algorithm according to the characteristics of the
UAV path planning task. It mainly studies and analyzes the boundaries and goals
of the interaction between the agent and the environment, so as to establish a deep
reinforcement learning model design that meets the requirements of the task.
Reinforcement learning realizes the learning of the optimal strategy by maxi-
mizing the cumulative reward. Formula (2) is a general cumulative reward model,
which represents the future cumulative reward value of the agent executing the
strategy from time t.


n
Rt = γ k−t rk (2)
k=t

where: γ ∈ [0, 1] is the discount rate, which is used to adjust the reward effect of
the future state on the reward at the current moment [39].
In the specific algorithm design, the reinforcement learning model establishes a
value function based on the expectation of accumulated reward to evaluate the policy.
Value functions include action value functions Q π (s, a) and state value functions
V π (s). The action value function of Q π (s, a) formula (3) reflects the expected reward
value of completing the action a from the state s execution strategy π; the state value
function of V π (s) formula (4) reflects the expected reward value when the state s
executes the strategy π.

Q π (s, a) = E π {Rt |st = s, at = a} (3)

V π (s) = E π {Rt |st = s} (4)

Reinforcement learning to obtain the maximum cumulative return is equivalent to


maximizing the value function Q π (s, a) and V π (s) process. The optimal action value
function Q ∗ (s, a) and optimal state value function are defined V ∗ (s) as formulas (5)
and (6):

Q ∗ (s, a) = max Q π (s, a) (5)


π

V ∗ (s) = max V π (s) (6)


π

Equation (3) into Eq. (5) and express it in the iterative form shown in (8):

Q ∗ (s, a) = max E π {Rt |st = s, at = a} (7)


π
44 R. Dong et al.
 
   
= E π R s, a, s  + γ max Q ∗ s  , a  |s, a (8)
i

In the formula: s  and a  are s the successor states and actions of and, respectively,
and formula (8) is called Q ∗ (s, a) the a Bellman optimal equation.
It can be seen that the value function includes the value function value of the next
moment and the immediate reward value, which means that when reinforcement
learning evaluates the strategy at a certain moment, it also considers the value of the
current moment and the range of future moments. The long-term value, that is, the
possible cumulative reward; this also avoids the limitations of the model, and avoids
only focusing on the size of the immediate reward and ignoring the long-term value,
which is not the optimal strategy choice. The Bellman equation iteratively finds the
MDP, and then obtains the Q ∗ (s, a) optimal action policy function π ∗ (s) shown in
the formula (9) [39]:

π ∗ (s) = arg max Q ∗ (s, a) (9)


a

In order to solve the continuous or large-scale discrete state-action space limitation


of traditional reinforcement learning, deep learning technology is introduced, and
the neural network is used to fit the value function, for example: the objective cost
function is estimated by the neural network of formula (10) Q ∗ (s, a),

L i (θi ) = E (yi − Q(s, a; θi )2 } (10)

where: θ —the parameters of the neural network.


fitting objective yi is shown in formula (11):
   
yi = R s, a, s  + γ max Q s  , a  ; θi−1 (11)
a

In the process of modeling and fitting the value function, the neural network is
very sensitive to the sample data; while the sequence data samples output by the
reinforcement learning execution strategy have strong correlation, which seriously
affects the fitting accuracy of the neural network and makes the iterative optimization
process of the strategy fall into Local minima even lead to non-convergence problems.
In order to overcome the above problems, deep reinforcement learning algorithms
generally use experience replay technology to weaken the coupling between the data
extraction process and the policy optimization process and remove the correlation
between sample data. Specifically, in the reinforcement learning process, the data
obtained by the interaction between the agent and the environment are temporarily
stored in the database as shown in Fig. 6, and then the data is obtained through
random sampling to guide the neural network update.
In addition, in order to obtain the optimal policy, it is desirable to obtain as high
a reward value as possible when performing the policy selection action, and also
hope that the model has a certain search ability to ensure the state space search
UAV Path Planning Based on Deep Reinforcement Learning 45

Fig. 6 Experience playback


database

ability. Obviously, the traditional greedy method cannot take into account the above
requirements, so soft strategies are generally used for action selection. ε-The greedy
action selection method is: when executing an action, (1 − ε) select a high-value
action according to π ∗ (s) the probability; ε randomly select the search action space
with probability, the mathematical expression is as formula (12):

π ∗ (s), probability 1 − ε
πε (s) = (12)
Random selection a ∈ A, probability ε

The DQN algorithm finally obtained is shown in Table 2. Among them, M is the
maximum number of training steps, the subscript j represents the serial number of the
state transition sample in Nbatch the small batch sample set of, is the si environmental
state of the mobile robot, ai is the executable action in the state space, and D is the
experience playback pool.

3 Design of Improved DQN Algorithm Combined


with Artificial Potential Field

In the DQN algorithm, the state transition probability is not considered, only the
description of the state space, action space and reward function of the agent is consid-
ered, and these elements should be designed according to specific tasks [40]. For the
path planning reinforcement learning task, it is first necessary to design a state space
based on sensor parameters, an action space based on UAV motion characteristics,
and a reward function based on path planning characteristics to build a UAV path
planning reinforcement learning system. In order to improve the robustness of the
UAV’s path planning in an unknown environment and improve the learning efficiency
of UAV path planning, based on the obstacle position information and target position
information of the environment where the UAV is located, the design is suitable for
path planning tasks. The reward function of, fully considers the influence of position
and orientation on the reward function. In addition, this chapter supplements and
improves the reward function based on the problem and idea that the range repulsion
46 R. Dong et al.

Table 2 DQN algorithm pseudo code

affects the path planning of the artificial potential field method: through the analysis
of the motion collision of the UAV, the supplementary establishment of the direction
penalty obstacle avoidance function is more effective for the UAV’s movement. Eval-
uation, guiding the UAV to quickly reach the target position under effective obstacle
avoidance conditions.

3.1 Network Structure Design

Since the DQN method generally overestimates the Q value of the behavior value
function, there is an over-optimization problem. The overall estimated value function
is larger than the real value function, and the error will increase with the increase of
the number of behaviors, as shown in Fig. 7, generally using two network Q Network
UAV Path Planning Based on Deep Reinforcement Learning 47

and Target Q Network, which implements behavior selection and behavior evaluation
with different value functions. The two network structure models are exactly the
same. In order to ensure the convergence and learning ability of the network training
process, the parameter update speed of the Target Q network is slower than that of
the Online Q network. Target in this section Q The network is updated every 300
steps by default, and can be adjusted according to actual training needs. Because in
the actual learning and training process, the learning time of the agent or the cost of
network training time increases with the increase of the model complexity. Therefore,
this chapter designs a network structure with low model complexity that can meet
the task requirements and uses the Keras hierarchical structure to build it; in order
to avoid the over-fitting phenomenon of the model, random deactivation Dropout is
added after the fully connected layer.
Based on the above theories, considering that the input data size of the deep
network network model is the characteristic state size of the regional perception
centered on the drone, and adapting to the network input size and scale, the final
network model consists of three fully connected layers and one hidden layer. consti-
tute. According to the actual requirements of the mobile robot control task, the
characteristic state of the robot is used as the input of the network, and the network
expects to output the Q value of 7 actions, and at the same time select the action with
the largest Q value to execute. The network model is shown in Fig. 8.

Fig. 7 Graph
48 R. Dong et al.

Fig. 8 Schematic diagram of network structure

3.2 State Space Design

The distance from the UAV to surrounding obstacles is the most intuitive indicator to
reflect the environmental state of the UAV. In this chapter, the current environmental
state is detected based on the distance information from the UAV to the surrounding
obstacles, and the lidar is used as a sensor to detect the relative position and distance
between the UAV and the obstacles. Therefore, the detection distance information
diagram as shown in Fig. 9 is designed.
The feature state information is mainly composed of three parts: sensor feedback
information, target position information and motion state information, forming a

Fig. 9 Schematic diagram


of distance perception
UAV Path Planning Based on Deep Reinforcement Learning 49

standard finite Markov process, so that deep reinforcement learning can be used to
deal with this task. The state space is represented by a one-dimensional array formed
by the distance values of the lidar; considering that there will be at least 360 depth
value channels in the circular area with the UAV as the center when the lidar detects,
which is not required in the actual task of this chapter So many depth values and too
large state space will increase the computational cost and weaken the learning ability
of the model. Therefore, in actual lidar detection, as shown in Fig. 9, this chapter sets
the sampling interval of lidar to 15°, and finally obtains the down-sampled lidar data
array. Specifically, the length of the lidar data array is 2 4, the first interval represents
the forward direction of the UAV, and the distance and included angle between the
UAV and the obstacle position are added; the state space format is shown in formula
(13), where state is the state space, which represents the distance value li of the ith
interval d corresponding to the lidar, the distance value between the UAV distance
and the target zone, a and the angle value between the UAV’s forward direction and
the obstacle.

state = [l1 , l2 , . . . , l22 , l23 , l24 , d, a] (13)

During the simulation process, the lidar information is obtained at a fixed


frequency, and the lidar data is extracted through the Topic message of ROS. The
state space data includes the distance information between the UAV and the target
point, and the azimuth information between the UAV and the obstacle. The above
information is processed into a multi-dimensional vector, where the value is 2 6, as
the state information of the UAV.

3.3 Action Space Design

The action space should be designed so that the drone can explore the environment as
much as possible for rewards. The UAV controls the heading through the command of
the autopilot. By defining the fixed yaw angle change of the UAV, plus the rising and
falling speeds, the movement of the UAV can basically cover the entire environment
through the speed and angular velocity control. Explore space. Therefore, the action
space, that is, the value range of the UAV action, is shown in Table 3.
DQN algorithm is discrete, and the UAV’s actions are divided into seven spatially,
including fast left turn, left turn, straight ahead, right turn, fast right turn, ascent and
descent, and the angular velocities are 1.2, −0.6, 0, 0.6, 1.2 rad /s, and ascent and
descent speeds 0.5 m/s and −0.5 m/s. The speed command is sent to the drone at
a fixed frequency in time. Through this design, the actual path of the drone is a
continuous arc and polyline.
50 R. Dong et al.

Table 3 Action space


Action Angular velocity (rad/s) or velocity (m/s)
0 −1.5 (rad/s)
1 −0.75 (rad/s)
2 0 (rad/s)
3 0.75 (rad/s)
4 1.5 (rad/s)
5 0.5 (m/s)
6 −0.5 (m/s)

3.4 Reward Function Design

The flight mission of the UAV is generally sailed according to the planned route,
which is composed of multiple waypoints arranged and connected, so the flight
mission of the UAV can be decomposed into multiple path planning tasks between
multiple waypoint sequences. The UAV starts from the starting point and passes
through the designated waypoints in turn. When the UAV in flight encounters an
obstacle, if there is a danger of collision, the UAV needs to avoid the obstacle.
Reinforcement learning is used for path planning. Since the motion behavior of
the UAV is selected from the action space, the path representing the UAV must
be flyable. At the same time, the artificial potential field method is introduced into
the reinforcement learning algorithm. As shown in Fig. 10, the attractive potential
is assigned to the target waypoint, and the repulsive potential is assigned to the
obstacle. The multi-rotor will attract the target waypoint and obstacles. Flying under
the combined action of the repulsive force, it is shown to fly along the desired path to
the target waypoint while avoiding obstacles on the path. The artificial potential field
method is embodied in the reward function of reinforcement learning. The following
is an introduction to the design of the improved DQN algorithm.
Figure 11, the movement of the drone in the surrounding environment is designed
as a movement in an abstract electric field, the target point is assumed to be negatively
charged, the drone and obstacles are assumed to be positive charges, the target The

Fig. 10 Schematic diagram of artificial potential field path planning


UAV Path Planning Based on Deep Reinforcement Learning 51

point and the drone have different charges, so they have gravity, and the obstacle and
the drone have the same charge, so they have repulsion. The movement of the drone
is guided by the resultant force in space, which r1 is the distance from the drone to the
target point, which r2 is The distance from the UAV to the obstacle is the Q G amount
of negative charge assigned to the target point, the amount of negative charge Q O
assigned to the obstacle, and the amount of Q U positive charge assigned to the UAV,
ka , kb and kc is the proportional coefficient, ϕ is the angle between the gravitational
direction and the movement direction of the U B UAV, and is the resultant force
received by the UAV; in actual work, in order to avoid the UAV to avoid collisions
as much as possible, choose the round-trip motion instead of the target point In the
case of motion, the attraction of the target point to the UAV should be greater than
the repulsion effect of the obstacle, so the set Q G value is larger than Q O the value to
ensure that the UAV can avoid obstacles and reach the target point; When the drone
approaches the target point, the gravitational force increases, and when the drone
approaches the obstacle, the repulsive force increases; function of the DQN deep
reinforcement learning algorithm is expressed as the following three parts:
Gravitational reward function (14):

QU QG U G
RU G  = U G  · ka = ka (14)
r12 |U G|

In the formula: RU G  —the reward caused U G  by gravity; U G—the gravitational


force of the drone;—the vector from the drone to the target point; |U G|—the distance
UG
from the drone to the target point; |U G|
—the direction from the drone to the target
point unit vector on.

Fig. 11 Schematic diagram


of the UAV reward function
52 R. Dong et al.

Repulsion Reward Function (15):

QU Q O U O
RU O  = U O  · kb = kb (15)
r22 |U O|

In the formula: RU O  —the reward caused U O  by the repulsion force;—the repulsion


force received by the U O UAV;—the vector from the UAV to the obstacle; |U O|—
UO
the distance from the UAV to the obstacle; |U O|
—the direction from the UAV to the
target point unit vector on.
Direction reward function (16):

(U O  + U G  )U C
Rϕ = arccos kc (16)
|U O  + U G  ||U C|
 
(U O +U G )U C
where: U C—the force actually received by the UAV; arccos |U O  +U G  ||U C|
—the
angle between the actual motion direction and the expected motion direction; reward
function is shown in (17):

R = RU G  + RU O  + Rϕ (17)

This chapter analyzes the advantages and disadvantages of the three major types
of machine learning, supervised learning, unsupervised learning, and reinforcement
learning, discusses the training methods and limitations of deep learning and rein-
forcement learning, introduces the basic theory of deep learning and reinforcement
learning, and introduces deep reinforcement. The DQN algorithm in learning focuses
on the improvement of the DQN algorithm combined with the artificial potential field
algorithm in the classical path planning algorithm, so that it has better performance
in the UAV path planning decision task. Constraining the agent to discrete actions
makes the algorithm easier to converge at the cost of reduced mobility. Using the
fusion method of electric potential field and deep reinforcement learning, obsta-
cles generate repulsive force, and target point generates gravitational force, which
is combined with the reward function to guide the drone to reach the target point
without collision, so that the algorithm converges quickly. The lightweight network
structure is adopted, including three fully connected layers and one hidden layer,
which enhances the real-time performance. Finally, the feasibility of the algorithm
is verified.

4 Simulation Experiment and Result Analysis

This chapter is the training and testing chapter. It applies the improved DQN algo-
rithm to the UAV path planning task, and selects common UAV mission scenarios
to design multiple sets of experiments for algorithm feasibility testing, training and
UAV Path Planning Based on Deep Reinforcement Learning 53

verification. According to the training model of the improved DQN algorithm, the
feasibility test of indoor path planning in Gazebo environment and the training of
dynamic obstacle avoidance were carried out. After completion, it does not rely on
the reinforcement learning strategy, and only outputs the action value through the
deep neural network.

4.1 Reinforcement Learning Path Planning Training


and Testing

With the UAV model, the reinforcement learning path planning algorithm and the
simulation environment, the system state, action and reward function of the rein-
forcement learning task are planned according to the UAV path. The process of this
experimental study is shown in Fig. 12. First, initialize the UAV and the simulation
environment, load the UAV into the simulation environment, and obtain the reward
function corresponding to R the state space S of the previous moment and the action
space of the previous moment when the UAV flies in the simulation environment
A. In addition, the reward function at the current moment is R  stored in the data
container, and the data stored in the data container can be updated in real time with
the movement of the drone; when the sample size is sufficient, the training process is
started, and the decision-making network in DQN is used. Fit the Q value, and select
the value in the action space with the highest expected value as the action command
of the UAV; when the UAV approaches an obstacle or collides with an obstacle, the
R value of the reward function generated is small, and when the UAV approaches
the target point Or when the target point is reached, the generated reward function
R value is large. As the training progresses, the behavior of the drone will avoid
obstacles and reach the target point. When the reward value reaches the requirement
or reaches the set number of training steps, save the most The weights and parameters
of the optimal deep neural network are then verified in the test environment, such as
the reliability of the model. After the training, the weights and parameter files of the
network model can be obtained. In the algorithm testing and application stage, there
is no need to train the model and reinforcement learning strategies. It only needs to
send the state information to the deep neural network module to output the action
value.
This chapter mainly conducts three parts of the experiment: The first part is to
verify the feasibility of the algorithm in the indoor path planning environment of
Gazebo; the second part is to conduct intelligent body training, to carry out the UAV
path planning task in the training environment, the environment needs to be observed,
sent to the neural network, and finally The network model is trained according to
the DQN algorithm. The third part is the agent test, which loads the trained model
into the test environment, executes the path task according to the network model
obtained during the training process, and finally counts the completion of the task. In
order to train and test the UAV path planning algorithm proposed above, Chap. 4 has
54 R. Dong et al.

Fig. 12 Training and testing flow chart

constructed a variety of path planning algorithm training and testing scenarios. The
training of all environments is designed and developed based on Tensorflow, ROS
and Pixhawk using Python language under Ubuntu 20.04 system.
In order to determine the path planning ability of the network model obtained
after training, testing is required. That is, in the same simulation environment as the
training environment, the path is completely determined by the network model of
deep reinforcement learning, and 100 rounds of testing are designed for each round.
Take the drone initialization as the beginning of the test round, and take the drone
reaching the target point as the end condition of the test round. A total of 100 rounds
of testing are carried out, and the target point is placed in the range that the drone can
reach in 50 training steps. Within, i.e. by defining a limited number of points within
a certain range, the drone target points randomly appear at the defined locations. At
the beginning of each turn, the drone will initialize, returning to the set position.
UAV Path Planning Based on Deep Reinforcement Learning 55

In order to evaluate the path planning performance of the algorithm, this chapter
designs three evaluation indicators as the actual quantitative indicators for judging
the effect of the algorithm, which are specifically defined as:
• Loss function: The loss function is the error function between the actual value
obtained by the target network and the predicted value output by the training
network. The training process is to use gradient descent to reduce this value. The
change in loss can indicate that the network is approaching convergence. Record
the test process The change of the loss function in the middle, judge the rate of
convergence and the size of the error;
• Maximum Q value: During training, each time the observation value of the number
of samples in the experience pool is taken, the current state is input into the training
network to obtain the Q value, and then the corresponding next state is input into
the target network, and the maximum Q value is selected. Record the change of
the maximum Q value during the test process to judge the learning effect of the
reinforcement learning strategy;
• Success rate: For each path planning process, if the UAV can reach the target
smoothly in the end, it is regarded as a successful path planning process, but
if after more than 50 actions, it is still If it cannot be achieved, or during the
execution of the action, the drone moves beyond the specified range of motion,
such as hitting an obstacle, or exceeding the specified motion area, etc., it will be
regarded as an unsuccessful test. Count the number of successes in one hundred
rounds, and count the success rate.

4.2 Training and Results

In an indoor closed space without obstacles, the path planning capability of the UAV
3D path planning task algorithm is verified. The UAV collects obstacle information
through lidar, obtains the distance of nearby obstacles relative to the agent, and the
clip relative to the target point. Angle and distance from the target point. Each time
training is performed according to the method proposed in this chapter, the artificial
potential field method is always followed for certain rewards and punishments during
the training process. The network model parameter settings are shown in Table 4.
The initial value of the greedy coefficient Epsilon is set to 1.0, and it gradually
decreases by 0.01 until 0.05 no longer decays. The deep neural network architecture
consists of 1 input layer, 2 fully connected layers, 1 hidden layer and then a fully
connected layer, and finally the output layer. Therefore, this chapter designs a network
structure with low model complexity that can meet the task requirements and uses the
Keras hierarchical structure to build it; in order to avoid the over-fitting phenomenon
of the model, a random dropout (Dropout) is added after the fully connected layer.
Considering the deep network The input data size of the network model is the size of
the feature state of the regional perception centered on the UAV, and it adapts to the
input size and scale of the network. The final network model consists of three fully
connected layers and one hidden layer. According to the actual requirements of the
56 R. Dong et al.

Table 4 Network model training parameters


Hyperparameters Numerical value Describe
episode_step 1000 Time steps
target_update 200 Target network update rate
discount_factor 0.99 Indicates how much value will be lost in the future based
on the time step
learning_rate 0.00025 Learning speed. If the value is too large, the learning effect
is not good; if the value is too small, the learning time is
very long
epsilon 1.0 Probability of choosing a random action
epsilon_decay 0.99 Epsilon reduction rate. When one step is over, ε decreases
epsilon_min 0.05 Epsilon minimum
batch_size 64 The size of a set of training samples
train_start 64 If the playback memory size is greater than 64, start
training
memory 1,000,000 The size of the playback memory

mobile robot control task, the characteristic state of the UAV is used as the input of
the network, and the network expects to output the Q value of 7 actions, and at the
same time, the action with the largest Q value is selected for execution.
Therefore, the training is carried out according to the above method, and the model
of the UAV reinforcement learning deep network with the ability of two-dimensional
path planning is obtained. Figure 13 shows the change of the model’s Eplison value,
and Fig. 14 shows the change of the model’s maximum Q value with the number of
training steps. It can be concluded that with the increase of the number of training
steps, the maximum Q value gradually increases, and the model error also tends to be
stable. After obtaining the reinforcement learning path planning model, there is no
need to train the model and reinforcement learning strategy, and only need to send the
state information to the deep neural network module to output the action value. The
motion command is selected in the action space, so it must conform to the kinematic
model, that is, the application of three-dimensional path planning (Fig. 15).
In this path planning task, the UAV starts from the starting point, can move toward
the target point, and deflects itself toward the target point. When approaching an
obstacle, it will make an obstacle avoidance action. It collides with the boundary
or obstacle and always maintains a safe distance, indicating that the path planning
strategy has been learned, and it proves that the reinforcement learning strategy
designed in this chapter can realize the path planning of the UAV (Table 5).
After the training is completed, load the parameters and weights of the 1000 -step
training model, and only need to send the state information to the deep neural network
module to output the action value. Send the action value to Pixhawk to control the
drone movement through ROS. Two sets of tests are conducted, where the starting
point of the drone is ( 0, 0, 2) and the target points are (2, 0, 0) and (−2, −2, 2), each
group conducts 100 tests for a total of 200 times, and counts the success rate of the
UAV Path Planning Based on Deep Reinforcement Learning 57

Fig. 13 Changes in eplison


values during training

Fig. 14 Maximum Q value


change during training

drone reaching the target point and the number of collisions. If it collides or stops,
the next test will be started. The test results are shown in the table below. As shown
in Fig. 16, the results show that the UAV has a certain path planning ability in the
indoor environment.

4.3 Comparative Analysis of Improved DQN and Traditional


DQN Algorithms

On the basis of the previous section, in order to better verify the performance of the
improved DQN algorithm for UAV path planning, the improved DQN path planning
58 R. Dong et al.

Fig. 15 Loss changes


during training

Table 5 Gazebo dynamic obstacle avoidance test results


Target point location Number of times to Number of collisions Success rate
reach the target point
Test group 1 (2, 0, 0) 76 24 0.76
Test group 2 (−2, −2, 2) 64 36 0.64

experiment was carried out in the indoor simulation environment. Set the classic
DQN algorithm to reward the target point, punish the collision, and give the reward
for the distance from the target point. At the same time, under the same environment,
the traditional DQN algorithm is used as a comparative experiment. 100 rounds of
testing are designed each round. Take the drone initialization as the start of the test
round, and take the drone to reach the target point as the end condition of the test
round. A total of 100 rounds of tests are carried out, and the target point is placed
within the range that the drone can reach in 50 steps, namely By defining a limited
number of points within a certain range, drone target points appear randomly at
defined locations. At the beginning of each round, the drone will initialize, return to
the set position, and select the loss change and average path length as the comparison
criteria.
The reward function of the classic DQN algorithm is shown in formula (18)

R = Ra + Rd cos θ + Rc + Rs (18)

the formula: Ra —target reward; Rd —direction reward; Rc —collision penalty; Rs —


step penalty [41].
According to the UAV reaching the target point, it is recorded as the UAV path
planning success, and the UAV path planning success rate based on the improved
DQN algorithm and DQN algorithm is calculated. The results are shown in Table 6.
UAV Path Planning Based on Deep Reinforcement Learning 59

(a) Environment initialization (b) Target point generation

(c) The drone moves towards the (d) The drone reaches the target
target point point

Fig. 16 3D indoor path planning test environment

Compared with the classic QN algorithm, the improved DQN algorithm has a shorter
average path length and a higher success rate, while the classic DQN algorithm still
cannot reach the target point after 100 rounds. From the comparison of the success
rate change curve, it can be shown that the effect of improving the DQN algorithm
in the obstacle avoidance experiment is better than that of the DQN algorithm.
Figure 17 shows the loss curves obtained by the DQN algorithm and the improved
DQN algorithm in the indoor 3D path planning environment of the UAV. The red
curve in the figure represents the loss change trend of the improved DQN algorithm,
while the blue curve represents the loss change trend of the classic DQN algorithm.
The figure reflects the variation of the error obtained by the UAV in each step after
being trained by two reinforcement learning algorithms. It can be seen that the error
of the improved DQN algorithm is smaller and the convergence speed is faster.
Figures 18 and 19 are the path planning trajectories of the UAV in the test envi-
ronment. For the improved DQN algorithm, the shortest path that can be achieved in
the discrete action space is used to reach the target point, while the test results of the
60 R. Dong et al.

Table 6 Comparison results of improved DQN and classic DQN


Target Number of Collision Average path Exceeds the Success
location times to frequency length maximum rate
reach the number of
target point rounds
Improve (2, 0, −1) 89 11 8 0 0.89
DQN
Classic (2, 0, −1) 52 10 13 38 0.52
DQN

Fig. 17 Comparison of loss

classic DQN algorithm show no The human–machine cannot successfully complete


the path planning requirements, which is manifested as a back-and-forth movement
in an open area in space to avoid collisions without moving to the target point. It
can be seen that due to the introduction of the idea of the artificial potential field
method, the improved DQN algorithm avoids the phenomenon that the DQN algo-
rithm repeatedly moves in place to avoid collisions, and can successfully complete
the path planning task.
Comparing the test results of the DQN algorithm and the improved algorithm,
it shows that the introduction of the artificial potential field method can effectively
guide the UAV to move to the target point, and can reach the target point to get
rewards; while the strategy learned by the traditional algorithm is to try to avoid
collision rather than actively. The target point moves. This shows that in the UAV
path planning task, the improved DQN algorithm can be more efficient and faster
than the classic DQN algorithm. At the same time, the magnitude of the change of the
UAV Path Planning Based on Deep Reinforcement Learning 61

Fig. 18 Improved DQN path

Fig. 19 Classic DQN path

loss curve of the DQN algorithm is greater than that of the improved DQN algorithm,
which shows that DQN is not as good as the improved DQN algorithm in terms of
algorithm stability.

5 Conclusions

UAV has become a working platform for various environments with different
purposes. This chapter takes the rotor UAV as the experimental platform to study the
deep reinforcement learning path planning of UAVs. This chapter uses ROS as the
communication, sends the decision instructions to Pixhawk to control the UAV to
achieve path planning, and proposes a path planning method that improves the DQN
algorithm. This algorithm combines the advantages of the artificial potential field
62 R. Dong et al.

method. After testing in the simulation environment, the algorithm is conducive to


the decision of deep reinforcement learning, and greatly reduces the time required for
training. In order to achieve more realistic rendering effects and build larger scenes,
Gazebo and Airsim are selected as the training and testing environments for depth
enhancement algorithms. Experimental results show that the algorithm can achieve
collision free path planning from the starting point to the target point, and it is easier
to converge than the classical DQN method. The conclusions of this chapter are as
follows:
According to the task requirements of this chapter, a path planning method based
on the improved DQN algorithm is designed. This algorithm combines the advan-
tages of the artificial potential field method, which can effectively guide the UAV to
reach the target point while avoiding obstacles in the training process, and signif-
icantly improves the efficiency of the path planning algorithm for UAV in-depth
reinforcement learning.
Combined with UAV model, reinforcement learning path planning algorithm and
simulation environment, training is conducted according to the system state, action
and reward function of UAV path planning reinforcement learning task, and then
simulation flight tests are conducted on the simulation test platform for UAV’s indoor
path planning ability and outdoor dynamic obstacle avoidance ability respectively,
Test results show that the algorithm can achieve indoor and outdoor path planning
and dynamic obstacle avoidance.
The path search results of improved DQN algorithm and classical DQN algo-
rithm are compared and analyzed. By comparing the number of UAV collisions, the
efficiency and stability of the reward function numerical change analysis algorithm
are improved, and the path planning trajectory effect of UAV is given in the test
environment, which finally verifies the stability and feasibility of the UAV depth
reinforcement learning path planning method designed in this chapter.

References

1. Khatib, O. (1995). Real-time obstacle avoidance for manipulators and mobile robots. Inter-
national Journal of Robotics Research, 5(1), 500–505. https://doi.org/10.1177/027836498600
500106.
2. Ge, S. S., & Cui, Y. J. (2002). ‘Dynamic motion planning for mobile robots using potential field
method. Autonomous robots’, 13(3), 207–222. https://doi.org/10.1023/A:1020564024509.
3. Mabrouk, M. H., & McInnes, C. R. (2008). Solving the potential field local minimum problem
using internal agent states. Robotics and Autonomous Systems, 56(12), 1050–1060. https://doi.
org/10.1016/j.robot.2008.09.006.
4. Jurkiewicz, P., Biernacka, E., Domżał, J., & Wójcik, R. (2021). Empirical time complexity of
generic Dijkstra algorithm. In 2021 IFIP/IEEE International Symposium on Integrated Network
Management (IM) (pp. 594–598). IEEE. (May, 2021).
5. Knuth, D. E. (1977). A generalization of Dijkstra’s algorithm. Information Processing Letters,
6(1), 1–5.
UAV Path Planning Based on Deep Reinforcement Learning 63

6. Pods˛edkowski, L., Nowakowski, J., Idzikowski, M., & Vizvary, I. (2001). ‘A new solution
for path planning in partially known or unknown environment for nonholonomic mobile
robots. Robotics and Autonomous Systems, 34(2–3), 145–152. https://doi.org/10.1016/S0921-
8890(00)00118-4.
7. Zhang, Y., Li, L. L., Lin, H. C., Ma, Z., & Zhao, J. (2017, September). ‘Development of
path planning approach based on improved A-star algorithm in AGV system. In International
Conference on Internet of Things as a Service (pp. 276–279). Springer, Cham. https://doi.org/
10.1007/978-3-030-00410-1_32. (Sept, 2017).
8. Sedighi, S., Nguyen, D. V., & Kuhnert, K. D. (2019). Guided hybrid A-star path planning
algorithm for valet parking applications. In 2019 5th International Conference on Control,
Automation and Robotics (ICCAR) (pp. 570–575). IEEE. https://doi.org/10.1109/ICCAR.2019.
8813752. (Apr, 2019).
9. LaValle, S. M. (1998). Rapidly-exploring random trees: A new tool for path planning (pp. 293–
308).
10. Karaman, S., & Frazzoli, E. (2012). Sampling-based algorithms for optimal motion planning
with deterministic μ-calculus specifications. In 2012 American Control Conference (ACC)
(pp. 735–742). IEEE. https://doi.org/10.1109/ACC.2012.6315419. (June, 2012).
11. Kavraki, L. E., Svestka, P., Latombe, J. C., & Overmars, M. H. (1996). Probabilistic roadmaps
for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics
and Automation, 12(4), 566–580. https://doi.org/10.1109/70.508439.
12. Webb, D. J., & Van Den Berg, J. (2013). Kinodynamic RRT*: Asymptotically optimal motion
planning for robots with linear dynamics. In 2013 IEEE International Conference on Robotics
and Automation (pp. 5054–5061). IEEE. https://doi.org/10.1109/ICRA.2013.6631299. (May,
2013).
13. Bry, A., & Roy, N. (2011). Rapidly-exploring random belief trees for motion planning under
uncertainty. In 2011 IEEE International Conference on Robotics and Automation (pp. 723–
730). IEEE. https://doi.org/10.1109/ICRA.2011.5980508. (May, 2011).
14. Nasir, J., Islam, F., Malik, U., Ayaz, Y., Hasan, O., Khan, M., & Muhammad, M. S.
(2013). RRT*-SMART: A rapid convergence implementation of RRT. International Journal
of Advanced Robotic Systems, 10(7), 299. https://doi.org/10.1109/ICRA.2011.5980508.
15. Gammell, J. D., Srinivasa, S. S., & Barfoot, T. D. (2014). Informed RRT*: Optimal sampling-
based path planning focused via direct sampling of an admissible ellipsoidal heuristic. In 2014
IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 2997–3004). IEEE.
https://doi.org/10.1109/IROS.2014.6942976. (Sept, 2014).
16. Ye, H., Zhou, X., Wang, Z., Xu, C., Chu, J., & Gao, F. (2020). Tgk-planner: An efficient topology
guided kinodynamic planner for autonomous quadrotors. IEEE Robotics and Automation
Letters, 6(2), 494–501. arXiv:2008.03468.
17. Koohestani, B. (2020). A crossover operator for improving the efficiency of permutation-based
genetic algorithms. Expert Systems with Applications, 151, 113381. https://doi.org/10.1016/j.
eswa.2020.113381.
18. Lamini, C., Benhlima, S., & Elbekri, A. (2018). ‘Genetic algorithm based approach for
autonomous mobile robot path planning. Procedia Computer Science’, 127, 180–189. https://
doi.org/10.1016/J.PROCS.2018.01.113.
19. Li, Q., Wang, L., Chen, B., & Zhou, Z. (2011). An improved artificial potential field method for
solving local minimum problem. In 2011 2nd International Conference on Intelligent Control
and Information Processing (Vol. 1, pp. 420–424). IEEE. https://doi.org/10.1109/ICICIP.2011.
6008278. (July, 2011).
20. Liang, J. H., & Lee, C. H. (2015). Efficient collision-free path-planning of multiple mobile
robots system using efficient artificial bee colony algorithm. Advances in Engineering Software,
79, 47–56. https://doi.org/10.1016/j.advengsoft.2014.09.006.
21. Akka, K., & Khaber, F. (2018). Mobile robot path planning using an improved ant colony
optimization. International Journal of Advanced Robotic Systems, 15(3), 1729881418774673.
https://doi.org/10.1177/1729881418774673.
64 R. Dong et al.

22. Su, Q., Yu, W., & Liu, J. (2021). Mobile robot path planning based on improved ant colony
algorithm. In 2021 Asia-Pacific Conference on Communications Technology and Computer
Science (ACCTCS) (pp. 220–224). IEEE. https://doi.org/10.1109/ACCTCS52002.2021.00050.
(Jan, 2021).
23. Cheng, J., Wang, L., & Xiong, Y. (2018). Modified cuckoo search algorithm and the prediction
of flashover voltage of insulators. Neural Computing and Applications, 30(2), 355–370. https://
doi.org/10.1007/s00521-017-3179-1.
24. Khaksar, W., Hong, T. S., Khaksar, M., & Motlagh, O. R. E. (2013). A genetic-based opti-
mized fuzzy-tabu controller for mobile robot randomized navigation in unknown environment.
International Journal of Innovative Computing, Information and Control, 9(5), 2185–2202.
25. Xiang, L., Li, X., Liu, H., & Li, P. (2021). Parameter fuzzy self-adaptive dynamic window
approach for local path planning of wheeled robot. IEEE Open Journal of Intelligent
Transportation Systems, 3, 1–6. https://doi.org/10.1109/OJITS.2021.3137931.
26. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A.,
Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., & Hassabis, D. (2015). Human-
level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.
org/10.1038/nature14236.
27. Jaradat, M. A. K., Al-Rousan, M., & Quadan, L. (2011). Reinforcement based mobile robot
navigation in dynamic environment. Robotics and Computer-Integrated Manufacturing, 27(1),
135–149. https://doi.org/10.1016/j.rcim.2010.06.019.
28. Shi, Z., Tu, J., Zhang, Q., Zhang, X., & Wei, J. (2013). The improved Q-learning algorithm
based on pheromone mechanism for swarm robot system. In Proceedings of the 32nd Chinese
Control Conference (pp. 6033–6038). IEEE. (July, 2013).
29. Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L., & Farhadi, A. (2017).
Target-driven visual navigation in indoor scenes using deep reinforcement learning. In 2017
IEEE International Conference on Robotics and Automation (ICRA) (pp. 3357–3364). IEEE.
https://doi.org/10.1109/ICRA.2017.7989381. (May, 2017).
30. Sadeghi, F., & Levine, S. (2016). Cad2rl: Real single-image flight without a single real image.
https://doi.org/10.48550/arXiv.1611.04201. arXiv:1611.04201.
31. Tai, L., & Liu, M. (2016). Towards cognitive exploration through deep reinforcement learning
for mobile robots. https://doi.org/10.48550/arXiv.1610.01733. arXiv:1610.01733.
32. Jisna, V. A., & Jayaraj, P. B. (2022). An end-to-end deep learning pipeline for assigning
secondary structure in proteins. Journal of Computational Biophysics and Chemistry, 21(03),
335–348. https://doi.org/10.1142/S2737416522500120.
33. He, L., Aouf, N., & Song, B. (2021). Explainable deep reinforcement learning for UAV
autonomous path planning. Aerospace Science and Technology, 118, 107052. https://doi.org/
10.1016/j.ast.2021.107052.
34. Jeong, I., Jang, Y., Park, J., & Cho, Y. K. (2021). Motion planning of mobile robots for
autonomous navigation on uneven ground surfaces. Journal of Computing in Civil Engineering,
35(3), 04021001. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000963.
35. Chen, C., Seff, A., Kornhauser, A., & Xiao, J. (2015). DeepDriving: Learning affordance for
direct perception in autonomous driving. In 2015 IEEE International Conference on Computer
Vision (ICCV). https://doi.org/10.1109/ICCV.2015.312.
36. Wu, K., Wang, H., Esfahani, M. A., & Yuan, S. (2020). Achieving real-time path planning
in unknown environments through deep neural networks. IEEE Transactions on Intelligent
Transportation Systems. https://doi.org/10.1109/tits.2020.3031962.
37. Maw, A. A., Tyan, M., Nguyen, T. A., & Lee, J. W. (2021). iADA*-RL: Anytime graph-based
path planning with deep reinforcement learning for an autonomous UAV. Applied Sciences,
11(9), 3948. https://doi.org/10.3390/APP11093948.
38. Gao, J., Ye, W., Guo, J., & Li, Z. (2020). ‘Deep reinforcement learning for indoor mobile robot
path planning. Sensors’, 20(19), 5493. https://doi.org/10.3390/s20195493.
39. Yongqi, L., Dan, X., & Gui, C. (2020). Rapid trajectory planning method of UAV based on
improved A* algo-rithm. Flight Dynamics, 38(02), 40–46. https://doi.org/10.13645/j.cnki.f.d.
20191116.001.
UAV Path Planning Based on Deep Reinforcement Learning 65

40. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller,
M. (2013). ‘Playing atari with deep reinforcement learning. https://doi.org/10.48550/arXiv.
1312.5602. arXiv:1312.5602.
41. Ruan, X., Ren, D., Zhu, X., & Huang, J. (2019). ‘Mobile robot navigation based on deep
reinforcement learning’. In 2019 Chinese control and decision conference (CCDC) (pp. 6174–
6178). IEEE. https://doi.org/10.1109/CCDC.2019.8832393. (June, 2019 ).
Drone Shadow Cloud: A New Concept
to Protect Individuals from Danger Sun
Exposure in GCC Countries

Mohamed Zied Chaari, Essa Saad Al-Kuwari, Christopher Loreno,


and Otman Aghzout

Abstract The pick temperature in the Gulf and the Persian Gulf is around 47◦ in the
summer. The hot season lasts for six months in this region, starting at the end of April
and ending in October. The average temperature in the same period exceeds 44◦ in
the USA and Australia. The high temperature worldwide affects the body’s capability
to function outdoors. “Heat stress” refers to excessive amounts of heat that the body
cannot handle without suffering physiological degeneration. Heat stress due to high
ambient temperature seriously threatens workers worldwide. This issue increases the
risk of limitations in physical abilities, discomfort, injuries, and heat-related illnesses.
According to the Industrial safety & hygiene news, the worker’s body must maintain
a core temperature of 36◦ to maintain its normal function. Many companies have
their workers wear UV-absorbing clothing to protect themselves from the sun’s rays.
This chapter presents a new concept to protect construction workers from dangerous
sun exposure in hot temperatures. The fly umbrella drone with a UV-blocker fabric
canopy provides a stable shaded area. The solution minimizes heat stress and protects
them from UV rays when working outdoors. According to the sun’s position, a fly
umbrella moves dramatically through an open space, providing shade for workers.

Keywords Safety · Heat stress · Smart flying umbrella · Drone · Outdoor worker

1 Introduction

Global temperatures are rising, and how to reverse the trend is the subject of discus-
sion. A special report published by the Intergovernmental Panel on Climate Change
(IPCC) in October 2018 addressed the impacts of global warming of 1.5◦ , as shown

M. Z. Chaari (B) · E. S. Al-Kuwari · C. Loreno


Qatar Scientific Club, Fab Lab Department, Doha, Qatar
e-mail: [email protected]
O. Aghzout
University of Abdelmalek Essaadi, Tetouan, Morocco
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 67


A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous
Systems Applications, Studies in Computational Intelligence 1093,
https://doi.org/10.1007/978-3-031-28715-2_3
68 M. Z. Chaari et al.

Fig. 1 Global surface


temperature [38]

in Fig. 1. Increasing temperatures have an uneven impact on subregions. Heat stress


is predicted to have the greatest impacts in Western Africa and Southern Asia, with
4.8 and 5.3% productivity losses in 2030, equivalent to 43 and 9 million jobs, respec-
tively, as shown in Fig. 2. A weakening ozone shield has caused an increase in UV
light transmission from the sun to the planet, resulting in skin diseases [37]. The report
concluded that limiting global warming to 1.5◦ would require rapid, far-reaching,
and unprecedented social changes [17]. In [27], this work present that the GCC coun-
tries have been rapidly growing and expanding, with urbanization occurring at an
accelerated pace. Approximately 80% of the population of the GCC lives in urban
areas, making it one of the most urbanized regions in the world. Global warming
is caused by urbanization as a result of increased energy consumption and, conse-
quently, higher carbon emissions. From 1979 to 2018, the death rate as a direct result
of heat exposure (the underlying cause of death) generally ranged between 0.5 and
2 deaths per million people, with spikes in specific years, as shown in Fig. 3. Death
certificates show more than 11,000 Americans have died from heat-related causes
since 1979. This increase affects the heat exposure of construction workers outdoor
[4, 22, 42]. High temperatures can cause various illnesses, such as heat stroke,
permanent damage, or even death.
In [28], the authors review the scientific reports on the health status of workers
exposed to high temperatures in the workplace. Heat exposure has been associated
with heart-related diseases, deaths, accidents, and psychological effects on construc-
tion workers. The author’s review suggests that many workers are vulnerable to heat
exposure, which affects workers worldwide.
In [39], the authors describe a model of heat stress impacts on cardiac mortality
in Nepali migrant workers in Qatar. They used this model for a multi-objective
necessary to present the effect of hot temperature on the body of workers. The
authors demonstrated that the increased cardiovascular mortality during hot periods
most likely is due to severe heat stress among these construction workers.
Drone Shadow Cloud: A New Concept to Protect Individuals … 69

Fig. 2 Percentage of working hours lost to heat stress by subregion since 1995 and projected for
2030 [23]

Fig. 3 Heat-related deaths in the United States


70 M. Z. Chaari et al.

In [32], the authors present a model based on worldwide analysis Heat exposure
has killed hundreds of U.S. workers. At least 384 workers in the United States
have died from environmental heat exposure in the past decade. In 37 states across
America, the count includes people working essential jobs, such as farm workers in
California, construction workers in Texas, and tree trimmers in North Carolina and
Virginia. That shows that heat stress is not limited to GCC countries and is prevalent
throughout the United States.
In [26], the authors and reporters from CJI and NPR examined worker heat deaths
recorded by OSHA between 2010 and 2020. They compared the high temperature
of each incident day with historical averages for the previous 40 years. Most of the
deaths happened on sweltering days on that date. Approximately two-thirds of the
incidents occurred on days when the temperature reached at least 50◦ .
In [12], the authors present a thorough longitudinal study to describe heat exposure
and renal health of Costa Rican rice workers over three months of production. In this
study, 72 workers with various jobs in a rice company provided pre-shift urine and
blood samples at baseline and three months later. NIOSH guidelines and the WBGT
index were used to calculate metabolic and ambient heat loads. As a result of the
research, the research recommended that efforts be made to provide adequate water,
rest, and shade to heat-exposed workers in compliance with national regulations.
In this study [31], the authors explain that during the stages of the milk manu-
facturing cycle, Italian dairy production exposes workers to certain uncomfortable
temperatures as well as potentially subjecting workers to heat shock. This study
aimed to assess the risks of heat stress for dairy workers who process buffalo milk
in southern Europe.
The United States has a high rate of heat-related deaths, although they are generally
considered preventable. In the United States, heat-related deaths averaged 702 per
year between 2004 and 2018, as shown in Fig. 4. As part of the CDC’s effort to study
heat-related deaths by age group, gender, race/ethnicity, and urbanization level and
to evaluate comorbid conditions associated with heat-related deaths, CDC analyzed
mortality data from the National Vital Statistics System (NVSS). The highest heat-
related mortality rates were observed among males 65 years and older, American
Indians/Alaska Natives who lived in nonmetropolitan counties, and those in large
central metro counties [6]. To counteract this risk, legal texts, as well as technological
solutions, exist. Nations agree to stop all work if the wet-bulb globe temperature
(WBGT) exceeds 32.1◦ in a particular workplace, regardless of the time. The factors
considered by the WBGT index are air temperature, humidity, sunlight, and wind
strength. Construction workers should not stay outside in the heat for long periods
because they will face serious health problems [5, 19]. In [8] the authors explain the
epidemic of chronic kidney disease in Central America is largely attributed to heat
stress and dehydration from strenuous work in hot environments. This study shows
the efforts to reduce heat stress and increase efficiency among sugarcane workers
in El Salvador. This study pushes the Salvadoran government to provide mobile
canopies for workers, as shown in Fig. 5. Umbrellas are widely used as shade in
middle eastern countries. This research aims to develop a flying umbrella to provide
shade and safe conditions parameters for outdoor workers.
Drone Shadow Cloud: A New Concept to Protect Individuals … 71

Fig. 4 U.S. heat-related deaths from 2004 to 2018

Fig. 5 Portable canopies


provide shade to the cane
field workers during a break

2 Related Work

2.1 Overview

Scientists and researchers have devoted much attention to this issue. In [18], the
authors present a technique based on shade structures installed in primary schools to
help reduce children’s exposure to ultraviolet radiation (UVR) during their formative
72 M. Z. Chaari et al.

Fig. 6 An illustration of solar geoengineering techniques [14, 24, 33]

years and to provide areas where they can conduct outdoor activities safely. In [25]
Scientists such as Keith seek ways to mimic the volcanic effect artificially. They
explain in a nature journal that a method to cool the planet rapidly is by injecting
aerosol particles into the stratosphere to reflect away some inbound sunlight, as shown
in Fig. 6. In [21, 29], the authors report that the process involves injecting sulfur
aerosols into the stratosphere between 9 and 50 kilometers above the Earth’s surface.
Solar geoengineering involves reflecting sunlight into space to limit global warming
and climate change. After the aerosols combine with water particles, sunlight will
be reflected more than usual for one to three years.
One scientific team is developing a global sunshade that uses balloons or jets to
shield the most vulnerable countries in the global south from the adverse effects
of global warming [34]. In [7], the present author’s scientists have focused more
on modifying fabric as a primary protective layer of skin against harmful radiation.
Today, many people and outdoor workers use umbrellas to protect themselves from
the sun’s and UV rays, are shown in Fig. 7. In the modern world, umbrellas are
necessary. In sweltering conditions, it is beneficial.
In addition, when we are working or doing something outdoors under changing
weather conditions, umbrellas are a handy tool, as shown in Fig. 7a. However, under
such circumstances, sometimes it has noticeable shortcomings. The hand may always
be busy when handling an umbrella, limiting some hand functions and requiring
further care and attention. There are some difficulties in obtaining some disadvantages
in holding an umbrella. Therefore, many companies and associations supply umbrella
hats for generating a canopy, as shown in Fig. 7b. Several high-tech solutions help
workers cope with and adapt to this climate, particularly outdoors. Providing shade
to customers utilizing robotics technology requires a great deal of work.
Drone Shadow Cloud: A New Concept to Protect Individuals … 73

(a) Southern Area Municipality distributes umbrella hats to workers (Bahrain).

(b) Peoples utilize an umbrella to block UV rays in Dubai.

Fig. 7 Ordinary umbrellas

Research at the Dongseo University has developed a solution to finding the perfect
spot to enjoy shade during a summer picnic [3]. A new type of portable architecture
solves the mundane but frustrating problem of adjusting shades throughout the day.
Researchers at this university have demonstrated that an adaptive canopy can change
shape as the sun moves throughout the day, providing stable shade and shadowing
regardless of the solar position or time of day while still considering its configuration
irrespective of location.
Research at the fabrication laboratory in QSC has developed a prototype to provide
cool air from the robot the perfect spot to enjoy during a summer picnic [11]. A new
type of air conditioner robot prototype that follows humans in outdoor applications.
Several robotic systems are sensitive to solar radiation, including some integrated
with solar panels, and can even use them for shading. The researchers conclude
that “the resulting architectural system can autonomously reconfigure, and stable
operation is driven by adaptive design patterns, rather than solely robotic assembly
methods.” Cyberphysical macro materials and aerial robotics utilize to construct the
canopy [44]. The drone is a lightweight nanomaterial made of Carbon Fiber with
integrated electronics to sense, process, and communicate data [1, 13].
University of Maryland researchers developed a system called RoCo which pro-
vides cooling while conserving the user’s comfort [15, 16]. The engineering faculty
of this university has created multiple versions of air conditioner robots with differ-
ent options. In today’s fast-moving world, robots that assist humans are increasingly
74 M. Z. Chaari et al.

relevant. In many fields and industries, a robot helps a human [35, 40]. Researchers
have confirmed the possibility of creating an intelligent flying umbrella by combining
drone and canopy technology. Drones significantly impact the production capabilities
of individuals such as artists, designers, and scientists, and they can detect and track
their users continuously. Over the past few years, UAV and mini UAV technology
have grown significantly [10]. It is present in our daily lives and helps us in vari-
ous fields, recently in the COVID-19 pandemic (Spraying, Surveillance, Homeland
Security, etc.). Today, drones can perform a wide range of activities, from delivering
packages to transporting patients [9].
In [20], A team of experts in engineering at Asahi Power Service has invented
a drone called “Umbrella Sidekick.” A drone can translate as an imaginative flying
umbrella.
While the techniques and solutions described above are good, outdoor workers
need better and more efficient methods, especially concerning global warming. The
following subsection presents a new proposal for flying umbrellas.

2.2 Proposal

This work aims to develop a flying umbrella with stable shading and a motion tracking
system. According to its name, this umbrella performs the same function as the older
one, flying above the head of the workers and serving the same purpose as an umbrella
hat. UV-protective fly umbrellas reduce the temperature in outdoor construction areas
and prevent heat illness at work. Flying Umbrella drones are designed mainly to:
• Provides consistent shadowing and shading regardless of the angle of the sun’s
rays.
• Keep a safe distance of approximately ten meters from the construction worker’s
field.
• Prevent heat illness at work.
The sun heats the earth during the day, and clear skies allow more heat to reach
the earth’s surface, which increases temperatures. However, cloud droplets reflect
some of the sun’s rays into space when the sky is cloudy. So, less of the sun’s
energy can reach the earth’s surface, which causes the earth to heat up more slowly,
leading to cooler temperatures [2, 43]. The prototype idea comes from the shadow
cast by clouds since clouds can block the sun’s rays and provide shade, as shown in
Fig. 8a. The study aims to develop an intelligent flying umbrella that would improve
efficiency and offer a comfortable outdoor environment for workers. We choose ten
meters high of the umbrella in the workers’ bran in a variety of ways. On the one
hand, the workers do not have hair and feel the noise generated by the propellers
[30, 36, 41]. On the other hand, the canopy reflects significant amounts of solar
radiation. (x) is a function of the solar position and the umbrella place, as shown
in Fig. 8b. The umbrella is on standby in special parking, awaiting the order to fly.
If it receives an order, it will follow the workers automatically. The umbrella will
Drone Shadow Cloud: A New Concept to Protect Individuals … 75

(a) Cloud shade. (b) Flying umbrella shade.

Fig. 8 A flying umbrella that casts a shadow

Fig. 9 Propose a way to flying umbrellas

return to its original position if it loses the target workers. A failed signal will cause
the umbrella to issue an alarm and return to the parking area smoothly based on the
ultrasonic sensor implemented in the umbrella, as shown in Fig. 9. The sunshades
protect workers from solar and heat stress, and sunshades need to be adjusted daily
according to solar radiation. The umbrella concept consists of several components: an
aluminum frame, an electronics board, sensors, and radio-frequency communication.
The chapter is structured as follows: The first section is the introduction. Section 2
presents the materials and methods. Section 3 discusses the fabrication and testing
of the fly umbrella. Section 4 demonstrates the results, and finally, we conclude with
a prototype discussion and plan for the future.
76 M. Z. Chaari et al.

3 Proposed Method

Thus, this flying umbrella includes a flight and operating system to provide shade to
workers outdoors. The product is born from the combination of a drone equipped with
the necessary equipment and a canopy umbrella. Control commands for a umbrella
drone: This module sends specific control commands to the drone via the radio-
frequency link to control the umbrella (i.e., pitch, roll, yaw, and throttle). Utilize RF
remote controls to track and follow workers in an open environment. The VU0002
digital ultrasonic sensor employs the ultrasonic time-of-flight principle to measure
the distance between the umbrella and the obstacle. Various communication proto-
cols, including LIN bus and 2/3-wire IO, are used to output a digital distance signal
and self-test data, making it suitable for an array of intelligent umbrella parking sys-
tems. This ultrasonic sensor is highly weather-resistant, stable, and anti-interference.
The logical design in Fig. 10 shows the interaction between the components of the
flying umbrella and how data flows from the input layer to the designated actions
by the umbrella. Through the camera on board, the pilot perceives the environment
through its sensor. Also via the RF signal, the onboard computer receives commands
from the umbrella and executes them by sending movement control commands to
the umbrella flight controller, which is the brain of the umbrella location system.
Transmitters transmit orders that are received by the receiver and then transmitted to
the flight controller, which instructs the actuators to move. As part of the prototype,
an ultrasonic proximity sensor detects obstacles in the umbrella’s path. The umbrella
can track the worker efficiently by avoiding obstacles. The drone tracking module
is responsible for monitoring the spatial relationship between real-world elements

Fig. 10 Diagram of a flying umbrella drone that uses an ultrasonic sensor


Drone Shadow Cloud: A New Concept to Protect Individuals … 77

Fig. 11 Three cases for flying umbrella cloud geometry effect on workers from space. (S: Shadow
shown on the image; E: shadow covered by cloud, and F: bright surface covered by cloud)

and their virtual representations. After an initial calibration process and using an
IMU-based approach, tracking is possible.
A pilot must be able to receive a live video feed from the umbrella, as well as
have real-time access to flight data. The umbrella should be able to switch between
modes without malfunctioning. The umbrella should be able to adjust its position
automatically without human assistance. For example, the umbrella must not exceed
five seconds in latency when delivering pilot instructions to the drone and transmitting
telemetry information from the drone to the pilot. At ten meters in height, the canopy
creates a shade geometry around the worker’s barn, as shown in Fig. 11. According
to the position of the flying umbrella, the shade position changes. Consider these
three possibilities. S: Shadow provided by the canopy; E: shadow is covered by
the umbrella, and F: the bright surface contaminated by the canopy. The projected
and actual positions of the umbrella will most likely differ significantly when the
observing and solar zenith angles are relatively large.

3.1 Design of Mechanical Structure

For making the umbrella structure frame, we use aluminum instead of carbon fiber in
our prototype since we do not have the facility to produce it from fiber carbon. Plastic
propellers are the most common carbon fiber propellers of better quality; we chose
G32X 11CF carbon fiber propellers for our prototype. A motor’s technical speci-
fication is imperative, and more efficient motors will save battery life and give the
owner more flying time, which is what every pilot wants. We select motor U13KV130
with high efficiency and can thrust 24 kg. Multirotor drones rely on electronic speed
78 M. Z. Chaari et al.

Fig. 12 Diagram of a flying umbrella drone

controllers to provide high frequency, high power, high-resolution AC power to the


brushless electric motors in a very compact package. The flight controller regulates
motor speeds via ESC, to provide steering. We select ESC flame 180A 12S V2.0.
Controls autopilot, follows workers, failsafe, and many other autonomous functions
with inputs from the receiver, GPS module, battery monitor, and IMU mange by the
flight controller. This is the heart of the umbrella drone, as shown in Fig. 12. All
propellers’ positions varied in a maximum dimension of about 1100 mm, as shown
in Fig. 13.
Steps to fabricate the prototype:
• Making aluminum umbrella frames.
• Fixing the Flame 180A 12s V2.0 & the U13II KV130 motors.
• Installing the G32X11CF propeller.
• Installing the flight controller board.
• Installing the ultrasonic sensor.
• Installing the camera and power distribution system.
• Fixing the batteries.
The umbrella body was designed and fabricated in our mechanical workshop. Assem-
bly and programming took place in the FabLab (QSC), as shown in Fig. 14. The
umbrella comprises six propellers and electronics that balance the canopy; it can
Drone Shadow Cloud: A New Concept to Protect Individuals … 79

Fig. 13 Schematic top and


right view of the umbrella
drone dimensions

(a) Top view.

(b) Right view.

move in two directions at the same speed. It uses six U13II KV130 motors (engines)
manufactured by T-MOTOR. Based on the data in Table 1, each engine produces 5659
W and a maximum thrust of 24 kg. Utilize a 2.5 m distance measurement ultrasonic
sensor to prevent crushing.
The different specifications of the hardware and mechanical parts of the flying
umbrella describe in Table 2.
An exploded picture of a flying umbrella concept in 3D, as shown in Fig. 15. The
parts of the umbrella drone shadows are mounted and secured well, as depicted in
Fig. 11b.
In the meantime, the umbrella remains on standby in a safe area and awaits an
order to take off, and it will follow workers after stabilizing at an altitude of ten
meters to provide shade. The pilot can move the position of the umbrella to give
the maximum area of shade to workers in the construction field. If the pilot loses
communication with the umbrella, a ringing alarm will go on, and the GPS will land
the umbrella automatically in the parking. The flowchart of the system algorithm is
shown in Fig. 16.
80 M. Z. Chaari et al.

(a) Umbrella frame ready. (b) Fixing the Flame 180A 12s V2.0 & the
U13II KV130 motors.

(c) Installation of the G32X11CF propeller. (d) Verify the weight of the umbrella body.

Fig. 14 Manufacturing the umbrella prototype

Table 1 The list of components


Item Manufacture Part number
UV-Blocker fabric Blocks up to 90% of UV rays N/A
Ultrasonic sensor Brand AUDIOWELL VU0002
Motor T-motor U13IIKV130 (Power:5659)
Propellers T-motor G32X 11CF Prop
ESC T-motor FLAME 180A 12S V2.0
Converter DC-DC MEAN WELL RSD-30G-5

Table 2 Mechanical specifications of the flying umbrella


List Technical specification
Umbrella frame size (mm) 2500 × 2500
Umbrella aluminum frame weight (kg) 33
Distance between propellers and the UV-blocker fabric (mm) 540
Distance between propellers and the ground (mm) 305
Drone Shadow Cloud: A New Concept to Protect Individuals … 81

(a) An umbrella in 3D exploded view. (b) All parts of the umbrella mounted.

Fig. 15 Flying umbrella ready to fly

3.2 Shade Fabric

We used a fabric canopy screen that can block 95% of UV rays while allowing water
and air to pass through. Polyethylene fabric of 110 grams per square meter with
galvanized buttonholes and strong seams. It is very breathable, allowing air to pass
through, making the space more comfortable. It can block 95% of UV rays and create
cool shadows. Cools the space and allows light to pass through, allowing raindrops
to pass through, so there’s no water pooling.

3.3 Selecting a Umbrella Flight Controller

The process of making an umbrella drone can be rewarding, but the choice of a
flight controller can be challenging. This prototype can only be operated by specific
drone controllers available today on the market. When you know exactly what our
umbrella drone will look like, you can narrow down the list of potential autopilot
boards, making the decision easier. Here are some criteria to consider when selecting
an autopilot board. In our analysis, we analyzed seven of the top drone controllers
on the market based on factors such as:
• Affordability
• Open Source Firmware
• FPV Racing friendly
• Autonomous functionality
• Linux or microcontroller-based environment
• Frame size typical
• Popularity
• CPU.
82 M. Z. Chaari et al.

Fig. 16 The overall system flowchart

In this subsection, we will present the best flight controller boards and select the best
for our prototype.

• APM Flight Controller: APM took the Pixhawk as its apprentice and developed
it into a much more powerful flight controller. Pixhawk uses a 32-bit processor,
while APM has an 8-bit processor. It was a massive leap for open-source drone
controllers, so DIY drone builders widely used them. The Pixhawk is compatible
with ArduPilot and PX4, two major open-source drone projects, and is also entirely
open-source.
Drone Shadow Cloud: A New Concept to Protect Individuals … 83

• Pixhawk: After the original pixhawk, the pixhawk open-source hardware project
made many flight control boards. Open-source projects like ArduPilot, therefore,
have a better chance of adding new functionality and support to Cube. Due to these
similarities, the Cube and the Pixhawk are very similar.
• Navio2: The Navio2 uses a raspberry pi to control the flight. As a result, Navio2 is
simply a shield that attaches to a Raspberry Pi 3. Debian OS images that come pre-
installed with ArduPilot are available for free from Emlid, which makes Navio2.
A simple flash of an SD card will do the trick.
• BeagleBone Blue: The first Linux implementation of ArduPilot was ported to
the BeagleBone Black before being ported to the BeagleBone in 2014. Creating
the BeagleBone Blue was a direct result of the success of the Linux porting of
ArduPilot for the BeagleBone Black.
• Naza Flight Controller: Naza-M V2 kits can be found on Amazon for about $200
and come with essential components like GPS. There is a closed-source flight
control software here, which means the community does not have access to its
code. Nasa flight controllers aren’t appropriate for people who want to build a
drone they can tinker with.
• Naze32: The Naze32 model boards are lightweight and affordable, costing about
$30-40. Many manufacturers offer Naze32 boards; choose one that is an F3 or F4
flight controller.

3.4 Flying Umbrella Power Calculation

A significant problem with electricity-powered robots is their battery life. This will
also be an issue with the flying umbrella. Lithium Polymer (LiPo) batteries utilize in
this prototype because of their lightweight and high capacity. It is also possible to use
Nickel Metal Hydride (NiMH), cheaper but heavier than LiPo, causing problems and
reducing the umbrella’s efficiency. Due to battery weight, there is a tradeoff between
an umbrella’s total weight and flight time. Each U13II KV130 brushless DC Motor
can thrust 24.3 Kg at a battery voltage of 48 VDC, as shown in Fig. 17. So with
six U13II KV130 brushless motors, the fly umbrella can carry 144 Kg. Two sources
power the flying umbrella:
• For six DC brushless motors at 5659 W each, it is 59.65 KW for the whole load and
full thrust. So the power of the total motor is approximately equal to 41.7 kW. Six
batteries 44.44 Vdc/8000 mAh (Two batteries 22.22 Vdc/8000 mAh series). The
total amount of energy produced by the umbrella is 41.7 kWh = 9721 KgCo2.
• Battery 12 Vdc 5.5 AH for powering all sensors (ultrasonic, flight controller, GPS
module, camera, etc.). So total consumption power is 35 W.
The time for the umbrella to fly is based upon the specifications of the batteries, as
shown in Table 3.
The umbrella cannot fly for more than 22 min with the CX48100 battery. Alu-
minum umbrella frames weigh about 33 kg, and CX48100 batteries weigh 42 kg.
84 M. Z. Chaari et al.

Fig. 17 U13II KV130 brushless DC Motor

Table 3 Batteries specifications


Batteries Current (Ah) Voltage (V) Umbrella Numbers of batteries Batteries
(Brand) expected total weight
flight time (Kg)
(min)
HRB 8 22.2 V 2 12 (Each two 11
batteries arranged in
series)
CX48100 100 48 22 1 42
CXC4825 25 48 11 2 (two batteries 16
arranged in series)

A total of 73 kg is the weight of the six propellers umbrella drone with a battery
and a thrust-to-weight ratio (TTWR) of about 120 kg. The batterie was fixed in the
center to ensure an appropriate weight distribution. The fly umbrella can carry the
total weight easily. Flight time calculation:

AC D = T F W × ( p ÷ v)

T = (C × (B D M)) ÷ (AC D) = 22 min

where:
ACD: Average current Draw,
TFW: Total flight weight,
BDM: Battery Discharge Margin,
P: Power to Weight ratio,
V: Voltage.
Drone Shadow Cloud: A New Concept to Protect Individuals … 85

4 Experiment Phase

We agree to use radio frequency technology to control the umbrella via a ground
control person to improve worker safety. A ground control station and GPS data
manage the umbrella remotely. The large size of the umbrella (2500 mm × 2500 mm)
necessitates the use of a controller with a remote controller and flier system. This
will ensure its use in urban areas and ensure its safety. The umbrella flying control
system maintains balance by taking into account all parameters. With the addition
of a high-quality ultrasonic sensor (VU0002), the umbrella can take off and land
easily and avoid obstacles. To ensure that the umbrella functions appropriately, the
ultrasonic sensor in the umbrella must work with an obstacle avoidance algorithm.
A flight controller is one of the most comprehensive features of a flying umbrella.
The application supports everything, from the home return to carefree flight to altitude
hold. The altitude holding and return to home features are particularly helpful for
stable shade and following the sun’s position. This technique may help a pilot who is
disoriented about their flight direction. Continuously flying around the yard without
experiencing extreme ups and downs is easy. However, using the Naza-M V2 in this
work is highly efficient, as shown in Fig. 18a. The Fly umbrella is equipped with
an accelerometer, gyroscope, magnetometer, barometric pressure sensor, and GPS
receiver for more excellent safety in urban environments, as shown in Fig. 18b. As a
safety precaution, a rope should use to pull the umbrella drone in case communication
is lost. See Fig. 19.
During the flight test, one of the objectives was to determine how the control sys-
tem would perform when the large umbrella size synced with the electronic system.
Pilots were able to fly the umbrella via remote control during flight tests. The first
step in determining parameters after moving an umbrella is to select longitudinal
axes. In this function, the FCS controls the longitudinal axes while the pilot contin-
ues to direct the umbrella. The pilot steered the umbrella in and out of the airport

(a) Connection diagram. (b) A GPS receiver, a barometric pressure sensor,


accelerometers, gyroscopes, and magnetometers
make up the fly umbrella’s control system.

Fig. 18 Fly umbrella RF components


86 M. Z. Chaari et al.

Fig. 19 The umbrella pulled with a rope

and increased gain until the umbrella was stable to minimize steady-state errors. It
is height-adjusted in ascending and descending steps from left to right to ensure sta-
bility and maintain a moderate rise and fall rate. Shades can be controlled remotely
and moved based on the worker’s location.
The umbrella took off successfully, flew steadily, and landed in the desired area,
as shown in Fig. 20. Flying umbrellas provide shade while remaining stable in the
air, as illustrated in Fig. 20b. The workers move in different directions (left, right, and
back) to evaluate the umbrella’s mechanical response. We measured the trajectory
between the flying umbrella and the error distance during the current study. We
kept the umbrella at a ten-meter altitude. During the testing phase, we observed
that the PID controller of the umbrella was affected by the wind speed. So one of
the limitations of the umbrella is that it cannot fly in strong winds. Because of the
high-efficiency propellers, the flying umbrella can be loud because large quantities
of air are rapidly displaced. While the propeller spins, pressure spikes are generated,
resulting in a distinctive buzzing noise. The flying umbrella produced in-air levels
of 80 dB, with fundamental frequencies at 120 Hz. Noise levels were around 95 dB
when the umbrella flew at altitudes of 5 and 10 m. Noise levels were very high in the
construction area, so it was not a significant effect compared to heat stress.
Drone Shadow Cloud: A New Concept to Protect Individuals … 87

(a) Flying umbrella successful take-off (b) Flying umbrella successful flying

(c) Landed

Fig. 20 Following scenes: a–b flying umbrella successful take-off and flying

5 Results and Discussion

The following is a summary of the analysis of the experimental results:


• For safety reasons, we implement ultrasonic sensors in the umbrella to avoid
obstacles and a GPS module to help the umbrella to the homing area in case of
losing the RF link with the pilot. The GPS module enables the umbrella to know
its location relative to the home area.
• Succesful control the umbrella remotely after adjusting the flay parameters such
as the speed and the altitude.
• Make a stable industrial shade that can protect workers in construction air from
high temperatures and UV rays.
• In addition to providing shade, this umbrella is equipped with a water tank in the
middle that produces a mist of water to keep workers filling cool.
• On a sunny day, the fly umbrella shields more than 75% of ultraviolet light.
• Thanks to its high-performance design and six brushless motors, this umbrella can
carry more than 120 kg.
• The umbrella provides shade over a surface area of 5 m2 (2.5 m × 2.5 m).
• The GPS sensor allows the flying umbrella to return to the parking station if it
loses contact with the umbrella pilot.
• Use UV-blocker fabric umbrellas, which offer excellent protection from the sun.
A minimum flight height may require for follow-me modes in this application.
Umbrella drones should fly at a higher altitude, with no obstacles in front or
behind them. The Umbrella drone produces high levels of noise.
88 M. Z. Chaari et al.

Fig. 21 The air temperature difference between shade and full sun in 05 June 2022 (Morning)

Table 4 The difference in temperature between objects in shade and full sunlight
Object Time Full sun Shade provided by
flying umbrella
Workers hat 10 AM 42.3◦ 41.2◦
Air temperature 10.15 AM 44◦ 42.3◦
Soil temperature 10.20 AM 46.5◦ 44◦

• A more extensive scale of the umbrella will be able to be used in many places after
further R&D.
In comparing temperatures in two locations, the first location under an umbrella shade
and the second location under the full sun in morning time, there was an average
difference of 2.5◦ during 22 min, as shown in Fig. 21. We can observe that the shade
reduces air and soil temperatures and blocks the sun’s rays, as shown in Table 4.
On the same date but afternoon temperatures in two locations, one under an
umbrella shade, one under direct sunlight, there was an average difference of 2.7◦
during 22 min, as shown in Figure 22. According to Table 5, the shade reduces the
air and soil temperatures by blocking the sun’s rays.
Testing at the airport reveals some parameters we can consider to increase the
efficiency of the umbrella. Another power source, such as a solar system, should
be considered to keep the umbrella flying for a long time. Based on the GPS signal,
tracking capabilities are very high. The umbrella produces high levels of noise which
are acceptable in workers’ barn area.
Drone Shadow Cloud: A New Concept to Protect Individuals … 89

Fig. 22 The air temperature difference between shade and full sun in 05 June 2022 (Afternoon)

Table 5 The difference in temperature between objects in shade and full sunlight (Afternoon)
Object Time Full sun Shade provided by
flying umbrella
Workers hat 4.00 PM 41.2◦ 40.4◦
Air temperature 4.15 PM 40.4◦ 39.8◦
Soil temperature 4.20 PM 39.8◦ 39.7◦

6 Conclusion

The purpose of this research is to develop a flying umbrella. We demonstrate how


the flying umbrella prototype successfully provides shade to individuals working
outdoors in hot climates. As a prototype, this umbrella design can perform all the
functions of an ordinary umbrella but with human assistance, and it protects workers
from the sun’s rays and prevent heat illness at work. In this work, the reader can find
the possible draft design and a list of all the devices required to perform the desired
task. The drone is changing the philosophy of flying objects to the manufactured
cloud. We compare the temperature difference between objects in the shade and full
sunlight at two separate times on the same day, afternoon and morning. The temper-
ature under the canopy is lower than in full sunlight. The air and soil temperature
decreased by 2.5◦ and 2.7◦ , respectively. In particular, more work needs to be done
to design umbrellas for different climates and critical conditions. Protecting an envi-
90 M. Z. Chaari et al.

ronment subject to rain, snow, and, most importantly, scorching heat is essential. In
future work, we propose the possibility of installing a wireless charging station in
the parking area for the fly umbrella drone.

Acknowledgements No funding to declare.

References

1. Agarwal, G. (2022). Brief history of drones. In: Civilian Drones, Visual Privacy and EU Human
Rights Law, Routledge (pp. 6–26). https://doi.org/10.4324/9781003254225-2
2. Ahmad, L., Kanth, R. H., Parvaze, S., & Mahdi, S. S. (2017). Measurement of cloud cover.
Experimental Agrometeorology: A Practical Manual (pp. 51–54). Springer International Pub-
lishing. https://doi.org/10.1007/978-3-319-69185-5_8
3. Ahmadhon, K., Al-Absi, M. A., Lee, H. J., & Park, S. (2019). Smart flying umbrella drone on
internet of things: AVUS. In 2019 21st International Conference on Advanced Communication
Technology (ICACT), IEEE. https://doi.org/10.23919/icact.2019.8702024
4. Al-Bouwarthan, M., Quinn, M. M., Kriebel, D., & Wegman, D. H. (2019). Assessment of heat
stress exposure among construction workers in the hot desert climate of saudi arabia. Annals
of Work Exposures and Health, 63(5), 505–520. https://doi.org/10.1093/annweh/wxz033.
5. Al-Hatimy, F., Farooq, A., Abiad, M. A., Yerramsetti, S., Al-Nesf, M. A., Manickam, C.,
et al. (2022). A retrospective study of non-communicable diseases amongst blue-collar migrant
workers in qatar. International Journal of Environmental Research and Public Health, 19(4),
2266. https://doi.org/10.3390/ijerph19042266.
6. Ambarish Vaidyanathan PSSS Josephine Malilay. (2020). Heat-related deaths - united states,
2004–2018. Centers for Disease Control and Prevention, 69(24), 729–734.
7. Bashari, A., Shakeri, M., & Shirvan, A. R. (2019). UV-protective textiles. In The Impact and
Prospects of Green Chemistry for Textile Technology (pp. 327–365). Elsevier. https://doi.org/
10.1016/b978-0-08-102491-1.00012-5
8. Bodin, T., García-Trabanino, R., Weiss, I., Jarquín, E., Glaser, J., Jakobsson, K., et al. (2016).
Intervention to reduce heat stress and improve efficiency among sugarcane workers in el sal-
vador: Phase 1. Occupational and Environmental Medicine, 73(6), 409–416. https://doi.org/
10.1136/oemed-2016-103555.
9. Chaari, M. Z., & Al-Maadeed, S. (2021). The game of drones/weapons makers war on drones.
In Unmanned Aerial Systems (pp. 465–493). Elsevier. https://doi.org/10.1016/b978-0-12-
820276-0.00025-x
10. Chaari, M. Z., & Aljaberi, A. (2021). A prototype of a robot capable of tracking anyone with
a high body temperature in crowded areas. International Journal of Online and Biomedical
Engineering (iJOE), 17(11), 103–123. https://doi.org/10.3991/ijoe.v17i11.25463.
11. Chaari, M. Z., Abdelfatah, M., Loreno, C., & Al-Rahimi, R. (2021). Development of air condi-
tioner robot prototype that follows humans in outdoor applications. Electronics, 10(14), 1700.
https://doi.org/10.3390/electronics10141700.
12. Crowe, J., Rojas-Garbanzo, M., Rojas-Valverde, D., Gutierrez-Vargas, R., Ugalde-Ramírez,
J., & van Wendel de Joode, B. (2020). Heat exposure and kidney health of costa rican rice
workers. ISEE Conference Abstracts, 2020(1). https://doi.org/10.1289/isee.2020.virtual.o-os-
549
13. DeFrangesco, R., & DeFrangesco, S. (2022). The history of drones. In The big book of drones
(pp. 15–28). CRC Press. https://doi.org/10.1201/9781003201533-2
14. Desch, S. J., Smith, N., Groppi, C., Vargas, P., Jackson, R., Kalyaan, A., et al. (2017). Arctic
ice management. Earths Future, 5(1), 107–127. https://doi.org/10.1002/2016ef000410.
Drone Shadow Cloud: A New Concept to Protect Individuals … 91

15. Dhumane, R., Ling, J., Aute, V., & Radermacher, R. (2017). Portable personal conditioning
systems: Transient modeling and system analysis. Applied Energy, 208, 390–401. https://doi.
org/10.1016/j.apenergy.2017.10.023.
16. Dhumane, R., Mallow, A., Qiao, Y., Gluesenkamp, K. R., Graham, S., Ling, J., & Radermacher,
R. (2018). Enhancing the thermosiphon-driven discharge of a latent heat thermal storage system
used in a personal cooling device. International Journal of Refrigeration, 88, 599–613. https://
doi.org/10.1016/j.ijrefrig.2018.02.005.
17. Geffroy, E., Masia, M., Laera, A., Lavidas, G., Shayegh, S., & Jolivet, R. B. (2018). Mcaa
statement on ipcc report “global warming of 1.5 c”. https://doi.org/10.5281/ZENODO.1689921
18. Gies, P., & Mackay, C. (2004). Measurements of the solar UVR protection provided by shade
structures in new zealand primary schools. Photochemistry and Photobiology. https://doi.org/
10.1562/2004-04-13-ra-138.
19. Hameed, S. (2021). India’s labour agreements with the gulf cooperation council coun-
tries: An assessment. International Studies, 58(4), 442–465. https://doi.org/10.1177/
00208817211055344.
20. Hayes, M. J., Levine, T. P., & Wilson, R. H. (2016). Identification of nanopillars on the cuticle
of the aquatic larvae of the drone fly (diptera: Syrphidae). Journal of Insect Science, 16(1), 36.
https://doi.org/10.1093/jisesa/iew019.
21. Haywood, J., Jones, A., Johnson, B., & Smith, W. M. (2022). Assessing the consequences of
including aerosol absorption in potential stratospheric aerosol injection climate intervention
strategies. https://doi.org/10.5194/acp-2021-1032.
22. How, V., Singh, S., Dang, T., Lee, L. F., & Guo, H. R. (2022). The effects of heat exposure
on tropical farm workers in malaysia: Six-month physiological health monitoring. Interna-
tional Journal of Environmental Health Research, 1–17. https://doi.org/10.1080/09603123.
2022.2033706
23. (ILO) ILO (2014) Informal Economy and Decent Work: A Policy Resource Guide Supporting
Transitions to Formality. INTL LABOUR OFFICE
24. Irvine, P., Emanuel, K., He, J., Horowitz, L. W., Vecchi, G., & Keith, D. (2019). Halving
warming with idealized solar geoengineering moderates key climate hazards. Nature Climate
Change, 9(4), 295–299. https://doi.org/10.1038/s41558-019-0398-8.
25. Irvine, P., Burns, E., Caldeira, K., Keutsch, F., Tingley, D., & Keith, D. (2021). Expert judge-
ments judgements on solar geoengineering research priorities and challenges. https://doi.org/
10.31223/x5bg8c
26. JULIA SHIPLEY DNRBSMCCWT BRIAN EDWARDS (2021) Hot days: Heat’s mounting
death toll on workers in the u.s.
27. Khan, H. T. A., Hussein, S., & Deane, J. (2017). Nexus between demographic change and
elderly care need in the gulf cooperation council (GCC) countries: Some policy implications.
Ageing International, 42(4), 466–487. https://doi.org/10.1007/s12126-017-9303-9.
28. Lee, J., Lee, Y. H., Choi, W. J., Ham, S., Kang, S. K., Yoon, J. H., et al. (2021). Heat exposure
and workers’ health: a systematic review. Reviews on Environmental Health, 37(1), 45–59.
https://doi.org/10.1515/reveh-2020-0158.
29. Lee, W. R., MacMartin, D. G., Visioni, D., & Kravitz, B. (2021). High-latitude stratospheric
aerosol geoengineering can be more effective if injection is limited to spring. Geophysical
Research Letters, 48(9). https://doi.org/10.1029/2021gl092696.
30. Lee, Y. K., Yeom, T. Y., & Lee, S. (2022). A study on noise analysis of counter-rotating
propellers for a manned drone. The KSFM Journal of Fluid Machinery, 25(2), 38–44. https://
doi.org/10.5293/kfma.2022.25.2.038.
31. Marucci, A., Monarca, D., Cecchini, M., Colantoni, A., Giacinto, S. D., & Cappuccini, A.
(2014). The heat stress for workers employed in a dairy farm. Journal of Agricultural Engi-
neering, 44(4), 170. https://doi.org/10.4081/jae.2013.218.
32. Mehta, B. (2021) Heat exposure has killed hundreds of u.s. workers - it’s time to do something
about it. Industrial Saftey & Hygiene News, 3(52).
33. Meyer, R. (2018). A radical new scheme to prevent catastrophic sea-level rise. The Atlantic
92 M. Z. Chaari et al.

34. Ming, T., de_Richter, R., Liu, W., & Caillol, S. (2014). Fighting global warming by climate
engineering: Is the earth radiation management and the solar radiation management any option
for fighting climate change? Renewable and Sustainable Energy Reviews,31, 792–834. https://
doi.org/10.1016/j.rser.2013.12.032
35. Niedzielski, T., Jurecka, M., Miziński, B., Pawul, W., & Motyl, T. (2021). First successful
rescue of a lost person using the human detection system: A case study from beskid niski (SE
poland). Remote Sensing, 13(23), 4903. https://doi.org/10.3390/rs13234903.
36. Oliver Jokisch, D. F. (2019). Drone sounds and environmental signals - a first review. 30th
ESSV ConferenceAt: TU Dresden
37. Pan, Q., Sumner, D. A., Mitchell, D. C., & Schenker, M. (2021). Compensation incentives and
heat exposure affect farm worker effort. PLOS ONE, 16(11), e0259,459. https://doi.org/10.
1371/journal.pone.0259459
38. Peace, A. H., Carslaw, K. S., Lee, L. A., Regayre, L. A., Booth, B. B. B., Johnson, J. S., &
Bernie, D. (2020). Effect of aerosol radiative forcing uncertainty on projected exceedance year
of a 1.5 c global temperature rise. Environmental Research Letters, 15(9), 0940a6. https://doi.
org/10.1088/1748-9326/aba20c
39. Pradhan, B., Kjellstrom, T., Atar, D., Sharma, P., Kayastha, B., Bhandari, G., & Pradhan, P. K.
(2019). Heat stress impacts on cardiac mortality in nepali migrant workers in qatar. Cardiology,
143(1–2), 37–48. https://doi.org/10.1159/000500853.
40. Sankar, S., & Tsai, C. Y. (2019). ROS-based human detection and tracking from a wireless
controlled mobile robot using kinect. Applied System Innovation, 2(1), 5. https://doi.org/10.
3390/asi2010005.
41. Thalheimer, E. (2021). Community acceptance of drone noise. INTER-NOISE and NOISE-
CON Congress and Conference Proceedings, 263(6), 913–924. https://doi.org/10.3397/in-
2021-1694.
42. Uejio, C. K., Morano, L. H., Jung, J., Kintziger, K., Jagger, M., Chalmers, J., & Holmes,
T. (2018). Occupational heat exposure among municipal workers. International Archives of
Occupational and Environmental Health, 91(6), 705–715. https://doi.org/10.1007/s00420-
018-1318-3.
43. Woelders, T., Wams, E. J., Gordijn, M. C. M., Beersma, D. G. M., & Hut, R. A. (2018).
Integration of color and intensity increases time signal stability for the human circadian system
when sunlight is obscured by clouds. Scientific Reports, 8(1). https://doi.org/10.1038/s41598-
018-33606-5
44. Wood, D., Yablonina, M., Aflalo, M., Chen, J., Tahanzadeh, B., & Menges, A. (2018). Cyber
physical macro material as a UAV [re]configurable architectural system. Robotic Fabrication
in Architecture, Art and Design 2018 (pp. 320–335). Springer International Publishing. https://
doi.org/10.1007/978-3-319-92294-2_25
Accurate Estimation of
3D-Repetitive-Trajectories using Kalman
Filter, Machine Learning and
Curve-Fitting Method for High-speed
Target Interception

Aakriti Agrawal, Aashay Bhise, Rohitkumar Arasanipalai, Lima Agnel Tony,


Shuvrangshu Jana, and Debasish Ghose

Abstract Accurate estimation of trajectory is essential for the capture of any high-
speed target. This chapter estimates and formulates an interception strategy for the
trajectory of a target moving in a repetitive loop using a combination of estimation
and learning techniques. An extended Kalman filter estimates the current location of
the target using the visual information in the first loop of the trajectory to collect data
points. Then, a combination of Recurrent Neural Network (RNN) with least-square
curve-fitting is used to accurately estimate the future positions for the subsequent
loops. We formulate an interception strategy for the interception of a high-speed
target moving in a three-dimensional curve using noisy visual information from
a camera. The proposed framework is validated in the ROS-Gazebo environment
for interception of a target moving in a repetitive figure-of-eight trajectory. Astroid,
Deltoid, Limacon, Squircle, and Lemniscates of Bernoulli are some of the high-order
curves used for algorithm validation.

Keywords Extended kalman filter · RNN · Least-square curve-fitting ·


3D-repetitive-trajectory · High-speed interception · Estimation

Nomenculture
f focal length of the camera
P Error covariance matrix

A. Agrawal · A. Bhise · R. Arasanipalai · L. A. Tony · S. Jana · D. Ghose (B)


Department of Aerospace Engineering, Indian Institute of Science, Guidance Control and
Decision Systems Laboratory (GCDSL), Bangalore 560012, India
e-mail: [email protected]
A. Agrawal
e-mail: [email protected]
R. Arasanipalai
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 93


A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous
Systems Applications, Studies in Computational Intelligence 1093,
https://doi.org/10.1007/978-3-031-28715-2_4
94 A. Agrawal et al.

r Radius of instantaneous centre of curvature of target trajectory


rdes Desired yaw rate for interceptor
k
xtarget The inertial coordinates of the target x-position at kth sampling time
k
ytarget The inertial coordinates of the target y-position at kth sampling time
k
xtarget,vision The target x-position observed using camera at kth sampling time
k
ytarget,vision The target y-position observed using camera at kth sampling time
xk Target pixel x coordinate at kth sampling time
yk Target pixel y coordinate at kth sampling time
xck x coordinate of instantaneous centre of curvature of target trajectory at
kth sampling time
yck y coordinate of instantaneous centre of curvature of target trajectory at
kth sampling time.
X̂ k−1 State variable at the instant of availability of sensor measurement.
X̂ k+1 State variable after measurement update.
X̂ k+1 State variable after measurement update.
Vtarget Speed of target.
Vdes Desired velocity of interceptor.
Zk Depth of target
RNN Recurrent Neural Network
UAV Unmanned Aerial Vehicle
EKF Extended Kalman Filter
IMU Inertial Measurement Unit

1 Introduction

In automated robotics systems, a class of problems that have engaged the attention
of researchers is that of motion tracking and guidance using visual information by
which an autonomously guided robot can track and capture a target moving in an
approximately known or predicted trajectory. Interception of a target in an outdoor
environment is challenging, and it is important for the defence of the military as well
as important civilian infrastructures. Interception of intruder targets using UAVs has
the advantages of low cost and quick deployability; however, the performance of the
UAV is limited by its payload capability. In this case, the detection of the target is
performed using visual information. Interception of a target with UAVs using visual
information is reported in various literature such as [8, 17, 33] and it is difficult due to
limitations on sensing, payload and computational capability of UAVs. Interception
strategies are generally based on the estimation of target future trajectories [30, 31],
controller based on visual servoing [5, 12], or using vision based guidance law
[22, 34]. Controller based on visual servoing is mostly applicable for slow-moving
targets. Guidance strategies for the interception of a high-speed target are difficult as
the capturability region of the guidance law is small for the interceptor compared to
a low-speed target. Interception using the visual information is further difficult as the
Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … 95

visual output can be noisy in the outdoor environment, and the range of view is small
compared to other sensors such as radar. The interception strategy by prediction of
target trajectory over a shorter interval is not effective in the case of a high-speed
target. Therefore, accurate estimation of the trajectory of the target is important for
efficient interception of a high-speed target. Once the looping trajectory of the target
is obtained, the interceptor could be placed in a favourable position to increase the
probability of interception. In this chapter, we consider a problem where an aerial
target is moving in a repetitive loop at high speed and an aerial robot, or a drone, has
to observe the target’s motion via its imaging system and predict the target trajectory
in order to guide itself for effective capture of the target.
In this chapter, the strategy for interception of a high-speed target moving in a
repetitive loop is formulated after estimation and prediction of target trajectory using
the Extended Kalman Filter, Recurrent Neural Network (RNN) and least square curve
fitting techniques. An Extended Kalman filter (EKF) is used to track a manoeuvring
target moving in an approximately repetitive loop by using the first loop of the tra-
jectory to collect data points and then using a combination of machine learning
with least-square curve-fitting to accurately estimate future positions for the sub-
sequent loops. The EKF estimates the current location of the target from its visual
information and then predicts its future position by using the observation sequence.
We utilise noisy visual information of the target from the three-dimensional trajec-
tory to carry out the trajectory estimation. Several high-order curves, expressed as
univariate polynomials, are considered test cases. Some of these are Circle/Ellipse,
Astroid, Deltoid, Limacon, Nephroid, Quadrifolium, Squircle, and Lemniscates of
Bernoulli and Gerono, among others. The proposed algorithm is demonstrated in the
ROS-Gazebo environment and is implemented in field tests. The problem statement
is motivated by Challenge-1 of MBZIRC-2020, where the objective is to catch an
intruder target moving in an unknown repetitive figure-of-eight trajectory (shown
in Fig. 1). The ball attached to the target drone is moving in an approximate figure-
of-eight trajectory in 3D, and the parameters of the trajectory are unknown to the
interceptors. Interceptors need to detect, estimate and formulate strategies for grab-
bing or interception of the target ball. In this chapter, the main focus is to estimate
the target trajectory using visual information. The method proposed in the chapter is
used first to estimate the position of the target using the Kalman Filter techniques,
and then the geometry of the looping trajectory is estimated using the learning and
curve fitting techniques. The main contribution of the chapter is the following:
1. Estimation of target position using visual information in EKF framework.
2. Estimation of target trajectory moving in a standard geometric curve in a closed
loop using Recurrent Neural Network and Least-Square curve fitting techniques.
3. Development of a strategy for interception of a high-speed target moving in a
standard geometric curve in a repetitive loop.
The rest of the chapter is organised as follows: Relevant literature is presented
in Sect. 2. Estimation and prediction of target location using visual information are
presented in Sect. 3. Detailed curve fitting methods using learning in 2D and 3D are
96 A. Agrawal et al.

Fig. 1 Problem statement

described in Sect. 4. The strategy for interception of a high-speed target is formulated


in Sect. 5. Simulation results are presented in Sect. 6. The summary of the chapter is
discussed in Sect. 7.

2 Related Work

Estimation of the target position during an interception scenario is traditionally


obtained using radar information. Estimation of target position using Extended
Kalman Filter from noisy GPS measurements are reported in literature [3, 19, 23].
Several interesting works have been reported in the literature about the interception
of a missile having a higher speed than the interceptor [35, 37]. However, intercep-
tion using visual interception is a relatively new topic. The estimation of the target
position from visual information in outdoor information is highly uncertain due to
the presence of high noise in the information of the target pixel.
Estimation of prediction of moving object using Kalman filtering techniques
are reported in [1, 24, 27]; however, prediction accuracy with these techniques
reduces with time horizon. Target trajectory estimation based on learning techniques
is reported in various literature such as k means [26], Markov models [7], and Long
Short Term Memory (LSTM) [29] techniques. In [29], the LSTM learning technique
is used for the estimation of the trajectory of highway vehicles from temporal data.
Here, the data is geographically classified into different clusters, and then a differ-
ent LSTM model is applied for robust trajectory estimation from a large volume of
information on vehicle position. After partitioning the trajectories into approximate
line segments, a novel trajectory clustering technique is proposed to estimate the pat-
tern of target motion [32]. In [28], the Inertial Measurement Unit(IMU) and visual
information are combined through an unsupervised deep neural network, Visual-
Inertial-Odometry Learner (VIOLeaener) network for trajectory estimation. In [25],
Convolution Neural Network is used to estimate the vehicle trajectory using the
optical flow.
Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … 97

Trajectory estimation using curve fitting method is reported in [2, 10, 15, 16, 18].
In [2], bearing-only measurements are used for fitting splines for highly manoeuvring
targets. In [16], the trajectory of targets has a smooth trajectory using a sliding time
window approach where parameters of the parametric curve are updated iteratively.
In [18], Spline Fitting Filtering (SFF) algorithm is used to fit a cubic spline to estimate
the trajectory of the manoeuvring target. The trajectory is estimated using data-driven
regression analysis, known as “fitting for smoothing (F4S)”, with the assumption that
trajectory is a function of time [15].
Interception of a target using the estimation of the target’s future position is
reported in various literature such as [9, 13, 36]. In [13], the target trajectory is
estimated using a simple linear extrapolation method and uncertainty in the target
position is considered using a random variable. In [36], the interception strategy
is formulated using the prediction of target trajectory using the historical data and
selection of optimal path using third-order Bezier curves. The proposed formulation
is validated using the simulation and not validated using real visual information.
Capturing an aerial target using a robotic manipulator after the target’s pose and
motion estimation using the adaptive extended Kalman filter and photogrammetry is
reported in [9].
Other research groups approached the similar problem statement (as shown in
Fig. 1) [4, 6, 38], and the estimation of trajectory is performed using the filtering
techniques assuming that target will be following the figure-of-eight trajectory; how-
ever, a general approach for estimation of trajectory following unknown geometric
curve is not reported.

3 Vision Based Target Position Estimation

In this section, the global position of the target is estimated from visual information.
It is assumed that the target trajectory lies in the 2D plane, and thus measurements
of the target in the global X -Y plane are considered. The target’s motion is assumed
to be smooth; that is, the change in curvature of the trajectory remains bounded and
smooth over time. Let xtarget and ytarget are coordinates of the target position, and
Vtarget is target speed, ψ is flight path angle with the horizontal. The target motion
without considering wind can be expressed as,

ẋtarget = Vtarget cosψ (1)

ẏtarget = Vtarget sinψ (2)

ψ̇ = ω (3)

The target trajectory, instantaneous circle, and the important variables at kth sampling
time is shown in Fig. 2. Let the position of the target at kth sampling time be (X k ),
given as
98 A. Agrawal et al.

Fig. 2 Target trajectory

 k T
X k = xtarget k
ytarget (4)

k k
where xtarget and ytarget are the inertial coordinates of the target in the global X -Y plane
at the kth sampling time . {xck , yck } are coordinates of the center of the instantaneous
curvature of the target trajectory at that instant. From Fig. 2, the variables θ k and ψ k
are related as follows,
π
θk = ψk − (5)
2
based on which Eqs. (1) and (2) can be simplified to,

ẋtarget = −Vtarget sinθ (6)

ẏtarget = Vtarget cosθ (7)

Therefore, the motion of the target can be represented in discrete time-space by the
following equations,

k
xtarget = xtarget
k−1
− Vtarget Δt (ytarget
k
− yck )/ (ytarget
k
− yck )2 + (xtarget
k
− xck )2 (8)

k
ytarget = ytarget
k−1
+ Vtarget Δt (xtarget
k
− xck )/ (ytarget
k
− yck )2 + (xtarget
k
− xck )2 (9)

where Vtarget is the speed of the target and Δt is the sampling time interval.
Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … 99

The target position is estimated using the target pixel information from the image
plane of a monocular camera. The target position is obtained considering the per-
spective projection of the target in the image plane. If the coordinates of the estimated
k k
target position are (xtarget,vision and ytarget,vision ), then
Z k xk
k
xtarget,vision = (10)
f

Z k yk
k
ytarget,vision = (11)
f

where xk and xk are the coordinates of the target pixel in the image plane, f is the
focal length of the camera, and Z k is the target depth.
The target measurement (Y k ) can be presented as,
 k   k 
xtarget,vision x
Y = k
k
= target
k + ηk (12)
ytarget,vision ytarget

where η ∼ N (0, R) is the measurement noise assumed to be normally distributed.

3.1 Computation of Centre of Instantaneous Curvature


of Target Trajectory

Computation of the coordinates of the centre of instantaneous curvature of target


trajectory and prediction of target trajectory from the observation sequences is derived
using the similar approach for the derivation of discrete-time guidance law as reported
in [20]. Let the coordinates of the centre of instantaneous curvature be calculated
from previous m observations, and we consider that its trajectory maintains a constant
curvature for the observed sequences of the target’s position. Using Fig. 3, the target
motion is expressed as,

Fig. 3 Centre of curvature


100 A. Agrawal et al.

k+1
xtarget = xtarget
k
− r δ sin θk (13)

k+1
ytarget = ytarget
k
+ r δ cos θk (14)

θk = θk−1 + δ (15)

where r is the radius of the instantaneous circle, and δ is the change in the target’s
flight path angle θ between the time steps. Let the sequence of last m observations
of target positions gathered at sample index k be,

{xtarget
k−i
, ytarget
k−i
} where i = 0, 1, 2, ..m − 1 (16)

We will define the difference in x-position and y-position of target of jth sequence
at kth sample index as,
k− j k− j−1
Δxtarget (k, j) = xtarget − xtarget = −r δ sin θk− j−1 (17)

k− j k− j−1
Δytarget (k, j) = ytarget − ytarget = r δ cos θk− j−1 (18)

Equation 17 can be written as,

Δxtarget (k, j) = −r δ sin(θk− j−2 + δ) (19)

Equivalently,

Δxtarget (k, j) = −r δ sin θk− j−2 cos δ − r δ cos θk− j−2 sin δ (20)

Therefore,

Δxtarget (k, j) = Δxtarget (k, j − 1) cos δ − Δytarget (k, j − 1) sin δ (21)

Similarly, Eq. 18 can be written as,

Δytarget (k, j) = r δ cos(θk− j−2 + δ) (22)

Therefore,

Δytarget (k, j) = Δxtarget (k, j − 1) sin δ + Δytarget (k, j − 1) cos δ (23)

Since the parameter δ describes the evolution of the target’s states, the elements of
the evolution matrix contain (cos δ, sin δ). The difference in observations equations
are written in matrix form as Eq. 24, for j = 0, 1, ..., m − 1.
Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … 101
⎡ ⎤ ⎡ ⎤
.. .. ..
⎢ . ⎥ ⎢ . . ⎥ 
⎢Δxtarget (k, j)⎥ ⎢Δxtarget (k, j − 1) −Δytarget (k, j − 1)⎥ cos δ
⎢ ⎥=⎢ ⎥ (24)
⎢Δytarget (k, j)⎥ ⎢Δytarget (k, j − 1) Δxtarget (k, j − 1) ⎥ sin δ
⎣ ⎦ ⎣ ⎦
.. .. ..
. . .

The least squares solution of the observation sequence provides the estimation of
the evolution matrix at every sampling step, and we obtain the estimated value of δ
as δ̂.
Let (xc (k), yc (k)) be the co-ordinates of the instantaneous center of curvature of
the target trajectory, then from Fig. 3 we can write,

xc (k) + r cosθk = xtarget


k
(25)

yc (k) + r sinθk = ytarget


k
(26)

Therefore using (17) and (18), (xc (k), yc (k)) is calculated as follows:

Δytarget (k, 1)
xc (k) = xtarget
k
− (27)
δ̂
Δxtarget (k, 1)
yc (k) = ytarget
k
+ (28)
δ̂
Steps for calculating the centre of curvature of the target trajectory are mentioned
in detail in Algorithm 1.

Algorithm 1: Algorithm for computing instantaneous centre of curvature of


target trajectory
k− j k− j
Input: Sequence of m target measurements: {xtarget , ytarget } where j = 1, 2, ..m − 1

1. Populate Δxtarget (k, j) and Δytarget (k, j) in b(k)


2. Populate Δxtarget (k, j − 1) and Δytarget (k, j − 1) in A(k)
3. Solve for evolution matrix.
(cos δ, sin δ) ← b(k)A(k)T (A(k)A(k)T )−1
4. Compute the centre coordinates
Δy (k,1)
xc (k) ← xtarget
k − target
δ̂
Δx target (k,1)
yc (k) ← ytarget
k +
δ̂

Output: xc (k) and yc (k)


102 A. Agrawal et al.

3.2 EKF Formulation

Once the instantaneous centre of curvature of the target trajectory at the current
instant is estimated, the target position is estimated using the continuous-discrete
extended Kalman Filter (EKF) framework. The continuous target motion model is
represented as,
Ẋ = F(X, U, ξ ) (29)

where
 
−Vtarget (ytarget − yc )/ (ytarget − yc )2 + (xtarget − xc )2
F(X, U ) = (30)
Vtarget (xtarget − xc )/ (ytarget − yc )2 + (xtarget − xc )2

The discrete measurement model is


k
 
xtarget
Yk = Hk (X k , ηk ) = k + ηk (31)
ytarget

where ξ is the process noise and ηk is the measurement noise. It is assumed that
process noise and measurement noises are zero mean Gaussian white noise, that is,
ξ ∼ N (0, Q) and ηk ∼ N (0, R).
The prediction step is the first stage of the EKF algorithm, where we propagate
the previous state and input values to the non-linear process Eq. (32) in a discrete
time estimate to arrive at the state estimate.

X̂˙ = F( X̂ , U, 0) (32)

The error covariance matrix is propagated as follows:

Ṗ = A P + P A T + Q (33)
∂F
where matrix A = ∂X
| X̂ . The matrix A can be derived as,

Vtarget
A= Γ (34)
((ytarget − yc )2 + (xtarget − xc )2 )3/2

where  
(ytarget − yc )(xtarget − xc ) −(xtarget − xc )2
Γ = . (35)
(ytarget − yc )2 −(xtarget − xc )(ytarget − yc )

The state and measurement update equation is given by,

X̂ k+ = X̂ k− + L k (Yk − Ck X̂ k− ) (36)
Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … 103

Pk+ = (I − L k Ck )Pk− (37)

where
L k = Pk CkT (R + Ck Pk CkT )− 1 (38)

∂H
Ck = | X̂ (39)
∂X
The EKF position estimation framework provides a filtered position of the target,
which is then used for predicting the target’s trajectory. The workflow of trajectory
prediction is divided into two phases, namely, the observation phase and the predic-
tion phase. During the observation phase, a predefined sequence of observations of
the estimated target position is gathered, and the trajectory is predicted in the near
future.

3.3 Future State Prediction

Prediction of target position in shorter duration is important for ease in the tracking
of the target. For prediction of target position up to n steps, we can write, for j =
0, 1, 2, · · · , n − 1
k+ j+1 k+ j
x̂target = x̂target + Δx̂target (k, j + 1) (40)

k+ j+1 k+ j
ŷtarget = ŷtarget + Δ ŷtarget (k, j + 1) (41)

where Δx̂target (k, j + 1) and Δ ŷtarget (k, j + 1) is expressed as,

k+ j+1 k+ j
Δx̂target (k, j + 1) = x̂target − x̂target = Δx̂target (k, j) cos δ̂ − Δ ŷtarget (k, j) sin δ̂
(42)
k+ j+1 k+ j
Δ ŷtarget (k, j + 1) = ŷtarget − ŷtarget = Δx̂target (k, j) sin δ̂ + Δ ŷtarget (k, j) cos δ̂
(43)
The steps for the trajectory prediction are described in Algorithm 2.

4 Mathematical Formulation for Curve Fitting Method

In this section, the mathematical formulation for the estimation of the looping tra-
jectory is derived. The curve-fitting technique is applied in the next loop based on
the initial observations of the target position in the first loop. It is to be noted that
conventional curve fitting techniques using the regression method will fail to esti-
104 A. Agrawal et al.

Algorithm 2: Trajectory prediction


k−i k−i
Input: Sequence of m measurements {xtarget , ytarget } where i = 0, 1, 2, .., m − 1

• Populate Δxtarget (k, j) and Δytarget (k, j) in b(k)


• Populate Δxtarget (k, j − 1) and Δytarget (k, j − 1) in A(k)
• Estimate δ̂ after solving evolution matrix.
δ̂ ← b(k)A(k)T (A(k)A(k)T )−1
• Predict the sequential change in the target location.
Δx̂target (k, j + 1) ← Δx̂target (k, j) cos δ̂ − Δ ŷtarget (k, j) sin δ̂
Δ ŷtarget (k, j + 1) ← Δx̂target (k, j) sin δ̂ + Δ ŷtarget (k, j) cos δ̂
• Predict the target position after propagating the sequential changes.
k+ j+1 k+ j
x̂target ← x̂target + Δx̂target (k, j + 1)
k+ j+1 k+ j
ŷtarget ← ŷtarget + Δ ŷtarget (k, j + 1)

Table 1 Equations of all high order curves taken into consideration


Curves Curve equation
x2 y2
Circle/Ellipse a2
+ b2
=1
2 2 2
Astroid x + y3 = a3
3

Deltoid (x 2 + y 2 )2 + 18a 2 (x 2 + y 2 ) − 27a 4 =


8a(x 3 − 3x y 2 )
Limacon (x 2 + y 2 − ax)2 = b2 (x 2 + y 2 )
Nephroid (x 2 + y 2 − 4a 2 )3 = 108a 4 y 2
Quadrifolium (x 2 + y 2 )3 = 4a 2 x 2 y 2
Squircle (x − a)4 + (x − b)4 = r 4
Lemniscate of Bernoulli (x 2 + y 2 )2 = 2a 2 (x 2 − y 2 )
Lemniscate of Gerono x 4 = a 2 (x 2 − y 2 )

mate the complex curves with multiple loops; therefore, the curve fitting technique
is formulated using the learning techniques. The following assumptions are made
about the target motion.
• The target drone is moving continuously in a looping trajectory in standard geo-
metrical curves.
• The target trajectory is a closed loop curve.
We have considered all high-order closed curves to the best of our knowledge,
and the curve fits the data to the appropriate curve equation without prior knowledge
about the shape of the figure. The closed curves taken into consideration are listed
in Table 1. We have considered curves with one parameter (for example, Astroid,
Nephroid, etc.) and with two parameters (for example, Limacon, Squircle, etc.).
Since Circle is a special case of the ellipse, we will include both in a single category.
This method will also be applicable in any closed mathematically-derivable curve.
Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … 105

The curves mentioned above have been well studied, and their characteristics are
well known. They are usually the zero set of some multivariate polynomials. We can
write their equations as
f (x, y) = 0 (44)

For example, the Lemniscate of bernoulli is

(x 2 + y 2 )2 − 2a 2 (x 2 − y 2 ) = 0 (45)

Lemniscate of Bernoulli has a single parameter a, which needs to be estimated. On


the other hand, the equation of an ellipse has two parameters, a and b, that need to
be estimated. Therefore, we can write a general function for the curves as

f (x, y, a, b) = 0 (46)

where b may or may not be used based on the category of shape the points are being
fitted to.
Univariate polynomials of the form

f (x) = a0 + a1 x + a2 x 2 + ... + ak x k (47)

can be solved using matrices if there are enough points to solve for the k unknown
coefficients. On the other hand, multivariate equations require different methods to
solve for their coefficients. One method for curve fitting uses an iterative least-squares
approach along with specifying related constraints.

4.1 Classification of Curves

The above-mentioned categories of curves all have different equations. Classifying


the curve into one of the above categories is required before curve fitting. We train
a neural network to classify the curves into various categories based on the (x, y)
points collected from the target drone.
The architecture of the network used is shown in Fig. 4. The input I is a vector
of m points arranged as [x0 , x1 , . . . , xm , y0 , y1 , . . . , ym ]. The output O is a vector of
length 9, denoting the probabilities of the given set of points belonging to the various
categories. Therefore, the network can be represented as a function f trained to map

f : [x0 , x1 , . . . , xm , y0 , y1 , . . . , ym ] → O (48)

The training parameters are listed in Table 2.


This network can classify 2D curves into the above-mentioned categories. In the
case of 3D, we can use this same network to classify the curve once it has been
rotated into a 2D plane (like the X -Y plane).
106 A. Agrawal et al.

Fig. 4 Network architecture

Table 2 Training parameters for the classification network


Parameters Value
Optimizer Adam
Learning rte 10−4
No. of taining eochs 9
Final taining acuracy 98%

4.2 Least-Squares Curve Fitting in 2D

Considering any of the above-mentioned curves in two dimensions, the base equation
has to be modified to account for both offset and orientation in 2D. Therefore, let the
orientation be some θ , and the offset be (x0 , y0 ). On applying a counter-clockwise θ
rotation to a set of points, the rotation is defined by this matrix equation:
    
x cos θ − sin θ x 
= (49)
y sin θ cos θ y

Substituting (x, y) from Eq. 49 into Eq. 46, we get the following function:

f (y  cos θ + x  sin θ, x  cos θ − y  sin θ, a, b) = 0 (50)


Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … 107

Letting

g(x  , y  , θ, a, b) = f (y  cos θ + x  sin θ, x  cos θ − y  sin θ, a, b) (51)

and rewriting it by replacing x  and y  by x and y respectively, we have

g(x, y, θ, a, b) = 0 (52)

To account for offset from origin, we can replace all x and y with x  and y  ,
respectively, where,

x  = x − x0 (53)
y  = y − y0 (54)

and (x0 , y0 ) is the offset of the centre of the figure from the origin. Therefore, we
have
g(x, y, θ, a, b, x0 , y0 ) = 0 (55)

as the final equation of the figure we are trying to fit. Applying the least-squares
method on the above equation for curve-fitting of m empirical points (xi , yi ).


m
E2 = (g(xi , yi , θ, a, b, x0 , y0 ) − 0)2 (56)
i=0

Our aim is to find x0 , y0 , a, b and θ such that E 2 is minimised. This can only be
done by,
d E2
= 0, where β ∈ {x0 , y0 , a, bθ } (57)

If g had been a linear equation, simple matrix multiplication would have yielded the
optimum parameters. But since Eq. 55 is a complex nth (where n is 2, 4 or 6) order
nonlinear equation with trigonometric variables, we need to use iterative methods
in order to estimate the parameters, a, b, θ , x0 , and y0 . Therefore, this work uses
Levenberg-Marquardt [14, 21] least-squares algorithm to solve the non-linear Eq. 57.

4.3 Least-Squares Curve Fitting of Any Shape in 3D

If the orientation of any shape is in 3D, the above algorithm will need some modifi-
cations. We first compute the equation of the plane in which the shape lies and then
transform the set of points to a form where the method in Sect. 4.2 can be applied.
108 A. Agrawal et al.

In order to find the normal to the plane of the shape, we carry out singular value
decomposition (SVD) of the given points. Let the set of points (x, y, z) be represented
as matrix A ∈ Rn×3 . From each point, subtract the centroid and calculate SVD of A.

A = U Σ V. (58)

where, columns of U = (u 1 , u 2 .....u n ) (left singular vectors), span the space of


columns of A, columns of V = (v1 , v2 , v3 ) (right singular vectors) span the space of
rows of A and Σ = diag(σ1 , σ2 , σ3 ) are the singular values linked to each left/right
singular vector. Now, since the points are supposed to be in 2D space, σ3 = 0 and
v3 = (n 1 , n 2 , n 3 ) gives the normal vector to the plane. Therefore, the equation of the
plane is,

n1 x + n2 y + n3 z = C (59)

where C is a constant. The next step is to transform the points to the X − Y plane. For
that, we first find the intersection of the above plane with X -Y plane by substituting
z = 0 in Eq. 59. We get the equation of line as,

n 1 x + n 2 y = C, where C is a constant. (60)

Then we rotate the points about the z-axis such that the above line is parallel to the
x-axis. Angle of rotation α = 0, β = 0 and γ = arctan( −n n1
2
) needs to be substituted
in matrix R, given in Eq. 61. New points will be A z = A R.
⎡ ⎤
cos β sin γ sin α sin β cos γ − cos α sin γ cos α sin β cos γ + sin α sin γ
R = ⎣ cos β sin γ sin α sin β sin γ + cos α cos γ cos α sin β sin γ − sin α cos γ ⎦
− sin β sin α cos β cos α cos β
(61)

Algorithm 3: Least Means Squares Algorithm


• Initialise parameters for shape detection network
• Store data points in variable Shape
• If Shape is in 3D then
Transform the shape to the X -Y plane using method in Sect. 4.3
Shape ← Shapetrans f or med
• Get shape prediction, shape pr ed , of Shape from shape detection network
• Apply curve-fitting algorithms on Shape using the equation of shape pr ed
• Generate target drone trajectory using the estimated shape parameters

We then rotate the points about X -axis by the angle cos−1 (|n 3 |/ n 1 , n 2 , n 3 ) to
make the points lie in the X − Y plane. Then, we substitute angles α =
|n 3 |
arccos( n 1 ,n 2 ,n 3 
), β = 0 and γ = 0 in the rotation matrix given in (61). Finally,
Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … 109

the set of points in the X − Y plane will be A f inal = A z R. We can then use the
neural network described in Sect. 4.1 to classify the curve into one of the nine cate-
gories. Then, we can compute the parameters (x0 , y0 , a, b, θ ) of the classified curve
using method given in Sect. 4.2. The combined algorithm for the shape detection and
parameter estimation is shown in Algorithm 3.

5 Interception Strategy

In this section, the interception strategy is formulated considering that the target is
moving in a repetitive figure-of-eight trajectory; however, the proposed framework
could be extended to other geometric curves. Once the target trajectory is estimated,
the favourable location for the interceptor is selected where interception can lead to
almost head-on collision. Let’s consider a target is moving in the direction of the
arrow (marked in green colour), as shown in Fig. 5. So, once the trajectory of the
target is estimated through the EKF, machine learning and curve fitting techniques,
and the direction of target motion is known; it is found that I1 and I2 are the favourable
location for generating almost head-on interception scenario. The part of the curve
between the red and yellow lines is in a straight line, so the target can be detected
earlier, and the interceptor will have a higher response time for target engagement.
Once the target is detected, the interceptor applies the pursuit-based guidance strategy
to generate the desired velocity and desired yaw rate. Details of the guidance strategy
are mentioned in [11, 34]. Here, In Algorithm 4, we have mentioned the important
steps of the guidance command for the sake of completion. The desired velocity
(Vdes ) for the interceptor is generated to drive the interceptor along the line joining
the interceptor and the projection of the target in the image plane. The desired yaw
rate (rdes ) is generated to keep the target in the field of view of the camera by using
a PD controller based on the error (eψ ) between the desired yaw and the actual
yaw. In the case of an interception scenario with multiple interceptors, the estimated
favourable standoff locations are allocated to multiple drones through task allocation
architecture considering the current states of target drones and interceptors.

Fig. 5 Favorable standoff


location for interception
110 A. Agrawal et al.

Fig. 6 Complete architecture for high-speed interception

The important steps of the overall framework for interception of a high-speed


target moving in a looping trajectory are shown in the block diagram (Fig. 6) and
Algorithm 4.

6 Results

6.1 Simulation Experiments

In the Gazebo environment, the complete simulation environment is created consid-


ering the target and interceptors, where the red ball attached to the target is considered
for interception (Shown in Figs. 7 and 8).
A vision module is developed for the detection and tracking of the red ball.
Although, the visual information captured in the simulation environment will have
similar uncertainty related to the outdoor environment. A ROS-based pipeline is
written in C++ for the simulation of filtering, estimation, and interception algorithms.
We have tested the estimation framework considering the target motion on different
geometric curves. Initially, the desired geometric curve is represented by the various
waypoints, and these waypoints are fed to the target autopilot. The target follows
the waypoints at a constant speed, and the trace of the attached ball of the target is
also can be approximately considered as the shape of the desired geometric curve.
The path followed by the ball is slightly perturbed by the wind and the oscillation
in the attaching mechanism between the target drone and the ball. The Interceptor
drone detects the ball with the attached camera, and the ball position is derived using
Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … 111

Algorithm 4: Interception of high-speed target


Input: Target pixel coordinates in the image plane, focal length, target depth, interceptor’s
yaw (ψ), the magnitude of desired target velocity (V ), Rotation from camera frame to
inertial frame (Rc2i ),
1. Obtain the target position from the target pixel coordinates in the camera frame.
k
xtarget,vision ← Z kfxk
k
ytarget,vision ← Z kfyk
2. Calculate the instantaneous centre of curvature of the target trajectory.
Δy (k,1)
xc (k) ← xtarget
k − target
δ̂
Δx (k,1)
yc (k) ← ytarget
k + target
δ̂
3. Estimate the target position using the EKF framework.
EKF prediction steps:
X̂˙ ← F( X̂ , U, 0)
Ṗ ← A P + P A T + Q
EKF updation steps:
X̂ k+ ← X̂ k− + L k (Yk − Ck X̂ k− )
Pk+ = (I − L k Ck )Pk−
4. Estimate the target trajectory using Algorithm 3.
5. Obtain the suitable standoff point for interceptors for ease in the interception.
6. Once the target is detected, apply the guidance strategy to generate the desired velocity
command and yaw rate command.
Vdes ← Rc2i Vc Vc = (Vcx , Vcy , Vcz )
Vcx ← V  2 xk2 , Vcy ← V  2 yk2 , Vcz ← V  2 f 2
x k +yk + f 2 x k +yk + f 2 x k +yk + f 2
de
rdes ← kpψ eψ + kdψ dtψ
Output: Desired velocity (Vdes ) and yaw rate (rdes ).

Fig. 7 Gazebo environment with IRIS drone: Target

the perspective projection. The position of the ball is fed to the EKF module as the
measurement and used in the EKF module for filtered position estimation of the ball.
After one loop of data, the neural network predicts the category of the shape, after
112 A. Agrawal et al.

Fig. 8 Gazebo environment: Target and interceptor

Fig. 9 Figure showing the


collected data points from
the drone trajectory and
predicted shape through
curve-fitting for Circle

which the curve fitting algorithm estimates the shape parameters using the appropriate
shape equation. The raw position data and the estimated shape of different curves are
shown in Figs. 9–17. The green colour dot shows the raw position of the ball obtained
using the information from the camera, and the blue curve represents the predicted
shape of the curve. As can be seen in Figs. 9–17 the overall estimation framework is
able to reasonably approximate the desired geometry of the curve.
The proposed estimation framework is tested for the estimation of 3D geometric
curves. Estimation of Leminscate of Gerono in 3D is shown in Fig. 18.
The proposed high-speed target interception strategy is tested by creating an inter-
ception scenario in the Gazebo environment where two interceptor drones are trying
to intercept the target ball moving in the figure-of-eight (Leminscate of Gerono)
curve (similar to Fig. 1). Different snapshot of the experiments are shown in Fig. 19–
24. Figure 20 shows a snapshot during the tracking and estimation of ball position
and the corresponding trajectory using the visual information. Figures 22–24 shows
the snapshot during the engagement once the target ball is detected by Drone 2 while
waiting at the standoff location.
Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … 113

Fig. 10 Figure showing the


collected data points from
the drone trajectory and
predicted shape through
curve-fitting for Astroid

Fig. 11 Figure showing the


collected data points from
the drone trajectory and
predicted shape through
curve-fitting for Deltoid

Fig. 12 Figure showing the


collected data points from
the drone trajectory and
predicted shape through
curve-fitting for Limacon
114 A. Agrawal et al.

Fig. 13 Figure showing the


collected data points from
the drone trajectory and
predicted shape through
curve-fitting for Nephroid

Fig. 14 Figure showing the


collected data points from
the drone trajectory and
predicted shape through
curve-fitting for
Quadrifolium

Fig. 15 Figure showing the


data points and predicted
shape for Squircle
Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … 115

Fig. 16 Figure showing the


collected data points from
the drone trajectory and
predicted shape through
curve-fitting for Lemniscate
of Bernoulli

Fig. 17 Figure showing the


collected data points from
the drone trajectory and
predicted shape through
curve-fitting for Lemniscate
of Gerono

Fig. 18 Curve prediction in


3D using estimated figure
parameters on simulated data
116 A. Agrawal et al.

Fig. 19 Snapshots of target


drone and interceptors in
Gazebo environment

Fig. 20 Drone estimates the


target trajectory while
tracking the target drone

Fig. 21 After trajectory


estimation, Drone 1 waits in
standoff location till
detection of target ball
Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … 117

Fig. 22 Drone 2 detects the


target drone and starts
applying the guidance
strategy

Fig. 23 Snapshots during


the head-on engagement

6.2 Hardware Experiments

The experimental setup (shown in Fig. 25) consists of two drones. One of the drones,
the target drone, has a red ball attached and will fly in a figure-of-8 trajectory. The
second drone is fitted with Sec3CAM 130 monocular camera for the detection of the
ball. The second drone will follow the target drone and estimate the ball position and
velocity using computer vision techniques. The raw position data as obtained using
the information from the image plane is shown in Fig. 26. The raw position data
is fed to the EKF algorithm and subsequently through the RNN and Least-Square
curve fitting techniques. Figure 27 shows the estimated figure-of-eight and raw ball
118 A. Agrawal et al.

Fig. 24 Snapshots during


the terminal phase of
interception

Fig. 25 Interceptor drone


and target drone. Using
visual information, the
interceptor drone estimates
the trajectory of the ball

position observed using visual information using the proposed framework (Figs. 13,
14, 15, 17, 21, 23).
Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … 119

Fig. 26 Red trace represents


the raw position data as
obtained by tracking the
target

Fig. 27 Curve prediction in


3D using estimated figure
parameters on experimental
data

7 Discussions

Interception of a target with low speed is easy compared to a target with a higher
speed as the region of capturability for a given guidance algorithm will be higher for
interception of a low-speed target. In the case of interception with small UAVs, the
target information is obtained from a camera’s information, and a camera’s detection
range is small compared to other sensors like radar. Interception of a high-speed target
using a conventional guidance strategy is difficult, even if the high-speed target is
moving in a repetitive loop. Therefore, the looping trajectory of the target needs to
be estimated so that interceptor can be placed at a favourable position for ease in the
interception of the high-speed target. The target position estimation using the visual
sensor is too noisy in an outdoor environment to estimate the target trajectory, as
120 A. Agrawal et al.

observed from field experiments. So, we have estimated the target position using the
extended Kalman filter framework. To obtain the target position, the target needs to
track the target, so we have proposed to predict the target trajectory over a shorter
horizon using least square methods considering the sequence of observed target
positions. Once the initial observations of the target position are made, learning
techniques and curve fitting methods are applied to identify the curve. Once the
parameter of the curve is estimated, the interceptors are placed for the head-on
collision situation. We have successfully validated the estimation framework for
various geometric curves in the Gazebo and outdoor environments. The geometric
curves should be a standard closed loop curve. While formulating the motion model
for target position estimation, it is assumed that the target’s motion is smooth, i.e.,
the change in curvature of the target’s trajectory remains bounded and smooth over
time. This assumption is the basis of our formulation of the target motion model.
The interception strategy is checked only in simulation. The maximum speed of the
target should be within a limit such that tracking by the interceptor is possible in the
first loop. The standby location for the interceptors to be selected is such that the
interceptor will have a higher reaction time to initiate the engagement. The proposed
framework provides a better interception strategy for a high-speed target rather than
directly chasing the target after detection due to higher response time and better
alignment along the target’s path.

8 Conclusions

In this chapter, we present the framework which is designed to estimate and predict
the position of a moving target, which follows a repetitive path of some standard
shape. The proposed trajectory estimation algorithm is used to formulate the inter-
ception strategy for a target having a higher speed at the interceptor. The target
position is estimated using the EKF framework using visual information, and then
the target position is used to estimate the shape of the repetitive loop of the target.
Estimation of different curves such as Lemniscate of Bernoulli, Deltoid, and Limacon
are performed using realistic visual sensors set up in the Gazebo environment. The
proposed high-speed interception strategy is validated by simulating an interception
scenario of a high-speed target moving in a figure-of-eight trajectory in the ROS-
Gazebo framework. Future work includes the integration of the proposed estimation
and prediction algorithm in the interception framework and validation of the com-
plete architecture in the outdoor environment. The proposed technique can also be
used to help the motion planning of autonomous cars and develop driver-assistance
systems in traffic junctions.

Acknowledgements We would like to acknowledge the Robert Bosch Center for Cyber Physical
Systems, Indian Institute of Science, Bangalore, and Khalifa University, Abu Dhabi, for partial
financial support.
Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … 121

References

1. Abbas, M. T., Jibran, M. A., Afaq, M., & Song, W. C. (2020). An adaptive approach to vehicle
trajectory prediction using multimodel kalman filter. Transactions on Emerging Telecommuni-
cations Technologies, 31(5), e3734.
2. Anderson-Sprecher, R., & Lenth, R. V. (1996). Spline estimation of paths using bearings-only
tracking data. Journal of the American Statistical Association, 91(433), 276–283.
3. Banerjee, P., & Corbetta, M. (2020). In-time uav flight-trajectory estimation and tracking using
bayesian filters. In 2020 IEEE Aerospace Conference (pp. 1–9). IEEE
4. Barisic, A., Petric, F., & Bogdan, S. (2022). Brain over brawn: using a stereo camera to detect,
track, and intercept a faster uav by reconstructing the intruder’s trajectory. Field Robotics, 2,
34–54.
5. Beul, M., Bultmann, S., Rochow, A., Rosu, R. A., Schleich, D., Splietker, M., & Behnke, S.
(2020). Visually guided balloon popping with an autonomous mav at mbzirc 2020. In 2020
IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR) ( pp. 34–41).
IEEE
6. Cascarano, S., Milazzo, M., Vannin, A., Andrea, S., & Stefano, R. (2022). Design and develop-
ment of drones to autonomously interact with objects in unstructured outdoor scenarios. Field
Robotics, 2, 34–54.
7. Chen, M., Liu, Y., & Yu, X. (2015). Predicting next locations with object clustering and tra-
jectory clustering. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp.
344–356). Springer
8. Cheung, Y., Huang, Y. T., & Lien, J. J. J. (2015). Visual guided adaptive robotic intercep-
tions with occluded target motion estimations. In 2015 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS) (pp. 6067–6072). IEEE
9. Dong, G., & Zhu, Z. H. (2016). Autonomous robotic capture of non-cooperative target by
adaptive extended kalman filter based visual servo. Acta Astronautica, 122, 209–218.
10. Hadzagic, M., & Michalska, H. (2011). A bayesian inference approach for batch trajectory
estimation. In 14th International Conference on Information Fusion (pp. 1–8). IEEE
11. Jana, S., Tony, L. A., Varun, V., Bhise, A. A., & Ghose, D. (2022). Interception of an aerial
manoeuvring target using monocular vision. Robotica, 1–20
12. Kim, S., Seo, H., Choi, S., & Kim, H. J. (2016). Vision-guided aerial manipulation using a
multirotor with a robotic arm. IEEE/ASME Transactions On Mechatronics, 21(4), 1912–1923.
13. Kumar, A., Ojha, A., & Padhy, P. K. (2017). Anticipated trajectory based proportional navi-
gation guidance scheme for intercepting high maneuvering targets. International Journal of
Control, Automation and Systems, 15(3), 1351–1361.
14. Levenberg, K. (1944). A method for the solution of certain non-linear problems in least squares.
Quarterly of Applied Mathematics, 2(2), 164–168.
15. Li, T., Prieto, J., & Corchado, J. M. (2016). Fitting for smoothing: a methodology for
continuous-time target track estimation. In 2016 International Conference on Indoor Posi-
tioning and Indoor Navigation (IPIN) (pp. 1–8). IEEE
16. Li, T., Chen, H., Sun, S., & Corchado, J. M. (2018). Joint smoothing and tracking based on
continuous-time target trajectory function fitting. IEEE transactions on Automation Science
and Engineering, 16(3), 1476–1483.
17. Lin, L., Yang, Y., Cheng, H., & Chen, X. (2019). Autonomous vision-based aerial grasping for
rotorcraft unmanned aerial vehicles. Sensors, 19(15), 3410.
18. Liu, Y., Suo, J., Karimi, H. R., & Liu, X. (2014). A filtering algorithm for maneuvering target
tracking based on smoothing spline fitting. In Abstract and Applied Analysis (Vol. 2014).
Hindawi
19. Luo, C., McClean, S. I., Parr, G., Teacy, L., & De Nardi, R. (2013). UAV position estimation
and collision avoidance using the extended kalman filter. IEEE Transactions on Vehicular
Technology, 62(6), 2749–2762.
122 A. Agrawal et al.

20. Ma, H., Wang, M., Fu, M., & Yang, C. (2012). A new discrete-time guidance law base on
trajectory learning and prediction. In AIAA Guidance, Navigation, and Control Conference (p.
4471)
21. Marquardt, D. W. (1963). An algorithm for least-squares estimation of nonlinear parameters.
Journal of the society for Industrial and Applied Mathematics, 11(2), 431–441.
22. Mehta, S. S., Ton, C., Kan, Z., & Curtis, J. W. (2015). Vision-based navigation and guidance
of a sensorless missile. Journal of the Franklin Institute, 352(12), 5569–5598.
23. Pang, B., Ng, E. M., & Low, K. H. (2020). UAV trajectory estimation and deviation analysis
for contingency management in urban environments. In AIAA Aviation 2020 Forum (p. 2919)
24. Prevost, C. G., Desbiens, A., & Gagnon, E. (2007). Extended kalman filter for state estimation
and trajectory prediction of a moving object detected by an unmanned aerial vehicle. In 2007
American Control Conference (pp. 1805–1810). IEEE
25. Qu, L., & Dailey, M. N. (2021). Vehicle trajectory estimation based on fusion of visual motion
features and deep learning. Sensors, 21(23), 7969.
26. Roh, G. P., & Hwang, S. W. (2010). Nncluster: an efficient clustering algorithm for road network
trajectories. In International Conference on Database Systems for Advanced Applications (pp.
47–61). Springer
27. Schulz, J., Hubmann, C., Löchner, J.,& Burschka, D. (2018). Multiple model unscented kalman
filtering in dynamic bayesian networks for intention estimation and trajectory prediction. In
2018 21st International Conference on Intelligent Transportation Systems (ITSC) (pp. 1467–
1474). IEEE
28. Shamwell, E. J., Leung, S., & Nothwang, W. D. (2018). Vision-aided absolute trajectory esti-
mation using an unsupervised deep network with online error correction. In 2018 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS) (pp. 2524–2531). IEEE
29. Shrivastava, A., Verma, J. P. V., Jain, S., & Garg, S. (2021). A deep learning based approach
for trajectory estimation using geographically clustered data. SN Applied Sciences, 3(6), 1–17.
30. Strydom, R., Thurrowgood, S., Denuelle, A., & Srinivasan, M. V. (2015). UAV guidance: a
stereo-based technique for interception of stationary or moving targets. In Conference Towards
Autonomous Robotic Systems (pp. 258–269). Springer
31. Su, K., & Shen, S. (2016). Catching a flying ball with a vision-based quadrotor. In International
Symposium on Experimental Robotics (pp. 550–562). Springer
32. Sung, C., Feldman, D., & Rus, D. (2012). Trajectory clustering for motion prediction. In 2012
IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 1547–1552). IEEE
33. Thomas, J., Loianno, G., Sreenath, K., & Kumar, V. (2014). Toward image based visual servoing
for aerial grasping and perching. In 2014 IEEE International Conference on Robotics and
Automation (ICRA) (pp. 2113–2118). IEEE
34. Tony, L. A., Jana, S., Bhise, A. A., Gadde, M. S., Krishnapuram, R., Ghose, D., et al. (2022).
Autonomous cooperative multi-vehicle system for interception of aerial and stationary targets
in unknown environments. Field Robotics, 2, 107–146.
35. Yan, L., Jg, Zhao, Hr, Shen, & Li, Y. (2014). Biased retro-proportional navigation law for
interception of high-speed targets with angular constraint. Defence Technology, 10(1), 60–65.
36. Zhang, X., Wang, Y., & Fang, Y. (2016). Vision-based moving target interception with a mobile
robot based on motion prediction and online planning. In 2016 IEEE International Conference
on Real-time Computing and Robotics (RCAR) (pp. 17–21). IEEE
37. Zhang, Y., Wu, H., Liu, J., & Sun, Y. (2018). A blended control strategy for intercepting high-
speed target in high altitude. Proceedings of the Institution of Mechanical Engineers, Part G:
Journal of Aerospace Engineering, 232(12), 2263–2285.
38. Zhao, M., Shi, F., Anzai, T., Takuzumi, N., Toshiya, M., Kita, I., et al. (2022). Team JSK at
MBZIRC 2020: interception of fast flying target using multilinked aerial robot. Field Robotics,
2, 34–54.
Robotics and Artificial Intelligence
in the Nuclear Industry: From
Teleoperation to Cyber Physical Systems

Declan Shanahan, Ziwei Wang, and Allahyar Montazeri

Abstract This book chapter looks to address how upcoming technology can be used
to improve the efficiency of decommissioning processes within the nuclear industry.
Challenges associated with decommissioning are introduced with a brief overview
of the previous efforts and current practices of nuclear decommissioning. A high-
level cyber-physical architecture for nuclear decommissioning applications is then
proposed by drawing upon recent technological advances in the realm of Industry 4.0
such as internet of things, sensor networks, and increased use of data analytics and
cloud computing approaches. In the final section, based on demands and proposals
from industry, possible applications within the nuclear industry are identified and
discussed.

Keywords Cyber-physical systems · Sensor networks · Robotics · Industry 4.0 ·


Nuclear · Decommissioning · Artificial intelligence

1 Introduction

1.1 Background

Around the world, many nuclear power plants are reaching the end of their active
life and are in urgent need of decontamination and decommissioning (D&D). In the
UK alone, seven advanced gas cooled reactors are due to enter decommissioning by
2028 [1]. This is in addition to the nuclear 11 Magnox reactors along with research
sites, weapon production facilities, and fuel fabrication and reprocessing facilities
already at various stages of decommissioning. Current estimates are that the decom-
missioning programme will not be completed until 2135, at a cost of £237bn [2].
The D&D process requires that the systems, structures, and components (SSC) of
a nuclear facility be characterised and then handled accordingly. This may include

D. Shanahan · Z. Wang · A. Montazeri (B)


School of Engineering, LA14YW, Lancaster University, Bailrigg, Lancashire, UK
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 123
A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous
Systems Applications, Studies in Computational Intelligence 1093,
https://doi.org/10.1007/978-3-031-28715-2_5
124 D. Shanahan et al.

removing materials that are unhazardous, to dealing with highly radioactive mate-
rials such as nuclear fuel. The main objectives of the D&D process are to protect
workers, the public and environment, while also minimising waste and associated
costs [3]. Despite this, at present many D&D tasks are still reliant on manual labour,
putting workers at risk of radiation exposure. The need to limit this exposure in the
nuclear industry, as defined by as low as reasonably practical (ALARP) principles,
is a challenging problem [4].

1.2 Motivation

Certain facilities within a nuclear plant are not accessible by humans due to levels
of radioactivity present. Where access is possible such as in alpha-contaminated
areas, heavy duty personal protective equipment is required, including air fed
suits and leather overalls. This makes work cumbersome and strenuous, while
still not removing all risk to the worker. In addition, the high cost of material
disposal, including the associated equipment for segregation of materials, some of
which may be highly radioactive, has necessitated more efficient processes. Conse-
quently, decommissioning tasks such as cutting and dismantling components present
challenges to the nuclear industry and require novel solutions [5].
In the nuclear industry due to uniqueness in the diversity and severity of its chal-
lenges, significant barriers prevent robotic and autonomous system (RAS) deploy-
ment These challenges, for example, are: (i) highly unstructured, uncertain and clut-
tered environments, (ii) high risk environments, with radioactive, thermal and chem-
ical hazards, (iii) exploration, mapping and modelling of unknown or partially known
extreme environments, (iv) powerful, precise, multi-axis manipulators needed with
complex multimodal sensing capabilities, (v) critically damaging effects of radiation
on electronic systems, (vi) need for variable robot supervision, from tele-immersion
to autonomous human–robot collaboration [6].
The Nuclear Decommissioning Authority (NDA) in the UK has set out 4 grand
challenges for nuclear decommissioning as shown in Table 1. As can be inferred
from Table 1, digitisation of the nuclear industry, in combination with autonomous
systems, advanced robotics, and wearable technologies, plays a significant role to
address these challenges. Sellafield Ltd., in charge of managing the world’s largest
inventory of untreated nuclear waste, has also identified key areas for technical
innovation, based upon the grand challenges presented by the NDA [7]. A major
difficulty in decommissioning current nuclear power plants is the lack of available
data regarding storage and previous operation. Future builds would benefit from a
comprehensive accounting of all lifecycle activities and incidents, which would in
turn, make planning dismantling and decontamination a much easier process.
Robotics and Artificial Intelligence in the Nuclear Industry: From … 125

Table 1 Grand challenges identified by the NDA [8]


Reducing waste and reshaping the waste Finding new ways to drive the waste hierarchy,
hierarchy increasing recycling and reuse to reduce
volumes sent for disposal
Intelligent infrastructure Using autonomous technology to manage assets
and buildings proactively and efficiently
Moving humans away from harm Reducing the need for people to enter hazardous
environments using autonomous systems,
robotics, and wearable technology
Digital delivery—enabling data driven Adopting digital approaches for capturing and
decisions using data, to improve planning, training and aid
decision making

1.3 Problem Statement

Robots are a natural solution to challenges faced with D&D processes. Neverthe-
less, uptake of robot systems in the nuclear industry has been slow, with handling
applications limited to teleoperated manipulators controlled through thick lead
glass windows, greatly reducing situational awareness [5]. Furthermore, while these
systems are robust and rugged, they lack feedback sensors or inverse kinematic
controllers, making them slow to operate and reliant on the experience and skill of
the operator. To address these challenges innovative technology is needed to improve
efficiency and reduce the time taken to decommission plants.
The need for robotic capability in the nuclear industry has long been recognised,
particularly in the aftermath of accidents where radioactive sources can become
dispersed and the danger to human life is much greater. Particular examples of these
accidents include Three Mile Island in the USA, Chernobyl in Russia, and more
recently at Fukushima in Japan. While the risks posed by nuclear are great, the need
for cleaner energy and net-zero carbon emission is critical and so there is a necessity
for systems that can deal with nuclear accidents, as well as provide support in the
handling of common decommissioning tasks [9].
The current study expands on earlier research on the use of robotics and
autonomous systems for nuclear decommissioning applications done at Lancaster
University [10, 11]. Although the hydraulically actuated robotic manipulators are
crucial for decommissioning operations, the inherent nonlinearities in the hydraulic
joints make modelling and control extremely difficult. For instance, in [12] it is
suggested to use a genetic algorithm technique to estimate the unknown parame-
ters of a hydraulically actuated, seven degree of freedom manipulator. In [13], the
estimation outcomes are enhanced by utilising a multi-objective cost function to
replace the output error system identification cost function. Another issue arising
in using hyper redundant manipulators in nuclear decommissioning is the need for
developing a numerically efficient inverse kinematic approach to be robust against
potential singularities [14]. An explanation of the earliest studies on the approaches
126 D. Shanahan et al.

to capturing the nonlinearities of the hydraulic manipulator is provided in [15, 16].


These findings are intended to be applied to state-dependent regulation of the robot
joints under Wiener-type nonlinearity [17, 18]. It is evident that a significant amount
of time must be allotted for pre-planning in decommissioning applications in order
to gather and process the relevant data without the need for human intervention.
Utilising an autonomous unmanned aerial vehicle in conjunction with the manip-
ulator can bring an important benefit of the decommissioning by
increasing speed, accuracy, and success [19]. A quadcopter would enable quicker
3D mapping and access to locations that could call for extensive preparation and
labour. For UAV, attitude control in the presence of uncertainties and direct wind
disturbance on the system model, a unique multi-channel chattering free robust
nonlinear control system is developed in [20, 21]. Applying the Extended Kalman
Filter (EKF) for state estimation [22] and using event-triggered particle filters to
enhance UAV energy management throughout the state estimation process [23] are
recent innovations that have brought the controller closer to practical application.
The results obtained are encouraging, however they haven’t been tested on a real
quadcopter platform yet. Additionally, expanding the number of robots to develop
a heterogeneous collection of multi-agent systems would enable faster and more
efficient execution of hitherto impossible missions. As a result, the goal of this book
chapter is to discuss the technological foundations for the creation of such a system.

1.4 Recent Technological Advances–Industry 4.0

Industry 4.0 originated in Germany and has since expanded in scope to include digital-
isation of manufacturing processes, becoming known as the fourth industrial revolu-
tion. The industry 4.0 paradigm can be characterised by key technologies that enable
the shift towards greater digitalisation and the creation of cyber-physical systems
(CPSs). This includes merging of technology that was once isolated, providing oppor-
tunities for new applications. As the key area of Industry 4.0 is manufacturing, many
applications are focused on increasing productivity within factories, most commonly
accomplished by improving the efficiency of machinery.
Industrial automation in manufacturing encompasses a wide range of technologies
that can be used to improve processes. With the advent of industry 4.0, there is now
a trend for a holistic approach to increasing automation by considering all aspects
of a process and how they are interlinked. Autonomous industrial processes can be
used to replace humans in work that is physically hard, monotonous or performed
in extreme environments as well as perform tasks beyond human capabilities such
as handling heavy loads or working to fine tolerances. At the same time, they offer
the opportunity for data collection, analytics, and quality checks, while providing
improved efficiency and reduced operation costs [24].
Automation of industrial processes and manufacturing systems requires the use
of an array of technologies and methods. Some of the technologies used at present
Robotics and Artificial Intelligence in the Nuclear Industry: From … 127

include distributed control systems (DCS), supervisory control and data acquisi-
tion (SCADA), and programmable logic controllers (PLC). These can be combined
with systems such as robot manipulators, CNC machining centres and/or bespoke
machinery.
Industry 4.0 continues past developments and looks to enhance the level of
autonomy in an industrial process. This can include new concepts and technologies
such as:

• Sensor networks • Cloud computing


• The industrial internet of things (IIoT) • Cloud robotics
• Cloud computing • Cognitive computing
• Big data • Blockchain
• Fault tolerant control systems • Artificial intelligence (AI)
• Simulation and digital twins • Cognitive computing

These technologies can also be applied to processes outside of those targeted


by traditional automation and provide advanced abilities for decision making. The
industry 4.0 concept involves integration at all possible levels, horizontal, vertical,
and end-to-end, allowing data to be shared across the entirety of a process. According
to [25], Industry 4.0 can be specified according to three paradigms: the smart product,
smart machine, and augmented operator. At present the nuclear industry is somewhat
lagging other industries such as manufacturing in adopting these new technologies.
While industry 4.0 concepts show great potential, there are several challenges
associated with practical implementation, especially in the nuclear sector. Lack of
technical skills in the workforce, compounded by the fact that much of the nuclear
workforce first employed by the industry are now reaching retirement age and taking
tacit knowledge with them that could be used for the future development of the
nuclear industry, in particular with regard to decommissioning [26]. Cyber security
is of paramount importance in the nuclear industry with cyber-attacks becoming a
more common occurrence, both from lone hackers and state actors. Introducing new
technologies and greater interconnectivity within an ecosystem will typically also
introduce new vulnerabilities that may be exploited, commonly known as a zero-
day. In the case of a CPS, this could lead to data breaches of sensitive information
from IT systems or disrupt operational technology (OT) systems, leading to loss
of control and/or destruction of hardware or infrastructure, possibly also causing
harm to or loss of human life [27]. Further challenges with implementing industry
4.0 concepts in nuclear include the need for interoperability between systems. As
many systems in nuclear are bespoke due to the unique design requirements, often
managing transfer of data between linked systems is not a design consideration;
therefore, new communication systems will need to be established to manage this
transition.
128 D. Shanahan et al.

1.5 Chapter Outlines and Contributions

The aim of this chapter is to provide an overview of the current challenges faced
during D&D operations at nuclear power plants, and how these can be addressed
by greater digital integration and advanced technology such as seen as part of the
Industry 4.0 initiative. A new conceptual design for the decommissioning process
using the cyber-physical framework is proposed and the challenges and opportunities
for the integrated technologies is discussed by drawing upon the literature. The start
of the chapter will give background on the D&D process and some of the challenges
faced. A review of the current state of the art in relevant research is then provided
for common D&D tasks. In the final section, a framework for autonomous systems
is developed, building upon the current advances in robot and AI system research.

2 Nuclear Decommissioning Processes

This section will provide the background on the D&D process and some of the chal-
lenges faced at each stage of the process. There are several main processes that must
be completed during D&D, categorised here as characterisation, decontamination,
dismantling and demolition, and waste management.

2.1 Characterisation

The initial stage of any decommissioning operation is to develop an inventory of all


materials in a facility which can then be used to plan subsequent decommissioning
processes. The first step of characterisation is performing a historical assessment of a
facility. This can consist of reviewing plant designs along with operational and main-
tenance reports, which can then allow expected materials to be categorised as contam-
inated, unrestricted, or suspected contaminated, and help focus further characterisa-
tion efforts. The initial inventory of materials can subsequently be further developed
by inspection, and then detailed characterisation of radiological and hazardous mate-
rials performed. Separate characterisation techniques may be required depending on
the types of contamination present, particularly as some parts of a nuclear facility
are not easily characterised due to high contamination and lack of access. Examples
of this are components of the reactor which become radioactive over time due to
neutron activation. A solution to this is to model expected radioactivity and then
take samples at suitable locations. Comparing the model predictions with the sample
readings can then allow for accurate characterisation of the complete reactor [3].
Such techniques often entail some level of uncertainty which must be consid-
ered when characterising waste. Factors such as radionuclide migration or spatial
Robotics and Artificial Intelligence in the Nuclear Industry: From … 129

variations in radionuclide concentrations can make sampling to the required confi-


dence a difficult task. Therefore, new approaches are required across a spectrum of
characterisation scenarios such in -situ characterisation of SSC in inaccessible areas,
non-destructive monitoring of packaged waste in disposal facilities, and accurate
characterisation of waste on the boundary between intermediate and low level [28].
Research focused on new solutions for characterisation is already underway such
as the CLEANDEM (Cyber physicaL Equipment for unmAnned Nuclear DEcom-
missioning Measurements) project that aims to implement an unmanned ground
vehicle (UGV) with an array of radiological sensing probes that can perform initial
assessments as well as ongoing monitoring during D&D operations [29].

2.2 Decontamination

During decommissioning operations, decontamination is a process that may be


performed with the primary aim to reduce radioactivity for subsequent disman-
tling operations. This can reduce risk to workers during the dismantling procedure,
along with providing cost savings in waste being reclassified at a lower level. These
benefits however must be balanced with the increased cost, additional secondary
waste, and radiation doses to workers associated with decontamination processes.
In determining whether decontamination will be beneficial, many variables must be
considered including:

• Types of structures • Composition of materials


• Accessibility of structure surfaces • Type of contaminant
• Material radioactivity levels and type • Destination of waste components

Decontamination can be performed in situ or parts can be relocated to a specialist


location. There is a variety of decontamination options that can be used including
chemical, electrochemical, and mechanical techniques. Each can have advantages
for specific applications [30].

2.3 Dismantling and Demolition

Dismantling (or segmentation) is often required for the components and systems
within a nuclear facility, a key example of this is the reactor. There is no standardised
method for dismantling; each facility will require a tailored approach depending on
the design and conditions. As with decontamination, variables such as component
types and ease of access need to be considered to determine the optimal approach.
There are several techniques available for dismantling, these include:
130 D. Shanahan et al.

• Plasma arc cutting • Abrasive water jet cutting


• Electric discharge machining (EDM) • Laser cutting
• Metal disintegration machining (MDM) • Mechanical cutting

Each technique can have advantages for a given application, depending on require-
ments for cutting depth, speed, and type of waste generated. Off-the-shelf equip-
ment may be adapted for the nuclear environment, along with the application of
tele-operation and/or robotics to reduce the risk of radiation to operators. Once the
systems and components have been removed, and the building is free from radioac-
tivity and contamination, demolition can be carried out. This can be done using
conventional techniques such as ball and breakers, collapsing, or explosives where
appropriate [3].

2.4 Waste Management

Careful consideration must be given to handling and disposal of waste materials


produced during decommissioning operations. The waste management hierarchy
should be adhered to and is of particular importance during decommissioning due
to higher waste disposal costs. In the UK, waste is classified into one of 3 categories
depending on radioactivity and heat production. These are high, intermediate, and
low-level waste. Low level waste (LLW) can be further broken down to include
very low-level waste (VLLW) which includes waste that can be disposed of through
traditional waste routes.
In determining the available routes for waste, it is necessary to review tech-
nical requirements and processing capabilities including aspects such as radioac-
tivity levels, storage capacities, treatment facilities, and material handling options.
Raw waste must undergo treatment and packaging to a form suitable for disposal or
long-term storage; these activities can be broadly categorised as post-processing
operations. Treatment and packaging of raw waste may use several processes,
some of which have already been introduced. These can include retrieval of
waste, sorting and segregation, size reduction, decontamination, treatment, condi-
tioning/immobilisation and/or packaging [8]. Post-processing provides several bene-
fits including reduced waste classification, smaller final volumes, safer waste
packages, and the opportunity to generate waste inventories.
As with other stages of the D&D process, humans are not able to work in the
vicinity of waste materials, thereby the necessitating the use of tele-operated or
autonomous systems [31]. The PREDIS (PRE-DISposal management of radioactive
waste) project has already started work focusing on developing treatment and condi-
tioning methods for different decommissioning wastes by producing new solutions
or adopting immature solutions by increasing technology readiness levels [32].
Robotics and Artificial Intelligence in the Nuclear Industry: From … 131

3 Current Practice in Nuclear Decommissioning Research

There is a range of robots required in the nuclear industry, each designed for specific
applications. These can be water, land or air based, and often have to deal with chal-
lenging environments that include high radiation and restricted access. Robots used
in nuclear have additional requirements over other industries, particularly in relation
to the ability to cope with radioactive environments. This results in the need for high
equipment reliability with low maintenance requirements. While conventional off the
shelf equipment may be suitable for some applications in nuclear, invariably it will
need to be adapted to make it more suitable for being deployed. This can involve tech-
nique such as relocating sensitive electronics, adding shielding, upgrading compo-
nents to radiation tolerant counterparts and/or adding greater redundancy through
alternative recovery methods [33].

3.1 Assisted Teleoperation and Manipulation in Nuclear


Decommissioning

Approaches looking to improve control of teleoperated robots include virtual and


augmented reality, AI assistance, and haptic feedback, such as being carried out by
researchers at the QMUL Centre for Advanced Robotics [34]. Virtual reality (VR)
would allow the operator to have a greater awareness of surroundings as opposed to
the current approach of using a set of screens to provide different perspectives of the
workspace. AI assistance can be used in conjunction with VR to automate the more
routine tasks and allow faster operation. This could also provide the possibility of
allowing a single operator to control multiple manipulators and only have to take full
control when a new problem is encountered. Haptic feedback could further improve
the operability of teleoperation systems by providing feedback that allows safer
grasping of an object [34]. Significant research has been carried out into the use of
hydraulic manipulators for decommissioning tasks such as pipe cutting as described
in [35]. The system under development uses COTS 3D vision to allow an operator
to select the intended work piece without having to manually position the cutting
tool or manipulators. This was shown to be faster than using tele-operational control
alone.
A common task required of robots is grasping objects. While this has been
performed for some time in manufacturing, nuclear industry environments are
often unstructured making such tasks considerably more difficult. To address
this, researchers at the University of Birmingham have developed autonomous
grasp planning algorithms with vision-based tracking that does not require a-priori
knowledge of objects or learning from training data. This is achieved using a local
contact moment based grasp metric using characteristic representations of surface
regions [36].
132 D. Shanahan et al.

3.2 Robot-Assisted Glovebox Teleoperation

Gloveboxes are used throughout the nuclear industry to provide a contained environ-
ment for handling hazardous objects and materials. They can however be difficult to
use and still present a risk of exposure to radioactive materials for the operator. In
addition, the gloves reduce tactile feedback and reduce mobility of the arms making
simple tasks tiring and challenging; it would therefore be beneficial to incorporate a
robotic system. There are however challenges for implementation such as integrating
robots within a glovebox which can be dark and cluttered, and protecting systems
from the harmful effects of radiation which can cause rapid deterioration to electrical
components. In [37], new technologies in robotics and AI are explored as to how
gloveboxes can be improved through the use of robotics. It is suggested that it is
preferable to design a robot to operate within the gloves to simplify maintenance
and protect it from contaminants. It is also noted while greater autonomy would
improve productivity, glovebox robotics mainly utilise teleoperation due to the risk
of breaking containment when using an autonomous system. Teleoperation methods
were further developed by a team at the University of Manchester which allowed
control of a manipulator using only the posture and gesture of bare hands. This was
implemented using virtual reality with Leap Motion and was successful at executing
a simple pick and place task. The system could be further improved with the addition
of haptic feedback, possibly achieved virtually using visual or audio feedback [38].

3.3 Post-processing of Nuclear Waste

Currently under development by Sellafield Ltd. and the National Nuclear Laboratory
is the box encapsulation plant (BEP) which is intended to be used for post-processing
of decommissioning waste. Post-processing offers benefits including lower waste
classifications, reduced waste volume and safer waste packaging while also allowing
creation of an inventory. Due to the radioactivity of the waste materials, it is necessary
to have a remotely operated system. Current designs are tele-operated however they
are extremely difficult to operate. To address this, research [31] looks at how greater
autonomy can be applied to the manipulators used in the BEP. The key requirements
of the autonomous system can be defined as:
• Visual object detection, recognition, and localisation of waste canisters.
• Estimation of 6DOF manipulator pose and position.
• Decision making agent utilising vision data and acting with the manipulator
control system.
• Autonomous manipulation and disruption.
Robotics and Artificial Intelligence in the Nuclear Industry: From … 133

3.4 Modular and Cooperative Robotic Platforms

A consideration in deploying robotics in radioactive environments is their total inte-


grated dose (TID), which is the amount of radiation a robot can be exposed to before
failure. In highly radioactive areas such as close to reactors, this can be a matter
of minutes. A further challenge in deploying robotics in nuclear facilities is the
unstructured nature of the operating environment. This means that robotic systems
will often spend a large proportion of time planning and mapping. These factors
combined mean that the time available to complete necessary tasks can be limited.
Greater autonomy is also a key driver for improved technology. Current methods
of operation by teleoperation can result in slow and inefficient operation. With the
high risk of failure, it is useful to have redundancy in robotic systems; modular and
multi-robot systems have been identified as a possible solution to this challenge [39].

3.5 Unmanned Radiation-Monitoring Systems

Characterisation of nuclear environments has long been a challenge within the


industry, particularly in the aftermath of incidents such as Fukushima, TMI and
Chernobyl. Monitoring radiation levels in and around a nuclear power plant is an
important task during operation, decommissioning and in response to incidents that
may result in emission of radiation. Typically, plants will have numerous monitoring
points however these do not provide data over a large area or may become faulty,
requiring an alternative solution. The authors in [40] propose an unmanned radiation
monitoring system by combining radiations sensors and a unmanned aerial vehicle
that can be deployed in quick response to where there may be a source of radia-
tion. Some of the key features for this include easy decontamination, hibernating
mode, and custom software, all with a high standard of reliability. Further detail of
previous robots can be found in [41]. A similar project for carrying out mapping of
radioactive environments was carried out as detailed in [42]. This work focused on
implementing gamma radiation avoidance capabilities that would allow a robot to
navigate a nuclear environment while avoiding high radiation doses that may cause
system failure.
In response to the flooding of primary containment vessels at the Fukushima
nuclear plant, a submersible ROV named AVEXIS was developed by collabora-
tion between the Universities of Manchester and Lancaster along with institutions
in Japan. Characterisation is achieved by combining visual inspection, acoustic
mapping, and radiation assessment by a CeBr3 inorganic scintillator. The system
has been validated as being able to detect fuel debris from experimental testing at
the Naraha test facility and National Maritime Research Institute in Japan, as well
as in water tanks at Lancaster University, UK.
MallARD, an autonomous surface vehicle, has also been developed by the Univer-
sity of Manchester for an International Atomic Energy Agency (IAEA) robotics
134 D. Shanahan et al.

challenge on the inspection of spent fuel storage ponds. This project focused on
development of robust localisation and positioning techniques using Kalman filters
and model predictive control techniques. Experimental testing showed the MallARD
system was able the follow planned paths with a reduction in error of two orders of
magnitude [43].

4 Towards an Autonomous Nuclear Decommissioning


Process

The review in the previous section reveals that a large body of work remains to
complete all aspects of autonomy expected in the nuclear sector. Moreover, the
overview in the last two sections highlights the challenges the nuclear sector is
currently facing for decommissioning of power plants and clarifies the urgent need
for increasing the autonomy level in the whole process. Inspired from autonomous
industrial processes and drawing upon what is currently practiced in various industrial
sectors after the fourth industrial revolution, in this section we aim to study the
underlying architecture and recent technological advances that motivates designing
an autonomous decommission process for the nuclear sector.

4.1 Different Levels of Autonomy

In autonomous systems, autonomy is typically defined by levels dependent on the


level of human input. The categorisations vary depending on the application, but
all have the same structure of being non-autonomous (lowest) up to completely
autonomous (highest). Likewise, within an automated system there is a hierarchy of
processes from the lowest being actuator and sensor control, up to high level which
can consist of sophisticated decision making. The processes of an automated system
can also be grouped by function. In [44], these classes are proposed as information
acquisition, information analysis, decision and action selection, and action imple-
mentation. When designing a system, a key question is what should be automated,
particularly with regard to safety and reliability requirements. Advancements in the
field of robotics and software mean that systems can operate with a higher degree of
automation than ever before. A proposed classification structure of autonomy levels
in industrial systems is shown in Fig. 1. Due to the similarities between process plants
and decommissioning operations, these levels are relevant to autonomous operations
within the nuclear industry for decommissioning tasks. At present many nuclear
decommissioning technologies have only reached Level 2.
A key issue in using new technologies within safety critical application such as
the nuclear industry is ensuring that they are fit for use and safe; the process of
determining this is known as verification and validation (V&V) [46]. V&V methods
Robotics and Artificial Intelligence in the Nuclear Industry: From … 135

Fig. 1 Taxonomy of autonomy levels for industrial systems [45]

will vary dependent on how a system is designed. Early stages of development may
rely more on simulation due to the unpredictable nature of systems in the initial
phases. Considerations for design include robust design methods, facility design and
possible adaptions to accommodate the system. In addition, considerations regarding
the maintenance and decommissioning of a new system must be evaluated such as
methods of recovery in the event of power loss or other critical fault [46]. For a
tele-operated system this may include physical testing, user analysis and simulated
testing. Semi-autonomous systems require greater analysis and may require stability
proofs along with hardware in the loop testing. Such systems are at risk from events
such as loss of communication and may need to be continuously recalibrated to ensure
they are operating within safe limits. Research at the University of West of England
has investigated how simulation-based internal models can be used in the decision-
making process of robots needs to identify safety hazards at runtime. This approach
would allow a robot to adapt to a dynamic environment and predict dangerous actions,
allowing a system to be deployed across more diverse applications [47].
136 D. Shanahan et al.

4.2 The Cyber Physical System Architecture

A cyber physical system (CPS) can be defined as “integrations of computation with


physical processes” [48]. They generally consist of embedded computers joined by
networks and may involve feedback loops for controlling physical processes. CPSs
are unique in their intricate connection between the cyber and physical world; while
this can present many problems for implementation, exploiting this fact can lead to
advanced systems with greater capabilities than ever seen before. The potential appli-
cations for CPSs are vast and spread across a spectrum of sectors such as healthcare,
transport, manufacturing, and robotics.
According to the National Institute of Standards and Technology (NIST) [49],
CPSs are defined as systems that “integrate computation, communication, sensing,
and actuation with physical systems to fulfil time-sensitive functions with varying
degrees of interaction with the environment, including human interaction”. There
are some key elements of a CPS that distinguish them from conventional systems.
These are:
• Amalgamation of cyber and physical domains, along with interconnectedness.
• The scope for systems of systems (SoS).
• Expected emergent behaviour.
• Requirements for methods to ensure interoperability.
• The possibility for functionality extended beyond initial design scope.
• Cross-domain linked applications.
• A greater emphasis on trustworthiness.
• Modifiable system architecture.
• Broad range of computational models and communication protocols.
• Time-sensitive nature, with latency and real time operation a key design issue.
• Characterisation by interaction with their operating environment.
The current CPS concept has developed from the increasing number of embedded
systems, associated sensors, and greater connectivity inherent in today’s systems.
This allows data to be collected directly from a system and be processed to give
insight into operation, made possible by the use of advances in data processing with
concepts such as big data, machine learning and AI. This can result in a system that
exhibits a level of intelligence [50].
As CPSs are a relatively recent concept, methods and structure for their design are
still being developed. An early architecture, proposed in [51], is a 5-level structure
for developing CPSs for Industry 4.0 applications. An overview of this is given in
Fig. 2.
Along with the 5C architecture, other architectures have been proposed including
the Reference Architectural Model Industry 4.0 (RAMI4.0) and Industrial Internet
Reference Architecture (IIRA) [52]. A key challenge in the implementation of CPSS
is the lack of interoperability of individual systems. In terms of software implemen-
tation this interoperability refers to the ability to exchange data between system,
more specifically known as semantic interoperability [53]. In the context of CPSs,
Robotics and Artificial Intelligence in the Nuclear Industry: From … 137

Fig. 2 The 5C architecture for a cyber physical system [51]

cyber physical production systems have been given as an adaptation to the manu-
facturing industry. This is based on the same framework as the conventional CPS,
mainly the 5Cs [54]. An example of a CPS developed for manufacturing automation
is proposed in [55]. The authors account for the hybrid system approach, integrating
the continuous-time physical process with the discrete-time control. In addition, the
paper notes that service-orientated architecture and multi-agent system approaches
are promising for the development of such a CPS. The concept system integrates the
robot movements with object recognition and fetching, all while take consideration
of safety during human–robot interaction.
The interconnected nature of CPSs make them inherently susceptible to cyber-
attacks. These attacks can come from actors such as hacking organisations, govern-
ments, users (that may be intentional or not), or hacktivists. Hackers can exploit
vulnerabilities, such as cross site scripting or misconfigured security in order to
break into a system. These risks can be mitigated by maintaining security defined
by the CIA triad—confidentiality, integrity, and availability. Many cyber-attacks are
confined to the cyber domain and focus on sensitive data or disrupting computer
systems. In contrast, CPS attacks can have a direct impact in the physical world that
can cause damage to the physical systems as well as the potential to endanger life.
Preventing future attacks, which could have greater impact due to the greater number
of connected systems, is of upmost importance [56].
138 D. Shanahan et al.

4.3 Enabling Technologies

The cyber physical architecture discussed in the previous section can be realised
on the pillars of various recently proposed technologies, referred to as enabling
technologies here. In the following section, these technologies are reviewed in more
depth to set the scene for development of a cyber-physical system for the nuclear
industry.
Industrial Internet of Things. The Industrial Internet of Things (IIoT) is a subset
of the Internet of Things (IoT) concept whereby devices are all interconnected, facil-
itated by the internet. IoT has seen increasing development in many sectors such
as transportation and healthcare. Increasing connectivity has been enabled by the
greater use of mobile devices each of which can provide feedback data. IIoT covers
machine-to-machine (M2M) communication and similar industrial automation inter-
action technology. While there are some similarities with consumer based IoT, IIoT
differs in that connectivity must be structured, applications are critical, and data
volume is high [53]. While IoT tends to use wireless communications, industrial
applications often rely on wired connectivity due to greater reliability.
As IIoT is characterised by the substantial amounts of data transferred, the
methods for data exchange are a key consideration for IIoT systems. Industrial
communication networks have developed considerably over the years, utilising devel-
opments in fields such as IT. Initial industrial communication was developed using
fieldbus system networks which helped to improve communication between low level
devices. As the internet continued to grow, automation networks changed to incor-
porate more ethernet based technologies despite issues with real time capabilities.
More recently wireless networks have become more common as they allow easier
reconfiguration of systems and do not require extensive cabling; they do have still
have issues with real time capabilities and concerns regarding reliability. In particular
the new wave of communication technology that has added the development of the
IoT and IIoT is more focused on consumer requirements and so are not currently
suitable for many of the demanding requirements of industry [57].
Using IIoT can help improve productivity and efficiency of processes. Real time
data acquisition and data analytics can be used in conjunction with IIoT to predict
equipment maintenance requirements and allow fast response to failures. In health-
care setting IoT can be used to improve patient safety by better monitoring patient
conditions.
There are many examples of IoT applications, some have been slowly devel-
oped while others are quickly being implemented as enabling technologies become
available. Examples include:
• Smart Cities—Smart highways can be used to improve traffic flow and reduce
accidents while monitoring of available parking spaces can be used to assist
drivers.
• Smart Agriculture—Monitoring weather along with soil moisture and quality can
ensure planting and harvesting is done at the correct time.
Robotics and Artificial Intelligence in the Nuclear Industry: From … 139

• Smart Homes—using IoT devices in the home can improve efficiency in use of
utilities in addition to also allowing better safety through detection of break ins.
Internet of Things systems generally comprise of a wireless sensor network
(WSN). WSNs have some key objectives such as sensing properties of their environ-
ment, sampling signals to allow digital processing, and some provide some ability
towards extracting useful information from the collected data. Typically, WSNs will
involve low-cost, low-power, communication methods such as Wi-Fi, Bluetooth,
or near frequency communication (NFC); some drawback of using these methods
include interference and loss of data. This can be particularly problematic in a nuclear
environment where good reliability is required [58].
The convergence of IoT and robotics has resulted in the concept of an internet of
robotic things (IoRT) which can be seen from control approaches such as with cloud
and ubiquitous robotics [59]. The term was initially identified to refer to fusion of
sensor data and manipulation of objects resulting in a cyber-physical approach to
robotics, and sharing characteristics with the CPS concept. Elementary capabilities
of an IoRT system include perception, motion, and manipulation, along with higher
level processing including decisional autonomy, cognitive ability, and interactive
capacity.
Cloud Computing. Cloud computing allow quick access to IT resources, providing
flexibility to match requirements. Resources are available on demand, over the
internet, and are used across a range of industries. There are several benefits of
cloud computing such as reduced initial upfront cost for infrastructure, running costs
associated with IT infrastructure, and allowing capacity to be quickly adapted to
requirements, reducing the need for planning. There are several deployment methods
for cloud computing which each offer benefits in terms of flexibility and management.
Some examples are shown in Fig. 3.

Infrastructure Platform as a Service


Software as a Service (SaaS)
as a Service (IaaS) (PaaS)

Access to networking and Software available on de-


Access to cloud computing
data storage with high mand thorough cloud infra-
without associated infra-
flexibility and manage- structure with low to no
structure maintenance
ment maintenance

Fig. 3 Cloud computing service models


140 D. Shanahan et al.

Edge computing is becoming a common alternative to cloud computing. It can


be defined as computing resources located part way between the data sources and a
cloud computing centre. This has the advantage of improving system performance in
terms of overall delay and reducing the requirements for bandwidth. In addition, fog
computing can provide greater security over the cloud, reducing the possibility of
data being intercepted during transmission and allowing greater control over storage
[60]. Edge-based system will also help drive the development of CPS by allowing
data processing in real-time, greatly improving the response of such systems.
The architecture for an edge-based network consists of an edge layer situated
between a device layer and cloud layer, which can be further broken down into near,
mid and far levels. The edge provides opportunities for integration of 5G, allowing
applications such as real-time fault monitoring. This however requires consideration
of quality of service based on availability, throughput and delay. 5G networks also
offer the ability to be sliced, such as by using network function virtualisation (NFV)
technology, a form of logical separation, which allows sharing of network resources.
As the edge relies on distributed computing power, data offloading and balancing
must be considered to avoid overloading resources and improving availability and
efficiency [61].
Cloud Robotics. A cloud robot can be defined as a robotic system that uses data
from a network to assist operating tasks [62]. This is in contrast to a conventional
robotic system whereby all processing and related tasks are carried out in a standalone
system. As there is latency in passing data, often cloud robotics are designed with
some on board abilities as required for real-time control. Using the cloud allows
access to several possibilities including large libraries of data, parallel computing
for statistical analysis, collective learning between robots, and including multiple
humans in the loop.
Some of the challenges involved with cloud computing include privacy and secu-
rity requirements, along with varying network latency. Using the cloud with robotics
has the opportunity to allow a SaaS approach to robotic development where pack-
ages such as those used with ROS can be available without requiring the lengthy
setup currently associated with using them. Research in [63] investigated the use
of cloud computing for global path planning with large grid maps, as these tend to
be computationally intensive. This was performed by using a vertex-centric RA*
planning algorithm, implemented with Apache Giraph distributed graph processing
framework. The research ultimately found that current cloud computing techniques
are unsuitable for the real-time requirements of path planning due to the latency in
network connections.
Big Data. The use of IIoT and WSNs generates a large amount of data, commonly
referred to as big data. Big data analytics is then required to make sense of the large
quantity of data and provide useful insights which can be used for tasks such as struc-
tural health monitoring and diagnostics. It can be combined with cloud computing
and AI techniques to manage data and perform analysis. Big data is characterised by
4 main characteristics, known as the 4Vs [64] shown in Table 2.
Robotics and Artificial Intelligence in the Nuclear Industry: From … 141

Table 2 The 4Vs of big data


Volume Data can be in the order of 1000 + TB
[64]
Variety Data can be structured or unstructured, with varying
formats
Velocity Data is generated rapidly
Value Data must be processed to extract value

Within a manufacturing environment data can be produced from numerous


sources. This can include resource data from IoT equipment along with material
and product parameter and environmental readings. In addition, management data
can be included arising from information systems such as ERP and PDM, as well as
CAD data. Data must be cleaned and organised before being stored, at which point
it can then be analysed. The processed data can be integrated into a digital twin of a
system to provide real time feedback to operators and develop predictive capabilities
for a system [65].
A framework for managing big data during nuclear decommissioning was carried
out in [66]. This involved combining imaging robotics and machine learning to help
assess condition of EOL nuclear facilities. Data was collected using LiDAR, which
was subsequently processed using AI techniques and stored as a distributed file
system using Hadoop.
Big data has the potential for multiple applications in robotics such as for object
recognition and pose estimation. Using datasets such as the Willow Garage household
object dataset, models can be trained for parameters such as grasp stability, robust,
grasping, and scene comprehension [62]. A major challenge remains in producing
data that is suitable for cross platform applications by standardising formats and
ensuring good interoperability between systems. In addition, research is required
into sparse representation of data to improve transmission efficiency, and approaches
that are robust to dirty data.
Digital Twins and Simulation. In 2012, NASA released a paper [67] defining a
digital twin as “an integrated multi-physics, multiscale, probabilistic simulation of
an as-built vehicle or system that uses the best available physical models, sensor
updates, fleet history, etc., to mirror the life of its corresponding flying twin.” While
this definition was created with aerospace engineering in mind, the concept can be
extended to a wide range of systems. A more recent definition gives a digital twin
(DT) in terms of physical objects (POs) as “a comprehensive software representation
of an individual PO. It includes the properties, conditions, and behaviour(s) of the
real-life object through models and data [68].”
A DT is a set of realistic models that can simulate an object’s behaviour in the
deployed environment. The DT represents and reflects its physical twin and remains
its virtual counterpart across the object’s entire lifecycle. This means that a digital
twin can reflect to real world condition of a system, components, or system of systems,
along with providing access to relevant historical data.
Digital twins have applications across a range of industries and sectors; some key
examples are in smart cities and healthcare. IoT sensor that gather data from cities
142 D. Shanahan et al.

can be used to give insight into the use of utilities and how it may be possible to
save energy. Within manufacturing real time status of machines performance can
be obtained helping to predict maintenance issues and increase productivity. Within
medicine and healthcare, digital twins can be used to provide real time diagnostics
of the human body and may even allow simulation of the effects of certain drugs, or
surgical procedures [69].
There are some key enablers in related technology that have allowed the develop-
ment of more comprehensive digital twins. While digital modelling has been around
for decades, advances now allow real world scenarios to be replicated and testing
carried out without any risk to the physical system. Digital modelling, made possible
by advances in processor power, likewise can be used to make predictions about that
condition of a system.
As shown in Fig. 4, digital twins can have different applications depending on the
computing layer in which they operate. In addition, it is possible to have multiple
digital twins running simultaneously each providing a different application. This
could include deployment within the system of interest itself, allowing rapid to
response to conditions that fall outside of nominal operating conditions. Simultane-
ously, a digital twin could be applied to work with historical data, possible utilising
data from concurrent systems and allowing predictions that can influence mainte-
nance strategies and test possible scenarios. A key consideration in the development
of a digital twin is the modelling method used; these can be categorised into data-
driven or physics-based modelling, however modelling approaches also exist as a
combination of these two techniques.
Multi-agent Systems. The concept of multi-agent system (MAS) developed from
the field of distributed artificial intelligence (DAI) first seen in the 1970s. DAI can
be defined as “the study, construction, and application of multiagent systems, that
is, systems in which several interacting, intelligent agents pursue some set of goals
or perform some set of tasks” [71]. An agent can be either cyber based such as a
software program, sometimes referred to as a bot, or a physical robot that can interact
directly with its environment.
Research in DAI has developed as a necessity from the ever-increasing distributed
computing in modern systems. A major aspect of DAI is that agents are intelligent,

Fig. 4 Digital twin strategies at different levels [70]


Robotics and Artificial Intelligence in the Nuclear Industry: From … 143

and therefore have some degree of flexibility while being able to optimise their
operations in relation to a given performance indicator. Typically, an agent will
have a limited set of possible actions which is known as its effectoric capability.
This capability may vary dependent on the current state of the environment. The
task environment for an agent can be characterised according to a set of properties.
According to [72], these can be defined as.
• Observability—whether the complete state of the environment is available.
• Number of agents—how many agents are operating in the environment; this also
required the consideration of how an agent will act, if it is competitive, cooperative
or can be viewed as a simple entity.
• Causality—the environment may be deterministic allowing future states to be
predicted with certainty, or alternatively stochastic in which there will be a level
of uncertainty in actions.
• Continuity—whether previous decisions affect future decisions.
• Changeability—an environment can be static or dynamic and requiring continual
updates.
• State—can either be discrete or continuous.
Agents can have software architectures not dissimilar from those of robots. Reflex
agents respond directly to stimuli, while model-based agents use an internal state to
maintain a belief of the environment. Goal based agents may involve some aspects
of search and planning in order to achieve a specific goal, utility agents have some
sense of optimising, and learning agents can improve their performance based on a
given performance indicator.
MASs can assist with many tasks required of the robots in a previously unknown
environment such as planning, scheduling, and autonomous operation. The perfor-
mance of a multi-agent system can be measured in terms of rationality, i.e., choosing
the correct action. There are 4 main approaches to implementation of a MAS are
shown in Table 3.

Table 3 MAS approaches


Acting humanly An approach based on the Turing test whereby an agent can respond to
written questions, with a result that is indistinguishable from if they had
been given by a human
Thinking humanly This approach relies on programming the working of the human
mind—this has developed into the field of cognitive science
Thinking rationally At its centre, this approach uses logic and reasoning. The main challenges
with this approach are formalising knowledge and managing the
computational burden when analysing a problem with a large number of
aspects
Acting rationally An approach that uses multiple methods to achieve rationality and
optimise performance for a given task
144 D. Shanahan et al.

Deep Learning Techniques. Machine learning has developed as a culmination of


research in associated research areas such as statistics and computer science. The
main goal of a machine learning approach is to generate predictions via a model
inference algorithm, using a machine learning model that has been trained for a
specific purpose. Different approaches for training can be implemented depending
on the characteristic of the data set being used such as supervised, unsupervised, or
reinforcement learning. Training can be performed by splitting a dataset to give a
training set, along with a test set which can be used to optimise the model by varying
parameters based to minimise a loss function [73].
Deep learning algorithms, sometimes also referred to as artificial neural networks
or multilayer perceptrons, are a subset of machine learning algorithms that mimic
pathways through the brain and are designed to recognise patterns. So far, deep
learning algorithms have generated a great impact in semantic image processing
and designing vision systems for robotic applications. The main advantage of deep
learning techniques is their potential to extract features from images and classify
them using data and with minimum human intervention when they are trained.
The neural networks in deep learning application are composed of nodes, or
neurons, which can be used to solve a variety of problems, most commonly separating
data into groups. Hidden layers add to the depth of the network using activation
functions which convert an input signal to an output that may be passed to the next
layer. These functions can be setup to create different networks such as feedforward,
recurrent or convolutional, dependent on the desired results.
Deep learning can be used to improve the performance of machine learning
algorithms through building representations from simpler representations, therefore
allowing more abstract features to be identified. Neural networks use weightings to
control the outputs. These can be determined by training the network, either online
or offline. This is done by showing the network many examples of different classes,
known as supervised learning, and minimising the cost function. With multiple
weightings, it is infeasible to test all possibilities. In this case, it is possible to find
the optimum weight of the cost function through gradient descent [74].
Human–Robot Collaboration. In the context of nuclear scenarios, human–robot
collaboration refers to a teleoperation scheme, where robot assists human in a virtual
interaction to perform the task. Robot motion would benefit from human-in-the-
loop cooperative framework, where robot autonomy can be effectively compensated
by introducing human intelligence. It has been shown in [75] that learning from
and adapting to humans can overcome the difficulties of decision making in fully
manual and autonomous tasks. Therefore, mixture of human and robot intelligence
presents opportunities for effective teamwork and task-directed assistance in nuclear
decommissioning context. On the other hand, the robot’s autonomy is often limited
by its perception capabilities when performing complex tasks. Projecting sensori-
motor capabilities onto the robot side would facilitate to reduce the high demand on
environment perception while keeping the operation safety.
Current nuclear robots are typically commanded according to a master–slave
scheme (Level 1–2 in Fig. 1), where the human operator at a console remotely controls
Robotics and Artificial Intelligence in the Nuclear Industry: From … 145

the robot during operation using visual and haptic feedback through the workstation.
However, local sensors equipping the tools at the remote environment may provide
better quality or complementary sensory information. Additionally, the operator’s
performance may decrease with fatigue, and in general robotic control should be
adapted to the operator’s specific sensorimotor abilities. Therefore, empowering
robots with some autonomy would effectively regulate flexible interaction behaviour.
In this regard, how to understand and predict human motion intent is the funda-
mental problem for the robot side. Three human motion intents are typically consid-
ered in physical human–robot interaction, namely motion intent, action intent and
task-level intent [75], which covers human intent in the short to long time and even
full task horizon. A more intuitive approach is developing physical or semi-physical
connection between robot and human. Such method can be observed in human–
human interaction literature. In terms of the performances of physically interacting
subjects in a tracking task, subjects can improve their performance by interacting with
(even a worse) partner [76]. Similar performances obtained with a robotic partner
demonstrated that this property results from the sensory exchange between the two
agents via the haptic channel [77]. This haptic communication is currently exploited
to develop sensory augmentation in a human–robot system [78].
When both human and robot perform partner intent estimation and interaction
control design, it is essential to investigate the system performance under this process
of bilateral adaptation. Game theory is a suitable mathematical tool to address this
problem [79]. For example, for novice or operator in fatigue, the robot controller
tends to compensate for human motion to mitigate any possible adverse effects due
to incorrect human manipulation. On the other hand, human may still wish to operate
along his/her own target trajectory. With the estimated intent and gain from the robot
side, human can counteract robot’s effects by further increasing his/her control input
(gain). Thus, there is haptic communication between human and robot regarding
their respective behaviour. The strategies of the two controllers can converge to a
Nash equilibrium as defined in non-cooperative game theory.

5 A Cyber Physical Nuclear Robotic System

The overall architecture of the cyber physical system proposed for the nuclear decom-
missioning applications is illustrated in Fig. 5. This architecture consists of two layers
namely ‘Cyber Environment’ and Extreme ‘Physical Environment’. Each robot based
on the local information from its own sensor, information from other robots and the
cloud, and a mission defined by ‘Decision Making and Action Invoking Unit’ or
‘Assisted Teleoperator’ can obtain a proper environmental perception and design a
proper path planning to avoid obstacles and conduct to the mission position.
As can be seen from Fig. 5, the proposed system involves various integrated
heterogeneous subsystems with a great level of interoperability and a high level of
automation in both vertical and horizontal dimensions of the system. In this sense,
the proposed architecture can be viewed as a complex system of systems with a
146 D. Shanahan et al.

Acon Invoking Unit


Decision Making and
Cyber Environment

Data Analycs Digital Twin (ROS, Assisted


and AI Algorithms VR, etc) Tele-operator

Autonomous Roboc Agent #i on the Edge


Cloud Infrastructure

Communicaon

Sensing

Percepon
Agent #1
Object Pose
Recognion Esmaon
Sensor Feature
Fusion Extracon
Extreme Physica Environment

Detecon
Fault

Introspecve Autonomy

Avoidance
Obstacle
Agent #2 Agent #i Agent #N

SLAM
Isolaon

Navigaon
Fault
FTC

Constraints

Planning
Robot

Path
Accommodaon
Fault
Sensor #1 Sensor #i Sensor #K

Actuaon
Cloud
Asset #1 Asset #i Asset #M

Fig. 5 A schematic block diagram of the proposed solution. The components studied in this project
and their relevance to the work packages are highlighted in light blue colour

hierarchy of various software, hardware, algorithms, and robotic platforms aiming


to enhance the autonomous operation of the overall system and execute complex
decommissioning tasks in an uncertain and unstructured environment. The proposed
system relies on a heterogeneous multi robotics system by which the complex decom-
missioning tasks are distributed and executed in the most efficient way to reduce the
execution time and hence the exposure of the robots to high does radioactive environ-
ments. The conceptual cyber physical system illustrated in Fig. 5 follows the level
5 of autonomy in the autonomy taxonomy levels depicted in Fig. 2. The proposed
architecture works based on the 5C architecture discussed in Fig. 2 of the book
chapter.
At the very bottom level, the system involves a mobile sensor network using a
set of aerial and ground-based mobile manipulators. The mobile robots are equipped
with a range of sensors to collect multi-sensory data from the physical assets and the
surrounding environment. The collected data is processed in real-time on the edge
and through the on-board computer available on the robotic platforms and used to
characterise the nuclear environment. This is usually carried out by estimating the
spatial radiation field and other environmental variables such as temperature and
humidity, and identifying different objects and their positions within the generated
Robotics and Artificial Intelligence in the Nuclear Industry: From … 147

map of the nuclear site. For autonomous operation of individual robots and their
interaction with other robots, various algorithms such as motion planning, trajec-
tory tracking controller, simultaneous localisation and mapping, object detection,
and pose estimation techniques should be designed by respecting the environmental
constraints imposed on each robot. In the following, the most important components
of the proposed cyber physical system are explained in more depth.

5.1 Software Architectures

The development of more advanced robotics has necessitated better structures for
designing robots based on how subsystems interact or communicate. Developments
have seen the formation of paradigms which provide approaches for solving robotic
design problems. Initial robots such as Shakey [80], were based on the sense-plan-act
(hierarchical) paradigm. As robotics developed, this paradigm became inadequate.
Planning would take too long and operating in a dynamic environment meant that
plans may be outdated by the time they are executed which may be challenging.
To address the challenges in control architecture, new paradigms were created
such as using reactive planning. A reactive system is one of the simplest forms of
the control architecture depicted in Fig. 6. While not effective for general purpose
applications, this paradigm is particularly useful in applications where fast execution
is required. This structure also exhibits similarities with some biological systems.
Another alternative approach that has been widely adopted is the subsumption archi-
tecture in Fig. 7, which is built from layers of interacting behaviours. This utilised an
arbitration method that would allow higher-level behaviours to override lower-level
ones [81]. While this method proved popular, it was unable to deal with longer term
planning.

Fig. 6 Example of SPA


architecture [81]

Fig. 7 Example of
subsumption architecture
[81]
148 D. Shanahan et al.

Fig. 8 Real-time control system architecture [83]

Hybrid Architecture. Hybrid architectures combine both reactive and deliberative


control architectures to give a combined behaviour, they are sometime also referred to
as layered. Many robots have been designed according to the hybrid model including
MITRE, ATLANTIS, and LAAS [82]. A good example of structured approach to
control implementation is the Real-Time Control System (RCS) reference model
architecture depicted in Fig. 8 [83]. The architecture provides a way to organise
system complexity based on a hierarchical structure and takes account of the entire
range of processes that affect the system. Figure 8 shows a high-level overview of the
system architecture. The same structure can also be applied at various levels of the
system hierarchy whereby tasks are decomposed and passed to subordinate levels.
Commonly used is a three-tiered approach to hybrid control architecture. This
consist of behavioural control at the lowest level which is involved interfacing with
components. Above this is an executive level which manages current tasks, and on
the highest level is the planning tier which is used for achieving long term goals.
Middleware/Communication. As the proposed robotic CPS consists of many
components interacting with each other, they require software that can facili-
tate internal communication between different software and hardware modules. A
middleware allows this by providing an abstraction layer to communicate between
devices with different protocols. Typically, this is achieved by a client–server or
publish-subscribe approach such as used by the popular framework Robot Operating
System (ROS). This open-source software has the key features of being modular and
reusable and is integrated with other software libraries such as OpenCV for real-time
computer vision, and open-source Python libraries such as KERAS and OpenAI for
machine learning, allowing for rapid development of advanced robotic systems.
Robotics and Artificial Intelligence in the Nuclear Industry: From … 149

5.2 Autonomous Multi-robot Systems

The early generations of the multi-robot system proposed in Fig. 5 were tele-operated
ensuring that workers need not enter hazardous environments. Such systems are
complex to operate and require intensive human supervision. The next generation will
be the autonomous multi-robot systems that has a hierarchical architecture depicted
in Fig. 8.
A robot can be defined as an autonomous system if it exists in the physical world,
senses its environment, acts on what it senses and has a goal or objective. They
consist of a mixture of hardware and software components which are designed
to work together. Robot designs and suitable components will vary depending on
the intended application along with constraints regarding size, power, and budget
[84]. Autonomous robots require electronics for different purposes; often a micro
controller is used for motors while a separate and more powerful on-board computer
is used for processing sensor data. Since the electronics components are vulnerable
to radiations the current practice to protect them is to use the radiation hardened
electronics. This may increase the TID measures of the nuclear robot, however, from
the viewpoint of completing a specific mission this may not be a sufficient time.
Also, deploying numerous single robots will not necessarily improve the progress of
the intended mission.
Power efficiency is a key consideration for designing embedded systems in robotic
applications. This may constrain practical implementation of advanced processing
algorithms such as deep learning techniques which may alternatively be done using
cloud-based form of computing.
A multi-robot system also requires hardware for external communication. The
simplest method of communication is using a wired connection. This is straightfor-
ward if the robot is static. However, for mobile robots a tether can become tangled or
caught. Alternatively, wireless communication methods are often used, for example
using Wi-Fi, allowing for greater system mobility and reconfiguration along with
quicker setup times due to the reduction in cabling required. Within a radioactive
environment, wireless communication and in particularly wireless sensor networks
can be challenging to implement. Some of the challenges include lack of acces-
sible power sources, radiation protection of communication system components,
and reinforced wall in nuclear plants that results in significant signal attenuation,
along with the need to ensure all communications are secure and reliable. A new
wireless communication design architecture using nodes with a base station has
recently been tested at the Sellafield site, UK, and was shown to be operationally
effective within reinforced concrete structures found in nuclear plants while the low
complexity sensor nodes used allow for greater radiation tolerance [85]. A low power
but rather lower range communication methods such as Bluetooth and ZigBee can be
also attempted for deployment of wireless sensor network in nuclear environments.
Using multi-robot or modular cheap robotic units with simple functionality can be
a possible solution to radiation exposure or TID problem. Such a system is inherently
redundant and the collective behaviour of the multi-robot system is significant. In
150 D. Shanahan et al.

order to better understand the behaviour of a multi-robot system, it is useful to have


a kinematics model. A key element in modelling of a robot is the generalised state
of the system. This provides a complete representation of the multi-robot system
by adding the states of the individual robots together. This is completed by defining
the local and global reference frames to define the location of the robot and the
relative distance between the multi-robot system and the extracted features in the
environment. In addition to the state, forward and inverse kinematics of the robots
are required for complete understanding of the robot behaviour. Building on the
kinematics, a dynamic model can also be developed to give a relationship between
control inputs and states.

5.3 Control System Design

A robot acts on its environment in two main ways: locomotion and manipulation. This
is achieved by various components depending on the task that is required. Effectors
are components or devices that allow a robot to carry out tasks such as moving and
grasping and consist of or are driven by actuators. Actuators typically produce linear
or rotary motion and may be powered via different mediums including electrics,
hydraulics, or pneumatics. While an actuator typically has one degree of freedom, a
robot can be designed with mechanisms comprising joints and links giving multiple
degrees of freedom.
Dexterous operation of the robotic manipulators in the restricted environments
existing in the nuclear sites require degrees of freedom more than required for a
task. In this case, the manipulator is kinematically redundant and higher degrees of
freedom are used to satisfy various environmental and robotics constraints such as
obstacle avoidance, joint limit avoidance, avoiding singularities, etc. Figure 9 illus-
trates the detailed block diagram of the control system designed for the manipulation
and grasping of a single dual arm robot. Similarly, the control system designed for
autonomous operation of a single UAV in an unstructured environment is depicted
in Fig. 10.
Improving control systems leads to better performance, in turn allowing faster
operation. In the development of new system, better control can also allow the use
of smaller and lighter weight components for a given application. As many phys-
ical systems are inherently non-linear, there has been increasing demand for control
schemes that can improve control of such systems. A common example of a system
requiring non-linear control is the hydraulic manipulator. Greater availability of
computing power has allowed more complex control schemes to be implemented as
the real time control. Nevertheless, proportional-derivative and proportional-integral-
derivative control are still commonly used controllers for industrial robotics [86].
Developing control systems using non-linear control methods has many benefits
over classical methods [87]. An overview of these is.
Robotics and Artificial Intelligence in the Nuclear Industry: From … 151

Fig. 9 The schematic of the control system designed for a dual arm manipulator

Fig. 10 The schematic of the control system designed for autonomous operation of a single UAV

• Improvement in performance over linear control methods.


• Control of systems that cannot easily be linearised, such as those with hard
nonlinearities.
• Reliable control in the presence of model uncertainties and external disturbances,
classed as robust controllers and adaptive controllers.
• The possibility of simpler control design due to methods being more consistent
with the true system physics.
• Guarantee of stability.
152 D. Shanahan et al.

• Cost saving, using cheaper components which are not required to have a linear
operating region.
Control Algorithms. Using the inverse dynamics of a system, a linear response can
be obtained by an inverse dynamics control, sometimes also called computed torque.
Often when designing a control scheme, knowledge of system of parameters is not
perfect. One technique to address this is adaptive control. Adaptive control uses
system identification techniques in combination with a model-based control law that
allows model parameters to be updated in response to the error.
As with most control laws, the challenge of a rigorous proof of stability is of central
importance [88]. Unlike traditional continuous-time systems, distanced communi-
cation in cyber-physical nuclear robotic systems would potentially result in time
delay, data loss, and even instability. Since the active nodes in the network are typi-
cally not synchronised with each other, the number of delay sources tends to be
stochastic. Given the potential network congestion and packet loss in closed-loop
robotic systems, the delay is typically assumed to follow a known probability distri-
bution [89]. From the perspective of model simplification, time delay sequence can
be hypothesised to be independently and identically distributed, where Markov chain
would benefit the modelling of network-induced delays [90, 91]. Therefore, main-
taining system stability in the presence of time delays is critical yet challenging
for systematic stability analysis and control synthesis. A classic control strategy is
built on the basis of passivity theory, which characterises the input–output property
of the system’s dissipation. The energy stored in passive system does not exceed
the energy imported from the environment. Passivity-based time-delay stabilisa-
tion methods have been intensively studied and have yielded rich results, such as
scattering approach, wave variable, damping injection, and time domain passivity
approach [92]. Different from passive-based methods, predictive control strategies
avoid passive analysis by compensating for the uncertainty of communication delays
in the system and provide unique advantages especially in handling constraints and
uncertain delays [93]. Another alternative approach is machine learning control
which is a subset of optimal control that allows improvements in performance as
the system being controlled repeats tasks, not necessarily involving the use of a para-
metric model. This allows compensation of difficult to model phenomena such as
friction and system nonlinearities. This type of control can also be used to validated
dynamic modes of a system by studying how the torque error function varies over a
system operating envelope. Research detailed in [94] shows how a remote centre of
movement (RCM) strategy can be used to avoid collisions while working with small
openings. This is particularly relevant to nuclear whereby ports are used to access
highly radioactive environments.
The physical and computational components are intertwined through communi-
cation networks in cyber-physical systems (CPSs). When the control design for the
CPSs is considered, the CPSs can also be regarded as one type of nonlinear networked
control systems (NCSs), in which the plant is controlled through the communication
networks [95–98]. For the NCSs, one main concern is the security, in which the
communication network may suffer from different types of malicious attacks, such
Robotics and Artificial Intelligence in the Nuclear Industry: From … 153

as deception attacks, replay attacks, denial-of-service (DoS) attacks, etc. [99–101].


Different from the deception attacks, where the attacks are trying to alter data from
the sensors and controllers, the DoS attacks are trying to jam the communication
networks for the control systems and deteriorate the control performance [99]. In the
paper, the last type, i.e., DoS attacks type, is investigated in the networked control
design. To improve the communication efficiency and save communication resources,
the event-triggered mechanism is promising to be implemented in NCSs [102–106].
Following the event-triggered mechanism in the communication networks, instead of
sending all the sampled data, only the sampled data considered as “very necessary”
will be sent from the plant end to the controller end. Adopting the event-triggered
mechanism into the NCSs, the control performance can be maintained as desired with
lower consumption of the communication bandwidth. Having mentioned the merits
of utilising event-triggered mechanism in NCSs, the event-triggered mechanism will
make the stability analysis more complex and the feasible stability conditions are
more difficult to obtain.
Fault tolerant control (FTC) schemes are those that include techniques for reducing
downtime and identifying developing faults and safety risks, by predicting, recog-
nising faults early, and mitigating effects of faults before they develop into more
serious problems. Within electrohydraulic systems, there are many faults that may
possibly develop which can impact tracking performance and system safety. These
include leakages, changes in supply pressure, and sensor faults [107]. FTC control
can be broadly categorised into passive and active approaches. Passive FTC can often
be implemented as an extension of robust control by which a fixed controllers are
chosen that can satisfy requirements for all expected operating conditions. While
this provides redundancy and improves system reliability, this is often at the cost
of system performance as both normal operation and faults are considered together.
Conversely, active FTC employs fault detection and diagnosis (FDD). Once a fault
is detected, the active DTC scheme will generate at first an admissible control before
gradually improving system performance once the system safety is guaranteed [108].
An example of FTC with FDD for robot manipulators with unknown loss of torque
at a joint is detailed in [109]. Experimental test showed that the scheme was able to
reduce tracking errors related to both bias and additive time-varying faults.
Visual servoing employs data obtained via a vision sensor in order to provide feed-
back and control the motion of a robot. Classical approaches include image-based
(IBVS) and pose-based (PBVS) visual servo control. From this, a global control
scheme can be created, such as a velocity controller. PBVS tends to be computation-
ally expensive while IBVS creates problems for control as it omits the pose estimation
step used with PBVS and so requires image features to be solved from a nonlinear
function of the camera pose [110]. For reliable visual servoing, it is important to
have good object tracking ability, in order to be able to specify the desired end posi-
tion. Tracking can be achieved through different methods including using fiducial
markers, 2-D contour tracking and pose estimation. The two basic configurations
of the end effector and camera are eye-in-hand, whereby the camera is attached to
the end effector, and eye-to-hand, where the camera is fixed externally in the world.
Visual servoing can be improved through good path planning, which accounts for
154 D. Shanahan et al.

constraints and uncertainties within a system. Path planning approaches for visual
servoing can be categorised into 4 groups: image space, optimisation-based, potential
field-based, and global [111].

5.4 Motion Planning Algorithms

Motion planning is the process of determining the sequence of actions and motions
required to achieve a goal such as moving from point A to B, in the case of a
manipulator this will be moving or rearranging objects in an environment. At present
in the nuclear industry teleoperation in still widely used which can be slow and tedious
for operators due to poor situational awareness and difficulty in using controllers.
Improved motion planning capability has the possibility of speeding up tasks such
as packing waste into safe storage containers [112].
The goal of a planner is to find a solution to a planning problem while satisfying
any constraints such as the kinematic and dynamics constraints of the links along with
constraints that arise from the environment such as obstacles. This must be done in
the presence of uncertainties arising from modelling, actuation, and sensing. Motion
planning has traditionally been split into macro planning for large scale movements,
and fine planning for high precision. Typically, as the level of constraints increases
such as during fine motor movements, feedback must be obtained at a higher rate
and actions are more computationally expensive [111].
One of the most basic forms of motion planning is by artificial potential fields
which operates by incrementally exploring free space until a solution is found. This
can be treated as an optimisation problem, using gradient descent to fins the optimal
solution. Other methods include graph-based path planning which evaluate different
path trees using a graph representation to get from the start to goal state. Examples
of these are A*, Dijkstra, Breadth First Search (BFS), and Depth First Search (DFS).
Alternatively, sampling-based path planning can be used which randomly add points
to a tree until a solution is found. Examples include RRT and PRM [113].
Graphs are often used in motion planners, consisting of nodes which typically
represent states, and edges which represent the ability to move between two nodes.
Graphs can be directed or undirected, depending on whether edges are bidirectional.
Weightings can be given to edges as a cost associated with traversing it. A tree refers
to a graph with one root node, several leaf nodes, no cycles and at most one parent
node per node. Graphs can also be represented as matrices. Once a graph is built,
it can then be searched. A* is a popular best-first search algorithm which finds the
minimum-cost path for a graph. It is a popular and efficient search technique which
has been applied to general manipulators and those with revolute joints in. In order to
allow a search such as A*, the configuration space must be discretised, which is most
easily done with a grid. Using a grid requires consideration of the appropriate costs
as well as in how many directions a path planner can travel. Multi-resolution grids
can be used which repeatedly subdivide cells that are in contact with an obstacle, and
Robotics and Artificial Intelligence in the Nuclear Industry: From … 155

therefore reduce the computational complexity that is one of the main drawbacks of
using grid methods [114].
Sampling methods sacrifice resolution-optimality but can find satisficing solutions
quickly. These methods are spilt into two main groups: RRT for single-query and
PRM for multiple-query planning. RRT is a data structure sampling scheme that
can quickly search high dimensional spaces that have both algebraic and differential
constraints. It does this through biasing exploration in the state space to “pull” towards
unexplored areas [115]. PRM uses randomly generated free configurations of the
robot which are then connected and stored as a graph. This learning phase is then
followed by the query phase where a graph search is used to connect two nodes
within the roadmap from the start and goal configurations. Segments of the path are
then concatenated to find a full path for the robot. Difficulties found when querying
can then be used to improve the roadmap [116].
Research into robotic path planning is an active research area; recently develop-
ments have utilised machine learning and deep neural networks to solve complex
multi-objective planning problems [117]. The increasing number of planning algo-
rithms is necessitating greater capability for benchmarking and compare algorithms
on the basis of performance indicators such as computational efficiency, success
rates, and optimality of the generated paths. Recently, the PathBench framework has
been created that allows comparison of traditional sample- and graph-based algo-
rithms and newer ML-base algorithms such as value iteration networks and long
short-term memory (LSTM) networks [118]. This is achieved by using a simulator
to test each algorithm, a generator and trainer component for the ML models, and an
analyser component to generate statistical data on each trial. Research on the Path-
Bench platform has shown that at present ML algorithms have longer path planning
times in comparison to classical planning approaches [117].
Another key component of planning manipulation tasks is the ability to properly
grasp an object. Research by Levine et al. [119] used a deep convolutional neural
network to predict the chance of a successful grasp, and a continuous servoing mech-
anism to update the motor commands. The CNN was trained with data from over
80,000 grasp attempts. These were obtained with the same robot model; however,
each robot is not identical and so the differences provided a diverse dataset for the
neural network to learn from. The proposed grasping methods can find non-obvious
grasping strategies and have the possibility to be extended to a wider range of grasping
strategies as the dataset increases.

5.5 Vision and Perception

Sensors are a key component of robotic needed for measuring physical properties
such as position and velocity. They can be classified as either proprioceptive or
exteroceptive depending on whether they take measurements of the robot itself or
of the environment. They can also be categorised according to energy output, being
156 D. Shanahan et al.

either active or passive. Commonly used sensors include LiDAR (Light Detection
and Ranging), SONAR (Sound Navigation and Ranging), RADAR (Radio Detection
and Ranging), RGB Camera, and RGB-D Camera.
Processing sensor data can be a difficult task in robotics as measurements are
often noisy, can be intermittent, and must sometimes be taken indirectly. Many
sensors provide a large amount of data which requires processing to extract useful
components such as for obstacle detection and object recognition. One of the most
commonly used forms of sensing is by visual data from a camera. Light has many
properties that can be measured such as intensity and wavelength and can interact by
different means such as absorption and reflection. Using an image, the size, shape,
and/or position of an object can be determined. Digital cameras use a light sensor to
convert a projection of the 3D world into a 2D image, a technique known as perspec-
tive projection. As images are collected using a lens, it is important to account for
how the image is formed as it passes through the lens such as using the thin lens
equation to relate the distances between the object and image [110].
Recently, deep neural networks have been used for a range of computer vision
tasks such as object recognition and classification, depth map inference, and pose
and motion estimation. This can be extended to visual servoing, using direct visual
servoing (DVS) which does not require feature extraction or tracking. Using a deep
neural network the convergence domain can be increased to create a CNN-based VS
as in [120].
Object Recognition and Detection. Object recognition has many applications such
as position measurement, inspection, sorting, counting, and detection. Requirements
for an object recognition task vary depending on the application and may include
evaluation time, accuracy, recognition reliability, and invariance. Invariance can be
with respect to illumination, scale, rotation, background clutter, partial occlusion, and
viewpoint change [121]. In unstructured environments such as with nuclear decom-
missioning, it is likely that all these aspects will in some way affect the object recog-
nition algorithm. A neural network can be used for image classification, outputting a
probability for a given object. Images can be converted to a standard array of pixels
which are then used as input to the neural network [74].
Deformable part models (DPM) for discriminative training of classifiers have
shown to be efficient and accurate on difficult datasets in [122]. This can also be
implemented using C++ with OpenCV which combines the DPM with a cascade
algorithm to speed up detection. Research in [123] looked to exploit visual context,
to aid object recognition, for example by identifying the room an object is in, and then
using that information to help narrow down possible objects. This allows recognition
with less local stimulus.
Advances in deep learning algorithms have opened up new avenues of research in
object detection. Some commonly used detectors include R-CNN, YOLO and SSD,
which generally operate by localising in terms of bounding boxes [124]. Research in
[125] looks to address the issues within the nuclear industry of detecting and cate-
gorising waste objects using a RGB-D camera. Common objects in decommissioning
includes PPE, tools, and pipes. These need to be detected, categorised, sorted, and
Robotics and Artificial Intelligence in the Nuclear Industry: From … 157

segregated according to their radioactivity level. Presently DCNN methods are a good
solution for object detection and recognition; however, they rely on large amounts
of training data which may be unavailable for new applications. Research at the
University of Birmingham [125] looked to address this by using weakly-supervised
deep learning for detection and recognition of common nuclear waste objects. Using
minimally annotated data for initial training the network was able to handle sparse
examples and was able to be implemented in a real time recognition pipeline for
detecting and categorising unknown waste objects. Researchers at the University of
Birmingham have also been able to expand 3D geometric reconstruction to allow
semantic mapping and provide understanding of features within scene contexts. This
was achieved using a Pixel-Voxel network to process RGB image and point cloud
data [126].
Pose Estimation. 6D object detection is the combination of object detection with
6D pose estimation. Within manufacturing there is demand for algorithms that can
perform 6D object detection for tasks such as grasping and quality control. Tasks
situations in manufacturing have the benefits of known CAD models, good cameras,
and controlled environments in terms of lightning. In the nuclear industry this is
often not the case and algorithms are required that can withstand lack of these factors
along with other difficulties such as occlusions, lack of textures, unknown instances
and colours, and difficult surface properties. A common approach to training 6D
object detection algorithms is via model-based training. This use CAD models to
generate augmented images; for geometric manipulations producing training images
is straightforward, while generating images with variations in surface properties or
projections can be more difficult. It is still however more cost effective and time
efficient to use model based training when possible rather than creating real images,
which can be particularly problematic in varying environmental conditions [127].

5.6 Digital Twins in Nuclear Environment

The creation of digital twins has been identified by the UK’s National Nuclear Labo-
ratory (NNL) as an area that could be adapted for use in a nuclear environment. Digital
twins have the possibility to be utilised throughout the lifespan of a nuclear facility.
However, while the technology may be available, implementing digital twins could
be more difficult due to stringent requirements for safety and security demanded by
regulatory bodies. Despite this the nuclear industry is in a good position to make better
use of digital twin technology, having already built a strong system of documenta-
tion based on destructive and non-destructive testing and analysis of components
and infrastructure [128]. Some progress is development of digital twins solutions for
nuclear has been made by the consortium for advanced simulation of INRs [129]. In
[130], digital twins are identified as a possible to technology to help the UK develop
the next generation of nuclear plants. This can be achieved by benefits including
158 D. Shanahan et al.

increased efficiency and improved safety analysis. This would be building on the
Integrated Nuclear Digital Environment (INDE) as proposed in [128].
Digital twins offer the opportunity to visualise and simulate work tasks in a virtual
environment using up to date data to plan work and improve efficiency and safety.
While task simulation may not require a digital twin, and a digital twin may not
necessarily be used for task simulation. The authors in [131] used Choreonoid to
simulate tasks to be performed by remotely controlled robots. To achieve this, they
developed and used several plug-ins for the software to emulate behaviour such as
underwater, aerial, camera-view modifications and disturbance, gamma camera, and
communication failure effect. In another set of research, a digital twin in virtual
environment is used to analyse scenarios involving a remote teleoperation system.
Beneficial to using the digital twin is the opportunity to test configuration changes
including in development of a convolutional neural network [132].
In [133] the authors developed a digital environment within Gazebo to allow the
simulation of ionising radiation to study the effects of interactions with radioactive
sources and how radiation detectors can be better developed. While this allowed
some research into the optimisation of robotic activities in radioactive environments,
due to the heavy computational burden of modelling complex radiation sources,
some simplifications had to be made such as point sources and assumption of a
constant radioactivity. Simulation can also often used in the development of control
systems, to aid with system design and operator training. A real-time simulator was
developed in [134] that was verified using open loop control experiments, and then
was subsequently applied to investigate the performance of trajectory tracking and
pipe-cutting tasks.

6 Conclusions

Future work will be focused on developing a virtual environment to allow sharing


of data between robots. This can be combined with improved methods to facilitate
human-in-the-loop control where data processing can be embedded within system to
provide insights and predictions to an operator, allowing for more efficient comple-
tion of tasks. In addition, fault tolerant control schemes will be researched to allow
for more robust systems that are able to handle the demands of a nuclear environment.
On going research will require collaboration with both research groups and industry
to ensure results are feasible in relation to regulatory requirements.
This chapter has presented the background on decontamination and decommis-
sioning tasks, along with a review of current methods in industry to perform D&D
operations. A key goal for the industry is to develop more autonomous systems
which can reduce the need for workers to enter dangerous radioactive environments
or spend excessive time operating equipment. Such advances have applications and
benefits in related industries and other hazardous environments. Building upon more
advanced autonomous systems, the concept of cyber physical system is introduced
Robotics and Artificial Intelligence in the Nuclear Industry: From … 159

and some of the progress made in utilising such systems in manufacturing as part
of the industry 4.0 concept are detailed. Finally, an overview of the enabling tech-
nologies along with a concept framework for a nuclear decommissioning CPS is
developed with attention to how developments in Industry 4.0 can be transferred for
application in nuclear decommissioning activities.

References

1. NAO. (2022). The decommissioning of the AGR nuclear power stations. https://www.nao.
org.uk/report/the-decommissioning-of-the-agr-nuclear-power-stations/.
2. Nuclear Decommissioning Authority. (2022). Nuclear Decommissioning Authority Annual
Report and Account 2021/22. http://www.nda.gov.uk/documents/upload/Annual-Report-and-
Accounts-2010-2011.pdf.
3. NEA. (2014). R&D and Innovation Needs for Decommissioning Nuclear Facili-
ties. https://www.oecd-nea.org/jcms/pl_14898/r-d-and-innovation-needs-for-decommission
ing-nuclear-facilities.
4. Industry Radiological Protection Co-ordination Group. (2012). The application of ALARP
to radiological risk, (IRPCG) Group.
5. Marturi, N., et al. (2017). Towards advanced robotic manipulations for nuclear decommis-
sioning. In Robots operating in hazardous environments. https://doi.org/10.5772/intechopen.
69739.
6. Watson, S., Lennox, B., & Jones, J. (2020). Robots and autonomous systems for nuclear
environments.
7. Sellafield Ltd. (2021). Future research and development requirements 2021 (pp. 1–32).
8. NDA. (2019). Integrated waste management radioactive waste strategy. https://www.gov.uk/
government/consultations/nda-radioactive-waste-management-strategy.
9. Bogue, R. (2015). Robots in the nuclear industry: a review of technologies and applications.
10. Montazeri, A., & Ekotuyo, J. (2016). Development of dynamic model of a 7DOF hydraulically
actuated tele-operated robot for decommissioning applications. In Proceedings of American
Control Conference (Vol. 2016-July, pp. 1209–1214). https://doi.org/10.1109/ACC.2016.752
5082. (Jul 2016).
11. Montazeri, A., West, C., Monk, S. D., & Taylor, C. J. (2017). Dynamic modelling and param-
eter estimation of a hydraulic robot manipulator using a multi-objective genetic algorithm.
International Journal of Control, 90(4), 661–683. https://doi.org/10.1080/00207179.2016.
1230231.
12. West, C., Montazeri, A., Monk, S. D., & Taylor, C. J. (2016). A genetic algorithm approach
for parameter optimization of a 7DOF robotic manipulator. IFAC-PapersOnLine, 49(12),
1261–1266. https://doi.org/10.1016/j.ifacol.2016.07.688.
13. West, C., Montazeri, A., Monk, S. D., Duda, D. & Taylor, C. J. (2017). A new approach to
improve the parameter estimation accuracy in robotic manipulators using a multi-objective
output error identification technique. In RO-MAN 2017-26th IEEE International Symposium
on Robot and Human Interactive Communication, Dec. 2017 (Vol. 2017-Jan, pp. 1406–1411).
https://doi.org/10.1109/ROMAN.2017.8172488.
14. Burrell, T., Montazeri, A., Monk, S., & Taylor, C. J. J. (2016). Feedback control—based
inverse kinematics solvers for a nuclear decommissioning robot. IFAC-PapersOnLine, 49(21),
177–184. https://doi.org/10.1016/j.ifacol.2016.10.541.
15. Oveisi, A., Anderson, A., Nestorović, T., Montazeri, A. (2018). Optimal input excitation
design for nonparametric uncertainty quantification of multi-input multi-output systems (Vol.
51, no. 15, pp. 114–119). https://doi.org/10.1016/j.ifacol.2018.09.100.
160 D. Shanahan et al.

16. Oveisi, A., Nestorović, T., & Montazeri, A. (2018). Frequency domain subspace identification
of multivariable dynamical systems for robust control design, vol. 51, no. 15, pp. 990–995.
https://doi.org/10.1016/j.ifacol.2018.09.065.
17. West, C., Monk, S. D., Montazeri, A., & Taylor, C. J. (2018) A vision-based positioning
system with inverse dead-zone control for dual-hydraulic manipulators. In 2018 UKACC
12th International Conference on Control, CONTROL 2018 (pp. 379–384). https://doi.org/
10.1109/CONTROL.2018.8516734. (Oct, 2018).
18. West, C., Wilson, E. D., Clairon, Q., Monk, S., Montazeri, A., & Taylor, C. J. (2018).
State-dependent parameter model identification for inverse dead-zone control of a hydraulic
manipulator∗ . IFAC-PapersOnLine, 51(15), 126–131. https://doi.org/10.1016/j.ifacol.2018.
09.102.
19. Burrell, T., West, C., Monk, S. D., Montezeri, A., & Taylor, C. J. (2018). Towards a cooperative
robotic system for autonomous pipe cutting in nuclear decommissioning. In 2018 UKACC
12th International Conference on Control, CONTROL 2018 (pp. 283–288). https://doi.org/
10.1109/CONTROL.2018.8516841. (Oct 2018).
20. Nemati, H., & Montazeri, A. (2018). Analysis and design of a multi-channel time-varying
sliding mode controller and its application in unmanned aerial vehicles. IFAC-PapersOnLine,
51(22), 244–249. https://doi.org/10.1016/j.ifacol.2018.11.549.
21. Nemati, H., & Montazeri, A. (2018). Design and development of a novel controller for robust
attitude stabilisation of an unmanned air vehicle for nuclear environments. In 2018 UKACC
12th International Conference on Control (CONTROL) (pp. 373–378). https://doi.org/10.
1109/CONTROL.2018.8516729.
22. Nemati, H., Montazeri, A. (2019). Output feedback sliding mode control of quadcopter using
IMU navigation. In Proceedings-2019 IEEE International Conference on Mechatronics, ICM
2019 (pp. 634–639). https://doi.org/10.1109/ICMECH.2019.8722899. (May 2019).
23. Nokhodberiz, N. S., Nemati, H., & Montazeri, A. (2019). Event-triggered based state esti-
mation for autonomous operation of an aerial robotic vehicle. IFAC-PapersOnLine, 52(13),
2348–2353. https://doi.org/10.1016/j.ifacol.2019.11.557.
24. Lamb, F. (2013). Industrial automation hands-on.
25. Weyer, S., Schmitt, M., Ohmer, M., & Gorecky, D. (2015). Towards industry 4.0-
Standardization as the crucial challenge for highly modular, multi-vendor production systems.
IFAC-PapersOnLine, 28(3), 579–584. https://doi.org/10.1016/j.ifacol.2015.06.143.
26. IAEA. (2004). The nuclear power industry’s ageing workforce : transfer of knowledge to the
next generation (p. 101). (no. June).
27. Department for Business Energy and Industrial Strategy UK. 2022 Civil Nuclear Cyber
Security Strategy. https://assets.publishing.service.gov.uk/government/uploads/system/
uploads/attachment_data/file/1075002/civil-nuclear-cyber-security-strategy-2022.pdf. (no.
May, 2022).
28. Emptage, M., Loudon, D., Mcleod, R., Milburn, H., & Row, N. (2016). Characterisation:
Challenges and opportunities–A UK perspective (pp. 1–10).
29. Euratom (2022) Cyber physicaL Equipment for unmAnned Nuclear DEcommissioning
Measurements. Horizon 2020. Retrieved September 08, 2022, from https://cordis.europa.
eu/project/id/945335.
30. OECD/NEA. (1999). Decontamination techniques used in decommissioning activities. In
Nuclear Energy Agency (p. 51).
31. Aitken, J. M., et al. (2018). Autonomous nuclear waste management. IEEE Intelligent Systems,
33(6), 47–55. https://doi.org/10.1109/MIS.2018.111144814.
32. Euratom (2020) PREDIS. Horizon 2020. https://doi.org/10.3030/945098.
33. Smith, R., Cucco, E., & Fairbairn, C. (2020). Robotic development for the nuclear envi-
ronment: Challenges and strategy. Robotics, 9(4), 1–16. https://doi.org/10.3390/robotics9
040094.
34. Vitanov, I., et al. (2021). A suite of robotic solutions for nuclear waste decommissioning.
Robotics, 10(4), 1–20. https://doi.org/10.3390/robotics10040112.
Robotics and Artificial Intelligence in the Nuclear Industry: From … 161

35. Monk, S. D., Grievson, A., Bandala, M., West, C., Montazeri, A., & Taylor, C. J. (2021).
Implementation and evaluation of a semi-autonomous hydraulic dual manipulator for cutting
pipework in radiologically active environments. Robotics, 10(2). https://doi.org/10.3390/rob
otics10020062.
36. Adjigble, M., Marturi, N., Ortenzi, V., Rajasekaran, V., Corke, P., & Stolkin, R. (2018).
Model-free and learning-free grasping by Local Contact Moment matching. In IEEE Inter-
national Conference on Intelligent Robots and Systems (pp. 2933–2940). https://doi.org/10.
1109/IROS.2018.8594226.
37. Tokatli, O., et al. (2021). Robot-assisted glovebox teleoperation for nuclear industry. Robotics,
10(3). https://doi.org/10.3390/robotics10030085.
38. Jang, I., Carrasco, J., Weightman, A., & Lennox, B. (2019). Intuitive bare-hand teleoperation
of a robotic manipulator using virtual reality and leap motion. In TAROS 2019 (pp. 283–294).
London: Springer.
39. Sayed, M. E., Roberts, J. O., & Donaldson, K. (2022). Modular robots for enabling operations
in unstructured extreme environments. Advanced Intelligent Systems. https://doi.org/10.1002/
aisy.202000227.
40. Cerba, Š, Lüley, J., Vrban, B., Osuský, F., & Nečas, V. (2020). Unmanned radiation-monitoring
system. IEEE Transactions on Nuclear Science, 67(4), 636–643. https://doi.org/10.1109/TNS.
2020.2970782.
41. Tsitsimpelis, I., Taylor, C. J., Lennox, B., & Joyce, M. J. (2019). A review of ground-based
robotic systems for the characterization of nuclear environments. Progress in Nuclear Energy,
111, 109–124. https://doi.org/10.1016/j.pnucene.2018.10.023. (no. Oct, 2018).
42. Groves, K., Hernandez, E., West, A., Wright, T., & Lennox, B. (2021). Robotic exploration of
an unknown nuclear environment using radiation informed autonomous navigation. Robotics,
10(2), 1–15. https://doi.org/10.3390/robotics10020078.
43. Groves, K., West, A., Gornicki, K., Watson, S., Carrasco, J., & Lennox, B. (2019). MallARD:
An autonomous aquatic surface vehicle for inspection and monitoring of wet nuclear storage
facilities. Robotics, 8(2). https://doi.org/10.3390/ROBOTICS8020047.
44. Parasuraman, R., Sheridan, T. B., & Wickens, C. D. (2000). A model for types and levels of
human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics-
Part A: Systems and Humans, 30(3), 286–297. https://doi.org/10.1109/3468.844354.
45. Gamer, T., Hoernicke, M., Kloepper, B., Bauer, R., & Isaksson, A. J. (2020). The autonomous
industrial plant–future of process engineering, operations and maintenance. Journal of Process
Control, 88, 101–110. https://doi.org/10.1016/j.jprocont.2020.01.012.
46. Luckcuck, M., Fisher, M., Dennis, L., Frost, S., White, A., & Styles, D. (2021). Princi-
ples for the development and assurance of autonomous systems for safe use in hazardous
environments. https://doi.org/10.5281/zenodo.5012322.
47. Blum, C., Winfield, A. F. T., & Hafner, V. V. (2018). Simulation-based internal models for
safer robots. Frontiers in Robotics and AI, 4. https://doi.org/10.3389/frobt.2017.00074. (no.
Jan, 2018).
48. Lee, E. A. (2008). Cyber physical systems: Design challenges. In Proceedings-11th IEEE
Symposium Object/Component/Service-Oriented Real-Time Distributed Computing ISORC
2008, (pp. 363–369). https://doi.org/10.1109/ISORC.2008.25.
49. NIST. (2017). Framework for Cyber-Physical Systems: Volume 1, Overview NIST Special
Publication 1500–201 Framework for Cyber-Physical Systems: Volume 1, Overview. https://
nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1500-201.pdf.
50. Wang, L., Törngren, M., & Onori, M. (2015). Current status and advancement of cyber-
physical systems in manufacturing. Journal of Manufacturing Systems, 37, 517–527. (no.
Oct, 2020). https://doi.org/10.1016/j.jmsy.2015.04.008.
51. Lee, J., Bagheri, B., & Kao, H. A. (2015). A cyber-physical systems architecture for Industry
4.0-based manufacturing systems. Manufacturing Letters, 3, 18–23. https://doi.org/10.1016/
j.mfglet.2014.12.001.
52. Pivoto, D. G. S., de Almeida, L. F. F., da Rosa Righi, R., Rodrigues, J. J. P. C., Lugli, A. B., &
Alberti, A. M. (2021). Cyber-physical systems architectures for industrial internet of things
162 D. Shanahan et al.

applications in Industry 4.0: A literature review. Journal of Manufacturing Systems, 58(no.


PA), 176–192. https://doi.org/10.1016/j.jmsy.2020.11.017.
53. Sisinni, E., Saifullah, A., Han, S., Jennehag, U., & Gidlund, M. (2018). Industrial internet
of things: Challenges, opportunities, and directions. IEEE Transactions on Industrial
Informatics, 14(11), 4724–4734. https://doi.org/10.1109/TII.2018.2852491.
54. Aceto, G., Persico, V., Pescapé, A., & Member, S. (2019). A Survey on information and
communication technologies for industry 4.0: State-of-the-art, taxonomies, perspectives, and
challenges. IEEE Communications Surveys and Tutorials, 21(4), 3467–3501.
55. Luo, R. C., & Kuo, C. W. (2016). Intelligent seven-DoF robot with dynamic obstacle avoidance
and 3-D object recognition for industrial cyber-physical systems in manufacturing automation.
Proceedings of the IEEE, 104(5), 1102–1113. https://doi.org/10.1109/JPROC.2015.2508598.
56. Yaacoub, J. P. A., Salman, O., Noura, H. N., Kaaniche, N., Chehab, A., & Malli, M. (2020).
Cyber-physical systems security: Limitations, issues and future trends. Microprocessors and
Microsystems, 77. https://doi.org/10.1016/j.micpro.2020.103201.
57. Wollschalger, M., Sauter, T., & Jasperneite, J. (2017). The Future of Industrial Communica-
tion. IEEE Industrial Electronics Magazine, pp. 17–27. (no. March).
58. Krishnamurthi, R., Kumar, A., Gopinathan, D., Nayyar, A., & Qureshi, B. (2020). An overview
of IoT sensor data processing, fusion, and analysis techniques. Sensors, 20(21), 1–23. https://
doi.org/10.3390/s20216076.
59. Simoens, P., Dragone, M., & Saffiotti, A. (2018). The internet of robotic things: A review of
concept, added value and applications. International Journal of Advanced Robotic Systems,
15(1), 1–11. https://doi.org/10.1177/1729881418759424.
60. Mukherjee, M., Shu, L., & Wang, D. (2018). Survey of fog computing: Fundamental, network
applications, and research challenges. IEEE Communications Surveys and Tutorials, 20(3),
1826–1857. https://doi.org/10.1109/COMST.2018.2814571.
61. Qiu, T., Chi, J., Zhou, X., Ning, Z., Atiquzzaman, M., & Wu, D. O. (2020). Edge computing
in industrial internet of things: Architecture, advances and challenges. IEEE Communications
Surveys and Tutorials, 22(4), 2462–2488. https://doi.org/10.1109/COMST.2020.3009103.
62. Kehoe, B., Patil, S., Abbeel, P., & Goldberg, K. (2015). A survey of research on cloud robotics
and automation. IEEE Transactions on Automation Science and Engineering, 12(2), 398–409.
https://doi.org/10.1109/TASE.2014.2376492.
63. Chaari, I., Koubaa, A., Qureshi, B., Youssef, H., Severino, R., & Tovar, E. (2018). On the
robot path planning using cloud computing for large grid maps. In 18th IEEE International
Conference on Autonomous Robot Systems and Competitions. ICARSC 2018, (pp. 225–230).
https://doi.org/10.1109/ICARSC.2018.8374187.
64. Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile Network Application, 19(2),
171–209. https://doi.org/10.1007/s11036-013-0489-0.
65. Tao, F., Qi, Q., Wang, L., & Nee, A. Y. C. (2019). Digital twins and cyber-physical systems
toward smart manufacturing and industry 4.0: Correlation and comparison. Engineering, 5(4),
653–661. https://doi.org/10.1016/j.eng.2019.01.014.
66. Upadhyay, H., Lagos, L., Joshi, S., & Abrahao, A. (2018) Big data framework with machine
learning for D&D applications.
67. Glaessgen, E. H., & Stargel, D. S. (2012). The digital twin paradigm for future NASA and
U.S. Air force vehicles. In 53rd Structures, Structural Dynamics, and Materials Conference:
Special Session on the Digital Twin (pp. 1–14).
68. Minerva, R., Lee, G. M., & Crespi, N. (2020). Digital twin in the IoT context: A survey on
technical features, scenarios, and architectural models. Proceedings of the IEEE, 108(10),
1785–1824. https://doi.org/10.1109/JPROC.2020.2998530.
69. Fuller, A., Fan, Z., Day, C., & Barlow, C. (2020). Digital twin: Enabling technologies, chal-
lenges and open research. IEEE Access, 8, 108952–108971. https://doi.org/10.1109/ACCESS.
2020.2998358.
70. Mathworks (2021) Digital twins for predicitive maintenance. https://explore.mathworks.com/
digital-twins-for-predictive-maintenance.
Robotics and Artificial Intelligence in the Nuclear Industry: From … 163

71. Weiss, G. (1999). Multiagent Systems: A Modern Approach to Distributed Artificial Intel-
ligence, (Vol. 3, no. 2). http://books.google.com/books?hl=nl&lr=&id=JYcznFCN3xcC&
pgis=1.
72. Russell, S., & Norvig, P. (2010). Artificial intelligence: A modern approach. Prentice Hall.
73. Alpaydın, E. (2010). Introduction to machine learning second edition. MIT Press. https://doi.
org/10.1007/978-1-62703-748-8_7.
74. Goodfellow, I., Bengio, Y., & Courville, A. (2012) Deep learning.
75. Li, Y., et al. (2022) A review on interaction control for contact robots through intent detection.
Progress in Biomedical Engineering, 4(3). https://doi.org/10.1088/2516-1091/ac8193.
76. Ganesh, G., Takagi, A., Osu, R., Yoshioka, T., Kawato, M., & Burdet, E. (2014). Two is better
than one: Physical interactions improve motor performance in humans. Science and Reports,
4(1), 3824. https://doi.org/10.1038/srep03824.
77. Takagi, A., Ganesh, G., Yoshioka, T., Kawato, M., & Burdet, E. (2017). Physically interacting
individuals estimate the partner’s goal to enhance their movements. Nature Human Behaviour,
1(3), 54. https://doi.org/10.1038/s41562-017-0054.
78. Li, Y., Eden, J., Carboni, G., & Burdet, E. (2020). Improving tracking through human-robot
sensory augmentation. IEEE Robotics and Automation Letters, 5(3), 4399–4406. https://doi.
org/10.1109/LRA.2020.2998715.
79. Başar, T., & Olsder, G. J. (1998). Dynamic noncooperative game theory (2nd ed.). Society
for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611971132.
80. Nilsson, N. (1969). A mobile Automaton. An application of artificial intelligence techniques.
81. Brooks, R. A. (1986). A robust layered control system for a mobile robot. IEEE Journal of
Robotics and Automation, 2(1), 14–23. https://doi.org/10.1109/JRA.1986.1087032.
82. Siciliano, B., & Khatib, O. (2012). Handbook of robotics. https://link.springer.com/book/.
https://doi.org/10.1007/978-3-319-32552-1.
83. Albus, J., et al. (2002). 4D/RCS version 2.0: A reference model architecture for unmanned
vehicle systems. NIST Interagency/Internal Report (NISTIR), National Institute of Standards
and Technology, Gaithersburg, MD. https://doi.org/10.6028/NIST.IR.6910.
84. Mataric, M. J. (2008). The robotics primer. MIT Press. https://doi.org/10.5860/choice.45-
3222.
85. Di Buono, A., Cockbain, N., Green, P., & Lennox, B. (2021). Wireless communications in
nuclear decommissioning environments. In UK-RAS Conference: Robots Working For and
Among us Proceedings (Vol. 1, pp. 71–73). https://doi.org/10.31256/ukras17.23.
86. Spong, M. W. (2022). An historical perspective on the control of robotic manipulators. Annual
Review of Control, Robotics, and Autonomous Systems, 5(1). https://doi.org/10.1146/annurev-
control-042920-094829.
87. Slotine, J.-J. E., & Li, W. (2011). Applied nonlinear control. Prentice Hall.
88. Craig, J. J., Hsu, P., & Sastry, S. S. (1987). Adaptive control of mechanical manipulators. The
International Journal of Robotics Research, 6(2), 16–28. https://doi.org/10.1177/027836498
700600202.
89. Shousong, H., & Qixin, Z. (2003). Stochastic optimal control and analysis of stability of
networked control systems with long delay. Automatica, 39(11), 1877–1884. https://doi.org/
10.1016/S0005-1098(03)00196-1.
90. Huang, D., & Nguang, S. K. (2008). State feedback control of uncertain networked control
systems with random time delays. IEEE Transactions on Automatic Control, 53(3), 829–834.
https://doi.org/10.1109/TAC.2008.919571.
91. Shi, Y., & Yu, B. (2009). Output feedback stabilization of networked control systems with
random delays modeled by Markov chains. IEEE Transactions on Automatic Control, 54(7),
1668–1674. https://doi.org/10.1109/TAC.2009.2020638.
92. Hokayem, P. F., & Spong, M. W. (2006). Bilateral teleoperation: An historical survey.
Automatica, 42(12), 2035–2057. https://doi.org/10.1016/j.automatica.2006.06.027.
93. Bemporad, A. (1998). Predictive control of teleoperated constrained systems with unbounded
communication delays. In Proceedings of the 37th IEEE Conference on Decision and Control
(Cat. No.98CH36171), 1998 (Vol. 2, pp. 2133–2138). https://doi.org/10.1109/CDC.1998.
758651.
164 D. Shanahan et al.

94. Guo, K., Su, H., & Yang, C. (2022) A small opening workspace control strategy for redundant
manipulator based on RCM method. IEEE Transactions on Control Systems Technology, 1–9.
https://doi.org/10.1109/TCST.2022.3145645.
95. Walsh, G. C., Ye, H., & Bushnell, L. G. (2002). Stability analysis of networked control systems.
IEEE Transactions on Control Systems Technology, 10(3), 438–446. https://doi.org/10.1109/
87.998034.
96. Tipsuwan, Y., & Chow, M.-Y. (2003). Control methodologies in networked control systems.
Control Engineering Practice, 11, 1099–1111. https://doi.org/10.1016/S0967-0661(03)000
36-4.
97. Yue, D., Han, Q.-L., & Lam, J. (2005). Network-based robust H∞ control of systems with
uncertainty. Automatica, 41(6), 999–1007. https://doi.org/10.1016/j.automatica.2004.12.011.
98. Zhang, X.-M., Han, Q.-L., & Zhang, B.-L. (2017). An overview and deep investigation
on sampled-data-based event-triggered control and filtering for networked systems. IEEE
Transactions on Industrial Informatics, 13(1), 4–16. https://doi.org/10.1109/TII.2016.260
7150.
99. Pasqualetti, F., Member, S., Dör, F., Member, S., & Bullo, F. (2013). Attack detection and
identification in cyber-physical systems. Attack Detection and Identification in Cyber-Physical
Systems, 58(11), 2715–2729.
100. Dolk, V. S., Tesi, P., De Persis, C., & Heemels, W. P. M. H. (2017). Event-triggered control
systems under denial-of-service attacks. IEEE Transactions on Control of Network Systems.,
4(1), 93–105. https://doi.org/10.1109/TCNS.2016.2613445.
101. Ding, D., Han, Q.-L., Xiang, Y., Ge, X., & Zhang, X.-M. (2018). A survey on security
control and attack detection for industrial cyber-physical systems. Neurocomputing, 275(C),
1674–1683. https://doi.org/10.1016/j.neucom.2017.10.009.
102. Yue, D., Tian, E., & Han, Q.-L. (2013). A delay system method for designing event-triggered
controllers of networked control systems. IEEE Transactions on Automatic Control, 58(2),
475–481. https://doi.org/10.1109/TAC.2012.2206694.
103. Wu, L., Gao, Y., Liu, J., & Li, H. (2017). Event-triggered sliding mode control of stochastic
systems via output feedback. Automatica, 82, 79–92. https://doi.org/10.1016/j.automatica.
2017.04.032.
104. Li, X.-M., Zhou, Q., Li, P., Li, H., & Lu, R. (2020). Event-triggered consensus control for
multi-agent systems against false data-injection attacks. IEEE Transactions on Cybernetics,
50(5), 1856–1866. https://doi.org/10.1109/TCYB.2019.2937951.
105. Zhang, L., Liang, H., Sun, Y., & Ahn, C. K. (2021). Adaptive event-triggered fault detection
scheme for semi-markovian jump systems with output quantization. IEEE Transactions on
Systems, Man, and Cybernetics: Systems, 51(4), 2370–2381. https://doi.org/10.1109/TSMC.
2019.2912846.
106. Huo, X., Karimi, H. R., Zhao, X., Wang, B., & Zong, G. (2022). Adaptive-critic design for
decentralized event-triggered control of constrained nonlinear interconnected systems within
an identifier-critic framework. IEEE Transactions on Cybernetics, 52(8), 7478–7491. https://
doi.org/10.1109/TCYB.2020.3037321.
107. Dao, H. V., Tran, D. T., & Ahn, K. K. (2021). Active fault tolerant control system design
for hydraulic manipulator with internal leakage faults based on disturbance observer and
online adaptive identification. IEEE Access, 9, 23850–23862. https://doi.org/10.1109/ACC
ESS.2021.3053596.
108. Yu, X., & Jiang, J. (2015). A survey of fault-tolerant controllers based on safety-related issues.
Annual Reviews in Control, 39, 46–57. https://doi.org/10.1016/j.arcontrol.2015.03.004.
109. Freddi, A., Longhi, S., Monteriù, A., Ortenzi, D., & Proietti Pagnotta, D. (2019). Fault tolerant
control scheme for robotic manipulators affected by torque faults. IFAC-PapersOnLine,
51(24), 886–893. https://doi.org/10.1016/j.ifacol.2018.09.680.
110. Corke, P. (2016). Robotics, vision and control (2nd ed.). Springer.
111. Brock, O., Kuffner, J., & Xiao, J. (2012) Robotic motion planning. In Springer handbook of
robotics. Springer.
Robotics and Artificial Intelligence in the Nuclear Industry: From … 165

112. Marturi, N., et al. (2017). Towards advanced robotic manipulation for nuclear decom-
missioning: A pilot study on tele-operation and autonomy. In International Conference
on. Robotics and Automation for Humanitarian Applications RAHA 2016-Conference
Proceedings. https://doi.org/10.1109/RAHA.2016.7931866.
113. Spong, M. W., Hutchinson, S., & Vidyasgar, M. (2004). Robot dynamics and control.
114. Lozano-PéRez, T. (1987). A simple motion-planning algorithm for general robot manipula-
tors. IEEE Journal of Robotics and Automation, 3(3), 224–238. https://doi.org/10.1109/JRA.
1987.1087095.
115. Lavalle, S., & Kuffner, J. (2000). Rapidly-exploring random trees: Progress and prospects.
Algorithmic Computational Robotics. (New Dir.).
116. Kavraki, L. E., Švestka, P., Latombe, J. C., & Overmars, M. H. (1996). Probabilistic roadmaps
for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics
and Automation, 12(4), 566–580. https://doi.org/10.1109/70.508439.
117. Hsueh, H.-Y., et al. (2022). Systematic comparison of path planning algorithms using
PathBench (pp. 1–23). http://arxiv.org/abs/2203.03092.
118. Guo, N., Li, C., Gao, T., Liu, G., Li, Y., & Wang, D. (2021). A fusion method of local
path planning for mobile robots based on LSTM neural network and reinforcement learning.
Mathematical Problems in Engineering, 2021. https://doi.org/10.1155/2021/5524232.
119. Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J., & Quillen, D. (2018). Learning hand-eye
coordination for robotic grasping with deep learning and large-scale data collection. Interna-
tional Journal of Robotics Research, 37(4–5), 421–436. https://doi.org/10.1177/027836491
7710318.
120. Bateux, Q., et al. (2018). Training deep neural networks for visual servoing. In ICRA 2018-
IEEE International Conference on Robotics and Automation, 2018 (pp. 3307–3314).
121. Treiber, M. (2013). An introduction to object recognition selected algorithms for a wide variety
of applications. Springer.
122. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection
with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 32(9). https://doi.org/10.1109/MC.2014.42.
123. Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A. (2003). Context-based vision
system for place and object recognition. In Proceedings of the IEEE International Conference
on Computer Vision (Vol. 1, pp. 273–280). https://doi.org/10.1109/iccv.2003.1238354.
124. Zakharov, S., Shugurov, I., & Ilic, S. (2019) DPOD: 6D pose object detector and refiner.
In Proceedings of the IEEE International Conference on Computer Vision, (Vol. 2019 Oct,
pp. 1941–1950). https://doi.org/10.1109/ICCV.2019.00203.
125. Sun, L., Zhao, C., & Yan, Z. (2019). A novel weakly-supervised approach for RGB-D-based
nuclear waste object detection (Vol. 19, no. 9, pp. 3487–3500).
126. Zhao, C., Sun, L., Purkait, P., Duckett, T., & Stolkin, R. (2018). Dense RGB-D semantic
mapping with pixel-voxel neural network. Sensors (Switzerland), 18(9). https://doi.org/10.
3390/s18093099.
127. Gorschlüter, F., Rojtberg, P., & Pöllabauer, T. (2022). A Survey of 6D object detection based
on 3D models for industrial applications. Journal of Imaging, 8(3), 1–18. https://doi.org/10.
3390/jimaging8030053.
128. Patterson, E. A., Taylor, R. J., & Bankhead, M. (2016). A framework for an integrated nuclear
digital environment. Progress in Nuclear Energy, 87, 97–103. https://doi.org/10.1016/j.pnu
cene.2015.11.009.
129. Lu, R. Y., Karoutas, Z., & Sham, T. L. (2011). CASL virtual reactor predictive simulation:
Grid-to-rod fretting wear. JOM Journal of the Minerals Metals and Materials Society, 63(8),
53–58. https://doi.org/10.1007/s11837-011-0139-6.
130. Bowman, D., Dwyer, L., Levers, A., Patterson, E. A., Purdie, S., & Vikhorev, K. (2022) A
unified approach to digital twin architecture–Proof-of-concept activity in the nuclear sector.
IEEE Access, 1–1. https://doi.org/10.1109/access.2022.3161626.
131. Kawabata, K., & Suzuki, K. (2019) Development of a robot simulator for remote operations for
nuclear decommissioning. In 2019 16th Int. Conf. Ubiquitous Robot. UR 2019 (pp. 501–504).
https://doi.org/10.1109/URAI.2019.8768640.
166 D. Shanahan et al.

132. Partiksha, & Kattepur, A. (2022). Robotic tele-operation performance analysis via digital twin
simulations (pp. 415–417). https://doi.org/10.1109/comsnets53615.2022.9668555.
133. Wright, T., West, A., Licata, M., Hawes, N., & Lennox, B. (2021). Simulating ionising radi-
ation in gazebo for robotic nuclear inspection challenges. Robotics, 10(3), 1–27. https://doi.
org/10.3390/robotics10030086.
134. Kim, M., Lee, S. U., & Kim, S. S. (2021). Real-time simulator of a six degree-of-freedom
hydraulic manipulator for pipe-cutting applications. IEEE Access, 9, 153371–153381. https://
doi.org/10.1109/ACCESS.2021.3127502.
Deep Learning and Robotics, Surgical
Robot Applications

Muhammad Shahid Iqbal, Rashid Abbasi, Waqas Ahmad,


and Fouzia Sher Akbar

Abstract Surgical robots can perform difficult tasks that humans cannot. They can
perform repetitive tasks, work with hazardous materials, and can operate difficult
objects. This has helped businesses, saved time and money while also preventing
numerous accidents. The use of surgical robots, also known as robot-assisted surgery
allows medical professionals to perform a wide range of complex procedures with
greater accuracy, adaptability, and control than traditional methods. Minimally inva-
sive surgery, which is frequently associated with robotic surgery, is performed
through small incisions. It is also used in some traditional open surgical proce-
dures. This chapter discusses advanced robotic surgical systems and deep learning
(DL). The purpose of this chapter is to provide an overview of the major issues in
artificial intelligence (AI), including how they apply to and limit surgical robots.
Each surgical system is thoroughly explained in the chapter, along with any most
recent AI-based improvements. Case studies are provided with the information on
recent advancements and on the role of DL, and future surgical robotics applications
in ophthalmology are also thoroughly discussed. The new ideas, comparisons, and
updates on surgical robotics and deep learning are all summarized in this chapter.

Keywords Robotics · Deep learning · Surgical robot · Application of surgical


robot · Modern trends in surgical robotics

M. S. Iqbal (B) · F. S. Akbar


Department of Computer Science and Information Technology, Women University of AJ&K,
Bagh, Pakistan
e-mail: [email protected]
R. Abbasi
Anhui Polytechnic University Hefei, Wuhu, China
W. Ahmad
Higher Education Department Govt, AJ&K, Mirpur 10250, Pakistan

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 167
A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous
Systems Applications, Studies in Computational Intelligence 1093,
https://doi.org/10.1007/978-3-031-28715-2_6
168 M. S. Iqbal et al.

1 Introduction

The development, application, and the use of mechanical robots are the subject of
high-level mechanics, an interdisciplinary field of science and planning. The assistant
will give you a thorough understanding of mechanical innovation, including various
robot types and how they are used in adventures [1, 2].
One barrier to robots mimicking humans is a lack of proprioception—a sense of
attention to muscles and body parts—a type of “intuition” for humans that is essential
to how to coordinate development. Roboticists have had the option to provide robots
with the feeling of sight through cameras, feeling of smell and taste through synthetic
sensors and receivers assist robots with hearing. However they have battled to assist
robots with procuring this “intuition” to see their body. Now, utilizing tactile mate-
rials and AI calculations, progress is being made. In one case, arbitrarily positioned
sensors distinguish contact and tension and send information to an AI calculation that
deciphers the signs. In another model, roboticists are attempting to foster a mechan-
ical arm that is essentially as able as a human arm, and that can get an assortment
of articles. Until late turns of events, the interaction included separately preparing a
robot to play out each undertaking or to have an AI calculation with a huge dataset of
involvement to gain from. Robert Kwiatkowski and Hod Lipson of Columbia Univer-
sity are chipping away at “task-skeptic self-demonstrating machines.” Similar to a
newborn child in its most memorable year of life, the robot starts without any infor-
mation on its own body or the material science of movement. As it rehashes great
many developments it observes the outcomes and constructs a model of them. The
AI calculation is then used to help the robot plan about future developments in light
of its earlier movement. Thusly, the robot is figuring out how to decipher its activ-
ities. A group of USC specialists at the USC Viterbi School of Engineering accept
they are quick to foster an AI-controlled automated appendage that can recuperate
from falling without being unequivocally customized to do as such. This is progres-
sive work that shows robots advancing by doing. Man-made brainpower empowers
present day advanced mechanics. AI and AI assisted robots with seeing, walk, talk,
smell and move in progressively human-like ways [3–13].
In this chapter, we propose taking a gander at the modified game plan execution of
the convolutional cerebrum network-based cautious robot with that of various robots
and cautious robots as well as the business standard of expert manual gathering.
Different convolutional mind network plans can be done by changing the amount of
part graphs and the coding and unraveling layer. Examination is finished to sort out
how gathering execution is impacted by designing arrangement limits. The chapter
describes each surgical system in detail, as well as its most recent advancements
through the use of AI. Future surgical robotics applications in ophthalmology are
thoroughly discussed, with case studies provided alongside recent progress and the
role of DL. This chapter summarises the new concepts and comparisons, as well as
updates on surgical robotics and deep learning. Figure 1 show the PRISMA diagram
of this chapter.
Deep Learning and Robotics, Surgical Robot Applications 169

Fig. 1 Shows the detail of subtopic of this chapter

2 Related Work

Careful robots have been available for some time, and mechanical medicine has made
significant progress over the past almost ten years by collecting data and experi-
menting with unusual situations. Mechanical and laparoscopic medical procedures
are known to have less of an impact on patients because they can complete tasks with
minimal intrusion [14–19]. A traditional open surgical procedure necessitates the use
of a critical and cautious site when performing an activity within an organ. A field of
view costs money because the intricate entanglement of human organs necessitates
it. This was the way the laparoscopic surgical procedure and the embedding of an
endoscope to get a field of view were described. The stomach depression and its
cavity could then be examined and treated. Additionally, it was promoted as a more
precise and meticulous method of medical treatment. Imaging and representation,
contact force detecting, and control advancements have made tissue palpation in the
170 M. S. Iqbal et al.

controller possible [20]. The impacted area is less damaged by robotic surgery, and
the speed of the patient’s recovery reduces the amount of time they can spend doing
other things [21–23]. Because this kind of work is done by controlling a robot arm,
it is hard for the specialist to talk to the patient directly.
To complete this task, you need a lot of skill and careful attention. On the oper-
ating table, the patient is surrounded by numerous machines. The robot’s console is
occupied by the operating surgeon. The patient is kept well away from the site of the
procedure. The surgeon simultaneously controls and monitors the console unit. The
surgeon uses visual information to understand how the robotic arm works during
surgery [24, 25]. This has to do with how well robotic surgery works. In addition,
surgeons assert that robotic surgery necessitates greater caution than open surgery due
to its greater reliance on visual aspects because the affected area cannot be directly
observed. When the surgical site is narrowed and the camera view is inevitably
narrowed, the surgeon receives even less information [26, 27]. A major drawback of
surgical robots is this. A gas-filled surgery space is created in the abdomen to ensure
a clear view, and surgical instruments are then inserted. Additionally, a camera is
inserted to demonstrate to the surgeon the state of the abdominal cavity. The pinch-
type master controller of the surgery robot enables extremely precise operation. It’s
hard and requires a lot of skill to operate. 3D images have recently been developed and
integrated into technology to address surgeons’ concerns [26, 27]. Combining images
from various vantage points can improve the quality of the information provided, as
demonstrated in this example. However, for surgeons who perform direct opera-
tions, it is still difficult to improve visual information. In an emergency, the method
of operation can also lead to poor judgment because it is carried out away from the
patient.
Surgeons can use a variety of simulations to improve their accuracy and become
more accustomed to performing operations. Several subtask devices have recently
been developed to provide surgeons with a virtual practice area prior to robotic
surgery [28]. The Da Vinci Research Kit, developed at Johns Hopkins, is the most
well-known tool. You can practice by creating a setting from materials that look like
human tissues. Because microscopic surgery is based on what the camera sees in the
area being treated, it is still essential to be able to use a hand even with this equipment.
To improve the success rate of surgery performed with such a limited view, additional
sensory transmission is required. A haptic feedback system is still missing from even
the most widely used da Vinci robot [29, 30]. Therefore, if RMIS can provide the
surgeon with tactile sensation data in real time during surgery, this issue will partially
be resolved. In robotic surgery, it is anticipated that the proposed haptic system
will speed up decision-making and enhance surgical quality and accuracy [31–37].
Surgeons who need to operate with great dexterity may require haptic systems [31,
36, 37]. When the surgeon has access to the patient’s real-time data, they are able
to make decisions quickly and precisely. The human body’s internal conditions can
vary greatly. Tissue stiffness may differ from that of the surrounding area if a tumor
hasn’t been found yet. However, unless you directly touch the deeply concealed area,
the issue might not be apparent. Tactile feedback can help with some of these issues.
Using a single tactile feedback device against a variety of human body tissues, it
Deep Learning and Robotics, Surgical Robot Applications 171

ought to be possible to alter the tactile perception of various organs and tissues in
real time. Numerous tactile transmission devices have been investigated in light of
these factors. The vibration feedback system is the most widely used tactile feedback
system, as previously mentioned [38, 39]. The intensity of the vibration can convey
the tactile sensation. However, it is frequently utilized to issue a warning signal in
response to external stimuli.
It is common knowledge that a piezoelectric-based vibration feedback system
can also function as a tactile device. According to numerous sources, various human
organs and tissues are viscoelastic. To ensure high surgical quality and safety, a
tactile device with properties comparable to or identical to those of human tissues
and organs ought to be utilized in order to provide surgeons with more precise infor-
mation about viscoelastic properties. However, due to the viscous effect’s time delay,
implementing viscoelastic properties via the vibration feedback system is extremely
challenging. Because of this, the surgical robot console’s tactile device cannot be
used with it. The piezoelectric technology is one of the additional systems [40]. In
accordance with different strategies for arrangement, this method provides tactile
sensation. You can be successful with a sufficient number of different forces [40].
However, using simple force to convey the state of body tissue is inappropriate.
A method that can simultaneously express all of the viscoelastic properties of the
human body is more suitable. To incorporate viscoelastic properties, a pneumatic
tactile transmission device has been proposed [41]. The entry of compressible gas
prevents the condition of incompressible tissues from being expressed under pneu-
matic pressure. In RMIS (Robot-assisted Minimally Invasive Surgery), numerous
tactile devices made of magnetorheological materials have recently been proposed
to support these points [42–49]. The development of the Haptic Master with MR
materials has been the subject of numerous studies [42, 50]. Additionally, a method
for directly delivering haptic information to the surgeon’s hand has been proposed as
the MR Tactile Cell Device [51–56]. Because the magnetorheologically based tactile
device is able to alter the yield stress by varying the intensity of the magnetic field,
it is possible to use a single sample to represent the organ characteristics of various
human tissues.

3 Machine Learning and Surgical Robot

A crucial component of any surgeon’s training is accurate and impartial performance


evaluation. On the other hand, surgeons continue to track their performance using
relatively basic metrics like operative duration, postoperative outcomes, and compli-
cation rates despite the abundance of innovation available to the modern surgeon.
The surgeon’s performance during the operation is not accurately captured by these
metrics. It is difficult to consistently track performance because trainer feedback on
intraoperative performance is not always unstructured, consistent, or consistent.
It is nothing new to find more systematic and objective methods for evaluating
intraoperative performance. The Objective Structured Assessment of Technical Skills
172 M. S. Iqbal et al.

(OSATS) is one of many rating scales that expert raters can use to evaluate surgeons
in a variety of areas, such as effectiveness, tissue handling, and operation flow [57].
These have also been modified for use with robotic platforms [58, 59], laparoscopic
procedures [60], and specific research fields [61, 62]. Despite their widespread use
in academic research, these scales are rarely utilized in clinical settings. This is due
to the fact that it requires a professional reviewer, is prone to rater bias, and requires
a lot of time and effort.
These issues might be solved by putting ML to use. The scientific field that focuses
on how computers learn from data is referred to as “machine learning” (ML).It is
capable of quickly generating automated feedback that can be replicated without the
assistance of professional reviewers once it has undergone training or is constructed
empirically. It can also easily process the vast amount of data that is available from
the modern operating room. Due to the ever-increasing availability of computa-
tional power, machine learning (ML) is being utilized in a variety of medical fields,
including surgery. Postoperative mortality risk prediction [63], autonomous perfor-
mance of simple tasks [64], and surgical work-flow analysis [65] are just a few of
the many surgical applications of machine learning (ML) and artificial intelligence
(AI).The widespread use of machine learning (ML) has led to the development of the
field of surgical data science, which aims to improve the value and quality of surgery
through data collection, organization, analysis, and modeling [63, 66–68]. Over the
past ten years, the use of machine learning (ML) in the assessment of surgical skill
has increased rapidly. However, the extent to which ML can be utilized to eval-
uate surgical performance is still unclear. HMM, SVM, and ANN were the most
commonly used machine learning (ML) techniques for evaluating surgical perfor-
mance. Coincidentally, these three important ML techniques follow the research
trends in this area, which initially emphasized the use of HMM before moving on to
SVM methods and, more recently, ANN and deep learning.

4 Robotics and Deep Learning

At the point when people and creatures participate in object control ways of behaving,
the cooperation innately includes a quick input circle among discernment and activity.
Indeed, even complex control errands, for example, removing a solitary article from
a jumbled container, can be performed without first confining the items or plan-
ning the scene, depending rather on ceaseless detecting from contact and vision.
Interestingly, automated control frequently (however not consistently) depends all
the more vigorously on early arrangement and examination, with moderately basic
input, for example, direction following, to guarantee solidness during execution.
Part of the justification behind this is that integrating complex tactile information
sources. For example, vision straightforwardly into an input regulator is extremely
difficult. Procedures, for example, visual servoing perform consistent criticism on
visual elements, yet commonly require the highlights to be determined manually,
and both open circle insight and input (for example by means of visual servoing)
Deep Learning and Robotics, Surgical Robot Applications 173

requires manual or programmed alignment to decide the exact mathematical connec-


tion between the camera and the robot’s end-effector. In this paper, authors propose
a learning-based way to deal with dexterity for mechanical getting a handle on.
This approach is information driven and objective driven: our technique figures out
how to servo a mechanical gripper to represents that is probably going to create
fruitful handles, with start to finish preparing straightforwardly from picture pixels
to task-space gripper movement. By constantly re-computing the most encouraging
engine orders, our strategy consistently incorporates tangible signs from the climate,
permitting it to respond to bothers and change the grip to expand the likelihood of
achievement [69, 70].
Moreover, the engine orders are given in the casing of the robot’s base, which isn’t
known to the model at test time. This implies that the model doesn’t need the camera
to be definitively adjusted as for the end-effector, however rather utilizes viewable
prompts to decide the spatial connection between the gripper and graspable articles
in the scene. Our point in planning and assessing this approach is to comprehend the
way in which well as getting a handle on framework could be advanced altogether
without any preparation, with negligible earlier information or manual designing.
Our strategy comprises of two parts: a grip achievement indicator, which utilizes
a profound convolutional brain organization (CNN) to decide how likely a given
movement is to create a fruitful handle, and a constant servoing instrument that
utilizes the CNN to refresh the robot’s engine orders ceaselessly. By consistently
picking the best anticipated way to a fruitful handle, the servoing component furnishes
the robot with quick criticism to irritations and item movement, as well as power
to off base activation. The principal commitments of this work are: a technique for
gaining persistent visual servoing for automated getting a handle on from monocular
cameras, a novel convolutional brain network design for figuring out how to anticipate
the result of a grip endeavor, and an enormous scope information assortment system
for mechanical handles. The author also presents a broad trial evaluation aimed at
learning the efficacy of this strategy, determining its information requirements, and
dissecting the possibility of reusing getting a handle on information across different
types of robots. This paper builds on our previous meeting paper by performing a
second exploratory assessment on another mechanical stage and evaluating move
gaining using data from two different robots [71].
Author depicts two enormous scope tries that directed on two separate mechanical
stages. In the core arrangement of analyses, the grip expectation CNN was prepared
on a dataset of around 800,000 handle endeavors, gathered utilizing a bunch of 7 level
of opportunity mechanical arms. Although the equipment boundaries of every robot
were at first indistinguishable, every unit experienced different mileage throughout
information assortment, cooperated with various items, and utilized somewhat unique
camera presents comparative with the robot base. The assortment of items gives a
different dataset to learning handle methodologies, while the fluctuation in camera
presents gives an assortment of conditions to learning nonstop dexterity for the getting
a handle on task. The principal explore was pointed toward assessing the adequacy
174 M. S. Iqbal et al.

of the proposed strategy, as well as contrasting it with baselines and earlier proce-
dures. The dataset utilized in these trials is accessible for download: https://sites.
google.com/website/brainrobotdata/home, the second arrangement of investigations
was pointed toward assessing whether getting a handle on information gathered by
one kind of robot could be utilized to work on the getting a handle on capability
of an alternate robot. In these examinations, author gathered in excess of 900,000
extra handle endeavors utilizing an alternate automated controller, with a consid-
erably bigger assortment of items. This second mechanical stage was utilized to
test whether consolidating information from numerous robots brings about better
generally getting a handle on capability. Trials showed that our convolutional brain
network getting a handle on regulator makes a high progress rate while getting a
handle on in mess on a wide reach. Author gathered of 800,000 handle endeavors
to prepare the CNN handle forecast model. Objects, including objects that are huge,
little, hard, delicate, deformable, and clear. Supplemental recordings of our getting
a handle on framework showed that the robot utilizes consistent input to continu-
ally change its grip, representing movement of the articles and erroneous incitation
orders. Author additionally contrast our methodology with open-circle variations
to show the significance of ceaseless criticism, as well as a hand-designing getting
a handle on benchmark that utilizes manual hand-to-eye alignment and profundity
detecting. Our technique makes the most elevated progress rates in our examinations.
At long last, author shows the way that can consolidate information gathered for two
unique sorts of robots, and use information from one robot to work on the getting a
handle on capability of another [71, 72].
Mechanical getting a handle on is one of the broadly investigated area of control.
While a total overview of getting a handle on is outside the extent of this work, author
allude the peruser to standard studies regarding the matter for a more complete treat-
ment. Comprehensively, getting a handle on techniques can be classified as mathe-
matically determined and information driven. Mathematical techniques dissect the
state of an objective article and plan an appropriate handle present, in light of stan-
dards, for example, force conclusion or confining. These techniques commonly need
to figure out the calculation of the scene; utilizing profundity or sound system sensors
and matching of recently examined models to perceptions. Author approach is most
firmly connected with ongoing work on self-directed learning of handle presented by
Pinto and Gupta, as well as prior work on gaining from independent trial and error,
proposed to become familiar with an organization to foresee the ideal handle direc-
tion for a given picture fix, prepared with self-administered information gathered
utilizing a heuristic getting a handle on framework in view of article proposition.
As opposed to this earlier work, our methodology accomplishes ceaseless dexterity
for getting a handle on by noticing the gripper and picking the best engine order to
move the gripper toward an effective handle, instead of making open-circle forecasts.
Since our technique utilizes no human comments, author can likewise gather a huge
genuine world dataset totally independently [73, 74].
Since our strategy makes considerably more fragile suppositions about the acces-
sible human management (none) and the accessible detecting (just over-the-shoulder
RGB), direct correlations as far as handle achievement rate to values detailed in earlier
Deep Learning and Robotics, Surgical Robot Applications 175

work are unrealistic. The arrangement of articles that we use for assessment incor-
porates incredibly troublesome items, like straightforward jugs, little round objects,
deformable articles, and mess. An error in object trouble between our work and
earlier examinations further muddles direct correlation of detailed precision. The
point of our work is thusly not to outline which framework is ideal, since such corre-
lations are unthinkable without normalized benchmarks, but instead analyze how
much a getting a handle on strategy dependent completely upon gaining from crude
independently gathered information can scale to mind boggling and different handle
situations [75].
One more related region to creator technique is automated coming to, which
manages coordination and criticism for arriving at movements, and visual servoing,
which tends to move a camera or end-effector to an ideal posture utilizing visual input.
As opposed to our methodology, visual servoing techniques are normally worried
about arriving at a posture comparative with objects in the scene, and frequently
(however not dependably) depend on physically planned or determined highlights
for input control. Photometric visual servoing utilizes an objective picture instead of
highlights, and a few visual servoing strategies have been recommended that don’t
straightforwardly need earlier adjustment between the robot and camera. A few
ongoing visual servoing strategies have additionally utilized learning and PC vision
procedures. Supposedly, no earlier learning-based technique has been suggested that
utilizes visual servoing to straightforwardly move into a represent that expands the
likelihood of achievement on a given errand (like getting a handle on) [76].
To anticipate the ideal engine orders to expand handle achievement, author use
convolutional brain organizations (CNNs) prepared on handle achievement expec-
tation. Albeit the innovation behind CNNs has been known for a really long time
they have made noteworthy progress lately on a wide scope of testing PC vision
benchmarks, turning into the accepted norm for PC vision frameworks. Nonethe-
less, utilizations of CNNs to mechanical control issues has been less predominant,
contrasted with applications to inactive insight undertakings like item acknowl-
edgment, limitation, and division. A few works have proposed to involve CNNs
for profound support learning applications, including playing computer games,
executing straightforward errand space movements for visual serving, controlling
basic recreated mechanical frameworks, and playing out an assortment of automated
control undertakings. Large numbers of these applications have been in straight-
forward or engineered spaces, and every one of them have zeroed in on generally
obliged conditions with little datasets [77].

5 Surgical Robots and Deep Learning

After dominating the market with the amazing Da Vinci framework for many years,
Intuitive Surgical is now finally up against international companies that are vying
for market share with their own iterations of cutting-edge robots [78]. A few studies
have suggested the use of CNNs for sophisticated support learning applications.
176 M. S. Iqbal et al.

These frameworks will typically include open command centers, lighter equipment,
and increased mobility. Even interest in robotization, which hasn’t been seen in
nearly 30 years, has been reignited. The STAR robot can sew inside objects more
effectively than a human hand without human intervention. In order to participate in
the gastrointestinal anastomosis of a pig, it combined 3-layered imaging and sensors
(close infrared fluorescent/NIRF labels) with the concept of managed independent
stitching [63]. The Revo-I, a Korean robot, recently completed its first clinical trials
in a long time, including a groundbreaking prostatectomy that saved Retzius’ life
(RARP). Even in the hands of skilled practitioners, three patients underwent blood
bonding, and the positive edge rate was 23% [79]. This is a great example of legitimate
advertising.
The new devices may be able to reduce the cost of an automated medical proce-
dure to be similar to that of a laparoscopy, even though the underlying equipment
cost may still be substantial. The UK’s Cambridge Medical Robotics has plans to
present more up-to-date costing models that cover support, tools, and even aides as
a whole package in addition to the actual equipment. This may attract multidisci-
plinary development in the east, among high volume open and laparoscopic special-
ists, to advanced mechanics. For instance, lower costs could support a more notable
acknowledgment of an automated medical procedure in eastern India, where prostate
malignant growth is rare but instead aggressive in those who get it. According to data
from the Vattikuti Foundation, there are currently 60 Da Vinci cases in India, with
urologists making up about 50% of the specialists and RARP being the most popular
approach. In a review of a recent series of RARPs from Kolkata, 90% self-control
and a biochemical repeat-free endurance of 75% at 5 years were found in cases of
mostly high-risk prostate cancer. While effective multidisciplinary teamwork will
reduce costs, it is almost certain that the use of Markov displaying will determine the
automated medical procedure’s medium-term cost-adequacy in the developing scene.
The two distinct perspectives in the field of new robots that are stirring up excite-
ment are man-made consciousness (AI) and quicker advanced correspondence, even
though cost may outweigh the titles. The era of careful AI has begun, even though
the concept is not new and can be traced back to Alan Turing, a genius whose deci-
phering skills had a significant impact on the outcome of World War II. Despite how
trendy it may sound, AI is probably going to be the main force behind the digitization
of meticulous practice. Artificial intelligence is the superset of managing a group of
intricate computer programs designed to achieve a goal by making decisions. With
models like visual discrimination, discourse acknowledgment, and language inter-
pretation, it is comparable to human insight in this way. A subset of AI called AI (ML)
uses dynamic PC calculations to understand and respond to specific information. By
determining, for example, whether a particular image represents a prostate malignant
growth, a prostate recognition calculation might enable the machine to reduce the
variability in radiologists’ interpretations of attractive resonance imaging. Current
machine learning frameworks have been transformed by fake neural networks explic-
itly deep learning, graphics processing units, and limitless information stockpiling
limits, making the executions faster, less expensive, and more impressive than at any
time in recent memory. The video accounts of experts performing RARP can now be
converted into Automated Performance Metrics through a black box, and they reveal
Deep Learning and Robotics, Surgical Robot Applications 177

astounding discoveries, such as the finding that not all highly productive experts are
necessarily those who achieve the best results [80].
Medical intervention is intended to become a more dependable, secure, and mini-
mally invasive process through the use of mechanical technology [81, 82]. New devel-
opments are being made in the direction of fully autonomous automated specialists
and robot-assisted frameworks. The careful framework that has been used the most
frequently to date is the da Vinci robot. Through remote-controlled laparoscopic
surgery in gynaecology, urology, and general surgery, it has already demonstrated its
effectiveness [81]. Data in a careful control center of a careful framework with robot
assistance includes crucial nuances for internally employable direction that can help
the dynamic cycle. Typically, this information is presented as 2D images or record-
ings with exacting tools and human tissues. It is a complex problem to comprehend
these details, which also include the posture evaluation of careful instruments near
careful scenes. The semantic grouping of the instruments in the meticulous control
center is a fundamental component of this interaction. Semantic separation of auto-
mated instruments is a challenging task due to the complexity and dynamic nature
of foundation tissues, as well as light changes like shadows and specular reflections,
visual obstructions like blood and camera focal point hazing, and visual impediments
like these. Division veils can be used to make a significant contribution to instrument
GPS systems. This creates a compelling need for the development of precise and
powerful PC vision techniques for the semantic separation of precise instruments
from functional pictures and video. Numerous vision-based techniques have been
developed for mechanical instrument identification and tracking [82]. Instrument
background division can be viewed as a double or occurrence division problem, and
old-style AI calculations have been used for this problem, utilising both surface high-
lights and variety [83, 84]. Later applications found a solution in semantic division,
which refers to the recognition of various instruments or their components [85, 86].
Deep learning-based approaches recently demonstrated performance improve-
ments over conventional AI methods for some biomedical issues [87, 88]. Convolu-
tional brain organisations have been successfully used in the field of clinical imaging
for a variety of purposes, including the analysis of a histology image of a bosom
malignant growth [89], the prediction of bone disease [90], the determination of
age [91], and others [87]. Applications of deep learning-based automated instrument
division have previously shown solid performance in paired division [92, 93], and
hopeful outcomes in multiclass division [94]. Beginning to emerge [95] are deep brain
network changes that are suitable for use in fixed and mobile devices, such as clin-
ical robots. The authors of this paper offer a thorough learning-based approach to the
semantic grouping of mechanical instruments that produces cutting-edge outcomes
in both a two-class and a multi-class setting. By using this method, the author is able
to create a solution for the Robotic Instrument Segmentation MICCAI 2017 Endo-
scopic Vision Sub Challenge [96]. This lodging placed first in the division of double
and multi-class instruments and second in the division of instrument parts sub-tasks.
Here, authors illustrate the arrangement’s subtleties in light of a U-Net model adjust-
ment [97]. Authors also provide additional improvements to this arrangement using
other contemporary profound models.
178 M. S. Iqbal et al.

6 Current Innovation in Surgical Robotics

CNNs with robotic assistance have been suggested in a few works for significant
support. For procedures like oncological surgery, colorectal surgery, and general
anesthesia, minimally invasive surgery (MIS) has recently gained popularity [98].
Natural Orifice Transluminal Endoscopic Surgery (NOTES), with mechanical assis-
tance, has the greatest potential and dependability of any technique for performing
tasks inside the peritoneal depression without stomach entry points. Phee et al. created
a master–slave adaptable endoscopic careful robot [99] to significantly enhance the
mobility of specialists within peritoneal pits. A flexible endoscope directs the slave
robot arms to the precise locations they desire, while the expert robot moves in
accordance with instructions from a specialist at the near end. Although the strength,
adaptability, and consistency of robot-assisted NOTES quickly improved over time,
the absence of precise haptic input remained a critical flaw. As a result, experts rely
heavily on experience and visual information to make decisions [100]. Numerous
studies [31, 101–103] have shown that providing doctors with haptic feedback will
not only significantly shorten the amount of time spent in the operating room during
the procedure, but it will also reduce the instances of excessive or inadequate power
move to reduce tissue damage.
Although Omega.7, CyberForce, and CyberGrasp [104] are among the well-
known haptic devices available on the market, the force data connecting the attentive
robots and worked objects is missing from the cycle. Tendon-Sheath Mechanisms
(TSMs) have been widely used for movement and power transmission in mechanical
frameworks for NOTES due to their high degree of adaptability and responsive-
ness to control in constrained and convoluted ways. TSMs are frequently associated
with issues such as backfire, hysteresis, and nonlinear pressure misfortune due to
grating that exists between the ligament and the associated sheath. Therefore, it is
challenging to obtain precise haptic feedback using these frameworks. At the distal
end of a careful robot, various sensors with working standards for removal [105],
current [106], pressure [107], resistance [108], capacitance [109], vibration [110],
and optical properties [111] can be mounted to clearly determine the connection force
for haptic input. However, the inability to sanitize, the uncomfortable environment
during the procedure, the lack of mounting space at the distal end, issues with the
associated wires and decorations, and other factors typically limit these sensors.
However, extensive efforts have been made to numerically depict the robot’s power
transmission for TSM-driven robots so that models can calculate the power at the
robot’s distal end using estimates from its proximal end. Kaneko et al. conducted
research on the pressure transmission in TSMs [112] in accordance with the Coulomb
rubbing model. Lampaert, Pitkowski, and Others [113, 114] proposed the Dahl,
LuGre, and Leuven models as alternatives to the Coulomb model in an effort to
alter the displaying procedure. However, as these demonstration techniques became
more accurate, they fundamentally began to exhibit irregularities between various
hysteresis stages and were unable to accurately depict the contact force when the
system was operating at zero speed. In a subsequent development of clinical robots,
Deep Learning and Robotics, Surgical Robot Applications 179

the nonlinear erosion in TSMs was unavoidably demonstrated using the Bouc-Wen
model [115]. Backfire pay is a common strategy in Bowden link control to reduce
hysteresis [116, 117]. Do and co. [118, 119] suggested an improved Bouc-Wen model
with dynamic properties that utilized speed and speed increase data to significantly
more accurately depict the erosion profile. It’s important to note that springs were
frequently used in writing to mimic the response of tissue. In fact, in order to effec-
tively perform force expectation on a haptic device, the non-straight property of tissue
needs to be taken into consideration. Wang et al. [120] also used the Voigt, Kelvin, and
Hunt-Crossley models, looked into other approaches for modeling the tissue force
response for TSMs in NOTES while keeping the space speed constant. For each
viscoelastic model that is going to be constructed, a number of challenging bound-
aries must be meticulously deduced for the scenarios where the ligament assumption
is sufficiently large to prevent any framework loosening and the types of collabora-
tions with tissue are constrained. This is necessary in order to accurately predict the
distal power in TSMs using numerical models. In automated control problems where
robots derive strategies directly from images, neural networks have demonstrated
observational results [100, 121]. Learning control arrangements with convolutional
highlights suggest that these components may also possess additional dynamical
framework-related properties. The dynamical framework hypothesis serves as the
impetus for the author’s investigation of methods for integrating the Transition State
model, with significant emphasis on division. The author acknowledges that precise
division necessitates the appropriate selection of visual highlights, and that division
is an essential initial step in numerous robot learning applications.

7 Limitation of Surgical Robot

Instrumentation: According to a significant portion of the studies that are taken into
consideration in this review, the absence of instrumentation designed specifically for
microsurgery severely limits the capabilities of the current robotic surgical systems.
The majority of the published articles examined the viability of robotic microsurgery
performed with the da Vinci surgical robot. Even though this particular system is
approved for seven different types of minimally invasive surgery, it is not recom-
mended or intended to be used for open plastic and reconstructive microsurgery. The
majority of compatible instruments with the da Vinci are thought to be too big to
handle the delicate tissue that is frequently encountered during microsurgery. Using
the da Vinci’s Black Diamond Micro Forceps, nerves and small vessels can be oper-
ated on with success. However, the process of handling submillimeter tissue and
equipment is time-consuming and difficult due to the absence of a comprehensive
set of appropriate microsurgical instruments. When compared to traditional micro-
surgery, using a surgical robot makes routine microsurgical procedures like dissecting
blood vessels, applying vessel clamps, and handling fine sutures more difficult.
The variety of tissues encountered during microsurgery is not completely covered
by the surgical toolkit. The large instruments are also present. Several operations
180 M. S. Iqbal et al.

involving the upper or lower limbs necessitate manipulating a variety of tissues,


including bone, blood vessels, and skin. Right now, there isn’t a robotic system
that has all the tools needed to work on different kinds of tissue. Consequently,
surgical robotics cannot be used exclusively for procedures involving both soft and
hard tissues. Additionally, robotics are ineffective in reconstructive surgery due to the
extensive use of microscopic and macroscopic techniques. It is therefore challenging
and time-consuming to switch between traditional and robotic-assisted microsurgery.
During microsurgery, the right optical aids to magnify and make it easier to see
the surgical field are just as essential as the right surgical instruments for working
on delicate tissue. An endoscopic 3D imaging system with a digital zoom that can
magnify up to ten times is provided by the da Vinci surgical robot. Sadly, the da
Vinci surgical robot’s image quality and magnification are below those of surgical
microscopes. The use of this system is limited because microsurgical procedures
sometimes require more magnification than the da Vinci can provide. It is preferable
to use surgical microscopes or other imaging systems that can provide sufficient
optical magnification while still maintaining high image quality. Due to a lack of
microsurgical instruments and robotic platforms designed specifically for plastic
and reconstructive microsurgery, the theoretical potential of surgical robots cannot
be realized.
Feedback by touching: The lack of tactile feedback during operations has been
cited numerous times by medical professionals as a disadvantage of surgical robotics.
The absence of haptic feedback in current surgical robots is cited as a limitation in just
17 of the reviewed papers. However, it is debatable whether this is a disadvantage.
Even though some people are unhappy that there isn’t haptic feedback, others say
that tactile feedback is optional and can be replaced with something else. It has been
demonstrated that visual feedback during microsurgery can reliably compensate for
this deficit, despite the fact that the capacity to sense the amount of forces applied
to delicate tissue may initially appear to be crucial. In addition, there are those who
contend that microsurgery’s forces are too weak to be felt by humans and should
not be trusted. Although surgical robots do not require tactile feedback, it can still
be beneficial. Soft tissue may deform during manipulation, but the rigid instruments
used in the procedure do not. Numerous clinical studies have demonstrated that
when the needle is handled by two robotic arms, the absence of tactile feedback can
result in needle bending. If implemented in such a way that the surgeon’s forces
are scaled, tactile feedback may also be beneficial. A surgeon can feel and evaluate
artificially increased forces, potentially minimizing unnecessary trauma to delicate
tissue. However, extensive testing is required to determine whether tactile feedback
can lower the risk of soft tissue damage. The differences in tissue trauma between
traditional manual microsurgery and robotic microsurgery with tactile feedback may
be better investigated in future studies.
High Price: In the literature, the cost of purchasing, using, and maintaining surgical
robotic systems has been a frequent topic. Because a single surgical robot can cost
more than $2 million, it takes a significant financial commitment to purchase one.
Running the system and providing a safe environment for robot-assisted surgery
come with additional direct and indirect costs. The cost of a procedure’s consumables
Deep Learning and Robotics, Surgical Robot Applications 181

can range from $1,800 to $4,600 per instrument. In order to familiarize personnel
working in operating rooms with surgical robots, training resources must be allocated.
In order to guarantee these systems’ dependability outside of the operating room,
more staff is required. Because of their intrinsic intricacy, fix and support of careful
robots require specific information. Because of this, hospitals that use surgical robots
have to negotiate service agreements with the manufacturers, which result in a 10%
increase in the system’s annual cost. The use of surgical robots is becoming less
appealing due to the rising costs brought on by the increased demands placed on
personnel and supplies.
If costly treatment options are linked to improved outcomes or increased revenue
over time, hospitals may benefit. However, only a small amount of evidence indicates
that plastic reconstructive microsurgery is an exception. There are currently few
reasons to spend a lot of money on surgical robots. The operating time for robotic-
assisted microsurgery is also longer than that of traditional microsurgery, according to
published data. As a result, waiting times may be longer and the number of patients
who can be treated may be reduced. The claimed paradoxical cost savings from
shorter hospital stays and fewer post-operative complications do not yet outweigh the
investment required to justify the use of surgical robots in plastic and reconstructive
microsurgery. Very few plastic surgery departments will currently be willing to invest
in robotic-assisted surgery unless patient turnover and cost efficiency both rise.
In surgical training, the apprenticeship model is frequently utilized, in which
students first observe a skilled professional before becoming more involved in proce-
dures. Typically, surgical robots only permit a single surgeon to complete the proce-
dure and operate the entire system. As a result, assistants rarely have the opportunity
to participate in robotically assisted tasks. Surgeons’ exposure to surgical robotics and
opportunities to improve their skills may be limited if they are not actively involved.
This problem can be solved by switching surgeons in the middle of the procedure
or by using two or more complete surgical robotic systems. Even though switching
between different users is a quick process, clinical outcomes may be jeopardized
if crucial circumstances are delayed. It might be safer to train new surgeons with
multiple surgical robots. Students can learn the skills necessary for robotic micro-
surgery while also providing the lead surgeon with an assistant who can assist during
the procedure. However, considering that each robotic system can cost more than
$2 million, it is difficult to justify purchasing one solely for training purposes. Last
but not least, it’s important to know that surgical robots shouldn’t replace traditional
microsurgery; rather, they should be seen as an additional tool. The skills required
for each type of microsurgery are very different. Due to the very different movements
and handling of delicate tissue, the skills required to successfully use a surgical robot
in these circumstances cannot be directly applied to conventional microsurgery. For
future surgeons to be able to deal with the many different problems that will arise
during their careers, they will need to receive training in both conventional and
robotic-assisted microsurgery. Therefore, surgical training ought to incorporate both
traditional and robotically assisted surgical experience.
182 M. S. Iqbal et al.

Procedure flow: According to research, traditional manual microsurgery may be


preferable to robotic surgery in some instances. Willems and his colleagues demon-
strated that traditional surgery is quicker than robotic-assisted microsurgery when
there is sufficient access to the surgical field. Only by reviewing patients prior to
any treatment and planning procedures in advance can the best treatment plan be
developed. Because there will always be some degree of uncertainty, it is chal-
lenging to predict which procedures will provide good surgical access and which
will not. Consequently, in order to achieve the desired outcomes, surgeons may need
to switch between robotic and conventional surgery during a procedure. It is abso-
lutely possible to transition during a procedure; however, this is a laborious and
time-consuming process that requires the operating room staff to be knowledgeable
about surgical robots. Costs could rise if this procedure is put off for too long. In
addition, complications may be more likely in situations that extend surgical and anes-
thetic procedures. Surgical robots must be able to accommodate uncertainty during
microsurgery and facilitate a seamless and quick transition between conventional
and robotic microsurgery in order to maximize surgical workflow [122].

8 Future Direction of Surgical Robot

RAS systems and supplies ought to begin to become more affordable as a result
of market competition for laparoscopic robotic assisted surgery. Laparoscopic RAS
ought to become more affordable as a result of this. RAS ought to be used more
frequently for laparoscopic procedures due to the benefits it provides to the patient
and the cost savings it provides. Laparoscopic RAS surgery will continue to become
more affordable due to the economies of scale that result from lower costs for RAS
systems, supplies, and maintenance [123].
Despite the fact that da Vinci continues to dominate the market for single port
laparoscopic RAS surgery, we can see that a few rival systems are still in the testing
phase. The cost and frequency of single port laparoscopic RAS surgery should go
down as a result of these systems’ availability. Single port laparoscopic RAS surgery
is likely to become the technique of choice for both surgeons and patients due to
the advantages of almost scar-free surgery and the decreasing costs. Endo Wrist
instruments with a single port are likely to be purchased by hospitals that have
purchased the da Vinci Xi system in order to perform both single-port and multi-port
laparoscopic surgery with the same RAS system. As single-port laparoscopic RAS
systems become available in the operating room, we are likely to see an increase in
the use of NOTES for genuine scar-free procedures. Similar to how Intuitive Survival
introduced the dedicated single port laparoscopic RAS system for the da Vinci SP
[123], they will probably introduce instruments that the da Vinci SP can use with
NOTES procedures to compete with the new NOTES-specific systems on the market.
Finally, both new RAS systems and upgrades to existing RAS systems are likely to
include augmented reality as a standard feature. Surgeons will be able to overlay real-
time endoscope camera feeds on top of elements of the operating room workspace
Deep Learning and Robotics, Surgical Robot Applications 183

using augmented reality [53, 86]. Technology advancements that can map features
like blood vessels, nerves, and even tumors and overlay their locations on the
surgeon’s display in real time have made this possible [54–56, 80]. Overlaid medical
images can also include images taken prior to a diagnosis or intervention design.
By assisting the surgeon in locating the area of interest and avoiding major blood
vessels and nerves that could cause the patient problems after surgery, this will help
the surgeon provide the safest and best care possible throughout the intervention.
New surgical systems that improve either manipulation or imaging, two essential
aspects of surgery, must be researched. Given the widespread adoption of these
technologies, it seems inevitable that new and improved imaging will be developed.
They must continue in order to keep up with robotic technology advancements on
the manipulation side [124].
The use of robotic surgery is still in its infancy. Equipment is incorporating new
technologies to boost performance and cut down on downtime. Siemens employee
Balasubramaniac asserts that digital twins and AI will improve future performance.
The procedure can undoubtedly be recorded and analyzed in the future for educa-
tional and process improvement purposes using the digital twin technology. It is
necessary to keep a minute-by-minute record of the process. There is a lot of hope that
robotic surgery will eventually improve precision, efficiency, and safety while poten-
tially lowering healthcare costs. Additionally, it may facilitate access to specialists
in difficult-to-reach locations. Santosh Kesari, M.D., Ph.D., co-founder and director
of neuro-oncology at the Pacific Neuroscience Institute in Santa Monica, California,
stated, Access to surgical expertise is limited in many rural areas of the United
States as well as in many parts of the world. It is anticipated that robotic-assisted
surgical equipment will be utilized by a growing number of healthcare facilities
for both in-person and online procedures. The technology will keep developing and
improving.
The technology of the future will be more adaptable, portable, and based on AI.
Additional robotic equipment, such as handheld devices, will be developed to accel-
erate telehealth and remote care. How quickly high-speed communication infrastruc-
ture is established will play a role in this.5G will be useful due to its 20 Gbps peak
data rate and 1 ms latency, but 6G is anticipated to be even better. With a latency
of 0.1 ms, 6G’s peak data rate ought to theoretically reach one terabit per second.
However, speeds can vary significantly depending on the technology’s application
and location. Open Signal, a company that monitors 5G performance all over the
world, asserts that South Korea frequently takes the lead in achieving the fastest 5G
performance, such as Ultra-Wideband download speeds of 988.37 Mbps. Verizon,
on the other hand, recently achieved a peak speed of 1.13 Gbps. The speed is signifi-
cantly impacted by the position of the 5G antennas. Even if you only reach your peak
performance once, that does not mean it will last. 5G has a long way to go before it
reaches 20 Gbps, even though it is currently at 1 Gbps. In conclusion, the medical
field can benefit greatly from remote robotic-assisted surgery. There are numerous
advantages. Ramp-up time will be affected by reliable communications systems and
secure chips, as well as the capacity to monitor each component in the numerous
interconnected systems that must cooperate for RAS to be successful.
184 M. S. Iqbal et al.

9 Discussion

A couple of works have proposed to use CNNs, A few works have proposed to
include CNNs for significant help learning applications; Action word Careful is a
joint undertaking among Johnson and Johnson’s clinical equipment division Ethicon
and Google’s life sciences division Verily. It has as of late planned its most memorable
computerized a medical procedure model, bragging driving edge mechanical abilities
and top tier clinical gadget innovation. Mechanical technology, representation, high
level instrumentation, information investigation and network are its excellent points
of support. IBM’s Watson additionally anticipates being an astute careful colleague. It
is a harbinger of limitless clinical data, utilizing regular language handling to explain
a specialist’s questions. It is right now being utilized to investigate electronic clin-
ical records and arrangement growth qualities determined to form more customized
treatment plans. Medical procedure might be additionally democratized by low inert-
ness ultrafast 5G availability. The Internet of Skills could make distant mechanical
medical procedure, educating and mentorship effectively available, independent of
the area of the master specialist [125]. In rundown the three trendy expressions for the
eventual fate of automated a medical procedure are-cost, information and network.
The effect of these advancements on tolerant consideration is being watched with
extensive interest. Author mean to examine assuming the exhibition improves when
train the CNNs with careful pictures [85]. Author will investigate how to extricate
predictable construction across conflicting showings and find that a few careful exhi-
bitions have circles, i.e., dreary movements where the specialist rehashes a subtask
until progress. Combining these movements into solitary crude is a significant need
for us. The subsequent stage is to apply this and future mechanized division strategies
to expertise appraisal and strategy acquiring.
A potential following stage of author work is to utilize the weighting factor grid
for helping techniques to all the more proficiently train the bound together state
assessment model. Albeit demonstrated as a FSM, the fine-grained states inside
each careful assignment are assessed autonomously, without impact from the past
state(s). One more potential following stage is perform state forecast in light of
recently assessed state succession. Later on, additionally plan to apply this state
assessment system to applications, for example, brilliant help advancements and
directed independence for careful subtasks.
This study had a few limits. In the first place, the proposed framework was
applied to Video Sets of preparing model and patients with thyroid disease who
went through BABA medical procedure. It is important to check the adequacy of
the proposed framework utilizing different careful techniques and careful regions.
Secondly, author couldn’t straightforwardly look at the exhibitions of the kinematics
and proposed picture based strategies since admittance to the da Vinci Research
Interface is restricted, permitting most scientists just to get kinematic crude informa-
tion [85]. Nonetheless, past investigations have detailed that the kinematics strategy
utilizing da Vinci robot had a mistake of somewhere around 4 mm [9]. Direct exami-
nation of execution is troublesome in light of the fact that the careful pictures utilized
Deep Learning and Robotics, Surgical Robot Applications 185

in the past review and in this study varied. Nonetheless, the normal RMSE of the
proposed picture based following calculation was 3.52 mm, demonstrating that this
strategy is more precise than the kinematics technique and that the last option can’t be
portrayed as prevalent. The exhibition of the ongoing technique with the past visual
strategy couldn’t be straightforwardly looked at on the grounds that no comparable
review distinguished and followed the tip directions of the SIs. Nonetheless, studies
have utilized profound learning-based discovery strategies to decide the bouncing
boxes of the SIs and to show the direction of the middle places of these cases [94, 95].
By the by, on the grounds that this approach couldn’t decide the particular areas of the
SIs, it can’t be viewed as an exact following technique instinctively. Correlation of
the quantitative presentation of the proposed strategy and different methodologies are
significant, making it important to analyze different SI following techniques. Thirdly,
since SIs is recognized on two-layered sees, mistakes might happen because of the
shortfall of profundity data. Mistakes of amplification were in this manner limited
by estimating the width of SIs on the view and changing pixels over completely to
millimeters. Notwithstanding, techniques are expected to use three-layered data in
light of stereoscopic matching of left and right pictures during mechanical medical
procedure [10, 11]. Fourth, on the grounds that the proposed technique is a mix of a
few calculations, longer recordings can bring about the aggregation of extra blunders,
corrupting the exhibition of the framework. Consequently, specifically, it is impor-
tant to prepare extra regrettable models with the occasion division system, which is
the start of the pipeline. For instance, cloth or cylinders on the mechanical medical
procedure view can be perceived as SIs (Supplementary Figure S4). At last, since
blunders from re-recognizable proof in the following system could fundamentally
influence the capacity to decide right directions, exact evaluation of careful abilities
requires manual amendment of mistakes.
Regardless of the advancement in present work, there actually exist a few restric-
tions of profound learning models toward a capable web-based ability appraisal. To
begin with, as affirmed by our outcomes, the classification exactness of managed
profound learning depends intensely on the marked examples. The essential worry
in this study lies with the JIGSAWS dataset and the absence of severe ground-truth
names of ability levels. It is vital to make reference to that there is an absence of
agreement in the ground-truth explanation of careful abilities. In the GRS-based
marking, ability names were explained in view of the predefined cutoff edge of
GRS scores, nonetheless, no usually acknowledged cutoff exists. For future work, a
refined marking approach with more grounded ground-truth information on specialist
ability might additionally further develop the general expertise evaluation [9, 10].
Second, author will look for a point by point streamlining of our profound design,
boundary settings, and expansion systems to all the more likely handle movement
time series information and further develop the internet based execution further. Like-
wise, the interpretability of consequently educated portrayals is at present restricted
because of the discovery idea of profound learning models. It would be intriguing to
examine a perception of profound various leveled portrayals to comprehend stowed
away ability designs, in order to all the more likely legitimize the choice taken by a
profound learning classifier.
186 M. S. Iqbal et al.

At this point, creator acknowledges that the huge limitation of the Profound
construction is its high estimation power. Running various significant mind networks
consistently requires different dealing with units, which limits the update speeds of
the trackers. Lightweight significant cerebrum associations will be perfect for contin-
uous cautious applications, at whatever point changed without choosing accuracy.
As late progress has been made on profound learning-based redoing and conveying
strategies [117, 118], a future course could be utilizing a learnable tissue tracker
and gadget tracker to expansion partner improve the knowledge framework. Another
bearing to pursue is cautious task computerization. By including the clear environ-
ment as analysis, controllers applied to the cautious instrument will really need to
accomplish tasks in unstructured, misshaping cautious circumstances.

10 Conclusions

Deep learning techniques are currently surpassing the prior level of workmanship
in a wide range of advanced mechanics, clinical mechanical technology, and fix
improvement endeavors. According to our main question, has the investigation of
careful robot changed deep learning in light of this rapid advancement? Although
the answer to this question is straightforwardly dependent on the specific issue and
space, we accept that deep learning has yet to comprehend or trigger a critical defining
moment in its ability to change. Despite being evaluated as having a strong position
in many different areas, issues like these have not yet been addressed by tactical
improvements in the ability to predict the future. This research focuses on careful
robot and deep learning, progress, achievement, and future perspectives. This area
requires more attention; in future more medical or surgical robots are needed.

References

1. Levine, S., Finn, C., Darrell, T., & Abbeel, P. (2016). End-to-end training of deep visuomotor
policies. The Journal of Machine Learning Research, 17(1), 1334–1373.
2. Bakshi, G., Kumar, A., & Puranik, A. N. (2022). Adoption of robotics technology in healthcare
sector. In Advances in communication, devices and networking (pp. 405–414). Singapore:
Springer.
3. Maibaum, A., Bischof, A., Hergesell, J., & Lipp, B. (2022). A critique of robotics in health
care. AI & Society, 37(2), 467–477.
4. Tasioulas, J. (2019). First steps towards an ethics of robots and artificial intelligence. Journal
of Practical Ethics, 7(1).
5. Hallevy, G. (2013). When robots kill: Artificial intelligence under criminal law. UPNE.
6. Bryndin, E. (2019). Robots with artificial intelligence and spectroscopic sight in hi-tech labor
market. International Journal of Systems Science and Applied Mathematic, 4(3), 31–37.
7. Lopes, V., Alexandre, L. A. & Pereira, N. (2019). Controlling robots using artificial
intelligence and a consortium blockchain. arXiv:1903.00660.
Deep Learning and Robotics, Surgical Robot Applications 187

8. Bataev, A. V., Dedyukhina, N., & Nasrutdinov, M. N. (2020, February). Innovations in the
financial sphere: performance evaluation of introducing service robots with artificial intel-
ligence. In 2020 9th International Conference on Industrial Technology and Management
(ICITM) (pp. 256–260). IEEE.
9. Nitto, H., Taniyama, D., & Inagaki, H. (2017). Social acceptance and impact of robots and
artificial intelligence. Nomura Research Institute Papers, 211, 1–15.
10. Yoganandhan, A., Kanna, G. R., Subhash, S. D., & Jothi, J. H. (2021). Retrospective and
prospective application of robots and artificial intelligence in global pandemic and epidemic
diseases. Vacunas (English Edition), 22(2), 98–105.
11. Rajan, K., & Saffiotti, A. (2017). Towards a science of integrated AI and Robotics. Artificial
Intelligence, 247, 1–9.
12. Chatila, R., Renaudo, E., Andries, M., Chavez-Garcia, R. O., Luce-Vayrac, P., Gottstein, R.,
Alami, R., Clodic, A., Devin, S., Girard, B., & Khamassi, M. (2018). Toward self-aware
robots. Frontiers in Robotics and AI, 5, 88.
13. Gonzalez-Jimenez, H. (2018). Taking the fiction out of science fiction:(Self-aware) robots
and what they mean for society, retailers and marketers. Futures, 98, 49–56.
14. Schostek, S., Schurr, M. O., & Buess, G. F. (2009). Review on aspects of artificial tactile
feedback in laparoscopic surgery. Medical Engineering & Physics, 31(8), 887–898.
15. Naitoh, T., Gagner, M., Garcia-Ruiz, A., Heniford, B. T., Ise, H., & Matsuno, S. (1999). Hand-
assisted laparoscopic digestive surgery provides safety and tactile sensation for malignancy
or obesity. Surgical Endoscopy, 13(2), 157–160.
16. Schostek, S., Ho, C. N., Kalanovic, D., & Schurr, M. O. (2006). Artificial tactile sensing in
minimally invasive surgery–a new technical approach. Minimally Invasive Therapy & Allied
Technologies, 15(5), 296–304.
17. Kraft, B. M., Jäger, C., Kraft, K., Leibl, B. J., & Bittner, R. (2004). The AESOP robot system in
laparoscopic surgery: Increased risk or advantage for surgeon and patient? Surgical Endoscopy
And Other Interventional Techniques, 18(8), 1216–1223.
18. Troisi, R. I., Patriti, A., Montalti, R., & Casciola, L. (2013). Robot assistance in liver surgery:
A real advantage over a fully laparoscopic approach? Results of a comparative bi-institutional
analysis. The International Journal of Medical Robotics and Computer Assisted Surgery, 9(2),
160–166.
19. Dupont, P. E., Nelson, B. J., Goldfarb, M., Hannaford, B., Menciassi, A., O’Malley, M. K.,
Simaan, N., Valdastri, P., & Yang, G. Z. (2021). A decade retrospective of medical robotics
research from 2010 to 2020. Science Robotics, 6(60), eabi8017.
20. Fuchs, K. H. (2002). Minimally invasive surgery. Endoscopy, 34(02), 154–159.
21. Robinson, T. N., & Stiegmann, G. V. (2004). Minimally invasive surgery. Endoscopy, 36(01),
48–51.
22. McDonald, G. J. (2021) Design and modeling of millimeter-scale soft robots for medical
applications (Doctoral dissertation, University of Minnesota).
23. Currò, G., La Malfa, G., Caizzone, A., Rampulla, V., & Navarra, G. (2015). Three-dimensional
(3D) versus two-dimensional (2D) laparoscopic bariatric surgery: A single-surgeon prospec-
tive randomized comparative study. Obesity Surgery, 25(11), 2120–2124.
24. Dogangil, G., Davies, B. L., & Rodriguez, Y., & Baena, F. (2010) A review of medical robotics
for minimally invasive soft tissue surgery. Proceedings of the Institution of Mechanical
Engineers, Part H: Journal of Engineering in Medicine, 224(5), 653–679.
25. Yu, L., Wang, Z., Yu, P., Wang, T., Song, H., & Du, Z. (2014). A new kinematics method
based on a dynamic visual window for a surgical robot. Robotica, 32(4), 571–589.
26. Byrn, J. C., Schluender, S., Divino, C. M., Conrad, J., Gurland, B., Shlasko, E., & Szold,
A. (2007). Three-dimensional imaging improves surgical performance for both novice and
experienced operators using the da Vinci Robot System. The American Journal of Surgery,
193(4), 519–522.
27. Kim, S., Chung, J., Yi, B. J., & Kim, Y. S. (2010). An assistive image-guided surgical robot
system using O-arm fluoroscopy for pedicle screw insertion: Preliminary and cadaveric study.
Neurosurgery, 67(6), 1757–1767.
188 M. S. Iqbal et al.

28. Nagy, T. D., & Haidegger, T. (2019). A dvrk-based framework for surgical subtask automation.
Acta Polytechnica Hungarica (pp.61–78).
29. Millan, B., Nagpal, S., Ding, M., Lee, J. Y., & Kapoor, A. (2021). A scoping review of
emerging and established surgical robotic platforms with applications in urologic surgery.
Société Internationale d’Urologie Journal, 2(5), 300–310
30. Nagyné Elek, R., & Haidegger, T. (2019). Robot-assisted minimally invasive surgical skill
assessment—Manual and automated platforms. Acta Polytechnica Hungarica, 16(8), 141–
169.
31. Okamura, A. M. (2009). Haptic feedback in robot-assisted minimally invasive surgery. Current
Opinion Urology, 19(1), 102.
32. Bark, K., McMahan, W., Remington, A., Gewirtz, J., Wedmid, A., Lee, D. I., & Kuchenbecker,
K. J. (2013). In vivo validation of a system for haptic feedback of tool vibrations in robotic
surgery. Surgical Endoscopy, 27(2), 656–664.
33. Van der Meijden, O. A., & Schijven, M. P. (2009). The value of haptic feedback in conventional
and robot-assisted minimal invasive surgery and virtual reality training: A current review.
Surgical Endoscopy, 23(6), 1180–1190.
34. Bethea, B. T., Okamura, A. M., Kitagawa, M., Fitton, T. P., Cattaneo, S. M., Gott, V. L.,
Baumgartner, W. A., & Yuh, D. D. (2004). Application of haptic feedback to robotic surgery.
Journal of Laparoendoscopic & Advanced Surgical Techniques, 14(3), 191–195.
35. Amirabdollahian, F., Livatino, S., Vahedi, B., Gudipati, R., Sheen, P., Gawrie-Mohan, S., &
Vasdev, N. (2018). Prevalence of haptic feedback in robot-mediated surgery: A systematic
review of literature. Journal of robotic surgery, 12(1), 11–25.
36. Okamura, A. M. (2004). Methods for haptic feedback in teleoperated robot-assisted surgery.
Industrial Robot: An International Journal, 31(6), 499–508.
37. Pacchierotti, C., Scheggi, S., Prattichizzo, D., & Misra, S. (2016). Haptic feedback for
microrobotics applications: A review. Frontiers in Robotics and AI, 3, 53.
38. Yeh, C. H., Su, F. C., Shan, Y. S., Dosaev, M., Selyutskiy, Y., Goryacheva, I., & Ju, M. S.
(2020). Application of piezoelectric actuator to simplified haptic feedback system. Sensors
and Actuators A: Physical, 303, 111820.
39. Okamura, A. M., Dennerlein, J. T., & Howe, R. D. (1998, May). Vibration feedback models for
virtual environments. In Proceedings of the 1998 IEEE International Conference on Robotics
and Automation (Cat. No. 98CH36146) (Vol. 1, pp. 674–679). IEEE.
40. Luostarinen, L. O., Åman, R., & Handroos, H. (2016, October). Haptic joystick for improving
controllability of remote-operated hydraulic mobile machinery. In Fluid Power Systems
Technology (Vol. 50473, p. V001T01A003). American Society of Mechanical Engineers.
41. Shang, W., Su, H., Li, G., & Fischer, G. S. (2013, November). Teleoperation system with hybrid
pneumatic-piezoelectric actuation for MRI-guided needle insertion with haptic feedback. In
2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 4092–4098).
IEEE.
42. Kim, P., Kim, S., Park, Y. D., & Choi, S. B. (2016). Force modeling for incisions into various
tissues with MRF haptic master. Smart Materials and Structures, 25(3), 035008.
43. Hooshiar, A., Payami, A., Dargahi, J., & Najarian, S. (2021). Magnetostriction-based
force feedback for robot-assisted cardiovascular surgery using smart magnetorheological
elastomers. Mechanical Systems and Signal Processing, 161, 107918.
44. Shokrollahi, E., Goldenberg, A. A., Drake, J. M., Eastwood, K. W., & Kang, M. (2018,
December). Application of a nonlinear Hammerstein-Wiener estimator in the development
and control of a magnetorheological fluid haptic device for robotic bone biopsy. In Actuators
(Vol. 7, No. 4, p. 83). MDPI.
45. Najmaei, N., Asadian, A., Kermani, M. R., & Patel, R. V. (2015). Design and performance
evaluation of a prototype MRF-based haptic interface for medical applications. IEEE/ASME
Transactions on Mechatronics, 21(1), 110–121.
46. Song, Y., Guo, S., Yin, X., Zhang, L., Wang, Y., Hirata, H., & Ishihara, H. (2018). Design and
performance evaluation of a haptic interface based on MR fluids for endovascular tele-surgery.
Microsystem Technologies, 24(2), 909–918.
Deep Learning and Robotics, Surgical Robot Applications 189

47. Kikuchi, T., Takano, T., Yamaguchi, A., Ikeda, A. and Abe, I. (2021, September). Haptic
interface with twin-driven MR fluid actuator for teleoperation endoscopic surgery system. In
Actuators (Vol. 10, No. 10, p. 245). MDPI.
48. Najmaei, N., Asadian, A., Kermani, M. R. & Patel, R. V. (2015, September). Performance
evaluation of Magneto-Rheological based actuation for haptic feedback in medical applica-
tions. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
(pp. 573–578). IEEE.
49. Gao, Q., Zhan, Y., Song, Y., Liu, J., & Wu, J. (2021, August). An MR fluid based master manip-
ulator of the vascular intervention robot with haptic feedback. In 2021 IEEE International
Conference on Mechatronics and Automation (ICMA) (pp. 158–163). IEEE.
50. Nguyen, N. D., Truong, T. D., Nguyen, D. H. & Nguyen, Q. H. (2019, March). Development
of a 3D haptic spherical master manipulator based on MRF actuators. In Active and Passive
Smart Structures and Integrated Systems XIII (Vol. 10967, pp. 431–440). SPIE.
51. Kim, S., Kim, P., Park, C. Y., & Choi, S. B. (2016). A new tactile device using magneto-
rheological sponge cells for medical applications: Experimental investigation. Sensors and
Actuators A: Physical, 239, 61–69.
52. Cha, S. W., Kang, S. R., Hwang, Y. H., & Choi, S. B. (2017, April). A single of MR sponge
tactile sensor design for medical applications. In Active and Passive Smart Structures and
Integrated Systems (Vol. 10164, pp. 520–525). SPIE.
53. Oh, J. S., Sohn, J. W., & Choi, S. B. (2018). Material characterization of hardening soft sponge
featuring MR fluid and application of 6-DOF MR haptic master for robot-assisted surgery.
Materials, 11(8), 1268.
54. Park, Y. J., & Choi, S. B. (2021). A new tactile transfer cell using magnetorheological materials
for robot-assisted minimally invasive surgery. Sensors, 21(9), 3034.
55. Park, Y. J., Yoon, J. Y., Kang, B. H., Kim, G. W., & Choi, S. B. (2020). A tactile device
generating repulsive forces of various human tissues fabricated from magnetic-responsive
fluid in porous polyurethane. Materials, 13(5), 1062.
56. Park, Y. J., Lee, E. S., & Choi, S. B. (2022). A cylindrical grip type of tactile device using
Magneto-Responsive materials integrated with surgical robot console: design and analysis.
Sensors, 22(3), 1085.
57. Martin, J. A., Regehr, G., Reznick, R., Macrae, H., Murnaghan, J., Hutchison, C., & Brown,
M. (1997). Objective structured assessment of technical skill (OSATS) for surgical residents.
British Journal of Surgery, 84(2), 273–278.
58. Vassiliou, M. C., Feldman, L. S., Andrew, C. G., Bergman, S., Leffondré, K., Stanbridge, D., &
Fried, G. M. (2005). A global assessment tool for evaluation of intraoperative laparoscopic
skills. The American Journal of Surgery, 190(1), 107–113.
59. Goh, A. C., Goldfarb, D. W., Sander, J. C., Miles, B. J., & Dunkin, B. J. (2012). Global
evaluative assessment of robotic skills: Validation of a clinical assessment tool to measure
robotic surgical skills. The Journal of Urology, 187(1), 247–252.
60. Insel, A., Carofino, B., Leger, R., Arciero, R., & Mazzocca, A. D. (2009). The development
of an objective model to assess arthroscopic performance. JBJS, 91(9), 2287–2295.
61. Champagne, B. J., Steele, S. R., Hendren, S. K., Bakaki, P. M., Roberts, P. L., Delaney, C. P.,
Brady, J. T., & MacRae, H. M. (2017). The American Society of Colon and Rectal Surgeons
assessment tool for performance of laparoscopic colectomy. Diseases of the Colon & Rectum,
60(7), 738–744.
62. Koehler, R. J., Amsdell, S., Arendt, E. A., Bisson, L. J., Bramen, J. P., Butler, A., Cosgarea, A.
J., Harner, C. D., Garrett, W. E., Olson, T., & Warme, W. J. (2013). The arthroscopic surgical
skill evaluation tool (ASSET). The American Journal of Sports Medicine, 41(6), 1229–1237.
63. Shademan, A., Decker, R. S., Opfermann, J. D., Leonard, S., Krieger, A., & Kim, P. C. (2016).
Supervised autonomous robotic soft tissue surgery. Science Translational Medicine, 8(337),
337ra64–337ra64.
64. Garrow, C. R., Kowalewski, K. F., Li, L., Wagner, M., Schmidt, M. W., Engelhardt, S.,
Hashimoto, D. A., Kenngott, H. G., Bodenstedt, S., Speidel, S., & Mueller-Stich, B. P. (2021).
Machine learning for surgical phase recognition: A systematic review. Annals of Surgery,
273(4), 684–693.
190 M. S. Iqbal et al.

65. Deo, R. C. (2015). Machine learning in medicine. Circulation, 132(20), 1920–1930.


66. Lee, C. K., Hofer, I., Gabel, E., Baldi, P., & Cannesson, M. (2018). Development and vali-
dation of a deep neural network model for prediction of postoperative in-hospital mortality.
Anesthesiology, 129(4), 649–662.
67. Maier-Hein, L., Vedula, S. S., Speidel, S., Navab, N., Kikinis, R., Park, A., Eisenmann, M.,
Feussner, H., Forestier, G., Giannarou, S., & Hashizume, M. (2017). Surgical data science for
next-generation interventions. Nature Biomedical Engineering, 1(9), 691–696.
68. Maier-Hein, L., Eisenmann, M., Sarikaya, D., März, K., Collins, T., Malpani, A., Fallert, J.,
Feussner, H., Giannarou, S., Mascagni, P., & Nakawala, H. (2022). Surgical data science–from
concepts toward clinical translation. Medical Image Analysis, 76, 102306.
69. Kosak, O., Wanninger, C., Angerer, A., Hoffmann, A., Schiendorfer, A., & Seebach, H.
(2016, September). Towards self-organizing swarms of reconfigurable self-aware robots. In
2016 IEEE 1st International Workshops on Foundations and Applications of Self * Systems
(FAS* W) (pp. 204–209). IEEE.
70. Pierson, H. A., & Gashler, M. S. (2017). Deep learning in robotics: A review of recent research.
Advanced Robotics, 31(16), 821–835.
71. Sünderhauf, N., Brock, O., Scheirer, W., Hadsell, R., Fox, D., Leitner, J., Upcroft, B., Abbeel,
P., Burgard, W., Milford, M., & Corke, P. (2018). The limits and potentials of deep learning
for robotics. The International Journal of Robotics Research, 37(4–5), 405–420.
72. Miyajima, R. (2017). Deep learning triggers a new era in industrial robotics. IEEE Multimedia,
24(4), 91–96.
73. Degrave, J., Hermans, M., & Dambre, J. (2019) A differentiable physics engine for deep
learning in robotics. Frontiers in Neurorobotics, 6.
74. Károly, A. I., Galambos, P., Kuti, J., & Rudas, I. J. (2020). Deep learning in robotics:
Survey on model structures and training strategies. IEEE Transactions on Systems, Man,
and Cybernetics: Systems, 51(1), 266–279.
75. Mouha, R. A. (2021). Deep learning for robotics. Journal of Data Analysis and Information
Processing, 9(02), 63.
76. Morales, E. F., Murrieta-Cid, R., Becerra, I., & Esquivel-Basaldua, M. A. (2021). A survey on
deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement
learning. Intelligent Service Robotics, 14(5), 773–805.
77. McLaughlin, E., Charron, N., & Narasimhan, S. (2020). Automated defect quantification in
concrete bridges using robotics and deep learning. Journal of Computing in Civil Engineering,
34(5), 04020029.
78. Rassweiler, J. J., Autorino, R., Klein, J., Mottrie, A., Goezen, A. S., Stolzenburg, J. U., Rha,
K. H., Schurr, M., Kaouk, J., Patel, V., & Dasgupta, P. (2017). Future of robotic surgery in
urology. BJU International, 120(6), 822–841.
79. Chang, K. D., Abdel Raheem, A., Choi, Y. D., Chung, B. H., & Rha, K. H. (2018). Retzius-
sparing robot-assisted radical prostatectomy using the Revo-i robotic surgical system: Surgical
technique and results of the first human trial. BJU International, 122(3), 441–448.
80. Chen, J., Oh, P. J., Cheng, N., Shah, A., Montez, J., Jarc, A., Guo, L., Gill, I. S., & Hung,
A. J. (2018). Use of automated performance metrics to measure surgeon performance during
robotic vesicourethral anastomosis and methodical development of a training tutorial. The
Journal of Urology, 200(4), 895–902.
81. Burgner-Kahrs, J., Rucker, D. C., & Choset, H. (2015). Continuum robots for medical
applications: A survey. IEEE Transactions on Robotics, 31(6), 1261–1280.
82. Münzer, B., Schoeffmann, K., & Böszörmenyi, L. (2018). Content-based processing and
analysis of endoscopic images and videos: A survey. Multimedia Tools and Applications,
77(1), 1323–1362.
83. Speidel, S., Delles, M., Gutt, C., & Dillmann, R. (2006, August). Tracking of instruments in
minimally invasive surgery for surgical skill analysis. In International Workshop on Medical
Imaging and Virtual Reality (pp. 148–155). Berlin, Heidelberg: Springer.
84. Doignon, C., Nageotte, F., & Mathelin, M. D. (2006). Segmentation and guidance of multiple
rigid objects for intra-operative endoscopic vision. In Dynamical Vision (pp. 314–327). Berlin,
Heidelberg: Springer.
Deep Learning and Robotics, Surgical Robot Applications 191

85. Pezzementi, Z., Voros, S., & Hager, G. D. (2009, May). Articulated object tracking by
rendering consistent appearance parts. In 2009 IEEE International Conference on Robotics
and Automation (pp. 3940–3947). IEEE.
86. Bouget, D., Benenson, R., Omran, M., Riffaud, L., Schiele, B., & Jannin, P. (2015). Detecting
surgical tools by modelling local appearance and global shape. IEEE Transactions on Medical
Imaging, 34(12), 2603–2617.
87. Ching, T., Himmelstein, D. S., Beaulieu-Jones, B. K., Kalinin, A. A., Do, B. T., Way, G. P.,
Ferrero, E., Agapow, P. M., Zietz, M., Hoffman, M. M., & Xie, W. (2018). Opportunities and
obstacles for deep learning in biology and medicine. Journal of The Royal Society Interface,
15(141), 20170387.
88. Kalinin, A. A., Higgins, G. A., Reamaroon, N., Soroushmehr, S., Allyn-Feuer, A., Dinov, I.
D., Najarian, K., & Athey, B. D. (2018). Deep learning in pharmacogenomics: From gene
regulation to patient stratification. Pharmacogenomics, 19(7), 629–650.
89. Yong, C. W., Teo, K., Murphy, B. P., Hum, Y. C., Tee, Y. K., Xia, K., & Lai, K. W. (2021).
Knee osteoarthritis severity classification with ordinal regression module. Multimedia Tools
and Applications, 1–13.
90. Tiulpin, A., Thevenot, J., Rahtu, E., Lehenkari, P., & Saarakkala, S. (2018). Automatic knee
osteoarthritis diagnosis from plain radiographs: A deep learning-based approach. Scientific
Reports, 8(1), 1–10.
91. Iglovikov, V. I., Rakhlin, A., Kalinin, A. A., & Shvets, A.A. (2018). Paediatric bone age assess-
ment using deep convolutional neural networks. In Deep learning in medical image analysis
and multimodal learning for clinical decision support (pp. 300–308). Cham: Springer.
92. Garcia-Peraza-Herrera, L. C., Li, W., Fidon, L., Gruijthuijsen, C., Devreker, A., Attilakos,
G., Deprest, J., Vander Poorten, E., Stoyanov, D., Vercauteren, T., & Ourselin, S. (2017,
September). Toolnet: holistically-nested real-time segmentation of robotic surgical tools.
In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
(pp. 5717–5722). IEEE.
93. Attia, M., Hossny, M., Nahavandi, S., & Asadi, H. (2017, October). Surgical tool segmentation
using a hybrid deep CNN-RNN auto encoder-decoder. In 2017 IEEE International Conference
on Systems, Man, and Cybernetics (SMC) (pp. 3373–3378). IEEE.
94. Pakhomov, D., Premachandran, V., Allan, M., Azizian, M., & Navab, N. (2019, October). Deep
residual learning for instrument segmentation in robotic surgery. In International Workshop
on Machine Learning in Medical Imaging (pp. 566–573). Cham: Springer.
95. Solovyev, R., Kustov, A., Telpukhov, D., Rukhlov, V., & Kalinin, A. (2019, January).
Fixed-point convolutional neural network for real-time video processing in FPGA. In 2019
IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering
(EIConRus) (pp. 1605–1611). IEEE.
96. Shvets, A. A., Rakhlin, A., Kalinin, A. A., & Iglovikov, V. I. (2018, December). Automatic
instrument segmentation in robot-assisted surgery using deep learning. In 2018 17th IEEE
International Conference on Machine Learning and Applications (ICMLA) (pp. 624–628).
IEEE.
97. Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional networks for
biomedical image segmentation. In International Conference on Medical Image Computing
and Computer-Assisted Intervention (pp. 234–241). Cham: Springer.
98. Hamad, G. G., & Curet, M. (2010). Minimally invasive surgery. The American Journal of
Surgery, 199(2), 263–265.
99. Phee, S. J., Low, S. C., Huynh, V. A., Kencana, A. P., Sun, Z. L. & Yang, K. (2009, September).
Master and slave transluminal endoscopic robot (MASTER) for natural orifice transluminal
endoscopic surgery. In 2009 Annual International Conference of the IEEE Engineering in
Medicine and Biology Society (pp. 1192–1195). IEEE.
100. Wang, Z., Sun, Z., & Phee, S. J. (2013). Haptic feedback and control of a flexible surgical
endoscopic robot. Computer Methods and Programs in Biomedicine, 112(2), 260–271.
101. Ehrampoosh, S., Dave, M., Kia, M. A., Rablau, C., & Zadeh, M. H. (2013). Providing haptic
feedback in robot-assisted minimally invasive surgery: A direct optical force-sensing solution
for haptic rendering of deformable bodies. Computer Aided Surgery, 18(5–6), 129–141.
192 M. S. Iqbal et al.

102. Akinbiyi, T., Reiley, C. E., Saha, S., Burschka, D., Hasser, C. J., Yuh, D .D. & Okamura, A.
M. (2006, September). Dynamic augmented reality for sensory substitution in robot-assisted
surgical systems. In 2006 International Conference of the IEEE Engineering in Medicine and
Biology Society (pp. 567–570). IEEE.
103. Tavakoli, M., Aziminejad, A., Patel, R. V., & Moallem, M. (2006). Methods and mechanisms
for contact feedback in a robot-assisted minimally invasive environment. Surgical Endoscopy
and Other Interventional Techniques, 20(10), 1570–1579.
104. Hayward, V., Astley, O. R., Cruz-Hernandez, M., Grant, D., & Robles-De-La-Torre, G. (2004).
Haptic interfaces and devices. Sensor Review.
105. Rosen, J., Hannaford, B., MacFarlane, M. P., & Sinanan, M. N. (1999). Force controlled and
teleoperated endoscopic grasper for minimally invasive surgery-experimental performance
evaluation. IEEE Transactions on Biomedical Engineering, 46(10), 1212–1221.
106. Tholey, G., Pillarisetti, A., Green, W., & Desai, J. P. (2004, June). Design, development,
and testing of an automated laparoscopic grasper with 3-D force measurement capability. In
International Symposium on Medical Simulation (pp. 38–48). Berlin, Heidelberg: Springer.
107. Tadano, K., & Kawashima, K. (2010). Development of a master–slave system with force-
sensing abilities using pneumatic actuators for laparoscopic surgery. Advanced Robotics,
24(12), 1763–1783.
108. Valdastri, P., Harada, K., Menciassi, A., Beccai, L., Stefanini, C., Fujie, M., & Dario, P. (2006).
Integration of a miniaturised triaxial force sensor in a minimally invasive surgical tool. IEEE
Transactions on Biomedical Engineering, 53(11), 2397–2400.
109. Howe, R. D., Peine, W. J., Kantarinis, D. A., & Son, J. S. (1995). Remote palpation technology.
IEEE Engineering in Medicine and Biology Magazine, 14(3), 318–323.
110. Ohtsuka, T., Furuse, A., Kohno, T., Nakajima, J., Yagyu, K., & Omata, S. (1995). Application
of a new tactile sensor to thoracoscopic surgery: Experimental and clinical study. The Annals
of Thoracic Surgery, 60(3), 610–614.
111. Lai, W., Cao, L., Xu, Z., Phan, P. T., Shum, P., & Phee, S. J. (2018, May). Distal end force
sensing with optical fiber bragg gratings for tendon-sheath mechanisms in flexible endo-
scopic robots. In 2018 IEEE International Conference on Robotics and Automation (ICRA)
(pp. 5349–5255). IEEE.
112. Kaneko, M., Wada, M., Maekawa, H., & Tanie, K. (1991, January). A new consideration on
tendon-tension control system of robot hands. In Proceedings of the 1991 IEEE International
Conference on Robotics and Automation (pp. 1028–1029). IEEE Computer Society.
113. Lampaert, V., Swevers, J., & Al-Bender, F. (2002). Modification of the Leuven integrated
friction model structure. IEEE Transactions on Automatic Control, 47(4), 683–687.
114. Piatkowski, T. (2014). Dahl and LuGre dynamic friction models—The analysis of selected
properties. Mechanism and Machine Theory, 73, 91–100.
115. Do, T. N., Tjahjowidodo, T., Lau, M. W. S., & Phee, S. J. (2015). Nonlinear friction modelling
and compensation control of hysteresis phenomena for a pair of tendon-sheath actuated
surgical robots. Mechanical Systems and Signal Processing, 60, 770–784.
116. Dinh, B. K., Cappello, L., Xiloyannis, M., & Masia, L. Position control using adaptive
backlash.
117. Dinh, B. K., Cappello, L., Xiloyannis, M., & Masia, L. (2016, October). Position control using
adaptive backlash compensation for bowden cable transmission in soft wearable exoskeleton.
In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
(pp. 5670–5676). IEEE.
118. Do, T. N., Tjahjowidodo, T., Lau, M. W. S., & Phee, S. J. (2014). An investigation of friction-
based tendon sheath model appropriate for control purposes. Mechanical Systems and Signal
Processing, 42(1–2), 97–114.
119. Do, T. N., Tjahjowidodo, T., Lau, M. W. S., & Phee, S. J. (2015). A new approach of fric-
tion model for tendon-sheath actuated surgical systems: Nonlinear modelling and parameter
identification. Mechanism and Machine Theory, 85, 14–24.
120. Do, T. N., Tjahjowidodo, T., Lau, M. W. S., Yamamoto, T., & Phee, S. J. (2014). Hysteresis
modeling and position control of tendon-sheath mechanism in flexible endoscopic systems.
Mechatronics, 24(1), 12–22.
Deep Learning and Robotics, Surgical Robot Applications 193

121. Lenz, I., Lee, H., & Saxena, A. (2015). Deep learning for detecting robotic grasps. The
International Journal of Robotics Research, 34(4–5), 705–724.
122. Tan, Y. P., Liverneaux, P., & Wong, J. K. (2018). Current limitations of surgical robotics in
reconstructive plastic microsurgery. Frontiers in surgery, 5, 22.
123. Longmore, S. K., Naik, G., & Gargiulo, G. D. (2020). Laparoscopic robotic surgery: Current
perspective and future directions. Robotics, 9(2), 42.
124. Camarillo, D. B., Krummel, T. M., & Salisbury, J. K., Jr. (2004). Robotic technology in
surgery: Past, present, and future. The American Journal of Surgery, 188(4), 2–15.
125. Kim, S. S., Dohler, M., & Dasgupta, P. (2018). The Internet of Skills: Use of fifth-generation
telecommunications, haptics and artificial intelligence in robotic surgery. BJU International,
122(3), 356–358.
Deep Reinforcement Learning
for Autonomous Mobile Robot
Navigation

Armando de Jesús Plasencia-Salgueiro

Abstract Numerous fields, such as the military, agriculture, energy, welding, and
automation of surveillance, have benefited greatly from autonomous robots’ contri-
butions. Since mobile robots need to be able to navigate safely and effectively, there
was a strong demand for cutting-edge algorithms. The four requirements for mobile
robot navigation are as follows: perception, localization, planning a path and control-
ling movement. Numerous algorithms for autonomous robots have been developed
over the past two decades. The number of algorithms that can navigate and control
robots in dynamic environments is limited, even though the majority of autonomous
robot applications take place in dynamic environments. A qualitative comparison of
the most recent Autonomous Mobile Robot Navigation techniques for controlling
autonomous robots in dynamic environments with safety and uncertainty consid-
erations is presented in this paper. The work incorporates different angles like the
essential technique, benchmarking, and showing parts of the improvement interac-
tion. The structure, pseudocode, tools, and practical, in-depth applications of the
particular Deep Reinforcement Learning algorithms for autonomous mobile robot
navigation are also included in the research. This study provides an overview of
the development of suitable Deep Reinforcement Learning techniques for various
applications.

Keywords Autonomous mobile robot navigation · Deep reinforcement learning ·


Methodology · Benchmarking · Teaching

A. de J. Plasencia-Salgueiro (B)
National Center of Animals for Laboratory (CENPALAB), La Habana, Cuba
e-mail: [email protected]
BioCubaFarma, National Center of Animal for Laboratory (CENPALAB), La Habana, Cuba

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 195
A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous
Systems Applications, Studies in Computational Intelligence 1093,
https://doi.org/10.1007/978-3-031-28715-2_7
196 A. de J. Plasencia-Salgueiro

1 Introduction

Autonomous robots have significantly influenced the development of numerous


social sectors. Since mobile robots need to be able to navigate safely and effec-
tively, there was a strong demand for cutting-edge algorithms. Under the data-driven
concept, mobile robots discovered a variety of effective algorithms for navigation
and motion control with the development of Machine Learning.
The four requirements for mobile robot navigation are as follows: perception,
localization, path planning, and motion control.
The number of algorithms that can navigate and control robots in dynamic envi-
ronments is limited, although the majority of autonomous robot applications take
place in dynamic environments.
Applications and platform independent systems are created by introducing Deep
reinforcement learning (DRL) as the general proposed framework for AI learning.
At the moment, human-level control is the ultimate goal of AI and robotics.
Because robotics and artificial intelligence (AI) are among the most complex
engineering sciences and highly multidisciplinary, you should be well-versed in
computer science, mathematics, electronics, and mechatronics before beginning the
construction of an AI robot [1].
The most recent five-year-old autonomous robot control methods that can control
autonomous robots in dynamic environments are the subject of a qualitative compar-
ative study in this paper. Autonomous Mobile Robot Navigation (AMRN) methods
using deep reinforcement learning algorithms are discussed in this paper. The experi-
ence of theoretical and practical implementation, validation through simulation and
experimentation are all taken into consideration when discussing the evolution of
each method’s application. Researchers benefit from this investigation by gaining an
understanding of the development and applications of appropriate approaches that
make use of DRL algorithms for AMRN.
The outstanding contributions of this work will be as follows:

– Define the benefits of developing mobile robots under a machine learning


conception using DRL.
– Give the relation and the detailed configuration of DRL for Mobile Robot
Navigation (MRN).
– Explain the methodology and the necessary techniques of benchmarking for the
application more representative DRL algorithm for MRN.
– Show the key considerations for Teaching DRL for MRN and propose two
exercises using CoppeliaSim.
– Defined the Application requirements of Autonomous Robots (AR) were estab-
lished the Practical Safety

The work is structured in the following incises. In Sect. 2. Antecedents briefly are
related to the historical development of the AR control, at conventional linear control
to DRL. In Sect. 3. Background, exposed the theoretical fundaments of AMRN and
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 197

Machine Learning (ML) including the requirements and the applications of ML algo-
rithms like Reinforcement Learning (RL), Convolutional Neural Networks (CNN),
the different approaches of DRL, Long Short-Term Memory (LSTM), and Applica-
tion requirements particularly the navigation in dynamic environments, safety, and
uncertainty. In Sect. 4. DRL Methods, make an accurate description of the more
common methods described in the more recent scientific literature, including the
theoretical conception, representation, logical flow chart, and pseudo-code of the
different DRL algorithms and combinations of them. In Sect. 5. Design Method-
ology is described the necessary steps to follow in the design of DRL systems
for autonomous navigation and the particularities and techniques benchmarking
in different conceptions, In Sect. 6. Teaching has exposed the particularity of the
teaching process of DRL algorithms and two exercises to develop in the class using
a simulation conception. In Sect. 7. Discussion is treated as the principal concep-
tions and troubles exposed in the work. In Sect. 8. Conclusions, is provided a brief
summary and the future perspective of the work.
The nomenclature used in this paper is listed in Abbreviations part.

2 Antecedents

2.1 Control Theory, Linear Control, and Mechatronics

Using electronics and ICs (Integrated Circuits) be able to control machines with more
flexibility and accuracy using conventional linear control systems using sensors
for providing feedback from the system output.
Linear control is motivated by Control Theory using mathematical solutions and
specifically linear algebra implemented on hardware using mechatronics, electronics,
ICs (Integrated Circuits), and micro-controllers.
These systems were using sensors to feedback on the error and were trying to
minimize the error to stabilize the system output. These linear control systems were
using a mathematical solution known as linear algebra to drive the function that maps
input to the output. This field of interest was known as Automation and the goal was
to create automatic systems [1].

2.2 Non-linear Control

Non-linear control became more crucial to drive the non-linear function (or kernel
function) mathematically for the more complicated task. The reason behind non-
linearity was the fact that input and output had different and sometimes big dimen-
sionality and the complexity could just not be modeled using linear control and linear
198 A. de J. Plasencia-Salgueiro

Fig. 1 Control modules for generating the controlling commands [2] (Pieter Abbeel—UC
Berkeley/OpenAI/Gradescope)

algebra. This was the main motivation and fuel for the rise of non-linear function
learning or how to drive these functions [1].

2.3 Classical Robotics

With the advancement in the computer industry, non-linear control gave birth to
intelligent control which is using AI for high-level control of the robot and systems.
Classical robotics was the dominating approach. These approaches were mostly
application dependent and highly platform-dependent. Generally speaking, these
approaches were hand-crafted, hand-engineered, and addressed as shallow AI [1].
These architectures are also referred to as GNC (Guidance, Navigation, and
Control) architectures, mostly composed of perception, planning, and control
modules. Perception modules were mostly used for mapping the environment and
localization of the robot inside the environment, Planning modules (also referred to
as navigation modules) to plan the path in terms of motion and mission, and Control
modules for generating the controlling commands (controlling behaviors) required
for the robot kinematics [1] (see Fig. 1).

2.4 Probabilistic Robotics

Sebastian Thrun in “Probabilistic Robotics” introduces a new way of looking at


robotics and how to incorporate ML algorithms for probabilistic decision-making
and robot control [1].
These architectures are the best examples of classical robotics with the addition of
machine learning in the planning/control (navigation/control) part. Sebastian Thrun
impacted the field of robotics by adding machine learning from AI to high-level
control architecture (or system software architecture).
By looking at these three architectures, you can see how machine learning and
computer vision have been used in a very successful way. Aside from the interfaces,
the main core architecture is composed of Perception and Planning/Control or Navi-
gation (as you can see planning/control is equal to navigation in Sebastian Thrun’s
terminology). The perception part has been fully regarded as a computer vision
problem and has been solved using computer vision approaches on the other hand
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 199

planning/control or navigation has been successfully solved using ML techniques


and mostly Support Vector Machine (SVM) [1].
With the advances of ML algorithms in solving computer vision problems, ML
as a whole (end-to-end) started to be taken a lot more seriously for intelligent robot
control and navigation as an end-to-end approach for high-level control architecture
(AI framework). This gave a huge boost to cognitive robotics among researchers and
in the robotic community [1].

2.5 Introduction of Back-Propagation for Feed-Forward


Neural Networks

The main boost in reconsidering the use of Neural Networks (NN) was the intro-
duction of the Back Propagation algorithm as a fast optimization approach. In one
of Geoff Hinton´s talks, he explained the biological foundation of back-propagation
and how it might happen in our brains [1].

2.6 Deep Reinforcement Learning

DRL -based control have been initially introduced and coined by a company called
Google Deep Mind (www.deepmind.com). This company started using this learning
approach for the simulated agents in Atari games. The idea is to let the agent learn on
its own till it reaches the human-control level of gaming or maybe a superior level.
Recent excitement in AI was brought about by this DRL method Deep Q-network
(DQN) in Atari games, a simple simulated environment, and robots for testing [1].
DRL is used where Deep Neural Network (DNN) is used to extract high-
dimensional observation features in Reinforcement Learning (RL). Figure 2 shows
how a DNN is used to approximate the Q value for each state and how the agent acts
by observing the environment accordingly.
With the implementation of DRL, the robotics state will transform like in Fig. 3.

Fig. 2 Working of DRL [3]


200 A. de J. Plasencia-Salgueiro

Fig. 3 Deep reinforcement


learning [2]

3 Background: Autonomous Mobile Robot Navigation


and Machine Learning

3.1 Requirements

Examples of mobile robots (MR) include ships that move with their surround-
ings, autonomous vehicles, and spacecraft. Their navigation involves looking for
an optimal or suboptimal route while simultaneously avoiding obstacles and consid-
ering their destination. To simplify this challenge, the majority of researchers have
concentrated solely on the navigation issue in two-dimensional space. The robot’s
sense of perception is its ability to perceive its surroundings.
The effectiveness with which an intelligent robot completes its mission is influ-
enced in part by the properties of the robot’s sensor and control systems, such as its
capacity to plan the trajectory and avoid obstacles.
Sensor monitoring for environments can be used in a broad range of locations.
Mounting sensors on robotic/autonomous systems is one approach to addressing the
issues of mobility and adaptability [4].
In order for the efficient robot’s action to be realized in real time, particularly in
environments that are unknown or uncertain, the strict requirements of the robot’s
sensor and control system parameters must be met. First, let’s talk about [5]:
– Increasing the precision of the remote sensor information;
– Reduction of sensor signal formation time to a minimum;
– Reducing the processing time of the sensor data;
– Reducing the amount of time required for the robot’s control system to make
decisions in a dynamic or uncertain environment with obstacles;
– Spreading the robots’ functional characteristics through the use of fast calculation
algorithms and effective sensors.
The recommender software, which makes use of machine learning, gets to work
once the anomaly detection software has discovered an anomaly. Using the auto-
matic’s installed navigation system or a compass equipped with sensors and the
automatic’s impact evasion system’s warning recognition data, the sensor data is
combined with the mechanical’s current course. An off-policy deep learning (DL)
model is used by the recommender to make recommendations for MR based on
the current conditions, surroundings, and sensor readings. The MR can send the
precise coordinates of the anomaly site and, if necessary, sensor data back to the
base for additional investigation as required thanks to this DL ML. This is especially
important when safety is at stake or when investigators only have to wear breathing
apparatus or hazardous material suits for a short time. The drone can go straight to
the tagged location while it analyzes additional sensors [4].
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 201

Localization
Localization is the method of determining where the robot is in its environment.
Ground or aerial vehicles’ precise positioning and navigation in complex
spatial environments are essential for effective planning, unmanned driving, and
autonomous operation [6].
In fact, the Kalman filter, which is associated with reinforcement learning, is
regarded as one of the more promising strategies for precise positioning. The RL-
AKF (adaptive Kalman filter navigation algorithm) uses the deep deterministic policy
gradient to find the most optimal state estimation and process noise covariance
matrix from the continuous action space, taking the integrated navigation system
as the environment and the opposite of the current positioning error as the reward.
When the GNSS signal is unavailable, the RL-AKF significantly improves integrated
navigation’s positioning performance [6].
Path-planning
In Path-planning, the robot chooses how to maneuver to reach the goal without
collision.
Even though the majority of mobile robot applications take place in dynamic
environments, there aren’t many algorithms that can guide robots through them [7].
For automatically mapping high-dimensional sensor data to robot motion
commands without referring to the ground truth, DRL algorithms are regarded
as powerful and promising tools. They only require a scalar reward function to
encourage the learning agent to experiment with the environment to determine the
best course of action for each state [8]. Building a modular DQN architecture to
combine data from a variety of vehicle-mounted sensors is demonstrated in [8]. In the
real world, the developed algorithm can fly without hitting anything. Path planning,
3D mapping, and expert demonstrations are not required for the proposed method.
Using an end-to-end CNN, it turns merged sensory data into a robot’s velocity control
input.
Motion control
The robot’s movements are controlled in Motion control to follow the desired trajec-
tory. Linear and angular velocities, for example, fall under the category of motion
control [9].
In plain contrast to the conventional framework for hierarchical planning, data-
driven techniques are also being applied to the self-ruling navigation problem as a
result of recent advancements in ML research. Systems that use end-to-end learning
algorithms to find navigation systems that map directly from perceptual inputs to
motion commands, avoiding the traditional hierarchical paradigm, have been devel-
oped in early work of this multiplicity. Without the symbolic, rule-based human
knowledge or engineering design of these systems, a MR can move everywhere [9].
In Fig. 4, the mentioned requirements are linked.
202 A. de J. Plasencia-Salgueiro

Fig. 4 Requirement
interrelation in AMRN [7]

3.2 Review of RL in AMR

Reviewing the reinforcement learning approaches in robotics, proposing it as a frame-


work in robotics as an application platform for RL experiments and case studies are
proposed in [10].
In 2013 [11], using the reference of the works of Huang [12] and Hatem [13]
proposed a framework for the simulation of RL using Artificial Neural Network
(ANN) like in Fig. 5.

3.3 Introduction of Convolutional Neural Networks

Bengio’s group with LeCun was the first to introduce CNN, and consistently propose
them as the best AI architecture [14]. They successfully applied DL to OCR (Optical
Character Recognition) for document analysis.

3.4 Advanced AMR: Introduction to DRL (Deep


Reinforcement Learning) Approach

Closer examination reveals that there are two versions of the DRL strategy: policies
and approaches based on values. Value-based DRL indirectly obtains the agent’s
policy by iteratively updating the value function. When the agent reaches an optimal
value, the optimal policy is chosen using the optimal value function. Using the
function approximation method, the policy-based approach directly builds a policy
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 203

Fig. 5 a A RL Structure
using an ANN, b Function Q
with ANN type MLP for
Q-learning [13]

network. After that, it selects actions within the network to determine the value of
the reward and optimizes the policy network parameters in the gradient direction to
produce an optimized policy that maximizes the value of the reward [15].
Value-Based DRL Methods

Deep Q network
Mnih et al. published an interesting preliminary work about DQN-related research in
Nature in 2015, affirming that after 49 games, the trained network could perform at a
human level. In DQN, the activity estimation capability is addressed with a DNN that
depend on Q learning and is referred to as a CNN. The feedback from game rewards
is used to train the network. The following are DQN’s fundamental characteristics:
[15]:

(1) The time-difference algorithm’s TD error is handled separately by the target


network.
(2) An experience reiterate method is used to select the samples, and the experience
pool stores and manages them (s; a; r; s ), In order to train the Q network, these
samples are kept in the experience pool, where collection samples are selected at
random. The elimination of sample correlation by the experience replay mech-
anism results in approximately independent and identical distributions for the
training samples. Slope drop is utilized to refresh the NN’s boundaries [15].
204 A. de J. Plasencia-Salgueiro

Double DQN
“Hado Van Hasselt” was the one who introduced Double DQN (DDQN). By breaking
down the max operation in the target-to-action selection, it is used to reduce the
overestimation problem to a minimum. A DQN and a DNN are combined in this
system. This approach was developed to address the issue of overestimating Q values
in models previously discussed. It’s aware that the action with a higher Q value is
the best option for the next state, but the accuracy of the Q value depends on what
It’s tried, what It got, and what will be the next state for this trial. It’s not have
sufficient Q values at the beginning of the experiment to estimate the best possibility.
Since there are fewer Q values at this point to choose from, the highest Q value may
cause It to take an incorrect action toward the target. DDQN is used to solve this
issue. One DQN is used to select the Q value, and the other uses the target network to
calculate the target Q value for that particular action. This DDQN assists with limiting
the miscalculation of the Q values which assists in decreasing the preparation with
timing [3].
Dueling Q network
A dueling Q network is utilized to tackle the issues in the DQN model by utilizing
two networks, the ongoing network, and the objective network. The ongoing network
approximates the Q values. Then again, the objective network chooses the following
best action and plays out the action picked by the objective. It may not be necessary to
approximate the value of each action in all circumstances. It uses a dueling Q network
for this. In some gaming settings, when a collision occurs, it chooses to move left (or
right), but in other situations, it need to know which action to take. A dueling network
is an architecture that it creates for a single Q network. It’s employing two sequences
rather than a single sequence following the convolution layer. Estimation values, the
advantage function, and finally a single Q value are separated by employing these
two sequences. As a result, the dueling network produces the Q function, which has
been trained using a variety of existing algorithms, such as DDQN and SARSA.
The progression of the dueling deep Q network is dueling double deep Q network
(D3QN) and will revised later [3].
Policy-based DRL methods

Deep Deterministic Policy Gradient (DDPG)


Of particular interest is that Policy-based DRL methods can solve problems with a
high-dimensional observation space, whereas value-based DRL methods (DQN and
its variants) can only deal with discrete, low-dimensional action spaces. However,
there are a number of appropriate tasks that necessitate constant, multilayered activity
spaces, particularly actual control tasks. To solve this issue, the action space can be
discretized, but it will still be dimensional, with an exponential increase in the number
of actions proportional to the degree of freedom [15].
Deep Deterministic Policy (DDP), a policy gradient-based method, can be used
to directly optimize the policy for problems involving a continuous action space.
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 205

In contrast to the random strategy represented by the probability distribution func-


tion, DDPG employs a deterministic policy function. It also learns from the DQN’s
target network, uses a CNN to simulate the policy and Q functions, and uses experi-
ence replay to stabilize the training and guarantee high sample utilization efficiency.
Gradient ascent updates the Q network over time, and the K samples in the experience
pool are chosen at random. [15].
Asynchronous Advantage Actor-Critic (A3C)
Also, Mnih and et al. developed the A3C algorithm, an AC-like Actor-Critical
(A3C) algorithm. The agent’s policy is directly optimized by the conventional policy
gradient algorithm; In order to update the policy, it must collect a series of complete
sequence data. The collection of DRL sequence data can result in major adjustments
[15].
In particular, the AC structure, which combines the policy gradient method with
the value function, is receiving a lot of attention.
The above affirmation is due to that the policy gradient method is used by the actor
to select actions within the AC structure, and the value function method is used by the
critic to evaluate those actions. Both the actor’s and the critic’s parameters are alter-
nately updated during training. One of its advantages is that the AC structure trans-
forms the sequence update in the policy gradient into a one-step update, removing
the need to belief that the succession will reach a conclusion before evaluating and
further developing the arrangement. The policy gradient algorithm’s variance and
the difficulty of data collection are both reduced by this fitness [15].
A3C enhances the AC structure in the following ways:
(1) Agents in parallel: Because the A3C algorithm generates multiple parallel
environments, multiple agents with secondary structures can update the main
structure’s parameters simultaneously in these environments. Multiple actors
frequently explore the environment.
(2) Return by taking N steps: In contrast to other algorithms, which usually use
a one-step return of the instant reward calculation function obtained in the
sample, A3C’s critic’s value function is updated using the multi-step cumulative
return. When the N-step return is calculated, it accelerates the iterative update
propagation and convergence.
A3C has a lower computational cost than DQN and can run on a CPU with multiple
cores. A3C has been successful in tasks like continuous robotic arm control and
maze navigation, despite issues with hyperparameter adjustment and low sampling
efficiency, according to the results of the experiments [15]. The particularities of
Actor-Critic Learning will be explained in incise 4.4.
Proximal Policy Optimization (PPO)
The sampled minibatch can only be used for one update epoch in traditional policy
gradient methods, which use an on-policy strategy, and must be resampled before
the subsequent policy update can be implemented. The PPO algorithm’s capacity to
206 A. de J. Plasencia-Salgueiro

carry out minibatch updates over a number of epochs increases the effectiveness of
sample utilization [15].
The PPO calculation utilizes an elective objective to advance the new approach
utilizing the old arrangement. It is utilized to enhance the new policy’s actions in
comparison to the previous policy. However, the training algorithm will become
unstable if the new policy makes significant improvements. The objective function
is improved by the PPO algorithm. A detailed explanation of PPO will be exposed
later in the incise 4.5.
Due to its ability to strike a balance between sample complexity, simplicity, and
time efficiency, PPO outperforms A3C and other on-policy gradient methods. [15].
Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs)
RNNs are a family of NN that cannot be constrained in the feed-forward architecture.
RNNs are obtained by introducing auto or backward connections—that is recurrent
connections—into feed-forward neural networks [16].
When introducing a recurrent connection, is introduced the concept of time. This
allows RNNs to take context into account; that is, to remember inputs from the past
by capturing the dynamic of the signal.
Introducing recurrent connections changes the nature of the NN from static to
dynamic and is therefore suitable for analyzing time series.
The simplest recurrent neural unit consists of a network with just one single hidden
layer, with activation function tanh(), and with an auto connection. In this case, the
output, h(t), is also the state of the network, which is fed back into the input—that
is, into the input of the next copy of the unrolled network at time t + 1 [16].
This simple recurrent unit already shows some memory, in the sense that the
current output also depends on previously presented samples at the input layer.
However, often not enough to solve most required tasks. It needed something more
powerful that can crawl backward farther in the past than just what the simple
recurrent unit can do. To solve this introduced LSTM units [16].
LSTM is a more complex type of recurrent unit, using an additional hidden vector,
the cell state or memory state, s(t), and the concept of gates. Figure 6 shows the
structure of an unrolled LSTM unit.

Fig. 6 LSTM layer [16]


Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 207

An LSTM layer contains three gates: a forget gate, an input gate, and an output
gate.
LSTM layers are a very powerful recurrent architecture, capable of keeping
the memory of a large number of previous inputs. These layers thus fit—and are
often used to solve—problems involving ordered sequences of data. If the ordered
sequences of data are sorted based on time, then we talk about time series. Indeed,
LSTM-based RNNs have been applied often and successfully to time series analysis
problems. A classic task to solve in time series analysis is demand prediction [16].

3.5 Application Requirements

Inside the applications requirement in Autonomous Robotic systems were consid-


ered only the dynamic environment, safety, and uncertainty how the more important
requirements.
Dynamic environment
Any decision-maker in RL is an agent, and the environment is anything that isn’t
the agent. As a training feedback signal, the agent interacts with the environment
to maximize the reward accumulated, thereby gaining a reward value. Using the S,
A, R, and P components, the agent-environment interaction process can be modeled
as a Markov Decision Process (MDP). S addresses the environment’s condition, A
addresses the agent’s action, R addresses the reward, and P addresses the probability
of a state transition. The agent’s policy π is the mapping from state space to action
space. When the state st  S, the agent takes action at  A, and then transfers to the
next state st+1 according to the state transition probability P, while receiving reward
value feedback r t  R from the environment [15].
Although the fact that the agent receives immediate rewards feedback at each time
step, the objective of RL is to maximize the long-term cumulative reward value over
the short term. The agent then continuously improves policy π by optimizing the
value function.
Two learning strategies have been proposed and developed by researchers because
dynamic programming necessitates massive memory consumption and complete
dynamic information, both of which are impracticable. Learning through Monte
Carlo and temporal difference (TD). The Q-learning algorithm combines TD learning
with the Bellman equations and MDP theories. Since then, RL research has made
significant progress, and RL algorithms have been utilized to solve a wide variety of
real-world issues [15].
Safety
The course of Safety tasks in MR is viewed as within an Independent Web of
Things Framework. The sensors gather data about the state of the system, which
is used by intelligent agents in the Internet of Things devices and Edge/Fog/Cloud
servers to decide how to control the actuators so they can act. A promising strategy
208 A. de J. Plasencia-Salgueiro

for autonomous intelligent agents is to use artificial intelligence decision-making


techniques, particularly RL and DRL [17].
Uncertainty assessment for Autonomous Systems
Robots can now operate in environments that are becoming increasingly dynamic,
unpredictable, unstructured, and only partially observable thanks to improvements
in robotics. Uncertainty is a major effect of a lack of information in these settings.
Therefore, autonomous robots must be able to deal with uncertainty [18].
The operating environment of autonomous or unmanned systems is uncertain.
Independent frameworks can rely on shocking insights and approximated models
to plan and decide. The sensor data are inherently uncertain due to noise, inconsis-
tency, and incompleteness. The analysis of such massive amounts of data necessi-
tates sophisticated decision-making strategies and refined analytical techniques for
effectively reviewing and/or predicting future actions with high precision.
Without a measurement of prediction uncertainty, DL algorithms cannot be fully
integrated into robotic systems under these circumstances. In deep DNN, there
are typically two sources of prediction uncertainty: ambiguity in the model and
uncertainty in the data.
Autonomous vehicles operate in highly unpredictable, non-stationary, and
dynamic environments. The most promising methods for treating DL weakness
are Bayesian Deep Learning (BDL), which combines DL and Bayesian probability
approaches, and fuzzy logic machine learning (ML) in a Web of Things structure.
Under an informational conception of the Internet of Things and Big Data, it is
necessary to describe the most well-known control strategies, map representation
techniques, and interactions with external systems, including pedestrian interactions,
for proper comprehension of this process [19].

4 Deep Reinforcement Learning Methods

4.1 Continuous Control

As in previous studies behind the success of Deep Q-Learning (DQL) to the contin-
uous action domain, the authors Lillicrap [20] developed an actor-critic, model-free
algorithm based on the deterministic policy gradient that can operate over contin-
uous action spaces. The proposed algorithm successfully solves numerous simulated
physics problems, including well-known ones like cart pole swing-up, handy manip-
ulation, legged locomotion, and car driving, by employing the same learning algo-
rithm, network architecture, and hyper-parameters. Because it has full access to all of
the space’s components and subsystems, the algorithm is able to identify strategies
that resemble those of arranging calculations. It demonstrates that the algorithm can
“end-to-end” learn policies for a variety of tasks: directly from sources of information
and raw sensors.
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 209

4.2 A Simple Implementation of Q-learning

The Bellman Eq. (1) provides the recognized definition of RL, where [21]:

V (s) = max a (R(s, a) + γ V (s , )) (1)

where:
• V(s): Present value of a state s;
• R (s, a): Reward related to action a in state s;
• V(s ): Future value in future state s ;
• a: Action was taken by the agent;
• s: Current agent state.
• γ : Discount factor.

In order for the Bellman equation to be applied to Q-learning, formula (2) states
that the formula needs to be transformed so that it can calculate the quality of each
agent state’s actions in both the current time (t) and a previous time (t − 1).

Q(s, a)t = Q(s, a)t−1 + α(R(s, a) + γ )(max a . Q(s , , a , ) − Q(s, a)(t−1) ) (2)

The Q-learning Eq. (2) was the foundation for the DQL algorithm, which makes
Q values available based on the agent’s state so that actions can be taken.
The agent’s Q values were determined in its current state using a dense artificial
NN with four inputs. Sensors, and hidden layer with thirty artificial neurons, execute
the Q-learning estimation, as displayed in the network diagram of Fig. 7. There are
four Q values included in the network’s output.

Fig. 7 NN used in the learning algorithm [21]


210 A. de J. Plasencia-Salgueiro

Based on the Q weights processed by the network for the agent’s final action,
the answer that was most likely to be chosen was determined using the SoftMax (3)
equation. The agent received a positive reward for making a good decision and a
negative reward for making a bad one after making a decision.

eQ
So f t Max =  Q (3)
e

With each interaction, the effectiveness of DQL increases. A technique known


as replay experience was used to confirm this assertion. It forces the agent to recall
his previous decisions while he made in this state. He sends these values back to the
network as a result. If the previous interaction resulted in a positive reward he can
keep the same decision, if he is punished, may choose a different path [21].
Starting execution, install the Q value table, set the learning times episode to 1,
put the information about the learning times max-episode in the upper line, and then
begin the cycle: After confirming the environment’s state S, achieving the next state
S’, and obtaining the immediate environmental reward R, select the subsequent action
A from the Q table; restoring the Q table; S should to be updated, the learning times
should to be increased by one, and the endpoint and maximum path limit should to
be checked to see if they have been reached. Continue with the “confirm s-select
action A—to reach S’ exploration process” cycle if this is not the case.
The program closes the circle when the specialist arrives at the desired state
to check whether the current learning times are more significant than the greatest
learning times. The program iterates until the required maximum number of learnings
is reached in order to achieve the best Q value and path. If the judgment results are
negative, the program ends. The most recent Q value table will be used in its place
in the event that the judgment result is positive. The following example is shown in
flowchart of Fig. 8. The number can be used to show the sequence how the algorithm
works [22].
The pseudo-code in Fig. 9 shows the basic writing of the DQL algorithm.

4.3 Dueling Double DQN

D3QN is used to navigate an unfamiliar indoor environment in the given example.


The robot ought to avoid hitting anything in the interim. The results are the linear
and angular velocities, and the raw data from the depth sensor that does not include
a size scale is used as the input [23].
DDQN and Dueling DQN are combined in D3QN. Consequently, it solves the
estimation issue and boosts performance. The learning sequences are considered
as a MDP because they are the standard RL method. The robot interacts with its
surroundings to make decisions. At each time step, the robot chooses an action
based on the current observation st at time t, where the observation is a four-frame
depth sensor data load.
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 211

Fig. 8 Q-learning Logical flow chart [22]

Fig. 9 DQL algorithm, (https://ailephant.com/category/algorithms/) [21]


212 A. de J. Plasencia-Salgueiro

Reward notices a sign r(st ,at ) delivered by the reward function. Moving forward,
half-turning left, turning left, turning left–right, and turning right are the actions. The
MR then moves on to the subsequent observation, st+1 . The (4) incremental reward
for the future is:


T
Rt = γ γ τ, (4)
τ

and the MR’s goal is to get the most discount on the reward γ is a discount factor
between 0 and 1 that weighs the importance of immediate versus future rewards. The
immediate will be more significant the smaller γ it is, and vice versa. The termination
time step is denoted by T. The algorithm’s goal is to make the action value function
Q as maximal as possible. In contrast to DQN, D3QN’s Q function is (5).

Q(s, a, θ, α, β) = V (s, θ, β) + (A(s, a; θ, α) − max(s, a; θ, α); (5)

where are the parameters of the two streams CNN of fully connected layers, a loss
function can be used to train the Q-network (6).

1 
n
L(θ ) = (yk − Q(s, a; θ))2 (6)
n k

Figure 10 displays the network’s structure.


Its components are the perception network and the control network. A three-layer
CNN with activation and convolution at each layer makes up the perception network
layer. The first CNN layer makes use of 32 5 × 5 stride 2 convolution kernels. With

Fig. 10 The structure of D3QN network. It has a dueling network and a three-layer CNN [23]
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 213

step 2, the subsequent layer makes use of 64 3 × 3 convolution parts. In the third layer,
64 2 × 2 convolution kernels are utilized with stride 2. The control network is the
dueling network. The dueling network consists of two sequences of fully connected
(FC) layers, with each FC layer independently estimating the state value and benefits
of each action. In the FC1 layer, there are 512 nodes in each of the two FC layers. In
the FC2 layer, there are two FC layers, each with six nodes. In the FC3 layer, there
is a FC layer with six nodes. The ReLu function serves as the activation function for
each and every layer. Figure 11 describes the D3QN model’s parameters.

Fig. 11 Algorithm D3QN


[23]
214 A. de J. Plasencia-Salgueiro

4.4 Actor-Critic Learning

Hafner et al., developed the Dreamer algorithm without the use of robots learning
simulators. Dreamer builds a world model from a replay buffer of previous experi-
ences by acting in the environment. The predicted trajectories of the learned model
are used by an actor-critic algorithm to learn actions. Decoupled learning updates
from data collection to meet latency requirements and permit rapid training without
waiting for the environment. A learner thread continuously trains the world model
and actor-critic behavior in the implementation while an actor thread simultaneously
computes actions for environment interaction [24].
World Model Learning: The world model, as shown in Fig. 13, is a DNN that
teaches itself to anticipate the dynamics of the environment (Left). Future represen-
tations are predicted rather than future inputs because sensory inputs can be large
images. This makes massively parallel training with large batch sizes possible and
reduces the number of errors that accumulate. As a result, the world model can be
interpreted as a brief environment simulation that the robot acquires on its own. As
it looks into the real world, the model gets better all the time. The Recurrent State-
Space Model (RSSM), which has four parts, is the foundation for the world model
[24]:

Encoder Network : encθ (st |st−1 , at−1 , xt ) (7)

Dynamics Network : dyn θ (st |st−1 , at−1 )

Decoder Network : decθ (st ) ≈ xt

Reward Network : r ewθ (st+1 ) ≈ rt

Proprioceptive (the sense of one’s own movement, force, and body position) joint
readings, force sensors, and high-dimensional inputs like RGB and depth camera
images are all shared by the various physical AMR sensors. All of the sensory inputs
x t are combined by the encoder network into the stochastic representations zt . Using
its recurrent state, ht , the dynamics model learns to predict the sequence of stochastic
representations. Because it reconstructs the sensory inputs to provide a rich signal
for learning representations and permits human inspection of model predictions, the
decoder is not required when learning behaviors from latent rollouts (Fig. 12).
In the authors’ real-world experiments, the AMR must interact with the real
world to discover task rewards, which the reward network learns to predict. It is
also possible to use rewards that are specified by hand in response to the decoded
sensory inputs. All of the world model’s components were jointly optimized using
stochastic backpropagation [24].
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 215

Fig. 12 Dreamer algorithm uses a direct method for learning on AMR hardware without the use
of simulators. The robot’s experience is collected by the current learned policy. The replay buffer is
expanded by this experience. Through supervised learning, the world model is trained on replayed
off-policy sequences. An actor-critic algorithm uses imagined rollouts in the world model’s latent
space to improve a NN policy. Low-latency action computation is made possible by parallel data
collection and NN learning, and learning steps can continue while the AMR is moving [24]

Actor-Critic algorithm
The world model represents knowledge about the dynamics that the actor-critic algo-
rithm learns is specific to the task at hand, whereas the actor-critic algorithm is
independent of the task at hand. Without decoding observations, the world model’s
predicted learned behaviors from rollouts are depicted in Fig. 13 (right). Similar to
specialized modern simulators, this enables massively parallel behavior learning on
a single GPU with typical batch sizes of 16 K. Two NN make up the actor-critic
algorithm [24]:

Actor − Network : π(at |st );

Critic Network : ν(st ); (8)

Finding a distribution of successful actions that maximizes the total sum of task
rewards that can be predicted for each latent model state st is the job of the actor
network. The critic network learns to anticipate the total quantity of future task
rewards through temporal difference learning. This is important because it makes it
possible for the algorithm to learn long-term strategies by, for example, taking into
account rewards after the H = 16 step planning horizon has passed. It is necessary
to regress the critic’s predicted model state trajectory return. An easy option is to
calculate the return as the sum of N intermediate rewards and the critic’s prediction
of the next state. The compute returns represent the average of all N  [1, H-1], not
an arbitrary N value [24]:
216 A. de J. Plasencia-Salgueiro

Fig. 13 Training in NN Hafner et al‘s. Dreamer algorithm (2019; 2020) for rapid robot learning
in real-world situations. Dreamer comprises two brain network parts. Left: A deep Kalman filter
that has been trained on replay buffer subsequences is the structure of the world model. All sensory
modalities are combined by the encoder into discrete codes. The decoder provides a significant
learning signal and makes it possible for humans to examine model predictions by reconstructing the
codes’ inputs. Without observing intermediate inputs, a RSSM is trained to predict subsequent codes
based on actions. Right: Without having to reconstruct sensory inputs, the world model enables
massively parallel policy optimization with large batch sizes and without remaking tangible data
sources. Dreamer trains a value and policy network using imagined rollouts and a learned reward
function [24]

.   λ .
Vtλ = rt + γ (1 − λ)ν(st+1 ) + λVt+1
λ
, VH = ν(s H ). (9)

The actor network is trained to maximize returns, whereas the critic network is
trained to regress λ-returns. Two examples of gradient estimators are reinforce and
the reparameterization trick. Reiterate that the actor’s optimal policy gradient can
be calculated. The differentiable dynamics network is used by Rezende and others
to directly backpropagate return gradients. Following Hafner et al., reinforcement
gradients for discrete action tasks and reparameterization gradients for continuous
control tasks were selected. In addition to maximizing returns, the actor is encouraged
to maintain a high entropy level throughout training in order to avoid a deterministic
policy collapse and maintain some exploration [24]:
 H   
.
L(π ) = −E lnπ (at |st sg Vtλ − ν(st ) + ηH [π (at |st )]]; (10)
t=1

Was optimized both actors and critics using the Adam optimizer (Kingma and
Ba, 2014). As is typical in the literature (Mnih et al., 2015), a slowly updated copy
of the critic network was used to calculate the λ-returns. (Lillicrap and others 2015).
Because doing so would result in model predictions that are incorrect and overly
optimistic, the world model is unaffected by the gradients of the actor and critic [24].
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 217

4.5 Learning Autonomous Mobile Robotics with Proximal


Policy Optimization

Based on the research [25], using PPO, the AMR’s current angular velocity distribu-
tion is learned. The PPO algorithm, which simplifies it for RNNs to work at a huge
scope by utilizing first- order gradients, can be considered an inexact variant of trust
district strategy enhancement. The continuous control learning algorithm is shown
in pseudocode in Fig. 14 utilizing PPO. The proposed algorithm makes use of the
actor-critic architecture.
To begin, the policy stipulates πθ that the mobile robot progresses through the
environment one step at a time. The state, the action, and the reward are gathered
for later training at each step. The advantage function is then provided by temporal
difference (TD) error, which is the difference between the state value V ϕ (st ), and
discounted rewards t1 >t γ t1 −t rt1 . By applying a gradient method to actor updates θ
concerning J PPO (θ ), a surrogate function whose probability ratio is πθ (at |st )/πold (at
|st ). is maximized. Actor optimizes new policy πθ (at |st ) based on the advantage
function and old policy πold (at |st ). The larger the advantage function is the more
probable the new policy changes are. However, if the advantage function is too large,
the algorithm is very likely to be divergent. Therefore, a KL penalty is introduced
to limit the learning rate from the old policy π old (at |st ) to the new policy πθ (at |st ).
Critic updates ϕ by a gradient method concerning L BL (ϕ), which minimizes the loss
function of TD error given data with length-T time steps. The desired change is set by
the hyperparameter KLtarget in each policy iteration. If the actual change KL[πold|πθ]
below or exceeds the KLtarget range in [βlow KLtarget , βhigh KLtarget ], the scaling term
α > 1 would adjust the coefficient of KL[πold |πθ ].
The clipped surrogate objective is another approach that can be used in place of
the KL penalty coefficient for updating the actor-network. The following summarizes
the primary objective:

Fig. 14 Continuous control learning algorithm through PPO in pseudo-code


218 A. de J. Plasencia-Salgueiro

L C L I P (θ ) = t [min(rt (θ ) t , cli p(rt (θ ), 1 − ∈, 1 + ∈) t )] (11)

where:  = 0.2 is the hyperparameter. The clip term clip(r t (θ ),1 − ,1 + )´t has
the same motivation as the KL penalty, which is also used to limit too large policy
updates.
Reward Function.
To simplify the reward function, the critic network uses only two distinct
conditions without normalization or clipping:

r move i f not collition
rt (st , at ) = (12)
rcollition i f collition

A positive reward r move is given to the AMR for freely operating in the environ-
ment. Otherwise, a significant negative reward rcollision is given if the AMR collides
with the obstacle during a minimum sensor scanning range check. This reward func-
tion encourages the AMR to maintain its lane and avoid collisions as it moves through
the environment.

4.6 Multi-Agent Deep Reinforcement Learning

Multi-Agent Reinforcement Learning (MARL) has been shown to perform well


in decision-making tasks in such a dynamic environment. Multi-agent learning is
challenging in itself, requiring agents to learn their policies while taking into account
the consequences of the actions of others [25].
The last mentioned work [25], was proposed for energy optimization of AMR
(or UAV) a direct collaborative Communication-Enabled Multi-Agent Decentralized
Double Deep Q-Network (CMAD–DDQN) method where each agent relies on its
local observations, as well as the information it receives from its nearby AMRs
for decision making. The communicated information from the nearby AMRs will
contain the number of connected ground users, instantaneous energy value, and
distances from nearby AMRs in each time step. The authors propose an approach
where each agent executes actions based on state information. It assumes a two-way
communication link among neighboring AMRs [25].
For improved system performance, the cooperative CMAD-DDQN strategy,
which relies on a communication mechanism between nearby UAVs, was proposed.
In the scenario under consideration, each agent’s compensation reflects its neigh-
borhood’s coverage performance. Each AMR is controlled by a DDQN agent, who
aims to maximize the system’s energy efficiency (EE) by jointly optimizing its 3D
trajectory, the number of connected ground users, and the amount of energy used by
the AMRs, as shown in Fig. 15 [25].
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 219

Fig. 15 Top-Left: The CMAD-DDQN framework, in which each AMR (or UAV) j equipped with
a DDQN agent interacts and learns from its state space’s closest neighbors. In order to boost system
performance as a whole, each AMR directly works together. Bottom-Left: Framework known as
the multi-agent decentralized double deep Q-network (MAD-DDQN), in which each AMR j that
is equipped with a DDQN agent relies solely on the information it gathers from the surrounding
environment and does not collaborate directly with its immediate neighbors. The DDQN agent
observes its current state s in the environment at each time-step UAV j´s and updates its trajectory
by selecting an action in accordance with its policy, receiving a reward r and moving to a new state
s´ [25]

It was presuming that as the agents collaborate with one another in a dynamic
shared environment, they might observe learning uncertainties brought on by other
agents’ contradictory policies. Figure’s algorithm16 depicts Agent j’s direct collabo-
ration with its neighbors’ DDQN. Agent j adheres to a “–greedy policy” by carrying
out an action in its current state s, then moving to a new state s´ and receiving a
reward that reflects its neighborhood’s coverage performance. Moreover, the DDQN
procedure depicted in lines 23–31 advances the specialist’s choices (Fig. 16).

4.7 Fusion Method

Due to AMR’s limited understanding of the environment when performing local path-
planning tasks, the issues of path redundancy and local deadlock that arise during
planning are present in environments that are unfamiliar and complex. A novel algo-
rithm based on the fusion (combination) of LSTM, NN, fuzzy logic control, and
RL was proposed by Guo et al. This algorithm uses the advantages of each algo-
rithm to overcome its disadvantages. With the end goal of nearby way arranging, a
NN model with LSTM units is first evolved. Second, a low-dimensional input fuzzy
220 A. de J. Plasencia-Salgueiro

Fig. 16 DDQN for Agent j with direct collaboration with its neighbors [25]

logic control (FL) algorithm is used to collect training data, and the network model
LSTM_FT is pretrained by transferring the learned method to acquire the required
skill. RL is joined with autonomous gaining of new standards from the conditions to
more readily adjust to various circumstances. In static and dynamic environments, the
FL and LSTM_FT algorithms are contrasted with the fusion algorithm LSTM_FTR.
Numerical simulations show that LSTM_FTR can significantly improve path plan-
ning success rate, path length optimization, and decision-making efficiency when
compared to FL. The LSTM_FTR can learn new rules and has a higher success rate
than the LSTM_FT [26]. The research’s simulation phase is still ongoing.
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 221

4.8 Hybrid Method

Deep neuro-fuzzy systems


New hybrid systems that are categorized as deep neuro-fuzzy systems (DNFS) were
created as a result of the concepts of hybrid approaches.
A DNN is a good way to deal with big data, the superior accuracy of the model,
however, comes at the cost of high complexity. Before using this kind of network
to solve some problems, it’s important to remember a few things. A deeper analyt-
ical model can be provided by a DNN because it employs multiple hidden layers;
However, the computational complexity increases with each layer. In addition, these
networks are based on a conventional NN that trains using the gradient descent opti-
mization method. As a result, the DNN frequently runs into the issue of getting fixed
in the local minima. The main disadvantage of a DNN, in addition to these diffi-
culties, is that the model is frequently censured for not being transparent, and the
black-box nature of the model prevents humans from tracing its predictions. It’s hard
to trust the results that these deep networks produce. As a result, there is always a
chance that analysts and DNNs won’t talk to each other. According to Talpur et al.,
this drawback restricts the usability of such networks in the majority of real-world
problems, where verification of predicted results is a major worry [27].
Few studies in the literature have created a novel DNFS by combining a DNN with
fuzzy systems to address these issues. Fuzzy systems are information-processing-
oriented structures built with fuzzy techniques. They are mostly put into systems
where traditional binary logic is difficult or impossible to use. Their main character-
istic is that they use fuzzy conditional IF–THEN rules to represent symbolic knowl-
edge. Thusly, the original hybridization of DNN and fuzzy systems have shown a
viable method for diminishing vulnerability utilizing fuzzy rules [27].
Figure 17 shows DNFS, which combines the advantages of a DNN and fuzzy
systems (Aviles et al., 2016).
Time-series data and other problems with high linearity can be solved with a
sequential DNFS. As represented in Fig. 18a, b, in sequential structural design, a
DNN and a fuzzy system process data sequentially. In fuzzy theory, a fuzzy set A

Fig. 17 Representation of DNFS by combining the advantages of fuzzy systems and a DNN [27]
222 A. de J. Plasencia-Salgueiro

Fig. 18 Sequential DNFS: a fuzzy systems incorporated with a DNN and b a DNN incorporated
with fuzzy systems [28]

in a universe of discourse X is represented by a membership function μA taking the


values from the unit interval as μA : X → [0, 1]. At this point, a membership function
shows the degree of similarity for a data point within the universe of discourse of x
∈ X.
To accurately describe real-world uncertainty, the fuzzy system makes use of the
approximate reasoning and decision-making capabilities of fuzzy logic. It can work
with data that shortage precision, ambiguity, or certainty [27].
A* hybrid with the deep deterministic policy gradient.
The expansion of nodes based on an evaluation function, which is the sum of two
costs, is a characteristic of the traditional graph-search algorithm known as A*: (1)
from the starting point to the node under consideration and (2) from the node to the
objective.
In contrast to traditional path planning methods, current approaches use RL
to solve the problem of robot navigation by implicitly learning to deal with the
interaction ambiguity of surrounding moving obstacles.
Because they are unaffected by changes in the number of robots and are decen-
tralized, decentralized methods for dealing with multi-robot dynamic scene way
arranging are becoming increasingly common.
A*, adaptive Monte Carlo localization (AMCL), and RL are utilized in this addi-
tional hybrid approach to develop a robot navigation strategy for particular dynamic
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 223

scenes. Another hybrid strategy is AMRN with DRL [28]. Model migration costs
are reduced when A* and DDPG method are used together.
The actor-critic algorithm serves as a model for the DDPG algorithm. The proce-
dure, as shown in Fig. 20 depicts how two critics use the model’s in Fig. 19 pseu-
docode to speed up the training process. One critic will advise the actor on how to
avoid collision and estimate the probability of it. In addition to instructing the actor
on how to get there, the additional critic will reduce the difference between the input
speed and the output speed.

4.9 Hierarchical Framework

Wei Zhu and authors following sampling efficiency and sim-to-real transfer capa-
bility, a hierarchical DRL framework for quick and secure navigation is described
in [29]. The low-level DRL policy enables the robot to simultaneously move toward
the target position while maintaining a safe distance from obstacles; The high-level
DRL policy has been expanded to further enhance navigational safety. It is chosen
as a sub-goal as a waypoint on the path from the robot to the ultimate goal to avoid
sparse reward and reduce state space. The path can also be made with a local or global
map, which can make the proposed DRL framework’s generalizability, safety, and
sampling efficiency much better. The sub-goal can also be used to reduce action
space and increase motion efficiency by creating a target-directed representation of
the action space.
The objective is a DRL strategy with high training efficiency for quick and secure
navigation in complex environments that can be used in a variety of environments
and robot platforms. The low-level DRL policy is in charge of quick motion, and
the high-level DRL policy was added to improve obstacle avoidance safety. As a
result, a two-layer DRL framework is built, as shown in Fig. 21. When the sub-goal,
which is a waypoint on the path from the robot to the ultimate goal, is chosen, the
observation space of RL is severely limited. When conventional global path planning
strategies are used to generate the path, which takes into account both obstacles and
the final goal position, the sampling space is further reduced. Due to the inclusion of
the sub-goal, the training effectiveness of this DRL framework is significantly higher
than that of pure DRL methods. The DNN only generates discrete linear velocity,
and the sub-goal’s angular velocity is inversely proportional to its orientation in the
robot frame. Consequently, there is less room for exploratory action and less room
for observation. Additionally, the proposed DRL framework is very adaptable to a
variety of robot platforms and environments for three reasons: (1) the DRL elements
are represented by a sub-goal on a feasible path; (2) the observation includes a high-
dimensional sensor scan whose features were extracted using DNN; (3) Generalized
linear and angular velocities are used to convert the actions into actuator commands
[29].
The DQN algorithm is utilized by the value-based RL framework in both low-level
and high-level DRL. The discrete action space was chosen due to its simplicity, even
224 A. de J. Plasencia-Salgueiro

Fig. 19 A* combined with DDPG method pseudo code


Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 225

Fig. 20 A* utilized the DDPG architectural strategy. The laser input is provided by the robot
sensor. The input for navigation comes from global navigation. Using the vel-output, the mobile
base controls the robot. The gradient and computed mean squared error (MSE) are used to update
the actors’ neural networks [28]

Fig. 21 DRL framework in a hierarchy. The high-level DRL strategy aims to safely avoid obstacles
while the low-level DRL policy is utilized for rapid motion. A 37-dimension laser scan, the robot’s
linear and angular velocities (v and w), and the sub-goal’s position in the robot frame (r and θ) are
all part of the same state input for both the low-level and high-level DRL policies. In contrast, the
high-level DRL policy generates two abstract choices, one of which relates to the low-level DRL
policy, while the low-level DRL policy generates five specific actions. [29]
226 A. de J. Plasencia-Salgueiro

though the DDPG algorithm and the soft-actor-critic (SAC) framework are required
for smoother motion.

5 Design Methodology

With DRL systems for autonomous navigation, the two most significant issues are
data inefficiency and a lack of generalizability to new goals.
A design methodology to be used for designing DRL applications in autonomous
systems is given in [30]. For the development of DRL-based systems, Hillebrand’s
methodology was designed to accommodate the need to compromise a design method
and make consecutive design decisions.
The V-Model’s fundamental principles serve as the methodology’s foundation.
Figure 22 describes the process’s four consecutive interactions.
The Operational Domain Analysis is the first step. The operational environment
and the robot’s tasks in terms of the use case and requirements are defined in this
phase. The criteria for testing and evaluating are the requirements.
The second stage is conceptual design. At this point, the primary characteristic
of the problem with reinforcement learning is established. Powerful characteristics
include the activity space, perception space, reward range, and significant gambling
conditions.
The Systems Design is the third step. The various design decisions that need to
be made and an understanding of the fundamental factors that influence them are all
part of this phase.
The design of the reward is the first crucial aspect. The goal that the agent is
supposed to achieve is implicitly encoded in their ward.
The selection of an algorithm is the second design decision. When choosing an
algorithm, there are a few things to consider. The kind of action space first. Both a
discrete and a continuous action space can be handled by the DRL algorithm.

Fig. 22 Design Process for DRL [30]


Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 227

The design of the NN is the third step. In DRL, NN are used to approximate value
and policy networks’ functions.
The inductive bias is the fourth design decision. Domain heuristics that are utilized
to accelerate the algorithm’s learning processor performance are referred to as an
inductive bias.
The learning rate is the final design factor. Virtual commissioning is the fourth
step. The rate at which the NN is trained is determined by this factor. X-in-the-Loop
Techniques and Virtual Testbeds are used to evaluate agent performance and integrate
the model in this step [30].

5.1 Benchmarking

Benchmarking is a difficult problem because of the learning process’s stochastic


nature and the limited datasets examined in algorithm comparisons. This problem
gets even worse with DRL. Since DRL incorporates both the environment’s and
model learning’s inherent stochasticities, it is particularly challenging to guarantee
reproducibility and fair comparisons. To this end, benchmarks have been created
using simulations of numerous sequential decision-making tasks [31].
Best practices to benchmark DRL

Number of Trials, Random Seeds, and Significance Testing


The number of trials, random seeds, and significance testing all plays a significant
role in DRL. Stochasticity comes from stochasticity in environments and randomness
in NN initializations. Simply changing the random seed can cause significant results
variations. Therefore, it is essential to conduct a large number of trials with various
random seeds when evaluating the performance of algorithms [31].
In DRL, it is common practice to simply use the average of several learning trials
to determine an algorithm’s effectiveness. This is a reasonable benchmark strategy
because methods derived from significance testing provide statistically supported
arguments in support of a particular hypothesis. Using a variety of random seeds and
environmental conditions, significance testing can be applied to DRL in practice to
account for the standard deviation across multiple trials. A direct 2-sample t-test,
for instance, can be used to determine whether performance gains are significantly
attributable to the algorithm’s performance or whether the results are too noisy in
highly stochastic environments. Particularly, it has been argued that for accurate
comparisons, it is not sufficient to simply present the top-K trials as performance
gains in several works. [31].
In addition, it is important to exercise caution when interpreting the outcomes. It is
possible to demonstrate that a hypothesis holds for one or more specific environments
and sets of hyperparameters, but fails in other contexts.
228 A. de J. Plasencia-Salgueiro

Hyperparameter Tuning and Ablation Comparisons


Ablation and tuning are two additional important considerations. In this instance,
a variety of random seed combinations are compared using an ablation analysis
across a number of trials. For baseline algorithms, it is especially important to fine-
tune hyperparameters as much as possible. A false comparison between a novel
algorithm and a baseline algorithm may occur if the hyperparameters used are not
selected appropriately. Particularly, a large number of additional parameters, such
as the learning rate, reward scale, network architecture, and training discount factor,
among others, have the potential to significantly affect outcomes.
To ensure that a novel algorithm performs significantly better, the appropriate
scientific procedure must be followed when selecting such hyperparameters.
Reporting Results, Benchmark Environments, and Metrics
Some studies have used metrics like the maximum return within Z samples or the
average maximum return, but these metrics may be biased to make the results of
highly unstable algorithms appear more significant. These metrics, for instance,
would guarantee that an algorithm is successful if it quickly achieves a high maximum
return but then diverges. It is essential to select measurements that provide a fair exam-
ination when selecting measurements to report. If the algorithm performs better in
average maximum return but worse in average return, it is essential to highlight
both results and describe the algorithm’s advantages and disadvantages. This also
applies to selecting the evaluation’s benchmark environments. Ideally, observational
outcomes ought to cover a large number of conditions to figure out which conditions
a calculation succeeds in and which don’t. Identifying performance applications and
capabilities in the real world requires this.
Open-source software for DRL simulation
A learning algorithm—model-based or model-free—and a particular structure or
structures of function approximators make up a DRL agent.
There is a lot of software available to simulate DRLs for autonomous mobile robot
navigation. For Validation and Verification of AMR, there are 37 tools listed under
Effect Tools in Araujo [32], Of those 37 tools, 27 have their source artifacts made
public and are open-source.
According to the same authors [32], the most significant gap is the absence of
agreed-upon rigorous measures and real-world benchmarks for evaluating the inter-
ventions’ efficiency and effectiveness. Measures of efficiency and effectiveness that
are applicable to the AMR sub-domains are scarce, and many of them are extremely
generic. Due to the absence of domain-specific modeling languages and runtime
verification methods, AMR testing strategies have possibility for improvement.
Another huge hole is the shortfall of quantitative determination dialects for indi-
cating the ideal framework properties; Property languages must cover topics like the
combination of discrete and continuous dynamics, stochastic and epistemic aspects,
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 229

and user- and environment-related aspects of behavior because of the inherent hetero-
geneity of AMR. This is related to the lack of interventions that provide quantitative
quality metrics as the test’s result and conduct a quantitative system analysis [32].
Techniques for benchmarking
Gazebo (https://gazebosim.org/home): An open-source 3D simulator for robotics
applications is called Gazebo. From 2004 to 2011, Gazebo was a part of the Player
Project. Gazebo became an independent project in 2012, and the Open Source
Robotics Foundation (OSRF) began providing support for it. Open Dynamics Engine
(ODE), Bullet, Simbody, and Dynamic Animation and Robotics Toolkit (DART) are
among the physics engines that are incorporated into Gazebo. Each of these physics
engines is capable of loading a physical model that is described in XML format by a
Simulation Description Format (SDF) or Unified Robotic Description. Also, Gazebo
allows users to come up with their world, model, sensor, system, visual, and GUI
plugins by implementing C ++ Gazebo extensions. This capability enables users to
extend the simulator further into more complex scenarios. OSRF provides a bridge
between Gazebo and Robot Operating System (ROS) with the gazebo_ros plugin
package [33].
ROS is an open-source software framework for robot software development main-
tained by OSRF. ROS is a widely used middleware by robotics researchers to leverage
the communication between different modules in a robot and between different
robots and to maximize the re-usability of robotics code from simulation to the
physical devices. ROS allows the running of different device modules as a node and
provide multiple different types of communication layers between the nodes such as
service, publisher-subscriber, and action communication models to satisfy different
purposes. This allows robotics developers to encapsulate, package, and re-use each
of the modules independently. Additionally, it allows each module to be used in both
simulation and physical devices without any modification [33].
In robotics, as well as many other real-world systems, continuous control frame-
works are required. Numerous works provide RL-compatible access to extremely
realistic robotic simulations by combining ROS with physics engines like ODE or
Bullet. The greater part of them can be run on genuine mechanical frameworks with
a similar programming.
Two examples of the implementation of Gazebo in benchmarking deep reinforce-
ment learning algorithms are the following.
Yue et al. [34] is troubled by the AMRN’s inability to construct an environment
map prior to moving it to its desired position. Instead, it only relies on what is
currently visible. The DQN is used to map the initial image to the mobile robot’s
best action within the DRL framework. As previously stated, it is difficult to directly
apply RL in a real-world robot navigation scenario due to the need for a large number
of training examples. Prior to being utilized to tackle the issue in a genuine portable
robot route situation, the DQN first goes through preparing in the Gazebo reproduc-
tion environment. The proposed method has been approved for both simulation and
real-world testing. The experimental results of autonomous mobile robot navigation
in the Gazebo simulation environment demonstrate that the trained DQN is able to
230 A. de J. Plasencia-Salgueiro

accurately map the current original image to the AMR’s optimal action and approx-
imate the AMR’s state action-value function. The experimental results in real-world
indoor scenes demonstrate that the DQN that was trained in a simulated environment
can be utilized in a real-world indoor environment. The AMR can similarly stay
away from problems and get to the planned area, even in unique conditions where
there is interference. As a result, it can be used as an effective and environmentally
adaptable AMRN method by AMR operating in an unknown environment.
For robots moving in tight spaces [35], the authors assert that mapping, localiza-
tion, and control noises could result in collisions when using motion planning based
on the conventional hierarchical autonomous system. In addition, it is disabled when
there is no map. To address these issues, the authors employ DRL, a self-decision-
making technique, to self-explore in small spaces without a map and avoid colli-
sions. The rectangular safety region, which represents states and detects collisions
for robots with a rectangular shape, and a meticulously constructed reward function,
which does not require information about the destination, were suggested to be used
for RL using the Gazebo simulator. After that, they test five reinforcement learning
algorithms—DDPG, DQN, SAC, PPO, and PPO-discrete—in a narrow track simu-
lation. After training, the successful DDPG and DQN models can be applied to three
brand-new simulated tracks and three actual tracks. (https://sites.google.com/view/
rl4exploration).
Benchmarking Techniques for Autonomous Navigation
Article [36] confirms, that “a lack of an open-source benchmark and reproducible
learning methods specifically for autonomous navigation makes it difficult for
roboticists to choose what RL algorithm to use for their mobile robots and for
learning researchers to identify current shortcomings of general learning methods
for autonomous navigation”.
Before utilizing DRL approaches for AMRN, the four primary requirements that
must be satisfied are as follows: thinking about safety, generalization to diverse and
novel environments, learning from limited experimentation information, and thinking
under uncertainty of partially observed sensory inputs. The four main categories of
learning methods that can satisfy one or more of the aforementioned requirements are
safe RL, memory-based NN architectures, model-based RL, and domain randomiza-
tion. A comprehensive investigation of the extent to which these learning strategies
are capable of meeting these requirements for RL-based navigation systems is carried
out by incorporating them into a brand-new open-source large-scale navigation
benchmark. This benchmarking’s codebase, datasets, and experiment configurations
can be found at https://github.com/Daffan/ros_jackal.
Benchmarking multi-agent deep reinforcement learning algorithms
An open-source Framework for Multi-robot Deep Reinforcement Learning
(MADRL), named MultiRoboLearn was proposed by Chen et al. [37]. In terms
of generality, efficiency, and capability in an unstructured and large complex envi-
ronment, it is also important to include the support of multi-robot systems in existing
robot learning frameworks. More specifically, complex tasks such as search/rescue,
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 231

group formation control, or uneven terrain exploration require robust, reliable, and
dynamic collaboration among robots. MADRL is a framework that acts as a bridge
to link multiagent DRL algorithms with real-world multi-robot systems. This frame-
work has two key characteristics compared with other frameworks. On the one hand,
compared with learning-based single-robot frameworks, it is important to consider
how robots collaborate to perform tasks intelligently, and how robots communi-
cate with each other efficiently, also, this work extends the system to the domain of
learning algorithms (https://github.com/JunfengChenrobotics/MultiRoboLearn).

6 Teaching

Robot navigation is typically accomplished through imperative programming in


typical educational robotics approaches. These methods, on the other hand, miss
an opportunity to introduce ML techniques based in an authentic and engaging
learning context due to the increasing presence of AI in everyday life. Additionally,
barriers that prevent all students from participating in robotics experiences include
the requirement of a lot of physical space as well as pricey, specialized equipment
[38].
Individual learning path planning has become more common in online learning
systems in recent years [39], but few studies have looked at teaching path planning
in traditional classrooms.
Authors in the work “Open Source Robotic Simulators Platforms for Teaching
Deep Reinforcement Learning Algorithms” [40] suggest for the teaching process
of RL and DRL algorithms according to their experience, two open source robotic
simulator platforms. The union of Gym, a toolkit for developing and comparing RL
algorithms, and the robotic simulator CoppeliaSim (V-REP). This conception was
followed in this chapter.
CoppeliaSim: A distributed control architecture is the foundation of the integrated
development environment for the robotics simulator CoppeliaSim (https://www.cop
peliarobotics.com/). Using an inserted script, a plugin, a ROS node, a remote API
client, or a custom solution, each object or model can be individually controlled.
This makes CoppeliaSim very versatile and ideal for multi-robot applications.
CoppeliaSim is great for multi-robot applications because of this Control programs
can be written in Octave, C/C + + , Python, Java, Lua, or Matlab. CoppeliaSim is
utilized for a variety of purposes, including rapid algorithm development, simulations
of factory automation, rapid prototyping and verification, robotics-related education,
remote monitoring, safety double-checking, and more [41].
The first proposed exercise in this paper is to control the position of a simulated
Kephera IV mobile robot in a virtual environment using RL algorithms. In order
to carry out the experiments and control the robot’s movement in the simulated
environment, the OpenAI Gym library and the 3D simulation platform CoppeliaSim
are utilized. The results of the RL agents used are compared to those of Villela and
IPC, two control algorithms, to represent the DDPG and DQN. The results of the
232 A. de J. Plasencia-Salgueiro

Fig. 23 Communication program [42]

analyses conducted under both obstacles and no obstacles conditions demonstrate


that DDPG and DQN are able to learn and comprehend the best activities in the
environment. This enables us to actually play out the position of control over various
target locations and achieve the best results given a variety of metrics and records
(https://github.com/Fco-Quiroga/exercise center Khepera position) [42].
An interface that converts the CoppeliaSim simulation into an environment that
is compatible with RL agents and control algorithms was developed using the
OpenAI Gym library. Gym provides a straightforward description of RL environ-
ments, formalized as Partially Observable Markov Decision Processes (POMDPs).
Figure 23, which is a diagram of the communication that takes place in the environ-
ments, shows the information flow that takes place between the algorithm or agent,
the Gym environment, the API, and the CoppeliaSim program.
In the second exercise, CoppeliaSim (V-Rep) is used to implement a learning-
based mapless motion planner. The objective situation according to the direction
outline for the versatile robot is yield by this organizer, which takes constant guiding
orders as information [44]. For mobile ground robots equipped with laser range
sensors, the primary source of traditional motion planners is the obstacle map of the
navigation environment. This requires both the environment’s obstacle map building
work and the highly precise laser sensor. Using an unconventional profound support
learning strategy, it is demonstrated how a mapless movement organizer can be
prepared from beginning to end with virtually no physically planned highlights or
previous exhibits. The trained planner can immediately put their knowledge to use,
both in environments that are watched and those that are not. The tests demonstrate
that the proposed mapless motion planner is able to successfully direct the no holo-
nomic mobile robot to its intended locations. In the examples at https://paperswit
hcode.com/paper/virtual-to-real-deep-reinforcement-learning, the gazebo is used.

7 Discussion

The introduction of deep reinforcement learning as the learning, general proposed


framework for AI creates application-independent and platform-independent
systems.
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 233

The ultimate goal in AI and Robotics currently is to reach human-level control.


Robotics and AI are among the most complicated engineering sciences and highly
multidisciplinary, supported by a good amount of basic knowledge in other sciences.
The theoretical origins of Autonomous Mobile robot parts at Control Theory and
reach the actual machine learning applications.
The requirements of Mobile Robot Navegation are Perception, Localization, Path-
planning, and Motion Control. In the case of Path-planning, for mapless robot navi-
gation, it may be a fuzzy requirement how as represented in Fig. 4. Continuing with
Background, the reader may be introduced in the basement of DRL with concepts and
functional structures of Machine Learning algorithms that contribute to general DRL
conception. Here are mentioned RL, CNN, LSTM, the value-based DQN, DDQN,
DD3QN, and the policy-based DDPG, A3C, and PPO, all of them to execute the
requirements of Dynamic behavior, safety, and uncertainty.
The Methods section is more theoretical and embraces the explanation, structure,
and pseudo-code of algorithms that permit the programming implementation of the
more common and perspective methods for autonomous mobile robotics like simple
algorithms, fusion, hybrid, hierarchical, and multi-agent that permit the researcher
to visualize all the options and possibilities for the development mobile autonomous
robots using DRL.
The two most important problems with DRL systems for autonomous navigation
are data inefficiency and lack of generalization to new goals. The methodology and
benchmarking tools described in Sect. 5 support the minimization of these prob-
lems in the development effort. The emphasis is doing in the use of simulation
tools. Gazebo is the Simulation tool more implemented in the Deep Reinforcement
Learning world for AMR development.
One important topic shown above is the key role of Teaching Robotics and
Machine Learning and the importance of Simulation for better comprehension. For
teaching, the text has proposed the use of CoppeliaSim.
Is expected continuous theoretical development in this field, particularly in the
fusion, hybrid and hierarchical implementations. With all this knowledge, is possible
for the researchers to find a way the development of autonomous mobile robotics
using Deep Reinforcement Learning algorithms with better functionalities in safe
and efficient navigation fields.
Papers like this exposed the general characterization of the use of DRL for the
development of AMRN that embraces not all these topics in only one work don´t
found by the author at this moment.

8 Conclusions

Numerous fields have benefited greatly from autonomous robots’ contributions. Since
mobile robots need to be able to navigate safely and effectively, there was a strong
demand for innovative algorithms but contradictory, the number of algorithms that
234 A. de J. Plasencia-Salgueiro

can navigate and control robots in dynamic environments is limited, even though the
majority of autonomous robot applications take place in dynamic environments.
With development of machine learning algorithm, in particular Reinforcement
Learning algorithms and Deep Learning, with the creation the Deep Reinforce-
ment Learning algorithms, is opened a wide field of applications at new level of
Autonomous Mobile Robot Navigation Techniques in dynamic environments with
safety and uncertainty considerations.
However, this is a very fast theme and for better development is necessary to
establish a methodological conception that in first order, select and characterized
the fundamental Deep Reinforcement Learning Algorithms at code level, making
qualitative comparison of the most recent Autonomous Mobile Robot Navigation
techniques for controlling in dynamic environments with safety and uncertainty
considerations, and underlying the most complex and promissory techniques like
fusion, hybrid and hierarchical framework. In second order, is included the design
methodology and established the different benchmarking techniques for the selec-
tion of better algorithm according the specific environment. Finally, but no minor
significant, is recommended the tool and more suitable examples, according the
experience, for Teaching Autonomous robot navigation using Deep Reinforcement
Learning Algorithms.
Thinking about the future perspective of the work, is necessary the continuing
development of methodology and write a more homogenous and practical document
that permits the inclusion of new developed algorithms and a better comprehension of
the exposed methodology. Hope that this methodology helps students and researcher
in his work.

Abbreviations

AC Actor-Critical Method
AI Artificial Intelligence
AMRN Autonomous Mobile Robot Navigation
ANN Artificial Neural Networks
AR Autonomous Robot
BDL Bayesian Deep Learning
CNN Convolutional Neural Networks
CMAD-DDQN Communication-Enabled Multiagent Decentralized DDQN
DRL Deep Reinforcement Learning
DNN Deep Neural Networks
DNFS Deep Neuro-fuzzy systems
DL Deep Learning
DQN Deep Q-network
DDQN Double DQN
D3QN Dueling Double Deep Q-network
DDPG Deep Deterministic Policy Gradient
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 235

DDP Deep Deterministic Policy


DQL Deep Q-Learning
DART Dynamic Animation and Robotics Toolkit
FC Fully Connected
FL Fuzzy Logic Control
GNC Guidance, Navigation and Control
LSTM Long Short Term Memory
MNR Mobile Robot Navigation
ML Machine Learning
MR Mobile Robot
MDP Markov Decision Process
MARL Multi-Agent Reinforcement Learning
MADRL Multi Robot Deep Reinforcement Learning
MSE Mean Square Error
NN Neural Networks
ODE Open Dynamics Engine
OSRF Open Source Robotics Foundation
POMDPs Partially observable Markov Decision processes
PPO Proximal Policy Optimization
RL Reinforcement Learning
RL-AKF Adaptive Kalman Filter Navigation Algorithm
RNN Recurrent Neural Network
ROS Robot Operating System
RSSM Recurrent State-Space Model
SAC Soft Actor Critic
SDF Simulation Description Format
URDF Unified Robotic Description Format.

References

1. Dargazany DRL (2021). Deep Reinforcement Learning for Intelligent Robot Control–Concept,
Literature, and Future (Vvol. 13806v1, no. 2105, p. 16).
2. Abbeel, P. (2016). Deep learning for robotics. In DL-workshop-RS.
3. Balhara, S. (2022). A survey on deep reinforcement learning architectures, applications and
emerging trends. IET Communications, 16.
4. Hodge, V. J. (2020). Deep reinforcement learning for drone navigation using sensor data. Neural
Computing and Applications, 20.
5. Kondratenko, Y., Atamanyuk, I., Sidenko, Machine learning techniques for increasing effi-
ciency of the robot’s sensor and control information processing. Sensors MDPI, 22(1062),
31.
6. Gao, X. (2020). RL-AKF: An adaptive kalman filter navigation algorithm based on reinforce-
ment learning for ground vehicles. Remote Sensing, 12(1704), 25.
7. Hewawasam, H. S. (2022). Past, present and future of path-planning algorithms for mobile
robot navigation in dynamic environments. IEEE Industrial Electronics Society, 3(2022), 13.
236 A. de J. Plasencia-Salgueiro

8. Doukhi, O. (2022). Deep reinforcement learning for autonomous map-less navigation of a


flying robot. IEEE Access, 13.
9. Xiao, X. (2022). Motion planning and control for mobile robot navigation using machine
learning: A survey. Autonomous Robots, 29.
10. Kober, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of
Robotics Research, no. Res.0278364913495721.
11. Plasencia, A. (2013). Simulación de la navegación de los robots móviles mediante algoritmos
de aprendizaje por refuerzo para fines docentes. In TCA-2013, La Habana.
12. H. B. (2005). Reinforcement learning neural network to the problem of autonomous mobile
robot obstacle avoidance. In: Proceedings of the Fourth International Conference on Machine
Learning and Cybernetics, Guangzhou.
13. H. M. (2008). Simulation of the navigation of a mobile robot by the Q Learning using artificial
neuron networks. In University Hadj Lakhdar, Batna, Algeria.
14. Bengio, Y. (2009). Learning deep architectures for AI. in Now Publishers Inc.
15. Zhu, K. (2021). Deep reinforcement learning based mobile robot navigation: A review.
Tsinghua Science and Technology, 26(5), 18.
16. Melcher, K., Silipo, R. (2020). Codeless deep learning with KNIME. Packt Publishing.
17. Plasencia, A.: Autonomous robotics safety. in X Taller Internacional De Cibernética Aplicada,
La Habana.
18. González-Rodríguez, L. (2021). Uncertainty-Aware autonomous mobile robot navigation
with deep reinforcement learning. In: Deep learning for unmanned systems, Switzerland AG
(pp. 225–257). Springer.
19. Plasencia, A. (2021). Managing deep learning uncertainty for unmanned systems. In Deep
Learning for Unmanned Systems, Switzerland (pp. 184–223). Cham: Springer.
20. Lillicrap, T. P. (2016). Continuous control with deep reinforcement. In ICLR 2016, London,
UK.
21. Rodrigues, M. (2021). Robot training and navigation through the deep Q-Learning algorithm.
In IEEE International Conference on Consumer Electronics (ICCE).
22. Jiang, Q. (2022). Path planning method of mobile robot based on Q-learning. in AIIM-2021
Journal of Physics: Conference Series.
23. Ruan, X. (2019). Mobile robot navigation based on deep reinforcement learning. in The 31th
Chinese Control and Decision Conference (2019 CCDC), Beijing.
24. Wu, P. (2022). DayDreamer: World models for physical robot learning (p. 15).
arXiv:2206.14176v1 [cs.RO].
25. Omoniwa, Communication-Enabled multi-agent decentralised deep reinforcement learning
to optimise energy-efficiency in UAV-Assisted networks. In IEEE transactions on cognitive
communications and networking (p. 12).
26. Guo, N. (2021). A fusion method of local path planing for mobile robots based on LSTM neural
network and reinforcement learning. Mathematical Problems in Engineering Hindawi, 2021,
no. id 5524232, p. 21, 2021.
27. Talpur, N. (2022). Deep Neuro-Fuzzy System application trends, challenges, and future
perspectives: a systematic survey. Artificial Intelligence Review, 49.
28. Zhao, K. (2022). Hybrid navigation method for multiple robots facing dynamic obstacles.
Tsinghua Science and Technology, 27(6), 8.
29. Zhu, W. (2022). A hierarchical deep reinforcement learning framework with high efficiency
and generalization for fast and safe navigation. IEEE Transactions on Industrial Electronics,
10.
30. Hillebrand, M. (2020). A design methodology for deep reinforcement learning in autonomous
systems. Procedia Manufacturing, 52, 266–271.
31. François-Lavet, V. (2018). An introduction to deep reinforcement learning. Foundations and
Trends in Machine Learning, 11(3–4), 140. arXiv:1811.12560v2 [cs.LG].
32. Araujo, H. (2022). Testing, validation, and verification of robotic and autonomous systems: A
systematic review. Association for Computing Machinery ACM, 62.
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation 237

33. La, W. G. (2022). DeepSim: A reinforcement learning environment build toolkit for ROS and
Gazebo (p. 10). arXiv:2205.08034v1 [cs.LG].
34. Yue, P. (2019). Experimental research on deep reinforcement learning in autonomous
navigation of mobile robot (2019)
35. Tian, Z. (2022). Reinforcement Learning for Self-exploration in Narrow Spaces (Vol. 17, p. 7).
arXiv:2209.08349v1 [cs.RO].
36. Xu, Z. Benchmarking reinforcement learning techniques for autonomous navigation.
37. Chen, J. (2022). MultiRoboLearn: An open-source Framework for Multi-robot Deep Reinforce-
ment Learning (p. 7). arXiv:2209.13760v1 [cs.RO].
38. Dietz, G. (2022). ARtonomous: Introducing middle school students to reinforcement learning
through virtual robotics. In IDC ’22: Interaction Design and Children.
39. Yang, T., Zuo (2022). Target-Oriented teaching path planning with deep reinforcement learning
for cloud computing-assisted instructions. Applied Sciences, 12(9376), 18.
40. Armando Plasencia, Y. S. (2019). Open source robotic simulators platforms for teaching deep
reinforcement learning algorithms. Procedia Computer Science, 150, 9.
41. Coppelia robotics. Retrieved October 10, 2022, from https://www.coppeliarobotics.com/.
42. Quiroga, F. (2022). Position control of a mobile robot through deep reinforcement learning.
Applied Sciences, 12(7194), 17.
43. Zeng, T. (2018). Learning continuous control through proximal policy optimization for mobile
robot navigation. In: 2018 International Conference on Future Technology and Disruptive
Innovation, Hangzhou, China.
44. Tai, L. (2017). Virtual-to-real deep reinforcement learning: continuous control of mobile robots
for mapless navigation. In IROS 2017, Hong Kong.
Event Vision for Autonomous Off-Road
Navigation

Hamad AlRemeithi, Fakhreddine Zayer, Jorge Dias, and Majid Khonji

Abstract Robotic automation has always been employed to optimize tasks that are
deemed repetitive or hazardous for humans. One instance of such an application
is within transportation, be it in urban environments or other harsh applications.
In said scenarios, it is required for the platform’s operator to be at a heightened
level of awareness at all times to ensure the safety of on-board materials being
transported. Additionally, during longer journeys it is often the case that the driver
might also be required to traverse difficult terrain under extreme conditions. For
instance, low light, fog, or haze-ridden paths. To counter this issue, recent studies
have proven that the assistance of smart systems is necessary to minimize the risk
involved. In order to develop said systems, this chapter discusses a concept of a Deep
Learning (DL) based Vision Navigation (VN) approach capable of terrain analysis
and determining the appropriate steering angle within a margin of safety. Within the
framework of Neuromorphic Vision (NV) and Event Cameras (EC), the proposed
concept is tackling several issues within the development of autonomous systems. In
particular, the use of Transformer based backbone for off-road depth estimation using
an event camera for better accuracy result and processing time. The implementation of
the above mentioned deep learning system, using event camera is leveraged through
the necessary data processing techniques of the events prior to the training phase.
Besides, binary convolutions (BN) and alternately spiking convolution paradigms

H. AlRemeithi
Tawazun Technology and Innovation, Abu Dhabi, United Arab Emirates
e-mail: [email protected]
H. AlRemeithi · F. Zayer (B) · J. Dias · M. Khonji
Khalifa University, Abu Dhabi, United Arab Emirates
e-mail: [email protected]
J. Dias
e-mail: [email protected]
M. Khonji
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 239
A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous
Systems Applications, Studies in Computational Intelligence 1093,
https://doi.org/10.1007/978-3-031-28715-2_8
240 H. AlRemeithi et al.

using the latest technology trend have been deployed as acceleration methods, with
efficiency in terms of energy latency, and environmental robustness. Initial results
hold promising potential for the future development of real-time projects with event
cameras.

Keywords Autonomous robotics · Off-road navigation · Event camera ·


Neuromorphic sensing · Robotic vision · Deep learning systems

1 Introduction

Despite the advancements in autonomous driving algorithms, there still exists much
room for development within the realm of off-road systems. Current state-of-the-
art techniques for self-driving platforms have matured in the context of inter-city
travel [1, 2] and thus neglects the challenges faced when navigating environments
such as deserts. Uneven terrain and the lack of relevant landmarks and/or significant
features, pose serious challenges when the platform attempts to localize itself or
analyze the terrain to determine suitable navigation routes [3]. When discussing
desert navigation, even for skilled drivers, maneuverability is a complex task to
achieve that requires efficient decision-making. It has been shown that self-driving
platforms are capable of meeting such standards when using a combination of sensors
that measure the state of the robot and its surroundings [2].
Popular modern approaches include stereo-vision in addition to ranging sensors
like LIDARs to map the environment [4]. Utilizing several sensors allow for redun-
dancy and safer navigation, but also at the cost of increased development com-
plexity, system integration requirements, and financial burden [5]. Researchers have
addressed this concern by developing algorithms that minimize on-board sensors,
where in extreme cases a monocular vision-based approach is developed [6]. Whilst
this reduces the computational resources required to run the system in real-time, it
is still feasible to further reduce the system complexity. In the recent years, Neuro-
morphic computing has been researched heavily to further optimize these systems,
and to accommodate these novel architectures, a new kind of vision sensor, Event
cameras, are used in-place of traditional Frame-based cameras. So, from the above,
efficient off-road navigation is yet to be achieved, especially in the context of extreme
environments. The study presented discusses scenarios that may be experienced in
the UAE deserts.
The contributions presented in this chapter is a concept of an end-to-end neural
network for depth and steering estimation in the desert. To the best of the authors’
knowledge, this work is the first to investigate and argue the implications of uti-
lizing a Transformer based backbone for off-road depth estimation using an event
camera. The implementation of the above-mentioned deep learning system, using
event camera is leveraged through the necessary data processing techniques of the
events prior to the training phase. During inference, an acceleration method, namely
Event Vision for Autonomous Off-Road Navigation 241

Binary Convolutions, is implemented and initial results hold promising potential for
the future development of real-time projects with event cameras.
The remaining sections of the paper are as the following, Sect. 2, present related
work in terms of off-road navigation and neuromorphic vision as well as the use of
event cameras and their feasibility. Section 3 discusses event-based vision naviga-
tion. Section 4 shows the the proposed end-to-end deep learning navigation model
including event processing, depth estimation, steering prediction and the data set.
Section 5 discusses the implementation of the system and its possible acceleration
using binary convolution method the integration. In addition, energy efficient pro-
cessing using memristive technology based neuromorphic infrastructure is proposed
in the case study. Results and discussion are presented in Sect. 6, to show the obtained
results and the main achievements. Finally, conclusions and future works are drowned
in Sect. 7.

2 Related Work

2.1 Off-Road Navigation

Given a desert setting, traditional navigation techniques may not be directly, or even
completely compatible with the setting. Considerable modifications are required
when navigating on off-road environments in terms of mechanical and algorith-
mic design when approaching this issue. In such environments, it is expected to be
exposed to high temperatures, visual distortions due to dust, and instability when
driving on uneven terrains [7, 8]. Due to these challenges, the risk involved for a
human operator is increased significantly and it is often the case that some sort of
enhancement is needed to existing systems [9]. Specifically, in night-time trans-
portation, convoys may be more inclined to halt the journey to minimize such risks;
results in profit loss for businesses or delays in critical missions [10–12]. It is crucial
to devise a strategy that may be utilized around-the-clock and being standalone as
well to allow for system integration flexibility with multiple platform types. Multiple
paradigms are used to navigate autonomously as mentioned in the section previously,
for example, the well established vision plus ranging sensor configuration (Camera
and LiDAR) [9]. Unfortunately, using such a configuration in harsh and unpredictable
environments is not reliable due to the degradation of LiDAR in performance due to
heat and diffraction by sand particles [13, 14]. The main challenges faced would be
to design a stronger filter to reconstruct the noisy LiDAR data due to the hot tempera-
tures, and the diffraction and refraction from the laser and the sand particles [15, 16].
Although there has been a study that managed to overcome this issue by employing
a high-power LiDAR [17], such a solution is not preferable. It is not viable in this
scenario as adding a high-power variant will hinder performance when operating
high temperatures. Subsequently, the complexity of integration is also increased as
larger power supplies and cooling solutions must be integrated on the platform. Since
242 H. AlRemeithi et al.

the project proposes a standalone solution, using a monocular setup is preferable,


to avoid additional peripherals. Software-wise, with extra sensors the data fusion
complexity is increased because of the uneven terrain [18–20].
This research addresses the challenges faced in a dynamic desert environment
when utilizing a purely vision-based strategy by introducing Event cameras to the
system architecture [21]. Using Event cameras will allow us to reduce the compu-
tational requirements of the system and speed up processing [22]. By nature of its
design, these cameras record change in light intensities asynchronously within a
scene rather than synchronously recording current intensity values of a pixel within
a frame [23, 24]. There are two main advantages of this feature, one is the reduced
power consumption which yields a better thermal profile in hot environments, and
in-turn implies better long-term system performance. More importantly, since only
changes in light intensities are recorded, i.e. only moving objects are seen and static
information is discarded inherently, in practice it is reflected as a reduction in system
latency; it is expected to have an internal latency of almost 1 million times faster
than traditional cameras [23, 25, 26].
To develop the desired system, a certain due diligence is required, and hence
this section is dedicated towards literature review of state-of-the art techniques in
autonomous driving, specifically for off-road settings. The rest of the chapter shall
discuss the next topics: a modest introduction to Event-based vision sensors, algorith-
mic compatibility of event streams with traditional computing paradigms, possible
improvements for high speed vision in robotics, state-of-the-art depth estimation
techniques, and finally off-road steering.

2.2 Neuromorphic Vision

Recent advances in the last couple of decades in neuromorphic computing have


manifested in the form of event cameras. Event cameras are bio-inspired vision
sensors whose purpose is to tackle the increased demands of dynamic environments in
robotic application [27–29]. These sensors are capable of capturing the change in light
intensities per pixel and record them asynchronously rather than continuously record
frames. The operational concept is illustrated in Fig. 1 with such a neuromorphic
sensor as presented in Fig. 2. Frames display the rotating disk, while events are in
red and blue; positive and negative events to highlight change in position.
Event cameras have also demonstrated to have robustness against environmental
luminosity variations due to high dynamic range (HDR) of approximately 140 dB,
in comparison to 60 dB in traditional setups [24, 29, 31], (see Fig. 3). Additionally,
this also implies better feature detection since robustness against luminosity changes
also indicate the same against motion blur. Low light performance is also superior to
traditional frame-based cameras as the HDR allows for more reception from moon-
light and daylight. This is evident in a study by [32–34] where an Event camera was
used for high speed (10 µs) variant-light feature reconstruction.
Event Vision for Autonomous Off-Road Navigation 243

Fig. 1 Operation of frame camera versus event camera [30]

Fig. 2 Simplified
Neuromorphic sensor
architecture [22]

As mentioned previously, since only non-static artifacts are recorded, this means
that operationally the discarded pixels indirectly conserves power. This is seen in the
power consumption of a regular Event camera being approximately 100–140 mW in
an active state; four orders of magnitude less than a frame-based camera [35].
In summary, the advantages are, as listed by [22]:
• Minimal Internal Latency and High Temporal Resolution
• High Dynamic Range Analog Sensor
• Reduced Power Consumption and Heat Dissipation.
244 H. AlRemeithi et al.

Fig. 3 Frame reconstruction under different illuminations [33]

2.3 Key Features of Event Cameras

Event cameras output asynchronous sparse data according to log intensity differ-
ences through time [26]. Expanding on the advantages listed earlier in the chapter,
this section elaborates further. The ms-scale latency and high dynamic range is a
benefit towards robotics applications especially in navigation. The available readout
data from the sensor is provided in Address-Event Representation, which was first
introduced in [36]. Since the data is asynchronous unlike frames, it must be first
interpreted in in terms of timestamps to correlate the event with a pixel counter-
part from frames or with another event from the same stream for further processing.
Techniques are discussed further below.

2.4 Data Registration

Event Processing is capable of providing frame-based vision performance with the


advantages mentioned in the prior section. Natively, events are not compatible with
modern techniques, but by adding the necessary pre-processing steps it is possible to
leverage existing state-of-the-art computer vision algorithms, such as depth estima-
tion [37] becomes realizable. In order to fully utilize the techniques, event streams
Event Vision for Autonomous Off-Road Navigation 245

must first be interpreted as frames. Firstly, it must be noted that an event packet from a
neuromorphic camera contains the following data e( p, x, y, ts ). This form of data is
referred to as Address Event Representation (AER) [25, 26, 38]. In the AER packet,
p is the polarity of the event, which implies direction, shows old and new position
as seen in Fig. 1. Pixel position is represented by the pair (x, y), and ts indicates the
timestamp the event was recorded; when an external trigger was visually detected by
the silicon retina. The required pre-processing of AER streams consists of integrating
the accumulation of packets and overlaying them after passing through an integration
block [27]. By providing raw frames image enhancement is achieved because of the
inherited characteristics from the event camera. Visually represented as.
One drawback of using the approach in Fig. 4 is the saturation of the frame over
time when no motion is present. It is expected to encounter this problem if no raw
frames are provided; generated frames are inferred from events only. On a con-
tinuously moving platform this might not pose a serious issue, but if applied to
surveillance applications, the added redundancy of having a dedicated frame sensor
can be beneficial. Another study [39] demonstrated this concepts by using an Event
camera unit that houses both a frame and an event sensor (Fig. 5).

Fig. 4 Event-To-Frame block diagram

Fig. 5 Event-To-Frame results comparison [27]


246 H. AlRemeithi et al.

2.5 Feature Detection

Feature Engineering has been a well-established topic of research for data gathered
using the traditional frame-based camera. Using a similar approach, object recog-
nition on event data can be accomplished through feature detection, followed by
classification. For event data, there are a few feature detection techniques. Corner
features, edges, and lines are the most commonly used features. This section describes
studies that detect corners from event data but were not further investigated for the
use of object recognition. However, characteristics around detected corners could be
retrieved and given into a classifier for further classification.The stance of using an
event-based camera may be argued by addressing the issues commonly faced when
operating a regular camera. In high-speed vision, it is seldom the case that ideal light-
ing situations and no motion blur exists. As a result, degraded features are registered
onto the receiving sensor, which in turn reduces whatever strategy is being employed.
Reconstruction techniques have been developed like in [8, 40, 41] to de-blur images
and generate high-frame rate video sequences. The method discussed in [40] presents
an efficient optimization process with rivals state-of-the-art in terms of high-speed
reconstruction under varying lighting conditions and dynamic scenes. The approach
is described as an Event-based Double Integral model, the mathematical formulation
and derivation is left as an exercise to the reader to be referred to from the original
material; the results are shown in Fig. 6. It was also noted in [8] that reconstructing
a frame-based video output purely from events is not feasible, if the objective is to
achieve the same temporal resolution. This factor restricts the output video to be only

Fig. 6 De-blurring results comparison [40]


Event Vision for Autonomous Off-Road Navigation 247

Fig. 7 Event-only feature tracking

as fast as the physical limitations brought by the frame-based counterpart. This con-
clusion has been had for some time where other research published challenged the
perspective and suggests the nullification of frame-based cameras completely in such
approaches. For instance in [42] the feature tracking technique defined was indepen-
dent of actual frames, but rather on reconstructed logarithmic intensities from events
as depicted in Fig. 7. The authors also argue that this approach retains the High-
Dynamic Range aspect and is favorable than using the original frames. The tested
case scenario was in uneven lighting settings or objects with conceived motion blur,
where the reconstructed frames from events were resistant against these degradation
factors, unlike the original frames.

2.6 Algorithmic Compatibility

Since Event cameras are a recent commercial technology, often researchers investi-
gate the applicability of legacy computer vision techniques and their effectiveness
with neuromorphic sensors. Firstly, it is crucial to have a distinct pipeline to calibrate
a vision sensor prior to any algorithm deployment. One method shown in [43] tack-
les this by having flashing patterns in specific intervals which define sharp features
within the video stream. Although the screen is not moving, due to the flashing, the
intensity value is changing with time in that position, which emulates the concept of
motion within a frame, as such, calibration patterns are recorded. Moreover, other
techniques like in [44] surfaced which use a deep learning model to achieve a generic
event camera calibration system. The paper shows that neural-network-based image
reconstruction is ideally suited for the task of intrinsic and extrinsic calibration of
event cameras, rather than depending on blinking patterns or external screens like in
[43]. The benefit of the suggested method is that it allows to employ conventional
calibration patterns that do not require active lighting. Furthermore, the technique
enables extrinsic calibration between frame-based and event-based sensors without
adding complexity. Both simulation and real-world investigations from the paper
show that picture reconstruction calibration is accurate under typical distortion mod-
els and a wide range of distortion factors (Fig. 8).
248 H. AlRemeithi et al.

Fig. 8 Learning-based calibration pipeline

3 Event-Based Vision Navigation

In this section, relevant techniques used for autonomous navigation are discussed.
Challenges related to said techniques will also be addressed with respect to the case
scenario presented by desert environments. Currently, state-of-the-art approaches uti-
lize a combination of Stereoscopic vision setups, LiDAR, and single-view cameras
around the vehicle for enhanced situational awareness (as seen in Tesla vehicles).
However, for this study, research has been limited to front-view for driving assis-
tance or full autonomy. As a result, the discussed methods include the well established
stereo-vision configurations, with the optional addition of a LiDAR, or in extreme
cases, a monocular setup is used. Additionally, is not much reliable research con-
ducted regarding steering using event data, apart from [45, 46]. The presented results
in these studies mainly address a driving scenario of that similar to inter-city naviga-
tion, which may not be fruitful when applied onto an off-road environment, and so,
further investigations such as this chapter must be conducted. Researchers in [47]
have also proposed an event-frame driving dataset for end-to-end neural networks.
This work shall also be extended to accommodate other environments like those in
the Middle East, specifically, the United Arab Emirates (Fig. 9).
Event Vision for Autonomous Off-Road Navigation 249

Fig. 9 Driving sample for day/night on-road navigation [47]

3.1 Vision and Ranging Sensor Fusion

Different feature extraction methods are discussed in the literature that handles 3D
space generation from 3D representations [48]. The algorithms mentioned opt for
LiDAR-camera data fusion to generate dense point-clouds for geometric and depth-
completion purposes. The images are first processed from the camera unit and the
LiDAR records the distances based on the surrounding obstacles. Algorithms such as
RANSAC are also employed during the correlation phase to determine the geomet-
ric information of the detected targets. Machine Learning based approaches are also
gaining popularity as they can optimize the disparity map when using a stereo config-
uration alongside a LiDAR, which reduces development complexity when designing
an algorithm and running it in real-time [1, 48, 49]. The purpose of this study is to
investigate reliable techniques and address viable real-time options for offroad nav-
igations, as such, the previous factors are taken into consideration. Environmental
challenges are also addressed in [15–17]. They highlight the issues of using a rang-
ing sensor, a LiDAR, in adverse environments which may yield to sub-optimal data
readings. Due to the LiDAR essentially being a laser, it is expected for performance
degradation to be observed if the mode of operation is in harsh off-road setting, such
as seen in the Arab regions. A study discussing a similar attenuation profile to the
sand particles seen in the desert is reported next (Fig. 10).

3.2 Stereo Event Vision

An alternative to using a LiDAR and Camera combination is by having a stereo


setup instead. It is well established that depth may be determined by generating a
disparity map between two images from the same scene being observed at different
perspectives. Given the homography matrix and intrinsic camera parameters, depth
can be estimated using regular cameras. The same can be seen by using Event cameras
250 H. AlRemeithi et al.

Fig. 10 Laser attenuation during Sandstorms [13]

Fig. 11 Event-based stereo configuration [50]

in a stereo configuration. Like its frame-based counterpart, this configuration has


applicability towards Depth Estimation, Visual Odometry (extended to SLAM), and
because of the Event stream, high-speed semi-dense reconstruction is possible [50].
For the purposes of off-road navigation the disparity maps can be used for slope
detection of incoming dunes, the detailed strategy of the proposed method will be
discussed in the following sections (Fig. 11).

3.3 Monocular Depth Estimation from Events

The data fusion difficulties is apparent as there has been a shift in leading insti-
tute’s research, such as DARPA [51] and NASA [52] opting for purely vision based
strategies. Avoiding such complexities allows for simpler hardware integration and
simpler software design with improved real-time performance as well. Also, since
monocular setups require less setup and calibration than stereo configurations, it is
possible to capitalize further on these optimizations. The reliability of monocular
Event Vision for Autonomous Off-Road Navigation 251

Fig. 12 Traversability analysis during DARPA challenge [51]

Fig. 13 Event mono-depth network overview [53]

vision in regular frame cameras has been tested early on in the literature to push the
limits of autonomous systems in the context of off-road terrain analysis [51]. The
terrain traversability tests have been conducted in the DARPA challenge involving
an extensive 132 mile test in 7 h in off-road environments (Fig. 12).
Researchers have also proposed deep neural network capable of combining both
modalities and achieving better results than the state-of-the-art (∼30%) for monoc-
ular depth estimation when tested on the KITTI dataset.
The developed method in [39, 53] presents a network architecture that generates
voxel grids from recurrent neural networks that yield logarithmic depth estimations
of the terrain. The model may be simply reproduced in future work by implementing
and encoder-decoder based model with residual networks as a backbone, or Vision
Transformers as well may be investigated due to their recent proven reliability [54].
From [53], we see that although contrast information is not provided about the scene,
meaningful representations of events are recorded which is evident in the depth
estimation from the following (Figs. 13 and 14).
252 H. AlRemeithi et al.

Fig. 14 Qualitative analysis of night-time depth estimation (On-road)

4 Proposed End-to-End Navigation Model

From the previous findings and the discussed literature, we can theorize a possible
solution to address the gaps. The main contribution is towards a Realizable Event-
based Deep Learning Neural Network for Autonomous Off-road Navigation. The
system can be designed using well-established frameworks in the community, like
PyTorch, to ensure stability during development. Moreover, the hardware imple-
mentation may be done on a CUDA enabled single-board computer. This is done to
further enhance the real-time performance by the on-board GPU parallelism capa-
bilities [55].
The main task is to optimize steering angle of mobile platform during off-road
navigation by the assistance of a vision sensor. In principle, the strategy shall incor-
porate Depth/Height estimation of terrain and possible obstacles using events and
frame fusion. The system shall contain a pre-processor block to filter noise and
enhance fused data before injecting them into the deep learning pipeline. In addition,
as seen in [39, 53], for meaningful latent space representation, Encoder-Decoder
based architectures are employed. Some examples to reduce unwanted artifacts and
recreate the scene from event-frame fusion are Variational Auto-Encoders, Adver-
sarial Auto-Encoders, and UNet Architectures. By including relevant loss functions
and optimizers to generate viable depth maps, the yielded results are then taken to the
Event Vision for Autonomous Off-Road Navigation 253

Fig. 15 Proposed end-to-end deep neural network

steering optimizer block. At this stage, the depth maps are analyzed using modern
computer vision technique or inferred from a Neural Network as shown in [45]. The
two main branches in the model, adapted from the aforementioned literature, are the
Depth Estimation branch and the Steering branch (Fig. 15).
The end-to-end model is derived from two independent models developed by
the department of informatics in the University of Zurich. The main contribution
in our concept is to combine the federated systems into one trainable model and
adjust the asynchronous encoder-decoder based embedding backbone into a more
computationally efficient lightweight variant suitable for deployment under restricted
hardware; further details discussed in Sect. 6 of the chapter.

4.1 Event Preprocessing

Due to the design of the sensors, they are particularly susceptible to the Background
Activity noise caused by cyclic noise and circuit leakage currents. Since background
activity rises when there is less light or when the sensitivity is increased, a filter
is frequently required for a variety of applications. A noise filter can be helpful in
certain situations for eliminating real events that are caused by slight changes in
the light and maintaining a greater delineation between the mobile target and the
surroundings. For this study, the Neuromorphic camera tool from the Swiss-based
startup iniVation, DV, is used for prototyping and selection of denoising algorithm.
From the literature, two prevalent noise removal techniques are used, coined as
knoise [56] and ynoise [57]. The former algorithm is depicted to have O(N) memory
complexity background removal capability. The proposed method is preferred for
memory sensitive tasks where near sensor implementations and harsh energy and
memory constraints are imposed. The method stores recovered events from the stream
as long as they are unique per row and column within a specific timestamp. Doing so,
minimizes the memory utilization of the on-board processor, and the reported error
254 H. AlRemeithi et al.

Fig. 16 Qualitative comparison of noise removal algorithms

rates were tens of magnitudes better than previous spatiotemporal filter designs.
The main use-case for such a filter would be for mobile platforms with limited in-
memory computing resources, such as off-road navigation platforms. The latter,
ynoise, presents a two-stage filtering solution. The method discards background
activity based on the duration of events within a spatiotemporal window around
a hot-pixel. Results of both knoise, ynoise and a generic noise removal algorithm
from iniVation is shown next (Fig. 16).

4.2 Depth Estimation Branch

The depth estimation branch is adopted from [39] where the network architecture is
RAMNet. RAMNet is a Recurrent Asynchronous Multimodal Neural Network which
serves as a generalized variant of RNNs that can handle asynchronous datastreams
depending on sensor-specific learnable encoding parameters [39]. The architecture is
a fully convolutional encoder-decoder architecture based on U-Net. The architecture
of the depth estimation branch is given by (Fig. 17).
Event Vision for Autonomous Off-Road Navigation 255

Fig. 17 Depth estimation branch

4.3 Steering Prediction Branch

It was demonstrated in [45] how a deep learning model can take advantage of the
event stream from a moving platform to determine the steering angles. The steering
branch for this study is based on the previous study, where subtle motion cues during
desert navigation will be fed into the model to learn the necessary steering behaviour
in harsh terrain. For simplicity, this steering algorithm will not be adjusted to take
into consideration slip, but rather only the planar geometry of the terrain to avoid
collision with hills and overturning the vehicle. To implement the discussed approach
from the reference material, the events are dispatched into an accumulator first to
obtain the frames which will be used in a regression task to determine an appropriate
angular motion. As a backbone, ResNet models are to be investigated as a baseline
for the full model (Fig. 18).
256 H. AlRemeithi et al.

Fig. 18 Steering regression branch

Fig. 19 Sample structure—MADMAX dataset

4.4 Desert Driving Dataset

To the best of the author’s knowledge, there is yet to be a purely ground-based


monocular vision driving dataset for desert navigation in day/night-time. One of the
contributions of this study is to produce a UAE specific Driving Dataset. Another
off-road driving dataset which was generated from the Moroccan Deserts is used as
reference throughout the investigation. Original details pertaining to the dataset orga-
nization is in the paper “The MADMAX data set for visual-inertial rover navigation
on Mars” [58] (Fig. 19).
Event Vision for Autonomous Off-Road Navigation 257

Fig. 20 NVIDIA xavier AGX peripherals

5 System Implementation

The proposed embedded system for processing and real-time deployment is the
NVIDIA Xavier AGX. From the official datasheet, The developer kit consists of
512-core Volta GPU with Tensor Cores, 32GB memory, and 8-core ARM v8.2 64-
bit CPU on an Linux-based distribution. NVIDIA Volta also allows for various data
operations which gives flexibility during the reduction of convolution operations
[55, 59]. Moreover, it has a moderate power requirement of 30−65 W while deliver-
ing desktop-grade performance in a small form-factor. Consequently, the NVIDIA
Xavier is a suitable candidate for the deployment of a standalone real-time system
(Fig. 20).
For a more extreme approach it is also possible to fully implement the proposed
network on an FPGA board as demonstrated by [60–62]. Since the design of an
FPGA framework is not within the scope of the project, as a proof of concept to
motivate further research, this chapter will detail the implementation of convolution
optimization techniques in simplistic networks on the PYNQ-Z1. A key feature is
it allows for ease of implementation and rapid prototyping because of the Python-
enabled development environment [60]. System features are highlighted in the next
figure (Fig. 21).
258 H. AlRemeithi et al.

Fig. 21 PYNQ-Z1 key technologies and environment breakdown

5.1 Deep Learning Acceleration

When developing an autonomous driving platform it is often the case that tradeoffs
between performance and system complexity is taken into consideration [9]. Efforts
to reduce the challenges are seen by systems discussed in the previous sections
transitioning towards a singular sensor approach. The solutions are bio-inspired,
purely linked to vision and neuromorphic computing paradigms as they tend to
offer significant performance boost [22, 63, 64]. Considerable studies have been
published in the Deep Learning field aiming towards less computationally expensive
inference setups, mainly through CUDA optimizations [55, 59] and FPGA Hardware
Acceleration [60].
The optimizations mentioned in the previous work take advantage of the primary
used hardware in autonomous system deployment, NVIDIA Development boards.
The enhancements are done through improving the pipeline through CUDA program-
ming and minimizing the needed operations to perform a convolution operation. In
the discussed methods one achieved comparable performance to regular convolu-
tion by converting floating-point operations into binary operations. While the other
reduces the non-trivial elements by averaging the pixels around a point-of-interest
and discarding the surrounding neighbours [59]; coined as Perforated Convolution
(Fig. 22).
Event Vision for Autonomous Off-Road Navigation 259

Fig. 22 Local Binary Convolution Layer (LBC) [55]

5.2 Memristive Neuromorphic Computing

When addressing the hardware implementation of deep learning systems often Neu-
romorphic computing platforms are mentioned. In the context of Event-based vision,
neuromorphic architectures are recently being favored over traditional von Neumann
architectures. This is due to comparative improvements in metrics such as computa-
tional speed up and reduced power consumption [65, 66]. Studies show that devel-
oping neuromorphic systems for mobile robotics, such as employing Spiking Neural
Networks (SNN), will yield faster and less computationally intensive pipelines with
reduced power consumption [67, 68]. SNNs are defined as a neuromorphic archi-
tecture which provides benefits such as improved parallelism due to neurons firing
asynchronously and possibly at the same time. The aforementioned is realizable
because by design, neuromorphic systems combine the functionality of processing
and memory within the same subsystem that carries out tasks by the design of the
artificial network rather than a set of algorithms as defined in von Neumann archi-
tectures [66]. The following figures demonstrate the main differences between both
architectures in addition to the working concept of a SNN (Fig. 23).

Fig. 23 Von Neumann and neuromorphic computing architectures


260 H. AlRemeithi et al.

Fig. 24 SNN operation

In theory, SNNs, in contrast to other forms of artificial neural networks, such


as multilayer perceptrons, the function of the network’s neurons and synapses is
more closely modeled after biological systems in spike neural networks. The most
important distinction between standard artificial neural networks and SNNs is that
the operation of SNNs takes timing into consideration [66] (Fig. 24).
Recently, implementations of neuromorphic hardware have been deployed for
robotics applications as seen in [69–71]. Applications included optical flow and
autonomous navigation of both aerial and ground vehicles. Additionally, they also
demonstrated power-efficient and computationally modest approaches on neuromor-
phic hardware, which is related to the current offroad study in this chapter. The previ-
ously mentioned studies have proven the concepts discussed in earlier neuromorphic
literature, and extending on that, this offroad study aims to build on that knowledge by
possibly combining SNN like architectures in conjunctions with Event-based vision
sensors.
Furthermore, for a hardware friendly architecture and efficient processing , it
has also been researched in the literature that memristor based implementations are
developed for more reductions in power consumption [72]. In addition to the tradi-
tional CMOS for signal control, compatible sub-blocs of nanocomposite materials
known as resistive memory, also known as memristive devices due to their ability
to store information in a nonvolatile manner with high density integration [73, 74]
is considered as processing unit. The later is ideally suited for the development of
in-memory computing engines for computer vision application [75–77] and efficient
substrates for spiking neural networks. The previous claim is explained by the fact
that crossbar-configured memristive devices can imitate synaptic efficacy and plas-
ticity [78, 79]. Synaptic efficacy is the creation of a synaptic output depending on
incoming neural activity. This may be measured using Ohm’s law by measuring the
device’s current when a read voltage signal is supplied. Synaptic plasticity is the
synapse’s capacity to adjust its weight during learning. By applying write pulses
along the crossbar’s wires, the crossbar design may execute synaptic plasticity in a
parallel and efficient manner. The mentioned architecture is seen next (Fig. 25).
Event Vision for Autonomous Off-Road Navigation 261

Fig. 25 Memristor based architecture

6 Results and Discussion

We take the two available networks, mainly what we are proposing is to change the
encoder-decoder CNN backbone into a purely attention-based model, Vision Trans-
former, and the usage of Binary convolution layers for the steering regression. For the
transformer, we argue that there is an apparent performance boost to the real-time,
and hardware implementation. The advantages include reduced power consump-
tion and performance time. This is seen through a reduced amount of parameters in
the Vision Transformer model, which implies less processing, and in-terms of the
hardware implementation of an accelerator, less arithmetic operations. Less phys-
ical computations is also preferable as it yield less power consumption for power
constrained platforms, such as ones seen in a desert environment which may have
reduced computing capabilities to deal with extreme heat and other adverse weather
conditions. Our proposed solution is to change the architecture seen in the Fig. 17,
by the transformer model (Fig. 26).
Studies have proven similar concepts in different domains, and what this chapter
aims to contribute towards is a first step towards efficient means of depth estimation
in desert navigation from the above rationale and the following results [81].
262 H. AlRemeithi et al.

Fig. 26 Proposed event transformer model for the system’s integration [80]

Architecture Estimate Speed (Frames/sec) ↑ Energy (Joules/Coulomb) ↓


CNN-backbone Depth 84.132 3.206
Intrinsics 97.498 2.908
Transformer-backbone Depth 40.215 5.999
Intrinsics 60.190 4.021

In terms of hardware implementation, using a Vision Transformer improves the


power consumption profile which is inferred from less parameters in the trainable
model. During implementation this of course translates to less dissipation of heat
and power loss due to reduced arithmetic operations conducted by for example the
discussed memristor based architecture in the previous section.
Method Type Parameters FLOPs Top-1 Accuracy
EST frame 21.38M 4.28G 52.4
M-LSTM frame 21.43M 4.82G 63.1
MVF-Net frame 33.62M 5.62G 59.9
ResNet-50 frame 25.61M 3.87G 55.8
EventNet voxel 2.81M 0.91G 17.1
RG-CNNs voxel 19.46M 0.79G 54.0
EV-VGCNN voxel 0.84M 0.70G 65.1
VMV-GCN voxel 0.86M 1.30G 66.3
Event Transformer [80] tensor 15.87M 0.51G 71.2

The initial results obtained towards a computationally efficient regression scheme


done on the PYNQ-Z1. Implementation was conducted to optimize the matrix multi-
plication operations in the convolution layers for the steering estimation by regress-
ing an angle according to the depth map from the previous branch. The first step is
Event Vision for Autonomous Off-Road Navigation 263

Table 1 Results of SMR with adjusted LBC block


Resource SMR
Conv2D BinConv2D
CPU ∼40% ∼40%
RAM ∼35% ∼33%
VRAM ∼51.25% ∼40%
Processing time per batch 1.1477 s 0.9122 s
δT 20.5193%

to recreate the results on GPU, then validate the approach on FPGA. The specific
algorithm optimized was first discussed in [82]. The study by Hu et al. demon-
strated Single-Mesh Reconstruction (SMR) to construct a 3D model from a single
RGB image. The approach depends on consistency between interpolated features
and learnt features through regression:
• Single-view RGB
• Silhouette mask of detected object.
The applicability of SMR is useful in the context of autonomous platforms. Poten-
tially, the platform can establish a 3D framework of itself with respect to the detected
3D obstacles (instead of the artifacts mentioned in [82]). Evidently, this can enhance
navigation strategies in the absence of additional sensors. The adjustment of the
convolution layers were aligned with the methods discussed to create a LBC layer
[55]. The purpose of this experiment was to prove the reduction in processing
resources required when performing convolution, regardless during training or infer-
ence (Table 1).
Clearly, we cannot establish a high level of confidence for this software-based
acceleration technique without examining a more relevant network and specifically
the KITTI dataset for self-driving vehicles, as a standard. The described network
in [83] follows a similar approach to SMR, however, applied to pose detection of
vehicles. During training, the pipeline is given:
• Single-view RGB
• Egocentric 3D Vehicle Pose Estimation.
The network architecture is given by (Fig. 27).
The 2D/3D Intermediate Representation stage is of interest to us as the main objec-
tive is to recreate a 3D framework from 2D inputs. Instead of regular convolution,
implementing LBC block yields the next results (Table 2).
From the results shared above, it is seen that the approach is viable, but it reduces
in potency and the dataset complexity increases. In the first dataset, singular artifacts
were provided without any background or ambiguous features. But in the KITTI
dataset, the vehicles exist in a larger ecosystem which may influence incorrect results,
and thus requires the network more time to determine clear boundaries from the back-
ground before transforming the 2D information into 3D representation. Furthermore,
264 H. AlRemeithi et al.

Fig. 27 EgoNet architecture—dotted outline is the required region of optimization

Table 2 Results of EgoNet with adjusted LBC block


Resource EgoNet
Conv2D BinConv2D
CPU ∼43% ∼42%
RAM ∼32% ∼30%
VRAM ∼65% ∼61%
Processing time per batch 0.8554 s 0.7607
δT 11.0708%

Fig. 28 PYNQ-Z1 LBC implementation results

we have successfully deployed an implementation of a binary convolution block done


on the PYNQ-Z1 to test the improved performance on FPGA hardware. The FPGA
specific metrics of our implementation are shown below (Fig. 28).

7 Conclusions

The implications of our experiments extend beyond the development of autonomous


robotics, but it also tackles another problem which is the financial aspect of deploying
such systems. Firstly, for the case study addressed, desert navigation has yet to achieve
Event Vision for Autonomous Off-Road Navigation 265

autonomy due to the terrain analyses complexities, ranging from depth, steering, and
slip estimation. The computational requirements for these approaches to be done in
real-time are often computationally expensive, but from the proposed approach, we
believe our system to be the first attempt towards efficient real-time computing in
constrained settings for off-road navigation. Secondly, the pipeline proposed may
be implemented for other systems, such as unmanned aerial platforms which tend
to be deployed for search-and-rescue missions and are seldom to have sufficient
on-board computing resources.This chapter served as a modest introduction towards
Event-based camera systems within the context of off-road navigation. The chapter
began by establishing the foundations behind the Neuromorphic sensor hardware
driving the camera. Moving on to the data processing aspect and the applicability of
traditional techniques. The knowledge-base was assessed to determine whether the
traditional techniques were indeed viable in this novel sensor. Furthermore, imple-
mentations of a Deep Learning system utilizing event camera is also possible through
the necessary data processing techniques of the events prior to the training phase. Dur-
ing inference, a possible implementation of an acceleration method, namely Binary
Convolutions, were implemented and initial results hold promising potential for the
future development of real-time projects with event cameras. Future work is still
necessary, specifically when addressing the data collection aspect within the UAE
environment. However, to summarize, the proposed Event Transformer-Binary CNN
(EvT-BCNN) concept in this chapter is a first attempt towards the deployments of
memristive-based systems and neuromorphic vision sensors as computing-efficient
alternatives to classical vision systems.

Acknowledgements This project is funded by Tawazun Technology & Innovation (TTI), under
Tawazun Economic Council, through the collaboration with Khalifa University. The work shared
is part of a MSc Thesis project by Hamad AlRemeithi, and all equipment is provided by TTI.
Professional expertise is also a shared responsibility between both entities, and the authors extend
their deepest gratitude for the opportunity to encourage research in this field.

References

1. Badue, C., Guidolini, R., Carneiro, R. V., Azevedo, P., Cardoso, V. B., Forechi, A., Jesus, L.,
Berriel, R., Paixão, T. M., Mutz, F., de Paula Veronese, L., Oliveira-Santos, T., & De Souza,
A. F. (2021). Self-driving cars: A survey. Expert Systems with Applications, 165.
2. Ni, J., Chen, Y., Chen, Y., Zhu, J., Ali, D., & Cao, W. (2020) A survey on theories and appli-
cations for self-driving cars based on deep learning methods. Applied Sciences (Switzerland),
10.
3. Chen, G., Cao, H., Conradt, J., Tang, H., Rohrbein, F., & Knoll, A. (2020). Event-based neuro-
morphic vision for autonomous driving: A paradigm shift for bio-inspired visual sensing and
perception. IEEE Signal Processing Magazine, 37.
4. Lin, M., Yoon, J., & Kim, B. (2020) Self-driving car location estimation based on a particle-
aided unscented kalman filter. Sensors (Switzerland), 20.
5. Mugunthan, N., Naresh, V. H., & Venkatesh, P. V. (2020). Comparison review on lidar vs camera
in autonomous vehicle. In International Research Journal of Engineering and Technology.
266 H. AlRemeithi et al.

6. Ming, Y., Meng, X., Fan, C., & Yu, H. (2021) Deep learning for monocular depth estimation:
A review. Neurocomputing, 438.
7. Li, X., Tang, B., Ball, J., Doude, M., & Carruth, D. W. (2019). Rollover-free path planning for
off-road autonomous driving. Electronics (Switzerland), 8.
8. Pan, Y., Cheng, C. A., Saigol, K., Lee, K., Yan, X., Theodorou, E. A., & Boots, B. (2020).
Imitation learning for agile autonomous driving. International Journal of Robotics Research,
39.
9. Liu, O., Yuan, S., & Li, Z. (2020). A survey on sensor technologies for unmanned ground
vehicles. In Proceedings of 2020 3rd International Conference on Unmanned Systems, ICUS
2020.
10. Shin, J., Kwak, D. J., & Kim, J. (2021). Autonomous platooning of multiple ground vehicles
in rough terrain. Journal of Field Robotics, 38.
11. Naranjo, J. E., Jiménez, F., Anguita, M., & Rivera, J. L. (2020). Automation kit for dual-mode
military unmanned ground vehicle for surveillance missions. IEEE Intelligent Transportation
Systems Magazine, 12.
12. Browne, M., Macharis, C., Sanchez-diaz, I., Brolinson, M., & Illsjö, R. (2017). Urban traffic
congestion and freight transport : A comparative assessment of three european cities. Interdis-
ciplinary Conference on Production Logistics and Traffic.
13. Zhong, H., Zhou, J., Du, Z., & Xie, L. (2018). A laboratory experimental study on laser
attenuations by dust/sand storms. Journal of Aerosol Science, 121.
14. Koepke, P., Gasteiger, J., & Hess, M. (2015). Technical note: Optical properties of desert aerosol
with non-spherical mineral particles: Data incorporated to opac. Atmospheric Chemistry and
Physics Discussions, 15, 3995–4023.
15. Raja, A. R., Kagalwala, Q. J., Landolsi, T., & El-Tarhuni, M. (2007). Free-space optics chan-
nel characterization under uae weather conditions. In ICSPC 2007 Proceedings - 2007 IEEE
International Conference on Signal Processing and Communications.
16. Vargasrivero, J. R., Gerbich, T., Buschardt, B., & Chen, J. (2021). The effect of spray water
on an automotive lidar sensor: A real-time simulation study. IEEE Transactions on Intelligent
Vehicles.
17. Strawbridge, K. B., Travis, M. S., Firanski, B. J., Brook, J. R., Staebler, R., & Leblanc, T.
(2018). A fully autonomous ozone, aerosol and nighttime water vapor lidar: A synergistic
approach to profiling the atmosphere in the canadian oil sands region. Atmospheric Measure-
ment Techniques, 11.
18. Hummel, B., Kammel, S., Dang, T., Duchow, C., & Stiller, C. (2006). Vision-based path-
planning in unstructured environments. In IEEE Intelligent Vehicles Symposium, Proceedings.
19. Mueller, G. R., & Wuensche, H. J. (2018). Continuous stereo camera calibration in urban
scenarios. In IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, 2018-
March.
20. Rankin, A. L., Huertas, A., & Matthies, L. H. (2009). Stereo-vision-based terrain mapping for
off-road autonomous navigation. Unmanned Systems Technology X, I, 7332.
21. Litzenberger, M., Belbachir, A. N., Donath, N., Gritsch, G., Garn, H., Kohn, B., Posch, C., &
Schraml, S. (2006). Estimation of vehicle speed based on asynchronous data from a silicon
retina optical sensor. In IEEE Conference on Intelligent Transportation Systems, Proceedings,
ITSC.
22. Gallego, G., Delbruck, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., Leutenegger, S.,
Davison, A. J., Conradt, J., Daniilidis, K., & Scaramuzza, D. (2020). Event-based vision: A
survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44.
23. Delbrück, T., Linares-Barranco, B., Culurciello, E., & Posch, C. (2010). Activity-driven, event-
based vision sensors. In ISCAS 2010 - 2010 IEEE International Symposium on Circuits and
Systems: Nano-Bio Circuit Fabrics and Systems.
24. Rebecq, H., Ranftl, R., Koltun, V., & Scaramuzza, D. (2021). High speed and high dynamic
range video with an event camera. IEEE Transactions on Pattern Analysis and Machine Intel-
ligence, 43.
Event Vision for Autonomous Off-Road Navigation 267

25. Lichtsteiner, P., Posch, C., & Delbruck, T. (2008). A 128× 128 120 db 15 µs latency asyn-
chronous temporal contrast vision sensor. IEEE Journal of Solid-State Circuits, 43, 566–576.
26. Brändli, C., Berner, R., Yang, M., Liu, S.-C., & Delbruck, T. (2014). A 240 × 180 130 db 3
µs latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-State Circuits,
49, 2333–2341.
27. Scheerlinck, C., Barnes, N., & Mahony, R. (2019). Continuous-time intensity estimation using
event cameras. Lecture notes in computer science (including subseries Lecture notes in artificial
intelligence and lecture notes in bioinformatics), 11365 LNCS.
28. Gallego, G., Lund, J. E. A., Mueggler, E., Rebecq, H., Delbruck, T., & Scaramuzza, D. (2018).
Event-based, 6-dof camera tracking from photometric depth maps. IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, 40.
29. Mostafavi, M., Wang, L., & Yoon, K. J. (2021). Learning to reconstruct hdr images from events,
with applications to depth and flow prediction. International Journal of Computer Vision, 129.
30. Mueggler, E., Huber, B., & Scaramuzza, D. (2014). Event-based, 6-dof pose tracking for high-
speed maneuvers.
31. Posch, C., Matolin, D., & Wohlgenannt, R. (2011). A qvga 143 db dynamic range frame-free
pwm image sensor with lossless pixel-level video compression and time-domain cds. IEEE
Journal of Solid-State Circuits, 46.
32. Lee, S., Kim, H., & Kim, H. J. (2020). Edge detection for event cameras using intra-pixel-area
events. In 30th British Machine Vision Conference 2019, BMVC 2019.
33. Rebecq, H., Ranftl, R., Koltun, V., & Scaramuzza, D. (2019). Events-to-video: Bringing modern
computer vision to event cameras. In Proceedings of the IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, 2019-June.
34. Xu, H., Gao, Y., Yu, F., & Darrell, T. (2017). End-to-end learning of driving models from
large-scale video datasets. In Proceedings - 30th IEEE Conference on Computer Vision and
Pattern Recognition, CVPR 2017, 2017-January.
35. Xu, H., Gao, Y., Yu, F., & Darrell, T. (2017). End-to-end learning of driving models from
large-scale video datasets. In Proceedings - 30th IEEE Conference on Computer Vision and
Pattern Recognition, CVPR 2017, 2017-January.
36. Boahen, K. A. (2004). A burst-mode word-serial address-event link - i: Transmitter design (p.
51). IEEE Transactions on Circuits and Systems I: Regular Papers.
37. Wang, C., Buenaposada, J. M., Zhu, R., & Lucey, S. (2018). Learning depth from monocular
videos using direct methods. In Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition.
38. Guo, S., Kang, Z., Wang, L., Zhang, L., Chen, X., Li, S., & Xu, W. (2020). A noise filter for
dynamic vision sensors using self-adjusting threshold.
39. Gehrig, D., Ruegg, M., Gehrig, M., Hidalgo-Carrio, J., & Scaramuzza, D. (2021). Combining
events and frames using recurrent asynchronous multimodal networks for monocular depth
prediction. IEEE Robotics and Automation Letters, 6.
40. Pan, L., Scheerlinck, C., Yu, X., Hartley, R., Liu, M., & Dai, Y. (2019). Bringing a blurry frame
alive at high frame-rate with an event camera. In Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, 2019-June.
41. Pan, L., Hartley, R., Scheerlinck, C., Liu, M., Yu, X., & Dai, Y. (2022). High frame rate video
reconstruction based on an event camera. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 44.
42. Gehrig, D., Rebecq, H., Gallego, G., & Scaramuzza, D. (2020). Eklt: Asynchronous photometric
feature tracking using events and frames. International Journal of Computer Vision, 128.
43. Saner, D., Wang, O., Heinzle, S., Pritch, Y., Smolic, A., Sorkine-Hornung, A., & Gross, M.
(2014). High-speed object tracking using an asynchronous temporal contrast sensor. In 19th
International Workshop on Vision, Modeling and Visualization, VMV 2014.
44. Muglikar, M., Gehrig, M., Gehrig, D., & Scaramuzza, D. (2021). How to calibrate your event
camera. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Workshops.
268 H. AlRemeithi et al.

45. Maqueda, A. I., Loquercio, A., Gallego, G., Garcia, N., & Scaramuzza, D. (2018). Event-based
vision meets deep learning on steering prediction for self-driving cars. In Proceedings of the
IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
46. Galluppi, F., Denk, C., Meiner, M. C., Stewart, T. C., Plana, L. A., Eliasmith, C., Furber, S.,
& Conradt, J. (2014). Event-based neural computing on an autonomous mobile platform. In
Proceedings - IEEE International Conference on Robotics and Automation.
47. Hu, Y., Binas, J., Neil, D., Liu, S. C., & Delbruck, T. (2020). Ddd20 end-to-end event camera
driving dataset: Fusing frames and events with deep learning for improved steering prediction.
In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems, ITSC 2020.
48. Zhong, H., Wang, H., Wu, Z., Zhang, C., Zheng, Y., & Tang, T. (2021). A survey of lidar and
camera fusion enhancement. Procedia Computer Science, 183.
49. Song, R., Jiang, Z., Li, Y., Shan, Y., & Huang, K. (2018). Calibration of event-based camera
and 3d lidar. In 2018 WRC Symposium on Advanced Robotics and Automation, WRC SARA
2018 - Proceeding.
50. Zhou, Y., Gallego, G., & Shen, S. (2021). Event-based stereo visual odometry. IEEE Transac-
tions on Robotics, 37.
51. Dahlkamp, H., Kaehler, A., Stavens, D., Thrun, S., & Bradski, G. (2007). Self-supervised
monocular road detection in desert terrain. Robotics: Science and Systems, 2.
52. Bayard, D. S., Conway, D. T., Brockers, R., Delaune, J., Matthies, L., Grip, H. F., Merewether,
G., Brown, T., & Martin, A. M. S. (2019). Vision-based navigation for the nasa mars helicopter.
AIAA Scitech 2019 Forum.
53. Hidalgo-Carrio, J., Gehrig, D., & Scaramuzza, D. (2020). Learning monocular dense depth
from events. In Proceedings - 2020 International Conference on 3D Vision, 3DV 2020.
54. Li, Z., Asif, M. S., & Ma, Z. (2022). Event transformer.
55. Juefei-Xu, F., Boddeti, V. N., & Savvides, M. (2017). Local binary convolutional neural net-
works. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition,
CVPR 2017, 2017-January.
56. Khodamoradi, A., & Kastner, R. (2021). O(n)-space spatiotemporal filter for reducing noise in
neuromorphic vision sensors. IEEE Transactions on Emerging Topics in Computing, 9.
57. Feng, Y., Lv, H., Liu, H., Zhang, Y., Xiao, Y., & Han, C. (2020). Event density based denoising
method for dynamic vision sensor. Applied Sciences (Switzerland), 10.
58. Meyer, L., Smíšek, M., Villacampa, A. F., Maza, L. O., Medina, D., Schuster, M. J., Steidle,
F., Vayugundla, M., Müller, M. G., Rebele, B., Wedler, A., & Triebel, R. (2021). The madmax
data set for visual-inertial rover navigation on mars. Journal of Field Robotics, 38.
59. Figurnov, M., Ibraimova, A., Vetrov, D., & Kohli, P. (2016). Perforatedcnns: Acceleration
through elimination of redundant convolutions. Advances in Neural Information Processing
Systems, 29.
60. Salman, A. M., Tulan, A. S., Mohamed, R. Y., Zakhari, M. H., & Mostafa, H. (2020). Compar-
ative study of hardware accelerated convolution neural network on pynq board. In 2nd Novel
Intelligent and Leading Emerging Sciences Conference, NILES 2020.
61. Yoshida, Y., Oiwa, R., & Kawahara, T. (2018). Ternary sparse xnor-net for fpga implementation.
In Proceedings - 7th International Symposium on Next-Generation Electronics. ISNE, 2018.
62. Ding, C., Wang, S., Liu, N., Xu, K., Wang, Y., & Liang, Y. (2019). Req-yolo: A resource-aware,
efficient quantization framework for object detection on fpgas. In FPGA 2019 - Proceedings
of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.
63. Li, J. N., & Tian, Y. H. (2021). Recent advances in neuromorphic vision sensors: A survey.
Jisuanji Xuebao/Chinese Journal of Computers, 44.
64. Chen, G., Cao, H., Aafaque, M., Chen, J., Ye, C., Röhrbein, F., Conradt, J., Chen, K., Bing, Z.,
Liu, X., Hinz, G., Stechele, W., & Knoll, A. (2018) Neuromorphic vision based multivehicle
detection and tracking for intelligent transportation system. Journal of Advanced Transporta-
tion, 2018.
65. Gutierrez-Galan, D., Schoepe, T., Dominguez-Morales, J. P., Jiménez-Fernandez, A., Chicca,
E., & Linares-Barranco, A. (2020). An event-based digital time difference encoder model
implementation for neuromorphic systems.
Event Vision for Autonomous Off-Road Navigation 269

66. Schuman, C. D., Kulkarni, S. R., Parsa, M., Mitchell, J. P., Date, P., & Kay, B. (2022).
Opportunities for neuromorphic computing algorithms and applications. Nature Computational
Science, 2.
67. Richter, C., Jentzsch, S., Hostettler, R., Garrido, J. A., Ros, E., Knoll, A., et al. (2016). Muscu-
loskeletal robots: Scalability in neural control. IEEE Robotics & Automation Magazine, 23(4),
128–137.
68. Zenke, F., & Gerstner, W. (2014). Limits to high-speed simulations of spiking neural networks
using general-purpose computers. Frontiers in Neuroinformatics, 8.
69. Dupeyroux, J., Hagenaars, J. J., Paredes-Vallés, F., & de Croon, G. C. H. E. (2021). Neuromor-
phic control for optic-flow-based landing of mavs using the loihi processor. In Proceedings -
IEEE International Conference on Robotics and Automation, 2021-May.
70. Mitchell, J. P., Bruer, G., Dean, M. E., Plank, J. S. Rose, G. S., & Schuman, C. D. (2018).
Neon: Neuromorphic control for autonomous robotic navigation. In Proceedings - 2017 IEEE
5th International Symposium on Robotics and Intelligent Sensors, IRIS 2017, 2018-January.
71. Tang, G., Kumar, N., & Michmizos, K. P. (2020). Reinforcement co-learning of deep and
spiking neural networks for energy-efficient mapless navigation with neuromorphic hardware.
In IEEE International Conference on Intelligent Robots and Systems.
72. Rajendran, B., Sebastian, A., Schmuker, M., Srinivasa, N., & Eleftheriou, E. (2019). Low-
power neuromorphic hardware for signal processing applications: A review of architectural
and system-level design approaches. IEEE Signal Processing Magazine, 36.
73. Lahbacha, K., Belgacem, H., Dghais, W., Zayer, F., & Maffucci, A. (2021) High density rram
arrays with improved thermal and signal integrity. In 2021 IEEE 25th Workshop on Signal and
Power Integrity (SPI) (pp. 1–4).
74. Fakhreddine, Z., Lahbacha, K., Melnikov, A., Belgacem, H., de Magistris, M., Dghais, W.,
& Maffucci, A. (2021). Signal and thermal integrity analysis of 3-d stacked resistive random
access memories. IEEE Transactions on Electron Devices, 68(1), 88–94.
75. Zayer, F., Mohammad, B., Saleh, H., & Gianini, G. (2020). Rram crossbar-based in-memory
computation of anisotropic filters for image preprocessingloa. IEEE Access, 8, 127569–127580.
76. Bettayeb, M., Zayer, F., Abunahla, H., Gianini, G., & Mohammad, B. (2022). An efficient
in-memory computing architecture for image enhancement in ai applications. IEEE Access,
10, 48229–48241.
77. Ajmi, H., Zayer, F., Fredj, A. H., Hamdi, B., Mohammad, B., Werghi, N., & Dias, J.
(2022). Efficient and lightweight in-memory computing architecture for hardware security.
arXiv:2205.11895.
78. Zayer, F., Dghais, W., Benabdeladhim, M., & Hamdi, B. (2019). Low power, ultrafast synaptic
plasticity in 1r-ferroelectric tunnel memristive structure for spiking neural networks. AEU-
International Journal of Electronics and Communications, 100, 56–65.
79. Zayer, F., Dghais, W., & Belgacem, H. (2019). Modeling framework and comparison of mem-
ristive devices and associated stdp learning windows for neuromorphic applications. Journal
of Physics D: Applied Physics, 52(39), 393002.
80. Li, Z., Asif, M., & Ma, Z. (2022). Event transformerh.
81. Varma, A., Chawla, H., Zonooz, B., & Arani, E. (2022). Transformers in self-supervised monoc-
ular depth estimation with unknown camera intrinsics.
82. Hu, T., Wang, L., Xu, X., Liu, S., & Jia, J. (2021). Self-supervised 3d mesh reconstruction
from single images. In Proceedings of the IEEE Computer Society Conference on Computer
Vision and Pattern Recognition.
83. Li, S., Yan, Z., Li, H., & Cheng, K. T. (2021). Exploring intermediate representation for
monocular vehicle pose estimation. In Proceedings of the IEEE Computer Society Conference
on Computer Vision and Pattern Recognition.
Multi-armed Bandit Approach for Task
Scheduling of a Fixed-Base Robot in the
Warehouse

Ajay Kumar Sandula , Pradipta Biswas , Arushi Khokhar ,


and Debasish Ghose

Abstract Robot task scheduling is an increasingly important problem in a multi-


robot system. The problem turns more complicated when the robots are heteroge-
neous with complementary capabilities and must work in coordination to accomplish
a task. This chapter describes a scenario where a fixed-base and a mobile robot with
complementary capabilities accomplish the ‘task’ of moving a package from a pickup
point to a shelf in a warehouse environment. We propose a two-fold optimised task
scheduling approach. The proposed approach reduces the task completion time based
on spatial and temporal constraints of the environment. The approach ensures that
the fixed-base robot reaches the mobile robot exactly when it brings the package to
the reachable workspace of the robotic arm. This helps us to reduce the waiting time
of the mobile robot. The multi-armed bandit (MAB) based stochastic task sched-
uler considers the history of the tasks to estimate the probabilities of corresponding
pickup requests (or tasks). The stochastic MAB scheduler ensures that the mobile
robot with higher estimates of probabilities is given top priority. Results demonstrate
that a stochastic multi-armed bandit based approach reduces the time taken to com-
plete a set of tasks compared to a deterministic first-come-first-serve approach to
solving the scheduling problem.

Keywords Task scheduling · Multi-agent coordination · Multi-armed bandit ·


Heterogeneous robots · Reinforcement learning

A. K. Sandula (B) · P. Biswas · D. Ghose


Indian Institute of Science, Bangalore 560012, India
e-mail: [email protected]
P. Biswas
e-mail: [email protected]
D. Ghose
e-mail: [email protected]
A. Khokhar
Jaypee University of Information Technology, Waknaghat 173234, India

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 271
A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous
Systems Applications, Studies in Computational Intelligence 1093,
https://doi.org/10.1007/978-3-031-28715-2_9
272 A. K. Sandula et al.

1 Introduction

In the near future, we will experience high penetration of heterogeneous multi-robot


systems in industries and daily life. Fragapane et al. [9] explain how material handling
technologies have advanced over the decades with innovation in robotics technology.
Multi-robot systems (MRS) are employed in warehouses and industries to handle
the logistics within an environment. In multi-robot systems, task allocation is the
assignment of tasks to a single or a group of agents based on the nature of the task. Task
scheduling means the arrangement of the tasks or sub-tasks for execution, depending
on the objectives and constraints. The problem of multi-agent pickup and dispatch
involves task allocation, task scheduling, multi-agent path planning and control. The
objective of task allocation or task scheduling techniques is to allocate or schedule
the tasks in a way that optimises a desired cost or objective(time for execution,
distance travelled). In the previous decades, several approaches have been used to
solve the task allocation problem, such as Centralized, Hybrid and Decentraliszed
approaches. Among the decentraliszed approaches, auction- based approaches have
been proven to be more efficient. The works by Zlot and Stentz [30], Wang et al. [28],
Viguria et al. [26] have explained the advantage of decentralized auction- based multi-
robot task allocation methods for various scenarios. This chapter focuses on the task
scheduling problem. The problem, multi-agent task scheduling, becomes challenging
when robots with different capabilities need to work in a coalition. Any schedule of
tasks or sub-tasks affects the individual task completion times for each agent. In the
literature, Stavridis and Doulgeri [23], Borrell Méndez et al. [1] and Szczepanski
et al. [24] investigated task schedule optimization algorithms in specific scenarios
where a single robot is executing the tasks. The papers by Wang et al. [27], Ho and
Liu [11], Zhang and Parker [29], Kalempa et al. [12] and Kousi et al. [14] describe
the multi-agent task scheduling scenarios where robots need to work in coalition to
accomplish the tasks. These works have detailed heuristic and rule- based approaches
for the task scheduling problem. This chapter investigates possible task scheduling
strategies for a warehouse-type scenario, where a fixed -base and a mobile robot must
collaborate to accomplish a task. The mobile robots carry a package from a pickup
point toward the shelf. Once it reaches the workspace of the fixed-base robot at the
shelf, the fixed-base robot pick and places the load from the mobile robot. AWhen
multiple mobile robots arrive at the fixed-base robot, a schedule of sub-tasks will
need to be generated for the pick and place execution at the shelf when multiple
mobile robots arrive at the fixed-base robot. This schedule affects the individual sub-
task completion time for the mobile robots because of the extra waiting time; that
is, the mobile robot has to wait for the pick and place execution until the scheduler
chooses the mobile robot.
In this chapter, we contributed towards the multi-armed bandit (Slivkins [22])
formulation of a task scheduling problem. Multi-armed bandit is a classical rein-
forcement learning problem where an agent has to prioritisze among several arms to
receive the best reward. The agent learns the reward probability of reward at a given
arm by random exploration and exploitation. We proposed a multi-armed bandit-
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot … 273

based stochastic scheduler that gives priority to the mobile robot with a higher prob-
ability estimate. The proposed approach further ensures the coordination between
agents (fixed-base and mobile) considering the temporal and spatial constraints. The
coordination between the fixed-base and mobile robot coordination is ensured while
scheduling the sub-tasks. The sub-tasks are scheduled so that the fixed-base robot
moves towards the parking spot of the mobile robot at the same time as the mobile
robot reaches the shelf.
We organise the chapter as follows. In Sect. 2, we present the literature review. In
Sect. 3, we elaborate the problem formulation, along with a detailed explanation of
the motion planning algorithms used. In Sect. 4, we propose a multi-armed bandit
formulation to organise the sequence of tasks of the robot. We report the results with
their analysis and interpretation in Sect. 5. In Sect. 6, we conclude the chapter with
a summary of the work presented in the chapter and future work.

2 Related Work

Task allocation is the assignment of tasks to the agents. In the context of MRS,
multi-robot task allocation (MRTA) was extensively investigated in the literature
(Zlot and Stentz [30], Wang et al. [28], Viguria et al. [26] and Tang and Parker [25]).
However, the assigned tasks are to be scheduled by the individual agents for the
execution. Task scheduling is the arrangement of tasks while execution. Researchers
investigated task scheduling for a robotic arm’s pick and place operation for several
applications. The work by Stavridis and Doulgeri [23] proposed an online task priority
strategy for assembling two parts considering the relative motions of robotic arms
while avoiding dynamic obstacles. Borrell Méndez et al. [1] investigated a decision
tree model to predict the optimal sequence of tasks for pick and place operation for
a dynamic scenario. The application is to assemble pieces which arrive in a tray to
manufacture footwear. Szczepanski et al. [24] explored a nature-inspired algorithm
for task sequence optimisation considering multiple objectives.
Wang et al. [27] investigated the heterogeneous multi-robot task scheduling for
a robotic arm by predicting the total time for execution of tasks and minimising
the pick and place execution time. However, the picking robot did not choose any
priority in the case when multiple mobile robots approached at the same time. Ho
and Liu [11] investigated the performance of nine pickup-dispatching rules for task
scheduling. The paper by Ho and Liu [11] found that LTIS (Longest Time In System)
rule has the worst performance, whilst GQL (Greater Queue Length) has the best
performance for the multiple-load pickup and dispatch problem. The station that was
not served for a long time will have the top priority in the LTIS rule. On the other
hand, the GQL rule gives priority to the station, which has more pickup requests that
need to be addressed. However, the study did not investigate the tasks that needed
collaboration between heterogeneous robots with complementary abilities. Zhang
and Parker [29] explored four heuristic approaches to solve the multi-robot task
scheduling in the case where robots needed to work in a coalition to accomplish a
274 A. K. Sandula et al.

task. The proposed methods have tried to schedule the tasks to reduce interference
with other tasks. However, the approach did not use the history of the tasks to prioritise
the scheduling process. The study by Kalempa et al. [12] reported a robust preemptive
task scheduling approach by categorising the tasks as ‘Minor’, ‘Normal’, ‘Major’
and ‘Critical’. The categories are decided based on the number of robots needed
to allocate for the task execution and urgency. ‘Minor’ tasks often do not require
any robot to perform the job. There are alternative means to accomplish these minor
tasks. ‘Normal’ tasks need one robot to finish the task. A task is ’Major’ when two
robots are required to complete the job. For the ‘Critical’ tasks, execution should
ideally be started as soon as the task is generated. A minimum of three robots are
required to accomplish the task. However, the proposed model did not consider the
criticality of the tasks within the categories. Kousi et al. [14] investigated a service-
oriented architecture (SOA) for controlling the execution of in-plant logistics. The
suggested scheduling algorithm in the architecture is search-based. The scheduler
finds all possibilities of alternatives available at the decision horizon and calculates
the utility for each of the alternatives. The task sequences with the highest utility is
then executed. The scheduler continues to generate the task sequences until the task
execution is completed. The utility is calculated by taking the weighted sum of the
consequence for the alternatives, considering the criteria such as distance travelled
and time for execution. However, the study did not consider robots working in a
coalition. To the best of our knowledge, none of the existing works in the current
literature has used the tasks’ history to set the task scheduler’s priority. In this chapter,
we investigate a multi-armed bandit approach to estimate the probability of a task
appearing in the future and using this information; the task scheduler assigns the
priority accordingly.
In robotics, the Multi-armed bandit (MAB) approach has been utilised, where the
robots must learn the preferences of the environment to allocate limited resources
among multiple alternatives. Korein and Veloso [13] reported a MAB approach to
learning the users’ preferences to schedule the mobile robots during their spare time
while servicing the users. Claure et al. [2] suggested a MAB approach with fairness
constraints for a robot to distribute resources based on the skill level of humans in
a human collaboration task. Dahiya et al. [3] investigated a MAB formulation to
allocate limited human operators for multiple semi-autonomous robots. Pini et al.
[20] explored a MAB formulation for task partitioning problems in swarm robotics.
Task partitioning could be useful in saving resources, less physical interference and
increasing efficiency, etc. But, it could also be costly to coordinate among differ-
ent sub-tasks linked to one another. The paper by Pini et al. [20] proposed a MAB
approach to estimate whether a task needed to be partitioned or not. The results are
compared with an ad-hoc algorithm given by Ozgul et al. [19], and the suggested
approach is shown to outperform the ad-hoc approach. Koval et al. [15] investigated a
MAB approach to select the most ‘robust’ trajectory under uncertainty for rearrange-
ment planning problem Durrant-Whyte et al. [5]. Eppner and Brock [6] reported a
MAB approach to decide on the best trajectory for a robotic arm to grasp an object
exploiting the environment surrounding the object. Krishnasamy et al. [16] proposed
a MAB formulation to reduce the queue regret of service by learning the service
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot … 275

probabilities over time. The learned probabilities help the server choose a service
with more probability. In this chapter, we propose a MAB formulation to decide the
priority for scheduling pick and place operations for a fixed-base robot (a limited
resource) among multiple mobile robots (competing alternatives) carrying the load
(Fig. 1).

3 Problem Formulation

3.1 Problem Framework and Description

In this subsection, we formulate the problem statement of task scheduling in a het-


erogeneous multi-robot system. We consider an environment of a warehouse with
pickup points and shelves. A pickup point Φi where i ∈ [1, p] ( p is the number of
pickup points) is defined as a point in a warehouse where a load-carrying mobile robot
picks up the load of type Ti . We define a shelf υ j where j ∈ [1, s] (s is the number
of shelves) as a point in the warehouse where the load is delivered. At the shelves, a
fixed-base robot picks up the load that is carried by a mobile robot and places it on a
shelf. A set η = [η1 , η2 , ..., η|η| ] is defined as the set of mobile robots that are present
in the environment, each capable of carrying a specific type of load. We consider
|X | as the cardinality of the set X . Similarly, the set  = [γ1 , γ2 , ..., γ|| ] is the set
of the fixed-base robots which are needed to work in coalition with mobile robots η
to accomplish a task. A task τ [i, j] is said to be accomplished when a mobile robot
carries a load of type Ti from pickup point Φi and moves to υ j where a fixed-base
robot γ j at the shelf would pick and place the load Ti into the shelf.

The task τ [i, j] can be decomposed into the following sub-tasks. (a) Mobile
robot moving to the pickup point Φi , (b) the mobile robot picks up the load
of type Ti , (c) the mobile robot carries the load towards the shelf υ j , (d) the
fixed-base robot moves towards the parking spot of the mobile robot, and (e)
the fixed-base robot picks and places the load Ti onto the shelf.

We simulate the tasks at every time step using a preset probability matrix defined
as P, where P[i, j] is the probability that a task τ [i, j] is going to be generated
within the next time step ‘t’. Note that P is a constant two-dimensional matrix used
to simulate the task requests. At every time-step ‘t’, a new set of tasks are generated
based on the probability matrix P. Hence, we can have multiple requests for a fixed-
base robot to execute at any given instant, which is stored in a queue. We define
the queue for γ j as Q j , which contains the list of tasks of the type τ [:, j]. Now, we
investigate the assignment of a priority among the tasks in Q j , using the previous
history of tasks, to reduce the overall task completion time. We use the multi-armed
276 A. K. Sandula et al.

(a) Side-view of the simulation environment

(b) Top-view of the simulation environment (as seen in Rviz)

Fig. 1 Simulation of the environment where η1 , η2 andη3 are mobile robots approaching the fixed
base robot γ1 from their respective starting points carrying different load types
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot … 277

p
bandit approach to schedule the tasks in Q j by generating a priority queue Q j . The
tasks are scheduled based on the previous history of tasks and the estimated time of
arrival(s) of the mobile robot(s) carrying the load towards the shelf.

3.2 Motion Planning

For the motion planning algorithm of the robotic arm, we use the RRT-connect
algorithm Kuffner and LaValle [17]. The RRT-connect algorithm is a sampling-based
motion planning algorithm. RRT-connect is an extension of the RRT (Randomly
exploring Rapid Tree) motion planning algorithm given by LaValle et al. [18], which
is probabilistically complete. The RRT algorithm generates a uniformly exploring
random tree with every iteration and finds the path if it exists. Initially, we label the
entire workspace into free and obstacle spaces, which are assumed to be known. If
the state of the robot is not colliding with the obstacles present in the environment, we
consider that state belongs to free space and vice-versa obstacle space. The algorithm
generates a node (a state in the configuration space) with a bias towards the goal.
The node is determined whether it is in the free space or obstacle. From the nearest
neighbour, we steer towards the sampled node, determine a new node, and add to
the tree, provided that a straight line exists without colliding obstacles from the
nearest neighbour to the sampled node. The node to which the newly sample node
was attached in the tree is said to be the parent node. The exploration ends when we
encounter a sample node lying within a tolerance of the goal connected to the tree.
Any node connected to the tree can be reachable from the start node.
However, the RRT-connect, on the other hand, explores the environment from
both start and goal regions. The two trees will stop exploring the region if any newly
sampled node connected to a tree falls within a tolerance of any other node from the
other tree. The Fig. 2 illustrates how the random trees from the start (red) and goal
(green) approach each other for an increasing number of iterations. Now, the full path
from start to goal can be found by approaching the parent nodes from the intersection
point of the tress. This algorithm is known to give the quickest solution when the
obstacles are not very narrow. In our simulation, we did not have narrow passages
to move the robotic arm. Hence RRT-connect would be well suited for executing the
pick and place task.
We used ROS1 navigation framework (also called move_base) for moving the
mobile robot to a given goal location while avoiding obstacles Quigley et al. [21].
The goal of the navigation framework is to localize the robot within the indoor envi-
ronment map and simultaneously move towards the goal. Fox et al. [7] proposed a
probabilistic localization algorithm with great practical success, which places com-
putation where it needs. The mobile robot has a 360-degree laser scan sensor, which
gives the distance of the obstacles(in two dimensions only) around the robot, that is,
the 2D point cloud. The Adaptive Monte-Carlo Localization algorithm by Fox et al.
[7] uses the 2D point cloud data and locates the position and orientation of the robot
in an indoor environment. The ROS navigation stack allows the robot to navigate
278 A. K. Sandula et al.

Fig. 2 RRT-connect algorithm illustration

from its current localized position to the goal point. The navigation framework uses
a two-level navigation approach by using global and local planning algorithms. The
goal of a global planner is to generate a path avoiding the static obstacles in the envi-
ronment. The goal of the local planner is to move the mobile robot along the planned
global path while avoiding dynamic obstacles. The local planner heavily reduces the
computational load of replanning the global path for changing environments in the
case of dynamic obstacles. A* Hart et al. [10], and Dijkstra [4] are popular graph
search algorithms which guarantee the optimal solution if a path exists from one
point to another. Dijkstra algorithm is an undirected search algorithm which follows
a greedy approach. A* algorithm is a directed search algorithm which uses heuris-
tics to focus the search towards the goal. The Dijkstra and A* algorithms are proven
to give the optimal solutions. The A* algorithm takes lesser time to reach the goal
because the search is directed. We used the A* algorithm as the global planner in
the navigation framework. The global planner gives a set of waypoints on the map
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot … 279

Fig. 3 Forward simulation

for the mobile robot to follow to reach the goal. These waypoints avoid the static
obstacles on the map.
The mobile robot uses the Dynamic-Window Approach (DWA) Fox et al. [8], a
reactive collision avoidance approach, as the local path planning algorithm in the
navigation framework. The DWA algorithm is proposed for robots equipped with a
synchro-drive system. In a synchro-drive system, all the wheels of the robot orient
in the same direction and rotate with the same angular velocity. The control inputs
for such a system are linear and angular velocities. The DWA algorithm changes the
control inputs of the robot at a particular interval. This approach considers the mobile
robot’s dynamic constraints to narrow the search space for choosing the control input.
Based on the maximum angular and linear accelerations of the motors at any given
instant, the reachable set of control inputs (angular and linear velocity for the mobile
robot) was determined. The reachable set of inputs is discretized uniformly. For each
sample, a kinematic trajectory is generated, and the algorithm estimates the simulated
forward location of the mobile robot. Figure 3 shows the forward simulation of the
’reachable’ velocities for the robot. The reachable velocities are the set of linear and
angular velocities in the control input plane that can be reached within the next time
step of choosing another velocity. For each trajectory of the forward simulation, a
cost is computed. The original implementation of Fox et al. [8] computes the cost
based on clearance from obstacles, progress towards the goal and forward velocity.
The ROS’s implementation is as follows. The cost is the weighted sum of three
components. The weighted sum includes the distance of the path to the endpoint
of the simulated trajectory. Hence, increasing the weight of this component would
make the robot stay on the global path. The second component of the cost is the
280 A. K. Sandula et al.

Fig. 4 RViz

distance from the goal to the endpoint of the trajectory. Increasing the weight of this
component makes the robot choose any higher velocity to move towards the goal.
The other component is the obstacle cost along the simulated forward trajectory. We
assume that the points on the map with obstacles are very high while computing
obstacle costs. Hence, if the robot collides at any simulated forward trajectory, the
cost would be very high, and the (v, ω) pair will not be chosen by the dwa planner
(Fig. 4).

cost = path_distance_bias * (distance to path from the endpoint of the tra-


jectory) + goal_distance_bias * (distance to goal from the endpoint of the
trajectory) + occdist_scale * (maximum obstacle cost along the trajectory in
obstacle cost)

This cost depends on the distance from obstacles, proximity to the goal point and
velocity. The trajectory with the least cost is chosen, and the process is repeated
periodically till the goal point is reached.

4 Methodology

This section explains the proposed approach to schedule the tasks based on a multi-
armed bandit formulation solved with the ε-greedy algorithm. We can further opti-
mize by scheduling the tasks’ execution so that the fixed-base robot reaches the
parking spot of the mobile robot synchronously when the mobile robot reaches the
workspace of the fixed-base robot. The following subsections explain the modules
associated with the suggested approach and summarise the methodology.
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot … 281

4.1 Multi-armed Bandit Formulation

Multi-armed bandit (MAB)(Slivkins [22]) problem is a classical reinforcement learn-


ing problem in which a fixed set of resources should be allocated among competing
choices. We formulate the approach such that, at every time step, a choice has to be
made by the algorithm. A reward is associated with each choice (arm or competing
alternative) based on a preset probability. A MAB solver works on the principle of
exploration and exploitation. The objective is to choose the arm with more expected
gain. In the context of the current problem, multiple mobile robots approaching a
given shelf are the competing choices. The fixed base robot is a limited resource which
should collaborate with the mobile robots to accomplish the tasks. The scheduling of
the tasks would be prioritised in a way that increases the maximum expected gain of
the multi-armed bandit problem. We used the Bernoulli multi-armed bandit, where
the reward is binary, that is, 0 or 1.
The goal of this module is to prioritise the order of the requests based on the
previous history of requests. We schedule the order of the task requests based on the
priority we get from the MAB solver(s). The history of task requests is the input to
the module; the output is the estimated order of probabilities of these task requests.
Hence we define a function α : τ ∗ → P ∗ , where τ ∗ represents the history of the
tasks accomplished until that time point. Note that τ ∗ is a three-dimensional binary
matrix to which a two-dimensional matrix is appended at every time-stamp. Each
row represents the list of tasks, and P ∗ represents the estimated set of probabilities
which is the output of this module. The ε-greedy algorithm works on the principle
of exploration and exploitation as explained in Algorithm 1. The parameter ‘ε’ is
the probability of whether the algorithm is in the state of exploration or exploitation
at a given time step. In the state of exploitation, we obtain the cumulative reward
and pick the best(greedy) bandit available. Here, we consider a total of |υ| number
of choices, each corresponding to a shelf, taken at a given time stamp. The aim is
to find the most probable pickup-shelf task request for each shelf. Hence, the robot
which carries the load type of that corresponding pickup point can be given priority
while scheduling the task when multiple load-carrying mobile robots come to a shelf
simultaneously. The value of ε should be set to a sufficiently high value to let the
algorithm explore all the arms to get the correct priority order.
The arms’ order keeps changing as the algorithm seeks to learn with the updated
history of tasks. The following equation calculates the estimates of the bandits.

This module prioritises the order of requests by calculating the estimated prob-
ability P ∗ , which is updated at every time stamp and helps us to schedule the
tasks.

=cur
T
1
P ∗ (i, j) = τ ∗ (i, j, T )β(aT = a(i, j)) (1)
N j (aT = a(i, j)) T =1
282 A. K. Sandula et al.

Algorithm 1 ε − gr eedy(T imesteps)


Ts = 0
n = a randomly generated number at each time step
for Ts < T imesteps do
if ε > n then
Explore
Ts = Ts + 1
else
Exploit
Ts = Ts + 1
end if
Update reward
end for

Algorithm 2 Grid Approach(i,G i ,vi ,ωi )


Robot Id ← i
Current location of robot ← G i = [xgi , ygi ]
Current linear velocity of robot ← vi
Current angular velocity of robot ← ωi
if GRID_LOCK(i,G i ) then
P O L I CY2 (G i )
Grid Approach(i,G i ,vi ,ωi )
end if
Forward simulation(i,vi ,ωi ) → Priorit yi
if Conflict_Check(Priorit yi ) then
P O L I CY1 (G i , Priorit yi )
end if
Continue moving to the goal using ROS-NAV1 architecture

In Eq. 1, P ∗ (i, j) represents the estimated probability of task request from ith pickup
point to jth shelf. Here, ‘T’ is the variable for the time step when the tasks are
generated. Here, N j (aT = a(i, j)) represents the number of times the action a(i, j)
was chosen by the MAB solver corresponding to the shelf j until the current number
of time-stamps, that is, T=cur. The function β(X ) is a binary function which returns
one if the condition  X is satisfied and zero if it is not satisfied. The denominator
N j (aT = a(i, j)) = TT =cur=1 β(aT = a(i, j)), represents the number of times the
arm i was chosen by the MAB solver at shelf j until the current time-step, that is,
T=cur.

4.2 Task Scheduling Based on Time Synchronization

In this subsection, we present the scheduling of the task requests that a fixed-base
robot at a shelf must execute. We assign the priority among the task requests based on
P ∗ we receive from the MAB solvers and the estimated time of arrival(s) of the mobile
robot(s). As explained in Sect. 3.1, accomplishing a task requires a mobile robot and
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot … 283

a fixed-base robot to work in a coalition. The mobile robot carries the package and
parks itself at a parking slot within the reachable workspace of the fixed-base robot.
We schedule the tasks in such a way that the fixed-base robot reaches the mobile robot
at the same time. This formulation helps us achieve collaboration between them to
reduce the overall time to finish the task. From P ∗ , we get the priority order (
p ) by
finding the element with the highest probability at each column. Sorting the indices
of the rows based on the highest estimate to the lowest gives us the priority order.

Algorithm 3 MAB Scheduler(E T A,


p)
Estimated Time of Arrivals ← E T A
for ∀x ∈ E T A do
if x < δ then
Append load type of x to list E
end if
end for
count=0
for ∀y ∈ p do  sort E subject to 
p
if y ∈ E then
Move y in E to position count
Shift the requests after count by one position
count = count+1
end if
end for

So, we define priority order as

p = sor t (i ← max − to − min(max(P ∗ (i, :))))


 (2)

The priority order of load types (p ) is obtained by sorting the highest to least prob-
abilities from the estimated probability of task requests (P ∗ ) with respect to the
maximum element among the rows of P ∗ . Hence, we check the highest probability
value in every row and compare the same for other rows. The row numbers from
highest to lowest are sorted in the priority order p . We conclude that the robots that
carry a specific load type which appears at the beginning of the  p have a higher prob-
ability of the tasks accumulated than the latter. We assume t j as the set of pickup
requests for the fixed-base robot at shelf j at any given instant. The estimated time of
arrival(s) of the mobile robot(s) is(are) calculated from the path length(s) with cur-
rent velocity as given in Fig. 5. We use the RRT-connect Kuffner and LaValle [17]
algorithm for the motion planning algorithm of the robotic arm. In this algorithm, a
tree from the source and a tree from the goal point are grown towards each other until
they meet. The shortest path from the set of nodes (tree) is then chosen to execute the
movement of the robotic arm. The scheduled requests are executed by the fixed-base
robot when the movement time of the fixed-base robot equals the estimated time
of arrival of the mobile robot. The estimated arrival time is calculated based on the
distance remaining for the mobile robot to travel, divided by the current velocity. The
time taken by the fixed-base robot is calculated by the angular distance (argument θ )
284 A. K. Sandula et al.

Fig. 5 Calculation of estimated time of arrival

the base joint has to travel and the velocity profile of the controller. The mobile robot
navigates using a global planner A∗ and a local planner dynamic-window approach
Fox et al. [8], which is an online collision avoidance algorithm.
The fixed base robot has the velocity profile as shown in Fig. 6. The angular
velocity, ω, increases with a uniform angular acceleration for θ < 10◦ after which
ω = 20◦ /s. For the last 10◦ of the angular displacement, ω decreases till it becomes
zero. As shown in Algorithm 2, the time taken to execute the movement of the fixed-
base robot, γ1 , is calculated using the angular displacement θ of the base joint of the
fixed-base robot from its current position.

Algorithm 4 Movement Time(θ )


Movement Time ← M T
if θ ≤ 10◦ √
then
M T = (2θ)/2
end if
if 10 ≤ θ ≤ 20√then
M T = 1 + (2θ − 10)/2
else
M T = 2 + (θ − 20)/2
end if
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot … 285

Fig. 6 Velocity profile of the base joint of the fixed base robot

Algorithm 5 MR_Task_Executor_ηi (T ,statei )


if i in T then
if statei = start then
Wait for the package deployment
MR_Task_Executor_ηi (T ,to shelf)
end if
if statei = to shelf then
Send_Goal(parking point)
Wait for execution
MR_Task_Executor_ηi (T ,pick)
end if
if statei = pick then
Wait for the pick execution of fixed-base robot
MR_Task_Executor_ηi (T ,to start)
end if
if statei = to start then
Send_Goal(pickup point)
Wait for execution
MR_Task_Executor_ηi (T ,start)
end if
else
Update(T)
MR_Task_Executor_ηi (T ,statei )
end if

The Fig. 7 shows the entire architecture of the proposed methodology and the
modules involved in the work. The two modules of the architecture are titled ‘Multi-
armed bandit’ and ‘Multi-agent coordination. The module titled ‘Multi-armed bandit’
gives us the estimated probabilities of the task requests based on the history of
the tasks, as explained in Sect. 4.1. The module titled ‘Multi-agent coordination’
takes into consideration the movement time and estimated time of arrivals and the
probability estimates to plan the sequence of tasks.
286 A. K. Sandula et al.

Fig. 7 Data flow between the sub-modules of the proposed architecture

Since this work focuses on task scheduling of fixed-base robots, we only consid-
ered one shelf and three pickup points. We have a total of four agents, one fixed-base
robotic arm γ1 , and three different load-carrying mobile robots (η1 , η2 , η3 ), which
start from three different pickup points. Tasks are allocated to the mobile robot, which
can carry the particular load type. Each mobile robot can carry a specific load type
from a pickup point. Hence, after finishing the task, the mobile robots move back to
the pickup point from the shelf to execute future tasks, if any. Figure 8 shows the
flow chart of the execution of requests by a load-carrying robot.
The detailed algorithm is explained in Algorithm 3. The input T is the list of tasks
which are not accomplished. If ‘i’ is in the list T , a load from pickup point ‘i’ is to be
carried to the shelf. We define four states of the mobile robot. State ‘start’ means the
robot is waiting for the load to be deployed at the pickup point. State ‘to shelf’ means
that the mobile robot will move with the load to the shelf at a parking point reachable
to the robotic arm. State ‘pick’ means the robot is waiting for the fixed-base robot
to reach the package to execute the pick and place operation. State ‘to start’ means
the robot has finished its task and is moving back to its corresponding pickup point.
Once the robot reaches the start position, it will follow the same loop if there are
unfinished tasks.
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot … 287

Fig. 8 Mobile robot task execution

Now, the fixed base robot at a given shelf prioritizes the pickup requests based
on the MAB scheduler algorithm, as explained in Algorithm 1. The output E of
the algorithm gives the order of the tasks that the fixed-base robot must execute.
The algorithm uses the multi-armed bandit formulation to estimate the priority to
be allocated among the mobile robots at that current instant. We only prioritize the
requests approaching the fixed-base robot within a threshold δ equal to the time
taken for executing a pick and place task. We move the fixed-base robot towards the
parking spot of the mobile robot in such a way that the robotic arm could reach for
pickup precisely when the mobile robot delivers the package to make the scheduler
288 A. K. Sandula et al.

robust. This can be achieved by moving the robotic arm when movement time equals
the estimated time of arrival of the mobile robot. We schedule the tasks based on the
priority p because we want to reduce the waiting time for the mobile robot, which
has more probability of getting the tasks accumulated in future. We investigate the
performance of the MAB task scheduler to a deterministic scheduler which works on
a first-come-first-serve (FCFS) approach. In the FCFS approach, the mobile robot,
which is estimated (based on the ETA) to arrive the earliest to the shelf, would be
scheduled first for the pick and place operation irrespective of the history of the
task requests. The position of the load is estimated using a classical colour-based
object detection approach. A mask is used on the image frame to recognise the object
in real-time, which only detects a particular colour. The mask is created using the
upper and lower limits of the hue, value of greyness and brightness. We can detect
any colour which falls within the specified range by the camera, which is attached to
the end-effector of the robotic arm. In our simulation, the red-coloured region in the
camera view is first detected using the colour-based object detection technique, as
explained in Algorithm 6. A contour is created around the red object, which is used
to detect the object’s centroid, the weighted average of all the pixels that make up the
object. A contour is a curve that joins all the continuous (along the boundary) points
having the same colour. As the depth (Z) was constant, the X and Y coordinates of

Fig. 9 View from the camera attached to the end-effector of the robotic arm
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot … 289

the block can be determined by the difference (in pixels) between the centre of the
red bounded region from the image’s centre (Fig. 9).

Algorithm 6 Colour-based object detection


Lower limit of colour range ← L
Upper limit of colour range ← U
Area of contour ← A
while true do
Find contours for the colour range
Calculate A
if A > 2000 then
Find centroid of object
end if
end while

5 Results and Analysis

5.1 Simulation Results

We simulated a warehouse scenario in ROS-Gazebo architecture with pickup points


1, 2, 3 at [1, 4], [−3.5, 0.5] and [1, −3], respectively. The fixed-base robotic arm is
placed on the shelf located at [0,0.7]. The task requests are generated at a probability
of 0.6, 0.4, and 0.8 from each of the pickup points to the shelf at a time-step of 35 s.
After empirical analysis, the variable δ is set to 4.0 s. Figure 10 shows a comparison
between the deterministic and the stochastic task scheduling approach for Simulation
1. In Simulation 1, ε is chosen to be 0.3. Figure 11 shows a comparison between
the deterministic and the stochastic task scheduling approach in Simulation 2. In
Simulation 2 also, ε is chosen to be 0.3. Table 1 shows the total time taken by Robot
1, Robot 2 and Robot 3 to complete the tasks using the deterministic first-come-first-
serve approach and the stochastic multi-armed bandit approach in Simulation 1 and
Simulation 2. In Simulation 1, the stochastic approach reduced the total time taken
to complete the tasks by Robot 1, Robot 2 and Robot 3 by 25.3%, 64.9% and 41.8%.
In Simulation 2, the stochastic approach reduced the total time taken to complete
the tasks Robot 2 and Robot 3 by 2.608% and 11.966%, respectively. However, the
total time taken to complete the tasks of Robot 1 was increased by 44.033% using
the multi-armed bandit based approach. Table 2 shows the cumulative time taken
by Robot 3 to complete consecutive sets of 20 tasks using the deterministic and the
stochastic approach. In Simulation 2, the difference between the time Robot 3 to
complete the first 20 tasks using the suggested multi-armed bandit based approach
and the first-come-first-serve approach is 0.76 h. For Simulation 1, the difference
is 1.1 h. This difference exponentially increases for higher sets of consecutive tasks
because of the accumulated waiting time. Figures 12 and 13 capture this difference.
For Robot 2, the difference between the time taken to complete the first six tasks
290 A. K. Sandula et al.

Fig. 10 Task duration in Simulation 1

using the proposed multi-armed bandit based approach and the first-come-first-serve
approach is 0.02 h in Simulation 1 (see Figs. 14 and 15). However, in Simulation 2,
the FCFS approach is faster by 0.05 h. It was observed that the multi-armed bandit
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot … 291

Fig. 11 Task duration in Simulation 2

based approach is faster for the last set of 6 tasks for Robot 2 by 0.45 h and 0.23 h in
Simulation 1 and Simulation 2, respectively. In Simulation 1, the difference between
the time for Robot 1 to complete the first 11 tasks using the suggested multi-armed
292 A. K. Sandula et al.

Table 1 Time taken (in hours) to complete 100 tasks in Simulation 1 and Simulation 2
Robot Simulation 1 Simulation 2
FCFS (h) MAB (h) FCFS (h) MAB (h)
1 7.2 5.4 7.2 10.4
2 1.2 0.4 1.2 1.2
3 34.9 20.3 34.9 30.7

Table 2 Time taken (in hours) to complete consecutive sets of 20 tasks by Robot 3 in Simulation
1 and Simulation 2
Tasks Simulation 1 Simulation 2
FCFS (h) MAB (h) FCFS (h) MAB (h)
20 2.52 1.42 2.52 1.76
40 6.33 3.73 6.33 5.41
60 10 6.25 10 9.42
80 15.29 8.54 15.29 13.53

Fig. 12 Cumulative task duration in Simulation 1 for Robot 3

bandit approach and the first-come-first-serve approach is 0.01 h, FCFS approaches


being the faster approach for this case. For Simulation 1, the difference is 0.17 h in
favour of the deterministic FCFS approach. However, for higher sets of consecutive
11 tasks, the stochastic multi-armed bandit based approach is faster in Simulation 1,
which is not the case in Simulation 2, as shown in Figs. 16 and 17.
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot … 293

Fig. 13 Cumulative task duration in Simulation 2 for Robot 3

Fig. 14 Cumulative task duration in Simulation 1 for Robot 2

Table 3 shows the cumulative time taken by Robot 2 to complete consecutive sets
of 6 tasks using the deterministic and the stochastic approach. Table 4 shows the
cumulative time taken by Robot 2 to complete consecutive sets of 11 tasks using the
deterministic and the stochastic approach.
We observe that the total task completion time for all the robots was more for the
FCFS approach than the MAB approach in both the simulations.
294 A. K. Sandula et al.

Fig. 15 Cumulative task duration in Simulation 2 for Robot 2

Fig. 16 Cumulative task duration in Simulation 1 for Robot 1

5.2 Discussion

We observe that the proposed approach can outperform the deterministic task sched-
uler. However, the uncertainty in executing the path by the mobile robot can affect the
task completion time. In Simulation 2, even though Robot 1 was given priority, the
task completion time was more than Simulation 1. The reason for this difference is
the uncertainty in executing the path by the mobile robot. The mobile robot uses the
Dynamic-Window Approach Fox et al. [8], a robust local path planning algorithm,
to avoid dynamic obstacles. Even though the global path decided by the robot is the
same in every case, it keeps updating based on the trajectory decided by the local path
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot … 295

Fig. 17 Cumulative task duration in Simulation 2 for Robot 1

Table 3 Time taken (in hours) to complete consecutive sets of 6 tasks by Robot 2 in Simulation 1
and Simulation 2
Tasks Simulation 1 Simulation 2
FCFS (h) MAB (h) FCFS (h) MAB (h)
6 0.09 0.07 0.09 0.14
12 0.24 0.08 0.24 0.38
18 0.14 0.1 0.14 0.25
24 0.2 0.07 0.2 0.15
30 0.45 0.09 0.45 0.22

Table 4 Time taken (in hours) to complete consecutive sets of 11 tasks by Robot 1 in Simulation
1 and Simulation 2
Tasks Simulation 1 Simulation 2
FCFS (h) MAB (h) FCFS (h) MAB (h)
11 0.26 0.27 0.26 0.43
22 0.62 0.42 0.62 0.94
33 1.15 0.85 1.15 1.71
44 1.91 1.53 1.91 2.73
55 2.52 1.8 2.52 3.52

planner. Hence, the execution of the path by the mobile robot does not necessarily
have the same time taken for execution for the same initial and final goal point.
In this chapter, we have proposed a novel task scheduling approach in the context
of heterogeneous robot collaboration. However, we did not consider the case where
robots are semi-autonomous. We observe that the difference between the time taken
296 A. K. Sandula et al.

Fig. 18 Top view of the mixed-reality warehouse environment

to complete the tasks using the deterministic first-come-first-serve approach and


the stochastic multi-armed bandit based approach was significantly high for Robot
3 in Simulation 1 and Simulation 2. Although the total task completion time for
the MAB approach was still less in Simulation 2, making the scheduler robust to
handle uncertainties would save more time. A robust task scheduler considering the
uncertainty of the task execution has to be investigated in future work.
In this work, we considered only one shelf and three pickup points. However, the
performance of this approach could be better investigated with multiple shelf points
coupled with multi-robot task allocation. This needs to be further going investigated.
In a warehouse scenario, some tasks require human intervention; that is when
the robots have to work collaboratively with humans. Hence, we simulated a mixed-
reality warehouse environment with real robots in an indoor environment. Figure 18
shows the top view of the simulated mixed reality interface. Figures 19, 20 and 21
illustrate the experiment setup in a mixed reality environment. The user manually
places the load on the top of the mobile robot. The mobile robot will carry the load
to the fixed-base robot (a shelf point). Once the load is placed, the user presses the
button where the load is to be carried. The experimental setup considers avoiding
collision with the virtual obstacles present in the indoor environment. This helps us
recreate a warehouse-type scenario in an indoor environment. In the same mixed
reality environment, we added two virtual buttons to follow the user wherever he
moves. These two buttons serve as the user input.
We considered two fixed-base robots to be chosen by the user for transporting the
load. In Fig. 17, the user presses the button on the left. The goal of the mobile robot
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot … 297

Fig. 19 The mobile robot carries the load towards the Fixed-base Station 1 when the user presses
the first button

is to autonomously avoid real and virtual obstacles and reach the desired goal. The
robot is clearly observed avoiding the virtual obstacles while reaching the goal. The
top part of the figure shows the Hololens camera view from the user. Hololens is a
holographic device which is developed and manufactured by Microsoft. The bottom
part shows the real-world view of the environment.
In Fig. 20, the user pressed the other button, choosing the other station. Hence, a
new goal position is sent to the navigation stack. The robot changes the global path
and, consequently, stops at that moment. Immediately after that, the robot started
moving along the new global path. The arrows in the Hololens view and the real-
world view represent the direction of the robot’s velocity at that moment. It can be
observed that the direction of velocity is such that the robot avoids the virtual obstacles
present on both its sides. This is done by providing an edited map to the navigation
stack. We send the edited map to the move_base node of ROS as a parameter. We
use the unedited map as an input parameter to the amcl node, which is responsible
for the indoor localisation of the robot.
In Fig. 21a. we can observe that the mobile robot is avoiding both the real and
virtual obstacles while parallelly progressing towards the new goal. The real-world
camera view of the figure shows the real obstacle and the end-effector of the fixed-
base robot. In Fig. 21b. we show that the mobile robot reached the workspace of the
fixed-base robot, and the fixed-base robot picks and places the load carried towards
298 A. K. Sandula et al.

Fig. 20 Mobile robot changes its trajectory when the second button is pressed

(a) Mobile robot moves towards Fixed-base (b) Fixed-base robot executes the pick and place
Station 2, avoiding the real obstacle task

Fig. 21 Pick and place task execution in a mixed-reality warehouse environment with real and
virtual obstacle avoidance - link to video
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot … 299

it using a camera attached to the end-affector. This work can be extended to a case
where the scheduler considers human input and prioritises the mobile robots for
collaboration to finish the task.

6 Conclusion

This chapter has presented the challenges of a heterogeneous multi-robot task


scheduling scenario. It has proposed a solution and has discussed the multi-armed
bandit technique used in this application domain. Future work will investigate a sce-
nario with multiple shelves coupled with dynamic task allocation. The uncertainty
of the execution of a mobile robot can cause increased task completion time. The
delays in the initial tasks could add up to the later tasks. Future work will further
investigate a task scheduler which can account for the uncertainty in the execution
of the tasks. A task scheduler that considers a human in the loop system for the case
with semi-autonomous robots is to be further investigated.

References

1. Borrell Méndez, J., Perez-Vidal, C., Segura Heras, J. V., & Pérez-Hernández, J. J. (2020).
Robotic pick-and-place time optimization: Application to footwear production. IEEE Access,
8, 209428–209440.
2. Claure, H., Chen, Y., Modi, J., Jung, M. F. & Nikolaidis, S. (2019). Reinforcement learning with
fairness constraints for resource distribution in human-robot teams. arXiv:abs/1907.00313.
3. Dahiya, A., Akbarzadeh, N., Mahajan, A. & Smith, S. L. (2022). Scalable operator allocation for
multi-robot assistance: A restless bandit approach. IEEE Transactions on Control of Network
Systems, 1.
4. Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische math-
ematik, 1(1), 269–271.
5. Durrant-Whyte, H., Roy, N., & Abbeel, P. (2012). A framework for Push-Grasping in clutter
(pp. 65–72).
6. Eppner, C., & Brock, O. (2017). Visual detection of opportunities to exploit contact in grasping
using contextual multi-armed bandits. In 2017 IEEE/RSJ international conference on intelligent
robots and systems (IROS) (pp. 273–278).
7. Fox, D., Burgard, W., Dellaert, F., & Thrun, S. (1999). Monte Carlo localization: Efficient
position estimation for mobile robots. AAAI/IAAI (343–349), 2.
8. Fox, D., Burgard, W., & Thrun, S. (1997). The dynamic window approach to collision avoid-
ance. IEEE Robotics and Automation Magazine, 4(1), 23–33.
9. Fragapane, G., de Koster, R., Sgarbossa, F., & Strandhagen, J. O. (2021). Planning and control of
autonomous mobile robots for intralogistics: Literature review and research agenda. European
Journal of Operational Research, 294(2), 405–426.
10. Hart, P., Nilsson, N., & Raphael, B. (1968). A formal basis for the heuristic determination of
minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2), 100–107.
https://doi.org/10.1109/tssc.1968.300136
11. Ho, Y.-C., & Liu, H.-C. (2006). A simulation study on the performance of pickup-
dispatching rules for multiple-load agvs. Computers and Industrial Engineering, 51(3), 445–
463. Special Issue on Selected Papers from the 34th. International Conference on Comput-
300 A. K. Sandula et al.

ers and Industrial Engineering (ICC&IE). https://www.sciencedirect.com/science/article/pii/


S0360835206001069
12. Kalempa, V. C., Piardi, L., Limeira, M., & de Oliveira, A. S. (2021). Multi-robot preemptive
task scheduling with fault recovery: A novel approach to automatic logistics of smart factories.
Sensors 21(19). https://www.mdpi.com/1424-8220/21/19/6536.
13. Korein, M., & Veloso, M. (2018). Multi-armed bandit algorithms for a mobile service robot’s
spare time in a structured environment. In D. Lee, A. Steen & T. Walsh, (Eds.), GCAI-2018.
4th Global Conference on Artificial Intelligence. EPiC Series in Computing, EasyChair (Vol.
55, pp. 121–133). https://easychair.org/publications/paper/cLdH
14. Kousi, N., Koukas, S., Michalos, G. & Makris, S. (2019). Scheduling of smart intra - factory
material supply operations using mobile robots. International Journal of Production Research
57(3), 801–814. https://doi.org/10.1080/00207543.2018.1483587
15. Koval, M. C., King, J. E., Pollard, N. S., & Srinivasa, S. S. (2015). Robust trajectory selection
for rearrangement planning as a multi-armed bandit problem. In 2015 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS) (pp. 2678–2685).
16. Krishnasamy, S., Sen, R., Johari, R. & Shakkottai, S. (2021). Learning unknown service rates
in queues: A multiarmed bandit approach. Operations Research, 69(1), 315–330. https://doi.
org/10.1287/opre.2020.1995
17. Kuffner, J., & LaValle, S. (2000). Rrt-connect: An efficient approach to single-query path
planning. In Proceedings 2000 ICRA. Millennium conference. IEEE international conference
on robotics and automation. Symposia proceedings (Cat. No.00CH37065) (Vol. 2, pp. 995–
1001).
18. LaValle, S. M. et al. (1998). Rapidly-exploring random trees: A new tool for path planning.
19. Ozgul, E. B., Liemhetcharat, S., & Low, K. H. (2014). Multi-agent ad hoc team partitioning
by observing and modeling single-agent performance. In Signal and information processing
association annual summit and conference (APSIPA), 2014 Asia-Pacific (pp. 1–7).
20. Pini, G., Brutschy, A., Francesca, G., Dorigo, M., & Birattari, M. (2012). Multi-armed bandit
formulation of the task partitioning problem in swarm robotics. In International conference on
swarm intelligence (pp. 109–120). Springer.
21. Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A. Y.,
et al. (2009). Ros: an open-source robot operating system. In ICRA workshop on open source
software (Vol. 3, p. 5). Kobe: Japan.
22. Slivkins, A. (2019). Introduction to multi-armed bandits. CoRR. arXiv:abs/1904.07272
23. Stavridis, S., & Doulgeri, Z. (2018). Bimanual assembly of two parts with relative motion gen-
eration and task related optimization. In 2018 IEEE/RSJ international conference on intelligent
robots and systems (IROS) (pp. 7131–7136). IEEE Press. https://doi.org/10.1109/IROS.2018.
8593928
24. Szczepanski, R., Erwinski, K., Tejer, M., Bereit, A. & Tarczewski, T. (2022). Optimal schedul-
ing for palletizing task using robotic arm and artificial bee colony algorithm. Engineering
Applications of Artificial Intelligence, 113, 104976. https://www.sciencedirect.com/science/
article/pii/S0952197622001774
25. Tang, F., & Parker, L. E. (2005). Distributed multi-robot coalitions through asymtre-d.
In 2005 IEEE/RSJ international conference on intelligent robots and systems, Edmonton,
Alberta, Canada, August 2–6, 2005 (pp. 2606–2613). IEEE. https://doi.org/10.1109/IROS.
2005.1545216
26. Viguria, A., Maza, I., & Ollero, A. (2008). S+t: An algorithm for distributed multirobot task
allocation based on services for improving robot cooperation. In 2008 IEEE international
conference on robotics and automation (pp. 3163–3168).
27. Wang, H., Chen, W., & Wang, J. (2020). Coupled task scheduling for heterogeneous multi-
robot system of two robot types performing complex-schedule order fulfillment tasks. Robotics
and Autonomous Systems, 131, 103560. https://www.sciencedirect.com/science/article/pii/
S0921889020304000
28. Wang, J., Gu, Y., & Li, X. (2012). Multi-robot task allocation based on ant colony algorithm.
Journal of Computers, 7.
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot … 301

29. Zhang, Y., & Parker, L. E. (2013). Multi-robot task scheduling. In 2013 IEEE international
conference on robotics and automation (pp. 2992–2998).
30. Zlot, R., & Stentz, A. (2006). Market-based multirobot coordination for complex tasks.
The International Journal of Robotics Research, 25(1), 73–101. https://doi.org/10.1177/
0278364906061160
Machine Learning and Deep Learning
Approaches for Robotics Applications

Lina E. Alatabani , Elmustafa Sayed Ali , and Rashid A. Saeed

Abstract Robotics plays a significant part in raising the standard of living. With
a variety of useful applications in several service sectors, such as transportation,
manufacturing, and healthcare. In order to make these services useable with efficacy
and efficiency in having robotics obey the directions supplied to them by the program,
continuous improvement is required. Intensive research has been focusing on the
way to improve these services which has led to the use of sub-sections of artificial
intelligence represented by ML and DL with their state-of-the-art algorithms and
architecture adding positive improvements to the field of robotics. Recent studies
prove various ML/DL algorithms for robotic system architectures to offer solutions
for different issues related to, robotics autonomy, and decision making. This chapter
provides a thorough review about autonomous and automatic robotics along with their
uses. Additionally, the chapter discusses extensive machine learning techniques such
as machine learning for robotics. And finally, a discussion about the issues and future
of artificial intelligence applications in robotics.

Keywords Robotics applications · Machine learning · Deep learning · Visioning


applications · Assistive technology · Imitation learning · Soft robotics

L. E. Alatabani
Faculty of Telecommunications, Department of Data Communications and Network Engineering,
Future University, Khartoum, Sudan
E. S. Ali (B)
Faculty of Engineering, Department of Electrical and Electronics Engineering, Red Sea
University (RSU), Port Sudan, Sudan
e-mail: [email protected]
E. S. Ali · R. A. Saeed
Department of Electronics Engineering, Collage of Engineering, Sudan University of Science and
Technology (SUST), Khartoum, Sudan
R. A. Saeed · R. A. Saeed
Department of Computer Engineering, College of Computers and Information Technology, Taif
University, P.O. Box 11099, Taif 21944, Saudi Arabia

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 303
A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous
Systems Applications, Studies in Computational Intelligence 1093,
https://doi.org/10.1007/978-3-031-28715-2_10
304 L. E. Alatabani et al.

1 Introduction

Robotics has recently emerged as one of the most significant and pervasive techno-
logical technologies. Artificial intelligence has played a significant role in the devel-
opment of advanced robots, which makes them more coherent and responsive [1].
Machine learning (ML) and deep learning (DL) approaches helped with the creation
of enhanced and intelligent control capabilities as well as the development of smart
solutions to a variety of problems affecting robotics applications [2]. Artificial intel-
ligence techniques have recently been used to create a variety of robots, giving them
the capacity to increase correlation, human traits, and productivity in addition to the
enhanced humanistic cognitive capacities [3]. Robots can learn using precise ML
techniques to increase their precision and understanding of spatial relations func-
tions, grab objects, control movement, and other tasks that let them comprehend
and respond to unobserved data and circumstances. Recently, the robotic process
automation and the capacity to interact with the environment have been incorporated
into the mechanisms of DL [4]. It also enables robots to perform various tasks by
understanding physical and logistical data patterns and act accordingly.
Due to the difficulty of translating and analyzing natural events, event control
is one of the hardest activities to design through code when creating robots. This
is especially true when there is a wide variety of actions that the robot performs
in reality [5]. Therefore, algorithms that can gain expert human knowledge of the
robot as structured parameters and improve control techniques are needed when
constructing robots [6]. These justifications state that ongoing modifications to the
robot’s programming are required because the world around it is constantly changing
and because a composite analytical model is required to create application solutions
[7]. One strategy that can address these challenges in a comprehensive and unique
manner is the usage of machine and DL architecture.
The rationale for utilizing ML and DL in robotics is that they are more broad, and
deep networks are excellent for robots in characterless environments since they are
capable of high-level reasoning and conceptualization [8]. According to the impor-
tance of AI approaches in robotics automotive to solve complex tasks, the contri-
butions of this chapter is to provides a brief concept about the autonomous and
automatic robots, and the differences between them. Also the chapter will discuss
the most important robotics applications and the solution can be provides by the AI
approaches, in addition to reviews the concept of extreme learning machines methods
for robotics. Moreover, it will reviews different robotic learning approaches such as
multi, self, and imitation learning approaches.
This chapter’s remaining sections are arranged as follows, in Sect. 2, the differ-
ences between the autofocus robots and automatic robots were reviewed. Section 3
will review different robotics applications with respect to the AI solution approaches.
The extreme learning machines methods for robotics is provided in Sect. 4. The
machine learning approach for soft robotics, and machine learning based robotics
applications were presented in Sects. 5, and 6 respectively. In Sect. 7, chapter provides
a challenges and open issues in robotics applications. The chapter was concluded in
Sect. 8.
Machine Learning and Deep Learning Approaches for Robotics … 305

2 Autonomous Versus Automatic Robots

Autonomous robots were defined by Massachusetts Institute of Technology (MIT)


as an intelligent machine that can execute tasks in the world by themselves,
without explicit human control [9]. Modern robots are expected to execute complex
commands to perform tasks related to sensitive fields, therefor the introduction of
machine learning and deep learning approach was present to improve the accuracy of
command execution in autonomous robots. The use of these approached improved the
decision-making capabilities in robots adding self-learning abilities to autonomous
robotics systems [10]. Using neural networks (NN) as an approach of ML adds more
time to the learning process as a result of the size of the neural network for the
complexity of commands to be learnt. As a result, the introduction of convolutional
neural network (CNN) was present, CNN introduced a concept for decreasing leaning
time through, (a) tensor decompositions of weights and activations of convolutional
layer, (b) quantization of weights, (c) thinning weights to decrease network size, (d)
flexible network architecture, (e) portioning the training to have a large network train
a small network [6, 11].
The complex control task in robotic systems requires innovatory planning strategy
to let robotics interacted like humans in doing tasks. Some classical planning mecha-
nisms were used such as fast forward which has a limitation when it comes to the span
of scenarios used. For example, if some environment objects are not known, the robot
cannot execute the actions. Thus, knowledge-based planning was produced aimed
at making robots complete complex control tasks the mechanism was successful
with limitation in the computation of long series of actions [12]. To overcome the
robotic limitation, another approach was introduced which is the knowledge-based
reasoning which used to accomplishing complex control tasks with motion planning
along with a perception module a semantic description for the robot to follow will
be present [13].
Knowledge representation is accomplished through numerous approaches such
as ontologies aimed at enabling the use of structuring concepts and relations in
reasoning tasks used by robots. Authors in [9]. As a formal, clear specification of
a common conceptualization, define the ontology. The abstract or concept of the
entities in a certain estate is referred to as conceptualization [14]. The abstracts are
accomplished by determining their related concept and relations. In robotics, the
knowledge oriented approach uses model to describe actions for the control domain,
while others use terminologies and inference in a given domain [7, 10]. Although
these approaches have contributed to the performance of autonomous robotics, but
some limitations occurred which is represented by not adapting the generic frame-
work, the previous approaches are more task specific. Thus, the range of hetero-
geneous robotic collaborative tasks can be increased by other methodologies, for
example, Core Ontology for Robotics and Automation (CORA), which enable the
addition of more value to autonomous robotic systems and knowledge development
[11]. The task a motion planning approach is illustrated in Fig. 1, which combining
the domains of autonomous robotics.
306 L. E. Alatabani et al.

Fig. 1 Task and motion planning (TAMP) for robotics

Automatic robots is a concept that defines expert systems in terms of having a soft-
ware robot that imitates human responses or actions. The process of automating the
human processes is accomplished through the application of robotic process automa-
tion (RPA). The RPA is collector bowls a set of tools that operate on computer system
interface and enable robotic acting like a human [12, 15]. RPA has a variety of appli-
cations in today’s industries, such as agriculture, power plants, and manufacturing.
This technology targets the automation of simple, repetitive, and traditional work
steps [13, 14]. Figure 2 represents the potentiality of process automation in relation
with cognitive and routine tasks.

Fig. 2 Robotics process automation based on tasks characteristics


Machine Learning and Deep Learning Approaches for Robotics … 307

Input Process Goal Time UI

Size

Variety

Ambiguity Ambiguity

Relationship
Variavility

Novelty Temporal
Demand

Task-Technology Performance
fit type Impact

AI Technology Capabilities

Image
Search Prediction
Recogniton

Automation Level
Natural
Speech
Data Analysis Language Optimization
Recognition
Understanding

Fig. 3 Intelligent robotic process automation (RPA) framework with AI approach

Artificial Intelligence (AI) gives the traditional concept of RPA more sense in
multiples areas as AI has a range of capabilities that can give added value to bots
in two major areas, (a) capturing information and (b) understanding the captured
information. In capturing information, the aim is speech recognition, image recogni-
tion, search, data analysis/clustering. In understanding the captured information, the
aim is natural language understanding i.e. acting as translator between humans and
machines, optimization, and prediction. An intelligent framework to be used with
RPA includes classifying tasks according to their characteristics and fitting them with
the AI capabilities in order to select the task that is most suitable to be automated
[15]. The potential framework is illustrated in Fig. 3.

3 Robotics Applications

Robotics has made a significant contribution to the advancement of both current


technology and the next generation of technology by incorporating features that
increase the efficiency of numerous fields and creatively integrating machine learning
(ML) into a variety of applications, some of which will be discussed in the following
subsections.
308 L. E. Alatabani et al.

3.1 Computer Vision

Image segmentation, detection, colorization, and classification are all examples of


complex digital image problems where machine learning is used. The ML tech-
niques represented in DL methods like Convolutional Neural Network (CNN) has
impacted results in enhancing prediction efficiency using several resources. A branch
of machine learning known as “Deep Learning” was developed in accordance with
Artificial Neural Networks (ANN), which simulate how the human brain works [16].

3.2 Computer Vision

Complex digital picture challenges like image segmentation, detection, coloriza-


tion, and classification frequently involve machine learning. Convolutional Neural
Networks (CNN), a DL method that incorporates ML techniques, have improved
prediction accuracy using a variety of resources. A branch of machine learning
known as “Deep Learning” was developed in accordance with Artificial Neural
Networks (ANN), which simulate how the human brain works [16]. The comparison
between traditional and deep learning computer vision is shown in Fig. 4. In CNN,
multiple layers are used to generate the network. The data passes through many steps
pre-process such as subtraction and normalization and then processed by the CNN
network [17].

Traditional Computer Vision

Input Features Output

Feature Engineering Classifier with shallow


Manual Extraction+Selection structure

Deep Learning Workflows

Input Output

Feature Learning + Classifier


(End-to-End Learning)

Fig. 4 Traditional computer vision versus DL workflows


Machine Learning and Deep Learning Approaches for Robotics … 309

A. Convolutional layer
The network’s convolutional layer is made up of several sets of filters (kernels) that
can take in a certain input and output a feature map. Filters are a representation of
a multi-dimensional gird of discrete numbers. The numbers represent the weights
of the filter which are learnt during the training phase of the network. CNN uses
sub-sampling feature to reduce the dimension resulting to be smaller as feature map
output. The output feature allows mild constancy to scale and makeup objects which
is useful for applications like object recognizing used by image processing.
Zero-Padding is introduced to reduce noise, super-resolution, or segmentation
when an image’s spatial size needs to be kept constant or larger after convolution,
because these operations require more pixel-intensive predictions. It also gives more
room to design deeper networks. In order to increase the size of the output feature map
through processing with multi-dimensional filters, zero-mapping involves sliding
zeros in the input feature map [18].
B. Pooling layers
This layer defines parts or blocks of the input feature map and then integrates feature
activations. This integration is represented by a voting function such as the max or
middle function in the convolutional layer where there is a need to specify the size
of the aggregation area. The amount of the output feature map is calculated by the
function if we consider the size of an voted region of f x f.
   
 h− f +s  w− f +s
h = ,w = (1)
s s

where h is the height, w is the width, and s is the stride. In order to extract the
compressed feature representation, the voting operation efficiently down-sizes the
input feature map [19].
C. Fully Connected Layers
This layer is placed at the end of the network. Some researches proved that it also
efficient to place it at the middle of the network. This layer correlate with layers
with filter size of 1 × 1 where each unit is fully is intensively joint to all units in the
previous layer. By performing a straightforward matrix multiplication, adding a bias
vector, and then adding an element-wise nonlinear function, the layer functions are
calculated.

y = f (W T x + b) (2)

where x and y denote the input and output parameter, b represents the bias parameter
and W is the weights of connections between the units [20].
D. Region of Interest (ROI)
It used in object detection, this layer in an important element of CNN. By creating
a bounding box and labeling each object with a specific object class, this method
310 L. E. Alatabani et al.

makes it possible to precisely pinpoint every object in a picture. Objects may be


located in any area of an image with a variety of different attributes. For the aim
of determining the approximate location of an object, the ROI pooling layer simply
employs the input feature map of an image and the coordinates of each region as
its inputs. However, an issue with varied spatial sizes results in each ROI having
variable dimensions. Because the CNN only operates on a fixed dimensional input,
the ROI pooling layer modifies these variable sized features to a predetermined sized
output feature maps. ROI has demonstrated that using a single set of input feature
maps to create a feature representation for each region improves the performance of
a deep network [21].

3.3 Learning Through Imitation

The idea of a robot was first proposed in the early 1960s, when a mobile industrial
robot with two fingers was designed to transport goods and execute specified duties.
Therefore, much study was done to enhance the gripping and directional processes.
Learning through presentation, also known as imitation learning, is a theory that
has been demonstrated to exist in the performance of difficult maneuvering tasks
that recognize and copy human motion without the use of sophisticated behavior
algorithms [22, 27].
Deep Reinforcement Learning (DRL) is extremely valuable in the imitation
learning field because it has the ability to create policies on its own, which is not
possible with traditional imitation learning techniques that require prior knowledge of
the learning system’s full model. Robots can immediately learn actions from images
thanks to DRL, which combines decision-making and insight abilities. The Markov
Decision Process (MDP), which is the basis of reinforcement learning, produces the
anticipated summation of rewards as the output of the action state function.
 T 

Q π (s, a) = E π γ |st = s, at = a
trt
(3)
t=0

where the Q π (s, a) represents the action state value where E π is the expected
outcome in the motion strategy case π, rt representing the reward value γt, denotes
the discount factor [23, 24].
Robots learn motions and maneuvers by watching the expert’s demonstration
through the process known as imitation learning, which is the concept of exactly
mimicking the instructor’s behavior or action. In order to optimize the learning
process, the robot also learns to correlate the observed motion with the performance
[27, 28]. Instead of having to learn the entire process from scratch, training data
can be achieved by learning from available motion samples. This has a significant
positive impact on increasing learning efficiency. Combining many reinforcement
Machine Learning and Deep Learning Approaches for Robotics … 311

Supervised
Optimal Policy
Learning

Behavior Cloning

Imitation Reinforcement
Learning Reward Optimal Policy
Learning

Inverse Reinforcement Learning

Generator Adversarial Discriminator Expert Policy

Generative Adversarial Imitation Learning

Fig. 5 Imitation learning classification

learning approaches can increase the speed and accuracy of imitation learning [25].
The three major types into which imitation learning is divided are presented in Fig. 5.
To enable the distribution of the state action path generated by the agent to match
the known learning path, learning is done in the behavior reproduction process after
the policy is acquired. A robotic arm or other item would typically only be able to
repeat a specific movement after receiving manual instruction or a teaching pack, with
the exception of becoming accustomed to an unfamiliar environment change. The
availability of data-driven machine learning techniques allows the robot to recognize
superior robot maneuvering units and adjust to environmental changes.
In inverse reinforcement learning a reward function is introduced to test if the
action is performed as it should be. This approach outperforms the traditional
behavior cloning for its adaptation to different environments qualities, it is described
as an efficient approach of imitation learning. Generative adversarial imitation
learning concept is satisfied by ensuring that the generated strategy is aligned with
the expert strategy [26]. The imitation learning framework contains trajectory and
force learnings. In a trajectory training approach, an existing trajectory profile is
taken for task as an input, then create the nominal trajectory of the next task. The
force learning part of the framework uses Reinforcement Learning agent, and an
equivalent controller is to learn both the position and the parameters commands
of the controller [27]. Figure 6 illustrates an imitation learning framework where
the update of dynamic movement primitives (DMPs) are updated using a modular
learning strategy.
312 L. E. Alatabani et al.

Trajectory Learning

Final Goal

IL
Trajectory Skill Position Feedback
DMPs
Profile Policy

Update Sub-Goal

Force Learning
Control

Force RL
Feedback

Fig. 6 Adaptive robotic imitation framework

3.4 Self-supervised Learning

Self-supervised learning has been used to improve the functionality of several robotic
application characteristics, such as improving robot navigation, movement, and
vision. Since they rely on positional information for efficient movement and task
fulfillment, almost all robots navigate by evaluating input from sensors. Robots use
motion-capture and GPS systems, as well as other external sources, to determine
their positions. They can also use on-board sensors that are currently in vogue, such
as 3D LiDARs that record varied distances, to determine their positions [29]. Self-
supervised learning, in which the robot does not require supervision and the target
does not require labeling, is a successful learning technique. Self-supervised learning
is therefore most suitable for usage when the data being investigated is unlabeled [30].
A variety of machine learning approaches has been used in the visual localiza-
tion area to enhance the effectiveness of the visual based manipulation. Researchers
have established that self-supervised learning techniques for feature extraction based
on image production and translation enhanced the performance of robotic systems.
Feature extraction using Domain-Invariant Super Point (DISP) is satisfied through
two major tasks: key point detection and description. A function detects key points on
a certain image via map calculation, the aim of this process is to compare and match
Machine Learning and Deep Learning Approaches for Robotics … 313

key points found in other images using similarity matrix, the output is a fixed-length
vector to describe each pixel of the image.
With the goal of feature extraction domain-invariant function, picture domain
adaptation and self-supervised training are combined. The image is parsed into sub-
domains, and the network is instructed to train from a specified function to locate
associated key points. Instead of using domain translation for the feature detector and
descriptor, optimization is employed to reduce the matching loss, which aids in the
extraction of a better feature extraction function. This enables the network to have
the ability to filter essential points and match them under varying circumstances.
When applied within deep convolutional layers, this cross-domain idea expands the
learning process’s scope. Using the scenes and objects as opposed to a single visual
space. When particular conditions are being examined, feature extraction has stronger
abilities when image-to-image translation is used [31].

3.5 Assistive and Medical Technologies

According to its definition, assistive technology is a concept of applications designed


to help the elderly and others with long-term disabilities overcome their lack of
capability or decline. The demand for assistive technology has increased quickly,
particularly in a port covid-19 world, which has necessitated the need to improve
performance. The recent advancement in machine learning methods has made it
easier for the development and improvement of autonomous robots which gave them
the abilities of adapting, responding, and interacting to the surrounding environment.
This advancement led to the enhancement of human–machine collaboration when
it comes to building companion robots, exoskeletons, and autonomous vehicles.
Companion robots are designed to do certain duties that improve the patient’s quality
of life by keeping an eye on their mental and physical well-being, communicating
with them, and providing them with amusement [32].
Utilizing care robots primarily focuses on human–robot interaction, which neces-
sitates careful consideration of the user’s history. These users, who typically work
in the medical profession, lack the technical know-how to interact with robots. The
development of assistive technology places a significant emphasis on the nursing
staff and patients’ families, due to the fact that they are regarded as secondary users
of the system [33].
Incorporating Part Affinity Field (PAF) for human body prediction after thor-
oughly analyzing human photos has a good effect on improving the performance of
assistive technology when using convolutional neural network as a sub-system of
deep learning methodologies. High efficiency and accuracy are two features of the
PAF approach, which can successfully identify a human body’s 2D posture using
picture analysis [34].
314 L. E. Alatabani et al.

3.6 Multi-agent Learning

The use of learning-based methods for multi-robot planning has yielded promising
results due to their abilities to manage a multi-dimensional environment with state-
space representation. Robotics difficult problems, such as teaching numerous robots
or multi-agents to perform a task simultaneously, were resolved by applying rein-
forcement learning [35, 36]. By moving the compute portion to the cloud, multi-agent
systems must create strategies to overcome problems like energy consumption and
computational complexity. The performance of multi-agent systems has substantially
improved as a result of the integration of multi-robot systems with edge and cloud
computing, adding value and enhancing user experience.
Depending on the application being used, especially when deploying robots in
the medical industry, robotic applications have increased the demand for speedier
processing. Resource allocation made it simple to handle this issue by utilizing
multi-agent resource allocation. To meet the Quality-of-Service standards, which
vary from one robot to another according on its application, resource allocation
refers to allocating a resource to a job based on its availability. Robotics applications
required a variety of latency-sensitive, data-intensive, and computational operations.
For these tasks, a variety of resources are needed, including computing, network,
and storage resources [37].
The Markov Decision Making Process is applied to a Markov model in Multi-
Agent Reinforcement Learning (MARL), where a group of agents N are taken into
account together with S State Space, A Joint Action Space, and a Reward Function
R. Each time-step, the Reward function computes N rewards, one for each actor.
Considering T as a transitional function that takes into account the likelihood that a
state will be reached after taking a combined action of a. Every time-step, an obser-
vation function O is sampled from each agent and placed under the observation space
Z of each agent. Multi-Agent systems can be either heterogeneous or homogenous,
i.e. having either distinct or share the same action space respectively. MARL systems
configurations vary depending on their reward functions which can be either coop-
erative or competitive, in addition to their learning setups which directly impact the
type of policies learnt [38].

4 Extreme Learning Machines Methods for Robotics

ELM was produced to overcome the restrictions of gradient-based learning algo-


rithms in areas of classifications productivity, ELM convergence is faster because
there are no iterations in the learning process. Due its superiority of learning speed
and generalization capability ELM has been deployed in a wide range of learning
problems such as clustering, classification, regression, and feature mapping. The
evolution of ELM was present to further improve its performance by elevating the
Machine Learning and Deep Learning Approaches for Robotics … 315

classification accuracy, lessen the number of manual interventions, and less training
time.
ELM uses two random parameter and freezes them during the training process.
The random parameters are kept on ELM hidden layer. To be more efficient in the
training process, the input vector is mapped into a random feature space with random
configurations and nonlinear activation functions. In the linear parameter solution
step βi is acquired by Moore–Penrose inverse as it is a linear problem Hβ = T [39, 41].
The Extreme Learning Machine and Color Feature Variant technique were
combined to create a single layer feed forward neural network. A completely
connected output layer and three tiers of hidden layers are both included in CF-
ELM. During the training phase, the extraction of the product of the random weight
values and the inputs is recorded in the hidden layer output matrix H. The activa-
tion function g (x) is used to process the input signal and convert it to an output,
and the CF-ELM is processed using the soft-sign activation function g(x), which is
represented by the following function.
x
g(x) = (4)
1 + x

The H matrix can be seen as follows;


  

H W1 . . . , W Ñ , b1 . . . , b Ñ , Y1 . . . , Y N , U1 . . . , U N , V1 . . . , VN

⎡   ⎤
g W1 .Y1 + b1 . . . g W Ñ .Y1 + bñ
⎢ ... ⎥
⎢ ⎥
⎢ g W .Y  + b . . . g W Ñ .Y1 + bñ ⎥

⎢ 1 1 1 ⎥
⎢ g(W .U + b ) . . . g W Ñ .U1 + bñ ⎥
⎢ 1 1 1 ⎥
⎢ ⎥
=⎢ ... ⎥3N . Ñ (5)
⎢ ⎥
⎢ g(W1 .U N + b1 ) . . . g W Ñ .U N + bñ ⎥
⎢ ⎥
⎢ g(W1 .V1 + b1 ) . . . g W Ñ .V1 + bñ ⎥
⎢ ⎥
⎣ ... ⎦
g(W1 .VN + b1 ) . . . g W Ñ .VN + bñ

where N represents the samples for training. Ñ denoted for number of neurons in

the hidden layer, W represents the input weight, b represent bias. Y , U and V are
the color input samples for each pixel. The difference between CF-ELM and ELM
is that ELM uses grey scale images. The output coming from the hidden layer, and
turned to be as an input multiplier for the output weights β, T represents the output
target which equals to β.H

T = β.H (6)

The β can be expressed by,


316 L. E. Alatabani et al.
⎡ ⎤
β1T
⎢ . ⎥
⎢ ⎥
⎢ ⎥
β = ⎢ . ⎥ Ñ .m (7)
⎢ ⎥
⎣ . ⎦
β ÑT

where m is the layer’s total number of neurons. The following equation can be used
to represent the goal output T matrix:
⎡ ⎤
t1T
⎢ . ⎥
⎢ ⎥
⎢ ⎥
T = ⎢ . ⎥ N .m (8)
⎢ ⎥
⎣ . ⎦
t ÑT

Vector of (1 s) typically represents the value for each t which is stored based on
the input training sample. The value of β can be obtained by making it as the subject.

β = H −1 .T (9)

where, H −1 represents the Moore–Penrose Pseudo Inverse of the matrix H.


The feed dot product forward between the input picture and the weights in the
hidden layer and the dot product with the output layer are added up to form the
classification function. The hidden layer needs to be broken up into three sub layers
in order to reflect the color properties. Therefore, the count of neurons of a hidden
layer that must be divisible by 3. To get the yi output, the m classes must be processed
using the classifier function.

   


3

yi = (βi j W j .Y + b j + βi j + Ñ W j+ Ñ .U + b j+ Ñ
j=1

+ βi j+2 Ñ (W j+2N
˜ .V + b j+2 Ñ )) (10)

wher e W j Represents the weight vector for the jth neuron [40].
Another joint approach based on convolutional neural networks and ELM is
proposed to harvest the strength of CNN, this is to overcome the gradient calcu-
lations that are used for updating network weights. Convolutional Extreme Learning
Machine (CELM) are fast training CNNs which are used for feature extraction by
effectively define filters [41].
Level-Based Learning Swarm Optimizer (LLSO) based ELM approach is intro-
duced to solve the limitations of ordinary ELM which are the reduced generalization
performance. The concept is to use LLSO to harvest the paradigmatic parameters for
ELM, the optimization problem is present at large scale in ELM because of the fully
Machine Learning and Deep Learning Approaches for Robotics … 317

connected input and hidden layers. The essential considerations in the optimization
of ELM parameters are fitness and particle encoding. The input layer’s weight vector
and the hidden neurons’ bias vector make up the LLSO-ELM particles. The following
equations can be used to numerically represent particle P:

P = [w11 , w12 , . . . w1L , . . . wn1 , . . . wnl , b1 . . . b L ] (11)

where n is the dimension of the input dataset, and L is the number of hidden neurons.
The following equations represent the particle’s length, Len of Particle:

Leno f Par ticle = (n + 1) × L (12)

Fitness is used to assess the quality of the particles, with smaller values indicating
better classification. As a result, the equation below can be used to determine fitness
value.

Fitness = 1 − Acc (13)

After the LLSO had sought and harvested the optimal parameters, Acc is the ratio
of the number of samples that were correctly classified to the count of all samples
that were harvested by the ELM algorithm [42].

5 Machine Learning for Soft Robotics

The general concept of robotics has been known for being precise and rigid. However,
in the latest years an improved technology was introduced with modern concepts such
as flexibility and adaptability. Soft robotics presented abilities that were not present
in the stiff formation of robotics [43]. Large and multidimensional data sets can be
used by ML to extract features. When coupled with the application of soft sensors to
robots, soft robotics’ performance has increased significantly. By using soft sensors,
robotics sensitivity and adaptability may be improved. Due to the non-linearity and
high hysteresis of soft materials, soft sensors are however constrained in terms of
their capabilities and calibration difficulty.
Learning approaches can overcome the limitations of implementing soft sensors.
ML enable to accurate the calibration and characterization by considering their
nonlinearity and hysteresis [44]. Robotics tasks can be divided into two sections:
perception and control, perception tasks aim at collecting information about the
environment via sensors for extracting the target properties, while in the control
policy the agent interconnects with the environment for the purpose of learning a
certain behavior depending on the received award. However, control in soft robotic
is much complicated and needs more efforts for the following reasons:
318 L. E. Alatabani et al.

• In the case of data distribution, perception observations are equally distributed.


While in controls, the data is collected at one place and stored online.
• Supervision Signal: in perceptions are completely supervised while controls only
selected rewards are available.
• Data Collection: data is collected offline for perception and online for controls.
Control tasks necessitate a big amount of training data, expensive interaction
between the soft robot and its environment, a sizable action space, and a robot whose
structure is constantly changing due to the robot’s constant bending and twisting.
These challenges dragged attention to the development of models that can easily
adapt and evolve to learn using previous experiences and can also handle large
datasets.
DL techniques are known to have high performance characteristics in applications
with control nature of tasks, DL has the ability to learn from previous experiences
where the technique considers previous experience as a factor in the learning algo-
rithm and processes large datasets. DL algorithms has exceled over other approaches
with its characteristics of accuracy and precision. The continuous growth and evolu-
tion of soft robotics and its applications has raised the need for smarter control
systems that can perform tasks involving objects of different shapes, adapt to different
environments, and perform considerable number of tasks merging soft robots [45].
Due to the wide action and state space that soft robot applications represent, policy
optimizations pose a significant difficulty. By combining reinforcement learning and
neural networks, soft robotics applications performed better overall. A wide variety
of Deep Reinforcement Learning (DRL) techniques have been introduced to address
control-related issues.
A. Deep Q-Networks (DQN)
Is a deep neural network based on CNN with the aim to obtain the value of Q-function
by denoting the weights of the network by Q*(s,a) where the error and target can be
mathematically represented by the following equations.

δtD Q N = ytD Q N − Q(st , at ; θtQ ) (14)

And,

ytD Q N = Rt+1 + γ (15)

The weights are iteratively updated by,


  2
θt+1 ← θt − ∝ (∂ δtD Q N θtQ /∂tQ ) (16)

Two learning strategies are used by DQN, including the target network, which uses
the same framework as the Q-network and updates the weights on the Q-networks
while iteratively copying them to the weights of the target network. Additionally,
Machine Learning and Deep Learning Approaches for Robotics … 319

the experience replay shows how the network’s input is gathered in state-action
pairs, together with their rewards, and then saved in a replay memory before being
retrieved as samples to serve as mini-batches for learning. Gradient descent is used
in the remaining learning tasks to lessen the possibility of loss between the learnt
Q-network and the target Q-network.
B. Deep Deterministic Policy Gradients (DDPG)
The DDPG is a combination of Actor-critic methods aimed at modeling prob-
lems with high dimensional action space. DDPG is represented mathematically by
stochastic and deterministic policies in the following Eqs. (17) and (18).

Q π (st , at ) = E Rt+1 ,st+1 ∼E [Rt+1 + γ E at+1 ∼π [Q π (st+1 , at+1 )]] (17)

And,
  
Q μ (st , at ) = E Rt+1 ,st+1 ∼E Rt+1 + Q μ Q π st+1 , μ(s t+1 (18)

This method is one of the first algorithms in the field of DRL applied to soft robots.
C. Normalized Advantage Function (NAF)
With the aid of deep learning, Q-Learning is made possible in continuous and high-
dimensional action space. This technique differs from standard DQN in that it outputs
V, L, and V in the neural network’s final layer. The advantage required for the learning
strategy is predicted by and L. To lessen correlations in the observation data gathered
over time, NAF uses the target network and experience replays.
 
1
A s, a; θμ , θ L = − (a − μ(s, θμ ))T P(s; θ L )(a − μ(s; θμ )) (19)
2

where:
T
P s; θ L = L(s; θ L )L(s; θ L ) (20)

D. Asynchronous Advantage Actor Critic

The algorithm operates to collect observations by various actor-learners where each


network weights are updated by each storing gradient with their corresponding obser-
vations. The algorithm uses score function in a form of advantage function which
relay on policy representation π(a|s; θ π ) and value estimation V (s; θ V ). The obser-
vations made by the action-learners are what led to the creation of the advantage func-
tion. In iterative phases up to T steps, each actor-learner gathers observations about
the local environment, which accumulates gradients from samples in the rollouts.
The following equations mathematically represent the advantage function,
320 L. E. Alatabani et al.
T −1 
 
π T −t π
A st , a t , θ , θ V
= γ k−t
+γ V s T ; θ − V st ; θ ; θ
V V
(21)
k=t

where: θ π , θ V are the network parameters, s is the state, a, is the action, and t is the
learning step.
Instead of updating the parameters sequentially, they are updated simultaneously,
negating the need to learn stabilizing strategies like memory replay. The mechanism
uses action-learners who aid in inspecting a larger perspective of the environment to
aid in learning the best course of action.
E. Trust Region Policy Optimization (TRPO)
This algorithm was introduced to overcome the limitations which might occur using
other algorithms, for optimizing large nonlinear policies that enhances the accuracy.
Cost function is used in place of reward function to achieve this. Utilizing conjugate
gradient followed by line search to solve optimization problems has been shown to
improve performance [46]. The equation that follows illustrates it,
 ∞


η(π ) = E π γ c(st )|s0 ∼ ρ0
t
(22)
t=0

With the same concept the state-value function is replaced, represented by Eq. (23),

Aπ = Q pi (s, a) − V π (s) (23)

The result of the optimization function would be an updating rule for the policy
given by Eq. (24),
 

 
πold 
η(π ) = η(πold ) + E π γ A
t
(st , at )s0 ∼ ρ0 (24)

t=0

6 ML-Based Robotics Applications

By incorporating improved sensorimotor capabilities that can give a robot the ability
to adapt to a changing environment, robotics technology is a field that is quickly
developing. This is made possible by the integration of AI into robotics, which
enables optimizing the level of autonomy through learning using ML techniques.
The usefulness of adding intelligence to machines is determined by their capacity
to predict the future through planning how to finish a job and interacting with the
environment through successfully manipulating or navigating [47, 48].
Machine Learning and Deep Learning Approaches for Robotics … 321

6.1 Robotics Recommendation Systems Using ML

The development of modern life has made it clear that better decision-making
processes are required when it comes to addressing e-services to enhance client
decision making. These systems utilize personalized e-services with artificial
intelligence-based approaches and procedures to identify user profiles and prefer-
ences [49]. With the application of multiple ML methods, recommendation quality
was raised and user experience increased. Recommendation systems are primarily
designed to help people who lack the knowledge or skills to deal with a multitude
of alternatives through systems that estimate use preferences as a result of reviewing
data from multiple sources. Knowledge-based recommender systems, collaborative
filtering-based recommender systems, and content-based recommender systems are
the three types of recommender systems that exist [50].
Making recommendations for products that are comparable to those that have
previously caught the user’s attention is the aim of content-based recommender
systems. Items attributes are collected from documents or pictures using retrieval
techniques like the vector space model [51]. As a result, information about the user’s
preferences, or what they like and dislike, is included in their user profile. Collabora-
tive filtering (CF), the most popular technique in recommender systems, is predicated
on the idea that consumers who are similar to one another will typically consume
comparable products. A system based on user preferences, on the other hand, will
function using information about users with comparable interests. Memory-based and
model-based CF techniques are the two categories; memory-based is the more tradi-
tional type and uses heuristic algorithms to identify similarities between people and
things. The fundamental method utilized in memory-based CF is closest neighbor,
which is simple, efficient, and precise. As a result, user-based CF and item-based CF
are two subcategories of memory-based CF.
Model-based CF was initially offered as a remedy for the shortcomings in the
prior methodology, but its use has since spread to suit additional applications.
Model-based CF uses machine learning and data mining techniques to forecast user
behavior. Knowledge-based recommender systems are built on the user knowledge
already in existence and are based on user knowledge gleaned from past behavior.
A popular method known as “case-based” is used by knowledge-based systems to
use previous difficulties to address present challenges. Knowledge-based applica-
tion fields include, but are not limited to, real estate sales, financial services, and
health decision assistance. Each of these application areas deals with a different
issue and necessitates a thorough knowledge of that issue. Figure 7 displays robotics
applications for machine learning systems.
Figure 8 illustrates how the implementation of AI methods has improved the
performance of many techniques in numerous fields, including knowledge engi-
neering, reasoning, planning, communication, perception, and motion [52]. Simply
defined, recommender systems use historical behavior to predict client needs [53].
322 L. E. Alatabani et al.

Fig. 7 Recommended machine learning system in robotics

The Root Mean Square Error (RMSE), which is widely used to evaluate prediction
accuracy, is the most basic performance evaluation technique for qualitative evalu-
ation metrics. This evaluation is done using the mean squared error (MSE), which
is calculated by dividing the sum of the squares of the difference between the actual
score and the anticipated score by the total number of expected scores. Additional
qualitative evaluations that are covered by a confusion matrix and used to calcu-
late the value of the qualitative evaluation index include precision, re-call, accuracy,
F-measure, ROC curve, and Area under curve (AUC). By identifying whether or
not the user’s preference is based on a recommender system, this matrix enables the
evaluation of a recommender system. Each row in Table 1 represents a user-preferred
item, and each column shows whether a recommender model has suggested a related
item.
Where: The number of items that fit the user’s preferences is represented by TP.
The number of user favorites that the recommendation system does not suggest is
known as TN. FP represents the frequency at which systems suggest products that
people dislike. Additionally, FN stands for the number of cases for which the system
does not offer a suggestion.
Table 2 also includes other qualitative measures such as accuracy, which is the
proportion of recommendations that are successful, precision, which is the number
of choices that exactly match the user’s preference, recall, which is the proportion
of recommendations made by a recommender system based on actual data gathered
from users, F-means, which calculates the harmonic average of precision and recall,
and the ROC curve, which is a graph demonstrating the relationship between the
True Positive Rate (TPR) and precision and recall [54].
Machine Learning and Deep Learning Approaches for Robotics … 323

AI Areas Techniques

Transfer Active
Knowledge Learning Learning
Engineering

Deep Neural
Network
Fuzzy
Reasoning Technologies

Evolutionary
Algorithm Reinforcement
Learning
Planning

Natural
Language
Communication Processing

Computer
Perception Vision

Fig. 8 Artificial intelligence techniques in robotics applications

Table 1 Confusion matrix in


Priority Recommended Not recommended
the qualitative evaluation
matrix User’s favorite True Positive (TP) True Negative (TN)
item
User unfavorable False Positive (FP) False Negative (FN)
item

6.2 Nano-Health Applications Based on Machine Learning

With the development of robots and ML applications toward smart life, nanotech-
nology is advancing quickly. Nanotechnology is still in its infancy, though, and this
has researchers interested in expanding this subject. The term “Nano” describes the
324 L. E. Alatabani et al.

Table 2 Qualitative
Evaluation matrix Equation
evaluation using confusion
matrix components Thoroughness TP/TP + FP
Rendering TP/TP + FN
Accuracy TP + TN/TP + FN + FP + TN
F-measure 2 × (Precision × Recall) / (Precision ×
Recall)
ROC curve Ratio of TP rate = (TP/TP + FN) and FP
rate = (FP/FP + TN)
AUC Area under the ROC curve

creation of objects with a diameter smaller than a human hair [55]. Reference [56]
claims that nanotechnology entails the process of creating, constructing, and manu-
facturing materials at the Nano scale. Robots that are integrated on the Nano scale are
known as Nano robots [57]. According to the authors, Nano robots are objects that
can sense, operate, and transmit signals, process information, exhibit intelligence, or
exhibit swarm behavior at the Nano scale.
They are made up of several parts that work together to carry out particular
functions; the parts are constructed at the Nano scale, where sizes range from 1 to
100 nm. In the medical industry, Nano robots is frequently utilized in procedures
including surgery, dentistry, sensing and imaging, drug delivery, and gene therapy.
Reference [58] Fig. 9 shows an application of Nano robotics.
By providing value to image processing through picture recognition, grouping,
and classifications via medical imaging processing using ML, this will improve the
performance of Nano health applications. By incorporating machine learning (ML)
into biological analysis using microscopic images, disease identification accuracy
can be increased. In order to better understand the influence nanoparticles have
on their characteristics and interactions with the targeted tissue and cells, machine
learning (ML) methods have been utilized to predict the pathological response to
breast cancer treatment with a high degree of accuracy. Artificial neural networks,
which decreased the prediction error rate, enable the use of techniques without the
need for enormous data sets [59].

Medical Applications

Surgery Dentistry Sensing & Imaging Drug Delivery Gene Therapy

Fig. 9 Nano robots for medical applications


Machine Learning and Deep Learning Approaches for Robotics … 325

6.3 Localizations Based on ML Applications

Locating objects within a predetermined area is a process called localization. Machine


learning-based localization has been used in a wide variety of applications. Simul-
taneous Localization and Mapping to pedestrian localization systems are all exam-
ples of localization systems. Numerous fields, including autonomous driving, indoor
robotics, and building surveying and mapping, utilize SLAM. The two parts of SLAM
are indoor and outdoor.
A SLAM system typically consists of front-end sensors to gather information
about unknown scenes and back-end algorithms to map the scene and pinpoint the
sensor’s location within it. Fringe projection profilometry (FPP), a technique that has
demonstrated good accuracy performance and speed, is one of the optical meteorolog-
ical tools utilized in SLAM [60]. Simple hardware configuration, flexible application,
and dense point cloud gathering are benefits of FPP.
Front-End FPP technique: the system consists of a projector and a camera, and
it projects coded fringe patterns while simultaneously capturing height-modulated
fringe patterns with the camera. Using transform-based or phase-shifting algorithms,
the desired phase is then determined from the collected patterns. In order to gather
the obsolete phase since the obtained computation is warped into the region of (-
π, π), a phase unwarping technique is needed. With the help of system calibration
and 3-D reconstruction, the obsolete phase creates the picture correlation between
the projector and camera, which then yields a 3-D shape [60]. The wrapped phase
and obsolete phase are used in phase-shifting to achieve system calibration and 3-D
reconstruction. The fringe pattern can be represented as follows using a phase-shifting
algorithm:
 
In u c , vc = a u c , vc + b u c , vc cos ϕ u c , vc − δn , n = 1, 2, . . . N (25)

where: u c , vc are the pixel coordinates, N represents phase steps. δn = 2π(n − 1)/N
representing the phase shift a, bandϕ represents the background. The desired phase
is then calculated using the least-square algorithm represented in the Eq. (26).
 
N
n=1 [In (u , v )sin(δn )]
c c
ϕ u ,v
c c
= arctan N (26)
n=1 [In (u , v )cos(δn )]
c c

A gray code-based phase un-wrapping technique, which may be arrived at


theoretically via the Eq. (27), is used to capture accurate absolute phase data.

u c , vc = ϕ u c , vc + K u c , vc × 2π (27)

where K represents the fringe order. The mapping is in form of 3D points in a form of
a matrix with coordinate frame (x w , y w , z w ) represented by the following equation.
326 L. E. Alatabani et al.



⎡ c
⎤ xw
u ⎢ yw ⎥
s c ⎣ v c ⎦ = Ac ⎢ ⎥
⎣ zw ⎦
1
1
⎡ ⎤
⎡ c c c c
⎤ xw
a11 a12 a13 a14 ⎢ w ⎥
= ⎣ a c a c a c a c ⎦⎢ y ⎥ (28)
21 22 23 24 ⎣ w ⎦
c c c c z
a31 a32 a33 a34
1
⎡ ⎤
⎡ ⎤ xw
up ⎢ yw ⎥
s p ⎣ v p ⎦ = Ac ⎢ ⎥
⎣ zw ⎦
1
1
⎡ w⎤
⎡p p p p ⎤ x
a11 a12 a13 a14 ⎢ w ⎥
p p p p ⎦⎢ y ⎥
= ⎣ a21 a22 a23 a24 ⎣ w ⎦ (29)
p p p p z
a31 a32 a33 a34
1

where: c is the camera and p are the projector. s = scaling factor. A = the resultant
product of the intrinsic and extrinsic matrix which is represented as follows;
 
Ac = I c × R c |T c
⎡ c ⎤ ⎡ c c c c⎤
f x 0 u c0 r11r12 r13 t1
c c c c⎦
= ⎣ 0 f yc v0c ⎦ = ⎣ r21 r22 r23 t2 (30)
c c c c
0 0 1 r31 r32 r33 t3
 
A p = I p × R p |T p
⎡ p p⎤ ⎡ p p p p⎤
fx 0 u0 r11r12 r13 t1
= ⎣ 0 f yp v p ⎦ = ⎣ r p r p r p t p ⎦
0 21 22 23 2 (31)
p p p p
0 0 1 r31r32 r33 t3

where: I is the intrinsic parameter, f x , f y = projector focal, u 0 , v0 = principal coordi-


nate. [R|T] = extrinsic matrix. Hence, the matching of absolute phase can be satisfied
through the equation given as follows,

(u c , vc ) × λ
up = (32)

Which is the matching between the points from the camera and projector.
The back-end: Following the collection of high-quality data, rapid and accurate
mapping is required. This is accomplished by using a registration technique based on
Machine Learning and Deep Learning Approaches for Robotics … 327

a 2D to 3D descriptor. By resolving the transformation matrix, which uses the coor-


dinate transformation to convert the affine transformation between 2D feature points
to a 3D point cloud to carry out the registration, accurate mapping is accomplished.
Speeded Up Robust Features (SURF) algorithm is used to extract 2D matching points,
resulting in a 2D transformation matrix [60, 61]. The 2D transformation matrix is then
converted to a 3D transformation matrix, after which the corresponding 3D points
are extracted. Next, we acquire the transformation matrix joined with the initial
prior. Finally, we perform the cloud registration by applying the output transforma-
tion matrix from the previous step [62]. The SURF algorithm can be represented
mathematically by the following equation.


n
  2
min P1 − R P2 − T  (33)
i=1

where: P1 and P2 represent the 3D data P1: (xw1 , yw1 , z w1 ) and P2: (xw2 , yw2 , z w2 )
The pedestrian localization systems has been growing recently, machine learning
has been applied to different types of pedestrian localization. There is a tendency to
apply supervised learning and scene analysis in pedestrian localization aspects for its
accuracy. The use of DL as a branch is implemented for its high processing capacity.
Scene analysis is the most frequency used ML approach in pedestrian localization
for its easy implementation and fair performance [61, 62].

6.4 Control of Dynamic Traffic Robots

Automated guided vehicles (AGV) have developed into autonomous mobile robots
as a result of recent breakthroughs in robotic applications (AMR). To get to the
vision-based system we have today, the main component of AGV material handling
systems has advanced through numerous mechanical, optical, inductive, inertial,
and laser guided milestones [63]. The technologies that improve the performance
of these systems include sensors, strong on-board processors, artificial intelligence,
simultaneous location and mapping, and strong on-board processors. The robot can
comprehend the workplace thanks to these technologies.
AMRs have AI algorithms applied to improve their navigation; they travel on
their own through unpredictable terrain. Machine learning (ML) methods can be
used to identify and categorize obstacles. A few examples include fuzzy logic, neural
networks, genetic algorithms, and neuron fuzzy. To move the robot from one place
to another while avoiding collisions, all of these techniques are routinely used. The
ability of the brain to do specific tasks serves as the source of inspiration for these
strategies [63, 64]. For example, if we take into consideration a dual-arm robot,
we may construct and analyze a control algorithm for the oscillation, position, and
speed control of the dual-arm robot. This required the usage of a dynamic system. The
system design incorporates time delay control and pole placement-based feedback
328 L. E. Alatabani et al.

control for the control of oscillation (angular displacement), precise position control,
and speed control, respectively.

7 Robotics Applications Challenges

As robots are employed in homes, offices, the healthcare industry, operating auto-
mobiles, and education, robotics applications have become a significant part of our
life. In order to improve performance, including accuracy, efficiency, and security,
it is increasingly common to deploy bots to integrate several applications utilizing
machine learning techniques [65]. Open difficulties in AI for robots include the
type of learning to be used, ML application, ML architecture, standardization, and
incorporation of other performance evaluation criteria in addition to accuracy [66].
Exploring different learning approaches would be beneficial for performance and
advancement, even though supervised learning is the most typical type of learning
in robotic applications [67]. Use ML tools to solve issues brought on by wireless
connectivity, which increases multipath and lowers system performance. Bots are
adopting DL architectures more frequently, particularly for localization, but their
use is restricted since DL designs require a significant amount of difficult-to-obtain
training data. To analyze the efficacy of ML systems, it is crucial to identify best
practices in these fields and take into account alternative evaluation criteria because
standard performance evaluation criteria are constrained [68].
There are many uses for machine learning, including in forensics, energy manage-
ment, health, and security. Since they are evolving so quickly, new trends in robotics
and machine learning require further study. Among the trends are end-to-end auto-
mated, common, and continuous DL approaches for data-driven intelligent systems.
Technologies that are quickly evolving, such as 5G, cloud computing, and blockchain,
offer new opportunities to improve the system as a whole [69, 70]. Issues with
user security, privacy, and safety must be resolved. Black box smart systems have
opportunities in AI health applications because to their low deployment costs, rapid
performance, and accuracy [71, 72]. These applications will aid in monitoring,
rehabilitation, and diagnosis. The future research trends also includes,
• AI algorithms which offer essential role in data analytics and decision making for
robotics operations.
• IT infrastructure such as 5G which plays an integral role with low latency, high
traffic, and fast connection for robotic based Industrial Applications.
• Human–robot Interaction HRC gained famous reputation lately in health appli-
cations during the pandemic.
• Big data and cloud-based applications are expected to accelerate in the coming
years to be applied with robotics for their powerful analytics that helps the
decision-making process.
Machine Learning and Deep Learning Approaches for Robotics … 329

8 Conclusions

Machine learning provides effective solutions in several areas such as robotics. ML


enabled to develop an autonomous robots for some applications related to industrial
automation, the Internet of Things, and autonomous vehicles. It has also contributed
significantly to the medical field. When ML is combined with other technology such
as computer vision, this can develop a number of improved applications that serve
healthcare with the possibility of high classification efficiency and the development
of a generation of robots that have the ability to move and interact with patients
according to their movement and gestures [73]. ML based robotics applications to
add value to the performance of robotics systems through the use of recommender
systems which is used in marketing by analyzing user behavior to recommend certain
choices, Nano health applications which introduced Nano size robotics to perform
certain health tasks in treatment and diagnoses, localization to determine the location
of certain items or users within a certain area, dynamic traffic using ML improved the
movement and manipulation of robots by obstacle avoidance and smooth movements
[74, 75]. The challenges of robotics applications can be solved by extended research
in areas that are considered as trending issues which will give promising solutions
and improvements to the currently available issues and challenges.

References

1. Saeed, M., OmriS., Abdel-KhalekE. S., AliM. F., & Alotaibi M.: Optimal path planning for
drones based on swarm intelligence algorithm. Neural Computing and Applications, 34, 10133–
10155 (2022). https://doi.org/10.1007/s00521-022-06998-9.
2. Niko, S., et al. (2018). The limits and potentials of deep learning for robotics. The International
Journal of Robotics Research, 37(4), 405–420. https://doi.org/10.1177/0278364918770733
3. Ali, E. S., Zahraa, T., & Mona, H. (2021). Algorithms optimization for intelligent IoV applica-
tions. In J. Zhao & Vinoth K. (Eds.), Handbook of Research on Innovations and Applications
of AI, IoT, and Cognitive Technologies (pp. 1–25). Hershey, PA: IGI Global (2021). https://doi.
org/10.4018/978-1-7998-6870-5.ch001
4. Matt, L, Marie, F, Louise, A., Clare, D, & Michael, F. (2020). Formal specification and verifi-
cation of autonomous robotic systems: A survey. ACM Computing Surveys, 52I(5), 100, 1–41.
https://doi.org/10.1145/3342355.
5. Alexander, L., Konstantin, M., & Pavol. B. (2021). Convolutional Neural Networks Training
for Autonomous Robotics, 29, 1, 75–79. https://doi.org/10.2478/mspe-2021-0010.
6. Hassan, M., Mohammad, H., Othman, O., & Aisha, H. (2022). Performance evaluation of
uplink shared channel for cooperative relay based narrow band internet of things network. In
2022 International Conference on Business Analytics for Technology and Security (ICBATS).
IEEE.
7. Fahad, A., Alsolami, F., & Abdel-Khalek, S. (2022). Machine learning techniques in internet of
UAVs for smart cities applications. Journal of Intelligent & Fuzzy Systems, 42(4), 3203–3226
8. Salih, A., & Sayed A.: Machine learning in cyber-physical systems in industry 4.0. In A.
Luhach & E. Atilla (Eds.), Artificial Intelligence Paradigms for Smart Cyber-Physical Systems.
(pp. 20–41). Hershey, PA: IGI Global. https://doi.org/10.4018/978-1-7998-5101-1.ch002.
9. Gruber, T. R. (1995). Toward principles for the design of ontologies used for knowledge sharing.
International Journal of Human-Computer Studies, 43, 907–928.
330 L. E. Alatabani et al.

10. Lim, G., Suh, I., & Suh, H. (2011). Ontology-Based unified robot knowledge for service robots
in indoor environments. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems
and Humans, 41, 492–509.
11. Mohammed, D., Aliakbar, A., Muhayy, U., & Jan, R. (2019). PMK—A knowledge processing
framework for autonomous robotics perception and manipulation. Sensors, 19, 1166. https://
doi.org/10.3390/s19051166
12. Wil, M., Martin, B., & Armin, H. (2018). Robotic Process Automation, Springer Fachmedien
Wiesbaden GmbH, part of Springer Nature (2018)
13. Aguirre, S., & Rodriguez, A. (2017). Automation of a business process using robotic
process automation (RPA): A case study. Applied Computational Science and Engineering
Communications in Computer and Information Science.
14. Ilmari, P., & Juha, L. (2021). Robotic process automation (RPA) as a digitalization related tool
to process enhancement and time saving. Research. https://doi.org/10.13140/RG.2.2.13974.
68161
15. Mona, B., & Sayed, A. (2021). Intelligence IoT Wireless Networks. Intelligent Wireless
Communications, IET Book Publisher.
16. Niall, O. et al. (2020). In K. Arai & S. Kapoor (Eds.), Deep Learning versus Traditional
Computer Vision. Springer Nature Switzerland AG 2020: CVC 2019, AISC 943 (pp. 128–144).
https://doi.org/10.1007/978-3-030-17795-9_10.
17. Othman, O., & Muhammad, H. et al. (2022). Vehicle detection for vision-based intelligent
transportation systems using convolutional neural network algorithm. Journal of Advanced
Transportation, Article ID 9189600. https://doi.org/10.1155/2022/9189600.
18. Ross, G., Jeff, D., Trevor, D., & Jitendra, M. (2019). Region-based convolutional networks
for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 38(1), 142–158.
19. Ian, G., Yoshua, B., & Aaron. C. (2016). Deep Learning (Adaptive Computation and Machine
Learning series) Deep Learning. MIT Press.
20. Macaulay, M. O., & Shafiee, M. (2022). Machine learning techniques for robotic and
autonomous inspection of mechanical systems and civil infrastructure. Autonomous Intelligent
Systems, 2, 8. https://doi.org/10.1007/s43684-022-00025-3
21. Khan, S., Rahmani, H., Shah, S. A. A., Bennamoun, M. (2018). A guide to convolutional neural
networks for computer vision. Springer. https://doi.org/10.2200/S00822ED1V01Y201712CO
V01.
22. Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey.
International Journal of Robotics Research, 32, 1238–1274.
23. Bakri, H., & Elmustafa, A., & Rashid, A.: Machine learning for industrial IoT systems. In J.
Zhao & V. Vinoth Kumar, (Eds.), Handbook of Research on Innovations and Applications of
AI, IoT, and Cognitive Technologies (pp. 336–358). Hershey, PA: IGI Global, (2021). https://
doi.org/10.4018/978-1-7998-6870-5.ch023
24. Luong, N. C., Hoang, D. T., Gong, S., Niyato, D., Wang, P., Liang, Y. C., & Kim, D. I. (2019).
Applications of deep reinforcement learning in communications and networking: A survey.
IEEE Communications Surveys Tutorials, 21, 3133–3174.
25. Chen, Z., & Huang, X. (2017). End-to-end learning for lane keeping of self-driving cars. In
2017 IEEE Intelligent Vehicles Symposium (IV) (pp. 1856–1860). https://doi.org/10.1109/IVS.
2017.7995975.
26. Jiang, H., Liangcai, Z., Gongfa, L., & Zhaojie, J. (2021). Learning for a robot: Deep reinforce-
ment learning, imitation learning, transfer learning, learning for a robot: Deep reinforcement
learning, imitation Learning. Transfer Learning. Sensors, 21, 1278. https://doi.org/10.3390/
s21041278
27. Yan, W., Cristian, C., Beltran, H., Weiwei, W., & Kensuke, H. (2022). An adaptive imitation
learning framework for robotic complex contact-rich insertion tasks. Frontiers in Robotics and
AI, 8, 90–123.
28. Ali, E. S., Hassan, M. B., & Saeed, R. (2020). Machine learning technologies in internet of
vehicles. In: M. Magaia, G. Mastorakis, C. Mavromoustakis, E. Pallis & E. K Markakis (Eds.),
Machine Learning and Deep Learning Approaches for Robotics … 331

Intelligent Technologies for Internet of Vehicles. Internet of Things. Cham: Springer. https://
doi.org/10.1007/978-3-030-76493-7_7.
29. Alatabani, L. E., Ali, E. S., & Saeed, R. A. (2021). Deep learning approaches for IoV applica-
tions and services. In: N. Magaia, G. Mastorakis, C. Mavromoustakis, E. Pallis, E. K. Markakis
(Eds.), Intelligent Technologies for Internet of Vehicles. Internet of Things. Cham: Springer.
https://doi.org/10.1007/978-3-030-76493-7_8
30. Lina, E., Ali, E., & Mokhtar A. et al. (2022). Deep and reinforcement learning technologies on
internet of vehicle (IoV) applications: Current issues and future trends. Journal of Advanced
Transportation, Article ID 1947886. https://doi.org/10.1155/2022/1947886.
31. Venator, M. et al. (2021). Self-Supervised learning of domain-invariant local features for robust
visual localization under challenging conditions. IEEE Robotics and Automation Letters, 6(2).
32. Abbas, A., Rania, A., Hesham, A. et al. (2021). Quality of services based on intelligent IoT
wlan mac protocol dynamic real-time applications in smart cities. Computational Intelligence
and Neuroscience, 2021, Article ID 2287531. https://doi.org/10.1155/2021/2287531.
33. Maibaum, A., Bischof, A., Hergesell, J., et al. (2022). A critique of robotics in health care.
AI & Society, 37, 467–477. https://doi.org/10.1007/s00146-021-01206-z
34. Yanxue, C., Moorhe, C., & Zhangbo, X. (2021). Artificial intelligence assistive technology
in hospital professional nursing technology. Journal of Healthcare Engineering, Article ID
1721529, 7 pages. https://doi.org/10.1155/2021/1721529.
35. Amanda, P., Jan, B., & Qingbiao, L. (2022). The holy grail of multi-robot planning: Learning
to generate online-scalable solutions from offline-optimal experts. In International Conference
on Autonomous Agents and Multiagent Systems (AAMAS 2022).
36. Lorenzo, C., Gian, C., Cardarilli, L., et al. (2021). Multi-agent reinforcement learning: A
review of challenges and applications. Applied Sciences, 11, 4948. https://doi.org/10.3390/app
11114948
37. Mahbuba, A., Jiong, J., Akhlaqur, R., Ashfaqur, R., Jiafu, W., & Ekram, H. (2021). Resource
allocation and service provisioning in multi-agent cloud robotics: A comprehensive survey.
Manuscript. IEEE. Retrieved February 10, 2021.
38. Wang, Y., Damani, M., Wang, P., et al. (2022). Distributed reinforcement learning for robot
teams: A review. Current Robotics Reports. https://doi.org/10.1007/s43154-022-00091-8
39. Elfatih, N. M., et al. (2022). Internet of vehicle’s resource management in 5G networks using
AI technologies: Current status and trends. IET Communications, 16, 400–420. https://doi.org/
10.1049/cmu2.12315
40. Edmund, J., Greg, F., David, M., & David, W. (2021). The segmented colour feature extreme
learning machine: applications in agricultural robotics. Agronomy, 11, 2290. https://doi.org/10.
3390/agronomy11112290
41. Rodrigues, I. R., da Silva Neto, S. R.,Kelner, J., Sadok, D., & Endo, P. T. (2021). Convolutional
extreme learning machines: A systematic review. Informatics 8, 33. https://doi.org/10.3390/inf
ormatics8020033.
42. Jianwen, G., Xiaoyan, L, Zhenpeng, I., & Yandong, L. et al. (2021). Fault diagnosis of indus-
trial robot reducer by an extreme learning machine with a level-based learning swarm opti-
mizer. Advances in Mechanical Engineering 13(5), 1–10. https://doi.org/10.1177/168781402
11019540
43. Ali, Z., Lorena, D., Saleh, G., Bernard, R., Akif, K., & Mahdi, B. (2021). 4D printing soft robots
guided by machine learning and finite element models. Sensors and Actuators A: Physical, 322,
112774.
44. Elmustafa, S. et al. (2021). Machine learning technologies for secure vehicular communica-
tion in internet of vehicles: Recent advances and applications. Security and Communication
Networks, Article ID 8868355. https://doi.org/10.1155/2021/8868355.
45. Ho, S., Banerjee, H., Foo, Y., Godaba, H., Aye, W., Zhu, J., & Yap, C. (2017). Experimental
characterization of a dielectric elastomer fluid pump and optimizing performance via composite
materials. Journal of Intelligent Material Systems and Structures, 28, 3054–3065.
46. Sarthak, B., Hritwick, B., Zion, T., & Hongliang, R. (2019). Deep reinforcement learning for
soft, flexible robots: brief review with impending challenges. Robotics, 8, 4. https://doi.org/10.
3390/robotics8010004
332 L. E. Alatabani et al.

47. Estifanos, T., & Mihret, M.: Robotics and artificial intelligence. International Journal of
Artificial Intelligence and Machine Learning, 10(2).
48. Andrius, D., Jurga, S., Žemaitien, E., & Ernestas, Š. et al. (2022). Advanced applications of
industrial robotics: New trends and possibilities. Applied Science, 12, 135. https://doi.org/10.
3390/app12010135.
49. Elmustafa, S. A. A., & Mujtaba, E. Y. (2019). Internet of things in smart environment: Concept,
applications, challenges, and future directions. World Scientific News (WSN), 134(1), 151.
50. Ali, E. S., Sohal, H. S. (2017). Nanotechnology in communication engineering: Issues,
applications, and future possibilities. World Scientific News (WSN), 66, 134-148.
51. Reham, A. A., Elmustafa, S. A., Rania, A. M., & Rashid, A. S. (2022). Blockchain for IoT-Based
cyber-physical systems (CPS): Applications and challenges. In: D. De, S. Bhattacharyya, &
Rodrigues, J. J. P. C. (Eds.), Blockchain based Internet of Things. Lecture Notes on Data
Engineering and Communications Technologies (Vol. 112). Singapore: Springer. https://doi.
org/10.1007/978-981-16-9260-4_4.
52. Zhang, Q, Lu, J., & Jin, Y. (2020). Artificial intelligence in recommender systems. Complex &
Intelligent Systems. Retrieved September 28, 2020 from, https://doi.org/10.1007/s40747-020-
00212-w.
53. Abdalla, R. S., Mahbub, S. A., Mokhtar, R. A., Elmustafa, S. A., Rashid, A. S. (2021). IoE
design principles and architecture. In Book: Internet of Energy for Smart Cities: Machine
Learning Models and Techniques. USA: Publisher: CRC group, Taylor & Francis Group.
54. Hyeyoung, K., Suyeon, L., Yoonseo, P., & Anna, C. (2022). A survey of recommenda-
tion systems: recommendation models, techniques, and application fields, recommendation
systems: recommendation models, techniques, and application fields. Electronics, 11, 141.
https:// doi.org/https://doi.org/10.3390/electronics11010141.
55. Yuanyuan, C., Dixiao, C., Shuzhang, L. et al. (2021). Recent advances in field-controlled micro–
nano manipulations and micro–nano robots. Advanced Intelligent Systems, 4(3), 2100116, 1–23
(2021). https://doi.org/10.1002/aisy.202100116,
56. Mona, B., et al. (2021). Artificial intelligence in IoT and its applications. Intelligent Wireless
Communications, IET Book Publisher.
57. Neto, A., Lopes, I. A., & Pirota, K. (2010). A Review on Nanorobotics. Journal of
Computational and Theoretical Nanoscience, 7, 1870–1877.
58. Gautham, G., Yaser, M., & Kourosh, Z. (2021). A Brief review on challenges in design and
development of nanorobots for medical applications. Applied Sciences, 11, 10385. https://doi.
org/10.3390/app112110385
59. Egorov, E., Pieters, C., Korach-Rechtman, H., et al. (2021). Robotics, microfluidics, nanotech-
nology and AI in the synthesis and evaluation of liposomes and polymeric drug delivery
systems. Drug Delivery and Translational Research, 11, 345–352. DOI: 10.1007/s13346-021-
00929-2.
60. Yang, Z., Kai, Z., Haotian, Y., Yi, Z., Dongliang, Z., & Jing, H. (2022). Indoor simultaneous
localization and mapping based on fringe projection profilometry 23, arXiv:2204.11020v1
[cs.RO].
61. Miramá, V. F., Díez, L. E., Bahillo, A., & Quintero, V. (2021). A survey of machine learning
in pedestrian localization systems: applications, open issues and challenges. IEEE Access, 9,
120138–120157. https://doi.org/10.1109/ACCESS.2021.3108073
62. Tian, Y., Adnane, C., & Houcine, C. (2021). A survey of recent indoor localization scenarios
and methodologies. Sensors, 21, 8086. https://doi.org/10.3390/s21238086
63. Giuseppe, F., René, D., Fabio, S., Strandhagen, J. O. (2021). Planning and control of
autonomous mobile robots for intralogistics: Literature review and research agenda European
Journal of Operational Research, 294(2), (405–426). https://doi.org/10.1016/j.ejor.2021.01.
019. Published 2021.
64. Alfieri, A., Cantamessa, M., Monchiero, A., & Montagna, F. (2012). Heuristics for puzzle-based
storage systems driven by a limited set of automated guided vehicles. Journal of Intelligent
Manufacturing, 23(5), 1695–1705.
Machine Learning and Deep Learning Approaches for Robotics … 333

65. Ahmad, B., Xiaodong, Z., & Haiming, S. et al. (2022). Precise motion control of a power line
inspection robot using hybrid time delay and state feedback control. Frontiers in Robotics and
AI 9(24). https://doi.org/10.3389/frobt.2022.746991.
66. Elsa, J., Hung, K., & Emel, D. (2022). A survey of human gait-based artificial intelligence
applications. Frontiers in Robotics and AI, 8. https://doi.org/10.3389/frobt.2021.749274.
67. Xi, V., & Lihui, W. (2021). A literature survey of the robotic technologies during the COVID-
19 pandemic. Journal of Manufacturing Systems, 60, 823–836. https://doi.org/10.1016/j.jmsy.
2021.02.005
68. Ahir, S., Telavane, D., & Thomas, R. (2020). The impact of artificial intelligence, blockchain,
big data and evolving technologies in coronavirus disease-2019 (COVID-19) curtailment. In:
Proceedings of the International Conference of Smart Electronics Communication ICOSEC
2020 (pp. 113–120). https://doi.org/10.1109/ICOSEC49089.2020.9215294.
69. Lana, I. S., Elmustafa, S., & Saeed, A. (2022). Machine learning in healthcare: Theory, applica-
tions, and future trends. In R. El Ouazzani & M. Fattah & N. Benamar (Eds.), AI Applications
for Disease Diagnosis and Treatment (pp. 1–38). Hershey, PA: IGI Global, (2022). https://doi.
org/10.4018/978-1-6684-2304-2.ch001
70. Jat, D., & Singh, C. (2020). Artificial intelligence-enabled robotic drones for COVID-19
outbreak. Springer Briefs Applied Science Technology, 37–46 (2020). DOI: https://doi.org/
10.1007/978–981–15–6572–4_5.
71. Schulte, P., Streit, J., Sheriff, F., & Delclos, G. et al. (2020). Potential scenarios and hazards in
the work of the future: a systematic review of the peer-reviewed and gray literatures. Annals of
Work Exposures and Health, 64, 786–816, (2020), DOI: https://doi.org/10.1093/annweh/wxa
a051.
72. Alsolami, F., Alqurashi, F., & Hasan, M. K. et al. (2021). Development of self-synchronized
drones’ network using cluster-based swarm intelligence approach. IEEE Access, 9, 48010–
48022. https://doi.org/10.1109/ACCESS.2021.3064905.
73. Alatabani, L. E., Ali, E. S., Mokhtar, R. A., Khalifa, O. O., & Saeed, R. A. (2022). Robotics
architectures based machine learning and deep learning approaches. In 8th International
Conference on Mechatronics Engineering (ICOM 2022), Online Conference, Kuala Lumpur,
Malaysia (pp. 107–113). https://doi.org/10.1049/icp.2022.2274.
74. Malik, A. A., Masood, T., & Kousar, R. (2020). Repurposing factories with robotics in the face
of COVID-19. IEEE Transactions on Automation Science and Engineering, 5(43), 133–145.
https://doi.org/10.1126/scirobotics.abc2782.
75. Yoon, S. (2020). A study on the transformation of accounting based on new technologies:
Evidence from korea. Sustain, 12, 1–23. https://doi.org/10.3390/su12208669
A Review on Deep Learning on UAV
Monitoring Systems for Agricultural
Applications

Tinao Petso and Rodrigo S. Jamisola Jr

Abstract In this chapter we present literature review on UAV monitoring systems


that utilized deep learning algorithms to ensure improvement on plant and animal
production yields. This work is important because of the growing world population
and thus increased demand for food production, that threaten food security and
national economy. Hence the need to ensure sustainable food production that is made
more complicated with the advent of global warming, occupational preference for
food consumption and not food production, diminishing land and water resources. We
choose to consider studies that utilize the UAV platform to collect images compared
to satellite because UAVs are easily available, cheaper to maintain, and the collected
images can be updated at any time. Previous studies with research foci on plant and
animal monitoring are evaluated in terms of their advantages and disadvantages. We
looked into different deep learning models and compared their model performances
in using various types of drones and different environmental conditions during data
gathering. Topics on plant monitoring include pest infiltration, plant growth, fruit
conditions, weed invasion, etc. While topics on animal monitoring include animal
identification and animal population count. The monitoring systems used in previous
studies utilize computer vision that include processes such as image classification,
object detection, and segmentation. It aids in increasing efficiency, high accuracy,
automatic, and intelligent system for a particular task. The recent advancements in
deep learning models and off-the-shelf drones open more opportunities with lesser
costs and faster operations in most agricultural monitoring applications.

Keywords Deep learning algorithms · UAV monitoring systems · Animal


agricultural applications · Plant agricultural applications · Computer vision ·
Literature review

T. Petso (B) · R. S. Jamisola Jr


Botswana International University of Science and Technology, Palapye, Botswana
e-mail: [email protected]
R. S. Jamisola Jr
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 335
A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous
Systems Applications, Studies in Computational Intelligence 1093,
https://doi.org/10.1007/978-3-031-28715-2_11
336 T. Petso and R. S. Jamisola Jr

1 Introduction

Agriculture significantly contributes to the growth of an individual country’s econ-


omy [95, 125]. The rapid increase of the growing population and negative challenges
of changing environmental conditions to name a few, cultivates the need to research
technological applications to enhance sustainable agriculture productivity with bet-
ter monitoring strategies [114]. The capability to employ remote sensing technology
had greatly advanced precision agriculture. Precision agriculture is the capacity to
make appropriate decision at the ideal time to evaluate the agricultural data [136].
It optimizes the input elements, such as water, fertilizer to enhance productivity,
quality and yield and minimize pests and diseases through specific targeted applica-
tion and appropriate amount and time to crop and livestock management [73, 81].
An efficient plant monitoring is important as excessive usage of chemicals such as
pesticides and fertilizers can destroy the fertility of the soil [110]. In animal moni-
toring an inappropriate usage of antibiotics or zoonosis can harm human health [81].
The implementation of precision agriculture ensures an efficient approach to food
security in the world [112].
The conventional open field farming had been used for many years, it has been
established to suffer environmental conditions that hamper agricultural production
hence extensive monitoring is needed [116]. Drones had been identified to be an
ideal to complement the existing agricultural monitoring applications for effective
and better production yield [94]. The common reason for using drones for agricul-
tural application is the capability of their remote sensing [77]. The usage of drones
have replaced satellites which are deemed to be costly and with low performance in
efficiency and accuracy as compared to drones [76, 93]. They also hold the capability
to access inaccessible agricultural areas [82]. A controlled environmental condition
farming also known as indoor farming holds great benefits such as reduced labour
and increased production yield for both plant and livestock [18]. The challenge of
crop diseases and pests is also reduced [38]. It has been established that indoor drone
operation is a challenge hence the investigation in the greenhouse and dairy farm barn
was conducted [59]. From this study, it had been established to be effective with the
implementation of visual simultaneous localization and mapping (VSLAM). This
approach can assist in real-time agricultural monitoring to aid in precision farm-
ing. It was established that drones reduce labour and time, hence saving costs [30].
Drones had also been established to increase agricultural productivity, efficiency and
sustainability [92, 96, 121]. Drones hold the capability to provide near real-time
information for the farmers to make an appropriate decision in agricultural manage-
ment [10]. These include crop as well as livestock conditions.
Deep learning has been applied in several agricultural application fields such as
automatic diseases detection, pest detection, weed detection, fruit detection, livestock
detection, to name a few [39, 71, 105, 122]. Its greatest attribute is the capability of
automatic feature extraction [126]. Recently an increase in deep learning applications
had been investigated due several aspects, such as affordability of a graphic process-
A Review on Deep Learning on UAV Monitoring Systems … 337

Fig. 1 DJI Phantom 3 SE performing fruit condition monitoring

ing unit (GPU) hardware and advancements of deep learning algorithms architectures
in the computer vision domain [36].

There are several drone agricultural monitoring applications an example appli-


cation is illustrated in Fig. 1.

The application of drones with deep learning algorithms improves agricultural


management efforts [13]. The combination of drones and deep learning algorithms
aids as a powerful tool to agricultural applications to ensure food security [20, 57].
They have proven to hold promising near real-time agricultural monitoring capability
for better production yield as compared to human effort [20]. Crop diseases and pest
infections had been established to cause signification loss to agricultural production
yields [34]. The loss of pest infections alone can result in as much as 30–33% of crop
production in a year [126]. The study conducted by [7] used drones for automatic
early detection of crop diseases to prevent future spreading. The advancements in
unmanned systems for automated livestock monitoring had been established to be a
promising alternative approach to better production yield [75].
The implementation of drones and deep learning algorithms holds some lim-
itations, such as image quality due to elements like drone altitude, shadows and
natural occlusion. The continuous advancing technological and hardware improve-
ments highlight the need to periodically review the deep learning algorithms with
drone-based systems and establish vital potential research directions. This chapter
strives to achieve the following objectives:
• Identification of research studies with contributions to drone application incorpo-
ration with deep learning strategies in agricultural applications
338 T. Petso and R. S. Jamisola Jr

• Identification of research focus, advantages and disadvantages


• Identification of possible future research areas
This chapter provides researchers with fundamental basics into the usage of drones
with deep learning strategies for agricultural applications and directions for future
research. It also guides future research works and highlights UAV operational capa-
bilities, advantages and disadvantages of using deep learning models.
The rest of the chapter is organised as follows: Sect. 2 outlines the methodology
and the basic steps involved in the development of UAV agricultural monitoring
applications. Sections 3 and 4 present findings on applications of deep learning mod-
els in plant and animal monitoring, respectively. Section 5 entails a discussion of
the findings with contributions and advantages and disadvantages. Finally, Sect. 6
concludes the chapter and highlight possible future research areas.

2 Proposed Methodology

In this literature review study, we perform analysis to explore the use of drones
for different agricultural application through the use of deep learning strategies.
Google Scholar was mainly used as a search engine for data collection. The key-
words that were used include “animal agricultural applications”, “computer vision”,
“deep learning algorithms”, “plant agricultural application”, and “UAV monitoring
system”. The initial step was data collection of related work and secondly detailed
review of that research work. The approach reveals a detailed overview deep learning
strategies, advantages, disadvantages, and future research ideas that can be exploited
in agriculture.

2.1 An Overview of Deep Learning Strategies used


in Agriculture

In recent years, advancements in deep learning has tremendous achievements in


agricultural applications such as saving labour cost, time, and expertise needed [72].
Deep learning architectures passes the input data through several layers, each layer
it passes is capable to extract features along the way [74]. Figure 2 highlights feature
extraction and classification process. Deep learning based models are made up of
different elements such as convolution layers, pooling layers, activation functions
dependent upon the the architecture [51]. Each architecture holds the pros and cons,
hence an appropriate selection is needed for a specific task [90].
Most of the architectures are pre-trained from a larger dataset for agricultural
applications and this process is known as transfer learning [31, 128].
Deep learning architecture training time is gradually reduced by the use of trans-
fer learning. The two most common types of detectors include one stage and two
A Review on Deep Learning on UAV Monitoring Systems … 339

Convolution layer Convolution layer Fully connected

Output
layer

Input layer Pooling layer Pooling layer

Feature extraction and classification

Fig. 2 The basic architecture of the convolutional neural network

stage detectors. The difference between the two is a trade-off of detection speed
and accuracy. The one stage detectors hold the capability for high detection speeds
as compared to two stage detectors [63]. These include single shot detector (SSD),
you only look once (YOLO) models, RetinaNet, to name a few [87]. The two stage
detectors have the capability of high detection accuracy compared to one stage detec-
tors [26]. Some of these includes Region convolutional neural network (R-CNN),
Fast region-based convolutional neural network (Fast R-CNN), Faster R-CNN, Mask
R-CNN, to name a few [102].

The five basic steps for the development of UAV agricultural monitoring appli-
cation includes UAV platform, data collection, deep learning model, perfor-
mance evaluation and agricultural application (Fig. 3).

UAV Platform
The most three common types of drones used for agricultural applications include
rotary-wing, fixed-wing, and fixed-wing hybrid Vertical Take-Off and Landing
(VTOL). These types hold different advantages and disadvantages. Rotary wing
drones are capable of flying at low altitude, hover and highly manoeuvrable hence
beneficial for image related agricultural monitoring. Their greatest challenge is low
endurance due to high power usage. Fixed-wing drones have high endurance and
payload capability which can be used for agricultural pesticide spraying. Fixed-wing
hybrid is a combination of rotary and fixed-wing characteristics. They hold the needed
attributes for agricultural applications such as hovering, high endurance, and better
manoeuvrability.
Data Collection
The images collected for the development of deep learning agricultural monitoring
systems are obtained using remote sensing from the drone. The limited public datasets
for agricultural drone videos and images for development of deep learning algorithms
340 T. Petso and R. S. Jamisola Jr

Fig. 3 Deep learning


agricultural monitoring
application development
Agricultural
steps include data collection Application
through a UAV platform,
deep learning model
selection, model
performance evaluation and
agricultural application Performance UAV
through an intelligent robotic Evaluation Platform
system

Deep
Learning Data
Collection
Model

highlights the need to collect datasets. The remote sensing specification and drone
altitude highly contributes to the image quality for monitoring systems capabilities.
Different environmental conditions such as brightness, contrast, drone altitude and
sharpness, to name a few are taken into account to ensure the development of a robust
deep learning agricultural monitoring system. A sample drone altitude variation
which constitutes model performance due to feature extraction characteristics are
highlighted in Figs. 4 and 5.
Deep Learning Model
The selection of the deep learning algorithms used for agricultural applications is
dependent upon the research objective and hardware capability. To ensure proper

Fig. 4 Low drone altitude


capable for detailed features
characteristics needed by
deep learning model
A Review on Deep Learning on UAV Monitoring Systems … 341

Fig. 5 High drone altitude


susceptible to limited feature
characteristics

training the data augmentation, hyperparameters like an optimizer, batch size, learn-
ing rate are set to optimum results during model training. Data augmentation is
primarily used to strengthen the size and quality of the training dataset, thus a robust
agricultural deep learning model can be developed [113]. The process of collect-
ing training dataset is prone to be expensive, thus data augmentation also has holds
the capability to increase the limited training dataset [23]. The hyperparameters are
mainly used to fine-tune the deep learning models for improved and better perfor-
mance [83].
Performance Evaluation
The four fundamental evaluation elements are graphically represented in Fig. 6. These
feature elements are used to compute the performance evaluation metrics: precision,
recall, f1 score, and accuracy. The Eqs. (1)–(6) presents their mathematical defini-
tions. Other metrics commonly used in research studies are average precision (AP)
and mean average precision (mAP). The following are the fundamental evaluation
elements definitions;

True positive (TP): model correctly detect a true object.


True negative (TN): model correctly detects a false object.
False positive (FP): model incorrectly detect a true object.
False negative (FN): model incorrectly detect a false object.

The mathematical deep learning performance evaluation metrics are as follows:


TP
Pr ecision (P) = (1)
T P + FP

TP
Recall (R) = (2)
T P + FN
342 T. Petso and R. S. Jamisola Jr

Fig. 6 Basic confusion Model Class


matrix for model evaluation Negative
Positive

True False
Positive Positive Negative
(TP) (FN)

Actual
Class

False True
Negative Positive Negative
(FP) (TN)

2P R
F1 Scor e = (3)
P+R

TP +TN
Accuracy (A) = (4)
T P + T N + FP + FN
 Recalln − Recalln−1
Average Pr ecision (A P) = (5)
n
Pr ecision n

1 
Mean Average Pr ecision (m A P) = A P. (6)
N
Agricultural Applications
The agricultural applications that are reviewed consider both plant and animal mon-
itoring A graphically representation of this UAV monitoring systems are detailed in
Fig. 7.

3 Findings on Applications of Deep Learning Models


in Plant Monitoring

The traditional agricultural monitoring systems to tackle challenges such as changing


climate, food security, growing population had been established to be insufficient
to address these requirements [118]. The recent advancement of automated UAV
monitoring systems with deep learning models had proven to improve plant yield
efficiently and minimum environmental damage [134]. Plant monitoring at different
growth stages is an important attribute to ensure combating plant problems in time for
A Review on Deep Learning on UAV Monitoring Systems … 343

Pest Infiltration

Plant Growth
Plant/Crop
Fruit Conditions
Monitoring
Weed Invasion
Drone

Monitoring Crop Disease Detection

Systems Livestock

Livestock Detection

Monitoring Livestock

Counting

Fig. 7 Agricultural applications

better management strategy [29]. It is vital for farmers to have proper yield estimates
to enable preparation for harvesting and market supply projections [9]. Pest outbreaks
in crops are unpredictable, continuous monitoring of them to prevent crop losses are
of adamant importance [42]. The capability of pest detection in near real-time aids
to make immediate and appropriate decision on time [25]. The implementation of
deep learning algorithms with drones contribute to appropriate decision making at
the right time for prompt plant monitoring tactics [125]. It has been established
to increase agricultural productivity as well as efficiency and save costs [29]. The
study conducted by [101] addressed insect monitoring through automated detection
to protect soft-skinned fruits, it was established that to be cost-effective, less time-
consuming and labour-demanding as compared to the existing monitoring methods.

3.1 Pest Infiltration

Pest infiltration greatly impacts the overall plant production yield [70]. They can
cause a production yield loss of roughly 20–40% [49]. Due to the gradual climate
change over the years pest outbreaks are common [84]. The usage of deep learning
techniques with drone to combat pests in plants is a practical approach [120]. The
study conducted by [25] automatically detected pest through deep learning technique
(YOLO v3 tiny) near real-time. This capability enhanced the drone to spray the pest to
the appropriate location thereby ensuring less destruction and improving production
344 T. Petso and R. S. Jamisola Jr

yield. The capability to detect, locate and quantify pest distribution with the aid of
drones and deep learning techniques eliminates human labour which is costly and
time-consuming [33]. The capability to detect pests during production minimise yield
loss on time [24]. Rice is one of the primary agricultural produce in Asian countries,
an effective monitoring is vital to ensure quality and high production yield [17, 58].
Table 1 highlight the advantages and disadvantages of a summary of studies with
the use of UAV monitoring systems for automatic pest detection with the application
deep learning models. The capability of deep learning models yield capability of near
real-time plant monitoring. The mobile hardware required needed to achieve these
needs a high graphical processing unit (GPU) to improve the model performance.
Deep learning models hold great opportunities for better feature extraction capabil-
ities vital for pest detection. An increase drone altitude and occlusion reduced the
capability of automatic pest detection.

3.2 Plant Growth

Agriculture has a great impact to the human survivability. Agricultural yield pro-
duction during plant growth is ideal to ensure improvement for food planning and
security. The ability to acquire agricultural data at different growth stages are vital for
better farm management strategies [28]. The capability of near real-time monitoring
for seedling germination is vital for early crop yield estimation [48]. The determi-
nation of plant flowing is vital for agricultural monitoring purposes [3]. It is also
essential to establish the plant maturity at the appropriate time for decision-making
purposes [79]. Thus, enabling yield prediction for better field management. The use
of deep learning models with drones had been established to save costs and time as
compared to the physical traditional approach [139]. The study conducted by [79]
highlighted the effectiveness of saving time and costs for estimating the maturity
of soybeans over time. The ability to attain pre-harvest information such as product
number and size is important for management decision-making and marketing plan-
ning [127]. The conventional maize plant growth monitoring was established to be
time demanding and labour intensive as compared to the use of deep learning with
drones [88]. A study by [45] highlighted the capability of automatic oil palm not
accessible to human to establish if they are ready for harvest or not.
Table 2 highlights the capability of plant growth monitoring at different stages
such as germination, flowering, immaturity and maturity for UAV agricultural appli-
cations. The capabilities of plant growth monitoring aids an effective approach to
early decision making needed to ensure high production yield and better agricultural
planning. Figure 8 demonstrates the capability of automatic seedling detection with
lightweight deep learning algorithm from a UAV platform. Some factors such as
occlusion due to plant growth, environmental background conditions, and environ-
ment thickness to name a few impact the overall performance of UAV monitoring
systems. Hardware with graphic processing units are needed for the capability to aid
near real-time essential for appropriate needed action.
A Review on Deep Learning on UAV Monitoring Systems … 345

Table 1 Advantages and disadvantages of UAV based pest infiltration monitoring applications
Main focus Pests Advantages Disadvantages References
Rice pest Leafhopper, Instant evaluation Requires an [17]
detection Insect, Arthropod, and prevent rice internet platform
Invertebrate, Bug damage in timely
manner
Coconut trees Rhinoceros Immediate pest High rate Wi-Fi [24]
pest detection beetle, Red palm monitoring required
weevil, Black capability
headed
caterpillar,
Coconut
Eriophyid mite,
Termite
Longan crop pest Tessaratoma Approach hold Limited power [25]
detection papillosa near real-time for GPU systems
capability with is a challenge
optimum pest
location and
distribution
Maize pest Spodoptera Early detection of Some leaves [33]
detection frugiperda infected maize under occlusion
leaves in time could not detect
pest damage
Maize leaves Fall armyworm Capability of near Challenge to [43]
affected by fall real-time results established exact
armyworms location for
future reference
Detect and Insects Deep learning Hardware with [106]
measure models high memory
leaf-cutting ant (YOLOv5s) (GPU) need
nest outperformed the
traditional
multilayer
perceptron neural
network
Soybean crop Acrididae, Deep learning Pest detection [120]
pest detection Anticarsia approach challenge at
gemmatalis, outsmarted the higher drone
Coccinellidae, other feature altitudes
Diabrotica extraction models
speciosa, Edessa
meditabunda,
Euschistus heros
adult, Euschistus
heros nymph,
Gastropoda,
Lagria villosa,
Nezara viridula
adult, Nezara
viridula nymph,
Spodoptera spp.
346 T. Petso and R. S. Jamisola Jr

Table 2 Advantages and disadvantages of plant growth monitoring applications


Main focus Stage Advantages Disadvantages References
Maize growth Flowering Feasibility of Challenge for thin [3]
automatic tassel tassel detection
detection
Sea cucumber Population The ability to Challenge for sea [65]
distribution growth monitor sea cucumber
cucumber detection in a
efficiently over a dense
large piece of environment
land
Peanut seedling Seedling The approach Requires a GPU [67]
detection and emergence saved hardware to
counting approximately achieve near
80% total time to real-time
count germinated capability
seedling as
compared to
human effort
Cotton seedling Seedling Positive Environmental [68]
detection and emergence capability for conditions such
count accurate and as soil
timely cotton background,
detection and lighting, clouds,
count wind affects deep
learning model
performance
Soybean maturity Maturity The approach Weather or soil [79]
aided making conditions can
decision in lead to
soybean maturity discrepancy
over time results
Individual Olive Tree biovolume Positive Fine resolution [104]
tree estimation estimation of provided better
biovolume for detection
automatic tree accuracy as
growth compared to
monitoring coarse resolution
Rice phenology Germination, leaf Estimated harvest Challenge for [133]
development, time detection of rice
tillering, stem corresponded phenology at
elongation, well with early stage
inflorescence expected harvest
emergence and time
heading,
flowering and
anthesis,
development of
fruit, ripening
A Review on Deep Learning on UAV Monitoring Systems … 347

Fig. 8 Sample seedling detection from a UAV platform with custom YOLO v4 tiny

3.3 Fruit Conditions

The capability to estimate the fruit yield and location is important for farm man-
agement and planning purposes [45]. A combination drone with deep learning algo-
rithms had been established to be effective method to locate the ideal areas to pick
fruits [64]. The traditional approach has been determined to be time-consuming and
labour demanding [130]. This approach enhances fruit detection in challenging ter-
rains. The study conducted by [8] established automatic detection, counting and size
of citrus fruits on an individual trees aided yield estimation. Another study by [129]
highlighted the capability of automatic mango detection and estimation using UAV
monitoring system with deep learning algorithms. The study conducted by [50] high-
lighted the melon detection, size estimation and location through UAV monitoring
system to be less labour intensive.
Table 3 highlights the positive capability and research gaps of UAV monitoring
systems using deep learning algorithms for fruit condition monitoring. Some of the
advantages highlighted includes effective approaches such as promising automatic
counting accuracy as compared to the manual approach. Figure 9 illustrates the
capability of automatic ripe lemon detection needed for harvesting planning pur-
poses. High errors are contributed by the fruit tree canopy during fruit detection and
counting yield estimation [78]. Environmental conditions such as occlusion, lighting
consistent as existing challenges to fruit condition monitoring.

3.4 Weed Invasion

The presence of weed holds great challenge for the overall agricultural production
yield [80]. They are capable to cause a production loss to as much as 45% of the yield
[99]. The presence of weeds also known as unwanted crops in the agricultural field
348 T. Petso and R. S. Jamisola Jr

Table 3 Advantages and disadvantages of fruit conditions monitoring applications


Main focus Advantages Disadvantages References
Fruit size and yield Positive yield Lightning conditions, [8]
estimation estimation of citrus distance to the citrus
fruits with average fruit to name a few
standard error of affected the detection
6.59% capability
Strawberry yield Promising strawberry Misclassification of [27]
prediction forecast monitoring mature and immature
system with counting strawberry with dead
accuracy of 84.1% leaves. Occlusion still
exists for strawberries
under leaves
Detection of oil palm The capability of Some fruits are [47]
loose fruits lightweight models subjected to occlusion
aids in fruit detection
at lower costs
To detect and locate Capable to establish Occlusion from [64]
longan fruits an accurate location to branches and leaves
pick longan fruits with hampered the
a capability total time detection capability
of 13.7 s
Detection of green An effective mango Lighting variation [129]
mangoes yield estimation affected the mango
approach with detection accuracy
estimation error rate of
1.1%
Strawberry monitoring Efficient and effective False classification [135]
approach to monitor and missed strawberry
strawberry plant detection still exist
growth
Strawberry maturity To automatically Strawberry flowers [139]
monitoring detect strawberry and fruits under leaves
flowering, immature were not detected
and mature fruits

competes for the needed resources such as sunlight, water, soil nutrition and growing
space [107]. An effective approach for weed monitoring is vital for appropriate farm
management [66]. Early weed detection is important to improve agricultural produc-
tivity and ensure sustainable agriculture [86]. The use of herbicides to combat weeds
have negative consequences such as destruction to the environmental surrounding
and harmful to the human health [11]. An appropriate quantity usage of herbicides
and pesticides based from correct weed identification and their location is an impor-
tant factor [41, 56]. Drones hold the capability of precise location and appropriate
chemical usage [56]. The ability of near real-time weed detection is vital farm man-
A Review on Deep Learning on UAV Monitoring Systems … 349

Fig. 9 Automatic ripe lemon detection from a UAV platform with custom YOLO v4 tiny

agement [76, 124]. The traditional approach to weed detection is manual and labour
intensive, and time-consuming [85].
Table 4 highlight advantages and disadvantages for UAV monitoring systems for
weed detection through deep learning models. The capability of weed detection with
deep learning models is an effective approach for early weed management strategies
to ensure high agricultural production yield. From the reviewed studies some of
the challenges encountered includes high drone altitude, lighting conditions, and
automatic detection of unclassified weed species during the deep learning model
training.

3.5 Crop Disease Monitoring

Crop diseases had been established to hamper the production yield hence the need for
extensive crop monitoring [19, 119, 138]. It also contributes to economical impact in
agricultural production concerning quality and quantity [55]. Automatic crop disease
detection is critically important for crop management and efficiency [54]. Early and
correct crop disease detection aids in plant management strategies in a timely manner
and ensures high production yield [52, 62]. Early disease symptoms are likely to be
identical and proper classification is vital to tackle them [1, 35]. The plant disease
have been established to affect the continues food supply. The study by [117] used
mask R-CNN for successful automatic detection of northern leaf blight which affects
maize. The traditional manual methods of diseases identification had been established
to be time consuming and labour demanding as to drones [60]. It is also susceptible to
human error [46]. Figure 10 presents a visualisation crop disease symptom vital for
monitoring purposes. The advantages and disadvantages of crop disease monitoring
are highlighted in Table 5.
350 T. Petso and R. S. Jamisola Jr

Table 4 Advantages and disadvantages of weed invasion monitoring applications


Main focus Weed Advantages Disadvantages References
Weed detection Species name not Capability of Uneven plant [11]
based on provided automatic weed spacing hampered
inter-row localisation the model
detection performance as
weeds are
identical to plants
Crop and weed Matricaria Promising Limited [22]
classification chamomilla L., classification capability to
Papaver rhoeas capability classify unknown
L., Veronica species
hederifolia L.,
Viola arvensis
ssp. arvensis
Weed detection – A cost-effective Demanding task [44]
and localisation approach for for labelling the
weed detection dataset for the
and location growing season
estimation
Automatic near Rice weed Aids the Different drone [61]
real-time rice capability of nearaltitude and angle
weed detection real-time weed under various
detection light conditions
affect detection
capability
Detection of Heracleum Capability for Power [76]
hogweed sosnówskyi hogweed consumption for
identification on embedded system
embedded system (NVIDIA Jetson
for near real-time Nano) is sensitive
to combat the to input voltage
weed spreading in
a timely manner
Weed estimation Species name not Capability of An increase in [86]
provided automatic plant area coverage the
detection and deep learning
weed estimation model (YOLO)
overestimated the
weed distribution
Weed detection Avena Fatu, Capability of near Lightweight [137]
and location in Cirsium Setosum, real-time models have low
wheat field Descurainia automatic detection
Sophia, detection with accuracy as
Euphorbia lightweight compared to full
Helioscopia, models on mobile models
Veronica Didyma devices
A Review on Deep Learning on UAV Monitoring Systems … 351

Fig. 10 Discolouration at the edges of the vegetable leaves

4 Findings on Applications of Deep Learning Models


in Animal Monitoring

Drones have been established to be an effective approach to monitoring livestock


remotely as compared to stationary cameras for farm management [13, 69]. Their
incorporation with deep learning techniques had highlighted positive outcomes for
livestock automatic identification [100]. They also hold the feasibility of monitoring
animals for various purposes such as behaviour, health and movement patterns which
is a critical element of proper farm management [12, 91, 98]. They save time and
cost-effective alternative approach as compared to the traditional approaches such as
the use of horse riding or vehicle for frequent farm visual physical animal monitoring
[2, 14, 16]. The capability of continuous animal monitoring is deemed as a complex
task, other alternative approaches such as the use of GPS devices are expensive
[123]. The study conducted by [6] for automatic individual cattle identification with
the implementation using a drone and deep-learning yielded promising proof-of-
concept for agricultural automation for cattle production monitoring in near real-time.
Another study by [4] highlighted that deep learning techniques can be implemented
for automatic cattle counting to be less labour demanding as compared to the manual
approach.
352 T. Petso and R. S. Jamisola Jr

Table 5 Advantages and disadvantages of crop disease monitoring applications


Main focus Crop disease Advantages Disadvantages References
Automatic Esca Efficient Data [52]
detection of Esca monitoring preprocessing
disease approach required human
with expertise to
label data for
training purposes
Automatic Mildew Approach Colour variations [53]
detection of vine monitoring slightly affect the
disease method model
incorporates performance
visible, infrared
spectrum and
depth mapping
Automatic Mildew Potential early Limited training [54]
detection of monitoring under dataset
mildew disease visible and
infrared spectrum
Identification of Cercospora leaf Cost-effective Drone motion [97]
affected okra spot approach to okra highlight to
plants disease plant hamper the
monitoring detection
accuracy
Detection and Brown rust & Capability to Colour variation [103]
classification of yellow rust detect and similar
diseased-infected automatically background
wheat crops detect diseased hampered
individual leaves detection
capability
Automatic Powdery mildew Efficient Different lighting [119]
soybean leaf approach to and background
disease monitor soaybean variation
identification disease conditions can
hamper
identification
capabilities
Automatic Yellow rust Elimimates the Low possibility [138]
detection of manual approach of
yellow rust and skilled misclassification
disease personnel still exists
A Review on Deep Learning on UAV Monitoring Systems … 353

Table 6 UAV based deep learning models used for plant monitoring applications
Deep learning UAV platform Application Findings Performance References
YOLO v3 Self assembled Pest detection Efficient mAP: [25]
YOLO v3 tiny APD-616X Pest location pesticide usage YOLO v3: 93.00% YOLO
v3 tiny: 89.00%
Speed:
YOLO v3: 2.95 FPS
YOLO v3 tiny: 8.71 FPS
ResNeSt-50 DJI Mavic air 2 Pest detection Effective pest Validation accuracy: [33]
ResNet-50 Pest location monitoring ResNeSt-50: 98.77%
Efficient Net Pest approach ResNet-50: 97.59%
RegNet Quantification Efficient Net: 97.89%
RegNet: 98.07%
VGG-16 DJI Phantom 4 Pest damage Effective Accuracy: [43]
VGG-19 Pro approach to VGG-16: 96.00%
Xception v3 increase crop VGG-19: 93.08%
MobileNet v2 yield Xception v3: 96.75%
MobileNet v2: 98.25%
YOLO v5xl DJI Phantom 4 Ant nest pest Precise Accuracy: [106]
YOLO v5l Adv detection monitoring YOLO v5xl: 97.62%
YOLO v5m approach YOLO V5l: 98.45%
YOLO v5s YOLO v5m: 97.89%
YOLO v5s: 97.65%
Inception-V3 DJI Phantom 4 Pest control Alternative pest Accuracy: [120]
ResNet-50 Adv monitoring Inception-V3: 91.87%
VGG-16 strategies ResNet-50: 93.82%
VGG-19 VGG-16: 91.80%
Xception VGG-19: 91.33%
Xception: 90.52%
Faster R-CNN DJI Phantom 3 Maize tassel Positive F1 Score: [3]
CNN Pro detection productivity Faster R-CNN: 97.90%
growth CNN: 95.90%
monitoring
VGG-16 eBee Plus Crop Useful crop F1 Score: 86.00% [28]
identification identification
YOLO v3 DJI Phantom 4 Sea cucumber Successful mAP = 85.50% [65]
Pro detection Sea growth density Precision = 82.00%
cucumber estimation Recall = 83.00% F1
density Score = 82.00%
CenterNet DJI Phantom 4 Cotton stand Feasibility of mAP: [68]
MobileNet Pro detection early seedling CenterNet: 79.00%
Cotton stand monitoring MobileNet: 86.00%
count Average Precision:
CenterNet: 73.00%
MobileNet: 72.00%
Faster R-CNN DJI Phantom 3 Citrus grove Positive yield SE: 6.59% [8]
Pro detection Count estimation
Size estimation
R-CNN DJI Phantom 4 Apple detection Effective F1 Score > 87.00% [9]
Pro Apple approach for Precision > 90.00%
estimation yield prediction
(continued)
354 T. Petso and R. S. Jamisola Jr

Table 6 (continued)
Deep learning UAV platform Application Findings Performance References
RetinaNet DJI Phantom 4 Melon detection Successful Precision = 92.00% F1 [50]
Pro Melon number yield estimation Score > 90.00%
estimation
Melon weight
estimation
FPN DJI Jingwei Longan fruit Effective fruit mAP: [64]
SSD M600 PRO detection detection FPN: 54.22%
YOLO v3 Longan fruit SSD: 66.53%
YOLO v4 location YOLO v3: 72.56%
MobileNet- YOLO v4: 81.72%
YOLO v4 MobileNet-YOLOv4:
89.73%
YOLO v2 DJI Phantom 3 Mango Effective mAP: 86.40% [129]
detection approach for Precision: 96.10%
Mango mango Recall: 89.00%
estimation estimation
YOLO v3 DJI Phantom 4 Flower Effective mAP: 88.00% AP: [139]
Pro detection approach for 93.00%
Immature fruit yield prediction
detection
Mature fruit
detection
YOLO v3 DJI Matrice Monocot weed Capability of AP Monocot: 91.48% [32]
600 Pro detection Dicot weed detection AP Dicot: 86.13%
weed detection in the field
FCNN DJI Phantom 3 Hogweed Positive results ROC AUC: 96.00% [76]
DJI Mavic Pro detection for hogweed Speed: 0.46 FPS
Embedded detection
devices
Faster R-CNN DJI Matrice Weed detection Weed Precision: [124]
SSD 600 Pro monitoring Faster R-CNN: 65.00%
SSD: 66.00%
Recall:
Faster R-CNN: 68.00%
SSD: 68.00%
F1 Score:
Faster R-CNN: 66.00%
SSD: 67.00%
YOLO v3 tiny DJI Phantom 3 Weed detection Effective mAP = 72.50% [137]
approach for
weed detection
in wheat field
SegNet Quadcopter Mildew disease Promising Accuracy: [54]
(Scanopy) detection disease Grapevine-level > 92%
detection grapes Leaf-level > 87%
SqueezeNet Quadcopter Cercospora Promising Validation accuracy: [97]
ResNet-18 (Customised) Leaf Spot disease SqueezeNet: 99.10%
disease detection ResNet-18: 99.00%
detection
DCNN DJI S1000 Yellow dust Crop disease Accuracy: 85.00% [138]
detection monitoring
(continued)
A Review on Deep Learning on UAV Monitoring Systems … 355

Table 6 (continued)
Deep learning UAV platform Application Findings Performance References
Inception-V3 DJI Phantom 3 Leaf disease Crop disease Accuracy: [119]
ResNet-50 Pro detection monitoring Inception-V3: 99.04%
VGG-19 ResNet-50: 99.02%
Xception VGG-19: 99.02%
Xception: 98.56%
1 [DCNN—Deep Convolutional Neural Network; Faster R-CNN—Faster Region-based Convolu-
tional Neural Network; FCNN—Fully Convolutional Neural Networks; FPN—Feature Pyramid
Network; VGG-16—Visual Geometry Group 16; VGG-19—Visual Geometry Group 19; YOLO
v2—You Only Look Once version 2; YOLO v3—You Only Look Once version 3; YOLO v4—You
Only Look Once version 4;YOLO v5—You Only Look Once version 5; R-CNN—Region-based
Convolutional Neural Network; CNN—Convolutional Neural Network; SSD—Single Shot Detec-
tor; SE—Standard Error]

4.1 Animal Population

The capability of farm management is deemed to be a demanding task on a large piece


of land [15]. The traditional approach of livestock counting is mainly a manual task
and labour demanding, which is susceptible to errors [108]. Farms are most likely
to be in the rural area with limited personnel to perform frequent animal count [37].
Drone had been investigated to to enhance animal population management [131].
The study conducted by [123] investigated an alternative approach for automatic
detection of animals and their activities from the natural habitant using a drone.
Figure 11 illustrates automatic identification of sheep from a UAV platform with
a deep learning algorithm. Low drone altitudes hold the great probability to alter
animal behaviour as well minimal animal coverage [89, 123]. The study conducted
by [109] highlighted the increase in drone altitude from 80 to 120 m to increase
sheep coverage whereas the capability to detect sheep contours was hampered. The

Fig. 11 Sample sheep detection from a UAV image with custom YOLO v4 tiny model
356 T. Petso and R. S. Jamisola Jr

study conducted by [21] highlighted a great unease in a flock of sheep towards a


presence of a drone as compared to dogs and human presence. Other challenges
included the vegetation growth and old trees for better sheep detection capability.
Environmental conditions such as high winds or rain affects the deployment of a
drone. The advantages and disadvantages of the reviewed studies are highlighted in
Table 7.

Table 7 Advantages and disadvantages of animal identification and population count monitoring
applications
Main focus Advantages Disadvantages References
Sheep detection and Online approach Online approach used [2]
counting yielded promising more power as
results immediately as compared to offline
compared to offline
Individual cattle Capability of non Challenge of false [5]
identification intrusive cattle positive exists in cases
detection and such as multi-cattle
individual alignment and cattle
identification with similar features
on the cattle coat
Aerial cattle Several deep learning Challenges conditions [13]
monitoring algorithms highlighted such as blur images
capability of livestock hampered
monitoring
Aerial livestock Automatic sheep Challenge such as [109]
monitoring detection to aid sheep close contact sheep for
counting individual
identification and
sheep under trees and
bushes could not be
detected
Cattle detection and Capability for cattle The model [111]
count management for performance decreases
grazing purposes with fast moving
animals
Detecting and Capability of cattle Challenge of cattle [115]
counting cattle counting by deleting movement hampers
duplicate animals cattle counting
capability
Livestock detection Capability of livestock Challenge with [132]
for counting capability monitoring based on overestimation due to
mask detection limited training
images
A Review on Deep Learning on UAV Monitoring Systems … 357

5 Discussion and Comparison of Deep Learning Strategies


in Agricultural Applications

The capability of effective agricultural monitoring is vital to ensure sustainable food


security to the crucial growing human population and changing climate. The recent
advancements in open based deep learning models and readily available drones have
opened research studies in the application of automatic agricultural monitoring appli-
cations. Tables 6 and 8 from the reviewed studies primarily highlights over 91% usage
of off-the-shelve drones as compared to self-assembled or customised drones.
The advantages and disadvantages of different deep learning models from differ-
ent agricultural applications on UAV monitoring platforms are summarised under
each subsection: pest infiltration, plant growth, fruit conditions, weed invasion, crop
disease, animal identification and count. The capability of deep learning strategies
from a UAV monitoring system greatly contributes to efficient and effective agri-
cultural outputs. They reduce skilled human involvement and the time needed for
immediate decision-making. From the reviewed studies, the application of transfer
learning is a widely used approach for different agricultural deep learning mod-
els. The model size in terms of model layers, power consumption and performance
speed are the deciding factors in the deep learning architecture selection. The recent
architecture version outperforms the older versions.
An effective pest monitoring strategy is vital for localisation and appropriate pes-
ticide usage to combat plant pests. Pest detection can also be interpreted as the extent
of the damage, and other indicators include ant nests. Pest management identifica-
tion of pest location is essential for pesticides applied to the appropriate site to save
time and costs. Based on the reviewed studies, the investigated deep learning models
yielded a detection accuracy greater than 90.00%. The highest accuracy was with
ResNeSt-50 with 98.77%. The possible reason can be a high number of layers for
feature extraction needed for better pest detection. It is vital to investigate differ-
ent deep learning models to identify the optimum one with respect to performance
evaluation. The study conducted by [106] highlighted that significantly increasing
the number of convolutional kernels and hidden layers in the model architecture can
increase model accuracy to a certain degree. YOLO v5xl with more convolutional
kernels and hidden layers had a lower accuracy as compared to YOLO v5l.
Plant growth monitoring at different stages aids in better agricultural management
such appropriate and timely interventions to ensure high production yield. Early
timely intervention efforts to replenish unsuccessful germination crops to catch up
with appropriate plant density, thus increasing the probability of high production
yield. The latter stage of plant growth monitoring aids in harvest estimation to help
in the market decision, manpower estimation, and equipment requirements. Plant
growth is not significant within a day. Thus, two stage detectors are ideal for the
application. Lightweight and non-lightweight deep learning models were investi-
gated in the reviewed studies. The two stage detectors attained high model perfor-
mance in terms of the f1 score with faster R-CNN at 97.90%. Two stage detectors
had been established to hold high detection accuracy due to the division of region
358 T. Petso and R. S. Jamisola Jr

Table 8 UAV based deep learning models used for animal monitoring applications
Deep learning UAV platform Application Findings Performance References
YOLO Self- Sheep Promising Accuracy: [2]
Assembled detection on-board Offline processing: 60.00%
drone Sheep system Online pre-processing: 89%
counting Online processing: 97.00%
LRCN DJI Inspire Cattle Non-intrusive mAP: [5]
Mk1 detection Cattle detection: 99.3%
Single frame Single frame individual: 86.07%
individual Accuracy:
Video based Video based individual: 98.13%
individual
YOLO v2 DJI Matrice Individual Practical Accuracy: [6]
Inception v3 100 cattle biometric YOLO v2: 92.40%
identification identification Inception v3: 93.60%
CNN DJI Phantom Cattle Effective and Accuracy > 90.00% [15]
4 Pro DJI detection efficient
Mavic 2 Cattle approach
counting
YOLO v3 DJI Tello Cattle Improved Not provided [37]
Deep Sort detection livestock
Cattle monitoring
counting
YOLO v2 DJI Phantom Cattle Positive cattle Precision: 95.70% [111]
4 detection grazing Recall: 94.60%
Cattle management F1 Score: 95.20%
counting
VGG-16 DJI Phantom Canchim Promising Accuracy: [13]
VGG-19 4 Pro cattle results for VGG-16: 97.22%
ResNet-50 v2 detection detection VGG-19: 97.30%
ResNet-101 ResNet-50 v2: 97.70%
v2 ResNet-101 v2: 98.30%
ResNet-152 ResNet-152 v2: 96.70%
v2 MobileNet MobileNet: 98.30%
MobileNet v2 MobileNet v2: 78.70%
DenseNet 121 DenseNet 121: 85.20%
DenseNet 169 DenseNet 169: 93.50%
DenseNet 201 DenseNet 201: 93.50%
Xception v3 Xception v3: 97.90%
Inception Inception ResNet v2: 98.30%
ResNet v2 NASNet Mobile: 85.70%
NASNet NASNet Large: 99.20%
Mobile
NASNet
Large
Mask R-CNN DJI Mavic Pro Cattle Potential Accuracy: [131]
detection cattle Pastures: 94.00%
Cattle monitoring Feedlot: 92.00%
counting
Mask R-CNN DJI Mavic Pro Livestock Effective Accuracy: [132]
classification approach for Cattle: 96%
Livestock livestock Sheep: 92%
counting monitoring
1 [LRCN—Long-term Recurrent Convolutional Network; YOLO—You Only Look Once; YOLO
v2—You Only Look Once version 2; YOLO v3—You Only Look Once version 3; R-CNN—Region-
based Convolutional Neural Network; VGG-16—Visual Geometry Group 16; VGG-19—Visual
Geometry Group 19]
A Review on Deep Learning on UAV Monitoring Systems … 359

of interest. The disadvantage it holds is the lower detection speed compared to one
stage detectors for near real-time capability. Thus, applying a two stage detector is
appropriate for plant growth agricultural monitoring.
An effective fruit condition monitoring is beneficial for better agricultural decision-
making in relation of fruit quantity, size, weight and degree of maturity estimation, to
name a few. These capabilities are needed for production yield prediction needed for
agricultural management planning such as fruit picking and market value estimation.
Fruit detection can help in planning fruit picking to consider the ease or difficulty of
picking and possible dangers. This will help acquire appropriate equipment to ensure
a smooth harvest process during fruit picking time. Fruit detection is performed at
different stages, flowering, mature, and immature, to help decide the harvest time
and ensure the maximum number of ripe fruits. The reviewed studies investigated
both lightweight and non-lightweight deep learning models. This approach provides
minimum time, less labour demand and lower erroneous as compared to manual fruit
monitoring. The highest performance evaluation was 91.10% precision for mango
detection and estimation from the reviewed studies. Though the model performance
can be good, some challenges such as tree occlusion and lighting variations can
hamper the overall model performance.
The presence of weeds hamper the plant growth, thus they compete for sunlight,
water, space and soil nutrients. Early detection and appropriately addressing them
greatly contributes to better agricultural production yield. The capability to detect
weeds is a challenging task due to similar characteristic features of plants. To help in
the accuracy of weed detection, we have to increase our knowledge of the expected
weed associated with a particular crop. Considering that they are many types of
weeds, we can concentrate on the more appropriate ones for a specific crop and
disregard others. This way, we can save time in deep learning model training. The
highest performance evaluation for weed detection was established to be 96.00%
with the classification model performance with FCN. The possible reason for this
high model performance is connected to the fact that FCN simplifies the feature
extraction learning process faster as it avoids the application of dense layers in the
model architecture.
The plant disease detection is commonly characterized by a change in colour
leaves such as isolated spots, widespread spots, isolated lesions, a cluster of lesions,
to name a few. SqueezeNet had the highest accuracy of 99.10%. Other studies high-
lighted high model accuracy of over 85.00%. The possible reason for high accuracy
capability is due to the change in plant leaves for detection purposes. We recommend
studies on detecting fallen leaves or broken leaves caused by an external force to help
determine the plant’s health.
Automatic livestock identification and count from a UAV also encompasses min-
imal animal behavioural change from the presence of a drone. Most of the reviewed
studies individually identified livestock for population and health monitoring. How-
ever, other studies are capable of counting livestock without individual identification.
The higher the drone altitude, the greater the challenge of acquiring the required
deep learning distinguishing features. There are limited studies establishing live-
stock response towards drones for appropriate altitude, thus great caution must be
360 T. Petso and R. S. Jamisola Jr

taken into consideration in livestock monitoring applications [40]. The highest per-
formance evaluation for livestock detection from the reviewed studies was identified
at 99.30% in terms of mean average precision with the LRCN model. LRCN is a
model approach ideal for visual features in videos, activities and image classifica-
tion. The incorporation of deep learning algorithms and livestock graze monitoring
capability on drones can aid in animal grazing management system. It is essential
to ensure animal grazing management for agricultural sustainability and maintain
continuous animal production [111].

6 Conclusions

The conventional approach of manual agricultural monitoring had been established


to be human skill-oriented and demanding in terms of time and labour. The UAV
monitoring system in agriculture provides an automatic tool to speed up assessment
and application of intervention methods that can improve its productivity. Deep learn-
ing is the most used machine learning tool in automatic agricultural applications. Its
advancements provide an efficient approach in intelligent drone systems. The appli-
cation of transfer learning speeds up the deep learning training process. Performance
evaluations are highlighted in with high processing results. The capability of near
real-time had been established to be vital for immediate agricultural management
decision making to ensure the probability of a high production yield. However, the
development capability of deep learning models with appropriate hardware, with high
power consumption and good internet in the field remain an area of improvement.
Innovative methods to tackle environmental conditions such as lighting variations,
tree occlusion, and drone altitude to improve the UAV monitoring systems can also
be developed. The collection of a vast, diverse amount of data in these different
conditions can be implemented to ensure a development of an accurate UAV system.
Additionally availability of publicly shared datasets obtained from UAVs to compare
the different deep learning networks to accelerate the development of better monitor-
ing systems. The overall benefits motivate the development of an integrated robust
intelligent system for sustainable agriculture to ensure world food security.

Acknowledgements Not applicable.

References

1. Abdulridha, J., Ampatzidis, Y., Kakarla, S. C., & Roberts, P. (2020). Detection of target spot
and bacterial spot diseases in tomato using UAV-based and benchtop-based hyperspectral
imaging techniques. Precision Agriculture, 21(5), 955–978.
2. Al-Thani, N., Albuainain, A., Alnaimi, F., & Zorba, N. (2020). Drones for sheep livestock
monitoring. In 2020 IEEE 20th Mediterranean Electrotechnical Conference (MELECON)
(pp. 672–676). IEEE.
A Review on Deep Learning on UAV Monitoring Systems … 361

3. Alzadjali, A., Alali, M. H., Sivakumar, A. N. V., Deogun, J. S., Scott, S., Schnable, J. C., &
Shi, Y. (2021). Maize tassel detection from UAV imagery using deep learning. Frontiers in
Robotics and AI, 8.
4. de Andrade Porto, J. V., Rezende, F. P. C., Astolfi, G., de Moraes Weber, V. A., Pache, M. C.
B., & Pistori, H. (2021). Automatic counting of cattle with faster R-CNN on UAV images. In
Anais do XVII Workshop de Visão Computacional, SBC (pp. 1–6).
5. Andrew, W., Greatwood, C., & Burghardt, T. (2017). Visual localisation and individual identi-
fication of holstein friesian cattle via deep learning. In Proceedings of the IEEE International
Conference on Computer Vision Workshops (pp. 2850–2859).
6. Andrew, W., Greatwood, C., & Burghardt, T. (2019). Aerial animal biometrics: Individual
friesian cattle recovery and visual identification via an autonomous UAV with onboard deep
inference. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS) (pp. 237–243). IEEE.
7. Anghelache, D., Persu, C., Dumitru, D., Băltatu, C., et al. (2021). Intelligent monitoring of
diseased plants using drones. Annals of the University of Craiova-Agriculture, Montanology,
Cadastre Series, 51(2), 146–151.
8. Apolo, O. E. A., Guanter, J. M., Cegarra, G. E., Raja, P., & Ruiz, M. P. (2020). Deep learning
techniques for estimation of the yield and size of citrus fruits using a UAV. European Journal
of Agronomy: The Official Journal of the European Society for Agronomy, 115(4), 183–194.
9. Apolo-Apolo, O. E., Pérez-Ruiz, M., Martínez-Guanter, J., & Valente, J. (2020). A cloud-
based environment for generating yield estimation maps from apple orchards using UAV
imagery and a deep learning technique. Frontiers in Plant Science, 11, 1086.
10. Ayamga, M., Akaba, S., & Nyaaba, A. A. (2021). Multifaceted applicability of drones: A
review. Technological Forecasting and Social Change, 167(120), 677.
11. Bah, M. D., Hafiane, A., & Canals, R. (2018). Deep learning with unsupervised data labeling
for weed detection in line crops in UAV images. Remote Sensing, 10(11), 1690.
12. Barbedo, J. G. A., & Koenigkan, L. V. (2018). Perspectives on the use of unmanned aerial
systems to monitor cattle. Outlook on Agriculture, 47(3), 214–222.
13. Barbedo, J. G. A., Koenigkan, L. V., Santos, T. T., & Santos, P. M. (2019). A study on the
detection of cattle in UAV images using deep learning. Sensors, 19(24), 5436.
14. Barbedo, J. G. A., Koenigkan, L. V., & Santos, P. M. (2020). Cattle detection using oblique
UAV images. Drones, 4(4), 75.
15. Barbedo, J. G. A., Koenigkan, L. V., Santos, P. M., & Ribeiro, A. R. B. (2020). Counting
cattle in UAV images-dealing with clustered animals and animal/background contrast changes.
Sensors, 20(7), 2126.
16. Behjati, M., Mohd Noh, A. B., Alobaidy, H. A., Zulkifley, M. A., Nordin, R., & Abdullah,
N. F. (2021). Lora communications as an enabler for internet of drones towards large-scale
livestock monitoring in rural farms. Sensors, 21(15), 5044.
17. Bhoi, S. K., Jena, K. K., Panda, S. K., Long, H. V., Kumar, R., Subbulakshmi, P., & Jebreen, H.
B. (2021). An internet of things assisted unmanned aerial vehicle based artificial intelligence
model for rice pest detection. Microprocessors and Microsystems, 80(103), 607.
18. Bhoj, S., Tarafdar, A., Singh, M., Gaur, G. (2022). Smart and automatic milking systems:
Benefits and prospects. In Smart and sustainable food technologies (pp. 87–121). Springer.
19. Bouguettaya, A., Zarzour, H., Kechida, A., Taberkit, A. M. (2021). Recent advances on UAV
and deep learning for early crop diseases identification: A short review. In 2021 International
Conference on Information Technology (ICIT) (pp. 334–339). IEEE.
20. Bouguettaya, A., Zarzour, H., Kechida, A., & Taberkit, A. M. (2022). Deep learning tech-
niques to classify agricultural crops through UAV imagery: A review. Neural Computing and
Applications, 34(12), 9511–9536.
21. Brunberg, E., Eythórsdóttir, E., Dỳrmundsson, Ó. R., & Grøva, L. (2020). The presence of
icelandic leadersheep affects flock behaviour when exposed to a predator test. Applied Animal
Behaviour Science, 232(105), 128.
22. de Camargo, T., Schirrmann, M., Landwehr, N., Dammer, K. H., & Pflanz, M. (2021). Opti-
mized deep learning model as a basis for fast UAV mapping of weed species in winter wheat
crops. Remote Sensing, 13(9), 1704.
362 T. Petso and R. S. Jamisola Jr

23. Cauli, N., & Reforgiato Recupero, D. (2022). Survey on videos data augmentation for deep
learning models. Future Internet, 14(3), 93.
24. Chandy, A., et al. (2019). Pest infestation identification in coconut trees using deep learning.
Journal of Artificial Intelligence, 1(01), 10–18.
25. Chen, C. J., Huang, Y. Y., Li, Y. S., Chen, Y. C., Chang, C. Y., & Huang, Y. M. (2021)
Identification of fruit tree pests with deep learning on embedded drone to achieve accurate
pesticide spraying. IEEE Access, 9, 21,986–21,997.
26. Chen, J. W., Lin, W. J., Cheng, H. J., Hung, C. L., Lin, C. Y., & Chen, S. P. (2021). A
smartphone-based application for scale pest detection using multiple-object detection meth-
ods. Electronics, 10(4), 372.
27. Chen, Y., Lee, W. S., Gan, H., Peres, N., Fraisse, C., Zhang, Y., & He, Y. (2019). Strawberry
yield prediction based on a deep neural network using high-resolution aerial orthoimages.
Remote Sensing, 11(13), 1584.
28. Chew, R., Rineer, J., Beach, R., O’Neil, M., Ujeneza, N., Lapidus, D., Miano, T., Hegarty-
Craver, M., Polly, J., & Temple, D. S. (2020). Deep neural networks and transfer learning for
food crop identification in UAV images. Drones, 4(1), 7.
29. Delavarpour, N., Koparan, C., Nowatzki, J., Bajwa, S., & Sun, X. (2021). A technical study on
UAV characteristics for precision agriculture applications and associated practical challenges.
Remote Sensing, 13(6), 1204.
30. Dileep, M., Navaneeth, A., Ullagaddi, S., & Danti, A. (2020). A study and analysis on various
types of agricultural drones and its applications. In 2020 Fifth International Conference on
Research in Computational Intelligence and Communication Networks (ICRCICN) (pp. 181–
185). IEEE
31. Espejo-Garcia, B., Mylonas, N., Athanasakos, L., Vali, E., & Fountas, S. (2021). Combining
generative adversarial networks and agricultural transfer learning for weeds identification.
Biosystems Engineering, 204, 79–89.
32. Etienne, A., Ahmad, A., Aggarwal, V., & Saraswat, D. (2021). Deep learning-based object
detection system for identifying weeds using UAS imagery. Remote Sensing, 13(24), 5182.
33. Feng, J., Sun, Y., Zhang, K., Zhao, Y., Ren, Y., Chen, Y., Zhuang, H., & Chen, S. (2022).
Autonomous detection of spodoptera frugiperda by feeding symptoms directly from UAV
RGB imagery. Applied Sciences, 12(5), 2592.
34. Fenu, G., & Malloci, F. M. (2021). Forecasting plant and crop disease: an explorative study
on current algorithms. Big Data and Cognitive Computing, 5(1), 2.
35. Görlich, F., Marks, E., Mahlein, A. K., König, K., Lottes, P., & Stachniss, C. (2021). UAV-
based classification of cercospora leaf spot using RGB images. Drones, 5(2), 34.
36. Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. (2016). Deep learning for
visual understanding: A review. Neurocomputing, 187, 27–48.
37. Hajar, M. M. A., Lazim, I. M., Rosdi, A. R., & Ramli, L. (2021). Autonomous UAV-based
cattle detection and counting using YOLOv3 and deep sort.
38. Hande, M. J. (2021). Indoor farming hydroponic plant grow chamber. International Journal
of Scientific Research and Engineering Trends, 7, 2050–2052.
39. Hasan, A. M., Sohel, F., Diepeveen, D., Laga, H., & Jones, M. G. (2021). A survey of deep
learning techniques for weed detection from images. Computers and Electronics in Agricul-
ture, 184(106), 067.
40. Herlin, A., Brunberg, E., Hultgren, J., Högberg, N., Rydberg, A., & Skarin, A. (2021). Animal
welfare implications of digital tools for monitoring and management of cattle and sheep on
pasture. Animals, 11(3), 829.
41. Huang, H., Lan, Y., Yang, A., Zhang, Y., Wen, S., & Deng, J. (2020). Deep learning versus
object-based image analysis (OBIA) in weed mapping of UAV imagery. International Journal
of Remote Sensing, 41(9), 3446–3479.
42. Iost Filho, F. H., Heldens, W. B., Kong, Z., & de Lange, E. S. (2020). Drones: innovative
technology for use in precision pest management. Journal of Economic Entomology, 113(1),
1–25.
A Review on Deep Learning on UAV Monitoring Systems … 363

43. Ishengoma, F. S., Rai, I. A., & Said, R. N. (2021). Identification of maize leaves infected
by fall armyworms using UAV-based imagery and convolutional neural networks. Computers
and Electronics in Agriculture, 184(106), 124.
44. Islam, N., Rashid, M. M., Wibowo, S., Wasimi, S., Morshed, A., Xu, C., & Moore, S. (2020).
Machine learning based approach for weed detection in chilli field using RGB images. In
The International Conference on Natural Computation (pp. 1097–1105). Fuzzy Systems and
Knowledge Discovery: Springer.
45. Jintasuttisak, T., Edirisinghe, E., & Elbattay, A. (2022). Deep neural network based date palm
tree detection in drone imagery. Computers and Electronics in Agriculture, 192(106), 560.
46. Joshi, R. C., Kaushik, M., Dutta, M. K., Srivastava, A., & Choudhary, N. (2021). Virleafnet:
Automatic analysis and viral disease diagnosis using deep-learning in vigna mungo plant.
Ecological Informatics, 61(101), 197.
47. Junos, M. H., Mohd Khairuddin, A. S., Thannirmalai, S., & Dahari, M. (2022). Automatic
detection of oil palm fruits from UAV images using an improved YOLO model. The Visual
Computer, 38(7), 2341–2355.
48. Juyal, P., & Sharma, S. (2021). Crop growth monitoring using unmanned aerial vehicle for farm
field management. In 2021 6th International Conference on Communication and Electronics
Systems (ICCES) (pp. 880–884). IEEE
49. Kaivosoja, J., Hautsalo, J., Heikkinen, J., Hiltunen, L., Ruuttunen, P., Näsi, R., Niemeläi-
nen, O., Lemsalu, M., Honkavaara, E., & Salonen, J. (2021). Reference measurements in
developing UAV systems for detecting pests, weeds, and diseases. Remote Sensing, 13(7),
1238.
50. Kalantar, A., Edan, Y., Gur, A., & Klapp, I. (2020). A deep learning system for single and
overall weight estimation of melons using unmanned aerial vehicle images. Computers and
Electronics in Agriculture, 178(105), 748.
51. Kamilaris, A., & Prenafeta-Boldú, F. X. (2018). Deep learning in agriculture: A survey.
Computers and Electronics in Agriculture, 147, 70–90.
52. Kerkech, M., Hafiane, A., & Canals, R. (2018). Deep leaning approach with colorimetric
spaces and vegetation indices for vine diseases detection in UAV images. Computers and
Electronics in Agriculture, 155, 237–243.
53. Kerkech, M., Hafiane, A., & Canals, R. (2020). Vddnet: Vine disease detection network based
on multispectral images and depth map. Remote Sensing, 12(20), 3305.
54. Kerkech, M., Hafiane, A., & Canals, R. (2020). Vine disease detection in UAV multispec-
tral images using optimized image registration and deep learning segmentation approach.
Computers and Electronics in Agriculture, 174(105), 446.
55. Kerkech, M., Hafiane, A., Canals, R., & Ros, F. (2020). Vine disease detection by deep
learning method combined with 3D depth information. In International Conference on Image
and Signal Processing (pp. 82–90). Springer.
56. Khan, S., Tufail, M., Khan, M. T., Khan, Z. A., & Anwar, S. (2021). Deep learning-based
identification system of weeds and crops in strawberry and pea fields for a precision agriculture
sprayer. Precision Agriculture, 22(6), 1711–1727.
57. Kitano, B. T., Mendes, C. C., Geus, A. R., Oliveira, H. C., & Souza, J. R. (2019). Corn plant
counting using deep learning and UAV images. IEEE Geoscience and Remote Sensing Letters.
58. Kitpo, N., & Inoue, M. (2018). Early rice disease detection and position mapping system using
drone and IoT architecture. In 2018 12th South East Asian Technical University Consortium
(SEATUC) (Vol. 1, pp. 1–5). IEEE
59. Krul, S., Pantos, C., Frangulea, M., & Valente, J. (2021). Visual SLAM for indoor livestock
and farming using a small drone with a monocular camera: A feasibility study. Drones, 5(2),
41.
60. Lan, Y., Huang, Z., Deng, X., Zhu, Z., Huang, H., Zheng, Z., Lian, B., Zeng, G., & Tong,
Z. (2020). Comparison of machine learning methods for citrus greening detection on UAV
multispectral images. Computers and Electronics in Agriculture, 171(105), 234.
61. Lan, Y., Huang, K., Yang, C., Lei, L., Ye, J., Zhang, J., Zeng, W., Zhang, Y., & Deng, J.
(2021). Real-time identification of rice weeds by UAV low-altitude remote sensing based on
improved semantic segmentation model. Remote Sensing, 13(21), 4370.
364 T. Petso and R. S. Jamisola Jr

62. León-Rueda, W. A., León, C., Caro, S. G., & Ramírez-Gil, J. G. (2022). Identification of
diseases and physiological disorders in potato via multispectral drone imagery using machine
learning tools. Tropical Plant Pathology, 47(1), 152–167.
63. Li, B., Yang, B., Liu, C., Liu, F., Ji, R., & Ye, Q. (2021) Beyond max-margin: Class margin
equilibrium for few-shot object detection. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition (pp. 7363–7372).
64. Li, D., Sun, X., Elkhouchlaa, H., Jia, Y., Yao, Z., Lin, P., Li, J., & Lu, H. (2021). Fast detection
and location of longan fruits using UAV images. Computers and Electronics in Agriculture,
190(106), 465.
65. Li, J. Y., Duce, S., Joyce, K. E., & Xiang, W. (2021). Seecucumbers: Using deep learning and
drone imagery to detect sea cucumbers on coral reef flats. Drones, 5(2), 28.
66. Liang, W. C., Yang, Y. J., & Chao, C. M. (2019). Low-cost weed identification system using
drones. In 2019 Seventh International Symposium on Computing and Networking Workshops
(CANDARW) (pp. 260–263). IEEE.
67. Lin, Y., Chen, T., Liu, S., Cai, Y., Shi, H., Zheng, D., Lan, Y., Yue, X., & Zhang, L. (2022).
Quick and accurate monitoring peanut seedlings emergence rate through UAV video and deep
learning. Computers and Electronics in Agriculture, 197(106), 938.
68. Lin, Z., & Guo, W. (2021). Cotton stand counting from unmanned aerial system imagery
using mobilenet and centernet deep learning models. Remote Sensing, 13(14), 2822.
69. Liu, C., Jian, Z., Xie, M., & Cheng, I. (2021). A real-time mobile application for cattle tracking
using video captured from a drone. In 2021 International Symposium on Networks (pp. 1–6).
IEEE: Computers and Communications (ISNCC).
70. Liu, J., & Wang, X. (2021). Plant diseases and pests detection based on deep learning: A
review. Plant Methods, 17(1), 1–18.
71. Liu, J., Abbas, I., & Noor, R. S. (2021). Development of deep learning-based variable rate
agrochemical spraying system for targeted weeds control in strawberry crop. Agronomy, 11(8),
1480.
72. Loey, M., ElSawy, A., & Afify, M. (2020). Deep learning in plant diseases detection for agri-
cultural crops: A survey. International Journal of Service Science, Management, Engineering,
and Technology (IJSSMET), 11(2), 41–58.
73. Maes, W. H., & Steppe, K. (2019). Perspectives for remote sensing with unmanned aerial
vehicles in precision agriculture. Trends in plant science, 24(2), 152–164.
74. Mathew, A., Amudha, P., & Sivakumari, S. (2020). Deep learning techniques: An overview.
In International Conference on Advanced Machine Learning Technologies and Applications
(pp. 599–608). Springer
75. Meena, S. D., & Agilandeeswari, L. (2021). Smart animal detection and counting frame-
work for monitoring livestock in an autonomous unmanned ground vehicle using restricted
supervised learning and image fusion. Neural Processing Letters, 53(2), 1253–1285.
76. Menshchikov, A., Shadrin, D., Prutyanov, V., Lopatkin, D., Sosnin, S., Tsykunov, E., Iakovlev,
E., & Somov, A. (2021). Real-time detection of hogweed: UAV platform empowered by deep
learning. IEEE Transactions on Computers, 70(8), 1175–1188.
77. van der Merwe, D., Burchfield, D. R., Witt, T. D., Price, K. P., & Sharda, A. (2020). Drones
in agriculture. In Advances in agronomy (Vo. 162, pp. 1–30).
78. Mirhaji, H., Soleymani, M., Asakereh, A., & Mehdizadeh, S. A. (2021). Fruit detection and
load estimation of an orange orchard using the YOLO models through simple approaches
in different imaging and illumination conditions. Computers and Electronics in Agriculture,
191(106), 533.
79. Moeinizade, S., Pham, H., Han, Y., Dobbels, A., & Hu, G. (2022). An applied deep learning
approach for estimating soybean relative maturity from UAV imagery to aid plant breeding
decisions. Machine Learning with Applications, 7(100), 233.
80. Mohidem, N. A., Che’Ya, N. N., Juraimi, A. S., Fazlil Ilahi, W. F., Mohd Roslim, M. H.,
Sulaiman, N., Saberioon, M., & Mohd Noor, N. (2021). How can unmanned aerial vehicles
be used for detecting weeds in agricultural fields? Agriculture, 11(10), 1004.
A Review on Deep Learning on UAV Monitoring Systems … 365

81. Monteiro, A., Santos, S., & Gonçalves, P. (2021). Precision agriculture for crop and livestock
farming-brief review. Animals, 11(8), 2345.
82. Nazir, S., & Kaleem, M. (2021). Advances in image acquisition and processing technologies
transforming animal ecological studies. Ecological Informatics, 61(101), 212.
83. Nematzadeh, S., Kiani, F., Torkamanian-Afshar, M., & Aydin, N. (2022). Tuning hyperpa-
rameters of machine learning algorithms and deep neural networks using metaheuristics: A
bioinformatics study on biomedical and biological cases. Computational Biology and Chem-
istry, 97(107), 619.
84. Nguyen, H. T., Lopez Caceres, M. L., Moritake, K., Kentsch, S., Shu, H., & Diez, Y. (2021).
Individual sick fir tree (abies mariesii) identification in insect infested forests by means of
UAV images and deep learning. Remote Sensing, 13(2), 260.
85. Ofori, M., El-Gayar, O. F. (2020). Towards deep learning for weed detection: Deep convolu-
tional neural network architectures for plant seedling classification.
86. Osorio, K., Puerto, A., Pedraza, C., Jamaica, D., & Rodríguez, L. (2020). A deep learning
approach for weed detection in lettuce crops using multispectral images. AgriEngineering,
2(3), 471–488.
87. Ouchra, H., & Belangour, A. (2021). Object detection approaches in images: A survey. In
Thirteenth International Conference on Digital Image Processing (ICDIP 2021) (Vol. 11878,
pp. 118780H). International Society for Optics and Photonics.
88. Pang, Y., Shi, Y., Gao, S., Jiang, F., Veeranampalayam-Sivakumar, A. N., Thompson, L., Luck,
J., & Liu, C. (2020). Improved crop row detection with deep neural network for early-season
maize stand count in UAV imagery. Computers and Electronics in Agriculture, 178(105), 766.
89. Petso, T., Jamisola, R. S., Mpoeleng, D., & Mmereki, W. (2021) Individual animal and herd
identification using custom YOLO v3 and v4 with images taken from a UAV camera at different
altitudes. In 2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP)
(pp. 33–39). IEEE.
90. Petso, T., Jamisola, R. S., Jr., Mpoeleng, D., Bennitt, E., & Mmereki, W. (2021). Automatic
animal identification from drone camera based on point pattern analysis of herd behaviour.
Ecological Informatics, 66(101), 485.
91. Petso, T., Jamisola, R. S., & Mpoeleng, D. (2022). Review on methods used for wildlife
species and individual identification. European Journal of Wildlife Research, 68(1), 1–18.
92. Ponnusamy, V., & Natarajan, S. (2021). Precision agriculture using advanced technology of
iot, unmanned aerial vehicle, augmented reality, and machine learning. In Smart Sensors for
Industrial Internet of Things (pp. 207–229). Springer.
93. Qian, W., Huang, Y., Liu, Q., Fan, W., Sun, Z., Dong, H., Wan, F., & Qiao, X. (2020). UAV
and a deep convolutional neural network for monitoring invasive alien plants in the wild.
Computers and Electronics in Agriculture, 174(105), 519.
94. Rachmawati, S., Putra, A. S., Priyatama, A., Parulian, D., Katarina, D., Habibie, M. T., Sia-
haan, M., Ningrum, E. P., Medikano, A., & Valentino, V. (2021). Application of drone technol-
ogy for mapping and monitoring of corn agricultural land. In 2021 International Conference
on ICT for Smart Society (ICISS) (pp. 1–5). IEEE.
95. Raheem, D., Dayoub, M., Birech, R., & Nakiyemba, A. (2021). The contribution of cereal
grains to food security and sustainability in Africa: potential application of UAV in Ghana,
Nigeria, Uganda, and Namibia. Urban Science, 5(1), 8.
96. Rahman, M. F. F., Fan, S., Zhang, Y., & Chen, L. (2021). A comparative study on application
of unmanned aerial vehicle systems in agriculture. Agriculture, 11(1), 22.
97. Rangarajan, A. K., Balu, E. J., Boligala, M. S., Jagannath, A., & Ranganathan, B. N. (2022).
A low-cost UAV for detection of Cercospora leaf spot in okra using deep convolutional neural
network. Multimedia Tools and Applications, 81(15), 21,565–21,589.
98. Raoult, V., Colefax, A. P., Allan, B. M., Cagnazzi, D., Castelblanco-Martínez, N., Ierodia-
conou, D., Johnston, D. W., Landeo-Yauri, S., Lyons, M., Pirotta, V., et al. (2020). Operational
protocols for the use of drones in marine animal research. Drones, 4(4), 64.
99. Razfar, N., True, J., Bassiouny, R., Venkatesh, V., & Kashef, R. (2022). Weed detection in
soybean crops using custom lightweight deep learning models. Journal of Agriculture and
Food Research, 8(100), 308.
366 T. Petso and R. S. Jamisola Jr

100. Rivas, A., Chamoso, P., González-Briones, A., & Corchado, J. M. (2018). Detection of cattle
using drones and convolutional neural networks. Sensors, 18(7), 2048.
101. Roosjen, P. P., Kellenberger, B., Kooistra, L., Green, D. R., & Fahrentrapp, J. (2020). Deep
learning for automated detection of Drosophila suzukii: Potential for UAV-based monitoring.
Pest Management Science, 76(9), 2994–3002.
102. Roy, A. M., Bose, R., & Bhaduri, J. (2022). A fast accurate fine-grain object detection model
based on YOLOv4 deep neural network. Neural Computing and Applications, 34(5), 3895–
3921.
103. Safarijalal, B., Alborzi, Y., & Najafi, E. (2022). Automated wheat disease detection using a
ROS-based autonomous guided UAV.
104. Safonova, A., Guirado, E., Maglinets, Y., Alcaraz-Segura, D., & Tabik, S. (2021). Olive
tree biovolume from UAV multi-resolution image segmentation with mask R-CNN. Sensors,
21(5), 1617.
105. Saleem, M. H., Potgieter, J., & Arif, K. M. (2021). Automation in agriculture by machine
and deep learning techniques: A review of recent developments. Precision Agriculture, 22(6),
2053–2091.
106. dos Santos, A., Biesseck, B. J. G., Latte, N., de Lima Santos, I. C., dos Santos, W. P., Zanetti, R.,
& Zanuncio, J. C. (2022). Remote detection and measurement of leaf-cutting ant nests using
deep learning and an unmanned aerial vehicle. Computers and Electronics in Agriculture,
198(107), 071.
107. dos Santos, Ferreira A., Freitas, D. M., da Silva, G. G., Pistori, H., & Folhes, M. T. (2017).
Weed detection in soybean crops using convnets. Computers and Electronics in Agriculture,
143, 314–324.
108. Sarwar, F., Griffin, A., Periasamy, P., Portas, K., & Law, J. (2018). Detecting and counting
sheep with a convolutional neural network. In 2018 15th IEEE International Conference on
Advanced Video and Signal Based Surveillance (AVSS) (pp. 1–6). IEEE.
109. Sarwar, F., Griffin, A., Rehman, S. U., & Pasang, T. (2021). Detecting sheep in UAV images.
Computers and Electronics in Agriculture, 187(106), 219.
110. Shankar, R. H., Veeraraghavan, A., Sivaraman, K., Ramachandran, S. S., et al. (2018). Appli-
cation of UAV for pest, weeds and disease detection using open computer vision. In 2018
International Conference on Smart Systems and Inventive Technology (ICSSIT) (pp. 287–292).
IEEE
111. Shao, W., Kawakami, R., Yoshihashi, R., You, S., Kawase, H., & Naemura, T. (2020). Cattle
detection and counting in UAV images based on convolutional neural networks. International
Journal of Remote Sensing, 41(1), 31–52.
112. Sharma, A., Jain, A., Gupta, P., & Chowdary, V. (2020). Machine learning applications for
precision agriculture: A comprehensive review. IEEE Access, 9, 4843–4873.
113. Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep
learning. Journal of Big Data, 6(1), 1–48.
114. Skendžić, S., Zovko, M., Živković, I. P., Lešić, V., & Lemić, D. (2021). The impact of climate
change on agricultural insect pests. Insects, 12(5), 440.
115. Soares, V., Ponti, M., Gonçalves, R., & Campello, R. (2021). Cattle counting in the wild with
geolocated aerial images in large pasture areas. Computers and Electronics in Agriculture,
189(106), 354.
116. Stein, E. W. (2021). The transformative environmental effects large-scale indoor farming may
have on air, water, and soil. Air, Soil and Water Research, 14(1178622121995), 819.
117. Stewart, E. L., Wiesner-Hanks, T., Kaczmar, N., DeChant, C., Wu, H., Lipson, H., Nelson,
R. J., & Gore, M. A. (2019). Quantitative phenotyping of northern leaf blight in UAV images
using deep learning. Remote Sensing, 11(19), 2209.
118. Talaviya, T., Shah, D., Patel, N., Yagnik, H., & Shah, M. (2020). Implementation of artificial
intelligence in agriculture for optimisation of irrigation and application of pesticides and
herbicides. Artificial Intelligence in Agriculture, 4, 58–73.
119. Tetila, E. C., Machado, B. B., Menezes, G. K., Oliveira, Ad. S., Alvarez, M., Amorim, W. P.,
Belete, N. A. D. S., Da Silva, G. G., & Pistori, H. (2019). Automatic recognition of soybean
A Review on Deep Learning on UAV Monitoring Systems … 367

leaf diseases using UAV images and deep convolutional neural networks. IEEE Geoscience
and Remote Sensing Letters, 17(5), 903–907.
120. Tetila, E. C., Machado, B. B., Astolfi, G., de Souza Belete, N. A., Amorim, W. P., Roel, A. R.,
& Pistori, H. (2020). Detection and classification of soybean pests using deep learning with
UAV images. Computers and Electronics in Agriculture, 179(105), 836.
121. Tiwari, A., Sachdeva, K., & Jain, N. (2021). Computer vision and deep learningbased frame-
work for cattle monitoring. In 2021 IEEE 8th Uttar Pradesh Section International Conference
on Electrical (pp. 1–6). IEEE: Electronics and Computer Engineering (UPCON).
122. Ukwuoma, C. C., Zhiguang, Q., Bin Heyat, M. B., Ali, L., Almaspoor, Z., & Monday, H.
N. (2022). Recent advancements in fruit detection and classification using deep learning
techniques. Mathematical Problems in Engineering.
123. Vayssade, J. A., Arquet, R., & Bonneau, M. (2019). Automatic activity tracking of goats using
drone camera. Computers and Electronics in Agriculture, 162, 767–772.
124. Veeranampalayam Sivakumar, A. N., Li, J., Scott, S., Psota, E., Jhala, A., Luck, J. D., & Shi, Y.
(2020). Comparison of object detection and patch-based classification deep learning models
on mid-to late-season weed detection in UAV imagery. Remote Sensing, 12(13), 2136.
125. Velusamy, P., Rajendran, S., Mahendran, R. K., Naseer, S., Shafiq, M., & Choi, J. G. (2021).
Unmanned aerial vehicles (UAV) in precision agriculture: Applications and challenges. Ener-
gies, 15(1), 217.
126. Wani, J. A., Sharma, S., Muzamil, M., Ahmed, S., Sharma, S., & Singh, S. (2021). Machine
learning and deep learning based computational techniques in automatic agricultural dis-
eases detection: Methodologies, applications, and challenges. In Archives of Computational
Methods in Engineering (pp. 1–37).
127. Wittstruck, L., Kühling, I., Trautz, D., Kohlbrecher, M., & Jarmer, T. (2020). UAV-based
RGB imagery for hokkaido pumpkin (cucurbita max.) detection and yield estimation. Sensors,
21(1), 118.
128. Xie, W., Wei, S., Zheng, Z., Jiang, Y., & Yang, D. (2021). Recognition of defective carrots
based on deep learning and transfer learning. Food and Bioprocess Technology, 14(7), 1361–
1374.
129. Xiong, J., Liu, Z., Chen, S., Liu, B., Zheng, Z., Zhong, Z., Yang, Z., & Peng, H. (2020).
Visual detection of green mangoes by an unmanned aerial vehicle in orchards based on a deep
learning method. Biosystems Engineering, 194, 261–272.
130. Xiong, Y., Zeng, X., Chen, Y., Liao, J., Lai, W., & Zhu, M. (2022). An approach to detecting
and mapping individual fruit trees integrated YOLOv5 with UAV remote sensing.
131. Xu, B., Wang, W., Falzon, G., Kwan, P., Guo, L., Chen, G., Tait, A., & Schneider, D. (2020).
Automated cattle counting using mask R-CNN in quadcopter vision system. Computers and
Electronics in Agriculture, 171(105), 300.
132. Xu, B., Wang, W., Falzon, G., Kwan, P., Guo, L., Sun, Z., & Li, C. (2020). Livestock classi-
fication and counting in quadcopter aerial images using mask R-CNN. International Journal
of Remote Sensing, 41(21), 8121–8142.
133. Yang, Q., Shi, L., Han, J., Yu, J., & Huang, K. (2020). A near real-time deep learning approach
for detecting rice phenology based on UAV images. Agricultural and Forest Meteorology,
287(107), 938.
134. Yang, S., Yang, X., & Mo, J. (2018). The application of unmanned aircraft systems to plant
protection in china. Precision Agriculture, 19(2), 278–292.
135. Zhang, H., Lin, P., He, J., & Chen, Y. (2020) Accurate strawberry plant detection system based
on low-altitude remote sensing and deep learning technologies. In 2020 3rd International
Conference on Artificial Intelligence and Big Data (ICAIBD) (pp. 1–5). IEEE.
136. Zhang, H., Wang, L., Tian, T., & Yin, J. (2021). A review of unmanned aerial vehicle low-
altitude remote sensing (UAV-LARS) use in agricultural monitoring in china. Remote Sensing,
13(6), 1221.
137. Zhang, R., Wang, C., Hu, X., Liu, Y., Chen, S., et al. (2020) Weed location and recognition
based on UAV imaging and deep learning. International Journal of Precision Agricultural
Aviation, 3(1).
368 T. Petso and R. S. Jamisola Jr

138. Zhang, X., Han, L., Dong, Y., Shi, Y., Huang, W., Han, L., González-Moreno, P., Ma, H., Ye,
H., & Sobeih, T. (2019). A deep learning-based approach for automated yellow rust disease
detection from high-resolution hyperspectral UAV images. Remote Sensing, 11(13), 1554.
139. Zhou, X., Lee, W. S., Ampatzidis, Y., Chen, Y., Peres, N., & Fraisse, C. (2021). Strawberry
Maturity Classification from UAV and Near-Ground Imaging Using Deep Learning. Smart
Agricultural Technology, 1(100), 001.
Navigation and Trajectory Planning
Techniques for Unmanned Aerial
Vehicles Swarm

Nada Mohammed Elfatih, Elmustafa Sayed Ali , and Rashid A. Saeed

Abstract Navigation and trajectory planning algorithms is one of the most impor-
tant issues in unmanned aerial vehicle (UAV) and robotics. Recently, UAV swarm
or flying ad-hoc network which have much interest and extensive attentions from
aviation industry, academia and research community, as it becomes one of the great
tools for smart cities, rescue/disaster managements and military applications. UAV
swarm is a scenario makes the UAVs interacted with each other. The control and
communication structure in UAVs swarm require a specific decision to improve the
trajectory planning and navigation operations of UAVs swarm. In addition, it requires
high processing time and power with resources scarcity to efficiently operates the
flights plan. Artificial intelligence (AI) is a powerful tool for optimization and accu-
rate solutions for decision and power management issues. However, it comes with
high data communication and processing. Leveraging AI with navigation and path
planning it gives much adding values and great results for the system robustness. UAV
industry moves toward the AI approaches in developing UAVs swarm and promising
more intelligence UAV swarm interaction, according to the importance of this topic,
this chapter will provide a systematic review on AI approaches and most algorithms
those enable to developing the navigation and trajectory planning strategies for UAV
swarm.

Keywords UAV swarm · Drones · Small unmanned aircraft systems (UASs) ·


Flight robotics · Artificial intelligent · Control and communication

N. M. Elfatih · E. S. Ali (B) · R. A. Saeed


Department of Electrical and Electronics Engineering, Red Sea University (RSU), Port Sudan,
Sudan
e-mail: [email protected]
E. S. Ali
Department of Electronics Engineering, Collage of Engineering, Sudan University of Science and
Technology (SUST), Khartoum, Sudan
R. A. Saeed
Department of Computer Engineering, College of Computers and Information Technology, Taif
University, P.O. Box 11099, Taif 21944, Saudi Arabia

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 369
A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous
Systems Applications, Studies in Computational Intelligence 1093,
https://doi.org/10.1007/978-3-031-28715-2_12
370 N. M. Elfatih et al.

1 Introduction

The Drones are known as an unmanned aerial vehicle (UAVs) which can operate
remotely without onboard humans [1]. UAVs have been investigated as a disruptive
technology that complement and support operations, which are performed tradition-
ally by human. Due to their excellent mobility, flexibility, easy deployment, high-
performance, low maintenance, adaptive altitude UAVs are widely used in many
applications related to civil and military issues, for example, wildfire and moni-
toring, traffic control, emergency rescue, medical field and intelligent transportation.
UAVs enable to provide wide coverage sensing for different environments [2].
For AUVs, various communication technologies and standards emerge. Such tech-
niques like cloud computing and software defined network, in addition to big data
analytics. The UAV design also passed through different communications evolutions,
beginning from using 3G broadband signals, and to achieve high data rate, in addition
to 5G end to end connectivity. The evolution of UAVs communications from 4 to 5G
provides new technologies to support cellular communications for UAVs operations
with high reliability, and high energy utilization [3]. The cellular network based 5G
provide an enhanced UAVs broadband communication, and also enables the UAVs
to act as a flying base station for swarm UAVs, and gateways to the ground cellular
stations.
Navigation process and trajectory planning are the most important issues which
are considered a crucial for UAVs. The process of planning the UAVs trajectories
in complex environments that contains a number of obstacles is one of the major
challenges facing its application [4]. In addition, the establishment of a network
consisting of a number of UAVs that have the ability to avoid collision while taking
into account the kinetic characteristics is the most important requirements in UAVs
swarm applications.
In accordance with the challenges mentioned, and to achieve the operational effi-
ciency of the swarm UAVs with their safety, it is important to intelligently estimate
the journey plan especially in complex environments. Therefore, trajectory planning
for UAVs has become a research hotspot. In this chapter we provide a comprehen-
sive review about technical conceptual about UAVs swarm architectures, applica-
tion, navigation and trajectory planning technologies. Our main contributions are
summarized as follows.
• Provides a review about UAVs swarm Architecture, communications and control
systems.
• Discussed the UAVs swarm navigations and trajectory planning classifications.
• Provides a review about most important intelligent technologies used for UAVs
swarm trajectory planning.
The rest of this chapter is organized as follows, Sect. 2 provides UAVs technical
background in addition to UAV swarm advantages and applications. Swarm commu-
nication and control system architectures are provided in Sect. 3. Section 4 provides
navigation and path planning for UAV swarm. The classical techniques for UAV
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 371

swarm navigation and path planning are reviewed in Sect. 5. In Sect. 6, the reactive
approaches for UAV Swarm navigation and path planning are discussed. Finally, the
conclusion is provided in Sect. 7.

2 UAV Technical Background

The beginning of the development of drone technology is when the federal law was
announced from the United States in the year 2016 regarding regulating the public
use of UAVs. Corresponds to the purpose, it has been used in a number of fields
such as agricultural monitoring, power lines and photography, in addition to various
search and rescue operations [5]. In recent years, the concept of a UAVs swarm has
become an important research topic, which is the possibility of managing a swarm
of drones and enabling interaction between them through intelligent algorithms that
enable the swarm to conduct joint operations among themselves according to pre-
defined paths that are controlled from the operations center. The operations of UVAs
are depending on it capability of controlling, maneuvering and power utilization. The
following section provides a brief concept about UVAs architecture and intelligence
operations.

2.1 UAV Architecture

The architecture of UAV consists of three layers, as shown in Fig. 1. These layers
relate to data collection, processing and operation [7]. The data collection layer
consists of a number of devices such as sensors, light detectors and cameras. The
other layers also contain processing devices, control systems, and other systems
related to maps and decision-making [8].
In UAVs, the central control system shown in Fig. 2 controls the UAV trajectory
in the real environment. The controller adjusts the speed, flight control, and radio
and power source. More clearly, the components are described as follows [9].
• Speed controller: provides high frequency to operate the UAV motors and control
their speed.
• Positioning system: It calculates the time and location information of the UAV
and determines the coordinates and altitude of the aircraft.
• Flight controller: It manages flight operations by reading location system
information while controlling communications.
• Battery: In UAVs, batteries are made of materials that give greater energy and
a long range as a material, like Lithium polymer, in addition other batteries are
added to help long-range flight
• Gimbal: It stabilizes the UAVs on its three-dimensional axis.
372 N. M. Elfatih et al.

Fig. 1 UAVs architecture layers

Fig. 2 UAV systems and


devices

• Sensors: There are a number of sensors in the UAVs that work to capture 3D
images or detect and avoid collisions.

2.2 UAV Swarm Current State

UAVs can be act and make different operation scenarios in swarm as a set of UAV
group. Recent studies tried to explore the benefits and features of swarm insect’s
behavior in nature [10]. As example, bee and eagle bird swarm provides an intelli-
gent concept of flying which can help to provide a good solution for UAVs tasks.
Also, the concept of swarm as a behavior of complex collective operations can take
place through interactions between large numbers of UAVs in an intelligent manner
[11]. The UAVs swarm can provide a tactical operation with high efficiency and
performance, in addition to increase the operations quality and reliability when used
in different applications [12].
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 373

UAV swarm enable to provide high-level tasks compared to one UAV. Moreover,
UAV swarms also allow for fault tolerance, if one UAV of the swarm teams are
lost, others swarms or group member can accomplish the assigned tasks by real-
locating the missions to the surviving team members [13]. A swarm of UAVs can
be deployed to perform various delivery missions including searching, performing
target tracking, high-accuracy search and surveillance [14]. All of these operations
are carried out by a squadron or group of UAVs that are directed by navigation,
guidance and control systems that work to manage the UAVs allocation, flying area
coordination and communications. All these tasks operate within a complex system
that includes various integrated and integrated technologies [15]. Additionally, the
artificial intelligence (AI) technologies are also used in building UAVs systems for
different purposes such for intelligent maneuvering, trajectory planning, and swarm
interaction.

2.3 UAV Swarm Advantages

Many previous papers discussed individual UAV systems and their various applica-
tions however, few did offer the study of UAV swarms and their associated limitations
versus advantages. Through a lot of literature, it is clear that UAV work in a number
of applications and scenarios related to surveillance and have an advantage when
used alone [15]. However, the operations of the UAV in a swarm gives more advan-
tages appear in exchange for the operation of single UAV, especially in search tasks.
By using swarm of UAVs searching task can be done in parallel and the range of
operations can be increase largely. Even though that, UAV swarm can face issues in
trajectory planning and interactions. In general, the swarm UAV advantages can be
summaries in table 1, when compared with single UAV [16].

A. Synchronized Actions

A swarm of UAV can simultaneously collect information from different locations,


and it can also take advantage of the collected information to build a model decision-
making system for complex tasks [17].

Table 1 Comparisons
Features Single UAV Swarm UAV
between a single UAV and
swarm UAVs systems Operations duration Poor High
Scalability Limited High
Mission speed Slow Fast
independence Low High
Cost High Low
Communication requirements High Low
Radar cross sections Large Small
374 N. M. Elfatih et al.

B. Time Efficiency
The swarm of UAV enable to reduce the time of making task and missions of searching
or monitoring. As an example, author in [7] provide a study of using UAV swarm
for detecting the nuclear radiation to build a map for rescue operations.
C. Complementarities a Team Member
Having a swarm of heterogeneous UAVs, more advantages can be achieved due to
the possibility of its integration in different operations and tasks at the same time
[18].
D. Reliability
The UAV swarm system delivers solutions that provide greater fault tolerance and
flexibility in case of single UAV mission fails.
E. Technology Evaluation
With the development of integrated systems and techniques of miniaturization,
models of UAVs to operate in a swarm can be produced, characterized by lightness
and small size. [18].
F. Cost
Single high-performance UAV to perform complex tasks are very costly when
compared to using a number of low-cost UAVs to perform the same task. Where
cost is related to power, size and weight [18].

2.4 UAV Swarm Applications

A big variety application as shown in Fig. 3 exists where UAVs swarm systems are
used. Figure 4 shows the review publications in UAVs swarm and their applications
between years 2018 to 2022. The following subsection provides an overview of most
important UAVs applications.
A. Photogrammetry
Photogrammetry enables to extract the quantitative information’s from scanned
images, in addition to recover point surface position. Several works addressed UAVs
swarm performing imagery collection. For example, [19] presented low altitude
thermal image system enable to observe specific area respected to flight plan.
B. Security and Surveillance
Many applications use UAVs swarms in video surveillance by cameras to cover
specific targets [20]. It also helps in monitoring, the traffic control operations, in
addition to many military monitoring operations.
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 375

Fig. 3 UAVs swarm systems applications

3000
Number of Publications

2500
2000
(2018 - 2022)

1500
1000
500
0

Fig. 4 Publications of UAVs Swarm systems applications (Google scholar)

C. Battlefields
Swarms of UAVs help cover battlefields to gather intelligence and transfer it to
ground receiving stations for decision-making [21]. In many military applications,
376 N. M. Elfatih et al.

UAV swarms serve to locate enemy locations in urban or remote areas, in lands or
seas.
D. Earth Monitoring
UAVs swarms are used to monitor geophysical processes and pollutant levels by
means of sensors and processing units that make independent surveys along a
predetermined path [21].
E. Precision agriculture
In the agricultural field, UAVs swarm help in spraying pesticides on plants to combat
agricultural pests while ensuring high productivity and efficiency [22]. They can also
monitor specific areas and analyze data to make spraying decisions.
F. Disaster management and good delivery
AUVs swarm assist in rescue operations during disasters, especially in the early
hours, and help deliver emergency medical supplies [24]. It can also assess risks and
damages in a timely manner by means of integrated atmospheric computing. Some
companies, such as Amazon, are working on using UAVs swarms to deliver goods
through computing systems and the Internet [25, 26].
G. Healthcare Application
In healthcare, UAVs help to collect data from different medical levels, from sensor
information related to patients, to health centers [27]. One of the examples of these
applications is the UAVs star network topology which uses radio alert technology to
allocate resources, which consists of the following stages.
• Stage 1: data collection, enables the UAV to gathering patients’ information.
• Stage 2: data reporting, enable to reporting the information’s collected by the
UAV to medical servers, or doctors end devices.
• Stage 3: data processing enables to take decisions about patient’s healthcare to
provide diagnosis and prescription.

3 Swarm Communication and Control System


Architectures

To design an efficient and high stable UAV swarm communication architectures


and protocols, it is necessary to take these challenges into account [28]. Figure 5
shows general reliable communication service scenario for UAVs. The UAVs
swarm communications architectures and it provided services are deal with many
requirements as follows.
• Increasing in spectrum demand accordingly to expected UAV applications.
• Higher bandwidth and data rates to support upstream data traffic for several UAVs
surveillance applications. Accordingly, there is a need to develop new strategy to
handle the big data traffic between the UAV members in swarm.
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 377

Fig. 5 UAVs network communication architecture

• A heterogeneous QoS is required in both uplink and downlink communications


for integrate the operations of UAV in swarm through the cellular network [19].
• High mobility and dynamic topology for UAV swarms, in addition to the high-
speed distinction where it needs high reliability and low latency communication
networks.
• Ability to manage spectrum overcrowding, since the UAVs can be operated on
different IEEE bands such as L and S bands, in addition to others related for
medical and industrial bands [13]. The UAVs also able to communicate with other
wireless technologies such as Bluetooth, WI-FI networks [29]. Accordingly, UAVs
such consist many operating devices to have deal with all these communication
bands.

3.1 Centralized Communication Architecture

The UAV swarm communications architecture illustrates the mechanism for


exchanging information between UAVs among themselves or with the ground infras-
tructure, as it plays an essential role in network performance, intelligent control and
cooperation of UAV swarms. [30]. Figure 6 shown the general UAV swarm commu-
nication architecture in centralized based approach. The center station is known
as Ground Control Station (GCS), which enable commutations to all UAV swarm
members [31]. The centralized architecture approach enables to let UAV swarm
network extended from single UAV to manage many UAVs. The GCS motoring the
UAV swarm to have decision making related to UAVs to manage their speeds and
positions [32]. The GCS also provides message control to let UAVs communicate
together [14].
378 N. M. Elfatih et al.

Fig. 6 Centralized UAVs swarm architecture

3.2 Decentralized Communication Architecture

With the increase in the number of UAVs in the swarm, the centralized communication
approach can be used, which provides an organizational structure that reduces the
number of swarm and UAVs numbers connected to the central network and gives
the independency for some UAVs [33]. Also, the long distances that UAV can travel
can lose their connection to the central network, so other decentralized networks are
allocated to the aircraft to carry out interactive communications in real time [15].

3.2.1 Single Group Swarm Ad Hoc Architecture

In single-group swarm Ad hoc network as shown in Fig. 7, the swarm’s internal


communication is not dependent on infrastructure. The communication between the
swarm and the infrastructure is a single point link based on a specific UAV acting as
a gateway [34]. In single group swarm ad-hoc networks, some UAVs act as a relay
node to forward data between UAV members is swarm. The UAVs can share situation
information in real time to improve collaborative control and efficiency. Likewise,
the interconnection between the UAV gateway and the infrastructure also enables
exchange the swarm information [35].
The UAV gate works as a tool for communicating with UAVs in short distances
and also with infrastructure in the long range. The gates reduce the burden on other
UAVs by reducing their devices and reducing their cost, which helps to expand the
range of communication and speed up the maneuvering performance of the UAVs
[16, 35]. However, in order to ensure consistent swarm communication, the flight
patterns of all UAVs in the swarm must be similar and operate under a single scenario
proportional to their size and speed [17, 36].
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 379

Fig. 7 Single group swarm Ad hoc network architecture

3.2.2 Multi Group Swarm Ad Hoc Architecture

A dedicated single group swarm network can be combined with other networks
as shown in Fig. 8, so that each network has a central architecture and a special
architecture with different applications depending on the task. The architecture is
organized in a centralized manner but the difference is at the level of the UAVs within
each private network group [37]. The architecture of communication within UAVs
swarm groups is similar to the architecture of communication within a swarm, with
a mechanism for communication between groups defined by the infrastructure. The
responsibility for connecting to the infrastructure lies with the gateway’s UAVs and
for coordinating communications between the missions of the various UAV groups.
This architecture helps to conduct specific multitasking applications for groups to
conduct joint multi-theater military operations so that the central control center can
communicate with different UAV swarms [18, 37].

3.2.3 Multi-Layer Swarm Ad Hoc Architecture

An ad hoc multi-layer swarm network architecture is an important type of architecture


that is suitable for a wide range of UAVs, as shown in Fig. 9. In which a group of
neighboring drones of the same type form a dedicated network as the first layer of the
communications infrastructure [38]. In this architecture there are different types of
drone kits on the drone gateway and it contains a second layer to enable connection to
the nearest drone gateway to the infrastructure. At the third layer it does not require
communication between any two drones in an ad hoc multi-layer swarm network
architecture but the interconnection of the drones in the same group is done on the
first level. Multi-layer custom network architecture compensates for the increase or
decrease of UAV nodes and quickly implement network reconstruction [39]. The
multi-layer ad-hoc network architecture works with scenarios where swarm UAVs
380 N. M. Elfatih et al.

Fig. 8 Multi-group swarm Ad hoc network architecture

missions are complex and there are a large number of UAVs performing the missions
allowing for a change in the network topology, and communication between the
UAVs [19, 39].
According to what has been reviewed, UAV communications engineering has
evolved significantly to serve a number of different and important scenarios.
According to this, there are different communication structures to choose from
among these structures. Table 2 summarizes the advantages and disadvantages of
the discussed architectures. It turns out that the central communications architecture

Fig. 9 Multi-layer swarm Ad hoc network architecture


Navigation and Trajectory Planning Techniques for Unmanned Aerial … 381

Table 2 UAVs Swarm communication architectures summary


Features Centralized Decentralized architecture
architecture Single-group Multi-group Multi-layer
Multi-hop communication ✕ ✓ ✓ ✓
UAVs relay ✕ ✓ ✓ ✓
Heterogeneous UAVs ✕ ✕ ✓ ✓
Auto configuration ✕ ✓ ✕ ✓
Limited coverage ✓ ✓ ✓ ✕
Single point of failure ✓ ✕ ✓ ✕
Robustness ✓ ✕ ✕ ✓
Note “✓” = supported “✕” = not supported

is suitable for scenarios with a small number of UAVs swarms with relatively simple
tasks. The more complex the tasks and the larger the swarms, the other architec-
tures are used according to the required scenario [40]. In case of expanded coverage
through, multi-hop network scenario, the decentralized communication architecture
is suitable for this purpose [41].
There are many communication technologies enable to provide UAVs communi-
cations. Figure 10, shows the classifications different UAVs communication tech-
nologies categorized four types based on, cellular, satellite, Wi-Fi based, and
cognitive radio UAVs communications.

4 Navigation and Path Planning for UAV Swarm

Navigation of UAV could be described as a procedure for robot makes a strategy on


how to quickly and safely reach the goal position, which typically depend on current
location and environment. In order to effectively for the scheduled assignment to be
completed, a UAVs should be aware fully of its statuses, comprising, heading direc-
tion, navigation speeds, location, as well as target location and starting point [41].
Today, several navigation techniques have been introduced and could be basically
alienated into three groups: satellite, vision-based and inertial navigations. However,
all these techniques are not perfect thus, it is critical to propose a suitable one for
navigation of UAV conferring to the explicit mission [42]. The navigation base vision
demonstrates to be a promising and primary autonomous research of navigation direc-
tion with the fast computer vision development. First, the visual sensor could offer
rich surroundings operational information second, sensors also, are extremely suit-
able for active environment perception due to their extraordinary anti-interference
capability. Third, utmost visual sensors are passive, which correspondingly avoid to
be detected by attackers [43].
382 N. M. Elfatih et al.

Table 3 Summarization of Trajectory planning methods


Algorithms Approach Extend to 3D Advantages Disadvantages
CD Workspace Yes Could be expanded to Needs searching A*
modeling 3D, algorithm
Mobile robots’
applications
VD Workspace Yes The obstacles will be Difficult to apply in a
modeling, and far from the planned 3D environment
Roadmap routes
VG Workspace Yes In polygon-based and The obstacles will be
modeling, and regular environments closed to planned
Roadmap have better route
performance Hard in cluttered
environments
APF Potential field Yes No need algorithm of Easy to fall in a local
modeling minimum
environment,
complexity time is
low, generated easy
path, avoidance of
obstacle in real-time
A* Searching Yes A low-cost and short Possible high time
path, could be complexity
associated with
algorithm of modeling
environment i.e., GD
Dijkstra Searching Yes the shortest path is High time complexity
Guaranteed
RM Workspace Yes Path finding is No optimal path is
modeling, and Guaranteed (needs guaranteed
Roadmap sampling nodes to be Hard generate path in
increased with no a slight gap,
boundaries) Complexity of
computation

Planning of the path is defined as the method of finding shortest and an optimal
path between destination and source. One of the utmost important difficulties to be
discovered in the UAVs arena. The core goal of UAV’s path planning is to discover
an effective flight cost through a path that fulfil the requirements of UAV perfor-
mance with small collision probability during the flight [20]. Planning of UAVs
route, normally comprises three main terms [21, 44]: Planning of motion, naviga-
tion, and planning of trajectory. Planning of motion contents restraints alike flight
route, turning the motion crank of the route planning. On the other side, Planning
of trajectory includes the route planning having velocity, time, and UAVs mobility
kinematics of whereas navigation is concerned with localization, and avoidance of
collision.
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 383

Fig. 10 UAV swarm communication technologies

For UAV route planning, a three-dimension for environment is essential, as in


a multifaceted environment two-dimension (2D) route planning technique would
not be capable to discover the objects and obstacles. There are several UAVs route
planning techniques for obstacle navigation. The three-dimension (3D) techniques
are formulated route-planning as a problem of optimization [45].

4.1 UAVs Network Communication and Path Planning


Architecture

UAV needs to accomplish route planning during motion from a source to a destina-
tion. UAVs recognize the neighboring environment by utilizing sensors to navigate,
control and plan flight mobility. The UAV route planning stages that required to be
tracked during the operations execution are (i) climate and weather sensing, (ii) navi-
gations, (iii) UAVs movement control. The mentioned above stages are to be applied
throughout the trip [46]. The climate and weather sensing for the environments gives
the UAVs awareness. The route planning and navigation methods are to be applied
continuously seeking for an optimal route. The UAVs movement and velocity are
monitored by central controller for avoidance of collision. Furthermore, the UAVs
need to communicate with neighbor UAVs for the network management during their
goal of mission [47].
384 N. M. Elfatih et al.

There are requirements for 3D route planning in multifaceted environment. The


2D route planning techniques are not suitable for such environments which would
be confused about discovering objects and obstacles compared to the sophisticated
3D environments. Then, 3D route planning methods are on severe demand for UAV
navigation and surveillance applications a complex and cluttered environments [47].
The UAVs energy of communication of base-stations could be decreased by opti-
mizing the power of transmission. Likewise, reducing the UAVs machine energy, it
required a model for consumption in UAVs systems where the efficient UAVs energy
could be modeled as:

E = (Pmin + h) (t) + (Pmax)(h/s) (1)

where, t presents the time of operating, h is the height, and s is the UAVs velocity.
Pmin and Pmax depend on motor specification and weight. One can write, Pmin is the
lowest power that required for UAV to start with α as the motor velocity. Hence,
forth, the entire communication cost (Tcom ) that reduced the UAVs cost and time can
be modeled as:

Tcom = ts + (to + th )l (2)

whereas, ts denoted as UAVs start time, to denoted as time overhead, th denoted


as UAVs hop time, and l denoted as the accumulated links between the start point
and target. Having such parameters, collision avoidance, robustness and complete-
ness aspects, which were used and considered for optimal path finding for UAVs
algorithms.

4.1.1 Trajectory Planning in 2D Environments

In conventional route planning techniques, information of environment has generally


been defined by a 2D level. In UAV case, it is supposed to maintain manual adjust-
ment or height for its flight. Optimization in a 2D route planning issues are non-
deterministic polynomial-time hardness (NP-hard) issue hence, no certain solution
is occurred [48]. Luckily, Collective Influence (CI) algorithm reduces the gradients
computing requirements of constraint and cost functions. This empowers the NP-hard
technique to be optimized and resolved. Usually, 2D route planning algorithms can
be categorized to three forms rendering to the UAV constraints. The first algorithms
deal with the UAVs as particles. In this situation, designers could concentrate on the
optimal path computation. Though, this type computation is quite sophisticated and
hard to implement, as NP-hard problems could be consistently converted to optimal
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 385

route constraint with spatial-search [49]. Though NP-hard problem has no regular
solution, multiple CI algorithms [23, 24] could work for route planning optimization
by simplifying the cost and constraint functions gradients computation.
The second algorithm model the problem based on the shape of the UAV. For
UAV shape-based algorithm, the problem can be converted [25, 49] to be considered
as 2D shape with shape parameters i.e., gravity center and wing span. Then, it could
be resolved by the method used UAV as a particle.
The third technique is UAV to be modeled as dynamic and kinematic constraints,
such as, its max and min radii turning. Related to the previous techniques, the
third technique is more complicated however more applied in applications. For
the dynamic and kinematic constraints, CI method [26, 50] could be used with the
advantages in computation and fast convergence.

4.1.2 Trajectory Planning in 3D Environments

As growing fields range, for example, navigation, detection, operations and trans-
portation, are all needed for UAVs application. Because of the environment’s
complexity, which have many factors with uncertainty and unstructured, 3D route
planning robust methods are crucially required [51]. Though route planning of UAV
for 3D spaces presents excessive opportunities, contrasting to route planning for 2D.
The challenges are dramatically increased for kinematic constraint. One of the tradi-
tional problems can be modeled for the 3D space while considering the kinematic
constraint for collision-free route planning. Bear in mind, kinematic constraints such
as temporal, geometric, and physical difficult to be resolved by conventional CI
methods which may encounter numerous of difficulties i.e., convergence rate is low
and exploration range is wide [52].
This article focuses and discusses 3D environments with emphasis on the chal-
lenges mentioned in the above sections. 3D techniques have various advantages
and characteristics when joint with suitable CI algorithms. To avoid the challenges
of 3D unmature and slow convergences in UAV route planning with low-height,
numerous of Genetic Algorithm (GA) can be used for route planning [27, 28]. The
enhanced particles swarm optimizations (PSO) techniques [29, 30] can be utilized to
solve blind exploration of wide range problems and execute comprehensive 3d route
optimization.
An improved ant colony optimization (ACO) techniques [31] for planning for
3D route have been discussed extensively, where it could also enhance the selection
speed and reduce the finding optimal local points probability. Unlike swarm methods,
Federated learning (FL) algorithm has generally been utilized for navigation vision in
UAV to enable images decisions and detection [32]. To discuss UAV route planning
attack problem, fusion neural-network (NN) based technique has been proposed [33].
This method could simply be enhanced by parallelization methods. Recently, with
the computing chip development, need more computing time and high performance
386 N. M. Elfatih et al.

Fig. 11 Classification of trajectory planning algorithms

for deep learning (DL) and machine learning (ML) techniques [34, 35] have been
guaranteed. These techniques (i.e., ML and DL methods) were greatly been used in
UAV 3D route planning to resolve the NP-hard problems more accurately in a wide
search region [53].

4.2 Trajectory Planning for UAVs Navigation Classifications

The route planning for UAV could be categorized in three methods namely
combinative sampling-based and biologically-inspired methods as presented in
Fig. 11.

4.3 Route Planning Challenges

• Route length: The route length is identified as the total path that UAVs can move
from start points to the end points.
• Optimization: which defined as the route calculation and parameters should be
efficient in time, energy and cost. It could be identified to three classes, i.e.,
non-optimal, sub-optimal and optimal.
• Extensiveness: which is identified as the characteristics that utilized in route
planning for discovery the route. It offers the UAVs a platform and an optimal
route solution.
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 387

• Cost-efficiency: it relies on the UAVs network cost computation. It comprises


of several influences such as peer-to-peer cost, cost of fuel, cost of charging the
battery, cost of memory spaces, cost of hardware and software.
• Time-efficiency: it defines the minimum time that UAVs can move from start
point to the target point assuming there are obstacles on the path. This could be
likely if UAVs utilize shortest and optimal path.
• Energy-efficiency: it is meaning the minimum UAVs consumption of energy in
terms of energy, fuel, battery used for pass from starting to destination points.
• Robustness: which is identified as tolerance of the UAVs and resilience against
errors i.e., hardware, software, protocols and communication during route
planning.
• Collisions avoidance: is identified as the capability of UAVs to for collision
detection and avoidance to avoid any crashes or physical mutilation to UAV.

5 Classical Techniques for UAV Swarm Navigation


and Path Planning

5.1 Roadmap Approach (RA)

The Roadmap Approaches (RA) is also identified as highway approaches. It is a


two-dimensional network of straight lines connecting the start and the destination
points without intersecting the obstacles defined in the map [54]. The basic idea of
this algorithm is to generate sampling nodes in the C-space randomly and connect
them. The RA consists of generating a fixed number of random points, which could
be called milestones in the search space. Milestones within an obstacle are discarded.
All remaining milestones are sequentially interconnected by straight lines, starting
from the robot starting point. Straight line segments within obstacles are discarded
[55]. The remaining line segments become the edges through which the robot can
travel collision-free. For a given start point (Ps) and target or finish point (Pf), all
possible connecting paths or routes are generated by collision to avoid obstacles.
A typical factor such as A* search technique is used to finding the shortest route
between initial point and destination. The resulting route consists of a series of way
points connecting the start and target locations.
In overall, the algorithm works as shown in Fig. 12. The map is identified by a
range named the total range, Crange. Then, separated to free obstacle range Cfree
and the obstacle range Cobst. Then, a connectivity graph network Qfree is created
by choosing a group of points that could be linked by straight line such that the
consequential discretization generates a group of polygons that surrounded obstacles
in Crange. The achieved graph connectivity is utilized to create a proposal for all
probable collision-free paths. Then, A* search algorithm is utilized to discover one
or more paths based on parameters that used from start to end points or positions in
between [56].
388 N. M. Elfatih et al.

Fig. 12 The road map method

This technique is also utilized for obstacles in polygonal environments in which


the polygon ribs are illustrated by edges and nodes. Two of these techniques were
used to represents paths by graph connectivity. An example of such techniques is the
graph visibility and Voronoi’s diagram [36, 56].
A. Visibility Graphs
The visibility graphs (VGs) are widely utilized for route planning algorithms based
on the Cspace modeling roadmap algorithm. This is an earliest method used for
path planning. As the name proposes, the VG produce a lines-of-sight (LoS) routes
throughout an environment. In the visibility graphs, the obstacles vertices represented
by finite number of nodes between start and end point. Each node VG is representing
a location of point, while the route between points is represented by a connected lines
[57]. If the connected lines do not cross any obstacles, means the path is feasible and
considered as visible path and draws in the visible graph as solid line, as shown in
Fig. 13. Otherwise, it considered as un feasible route which requires to be deleted from
the visible graph. The similar procedure is recurring for the other nodes remaining
until the finish point/node. VG builds the roadmap which identifies the free spaces
around obstacles, thus translating the connected Cspace into a structure with graph
skeleton. Lastly, A path is then produced utilizing searching graph algorithm such
as Dijkstra protocol to discover the shortest route that links route from start to end
point [58].
The VG concept could be expanded to a 3D graph environment, which utilizing the
3D plane rather than lines. Many papers in literature discuss the usage of VG in 3D
spaces. For example, authors in [38] presented a technique for transferring 3D into
2D problems, then discover the route by using the legacy 2D VG algorithms. Finally,
it adds additional view which is the path altitude [59].
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 389

Fig. 13 Visibility graph

B. A* algorithm
A* algorithms are a traversal graph and route searching algorithms which is
commonly utilized for discovering the optimum route due to its optimality and
completeness [60]. It discovers the optimum route with less processing time and
it has to store and remember all nodes that have been visited earlier. It uses those
memories to identify the best path that can be taken from its current state. To find
the following node, one can use the below expression.

f(n) = g(n) + h(n) (3)

where n denotes as following node on the route, g(n) is the route cost from node S
to n, and h(n) is a heuristic process that calculates the lowest path cost from G to n.
The minimum path cost is approximated and estimated to reach the next optimum
node point. The repeated optimum node points are estimated based on these expenses
of the optimum route by obstacles avoiding [61]. Figure 12 illustrates the phases
that elaborate in searching the optimum route in the A* searching algorithms. It
is basically based on efficient heuristic cost, the high expanded search areas, and
appropriate only in a static circumstance. Since A* algorithm solve the optimum
routes that made by neighbor nodes to build the roadmap, it results with route jagged
and long solution.
390 N. M. Elfatih et al.

Algorithm 1: The A* Algorithm


Input: start, goal(n), h(n), expand(n)
Output: path
if goal(start) = true then
return makePath(start)
end
open←start
closed ← ∅
while open ≠ ∅ do
sort(open)
n ←open.pop()
kids expand(n)
forall kid ∈ kids do
kid.f (n.g + 1) + h(kid)
if goal(kid) = true then return makePath(kid)
if kid ∩closed = ∅ then open←kid
end
closed n
end
return path

5.2 Cell Decomposition (CD)

Cell decomposition (CD) is a path-planning algorithm based on a 2D C-space


modeling approach. In the cell decomposition method, the environment is divided
into non overlapping grids (cells) and uses connectivity graphs for traversing from
one cell to another to achieve the goal. Possible routes from the start to finish points
are then created that pass-through neighbor free cells (no obstacles in these cells)
[62]. The obstacles are isolated by connectivity finding among the free cells. Thus,
a discrete version of the environments is created. Search algorithms are utilized to
connect neighbor free cells. Figure 16 presents the process schematic. The shaded
cells are removed due to obstacles occupation in grey. A connectivity between the
start and destination points is computed by linking the free cells by a straight lines’
series [63]. The Fig. 14 represents a simple environment division, which is known
as the cell decomposition exact method. If no path is found, then the cells are then
decomposed into smaller cells and a new search is completed The CD method is
characterized as exact, adaptive, and approximate.
In the exact CD cells does not have a precise size and shape, but could be deter-
mined by the environment map, location and obstacle shape [63]. This approach
utilizes the regular grid in several ways. Firstly, the available environment free space
is disintegrated into small parameters (triangular and trapezoidal) followed by a
number for each parameter. Each parameter in the environment represents a node in
the graph connectivity. The neighbour nodes are then permitted to joint in the space
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 391

Fig. 14 Cell decomposition


(CD)

arrangement and a route in this chart equivalences to a digression in free space. This
is drawn by the striped cells succession [64]. A route in this graph associated to a free
space network, which is drawn by the striped cells succession. These channels are
then changed into a free route by linking the underlying arrangement to the objec-
tives design through a midpoint of the crossing points of the contiguous cells in the
channel.
In approximate CD, planning spaces have been utilized to identify a regular grid
has an explicit size and shape, henceforth it is easy to configure. Adaptive CD recog-
nizes the presented information in the free space and follows the avoidance basic
concept of the free space in regular CD [44, 64].
The benefits of this method are that it is practical to implement above two-
dimensions and relatively quick to compute. However, because it is an iterative
process, it is not necessarily practical to compute online as there is no guarantee
when, or if, a solution is found. Additionally while there are both exact and approx-
imate cell decomposition methods, the approximate method (shown in the figure
above) can provide very suboptimal solutions.

5.3 Artificial Potential Field (APF)

Motion planning field using APF initially used for online avoidance of collision
for where UAVs do not have previous knowledge about obstacles however it avoids
it in real-time manner. The comparatively simple concepts treat the vehicles as a
node under the effect of an APF where the differences in the spaces characterize
the environment structure [65]. The attractive potentials reflect the vehicle pull to
the goal and the repulsive potentials reflect the UAV push from the obstacle [44,
66]. Consequently, the environment is disintegrated into values set where high value
is linked to obstacles and low value is linked to the goal. Several steps are used to
392 N. M. Elfatih et al.

Fig. 15 Example of the potential field method

construct the map using potential fields. First, the target point is assigned a large
negative value and Cfree is assigned increasing values as the distance from the goal
increases. Again typically, the inverse of the distance from the goal is used as a value
[10, 65]. Second, Cobstacle is assigned as the highest values and Cfree is assigned
decreasing values as the distance from the obstacles increases. Typically, the inverse
of the distance from the obstacle is used as a value. Finally, the two potentials in
Cfree are added and a steepest descent approach is used to find an appropriate path
from start point to the end point (see Fig. 15 on the right) [45] (Table 3).

6 Reactive Approaches for UAV Swarm Navigation


and Path Planning

6.1 Genetic Algorithm (GA)

The Genetic Algorithms (GA) is an dynamic stochastic searching algorithms based


on the natural genetics and selection utilized to resolve the optimization issues [46,
66]. In the terms of route-planning, Genes are points that are waypoints on the route,
and GA uses genetic operations to for initial route optimization. There are five genetic
operations’ stages in GA route planning [47, 66]:
• The cross operation: randomly select two points from two routes exchange the
remained route after the selected points.
• The mutation operation: randomly select one point from a route and swap it with
a point that does not select by any route.
• The mobile operation: random select a point in the route and change it to a
neighboring location.
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 393

• The delete operation: randomly select a point in the route, then connect any two
neighbor nodes together. If eliminate the selected node results in a short without
collision route, then eliminate this point from the route.
• The enhance operation: can be only utilized in the collision-free routes. Choose a
point from the route and enclose two new points in two sides of the chosen point.
Then, link the two new points with a route. If the new route is viable, eliminate
the chosen point.
The mentioned genetic operations are used for parent routes to create an opti-
mized child route. In GAs, the parent route is identified as the preliminary route
achieved from the preceding route planning operation, which could be achieved
using Roadmap. The parent routes should be line sections that link the start and the
end via numerous midway points. GAs are robust search algorithms which needs very
minor information on the environment for efficient search [67]. Most of the studies
have studied static environment navigation only by utilizing GAs however, naviga-
tion in dynamic environment with existing of mobile obstacles is not been discussed
extensively in the literature. To have excellent achievements in UAB route plan-
ning, several studied have been studied GAs applications along with other intelligent
algorithms jointly which sometimes called hybrid approaches [50].

6.2 Neural Network (NN)

The ANN structure concept is stimulated by the neural biological network operation.
It is built on a group that connected with computation function known as artificial
neuron (AN). Each link between ANs has ability to transmit a signal from one point to
another [68]. The ANs process the signal received and then signal the ANs associated
to it. In ANN configuration of UAV route planning, the link between the neurons
is called signal and it is usually real number. The neuron output is computed by a
nonlinear function. They are typically optimization through mathematical stochastic
approaches based on huge amounts of data fitting [69]. Then, we can attain a suitable
solution which can be converted by mathematical function.
The ANN algorithms reduce the mathematical complexity by eliminating the
collocating requirement for the computational environments and providing fast
computer equipment [62]. Since an ANN is created by using parallel computation
the convergence is generally very fast, and the created route is safe and optimal
[63]. There are, two key forms of ANN approaches have been used in UAV route
planning: firstly, a UAV built its route on a sample trajectory and utilizes a direct asso-
ciation approach to optimize and compute the trajectory [64]. Secondly, it uses NNs
to estimate the system dynamic, objective function, and gradient, which eliminate
the collocation requirement, thus reducing the nonlinear programming problems size
[65]. Presently, second type approaches are more popular. Then, it has been expanded
to determine its best for resolving multiple-UAVs problem [66]. Additionally, ANN
has generally been combined with other approaches and algorithms [67, 68] such as
394 N. M. Elfatih et al.

the PFM, PSO, and GA, to maximize their advantages. Deep neural networks (DNNs)
are a multi-layer NNs and have been extensively used in the AI field recently, such
as speech recognitions and images processing. Due to its capability to characterize
and extract features precisely, it can be applied for UAV future facilitation for route
planning in complex environments.

6.3 Firefly Algorithm (FA)

Firefly algorithms (FAs) are stimulated by the fireflies’ behavior and flashing activi-
ties, although it is also known as the meta heuristics’ algorithms. Its concepts include
general identifications and random states as trial/error of firefly which is present in
nature statistically [70]. The firefly is a flying beetle of the Lampyridae family and
usually is called lightning bugs due to its capability to create light. It creates light by
a process of Luciferin oxidation in the enzymes Luciferase presence, which arises
very rapidly. The light creation process is known as bio luminescence and fireflies
utilize this light to glow without spending heat energies. Firefly uses the light for
mate selection, message communication and occasionally also for terrifying off other
insects who try to attack it.
Recently the FAs have been utilized as an optimized tool and its applications are
spreading in nearly all engineering areas such as robot mobile navigations. In [70],
the authors presented Firefly algorithms-based robot mobile navigations approach
in the of static obstacles presence. The paper attained the three primary navigation
objectives such as route safety, route length, and route smoothness. In [71], authors
showed the FAs for the shortest path with free collision for single robot mobile
navigation in simulations environment. In [72] established the FAs for underwater
robot mobile navigation. Authors established strategy for swarm robots scheduling
for jamming and interference avoidance in 3D marine space environment. Reference
[73] discussed a similar environment, where a real-life underwater robot navigation
in partially pre-known environment is presented by utilizing the levy light-fireflies
based method.
The FAs based cooperation for dead robot detection strategy in a multi-mobile
robot environment is discussed by [74]. The FA 3D application for world exploration
with aerial navigations is implemented and developed by [75]. An enhanced version
of FAs is applied for unmanned combat aerial vehicle (UCAV) route planning in a
crowded complex environment and to avoid hazard areas and minimizing the fuel
cost. The concentric sphere based modified FA algorithm has [76] been presented to
avoid random moving of the fireflies in less computational efforts. The experimental
and situational results show a great commitment in achieving the goals of navigation
in a complex environment. Reference [77] Addressed the problem of navigation
specifically in dynamic conditions.
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 395

6.4 Ant Colony Optimization (ACO)

The ACO algorithms initiated from the ant community behaviour and its capability
to search for the best shortest route from the source (nest) to a destination while
they are seeking for food [78]. In route planning method, all the routes of ant swarm
establish the optimized solution space for the problem. The pheromone concentration
is increasingly accumulated on shorter routes, and the number of ants selecting the
route is also growing. Ultimately, the entire ants concentrate on the shortest route
under confident feedback, and the consistent solution is the optimum to the route
planning of optimized problem [80]. ACO algorithm for UAV route planning is
typically developed by dividing the area of flying into a grid and enhancing a route
between a grid point and the destination points [85] to conduct the optimal route
efficient and rapid search [81]. An improved algorithm was discussed in [81] with
the assistance of a climbing weight and a 3D grid. Today, the ACO is utilized for
efficient route planning, and to handle the robot mobile navigation problems for
obstacles avoidance.
The ACO compared with other Collective Influence (CI) algorithms, the ACO
has solid robustness and capability to search for a best solution. Furthermore, the
ACO is an evolution population-based algorithms that are fundamentally easy and
parallel to run in parallel. To enhance the performance of ACO algorithm in route
planning problematic issues, the ACO algorithms can be simply combined with a
various heuristic algorithm.

6.5 Cuckoo Search (CS)

The CS algorithms are based on the cuckoo’s lazy behavior for putting their eggs
in the of other birds’ nests. The algorithms follow three basic guidelines for an
optimized solution problem as discussed in [79]. At a time, each cuckoo put one egg
in a randomly selected nest. The best nest with high eggs quality will be passed to
the next generation. The number of nests available is usually fixed, and the cuckoo
egg laid has a probability of P ∈ (0, 1) to be discovered by the host bird. In such
case, the host bird can either abandon the current nests and build another one or
get rid of the egg. The CS algorithms are an enhanced approach due to the grows
of the efficiency and rate of convergence, henceforth it is extensively recognized in
various optimization engineering problems. Robot mobile navigations are one the
area where computational time and performance are need to be optimized [80].
The CS algorithms utilized for wheeled robot navigation in a static environment
the environment is partially known, and have shown real-life experiments and simu-
lations over the complex environments. The simulation and experimental results
present good arrangement as there was a much slighter deviation errors [81].
The CS-based algorithms perform well when combined with other navigation
methods. One such method is a combined of adaptive neuro fuzzy inference systems
396 N. M. Elfatih et al.

(ANFIS) and CS were proposed to obtain better navigation results in uncertain envi-
ronments. Another hybrid route planning method for an uncertain 3D environment by
hybridizing the CS with differential evolution (DE) algorithms for the global conver-
gence speed acceleration. The enhanced convergence speed aids the aerial robot to
discover the 3D environment. The CS 3D applications particularly for a battleground
has been discussed in [82]. In the manuscript, hybrid method (combing CS and DE)
has been proposed for aerial 3D route planning optimization problem. The DE is
added for the cuckoo’s selection process optimization which enhanced CS algorithm
noticeably, where the cuckoos were act as searching agent for optimum route.

6.6 Particle Swarm Optimization (PSO)

PSO is an algorithm that describe birds flocking based optimization approach. There
are two parameters in this approach: position and speed. Position defines the move-
ment direction, while speed is the movement variable. Each element in the search
space individually searches for the optimum solution, and saves it as the present indi-
vidual value, and shares this value with other elements in the whole swarm, and finds
the optimum individual value for the entire swarm [82]. The present global swarm
optimum solution is that all elements belong to the swarm adapt their position and
speed according to the present individual value they found and the present global
optimum that distributed to the whole entire particle swarm [83].
Extensive studies have been done based on the UAVs route planning by applying
PSOs approaches and its alternatives. In PSOs, individual or particle is initialized
randomly. Each one of these particles represent a probable solution to path planning
problem and search around within certain space to look for optimum position. PSOs
have advantage compared with other computing approaches as it can faster finds
solution [81, 83].
Each particle in the swarm has its own individual speed, Vi and individual location,
Xi and search towards the local optimal position, Pi and global optimal position,
Pg. Local optimal location is the location at which the elements in swarm meet its
optimum suitability during fitness evaluation phase. For global optimal position, X’ is
obtained by particle for the whole swarm obtained. It achieves the optimum solution
by iterations. In each iteration, each element would apprise their position and speed
until extreme iteration is reached.

6.7 Bacterial Foraging Optimization (BFO)

BFO inspired by the behavior of an M. Xanthus and E. coli bacteria optimization


process. The bacteria searches for nutrients by applying the best usage of energy
attained per time. The BFOs algorithms are characterized by chemotaxis that observes
chemical inclines by which bacteria send special signals between each other. This
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 397

process has four key concepts such as reproduction/swarming, chemotaxis, dispersal,


and elimination. The bacteria behavior [84] for nutrient searching is shown as below.
• Bacteria continually move in search for more regions of nutrient on the space.
Bacteria with enough food life longer and can be split into two equivalent parts
while bacteria with the lesser nutrient regions will die and disperse.
• Bacteria exist in the more nutrient regions are involved with others by chemical
phenomenon and bacteria exist lesser nutrient regions give a caution signal to
other bacteria utilizing a special signal.
• Bacteria grow a highly nutrient regions on the space.
• Bacteria are disseminated in the space for new nutrients regions.
The BFO algorithm applications for robot mobile navigations in static environ-
ments is discussed initially by [84] with variable speed based on Cauchy, uniform, and
Gauss distributions. The same strategy in the existing of obstacles is discussed, for
navigation in static environments. Real-time navigations in building floor, corridor
and, lobby environments for a single robot mobile system are discussed in [85].
For Performance improvement for wheeled robots in route planning, an improved
BFO algorithm is proposed [86]. The proposed method models the environment
by utilizing an APF algorithm between two contrasting forces i.e., repulsive forces
for the obstacle and attractive forces for the goals. The method examines the nega-
tive feedbacks from the algorithm to choose appropriate direction vectors that lead
the search processes to the auspicious area with a optimum local search. The navi-
gation in the exist of several robots is itself a problematic issue BFOs algorithms
are proposed to deal with such a condition [87]. The authors combined the search
harmony algorithm with BFOs. Away from the application of wheeled robots, the
BFOs algorithms have been validated effectively for an industrial manipulator as
reviewed authors in [88] who discovered that the enhanced BFOs give best results
compared to the conventional BFOs. The UAV navigations problem by utilizing
BFOs have proposed by [88]. In the manuscript, the BFOs have been combined with
a proportional integral derivatives (PIDs) controller to obtain optimum search coef-
ficients in 3D spaces and to avoid complex models while adjust the controller for
UAVs.

6.8 Artificial Bee Colony (ABC)

The ABCs algorithms are an intelligent-based swarm techniques adapted from the
honey bees’ activities in food search and it is initially introduced [83]. The ABC
algorithms are populations-based protocol comprising of inherent solutions popula-
tion (i.e., food sources for bee). It is comparatively simple, light processing and it is
populations-based stochastic search method in the swarm algorithm field. ABC food
search cycles comprises of the following three stages. Send the working bee to food
sources and assessing the juice quality. Onlookers’ bees selecting the food source
398 N. M. Elfatih et al.

after attaining information from working bees and computing the quality of nectar.
Having the scout bee and send it onto probable food source [87].
The ABCs algorithms application to the robot mobile navigations in static envi-
ronments is proposed by [89]. The proposed method applies ABCs for local search
and evolutionary algorithms to identify the optimum route. Real-time experiment in
indoor environments is discussed for result verification.
Similar techniques in static environments are also discussed by [89] however the
results were limited to simulations environment. For the navigation goal meeting
in a real-life with dynamic environments, the ABCs’ based technique is proposed
by [90]. Authors proposed hybrid method which combined the ABC with a rolling
time window protocol. Several robot mobile navigations in environments are a chal-
lenge issue, the development of ABC is successfully finalized in static environments.
Similar to the wheeled robot mobile navigations, the ABC is examined for navigation
aerial underwater and autonomous vehicles routine problem [83].
UCAV route planning purposes to attain an optimum 3D flight path by consider
the constraints and threats in the battle field. The researchers discussed the UCAV
navigation problems utilizing an enhanced ABC. The ABC is amended by balance-
evolution strategies (BESs) which completely uses the information convergence
throughout the iterations to employ the investigation accuracy and conduct balance
pursue between global explorations and the local exploitations capability [89]. ABC
algorithms applications in the military sectors have been discussed by [90], where an
unmanned helicopter has been examined for a stimulating mission such as accurate
measurements, information gathering, and etc.

6.9 Adaptive Artificial Fish Swarm Algorithm (AFSA)

AFSA is a part of Intelligence swarms, which proposed by [91]. Mostly, fishes move
to a location for best consistency food by execution social search behaviors. AFSAs
have roughly four behaviors’ prey, follow, swarm, and leap behaviors [90]. Lately,
with its robust volume of global search, good robustness, and fast convergence rates,
AFSA is extensively utilized for dealing with robot route planning issues. Hence,
several researches proposed methods to enhance the standard AFS performance by
fictitious entities of real fish. In a noble AFSA algorithm, identified as NAFSA, has
been presented to enhance the weak issues of the standard AFSA and fastening the
speed of convergence for the algorithm. A mended form of AFSA called MAFSA by
dynamic parameters control has been proposed to choose the optimum features subset
to enhance the categorization accuracy to enhance vector machines experimental
result show that the proposed method is outperform the standard AFSA [91].
A new optimization AFS is presented to enhance the counterfeiting of AFSA
behavior, which was near to reality that to enhance the ambient sense for the fishes’
foraging behavior. By testing the environment, artificial fish could monitor the
surrounded information to attain an optimum state for better movement direction. The
hybrid adaptive systems niche artificial fishes swarm algorithms (AHSNAFSAs) is
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 399

proposed to resolve the vehicles’ routing problems, and the ecological niche concept
is discussed and presented to enhance the deficiency of conventional AFSA to achieve
an optimum solution [92].

7 Conclusions

A review of on UAVs navigation and route planning approaches for autonomous


robots mobile, the advantages, and disadvantages of these algorithms were discussed
and presented extensively in this chapter. An inclusive argument for each method
in the research field under study for UAVs route planning and navigation algo-
rithms were presented. This survey is despite the major enhancement in last studies
over some years ago, a very few works of these studies has been reported in this
chapter. This survey categories the various techniques into conventional and reactive
techniques. The main themes of this review are shown below.
• Reactive techniques achieve much better than conventional techniques due
to higher ability to handle uncertainty presence in the environments. A few
researches studies were presented based on dynamic environments compared with
static environments.
• Reactive approaches use is common for real-time navigation issues.
• In dynamic environments, there are less researches on UVAs navigation for mobile
goals issue compared with mobile obstacles problem.
• Most researches establish a simulation environment researches on the real-time
environments are much fewer.
• Researches on the navigation of UASs are few compared with the single UAS.
• There are countless scopes in using newly algorithms developed such as CS,
SFLA, BA, FA, DE, HS, ABC, BFO and IWO for navigations in an uncertain
complex environment in the existence of high uncertainty and these could be
utilized to propose new types of hybrid mechanisms.
• The classical approaches efficiency can be optimized by mongrelizing with
reactive mechanisms

References

1. Lu, Y., Zhucun, X., Xia, G.-S., & Zhang, L. (2018). A survey on vision-based UAV navigation.
Geo-Spatial Information Science, 21(1), 1–12.
2. Rashid, A., & Mohamed, O. (2022). Optimal path planning for drones based on swarm intel-
ligence algorithm. Neural Computing and Applications, 34, 10133–10155. https://doi.org/10.
1007/s00521-022-06998-9
3. Aggarwal, S., & Kumar, N. (2019). Path planning techniques for unmanned aerial vehicles: a
review. Solutions, and Challenge, Com Com, 149
400 N. M. Elfatih et al.

4. Lina, E., Ali, A., & Rania, A., et al, (2022). Deep and reinforcement learning technologies on
internet of vehicle (IoV) applications: Current issues and future trends. Journal of Advanced
Transportation, Article ID 1947886. https://doi.org/10.1155/2022/1947886
5. Farshad, K., Ismail, G., & Mihail, L. S. (2018). Autonomous tracking of intermittent RF source
using a UAV swarm. IEEE Access, 6, 15884–15897.
6. Saeed, M. M., Saeed, R. A., Mokhtar, R. A., Alhumyani, H., & Ali, E. S. (2022). A novel
variable pseudonym scheme for preserving privacy user location in 5G networks. Security and
Communication Networks, Article ID 7487600. https://doi.org/10.1155/2022/7487600
7. Han, J., Xu, Y., Di, L., & Chen, Y. (2013). Low-cost multi-uav technologies for contour mapping
of nuclear radiation field. Journal of Intelligent and Robotic Systems, 70(1–4), 401–410.
8. Merino, L., Martínez, J. R., & Ollero, A. (2015). Cooperative unmanned aerial systems for fire
detection, monitoring, and extinguishing. In Handbook of unmanned aerial vehicles (pp. 2693–
2722).
9. Othman, O. et al. (2022). Vehicle detection for vision-based intelligent transportation systems
using convolutional neural network algorithm. Journal of Advanced Transportation, Article ID
9189600. https://doi.org/10.1155/2022/9189600
10. Elfatih, N. M., et al. (2022). Internet of vehicle’s resource management in 5G networks using
AI technologies: Current status and trends. IET Communications, 16, 400–420. https://doi.org/
10.1049/cmu2.12315
11. Sana, U., Ki-Il, K., Kyong, H., Muhammad, I., et al. (2009). UAV-enabled healthcare
architecture: Issues and challenges”. Future Generation Computer Systems, 97, 425–432.
12. Haifa, T., Amira, C., Hichem, S., & Farouk, K. (2021). Cognitive radio and dynamic TDMA
for efficient UAVs swarm Communications. Computer Networks, 196.
13. Saleem, Y., Rehmani, M. H., & Zeadally, S. (2015). Integration of cognitive radio technology-
with unmanned aerial vehicles: Issues, opportunities, and future research challenges. Journal
of Network and Computer Applications, 50, 15–31. https://doi.org/10.1016/j.jnca.2014.12.002
14. Rashid, A., Sabira, K., Borhanuddin, M., & Mohd, A. (2006). UWB-TOA geolocation
techniques in indoor environments. Institution of Engineers Malaysia (IEM), 67(3), 65–69,
Malaysia.
15. Xi, C., Jun, T., & Songyang, L. (2020). Review of unmanned aerial vehicle Swarm communi-
cation architectures and routing protocols. Applied Sciences, 10, 3661. https://doi.org/10.3390/
app10103661
16. Sahingoz, O. K. (2013). Mobile networking with UAVs: Opportunities and challenges. In
Proceedings of the 2013 international conference on unmanned aircraft systems (ICUAS),
Atlanta, GA, USA, 28–31 May 2013 (pp. 933–941). New York, NY, USA: IEEE.
17. Kaleem, Z., Qamar, A., Duong, T., & Choi, W. (2019). UAV-empowered disaster-resilient edge
architecture for delay-sensitive communication. IEEE Network, 33, 124–132.
18. Sun, Y., Wang, H., Jiang, Y., Zhao, N. (2019). Research on UAV cluster routing strategy
based on distributed SDN. In Proceedings of the 2019 IEEE 19th International Conference
on Communication Technology (ICCT), Xi’an, China, 2019 (pp. 1269–1274). New York, NY,
USA: IEEE.
19. Khan, M., Qureshi, I, & Khan, I. (2017). Flying ad-hoc networks (FANETs): A review of
communication architectures, and routing protocols. In Proceedings of the 2017 first inter-
national conference on latest trends in electrical engineering and computing technologies
(INTELLECT). (pp. 1–9). New York, NY, USA.
20. Shubhani, A., & Neeraj, K. (2020). Path planning techniques for unmanned aerial vehicles: A
review, solutions, and challenges. Computer Communications, 149, 270–299.
21. Mamoon, M., et al. (2022). A comprehensive review on the users’ identity privacy for 5G
networks. IET Communications, 16, 384–399. https://doi.org/10.1049/cmu2.12327
22. Yijing, Z., Zheng, Z., & Yang, L. (2018). Survey on computational-intelligence-based UAV
path planning. Knowledge-Based Systems, 158, 54–64.
23. Zhao, Y., Zheng, Z., Zhang, X., & Liu Y. (2017). Q learning algorithm-based UAV path learning
and obstacle avoidance approach. In: 2017 thirty-sixth chinese control conference (CCC)
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 401

24. Zhang, H. (2017). Three-dimensional path planning for uninhabited combat aerial vehicle
based on predator-prey pigeon-inspired optimization in dynamic environment. Press.
25. Alaa, M., et al. (2022). Performance evaluation of downlink coordinated multipoint joint trans-
mission under heavy IoT traffic load. Wireless Communications and Mobile Computing, Article
ID 6837780.
26. Sharma, R., & Ghose, D. (2009). Collision avoidance between uav clusters using swarm
intelligence techniques. International Journal of Systems Science, 40(5), 521–538.
27. Abdurrahman, B., & Mehmetnder, E. (2016). Fpga based offline 3d UAV local path planner
using evolutionary algorithms for unknown environments. Proceedings of the Conference of
the IEEE Industrial Electronics Society, IECON, 2016, 4778–4783.
28. Yang, X., Cai, M., Li, J. (2016). Path planning for unmanned aerial vehicles based on genetic
programming. In Chinese control and decision conference (pp. 717–722).
29. Luciano, B., Simeone, B., & Egidio, D. (2017). A mixed probabilistic-geometric strategy
for UAV optimum flight path identification based on bit-coded basic manoeuvres. Aerospace
Science Technology, 71.
30. Phung, M., Cong, H., Dinh, T., & Ha, Q. (2017). Enhanced discrete particle swarm optimization
path planning for UAV vision-based surface inspection. Automation in Construction, 81, 25–33.
31. Ugur, O., Koray, S. O. (2016). Multi colony ant optimization for UAV path planning with
obstacle avoidance. In International conference on unmanned aircraft systems (pp 47–52).
32. Adhikari, E., & Reza, H. (2017). A fuzzy adaptive differential evolution for multi-objective 3d
UAV path optimization. Evolutionary Computation, 6(9).
33. Choi, Y., Jimenez, H., & Mavris, D. (2017). Two-layer obstacle collision avoidance with
machine learning for more energy-efficient unmanned aircraft trajectories. Robotics and
Autonomous Systems, 6(2).
34. Abdul, Q. (2017). Saeed M: Scene classification for aerial images based on CNN using sparse
coding technique. International Journal of Remote Sensing, 38(8–10), 2662–2685.
35. Kang, Y., Kim, N., Kim, B., Tahk, M. (2017). Autopilot design for tilt-rotor unmanned aerial
vehicle with nacelle mounted wing extension using single hidden layer perceptron neural
network. In Proceedings of the Institution of Mechanical Engineers G Journal of Aerospace
Engineering, 2(6), 743–789.
36. Bygi, M., & Mohammad, G. (2007). 3D visibility graph. In International conference on compu-
tational science and its applications, conference: computational science and its applications,
2007. ICCSA 2007. Kuala Lampur.
37. Rashid, A., Rania, A., & Jalel, C., Aisha, H. (2012). TVBDs coexistence by leverage sensing
and geo-location database. In IEEE international conference on computer & communication
engineering (ICCCE2012) (pp. 33–39).
38. Fahad, A., Alsolami, F., & Abdel-Khalek, S. (2022). Machine learning techniques in internet of
UAVs for smart cities applications. Journal of Intelligent and Fuzzy Systems, 42(4), 3203–3226.
39. Ali, S., Hasan, M., & Rosilah, H, et al. (2021). Machine learning technologies for secure
vehicular communication in internet of vehicles: recent advances and applications. Security
and Communication Networks, Article ID 8868355. https://doi.org/10.1155/2021/8868355
40. Zeinab, K., & Ali, S. (2017). Internet of things applications, challenges and related future
technologies. World Scientific News (WSN), 67(2), 126–148.
41. Wang, Y., & Yuan, Q. (2011). Application of Dijkstra algorithm in robot path-planning. In
2011 2nd international conference mechnical automation control engineering (MACE 2011)
(pp. 1067–1069).
42. Patle, B. K., Ganesh, L., Anish, P., Parhi, D. R. K., & Jagadeesh, A. (2019). A review: On path
planning strategies for navigation of mobile robot. Defense Technology, 15, 582e606. https://
doi.org/10.1016/j.dt.2019.04.011
43. Reham, A, Ali, A., et al. (2022). Blockchain for IoT-based cyber-physical systems (CPS): appli-
cations and challenges. In: De, D., Bhattacharyya, S., Rodrigues, J. J. P. C. (Eds.), Blockchain
based internet of things. Lecture notes on data engineering and communications technologies
(Vol. 112). Springer. https://doi.org/10.1007/978-981-16-9260-4_4
402 N. M. Elfatih et al.

44. Jia, Q., & Wang, X. (2009). Path planning for mobile robots based on a modified potential
model. In Proceedings of the IEEE international conference on mechatronics and automation,
China.
45. Gul, W., & Nazli, A. (2019). A comprehensive study for robot navigation techniques. Cogent
Engineering, 6(1),1632046.
46. Hu, Y., & Yang, S. (2004). A knowledge based genetic algorithm for path-planning of a mobile
robot. In IEEE international conference on robotics automation.
47. Pratihar, D., Deb, K., & Ghosh, A. (1999). Fuzzy-genetic algorithm and time-optimal obstacle
free path generation for mobile robots. Engineering Optimization, 32(1), 117e42.
48. Hui, N. B., & Pratihar, D. K. (2009). A comparative study on some navigation schemes of a
real robot tackling moving obstacles. Robot Computer Integrated Manufacture, 25, 810e28.
49. Wang, X., Shi, Y., Ding, D., & Gu, X. (2016). Double global optimum genetic algorithm
particle swarm optimization-based welding robot path planning. Engineering Optimization,
48(2), 299e316.
50. Vachtsevanos, K., & Hexmoor, H. (1986). A fuzzy logic approach to robotic path planning
with obstacle avoidance. In 25th IEEE conference on decision and control (pp. 1262–1264).
51. Ali Ahmed, E. S., & Zahraa, T, et al. (2021). Algorithms optimization for intelligent IoV
applications. In Zhao, J., and Vinoth Kumar, V. (Eds.), Handbook of research on innovations
and applications of AI, IoT, and cognitive technologies (pp. 1–25). Hershey, PA: IGI Global.
https://doi.org/10.4018/978-1-7998-6870-5.ch001
52. Rashid, A., & Khatun, S. (2005) Ultra-wideband (UWB) geolocation in NLOS multipath fading
environments. In Proceeding of IEEE Malaysian international communications conference–
IEEE conference on networking 2005 (MICC-ICON’05) (pp. 1068–1073). Kuala Lumpur,
Malaysia.
53. Hassan, M. B., & Saeed, R. (2021). Machine learning for industrial IoT systems. In Zhao,
J., & Vinoth, K. (). Handbook of research on innovations and applications of AI, IoT, and
cognitive technologies (pp. 336–358). Hershey, PA: IGI Global. https://doi.org/10.4018/978-
1-7998-6870-5.ch023
54. Ali, E. S., & Hassan, M. B. et al. (2021). Terahertz Communication Channel characteristics and
measurements Book: Next Generation Wireless Terahertz Communication Networks Publisher.
CRC group, Taylor & Francis Group.
55. Rania, S., Sara, A., & Rania, A., et al. (2021). IoE design principles and architecture. In Book:
Internet of energy for smart cities: Machine learning models and techniques, publisher. CRC
group, Taylor & Francis Group.
56. Jaradat, M., Al-Rousan, M., & Quadan, L. (2011). Reinforcement based mobile robot
navigation in dynamic environment. Robot Computer Integrated Manufacture, 27, 135e49.
57. Tschichold, N. (1997). The neural network model Rule-Net and its application to mobile robot
navigation. Fuzzy Sets System, 85, 287e303.
58. Alsaqour, R., Ali, E. S., Mokhtar, R. A., et al. (2022). Efficient energy mechanism in heteroge-
neous WSNs for underground mining monitoring applications. IEEE Access, 10, 72907–72924.
https://doi.org/10.1109/ACCESS.2022.3188654
59. Jaradat, M., Garibeh, M., & Feilat, E. A. (2012). Autonomous mobile robot planning using
hybrid fuzzy potential field. Soft Computing, 16, 153e64.
60. Yen, C., & Cheng, M. (2018). A study of fuzzy control with ant colony algorithm used in
mobile robot for shortest path planning and obstacle avoidance. Microsystem Technology, 24(1),
125e35.
61. Duan, L. (2014). Imperialist competitive algorithm optimized artificial neural networks for
UCAV global path planning. Neurocomputing, 125, 166–171.
62. Liang, K. (2010). The application of neural network in mobile robot path planning. Journal of
System Simulation, 9(3), 87–99.
63. Horn, E., Schmidt, B., & Geiger, M. (2012). Neural network-based trajectory optimization for
unmanned aerial vehicles. Journal of Guidance, Control, and Dynamics, 35(2), 548–562.
64. Geiger, B., Schmidt, E., & Horn, J. (2009). Use of neural network approximation in multiple
unmanned aerial vehicle trajectory optimization. In Proceedings of the AIAA guidance,
navigation, and control conference, Chicago, IL.
Navigation and Trajectory Planning Techniques for Unmanned Aerial … 403

65. Ali, E., Hassan, M., & Saeed, R. (2021). Machine learning technologies in internet of vehicles.
In: Magaia, N., Mastorakis, G., Mavromoustakis, C., Pallis, E., Markakis, E. K. (Eds.), Intelli-
gent technologies for internet of vehicles. Internet of things. Cham : Springer. https://doi.org/
10.1007/978-3-030-76493-7_7
66. Gautam, S., & Verma, N., Path planning for unmanned aerial vehicle based on genetic algo-
rithm & artificial neural network in 3d. In Proceedings of the 2014 international conference
on data mining and intelligent computing (ICDMIC) (pp. 1–5). IEEE.
67. Wang, N., Gu, X., Chen, J., Shen, L., & Ren, M. (2009). A hybrid neural network method for
UAV attack route integrated planning. In Proceedings of the advances in neural networks–ISNN
2009 (pp. 226–235). Springer.
68. Alatabani, L, & Ali, S. et al. (2021). Deep learning approaches for IoV applications and
services. In Magaia, N., Mastorakis, G., Mavromoustakis, C., Pallis, E., & Markakis, E. K.
(Eds.), Intelligent technologies for internet of vehicles. Internet of things. Cham : Springer.
https://doi.org/10.1007/978-3-030-76493-7_8
69. Hidalgo, A., Miguel, A., Vegae, R., Ferruz, J., & Pavon, N. (2015). Solving the multi-objective
path planning problem in mobile robotics with a firefly-based approach. Soft Computing, 1e16.
70. Brand, M., & Yu, H. (2013). Autonomous robot path optimization using firefly algorithm. In
International conference on machine learning and cybernetics, Tianjin (Vol. 3, p. 14e7).
71. Salih, A., & Rania, A. A., et al. (2021). Machine learning in cyber-physical systems in industry
4.0. In Luhach, A. K., and Elçi, A. (Eds.), Artificial intelligence paradigms for smart cyber-
physical systems (pp. 20–41). Hershey, PA: IGI Global. https://doi.org/10.4018/978-1-7998-
5101-1.ch002
72. Mahboub, A., & Ali, A., et al. (2021). Smart IDS and IPS for cyber-physical systems. In
Luhach, A. K., and Elçi, A. (Eds.), Artificial intelligence paradigms for smart cyber-physical
systems (pp. 109–136). Hershey, PA: IGI Global. https://doi.org/10.4018/978-1-7998-5101-1.
ch006
73. Christensen, A., & Rehan, O. (2008). Synchronization and fault detection in autonomous robots.
In IEEE/RSJ intelligent conference on robots and systems (p. 4139e40).
74. Wang, G., Guo, L., Hong, D., Duan, H., Liu, L., & Wang, H. (2012). A modified firefly algorithm
for UCAV path planning. International Journal of Information Technology, 5(3), 123e44.
75. Patle, B., Parhi, D., Jagadeesh, A., & Kashyap, S. (2017). On firefly algorithm: optimization
and application in mobile robot navigation. World Journal of Engineering, 14(1):65e76, (2017).
76. Patle, B., Pandey, A., Jagadeesh, A., & Parhi, D. (2018). Path planning in uncertain environment
by using firefly algorithm. Defense Technology, 14(6), 691e701. https://doi.org/10.1016/j.dt.
2018.06.004.
77. Ebrahimi, J., Hosseinian, S., & Gharehpetian, G. (2011). Unit commitment problem solution
using shuffled frog leaping algorithm. IEEE Transactions on Power Systems, 26(2), 573–581.
78. Tang, D., Yang, J., & Cai, X. (2012). Grid task scheduling strategy based on differential
evolution-shuffled frog leaping algorithm. In Proceedings of the 2012 international conference
on computer science and service system, (CSSS 2012) (pp. 1702–1708).
79. Hassanzadeh, H., Madani, K., & Badamchizadeh, M. (2010). Mobile robot path planning
based on shuffled frog leaping optimization algorithm. In 2010 IEEE international conference
on automation science and engineering, (CASE 2010) (pp. 680–685).
80. Cekmez, U., Ozsiginan, M., & Sahingoz, O. (2014). A UAV path planning with parallel
ACO algorithm on CUDA platform. In Proceedings of the 2014 international conference on
unmanned aircraft systems (ICUAS) (pp. 347–354).
81. Zhang, C., Zhen, Z., Wang, D., & Li, M. (2010). UAV path planning method based on ant
colony optimization. In Proceedings of the 2010 Chinese Control and Decision Conference
(CCDC) (pp. 3790–3792). IEEE.
82. Brand, M., Masuda, M., Wehner, N., & Yu, X. (2010). Ant colony optimization algorithm for
robot path planning. In 2010 international conference on computer design and applications,
3(V3-V436-V3), 440.
83. Mohanty, P., & Parhi, D. (2015). A new hybrid optimization algorithm for multiple mobile
robots’ navigation based on the CS-ANFIS approach. Memetic Computing, 7(4), 255e73.
404 N. M. Elfatih et al.

84. Wang, G., Guo, L., Duan, H., Wang, H., Liu, L., & Shao, M. (2012). A hybrid metaheuristic
DE/ CS algorithm for UCAV three-dimension path planning. The Scientific World Journal,
2012, 83973. https://doi.org/10.1100/2012/583973.11pages
85. Abbas, N., & Ali, F. (2017). Path planning of an autonomous mobile robot using enhanced
bacterial foraging optimization algorithm. Al-Khwarizmi Engineering Journal, 12(4), 26e35.
86. Jati, A., Singh, G., Rakshit, P., Konar, A., Kim, E., & Nagar, A. (2012). A hybridization
of improved harmony search and bacterial foraging for multi-robot motion planning. In:
Evolutionary computation (CEC), IEEE congress, 1e8, (2012).
87. Asif, K., Jian, P., Mohammad, K., Naushad, V., Zulkefli, M., et al. (2022). PackerRobo: Model-
based robot vision self-supervised learning in CART. Alexandria Engineering Journal, 61(12),
12549–12566. https://doi.org/10.1016/j.aej.2022.05.043
88. Mohanty, P., & Parhi, D. (2016). Optimal path planning for a mobile robot using cuckoo search
algorithm. Journal of Experimental and Theoretical Artificial Intelligence, 28(1e2), 35e52.
89. Wang, G., Guo, L., Duan, H., Wang, H., Liu, L., & Shao, M. (2012). A hybrid metaheuristic
DE/ CS algorithm for UCAV three-dimension path planning. The Scientific World Journal,
583973, 11 pages. https://doi.org/10.1100/2012/583973
90. Ghorpade, S. N., Zennaro, M., & Chaudhari, B. S., et al. (2021). A novel enhanced quantum
PSO for optimal network configuration in heterogeneous industrial IoT, in IEEE access, 9,
134022–134036. https://doi.org/10.1109/ACCESS.2021.3115026
91. Ghorpade, S. N., Zennaro, M., Chaudhari, B. S., et al. (2021). Enhanced differential crossover
and quantum particle Swarm optimization for IoT applications. IEEE Access, 9, 93831–93846.
https://doi.org/10.1109/ACCESS.2021.3093113
92. Saeed, R. A., Omri, M., Abdel-Khalek, S., et al. (2022). Optimal path planning for drones
based on swarm intelligence algorithm. Neural Computing and Applications. https://doi.org/
10.1007/s00521-022-06998-9
Intelligent Control System for Hybrid
Electric Vehicle with Autonomous
Charging

Mohamed Naoui, Aymen Flah, Lassaad Sbita, Mouna Ben Hamed,


and Ahmad Taher Azar

Abstract The present chapter deals with a general review of electric vehicles (EVs)
and testing the efficiency of modern charging systems. This work is concentrated
also on hybrid vehicle architectures and recharge systems. In the first step and
more precisely, a global study on the different architecture and technologies for
EVs examined the battery, electric motor, and different sensor actions in electric
vehicles. The second part also discusses the different types of charging systems used
in EVs which we divided into two types, the first one is classic chargers the second
is the autonomous charger. In addition, an overview of the autonomous charger
is presented along with its corresponding mathematical modeling to address the
photovoltaic charger (PV) and Wireless charging system (WR). After a clear mathe-
matical discerption of each part and by showing the needed electronic equipment to
assure each tool’s role, an easy management loop is designed and implemented. Then
propose a hybrid charging system between PV and WR and then used an intelligent
power distribution system. Then, Matlab/Simulink software is used to simulate the
energetic performance of an electric vehicle with this hybrid recharge tool under
various simulation conditions. At the end of this study, the given results and their
corresponding discussions show the benefits and the drawbacks of each solution and
prove the importance of this hybrid recharge tool for increasing vehicle autonomy.

Keywords Hybrid electric vehicle · Wireless charging system · Batteries


technology · Intelligent control · Photovoltaic · Fuzzy logic control

M. Naoui (B) · A. Flah · L. Sbita · M. Ben Hamed


Environment and Electrical Systems LR18ES34, National Engineering School of Gabes,
University of Gabes, Zrig Eddakhlania, Tunisia
e-mail: [email protected]
A. Flah
e-mail: [email protected]
A. T. Azar
College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586,
Saudi Arabia
e-mail: [email protected]; [email protected]; [email protected]
Faculty of Computers and Artificial Intelligence, Benha University, Benha 13518, Egypt

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 405
A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous
Systems Applications, Studies in Computational Intelligence 1093,
https://doi.org/10.1007/978-3-031-28715-2_13
406 M. Naoui et al.

1 Introduction

Electrifying transport systems have become a necessity in the modern city, and having
an electric vehicle instead of a fuel-powered one is becoming essential given the
technological advantages of communication techniques as well as the advantages of
taxes and the reduced price of electrical energy, compared to that of fuels [1]. This
transport system has been applied to several models and architectures which differ
internally. In recent years, most countries have sought to develop their transport
systems. Indeed, the facilities offered by new technologies and the global orientation
in saving the planet from atmospheric pollution have pushed toward the electrification
of transport systems. The future objective has therefore become the elimination of
all transport systems based on polluting energies, the replacement of which by other
systems using clean energy has become a necessity in most countries. This means
that modern transport systems are based, either totally or partially, on electrical
energy, which is non-polluting energy. The scarcity of fossil fuels and ecological
concerns are leading to a phase of the energy transition. The transport sector absorbed
nearly 66% of global oil production in 2011, producing more than 70% of global
greenhouse gas emissions [2]. The automotive sector is at the heart of these problems.
Therefore, research and development of technologies related to electric vehicles have
become indispensable [3]. About this sector and its basic elements, the storage of
electrical energy is one of the main success factors in this field [4]. Generally, for an
electric vehicle to be competitive in the market, the factor mentioned above must be
high profitability. About the latter, the model of the charging system used, directly
influences the classification of the vehicle model, compared to other models. These
charging systems have been treated and studied in various research works since the
appearance of accumulators. This led to the appearance of the notion of a charger
connected to the network [5]. These conventional systems immobilize the system
that carries the accumulators and limit the area of movement of the latter [6]. This
behavior includes weaknesses and strengths for some applications. Concerning the
objective studied in this thesis, electric vehicles using grid-connected chargers are
exposed to various problems [7]. These are especially linked to the recharging time,
which is generally high, and to the need for stops for a long journey, and this for a
long time [5]. Weaknesses appeared with this type of vehicle. The resolution of this
problem began with the appearance of new lithium battery technology as well as with
mobile charging systems, such as the photovoltaic, hybrid, or even mobile contact
system (example of the metro). In this context, the authors, in [8, 9], proposed the
first version of an adaptable photovoltaic energy system for an electric vehicle [10].
This system is used in fixed stations (doors of photovoltaic panels or panels installed
on the vehicle itself).
From another point of view, the integration of the case of wireless charging when
parking the vehicle appeared in [11], and then problems appeared in relation to the
frequency factor. The researchers in [12, 13] have proposed a version characterized by
a frequency study of this system taking into account the static and dynamic problems
[14]. However, the efficiency of all charging systems is related to the type of control
Intelligent Control System for Hybrid Electric Vehicle … 407

and energy distribution used. These kinds of control are indirect or may employ
procedures that do not require recharge system information or knowledge, such as
fuzzy logic (FL), neural network (NN), and ANFIS-based tactics [15–17].
In this context, the work presented in this chapter aims to study the present history
of electric vehicles. We have tried to present this system in accordance with what the
literature suggests. We have particularly talked about the different topologies that
exist, as well as the known architectures. The rest of this part consists of a state of
known and used recharging systems. We have exploited their modern and classic
architectures. Then use an intelligent power distribution system to save the energy
in the battery and use the available sources of energy in the most efficient way.
Finally, in this chapter, the authors studied the incorporation of several recharging
devices within the vehicle and were able to use also the vehicle in motion. This
hybrid recharge system consists of photovoltaic panels mounted on the body of the
car and collected solar energy to be used in the power pack. In addition, the wireless
charging device is removed from the car in order to provide control even though the
vehicle is in motion on the rechargeable paths.
These systems are modeled and explained in order to define the hybrid recharge
system. Also, this system is studied using Matlab/Simulink tool for having infor-
mation about the power flux and the relation of the external factors and the vehicle
speed on the battery state of charge parameter. Therefore, this chapter is organized
into four sections. After the introduction, a general review presents the general clas-
sification of electric vehicles. The third section explains the architecture of electric
vehicles, where it accurately presents its components. The next section describes elec-
tric vehicle charging, the mathematical model for the autonomous charging system,
and simulation results. Finally, a conclusion resumes the chapter.

2 Preliminaries

2.1 Hybrid Vehicle and Pure Electric Vehicle

The electric vehicle composes of two models, which are the hybrid version and the
pure EV. The combustion engine seems the only difference between the two models
as it is existing only in the hybrid model. The initial pack of components regroups
a source of energy beside a battery system. This bloc is connected to the power
electronic converter for feeding the main electric motor. The function bloc needs a
system of control beside a high-processor calculator for supervising all the processes
of energy management and vehicle speed control [18].
A lot of problems and advantageous descriptions can be found for each of these
models [19]. Pure electric vehicle is friendly to the environment. Based on the
problem of environment and gas emissions, the pure electric vehicle field has encour-
aged research for making this transport solution more efficient. Optimizing the size
of the motor and improving the battery technologies, by making various solutions
408 M. Naoui et al.

Fig. 1 General classification of electric vehicle

for recharge are the different fields in this sector of research [20]. The EVs use
electrical energy to drive the vehicle and to interact with the vehicle’s electrical
system. According to the Technical Committee of the International Electro-Technical
Commission (IETCTC), whether a vehicle uses two or more energy sources or storage
systems, or converters to drive the vehicle, it is referred to as a hybrid electric vehicle
(HEV) as long as at least one source supplies electricity [21]. EVs are categorized
according to the combination of sources into various categories [3]. The battery alone
serves as a source for the electric vehicle battery (BEV), the electric vehicle fuel cell
and battery in the electric vehicle fuel cell (FCEV), the HEV battery and ICE, the
PEV battery and grid or the external charging station as shown in Fig. 1. In the
following section, the specifics of the EV forms are discussed.

2.2 Hybrid Vehicle Architecture

The HEV can be classified into three major architectures. Architecture refers to the
configuration of the main elements of the powertrain. In our case, it is the heat engine,
an electric machine, and a battery. Three architectures can be characterized by how
thermal and electrical energies are channeled to the wheels: series, parallel, or power
bypass (series–parallel) [22].
Intelligent Control System for Hybrid Electric Vehicle … 409

ICE

Generator

Electric Motor

Fig. 2 The series architecture of the hybrid vehicle

2.2.1 Hybrid Series

In this configuration (Fig. 2), the heat engine drives an alternator that supplies the
battery in the event of discharge and the electric motor in the event of a high-power
demand. This type of model allows great flexibility of propulsion, it consists in
operating the heat engine in the range of its highest efficiency and in increasing
the autonomy of the vehicle. On the other hand, the overall efficiency is very low
because of a double conversion of energy. Then, it requires a relatively powerful
electric motor because it alone provides all of the propulsion [23].
However, this architecture makes it possible to satisfy one of the constraints raised
in the issue, particularly low emissions in the urban cycle and a saving of 15 to 30%
in consumption.

2.2.2 Parallel Hybrid

In a parallel hybrid structure, the heat engine supplies its power to the wheels as for
a traditional vehicle. It is mechanically coupled to an electric machine which helps
it. This configuration is shown in Fig. 3 [24].
410 M. Naoui et al.

ICE

Electric Motor

ICE

Electric Motor

Fig. 3 Double-shaft and single-shaft parallel hybrid configuration


Intelligent Control System for Hybrid Electric Vehicle … 411

The peculiarity of its coupling also gives it the name of parallel hybrid with the
addition of torque or of addition of speed depending on the structure and design of the
vehicle. The torque addition structure adds the torques of the electric machine and the
heat engine to propel the vehicle (or to recharge the battery). This connection can be
made by belts, pulleys, or gears (a technology called parallel double-shaft hybrid).
The electric machine can also be placed on the shaft connecting the transmission
to the heat engine (a technology called parallel single shaft). The speed addition
structure adds the speeds of the heat engine and the electric machine. The resulting
speed is related to the transmission. This type of coupling allows great flexibility in
terms of speeds. The connection is made mechanically by planetary gear (also called
epicyclic gear).
This architecture requires more complex control than that of the serial architecture
and requires additional work for the physical integration of power sources. Never-
theless, not insignificant gains can be obtained, even by using electrical components
of low power and low capacity. Also, these gains make it possible to compensate
for the additional cost of this architecture and the excess weight associated with the
batteries and the electric motor.

2.2.3 Series–Parallel Hybrid

A series–parallel architecture combines the operating modes and the advantages of


both series and parallel architectures. The best-known of the series / parallel hybrid
architectures is that of the Toyota Prius. The latter uses a planetary gear and a first
electric machine that brings the engine to its best performance points, a second
machine participates in traction.
Within these structures, part of the energy delivered by the heat engine is trans-
mitted mechanically to the wheels. At the same time, electric machines take or supply
energy to the transmission to meet the objectives (acceleration, charge, or discharge
of the battery, optimal consumption of the heat engine). In most cases, there are two
electric machines, each of which can be either a motor or a generator. This configu-
ration, therefore, allows at least 4 operating modes each having certain advantages.
Such an architecture is described in Fig. 4. This architecture has the advantage of
being very efficient, without the use of a clutch or variable speed drive, but with very
delicate management [25].

2.2.4 Comparison of the Different Propulsion Structures

Depending on the configuration used, here are some advantages and disadvantages
of each one presented in Table 1.
412 M. Naoui et al.

ICE

Generator
Electric Motor

Fig. 4 Schematic of the double hybridization vehicle

3 The Architecture of Electric Vehicles

3.1 Battery Technologies

The battery is defined as a device that ensures the storage of energy to deliver
electrical energy. It is divided into two categories: Primary batteries and secondary
batteries. Primary batteries are characterized by the provision of energy for a single
time during discharge while secondary batteries permanently offer storage energy
during the process of charge and discharge during the whole life of the battery.
The characteristics of a battery are generally defined by several criteria indicating
its performance which are: Energy density, Cost in the market, The number of
charging cycles, The discharge processes, The influence on the environment, The
temperature ranges, and the memory effects. This section will focus on secondary
batteries as they are the ones used in electric or hybrid vehicles. There are a lot of
batteries used in EVs, generally based on Lead Acid, Nickel, Lithium Metal, Silver,
and Sodium-Sulfur. The following battery technologies that are used in EVs will
be described respectively: Lead-acid (Pb-acid), Nickel–Cadmium (NiCd), Nickel-
metal-hydride (Ni-MH), Lithium-ion (Li-ion) [26]. Table 2 shows the characteristics
Intelligent Control System for Hybrid Electric Vehicle … 413

Table 1 Comparison of various architectures


Hybrid type Advantages Disadvantages
Hybrid series – Good energy efficiency at low – The poor energy efficiency of the
speeds (all-electric mode in urban overall chain (in extra urban areas)
areas) – Use of 3 machines, one of which
– Good control of the heat engine (the electric traction machine) is at
– The generator set is not necessarily least of high power (maximum size)
placed next to the electric traction – All thermal modes are not possible
machine: additional degree of
freedom to place the various
components (example of the
low-floor bus)
– Relatively easy to manage
(compared to other architectures)
– It is easy to design and control. It
requires very little mechanical
equipment (no clutch or gearbox)
Parallel hybrid – Good energy efficiency – More increased operation of the
– Use of a single electric machine heat engine: poor dynamics
– All-thermal and all-electric modes – The torque setpoint must be
(in some cases) are possible distributed at all times between the
– Transmission is little modified (in two torque sources
some cases) compared to the – Mechanical coupling and complex
conventional vehicle energy control
Mixed hybrid – Good energy efficiency – Use of 3 machines or 2 machines
– Very good energy distribution with 2 clutches
– Vehicle flexibility: all modes are – Very complex coupling and very
authorized (thermal, electric, series, delicate management
parallel, or series–parallel) – It requires at least two electric
– No break-in torque at the wheel machines in addition to the heat
engine, which makes it expensive
and very heavy

of the different battery hybrid vehicles, their electrification systems, costs, and CO2
emission minimization in each case [27].

Table 2 Comparative table of battery technologies


Battery Plomb acide Ni–Cd Ni-Mh Li-ion
Energy density 30–50 45–80 60–120 160–200
(Kh/kg)
Number of cycles 500 → 800 1000 → 2000 600 → 1500 400 → 1200
(charge/discharge)
Loading time 6 → 12 h 1→2h 2→4h 2→4h
Operating – 20 → 60 °C – 40 → 60 °C – 20 → 60 °C – 20 → 60 °C
temperature
414 M. Naoui et al.

Fig. 5 Example of supercapacitor

3.2 Super-Capacitors

The principle of Supercapacitors (SC) as shown in Fig. 5 is to store energy in the


electrostatic form they are energy storage systems of low energy density but of
significant power density. Consequently, they are used in the transient phases to
provide the requested power peaks, in order to reduce current stresses, reduce the
size and increase the lifespan of the main energy source (batteries or battery to fuel)
[28].
The supercapacitor consists of two metal collectors (see Fig. 6), each coupled to
two carbonaceous, porous electrodes impregnated with an electrolyte.
To remedy the problems of oversizing batteries in VEH applications (Ecopur
ventilation for housing), supercapacitors have very interesting properties. the charge
transfer kinetics are faster than in the case of batteries. Their lifetime is of the order
of a few hundred thousand charge/discharge cycles [29].

3.3 The Electric Motor

The motor is a relatively simple component at the heart of an electric vehicle that
operates on the interaction forces (force vectors) between an electromagnet and a
permanent magnet. When braking, the mechanical chain becomes part of the power
source, and the main energy source (battery) becomes the receiver an actuator that
creates rotational motion from electrical energy Electric motors are widely used
Intelligent Control System for Hybrid Electric Vehicle … 415

Fig. 6 Composition of a supercapacitor

because of their reliability, simplicity, and good performance. An electric motor is


composed of an output shaft, a frame body, and two electric spindles [30]. There are
a large number of engine types.

3.3.1 DC Motors

The drives with DC motors have long been used in electric vehicles because they
provide simple speed control. Furthermore, this sort of engine has great electric
propulsion properties (very favorable torque curve at low speed). However, their
production is costly, and the brush-collector system must be maintained [31]. Their
speed is limited, and they have low specific power, typically 0.3 to 0.5 kW/kg, whereas
gasoline engines have a specific power of 0.75 to 1.1 kW/kg. As a result, they are
less trustworthy and unsuitable for this purpose [32].

3.3.2 Asynchronous Motors

The asynchronous motor is made up of a stator and a rotor. The stator is the fixed
part of the motor and it has three windings (or windings) which can be connected
in star (Y) or in delta (4) depending on the supply network. The rotor is the rotating
part of the engine and is cylindrical, it carries either a winding (usually three-phase
like the stator) accessible by three rings and three brushes, or an inaccessible squirrel
cage, based on aluminum conductive edges. In both cases, the rotor circuit is short-
circuited (by rings or a rheostat) [33]. The asynchronous machine, due to its simplicity
of manufacture and maintenance, is currently the most widespread in the industrial
sector and has much better performance than other types of machines. Furthermore,
these machines have a lower mass torque, efficiency, and power factor than magnet
machines.
416 M. Naoui et al.

3.3.3 Synchronous Motors

The synchronous motors Although more difficult to manage, more expensive, and
potentially less robust, synchronous motor selection has become critical in electric
and hybrid vehicles. In generator and motor mode, the synchronous machine has the
highest efficiency. A synchronous motor, like an asynchronous motor, consists of a
stator and a rotor separated by an air gap. The only change is in the rotor design.

3.3.4 Operation of the Electric Motor

Electric vehicles are increasingly part of our daily lives, so it was time to look at the
operation of their motor as well as the different versions (synchronous, asynchronous,
permanent, induction, etc.). So, let’s see the general principle of this technology
which however is not new [34].
A. The principle of an electric motor
The principle of an electric motor in Fig. 7 regardless of its construction, is to use
magnetic force to generate movement. Magnetic force is recognizable to us since
magnets may repel or attract other magnets. We shall employ two primary elements
for this: permanent magnets and copper coils (ideal materials for this work because
they are the most conductive…), or even copper coils in some cases (therefore without
a permanent magnet). Everything will be mounted on a circular axis to achieve a
permanent and linear movement; the idea is to create something with a cycle that
will repeat itself as long as we feed the motor.

Fig. 7 Electric motor


Intelligent Control System for Hybrid Electric Vehicle … 417

It should also be known that a coil traversed by current (thus electrons) then
behaves like a magnet, with an electromagnetic field with two poles: north and south.
All electric motors are reversible: if we move the magnet manually, this generates
an electric current in the coil (we can then recharge the battery, for example, this is
regeneration) If we inject current into the coil, then the magnet begins to move. In
reality, the current goes from – to + . If the convention was decided that it would go
from + to – (we decided on this convention before having the true direction of the
current).
B. Parts of an electric motor

B.1 Accumulator:
This is where the current that will power the motor comes from, generally from a
Lithium-Ion battery or Ninh battery.
B.2 Stator:
This is the peripheral part of the engine, the one that does not rotate. To help you
remember, tell yourself it’s the static part (stator). It is in 99% of the cases made up
of coils that we will more or less supply (but also more or less alternate in = /- with
alternating current motors) to make the rotor turn.
B.3 Rotor:
This is the moving part, and to remind you of this, think of the word rotation (rotor).
It is generally not powered because being mobile it is difficult to do (or in any case,
it is not sustainable over time).
C. Transmission:
Because the electric motor has a very high operating range (16,000 rpm on a Model S
(model of electric vehicles, for example) and torque available quickly (the lower the
revs, the more torque), it was not necessary to produce a gearbox, so we have a type
of motor that is directly connected to the wheels! The gear ratio remains constant
whether you are traveling at 15 or 200 km/h.
The rhythm of the electric motor is not exactly set on that of the wheels; there
is what is known as a reduction. On a Model S, the ratio is around 10%, which
means that the wheel turns 10 times slower than the electric motor. An epicyclic gear
train, which is common in automatic gearboxes, is used to obtain the reduction ratio.
Figure 8 depicts this global structure.
After this reducer, there is finally the differential which allows the wheels to rotate
at different speeds. No need for a clutch or a torque converter because if a thermal
engine needs to be in motion all the time, this is not the case with an electric motor.
It, therefore, has no idling speed or need for a clutch that acts as a bridge between
the wheels and the engine: when the wheels stop, there is no need to disengage.
418 M. Naoui et al.

Fig. 8 Transmission system

3.3.5 The Calculator and Different Sensors of Electric Vehicles

The calculator is a power calculator and manages a lot of things for example controls
the energy flows thanks to the many sensors it has. For example, when I accelerate,
I press a sensor (the pedal) called a potentiometer (it’s the same thing on modern
thermal vehicles), the computer then manages the flow of energy to be sent to the
engine according to my “degree of acceleration”. Same when I release the pedal, it
will manage energy recovery by sending the juice generated by the electric motor
(therefore reversible) to the battery while modulating the electric flow. It can ripple
the current using a chopper (battery to motor) or even rectify the current (recovery of
alternative energy for the DC battery). The different sensor action in electric vehicles
is shown in Table 3.

4 Electric Vehicles Charging

4.1 Types of Classic Chargers

Integrated chargers can reuse all, or part, of the components of the traction chain to
perform the recharge. For example, the power of the traction chain of the Renault
ZOE 2nd generation is 80 kW with an on-board battery and a capacity of 40 kWh,
which makes it possible to envisage a substantial recharge in 30 min using the compo-
nents of the traction inverter. The tree in Fig. 9 is taken from a review of integrated
onboard chargers carried out as part of a CIFRE thesis with Renault S.A.S. lists the
different means of exploiting the traction chain for charging. This classification is
based on the study of 67 publications including patented topologies, journal articles,
and conference papers [35].
Intelligent Control System for Hybrid Electric Vehicle … 419

Table 3 Different sensor


Nam sensor Figures
actions in the electric vehicles
Battery discharge status sensor

Battery discharge status sensor

Brake pedal action sensor

Parking brake sensor

Electric motor temperature sensors

Outdoor temperature sensor

4 tire pressure sensors

Front obstacle sensors

Reversing radar sensor

Light sensor
420 M. Naoui et al.

Fig. 9 Classification of on-board and powertrain-integrated chargers

The reuse of onboard power electronics and/or electrical machine windings can
cause EMC interference problems with other equipment connected to the electrical
system and also with domestic protection devices. This may affect the availability of
the EV load. If the high-frequency components of the leakage currents are too high,
two things can happen blinding or untimely tripping of protection devices, such as
the differential circuit breaker. Any event that can cause an RCD to blind poses a
high safety risk to the user. Therefore, the IEC 61,851–21 safety standard specifies
that the leakage current must not exceed 3.5 mARMS.
Thus, the reduction of emissions conducted towards the network, and more partic-
ularly those of common mode currents, in a large frequency range [150 kHz –
30 MHz] is often achieved by galvanic isolation through the use of topologies based
on power transformers. Given the charging power levels, galvanic isolation has an
impact on the cost and volume of the charger. When the charger is not isolated, manu-
facturers use passive and active filtering in order to limit the disturbances generated
by the charger.

4.2 Autonomous Charger

Autonomous chargers, or non-traditional chargers, are charging systems that combine


the use of new energy sources and advanced charging techniques, which ensure the
simplicity of the charging task, energy saving, and even, recharging time savings.
Intelligent Control System for Hybrid Electric Vehicle … 421

4.2.1 Photovoltaic Charger

The average intensity of solar energy reaching the earth is 1367W/m2 . Benefiting
from this amount of energy has encouraged researchers to design solar receivers
intended to transform this solar energy into electrical energy. The results found
have guided vehicle manufacturers towards another energy source that will later
be used to improve vehicle autonomy. This solar charging system is essentially
based on a set of components that essentially includes the solar receivers which
ensure the obtaining of direct electricity when light reaches them. The efficiency of
this conversion depends mainly on the type of solar panel, such as polycrystalline,
monocrystalline, or amorphous silicon.
Charge controllers are also indispensable tools in this operating loop since the
outputs of the panels are variable and must be adjusted before being stored in
the battery or supplied to the load. Charge controllers work by monitoring battery
voltage. In other words, they extract the variable voltage from the photovoltaic panels,
depending on the safety of the battery. Once fully charged, the controller can short
out the solar panel to prevent further charge buildup in the battery. These controllers
are usually DC-DC converters. Figure 10 shows the architecture of the solar vehicle
and the location of the charge controller [36].
Most of these controllers measure the voltage in the battery and supply current
to the battery accordingly or completely stop the flow of current. This is done by
measuring the current capacity of the battery, rather than looking at its state of charge
(SOC). The maximum battery voltage allowed to reach is called the “charging set
point”. Factors such as prevention of deep discharge, battery sulfation, overcurrent,
and short circuit, are also prevented by the controller. A deep discharge can be
detected by the microcontroller, and it will then initiate an automatic acceleration

Fig. 10 Solar vehicle architecture


422 M. Naoui et al.

charge to keep the battery activated. Depending on the connections, charge controllers
can be of two types: the parallel controller, which is connected in parallel with the
battery and the load, and the series controller, which is placed in series between the
solar, the battery, and the load.

4.2.2 Inductive Power Transfer

Implemented by inductive power transfer, wireless vehicle charging is convenient


in terms of safety and convenience: the user need not worry about handling power
cords, thus avoiding the risk of electrocution, and can park the vehicles in appropriate
spaces, so that the charging operation can automatically take place. The coils are
generally placed in the following way: the one connected to the grid is placed on
the ground, and the other, connected to the battery, is placed below the chassis of
the vehicle, as can be seen in Fig. 11 [37]. The minimum power of the electric
vehicle charging level is generally 3 kW. Various examples of commercial electric
vehicles’ wireless charging stations can be provided as electric vehicle companies
are increasingly interested in this innovation. Among vehicle manufacturers, Toyota,
Nissan, General Motors, and Ford are some of the companies showing interest in
inductive charging [38].
Among companies producing wireless charging systems for electric vehicles,
Evatran and HaloIPT are leaders in providing and improving inductive charging tech-
nology. Evatran has created the Plugless power inductive charging system. HaloIPT,
of which one of the images of the inductive charger is presented in Fig. 11, was
acquired by Qualcomm. The opportunity for fast charging would make IPT more
attractive for electric vehicles [39, 40]

5 The Mathematical Model for the Autonomous Charging


System

In this part we tested the efficiency of the autonomous system, we chose the two most
used systems in the application of electric and hybrid vehicles, the first photovoltaic
charging system and the second wireless charging a hybrid between the PV and WR
and tests the efficiency. then we proposed a hybrid system between PV and WR and
tested the efficiency. The different blocks for the hybrid recharge system are shown
in Fig. 12.
Intelligent Control System for Hybrid Electric Vehicle … 423

Battery Pack

BMS with control & Protection

DC/DC Converter

AC/DC Converter

AC/DC DC/AC
Grid Or Home
Converter Converter

(a)

Battery Pack

BMS with control & Protection


Grid Or Home
DC/DC Converter

AC/DC Converter

AC/DC Converter DC/AC Converter

(b)
Fig. 11 Wireless charging of electric vehicles, based on IPT a wireless V2G b plug-In V2G

5.1 Inductive Power Transfer Model

A simplified representation of the IPT system is given in Fig. 13 where “V1 ” and
“V2 ” indicate the input and output voltages of this system. Each part consists of a set
of resistance and capacitance placed in series, between the source and the part either
emitting or receiving. This system is similar to that of a moving transformer [22].
424 M. Naoui et al.

DC/DC
Converter

Battery Pack

BMS with control & Protection

DC/DC Converter

AC/DC Converter

AC/DC DC/AC
Grid Or Home
Converter Converter

Fig. 12 Proposed hybrid system

C1 C2
I1 R1 R2 I2

V1 L1 L2 V2

Fig. 13 The IPT system: a simplified representation

From this representation, it is possible to express the primary voltage delivered


by the DC-AC stage according to the parameters of the coil of the transmitting part
Eq. (1).
⎧−  −
⎨→
V1 = jωL 1 + jωC
1
+ R1
→ −

I 1 − jωM I 2
1 
⎩ −
→ −
→ −→ (1)
V2 = jωM I1 − jωL 2 + jωC1
2
+ R2 I2
Intelligent Control System for Hybrid Electric Vehicle … 425

Subsequently, the vectors linked to V1 and V2, by considering that ϕ1 and ϕ2 are
their phases, with respect to a zero-phase reference vector, are given by (2).
−
→ √
V1= 2 2
π

V1 (cosϕ1 + j.sinϕ1 )

→ (2)
V2= 2 2
π
V2 (cosϕ2 + j.sinϕ2 )

Is the real part of the power on the primary and secondary sides equivalent to the
active power as seen through the Eq. (3):
⎧ → −
⎨ P1 = Re − →
V 1 I1
→ −
⎩ P2 = Re − → (3)
V 2 I2

From Eq. (1), the vector of the primary current is expressed according to Eq. (4).
⎧ → jωM −
− →

⎨−→ V 1− R 2
V

I 1= 2
2
R1+ (ωM) (4)

⎩ ω = √ 1 = 2π f
R2

LC

where L is the intrinsic inductance of the primary and secondary coils, assumed to be
identical. C is the value of the series compensation capacitors C1 and C2, assumed
to be equal (C1 = C2). The expression of the current on the emitting side is therefore
expressed in Eq. (5).
⎧ √
⎪ −
→ X −Y

⎪ I 1= 2 2
.


π R1 + (ωM)
R
2
2

X = V1 (cosϕ1 + j.sinϕ1 ) (5)







Y = j ωMR2V2 (sinϕ2 + j.cosϕ2 )

However, the phase delay is defined as the phase difference between V2 and V1,
hence:

ϕ D = ϕ1 − ϕ2 (6)

According to Eqs. (2), (3), and (5), the real power of the primary side is defined
according to Eq. (7)
⎡ ⎤
V V ωM
8 ⎣ V1 + 1 R22 sinϕ D ⎦
2
P1 = 2 (7)
π R1 + (ωM)
2

R2

From Eq. (1), the secondary current vector is as follows:


426 M. Naoui et al.



jωM V 1 −


→ R1
− V2
I 2= (8)
(ωM)2
R2 + R1

The secondary current vector is, therefore:


⎧ √
⎪ −


⎪ I 2= 2 2
. A−B


π R2 + (ωM)
R1
2

A= jωM V1
(sinϕ1 + j.cosϕ1 ) (9)


R1



B = V2 (cosϕ2 + j.sinϕ2 )

According to Eqs. (3), (8), and (9), the real power on the secondary side is defined
by Eq. (10).
⎡ ⎤
V V ωM
8 ⎣ 1 R21 sinϕ D − V2 ⎦
2
P2 = 2 (10)
π R2 + (ωM)
2

R1

The real effective values of the primary and secondary waveforms are related to
the direct voltages V1 and V2, as a function of their phase shift values, ϕs1 and ϕs2 ,
relating, respectively, to the primary and secondary bridges. Considering Vdc and
Vbatt as the amplitudes of V1 and V2 :
  
V1 = Vdc .sin ϕ2s1 
(11)
V2 = Vbatt .sin ϕ2s2

Finally, replacing (7) by (10) and (11), the real powers are also obtained:
⎧ V sin( 2s1 ) 
ϕ   

⎪ P1 = π82 · dc (ωM) Vdc sin ϕ2s1 + E

⎪ R +
2
⎪ 1

⎨ P = 8 · Vbatt sin( 2s2 )  F − V sin ϕs2 



R2
ϕ

2 π2 2 batt
R2 + (ωM) 2
(12)

R1

⎪ V ωMsin( 2 )sin(ϕ D )
ϕs2


⎪ E = batt


Rϕ2
Vdc ωMsin( 2s1 )sin(ϕ D )
F= R1

5.2 Photovoltaic Generator Model

The solar cell is an electrical component used in some application requirements (such
as an electric vehicle) to technologically transform solar energy into electricity to
produce the electrical energy requirements. Many authors have suggested various
Intelligent Control System for Hybrid Electric Vehicle … 427

models for solar cell to prove their research work [41–50]. The current I c can be
given by

Ic = I ph + Ish + Id (13)

The current I ph (PV cells current) can be evaluated as:


⎧  
⎨ I ph =
⎪ (Ir s−r e f + K SC T (Tc − Tc−r e f ) )
G
Gr e f
+Rs Ic )
Id = Ir s (exp q(VcαkT − 1) (14)

⎩ Ish = R p (Vc + Rs Ic )
1

The I rs current can be approximately obtained as

Ir s−r e f
Ir s = (15)
exp( n sq.nβT
Voc
c
)−1

Finally, the current I c can be given by

q(Vc + Rs Is ) 1
Ic = I ph − Ir s exp −1 − (Vc + Rs Is ) (16)
αkT Rp

The model of a photovoltaic generator depends on the number of parallel and


series cells, respectively Np and Ns.

I p = N p Ic
(17)
V p = Ns n s Vc

Finally, the photovoltaic generator current can be given by


 N p Vp Rs I p
I p = N p I ph − [N p Ir s (K − (
R p n s Ns
+ Np
)]
V Rs I p (18)
K = q
exp αkT ( n s Np s + Np
)− 1)

The resistance Rp and Rs parameters are not considered (Rp >> Rs). Here is the
model with Rp = ∞ and Rs = 0.

q Vp
I p = N p I ph − N p Ir s (exp ( ) − 1) (19)
nβTc n s Ns
428 M. Naoui et al.

6 Simulation Results and Discussion

Following this exposure, it is important to cite the conditions of the simulation vehi-
cles out during this phase. Table 4 gives the technical specifications of the hybrid
system, as well as the driving condition applied to the vehicle. Driving exhibits a
variety of forms of acceleration: low, medium, and high, ensuring a state of decel-
eration, to demonstrate that this hybrid system can achieve a visible energy gain,
especially when the traction motor is not consuming. This phase is summarized in
Fig. 14, especially at times from 8 to 13 s.

Table 4 Characteristics of
Electrical characteristics of the hybrid system
the hybrid system
Electric motor power 50 KW
Type of electric motor PMSM
Type of battery lithium
Battery voltage 288 V
Maximum vehicle speed 120 km/h
Max motor torque 150 Nm
Mechanical characteristics of the vehicle
Max vehicle weight 332 kg
Tilt angle αr Tilt angle αr Variable according to route
Vehicle front surface 2.7 m2
Air density 1,225 kg/m3
Features of the PV charging system
Vehicle front surface 1.5 m2
Number of PV cells 145 145145 °C

Fig. 14 Driving cycle


selected during the
simulation phase
Intelligent Control System for Hybrid Electric Vehicle … 429

6.1 Fuzzy Logic Algorithms

The fuzzy logic technique has recently been established as one of the intelligent
methods used in power distribution systems to detect the power generated by the
recharge systems and distribute it in the most efficient way to recharge the battery
with a large amount of power. This control is more robust than traditional control
techniques and does not necessitate a perfect understanding of the system’s math-
ematical model. A fuzzy logic supervisor’s basic three functional blocks are fuzzi-
fication, inference engine, and defuzzification. As a result, input variables, output
variables, membership functions, and fuzzy rules identify it. Any fuzzy controller’s
success is determined by variables such as the number and relevance of any chosen
input, the fuzzification method, and the number of rules. The chosen variables are
related to these three signals in this case of application and are based on multiple
tests performed before in the study of [51] and in our earlier works. So, to pilot this
energy we will propose a simple flowchart of energy management which is presented
in Table 5.

6.2 Power Delivered by the Charging System

To test the profitability of this hybrid system, especially in relation to the state of
charge of the battery in the case where the vehicle is in motion, we will refer in what
follows to the simulation conditions mentioned above. Indeed, we propose to study
to have the energetic behavior of the vehicle for an increased speed of the form given
in Fig. 14. The forms of power delivered by the studied source are implemented in
Fig. 15, corresponding respectively to the photovoltaic and wireless cases. Switching
on these two energy sources gave a new kind of power output. The average value of
the power acquired from this hybrid system is greater than that of the photovoltaic
or wireless mode alone.
As explained in the previous part, it is clear that the average value of the power
acquired by the hybrid system is quite remarkable. During the action phase of the
traction system, the power consumed by the electric machine used follows a variable
path proportional to the selected acceleration or driving state. The main power source
is usually the battery, where the voltage supplied is quite stable. The other sources are
also used, as additional sources, to minimize the charge on the accumulators. At this
point, the state of charge of the battery is related to various conditions and situations,
especially the driving state and external factors related to the climate and the sizing
of the wireless system. Coordination between the various energy sources includes
control of power management. We chose to use the battery device in our work to begin
generating electricity. The WR generates electricity and the photovoltaic system at
least works to adapt solar irradiation to electrical energy supplied to a DC bus. The
total power is measured as present in Fig. 15 [52–54].
430 M. Naoui et al.

Fig. 15 Power dynamics of the hybrid system at a predefined driving cycle

6.3 Power Distribution and SOC Evolution

It is important to note that the selected driving cycle is the one used in Fig. 14. The
contribution of additive sources is well supervised in the form of power delivered
by the accumulators in Fig. 16. The first part of the simulation shows that the drops
obtained in the form of power delivered by the accumulator do not influence the
form of power consumed by the machine. On the other hand, we can notice that
during the malfunction of the motor “zero acceleration”, the power which is within
the battery is of negative form, which validates the state of charge on the part of
the photovoltaic source and wireless. For low acceleration forms, the implanted
hybrid system provides enough power to power the engine and charge the battery
simultaneously. Figure 16, shows this conclusion.
Along with this energy behavior, due to this hybrid system, it is possible to monitor
the state of charge of the battery, in order to officially validate whether this model is
Intelligent Control System for Hybrid Electric Vehicle … 431

4
Fig. 16 Evolution of power x 10
in relation to speed (Battery 4
power)

P-Batt (W)
0

Gain
-2

-4
0 5 10 15
Time (s)

Fig. 17 Evolution of the 40.6


SOC taking into account the
layout of the hybrid charging 40.4
system
40.2
SOC (%)

40

Low High Zero


39.8
acceleration acceleration acceleration
39.6
0 5 10 15
Time (s)

profitable or not. Figure 17 shows the state of charge of the battery and proves that
during weak acceleration the SOC rate increases, although the vehicle is in motion,
and the same during the stop phase.
In the same context, we wanted to test the contribution of this hybrid charging
system against the wireless one or pure photovoltaic one. Figure 18 shows two cases
of SOC evolution, the first case is when the hybrid charging system is deactivated
and there is only battery consumption. The second case presents the evolution of the
SOC in the operation of this hybrid system. We notice the difference between the
two curves is clear and the energy gain is equal to 0.86%.
On the other hand, it is possible to monitor the evolution of the power supplied by
these charging, photovoltaic, wireless, and hybrid systems. Figure 19, shows that the
power of the hybrid system represents the sum of the powers obtained by the single
charging systems. It is clear that the power obtained by the wireless system is zero
since the vehicle has not yet passed over a transmitter coil.
Table 6 summarize the energy statistics of the charging systems studied and prove
that the hybrid system provides a greater gain compared to a purely photovoltaic or
432 M. Naoui et al.

40.5
Without (PV+WR)
With (PV+WR)

SOC(%) 40

39.5 X= 14.938
Y= 39.6303

39

X= 14.9977
38.5 Y= 38.7669
0 5 10 15
Time (s)

Fig. 18 Hybrid system SOC variation (PV + WR)

Fig. 19 Summary of the average strength of the three charging systems

purely wireless charging system. This performance will ensure a gain in terms of the
distance traveled and will increase the life of the battery, which improves the overall
performance of the electric vehicle.

7 Conclusion

In this chapter, we have discussed the state of the art of the electrified transport system
relating to electric vehicles. Indeed, we have tried to present the different architectures
and models cited in the literature, such as pure electric and hybrid models. More
precisely, in this chapter, the recharge systems used as well as their different internal
architectures have been demonstrated. During this study, we divided these types of
Intelligent Control System for Hybrid Electric Vehicle … 433

Table 5 Fuzzy logic algorithms


Step 1. Start
Step 2. Measure the parameters Speed(k), P-batt(k), P-PV(k), P-WR (k).
Step 3. Fuzzification

Step 4. Rules
Energy Deceleration Low speed Medium High Speed
/Speed Speed
P-batt zero Low Low High

P-PV High High High High

P-WR Medium High Medium Low

Step 5. Apply inferences


If (Speed =low speed) then P-batt=Low, P-PV= High, P-WR= High
If (Speed =Medium Speed) then P-batt=Low, P-PV= High, P-WR=
Medium
If (Speed =High Speed) then P-batt= High, P-PV= High, P-WR= Low
If (Speed =Deceleration) then P-batt= zero, P-PV= High, P-WR= Me-
dium
Step 6. Defuzzification
Step 7. End

chargers into two categories: a set of modern or advanced chargers, solar chargers, and
Inductive Power Transfer. By focusing on the second category, the rest of this work
offers an in-depth study of these charging systems and exposes detailed modeling for
these different blocks. An operating simulation, to determine their performance at
specific operating conditions is applied. Then, a detailed mathematical model shows
a simulation result regarding the hybrid recharge system and energy. This recharge
system is installed into an electric vehicle to improve the vehicle’s autonomy. Each
recharge bloc, such as the photovoltaic recharge system and the wireless recharge tool
was modeled, and their corresponding mathematical expressions are given. Then, the
434 M. Naoui et al.

Table 6 Summary of the effectiveness of the three charging systems


SOC (%) Loss Average Minimum Maximum Power
power sum power power harvested
from the
battery
Only WR 0.27% 7588 W 0 6533 W 5772 W
generator
Only PV 0.8% 22,290 W 5662 W 10,235 W 6972 W
generator
Only PV + WR 1.07% 26,360 W 5662 W 12,219 W 12,760 W
generator

energetic performances were improved as the battery state of charge performances


are become more important which proved the vehicle’s total profitability.

Acknowledgements The authors would like to thank Prince Sultan University, Riyadh, Saudi
Arabia for supporting this work. Special acknowledgement to Automated Systems & Soft
Computing Lab (ASSCL), Prince Sultan University, Riyadh, Saudi Arabia.

References

1. Bai, H., & Mi, C. (2011). The impact of bidirectional DC-DC converter on the inverter operation
and battery current in hybrid electric vehicles. In 8th international conference power electron.
- ECCE Asia "Green world with power electron. ICPE 2011-ECCE Asia (pp. 1013–1015).
https://doi.org/10.1109/ICPE.2011.5944686.
2. Sreedhar, V. (2006). Plug-in hybrid electric vehicles with full performance. In 2006 IEEE
configuration electrical hybrid vehicle ICEHV (pp. 1–2). https://doi.org/10.1109/ICEHV.2006.
352291.
3. Mohamed, N., Aymen, F., Ali, Z. M., Zobaa, A. F., & Aleem, S. H. E. A. (2021). Efficient
power management strategy of electric vehicles based hybrid renewable energy. Sustainability,
13(13), 7351. https://doi.org/10.3390/su13137351
4. Ertan H. B., & Arikan, F. R. (2018). Sizing of series hybrid electric vehicle with hybrid energy
storage system. In SPEEDAM 2018 - proceedings: international symposium on power elec-
tronics, electrical drives, automation and motion (pp. 377–382). https://doi.org/10.1109/SPE
EDAM.2018.8445422.
5. Kisacikoglu, M. C., Ozpineci, B., & Tolbert, L. M. (2013). EV/PHEV bidirectional charger
assessment for V2G reactive power operation. IEEE Transactions on Power Electronics, 28(12),
5717–5727. https://doi.org/10.1109/TPEL.2013.2251007
6. Lee, J. Y., & Han, B. M. (2015). A bidirectional wireless power transfer EV charger using
self-resonant PWM. IEEE Transactions on Power Electronics, 30(4), 1784–1787. https://doi.
org/10.1109/TPEL.2014.2346255
7. Tan, L., Wu, B., Yaramasu, V., Rivera, S., & Guo, X. (2016). Effective voltage balance control
for bipolar-DC-Bus-Fed EV charging station with three-level DC-DC Fast Charger. IEEE
Transactions on Industrial Electronics, 63(7), 4031–4041. https://doi.org/10.1109/TIE.2016.
2539248
8. Abdelwahab O. M., & Shaaban, M. F. (2019). PV and EV charger allocation with V2G capa-
bilities. In Proceedings - 2019 IEEE 13th international conference on compatibility, power
Intelligent Control System for Hybrid Electric Vehicle … 435

electronics and power engineering, CPE-POWERENG 2019 (pp. 1–5). https://doi.org/10.1109/


CPE.2019.8862370.
9. Domínguez-Navarro, J. A., Dufo-López, R., Yusta-Loyo, J. M., Artal-Sevil, J. S., & Bernal-
Agustín, J. L. (2019). Design of an electric vehicle fast-charging station with integration of
renewable energy and storage systems. International Journal of Electrical Power and Energy
Systems, 105, 46–58. https://doi.org/10.1016/j.ijepes.2018.08.001
10. Ali, Z. M., Aleem, S. H. E. A., Omar, A. I., & Mahmoud, B. S. (2022). Economical-
environmental-technical operation of power networks with high penetration of renewable
energy systems using multi-objective coronavirus herd immunity algorithm. Mathematics,
10(7), 1201. https://doi.org/10.3390/math10071201
11. Omori, H., Tsuno, M., Kimura, N., & Morizane, T. (2018). A novel type of single-ended
wireless V2H with stable power transfer operation against circuit constants variation. 2018 7th
international conference on renewable energy research and applications (vol. 5, pp. 1–5).
12. Maeno, R., Omori, H., Michikoshi, H., Kimura, N., & Morizane, T. (2018). A 3kW single-
ended wireless EV charger with a newly developed SiC-VMOSFET. In 7th onternational
IEEE conference on renewable energy research and application, ICRERA 2018 (pp. 418–423).
https://doi.org/10.1109/ICRERA.2018.8566866.
13. Colak, K., Bojarski, M., Asa, E., & Czarkowski, D. (2015). A constant resistance analysis
and control of cascaded buck and boost converter for wireless EV chargers. In Conference
proceedings - IEEE applied power electronics conference and exposition - APEC (vol. 2015,
pp. 3157–3161). https://doi.org/10.1109/APEC.2015.7104803.
14. Mohamed, N. et al. (2021). A new wireless charging system for electric vehicles using two
receiver coils. Ain Shams Engineering Journal. https://doi.org/10.1016/j.asej.2021.08.012.
15. Azar, A. T., Serrano, F. E., Flores, M. A., Kamal, N. A., Ruiz, F., Ibraheem, I. K., Humaidi, A.
J., Fekik, A., Alain, K. S. T., Romanic, K., Rana, K. P. S., Kumar, V., Gorripotu, T. S., Pilla,
R., & Mittal, S. (2021). Fractional-order controller design and implementation for maximum
power point tracking in photovoltaic panels. In Advances in nonlinear dynamics and chaos
(ANDC), renewable energy systems, 2021 (pp. 255–277). Academic. https://doi.org/10.1016/
B978-0-12-820004-9.00031-0.
16. Azar, A. T., Abed, A. M., Abdulmajeed, F. A., Hameed, I. A., Kamal, N. A., Jawad, A. J. M.,
Abbas, A. H., Rashed, Z. A., Hashim, Z. S., Sahib, M. A., Ibraheem, I. K., & Thabit, R. (2022).
A new nonlinear controller for the maximum power point tracking of photovoltaic systems
in micro grid applications based on modified anti-disturbance compensation. Sustainability,
14(17), 10511.
17. Tian, X., He, R., Sun, X., Cai, Y., & Xu, Y. (2020). An ANFIS-based ECMS for energy
optimization of parallel hybrid electric bus. IEEE Transactions on Vehicular Technology, 69(2),
1473–1483. https://doi.org/10.1109/TVT.2019.2960593
18. Lulhe A. M., & Date, T. N. (2016). A technology review paper for drives used in electrical
vehicle (EV) and hybrid electrical vehicles (HEV). In 2015 international conference on control,
instrumentation, communication and computational technologies, ICCICCT 2015 (pp. 632–
636). https://doi.org/10.1109/ICCICCT.2015.7475355.
19. Datta, U. (2019). A price - regulated electric vehicle charge - discharge strategy. In Energy
research (pp. 1032–1042). https://doi.org/10.1002/er.4330.
20. Rawat, T., Niazi, K. R., Gupta, N., & Sharma, S. (2019). Impact assessment of electric vehicle
charging/discharging strategies on the operation management of grid accessible and remote
microgrids. International Journal of Energy Research, 43(15), 9034–9048. https://doi.org/10.
1002/er.4882
21. Hu, Y. et al. (2015). Split converter-fed SRM drive for flexible charging in EV/HEV
applications, 62(10), 6085–6095.
22. Mohamed, N., et al. (2022). A comprehensive analysis of wireless charging systems for electric
vehicles. IEEE Access, 10, 43865–43881. https://doi.org/10.1109/ACCESS.2022.3168727
23. Wang, J., Cai, Y., Chen, L., Shi, D., Wang, R., & Zhu, Z. (2020). Review on multi-power
sources dynamic coordinated control of hybrid electric vehicle during driving mode transition
process. International Journal of Energy Research, 44(8), 6128–6148. https://doi.org/10.1002/
er.5264
436 M. Naoui et al.

24. Zhao, C., Zu, B., Xu, Y., Wang, Z., Zhou, J., & Liu, L. (2020). Design and analysis of an
engine-start control strategy for a single-shaft parallel hybrid electric vehicle. Energy, 202(5),
2354–2363. https://doi.org/10.1016/j.energy.2020.117621
25. Cheng, M., Sun, L., Buja, G., & Song, L. (2015). Advanced electrical machines and machine-
based systems for electric and hybrid vehicles. Energies, 8(9), 9541–9564. https://doi.org/10.
3390/en8099541
26. Naoui, M., Aymen, F., Ben Hamed, M., & Lassaad, S. (2019). Analysis of battery-EV state of
charge for a dynamic wireless charging system. Energy Storage, 2(2). https://doi.org/10.1002/
est2.117.
27. Rajashekara, K. (2013). Present status and future trends in electric vehicle propulsion tech-
nologies. IEEE Journal of Emerging and Selected Topics in Power Electronics, 1(1), 3–10.
https://doi.org/10.1109/JESTPE.2013.2259614
28. Paladini, V., Donateo, T., de Risi, A., & Laforgia, D. (2007). Super-capacitors fuel-cell
hybrid electric vehicle optimization and control strategy development. Energy Conversion
and Management, 48(11), 3001–3008. https://doi.org/10.1016/j.enconman.2007.07.014
29. Chopra, S. (2011). Contactless power transfer for electric vehicle charging application. Science
(80).
30. Emadi, A. (2017). Handbook of automotive power electronics and motor drives.
31. Naoui, M., Flah, A., Ben Hamed, M., & Lassaad, S. (2020). Brushless motor and wireless
recharge system for electric vehicle design modeling and control. In Handbook of research on
modeling, analysis, and control of complex systems.
32. Guarnieri, M. (2011). When cars went electric, Part 2. IEEE Industrial Electronics Magazine,
5(2), 46–53. https://doi.org/10.1109/MIE.2011.941122
33. Levi, E., Bojoi, R., Profumo, F., Toliyat, H. A., & Williamson, S. (2007). Multiphase induction
motor drives-a technology status review. IET Electric Power Applications, 1(5), 643–656.
https://doi.org/10.1049/iet-epa
34. Mohamed, N., Flah, A., Ben Hamed, M., & Lassaad, S. (2021). Modeling and simulation of
vector control for a permanent magnet synchronous motor in electric vehicle. In 2021 4th
international symposium on advanced electrical and communication technologies (ISAECT),
2021 (pp. 1–5). https://doi.org/10.1109/ISAECT53699.2021.9668411.
35. Yilmaz, M., & Krein, P. T. (2013). Review of battery charger topologies, charging power
levels, and infrastructure for plug-in electric and hybrid vehicles. IEEE Transactions on Power
Electronics, 28(5), 2151–2169. https://doi.org/10.1109/TPEL.2012.2212917
36. Mohamed, N., Flah, A., & Ben Hamed, M. (2020). Influences of photovoltaics cells number for
the charging system electric vehicle. In Proceedings of the 17th international multi-conference
system signals devices, SSD 2020 (pp. 244–248). https://doi.org/10.1109/SSD49366.2020.936
4141.
37. Wu, H. H., Gilchrist, A., Sealy, K., Israelsen, P., & Muhs, J. (2011). A review on inductive
charging for electric vehicles. 2011 IEEE international electrical machine drives conference
IEMDC, 2011 (pp. 143–147). https://doi.org/10.1109/IEMDC.2011.5994820
38. Xie, L., Shi, Y., Hou, Y. T., & Lou, A. (2013). Wireless power transfer and applications to
sensor networks. IEEE Wireless Communications, 20(4), 140–145. https://doi.org/10.1109/
MWC.2013.6590061
39. Cao, P. et al. (2018). An IPT system with constant current and constant voltage output features
for EV charging. In Proceedings of the IECON 2018 - 44th annual conference IEEE industrial
electronics society (vol. 1, pp. 4775–4780). https://doi.org/10.1109/IECON.2018.8591213.
40. Nagendra, G. R., Chen, L., Covic, G. A., & Boys, J. T. (2014). Detection of EVs on IPT
highways. In Conference proceedings of the - IEEE applied power electronics conference and
exposition - APEC (pp. 1604–1611). https://doi.org/10.1109/APEC.2014.6803521.
41. Mohamed, N., Aymen, F., & Ben Hamed, M. (2019). Characteristic of photovoltaic generator
for the electric vehicle. International Journal of Scientific and Technology Research, 8(10),
871–876.
42. Dheeban, S. S., Selvan, N. M., & Kumar, C. S. (2019). Design of standalone pv system.
International Journal of Scientific and Technology Research (vol. 8, no. 11, pp. 684–688).
Intelligent Control System for Hybrid Electric Vehicle … 437

43. Kamal, N. A., & Ibrahim, A. M. (2018). Conventional, intelligent, and fractional-order control
method for maximum power point tracking of a photovoltaic system: A review. In Advances in
nonlinear dynamics and chaos (ANDC), fractional order systems (pp. 603–671). Academic.
44. Amara, K., Malek, A., Bakir, T., Fekik, A., Azar, A. T., Almustafa, K. M., Bourennane,
E., & Hocine, D. (2019). Adaptive neuro-fuzzy inference system based maximum power point
tracking for stand-alone photovoltaic system. International Journal of Modelling, Identification
and Control, 2019, 33(4), 311–321.
45. Fekik, A., Hamida, M. L., Houassine, H., Azar, A. T., Kamal, N. A., Denoun, H., Vaidyanathan,
S., & Sambas, A. (2022). Power quality improvement for grid-connected photovoltaic panels
using direct power control. In A. Fekik, & N. Benamrouche (Ed.), Modeling and control of
static converters for hybrid storage systems, 2022 (pp. 107–142). IGI Global. https://doi.org/
10.4018/978-1-7998-7447-8.ch005.
46. Fekik, A., Azar A. T., Kamal, N. A., Serrano, F. E., Hamida, M. L., Denoun, H., & Yassa,
N. (2021). Maximum power extraction from a photovoltaic panel connected to a multi-cell
converter. In Hassanien, A. E., Slowik, A., Snášel, V., El-Deeb, H., & Tolba, F. M. (Eds.),
Proceedings of the international conference on advanced intelligent systems and informatics
2020. AISI 2020. Advances in intelligent systems and computing (vol. 1261, pp. 873–882).
Springer, Cham. https://doi.org/10.1007/978-3-030-58669-0_77.
47. Kamal, N. A., Azar, A. T., Elbasuony, G. S., Almustafa, K. A., & Almakhles, D. (2019).
PSO-based adaptive perturb and observe MPPT technique for photovoltaic systems. In The
international conference on advanced intelligent systems and informatics AISI 2019. Advances
in intelligent systems and computing (vol. 1058, pp. 125–135). Springer.
48. Ammar, H. H., Azar, A. T., Shalaby, R., Mahmoud, M. I. (2019). Metaheuristic optimization of
fractional order incremental conductance (FO-INC) Maximum power point tracking (MPPT).
Complexity, 2019, Article ID 7687891, 1–13. https://doi.org/10.1155/2019/7687891
49. Rana, K. P. S., Kumar, V., Sehgal, N., George, S., & Azar, A. T. (2021). Efficient maximum
power point tracking in fuel cell using the fractional-order PID controller. In Advances in
nonlinear dynamics and chaos (ANDC), renewable energy systems (pp. 111–132). Academic.
https://doi.org/10.1016/B978-0-12-820004-9.00017-6
50. Ben Smida, M., Sakly, A., Vaidyanathan, S., & Azar, A. T. (2018). Control-based maximum
power point tracking for a grid-connected hybrid renewable energy system optimized by particle
swarm optimization. Advances in system dynamics and control (pp. 58–89). IGI-Global, USA.
https://doi.org/10.4018/978-1-5225-4077-9.ch003
51. Ghoudelbourk, S., Dib, D., Omeiri, A., & Azar, A. T. (2016). MPPT Control in wind energy
conversion systems and the application of fractional control (PIα ) in pitch wind turbine.
International Journal of Modelling, Identification and Control (IJMIC), 26(2), 140–151.
52. Kraiem, H., et al. (2022). Decreasing the battery recharge time if using a fuzzy based power
management loop for an isolated micro-grid farm. Sustain, 14(5), 1–23. https://doi.org/10.
3390/su14052870
53. Liu, H. C., Wu, S.-M., Wang, Z.-L., & Li, X.-Y. (2021). A new method for quality function
deployment with extended prospect theory under hesitant linguistic environment. IEEE Trans-
actions on Engineering Management, 68(2), 442–451. https://doi.org/10.1109/TEM.2018.286
4103
54. Nguyen, B. H., Trovão, J. P. F., German, R., & Bouscayrol, A. (2020). Real-time energy
management of parallel hybrid electric vehicles using linear quadratic regulation. Energies,
13(21), 1–19. https://doi.org/10.3390/en13215538
55. Guo, J., He, H., & Sun, C. (2019). ARIMA-based road gradient and vehicle velocity prediction
for hybrid electric vehicle energy management. IEEE Transactions on Vehicular Technology,
68(6), 5309–5320. https://doi.org/10.1109/TVT.2019.2912893
Advanced Sensor Systems for Robotics
and Autonomous Vehicles

Manoj Tolani, Abiodun Afis Ajasa, Arun Balodi, Ambar Bajpai,


Yazeed AlZaharani, and Sunny

Abstract In robotic and autonomous vehicle applications, sensor systems play a


critical role. Machine learning (ML), data science, artificial intelligence (AI), and
the internet of things (IoT) are all advancing, which opens up new possibilities for
autonomous vehicles. For vehicle control, traffic monitoring, and traffic management
applications, the integration of robotics, IoT, and AI is a very powerful combination.
For effective robotic and vehicle control, robot sensor devices require an advanced
sensor system. As a result, the AI-based system seeks the attention of the researcher
to make the best use of sensor data for various robotic applications while conserving
energy. The efficient collection of the data from sensors is a significant difficulty that
AI technologies can effectively address. The data consistency method can also be
used for time-constraint data collection applications. The present chapter discusses
three important methods to improve the quality of service (QoS) and quality of
experience (QoE) parameters of the robotic and autonomous vehicle applications.
The first one is consistency-guaranteed and collision-resistant approach that can
be used by the advanced sensor devices for the data aggregation and the removal
of the redundant data. The second one is aggregation aware AI-based methods to
improve the lifetime of the robotic devices and the last one is dividing the sensors

M. Tolani (B)
Manipal Institute of Technology, udupi, Manipal, Karnataka, India
e-mail: [email protected]
A. A. Ajasa
Universiti Teknologi Malaysia, Skudai, JB, Malaysia
e-mail: [email protected]
A. Balodi · A. Bajpai
Atria Institute of Technology, Bangalore, India
e-mail: [email protected]
A. Bajpai
e-mail: [email protected]
Y. AlZaharani
University of Wollongong, Wollongong, Australia
Sunny
Indian Institute of Information Technology, Allahabad, India
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 439
A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous
Systems Applications, Studies in Computational Intelligence 1093,
https://doi.org/10.1007/978-3-031-28715-2_14
440 M. Tolani et al.

devices based on continuous and event-monitoring robotic application and usage of


the application-specific protocol to deal with the corresponding data. In addition the
present chapter also discusses the role of sensor systems for various applications.

Keywords Machine learning (ML) · Artificial intelligence (AI) · Continuous


monitoring · Event-monitoring · Consistency-guaranteed

1 Introduction

The sensors play an important role for the various robotic applications. The robots
used for industrial, commercial as well domestic applications [1–11]. The advance-
ment in machine learning methods makes the system more intelligent like human
beings. The uses of advance sensor system makes the robot intelligent upto next
level. The present work deals with the need and uses of the advanced sensor system
for the autonomous vehicle application. In the present work the robotic vehicle appli-
cation is mainly divided into railway trains and road vehicles. The advanced sensor
systems are mainly used into railway track or inside the train in case of automatic
application. Similarly, in case of road vehicle, the sensors are used for the efficient
operations of the vehicles and in roadside for automatic driving and other services.
In this chapter, the sensors are divided based on their three different categories of
operation [12–79]. The first category belongs to the category of sensors which con-
tinuously senses the data based on the operation requirement. These types of sensors
are called continuous monitoring sensors [80]. The next category of the sensors are
called event monitoring sensors which belongs to the category of event based data
generation. Also, one more category of the sensors are their i.e. periodic sensors.
The periodic sensors transmit the data periodically to the monitoring station. The
roadside monitoring system for vehicle application and track monitoring system for
railway application is discussed below in the subsections [14–122].

1.1 Automatic Driving Application

Now a days advanced sensors are used for the automatic driving and vehicle monitor-
ing application. For the automatic driving, efficient and accurate monitoring helps to
better prediction and tracking. the researchers are working in the field of various pre-
diction methods for the tracking of the vehicles. The kalman, extended kalman based
prediction methods are widely used for the prediction application. The advanced
algorithms are also used for the accurate prediction e.g. cuckoo search algorithm,
particle swarm optimization. The prediction algorithm plays an important role but it
has a limit. The prediction efficiency can be improved with the help of the advanced
sensors. Now a days advanced sensors with efficient communication system is used
for the monitoring application. The advanced sensor system for vehicle application
Advanced Sensor Systems … 441

Fig. 1 Advanced sensors of vehicle application

is shown in Fig. 1. As shown in figure the sensor device is mounted on each vehicle.
The sensor devices directly communicates with the road side devices and transmits
the data to road side device. All the road side devices transmit the data to the mon-
itoring station. The monitoring station analyzes the data and generates the control
signal for the vehicle. The advanced sensors used in the vehicle makes the system
more efficient [91, 120, 121].

The automatic driving is an important application which requires advanced sen-


sors for autonomous robotic application. The advanced sensors of the vehicle
can be subdivided into continuous monitoring, event monitoring, and periodic
monitoring sensors.

1.2 Railway Monitoring Application

The railway monitoring is an important field to make the railway trains robotic and
autonomous [13]. The researchers have worked on railway monitoring application
in different fields. Most of the researchers are working in the MAC protocol [14–
58, 81, 86–100, 104, 114–119], aggregation protocol [59–83, 101, 102, 122], and
data consistency [91, 120, 121]. However, the performance efficiency of the railway
monitoring system depends upon the advanced sensor devices. Now a days many
advanced sensor devices are available for the wireless sensor network and IoT appli-
442 M. Tolani et al.

Fig. 2 Advanced sensors of railway application

cation. The operation of railway track monitoring application is shown in The Fig. 2.
The sensor devices are placed on the track. The sensor devices transmits the data to
the monitoring station via base station.

The railway monitoring is an important application which requires advanced


sensors for autonomous robotic application. The advanced sensors of the rail-
way monitoring can be subdivided into continuous monitoring, event moni-
toring, and periodic monitoring sensors.

The advancement of artificial intelligence (AI)/machine learning (ML) opens a


new door in the field of monitoring application. The integration of advanced sensors
with AI/ML is now used for the advanced applications. In the present chapter, vari-
ous methods are discussed for advanced robotic devices. The data consistency, data
aggregation, and application-specific data transmission protocol are few methods are
discussed for advanced robotic sensor application.
In the rest of the chapter, literature review and analysis are discussed in Sect. 2.
The uses of advanced sensors for various applications is discussed in Sect. 3. Finally,
Sect. 4 discusses the overall research contribution, future scope and conclusion based
on results and findings.

2 Related Works

The researchers have reported many works related to the advanced sensor applica-
tion. Victoria et al. discussed various uses of advanced sensors for the railway track
Advanced Sensor Systems … 443

condition monitoring systems. The authors reported various research works for the
monitoring application using WSN. The researchers used WSN to (1) Maintain pro-
cess Tolerance, (2) Verify and protect machine, (3) Detect maintenance requirement,
(4) Minimize Downtime, and (5) Prevent failure and save business money and time.
In this work, the author have reported many advanced sensors for bridge monitor-
ing, tunnel monitoring, track monitoring, and rail bed monitoring. Eila et al. proposed
a system to measure the bogie vibration [56]. Maly et al. proposed the heterogeneous
sensor model to integrate data [94]. There are many other research works reported by
many of the researchers. For the better analysis of the contribution of the researchers
in this direction, the research papers are selected based on inclusion and exclusion
criteria. Approximately 280 research papers are identified at the first stage. Based on
inclusion and exclusion criteria, total 70% papers are rejected.

Inclusion and Exclusion criteria plays an important role in the identification of


the papers. The research papers related to the advanced sensors and automatic
robot applications are closely identified.

The research works related to the core data communication are rejected at the
screening stage and research papers with core idea with advanced sensors are identi-
fied for the full review as shown in Fig. 3. The keywords of the works closely related
to the advanced sensors are also mentioned. The identified keywords are according
to the inclusion criteria.
In the continuous process, the year-wise contribution of the researchers are also
analyzed in the field of advanced sensors. The research analysis shows that the
research contribution of the researchers are increased in the last 4–5 years as shown
in Fig. 4. The main reason of the researchers attraction in this direction is the advance-
ment in AI/ML and IoT. Advanced sensors are primary requirement of the IoT. Also,
the AI/ML methods make the IoT system more powerful. Therefore, the requirement
of the advanced sensors increasing exponentially.

The literature study shows that the researchers contribution is increasing from
last 5–10 years. The advancement of the IoT and AI/ML is the major cause
of the demand of advanced sensors. The automatic driving efficiency strongly
depends upon advanced sensors.

The main inclusion terms are mentioned in Fig. 3. The cumulative use of each
term is shown in Fig. 5. The result in Fig. 5. shows that most of the papers are based
on the event monitoring sensors. The continuous monitoring is also mentioned in
many of the sensors. Therefore, the discussion of advanced sensors are categorized
into continuous monitoring and event monitoring sensors.
444 M. Tolani et al.

Fig. 3 Screening and Identification process of the research papers (Inclusion/Exclusion criteria,
Keywords)
Advanced Sensor Systems … 445

Fig. 4 Year-wise contribution of the researchers in the field of advanced sensors

Fig. 5 Cumulative use of inclusion terms in manuscripts


446 M. Tolani et al.

Fig. 6 The cumulative count of the keywords

Apart from this the main focus of the researchers are AI/ML based advance
methods to control the vehicle. The vehicle sensors are advance sensors which are
used for the autonomous driving. The researchers have also used various different
types of sensors devices. The sensor devices are mainly categorized into reduced
function device (RFD) and fully function device (FFD). The RFD devices can pnly
perform sensing operation and transmission of the data. However, the FFD devices
can perform all the different types of the computational operation. The researchers
have reported clustering, and aggregation operation which can be performed by FFD
devices.

RFD mainly works as a end device. However, the FFD device can work as
intermediate device and can perform all the mathematical operations. The
FFD can also work as cluster head device.

The keywords of the inclusion criteria mainly describes the focus of the researchers
in the particular direction. Therefore the cumulative count of the keywords are also
identified. The keywords of the shortlisted manuscripts also show that the main focus
of the researchers is either event monitoring and continuous monitoring sensors. The
researchers used this type of sensors for WSN or IoT applications. Similarly, data
aggregation and filtration of the collected data from the advanced sensors are also
reported in the research works (Fig. 6).
Advanced Sensor Systems … 447

Fig. 7 Research Contribution in various fields

The cumulative count of the keywords signifies the focus of the researchers in
the particular direction. The current cumulative count of the keywords clearly
indicates that the advanced continuous and event monitoring sensors plays an
important role for autonomous robotic application.

As already mentioned that the researchers are working in various domain of the
autonomous vehicle. The contribution of the researchers in various fields are analyzed
as shown in Fig. 7. The analysis shows that the major focus of the researchers is to
reduce the energy consumption. The advanced sensors play an important role in
reduction of energy consumption. The researchers also reported various other works
for the reduction of energy consumption. The efficient design of MAC protocol is one
of the field. The researchers have reported various contention based and contention-
less protocols. The road/railway safety based intelligent system is reported by various
researchers. These types of systems can only be made with the help of the AI/ML
and IoT.

There are various ways to reduce the energy consumption. However, the
energy-efficient methods can be categorized into two different fields, i.e., hard-
ware and software related fields. In the software related fields, the researchers
are working in MAC protocol, routing protocol, and aggregation protocol. For
the hardware related field, the researchers are working in the advanced sensor
design and fabrication.
448 M. Tolani et al.

The researchers have reported many works in related to the bridge monitoring
[4–9]. The acoustic emission sensor is used for the crack/fatigue detection. Similarly
strain guage sensor is used for the stress detection on the railway track [1, 3, 8, 12]. the
researchers have used for the piezoelectric strain guage sensor for bridge monitoring
application [11]. Similarly, the strain guage sensor is used for the weight measurement
of the train as reported in [10]. For the dynamic load application accelerometer sensor
is used for various application [2, 8]. Many other research works are reported for
different other applications.

3 Types of Sensors for Various Applications

The advanced sensors can be used for different applications. In this section, we have
discussed two categories of advanced sensors as given below:

• Efficient Road Monitoring


• Efficient railway Monitoring.

3.1 Efficient Road Monitoring

Various requirements and dependencies on various technologies and parameters are


discussed for efficient road monitoring. The efficient road monitoring depends upon
various other technologies as shown in Fig. 8. Now a days the data rate demand is
increasing day by day. To fulfill the huge demand of the data rate, the researcher are
moving in higher frequency range. The 5G technology fulfills the current demand
of the bandwidth requirement. The advanced sensors integrated with device-to-
device communication and IoT increases the efficiency of the road monitoring. The
researchers have proposed various aggregation protocols for the energy-efficient data
transmission. The data consistency methods are also important. There is a trade-off
between energy-efficient data transmission and data consistency.

The data is transmitted from the sensor devices to the roadside equipment via
direct communication. The monitoring station receives data from every road-
side device. The monitoring station evaluates the information and produces
the vehicle’s control signal.The technology is more effective thanks to the
sophisticated sensors utilized in the vehicle.
Advanced Sensor Systems … 449

Fig. 8 Various dependencies of efficient road monitoring

3.2 Efficient Railway Monitoring Monitoring

The efficient railway monitoring depends upon various other technologies similar to
railway monitoring as shown in Fig. 9.The demand for data rates is rising now a days.
The researchers are stepping up their frequency range in order to meet the enormous
demand for data rate. The present need for bandwidth is met by the 5G technology
for railway application. The effectiveness of the road monitoring is increased by
the modern sensors combined with device-to-device communication and IoT. The
researchers have put up a number of aggregation protocols for the transfer of data
that uses little energy. Methods for ensuring data consistency are also crucial. Data
integrity and energy-efficient data transmission are trade-offs [80, 84, 85, 90, 92,
95, 103, 105–113].
The researchers have reported various uses of the advance sensors for the vari-
ous railway application needs. The accelerometer, gyroscope, FB Strain sensors are
used for the train shell monitoring application. Humidity, motion detector, vibration
sensors are used for the wagon monitoring application. Surface, acoustic sensor,
and inertia sensors are used for the bogie monitoring. Gyro sensor and gap sensors
are used for the wheels monitoring. Wind pressure sensors are used for the brakes
monitoring. Few other sensors are also used for other applications as given in Table 1.
450 M. Tolani et al.

Fig. 9 Various dependencies of efficient railway monitoring

Table 1 Models used for optimising the energy usage in a building


Advance sensors Application
Accelerometer, gyroscope, FB strain Train shell monitoring
Humidity, motion detector, Vibration sensor Wagon monitoring
Surface acoustic sensor, Inertia sensors Bogie monitoring
Gyro, gap sensors Wheels monitoring
Wind pressure sensors Brakes monitoring
Load cell, FBS, FBG, FBT Panto-graph
Thermo-couple, SAW temperature Axles monitoring

The field of railroad monitoring is crucial to the development of robotic and


autonomous trains. The researchers have developed applications for railway
monitoring in several sectors. The majority of researches are focused on data
consistency, MAC protocol, and aggregation protocol. However, the modern
sensor devices are what make the railway monitoring system operate effec-
tively. There are several cutting-edge sensor devices available now for use with
IoT applications and wireless sensor networks.

The road side application requires various different types of sensors. As shown in
Fig. 10 that footpath sensor, light sensor, service road shoulder width, safety barrier,
medium width, gyroscope, road capacity, traffic signal, operating speed, and many
other different types of the sensors. Most of the sensor are mentioned in the Fig. 10.
Advanced Sensor Systems … 451

Fig. 10 Advanced sensors for Robotic Road Application

Fig. 11 Advanced sensors for Robotic Railway Application

Similarly, the advanced sensors can also be used for the railway application. The
sensors used for the railway application are shown in Fig. 11. The accelerometer
sensors, FBG sensor, inclinometer sensor, maganetoelectric sensor, acoustic emis-
sion, gyroscope, displacement sensor, and many other senors are used for the railway
monitoring application.
452 M. Tolani et al.

The monitoring of bridges is a topic on which the researchers have published


a lot of work. Crack and fatigue detection is done using the acoustic emission
sensor. Similar to this, the railway track uses a strain gauge sensor to monitor
stresses. In order to monitor bridges, researchers have developed piezoelectric
strain gauge sensors. Similar to how it was described in [118], the strain gauge
sensor is used to measure the train’s weight. A variety of applications use
accelerometer sensors for dynamic loads. Numerous further research projects
have been reported for numerous other uses.

4 Conclusion

Sensor systems are essential in applications involving robotics and autonomous vehi-
cles. The advancement of data science, artificial intelligence, machine learning, and
the internet of things (IoT) creates new opportunities for autonomous cars. The
fusion of robots, IoT, and AI is a particularly potent combination for applications
such as vehicle control, traffic monitoring, and traffic management. Advanced sen-
sor systems are necessary for efficient robotic and vehicle control with robot sensor
devices. As a result, the AI-based system attracts researcher’s attention in order to
maximize the utilization of sensor data for diverse robotic applications while mini-
mizing energy consumption. One key challenge that AI technology can successfully
address is the effective collection of data from sensors. Applications requiring time-
constrained data collection can also make use of the data consistency method. The
current chapter examines different crucial ways to raise the robotic and autonomous
vehicle applications’ quality of service (QoS) and quality of experience (QoE) stan-
dards.
In future, the nano-electromachenical system (NEMS) advanced sensors and actu-
ators can be developed for the low energy and long life time monitoring applications.
Also, new physical layer protocols can be developed for the efficient operation.

References

1. Bischoff, R., Meyer, J., Enochsson, O., Feltrin, G., & Elfgren, L. (2009). Eventbased strain
monitoring on a railway bridge with a wireless sensor network. In Proceedings of the 4th
International Conference on Structural Health Monitor, (pp. 1–8). Zurich, Switzerland: Intell.
Infrastructure.
2. Chebrolu, K., Raman, B., Mishra, N., Valiveti, P., & Kumar, R. (2008). Brimon: A sensor
network system for railway bridge monitoring. In Proceedings of the 6th International Con-
ference on Mobile System and Application Services, Breckenridge, CO, USA (pp. 2–14).
Advanced Sensor Systems … 453

3. Feltrin, G. (2012). Wireless sensor networks: A monitoring tool for improving remaining
lifetime estimation. In Civil Struct (Ed.), Health Monitoring Workshop (pp. 1–8). Berlin:
Germany.
4. Grosse, C., et al. (2006). Wireless acoustic emission sensor networks for structural health mon-
itoring in civil engineering. In Proceedings of the European Conference on Non-Destructive
Testing (pp. 1–8), Berlin, Germany.
5. Grosse, C., Glaser, S., & Kruger, M. (2010). Initial development of wireless acoustic emission
sensor Motes for civil infrastructure state monitoring. Smart Structures and Systems, 6(3),
197–209.
6. Hay, T. et al. (2006). Transforming bridge monitoring from time-based to predictive mainte-
nance using acoustic emission MEMS sensors and artificial intelligence. In Proceedings of
the 7th World Congress on Railway Research, Montreal, Canada, CD-ROM.
7. Hay, T. (2007). Wireless remote structural integrity monitoring for railway bridges. Trans-
portation Research Board, Washington, DC, DC, USA, Technical report no. HSR-IDEA
Project 54.
8. Krüger, M. et al. (2007). Sustainable Bridges. Technical Report on Wireless Sensor Networks
using MEMS for Acoustic Emission Analysis including other Monitoring Tasks. Stuttgart,
Germany: European Union.
9. Ledeczi, A., et al. (2009). Wireless acoustic emission sensor network for structural monitoring.
IEEE Sensors Journal, 9(11), 1370–1377.
10. Reyer, M., Hurlebaus, S., Mander, J., & Ozbulut, O. E. (2011). Design of a wireless sensor
network for structural health monitoring of bridges. In Proceedings of the 5th International
Conference on Sens Technology, Palmerston North, New Zealand (pp. 515–520).
11. Sala, D., Motylewski, J., & Koaakowsk, P. (2009). Wireless transmission system for a railway
bridge subject to structural health monitoring. Diagnostyka, 50(2), 69–72.
12. Townsend, C., & Arms, S. (2005). Wireless sensor networks. Principles and applications. In
J. Wilson (Ed.), Sensor Technology Handbook (Chap. 22). Oxford, UK: Elsevier.
13. Tolani, M., Sunny, R., Singh, K., Shubham, K., & Kumar, R. (2017). Two-Layer optimized
railway monitoring system using Wi-Fi and ZigBee interfaced WSN. IEEE Sensors Journal,
17(7), 2241–2248.
14. Rasouli, H., Kavian, Y. S., & Rashvand, H. F. (2014). ADCA: Adaptive duty cycle algorithm
for energy efficient IEEE 802.15.4 beacon-enabled WSN. IEEE Sensors Journal, 14(11),
3893–3902.
15. Misic, J., Misic, V. B., & Shafi, S. (2004). Performance of IEEE 802.15.4 beacon enabled
PAN with uplink transmissions in non-saturation mode-access delay for finite buffers. In First
International Conference on Broadband Networks, San Jose, CA, USA (pp. 416–425).
16. Jung, C. Y., Hwang, H. Y., Sung, D. K., & Hwang, G. U. (2009). Enhanced markov chain
model and throughput analysis of the slotted CSMA/CA for IEEE 802.15.4 under unsaturated
traffic conditions. In IEEE Transactions on Vehicular Technology (Vol. 58, no. 1, pp. 473–478),
January 2009.
17. Zhang, H., Xin, S., Yu, R., Lin, Z., & Guo, Y. (2009). An adaptive GTS allocation mechanism
in IEEE 802.15.4 for various rate applications. In 2009 Fourth International Conference on
Communications and Networking in China.
18. Ho, C., Lin, C., & Hwang, W. (2012). Dynamic GTS allocation scheme in IEEE 802.15.4 by
multi-factor. In 2012 Eighth International Conference on Intelligent Information Hiding and
Multimedia Signal Processing.
19. Yang, L., Zeng, S. (2012). A new GTS allocation schemes For IEEE 802.15.4. In 2012 5th
International Conference on BioMedical Engineering and Informatics (BMEI 2012)
20. Hurtado-López, J., & Casilari, E. (2013). An adaptive algorithm to optimize the dynamics of
IEEE 802.15.4 network. In Mobile Networks and Management (pp. 136–148).
21. Standard for Part 15.4: Wireless Medium Access Control (MAC) and Physical Layer Specifica-
tions for Low Rate Wireless Personal Area Networks (LR-W PAN), IEEE Standard 802.15.4,
Junuary 2006.
454 M. Tolani et al.

22. Pei, G., & Chien, C. (2001). Low power TDMA in large WSNs. In 2001 MILCOM Proceed-
ings Communications for Network-Centric Operations: Creating the Information Force (Cat.
No.01CH37277) (Vol. 1, pp. 347–351).
23. Shafiullah, G. M., Thompson, A., Wolf, P., & Ali, S. (2008). Energy-efficient TDMA MAC
protocol for WSNs applications. In Proceedings of the 5th ICECE, Dhaka, Bangladesh,
December 24–27, 2008 (pp. 85–90).
24. Hoesel & Havinga. (2004). A lightweight medium access protocol (LMAC) for WSNs: Reduc-
ing preamble transmissions and transceiver state switches. In 1st International Workshop on
Networked Sensing Systems (pp. 205–208).
25. Alvi, A. N., Bouk, S. H., Ahmed, S. H., Yaqub, M. A., Sarkar, M., & Song, H. (2016). BEST-
MAC: Bitmap-Assisted efficient and scalable TDMA-Based WSN MAC protocol for smart
cities. IEEE Access, 4, 312–322.
26. Li, J., & Lazarou, G. Y. (2004). A bit-map-assisted energy-efficient MAC scheme for WSNs.
In Third International Symposium on Information Processing in Sensor Networks. IPSN 2004
(pp. 55–60).
27. Shafiullah, G., Azad, S. A., & Ali, A. B. M. S. (2013). Energy-efficient wireless MAC protocols
for railway monitoring applications. IEEE Transactions on Intelligent Transportation Systems,
14(2), 649–659.
28. Patro, R. K., Raina, M., Ganapathy, V., Shamaiah, M., & Thejaswi, C. (2007). Analysis and
improvement of contention access protocol in IEEE 802.15.4 star network. In 2007 IEEE
International Conference on Mobile Adhoc and Sensor Systems, Pisa (pp. 1–8).
29. Pollin, S. et al. (2008). Performance analysis of slotted carrier sense IEEE 802.15.4 medium
access layer. In IEEE Transactions on Wireless Communications (Vol. 7, no. 9, pp. 3359–
3371), September 2008.
30. Park, P., Di Marco, P., Soldati, P., Fischione, C., & Johansson, K. H. (2009). A generalized
Markov chain model for effective analysis of slotted IEEE 802.15.4. In IEEE 6th International
Conference on Mobile Adhoc and Sensor Systems Macau (pp. 130–139).
31. Aboelela, E., Edberg, W., Papakonstantinou, C., & Vokkarane, V. (2006). WSN based model
for secure railway operations. In Proceedings 25th IEEE International Performance, Com-
puter Communication Conference, Phoenix, AZ, USA (pp. 1–6).
32. Shafiullah, G., Gyasi-Agyei, A., & Wolfs, P. (2007). Survey of wireless communications
applications in the railway industry. In Proceedings of 2nd International Conferences on
Wireless Broadband Ultra Wideband Communication, Sydney, NSW, Australia (p. 65).
33. Shrestha, B., Hossain, E., & Camorlinga, S. (2010). A Markov model for IEEE 802.15.4
MAC with GTS transmissions and heterogeneous traffic in non-saturation mode. In IEEE
International Conference on Communication Systems, Singapore (pp. 56–61).
34. Park, P., Di Marco, P., Fischione, C., & Johansson, K. H. (2013). Modeling and optimization
of the IEEE 802.15.4 protocol for reliable and timely communications. In IEEE Transactions
on Parallel and Distributed Systems (Vol. 24, no. 3, pp. 550–564), March 2013.
35. Farhad, A., Zia, Y., Farid, S., & Hussain, F. B. (2015). A traffic aware dynamic super-frame
adaptation algorithm for the IEEE 802.15.4 based networks. In IEEE Asia Pacific Conference
on Wireless and Mobile (APWiMob), Bandung (pp. 261–266).
36. Moulik, S., Misra, S., & Das, D. (2017). AT-MAC: Adaptive MAC-Frame payload tuning
for reliable communication in wireless body area network. In IEEE Transactions on Mobile
Computing (Vol. 16, no. 6, pp. 1516–1529), June 1, 2017.
37. Choudhury, N., & Matam, R. (2016). Distributed beacon scheduling for IEEE 802.15.4 cluster-
tree topology. In IEEE Annual India Conference (INDICON), Bangalore, (pp. 1–6).
38. Choudhury, N., Matam, R., Mukherjee, M., & Shu, L. (2017). Adaptive duty cycling in
IEEE 802.15.4 Cluster Tree Networks Using MAC Parameters. In Proceedings of the 18th
ACM International Symposium on Mobile Ad Hoc Networking and Computing, Mobihoc’17,
Chennai, India (pp. 37:1–37:2).
39. Moulik, S., Misra, S., & Chakraborty, C. (2019). Performance evaluation and Delay-Power
Trade-off analysis of ZigBee Protocol. In IEEE Transactions on Mobile Computing (Vol. 18,
no. 2, pp. 404–416), February 1, 2019.
Advanced Sensor Systems … 455

40. Barbieri, A., Chiti, F., & Fantacci, R. (2006). Proposal of an adaptive MAC protocol for
efficient IEEE 802.15.4 low power communications. In Proceedings of IEEE 49th Global
Telecommunication Conference, December 2006 (pp. 1–5).
41. Lee, B.-H., & Wu, H.-K. (2010). Study on a dynamic superframe adjustment algorithm for
IEEE 802.15.4 LR-WPAN. In Proceedings of Vehicular Technology Conference (VTC), May
2010 (pp. 1–5).
42. Jeon, J., Lee, J. W., Ha, J. Y., & Kwon, W. H. (2007). DCA: Duty-cycle adaptation algorithm
for IEEE 802.15.4 beacon-enabled networks. In Proceedings of the 65th IEEE Vehicular
Technology Conference, April 2007 (pp. 110–113).
43. Goyal, R., Patel, R. B., Bhadauria, H. S., & Prasad, D. (2014). Dynamic slot allocation scheme
for efficient bandwidth utilization in Wireless Body Area Network. In 9th International Con-
ference on Industrial and Information Systems (ICIIS), Gwalior (pp. 1–7).
44. Na, C., Yang, Y., & Mishra, A. (2008). An optimal GTS scheduling algorithm for time-
sensitive transactions in IEEE 802.15.4 networks. In Computer Networks (Vol. 52 no. 13 pp.
2543–2557), September 2008.
45. Akbar, M. S., Yu, H., & Cang, S. (2017). TMP: Tele-Medicine protocol for slotted 802.15.4
with duty-cycle optimization in wireless body area sensor networks. IEEE Sensors Journal,
17(6), 1925–1936.
46. Koubaa, A., Alves, M., & Tovar, E. (2006). GTS allocation analysis in IEEE 802.15.4 for
real-time WSNs. In Proceedings 20th IEEE International Parallel and Distributed Processing
Symposium, Rhodes Island (p. 8).
47. Park, P., Fischione, C., & Johansson, K. H. (2013). Modeling and stability analysis of hybrid
multiple access in the IEEE 802.15.4 protocol. ACM Transactions on Sensor Networks, 9(2),
13:1–13:55.
48. Alvi, A., Mehmood, R., Ahmed, M., Abdullah, M., & Bouk, S. H. (2018). Optimized GTS
utilization for IEEE 802.15.4 standard. In International Workshop on Architectures for Future
Mobile Computing and Internet of Things.
49. Song, J., Ryoo1, J., Kim, S., Kim, J., Kim, H., & Mah, P. (2007). A dynamic GTS allocation
algorithm in IEEE 802.15.4 for QoS guaranteed real-time applications. In IEEE International
Symposium on Consumer Electronics. ISCE 2007.
50. Lee, H., Lee, K., & Shin, Y. (2012). A GTS Allocation Scheme for Emergency Data Trans-
mission in Cluster-Tree WSNs, ICACT2012, February 2012 (pp. 19–22).
51. Lei, X., Choi, Y., Park, S., & Hyong Rhee, S. (2012). GTS allocation for emergency data
in low-rate WPAN. In 18th Asia-Pacific Conference on Communications (APCC), October
2012.
52. Yang, L., & Zeng, S. (2012). A new GTS allocation schemes For IEEE 802.15.4. In 2012 5th
International Conference on BioMedical Engineering and Informatics (BMEI 2012).
53. Cheng, L., Bourgeois, A. G., & Zhang, X. (2007). A new GTS allocation scheme for IEEE
802.15.4 networks with improved bandwidth utilization. In International Symposium on Com-
munications and Information Technologies
54. Udin Harun Al Rasyid, M., Lee, B., & Sudarsono, A. (2013). PEGAS: Partitioned GTS
allocation scheme for IEEE 802.15.4 networks. In International Conference on Computer,
Control, Informatics and Its Applications.
55. Roy, S., Mallik, I., Poddar, A., & Moulik, S. (2017). PAG-MAC: Prioritized allocation of
GTSs in IEEE 802.15.4 MAC protocol—A dynamic approach based on Analytic Hierarchy
Process. In 14th IEEE India Council International Conference (INDICON), December 2017.
56. Heinzelman, W. B., Chandrakasan, A. P., & Balakrishnan, H. (2002). An application-specific
protocol architecture for wireless microsensor networks. IEEE Wireless Communication
Transactions, 1(4), 660–670.
57. Philipose, A., & Rajesh, A. (2015). Performance analysis of an improved energy aware MAC
protocol for railway systems. In 2nd International Conference on Electronics and Communi-
cation Systems (ICECS), Coimbatore, (pp. 233–236).
58. Kumar, D., & Singh, M. P. (2018). Bit-Map-Assisted Energy-Efficient MAC protocol for
WSNs. International Journal of Advanced Science and Technology, 119, 111–122.
456 M. Tolani et al.

59. Duarte-Melo, E. J., & Liu, M. (2002). Analysis of energy-consumption and lifetime of hetero-
geneous WSNs. In Global Telecommunications Conference. GLOBECOM ’02. IEEE, 2002
(Vol. 1. pp. 21–25).
60. Shabna, V. C., Jamshid, K., & Kumar, S. M. (2014). Energy minimization by removing
data redundancy in WSNs. In 2014 International Conference on Communication and Signal
Processing, Melmaruvathur (pp. 1658–1663).
61. Yetgin, H., Cheung, K. T. K., El-Hajjar, M., & Hanzo, L. (2015). Network-Lifetime maxi-
mization of WSNs. IEEE Access, 3, 2191–2226.
62. Rajagopalan, R., & Varshney, P. K. (2006). Data-aggregation techniques in sensor networks:
A survey. In IEEE Communications Surveys & Tutorials (Vol. 8, no. 4, pp. 48–63). Fourth
Quarter 2006.
63. Jesus, P., Baquero, C., & Almeida, P. S. (2015). A survey of distributed data aggregation algo-
rithms. In IEEE Communications Surveys Tutorials (Vol. 17, no. 1, pp. 381–404). Firstquarter
2015.
64. Zhou, F., Chen, Z., Guo, S., & Li, J. (2016). Maximizing lifetime of Data-Gathering trees
with different aggregation modes in WSNs. IEEE Sensors Journal, 16(22), 8167–8177.
65. Sofra, N., He, T., Zerfos, P., Ko, B. J., Lee, K. W., & Leung, K. K. (2008). Accuracy analysis
of data aggregation for network monitoring. MILCOM 2008–2008 IEEE Military Communi-
cations Conference, San Diego, CA (pp. 1–7).
66. Heinzelman, W., Chandrakasan, A., & Balakrishnan, H. (2000). Energy-Efficient communi-
cation protocols for wireless microsensor networks. In Proceedings of the 33rd Hawaaian
International Conference on Systems Science (HICSS), January 2000.
67. Liang, J., Wang, J., Cao, J., Chen, J., & Lu, M. (2010). Algorithm, an efficient, & for construct-
ing maximum lifetime tree for data gathering without aggregation in WSNs. In Proceedings
IEEE INFOCOM, San Diego, CA (pp. 1–5).
68. Wu, Y., Mao, Z., Fahmy, S., & Shroff, N. B. (2010). Constructing maximum-lifetime data-
gathering forests in sensor networks. IEEE/ACM Transactions on Networking, 18(5), 1571–
1584.
69. Luo, D., Zhu, X., Wu, X., & Chen, G. (2011). Maximizing lifetime for the shortest path
aggregation tree in WSNs. Proceedings IEEE INFOCOM, Shanghai (pp. 1566–1574).
70. Hua, C., & Yum, T. S. P. (2008). Optimal routing and data aggregation for maximizing lifetime
of WSNs. IEEE/ACM Transactions on Networking, 16(4), 892–903.
71. Choi, K., & Chae, K. (2014). Data aggregation using temporal and spatial correlations in
Advanced Metering Infrastructure. In The International Conference on Information Network-
ing 2014 (ICOIN2014), Phuket (pp. 541–544).
72. Villas, L. A., Boukerche, A., Guidoni, D. L., de Oliveira, H. A. B. F., de Araujo, R. B., &
Loureiro, A. A. F. (2013). An energy-aware spatio-temporal correlation mechanism to perform
efficient data collection in WSNs. Computer Communications, 36(9), 1054–1066.
73. Liu, C., Wu, K., & Pei, J. (2007). An energy-efficient data collection framework for WSNs by
exploiting spatiotemporal correlation. IEEE Transactions on Parallel and Distributed Systems,
18(7), 1010–1023.
74. Kandukuri, S., Lebreton, J., Lorion, R., Murad, N., & Daniel Lan-Sun-Luk, J. (2016). Energy-
efficient data aggregation techniques for exploiting spatio-temporal correlations in WSNs.
Wireless Telecommunications Symposium (WTS) (pp. 1–6), London.
75. Mantri, D., Prasad, N. R., & Prasad, R. (2014). Wireless Personal Communications, 5, 2589.
https://doi.org/10.1007/s11277-013-1489-x.
76. Mantri, D., Prasad, N. R., Prasad, R., & Ohmori, S. (2012). Two tier cluster based data aggre-
gation (TTCDA) in WSN. In 2012 IEEE International Conference on Advanced Networks
and Telecommunciations Systems (ANTS).
77. Pham, N. D., Le, T. D., Park, K., & Choo, H. SCCS: Spatiotemporal clustering and com-
pressing schemes for efficient data collection applications in WSNs. International Journal of
Communication Systems, 23, 1311–1333.
78. Villas, L. A., Boukerche, A., de Oliveira, H. A. B. F., de Araujo, R. B., & Loureiro, A. A. F.
(2014). A spatial correlation aware algorithm to perform efficient data collection in WSNs.
Ad Hoc Networks, 12, 69–85. ISSN 1570-8705.
Advanced Sensor Systems … 457

79. Krishnamachari, B., Estrin, D., & Wicker, S. B. (2002). The impact of data aggregation in
WSNs. In ICDCSW ’02: Proceedings of the 22nd International Conference on Distributed
Computing Systems (pp. 575–578). Washington, DC, USA: IEEE Computer Society.
80. Tolani, M., & Sunny, R. K. S. (2019). Lifetime improvement of WSN by information sensitive
aggregation method for railway condition monitoring. Ad Hoc Networks, 87, 128–145. ISSN
1570-8705. https://doi.org/10.1016/j.adhoc.2018.11.009.
81. Tolani, M., & Sunny, R. K. S. (2019). Energy Efficient Adaptive Bit-Map-Assisted Medium
Access Control Protocol, Wireless Personal Communication (Vol. 108, pp. 1595–1610).
https://doi.org/10.1007/s11277-019-06486-9.
82. MacQueen, J. B. Some Methods for classification and Analysis of Multivariate Observations.
In Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability (Vol.
1, pp. 281–297). Berkeley: University of California Press.
83. Mišić, J., Shafi, S., & Mišić, V. B. (2005). The impact of MAC parameters on the performance
of 802.15.4 PAN. Ad Hoc Network. 3, 5 (September 2005), 509–528. https://doi.org/10.1016/
j.adhoc.2004.08.002.
84. An IEEE 802.15.4 complaint and ZigBee-ready 2.4 GHz RF transceiver. (2004). Microwave
Journal, 47(6), 130–135.
85. Dargie, W., & Poellabauer, C. (2010). Fundamentals of WSNs: Theory and Practice. Wiley
Publishing.
86. Park, P., Fischione, C., & Johansson, K. H. (2013). Modeling and stability analysis of hybrid
multiple access in the IEEE 802.15.4 protocol. ACM Transactions on Sensor Networks, 9, 2,
Article 13, 55 pages.
87. Zhan, Y., & Xia, M. A. (2016). GTS size adaptation algorithm for IEEE 802.15.4 wireless
networks. Ad Hoc Networks, 37, Part 2, pp. 486–498. ISSN 1570-8705, https://doi.org/10.
1016/j.adhoc.2015.09.012.
88. Iala, I, Dbibih, I., & Zytoune, O. (2018). Adaptive duty-cycle scheme based on a new predic-
tion mechanism for energy optimization over IEEE 802.15.4 wireless network. International
Journal of Intelligent Engineering and Systems, 11(5). https://doi.org/10.22266/ijies2018.
1031.10.
89. Boulis, A. (2011). Castalia: A simulator for WSNs and Body Area Networks, user’s manual
version 3.2, NICTA.
90. Kolakowski1, P., Szelazek, J., Sekula, K., Swiercz, A., Mizerski, K., & Gutkiewicz, P. (2011).
Structural health monitoring of a railway truss bridge using vibration-based and ultrasonic
methods. Smart Materials and Structures, 20(3), 035016.
91. Al-Janabi, T. A., & Al-Raweshidy, H. S. (2019). An energy efficient hybrid MAC protocol
with dynamic sleep-based scheduling for high density IoT networks. IEEE Internet of Things
Journal, 6(2), 2273–2287.
92. Penella-López, M. T., & Gasulla-Forner, M. (2011). Powering autonomous sensors: An inte-
gral approach with focus on solar and RF energy harvesting. Springer Link. https://doi.org/
10.1007/978-94-007-1573-8.
93. Farag, H., Gidlund, M., & Österberg, P. (2018). A delay-bounded MAC protocol for mission-
and time-critical applications in industrial WSNs. IEEE Sensors Journal, 18(6), 2607–2616.
94. Lin, C. H., Lin, K. C. J., & Chen, W. T. (2017). Channel-Aware polling-based MAC protocol
for body area networks: Design and analysis. IEEE Sensors Journal, 17(9), 2936–2948
95. Hodge, V. J., O’Keefe, S., Weeks, M., & Moulds, A. (2015). WSNs for condition monitoring
in the railway Industry: A survey. IEEE Transactions on Intelligent Transportation Systems,
16(3), 1088–1106.
96. Ye, W., Heidemann, J., & Estrin, D. (2002). An energy-efficient MAC protocol for WSNs. In
Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies
(Vol. 3, pp. 1567–1576).
97. Siddiqui, S., Ghani, S., & Khan, A. A. (2018). ADP-MAC: An adaptive and dynamic polling-
based MAC protocol for WSNs. IEEE Sensors Journal, 18(2), 860–874.
98. Stem, M., & Katz, R. H. (1997). Measuring and reducing energy-consumption of network
interfaces in hand held devices. IEICE Transactions on Communications, E80-B(8), 1125–
1131.
458 M. Tolani et al.

99. Lee, A. H., Jing, M. H., & Kao, C. Y. (2008). LMAC: An energy-latency trade-off MAC
protocol for WSNs. International Symposium on Computer Science and its Applications,
Hobart, ACT (pp. 233–238).
100. Karl, H., & Willig, A. (2005). Protocols and Architectures for WSNs. Wiley.
101. Balakrishnan, C., Vijayalakshmi, E., & Vinayagasundaram, B. (2016). An enhanced itera-
tive filtering technique for data aggregation in WSN. In 2016 International Conference on
Information Communication and Embedded Systems (ICICES), Chennai (pp. 1–6).
102. Nayak, P., & Devulapalli, A. (2016). A fuzzy logic-based clustering algorithm for WSN to
extend the network lifetime. In IEEE Sensors Journal, 16(1), 137–144.
103. Tolani, M., Bajpai, A., Sunny, R. K. S., Wuttisittikulkij, L., & Kovintavewat, P. (2021). Energy
efficient hybrid medium access control protocol for WSN. In The 36th International Tech-
nical Conference on Circuits/Systems, Computers and Communications, June 28th(Mon)–
30th(Wed)/Grand Hyatt Jeju, Republic of Korea.
104. María Gabriela Calle Torres, energy-consumption in WSNs Using GSP, University of Pitts-
burgh, M.Sc. Thesis, April.
105. Chebrolu, K., Raman, B., Mishra, N., Valiveti, P., & Kumar, R. (2008). Brimon: A sensor
network system for railway bridge monitoring. In Proceeding 6th International Conference
on Mobile Systems, Applications, and Services, Breckenridge, CO, USA, pp. 2–14.
106. Pascale, A., Varanese, N., Maier, G., & Spagnolini, U. (2012). A WSN architecture for railway
signalling. In Proceedings of 9th Italian Network Workshop, Courmayeur, Italy (pp. 1–4).
107. Grudén, M., Westman, A., Platbardis, J., Hallbjorner, P., & Rydberg, A. (2009). Reliability
experiments for WSNs in train environment. in Proceedings of European Wireless Technology
Conferences, (pp. 37–40).
108. Rabatel, J., Bringay, S., & Poncelet, P. (2009). SO-MAD: Sensor mining for anomaly detection
in railway data. Advances in Data Mining: Applications and Theoretical Aspects, LNCS (Vol.
5633, pp. 191–205).
109. Rabatel, J., Bringay, S., & Poncelet, P. (2011). Anomaly detection in monitoring sensor data
for preventive maintenance. Expert Systems With Applications, 38(6), 7003–7015.
110. Reason, J., Chen, H., Crepaldi, R., & Duri, S. (2010). Intelligent telemetry for freight trains.
Mobile computing, applications, services (Vol. 35, pp. 72–91). Berlin, Germany: Springer.
111. Reason, J., & Crepaldi, R. (2009). Ambient intelligence for freight railroads. IBM Journal of
Research and Development, 53(3), 1–14.
112. Tuck, K. (2010). Using the 32 Samples First In First Out (FIFO) in the MMA8450Q,
Energy Scale Solutions by free scale, FreeScale Solutions, 2010. http://www.nxp.com/docs/
en/application-note/AN3920.pdf.
113. Pagano, S., Peirani, S., & Valle, M. (2015). Indoor ranging and localisation algorithm based
on received signal strength indicator using statistic parameters for WSNs. In IET Wireless
Sensor Systems (Vol. 5, no. 5, pp. 243–249), October 2015.
114. Tolani, M., Bajpai, A., Sharma, S., Singh, R. K., Wuttisittikulkij, L., & Kovintavewat, Energy
efficient hybrid medium access control protocol for WSN. In 36th International Technical
Conference on Circuits/Systems, Computers and Communications, (ITC-CSCC 21), at Jeju,
South Korea, 28–30 June 2021.
115. Tolani, M., Sunny, R. K. S. (2020). Energy-Efficient adaptive GTS allocation algorithm for
IEEE 802.15.4 MAC protocol. Telecommunication systems. Springer. https://doi.org/10.1007/
s11235-020-00719-0.
116. Tolani, M., Sunny, R. K. S. Adaptive Duty Cycle Enabled Energy-Efficient Bit-Map-Assisted
MAC Protocol. Springer, SN Computer Science. https://doi.org/10.1007/s42979-020-00162-
7.
117. Tolani, M., Sunny, R. K. S. (2020). Energy-Efficient Hybrid MAC Protocol for Railway Mon-
itoring Sensor Network (Vol. 2, p. 1404). Springer, SN Applied Sciences (2020). https://doi.
org/10.1007/s42452-020-3194-1.
118. Tolani, M., Sunny, R. K. S. (2018). Energy-efficient aggregation-aware IEEE 802.15.4
MAC protocol for railway, tele-medicine & industrial applications. In 2018 5th IEEE Uttar
Pradesh Section International Conference on Electrical, Electronics and Computer Engi-
neering (UPCON), Gorakhpur (pp. 1–5).
Advanced Sensor Systems … 459

119. Khan, A. A., Jamal, M. S., & Siddiqui, S. (2017). Dynamic duty-cycle control for WSNs using
artificial neural network (ANN). International Conference on Cyber-Enabled Distributed
Computing and Knowledge Discovery (CyberC), 2017, 420–424. https://doi.org/10.1109/
CyberC.2017.93
120. Wahyono, I. D., Asfani, K., Mohamad, M. M., Rosyid, H., Afandi, A., & Aripriharta (2020).
The new intelligent WSN using artificial intelligence for building fire disasters. In 2020 Third
International Conference on Vocational Education and Electrical Engineering (ICVEE) (pp.
1–6). https://doi.org/10.1109/ICVEE50212.2020.9243210.
121. Aliyu, F., Umar, S., & Al-Duwaish, H. (2019). A survey of applications of artificial neural
networks in WSNs. In 2019 8th International Conference on Modeling Simulation and Applied
Optimization (ICMSAO) (pp. 1–5). https://doi.org/10.1109/ICMSAO.2019.8880364.
122. Sun, L., Cai, W., & Huang, X. (2010). Data aggregation scheme using neural networks in
WSNs. In 2010 2nd International Conference on Future Computer and Communication,
May 2010 (Vol. 1, pp. V1-725–V1-729).
123. Elia, M. et al. (2006). Condition monitoring of the railway line and overhead equipment
through onboard train measurement-an Italian experience. In Proceedings of IET International
Conference on Railway Condition Monitor, Birmingham, UK (pp. 102–107).
124. Maly, T., Rumpler, M., Schweinzer, H., & Schoebel, A. (2005). New development of an overall
train inspection system for increased operational safety. In Proceedings of IEEE Intelligent
Transportation Systems, Vienna, Austria (pp. 188–193).
Four Wheeled Humanoid Second-Order
Cascade Control of Holonomic
Trajectories

A. A. Torres-Martínez, E. A. Martínez-García, R. Lavrenov, and E. Magid

Abstract This work develops model-based second-order cascade motion controller


of a holonomic humanoid-like wheeled robot. The locomotion structure is com-
prised of four mecanum wheels radially arranged. The model is given as a function
of all wheels contribution adding maneuverability to upper limbs. High-order deriva-
tives are synchronized through numeric derivations and integration, obtained online
for consistent performance of inner loops feedback. The controller deploys refer-
ence inputs vectors, both global and local to each cascade loop. In this approach,
the controller decreases errors in position, velocity and acceleration simultaneously
through Newton-based recursive numerical approximations. A main advantage of
this approach is the robustness obtained by three recursive feedback cascades: dis-
tance, velocity and acceleration. Observers are modeled by combining multi-sensor
inputs. The controller showed relative complexity, effectiveness, and robustness.
The proposed approach demonstrated good performance, re-routing flexibility and
maneuverability through numerical simulations.

Keywords Cascade-control · Holonomy · Wheeled-humanoid · Path-tracking ·


Control-loop · Mobile-robot

A. A. Torres-Martínez · E. A. Martínez-García (B)


Institute of Engineering and Technology, Universidad Autónoma de Ciudad Juárez,
Ciudad Juárez, Mexico
e-mail: [email protected]
R. Lavrenov · E. Magid
Institute of Information Technology and Intelligent Systems, Kazan Federal University,
Kazan, Russian Federation
e-mail: [email protected]
E. Magid
HSE Tikhonov Moscow Institute of Electronics and Mathematics, HSE University,
Moscow, Russian Federation

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 461
A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous
Systems Applications, Studies in Computational Intelligence 1093,
https://doi.org/10.1007/978-3-031-28715-2_15
462 A. A. Torres-Martínez et al.

1 Introduction

Wheeled mobile robots are widely used in a number of applications. The perfor-
mance of a wheeled robot is considerably good in particular on flat and structured
floors. For instance, they are faster, more efficient in reaching positions and usually
more effective in terms of mechanical energy requirements than walking bipeds.
In numerous robotic applications, particularly in cluttered environments, omnidirec-
tional rolling locomotion capabilities result suitable to easily change the robot’s body
posture to basically move towards any direction without explicit yaw control. Nowa-
days, deployment of omnidirectional wheels as a means for locomotion in mobile
robotics is highly demanding in a number or applications due to their ability to move
in any direction and particularly driving in plain structured confined spaces [1, 2]. The
use of omniwheels, unlike conventional wheels have fewer kinematic constraints and
allow the robot to move in a wide range of mobility. Adding, holonomy, considerable
maneuverability. In modern times, the cases of omniwheel-based holonomic robots
developed for different applications are considerable numerous and relevant. For
instance, the use of personal assistant robots as walking-helper tool, demonstrated
capability to provide guidance and dynamic support for impaired walking people [3].
There are also other types of human-robot assistants in cases where mobile robots are
purposed to perform interaction socially assistive [4]. In healthcare, robotic systems
have been designed with mecanum wheels for providing omnidirectional motion to
wheelchairs [5]. Although, manipulators onboard mobile platforms are not relatively
new approaches, however mobile manipulators with omnidirectional locomotion pro-
vides interesting advantages. Mecanum-wheeled platforms have been performed as
holonomic vehicular manipulators moving in industrial working spaces [6]. Park
et al. [7] presented a controller for velocity tracking and vibration reduction of a
cart-pole inverted pendulum like-model of omnidirectional assistive mobile robot.
The robot adopted mecanum wheel rolling with suspension for keeping consistent
contact mecanum wheel and ground while transporting heavy goods placed on high
locations.
Moreover, instrumented omnidirectional platforms with visually guided servo-
ing devices have been reported [8]. Furthermore, holonomic robotic platforms have
been exploited as robotized sporting and training technology to provide assistance
and training in racquet sports [9]. A traditional robotic application is exploiting
advantages of omniwheel-based mobile robots being deployed as domestic assistants
in household environments [10]. An advantageous application of mecanum-wheels
used as an omni-directional mobile robot in industrial fields has been critical. For
instance, autonomous material indoor transportation [11] as well as robotic plat-
forms of four omnidirectional wheels working in warehouses [12]. The work [13]
performed collaborative manipulation by multirobot displacing payloads transported
to desired locations in planar obstacle-clustered scenarios maneuvering through nar-
row pathways for which advocated the use of Mecanum-Wheeled Robots positioning
without body-orientation change. The work [14] developed modular reconfiguration
by deploying a group of vehicles to perform different mission tasks. Recofnigura-
Four Wheeled Humanoid Second-Order … 463

tion was done at the level of motion planning deploying four-wheel-drive Mecanum
vehiclemobile robots. A critical deployment of omniwheel robotics is in a gaining
popularity field for omnidirectional humanoids, that is in nursing and rehabilitation
[15].
A variety of demands on the use of robots differs considerably on how to deploy
them. For instance, in the industry with robots working on-site with highly accurate
robotic arms, or mobile platforms moving heavy loads and assisting humans workers
in close proximity [16].
This chapter presents the trajectory tracking model of a humanoid robot at the
stage of kinematic modeling and simulation. This work approaches a model of two
upper limbs of three joints fixed on a trunk that is placed on a Mecanum wheeled plat-
form with four asynchronous rolling drives. This research’s main contribution is the
development of a three-cascade kinematic trajectory tracking controller. Each cas-
cade is comprised of a different order derivative deduced from the robot’s kinematic
model. Different observers to complement the control are developed considering a
deterministic approach and based on wheels-encoder and an inertial measurement
unit. The omniwheels physical arrangement is radially equidistant and tangential
rotated with respect to (w.r.t.) the center of reference. This work shows numeri-
cal simulation results that allow validating and understanding proposed models and
ideas, as well as to refine them before converting them into feasible and operational
physical systems.
This chapter organizes the sections as follows. Section 2 briefly discusses similar
works. Section 3 deduces motion equations of the robot’s arms and its four-wheel
four-drive omnidirecional rolling platform. Section 4 defines the sensing model and
observers used as online feedback elements. Section 5 describes the three-cascade
controller. Finally, Sect. 7 provides conclusion of the work.

2 Related Work

The study of position errors and calibration methods for robot locomotion with omni-
directional wheels has demonstrated to be relevant [17]. The work [18] developed a
reference-based control for a mecanum-wheels omnidirectional robot platform, rely-
ing on the robot’s kinematic model and generate trajectories and optimal constrained
navigation. The cost function quantified differences between the robot’s path predic-
tion and using a family of parameterized reference trajectories. [19] demonstrated
control of a time-varying proportional integral derivative model for trajectory track-
ing of a mecanum-wheeled robot. It used linearization of a nonlinear kinematic error
model and controller’s parametric coefficients adjusted by trial-and-error.
Omniwheel-based robot motion is effected by systematic perturbations differently
as in conventional wheeled robots. Identifying the sources of pose errors are critical
to develop methods for kinematic errors reduction of omni-directional robotic sys-
tem [20]. The work [21] evaluated a method to correct systematic odometry errors of
a humanoid-like three-wheeled omnidirectional mobile robot. Correction was made
464 A. A. Torres-Martínez et al.

by iteratively adjusting effective values with respect to robot’s kinematic parame-


ters, matching referenced positions by estimation. The correct functionality of a four
Mecanum wheels robot was approached by the Dijkstra’s algorithm and tested in [22].
The work [23], proposed an odometry-based kinematic parameters error calibration
deploying least squares linear regression for a mobile robot with three omniwheels.
Similarly, [24] presented a calibration system to reduce pose errors based on the
kinematic formulation of a three-wheeled omnidirectional robot, considering sys-
tematic and non-systematic errors compensation. Another three-omniwheel robot
was reported in [27], where motion calibration is obtained by optimizing the effec-
tive kinematic parameters and the inverse Jacobian elements are minimized through
a cost function during path tracking. Previous cited works reported different solu-
tions to calibrate odometric position errors in omni-wheel-based mobile robots. Such
works highlight two main approaches, by numerical estimations and by modeling
deterministic metrical errors. A main difference with respect to the present context
is the focus on tracking control of local Cartesian points within a global trajectory,
unlike encoders usage, other pose measurement methods are considered, for instance
data fusion of online heterogeneous inertial measurements.
The research [25] reported a radially arranged omnidirectional four-wheeled robot
controlled by three proportional-integral-derivative controllers (PID). The PIDs con-
trolled speed, heading, and position during trajectory tracking and using odometry
to measure the robot’s posture. The research [26] developed a theoretical kinematic
basis for accurate motion control of combined mobility configurations based on
coefficients for speeds compensation mainly caused by wheels slippage in a four
mecanum wheels industrial robot. The work [28] reported a controller based on
observer and a high order sliding mode for a multirobot system of three-wheeled
omnidirectional platforms. Previous cited works reported motion control approaches
of omniwheel robotic platforms that tackled either slippage problems or motion inac-
curacies to improve the robot’s posture. As a difference with the present work, it
assumes that pose observation is already adequate but rather focusing on robustly
controlling the robot’s path motion along a linear trajectory segment by triple kine-
matics control of high-order derivatives, simultaneously.
The research reported in [29] introduced a general model for analysis of symmet-
rical multi-wheel omnidirectional robot. Inclusion of constrained trajectory planning
optimization was implemented for free collision navigation. The reported work in
[30] introduced a kinematic model for trajectory tracking control of a radially ori-
ented four-wheel omnidirectional robot using odometry as feedback.
The work [31] presented a four Mecanum wheels omnidirectional mobile robot
for motion planning tasks implementing fault-tolerance on wheels with a fuzzy con-
troller. The work conducted by [32] proposed a neural control algorithm to determine
neural network weights adaptation with parametric disturbances as an intelligent
control for path motion by a four mecanum wheels mobile robot. Fault tolerant nav-
igation control on a four mecanum-wheel structure was developed by [33], using
adaptive control second order dynamics and parametric uncertainty. A controller for
an omni-wheeled industrial manipulator was presented by [34], it adaptively com-
bined a fuzzy wavelet neural network, a sliding mode and a fractional-order criterion.
Four Wheeled Humanoid Second-Order … 465

Finally, a path-following control using extended Kalman filtering for sensor fusion
was introduced in [35].
Some of previous cited works reported approaches using soft-computing tech-
niques combined with traditional control methods for tracking, either for recovery
of disturbances and fault tolerances in tracking motion control. As a difference, in
the present research a model-based recursive control is proposed with the particular-
ity of implementing inner multi-cascades combining multiple higher-order inputs.
Numerical errors are reduced with respect to a reference model by successive approx-
imations as convergence laws. The focus presented in this research differs from most
the cited related work, fundamentally in the class of control’s structure and the kind
of observers models. For instance, while a traditional PID controller might combine
three different order derivatives as a summation of terms into an algebraic expres-
sion, the proposed approach exploits each derivative inside another of lower order
and faster sampling as different recursive control cycles.

3 Robot Motion Model

This section describes the essential design parts of the proposed robotic structure
at the level of simulation model. Additionally, both kinematic models, the onboard
manipulators and the four mecanum wheels and the omnidirectional locomotive
structure are illustrated.
Figure 1a depicts the humanoid CAD concept of the proposed robotic platform.
Figure 1b shows a basic figure created in C/C++ language as a resource for numerical
simulations, which deploy the Object Dynamic Engine (ODE) libraries to create
simulated animations.

(a) CAD structure. (b) Simulation model.

Fig. 1 Mecanum four-wheeled humanoid structure. a a CAD mo del. b a simulation model from
the physics engine ODE
466 A. A. Torres-Martínez et al.

Fig. 2 Onboard arms basic mechanism. Joints and links kinematic parameters (above). Side of the
elbow mechanism (middle). Side of the wrist mechanism and shoulder gray-color gear (below)

The four mecanum wheels are located symmetric radially arranged and equidis-
tant beneath the chassis structure. Each wheel is independently driven both rotary
directions. This work provides the emphasis on the omnidirectional locomotion con-
troller, since motion over the plane ground has impacts on the manipulators position,
adding robot’s translation and orientation is given in models separately along the
manuscript. Figure 2 illustrates a basic conceptual design purposed to help describ-
ing joints’ functional form. The limbs purpose in this manuscript is to illustrate
general interaction in general scenarios with manipulable objects.
Therefore, the onboard arms may be modeled for multiple degrees of freedom.
However, in this manuscript the manipulators have been established symmetrically
planar with three rotary joints: shoulder (θ0 ), elbow (θ2 ) and wrist (θ2 ), all turning
in pitch (see Fig. 2). Additionally, the robot’s orientation is assumed to be the arms’
yaw motion (θt ). The onboard arm’s side view is shown in Fig. 2(below), where the
gray-color gear is the actuating device ϕ0 that rotates a shoulder. The arm’s joint φl1
describes angular displacements for link l1 . Figure 2(middle) shows an antagonistic
arm’s side view where the orange-color joint mechanism for θ1 (elbow) is depicted.
The mechanism device for θ1 has asynchronous motion from θ0 and θ2 . Additionally,
Four Wheeled Humanoid Second-Order … 467

Fig. 2(middle) shows yellow-color gearing system to depict how the wrist motoin
is obtained and transmitted from the actuating gear ϕ2 towards ϕ5 . The wrist rotary
angle is the joint θ2 that rotates the gripper’s elevation angle.
Hence, without lose of generality, the Cartesian position is a system of equations
. ..
that are established for now in two-dimension, z = 0, z = 0 and z = 0. Let z be the
depth dimension no treated in this section. Subsequently, a third Cartesian component
may be stated when the robot’s yaw is defined as it impacts the arms pose, given in
the next sections.
From the depiction of Fig. 2(above), the following arms position xa , y A expres-
sions are deduced to describe the motion in sagittal plane (pitch),

xa = l1 · cos(θ0 ) + l2 · cos(θ0 + θ1 ) + l3 · cos(θ0 + θ1 + θ2 ), (1)

and
ya = l1 · sin(θ0 ) + l2 · sin(θ0 + θ1 ) + l3 · sin(θ0 + θ1 + θ2 ). (2)

where the functional forms of actuating joints are described in the following
Definition 1,
Definition 1 (Joints functional forms) Assuming gears angles and teeth numbers by
ϕi and n j , respectively, let ϕ0 be the actuating joint,

θ0 = ϕ0 . (3)

Let ϕ6 be an actuating gear that transmits rotation to gear ϕ6 ≡ θ1 for link l2 ,


 
n6n7 n6
ϕ8 = · ϕ6 = · ϕ6 , (4)
n7n8 n8
.
Therefore, θ1 = ϕ8 . Let ϕ1 transmit motion to ϕ5 ≡ θ2 for l3 rotation by
 
n1n2n3n4 n1
ϕ5 = · ϕ1 = · ϕ1 , (5)
n2n3n4n5 n5
.
therefore θ2 = ϕ5 .
Previous statements conduct to the following Proposition 1.
Proposition 1 (Arm’s kinematic law) The kinematic control including the gears
mechanical advantages n 6 /n 8 and n 1 /n 5 , reaches the reference angular positions
θ0,1,2 , while varying joints angles ϕ0,8,5 by

    ⎛θ − ϕ ⎞
xat+1 − xat −l1 c0 − nn 68 (l1 c0 + l2 c01 ) − nn 15 (l1 c0 + l2 c01 + l3 c012 ) 0 0
= n6 n1 ⎝θ1 − ϕ8 ⎠
yat+1 − yat l1 s0 n 8 (l1 s0 + l2 s01 ) n 5 (l1 s0 + l2 s01 + l3 s012 )
θ2 − ϕ5
(6)
468 A. A. Torres-Martínez et al.

Fig. 3 Onboard arms local Cartesian motion simulation for an arbitrary trajectory

Hence, the law (6) is satisfied when limϕi →θi (x, y)t+1 − (x, y)t = 0. Being x, y a
Cartesian position, and (θi − ϕi ) an instantaneous joints error.
It follows that validating previous kinematic expressions (1) and (2), Fig. 3 shows a
numerical simulation for Cartesian position along an arbitrary trajectory.
Moreover, from the system of nonlinear equations modeling position (1) and (2)
and hereafter assuming that joints θ j (ϕk ) are functions in terms of gears rotations.
Thus, first-order derivative w.r.t. time is algebraically deduced and Cartesian veloc-
ities are described by
      1   2
ẋa −s0 −s01 −s012
= l1 θ̇0 + l2 θ̇i + l3 θ̇i . (7)
ẏa c0 c01 c012
i=0 i=1

It follows the second-order derivative which describe the arms Cartesian accelera-
tions, where the serial links’ Jacobian is assumed a non stationary matrix Jt ∈ R2×3 ,
such that ⎛ ⎞ ⎛ ⎞
  θ̈0 θ̇ 0
ẍa
= Jt · ⎝θ̈1 ⎠ + J̇t · ⎝θ̇ 1 ⎠ (8)
ÿa
θ̈2 θ̇ 2

The ultimate purpose of this section is to merely establish kinematic models as


a basis for the following sections. However, the essential focus of this research is
the locomotive control of the mobile holonomic structure. At this point, the two-
Four Wheeled Humanoid Second-Order … 469

Fig. 4 4W4D holonomic kinematics. Mecanum wheels location without twist (left). Wheels posi-
tions and twisted ± π2 w.r.t. its center (right)

dimension manipulators can therefore exploit the native holonomic mobility such as
position and rotation as to provide three-dimension spatial manipulator’s trajectories.
Additionally, omnidirectional mobility as a complement to the arms, allows arms’
degrees of freedom complexity reduction.
Let us establish the following mobility kinematic constraints depicted in Fig. 4.
Therefore, without loss of generality let us state the following Proposition 2,
Proposition 2 (Holonomic motion model) Let ut be the robot state vector in its
Cartesian form with components (x, y) orientation, such that ut ∈ R2 , u = (x, y) .
Hence, the forward kinematics is
. .
u = r K · Φ, (9)

where, K is a stationary kinematic control matrix containing geometrical parameters


and r is the wheels radius. Let Φ t = (φ1 , φ2 , φ3 , φ4 ) be the four wheels angular
velocity vector. Likewise, the backward kinematics where the constraints matrix is a
non square system

1 1 .
Φ̇ = · K+ · u̇ = · K (K · K )−1 · u. (10)
r r

Therefore, according to geometry of Fig. 4 and the general models of previous Propo-
. .
sition 2, the following algebraic deduction arises, Cartesian speeds x and y in holo-
nomic motion are obtained from wheels tangential velocities Vk expressed as,
470 A. A. Torres-Martínez et al.

. π π π π
x = V1 · cos α1 − + V2 · cos α2 − + V3 · cos α3 − + V4 · cos α4 − ,
2 2 2 2
(11)
as well as
. π π π π
y = V1 · sin α1 − + V2 · sin α2 − + V3 · sin α3 − + V4 · sin α4 − .
2 2 2 2
(12)

Moreover, let. Vk be the wheels tangential velocities described in terms of the


angular speeds φ k , such that the following equality is stated,
.
Vk = r · φ k (13)

From where, the stationary non square kinematic control matrix K is provided by
Definition 2,
Definition 2 (4W4D holonomic kinematic matrix) Each wheel with angle αk w.r.t.
the robot’s geometric center, thus
 
cos(α1 − π2 ) cos(α2 − π2 ) cos(α3 − π2 ) cos(α4 − π2 )
K= . (14)
sin(α1 − π2 ) sin(α2 − π2 ) sin(α3 − π2 ) sin(α4 − π2 )

It follows that the speed holonomic model in as a function of wheels angular


speeds and matrix K is ⎛. ⎞
φ1
. ⎜. ⎟
x ⎜φ ⎟
. = r · K · ⎜ . 2⎟ . (15)
y ⎝φ 3 ⎠
.
φ4

Similarly, from previous model higher-order derivatives are deduced for subse-
quent treatment for the sake of controller cascades building. Thus, the second-order
kinematic model is
⎛ .. ⎞
φ
 ..    ⎜ .. 1 ⎟
x cos(α1 − π2 ) cos(α2 − π2 ) cos(α3 − π2 ) cos(α4 − π2 ) ⎜φ 2 ⎟
.. = r · · ⎜ .. ⎟ .
y sin(α1 − π2 ) sin(α2 − π2 ) sin(α3 − π2 ) sin(α4 − π2 ) ⎝φ 3 ⎠
..
φ4
(16)
Likewise, a third-order derivative is provided by model
Four Wheeled Humanoid Second-Order … 471

(a) First order derivative performance.

(b) Second order derivative performance.

Fig. 5 General higher order derivatives for 4W4D holonomic model. Velocity (above). Acceleration
(below)
472 A. A. Torres-Martínez et al.
⎛... ⎞
φ
...  π π π π
 ⎜...1 ⎟
x cos(α1 − 2 ) cos(α2 − 2 ) cos(α3 − 2 ) cos(α4 − 2 ) ⎜φ 2 ⎟
... = r · · ⎜... ⎟ .
y sin(α1 − π2 ) sin(α2 − π2 ) sin(α3 − π2 ) sin(α4 − π2 ) ⎝φ 3 ⎠
...
φ4
(17)
The fact that matrix K is stationary keeps simplistic the linear derivative expressions.
For this type of four-wheel holonomic platforms, their kinematic models produce
the following behavior curves shown in Fig. 5.

4 Observer Models

This section establishes the main sensing models that are assumed deterministic and
in the cascade controller as elements of feedback for observing the robot’s model
state. It is worth saying that perturbation models and noisy sensor measurements and
calibration methods are out of the scope of this manuscript’s interest.
Thus, let us assume a pulse shaft encoder fixed for each wheel. Hence, let φ̂εk be
a measurement of the angular position of the k th -wheel,


φ̂εkt (η) = ηt , (18)
R
where ηt is the instantaneous number of pulses detected while wheel is rotating.
Let R be defined as the encoder angular resolution. Furthermore, the angular veloc-
ity encoder-based observation is given by the backward high-precision first-order
derivative,
3φ̂t − 4φ̂t−1 + φ̂t−2
φ̇ˆ ε (η, t) = , (19)
(tk − tk−1 )(tk−1 − tk−2 )

with three previous measurements of angle θ̂ε and time tk . Hence, the kth wheel’s
tangential velocity is obtained by
πr
υk = (3ηt − 4ηt−1 + ηt−2 ) , (20)
RΔt
.
where r is the wheel’s radius and considering varying time loops, let Δt = (tk −
tk−1 )(tk−1 − tk−2 ). Without loss of generality, let us substitute previous statements
in Proposition 3 to describe Cartesian speeds observation, such that
Proposition 3 (Encoder-based velocity observer) For simplicity, let us define the
.
constants βk = αk − π2 as constant angles for wheels orientation. Therefore, the
ˆ ẏˆ are modeled by
encoder-based velocity observers ẋ,
Four Wheeled Humanoid Second-Order … 473

4
ẋˆk = υk sin(βk ) (21)
k=1

and
4
ẏˆk = υk cos(βk ). (22)
k=1

Moreover, the four wheels tangential speeds contribute to exert yaw motion w.r.t.
the center of the robot. Thus, since by using encoders the wheels linear displacements
can be inferred, then an encoder-based yaw observation θ̂ε is possible,

4
πr
θ̂ε = ηk , (23)
2L k=1

where L is the robot’s distance between any wheel and its center of rotation. Thus,
the robot’s angular velocity observer based only on the encoders measurements is

πr
4
 
θ̇ˆε = 3ηkt − 4ηkt−1 + ηkt−2 . (24)
4R LΔt k=1

In addition, in order to decrease time-based derivative order of gyroscope’s raw


measurements θ̇g , let us integrate sequence of raw measurements according to the
Newton-Cotes integration approach as Definition 3,
Definition 3 (Online sensor data integration) The robot’s yaw observation is inferred
by time integration of raw angular velocity measurements, such that
  N −1

tN
t N − t0
θ̂g = θ̇ˆg dt = θ̇ˆg0 + 2 θ̇ˆgk + θ̇ˆg N , (25)
t0 2N k=1

with N available measurements in time segment t N − t0 ,


Furthermore, for the robot’s yaw motion let us assume a fusion of encoders and
inertial measurements about angle θι (inclinometer [rad]) and angular velocity ωg
(gyroscope [rad/s]), such that a complete robot’s yaw rate observer is provided by
Proposition 4.

Proposition 4 (Yaw deterministic observers) The robot’s angular velocity is an


averaged model of three sensing models, encoder θ̂ε , inclinometer θ̂ι and gyroscope
θ̂g such that

1 1 1 t2 ˆ
θ̂t = θ̂ε + θ̂ι + θ̇g dt, (26)
3 3 3 t1
474 A. A. Torres-Martínez et al.

where θ̂ε is substituted by (23). It follows that its first-order derivative,

1ˆ 1 d θ̂ι 1
ω̂t = θ̇ε + + θ̇ˆg , (27)
3 3 dt 3

where θ̇ˆε is substituted by (24).

5 Omnidirectional Cascade Controller

The relevant topic of this manuscript is a second-order cascade controller. It implies


three nested control cycles. The highest frequency is the acceleration loop within the
velocity cycle, and both in turn inside the loop for position, being the latter the slowest.
The global cycle of position uses global references, while the rest of the inner cycles
work on predictions as local references. The proposed cascade controller considers
three feedback simultaneously and reduces the errors by recursive calculations and
successive approximations with different sampling frequencies.
By stating the equation provided in Proposition 2 in the form of differential equa-
tion
du dΦ
=r ·K· (28)
dt dt
and solving according to the following expression, where both differentials dt are
reduced, and by integrating w.r.t. time establishing the limits in both sides of equation:
 u2  Φ2
dudt = r · K dΦdt, (29)
u1 Φ1

resulting in the following equality:

u2 − u1 = r · K · (Φ 2 − Φ 1 ). (30)

Therefore, considering the Moore-Penrose approach to obtain the pseudoinverse


of the non square stationary kinematic matrix K and by solving and algebraically
arranging using continuous-time notation, for and expression in terms of a recursive
backward solution,

1
Φ t+1 = Φ t + · K T (K · K T )−1 · (ur e f − ût ), (31)
r
where, the prediction vector ut+1 is re-formulated as the global reference ur e f or
the goal the robot is desirable to reach. Likewise, for the forward kinematic solution
ut+1 is
ut+1 = ut + r · K · (Φ t+1 − Φ̂ t ). (32)
Four Wheeled Humanoid Second-Order … 475

Fig. 6 A global cascade recursive simulation for Cartesian position

Therefore, the first global controller cascade is formed by means of the pair of
recursive expressions (31) and (32). Proposition 5 highlights the global cascade by.
Proposition 5 (Feedback position cascade) Given the inverse kinematic motion with
observation in the workspace ût

1
Φ t+1 = Φ t + · K T (K · K T )−1 · (ur e f − ût ), (33)
r

and direct kinematic motion with observation in the control variables space Φ̂ t ,

ut+1 = ut + r · K · (Φ t+1 − Φ̂ t ). (34)

Proposition 5 is validated through numerical simulations that are shown in Fig. 6 the
automatic Cartesian segments and feedback position errors decreasing.
Without loss of generality and following the same approach as Proposition 5,
let us represent a second controller cascade to control velocity. Thus, the following
equation expresses the second-order derivative kinematic expression given in (16)
as a differential equation, .
d u̇ dΦ
=r ·K· (35)
dt dt
and solving definite integrals,
476 A. A. Torres-Martínez et al.

Fig. 7 Numerical simulation for second cascade inner recursive in terms of velocities

 .
u2  .
Φ2 .
.
.
d udt = r · K .
d Φdt, (36)
u1 Φ1

and similarly obtaining the following first-order equality,


. . . .
u2 − u1 = r · K · (Φ 2 − Φ 1 ). (37)

It follows the Proposition 6 establishing the second cascade controlling the first-order
derivatives.
Proposition 6 (Feedback velocity cascade) The backwards kinematic recursive
function with in-loop velocity observers and prediction u̇t+1 used as local reference
u̇r e f is given by

1
Φ̇ t+1 = Φ̇ t + · K T · (K · K T )−1 · (u̇r e f − u̇ˆ t ), (38)
r
likewise the forward speeds kinematic model,

ˆ ).
u̇t+1 = u̇t + r · K · (Φ̇ t+1 − Φ̇ (39)
t

Proposition 6 is validated by simulation Fig. 7.


Similarly, the third-order model from Eq. (17),
Four Wheeled Humanoid Second-Order … 477

... ...
u =K·Φ (40)

and stated as a differential equation,


.. ..
du dΦ
=r ·K· , (41)
dt dt
which is solved by definite integral in both sides of equality
 ..
u2  ..
Φ2 ..
..
..
d udt = r · K ..
d Φdt, (42)
u1 Φ1

thus, a consistent second-order derivative (acceleration) equality is obtained,


.. .. .. ..
u2 − u1 = r · K · (Φ 2 − Φ 1 ). (43)

Therefore, the following Proposition 7 is provided and notation rearranged for a third
recursive inner control loop in terms of accelerations.
Proposition 7 (Feedback acceleration cascade) The backwards kinematic recursive
function with in-loop acceleration observers and prediction Φ̈ t+1 used as local ref-
erence ür e f is given by

1
Φ̈ t+1 = Φ̈ t + · K T · (K · K T )−1 · (ür e f − üˆ t ). (44)
r
Additionally, the forward acceleration kinematic model is

.. .. .. ˆ
..
ut+1 = ut + r · K · (Φ t+1 − Φ t ), (45)

Proposition 7 is validated through numerical simulation of Fig. 8 showing an arbitrary


performance.
At this point is worth highlighting a general convergence criteria for recursive
control loops. The cycles end up by satisfying feedback error numerical precision
εu,Φ , such that.
Definition 4 (Convergence criteria) When the feedback error numerically meets a
local general reference according to the limit

lim Φ r e f − Φ̂ = 0,
ΔΦ→0

where the feedback error is eΦ = (Φ r e f − Φ̂) that numerically will nearly approach
zero. Thus, given the criterion eΦ < εΦ is a
478 A. A. Torres-Martínez et al.

Fig. 8 Numerical simulation for third cascade inner recursive in terms of accelerations

Fig. 9 Second-order cascade controller block diagram

 
 Φ − Φ̂ 
 ref t
  < ε. (46)
 Φre f 

Although, previous Definition 4 was described as a criterion for Φ, it has general


applicability being subjected to condition any other control variable in process.
Therefore, given Propositions 5, 6 and 7, which establish the different recursive
control loops for each order of derivative involved, Fig. 9 shows cascades coupling
forming the controller.
Therefore, Fig. 9 is summarized in Table 1 depicting only the coupling order of
solutions provided in Propositions 5–7.
Essentially, the coupling element between steps 1 and 2 of Table 1 is a first order
derivative of wheels angular velocities w.r.t. time. Likewise, the following list briefly
describes the observers or sensing models interconnecting every step in the controller
Four Wheeled Humanoid Second-Order … 479

Table 1 Cascade-coupled recursive controllers


Steps Equation
 −1
1 Φ t+1 = Φ t + 1
r · KT K · KT · (ur e f − ût )

. . . .̂
2 ut+1 = ut + r · K · (Φ t+1 − Φ t )
.. .. .. ..
3 Φ t+1 = Φ t + 1
r · K T (K · K T )−1 · (ut+1 − uˆ t )

.. .. .. ..ˆ
4 ut+1 = ut + r · K · (Φ t+1 − Φ t )
. . . .̂
5 Φ t+1 = Φ t + 1
r · K T (K · K T )−1 · (ut+1 − ut )

6 ut+1 = ut + r · K · (Φ t+1 − Φ̂ t )

cascades. Basically, it is about variational operations with fundamentals in numerical


derivation and integration to transform sensor data into consistent physical units.
1. Feedback from step 1 to step 2:

d 3Φ̂ t − 4Φ̂ t−1 + Φ̂ t−2


Φ̇ t+1 = Φ t+1 = ,
dt Δt
2. Feedback from step 2 to step 3:
.̂ .̂ .̂
d . 3ut − 4ut−1 + ut−2
üt+1 = ut+1 = ,
dt Δt
3. Feedback from step 4 to step 5:
 n−1
t
tn − t0 ..ˆ
uˆ i + uˆ k ),
.. .. ..
u̇t+1 = ut+1 dt = · (u 0 + 2
t0 2 · nt i=1

4. Feedback from step 5 to step 6:


 n−1
t .
tn − t0 .̂ .̂ .̂
Φ t+1 = Φ t+1 dt = · (Φ 0 + 2 Φ i + Φ k ),
t0 2 · nt i=1

Additionally, the following listing Algorithm 1 is the controller in pseudocode


notation.
480 A. A. Torres-Martínez et al.

. .̂ .. .. .̂ .. ..ˆ
Data: , K, ur e f , ut , ût , ut , ut , ut , uˆ t , Φ t , Φ̂ t , Φ t , Φ t , Φ t
ur e f = (xi , yi )T ;
while (ur e f − ut ) < u do
Φ t+1 = Φ t + r1 · K T (K · K T )−1 · (ur e f − ût );
3Φ̂ t −4Φ̂ t−1 +Φ̂ t−2
dt Φ t+1 =
d
Δt ;
.
ˆ ) < ε do
while (Φ t+1 − Φ̇ t Φ̇
. . . .̂
ut+1 = ut + r · K · (Φ t+1 − Φ t );
.̂ .̂ .̂
d .
dt ut+1= 3ut −4uΔt t−1 +ut−2
;
while (üt+1 − üˆ t ) < εü do
.. .. .. ..
Φ t+1 = Φ t + · K T · (K · K T )−1 · (ut+1 − uˆ t );
1
r
.. .. .. ..ˆ
ut+1 = ut + r · K · (Φ t+1 − Φ t );
 b .. ..ˆ  jn−1 ..ˆ ..ˆ
a ut+1 dt = 2·n · (u0 + 2 · j=1 uk j + ukn );
b−a

end
. . . .̂
Φ t+1 = Φ t + r1 · K T · (K · K T )−1 · (ut+1 − ut );
b . .̂  jn−1 .̂ .̂
a Φ t+1 dt = 2·n · (Φ 0 + 2 · j=1 Φ k j + Φ kn );
b−a

end
ut+1 = ut + r · K · (Φ t+1 − Φ̂ t );
end
Algorithm 1: Second-order three cascade controller pseudocode

The following Figures of Sect. 6 show the numerical simulation results under
a controlled scheme. The robot navigate to different Cartesian positions and within
trajectory segments the cascade controller is capable to control position, then controls
the velocity exerted within such segment of distance, and similarly the acceleration
is controlled within a small such velocity-window that is being controlled.

6 Results Analysis and Discussion

The three cascade controllers required couplings between them through numerical
derivations and integrations overtime. In this case, backwards high precision deriva-
tives and Newton-Cotes integration were used. Although, the traditional PID also
deploys three derivative orders, the use of them is by far different in their imple-
mentation. The proposed cascade model worked considerably stable, reliable and
numerically precise.
A main fundamental in the proposed method is that three loops are nested. The
slowest loop establishes a current metric error distance toward a global vector ref-
erence. Then, the second and third nested control loops establish local reference
velocity and acceleration, both corresponding to the actual incremental distance. The
three loops are conditioned to recursively reduce errors up to a numerical precision
value by means of successive approximations.
Four Wheeled Humanoid Second-Order … 481

Fig. 10 Controlled Cartesian position along a trajectory compounded of four global points

For instance, Fig. 10 shows a nonlinear Cartesian trajectory comprised of four


main linear local segments with different slopes each.
The first cascade loop basically controls the local displacements in between cou-
ples of global points. Figure 11 shows four peaks that represent the frontier of each
global point reached. The controller locally starts metric displacement after getting
each goal point, and approaches the next one by means of successive approximations
producing nonlinear rates of motion.
The second inner looped cascade controls the robot’s Cartesian velocities w.r.t.
time as shown in Fig. 12. At each goal reached the first-order derivative shows
local minimal or maximal with magnitudes depending on the speeds due to the loop
reference values (local predictions as references). where the last Trajectory’s point
reached is the global control’s reference. In this case, the velocity references are
local values to be fitted, or also known as the predicted values for the next local
loop calculation at t + 1, such as u̇t+1 or Φ̇ t+1 . As the velocity control loop is inside
the metric displacement loop, the velocity is controlled by a set of loops, only for
a segment of velocity, the one that is being calculated in the current displacement’s
loop.
The third inner control loop that manages the second-order derivative produces
the results shown in Fig. 13. This control loop is the fastest iterative process, which
exerts the higher sampling frequency. In this case, the acceleration references are
local values to be fitted, or the predicted values for the next local loop calculation,
such as üt+1 or Φ̈ t+1 . As the acceleration control loop is inside the velocity loop, the
482 A. A. Torres-Martínez et al.

Fig. 11 Controlled displacement performance

Fig. 12 Controlled linear velocity


Four Wheeled Humanoid Second-Order … 483

Fig. 13 Controlled acceleration

acceleration is controlled by a set of loops, only for a segment of velocity, the one
that is being calculated in the current velocity’s loop.
Finally, the observers that provided feedback were stated as a couple of single
expression representing a feasible model of sensor fusion (summarized by Proposi-
tion 4). The robot’s angular motion (angle and yaw velocity) combined wheels motion
and inertial movements into a compounded observer model. The in-loop transition
between numerical derivatives worked highly reliably. The multiple inner control
cascades approach resulted numerically accurate, working online considerably fast.
Although, this type of cascade controller has the advantage that input, reference and
state vectors and matrices can easily be augmented without any alteration to the
algorithm, but if compared with PID controller in terms of speed, the latter is faster
due to less computational complexity.

7 Conclusions

The natural kinematic constraints of a genuinely omnidirectional platform always


produce straight paths. It is an advantage because it facilitates its displacement.
In strictly necessary situations, to generate deliberately curved or even discontinu-
ous displacements, an omnidirectional platform traverses it through linearization of
infinitesimal segments.
484 A. A. Torres-Martínez et al.

The proposed cascade control model was derived directly from the physics of the
robotic mechanism. In such an approach, the kinematic model was obtained as an
independent cascade and established as a proportional control type with a constant
gain represented by the robot’s geometric parameters. The gain or convergence factor
resulted in a non-square stationary matrix (MIMO). Unlike a PID controller, the
inverse analytic solution was formulated to obtain a system of linear differential
equations. In its solution, definite integration produced a recursive controller, which
converged to a solution by successive approximations of the feedback error.
The strategy proposed in this work focuses on connecting all the higher order
derivatives of the system in nested forms (cascades), unlike a PID controller which is
described as a linear summation of all derivative orders. Likewise, a cascade approach
does not need gain adjustment.
The lowest order derivative was organized in the outer cascade. Being the loop
with the slowest control cycle frequency and containing the global control references
(desired positions). Further, the intermediate cascade is a derivative with the next
higher order, and is governed by a local speed reference. That is, this cascade controls
the speed during the displacement segment that has projected the cycle of the external
cascade. Finally, the acceleration cascade cycle is the fastest loop and controls the
portions of acceleration during a small interval of displacement along the trajectory
towards the next Cartesian goal.
The proposed research evidenced a good performance, showing controlled limits
of disturbances due to the three controllers acting over a same portion of motion.
The controller was robust and the precision error ε allowed to adjust the accuracy of
the robot goal closeness.

Acknowledgements The corresponding author acknowledges the support of Laboratorio de


Robótica. The third and fourth authors acknowledge the support of the Kazan Federal University
Strategic Academic Leadership Program (‘PRIORITY-2030’).

References

1. Mutalib, M. A. A., & Azlan, N. Z. (2020). Prototype development of mecanum wheels mobile
robot: A review. Applied Research and Smart Technology, 1(2), 71–82, ARSTech. https://doi.
org/10.23917/arstech.v1i2.39.
2. Yadav P. S., Agrawal V., Mohanta J. C., & Ahmed F. (2022) A theoretical review of mobile
robot locomotion based on mecanum wheels. Joint Journal of Novel Carbon Resource Sciences
& Green Asia Strategy, 9(2), Evergreen.
3. Palacín, J., Clotet, E., Martínez, D., Martínez, D., & Moreno, J. (2019). Extending the appli-
cation of an assistant personal Robot as a Walk-Helper Tool. Robotics, 8(27), MDPI. https://
doi.org/10.3390/robotics8020027.
4. Cooper S., Di Fava A., Vivas C., Marchionni L., & Ferro F. (2020). ARI: The social assistive
robot and companion. In 29th IEEE International Conferences on Robot and Human Inter-
active Communication, Naples Italy, August 31–September 4. https://doi.org/10.1109/RO-
MAN47096.2020.9223470.
Four Wheeled Humanoid Second-Order … 485

5. Li, Y., Dai, S., Zheng, Y., Tian, F., & Yan, X. (2018). Modeling and kinematics simulation of a
mecanum wheel platform in RecurDyn. Journal of Robotics Hindawi. https://doi.org/10.1155/
2018/9373580
6. Rohrig, C., Hes, D., & Kunemund, F. (2017). Motion controller design for a mecanum wheeled
mobile manipulator. In 2017 IEEE Conferences on Control Technology and Applications, USA
(pp. 444–449), August 27–30. https://doi.org/10.1109/ccta.2017.8062502.
7. Park J., Koh D., Kim J., & Kim C. (2021). Vibration reduction control of omnidirectional
mobile robot with lift mechanism. In 21st International Conferences on Control, Automation
and Systems. https://doi.org/10.23919/ICCAS52745.2021.9649932.
8. Belmonte, Á., Ramón, J. L., Pomares, J., Garcia, G. J., & Jara, C. A. (2019). Optimal image-
based guidance of mobile manipulators using direct visual servoing. Electronics, 8(374). https://
doi.org/10.3390/electronics8040374.
9. Yang, F., Shi, Z., Ye, S., Qian, J., Wang, W., & Xuan D. (2022). VaRSM: Versatile autonomous
racquet sports machine. In ACM/IEEE 13th International Conferences on Cyber-Physical Sys-
tems, Milano Italy, May 4–6. https://doi.org/10.1109/ICCPS54341.2022.00025.
10. Eirale A., Martini M., Tagliavini L., Gandini D., Chiaberge M., & Quaglia G. (2022). Marvin:
an innovative omni-directional robotic assistant for domestic environments. arXiv:2112.05597,
https://doi.org/10.48550/arXiv.2112.05597.
11. Qian J., Zi B., Wang D., Ma Y., & Zhang D. (2017). The design and development of an omni-
directional mobile robot oriented to an intelligent manufacturing system. Sensors, 17 (2073).
https://doi.org/10.3390/s17092073.
12. Zalevsky, A., Osipov, O., & Meshcheryakov, R. (2017). Tracking of warehouses robots based on
the omnidirectional wheels. In International Conferences on Interactive Collaborative Robotics
(pp. 268–274). Springer. https://doi.org/10.1007/978-3-319-66471-2_29.
13. Rauniyar A., Upreti H. C., Mishra A., & Sethuramalingam P. (2021). MeWBots: Mecanum-
Wheeled robots for collaborative manipulation in an obstacle-clustered environment without
communication. J. of Intelligent & Robotic Systems, 102(1). https://doi.org/10.1007/s10846-
021-01359-5.
14. Zhou, J., Wang, J., He, J., Gao, J., Yang, A., & Hu, S. (2022). A reconfigurable modular
vehicle control strategy based on an improved artificial potential field. Electronics, 11(16),
2539. https://doi.org/10.3390/electronics11162539
15. Tanioka, T. (2019). Nursing and rehabilitative care of the elderly using humanoid robot. The
Journal of Medical Investigation, 66,. https://doi.org/10.2152/jmi.66.19
16. Shepherd, S., & Buchstab, A. (2014). KUKA Robots On-Site. In W. McGee and M. Ponce
de Leon M (Eds.), Robotic Fabrication in Architecture, Art and Design (pp. 373–380). Cham:
Springer. https://doi.org/10.1007/978-3-319-04663-1_26.
17. Taheri, H., & Zhao, C. X. (2020). Omnidirectional mobile robots, mechanisms and navigation
approaches. Mechanism and Machine Theory, 153(103958), Elsevier. https://doi.org/10.1016/
j.mechmachtheory.2020.103958.
18. Slimane Tich Tich, A., Inel, F., & Carbone, G. (2022). Realization and control of a mecanum
wheeled robot based on a kinematic model. In V. Niola, A. Gasparetto, G. Quaglia & G. Carbone
(Eds.), Advances in Italian Mechanism Science, IFToMM Italy, Mechanisms and Machine
Science (Vol. 122). Cham: Springer. https://doi.org/10.1007/978-3-031-10776-4_77.
19. Thai, N. H., Ly, T. T. K., & Dzung, L. Q. (2022). Trajectory tracking control for mecanum wheel
mobile robot by time-varying parameter PID controller. Bulletin of Electrical Engineering and
Informatics, 11(4), 1902–1910. https://doi.org/10.11591/eei.v11i4.3712
20. Han K., Kim H., & Lee J. S. (2010). The sources of position errors of omni-directional mobile
robot with mecanum wheel. In IEEE International Conferences on Systems, Man and Cyber-
netics, October 10–13, Istanbul, Turkey (pp. 581–586). https://doi.org/10.1109/ICSMC.2010.
5642009.
21. Palacín J., Rubies E., & Clotet E. (2022). Systematic odometry error evaluation and correction
in a human-sized three-wheeled omnidirectional mobile robot using flower-shaped calibration
trajectories. Applied Sciences, 12(5), 2606, MDPI. https://doi.org/10.3390/app12052606.
486 A. A. Torres-Martínez et al.

22. Cavacece, M., Lanni, C., & Figliolini, G. (2022). Mechatronic design and experimentation of a
mecanum four wheeled mobile robot. In: V. Niola, A. Gasparetto, G. Quaglia & G. Carbone G.
(Eds.) Advances in Italian Mechanism Science. IFToMM Italy 2022. Mechanisms and Machine
Science (Vol. 122). Cham: Springer. https://doi.org/10.1007/978-3-031-10776-4_93.
23. Lin, P., Liu, D., Yang, D., Zou, Q., Du, Y., & Cong, M. (2019). Calibration for odometry
of omnidirectional mobile robots based on kinematic correction. In IEEE 14th International
Conferences on Computer Science & Education, August 19–21, Toronto, Canada (pp. 139–
144). https://doi.org/10.1109/iccse.2019.8845402.
24. Maddahi, Y., Maddahi, A., & Sepehri, N. (2013). Calibration of omnidirectional wheeled
mobile robots: Method and experiments. In Robotica (Vol. 31, pp. 969–980). Cambridge Uni-
versity Press. https://doi.org/10.1017/S0263574713000210.
25. Ma’arif, I. A., Raharja, N. M., Supangkat, G., Arofiati, F., Sekhar, R., & Rijalusalam, D.U.
(2021). PID-based with odometry for trajectory tracking control on four-wheel omnidirec-
tional Covid-19 aromatherapy robot. Emerging Science Journal, 5. SI “COVID-19: Emerging
Research”. https://doi.org/10.28991/esj-2021-SPER-13.
26. Li, Y., Ge, S., Dai, S., Zhao, L., Yan, X., Zheng, Y., & Shi, Y. (2020). Kinematic modeling of a
combined system of multiple mecanum-wheeled robots with velocity compensation. Sensors,
20(75), MDPI. https://doi.org/10.3390/s20010075.
27. Savaee E., & Hanzaki A. R. (2021). A new algorithm for calibration of an omni-directional
wheeled mobile robot based on effective kinematic parameters estimation. Journal of Intelligent
& Robotic Systems, 101(28), Springer. https://doi.org/10.1007/s10846-020-01296-9.
28. Khoygani, M. R. R., Ghasemi, R., & Ghayoomi, P. (2021). Robust observer-based control of
nonlinear multi-omnidirectional wheeled robot systems via high order sliding-mode consensus
protocol. International Journal of Automation and Computing, 18, 787–801, Springer, https://
doi.org/10.1007/s11633-020-1254-z.
29. Almasri, E., & Uyguroğlu, M. K. (2021). Modeling and trajectory planning optimization for the
symmetrical multiwheeled omnidirectional mobile robot. Symmetry, 13(1033), MDPI. https://
doi.org/10.3390/sym13061033.
30. Rijalusalam, D.U., & Iswanto, I. (2021). Implementation kinematics modeling and odometry
of four omni wheel mobile robot on the trajectory planning and motion control based micro-
controller. Journal of Robotics and Control, 2(5). https://doi.org/10.18196/jrc.25121.
31. Alshorman, A. M., Alshorman, O., Irfan, M., Glowacz, A., Muhammad, F., & Caesarendra, W.
(2020). Fuzzy-Based fault-tolerant control for omnidirectional mobile robot. Machines, 8(3),
55, MDPI. https://doi.org/10.3390/machines8030055.
32. Szeremeta, M., & Szuster, M. (2022). Neural tracking control of a four-wheeled mobile
robot with mecanum wheels. Applied Science, 2022(12), 5322, MDPI. https://doi.org/10.3390/
app12115322.
33. Vlantis, P., Bechlioulis, C. P., Karras, G., Fourlas, G., & Kyriakopoulos, K. J. (2016). Fault
tolerant control for omni-directional mobile platforms with 4 mecanum wheels. In IEEE Inter-
national Conferences on Robotics and Automation (pp. 2394–2400). https://doi.org/10.1109/
icra.2016.7487389.
34. Wu, X., & Huang, Y. (2021). Adaptive fractional-order non-singular terminal sliding mode
control based on fuzzy wavelet neural networks for omnidirectional mobile robot manipulator.
ISA Transactions. https://doi.org/10.1016/j.isatra.2021.03.035
35. Pizá, R., Carbonell, V., Casanova, Á., Cuenca, J. J., & Salt L. (2022). Nonuniform dual-rate
extended kalman-filter-based sensor fusion for path-following control of a holonomic mobile
robot with four mecanum wheels. Applied Science, 2022(12), 3560, MDPI. https://doi.org/10.
3390/app12073560.

You might also like