0% found this document useful (0 votes)
8 views

Automatic Control of Unmanned Vehicles Based on Deep Reinforcement Learning and YOLO Algorithm Using Airsim Simulation

This paper discusses the use of Deep Reinforcement Learning (DRL) and the YOLO algorithm for the automatic control of unmanned vehicles in a UAV distribution center, utilizing the AirSim simulation environment. The study demonstrates that integrating YOLO with DRL significantly improves training speed and performance in object detection and navigation tasks. Results indicate that the combination of these technologies lays a strong foundation for enhancing UAV operational capabilities.

Uploaded by

dzdlut
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Automatic Control of Unmanned Vehicles Based on Deep Reinforcement Learning and YOLO Algorithm Using Airsim Simulation

This paper discusses the use of Deep Reinforcement Learning (DRL) and the YOLO algorithm for the automatic control of unmanned vehicles in a UAV distribution center, utilizing the AirSim simulation environment. The study demonstrates that integrating YOLO with DRL significantly improves training speed and performance in object detection and navigation tasks. Results indicate that the combination of these technologies lays a strong foundation for enhancing UAV operational capabilities.

Uploaded by

dzdlut
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Automatic Control of Unmanned Vehicles Based

on Deep Reinforcement Learning and YOLO


Algorithm Using Airsim Simulation

Xiulin Zhang1 , Xiaolei Qu1,2 , Shuting Yang1 , Junbiao Dong1 , Jingcheng Zhang3 ,
and Ke Li3(B)
1 Avic Shenyang Aircraft Design and Research Institute, Beijing, China
2 Northwestern Polytechnical University, Xian, China
3 School of Aeronautical Science and Engineering, Beihang University, Beijing, China

[email protected]

Abstract. Through the comparative investigation of the existing research on the


UAV distribution center these years, our group selected unmanned vehicles as the
carrier for the UAV distribution, used to collect and transport the UAV, and at the
same time carried out functional splitting for the application workflow, realized
the autonomous navigation and obstacle avoidance of the UAV, the UAV mobile
object landed, the unmanned vehicle autonomous navigation, the obstacle avoid-
ance, mobile object recognition, tracking and gesture recognition a total of five
main functional modules. In this paper, the two modules of autonomous naviga-
tion, obstacle avoidance, and object detection of unmanned vehicles in a simulation
environment are introduced. In particular, the paper focuses on the experimental
methods and results of AirSim-based autonomous vehicle self-driving simulation
through Deep Reinforcement Learning under the UE4 engine, analyzes the simula-
tion results and puts forward the corresponding optimization ideas, and introduces
the object detection method and concrete implementation details based on YOLO
algorithm. A more complete solution is provided for the unmanned vehicle part of
the UAV distribution center management dilemma. From the simulation results,
the Deep Q Network itself and simulation environment used in this paper are
suitable for verification of unmanned vehicle control, through a certain period of
training, the neural network could make stable decisions for unmanned vehicles
reaching the destination in a specific indoor simulation environment. The verifica-
tion of the unmanned vehicle provides a solid foundation for the implementation
of the technologies in the UAV distribution center.

Keywords: UAV · Distribution Centers · Deep Reinforcement Learning ·


Autonomous Driving · Object Detection · YOLO

1 Introduction
1.1 Background
The term “self-driving” often associated with artificial intelligence, encompasses a
spectrum of interpretations. For instance, “unmanned vehicles” is a phrase commonly
employed to denote a range of specific technologies and applications, which may

© Beijing KeCui Man-Machine-Environment System Engineering Technology Research Academy 2024


S. Long et al. (Eds.): MMESE 2024, LNEE 1256, pp. 548–554, 2024.
https://doi.org/10.1007/978-981-97-7139-4_76
Automatic Control of Unmanned Vehicles 549

include varying degrees of driving assistance. Some of these applications necessitate


human interaction, yet the term “unmanned” signifies full automation in vehicle control,
excluding the primary functions typically associated with assisted driving.
The aspiration is for fully autonomous vehicles to eventually supplant the role of
human drivers. However, the evolution of autonomous driving technology suggests a
transitional phase lasting a decade or more ahead. To delineate the distinct levels of
autonomous driving capabilities, SAE International introduced a comprehensive six-tier
classification system back in 2014. This system serves to clarify the nuances between
the different stages of self-driving advancements [1, 8].
Each control system of the vehicle needs to be able to link to the decision-making
system via the bus and precisely control driving movements such as acceleration, braking,
and steering amplitude following the bus instructions issued by the decision-making
system [2].

1.2 Deep Reinforcement Learning (DRL)


1.2.1 Value-Based Learning
The core of Value-Based Learning is the action-value function Q∗ (s, a) [9]. Each time
the intelligence observes a state st , it inputs it into the function Q∗ evaluates all actions,
and selects the best action:

Q∗ (st , Left) = 152, (1)

Q∗ (st , Right) = 396, (2)

Q∗ (st , Up) = −232 (3)

These Q-values represent the expected benefit of the action [10]. The expected gain
for an action to the left is 152, for an action to the right is 396, and for an action up is −
232. The intelligence should take the action with the largest Q-value, i.e., action to the
right.
How to get the Q∗ function? We collect states, actions, and returns as training data
and learn to obtain a neural network that approximates the true Q∗ function. The most
widely known value learning method is Deep Q Networks (DQN) [5].

1.2.2 Policy-Based Learning


We can directly use the strategy function to compute the probability values of all actions,
and then randomly select an action and execute it according to the probability [4]. Each
time a state st is observed, feed it into the function π(a|s) and let it evaluate all actions
to get the probability value:

π (Left|st ) = 0.2 (4)

π (Right|st ) = 0.5 (5)


550 X. Zhang et al.

π (Up|st ) = 0.3 (6)

The intelligent system randomly selects among the three operations and executes
them as selected [7].

1.3 AirSim Simulation Environment

With the features preferred introduced, the group chose AirSim as the simulation software
for its open-source feature and well instructed Python interface.
Specifically, AirSim is an Unreal Engine-based simulator for drones, cars, and more.
Airsim supports popular flight controllers such as PX4 and ArduPilot, and can also per-
form hardware-in-the-loop simulations for physically and visually realistic simulations.
Airsim is an open source, cross-platform Unreal plug-in that is simple to apply to the
Unreal environment [6].
AirSim aims to develop itself into an AI research platform for self-driving vehicles.
To this end, AirSim opens up Application Programming Interfaces (APIs) for retrieving
data and controlling vehicles.
The ALU arithmetic logic unit is a combination logic circuit that can realize multiple
sets of arithmetic and logic operations.
It is clear that GPUs have more ALUs and can perform powerful calculations. To
put it simply, the CPU is like the brain of a computer with multiple functions, including
scheduling, management, and coordination of other hardware. On the opposite, a GPU
is like an employee who accepts CPU scheduling with huge computational power. Deep
learning requires a lot of matrix computing, and the GPU just meets this requirement.
Therefore, our lab chose one of the best GPUs, RTX 3090 as the core hardware.

2 Algorithms

2.1 YOLO

2.1.1 Algorithm
Resolving the information that can be understood by the computer from the image is
the central problem of machine vision. Because of its powerful representation ability,
combined with the accumulation of data quantity and the progress of computational
power, the deep learning model has become a hot research direction of machine vision.
As for the topic of how a computer could understand a picture, there are three main
levels, depending on the needs of the successor task, which are classification, detection,
and segmentation, as shown in the figure below [11] (Fig. 1).
The YOLO algorithm is based on the structure as shown as [3]. It consists of six
convolutional layers and two connection layers, respectively for retrieving the features
of the figure and classification of different features.
Automatic Control of Unmanned Vehicles 551

Fig. 1. The three different levels of understanding a picture

3 Results

3.1 Simulation Environment

3.1.1 Automotive Winter Scene


The second scene that is chosen is the automotive winter scene from the official file
of Airsim. The feature of this scene is that the main color of the environment is white,
which is very similar to the road’s color, and therefore the difficulty of the detection of
the road as well as obstacles are more difficult than the parking lot scene. The reason that
this scene is selected is to further test the accuracy of the trained network of autonomous
driving. Another reason that this scene is selected is to testify the dependency of the
network on the Object Detection, YOLO algorithm. In specific, the network itself could
take the car as an agent and drive it to avoid the obstacles. With the help of the YOLO
algorithm, the input image of the network is labeled with proper classification and the
difficulty of autonomous driving could be improved. However, with further training of
the network, gradually the parameters might be adjusted to fit the input information with
the YOLO algorithm and therefore as the car is in an environment that YOLO algorithm
may be hard to work, the accuracy of the action of the car could be decreased, which

Fig. 2. Outdoor Unreal scene - automotive winter scene


552 X. Zhang et al.

could lead to a danger of the unmanned vehicle as well as the UAVs on the vehicle.
In this case, the winter scene is selected due to the difficulty of the YOLO algorithm
in object detection. That is, there is not as much information provided by the YOLO
network and therefore the agent needs to be trained barely based on its interaction with
the environment (Fig. 2).

3.2 Simulation Results

It could be shown from the figure that the detection of the figure was only found on the
object with 0.35, rather low confidence. The warning piles and trees are not detected due
to the complex environment as well as the snow on the objects. Therefore, the YOLO
algorithm might not be proper for improving the performance in the winter scene (Fig. 3).

Fig. 3. YOLO result in the automotive winter scene

3.2.1 DQN+YOLO
As introduced before, the combination of YOLO and DQN in the parking lot environment
provides the input of DQN with more information before the training of the network.
With the same method of drawing the reward training with DQN, the reward record of
using an algorithm combining YOLO and DQN is shown below. The main difference
from the previous figure is its training speed. In specific, the method reaches its first data
point to the destination only after 80 episodes, which is about 65% less than only using
DQN. It is proven by using YOLO before the input of the deep reinforcement learning
algorithms the training speed could be improved (Fig. 4).
Automatic Control of Unmanned Vehicles 553

Fig. 4. Reward of YOLO + DQN in 300 episodes

4 Conclusion

This research aimed to identify a suitable approach within the UAV dispatch hub, focus-
ing on the practical application of a deep reinforcement learning strategy for unmanned
aerial vehicle control and the associated performance enhancements.
Specifically, the study employed Microsoft’s AirSim platform as the simulation set-
ting, primarily utilizing the indoor Unreal Engine 4 parking lot scenario. The simulation
leveraged the Deep Q-Network algorithm due to its efficacy and straightforwardness. To
enhance the performance of the trained model, the YOLO v3 object detection technique
was integrated as the detection mechanism for the UAV. The model was further refined
by incorporating the object detection outputs as inputs to the network, thereby expediting
the training phase.
Through three experimental iterations, it was observed that the introduction of the
YOLO network prior to the DQN input significantly expedited the network’s learning
curve. In a series of 300-episode experiments, the DQN achieved its initial peak reward
at the 130th episode, whereas the YOLO-enhanced DQN reached its initial peak at the
80th episode. Moreover, the YOLO-augmented DQN demonstrated a 30% increase in
average reward over the 300 episodes compared to the standard DQN. This indicates
that effective object detection can notably elevate the average reward during the training
of deep reinforcement learning algorithms for autonomous driving neural networks.
The deployment of these algorithms has effectively validated the practicality of
deploying deep reinforcement learning agents for UAVs within the project. Additionally,
the integration of efficient object detection techniques lays a robust foundation for the
identification of anomalous objects in the operational environment of the UAV agent.

Acknowledgements. This work are supported by the Chinese National Natural Science Founda-
tion (No. 61773039), the Aeronautical Science Foundation of China (No. 2017ZDXX1043), and
Aeronautical Science Foundation of China (No. 2018XXX).
554 X. Zhang et al.

References
1. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision
benchmark suite. In: Proceedings of the IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, pp. 3354–3361 (2012). https://doi.org/10.1109/CVPR.2012.
6248074
2. Levinson, J., et al.: Towards fully autonomous driving: systems and algorithms. In: IEEE
Intelligent Vehicles Symposium, Proceedings, pp. 163–168 (2011). https://doi.org/10.1109/
IVS.2011.5940562
3. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time
object detection (2016). Accessed 07 June 2021. http://pjreddie.com/yolo/
4. Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learn-
ing: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017). https://doi.org/10.1109/
MSP.2017.2743240
5. Lv, L., Zhang, S., Ding, D., Wang, Y.: Path planning via an improved DQN-based learn-
ing policy. IEEE Access 7, 67319–67330 (2019). https://doi.org/10.1109/ACCESS.2019.291
8703
6. Shah, S., Dey, D., Lovett, C., Kapoor, A.: AirSim: high-fidelity visual and physical simulation
for autonomous vehicles. arXiv, pp. 621–635, 15 May 2017. https://doi.org/10.1007/978-3-
319-67361-5_40
7. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning, September 2016.
https://goo.gl/J4PIAz. Accessed 24 May 2021
8. Shende, V.: Analysis of research in consumer behavior of automobile passenger car customer.
Int. J. Sci. Res. Publ. 4(2), 1–8 (2014). http://www.ijsrp.org/research-paper-0214/ijsrp-p2670.
pdf
9. Li, Y.: Deep reinforcement learning: an overview, January 2017. http://arxiv.org/abs/1701.
07274. Accessed 24 May 2021
10. Li, Y.: Deep reinforcement learning. Nature 511(7508), 184–190 (2018). http://arxiv.org/abs/
1810.06339. Accessed 24 May 2021
11. Tian, Y., Yang, G., Wang, Z., Wang, H., Li, E., Liang, Z.: Apple detection during different
growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric.
157, 417–426 (2019). https://doi.org/10.1016/j.compag.2019.01.012

You might also like