Automatic Control of Unmanned Vehicles Based on Deep Reinforcement Learning and YOLO Algorithm Using Airsim Simulation
Automatic Control of Unmanned Vehicles Based on Deep Reinforcement Learning and YOLO Algorithm Using Airsim Simulation
Xiulin Zhang1 , Xiaolei Qu1,2 , Shuting Yang1 , Junbiao Dong1 , Jingcheng Zhang3 ,
and Ke Li3(B)
1 Avic Shenyang Aircraft Design and Research Institute, Beijing, China
2 Northwestern Polytechnical University, Xian, China
3 School of Aeronautical Science and Engineering, Beihang University, Beijing, China
1 Introduction
1.1 Background
The term “self-driving” often associated with artificial intelligence, encompasses a
spectrum of interpretations. For instance, “unmanned vehicles” is a phrase commonly
employed to denote a range of specific technologies and applications, which may
These Q-values represent the expected benefit of the action [10]. The expected gain
for an action to the left is 152, for an action to the right is 396, and for an action up is −
232. The intelligence should take the action with the largest Q-value, i.e., action to the
right.
How to get the Q∗ function? We collect states, actions, and returns as training data
and learn to obtain a neural network that approximates the true Q∗ function. The most
widely known value learning method is Deep Q Networks (DQN) [5].
The intelligent system randomly selects among the three operations and executes
them as selected [7].
With the features preferred introduced, the group chose AirSim as the simulation software
for its open-source feature and well instructed Python interface.
Specifically, AirSim is an Unreal Engine-based simulator for drones, cars, and more.
Airsim supports popular flight controllers such as PX4 and ArduPilot, and can also per-
form hardware-in-the-loop simulations for physically and visually realistic simulations.
Airsim is an open source, cross-platform Unreal plug-in that is simple to apply to the
Unreal environment [6].
AirSim aims to develop itself into an AI research platform for self-driving vehicles.
To this end, AirSim opens up Application Programming Interfaces (APIs) for retrieving
data and controlling vehicles.
The ALU arithmetic logic unit is a combination logic circuit that can realize multiple
sets of arithmetic and logic operations.
It is clear that GPUs have more ALUs and can perform powerful calculations. To
put it simply, the CPU is like the brain of a computer with multiple functions, including
scheduling, management, and coordination of other hardware. On the opposite, a GPU
is like an employee who accepts CPU scheduling with huge computational power. Deep
learning requires a lot of matrix computing, and the GPU just meets this requirement.
Therefore, our lab chose one of the best GPUs, RTX 3090 as the core hardware.
2 Algorithms
2.1 YOLO
2.1.1 Algorithm
Resolving the information that can be understood by the computer from the image is
the central problem of machine vision. Because of its powerful representation ability,
combined with the accumulation of data quantity and the progress of computational
power, the deep learning model has become a hot research direction of machine vision.
As for the topic of how a computer could understand a picture, there are three main
levels, depending on the needs of the successor task, which are classification, detection,
and segmentation, as shown in the figure below [11] (Fig. 1).
The YOLO algorithm is based on the structure as shown as [3]. It consists of six
convolutional layers and two connection layers, respectively for retrieving the features
of the figure and classification of different features.
Automatic Control of Unmanned Vehicles 551
3 Results
could lead to a danger of the unmanned vehicle as well as the UAVs on the vehicle.
In this case, the winter scene is selected due to the difficulty of the YOLO algorithm
in object detection. That is, there is not as much information provided by the YOLO
network and therefore the agent needs to be trained barely based on its interaction with
the environment (Fig. 2).
It could be shown from the figure that the detection of the figure was only found on the
object with 0.35, rather low confidence. The warning piles and trees are not detected due
to the complex environment as well as the snow on the objects. Therefore, the YOLO
algorithm might not be proper for improving the performance in the winter scene (Fig. 3).
3.2.1 DQN+YOLO
As introduced before, the combination of YOLO and DQN in the parking lot environment
provides the input of DQN with more information before the training of the network.
With the same method of drawing the reward training with DQN, the reward record of
using an algorithm combining YOLO and DQN is shown below. The main difference
from the previous figure is its training speed. In specific, the method reaches its first data
point to the destination only after 80 episodes, which is about 65% less than only using
DQN. It is proven by using YOLO before the input of the deep reinforcement learning
algorithms the training speed could be improved (Fig. 4).
Automatic Control of Unmanned Vehicles 553
4 Conclusion
This research aimed to identify a suitable approach within the UAV dispatch hub, focus-
ing on the practical application of a deep reinforcement learning strategy for unmanned
aerial vehicle control and the associated performance enhancements.
Specifically, the study employed Microsoft’s AirSim platform as the simulation set-
ting, primarily utilizing the indoor Unreal Engine 4 parking lot scenario. The simulation
leveraged the Deep Q-Network algorithm due to its efficacy and straightforwardness. To
enhance the performance of the trained model, the YOLO v3 object detection technique
was integrated as the detection mechanism for the UAV. The model was further refined
by incorporating the object detection outputs as inputs to the network, thereby expediting
the training phase.
Through three experimental iterations, it was observed that the introduction of the
YOLO network prior to the DQN input significantly expedited the network’s learning
curve. In a series of 300-episode experiments, the DQN achieved its initial peak reward
at the 130th episode, whereas the YOLO-enhanced DQN reached its initial peak at the
80th episode. Moreover, the YOLO-augmented DQN demonstrated a 30% increase in
average reward over the 300 episodes compared to the standard DQN. This indicates
that effective object detection can notably elevate the average reward during the training
of deep reinforcement learning algorithms for autonomous driving neural networks.
The deployment of these algorithms has effectively validated the practicality of
deploying deep reinforcement learning agents for UAVs within the project. Additionally,
the integration of efficient object detection techniques lays a robust foundation for the
identification of anomalous objects in the operational environment of the UAV agent.
Acknowledgements. This work are supported by the Chinese National Natural Science Founda-
tion (No. 61773039), the Aeronautical Science Foundation of China (No. 2017ZDXX1043), and
Aeronautical Science Foundation of China (No. 2018XXX).
554 X. Zhang et al.
References
1. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision
benchmark suite. In: Proceedings of the IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, pp. 3354–3361 (2012). https://doi.org/10.1109/CVPR.2012.
6248074
2. Levinson, J., et al.: Towards fully autonomous driving: systems and algorithms. In: IEEE
Intelligent Vehicles Symposium, Proceedings, pp. 163–168 (2011). https://doi.org/10.1109/
IVS.2011.5940562
3. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time
object detection (2016). Accessed 07 June 2021. http://pjreddie.com/yolo/
4. Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learn-
ing: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017). https://doi.org/10.1109/
MSP.2017.2743240
5. Lv, L., Zhang, S., Ding, D., Wang, Y.: Path planning via an improved DQN-based learn-
ing policy. IEEE Access 7, 67319–67330 (2019). https://doi.org/10.1109/ACCESS.2019.291
8703
6. Shah, S., Dey, D., Lovett, C., Kapoor, A.: AirSim: high-fidelity visual and physical simulation
for autonomous vehicles. arXiv, pp. 621–635, 15 May 2017. https://doi.org/10.1007/978-3-
319-67361-5_40
7. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning, September 2016.
https://goo.gl/J4PIAz. Accessed 24 May 2021
8. Shende, V.: Analysis of research in consumer behavior of automobile passenger car customer.
Int. J. Sci. Res. Publ. 4(2), 1–8 (2014). http://www.ijsrp.org/research-paper-0214/ijsrp-p2670.
pdf
9. Li, Y.: Deep reinforcement learning: an overview, January 2017. http://arxiv.org/abs/1701.
07274. Accessed 24 May 2021
10. Li, Y.: Deep reinforcement learning. Nature 511(7508), 184–190 (2018). http://arxiv.org/abs/
1810.06339. Accessed 24 May 2021
11. Tian, Y., Yang, G., Wang, Z., Wang, H., Li, E., Liang, Z.: Apple detection during different
growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric.
157, 417–426 (2019). https://doi.org/10.1016/j.compag.2019.01.012