Learning A Deep Neural Net Policy For End-to-End Control of Autonomous Vehicles
Learning A Deep Neural Net Policy For End-to-End Control of Autonomous Vehicles
Abstract— Deep neural networks are frequently used for learn all required road image features without any manual
computer vision, speech recognition and text processing. The optimization or initialization.
reason is their ability to regress highly nonlinear functions. Very recently deep learning was introduced as an end-
We present an end-to-end controller for steering autonomous
vehicles based on a convolutional neural network (CNN). The to-end framework for control policy search [3], [5], [6],
deployed framework does not require explicit hand-engineered [7]. Unlike in classification tasks, the approximated function
algorithms for lane detection, object detection or path planning. is continuous and maps complex sensor readings such as
The trained neural net directly maps pixel data from a front- camera frames to control actions. Deep learning based end-
facing camera to steering commands and does not require any to-end systems improved the policy performance signifi-
other sensors. We compare the controller performance with the
steering behavior of a human driver. cantly in several robotic tasks such as grasping [5]. Since
explicit mapping of observations to control actions has not
I. INTRODUCTION to be determined a priori in a design process anymore, deep
learning turns out to be very efficient for certain high level
Self-driving cars are expected to have a huge impact on the controls objectives [6].
automotive industry. More than 90% of all car accidents are Our goal is to demonstrate a deep learning based end-to-
caused by human errors and only 2% by vehicle failures [1]. end system that uses deep neural nets to encode a control
Self-driving cars can become a safer alternative to human policy for steering a car. Instead of engineering each au-
drivers and save thousands of lives yearly. Tremendous topilot component separately, we combine all efforts within
efforts are undertaken in industry and academia on hardware one framework. Hence, lane detection, object detection, path
and algorithmic research. Self-driving cars have to cope planning and controls are not being explicitly considered.
with different challenges such as recognition tasks (traffic, Camera systems allow to collect large amounts of infor-
humans, road, lanes etc.), path planning and controls. For mation as frames. We use frames from a camera inside a
each of these tasks a vast amount of different sensor data vehicle as our control input which is comparable to the visual
is gathered and needs to be processed. In the last years, information a human driver receives. The controller (a CNN)
deep neural nets emerged as a promising method for several maps the camera frames to steering angle. Thereby, the high
applications where complex data needs to be processed. level objective is lane following.
Neural nets are referred to as deep, if they possess a In summary, a deep learning based end-to-end controller
sufficiently large number of layers. Deep learning refers to could be a promising strategy for many problems related to
training a deep neural net. Due to the complex network autonomous driving.This paper intends to provide a basis
structure deep neural nets can be used to approximate or and an overview for implementing a simple deep-learning
to learn highly nonlinear functions [2]. They are firmly system. Researchers associated with classical control tech-
established in machine learning and are commonly used in niques, specifically in autonomous vehicle research, are the
computer vision, speech recognition and text processing. The main audience for this contribution. We describe a neural net
convolutional neural net (CNN) is a type of a deep neural net architecture and how to train a deep neural net successfully
which is particularly well suited for image recognition [3]. for an end-to-end steering control system of autonomous
They perform a dimensional reduction of a high-dimensional vehicles. Thereby, we use a car simulator as the testbed.
input through convolution. Convolutional neural nets lead to We also compare different kinds of solvers to generate the
superior performance even on large data sets [4]. Applied parameter updates during the net training.
to autonomous driving CNNs could in principle be used to The remainder of this paper is organized as follows.
Section II presents the concept of deep control policies.
This work was supported by the German Academic Exchange Service
(DAAD).
Steering control algorithm design is carried out in Section III.
1 Authors are with the Department of Mechanical Engineering, University Section IV present simulation results. Finally, conclusions
of California, Berkeley, USA are drawn in Section V.
[email protected]
[email protected] II. DEEP CONTROL POLICIES
[email protected]
2 Authors are with the Institute of Mechanics and Ocean Engineering, A control policy can be regarded as a generalization of
Hamburg University of Technology, Germany a control law, whereby the mapping from inputs to outputs
[email protected]
[email protected] is not necessarily an analytical functional relation. It can
[email protected] either be deterministic or stochastic and maps observations
control
III. ALGORITHM DESIGN signal plant
This section presents the main aspects of the deep neural (e. g. vehicle)
net policy algorithm.
deep neural net output
A. Neural Networks (e. g. position)
4915
at α = 0.01 to 0.001 and should drop down after a certain file which includes the scaled and normalized frames and
number of iterations [4]. The described optimization can be steering angles. This file is the input for the training process
stopped as soon as the performance of the neural network in CAFFE. We use a 15 minute driving video to train the
on the training data set is satisfactory. net resulting in 10,800 training frames.
B. Implementation
The implementation environment consists of two main
parts. The first part simulates the vehicle dynamics and
generates the training data and the second part deals with
the neural net algorithm.
1) Car simulation: For a first safe policy training step
the simulation environment CARSIM is deployed, which is
a commercial tool for simulating car behavior. CARSIM
computes the vehicle dynamics based on the driver inputs
such as steering or braking. The driving environment (road
course, traffic, wind) is freely selectable. Screen captures and Fig. 2: An exemplary screen capture of a right curve, gen-
matching steering angles can be easily recorded via CARSIM erated with CARSIM. The steering wheel angle is −10.17◦ .
and fed to a neural net.
2) Deep neural network framework: We use CAFFE to
implement the deep neural net. It is a C++ based library with D. Network structure
MATLAB and Python interfaces. Developed at UC Berkeley The structure of the neural net determines the quality
CAFFE supports training, testing, and fine tuning of neural of the results. However, there is more than a single one
nets and is usable with CPU and for faster computation with appropriate net structure. Each task requires a specific net
GPU [9]. structure. Finding a good net is an iterative process. Some
net structures do not lead to convergence of the loss function,
C. Data collection
some of them lead to convergence but the performance
In order to train a CNN we need labeled frames, therefore on unknown test data is still bad. We use a CNN with
each road screen capture requires a matching steering angle. four hidden layers. Three convolutional layers and one fully
A human driver steers a car with a joystick wheel in a connected layer. The input data has the size 190×100×3 and
CARSIM simulation to provide the labeled data for the we use a batch size of 128. The first two convolutional layers
training step. Here, the considered scenario includes a two have a kernel size of 5 × 5. The first one has 20 feature maps
lane road without traffic. We sample the simulation with 12 as output and the second one 48. They are used to extract
frames per second (FPS). More FPS would only lead to more the features of the road frames. After the first two layers we
similar frames without more relevant information [3]. The car use a pooling layer with 2 × 2 kernel to scale the frames.
drives with a constant speed of 60 km/h to further simplify The third convolutional layer has a 3 × 3 kernel size. The
the scenario. The steering angles vary between −45◦ and fully connected layer has 500 outputs and the last layer has
40◦ and the frame resolution is 1912 × 1036 pixels. The used the steering angle as an output. After every layer we use a
steering angle is the wheel steering angle dropout, which is a regularization technique to prevent the
φw = R · φt , (6) neural net from overfitting [12]. In order to achieve faster
learning we use the rectified linear unit (ReLU) instead of
where φt is the tire steering angle and R is the steering the sigmoid function as an activation function [13]. Figure 3
ratio [11]. The car model has a steering ratio of R ≈ 15. shows the described neural net structure.
After completing the data collection, we reject the non-
proper frames by hand. Non-proper frames result from bad IV. RESULTS
human driving behavior or graphic errors in the CARSIM This section presents results for the previously described
simulation. case study. We compare the deep neural net controller
Figure 2 shows one sample screen capture used during performance with a human driver on a novel data set and
the training phase. Every pixel has a value between 0 and analyse different solvers.
255. As common practice we normalize each pixel to values
from 0 to 1 and the steering angles to values from -1 to 1. A. Training of the neural net
As a result the training loss function will converge faster. If We train the net on a CPU, which can be a long process.
the steering angles are not scaled the loss could become too It can be accelerated by exploiting GPUs. There is no exact
large, and might influence the convergence of the training rule when to stop the training phase. As soon as the loss
loss function. The larger the frames the more computational converges and does not decrease anymore the net is either
power is required for training the net. Therefore, we scale trained well enough or the net structure is not suitable for
the frame size to 190 × 100 pixels. In order to solve the re- the task at hand. For the case the net is already sufficiently
gression task we generate a hierarchical data format (HDF5) trained and the training process is not stopped, overfitting can
4916
Input Convolutinal 1 Pooling 1 Convolutinal 2 Pooling 2 Convolutinal 3 Fully-connected
3@100 x 190 feature maps maps feature maps maps feature maps layer
20@96 x 186 20@48 x 93 48@44 x 89 48@22 x 45 64@20 x 43 500 Output:
steering angle
5x5 Kernel 2x2 Kernel 5x5 Kernel 2x2 Kernel 3x3 Kernel
Fig. 3: The net structure of the designed neural net. The net consists of three convolutional layers, two pooling layers and
one fully connected layer. We also use dropouts to prevent overfitting and the ReLU activation function.
occur even if dropouts are used. Figure 4 shows a sample for non-convex problems [15]. Figure 5 shows the Euclidean
loss curve over the training iterations. Note, after the 5,000th loss between the predicted and labeled steering angles scaled
iteration the loss does not decrease noticeable, so the training to the range of -1 to 1, for the different solvers. The three
process should be stopped. loss curves are Gaussian distributed and quite similar but
each solver achieves another loss at the end. Table I shows
·10−2 the converged mean loss of each solver. The Adam solver
3
achieves the lowest loss after 8,000 iterations with 0.0012.
Scaled Euclidean Loss (E)
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 ·10−2
2
Iteration ·104 NAG
Scaled Euclidean Loss (E)
Fig. 4: The Euclidean loss between the predicted and labeled Adam
1.5 SGD
steering angle over the SGD iteration steps.
4917
only 4,000 iterations and a training data set consisting of
10,800 frames. Similar to the loss curves the steering angles
are in agreement.
Table II shows the mean loss and standard deviations of the
results. Even if the Adam solver has the best loss curve, the
performance on the validation data set of the NAG trained net
shows the best results for unknown road images. Despite the
fact that the trained nets perform well on unknown data the (a) 6 out of 20 feature maps of the first convolutional layer.
mean loss on the validation data is higher than on the training
data by one magnitude. However, this is expected and a
typical machine learning phenomenon because the average
fit of the model is always worse for novel unseen data than
for data which was used for training.
1
Human driver (b) 6 out of 48 feature maps of the second convolutional layer.
NAG
Scaled Steering Angle
Adam
0.5 SGD
4918
raw sensor data and high level objectives well, there are no
formal guarantees regarding stability or convergence. The
authors are convinced that a combination of classical control
theory and deep learning will result in a new generation of
controllers.
ACKNOWLEDGMENT
The financial support received from the German Aca-
demic Exchange Service (DAAD) is gratefully acknowl-
edged. Thanks to e.g. Ziya Ercan from the Department of
Mechanical Engineering, University of California, Berkeley,
USA, Gregory Kahn from the Department of Electrical En-
gineering and Computer Sciences, University of California,
Berkeley, USA, and Wenshuo Wang from the Mechanical
Engineering Department, Beijing Institute of Technology,
Beijing, China for their technical support and helpful com-
ments.
R EFERENCES
[1] S. Singh, “Critical Reasons for Crashes Investigated in the National
Motor Vehicle Crash Causation Survey,” NHTSAs National Center for
Statistics and Analysis, Tech. Rep., 2015.
[2] T. Liu, S. Fang, Y. Zhao, P. Wang, and J. Zhang, “Implementa-
tion of Training Convolutional Neural Networks,” arXiv preprint
arXiv:1506.01195, 2015.
[3] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp,
P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al.,
“End to End Learning for Self-Driving Cars,” arXiv preprint
arXiv:1604.07316, 2016.
[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classifi-
cation with Deep Convolutional Neural Networks,” in Advances in
neural information processing systems, 2012, pp. 1097–1105.
[5] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training
of deep visuomotor policies,” Journal of Machine Learning Research,
vol. 17, no. 39, pp. 1–40, 2016.
[6] T. Zhang, G. Kahn, S. Levine, and P. Abbeel, “Learning Deep Control
Policies for Autonomous Aerial Vehicles with MPC-Guided Policy
Search,” arXiv preprint arXiv:1509.06791, 2015.
[7] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley,
D. Silver, and K. Kavukcuoglu, “Asynchronous Methods for Deep
Reinforcement Learning,” in International Conference on Machine
Learning, 2016.
[8] Net-Scale Technologies, Inc. (2016, August) Autonomous Off-Road
Vehicle Control Using End-to-End Learning. Final technical report.
[Online]. Available: http://net-scale.com/doc/net-scale-dave-report.pdf
[9] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
S. Guadarrama, and T. Darrell, “Caffe: Convolutional Architecture for
Fast Feature Embedding,” arXiv preprint arXiv:1408.5093, 2014.
[10] L. Bottou, “Stochastic gradient descent tricks,” in Neural Networks:
Tricks of the Trade. Springer, 2012, pp. 421–436.
[11] R. Rajamani, Vehicle Dynamics and Control. Springer Science &
Business Media, 2011.
[12] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and
R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Net-
works from Overfitting,” Journal of Machine Learning Research,
vol. 15, no. 1, pp. 1929–1958, 2014.
[13] A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation
Learning with Deep Convolutional Generative Adversarial Networks,”
arXiv preprint arXiv:1511.06434, 2015.
[14] D. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,”
arXiv preprint arXiv:1412.6980, 2014.
[15] I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton, “On the
importance of initialization and momentum in deep learning,” ICML
(3), vol. 28, pp. 1139–1147, 2013.
4919