0% found this document useful (0 votes)
63 views

Integrating Deep Reinforcement Learning With Model-Based Path Planner

This document proposes a hybrid approach that integrates deep reinforcement learning (DRL) with model-based path planners for automated driving. Specifically, it trains a DRL agent to follow waypoints generated by an A* path planner as closely as possible, by penalizing the agent for straying from the waypoints and collisions. The agent learns this policy through interacting with a simulated driving environment. Experimental results show the approach can successfully plan paths and navigate between points in a dynamic urban simulation.

Uploaded by

Adnan Hossain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Integrating Deep Reinforcement Learning With Model-Based Path Planner

This document proposes a hybrid approach that integrates deep reinforcement learning (DRL) with model-based path planners for automated driving. Specifically, it trains a DRL agent to follow waypoints generated by an A* path planner as closely as possible, by penalizing the agent for straying from the waypoints and collisions. The agent learns this policy through interacting with a simulated driving environment. Experimental results show the approach can successfully plan paths and navigate between points in a dynamic urban simulation.

Uploaded by

Adnan Hossain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Integrating Deep Reinforcement Learning with Model-based Path

Planners for Automated Driving


Ekim Yurtsever∗,† , Linda Capito∗,† , Keith Redmill∗ and Umit Ozguner∗

Abstract— Automated driving in urban settings is challeng-


ing. Human participant behavior is difficult to model, and
conventional, rule-based Automated Driving Systems (ADSs)
tend to fail when they face unmodeled dynamics. On the
other hand, the more recent, end-to-end Deep Reinforcement
arXiv:2002.00434v2 [cs.AI] 19 May 2020

Learning (DRL) based model-free ADSs have shown promising


results. However, pure learning-based approaches lack the hard-
coded safety measures of model-based controllers. Here we
propose a hybrid approach for integrating a path planning
pipe into a vision based DRL framework to alleviate the short-
comings of both worlds. In summary, the DRL agent is trained Fig. 1. An overview of our framework. FC stands for Fully Connected lay-
to follow the path planner’s waypoints as close as possible. The ers. The proposed system is a hybrid of a model-based planner and a model-
agent learns this policy by interacting with the environment. free DRL agent. *Other sensor inputs can be anything the conventional pipe
The reward function contains two major terms: the penalty needs. ** We integrate planning into the DRL agent by adding ‘distance to
of straying away from the path planner and the penalty of the closest waypoint’ into our state-space, where the path planner gives the
having a collision. The latter has precedence in the form of closest waypoint. Any kind of path planner can be integrated into the DRL
having a significantly greater numerical value. Experimental agent with the proposed method.
results show that the proposed method can plan its path and
navigate between randomly chosen origin-destination points in
CARLA, a dynamic urban simulation environment. Our code directly [21]. However, the lack of hard-coded safety mea-
is open-source and available online 1 . sures, interpretability, and direct control over path constraints
limit the usefulness of these methods.
I. I NTRODUCTION We propose a hybrid methodology to mitigate the draw-
Automated Driving Systems (ADSs) promise a decisive backs of both approaches. In summary, the proposed method
answer to the ever-increasing transportation demands. How- integrates a short pipeline of localization and path planning
ever, widespread deployment is not on the horizon as state- modules into a DRL driving agent. The training goal is to
of-the-art is not robust enough for urban driving. The recent teach the DRL agent to oversee the planner and follow it if
Uber accident [1] is an unfortunate precursor: the technology it is safe to follow. The proposed method was implemented
is not ready yet. with a Deep Q Network (DQN) [25] based RL agent and the
There are two common ADS design choices [2]. The A* [26] path planner. First, the localization module outputs
first one is the more conventional, model-based, modular the ego-vehicle position. With a given destination point, the
pipeline approach [3]–[10]. A typical pipe starts with a path planner uses the A* algorithm [26] to generate a set of
perception module. Robustness of perception modules has waypoints. The distance to the closest waypoint, along with
been increased greatly due to the recent advent of deep monocular camera images and ego-vehicle dynamics, are
Convolutional Neural Networks (CNN) [11]. The pipe usu- then fed into the DQN based RL agent to select discretized
ally continues with scene understanding [12], assessment steering and acceleration actions. During training, the driving
[13], planning [14] and finally ends with motor control. The agent is penalized for making collisions and being far from
major shortcomings of modular model-based planners can the closest waypoint asymmetrically, with the former term
be summarized as complexity, error propagation, and lack of having precedence. We believe this can make the agent prone
generalization outside pre-postulated model dynamics. to follow waypoints during free driving but have enough
The alternative end-to-end approaches [15]–[24] elimi- flexibility to stray from the path for collision avoidance using
nated the complexity of conventional modular systems. With visual cues. An overview of the proposed approach is shown
the recent developments in the machine learning field, sen- in Figure 1.
sory inputs now can directly be mapped to an action space. The major contributions of this work can be summarized
Deep Reinforcement Learning (DRL) based frameworks can as follows:
learn to drive from front-facing monocular camera images • A general framework for integrating path planners into
model-free DRL based driving agents
∗ E. Yurtsever, L. Capito, K. Redmill and U. Ozguner are with The Ohio • Implementation of the proposed method with an A*
State University, Ohio, US. planner and a DQN RL agent. Our code is open-source
† These authors contributed equally to this work
Corresponding author: Ekim Yurtsever, [email protected] and available online1 .
1 https://github.com/Ekim-Yurtsever/Hybrid-DeepRL-Automated-Driving The remainder of the paper is organized in five sections.
A brief literature survey is given in Section II. Section
III explains the proposed methodology and is followed by
experimental details in Section IV. Results are discussed in
Section V and a short conclusion is given in Section VI.

II. R ELATED W ORKS


End-to-end driving systems use a single algorithm/module
to map sensory inputs to an action space. ALVINN [16] was
the first end-to-end driving system and utilized a shallow,
fully connected neural network to map image and laser
range inputs to a discretized direction space. The network
was trained in a supervised fashion with labeled simulation
data. More recent studies employed real-world driving data
and used convolutional layers to increase performance [18].
However, real-world urban driving has not been realized with
an end-to-end system yet.
A CNN based partial end-to-end system was introduced
to map the image space to a finite set of intermediary
“affordance indicators” [15]. A simple controller logic was Fig. 2. Illustration of state st ' (zt , et , dt ) and distance to the final
then used to generate driving actions from these affordance destination lt at time t. Waypoints w ∈ W are to be obtained from the
path planner.
indicators. Chauffer Net [27] is another example of a mid-
to-mid system. These systems benefit from robust perception
modules on the one end, and rule-based controllers with integrate path-planning into the MDP by adding d, distance
hard-coded safety measures on the other end. to the closest waypoint, to the state-space.
All the methods mentioned above suffer from shortcom-
S: A set of states. We associate observations made at time
ings of supervised learningnamely, a significant dependency
t with the state st as st ' (zt , et , dt ) where; 1) zt =
on labeled data, overfitting, and lack of interpretability. Deep
fcnn (It ) is a visual feature vector which is extracted
Reinforcement Learning (DRL) based automated driving
using a deep CNN from a single image It captured by
agents [20], [21] replaced the need for huge amounts of
a front-facing monocular camera. 2) et is a vector of
labeled data with online interaction. DRL agents try to learn
ego-vehicle states including speed and location 3) dt
the optimum way of driving instead of imitating a target
is the distance to the closest waypoint obtained from
human driver. However, the need for interaction raises a
the model-based path planner. dt is the key observation
significant issue. Since failures cannot be tolerated for safety-
which links model-based path planners to the MDP.
critical applications, in almost all cases, the agent must be
A: A set of discrete driving actions illustrated in Figure
trained in a virtual environment. This adds the additional
3. Actions consist of discretized steering angle and ac-
virtual-to-real transfer learning problem to the task. In ad-
celeration values. The agent executes actions to change
dition, DRL still suffers from a lack of interpretability and
states.
hard-coded safety measures.
P : The transition probability Pt = Pr(st+1 |st , at ). Which
A very recent study [28] focused on general tactical
is the probability of reaching state st+1 after executing
decision making for automated driving using the AlphaGo
action at in state st .
Zero algorithm [29]. AlphaGo Zero combines tree-search
r: A reward function r(st+1 , st , at ). Which gives the in-
with neural networks in a reinforcement learning framework,
stant reward of going from state st to st+1 with at .
and its implementation to the automated driving domain is
promising. However, this study [28] was limited to only high- The goal is to find a policy function π(st ) = at that will
level tactical driving actions such as staying on a lane or select an action given a state such that it will maximize the
making a lane change. following expectation of cumulative future rewards where
Against this backdrop, here we propose a hybrid DRL- st+1 is taken from Pt .
based driving automation framework. The primary motiva- ∞
!
tion is to integrate path-planning into DRL frameworks for
X
t
E γ r(st , st+1 , at ) (1)
achieving a more robust driving experience and a faster t=0
learning process.
Where γ is the discount factor, which is a scalar between
III. P ROPOSED M ETHOD 0 ≤ γ ≤ 1 that determines the relative importance of later
rewards with respect to previous rewards. We fix the horizon
A. Problem formulation for this expectation with a finite value in practice.
In this study, automated driving is defined as a Markov Our problem formulation is similar to a previous study
Decision Process (MDP) with the tuple of (S, A, P, r). We [21], the critical difference being the addition of dt to the
Fig. 3. The DQN based DRL agent. FC stands for fully connected. After training, the agent selects the best action by taking the argmax of predicted Q
values.

state space and the reward function. An illustration of our C. Integrating path planning into model-free DRL frame-
formulation is shown in Figure 2. works
The main contribution of this work is the integration of
B. Reinforcement Learning path planning into DRL frameworks. We achieve this by
modifying the state-space with the addition of d. Also, the
Reinforcement learning is an umbrella term for a large reward function is changed to include a new reward term rw ,
number of algorithms derived for solving the Markov Deci- which rewards being close to the nearest waypoint obtained
sion Problems (MDP) [21]. from the model-based path planner, i.e. a small d. Utilizing
In our framework, the objective of reinforcement learning waypoints to evaluate a DRL framework were suggested in a
is to train a driving agent who can execute ‘good’ actions so very recent work [30], but their approach does not consider
that the new state and possible state transitions until a finite integrating the waypoint generator into the model.
expectation horizon will yield a high cumulative reward. The The proposed reward function is as follows.
overall goal is quite straightforward for driving: not making
collisions and reaching the destination should yield a good r = βc rc + βv rv + βl rl + βw rw (3)
reward and vice versa. It must be noted that RL frameworks Where rc is the no-collision reward, rv is the not driving
are not greedy unless γ = 0. In other words, when an very slow reward, rl is being-close to the destination reward,
action is chosen, not only the immediate reward but the and rw is the proposed being-close to the nearest waypoint
cumulative rewards of all the expected future state transitions reward. The distance to the nearest waypoint d is shown
are considered. in Figure 2. The weights of these rewards, βc , βv βl , βw , are
Here we employ DQN [25] to solve the MDP problem parameters defining the relative importance of rewards. These
described above. The main idea of DQN is to use neural parameters are determined heuristically. In the special case
networks to approximate the optimal action-value function of βc = βv = βl = 0, the integrated model should mimic
Q(s, a). This Q function maps the state-action space to R. the model-based planner.
Q : S × A → R while maximizing equation 1. The problem Please note that any planner, from the naive A* to more
comes down to approximiate or to learn this Q function. The complicated algorithms with complete obstacle avoidance
following loss function is used for Q-learning at iteration i. capabilities, can be integrated into this framework as long
as they provide a waypoint.
Li (θ) =
" 2 # IV. E XPERIMENTS
θi− θi As in all RL frameworks, the agent needs to interact
E(s,a,r) r + γmaxQ (st+1 , at+1 ) − Q (st , at )
at+1
with the environment and fail a lot to learn the desired
(2) policies. This makes training RL driving agents in real-world
extremely challenging as failed attempts cannot be tolerated.
Where Q-Learning updates are applied on samples As such, we focused only on simulations in this study. Real-
(s, a, r) ∼ U (D). U (D) draws random samples from the world adaptation is outside of the scope of this work.
data batch D. θi is the Q-network parameters and θi− is the The proposed method was implemented in Python based
target network parameters at iteration i. Details of DQN can on an open-source RL framework [31] and CARLA [32]
be found in [25]. was used as the simulation environment. The commonly used
Fig. 4. The experimental process: I. A random origin-destination pair was selected. II. The A* algorithm was used to generate a path. III. The hybrid
DRL agent starts to take action with the incoming state stream. IV. The end of the episode.

A* algorithm [26] was employed as the model-based path


planner, and the recently proposed DQN [25] was chosen as 1
rv = v−1 (6)
the model-free DRL. v0

A. Details of the reward function l


rl = 1 − (7)
lprevious
The general form of r was given in the previous Section
in equation 3. Here, the special case and numerical values d
used throughout the experiments are explained. rw = 1 − (8)
d0
Where  = 5m, the desired speed v0 = 50km/h, and d0 =

rv + rl + rw , rc = 0 & l ≥ 

8m. In summary, rw rewards keeping a distance less than
r = 100 , rc = 0 & l <  (4)
 d0 to the closest waypoint at every time step, and rl rewards
rc , rc 6= 0

decreasing l over lprevious , distance to the destination in the the
( previous time step. The last term of rl allows to continuously
0 , no collision penalize/reward the agent for getting further/closer to the
rc = (5)
−1 , there is a collsion final destination.
If there is a collision, the episode is over and the reward
gets a penalty equal to −1. If the vehicle reaches its
destination ∃ > 0 : l < , a reward of 100 is sent back.
Otherwise, the reward consists of the sum of the other terms.
d0 was selected as 8m because the average distance between
waypoints of the A* equals to this value.

B. DQN architecture and hyperparameters


The deep neural network architecture employed in the
DQN is shown in Figure 3. The CNN consisted of three
identical convolutional layers with 64 filters and a 3 × 3
window. Each convolutional layer was followed by average
pooling. After flattening, the output of the final convolutional
layer, ego-vehicle speed and distance to the closest waypoint
were concatenated and fed into a stack of two fully connected
layers with 256 hidden units. All but the last layer had
rectifier activation functions. The final layer had a linear Fig. 5. Normalized reward versus episode number. The proposed hybrid
approach learned to drive faster than its complete end-to-end counterpart.
activation function and outputed the predicted Q values,
which were used to choose the optimum action by taking TABLE I
argmax. AVERAGE REWARD SCORES FOR FIVE RUNS IN EACH ROUTE TYPE .
Q
Route type Hybrid-DQN Human average
C. Experimental process & training Straight (highway) 21.1 43.4
Straight (urban) 27.6 38.1
The experimental process is shown in Figure 4. The fol- Straight (under bridge) 31.6 45.2
lowing steps were carried repeatedly until the agent learned Slight curve 30.4 49.5
Sharp curve -74.4 -8.9
to drive. Right turn in intersection -136.9 -12.1
1) Select two random points on the map as an origin- Left turn in intersection -385.9 -25.5
destination pair for each episode
2) Use A* path planner to generate a path between origin-
destination using the road topology graph of CARLA. V. R ESULTS
3) Start feeding the stream of states, including distance to Figure 5 illustrates the training process. The result is
the closest waypoint, into the DRL agent. DRL agent clear and evident: The proposed hybrid approach learned to
starts to take actions at this point. If this is the first drive much faster than its complete end-to-end counterpart.
episode, initialize the DQN with random weights. It should be noted that the proposed approach made a
4) End the episode if a collision is detected, or the goal quick jump at the beginning of the training. We believe
is reached. the waypoints acted as a ‘guide’ and made the algorithm
5) Update the weights of the DQN after each episode with learn faster that way. Our method can be used for spooling
the loss function given in equation 2. up the training process of a complete end-to-end variant
6) Repeat the above steps sixty thousand times with transfer learning. Qualitative analysis of the driving
performance can be done by watching the simulation videos
D. Comparision and evaluation on our repository1 .
The proposed hybrid approach was compared against a The proposed method outperformed the end-to-end DQN,
complete end-to-end DQN agent. The complete end-to-end however, it is still not good as the average human driver as
agent took only monocular camera images and ego-vehicle can be seen in Table I.
speed as input. The same network architecture was employed Even though promising results were obtained, the exper-
for both methods. iments at this stage can only be considered as proof of
A human driving experiment was also conducted to serve concepts, rather than an exhaustive evaluation. The proposed
as a baseline. The same reward function that was used to method needs to consider other integration options, be com-
train the DRL agent was used as the evaluation metric. Four pared against other state-of-the-art agents, and eventually
adults aging between 25 to 30 years old participated in the should be deployed to the real-world and tested there.
experiments. The participants drove a virtual car in CARLA The model-based path planner tested here is also very
using a keyboard and were told to follow the on-screen path naive. In addition, the obstacle avoidance capabilities of
(marked by a green line). The participants did not see their the proposed method was not evaluated. Future experiments
scores. Every participant drove each of the seven predefined should focus on this aspect. The integration of more complete
routes five times. The average cumulative reward of each path planners with full obstacle avoidance capabilities can
route was accepted as the “average human score.” yield better results.
VI. C ONCLUSIONS [12] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Be-
nenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset
In this study, a novel hybrid approach for integrating path for semantic urban scene understanding,” in Proceedings of the IEEE
planning into model-free DRL frameworks was proposed. conference on computer vision and pattern recognition, 2016, pp.
3213–3223.
A proof-of-concept implementation and experiments in a [13] E. Yurtsever, Y. Liu, J. Lambert, C. Miyajima, E. Takeuchi, K. Takeda,
virtual environment showed that the proposed method is and J. H. Hansen, “Risky action recognition in lane change video clips
capable of learning to drive. using deep spatiotemporal networks with segmentation mask transfer,”
in 2019 IEEE Intelligent Transportation Systems Conference (ITSC).
The proposed integration strategy is not limited to path IEEE, 2019, pp. 3100–3107.
planning. Potentially, the same state-space modification and [14] M. McNaughton, C. Urmson, J. M. Dolan, and J.-W. Lee, “Motion
reward strategy can be applied for integrating vehicle control planning for autonomous driving with a conformal spatiotemporal
lattice,” in 2011 IEEE International Conference on Robotics and
and trajectory planning modules into model-free DRL agents. Automation. IEEE, 2011, pp. 4889–4895.
Finally, the current implementation was limited to output [15] C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: Learning
affordance for direct perception in autonomous driving,” in Proceed-
only discretized actions. Future work will focus on enabling ings of the IEEE International Conference on Computer Vision, 2015,
continuous control and real-world testing. pp. 2722–2730.
[16] D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural
ACKNOWLEDGMENT network,” in Advances in neural information processing systems, 1989,
pp. 305–313.
This work was funded by the United States Department [17] U. Muller, J. Ben, E. Cosatto, B. Flepp, and Y. L. Cun, “Off-
road obstacle avoidance through end-to-end learning,” in Advances
of Transportation under award number 69A3551747111 for in neural information processing systems, 2006, pp. 739–746.
Mobility21: the National University Transportation Center [18] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp,
for Improving Mobility. P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al., “End to
end learning for self-driving cars,” arXiv preprint arXiv:1604.07316,
Any findings, conclusions, or recommendations expressed 2016.
herein are those of the authors and do not necessarily reflect [19] H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end learning of driving
models from large-scale video datasets,” arXiv preprint, 2017.
the views of the United States Department of Transportation, [20] A. E. Sallab, M. Abdou, E. Perot, and S. Yogamani, “Deep rein-
Carnegie Mellon University, or The Ohio State University. forcement learning framework for autonomous driving,” Electronic
Imaging, vol. 2017, no. 19, pp. 70–76, 2017.
R EFERENCES [21] A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J.-M. Allen, V.-D.
Lam, A. Bewley, and A. Shah, “Learning to drive in a day,” in 2019
[1] P. Kohli and A. Chadha, “Enabling pedestrian safety using computer International Conference on Robotics and Automation (ICRA). IEEE,
vision techniques: A case study of the 2018 uber inc. self-driving 2019, pp. 8248–8254.
car crash,” in Future of Information and Communication Conference. [22] S. Baluja, “Evolution of an artificial neural network based autonomous
Springer, 2019, pp. 261–279. land vehicle controller,” IEEE Transactions on Systems, Man, and
[2] E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A survey of Cybernetics-Part B: Cybernetics, vol. 26, no. 3, pp. 450–463, 1996.
autonomous driving: Common practices and emerging technologies,” [23] J. Koutnı́k, G. Cuccu, J. Schmidhuber, and F. Gomez, “Evolving large-
IEEE Access, vol. 8, pp. 58 443–58 469, 2020. scale neural networks for vision-based reinforcement learning,” in Pro-
[3] C. Urmson, J. Anhalt, D. Bagnell, C. Baker, R. Bittner, M. Clark, ceedings of the 15th annual conference on Genetic and evolutionary
J. Dolan, D. Duggins, T. Galatali, C. Geyer, et al., “Autonomous computation. ACM, 2013, pp. 1061–1068.
driving in urban environments: Boss and the urban challenge,” Journal [24] K. Makantasis, M. Kontorinaki, and I. Nikolos, “A deep reinforcement
of Field Robotics, vol. 25, no. 8, pp. 425–466, 2008. learning driving policy for autonomous road vehicles,” arXiv preprint
arXiv:1905.09046, 2019.
[4] J. Levinson, J. Askeland, J. Becker, J. Dolson, D. Held, S. Kammel,
[25] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
J. Z. Kolter, D. Langer, O. Pink, V. Pratt, et al., “Towards fully
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski,
autonomous driving: Systems and algorithms,” in Intelligent Vehicles
et al., “Human-level control through deep reinforcement learning,”
Symposium (IV), 2011 IEEE. IEEE, 2011, pp. 163–168.
Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[5] J. Wei, J. M. Snider, J. Kim, J. M. Dolan, R. Rajkumar, and B. Litk-
[26] P. E. Hart, N. J. Nilsson, and B. Raphael, “A formal basis for the
ouhi, “Towards a viable autonomous driving research platform,” in
heuristic determination of minimum cost paths,” IEEE transactions on
Intelligent Vehicles Symposium (IV), 2013 IEEE. IEEE, 2013, pp.
Systems Science and Cybernetics, vol. 4, no. 2, pp. 100–107, 1968.
763–770.
[27] M. Bansal, A. Krizhevsky, and A. Ogale, “Chauffeurnet: Learning to
[6] A. Broggi, M. Buzzoni, S. Debattisti, P. Grisleri, M. C. Laghi,
drive by imitating the best and synthesizing the worst,” arXiv preprint
P. Medici, and P. Versari, “Extensive tests of autonomous driving tech-
arXiv:1812.03079, 2018.
nologies,” IEEE Transactions on Intelligent Transportation Systems,
[28] C.-J. Hoel, K. Driggs-Campbell, K. Wolff, L. Laine, and M. Kochen-
vol. 14, no. 3, pp. 1403–1415, 2013.
derfer, “Combining planning and deep reinforcement learning in
[7] W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km:
tactical decision making for autonomous driving,” IEEE Transactions
The oxford robotcar dataset,” The International Journal of Robotics
on Intelligent Vehicles, 2019.
Research, vol. 36, no. 1, pp. 3–15, 2017.
[29] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang,
[8] N. Akai, L. Y. Morales, T. Yamaguchi, E. Takeuchi, Y. Yoshihara, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al., “Mastering
H. Okuda, T. Suzuki, and Y. Ninomiya, “Autonomous driving based the game of go without human knowledge,” Nature, vol. 550, no. 7676,
on accurate localization using multilayer lidar and dead reckoning,” pp. 354–359, 2017.
in IEEE 20th International Conference on Intelligent Transportation [30] B. Osinski, A. Jakubowski, P. Milos, P. Ziecina, C. Galias, and
Systems (ITSC). IEEE, 2017, pp. 1–6. H. Michalewski, “Simulation-based reinforcement learning for real-
[9] E. Guizzo, “How google’s self-driving car works,” IEEE Spectrum world autonomous driving,” arXiv preprint arXiv:1911.12905, 2019.
Online, vol. 18, no. 7, pp. 1132–1141, 2011. [31] Sentdex, “Carla-rl,” https://github.com/Sentdex/Carla-RL, 2020.
[10] J. Ziegler, P. Bender, M. Schreiber, H. Lategahn, T. Strauss, C. Stiller, [32] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “Carla:
T. Dang, U. Franke, N. Appenrodt, C. G. Keller, et al., “Making Bertha An open urban driving simulator,” arXiv preprint arXiv:1711.03938,
drive – an autonomous journey on a historic route,” IEEE Intelligent 2017.
Transportation Systems Magazine, vol. 6, no. 2, pp. 8–20, 2014.
[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in neural
information processing systems, 2012, pp. 1097–1105.

You might also like