0% found this document useful (0 votes)

7 views

Sensor

The document discusses using deep imitation learning and reinforcement learning to model driver behavior in near-accident scenarios. It trains an agent using imitation learning on expert demonstrations in different scenarios and modes. Reinforcement learning is then used to select high-level modes and switch between imitation learning models to achieve safety and efficiency in risky situations.

Uploaded by

shahdabdelrhman8x

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Sensor

Uploaded by

shahdabdelrhman8x

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Investigation of Near-accident Car-driving Scenario

using Deep Imitation Learning and Reinforcement

Learning
Team Members: Wentao Zhong Jiaqiao Zhang Project Advisor: Erdem Bıyık
May 2020

1 Abstract
According to [1], the road traffic can be divided into three categories in terms of human driver’s responsibility:
navigation, guidance and control. In this work, we focus on the guidance level in the high risk scenarios,
which is responsible for the output of desired trajectory and/or speed. First, We take the approach of
implementing deep imitation learning to obtain the driver agent model using data generated by predefined
dominant control law. Then, reinforcement learning is applied to find the policy in high risk scenarios via
switching control model considering both efficiency and safety.

2 Introduction
The autonomous driving technology grows rapidly recent years. However, the high risk scenario, where a
potential accident is likely to happen, could not be tackled well since the action need to be changed corre-
spondingly with other drivers. Hence, the agent may need to change actions significantly to stay safe.

According to the recent studies [2] [3], the reinforcement learning (RL) and imitation learning (IL) are two
dominant approaches for autonomous driving in learning actions. Reinforcement learning is applied to learn
the driving policies which maximize the reward function and imitation learning tries to learn the behavior
from expert, i.e. behaviour cloning. One shortcome of RL is that it need to fully explore the environment
while the IL requires large amount of expert demonstrations data. Both approaches are not suitable for
rapid transition of action since the action learned is continuous. Our approach intend to combine RL and
IL approaches to allow the driving agent switch actions accordingly to achieve safety requirement in near-
accident scenario.

First, we obtain driver agent models using deep imitation learning. The input to our algorithm is the dataset
generated from CARLO simulation containing the 4-D observation (location of the ego vehicle, velocity of
the ego vehicle, location of other vehicle with noise and velocity of other vehicle with noise), control input
(steering and throttle) and driving mode indicator. We then use Conditional Imitation Learning (CoIL) [4]
to output a predicted control input given observation and driving mode indicator.

The basic environment setup used is CARLO [4] to perform 2D driving simulation. Two different high risk
scenarios including cross traffic and wrong direction are tested. Different driving modes are evaluated based
on completion time and collision rate.

1
3 Related work
Imitation learning (IL) is one of the popular methods. Muller implement behavior cloning IL to solve off-
road obstacle avoidance [5]. The algorithm learns driving policy which consist of state-action pairs from
the dataset. One drawback is they suffer from the generalization of unpredicated behavior subject to new
test domain. Also it requires a huge amount of expert demonstrations and lead to low data efficiency. [6]
Codevilla proposed a method called Conditional Imitation Learning (CoIL) which extends IL with high-level
commands, as shown in Figure 1 [7]. It learns separate IL models for each high-level commands and some
features are shared between different learned IL models. It improves data efficiency but it requires high level
commands at test time. Our approach proproses to solve this problem using reinforcement learning to learn
agent providing the high-level commands instead of using commands provided from drivers.

Figure 1: Network architecture of CoIL

Reinforcement learning (RL) is one main approach applied in autonomous driving [2]. It explores the
environment first and then take actions in each state which maximize the pre-defined reward. One short
come is that the state space in driving scenario is very large, which makes it hard to fully explore. According
to [8], Hierarchical Reinforcement Learning is proposed to solve this problem. It consist of multiple layers.
The higher layer acts like a manager to give goal for lower layers and lower layer acts like worker to achieve
the goal. It improves the exploration efficiency. Finally, Nair extend this Hierarchical RL to use expert
demonstrations to obtain the high level commands for the exploration of RL [9]. However, all the algorithms
could not address the problem at near-accident scenario since it is difficult to design the low-level reward
function for Hierarchical RL. Instead of using RL to learn the low-level policy, our approach first use IL to
obtain the low-level policy. Then we use RL to obtain the high-level commands.

4 Dataset and Features

The dataset we used for the low-level IL training is obtained from the CARLO simulation for two different
driving scenarios [4]. For the first cross traffic scenario as shown in Fig 2a, we collect the expert demonstra-
tions from three different driving modes (timid, aggressive, normal) in simulation. The control policies for
the ego car for different driving modes are pre-defined based on time to collision (TTC) and time to entering
intersection of both ego and ado cars according to [10]. The dataset for scenario 1 contains observations
(including location of the ego vehicle, velocity of the ego vehicle, location of other vehicle with noise and
velocity of other vehicle with noise); action (throttle) from the control policies and one indicator which
indicates the specific driving mode.
For the second wrong direction scenario as shown in Fig 2b, the expert demonstration data is generated
similarly using two different driving modes (timid, aggressive) in simulation. The observation space is 5D
with additional observation of ado vehicle location. The control policy for the two driving modes are defined
similarly as in scenario 1. Except for TTC, we calculate a predicted collision point based on ado car’s
heading, location and velocity. The composition of this dataset is similar to the dataset of scenario 1. The
steering control is set to be zero since the ego car is moving in a lane. The vehicle is assumed to be a point
mass in the simulation. Roughly 85% data is used for training examples and 10% data is used for validation
examples and 5% data is used for test examples for the CoIL training.

Implementation code can be found at https://github.com/JiaqiaoZhang/CS229_Final_Project

5 Methods
The main method is firstly trying to learn different driving mode using CoIL from the dataset described
before. We defined three main driving modes according to [10], timid mode, normal mode and aggressive
mode. These three modes are defined by evaluating the driving risk which is based on time for the following
car to reach the preceding car and predicted time to collision. The timid driving mode have low driving
risk and it tends to keep low velocity and large gap with other car. The aggressive driving mode has high
driving risk and tends to keep high velocity and small gap with other cars. The normal driving mode behaves
neutrally between these two modes.
Deep neural network is used to learn the driving model using Conditional Imitation Learning (CoIL). Ac-
cording to [7], the input to the neural network is the observation of the vehicle in the simulated scenario. The
output is the action (steering and throttle). An additional command which is used to determine the driving
mode is provided. CoIL allow us to train models which can be switched given the high-level commands.
Then the high-level agent which gives the commands to switch the different low-level agent obtained from
CoIL before is trained using Proximal Policy Optimization (PPO) RL algorithm [11]. PPO alternate between
sampling data through interaction with the environment, and optimizing a novel objective function according
to [11]. The novel objective achieves a way to do a Trust Region update which is compatible with Stochastic
Gradient Descent, and simplifies the algorithm by removing the KL penalty and need to make adaptive
updates. The environment used to train RL is CARLO for two different scenarios.

6 Experiments/Results/Discussion
The preliminary experiments are done using the scenario proposed in [4]. The experiments are conducted in
two different scenarios:

(1) the first scenario is characterized by a ego car which is approaching a crossroad while ado car (simulated
by computer using pre-defined control law) is also approaching (figure 2a), which is called the intersection
scenario. For this scenario, there are overall three control mode dominating ago car: aggressive, normal and
easy. In the aggressive mode, the ado will perform in a way that will generally have higher speed and are
more likely to collide with ego car, while in the easy mode the collision is prevented. The normal mode is
designed in a way that the completion time and collision rate are both between aggressive mode and normal
mode. All three mode are hard coded control laws.
(2) The second scenario is simulating the situation where the ado car is driving in the opposite direction
towards the ego car, as shown in figure 2b. For this scenario, there are only two mode: aggressive and timid.
The simulator used in this experiment is CARLO, a customized 2D driving simulator that involves simple
dynamic model and visualizations. CARLO can handle 2D simulation with fast implementation in perception
and measurement data.

We assume a point-mass dynamic model in this scenario, and no other obstacle in the scenario. Considering
safety and efficiency, a test is defined as success when ego car reaches the target under a certain amount of
time without colliding into the ago car or other environment. The data we used are composed of two parts:
first part is provided by Erdem Bıyık, which is associated with the paper [1]; second part is generated by
hard coded control laws in the gym environment.

For both scenarios, the observations are: ego car’s location and velocity, ado car’s location and velocity. The
only difference between scenario 1 and 2 is in scenario 1, the dimension of ego car’s location is one (because
ego car only drives in a straight line), while in scenario 2, the ego car’s dimension is two. The observation
along with higher level command (timid or aggressive) is fed into neural network to obtain a policy that
minimizes the loss function defined by the difference between the ego car and expert behaviors.

Implementation code can be found at https://github.com/JiaqiaoZhang/CS229_Final_Project

(a) Scenario1 (ego car: red, ado car: blue) (b) Scenario2 (ego car: red, ado car: blue)

Figure 2: Scenario illustration

(a) Scenario 1 collision rate (b) Scenario 1 completion rate

Figure 3: Scenario 1 experiments results

The results of scenario 1 is shown in Figure 2a. As for now, for conditional imitation learning, we have
three results: ”COIL-aggressive”, ”COIL-middle” and ”COIL-timid”. ”Aggressive”, ”Timid” and ”Normal”
stand for the testing result of hard coded control policies before CoIL and RL. ”Random” shows the result
of policy combining three control modes randomly. Finally, ”RL-2 mode” and ”RL-3 mode” are the results
of using reinforcement learning when there are two control modes for selection (aggressive and timid), and
there are three control modes for selection (aggressive, middle and timid).
First, let’s compare the results of imitation learning and hard coded policy. In terms of the collision rate,
there is not much difference between normal and CoIL-normal or timid and CoIL-timid. The collision rate
for aggressive is 0.9 while for CoIL-aggressive is 0.24. When it comes to the completion time, the trend is
exactly on the opposite side. This is partly because if the ego car wants to behave aggressively, it will have
a comparatively higher speed, which will result in less time and a higher collision rate. It is no surprising
that the timid mode has the longest completion time, while the aggressive mode has the lowest completion
time.
A random policy also is included for later comparison with results from reinforcement learning. It can be
seen that the collision rate and completion time are in the middle of three hard code policy, which makes
sense because random policy chooses three modes with equal probability, and the model is closely related to
the ego car action, which is throttle in this case.

Implementation code can be found at https://github.com/JiaqiaoZhang/CS229_Final_Project

Finally, results of RL-2 mode and RL-3 mode show that the reinforcement learning actually achieves a good
balance in choosing modes. Both RL results have a lower completion time and lower collision rate than the
random one, which shows that RL policies can effectively choose the modes according to the observation.
Although RL’s collision rate is not as low as the timid one, the completion time has reduced by 5.21s and
8.32s respectively. It worth to mention that the result of RL is closely related to the penalty/reward when
there is a collision. This finding also verifies the results provided by the paper [4]. What’s more is the
difference between choosing from 2 driving modes and 3 driving modes. From our study, we find that there
is a obvious improvement of the completion time between 2 and 3 driving modes, while the collision rate of 3
driving mode is 0.1, which is higher than 2 driving mode: 0.02. This finding is interesting because it suggests
that first training with more driving mode. However, this study only compares the results from 2 to 3 driving
mode, whether more modes (greater than 3) will contributes to lower completion time is still remain to be
discovered. Also, all those results are based on 2D simulation CARLO and point mass physical model, more
work should be done using real car kinematic and dynamic model, such as in CARLA environment [12].

(a) Scenario 2 collision rate (b) Scenario 2 completion rate

Figure 4: Scenario 2 experiments results

The results of scenario 2 is shown in figure 4. For this scenario, we only train the RL policy in 2 driving
mode. The overall performance of RL policy is similar to what we have seen in scenario 1. The completion
time of RL-2 mode is about the same as the random policy, but the collision rate of RL-2 mode is much
lower than random one (from 0.11 to 0.24).

7 Conclusion/Future Work
In summary, we can draw a conclusion that the approach - first using conditional imitation learning to learn
driving model from expert, then training a high level policy using reinforcement learning does a great job in
two scenarios. Although two scenarios shown in this project is rather simple and there are some assumptions
in the simulation, this does shed light on the application of CoIL-RL in more complicated scenario where
the motion planning of the vehicle is challenging due to the environment. However, there are still some work
remain to be done. First, more high risk scenarios including halting car, merge, unprotected Turn can be
used to evaluate the performance of different driving model and switching techniques. Second the simulation
can be done in CARLA [12] where the physical model is more realistic. Finally, the expert data can be
obtained from real drivers instead of hard coded policy.

Implementation code can be found at https://github.com/JiaqiaoZhang/CS229_Final_Project

8 Contributions
1. Test and write hard coded policy
2. Generate data for scenario 1 and 2
3. Train policy for each driving mode using imitation learning
4. Train policy combining 2 or 3 driving modes using CoIL
5. Train reinforcement learning using DQN and PPO
6. Test and evaluate the results.
All team members contributes equally to this project.

Implementation code can be found at https://github.com/JiaqiaoZhang/CS229_Final_Project

References
[1] Edmund Donges. A conceptual framework for active safety in road traffic. Vehicle System Dynamics,
32(2-3):113–128, 1999.
[2] Markus Wulfmeier, Dushyant Rao, Dominic Zeng Wang, Peter Ondruska, and Ingmar Posner. Large-
scale cost function learning for path planning using deep inverse reinforcement learning. The Interna-
tional Journal of Robotics Research, 36(10):1073–1087, 2017.

[3] Dean A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In Proceedings of
the 1st International Conference on Neural Information Processing Systems, NIPS’88, page 305–313,
Cambridge, MA, USA, 1988. MIT Press.
[4] Zhangjie Cao, Erdem Biyik, Woodrow Z. Wang, Allan Raventos, Adrien Gaidon, Guy Rosman, and
Dorsa Sadigh. Reinforcement learning based control of imitative policies for near-accident driving. In
Science and Systems (RSS), July 2020.
[5] Urs Muller, Jan Ben, Eric Cosatto, Beat Flepp, and Yann L. Cun. Off-road obstacle avoidance through
end-to-end learning. In Y. Weiss, B. Schölkopf, and J. C. Platt, editors, Advances in Neural Information
Processing Systems 18, pages 739–746. MIT Press, 2006.

[6] Felipe Codevilla, Matthias Müller, Alexey Dosovitskiy, Antonio López, and Vladlen Koltun. End-to-end
driving via conditional imitation learning. CoRR, abs/1710.02410, 2017.
[7] Felipe Codevilla, Matthias Muller, Antonio Lopez, Vladlen Koltun, and Alexey Dosovitskiy. End-to-
end driving via conditional imitation learning. 2018 IEEE International Conference on Robotics and
Automation (ICRA), May 2018.

[8] Tejas D Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. Hierarchical deep
reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In D. D. Lee,
M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information
Processing Systems 29, pages 3675–3683. Curran Associates, Inc., 2016.
[9] Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Overcoming
exploration in reinforcement learning with demonstrations. CoRR, abs/1709.10089, 2017.
[10] Qingwen Xue, Ke Wang, Jian Lu, and Yujie Liu. Rapid driving style recognition in car-following using
machine learning and vehicle trajectory data. Journal of Advanced Transportation, 2019:1–11, 01 2019.
[11] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy
optimization algorithms. CoRR, abs/1707.06347, 2017.
[12] Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. CARLA: An
open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning, pages
1–16, 2017.

Implementation code can be found at https://github.com/JiaqiaoZhang/CS229_Final_Project

ICT Vids Explained
38% (81)
ICT Vids Explained
11 pages
EMV CPS v1.1
67% (3)
EMV CPS v1.1
104 pages
Modified DDPG car-following model with a real-world human driving experience with CARLA simulator
No ratings yet
Modified DDPG car-following model with a real-world human driving experience with CARLA simulator
34 pages
Proximal Policy Optimization Through A Deep Reinfo
No ratings yet
Proximal Policy Optimization Through A Deep Reinfo
19 pages
jurnal
No ratings yet
jurnal
20 pages
Autonomous Driving System Using Proximal Policy Optimization in Deep Reinforcement Learning
No ratings yet
Autonomous Driving System Using Proximal Policy Optimization in Deep Reinforcement Learning
10 pages
Paper 3 Mlis
No ratings yet
Paper 3 Mlis
9 pages
Scenario Generation For Autonomous Vehicles With Deep-Learning-Based Heterogeneous Driver Models Implementation and Verification
No ratings yet
Scenario Generation For Autonomous Vehicles With Deep-Learning-Based Heterogeneous Driver Models Implementation and Verification
14 pages
ChauffeurNet: Learning to Drive. Alex Krizhevsky
No ratings yet
ChauffeurNet: Learning to Drive. Alex Krizhevsky
10 pages
10、《Let Hybrid a Path Planner Obey Traffic Rules a Deep Reinforcement Learning-Based Planning Framework》
No ratings yet
10、《Let Hybrid a Path Planner Obey Traffic Rules a Deep Reinforcement Learning-Based Planning Framework》
8 pages
N-19741
No ratings yet
N-19741
8 pages
Electronics: Decision-Making System For Lane Change Using Deep Reinforcement Learning in Connected and Automated Driving
No ratings yet
Electronics: Decision-Making System For Lane Change Using Deep Reinforcement Learning in Connected and Automated Driving
13 pages
Trustworthy Safety Improvement For Autonomous Driving Using Reinforcement Learning
No ratings yet
Trustworthy Safety Improvement For Autonomous Driving Using Reinforcement Learning
18 pages
Toromanoff End-to-End Model-Free Reinforcement Learning For Urban Driving Using Implicit Affordances CVPR 2020 Paper
No ratings yet
Toromanoff End-to-End Model-Free Reinforcement Learning For Urban Driving Using Implicit Affordances CVPR 2020 Paper
10 pages
Paper 11
No ratings yet
Paper 11
15 pages
1976 Decision Making For Autonomous Driving Via Augmented Adversarial Inverse Reinforcement
No ratings yet
1976 Decision Making For Autonomous Driving Via Augmented Adversarial Inverse Reinforcement
7 pages
DRLAD ExploringApplicationsTalk2019 CognitiveVehicles
No ratings yet
DRLAD ExploringApplicationsTalk2019 CognitiveVehicles
28 pages
2209.15073v2
No ratings yet
2209.15073v2
5 pages
Attention-Based Highway Safety Planner For Autonomous Driving Via Deep Reinforcement Learning
No ratings yet
Attention-Based Highway Safety Planner For Autonomous Driving Via Deep Reinforcement Learning
14 pages
21-IEEE-NNLS-Adaptive Resilient Event-Triggered Control Design of Autonomous Vehicles With An Iterative Single Critic Learning Framework
No ratings yet
21-IEEE-NNLS-Adaptive Resilient Event-Triggered Control Design of Autonomous Vehicles With An Iterative Single Critic Learning Framework
10 pages
Reactive and Safe Road User Simulations Using Neural Barrier Certificates
No ratings yet
Reactive and Safe Road User Simulations Using Neural Barrier Certificates
8 pages
Conference Paper
No ratings yet
Conference Paper
11 pages
End-To-End Driving Via Conditional Imitation Learning (03-2018, 1710.02410)
No ratings yet
End-To-End Driving Via Conditional Imitation Learning (03-2018, 1710.02410)
8 pages
21AIE401DRL TeamNo4 AIE19005 20 36 Report
No ratings yet
21AIE401DRL TeamNo4 AIE19005 20 36 Report
7 pages
Basic Study For Transfer Learning For Autonomous Driving in Car Race of Model Car
No ratings yet
Basic Study For Transfer Learning For Autonomous Driving in Car Race of Model Car
4 pages
Decision-Making For Autonomous Vehicles On Highway: Deep Reinforcement Learning With Continuous Action Horizon
No ratings yet
Decision-Making For Autonomous Vehicles On Highway: Deep Reinforcement Learning With Continuous Action Horizon
9 pages
Toward HITL AI Enhancing Deep Reinforcement Learning Via RealTime Human Guidance For Autonomous Driving
No ratings yet
Toward HITL AI Enhancing Deep Reinforcement Learning Via RealTime Human Guidance For Autonomous Driving
17 pages
Microscopic Traffic Simulation by Cooperative Multi-Agent Deep Reinforcement Learning
No ratings yet
Microscopic Traffic Simulation by Cooperative Multi-Agent Deep Reinforcement Learning
9 pages
glaser2010
No ratings yet
glaser2010
18 pages
2310.10925v2
No ratings yet
2310.10925v2
11 pages
Nonlinear_Driver_Parameter_Estimation_and_Driver_Steering_Behavior_Analysis_for_ADAS_Using_Field_Test_Data
No ratings yet
Nonlinear_Driver_Parameter_Estimation_and_Driver_Steering_Behavior_Analysis_for_ADAS_Using_Field_Test_Data
14 pages
Reinforcement Learning Based Approach For Multi-Vehicle Platooning Problem With Nonlinear Dynamic Behavior
No ratings yet
Reinforcement Learning Based Approach For Multi-Vehicle Platooning Problem With Nonlinear Dynamic Behavior
10 pages
WD-RL
No ratings yet
WD-RL
24 pages
Wevj 14 00032 v2
No ratings yet
Wevj 14 00032 v2
19 pages
0504 Learning Robust Driving Policies Without Online Exploration
No ratings yet
0504 Learning Robust Driving Policies Without Online Exploration
8 pages
Comparing DRL Architectures
No ratings yet
Comparing DRL Architectures
14 pages
Decision-Making Strategy On Highway For Autonomous Vehicles Using Deep Reinforcement Learning
No ratings yet
Decision-Making Strategy On Highway For Autonomous Vehicles Using Deep Reinforcement Learning
11 pages
Me 2017 Dec7
No ratings yet
Me 2017 Dec7
5 pages
Nghiên cứu hệ thống tránh va chạm lái khẩn cấp tự động và kiểm soát ổn định của xe lái thông minh
No ratings yet
Nghiên cứu hệ thống tránh va chạm lái khẩn cấp tự động và kiểm soát ổn định của xe lái thông minh
13 pages
자율주행 LiDar
No ratings yet
자율주행 LiDar
20 pages
Ignition: An End-to-End Supervised Model For Training Simulated Self-Driving Vehicles
No ratings yet
Ignition: An End-to-End Supervised Model For Training Simulated Self-Driving Vehicles
6 pages
Learning A Deep Neural Net Policy For End-to-End Control of Autonomous Vehicles
No ratings yet
Learning A Deep Neural Net Policy For End-to-End Control of Autonomous Vehicles
6 pages
Generative Adversarial Inverse Reinforcement Learning With Deep Deterministic Policy Gradient
No ratings yet
Generative Adversarial Inverse Reinforcement Learning With Deep Deterministic Policy Gradient
15 pages
2501.12408v1
No ratings yet
2501.12408v1
16 pages
Mainrep
No ratings yet
Mainrep
6 pages
A Review of Reward Functions for Reinforcement Learning in the context of autonomous driving
No ratings yet
A Review of Reward Functions for Reinforcement Learning in the context of autonomous driving
8 pages
Integrating Deep Reinforcement Learning With Model-Based Path Planner
No ratings yet
Integrating Deep Reinforcement Learning With Model-Based Path Planner
6 pages
Master Thesis Proposal Santiago Amaya
No ratings yet
Master Thesis Proposal Santiago Amaya
5 pages
Interpretable End-To-End Urban Autonomous Driving With Latent Deep Reinforcement Learning
No ratings yet
Interpretable End-To-End Urban Autonomous Driving With Latent Deep Reinforcement Learning
11 pages
Applsci 13 02750 v2
No ratings yet
Applsci 13 02750 v2
23 pages
Liu 2017
No ratings yet
Liu 2017
6 pages
DDPG for Obstacle Avoidance
No ratings yet
DDPG for Obstacle Avoidance
22 pages
Q-Learning Solution
No ratings yet
Q-Learning Solution
7 pages
2020 Hierarchical Reinforcement Learning For Autonomous Decision Making and Motion Planning of Intelligent Vehicles
No ratings yet
2020 Hierarchical Reinforcement Learning For Autonomous Decision Making and Motion Planning of Intelligent Vehicles
14 pages
s00500-018-3063-7
No ratings yet
s00500-018-3063-7
14 pages
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
No ratings yet
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
24 pages
Mainrep
No ratings yet
Mainrep
6 pages
applsci-14-00173
No ratings yet
applsci-14-00173
18 pages
Agent-Based Evaluation of Driver Heterogeneous Behavior During Safety-Critical Events
No ratings yet
Agent-Based Evaluation of Driver Heterogeneous Behavior During Safety-Critical Events
6 pages
Behavioral Adaptation in Driving Using Deep Q Learning for Adaptive Cruise Control and Lane Control
No ratings yet
Behavioral Adaptation in Driving Using Deep Q Learning for Adaptive Cruise Control and Lane Control
9 pages
Deep Reinforcement Learning for Autonomous Driving a Survey
No ratings yet
Deep Reinforcement Learning for Autonomous Driving a Survey
18 pages
Self Driving
From Everand
Self Driving
Kai Turing
No ratings yet
L3-Types of Language Syllabi
No ratings yet
L3-Types of Language Syllabi
6 pages
Raghavan 2012
No ratings yet
Raghavan 2012
2 pages
Bussiness Analyst Interview Questions
No ratings yet
Bussiness Analyst Interview Questions
18 pages
Difference Systematic Review and Literature Review
100% (1)
Difference Systematic Review and Literature Review
6 pages
COVID Module 1 Transcript
No ratings yet
COVID Module 1 Transcript
3 pages
English A Equivalency in Eu&Eea - 2011
No ratings yet
English A Equivalency in Eu&Eea - 2011
1 page
Section of WHP: Terrace Floor Terrace Floor
No ratings yet
Section of WHP: Terrace Floor Terrace Floor
1 page
Principle 9 12
No ratings yet
Principle 9 12
18 pages
Eddm Apch-02 Ils 08R
No ratings yet
Eddm Apch-02 Ils 08R
1 page
Extracted Pages From ASME II PART A1 (2019) - SA210
No ratings yet
Extracted Pages From ASME II PART A1 (2019) - SA210
6 pages
Mock Lesson Plan
100% (1)
Mock Lesson Plan
2 pages
Copar Notes
No ratings yet
Copar Notes
8 pages
Ego-the-Self
No ratings yet
Ego-the-Self
5 pages
PGDM GLC Ib (Organisation Behavior)
No ratings yet
PGDM GLC Ib (Organisation Behavior)
126 pages
ICE Export Catalog 2011
No ratings yet
ICE Export Catalog 2011
72 pages
Business
No ratings yet
Business
2 pages
Chapterwise PP FCMA - APR'24 Updated
No ratings yet
Chapterwise PP FCMA - APR'24 Updated
6 pages
The Object of Proximity
No ratings yet
The Object of Proximity
29 pages
ISO_FDIS_15614-4_2005-03_aluminio castings(E)
No ratings yet
ISO_FDIS_15614-4_2005-03_aluminio castings(E)
24 pages
List of Peer Advisers - Intramuros
No ratings yet
List of Peer Advisers - Intramuros
4 pages
Worflow Configuration
No ratings yet
Worflow Configuration
14 pages
PPT
No ratings yet
PPT
14 pages
Unit 1_Foundations of AI and Cognitive Computing.docx
No ratings yet
Unit 1_Foundations of AI and Cognitive Computing.docx
30 pages
CASE 1: de Mar's Product Strategy
100% (1)
CASE 1: de Mar's Product Strategy
8 pages
Audit Materiality
No ratings yet
Audit Materiality
2 pages
CCCCCCCCCCCCC C
No ratings yet
CCCCCCCCCCCCC C
10 pages
What Are Activities To Convert Raw Data To Make Information in Information Systems
33% (3)
What Are Activities To Convert Raw Data To Make Information in Information Systems
4 pages
Flowchart For Direct Extraction in DataCollector 2.0.0
No ratings yet
Flowchart For Direct Extraction in DataCollector 2.0.0
1 page

Sensor

Uploaded by

Sensor

Uploaded by

Investigation of Near-accident Car-driving Scenario

using Deep Imitation Learning and Reinforcement

Figure 1: Network architecture of CoIL

4 Dataset and Features

Implementation code can be found at https://github.com/JiaqiaoZhang/CS229_Final_Project

Implementation code can be found at https://github.com/JiaqiaoZhang/CS229_Final_Project

Figure 2: Scenario illustration

(a) Scenario 1 collision rate (b) Scenario 1 completion rate

Figure 3: Scenario 1 experiments results

Implementation code can be found at https://github.com/JiaqiaoZhang/CS229_Final_Project

(a) Scenario 2 collision rate (b) Scenario 2 completion rate

Figure 4: Scenario 2 experiments results

Implementation code can be found at https://github.com/JiaqiaoZhang/CS229_Final_Project

Implementation code can be found at https://github.com/JiaqiaoZhang/CS229_Final_Project

Implementation code can be found at https://github.com/JiaqiaoZhang/CS229_Final_Project

You might also like