final project

Uploaded by

chu030331

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

final project

Uploaded by

chu030331

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

simulate the reweighting process

using MATLAB. This approach

ensures the correctness and
Pin-Shiang Chu effectiveness of the algorithm
Departmet of Civil Engineering before applying it to the Gazebo
National Central University environment.
Taoyuan Taiwan
[email protected]
I. Introduction

This final project aims to

reinfocement learning algorithms
to enable drones to autonomously
learn how to navigate from a
starting point to an endpoint
while successfully avoiding
obstacles. To evaluate the
effectiveness of the algorithm, the
training process will be simulated
in a virtual environment. In the
future, the learning results can be III. Reinfocement Learning
applied in real-world settings, In this environment, it is
allowing drones to successfully highly suitable to use
complete autonomous flights in reinforcement learning to train
the real world based on their robots. Reinforcement learning is
training in the simulator. a type of machine learning where
II. Backgound a computer learns to correctly
perform a task through repeated
This project is designed to interactions with a dynamic
operate within the Gazebo environment. In the real world, it
environment, aiming to develop is not feasible to allow robots to
an algorithm that enables an constantly make trial-and-error
unmanned aerial vehicle (UAV) to choices, as this would be very
learn to fly from (6,3) to (6,-6). costly. Therefore, by training
Due to the time-consuming robots in a simulator, they can
nature of reweighting during the learn which behaviors to avoid
learning process, the and then achieve the desired
implementation will initially objectives in the real world. This
trial-and-error learning method stable.
enables computers to make a C. r: Immediate reward
series of decisions without human received after taking action
intervention and without being a in state s.
explicitly programmed to perform D. γ: Discount factor, with a
specific tasks. range between 0 and 1. It
determines the present
IV. Q-Learning
value of future rewards. A
Q-Learning is a reinforcement higher value means future
learning algorithm based on value rewards have a greater
iteration. Its fundamental concept impact on current decisions.
is to guide an agent in choosing E. Q(s′,a′): The maximum Q-
the optimal action by learning the value for all possible actions
value function Q(s,a) of state- a′ in the next state s′. This
action pairs. The Q-value represents the highest
represents the expected expected reward for the
cumulative reward obtained after next state.
taking action a in state s. The F. r+γmaxQ(s′,a′)−Q(s,a): This
algorithm approximates the part is called the Temporal
optimal value function by Difference (TD) error, which
continuously updating the Q- represents the difference
values. As for how to calculate the between the current
Q-value, we can use the following estimate and the newly
formula: observed information. It
reflects the gap between
Q(s,a)=Q(s,a)+α[r+γmaxQ(s′,a′)−Q(s,a)]
the current Q-value and the
A. Q(s,a): This is the current Q- updated Q-value.
value for taking action aaa
in state s, which we want to V. Algorithm
update.
For episode
B. α: Learning rate, with a
range between 0 and 1. It 1.a = max (Q(s, a))(ε-greedy)
determines the weight
2.Get reward(Q(s, a))
given to new information. A
higher value makes learning 3.Q(s,a)=Q(s,a)+α[r+γmaxQ(s′,a′)
faster but more unstable, −Q(s,a)]
while a lower value makes
A. Chose a: With a probability
learning slower but more
of x, a direction is chosen B. Gradient Descent for
randomly, and with a Parameter Updates
probability of 1−x, the
To update the parameters ,
direction with the highest Q-
we use gradient descent to
value is chosen.
minimize the loss function,
B. Calculate TD Error:
typically defined as the squared
r+γmaxQ(s′,a′)−Q(s,a) The TD
error represents the TD error:(x)=[r+γmax⁡a′Q(s′,a′;x)
difference between the −Q(s,a;x)]^2
current Q-value and the
The update rule for xis: x←x+α∇x
estimated Q-value after the
Q(s,a;x)
update.
C. Update Q-value: C. Algorithm
Q(s,a)←Q(s,a)+αδ The learning
For episode
rate α controls the impact of
the TD error, adjusting the 1.a = max (Q(s, a))
Q-value incrementally.
2.Get reward(Q(s, a))

VI. Proposed Method 3.W:= W + α * [r + γ * max(Q(s', a'))

- Q(s, a)*X
The above method involves
recording the Q-values for each D. Parameter Design
state-action pair to build a Q- i. The feature design of the Q
table. However, when the state or function selects [f1, f2, f3,
action space is very large or f4, f5] to represent:
continuous, this approach 1. Constant
becomes impractical. In such 2. abs(goal.x - robot.x)
cases, we can use function 3. abs(goal.y - robot.y)
approximation to replace the Q- 4. abs(atan(goal.y - robot.y,
table. This method is known as goal.x - robot.x) - robot_t)
function-based Q-Learning. We  Represents the
can define our own Q-function for difference in angle
this purpose. between the drone
and the target angle.
A. Function Based Q- 5. sqrt(pow(robot.x - goal.x, 2)
Learning
+ pow(robot.y - goal.y, 2))
Q(s,a;x)←Q(s,a;x)+α[r+γmaxa′ Q(s  Represents the
′,a′;x)−Q(s,a;x)] absolute distance.
Therefore, the Q function can issues were encountered. First,
be expressed as [f1, f2, f3, f4, f5] during training in Gazebo, it's
* [x1, crucial to be mindful when
x2,
detecting obstacles through ROS.
x3,
The UAV may not maintain a
x4,
perfectly horizontal state during
x5]
flight, which could inadvertently
ii. The reward is designed such lead to floor detection being
that encountering an mistaken for obstacles. This can
obstacle or a wall results in result in training errors. Another
-10, reaching the endpoint
issue arose during the weight
gives a reward of 10, and
updating process. Initially, when
for all other situations the
reward is -0.05. the UAV collided with an obstacle,
iii. The actions are designed to the intention was to reset the
include robot's position to (6, 3, 1) using a
five directions: function. However, it was
Left 30 degrees
discovered that this method
Left 15 degrees
caused the UAV to fall due to the
Forward (straight ahead)
Right 15 degrees lack of initial velocity.
Right 30 degrees Consequently, weights needed to
iv. The satates: be saved to a file after each round
and then extracted from the file
for the next round of training.

VII. Conclusion

During the process of working on

the final project, several critical

Vocabulary Quiz 2 Group B: Phone
100% (2)
Vocabulary Quiz 2 Group B: Phone
2 pages
AGIL Paradigm
75% (8)
AGIL Paradigm
3 pages
Using Q-Learning To Automatically Tune Quadcopter PID Controller Online For Fast Altitude Stabilization
No ratings yet
Using Q-Learning To Automatically Tune Quadcopter PID Controller Online For Fast Altitude Stabilization
6 pages
Trajectory Optimization For Autonomous Flying Base Station Via Reinforcement Learning
No ratings yet
Trajectory Optimization For Autonomous Flying Base Station Via Reinforcement Learning
5 pages
MAS-Lab7-QFA
No ratings yet
MAS-Lab7-QFA
10 pages
S18 Reinforcement Learning 2
No ratings yet
S18 Reinforcement Learning 2
46 pages
Rahman2018 Article ImplementationOfQLearningAndDe PDF
No ratings yet
Rahman2018 Article ImplementationOfQLearningAndDe PDF
6 pages
Reinforcement Learning-Based Collision Avoidance For UAV
No ratings yet
Reinforcement Learning-Based Collision Avoidance For UAV
6 pages
Extended Q-Learning: Reinforcement Learning Using Self-Organized State Space
No ratings yet
Extended Q-Learning: Reinforcement Learning Using Self-Organized State Space
10 pages
RL
No ratings yet
RL
6 pages
Temporal-Difference (TD) Learning: Basics
No ratings yet
Temporal-Difference (TD) Learning: Basics
6 pages
Simulation of The Navigation of A Mobile Robot by The Q-Learning Using Artificial Neuron Networks
No ratings yet
Simulation of The Navigation of A Mobile Robot by The Q-Learning Using Artificial Neuron Networks
12 pages
Q Learning
No ratings yet
Q Learning
9 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
4 pages
10. Learning Task
No ratings yet
10. Learning Task
14 pages
AIML
No ratings yet
AIML
4 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
37 RL
No ratings yet
37 RL
18 pages
unit-5
No ratings yet
unit-5
65 pages
Data Driven Control IEEE Paper
No ratings yet
Data Driven Control IEEE Paper
4 pages
Documentation
No ratings yet
Documentation
27 pages
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
No ratings yet
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
9 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
38 pages
Playing Geometry Dash With Convolutional Neural Networks
No ratings yet
Playing Geometry Dash With Convolutional Neural Networks
7 pages
The Actor-Dueling-Critic Method
No ratings yet
The Actor-Dueling-Critic Method
20 pages
Dhruv Anirudh DrSandeep (3)
No ratings yet
Dhruv Anirudh DrSandeep (3)
21 pages
Intro to Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro to Reinforcement Learning - DQ Q AC A3C
36 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
Question Bank - Unit 4 & 5
No ratings yet
Question Bank - Unit 4 & 5
2 pages
Smooth Q-Learning - Accelerate Convergence
No ratings yet
Smooth Q-Learning - Accelerate Convergence
7 pages
2011-Leon Teaching A Robotb
No ratings yet
2011-Leon Teaching A Robotb
8 pages
Reinforcement_Learning_Algorithms_in_Global_Path_Planning_for_Mobile_Robot
No ratings yet
Reinforcement_Learning_Algorithms_in_Global_Path_Planning_for_Mobile_Robot
5 pages
Lec 09
No ratings yet
Lec 09
26 pages
3.reinforcement Learning DDPG-PPO Agent-Based Control S Ystem
No ratings yet
3.reinforcement Learning DDPG-PPO Agent-Based Control S Ystem
14 pages
drl_hw2_2022_fin2
No ratings yet
drl_hw2_2022_fin2
6 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
CS 188 Introduction To Artificial Intelligence Summer 2019 Note 4
No ratings yet
CS 188 Introduction To Artificial Intelligence Summer 2019 Note 4
9 pages
Temporal Difference Learning
No ratings yet
Temporal Difference Learning
17 pages
Assignment_7
No ratings yet
Assignment_7
1 page
Reinforcement Learning Based Quadcopter Controller
No ratings yet
Reinforcement Learning Based Quadcopter Controller
7 pages
RL Examples
No ratings yet
RL Examples
6 pages
Application of Reinforcement Learning To A Two Dof Robot Arm Control
No ratings yet
Application of Reinforcement Learning To A Two Dof Robot Arm Control
2 pages
Flappy Bird: AI Final Project
No ratings yet
Flappy Bird: AI Final Project
8 pages
Temporal Difference Models_ Model-Free Deep RL for Model-Based Control
No ratings yet
Temporal Difference Models_ Model-Free Deep RL for Model-Based Control
14 pages
2209.02954v1
No ratings yet
2209.02954v1
10 pages
Studies on Model Free Control of Dynamical Systems
No ratings yet
Studies on Model Free Control of Dynamical Systems
13 pages
3D Obstacle Avoidance For UAV Based On RL and RealSense
No ratings yet
3D Obstacle Avoidance For UAV Based On RL and RealSense
6 pages
Sensors 23 02036
No ratings yet
Sensors 23 02036
24 pages
Report On Reinforcement Learning
No ratings yet
Report On Reinforcement Learning
26 pages
Reinforcement Learning Optimization
No ratings yet
Reinforcement Learning Optimization
6 pages
Filippov Theory On Infinitesimal Epsilon-Greedy Q-Learning
No ratings yet
Filippov Theory On Infinitesimal Epsilon-Greedy Q-Learning
66 pages
Movement Skill Acquisition Using Imitati
No ratings yet
Movement Skill Acquisition Using Imitati
64 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Reinforced Learning-Based Robust Control Design For Unmanned Aerial Vehicle
No ratings yet
Reinforced Learning-Based Robust Control Design For Unmanned Aerial Vehicle
16 pages
Reinforcement Learning For Robust Missile Autopilot Design
No ratings yet
Reinforcement Learning For Robust Missile Autopilot Design
10 pages
Wcci 14 S
No ratings yet
Wcci 14 S
7 pages
Cessna 2
No ratings yet
Cessna 2
20 pages
ML Module - 5 QB Solved-1
No ratings yet
ML Module - 5 QB Solved-1
11 pages
A Star: Fundamentals and Applications
From Everand
A Star: Fundamentals and Applications
Fouad Sabry
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Copyofluckertbenchmarklesson-Makinginferencessol7 5g
No ratings yet
Copyofluckertbenchmarklesson-Makinginferencessol7 5g
2 pages
IAS Transfer List
No ratings yet
IAS Transfer List
5 pages
BBC5300D Hardware Description (01) (PDF) - EN
No ratings yet
BBC5300D Hardware Description (01) (PDF) - EN
11 pages
Download Complete (Ebook) Oxford Textbook of Epilepsy and Epileptic Seizures by Mark Cook, Samden Lhatoo ISBN 9780199659043, 0199659044 PDF for All Chapters
100% (4)
Download Complete (Ebook) Oxford Textbook of Epilepsy and Epileptic Seizures by Mark Cook, Samden Lhatoo ISBN 9780199659043, 0199659044 PDF for All Chapters
81 pages
Karcher b102 Plus Operation User S Manual 20
No ratings yet
Karcher b102 Plus Operation User S Manual 20
20 pages
Thin-Layer Chromatography of Amino Acids: HASPI Medical Biology Lab 15b
No ratings yet
Thin-Layer Chromatography of Amino Acids: HASPI Medical Biology Lab 15b
6 pages
Real Time Data Get From Stock Exchange Using PHP
No ratings yet
Real Time Data Get From Stock Exchange Using PHP
6 pages
12. Assessment of Charitable Trusts question
100% (1)
12. Assessment of Charitable Trusts question
13 pages
CSF
100% (1)
CSF
19 pages
Reclaiming Space From MDC
No ratings yet
Reclaiming Space From MDC
18 pages
13 02.25.Fish Terminal-layout1
No ratings yet
13 02.25.Fish Terminal-layout1
1 page
Letter To The Editor Example - Google Search
No ratings yet
Letter To The Editor Example - Google Search
1 page
Lecture 1
No ratings yet
Lecture 1
19 pages
Operation and Control of STP
No ratings yet
Operation and Control of STP
11 pages
Tabu NHS
No ratings yet
Tabu NHS
7 pages
PAG 12.1 - chemistry
No ratings yet
PAG 12.1 - chemistry
2 pages
Skincare Products Sephora
No ratings yet
Skincare Products Sephora
1 page
20-10-24 - JR - Iit - Star Co-Sc - Jee Adv - 2020 (P-Ii) - Cat-18 - QP
No ratings yet
20-10-24 - JR - Iit - Star Co-Sc - Jee Adv - 2020 (P-Ii) - Cat-18 - QP
20 pages
M 07-02.2 - C-001 Dimensions, Allowance, Tolerances and Standard of Workmanship
0% (1)
M 07-02.2 - C-001 Dimensions, Allowance, Tolerances and Standard of Workmanship
28 pages
Algebra 01
No ratings yet
Algebra 01
2 pages
The DJ Bible
No ratings yet
The DJ Bible
706 pages
SC DLP Y5 TS25 (Unit 8)
No ratings yet
SC DLP Y5 TS25 (Unit 8)
10 pages
Purgatory Bilingual Edition Raul Zurita pdf download
100% (1)
Purgatory Bilingual Edition Raul Zurita pdf download
54 pages
AFOAJ0XXOS2-Molex Cable Comply To Australian Standards
No ratings yet
AFOAJ0XXOS2-Molex Cable Comply To Australian Standards
1 page
Parts Reference List MODEL: MFC7420 / 7820N DCP7010 / 7010L / 7025
No ratings yet
Parts Reference List MODEL: MFC7420 / 7820N DCP7010 / 7010L / 7025
33 pages
Docx
No ratings yet
Docx
15 pages
Unit 05 Cell Cycle
No ratings yet
Unit 05 Cell Cycle
13 pages
Retaining Ring
No ratings yet
Retaining Ring
1 page

final project

Uploaded by

final project

Uploaded by

simulate the reweighting process

using MATLAB. This approach

This final project aims to

VI. Proposed Method 3.W:= W + α * [r + γ * max(Q(s', a'))

During the process of working on

You might also like