Trajectory Optimization For Autonomous Flying Base Station Via Reinforcement Learning

load cell optimization

Uploaded by

Abhishek Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Trajectory Optimization For Autonomous Flying Base Station Via Reinforcement Learning

load cell optimization

Uploaded by

Abhishek Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

Trajectory Optimization for Autonomous Flying

Base Station via Reinforcement Learning
Harald Bayerlein, Paul de Kerret, and David Gesbert
Communication Systems Department, EURECOM
Sophia Antipolis, France
Email: {harald.bayerlein, paul.dekerret, gesbert}@eurecom.fr

Abstract—In this work, we study the optimal trajectory of an

unmanned aerial vehicle (UAV) acting as a base station (BS)
to serve multiple users. Considering multiple ﬂying epochs, we
leverage the tools of reinforcement learning (RL) with the UAV
acting as an autonomous agent in the environment to learn the
trajectory that maximizes the sum rate of the transmission during
ﬂying time. By applying Q-learning, a model-free RL technique,
an agent is trained to make movement decisions for the UAV. We
compare table-based and neural network (NN) approximations
of the Q-function and analyze the results. In contrast to previous
works, movement decisions are directly made by the neural
network and the algorithm requires no explicit information about
the environment and is able to learn the topology of the network
to improve the system-wide performance.

I. I NTRODUCTION Fig. 1. UAV BS optimizing its trajectory to maximize the sum rate of the
transmission to a group of users, e.g. in case of stationary transmitter failure.
Compared to traditional mobile network infrastructure,
mounting base stations (BSs) or access points (APs) on
unmanned aerial vehicles (UAVs) promises faster and dynamic
network deployment, the possibility to extend coverage beyond the path planning of the UAV base station. This allows for
existing stationary APs and provide additional capacity to optimizing the users’ QoS during the whole flying time as well
users in localized areas of high demand, such as concerts as combining it with other mission critical objectives such as
and sports events. Fast deployment is especially useful in energy conservation by reducing flying time (e.g. [2] and [6])
scenarios when sudden network failure occurs and delayed or integrating landing spots for the UAV in the trajectory [4].
re-establishment not acceptable, e.g. in disaster and search- In this work and as depicted in figure 1, we consider the
and-rescue situations [1]. In remote areas where it is not UAV acting as a BS serving multiple users maximizing the
feasible or economically efficient to extend permanent network sum of the information rate over the flying time, but a multi-
infrastructure, high-flying balloons or unmanned solar planes tude of other applications exist. [8] and [9] provide summaries
(as in Google’s project Loon and Facebook’s Internet.org of the general challenges and opportunities. In [4], [10] and
initiative) could provide Internet access to half the world’s [11], the authors investigate an IoT-driven scenario where an
population currently without it. autonomous drone gathers data from distant network nodes.
In all mentioned scenarios where flying APs hold promise, The authors of [7] and [12] work on an application where an
a decisive factor for the system’s ability to serve the highest existing ground-based communications network could be used
possible number of users with the best achievable Quality for beyond line-of-sight (LOS) control of UAVs if resulting
of Service (QoS) is the UAV’s location. Previous work has interference within the ground network is managed. In [2] and
either addressed the placement problem of finding optimal [5], a scenario similar to this work is considered where a UAV-
positions for flying APs (e.g. [2], [3]) or optimizing the mounted BS serves a group of users. Whereas the authors in
UAV’s trajectory from start to end [4]–[7]. Whereas fixed [5] also maximize the sum rate of the users, the goal in [2] is to
locations fulfilling a certain communication network’s goal cover the highest possible number of users while minimizing
are determined in the placement problem, the alternative is transmit power.
to embed the optimization of the communication system with Recent successes in the application of deep reinforcement
learning to problems of control, perception and planning
The authors acknowledge the support of the SeCIF project within the achieving superhuman performance, e.g. playing Atari video
French-German Academy for the Industry of the Future as well as the support games [13], have created interest in many areas, though RL-
from the PERFUME project funded by the European Research Council (ERC)
under the European Union’s Horizon 2020 research and innovation programme based path planning for mobile robots and UAVs in particular
(grant agreement no. 670896). has not been investigated widely. Deep learning applications

2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

in UAV guidance focus often on perception and have mostly with transmit power P , noise power N and pathloss Lk of
auxiliary functions for the actual path planning, see [14] the k-th user. The UAV-user distance dk (t) with the UAV at
for a review. In [3], a radio map is learned which is then constant altitude H and all users at ground level is given as
used to ﬁnd optimal UAV relay positions. In [7] a deep RL
2 2
system based on echo state network (ESN) cells is used to dk (t) = H 2 + (x(t) − ak ) + (y(t) − bk ) (4)
guide cellular-connected UAVs towards a destination while
With the pathloss exponent set to α = 2 for vacuum, the
minimizing interference.
pathloss for user k is given as
Our work focuses on a different scenario where the UAV
carries a base station and becomes part of the mobile com- Lk = dk (t)−α · 10XRayleigh /10 · βshadow (5)
munication infrastructure serving a group of users. Movement
decisions to maximize the sum rate over ﬂying time are made where small-scale fading was modeled as a Rayleigh dis-
directly by a reinforcement Q-learning system. Previous works tributed random variable XRayleigh with scaling factor σ = 1.
not employing machine learning often rely on strict models of The attenuation through obstacle obstruction was modeled
the environment or assume the channel state information (CSI) with a discrete factor βshadow ∈ {1, 0.01} which is set to
to be predictable. In contrast, the Q-learning algorithm requires βshadow = 0.01 in the obstacle’s shadow and to βshadow = 1
no explicit information about the environment and is able to everywhere else. Using the described model, the maximization
learn the topology of the network to improve the system-wide problem can be formulated as
performance. We compare a standard table-based approach and T K
a neural network as Q-function approximators. max Rk (t)dt (6)
x(t),y(t) t=0 k=1

II. S YSTEM M ODEL To guarantee that a feasible solution exists, T and V must
be chosen such that the UAV is at least able to travel from
A. UAV Model
initial to
final position along the minimum distance path, i.e.
The UAV has a maximum flying time T , by the end of which V T ≥ (xf − x0 )2 + (yf − y0 )2 .
it is supposed to return to a final position. During the flying
III. F UNDAMENTALS OF Q-L EARNING
time t ∈ [0, T ], the UAV’s position is given by (x(t), y(t)) and
a constant altitude H. It is moving with a constant velocity V . Q-learning is a model-free reinforcement learning method
The initial position of the UAV is (x0 , y0 ), whereas (xf , yf ) firstly proposed by Watkins and developed further in 1992
is the final position. x(t) and y(t) are smooth functions of [15]. It is classified as model-free because it has no internal
class C ∞ and defined as representation of the environment.
Reinforcement learning in general proceeds in a cycle of

[0, T ] → R [0, T ] → R interactions between an agent and its environment. At time t,
x: y: (1) the agent observes a state st ∈ S, performs an action at ∈ A
t → x(t) t → y(t)
and subsequently receives a reward rt ∈ R. The time index is
subjected to then incremented and the environment propagates the agent to
a new state st+1 , from where the cycle restarts.
x(0) = x0 , y(0) = y0 Q-learning specifically allows an agent to learn to act
x(T ) = xf , y(T ) = yf optimally in an environment that can be represented by a
Markov decision process (MDP). Consider a finite MDP
The UAV’s constant velocity is enforced over the time S, A, P, R, γ with state space S, action space A, state tran-
derivatives ẋ(t) and ẏ(t) with sition probability Pa (s, s ) = P r(st+1 = s | st = s, at = a),
reward function Ra (s, s ) and discount factor γ ∈ [0, 1) which
ẋ2 (t) + ẏ 2 (t) = V, t ∈ [0, T ] (2) controls the importance of future rewards in relation to present
reward.
B. Communication Channel Model The goal for the agent is to learn a behavior rule that
The communication channel between the UAV AP and a maximizes the reward it receives. A behavior rule that tells
number of K users is described by the log-distance path loss it how to select actions given a certain state is referred to as
model including small-scale fading and a constant attenuation a policy and can be stochastic in general. It is given as
factor in the shadow of the obstacle. π(a|s) = P r [at = a | tt = s] (7)
The communication link is modeled as an orthogonal point-
to-point channel. The information rate of the k-th user, k ∈ Q-learning is based on iteratively improving the state-action
{1, ..., K} located at a constant position (ak , bk ) ∈ R2 at value function (or Q-function) which represents an expectation
ground level is given by of the future reward when taking action a in state s and
following policy π from thereon after. The Q-function is
P
Rk (t) = log2 1 + · Lk (3) Qπ (s, a) = Eπ {Rt | st = s, at = a} (8)
N
2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

where the discounted sum of all future rewards at current time flying time T . Each new movement decision is evaluated by
t is called the return Rt ∈ R given by the environment according to the achieved sum rate computed
−1 with the channel model described in II-B and a numerical

T
Rt = γ k rt+1+k (9) reward based on the rate result issued to the agent. The
k=0 reward is then used to update the Q-value of the state and
chosen action according to the rule defined in (11). As the
with discount factor γ ∈ [0, 1) as set in the MDP definition
drone is propagated to its new position and the time index
and reaching the terminal state at time t + T . Given the Q-
∗ t is incremented, the cycle repeats until the maximum flying
function with perfect information Qπ (s, a), an optimal policy
time is reached and the learning episode ends. Random action
can be derived by selecting actions greedily:
probability is decreased and drone position and time index
∗
π ∗ (a|s) = arg max Qπ (s, a) (10) are reset for the start of a new episode. The number of episodes
a must be chosen so that sufficient knowledge of environment
From combining (8) and (9) it follows that Rt can be and network topology have accumulated through iterative
approximated in expectation based on the agent’s next step in updates of the Q-table.
the environment. The central Q-learning update rule to make
iterative improvements on the Q-function is therefore given by B. NN-based Q-learning (Q-net)

Qπ (st , at ) ← Qπ (st , at )+ Representation of the Q-function by a table is clearly

not practical in large state and actions spaces as table size
α rt + γ max Qπ (st+1 , a) − Qπ (st , at ) (11) increases exponentially. Instead, the Q-function can be repre-
a
sented by an alternative nonlinear function approximator, such
with learning rate α ∈ [0, 1] determining to what extend
as a neural network (NN) composed of connected artificial
old information is overridden and discount factor γ ∈ [0, 1)
neurons organized in layers.
balancing the importance of short-term and long-term reward.
A model with two hidden layers, each with a number of
γ approaching 1 will make the agent focus on gaining long-
nnodes = 100 neurons, proofed adequate to make a direct
term reward, whereas choosing γ = 0 will make it consider
comparison with the standard table-based approach. Choosing
only the immediate reward of an action. A value of γ = 1
the right architecture and learning parameters is in general
could lead to action values diverging. Q-learning will converge
a difficult task and has to be done through heuristics and
to the optimal policy regardless of the exploration strategy
extensive simulations. The NN input was chosen to contain
being followed, under the assumption that each state-action
only the minimal information of one state space sample,
pair is visited an infinite number of times, and the learning
feeding current position (xt , yt ) of the drone and time index t
parameter α is decreased appropriately [16].
into the network denoted Qπθ with NN parameters θ. Four
IV. Q- LEARNING FOR T RAJECTORY O PTIMIZATION output nodes directly represent the Q-values of the action
A. Table-based Q-learning space. The neural network was implemented using Google’s
TensorFlow library.
In this section, we describe how the Q-learning algorithm
The basic procedure of the NN Q-learning algorithm is the
was adapted for trajectory optimization of a UAV BS inside
same as in the table-based approach described in the previous
a simulated environment with the Q-function being approxi-
section IV-A. However during training, the weights of the
mated by a four-dimensional table of Q-values. Each Q-value
network are iteratively updated based on the reward signal
represents thereby a unique state-action pair and a higher value
such that the output Q-values better represent the achieved
relative to other values promises a higher return according to
reward using the update rule (11). To avoid divergence and
definition (8).
oscillations typically associated with NN-based Q-learning, the
In order to promote initial exploration of the state space, the
training process makes use of the replay memory and target
Q-table is initialized with high Q-values to entice the agent to
network improvements described in [13].
visit each state-action pair at least once, a concept know as
optimism in the face of uncertainty. After the UAV’s position
V. S IMULATION
(x0 , y0 ) and the time index t = 0 have been initialized, the
agent makes its first movement decision according to the - A straight-forward, simulated environment was set up
greedy policy, where with probability ∈ [0, 1] a random to evaluate the Q-learning algorithm. The state space
action to explore the state space is taken and in all other S = {x, y, t} was chosen to contain position of the
cases the action that maximizes the Q-function. Therefore drone and time. For simplicity, available actions for the
a balance must be found between the share of random and drone were limited to movement in four directions A =
non-random actions which is referred to as the exploration- { up , right, down , lef t } within the plane of constant al-
exploitation trade-off [16]. The probability for random actions titude H = 20. It follows that there are four Q-values
is exponentially decreasing over the learning time. representing the action space for each position in the grid and
The agent’s initial movement decision starts the first learn- each time index, as is shown in figure 2 exemplarily for one
ing episode which terminates upon reaching the maximum position and time index.
2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

A. Environment VI. R ESULTS

The final learned trajectory by the table-based approach is
The simulated environment, as depicted in figure 2, is based
depicted in figure 2 while the development of the resulting
on a 15 by 15 grid world which is populated at initialization
sum rate per episode during learning is shown in figure 3 for
with two static users at ground level and a cuboid obstacle with
both approaches. It is important to note that the number of
height equal to the constant altitude of the drone standing on a
episodes in figure 3 is clipped due to the fact that the table-
2 by 4 ground plane. The initial and final position of the UAV
based solution only converged after n = 800, 000 episodes in
are set to the lower left corner (x0 , y0 ) = (xf , yf ) = (0, 0).
comparison to n = 27, 000 for Q-net. The NN approximator
The obstacle obstructs the LOS connection between users therefore shows a much higher training data efficiency, mainly
and UAV BS in part of the area. The signal strength in due to the fact that training data can be reused in the NN
the shadowed part, shown as gray in figure 2, is reduced training. The sum rate shows a high increase in the respective
by a fixed factor of βshadow = 0.01. After initialization first third of the learning phase when the rough layout of
of the environment and placement of users and obstacle, the trajectory is learned. Exploration slows down in the later
a shadowing map for the whole area is computed utilizing phases of the learning process, which means that only details
ray tracing, which is then used as a lookup table during in the trajectory change and the absolute impact on the sum
the learning process. In addition, random samples at each rate is consequently small.
new time index from a Rayleigh distributed random variable The final trajectory shows that the agent is able to infer
XRayleigh modeling small-scale fading are used to compute information about network topology and environment from
the current information rate for each user according to the the reward signal. Both approaches, the table-based and NN
channel model equations (3) and (5). The resulting sum rate approximators, converge to a trajectory with the same charac-
forms the basis of the reward signal. teristics. Specifically, the agent’s behavior shows that it learned
the following:
B. Learning Parameters • The UAV reaches the maximum cumulative rate point
between the two users on a short and efficient path.
The main component of the reward signal is the achieved • It avoids flying through the shadowed area keeping the
sum rate between users and BS. An additional negative reward sum rate high during the whole flying time.
is added if the action chosen by the agent leads the UAV to • While the action space does not allow for hovering on one
step outside the 15 by 15 grid. The third component can be position, the drone learns to circle around the maximum
added by a safety check that activates when the UAV fails to cumulative rate point.
make the decision to return to the landing position before the • The agent decides to return to its landing position in time
maximum flying time T = 50 is reached. The safety system to avoid crashing and on an efficient trajectory.
then activates and forces the UAV to return while awarding a In this simple environment, both approaches are able to
negative reward for each necessary activation of the system. find efficient trajectories in reasonable computation time. This
Except when the safety system is activated, movement changes for larger state spaces. Evaluating both approaches
decisions are made based on the -greedy policy described in a 30 by 30 grid environment with four randomly placed
in IV-A. The probability for random actions is exponen- users and obstacles each showed that table-based learning is
tially decreasing over the learning time with decay constant not able to find a trajectory outperforming random movement
λaction = 14 for NN and table-based approach alike. decisions within a realistic computation time. In the same
No explicit rules exist to choose the learning parameters and environment, Q-net converges to a high sum rate trajectory
the parameters for the update rule (11) in general, which is within nN N = 30, 000 training episodes and a computation
why they have to be found through a combination of heuristics time of about one hour on a basic office computer.
and search on the parameter space. For the table-based approx-
VII. D ISCUSSION
imation, a combination of constant learning rate αtable = 0.3
and number of learning episodes ntable = 800, 000 was A. Summary
selected. As the goal in our scenario independent of approach We have introduced a novel Q-learning system to directly
is to maximize the sum rate over the whole flying time, the make movement decisions for a UAV BS serving multiple
discount factor was set to γ = 0.99 to make the agent focus users. The UAV acts as an autonomous agent in the envi-
on long-term reward for both approaches. ronment to learn the trajectory that maximizes the sum rate
The learning rate in the update rule (11) for the NN-based of the transmission over the whole flying time without the
approach is set to αnn = 1. Instead, the learning speed need for explicit information about the environment. We have
during NN training is controlled with the gradient descent formulated a maximization problem for the sum rate, which
step size, which is exponentially decayed with decay constant we solved iteratively by approximating the Q-function. Our
λgradient = 5 over the whole training time from a value simulation has shown that the agent is able to learn the
of 0.005 to 0.00005. A number of nnn = 27, 000 learning network topology and infer information about the environment
episodes proved sufficient for the training of the NN. to find a trajectory that maximizes the sum rate and which lets
2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

the UAV autonomously return to its landing spot within the B. Limitations and Future Work
flying time limit. Comparing table-based and neural network The relatively high number of learning episodes to obtain
approximators for the Q-function showed that using a table is the described results shows a limitation in choosing a Q-
not feasible for large state spaces, but training a NN provides learning approach in comparison to methods in previous
the necessary scalability and proved to be more efficient using works. This is a consequence of the generality of Q-learning
less training data then table-based Q-learning. and the avoidance to make any assumptions about the environ-
ment contained within the approach. Integrating even a coarse
model of the environment with an alternative model-based RL
method would result in faster convergence, but also entail a
loss in learning universality. The long learning time is put
into perspective by the fact that the main training can be com-
pleted offline based on the prior distribution before a shorter
adaptation phase to the true setting. Future work will include
considerations about dynamically changing environments, as
well as a more detailed look at real-world constraints such as
the energy efficiency of the learned trajectory.
R EFERENCES
[1] K. Namuduri, “Flying cell towers to the rescue,” IEEE Spectrum, vol. 54,
no. 9, pp. 38–43, Sep. 2017.
[2] M. Alzenad, A. El-Keyi, F. Lagum, and H. Yanikomeroglu, “3-D
placement of an unmanned aerial vehicle base station (UAV-bs) for
energy-efficient maximal coverage,” IEEE Wireless Communications
Letters, vol. 6, no. 4, pp. 434–437, Aug. 2017.
[3] J. Chen and D. Gesbert, “Optimal positioning of flying relays for wire-
less networks: A LOS map approach,” in IEEE International Conference
on Communications (ICC), 2017.
[4] R. Gangula, D. Gesbert, D.-F. Külzer, and J. M. Franceschi Quintero,
“A landing spot approach to enhancing the performance of UAV-aided
wireless networks,” in IEEE International Conference on Communica-
tions (ICC) (accepted), 2018, Kansas City, MO, USA.
[5] R. Gangula, P. de Kerret, O. Esrafilian, and D. Gesbert, “Trajectory
optimization for mobile access point,” in Asilomar Conference on
Signals, Systems, and Computers, 2017, Pacific Grove, CA, USA.
[6] Y. Zeng and R. Zhang, “Energy-efficient uav communication with tra-
jectory optimization,” IEEE Transactions on Wireless Communications,
vol. 16, no. 6, pp. 3747–3760, 2017.
[7] U. Challita, W. Saad, and C. Bettstetter, “Cellular-connected uavs over
5g: Deep reinforcement learning for interference management,” arXiv
Fig. 2. The final trajectory for the table-based approach after completed preprint arXiv:1801.05500, 2018.
episode ntable = 800, 000 depicted in the simulation environment with two [8] Y. Zeng, R. Zhang, and T. J. Lim, “Wireless communications with
users and one obstacle. The gray area is in the shadow of the obstacle. As a unmanned aerial vehicles: opportunities and challenges,” IEEE Com-
visualization for the table, the four Q-values, one for each action, are shown munications Magazine, vol. 54, no. 5, pp. 36–42, 2016.
for the start position (0, 0). At time index t = 0, the Q-value for action ’up’ [9] L. Gupta, R. Jain, and G. Vaszkun, “Survey of important issues in uav
was learned to promise the highest future return. communication networks,” IEEE Communications Surveys & Tutorials,
vol. 18, no. 2, pp. 1123–1152, 2016.
[10] C. Zhan, Y. Zeng, and R. Zhang, “Energy-efficient data collection in uav
enabled wireless sensor network,” submitted to IEEE Wireless Commun.
Letters, available online at https://arxiv. org/abs/1708.00221, 2017.
[11] J. Gong, T.-H. Chang, C. Shen, and X. Chen, “Aviation time minimiza-
tion of uav for data collection over wireless sensor networks,” arXiv
preprint arXiv:1801.02799, 2018.
[12] B. Van der Bergh, A. Chiumento, and S. Pollin, “Lte in the sky: trading
off propagation benefits with interference costs for aerial nodes,” IEEE
Communications Magazine, vol. 54, no. 5, pp. 44–50, 2016.
[13] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski
et al., “Human-level control through deep reinforcement learning,”
Nature, vol. 518, no. 7540, p. 529, 2015.
[14] A. Carrio, C. Sampedro, A. Rodriguez-Ramos, and P. Campoy, “A
review of deep learning methods and applications for unmanned aerial
vehicles,” Journal of Sensors, vol. 2017, no. 3296874, 2017.
[15] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning,
Fig. 3. Sum information rate between UAV BS and users per episode over vol. 8, no. 3-4, pp. 279–292, 1992.
learning time comparing table-based and NN approximators of the Q-function. [16] R. S. Sutton and A. G. Barto, Introduction to reinforcement learning,
The plot only shows a clipped range of learning episodes as the table-based 2nd ed. Cambridge, Massachusetts: MIT Press, 2017.
solution converges after ntable = 800, 000 episodes, whereas Q-net only
needs nN N = 27, 000 episodes to come to a similar solution.

QoE-Driven_Adaptive_Deployment_Strategy_of_Multi-UAV_Networks_Based_on_Hybrid_Deep_Reinforceme
No ratings yet
QoE-Driven_Adaptive_Deployment_Strategy_of_Multi-UAV_Networks_Based_on_Hybrid_Deep_Reinforceme
14 pages
[email protected]
No ratings yet
[email protected]
6 pages
Deep Reinforcement Learning For UAV NavigationThrough Massive MIMO Technique
No ratings yet
Deep Reinforcement Learning For UAV NavigationThrough Massive MIMO Technique
6 pages
RL PLS UVA
No ratings yet
RL PLS UVA
5 pages
Multi-Agent Reinforcement Learning Based Resource Allocation For UAV Networks
No ratings yet
Multi-Agent Reinforcement Learning Based Resource Allocation For UAV Networks
30 pages
final project
No ratings yet
final project
4 pages
Liu 2019
No ratings yet
Liu 2019
13 pages
Reinforced Learning-Based Robust Control Design For Unmanned Aerial Vehicle
No ratings yet
Reinforced Learning-Based Robust Control Design For Unmanned Aerial Vehicle
16 pages
Multi Agent Reinforcement Learning Based Resource Allocation For
No ratings yet
Multi Agent Reinforcement Learning Based Resource Allocation For
15 pages
Ris + Uav
No ratings yet
Ris + Uav
14 pages
Drones 08 00060 With Cover
No ratings yet
Drones 08 00060 With Cover
22 pages
3D Obstacle Avoidance For UAV Based On RL and RealSense
No ratings yet
3D Obstacle Avoidance For UAV Based On RL and RealSense
6 pages
QPGAO RL UAVQ Rev1 Fix-1
No ratings yet
QPGAO RL UAVQ Rev1 Fix-1
15 pages
Using Q-Learning To Automatically Tune Quadcopter PID Controller Online For Fast Altitude Stabilization
No ratings yet
Using Q-Learning To Automatically Tune Quadcopter PID Controller Online For Fast Altitude Stabilization
6 pages
Reinforcement Learning-Based Collision Avoidance For UAV
No ratings yet
Reinforcement Learning-Based Collision Avoidance For UAV
6 pages
Deep Reinforcement Learning For Drone Delivery
No ratings yet
Deep Reinforcement Learning For Drone Delivery
19 pages
2020 Reinforcement Learning For A Cellular Internet of UAVs Protocol Design Trajectory Control and Resource Management
No ratings yet
2020 Reinforcement Learning For A Cellular Internet of UAVs Protocol Design Trajectory Control and Resource Management
8 pages
Optimal_Trajectory_Learning_for_UAV-Mounted_Mobile_Base_Stations_using_RL_and_Greedy_Algorithm
No ratings yet
Optimal_Trajectory_Learning_for_UAV-Mounted_Mobile_Base_Stations_using_RL_and_Greedy_Algorithm
6 pages
Timely Data Collection For UAV-based IoT Networks A Deep Reinforcement Learning Approach
No ratings yet
Timely Data Collection For UAV-based IoT Networks A Deep Reinforcement Learning Approach
13 pages
Puede Ser
No ratings yet
Puede Ser
14 pages
Mid Sem Rev
No ratings yet
Mid Sem Rev
26 pages
Machine Learning For Position Prediction and Determination in Aerial Base Station System
No ratings yet
Machine Learning For Position Prediction and Determination in Aerial Base Station System
6 pages
drones-08-00018-v2
No ratings yet
drones-08-00018-v2
18 pages
Learning Based Multi-Obstacle Avoidance of Unmanned
No ratings yet
Learning Based Multi-Obstacle Avoidance of Unmanned
13 pages
Spatio-Temporal Trajectory Design For UAVs Enhancing URLLC and LoS Transmission
No ratings yet
Spatio-Temporal Trajectory Design For UAVs Enhancing URLLC and LoS Transmission
5 pages
Oakes
No ratings yet
Oakes
14 pages
B-APFDQN a UAV Path Planning Algorithm Based on Deep Q-Network and Artificial Potential Field
No ratings yet
B-APFDQN a UAV Path Planning Algorithm Based on Deep Q-Network and Artificial Potential Field
14 pages
Autonomous Unmanned Aerial Vehicle Navigation Using Reinforcement Learning: A Systematic Review
No ratings yet
Autonomous Unmanned Aerial Vehicle Navigation Using Reinforcement Learning: A Systematic Review
24 pages
Reinforcement Learning Control of An Aerial Robot Based On A Tuned Proximal Policy Optimization in Takeoff and Hover Phases
No ratings yet
Reinforcement Learning Control of An Aerial Robot Based On A Tuned Proximal Policy Optimization in Takeoff and Hover Phases
7 pages
5G_Network_on_Wings_A_Deep_Reinforcement_Learning_Approach_to_the_UAV-Based_Integrated_Access_and_Backhaul
No ratings yet
5G_Network_on_Wings_A_Deep_Reinforcement_Learning_Approach_to_the_UAV-Based_Integrated_Access_and_Backhaul
18 pages
Cessna 2
No ratings yet
Cessna 2
20 pages
Path Plan
No ratings yet
Path Plan
9 pages
Automated-aerial-suspended-cargo-delivery-through-rein_2017_Artificial-Intel
No ratings yet
Automated-aerial-suspended-cargo-delivery-through-rein_2017_Artificial-Intel
18 pages
Trajectory-Control Using Deep System Identification and Model Predictive Control For Drone Control Under Uncertain Load
No ratings yet
Trajectory-Control Using Deep System Identification and Model Predictive Control For Drone Control Under Uncertain Load
6 pages
Joint Trajectory and Communication Design For Multi-UAV Enabled Wireless Networks
No ratings yet
Joint Trajectory and Communication Design For Multi-UAV Enabled Wireless Networks
16 pages
UAV Positioning For Throughput Maximization Using Deep Learning Approaches
No ratings yet
UAV Positioning For Throughput Maximization Using Deep Learning Approaches
26 pages
Control and Planning of Flight Trajectories of Multirotor Uavs (Or Fixed Wing) Based On Reinforcing Learning
No ratings yet
Control and Planning of Flight Trajectories of Multirotor Uavs (Or Fixed Wing) Based On Reinforcing Learning
5 pages
Deep Q_Dyna Q
No ratings yet
Deep Q_Dyna Q
3 pages
Minimum-Time Path Planning For Autonomous Landing of UAV On Aerial Drone Carrier
No ratings yet
Minimum-Time Path Planning For Autonomous Landing of UAV On Aerial Drone Carrier
6 pages
1 s2.0 S0167739X18325299 Main
No ratings yet
1 s2.0 S0167739X18325299 Main
9 pages
Autonomous_Decision-Making_Generation_of_UAV_based_on_Soft_Actor-Critic_Algorithm-1
No ratings yet
Autonomous_Decision-Making_Generation_of_UAV_based_on_Soft_Actor-Critic_Algorithm-1
6 pages
TSP CMC 16087
No ratings yet
TSP CMC 16087
17 pages
Wcci 14 S
No ratings yet
Wcci 14 S
7 pages
Collaborative Coverage Path Planning of UAVs Using RL
No ratings yet
Collaborative Coverage Path Planning of UAVs Using RL
7 pages
Reinforcement Learning-Based Routing Protocols in
No ratings yet
Reinforcement Learning-Based Routing Protocols in
60 pages
A_Soft_Actor-Critic_Based_Reinforcement_Learning_Approach_for_Motion_Planning_of_UAVs_Using_Depth_Images-1
No ratings yet
A_Soft_Actor-Critic_Based_Reinforcement_Learning_Approach_for_Motion_Planning_of_UAVs_Using_Depth_Images-1
10 pages
UAV Path Planning Based On Receding Horizon Control With Adaptive Strategy
No ratings yet
UAV Path Planning Based On Receding Horizon Control With Adaptive Strategy
5 pages
2010 Dynamic UAV Relay Positioning For The Ground-To-Air Uplink
No ratings yet
2010 Dynamic UAV Relay Positioning For The Ground-To-Air Uplink
5 pages
Paper - 3D ONLINE PATH PLANNING OF UAV
No ratings yet
Paper - 3D ONLINE PATH PLANNING OF UAV
15 pages
Wang 等 - 2022 - Deep Reinforcement Learning Based Dynamic Trajectory Control for UAV-Assisted Mobile Edge Computing
No ratings yet
Wang 等 - 2022 - Deep Reinforcement Learning Based Dynamic Trajectory Control for UAV-Assisted Mobile Edge Computing
16 pages
2504.03804v1
No ratings yet
2504.03804v1
7 pages
A New Autonomous Method of Drone Path Planning Based on Multiple Strategies for Avoiding Obstacles with High Speed and High Density
No ratings yet
A New Autonomous Method of Drone Path Planning Based on Multiple Strategies for Avoiding Obstacles with High Speed and High Density
21 pages
2309.15191v2
No ratings yet
2309.15191v2
8 pages
Reinforcement Learning For UAV Attitude Control: William Koch, Renato Mancuso, Richard West, and Azer Bestavros
No ratings yet
Reinforcement Learning For UAV Attitude Control: William Koch, Renato Mancuso, Richard West, and Azer Bestavros
21 pages
3D-Trajectory and Phase-Shift Design for RIS-Assisted UAV Systems Using Deep Reinforcement Learning
No ratings yet
3D-Trajectory and Phase-Shift Design for RIS-Assisted UAV Systems Using Deep Reinforcement Learning
10 pages
2501.08220v1
No ratings yet
2501.08220v1
10 pages
Robust Path Planning For Avoiding Obstacles Using
No ratings yet
Robust Path Planning For Avoiding Obstacles Using
8 pages
Aero5GBS - Deep Learning-Empowered Ground Users Ha
No ratings yet
Aero5GBS - Deep Learning-Empowered Ground Users Ha
14 pages
2212.03828v1
No ratings yet
2212.03828v1
9 pages
A Star: Fundamentals and Applications
From Everand
A Star: Fundamentals and Applications
Fouad Sabry
No ratings yet
UAV Positioning For Throughput Maximization: Research Open Access
No ratings yet
UAV Positioning For Throughput Maximization: Research Open Access
15 pages
IEEE Conf Feb2018
No ratings yet
IEEE Conf Feb2018
5 pages
Instantaneous Frequency Estimation and Localization For Enf Signals
No ratings yet
Instantaneous Frequency Estimation and Localization For Enf Signals
10 pages
Viola Jones
No ratings yet
Viola Jones
3 pages
ME 5104 Syllabus C14
No ratings yet
ME 5104 Syllabus C14
4 pages
Cadexperts Catia
No ratings yet
Cadexperts Catia
12 pages
Ludwig Wittgenstein and Hermann Broch - The Need For Fiction and L PDF
No ratings yet
Ludwig Wittgenstein and Hermann Broch - The Need For Fiction and L PDF
233 pages
Alexandre Koyr and The Scientific Revolution
No ratings yet
Alexandre Koyr and The Scientific Revolution
13 pages
Me6702 Mechatronics Unit 4
No ratings yet
Me6702 Mechatronics Unit 4
23 pages
(Computing Supplement 11) Dr. K. Daniilidis (Auth.), Prof. Dr. W. Kropatsch, Prof. Dr. R. Klette, Prof. Dr. F. Solina, Prof. Dr. R. Albrecht (Eds.) - Theoretical Foundations of Computer Vision-Springe
No ratings yet
(Computing Supplement 11) Dr. K. Daniilidis (Auth.), Prof. Dr. W. Kropatsch, Prof. Dr. R. Klette, Prof. Dr. F. Solina, Prof. Dr. R. Albrecht (Eds.) - Theoretical Foundations of Computer Vision-Springe
259 pages
Fusion 360 - Torch Tutorial
No ratings yet
Fusion 360 - Torch Tutorial
29 pages
Mof Important Topics For Exam
No ratings yet
Mof Important Topics For Exam
3 pages
Elasticity Multiple Choice. Choose The One Alternative That Best Completes The Statement or Answers The Question. (5 Points)
No ratings yet
Elasticity Multiple Choice. Choose The One Alternative That Best Completes The Statement or Answers The Question. (5 Points)
2 pages
Thermoelasticity
No ratings yet
Thermoelasticity
63 pages
WORKSHEET ON PYTHON MODULES
No ratings yet
WORKSHEET ON PYTHON MODULES
2 pages
Physics Assignment
No ratings yet
Physics Assignment
20 pages
Time Constant
No ratings yet
Time Constant
6 pages
8th Class Math
No ratings yet
8th Class Math
2 pages
CB 03 Corr
No ratings yet
CB 03 Corr
10 pages
Turbine Cascade Geometry t106 Profile
No ratings yet
Turbine Cascade Geometry t106 Profile
9 pages
Deductive Reasoning: 1. Plan
No ratings yet
Deductive Reasoning: 1. Plan
7 pages
C Lang Objective Type Questions
No ratings yet
C Lang Objective Type Questions
18 pages
EME (18ME15) - Notes-JSA & BMD PDF
No ratings yet
EME (18ME15) - Notes-JSA & BMD PDF
93 pages
Case Study: Titan Insurance Company: Practice Assignment
No ratings yet
Case Study: Titan Insurance Company: Practice Assignment
4 pages
RRB ALP 2024 CBT-1 and CBT-2 Complete Syllabus
No ratings yet
RRB ALP 2024 CBT-1 and CBT-2 Complete Syllabus
5 pages
21 Inductive and Deductive Reasoning
100% (3)
21 Inductive and Deductive Reasoning
34 pages
Algebra 2: Section 6.3: Adding, Subtracting, and Multiplying Polynomials
No ratings yet
Algebra 2: Section 6.3: Adding, Subtracting, and Multiplying Polynomials
15 pages
Zoterimi i Shoqerive Te Biznesit Te Regjistruara Gjate 2019 Sipas Grupeve Gjinore Femra Dhe Meshkuj English Excel
No ratings yet
Zoterimi i Shoqerive Te Biznesit Te Regjistruara Gjate 2019 Sipas Grupeve Gjinore Femra Dhe Meshkuj English Excel
275 pages
Mathematical Literacy P1 Memo June 2022 Eng Eastern Cape
No ratings yet
Mathematical Literacy P1 Memo June 2022 Eng Eastern Cape
9 pages
01 Intro
No ratings yet
01 Intro
48 pages
Aci 544.2R-89 (2009)
No ratings yet
Aci 544.2R-89 (2009)
11 pages
Ayoub and Feifei - 2011
No ratings yet
Ayoub and Feifei - 2011
17 pages
Arthur Schopenhauer
No ratings yet
Arthur Schopenhauer
21 pages
Trigonometry Lecture Notes
100% (2)
Trigonometry Lecture Notes
3 pages

Trajectory Optimization For Autonomous Flying Base Station Via Reinforcement Learning

Uploaded by

Trajectory Optimization For Autonomous Flying Base Station Via Reinforcement Learning

Uploaded by

2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

Trajectory Optimization for Autonomous Flying

Abstract—In this work, we study the optimal trajectory of an

978-1-5386-3512-4/18/$31.00 ©2018 IEEE

Qπ (st , at ) ← Qπ (st , at )+ Representation of the Q-function by a table is clearly

A. Environment VI. R ESULTS

You might also like