Tradeoff Between Age of Information and Operation Time For UAV Sensing Over Multi-Cell Cellular Networks
Tradeoff Between Age of Information and Operation Time For UAV Sensing Over Multi-Cell Cellular Networks
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
Abstract—Unmanned aerial vehicles (UAVs) have a significant potential for sensing applications in further cellular networks due to
their extensive coverage and flexible deployment. In this paper, we consider a multi-cell cellular network with a cellular-connected UAV,
which senses data with onboard sensors and uploads sensory data to the ground base stations (BSs). To evaluate the freshness of
sensory data, we employ the concept of age of information (AoI), which is defined as the time elapsed since the latest successful
transmission of sensory data. A lower AoI implies fresher sensory data, which may lead to the increase of UAV operation time. To
balance such tradeoff, we aim to minimize the weighted sum of operation time and total AoI for the UAV by jointly optimizing
transmission scheduling, BS association, as well as UAV trajectory. The problem is formulated as a mixed-integer nonlinear
programming (MINLP) problem, which is difficult to solve due to the time-varying propagation channels. To this end, we first
characterize the average communication performance with statistic channel information, and then develop a search algorithm to obtain
the optimal solution via employing the optimal structure as well as convex optimization techniques, while a low-complexity Double
Graph based Algorithm (DGA) is developed to obtain a suboptimal solution. Then, by taking into account the site-specific performance
and making fast decisions online, we propose a Deep reinforcement Learning Algorithm (DLA). Compared to DGA, DLA can adapt to
the specific local environment and obtain a solution more rapidly once the training process is completed. Simulation results show that
the proposed algorithms outperform the benchmarks about 30%, and achieve flexible tradeoff between operation time and AoI of UAV
sensing, which is not available by considering just one objective.
Index Terms—Multi-cell cellular network, UAV sensing, operation time, age of information (AoI).
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
first scenario, the UAVs serve as aerial communication plat- single cell network. In this work, different from the afore-
forms, such as data collectors or data fusion center, to collect mentioned works, we study the UAV sensing over multi-
sensory data generated from ground sensor nodes. The cell cellular networks in urban areas, where flexible tradeoff
work in [25] studied the age-optimal data collection problem between AoI and operation time for UAV sensing has been
for UAV-enabled WSNs, where the AoI was defined as a investigated.
weighted sum of sensor nodes’ data uploading time and the
corresponding flight time. The fine-grained trajectory plan 3 S YSTEM M ODEL
for multi-UAVs was investigated in [26] to minimize the
In this paper, we consider a multi-cell cellular network
maximum flight time of UAVs for data collection. An AoI-
enabled UAV sensing scenario, which consists of a target
optimal trajectory planning in UAV-assisted wireless sensor
sensing region Ŝ ⊆ R2×1 , M > 1 ground BSs denoted by
networks was studied in [27], which consists of a clustering
set M = {s1 , . . . , sM }, and one UAV embedded with sen-
module and a neural trajectory solver. The work in [28] pro-
sors and communication devices. The UAV collects various
posed a time-efficient data collection scheme, where sensory
required data (e.g., aerial surveillance) from Ŝ with their
data from multiple ground devices were uploaded to the
sensors in each time instant [8], where the sensing data
UAV via uplink non-orthogonal multiple access (NOMA).
are transmitted to the ground BSs for further processing or
The work in [29] studied the minimum UAV deployment
delivery. The UAV is assumed to fly at a constant altitude
problem to find the data collection tours for multiple UAVs
of H since frequent descending and ascending are energy-
to collect data from ground devices within a delay thresh-
inefficient [35]. The horizontal location of the UAV in time
old. However, UAVs were only considered as data collection
instant t is denoted by {u(t)|u(t) ∈ R2×1 , 0 ≤ t ≤ T },
platforms in the above works.
where T denotes the total time horizon of the UAV flight,
The other emerging scenario is that the UAVs equipped
also named as UAV operation time (or mission completion
with various sensors are regarded as aerial nodes to directly
time). All BSs are assumed to have the same height of HG ,
provide wireless sensing support from the sky, such as
HG ≪ H , where gm ∈ R2×1 denotes the horizontal coordi-
image/video surveillance, etc. The work in [30] studied
nate of BS sm , 1 ≤ m ≤ M . There are K important target
UAV enabled surveillance of uneven surface to achieve a
locations (e.g., well-known attractions for aerial filming) in
maximal compact coverage, where a centralized algorithm
the sensing region Ŝ , which are required to be reached
and two distributed algorithms were proposed. The work
during the flight [36], denoted by K = {ρ1 , . . . , ρK }. Denote
in [19] minimized the maximum time spent by the UAVs
wk ∈ R2×1 as the horizontal coordinate of ρk , 1 ≤ k ≤ K .
for disaster area surveillance through developing approx-
The mission of the UAV is to sense and transmit data over
imation algorithm. The work in [31] developed a UAV-
region Ŝ from an initial location uI to a final location uF ,
assisted surveillance framework by utilizing random walks
while accessing all important target locations in K. For
with consideration of battery constraints of the UAVs. The
example, the aerial filming not only film the target locations
deployment of UAVs for anisotropic monitoring tasks was
in K, but also the region along its flight [15]. At the end of
investigated in [32], where the anisotropy of monitoring
the mission, the whole aerial film can be generated. Another
angle was taken into account. The work in [33] studied the
example is the aerial virtual reality (VR) applications with
problem of UAV-assisted multi-task allocation for mobile
cellular-connected UAV [37], where the VR users can expe-
crowd sensing to maximize sensing coverage with deep
rience a aerial view flight along target locations in K, and
reinforcement learning. However, the air-to-ground sensory
the UAV’s vision is transmitted to BSs all the time. Thus, we
data transmission to BSs have not been captured in the
have the target constraint
above works.
Cellular-connected UAVs have been increasingly consid- {wk |1 ≤ k ≤ K} ⊆ {u(t)|t ∈ [0, T ]}, (1)
ered for UAV sensing due to their operability and applica-
In practice, uI and uF may correspond to different charging
bility to UAV operations over wide areas [34], where UAVs
stations or the target locations for its pre- and post-mission.
act as aerial users in cellular networks and the sensory
As such, the UAV operation time T corresponds to the total
data are transmitted to ground BSs by leveraging the 5G
time horizon for the UAV flying from uI to uF , while ac-
high-speed wireless infrastructures. AoI was minimized in
cessing all important target locations in K, which is a design
[11] by joint design of sensing time, transmission time,
variable in this paper. We assume that the UAV battery
UAV trajectory, as well as task scheduling. The work in
capacity is enough for the UAV to visit all target points.
[12] considered underlay UAV-to-device communications
Note that our design framework can also be extended to
and studied the AoI minimization problem over a cellular
the multi-UAV scenario with different missions by sepa-
internet of UAVs via trajectory design. AoI-driven quality of
rating different UAVs in operation time or flying altitude
service (QoS) provisioning schemes over UAV sixth genera-
to avoid collision. Let v(t) ≜ u̇(t) be the UAV velocity
tion (6G) multimedia mobile networks was proposed in [13]
at time t, with ∥v(t)∥ ≤ Vmax , ∀t ∈ [0, T ] where Vmax is
to efficiently support massive ultra-reliable and low latency
the maximum speed due to the mechanical limitation. The
communications. The work in [14] studied delay-sensitive
distance between BS sm and the UAV at time t is calculated
energy-efficient UAV crowdsensing to maximize the data q
2
collection ratio from PoIs, while keeping data freshness and as dm (t) = (H − HG )2 + ∥u(t) − gm ∥ .
minimizing energy consumption of all UAVs. The work in
[15] studied the framework of image surveillance UAVs for 3.1 Channel Model
relay communication between ground users and a remote Through A2G channels, the data sensed by the UAV are
BS. However, the aforementioned works only considered a transmitted to the ground BSs. The UAV is assumed to
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
4
PM
be assigned to a dedicated subchannel without inter-cell each time instant t, then we have m=1 xm (t) ≤ 1, t ∈
interference [37], [38]. As we consider UAV sensing for [0, T ], where the BS association is also implied in transmis-
urban environments, buildings will affect the radio propa- sion scheduling indicator. Denote Rm (t) as the achievable
gation by blocking LoS paths. We assume that the buildings rate from the UAV to BS sm if scheduled at time t, then
are distributed in the considered area, where the locations Rm (t) is calculated as
and heights of buildings follow the model suggested by
P |hm (t)|2
the International Telecommunication Union (ITU) [39]. With Rm (t) = B log2 1 + , (2)
σ2
a given building distribution, the UAV will have an un-
obstructed LoS link with BS sm at time t if there exist where σ 2 and B denote the noise power and channel band-
no buildings intersecting the straight line between horizon width, respectively, and P denotes the UAV’s maximum
location gm at height HG and u(t) at height H . Otherwise, transmit power.
if there is at least one building that intersects with this line, We consider the real-time transmission with just-in-time
then the A2G channel is non-Line-of-Sight (NLoS). transmission policy where data is required to be transmitted
Assuming that the complete channel state information immediately after it is generated from the sensors on the
(CSI) for the A2G channels in the considered area is not UAV. In particular, we assume that the UAV generates
known a priori, while the UAV can estimate the instan- sensory data with a sensing rate of Rs in each time instant,
taneous CSI by receiving signals from the BSs within its while the UAV is required to finish the data transmission
communication coverage or leveraging the existing han- process concurrently with the data sensing process to keep
dover mechanisms with continuous reference signal re- the sensory data up-to-date for further processing at BSs
ceived power (RSRP) measurements [23]. We characterize [43]. In practice, Rs is determined by the size of UAV
the A2G channels by both large-scale and small-scale fading. sensing data, which depends on intrinsic proprieties such
Denote βm (t) as the large-scale channel gain at time t as the resolution of the sensors on the UAV. As a result,
between BS sm and the UAV, which can be represented the transmission rate of the UAV should be no less than its
by the LoS and NLoS components. Specifically, βm (t) = sensing rate, i.e.,
β0 dm (t)−α for LoS links while βm (t) = µβ0 dm (t)−α for M
X
NLoS links as in [22], [40], where α ≥ 2 denotes the path loss xm (t)Rm (t) ≥ Rs , ∀t, (3)
exponent, β0 is the channel gain at the reference distance m=1
1 m, and µ < 1 denotes the additional attenuation factor We assume that the above immediate sensory data trans-
which is brought from NLoS propagation. The small-scale mission happens perfectly once (3) is satisfied; otherwise,
fading h̃m (t) for the A2G link between the UAV and sm the immediate data transmission fails. The corresponding
at time t can be modelled as Rayleigh fading for the NLoS problem for non-real-time transmission will be left as our
case and Rician fading for the LoS case with Rician factor future work. However, the A2G channel quality are highly
Kc [41], where E[|h̃m (t)|2 ] = 1. dependent on the distance between the UAV and BSs, as
Typically, ground BSs are equipped with downtilted well as the channel randomness, which is brought from
antennas to ensure good performance of ground users. random small-scale fading and building blockages, and
We adopt a practical BS antenna radiation pattern with thus the constraints in (3) may not be always satisfied.
downtilt angle ϕD ∈ [0◦ , 90◦ ], and each BS is assumed to To tackle such an issue, we adopt the AoI to measure
be equipped with a vertical uniform linear array (ULA) how timely the transmission of sensory data is, i.e., the
with n0 elements [16], [42]. Then the power gain of freshness of the sensory data [44]. It is worthwhile to note
each antenna element at time t along the direction be- that the sensory data is only generated from the sensors or
tween the UAV and BS sm is calculated by Ge (ϕm (t)) ≜ cameras on the UAV (e.g., video surveillance), and data is
2 transmitted immediately after it is generated from the UAV
ϕm (t)
− min 12 HP BWv , G0 , where Ge (ϕm (t)) is measured
that is successfully received once (3) is satisfied. Denote
in dB and ϕm (t) ≜ arcsin( H−H dm (t) ) represents the elevation
U λ(t) as the latest time t that the immediate data trans-
angle between the BS sm and UAV at time t, HP BWv and mission of the UAV is successfully received at BSs, i.e.,
λ(t) ≜ maxτ {τ | M
P
G0 denote the half power beamwidth and antenna nulls m=1 xm (τ )Rm (τ ) ≥ Rs , τ ∈ [0, t]}. Thus,
threshold, respectively. As derived in [16], the array factor at for each time instant t, the time duration t − λ(t) can be
time t for the ULA on BS sm is calculated by Af (ϕm (t)) = regarded as the time elapsed since the latest time when
n π
sin( 02 (sin ϕm (t)−sin ϕD )) freshest information is successfully received, which specifies
√1 . As a result, the overall an-
N0 sin( π2 (sin ϕm (t)−sin ϕD )) the age of such freshest information received and thus we
Ge (ϕm (t))
tenna gain can be given by Gm (t) = 10 10 Af (ϕm (t))2 , refer it as AoI. As such, denote the AoI of the UAV at time t
and the baseband equivalent channel between the UAV as A(t), which can be defined as
and sm at p time t, denoted by hm (t), can be expressed as A(t) ≜ t − λ(t). (4)
hm (t) = Gm (t)βm (t)h̃m (t).
Thus, we have λ(t) ≤ t, A(t) ≥ 0, ∀t. If λ(t) = t, then
the UAV transmits the sensory data on time, and AoI
3.2 Data Transmission A(t) = 0; otherwise, the AoI A(t) will increase with time.
Define xm (t) ∈ {0, 1} as the data transmission scheduling Therefore, the total AoI throughout the UAV mission can be
RT
indicator, where xm (t) = 1 if the UAV transmits the sensed calculated as 0 A(t)dt. In this paper, we aim to minimize
data to BS sm at time t and xm (t) = 0 otherwise. We assume the total AoI to reduce the overall effect of the disconnected
that at most one ground BS is scheduled for reception at events/durations for freshness of sensory data.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
3.3 Problem Formulation used as input, based on which a function that maps the
Note that the UAV makes its sensory data fresher with a input local environment to the output flying decisions can
smaller AoI. To ensure the freshness of sensory data, the be learnt with a DDQN. Note that the DDQN-based offline
UAV should transmit data to ground BSs as quickly as model training can be conducted at the UAV control center
possible using a clean channel such that the AoI can be or BSs, which usually have a more powerful computation
kept small. On the other hand, the UAV operation time T capability. Then the well-trained learning model can be
should be minimized to increase the operation efficiency. transferred to and executed on the UAV to perform model
Intuitively, with a larger T , the UAV has more flexibility inference locally and make fast decisions.
to move closer to ground BSs for better channel quality,
leading to a timely transmission of sensory data and a
4 P ROPOSED G RAPH BASED A LGORITHM
smaller AoI. Thus, a fundamental tradeoff exists between
minimizing UAV operation time T and minimizing the total In this section, we first characterize the average communi-
RT
AoI 0 A(t)dt. To balance such a tradeoff, in this paper, we cation performance with statistic channel information, and
associate each of them with a weighting factor and minimize then reformulate the problem into a more tractable form by
the weighted sum, by optimizing UAV trajectory {u(t)} and analyzing the structure of the optimal solution. We then ob-
transmission scheduling {xm (t)} as well as UAV operation tain the optimal and low-complexity suboptimal solutions
time T , subject to the target location constraint and UAV’s by employing the graph based algorithms.
mechanical constraints. The optimization problem is formu-
lated as 4.1 Problem Reformulation
Z T
(P1) : min θT + (1 − θ) A(t)dt We adopt the probabilistic LoS channel model [21] to char-
T,{u(t)},{xm (t)} 0 acterize the average communication performance under
s.t. xm (t) ∈ {0, 1}, ∀m, t, (5) building blockages over a large number of similar com-
XM munication environments, which is assumed to be known
xm (t) ≤ 1, ∀t, (6) in advance. In particular, the LoS probability at time t
m=1 between ground BS sm and the UAV is denoted by PL m (t) =
{wk |1 ≤ k ≤ K} ⊆ {u(t)|t ∈ [0, T ]}, (7) 1
, where a and b are two environment
1+a exp(−b[|ϕm (t)|−a])
∥v(t)∥ ≤ Vmax , ∀t, (8) dependent parameters, and ϕm (t) is the elevation angle
u(0) = uI , u(T ) = uF , (9) at time t. As a result, βm (t) = β0 dm (t)−α for LoS link
−α
with probability PL m (t) while βm (t) = µβ0 dm (t) for
where θ and 1 − θ, 0 ≤ θ ≤ 1, denote the weights of the NLoS link with probability PN (t) ≜ 1 − P L
(t) . As such,
m m
operation time and the total AoI of the UAV, respectively. we obtain E[|hm (t)|2 ] = Gm (t)P̂m L
(t)β0 dm (t)−α , where
RT
The tradeoff between T and 0 A(t)dt can be obtained by L
P̂m (t) ≜ PL (t) + (1 − PL
(t))µ is a regularized LoS proba-
m m
solving problem (P1) for a given value of θ. With a large θ, bility with additional factor µ for NLoS occurrence.
the UAV is expected to navigate through the target points Since the channel gains hm (t) are random variables, the
where the AoI issue is almost ignored. Otherwise, the UAV channel achievable rates Rm (t) are also random variables.
tends to keep connectivity with the BS for better channel Thus we are interested in the expected achievable rate, i.e.,
quality, leading to a smaller AoI. E[Rm (t)], whose closed form expressions are also difficult to
Note that the optimal solution to problem (P1) is difficult obtain due to the difficulty of deriving its probability distri-
to obtain, since (P1) consists of continuous optimization bution. To address such issue, we obtain its approximation
variables {u(t)} and binary variables {xm (t)}. Furthermore, based on the following result from [45].
(P1) is an MINLP which contains the complicated target
location constraint and an integral upper limited by a de- Proposition 1. ( [45, Theorem 1]). The approximation
E[X]
E log2 1 + X
sign variable T . In addition, an accurate channel coefficient Y ≈ log2 1 + E[Y ] holds, where X and Y
hm (t) is usually unavailable due to the channel randomness are independent random variables, X ≥ 0, Y > 0.
and the frequent switching between LoS/NLoS connections
during a flight. Let X = P |hm (t)|2 and Y = σ 2 . By applying
To address such difficulties, we first consider the av- Proposition 1 over E[Rm (t)], we obtain E[Rm (t)] ≈
L
P E[|hm (t)|2 ]
P Gm (t)P̂m (t)β0
erage communication performance with statistic channel B log2 1 + σ2 = B log2 1 + σ 2 dm (t)α .
information and obtain a statistically favorable formulation Note that P̂mL
(t) is still a complicated functions with UAV
in Section 4, based on the expected communication rate trajectory u(t), and it is difficult to tackle. By applying
over the probabilistic LoS channel model. By analyzing the homogeneous approximation approach as in [40], [41], we
optimal structure, we develop a search algorithm to obtain L
let P̂m L
(t) ≈ P̄m and Gm (t) ≈ Ḡm , where P̄m L
and Ḡm are
the optimal solution and a low-complexity DGA to obtain determined as the average value for the TSP path visiting
a suboptimal solution by employing shortest path and TSP the locations in K exactly once from uI to uF due to the
path techniques. Next, we address the site-specific perfor- target location constraint (7), and this way, a satisfactory
mance for a specific local environment in Section 5, and approximation accuracy can be guaranteed [41]. Thus, we
the UAV acts as an agent to interact with the environment L
P Ḡ P̄m β0
have E[Rm (t)] ≈ B log2 1 + σ2 dmm (t) α ≜ R̄m (t), and the
by CSI measurement. We propose a DLA to learn from RT
the specific local environment and make fast decisions. In UAV’s total AoI can be approximated as 0 t − λ̄(t) dt,
PM
particular, the sampled rate measurements by the UAV is where λ̄(t) ≜ maxτ {τ |τ ∈ [0, t], m=1 xm (τ )R̄m (τ ) ≥
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
Rs }. As such, the transmission scheduling can be approx- case of problem (P2) by assuming that the service area of
imately obtained as each BS is sufficiently large, such that the UAV lies in service
( areas of all BSs during its flight. In this case, the UAV
1, m = arg maxm′ ∈{1,2,...,M } R̄m′ (t), can transmit the freshest sensory data to any BS and the
xm (t) = (10)
0, otherwise, total AoI of the UAV is always zero. Problem (P2) in this
special case reduces to a UAV operation time minimization
and λ̄(t) is written as λ̄(t) = max{τ |τ ∈ [0, t], max R̄m (τ ) ≥ problem under the target location constraint (7), where the
m
Rs }. UAV should reach all the target locations in K. To minimize
It can be shown that the QoS requirement max R̄m (τ ) ≥ the UAV operation time, the UAV is required to fly at the
m
Rs is equivalent to the constraint on the distance be- maximum speed and all the target locations in K are visited
tween the UAV and its closest BS, i.e., min ∥u(t) − gm ∥ ≤ exactly once. Thus, this special problem is equivalent to the
m
q
2 TSP with given initial and end locations [47], which is to
d¯m , where d¯m ≜ ( 2Rsγ/B m
−1
) α − (H − H )2 and γ
G m = find the shortest possible route that visits each city exactly
L
P Ḡm P̄m once with a given list of cities and the distance between each
σ2 denotes the reference signal-to-noise ratio (SNR)
at 1 m. As a result, the UAV selects the closest BS for pair of cities. By adding a dummy city with its distance to
sensory data transmission at each time t. Note that the the initial and end locations set to 0, while that to all other
above approximation solution for transmission scheduling cities set to a sufficiently large value, TSP with given initial
is only applied for the average communication performance, and end locations is equivalent to the standard TSP. Hence,
which is not suitable for the site-specific performance with problem (P2) is NP-hard.
given a specific local environment. Define service area Am ≜
In the following, we first analyze the structure of the
{u|u ∈ R2×1 , ∥u − gm ∥ ≤ d¯m } for ground BS sm , which is
optimal solution to (P2), based on which both optimal
exactly a disk region on the horizontal plane centered at gm
and low-complexity suboptimal solutions are derived by
with radius d¯m . The QoS requirement max R̄m (τ ) ≥ Rs can
m employing graph theory.
always be satisfied as long as the UAV’s horizontal position
SM
lies in the region  ≜ m=1 Am . If the UAV is out of region
Â, then no immediate sensory data transmission happen 4.2 Proposed Solution
and thus the AoI of the UAV will increase. To gain more insights into problem (P2), we consider a
We denote J , J ≤ M , as the number of service areas that special case when K = ∅, i.e., K = 0, where there exist no
the UAV flies over during the mission, and the correspond- important target locations that the UAV needs to visit. The
ing BSs set is denoted as J = {sω1 , sω2 , . . . , sωJ } ⊆ M, above special case of (P2) is named as problem (P2-s). Define
where |J | = J and ωi ∈ {1, . . . , M } is the BS index in the UAV waypoints usπj ≜ u(tsπj ), ulπj ≜ u(tlπj ), 1 ≤ j ≤ J .
M, 1 ≤ i ≤ J . Denote tsωi and tlωi as the time instants It is easy to prove by contradiction that the optimal UAV
that the UAV starts to enter and leave service area Aωi , trajectory to (P2-s) can be assumed to contain line segments
respectively. Thus, we can obtain the visiting order of J , connecting uI , usπ1 , ulπ1 , . . . , usπJ , ulπJ , uF and the UAV flies
π = (π1 , π2 , . . . , πJ ) by re-arranging {ωi } with increasing at the maximum speed. As such, with given J and π ,
tsωi , which is exactly a permutation of (ω1 , ω2 , . . . , ωJ ). Thus, problem (P2-s) can be written as the following problem (P3),
we have tsπi ≤ tlπi ≤ tsπi+1 ≤ tlπi+1 , 1 ≤ i ≤ J − 1, and where ulπ0 ≜ uI and usπJ+1 ≜ uF .
∥u(t) − gπi ∥ ≤ d¯πi , t ∈ [tsπi , tlπi ]. Due to the definition of
AoI, we have λ̄(t) = t when t ∈ [tsπi , tlπi ], 1 ≤ i ≤ J , and θ X l
J J+1
X
λ̄(t) = tlπi−1 when t ∈ [tlπi−1 , tsπi ], 2 ≤ i ≤ J . As a result, the (P3) : min uπj − usπj + usπj − ulπj−1
{usπ ,ulπ }Vmax
total AoI of the UAV can be expressed as j j j=1 j=1
J+1
J+1
X (tsπi − tlπi−1 )2 (1 − θ) X s 2
T
+ uπj − ulπj−1
Z
t − λ̄(t) dt = , (11) 2
2Vmax
2 j=1
0 i=1
s.t. usπj − gπj ≤ d¯πj , ulπj − gπj ≤ d¯πj ,
where we define tlπ0 ≜ 0 and tsπJ+1 ≜ T . Based on the
above discussions, problem (P1) can be reformulated in the j = 1, . . . , J. (15)
following equivalent form.
Note that (P3) is a standard convex optimization prob-
J+1
X (tsπj − tlπj−1 )2 lem, which can be solved with CVX [48]. As a result, the op-
(P2) : min θT + (1 − θ) timal solution to (P2-s) can be obtained by an exhaustively
T,J ,π ,{u(t)},{tsπ ,tlπ } 2
i i i=1 search over all possible subsets J ⊆ M and the visiting
s.t. J ⊆ M, (12) order π of each J , and then solving (P3) to determine the
u(t) − gπj ¯ s l
≤ dπj , t ∈ [tπj , tπj ], i = 1, . . . , J, (13) minimum objective value. However, searching all possible
subsets of J has an exponential complexity of O(2M ),
tsπj−1≤ tlπj−1 ≤ tsπj ≤ tlπj , 2 ≤ j ≤ J, (14)
which is infeasible for large values of M . Therefore, we
(7) − (9). propose an efficient suboptimal solution to (P2-s) by using
a graph based algorithm. With given BS location {gm }
Theorem 1. Problem (P2) is NP-hard.
and corresponding service area radius {d¯m }, we define
Proof. We show the proof by a reduction from TSP, which is a weighted graph Ĝ(V̂ , Ê, ŵ) as follows, and obtain the
a well known NP-hard problem [46]. We consider a special following proposition.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
Definition 1. Ĝ(V̂ , Ê, ŵ) is constructed by Algorithm 1 Search Algorithm for Problem (P2)
1: Λ∗ ← ∞;
• V̂ ≜ {v̂0 , v̂1 , . . . , v̂M , v̂M +1 }. We introduce a vertex v̂m
2: for each permutation Π of the target locations in K do
for the location of each BS sm , i.e., gm , 1 ≤ m ≤ M ,
3: for l = 1 to l = K + 1 do
where v̂o and v̂M +1 represent the initial and final loca-
4: Set initial location ũI = wΠl−1 and final location
tions, respectively.
ũF = wΠl . Let Λ∗Π,l ← ∞;
• Ê ≜ {(v̂i , v̂j )|0 ≤ i ̸= j ≤ M + 1}. For any two
5: for each subset J ⊆ M do
different vertices v̂i and v̂j , v̂i ∈ V̂ , v̂j ∈ V̂ , i ̸= j , there 6: for each visiting order π for J do
exists an edge (v̂i , v̂j ). 7: With given ũI , ũF , J , and π , solve
• ŵ : Ê → R+ , where ŵ is a weight function. In particular, the standard convex optimization problem (P3) with
θ (1−θ) 2
ŵ(v̂i , v̂j ) = Vmax ∥gi − gj ∥ + 2V 2 ∥gi − gj ∥ , 0 ≤ CVX to obtain the objective value ΛΠ,l and waypoints
max
i ̸= j ≤ M + 1, where g0 and gM +1 denote the initial {usπj , ulπj };
and final locations, respectively. 8: if ΛΠ,l < Λ∗Π,l then
Proposition 2. The objective value of (P2-s) is upper bounded by 9: Λ∗Π,l ← ΛΠ,l , JΠ,l
∗
← J , π ∗Π,l ← π ,
∗ s l
the weight of the shortest path between vertex v̂o and v̂M +1 over UΠ,l ← {uπj , uπj };
graph Ĝ. 10: end if
11: end for
Proof. Considering a special case of (P2-s) with d¯πj = 0, ∀j , 12: end for
which we call problem (P2-ss). Then usπj = ulπj = gπj , ∀j . end
13:
Pfor
K+1
Due to Definition 1, it is easy to see that problem (P2- 14: if l=1 Λ∗Π,l < Λ∗ then
ss) is equivalent to finding the shortest path over graph Π∗ ← Π, Λ∗ ←
PK+1 ∗ ∗ ∗
15: l=1 ΛΠ,i , JΠ ← {JΠ,l },
Ĝ, and thus the optimal value to (P2-ss) equals to the ∗ ∗ ∗
π Π ← {π Π,l }, UΠ ← {UΠ,l }; ∗
weight of the shortest path p ⃗∗ between vertex v̂0 and v̂M +1 , 16: end if
∗
i.e., J = {v̂i |v̂i ∈ p ⃗ }, while π ∗ can be obtained by
∗
17: end for
rearranging the vertices in J ∗ with the order appearing 18: Construct the optimal UAV trajectory based on Π , JΠ
∗ ∗
,
in p⃗∗ . In addition, the optimal solution to (P2-ss) is also a π ∗Π , UΠ∗ with line segments and the maximum speed;
feasible solution to (P2-s), since (P2-ss) is a special case of
(P2-s). Thus the optimal objective value of (P2-ss) serves
as an upper bound of that of (P2-s), which concludes the
proof. 1, where the optimal solution to (P2-s) is obtained from Step
5 to Step 12. The complexity of Algorithm 1 is given by
Note that Dijkstra algorithm can be used to find the O(2M M !K!K log(1/ζ)), where ζ is the solution accuracy.
shortest path from v̂0 to v̂M +1 in Ĝ with complexity O(M 2 ) To further reduce the complexity, especially when M
[49]. Thus, a suboptimal solution to (P2-s) can be obtained and K are large, we propose a DGA for finding a subop-
via finding the shortest path in Ĝ using Dijkstra algorithm timal solution to (P2). Specifically, we define another graph
to determine the serving BS set J as well as the visiting Ǧ(V̌ , Ě, w̌) as follows.
order π , and solving the convex optimization problem (P3)
Definition 2. Ǧ(V̌ , Ě, w̌) is constructed by
to determine the waypoints {usπj , ulπj }. It is worthwhile to
note that such suboptimal solution is general for any initial • V̌ ≜ {v̌0 , v̌1 , . . . , v̌K , v̌K+1 }. We introduce a vertex v̌k
location ũI and final location ũF . for the target location ρk , i.e., wk , 1 ≤ k ≤ K , where v̌0
Let us consider the general case when K ̸= ∅, i.e., K > 0, and v̌K+1 represent uI and uF , respectively.
where there exist K important target locations that the UAV • Ě ≜ {(v̌i , v̌j )|0 ≤ i ̸= j ≤ K+1}. For any two different
needs to visit. Then the UAV trajectory can be partitioned vertices v̌i and v̌j , v̌i ∈ V̌ , v̌j ∈ V̌ , i ̸= j , there exists an
into K + 1 segments, with Π = (Π1 , . . . , ΠK ) denoting the edge (v̌i , v̌j ).
visiting order of K. Specifically, the lth segment starts at • w̌ : Ě → R+ , where w̌ is a weight function. In particular,
wΠl−1 and ends at wΠl , 1 ≤ l ≤ K + 1, where wΠ0 ≜ uI w̌(v̌i , v̌j ) represents the weighted sum of the operation
and wΠK+1 ≜ uF . Denote the optimal UAV trajectory for time and total AoI starting from v̌i and ending at v̌j .
the lth segment as {u∗l (t)} and the minimum weighted sum
of operation time and total AoI along the lth segment is Note that weight w̌(v̌i , v̌j ) can be obtained by solving
denoted by Λ∗l . Thus, we can conclude that the optimal ob- problem (P2-s) with given v̌i and v̌j . Due to Definition 2,
PK+1
jective value of (P2) for the entire flight is equal to l=1 Λ∗l , solving (P2) is equivalent to finding a TSP path starting from
since otherwise we can always replace the lth segment with v̌0 and ending at v̌K+1 , while each vertex in V̌ is visited
{u∗l (t)}, which results in a smaller weighted sum. As a only once. Note that although finding a TSP path is NP-
result, we can search all the possible permutations of the hard, it has been well studied and there exist many effi-
K target locations and minimize the weighted sum along cient algorithms to find an approximate solution, e.g., with
each segment of UAV trajectory. With given permutation complexity O(K 2 ) [50]. Based on the above discussions, the
Π, the weighted sum minimization problem for the lth suboptimal solution to (P2) can be obtained by the following
segment can be regarded as (P2-s) with initial location wΠl−1 DGA given by Algorithm 2, where the suboptimal solution
and finial location wΠl , where the solution has already to (P2-s) is obtained from Step 3 to Step 7. Note that the main
been obtained. Based on the above discussions, the optimal steps in Algorithm 2 are Step 2 to Step 8 for weight calcula-
solution to (P2) can be obtained by the following Algorithm tion, whose complexity is given by O(K 2 M 2 ). Furthermore,
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
Algorithm 2 Double Graph Algorithm for Problem (P2) where Rm′ (t) is the actual measured rate. As a result,
1: Construct graph Ǧ(V̌ , Ě, w̌) with Definition 2; the UAV selects the BS with the maximum measured
2: for each edge (v̌i , v̌j ) ∈ Ě do rate for sensory data transmission at each time t. Then,
3: Let ũI and ũF be the initial and final locations λ(t) = maxτ {τ | max Rm (τ ) ≥ Rs , τ ∈ [0, t]}. Due to (7),
m
corresponding to vertices v̌i and v̌j ; the optimal UAV trajectory to (P1) can be partitioned by
4: Construct graph Ĝ(V̂ , Ê, ŵ) with Definition 1, where the target locations in K into K + 1 segments, and we can
v̂0 corresponds to ũI and v̂M +1 corresponds to ũF ; solve K + 1 independent subproblems for each segment of
5: Obtain the shortest path in Ĝ(V̂ , Ê, ŵ) by utilizing the UAV trajectory as in Section 4. As such, we first obtain
Dijkstra algorithm to determine the visited BS set Ji,j the visiting order of the target locations Π = (Π1 , . . . , ΠK )
and visiting order π i,j ; by the offline Algorithm 2 in Section 4. The subproblem
6: With given ũI , ũF , Ji,j , and π i,j , solve convex for each segment l can be written as the following (P4),
optimization problem (P3) with CVX to obtain objective 1 ≤ l ≤ K , where we omit the subscripts of segment l
value Λi,j and waypoints Ui,j . for ease of exposition, i.e.,
7: w̌(v̌i , v̌j ) ← Λi,j ; Z T̃
8: end for (P4) : min θT̃ + (1 − θ) (t − λ(t)) dt
9: Find a TSP path p ⃗ over Ǧ(V̌ , Ě, w̌) starting from v̌0 and T̃ ,{u(t)} 0
ending at v̌K+1 with TSP algorithm [50] to determine s.t. ∥v(t)∥ = Vmax , ∀t ∈ [0, T̃ ], (17)
total weighted sum Λ and visiting order Π; u(0) = ũI , u(T̃ ) = ũF . (18)
10: JΠ ← {Ji,j |(v̌i , v̌j ) ∈ p⃗}, π Π ← {π i,j |(v̌i , v̌j ) ∈ p⃗},
UΠ ← {Ui,j |(v̌i , v̌j ) ∈ p⃗}; where T̃ is the UAV operation time along the lth segment,
11: Construct the UAV trajectory based on Π, JΠ , π Π , UΠ ũI and ũF are the start and end points of the lth segment
with line segments and maximum speed; of the UAV trajectory, respectively. Note that ũI and ũF are
exactly determined by the given visiting order Π. In the
following, we will only focus on solving problem (P4).
the complexity of the TSP algorithm is O(K 2 ). In summary, Since no prior knowledge about the specific radio prop-
the complexity of Algorithm 2 is given by O(K 2 M 2 ). agation environment is given, then the UAV can make se-
Note that the above algorithms rely on the statistic quential decisions regarding its trajectory in each time step,
channel information, thus only average communication per- and the trajectory design influences its states as well as the
formance over a large number of similar scenarios can be AoI in the future. As such, we first transform problem (P4)
obtained, which may not be suitable for the site-specific into an MDP. For ease of exposition, the time horizon T̃ is
performance. Furthermore, these algorithms are time con- discretized into Ñ equal time slots, i.e., T̃ = Ñ δ , where the
suming, thus not suitable for onboard implementation on slot duration δ is appropriately chosen such that the UAV’s
the UAV, where fast decision making is needed. The afore- location can be assumed to be approximately unchanged
mentioned limitations motivate the DLA developed based within each time slot. As such, the UAV trajectory can
on DRL, which will be presented in the following section. be approximated by a sequence {u[n]|1 ≤ n ≤ Ñ }. The
discretized form of Rm (t), λ(t), βm (t), hm (t), h̃m (t) are rep-
5 P ROPOSED DDQN- BASED DLA resented by Rm [n], λ[n], βm [n], hm [n], h̃m [n], respectively.
Thus, (P4) can be approximated as
To analyze the site-specific performance for a specific local
environment and make fast decisions, the DLA is proposed Ñ
X
in this section. We first reformulate the problem as a Markov (P5) : max −θÑ − (1 − θ) (n − λ[n])
{u[n]},Ñ
decision process (MDP), and then propose a DDQN-based n=1
DLA to learn from the local radio propagation environment v[n], ∀n,
s.t. u[n + 1] = u[n] + δVmax ⃗ (19)
by sampled rate measurements, where no prior knowl- ∥⃗v[n]∥ = 1, ∀n, (20)
edge about the radio propagation environment is required.
u[1] = ũI , u[Ñ ] = ũF , (21)
Specifically, the sampled rates of the A2G channels mea-
sured by the UAV is used as input, based on which a func- where ⃗ v[n] denotes the UAV flying direction at time slot n.
tion that maps the local environment input to the output Note that the UAV is at waypoint u[n] at time slot n, and
flying decisions can be learnt with a DDQN-based model. the LoS/NLoS states can be exactly determined by checking
After suitable training, the DDQN can be utilized to make whether obstacles exist between BSs and the UAV in the
quick decisions for the UAV. specific local environment, where the large-scale channel
power gain βm [n] is thus determined. On the other hand,
5.1 Problem Reformulation we can also measure h̃m [n] with the existing handover
In practice, the rate Rm (t) can be measured in a specific local mechanisms with continuous RSRP measurements. Note
environment by leveraging the existing handover mecha- that each time slot may contain multiple fading blocks,
nisms with continuous RSRP measurements [23]. Similar as e.g., Ω ≥ 1 fading blocks. Then the UAV can perform Ω
in Section 4, the transmission scheduling for (P1) can be measurements and adopt the average value for Rm [n]. In
1 PΩ
obtained by other words, Rm [n] = Ω j=1 Rm [n, j], where Rm [n, j] is
( the j th measurement at time slot n, 1 ≤ j ≤ Ω. Although
1, m = arg maxm′ ∈{1,2,...,M } Rm′ (t), such an estimation causes a performance gap, the gap is
xm (t) = (16)
0, otherwise, practically negligible with a sufficiently small δ [22]. As a
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
result, rate Rm [n] can be measured at location u[n], which increments and directions are both discrete. In the follow-
is utilized in the following for reward calculation of DDQN. ing, a DLA is proposed based on DDQN with a dueling
Problem (P5) can be modeled as an MDP. In particular, network architecture. Our proposed DLA consists of an
the UAV is regarded as an agent and all of the network offline training process and an online execution process.
settings (including the local radio propagation environment, During the training process, the training data is collected
target locations and BSs) are regarded as the environ- and the DDQN model is trained offline. After the training
ment. As such, we can characterize the UAV by a tuple process, we execute the well-trained DDQN model to learn
< S, A, P, R >, where S is the state space, A is the action the optimized flying strategy for the UAV according to its
space, P is the state transition function, and R is the reward current state.
function. By formulating the problem as an MDP, the UAV
acts like an agent which finds the peak of the reward
5.2 Proposed Solution
by interacting with the environment. In the following, we
define the above elements. In the following, we adopt a DQN with the dueling network
architecture [51], termed as dueling DQN, to approximate
• State Space: We define the state of the UAV at the be- the state-action value Q̂(s, a) with weights η . Compared to
ginning of the nth time slot as the UAV’s horizontal the standard DQN, the dueling DQN consists of two streams
location, i.e., u[n]. Then u[1] = ũI is the initial state that represent the value and advantage functions, which are
while ũF is the final state. As a result, S ⊆ R2×1 combined in a smart way via a special aggregating layer
denotes the continuous state space that contains all to estimate the state-action value function Q. In particu-
possible UAV horizontal locations. 1 P
lar, Q̂(s, a; α, β) = V(s; β) + (G(s, a; α) − |A| G(s, a; α)),
• Action Space: We define the actions of the UAV as a
its flying directions. In particular, the UAV’s action where α and β are the parameters of fully-connected layers.
within the nth time slot is expressed as ⃗ v[n], where The value function V corresponds to how good it is in a
∥⃗v[n]∥ = 1. Note that the UAV updates its decisions particular state s, while the advantage function G decouples
at the beginning of each time slot, and will keep the state value from the Q-function such that the importance
its decision unchanged within each time slot. As a of each action can be measured. In this case, more robust
result, A ≜ {⃗ v| ∥⃗v∥ = 1} denotes the action space estimates of state value can be achieved, and thus stability
that contains all the possible UAV flying directions. and convergence rate can be significantly improved [51].
For simplicity, we uniformly discretize the set of We adopt the multi-step bootstrapping which can effec-
actions into κ values to ensure finite action space, tively accelerate the training [52]. In particular, the truncated
i.e., A = {⃗v(1) , . . . , ⃗v(κ) }. N0 -step return is given by
• State Transition Function: The UAV’s current state 0 −1
NX
before the (n + 1)th time slot is determined by the Φ[n : n + N0 ] = γΦ[n + k + 1], (24)
UAV’s state before the nth time slot. With given the k=0
current action ⃗ v[n], we can obtain the movement with γ denoting the discount factor. In (P5), the objective
of the UAV within the time slot n with (19), i.e., function corresponds to γ = 1, which means that all re-
u[n + 1] = u[n] + δVmax ⃗v[n]. wards are equally important. Furthermore, it is shown that
• Reward Function: For (P5), the reward function is de- Q-values are always overestimated by DQN training. We
fined to award the UAV for reaching its destination, adopt DDQN to tackle such issue [23], [52]. DDQN includes
and penalize the UAV for moving and the increase two neural networks named the primary neural network
of AoI. At time slot n, we define an indicator In = 1 and the target neural network, both of which adopt the du-
when max Rm [n] ≥ Rs , and In = 0 otherwise. Let eling structure. The key idea of DDQN is to select an action
m
λ[1] = 1. For n ≥ 2, we have by using the primary network with weights η , while uses
( the target network with η ′ to compute the target Q-value
n, In = 1,
λ[n] = (22) for the action. By decoupling Q value evaluation and action
λ[n − 1], In = 0. selection, DDQN can efficiently mitigate overestimation and
Due to the objective function of (P5), the reward at enhance stability during model training.
time slot n is defined as For the training process of DDQN with the dueling
network architecture, the UAV obtains a state from the state
Φ[n] = −θ − (1 − θ)(n − λ[n]). (23) space S and selects an action from the action space A with
the ϵ-greedy policy to balance exploitation and exploration.
Therefore, the UAV is motivated to minimize its
To be specific, the action ⃗
v[n] that maximizes the Q-value is
weighted sum of operation time and total AoI by
chosen with a probability of 1 − ϵ, while a random action is
making decisions on its flying directions. Note that if
chosen with probability ϵ, i.e.,
there exist two flying paths with the same total AoI,
we prefer the one with minimum operation time. In 1 − ϵ, ⃗v[n] = max Q̂(u[n], ⃗v′ ; η),
this case, we take the total AoI as performance metric P(⃗
v[n]) = v′ ∈A
⃗ (25)
ϵ/(|A| − 1), otherwise.
which tilts the problem towards solutions that reduce
the mission duration.
A replay buffer D, i.e., memory pool, is used to store the
After obtaining the discretized actions, the state space N0 -step transitions (or, experience) (u[n], ⃗v[n], Φ[n : n +
can be further discretized into finite space since position N0 ], u[n]), and a mini-batch of experiences are randomly
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
10
sampled from it to update the weights η by minimizing the optimized policy can be generated very fast with only some
loss function simple algebraic calculations. In particular, in each time slot,
the UAV’s action will be generated by the trained neural
Φ[n : n + N0 ] + γ N0 Q̂(u[n + N0 ], ⃗v∗ ; η ′ ) networks, i.e., through the operation at Step 7, which guides
2 the UAV by making real-time decisions and will inevitably
−Q̂(u[n], ⃗v[n]; η) , (26) reduce the payoff of practical implementation.
v∗ = arg max
where ⃗ ′
Q̂(u[n + N0 ], ⃗v′ ; η). Here, the first
v ∈A
⃗ 6 S IMULATION S TUDY
part Φ[n : n + N0 ] + γ N0 Q̂(u[n + N0 ], ⃗v∗ ; η ′ ) represents
In this section, we provide the system settings and evaluate
a target that the Q-value needs to move and the second
the performance of both the DGA for average performance
part Q̂(u[n], ⃗
v[n]; η) denotes the estimation of Q-value. As
and DLA for site-specific performance through simulation.
a result, the loss function indicates the estimation error of
We also study the impact of different parameters on the
a DDQN model, where better estimation performance of
performance of the proposed algorithms.
DDQN can be achieved with smaller loss function. The
pseudocode of DLA for problem (P5) is summarized in
Algorithm 3. 6.1 Simulation Setting
As shown in Fig. 2, we consider a cellular-connected UAV
Algorithm 3 Training process for problem (P5) sensing system with M = 7 ground BSs, which are uni-
1: Initialize replay memory D with capacity |D|; Initialize formly distributed in a square urban area (denoted by
exploration probability ϵ and decaying rate ν ; Ŝ ∈ R2 ) with width 3.0 km. The UAV collects sensory
2: Initialize the primary Q-network with weights η ; data from Ŝ with its sensors in each time instant, while
3: Initialize the target Q-network with weights η ′ = η ; the sensing data are transmitted to the ground BSs for
4: for episode = 1 to Ψmax do further processing. The target locations in K are denoted
5: Initialize the initial state u[1] = ũI . Initialize n = 1; by black squares in Fig. 2, which are required to be visited
6: while n ≤ N max and u[n] ̸= ũF do during the flight. The mission of the UAV is to sense and
7: Choose a random action ⃗ v[n] in A with probabil- transmit data in region Ŝ from uI to uF , while accessing
ity ϵ; otherwise select ⃗ v[n] = max
′
Q̂(u[n], ⃗v′ ; η); all the target locations in K. The heights and locations of
v ∈A
⃗
the buildings in the considered urban area are generated
8: Perform action ⃗ v[n] and observe the next state
based on one realization of the statistical model specified
u[n + 1];
by ITU [39]. As such, the LoS/NLoS states at any location
9: Obtain λ[n] due to (22) based on measurement of
can be exactly determined by checking whether obstacles
Rm [n], and calculate reward Φ[n] = −θ − (1 − θ)(n −
exist between ground BSs and the UAV, while the statistical
λ[n]);
model can be used to obtain the LoS probability, which
10: Calculate the N0 -Step reward Φ[n − N0 : n] ac-
reflects the average communication performance over a
cording to (24), and store (u[n−N0 ], ⃗ v[n−N0 ], Φ[n−N0 :
large number of realizations. In the following, the results
n], u[n]) into the replay buffer D;
of average communication performance with the statistical
11: Sample a random minibatch of transitions
model using DGA are given in Section 6.2, while the results
(u[j], ⃗v[j], Φ[j : j + N0 ], u[j + N0 ]) from replay buffer
of site-specific performance using DLA are given in Section
D;
6.3. Similar to [53], the antenna patterns related parameters
12: Update the weights η of the primary Q-network
are set as: ϕD = 10◦ , G0 = 30 dB, HP BWv = 65◦ , n0 = 8.
by gradient descent with the loss function defined by
The A2G channels parameters are set as: β0 = −60 dB,
(26);
13: Update n ← n + 1; ϵ ← ϵν ;
σ 2 = −110 dBm, a = 10, b = 0.6, α = 2.2, µ = 0.01,
and Kc = 15 dB as in [21]. Unless otherwise stated, other
14: Every B̂ steps, update the target network weights
parameters are set as: HG = 25 m, H = 100 m, B = 1 MHz,
η′ = η;
P = 0.1 W, Rs = 1.5 Mbps, δ = 1 s, and Vmax = 50 m/s.
15: end while
16: end for
6.2 Average Performance of the Proposed DGA
max
In Algorithm 3, the training process consists of Ψ We first consider the offline design with the average com-
episodes, where ϵ-greedy approach is adopted to choose an munication performance. To balance the tradeoff between
action with the given current state in each episode. The al- the operation time and total AoI of the UAV, the optimized
gorithm starts with a fairly randomized policy at Step 7 and trajectories with different values of weighting factor θ are
slowly move to a deterministic policy later due to the update shown in Fig. 2. The colored stars denote the ground BSs
of ϵ at Step 13. According to [51], it has been proved that and the circle around BS sm represents the service area Am
the convergence of the learning Algorithm 3 with DDQN (obtained in Section 4.1), where timely transmission require-
can be guaranteed. As for the DLA, the training procedure ment can be always satisfied if the UAV is within Am , ∀m. If
can be deployed in a simulator and runs offline at BSs, the UAV is out of the service areas, the sensory data cannot
where its time complexity is proportional to the number of be transmitted timely and thus the AoI of the UAV increases.
training time, i.e., O(Ψmax N max ). Once the neural networks It is observed that the ground BSs have service areas with
are trained, the converged neural networks are saved for heterogeneous size even when the UAV generates sensory
testing and can be easily deployed at the UAV, where the data with a fixed sensing rate Rs and transmits data with the
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
11
3000 3000
Optimal solution with Algorithm 1
Suboptimal solution with Algorithm 2
2500 2500
2000 2000
y(m)
y(m)
1500 1500
1000 1000
500 500
Optimal solution with Algorithm 1
Suboptimal solution with Algorithm 2
0 0
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000
x(m) x(m)
(a) θ = 0.2 (a) θ = 0.2
3000 3000
Optimal solution with Algorithm 1
Suboptimal solution with Algorithm 2
2500 2500
2000 2000
y(m)
y(m)
1500 1500
1000
1000
500
500
Optimal solution with Algorithm 1
0 Suboptimal solution with Algorithm 2
0 500 1000 1500 2000 2500 3000 0
0 500 1000 1500 2000 2500 3000
x(m)
x(m)
(b) θ = 0.8
(b) θ = 0.8
same transmit power P , since the local radio environment space, which is also verified by the complexity analysis in
around different BSs are heterogeneous (e.g., with different Section 4. In contrast, our proposed suboptimal Algorithm
building distribution). From Fig. 2, we can see that the UAV 2 requires much less execution time than Algorithm 1 and
enters and leaves the service areas of BSs sequentially, where achieves a close performance as Algorithm 1 (see Fig. 4(b)).
all the target locations are reached. With different values of Thus, the proposed DGA is a practically appealing solution
weighting factor θ, the visiting order of the target locations for the offline problem (P2) from both complexity and
and BSs as well as waypoints may be different, which can performance considerations. In the following, we only use
be obtained by the optimal Algorithm 1 and the suboptimal DGA for performance comparison with other benchmark
Algorithm 2. It is observed that the suboptimal solution schemes.
obtained by Algorithm 2 achieves similar trajectory as the It can be seen in Fig. 2 that when the weighting factor θ
optimal solution obtained by Algorithm 1, thus validating is large, the UAV is expected to fly directly to the target lo-
the efficiency of the DGA. The performance with different cations to reduce the flying distance since minimizing UAV
scenario configuration are provided in Fig. 3 (e.g., with operation time is more important, while the total AoI of the
M = 5, K = 3, Rs = 1 Mbps). It is observed from Fig. UAV increases due to the long time out of service areas.
3 that similar trends can be obtained as in Fig. 2, where This is also verified in the tradeoff curves plotted in Fig. 5,
RT
the details are omitted for brevity. Thus, unless otherwise which show the total AoI and average AoI (i.e., T1 0 A(t)dt)
stated, we only consider the scenario configuration as in versus UAV operation time obtained for the proposed DGA
Fig. 2 to avoid redundancy. by using different values of θ. For any point in Fig. 5(a), its
Fig. 4 depicts the average performance comparison be- values for x-coordinate and y-coordinate represent for the
tween Algorithm 1 and Algorithm 2 under different num- contributions for operation time and total AoI, respectively.
bers of BSs when θ = 0.8, where the BSs are randomly As expected, the UAV operation time decreases with θ while
distributed in the area and the results are obtained by the total AoI increases with θ, which shows that the decrease
averaging over 100 random realizations of the BS locations. of total AoI is at the cost of an increase of the UAV operation
Similar results can be obtained for θ = 0.2, which are time. Similar trend is observed for average AoI in Fig. 5(b),
omitted for brevity. The required execution times for the where the details are omitted for brevity. Thus, θ can be
two algorithms are given in Fig. 4(a), which are obtained flexibly set to achieve a good balance between the AoI and
over a computer with dual core CPU 3.4 GHz. It is observed operation time of the UAV according to practical system
that the execution time of the search Algorithm 1 increases requirements.
exponentially with M due to the rapidly increased search To further illustrate the performance gain achieved by
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
12
106 1800
Optimal Algorithm 1
Suboptimal Algorithm 2 1600
105
Execution time (second) 1400
1000
103
800 Proposed DGA
Uploading rate maximization benchmark
Min-max AoI benchmark
102 600 Operation time minimization benchmark
AoI minimization benchmark
Sense and transmit benchmark
400
Energy minimization benchmark
101
200
100 0
3 4 5 6 7 100 120 140 160 180 200
Number of BSs UAV operation time (s)
(a) Execution time comparison (a) Total AoI versus UAV operation time
Weighted sum of UAV operation time and total AoI (second)
250 18
Optimal Algorithm 1
Suboptimal Algorithm 2 16
Proposed DGA
200
8
100
6
4
50
2
0
0 100 120 140 160 180 200
3 4 5 6 7 UAV operation time (s)
Number of BSs
(b) Average AoI versus UAV operation time
(b) Weighted sum comparison
Fig. 5. Tradeoff between UAV operation time and AoI for UAV sensing.
Fig. 4. Performance comparisons between optimal Algorithm 1 and
suboptimal Algorithm 2. AoI of the UAV can be achieved with these benchmark
schemes, since they only optimize either the UAV operation
the proposed DGA, we consider the following six bench- time or the AoI. Uploading rate maximization benchmark
mark schemes, which are referred to as the Operation time achieves a comparable AoI as DGA, but it leads to a
minimization benchmark, the AoI minimization benchmark, the higher UAV operation time and such gain is brought by
Min-max AoI benchmark, the Uploading rate maximization the waypoint optimization of DGA. The performance gap
benchmark, the Sense and transmit benchmark, as well as the between the min-max AoI benchmark and DGA illustrates
Energy minimization benchmark. In the operation time mini- the additional gain of joint optimization of the AoI and the
mization benchmark, the UAV operation time is minimized visiting order of target locations. The min-max AoI bench-
by visiting all target locations with the TSP method as in mark corresponds to egalitarian bargaining for fairness effi-
[47], which corresponds to the case when θ = 1. In the AoI ciency, while the AoI minimization benchmark corresponds
minimization benchmark, the total AoI of UAV sensing is to utilitarian bargaining. It is also observed that the sense
minimized as in [54], which corresponds to the case when and transmit benchmark results in higher AoI and UAV
θ = 0. In the min-max AoI benchmark, the maximum AoI operation time than DGA, which is mainly attributed to the
along the UAV’s flight is minimized with graph algorithm performance gain brought by the optimized BS selection in
as in [55]. In the uploading rate maximization benchmark, DGA. Although energy minimization benchmark results in
the UAV flies to the top of the visited BSs for uploading the minimum energy consumption of the UAV, higher AoI
rate maximization of sensory data as in [22]. In the sense and UAV operation time are incurred. This can be verified in
and transmit benchmark, the UAV flies directly to the target Fig. 6, where a comparison of energy consumption among
points and then transmits sensory data to the nearby BSs, different schemes is presented, utilizing the energy model
similar as in [11]. For the energy minimization benchmark, in [41].
the UAV flies to the target points with the minimum-energy
speed, which is found numerically by one-dimensional
search based on the propulsion energy model derived in 6.3 Site-specific Performance of the Proposed DLA
[41]. Next, the site-specific performance is studied for a specific
From Fig. 5 we can see that each of the six benchmark urban local environment mentioned in Section 6.1, and we
schemes only result in one singleton trade-off point. Unlike evaluate the performance of the proposed DLA. For the
DGA, no flexible trade-off between the operation time and DLA, we set Ψmax = 5000, N max = 800, Ω = 100, κ = 8,
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
13
105 0
2.6
Proposed DGA
1.8
-2000
1.6
-2500
1.4
-3000
1.2
-3500
1
-4000
0.8 0 1000 2000 3000 4000 5000
0 0.2 0.4 0.6 0.8 1 Episode
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
14
280
Proposed offline Algorithm 1
260 Proposed DGA
Proposed DLA
240 Operation time minimization benchmark
200
180
160
140
120
100
80
1 1.2 1.4 1.6 1.8 2
7 C ONCLUSION
munication performance. On the other hand, the difference In this paper, we proposed an AoI-driven UAV sensing
between actual global service map and that obtained from framework with a cellular-connected UAV to provide re-
statistical model (e.g., not the same regular disc areas as red mote sensing services over multi-cell cellular networks. In
circle parts and many coverage holes exist) demonstrates the particular, we developed a remote UAV sensing model over
characteristic of site-specific performance, which depends multi-cell cellular networks in urban environments. Taking
on the actual radio propagation environment with buildings into account the tradeoff between the AoI and operation
and downtilted BS antenna radiation pattern. time of the UAV, we formulated a weighted sum minimiza-
tion problem through jointly optimizing the UAV trajectory
It is observed in Fig. 8 that the optimized trajectory and operation time as well as transmission scheduling and
obtained from DLA follows the similar trend as the offline BS association. A search algorithm and the low-complexity
optimal Algorithm 1 and DGA. In particular, the UAV tries DGA are developed to obtain optimal and suboptimal solu-
to visit all the target locations by avoiding service holes due tions for average performance, respectively. A DLA is then
to the values of θ, such that the weighted sum of operation proposed to analyze the site-specific performance by em-
time and total AoI is minimized. The solution obtained by ploying a DDQN with dueling network model. Simulation
Algorithm 1 and DGA may lead to a longer distance or a results validate the proposed schemes in supporting remote
larger AoI, which results in a larger weighted sum, since it is UAV sensing and demonstrate the flexible tradeoff between
only based on the statistical model without no specific local the AoI and operation time of the UAV. The design frame-
environment information. Fig. 9 depicts the transmission work can be extended by taking into the energy issue such
scheduling for DLA, and we can see that the UAV may that both accurate propulsion energy model and communi-
not always transmit to its closest BS due to the specific cation related energy model are included, which will be left
positioning of buildings as well as practical BS antenna as future work. For scenario with multi-UAV sharing the
radiation pattern. same mission, a comprehensive cooperative sensing design
deserves further study, where game theory may be adopted
Fig. 10 depicts site-specific performance with different
to tackle the coordination among the UAVs. Furthermore,
schemes. In particular, Fig. 10 shows the weighted sum of
it would be possible to construct a radio map of considered
operation time and total AoI of the UAV versus the sensing
area with given the locations of ground BS and the buildings
rate Rs when θ = 0.8. Similar results can be obtained
[56], which may provide environment awareness for the
for other values of θ, which is omitted for brevity. When
corresponding problem. In the future work, the accurate
Rs increases, the service areas shrink and thus it is more
radio map construction and radio map based optimization
difficult to satisfy the immediate transmission constraint,
needs to be further considered in practice.
leading to the increase of AoI. It is observed that DLA
performs better than not only the benchmark schemes but
also the offline Algorithm 1 and DGA. The reason is that
the benchmark schemes and offline algorithms are all based
R EFERENCES
on the graph model constructed by the statistical model re- [1] Y. Zeng, Q. Wu, and R. Zhang, “Accessing from the sky: A tutorial
lated disk coverage area with no specific local environment on UAV communications for 5G and beyond,” Proc. IEEE, vol. 107,
information. With DLA, the UAV can learn to effectively no. 12, pp. 2327–2375, Dec. 2019.
[2] Q. Wu et al., “A comprehensive overview on 5G-and-beyond
avoid service holes from accumulated experience by inter- networks with UAVs: From communications to sensing and intel-
acting with the specific local environment, and minimize ligence,” IEEE J. Sel. Areas Commun., vol. 39, no. 10, pp. 2912–2945,
the weighted sum of operation time and total AoI, which Oct. 2021.
demonstrates the effectiveness of DLA. In addition, once the [3] K. Rezaee, S. J. Mousavirad, M. R. Khosravi, M. K. Moghimi and
M. Heidari, “An autonomous UAV-assisted distance-aware crowd
DDQN are properly trained, it only needs a small amount sensing platform using deep shuffleNet transfer learning,” IEEE
of algebra calculations to obtain the solution. Trans. Intell. Transp. Syst., vol. 23, no. 7, pp. 9404–9413, July 2022.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
15
[4] M. Mozaffari, W. Saad, M. Bennis, Y.-H. Nam, and M. Debbah, “A [25] J. Liu, P. Tong, X. Wang, B. Bai, and H. Dai, “UAV-aided data
tutorial on UAVs for wireless networks: Applications, challenges, collection for information freshness in wireless sensor networks,”
and open problems,” IEEE Commun. Surveys Tuts., vol. 21, pp. IEEE Trans. Wireless Commun., vol. 20, no. 4, pp. 2368–2382, Apr.
2334–2360, thirdquarter, 2019. 2021.
[5] Y. Zeng, J. Lyu, and R. Zhang, “Cellular-connected UAV: Potential, [26] C. Luo, M. N. Satpute, D. Li, Y. Wang, W. Chen, and W. Wu, “Fine-
challenges, and promising technologies,” IEEE Wireless Commun., grained trajectory optimization of multiple UAVs for efficient data
vol. 26, no. 1, pp. 120–127, Feb. 2019. gathering from WSNs,” IEEE/ACM Trans. Netw., vol. 29, no. 1, pp.
[6] W. Mei and R. Zhang, “Aerial-ground interference mitigation for 162–175, Feb. 2021.
cellular-connected UAV,” IEEE Wireless Commun., vol. 28, no. 1, pp. [27] T. Wu et al., “A novel AI-based framework for AoI-optimal tra-
167–173, Feb. 2021. jectory planning in UAV-assisted wireless sensor networks,” IEEE
[7] W. K. New, C. Y. Leow, K. Navaie, Y. Sun, and Z. Ding, Trans. Wireless Commun., vol. 21, no. 4, pp. 2462–2475, Apr. 2022.
“Interference-aware NOMA for cellular-connected UAVs: Stochas- [28] W. Wang, N. Zhao, L. Chen, X. Liu, Y. Chen, and D. Niyato, “UAV-
tic geometry analysis,” IEEE J. Sel. Areas Commun., vol. 39, no. 10, assisted time-efficient data collection via uplink NOMA,” IEEE
pp. 3067–3080, Oct. 2021. Trans. Commun., vol. 69, no. 11, pp. 7851–7863, Nov. 2021.
[8] S. Zhang, H. Zhang, B. Di, and L. Song, “Cellular UAV-to-X com- [29] W. Xu et al., “Minimizing the deployment cost of UAVs for delay-
munications: Design and optimization for multi-UAV networks,” sensitive data collection in IoT networks,” IEEE/ACM Trans. Netw.,
IEEE Trans. Wireless Commun., vol. 18, no. 2, pp. 1346–1359, Feb. vol. 30, no. 2, pp. 812–825, Apr. 2022.
2019. [30] D. Saha, D. Pattanayak, and P. S. Mandal, “Surveillance of un-
[9] F. Wu, H. Zhang, J. Wu, and L. Song, “Cellular UAV-to-device even surface with self-organizing unmanned aerial vehicles,” IEEE
communications: Trajectory design and mode selection by multi- Trans. Mobile Comput., vol. 21, no. 4, pp. 1449–1462, Apr. 2022.
agent deep reinforcement learning,” IEEE Trans. Commun., vol. 68, [31] S. Hosseinalipour, A. Rahmati, D. Y. Eun, and H. Dai, “Energy-
no. 7, pp. 4175–4189, July 2020. aware stochastic UAV-assisted surveillance,” IEEE Trans. Wireless
[10] J. Hu, H. Zhang, L. Song, R. Schober, and H. V. Poor, “Cooperative Commun., vol. 20, no. 5, pp. 2820–2837, May 2021.
internet of UAVs: Distributed trajectory design by multi-agent [32] W. Wang et al., “Deployment of unmanned aerial vehicles for
deep reinforcement learning,” IEEE Trans. Commun., vol. 68, no. anisotropic monitoring tasks,” IEEE Trans. Mobile Comput., vol. 21,
11, pp. 6807–6821, Nov. 2020. no. 2, pp. 495–513, 1 Feb. 2022.
[11] S. Zhang, H. Zhang, Z. Han, H. V. Poor, and L. Song, “Age of [33] H. Gao, J. Feng, Y. Xiao, B. Zhang, and W. Wang, “A UAV-assisted
information in a cellular internet of UAVs: Sensing and communi- multi-task allocation method for mobile crowd sensing,” to appear
cation trade-off design,” IEEE Trans. Commun., vol. 19, no. 10, pp. in IEEE Trans. Mobile Comput, 2023.
6578–6592, Oct. 2020.
[34] B. Chang, W. Tang, X. Yan, X. Tong, and Z. Chen, “Inte-
[12] F. Wu, H. Zhang, J. Wu, Z. Han, H. V. Poor, and L. Song,
grated scheduling of sensing, communication, and control for
“UAV-to-device underlay communications: Age of information
mmWave/THz communications in cellular connected UAV net-
minimization by multi-agent deep reinforcement learning,” IEEE
works,” IEEE J. Sel. Areas Commun., vol. 40, no. 7, pp. 2103–2113,
Trans. Commun., vol. 69, no. 7, pp. 4461–4475, July 2021.
July 2022.
[13] X. Zhang, J. Wang, and H. V. Poor, “AoI-driven statistical delay
[35] J. Liu, M. Sheng, R. Lyu, Y. Shi, and J. Li, “Access points in the air:
and error-rate bounded QoS provisioning for mURLLC over UAV-
Modeling and optimization of fixed-wing UAV network,” IEEE J.
multimedia 6G mobile networks using FBC,” IEEE J. Sel. Areas
Sel. Areas Commun., vol. 38, no. 12, pp. 2824–2835, Dec. 2020.
Commun., vol. 39, no. 11, pp. 3425–3443, Nov. 2021.
[14] Z. Dai, C. H. Liu, R. Han, G. Wang, K. Leung, and J. Tang, “Delay- [36] Z. Zhou et al., “When mobile crowd sensing meets UAV: Energy-
sensitive energy-efficient UAV crowdsensing by deep reinforce- efficient task assignment and route planning,” IEEE Trans. Com-
ment learning,” to appear in IEEE Trans. Mobile Comput, 2023. mun., vol. 66, no. 11, pp. 5526–5538, Nov. 2018.
[15] N. Van Cuong, Y.-W. Peter Hong, and J.-P. Sheu, “UAV trajectory [37] M. Chen, W. Saad, and C. Yin, “Echo-liquid state deep learning for
optimization for joint relay communication and image surveil- 360◦ content transmission and caching in wireless VR networks
lance,” IEEE Trans. Wireless Commun., vol. 21, no. 12, pp. 10177– with cellularconnected UAVs,” IEEE Trans. Commun., vol. 67, no.
10192, Dec. 2022. 9, pp. 6386–6400, Sept. 2019.
[16] X. Xu, Y. Zeng, Y. L. Guan, and R. Zhang, “Overcoming endurance [38] S. Zhang, Y. Zeng, and R. Zhang, “Cellular-enabled UAV com-
issue: UAV-enabled communications with proactive caching,” munication: A connectivity-constrained trajectory optimization
IEEE J. Sel. Areas Commun., vol. 36, no. 6, pp. 1231–1244, Jun. 2018. perspective,” IEEE Trans. Commun., vol. 67, no. 3, pp. 2580–2604,
[17] H. Wang, J. Wang, G. Ding, J. Chen, F. Gao, and Z. Han, “Com- Mar. 2019.
pletion time minimization with path planning for fixed-wing UAV [39] Propagation Data and Prediction Methods Required for the Design of
communications,” IEEE Trans. Wireless Commun., vol. 18, no. 7, pp. Terrestrial Broadband Radio Access Systems Operating in a Frequency
3485–3499, July 2019. Range From 3 to 60 GHz, document ITU-R, Rec. 1410-5, Feb. 2012.
[18] J. Gong, T.-H. Chang, C. Shen, and X. Chen, “Flight time minimiza- [40] J. -H. Lee, K. -H. Park, Y. -C. Ko and M. -S. Alouini, “Throughput
tion of UAV for data collection over wireless sensor networks,” maximization of mixed FSO/RF UAV-aided mobile relaying with
IEEE J. Sel. Areas Commun., vol. 36, no. 9, pp. 1942–1954, Sept. a buffer,” IEEE Trans. Wireless Commun., vol. 20, no. 1, pp. 683–694,
2018. Jan. 2021.
[19] Q. Guo et al., “Minimizing the longest tour time among a fleet of [41] Y. Zeng, J. Xu, and R. Zhang, “Energy minimization for wireless
UAVs for disaster area surveillance,” IEEE Trans. Mobile Comput., communication with rotary-wing UAV,” IEEE Trans. Wireless Com-
vol. 21, no. 7, pp. 2451–2465, July 2022. mun., vol. 18, no. 4, pp. 2329–2345, Apr. 2019.
[20] R. Amer, W. Saad, and N. Marchetti, “Mobility in the sky: Perfor- [42] M. M. Azari, F. Rosas, and S. Pollin, “Cellular connectivity for
mance and mobility analysis for cellular-connected UAVs,” IEEE UAVs: Network modeling, performance analysis, and design
Trans. Commun., vol. 68, no. 5, pp. 3229–3246, May 2020. guidelines,” IEEE Trans. Wireless Commun., vol. 18, no. 7, pp. 3366–
[21] C. You and R. Zhang, “Hybrid offline-online design for UAV- 3381, July 2019.
enabled data harvesting in probabilistic LoS channels,” IEEE Trans. [43] S. Zhang, H. Zhang, B. Di, and L. Song, “Joint trajectory and
Wireless Commun., vol. 19, no. 6, pp. 3753–3768, June 2020. power optimization for UAV sensing over cellular networks,”
[22] C. Zhan and Y. Zeng, “Energy-efficient data uploading for cellular- IEEE Commun. Lett., vol. 22, no. 11, pp. 2382–2385, Nov. 2018.
connected UAV systems,” IEEE Trans. Wireless Commun., vol. 19, [44] S. Kaul, R. Yates, and M. Gruteser, “Real-time status: How often
no. 11, pp. 7279–7292, Nov. 2020. should one update?” in Proc. IEEE INFOCOM, Orlando, FL, Mar.
[23] Y. Zeng, X. Xu, S. Jin, and R. Zhang, “Simultaneous navigation 2012, pp. 2731–2735.
and radio mapping for cellular-connected UAV with deep rein- [45] M. Hua, L. Yang, Q. Wu, and A. L. Swindlehurst, “3D UAV
forcement learning,” IEEE Trans. Wireless Commun., vol. 20, no. 7, trajectory and communication design for simultaneous uplink and
pp. 4205–4220, July 2021. downlink transmission,” IEEE Trans. Commun., vol. 68, no. 9, pp.
[24] M. Samir, C. Assi, S. Sharafeddine, and A. Ghrayeb, “Online 5908–5923, Sept. 2020.
altitude control and scheduling policy for minimizing AoI in UAV- [46] P. Vansteenwegen, W. Souffriau, and D. V. Oudheusden, “The
assisted IoT wireless networks,” IEEE Trans. Mobile Comput., vol. orienteering problem: A survey,” Eur. J. Oper. Res., vol. 209, no.
21, no. 7, pp. 2493–2505, July 2022. 1, pp. 1–10, 2011.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Mobile Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TMC.2023.3267656
16
[47] Y. Zeng, X. Xu, and R. Zhang, “Trajectory design for completion Jing Wang received her Ph.D. degree from
time minimization in UAV-enabled multicasting,” IEEE Trans. Peking University in 2011. She is currently an
Wireless Commun., vol. 17, no. 4, pp. 2233–2246, Apr. 2018. Associate Professor at the School of Informa-
[48] M. Grant and S. Boyd, “CVX: MATLAB software for tion, Renmin University of China. Her research
disciplined convex programming,” 2016. [Online] Available: interests include edge intelligence and data ana-
http://cvxr.com/cvx. lytics, computer system for Artificial Intelligence,
[49] D. B. West, Introduction to Graph Theory. Prentice Hall, 2001. and energy-efficient computing. She received
[50] C. Rego, D. Gamboa, F. Glover, and C. Osterman, “Traveling Beijing Nova Award, she has published papers
salesman problem heuristics: Leading methods, implementations on top computer conferences such as MICRO,
and latest advances,” Eur. J. Oper. Res., vol. 211, no. 3, pp. 427–441, ISCA, HPCA, and IEEE/ACM Transactions jour-
2011. nals. She received the Best Paper Award of
[51] Z. Wang, et al., “Dueling network architectures for deep reinforce- ICCD, Featured paper of IEEE Transactions on Computer, etc. She
ment learning,” in Proc. of the 33rd Int. Conf. Machine Learning served as a TPC Member of ISCA, NAS, and ASDB.
(ICML), June 2016, pp. 1995–2003.
[52] H. Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning
with double Q-learning,” in Proc. AAAI, 2016, pp. 2094–2100.
[53] Technical Specification Group Radio Access Network: Study on En-
hanced LTE Support for Aerial Vehicles, document 3GPP TR 36.777
V15.0.0, Dec. 2017.
[54] B. Zhu, E. Bedeer, H. H. Nguyen, R. Barton, and Z. Gao, “UAV
trajectory planning for AoI-minimal data collection in UAV-aided
IoT networks by transformer,” IEEE Trans. Wireless Commun., vol.
22, no. 2, pp. 1343–1358, Feb. 2023. Zhi Liu (S’10-M’14-SM’19) received the B.E.,
[55] S. Zhang and R. Zhang, “Trajectory design for cellular-connected from the University of Science and Technology
UAV under outage duration constraint,” in Proc. IEEE International of China, China and Ph.D. degree in informatics
Conference on Communications (ICC), Shanghai, China, May 2019, in National Institute of Informatics. He is cur-
pp. 1–6. rently an Associate Professor at The University
[56] N. Dal Fabbro, M. Rossi, G. Pillonetto, L. Schenato, and G. Piro, of Electro-Communications, Japan. His research
“Model-free radio map estimation in massive MIMO systems interest includes video network transmission, ve-
via semi-parametric gaussian regression,” IEEE Wireless Commun. hicular networks and mobile edge computing.
Lett., vol. 11, no. 3, pp. 473–477, Mar. 2022. He was a recipient of the IEEE StreamComm
2011 Best Student Paper Award, the 2015 IEICE
Young Researcher Award, and the ICOIN 2018
Best Paper Award. He is now an Editorial Board Member of Wireless
Networks (Springer) and IEEE OPEN JOURNAL OF THE COMPUTER
SOCIETY. He is a senior member of IEEE.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on May 04,2023 at 13:59:11 UTC from IEEE Xplore. Restrictions apply.