H.265/HEVC Video Transmission Over 4G Cellular Networks
H.265/HEVC Video Transmission Over 4G Cellular Networks
Aman Jassal
in
Long Term Evolution has been standardized by the 3GPP consortium since
2008, with 3GPP Release 12 being the latest iteration of LTE Advanced,
which was finalized in March 2015. High Efficiency Video Coding has been
standardized by the Moving Picture Experts Group since 2012 and is the
tent to users. With video traffic projected to represent the lion’s share of
mobile data traffic in the next few years, providing video and non-video
future 5G systems.
the High Efficiency Video Coding standard such as coding structures and
the most within the coded video bitstream to determine which frames have
higher utility for the High Efficiency Video Coding decoder located at the
user’s device and evaluate the performances of best effort and video users
performance for best effort users and packet loss performance for video users
ii
Abstract
video users using our proposed Frame Reference Aware Proportional Fair
iii
Preface
I hereby declare that I am the author of this thesis. This thesis is an original,
unpublished work under the supervision of Dr. Cyril Leung. In this work,
I played the primary role in designing and performing the research, doing
data analysis and preparing the manuscript under the supervision of Dr.
Cyril Leung.
iv
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Basics of H.265/HEVC . . . . . . . . . . . . . . . . . . . . . . 4
v
Table of Contents
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
vi
List of Tables
ture of GOP-size 8 . . . . . . . . . . . . . . . . . . . . . . . . 11
ture of GOP-size 8 . . . . . . . . . . . . . . . . . . . . . . . . 13
vii
List of Figures
viii
List of Acronyms
BE Best Effort.
CB Coding Block.
ix
List of Acronyms
IP Internet Protocol.
tor.
x
List of Acronyms
PB Prediction Block.
RI Rank Indication.
xi
List of Acronyms
RU Resource Utilization.
xii
Acknowledgements
sincerest thanks to my supervisor, Dr. Cyril Leung, who has given me great
and his insights helped make this research work more valuable. Without his
I would also like to thank Dr. Ahmed Saadani for his guidance and
colleagues, Mr. Sebastien Jeux and Dr. Sofia Martinez Lopez, and more
tion with the 3GPP, have had a great influence on me and without their
British Columbia.
All of the work that has been done in this thesis was supported in part by
xiii
Dedication
xiv
Chapter 1
Introduction
With the emergence of Long Term Evolution (LTE) and its subsequent it-
consortium, video services are fast becoming the dominant data services
72% of the total mobile data traffic by 2019 [1]. The transmission of video
and the effect of error propagation within the video sequence in the event
of packet losses. The current dominant standard for video coding is Ad-
vanced Video Coding (H.264/AVC) [2] and is used to deliver a wide range
ized by the Moving Picture Experts Group (MPEG) in 2012 and is expected
to reduce the bit rate compared to H.264 High Profile by about 50% while
1
Chapter 1. Introduction
As we move towards 5G, one of the key targets that we need to achieve
Division Multiple Access (OFDMA) systems [6], where the scheduling al-
gorithm uses the Mean Opinion Score (MOS) as a way to provide QoE.
Other attributes that the research community has been focusing on in order
to improve the QoE of video users are the playback buffer status and the
rebuffering time [7]-[8]. One of the limitations in these works is the reliance
on video traces that were generated for low-definition video sequences en-
coded using H.264/AVC, which are not representative of the targets that
H.265/HEVC video streaming over Wi-Fi wireless networks and shown that
the QoE of video sequences, reflected through the use of MOS, is very sen-
assumed that packet losses are random; however in cellular networks this
tics of the video sequence and the individual user’s link quality will dictate
mission over 4G networks. Existing works have not used the compression
2
Chapter 1. Introduction
use Key Performance Indicators (KPIs) that have been recommended by the
3. The joint-assessment of the QoE of video users and Best Effort users
lines the basics of the H.265/HEVC standard that are relevant to this work.
3
Chapter 2
Basics of H.265/HEVC
are directly relevant to this thesis and to the problem formulation that
high-level syntax used to represent the video data, the motion prediction
techniques used for video compression and the coding structures and refer-
H.265/HEVC [3]. The main point to understand is that the encoder knows
about the specifics of the coding structure and it has to provide the decoder
about the information needed to reconstitute it. This is done through using
a given coding order (which is implicitly embedded in the way LDUs are
ordered) and through using Reference Picture Sets and Reference Picture
Lists (the former are explicitly transmitted and the latter are derived dur-
ing the decoding process). In this chapter we will explain how all of these
features work.
4
2.1. Syntax Structures and Syntax Elements
inside logical data units called Network Abstraction Layer (NAL) units.
semantics that the decoder can read and understand. The syntax is the set
of words the decoder knows and the semantics tells the decoder how the
syntax and the semantics is recovered through the decoding process, which
Table 2.1 illustrates the syntax structure of a generic NAL unit and the
elements have associated descriptors which are used for parsing purposes
but these are not covered in this thesis and the interested reader is in-
1
In this thesis, we will interchangeably use the terms ”Picture” and ”Frame”.
5
2.1. Syntax Structures and Syntax Elements
vited to refer to [4] (Chapter 5) for more details. Every NAL unit carries
N umBytesInN alU nit bytes, which further breaks down into a 16-bit header
made of 4 syntax elements and a payload which is the Raw Byte Sequence
first syntax element is the forbidden zero bit (forbidden zero bit). The
second syntax element is nal unit type, which is written over 6 bits and
carries the type of the RBSP contained in the NAL unit. The values that
it can take are specified in Table 7-1 of [3], NAL unit types belong either to
Video Coding Layer (VCL) or non-VCL. VCL types comprise all NAL units
that contain coded video data whereas non-VCL types contain parameter
information. The third syntax element is the layer identifier, nuh layer id,
which is written over 6 bits. Its value is always 0 although other values
scalable or 3D video coding extensions of [3]. The fourth and final syntax
which is written over 3 bits. Its value is typically 1, which means that there
is only one temporal layer. We assume that this is the case throughout the
thesis. The temporal identifier for the NAL unit, TemporalID, is obtained
as:
The payload of NAL units is the RBSP, denoted as the rbsp byte
rbsp byte[i] is the ith byte of the RBSP. Because there are various types of
NAL units, the RBSP itself can be viewed as a syntax structure carrying syn-
6
2.2. Coding Structures and Reference Picture Lists
tax elements. For each nal unit type, the H.265/HEVC standard provides
the description of the associated syntax structure. For instance, the RBSP
of [3]), the RBSP of a Clean Random Access NAL unit has a dedicated
syntax structure further broken into a slice segment header, a slice segment
data and trailing bits (Section 7.3.2.9 of [3]), etc. In order to guarantee
that every NAL unit has a unique start identifier byte, the H.265/HEVC
During the decoding process, this byte is usually discarded. In this thesis,
we assume that a bitstream is only made of generic VCL NAL units and
from this point onwards, a NAL unit will be referred to as Logical Data Unit
(LDU).
Lists
which can be decoded using pictures within that sequence. Similarly, a coded
contain all the LDUs associated with that picture. In this section we will
present some of the tools used by the H.265/HEVC standard for motion
7
2.2. Coding Structures and Reference Picture Lists
with clearly defined dependencies between pictures and a given coding or-
der. The purpose of having pictures depend on others is for prediction, which
can be done from one picture or two pictures (called uni-prediction and bi-
different from the output order: the coding order is the order in which pic-
tures are encoded while the output order is the order in which pictures are
dard uses a Picture Order Count (POC) to uniquely identify a given picture
in output order. From this point onwards and for the sake of convenience,
a coding structure is that the first picture in a coding structure does not
ture only reference other pictures within the coding structure for prediction
purposes. In this case, the coding structure is called a closed GOP. The
8
2.2. Coding Structures and Reference Picture Lists
will use the hierarchical-B coding structure that was used by the Joint Col-
laborative Team on Video Coding (JCT-VC) for the Main Profile Random
will refer to that specific coding structure. For simplicity, throughout the
remainder of this thesis, we will refer to this coding structure simply as the
9
2.2. Coding Structures and Reference Picture Lists
coding structure. Referenced pictures are denoted by a (*) and arrows point
from the referenced picture to denote all direct dependent pictures. Depen-
dent pictures can be either before or after the referenced picture in display
order. The reference coding structure is actually an open GOP coding struc-
ture and by design it operates with a GOP size of 8. We can see the open
side of the reference coding structure in Fig. 2.1 on the examples where poc0 ,
poc4 and poc6 are the referenced pictures. They are referred by pictures be-
yond the GOP size: poc0 , poc4 and poc6 are all referenced by poc16 . The
reference coding structure uses I-Frames and B-Frames. The coding order
its own since there are no pictures before poc0 . Using this definition, we
can easily identify that after poc0 , the next GOP is comprised of {poc8 ,
poc4 , poc2 , poc1 , poc3 , poc6 , poc5 , poc7 }. The reference coding structure is
sequence. The encoder can change the coding structure if it yields better
coding of a video sequence. The decoder at the receiver side will extract the
Coding structures specify the coding order and the dependencies between a
given set of pictures. The decoder does not have any knowledge about the
10
2.2. Coding Structures and Reference Picture Lists
Table 2.2: Reference Picture Sets for the Hierarchical-B Coding Structure
of GOP-size 8
Reference Picture Set Reference POCs
0 pocn−8 , pocn−10 , pocn−12 , pocn−16
1 pocn−4 , pocn−6 , pocn+4
2 pocn−2 , pocn−4 , pocn+2 , pocn+6
3 pocn−1 , pocn+1 , pocn+3 , pocn+7
4 pocn−1 , pocn−3 , pocn+1 , pocn+5
5 pocn−2 , pocn−4 , pocn−6 , pocn+2
6 pocn−1 , pocn−5 , pocn+1 , pocn+3
7 pocn−1 , pocn−3 , pocn−7 , pocn+1
coding structure that was used by the encoder, it must derive this informa-
tion from the LDUs that carry the encoded video data. In this section, we
explain how the encoder transmits the information regarding the dependen-
output. Any picture located in the Decoded Picture Buffer can be reused as
reference for prediction. Pictures that are available for inter prediction are
listed in a so-called Reference Picture Set. The Reference Picture Set is sent
in the Sequence Parameter Set and each picture indexed in there is explicitly
identified using its POC value. Table 2.2 lists the different Reference Picture
Sets defined for the reference coding structure that was used by the JCT-VC
[10]. Eight Reference Picture Sets are defined and for a given picture pocn ,
the corresponding referenced POCs are given. Since poc0 is the first POC
11
2.2. Coding Structures and Reference Picture Lists
i < 0 were to be in a Reference Picture Set, the picture would simply not
be included.
The LDUs of a given picture carry a header that specifies which Reference
Picture Set to activate. H.265/HEVC uses two Reference Picture Lists for
inter prediction, called List0 and List1. The decoder reconstructs these
lists from the Reference Picture Sets that were supplied in the Sequence
Parameter Set and this process is specified in Section 8.3.4. of [3]. The
Set which is actually used for inter prediction. For uni-predicted frames
both List0 and List1 are activated. Motion compensated prediction is then
performed using the activated lists. The resulting prediction can be either
made from one picture only or a combination of pictures. Using these lists,
the hierarchy between pictures can be recovered. Table 2.3 depicts the
hierarchical-B coding structure of size 8 that was used by the JCT-VC for
This is the reference coding structure that we use throughout this thesis for
all our video sequences. For each picture, we provide the Reference Picture
Set that is used and the POCs of the pictures in the Reference Picture
and I-Frames do not use Inter Prediction. Therefore it does not have any
associated Reference Picture Set and its associated Reference Pictures Lists
are empty. poc8 and poc16 both use the same Reference Picture Set, however
for poc8 , three of the pictures do not exist therefore poc8 only references
12
2.3. Motion Compensated Prediction
Table 2.3: Reference Picture Lists for the Hierarchical-B Coding Structure
of GOP-size 8
POC RPS used List0 POCs List1 POCs
0 - N/A N/A
8 0 0 0
4 1 0, 8 8, 0
2 2 0, 4 4, 8
1 3 0, 2 2, 4
3 4 2, 0 4, 8
6 5 4, 2 8, 4
5 6 4, 0 6, 8
7 7 6, 4 8, 6
16 0 8, 6, 4, 0 8, 6, 4, 0
12 1 8, 6 16, 8
10 2 8, 6 12, 16
9 3 8, 10 10, 12
... ... ... ...
poc0 . By combining the information in Table 2.2 and Table 2.3, one can
coded frames (I-Frames) whereas inter prediction is used for all other frames,
hind inter prediction is that a given picture uses another picture as ref-
13
2.3. Motion Compensated Prediction
erence, searches for the block in that reference picture that best matches
the predicted area and encodes the information of the motion of that block
two pictures as reference for inter prediction. Fig. 2.2 illustrates the con-
using the coding structures that we introduced in Section 2.2.1. poc does
uni-prediction from picture poc − 2 and does bi-prediction from its adjacent
pictures poc − 1 and poc + 1. Note that bi-prediction does not require the
pictures to be adjacent to poc, one CB from poc uses poc − 2 and poc − 1 for
bi-prediction.
blocks called Prediction Block (PB). After the picture has been partitioned
into PBs, the encoder will then perform prediction on a PB-basis from the
reference pictures whose POCs are given in the Reference Picture Lists.
14
2.4. Operation with Networking Layers
The encoder will look through the reference pictures for the same area as
encodes the information of the shift as the tuple of the motion vector and
the reference picture’s POC. The motion vector is the shift between the
area corresponding to the PB and the area in the reference picture which
presented the lowest amount of rate-distortion. The basic idea behind rate-
distortion optimization is that the encoder looks for the best possible coding
mode that reduces the loss of video quality, i.e. the distortion, and the
required bit rate to encode that area, i.e. the rate. It is beyond the scope
of this thesis to delve into rate-distortion algorithms and their specifics and
the interested reader is invited to refer to [11] and to [4] (Chapter 2) for
at the Application layer, which sits at the highest level in the Open Sys-
tems Interconnection (OSI) model [12]. The encoder generates LDUs which
are then sent to the lower layers for transmission over packetized networks
based on the Internet Protocol (IP). One of the commonly used solutions
for delivering video content over IP networks is to use the Real Time Proto-
col (RTP). The Internet Engineering Task Force (IETF) has formulated the
RFC 6184, which details the operation of RTP for delivering H.264/AVC
content [13]. Similarly the IETF has formulated a draft RFC for the op-
15
2.4. Operation with Networking Layers
eration of RTP for delivering H.265/HEVC content [14]. We will look into
thesis, we assume that that for all users we have a Single RTP stream on a
single media transport (SRST) and all LDUs are sent in RTP packets that
use the Single NAL unit packet structure. Fig. 2.3 shows the structure of
such an RTP packet. The PayloadHdr field is the bit-exact copy of the LDU
header, the DONL field is optional and carries the 16 least significant bits
of the Decoding Order Number. We assume that this field does not exist.
The NAL unit payload data field is the payload of the LDU and the last
field is also optional and included for the purpose of padding. We assume
that all RTP packets have a padding field occupying 10 bytes. Given that
field in the Single NAL unit packet structure: the RefCount field. Since the
sequence, it can also keep track of the number of times a given picture is
referenced within the video sequence and propagate that information to the
16
2.4. Operation with Networking Layers
For live streaming services, RTP is used in conjunction with the User
over the Hypertext Transfer Protocol (HTTP) using adaptive bit rate and is
Fidelity (Wi-Fi). Fig. 2.4 gives an illustration of how the protocol stacks
are set up. In this thesis, we will focus on using video streaming services
to cellular users. We assume the use of RTP and UDP to supply packets
over IP, using the modified Single NAL unit packet structure for the RTP
17
Chapter 3
structures, syntax structures and syntax elements, which are used to en-
more bandwidth-efficient encoding and reference picture lists for helping the
decoder track which pictures to use as reference when doing motion predic-
that exploits these features and delivers video content based on their de-
18
3.1. Mathematical Formulation of the Shared Resource Allocation Problem
us consider a user k and let the channel capacity of user k for time-slot n
of the shared resource allocation problem, which has been widely used by
networks. This shared resource allocation problem, which we will call SRAP,
SRAP:
X
maximize F (~r(n)) , Uk (rk (n)) (3.1)
k∈S
F is the objective function that we are trying to maximize, Uk (rk (n)) denotes
the utility function of user k and rk (n) is the average throughput of user k
up to time-slot n. Constraint (3.2) ensures that the rate of the user does
not exceed the channel capacity Ck (n) that user k is experiencing during
is strictly concave and differentiable and that the feasible region in (3.2) is
solution exists for SRAP and Kelly has provided an explicit optimal solution
19
3.1. Mathematical Formulation of the Shared Resource Allocation Problem
In wireless networks, the channel capacity and the number of users ac-
tively sharing resource varies with time. This is due to the random nature
of the wireless channel and the network’s traffic. As a result, the optimal
solution to SRAP also varies with time. Hosein [17] proposed a solution
the user which maximizes the gradient of the objective function. Hosein
1 d (n)
(1 − )rk (n) + k
if user k is served,
rk (n + 1) = τ τ (3.3)
1
(1 − )r (n)
k otherwise.
τ
dk (n) is the throughput of user k estimated for time-slot n in bits per sec-
ond. τ > 1 is the time constant of the exponential smoothing filter. rk (n)
tive function is strictly concave, Hosein showed that all we need to find is
the direction, i.e. the user, which maximizes the gradient of the objective
logarithmic function of the rate of that user log(rk ), then the maximum
gradient direction, i.e. the user maximizing the gradient function, is given
20
3.1. Mathematical Formulation of the Shared Resource Allocation Problem
by: ( )
dk (n)
k ∗ = argmax (3.5)
k∈S rk (n)
(3.5) is the well-known Proportional Fair metric, widely used for scheduling
(UMTS) and LTE. An alternate way of finding this result is as follows2 . The
the rate of user k and we know how the rate of each user is computed. Let
us assume that user i is selected at time-slot n, the new utility value will be
X
log((1 − τ −1 )rk (n)) + log((1 − τ −1 )ri (n) + τ −1 di (n)). (3.6)
k∈S
k6=i
By adding and subtracting log((1 − τ −1 )ri (n) in Eq. (3.6), the sum will be
From Eq. (3.8), it is obvious to see that the overall utility is maximized if
di (n)
user i maximizes ri (n) , which is the Proportional Fair metric. Hosein [17]
also proposed the use of barrier methods in order to account for Quality of
21
3.1. Mathematical Formulation of the Shared Resource Allocation Problem
are penalty methods, which forces the solutions to remain in a certain area
to account for frame reference awareness and call this new problem SRAP-
ck (n), to account for the fact that the network does not hold transmission
queues of infinite size. This also prevents the scenario where a video user
SRAP-FRA:
X
maximize F (~r(n), ~c(n)) , Uk (rk (n), ck (n)) (3.9)
k∈S
Ck0 (n) is the constraint on the number of frame references the transmission
22
3.1. Mathematical Formulation of the Shared Resource Allocation Problem
queue of user k can hold at any given time-slot n, ck (n) is the average number
Uk (rk (n), ck (n)) is the combined utility function of user k that we introduce
for our frame reference aware scheduling framework. For our scheduling
framework, we need to track for each user whether its transmission queue
is holding any frame that is referenced within the video sequence user k is
watching and take any decision based on that. Essentially, we are building
a scheduling framework where users watching video content get sent content
that the decoder needs to perform its task as efficiently as possible and by
functions and express the combined utility function for each user k as
Uk (rk (n), ck (n)) = Uk,1 (rk (n)) + Uk,2 (ck (n)), (3.12)
where
Uk,1 (rk (n)) , log(rk (n)), Uk,2 (ck (n)) , −λ exp(−µ(ck (n) − cmin )). (3.13)
positive-valued parameters for adjusting the penalty for leaving the feasible
region. Hosein [17] has proposed the use of such functions for delivering
QoS though there is no indication in the literature to suggest that this type
addressing such issues. Our motivation for using a barrier function based
23
3.1. Mathematical Formulation of the Shared Resource Allocation Problem
1 t (n)
(1 − )ck (n) + k
if user k is served,
ck (n + 1) = T T (3.14)
1
(1 − )c (n)
k otherwise,
T
where ck (n) is the frame reference count of user k at the beginning of time-
slot n, cmin is the minimum number of frame references that we force the
system to provide to each video user, T > 1 is the time constant of the
SRAP-FRA:
X
maximize F (~r(n), ~c(n)) , Uk,1 (rk (n)) + Uk,2 (ck (n)) (3.15)
k∈S
24
3.2. Solution to the proposed Shared Resource Allocation Problem
Allocation Problem
In this section, we are going to derive the solution to the proposed optimiza-
tion problem SRAP-FRA (3.15). We need to find the user that maximizes
utility function as the sum of two separate utility functions (3.12), maxi-
P
mizing the combined utility can be written as maximizing k∈S Uk,1 (rk (n))
P
and k∈S Uk,2 (ck (n)) individually. We already know the solution to the
maximization of the sum of the first utility function Uk,1 (rk (n)). We will
focus on deriving the solution to the maximization of the sum of the second
utility function Uk,2 (ck (n)). Let us call j the user selected to be served at
parameterize the movement of the sum of the second utility functions in the
as:
25
3.2. Solution to the proposed Shared Resource Allocation Problem
User j is served and all other users are not. Given the update equations of
tj (n) − cj (n)
Fj,2 (β) = Uj,2 (cj (n) + β )+
T
X ck (n)
Uk,2 (ck (n) − β ). (3.19)
T
k∈S
k6=j
get:
∂Fj,2
Since we are looking to maximize ∂β , we can ignore the second term of
(3.21) as this term is a sum which is common to all users in the network. We
also know the expression of Uk,2 , so the expression of the maximum gradient
direction is
( )
λµ
k ∗ = argmax tk (n) exp(−µ(ck (n) − cmin )) . (3.22)
k T
Essentially, this means that the system will maximize the utility of users
26
3.2. Solution to the proposed Shared Resource Allocation Problem
vided frames which the decoder can always decode or if the decoder does not
have to wait for other frames before being able to decode those frames, then
video users can watch video sequences with no perceptible delay and this
will enhance the Quality of Experience of video users. This sort of procedure
helps counter error propagation within the video decoding process, therefore
resilience. Using (3.5), (3.12) and (3.22), the final expression of the metric
for the proposed scheduling framework (3.15) can then be expressed as:
dk (n) λµ
+ tk (n) exp(−µ(ck (n) − cmin )). (3.23)
rk (n) T
For the rest of this thesis, we shall refer to our proposed scheduling scheme
27
Chapter 4
System Model
this chapter we will cover the components that are of utmost relevance to this
Analytical traffic models have been proposed for near-real time video stream-
ing in [18], where the packet sizes and packet inter-arrival times are based
28
4.1. H.265/HEVC Video Content Generation
variability in the packet sizes coming from the video source, it is agnostic to
the application level experience of H.265/HEVC video users and to this end,
test sequences which were used for development and testing purposes by
The characteristics of these video test sequences are given in Table 4.1. For
trace files using HM 14.0 [21], from which we extract the information of the
Reference Picture Lists, as defined in Section 2.2.2, for all frames in order
For simplicity, we assume that each frame consists of only one slice seg-
ment (see Section 2.1), so that each frame is encoded inside one LDU. The
GoP size is set to 8, the Intra-Period is defined as the interval between two
that an I-Frame can be found approximately every second. Its value de-
pends on the frame-rate of the video sequence: for a frame rate of {20, 24,
30, 50, 60} fps, the Intra-Period is set to {16, 24, 32, 48, 64} (respectively).
Aside from I-Frames, we use B-Frames only. Using the bitstreams generated
from the video sequences we selected, we create a custom Traffic Model for
each video sequence and use it as input to our LTE-A simulator, which is
29
4.2. LTE-Advanced System Model
In this section, we will describe some of the components and features that
assumed to have three sectors each in order to provide coverage, thus there
grid layout is provided in Fig. 4.1. To ensure that all cells experience
30
4.2. LTE-Advanced System Model
base stations. The central cluster is where the users are created and where
all of the statistics are collected. Fig. 4.2 illustrates the concept of wrap-
around. Virtual clusters are depicted in grey while the central cluster is
yellow. The surrounding clusters are virtual clusters in the sense that no user
is actually dropped there. All the cells in the virtual clusters are copies of the
31
4.2. LTE-Advanced System Model
original cells in the central cluster. Everything the virtual cells have is the
same in terms of antenna configuration, traffic and fast-fading, with the only
random locations in the central cluster. For all base stations, we assume that
each sector uses 4 transmit antennas and each user uses 2 receive antennas.
32
4.2. LTE-Advanced System Model
Fig. 4.3 depicts a PRB allocation with 4 users in a system with 10 MHz of
bandwidth. Note that at 10 MHz, the last group only contains 2 PRBs as
33
4.2. LTE-Advanced System Model
We model two types of traffic: Best Effort (BE) traffic and video traffic.
Traffic type assignment probability between BE and video is 0.5 each. Usu-
ally users are assumed to be active for the entire duration of the simulation,
i.e. they are created at the beginning of the simulation and dropped at the
end of the simulation, as stated in [18]. In this thesis, we decided to use more
realistic traffic models. Users are created at random time instants accord-
until they have completed their session or until they are dropped from the
network. For the BE traffic model, we use FTP Traffic Model 1 defined in
the 3GPP Technical Report [19] and whose parameters are summarized in
Table 4.3.
Similarly, we define a traffic model for video users; in this thesis we use
our own custom traffic model. Because we need information about frame
streams for use in our performance evaluation. Section 4.1 covers the actual
generation of the video bitstreams in more detail. We wrap the video bit-
worth of video data. This helps us generate video traffic representing one
minute’s worth of video data. Video users remain in the network until there
are no more packets left for them to receive. The parameters of our video
34
4.2. LTE-Advanced System Model
For every user in the network, we need to model the effects of the large-scale
tion and fading characteristics of the channel may be different. In this thesis,
M.2135 [25]. The ITU-R scenario defines users traveling at vehicular speeds
(30 km/h) whereas the 3GPP Urban Macrocell scenario defines users as
traveling at pedestrian speeds (3 km/h). The reason for using the 3GPP
data rates, which are more practical if the users are moving at pedestrian
such as the Spatial Channel Model [26] to capture these aspects. Typically,
channel models capture the number of clusters3 and their spatial character-
istics such as the delay spread, the angular spread and the power carried
tool we used, IMTAphy, uses the channel model specified by the ITU-R in
3
In this thesis, we will interchangeably use the terms ”Cluster” and ”Tap”.
35
4.2. LTE-Advanced System Model
report M.2135 [25]. In [25], the channel model for the Urban Macrocell sce-
use is the Spatial Channel Model [26], which is a 6-tap model. There are two
reasons for choosing the 3GPP Spatial Channel Model. The first reason is
that although the ITU-R Channel Model is more accurate, it requires a large
also requires high computational power due to having to sum a large num-
ber of clusters for every link, for every subcarrier and for every time-slot.
two different scheduling schemes. The relevant aspect of the channel model
acteristics of the channel such as Delay Spread and Angular Spread rather
radio channel can typically be described through its large-scale and small-
formula used for the Urban Macrocell scenario is defined in [24] as follows
where P L denotes the mean path loss in dB between a given user and a given
base station and d denotes the distance between the user and the base station
around 2 GHz. The distance between a user and a base station must always
36
4.2. LTE-Advanced System Model
account for the aspects of modelling a MIMO channel and are given for a
given pair of antennas s and u (resp. station and user) and a given cluster
n:
r r
1 N LoS KR
hLoS (t)
hu,s,n (t) + n = 1,
KR + 1 u,s,n
KR + 1
hu,s,n (t) = r (4.2)
1 N LoS
h (t) 2 6 n 6 N,
KR + 1 u,s,n
is applied only to the first cluster. The way the Spatial Channel Model is
designed, the first cluster is the cluster for which the delay is the shortest.
The non line-of-sight channel component is expressed for a given cluster and
T
r M
Pn X Frx,u,V (θn,m )
hN LoS
u,s,n (t) =
M
m=1 Frx,u,H (θn,m )
√
exp(jΦvv
n,m ) κ−1 exp(jΦvh
n,m ) Ftx,s,V (φn,m )
√
κ−1 exp(jΦhv
n,m ) exp(jΦhh
n,m ) Ftx,s,H (φn,m )
exp(jds 2πλ−1 −1
0 sin(φn,m )) exp(jdu 2πλ0 sin(θn,m )) exp(j2πνn,m t) (4.3)
where Pn is the power of the nth cluster, M is the number of rays within the
cluster, Frx,u,V and Frx,u,H are the field patterns of the uth antenna element
37
4.2. LTE-Advanced System Model
Ftx,s,V and Ftx,s,H are the field patterns of the sth antenna element at the
θn,m and φn,m are the arrival and departure angles of the mth ray in the
nth cluster, ds and du are the distance between antenna elements at the
component of the mth ray of the nth cluster and t is the time instant.
Φvv vh hv hh
n,m , Φn,m , Φn,m and Φn,m are uniformly generated random phases used
T
vv
Frx,u,V (θLoS ) exp(jΦLoS ) 0 Ftx,s,V (φLoS )
hLoS
u,s,n (t) =
Frx,u,H (θLoS ) 0 exp(jΦhh
LoS ) Ftx,s,H (φLoS )
exp(jds 2πλ−1 −1
0 sin(φLoS )) exp(jdu 2πλ0 sin(θLoS )) exp(j2πνLoS t) (4.4)
where Frx,u,V and Frx,u,H are the field patterns of the uth antenna element
Ftx,s,V and Ftx,s,H are the field patterns of the sth antenna element at the
θLoS and φLoS are the arrival and departure angles of the line-of-sight ray,
38
4.2. LTE-Advanced System Model
The channel impulse responses given by (4.2) are expressed in the time-
domain channel matrix at the kth subcarrier for a 4x2 MIMO system are
given as:
H1,1 (k) H1,2 (k) H1,3 (k) H1,4 (k)
H(k) = , k ∈ {1, 2, ..., NF F T } (4.5)
H2,1 (k) H2,2 (k) H2,3 (k) H2,4 (k)
where NF F T is the Fast Fourier Transform size. Let us denote the Fast
Hu,s (k) = F[hu,s,1 (t), hu,s,2 (t), ..., hu,s,N (t)], k ∈ {1, 2, ..., NF F T }. (4.6)
In the specific case of LTE, the subcarrier spacing is defined as 15000 Hz.
least higher than 10 MHz and that is a multiple of the subcarrier spacing, i.e.
15000 Hz. Since Fast Fourier Transforms are optimized for lengths that are
39
4.2. LTE-Advanced System Model
ter. It is shown in Chapter 8 of [27] that with CSI knowledge at the trans-
mitter, one can extract the maximum performance available from MIMO
systems. The 3GPP standard has outlined several control signalling mech-
anisms for each of the transmission modes it defines. In this thesis, we use
sending information about the channel matrix itself, the user sends quan-
tized information about different channel statistics that can help the net-
The RI is the rank of the channel matrix, i.e. the number of degrees of
freedom that it can carry. The PMI is the index of the Precoding Matrix
that maximizes the received power at the receiver and the CQI is the spectral
efficiency that the receiver would be able to achieve. The PMI and CQI
reports are conditioned upon the value of the RI. The reporting mode we
use in this thesis is the Aperiodic CSI Reporting Mode 3-1, as defined in
40
4.2. LTE-Advanced System Model
Section 7.2.1 of [28]. Other reporting modes are also defined by the 3GPP
[28].
PMI report and several subband CQI reports. The size of a subband is
MHz in [28]. Thus, a single CSI report from the user will contain one value
for the RI, one value for the PMI and nine values for the CQI (one CQI
value per subband). In this thesis, we assume that the periodicity of the
CSI reports is set to 5 ms. The RI is typically a statistic that is reported less
frequently than the PMI or the CQI and its periodicity is set to 20 ms. For
41
Chapter 5
Some of the key targets specified by the NGMN Alliance for 5G networks
hanced Quality of Experience. These targets are defined and outlined in [5].
user throughput for 95% of the time across 95% of the coverage area. This
Function (CDF) of the user throughput. We also look at the average user
including insights gained from our results. So far, all the works in the field
is that they only capture performance metrics (for instance user throughput
and served cell throughput) in a range where the network is operating at full
on the time of the day, it is useful for carriers to have a more complete
42
5.1. Simulation Assumptions
using traffic models where user arrivals are modelled according to a Poisson
points because there is a small number of users in the network, which results
in low interference and high user throughputs. This ensures that users that
enter the network are served quickly and leave quickly. This scenario is
excellent, they are earning little revenue due to the small number of users.
Conversely, we expect that performance will be bad at high traffic load points
because there is a large number of users in the network, which results in high
carriers because although revenues are high due to the large number of
this will lead to customer dissatisfaction. The desirable scenario for carriers
is intermediate traffic loads: where the number of users on the network leads
leads to acceptable throughputs and users can enjoy reasonably good Quality
of Experience.
The main components of our system model are described in Chapter 4. Here,
43
5.1. Simulation Assumptions
we describe some of the other assumptions made. We assume that the base
video bitstream and identify specific LDUs. Since our LTE-A base stations
can parse video bitstreams, they can specifically look for each user’s LDUs
and keep track of the Ref Count field in the LDUs. Using the information
carried by the Ref Count field, the LTE-A system can then keep track of the
referenced frames being sent to each video user, using exponential smoothing
evaluation using metrics which have been proposed for 5G systems. For our
LTE-A system, we decide to model a 4x2 MIMO system. We also assume the
ple transmit antennas and multiple receive antennas can provide additional
is that SU-MIMO will focus on sending multiple data streams towards the
44
5.1. Simulation Assumptions
same user whereas MU-MIMO will focus on sending data streams towards
and assume the use of 4-Tx Release 12 Precoding Matrices [28]-[30]. Trans-
mission Mode 10 is a mode where the system allows the use of so-called
of this thesis to describe the physical layer procedures and processing fea-
tures that are relevant for the operation of Transmission Mode 10. More
error rate value. Several methods exist in the literature such as Exponen-
tial Effective SNR Mapping (EESM) [32] and Mutual Information Effective
SNR Metric (MIESM) [33]. In this thesis, we use EESM. The basic idea
over Nsc subcarriers with instantaneous SNR value γk at the kth subcarrier.
Nsc !
1 X γk
γef f = −β ln exp − , (5.1)
Nsc β
k=1
The resulting γef f is then mapped to a corresponding block error rate. The
values of the β parameters depend on the modulation and the code rate,
e.g. β = 1.49 for Quaternary Phase Shift Keying (QPSK) with a code rate
1
of 3 or β = 7.68 for Quadrature Amplitude Modulation (QAM)-16 with a
45
5.1. Simulation Assumptions
4
code rate of 5. These values can be found in Table 19.13, Chapter 19 of
[20]. Several sources exist for the values of β that can be applied in an
LTE or LTE-A system, for our simulations we use the β values given in [32].
Parameter values for our LTE-A simulations are summarized in Table 5.1
and reflect those used in study items that 3GPP technical groups have used
is 0.5 each and in our simulations, the user arrivals rates for the two traffic
models, i.e. BE and video, are equal. This ensures that the average number
of users generated for each traffic type is the same. The length of the
simulation is chosen such that we generate at least 8000 users for each traffic
type. This was done to ensure that all the metrics that are reported in this
thesis are obtained within a 95% confidence interval of ±10% around the
mean value.
46
5.1. Simulation Assumptions
We use offered load per sector and Resource Utilization (RU) as our
reference points. This is because for finite buffer traffic models the 3GPP
lular network goes through and we decided to align our methodology with
radio resource blocks allocated for data traffic to the total number of ra-
dio resource blocks in the system bandwidth available for data traffic [19].
We first ran simulations using the Proportional Fair scheme and determine
the offered loads corresponding to RU values between 40% and 70%. Then
we ran simulations using the proposed scheme for those offered loads and
compare the resulting performance and QoE for both BE users and video
users. These offered loads are listed in Table 5.2. It can be seen that for
the PF scheme, the offered load per sector values range between 5.88 Mbps
per sector and 6.94 Mbps per sector. The 95% confidence interval for the
For video users, we report the Active Download Time (ADT), the satis-
fied video user percentage and the packet loss ratio of Clean Random Access
i.e. a packet loss ratio of 3%, is enough to make the Quality of Experi-
ence mediocre. Clean Random Access NAL units carry the encoded video
terms of bit rate. Since the decoding of the whole video sequence is basi-
cally reliant on the correct decoding of these LDUs, the packet loss ratio of
47
5.2. Simulation Results and Discussion
these LDUs provides a good indication of how much video content becomes
non-viewable.
For BE users, we report the absolute values of the average user through-
put and the 10th-percentile of the user throughput CDF. We also report the
average user throughput in the outer region of every cell. The reason we
users (8000 video users and 8000 BE users respectively) take between 48 to
72 hours of run time. In order to generate results where the 95% confidence
would need to generate possibly over 30000 users. This could potentially
this thesis, we will refer to the 10th-percentile of the user throughput CDF
as the ratio of the total volume of the transferred data to the download time.
For BE users, the download time is defined as the difference between the
time instant of the last packet correctly received by the user and the time
In this section, we present our simulation results and discuss the main find-
ings. We will present our results for video users followed by those for BE
users.
48
5.2. Simulation Results and Discussion
For the performance evaluation of video users, we consider two metrics. The
first metric that we introduce is the ADT: which is the time a video user
spends actively downloading video content. The second metric is the MOS
The 95% confidence intervals for the active download time are within
±6% of the reported values. Fig. 5.1 shows the active download times video
users spend downloading video content while they are in the network. Using
the Proportional Fair scheme, video users spend between 3.5 seconds and
8 seconds downloading video content (for offered loads between 5.9 Mbps
per sector and 6.9 Mbps per sector respectively). These numbers can be
explained by the fact that with the Proportional Fair algorithm tries to be
fair to all users, video and BE alike. Resources end up being shared by all
users. Using our proposed scheme, video users are given higher importance
49
5.2. Simulation Results and Discussion
2
5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.2
Offered Load [Mbps / Sector]
being served more quickly, as Fig. 5.1 shows. For offered loads between 5.9
Mbps per sector and 6.9 Mbps per sector, video users spend between 2.2
seconds and 4.2 seconds downloading video content. This is very significant
as any time video users do not spend downloading video content means that
video services is the MOS which reflects the quality of the viewing experience
from the users’ perspective. We are going to look into the MOS that users
would give based on the Packet Loss that they experience, which we denote
50
5.2. Simulation Results and Discussion
100
PF − MOS > 4
Satisfied Video User Percentage [%] 95
FRA−PF − MOS > 4
90
85
80
75
70
65
60
55
50
5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.2
Offered Load [Mbps / Sector]
satisfied video user percentage results are within ±8% of the reported values.
It was shown in [9] that the MOS is very sensitive to the Packet Loss Ratio
(PLR). The findings in [9] were that for PLRs below 1.5% correspond to a
a video user’s MOS is only affected by the PLR it experiences, we can state
that the QoE of a video user will be high if the PLR is below 1.5% (i.e.,
its MOS will be greater than 4, and the video user will be satisfied). The
QoE will be low if the PLR is higher than 1.5% (i.e., its MOS will be lower
than 4, and the video user will experience significant degradation). Fig. 5.2
shows the results in terms of video user percentage for which the MOS is
greater than 4.
51
5.2. Simulation Results and Discussion
over BE users. As can be seen from Fig. 5.2, for offered loads around 5.9
Mbps per sector, both PF and FRA-PF schemes are able to satisfy over
more quickly as the load increases: for offered loads around 6.8 Mbps per
sector, the FRA-PF scheme can satisfy over 80% of video users whereas the
Access (CRA) LDUs lost. I-Frames are typically carried inside CRA LDUs
and they represent the most significant portion of the bitstream in terms
the H.265/HEVC standard, I-Frames are the frames that are referenced the
most throughout a video sequence and the loss of an I-Frame causes error
our settings for the Intra-Period so that two I-Frames are one second apart
Intuitively, the loss of an I-Frame causes the loss of about one second
of video content to the end user because all subsequent B-Frames reference
frames as possible. Fig. 5.3 shows the results obtained for CRA LDU loss
ratio. Since the proposed FRA-PF scheme is able to locate referenced frames
52
5.2. Simulation Results and Discussion
10
PF
9
FRA−PF
8
CRA LDU Loss Ratio [%]
0
5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.2
Offered Load [Mbps / Sector]
ratio.
example. The original video bitstream contains 9 CRA LDUs and 600 LDUs
in total. Since we wrap the bitstream around 6 times, this results in a total
of 54 CRA LDUs for a given user. With the PF scheme, the CRA LDU
loss ration goes from 1.6% to 9.0% out of the total 54 CRA LDUs as the
offered load changes from 5.9 to 6.9 Mbps per sector. This corresponds to
at least 1 LDU or at worst 5 LDUs. For offered loads near 7 Mbps per
sector, this means that as much as 5 seconds of video content becomes non-
viewable because of the loss of CRA LDUs. With the proposed FRA-PF
scheme, the CRA LDU loss ratio goes from 0.1% to 1.18% out of the total
54 CRA LDUs as the offered load changes from 5.9 to 6.9 Mbps per sector.
53
5.2. Simulation Results and Discussion
This means that in either case up to 1 LDU is lost. For offered loads near
7 Mbps per sector, this means that as much as 1 second of video content
how the proposed FRA-PF scheme provides the decoder with the reference
frames to facilitate the task of decoding and also how the proposed scheme
locates the packets with greater importance for the H.265/HEVC decoder.
playback at the end user and contributes to enhance the viewing experience
scheme reduces the loss of packets carrying referenced frames, which will
For BE users, we report the absolute gains of the average throughput and the
intervals of the average throughput and coverage throughput are within ±3%
Fig. 5.4. The offered load values of interest to us are in the range of 5.9
to 6.9 Mbps per sector. From Fig. 5.4, it can be seen that the with the
between 15 Mbps and 10 Mbps. With our proposed FRA-PF scheme, users
This is explained by the fact that our proposed FRA-PF scheme serves video
54
5.2. Simulation Results and Discussion
16
PF
FRA−PF
15
Throughput [Mbps]
14
13
12
11
10
5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.2
Offered Load [Mbps / Sector]
ity of more radio resources helps BE users leave the network more quickly
resources to the right users at the right time will benefit all users. This is
by the network and the average throughputs users can get on average.
Fig. 5.5 shows the coverage throughput results for offered load values
between 5.9 and 6.9 Mbps per sector. As expected, the coverage throughput
ing the PF scheduling scheme, users can expect to get coverage throughputs
between 4.9 Mbps and 1.5 Mbps. In an LTE-A system using our proposed
FRA-PF scheme, for the same offered load values, users can expect to get
55
5.2. Simulation Results and Discussion
7
PF
FRA−PF
6
Throughput [Mbps]
1
5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.2
Offered Load [Mbps / Sector]
coverage throughputs between 6.1 Mbps and 3.6 Mbps. This result is a lot
more significant than the average user throughput we have shown earlier.
It shows that 90% of users can expect a throughput of at least 3.6 Mbps,
which is more than double the throughput with the PF scheduling scheme.
fixed. For other services, e.g. Web Browsing, higher throughput can trans-
late into noticeably faster loading times and enhanced Quality of Experience.
We stated that the 95% confidence intervals of the coverage throughput are
within ±9% of the reported values, this is due to the fact that the statistics
of users that experience relatively low Signal to Interference and Noise Ratio
(SINR) are very sensitive. We model random user arrivals in our simula-
56
5.2. Simulation Results and Discussion
tions, which leads to inter-cell interference that varies with time. For users
Finally, we examine the statistics for users that are geographically lo-
cated within the area covering the outer 10% of the coverage area, as de-
picted in Fig. 5.6, we will call this region the cell-edge region. The area
√
A of a hexagon is calculated as A = 2 3a2 , where a is the apothem of the
knowing that the inter-site distance is equal to 500 meters, we can easily
find that the apothem size is then 250 meters. The users in the cell-edge
region are those who lie outside the inner hexagon, i.e. outside the hexagon
of apothem a0 ' 237 meters. The results are shown in Fig. 5.7. For offered
loads ranging from 5.9 to 6.9 Mbps per sector, users in the cell-edge region
scheme. With our proposed FRA-PF scheme, users in the cell-edge region
experience throughputs ranging from 12.2 to 9.9 Mbps per sector for the
same offered load values. The trend is consistent with those for the average
the average throughput of users in the cell edge region is higher than the
57
5.2. Simulation Results and Discussion
12.5
PF
12
FRA−PF
11.5
Throughput [Mbps]
11
10.5
10
9.5
8.5
8
5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.2
Offered Load [Mbps / Sector]
users randomly over time, which leads to inter-cell interference varying over
time. As a result, a user located in the cell-edge region but not interfered by
where every cell in the network is always transmitting at all times, therefore
high throughputs are achievable or not. Users located closer to their serving
cell would suffer from lower path loss, this would translate to higher average
this is no longer true due to users arriving randomly in the network and be-
58
5.2. Simulation Results and Discussion
ing subject to the inter-cell interference that is present during the time the
user is in the network. Of course, path loss always plays a significant role in
dictating overall performance but this is now tempered by the fact that users
should target users that require the lowest amount of resources in order to be
satisfied. This will help the system deliver better user experience to all users
in the network. The QoE of all users improves thanks to the departure of
other users and our proposed scheme does that by serving video users faster.
This benefits all users in the network and helps provide a more consistent
user experience across the whole network, which is in line with the objectives
of future 5G networks.
59
Chapter 6
This chapter summarizes the main contributions of the thesis and provides
6.1 Contributions
and the process through which the H.265/HEVC encoder transmits this
work which allocates resources to video users that need to receive referenced
frames.
buffer traffic models. To the best of our knowledge, there is no similar work
reported in the literature. Results showed that both video and BE users
benefit from the proposed scheduling framework. Video users benefit from
60
6.2. Future Work
frames in the video sequence. As long as there are such frames in the
users and allocate resources to them. This allows video users to download
video content more quickly and allows BE users to access resources more
quickly, leave the network more quickly and enjoy higher throughputs on
average as a result.
is that they provide a consistent user experience across the coverage area.
Results showed that 90% of BE users can expect to get between 1 Mbps
region of each cell actually experience much higher throughputs than the
10th percentile of the user throughput CDF. This shows that users that
61
6.2. Future Work
If one were to focus on the communications side, one direction for future
with some 3GPP Release-12 features such as the Release 12 4-Tx Linear
Precoding. This is because at the time the work was undertaken, 3GPP
was still working on Release 13 and no air-interface had yet been proposed
video users and best effort users. We also did not consider any admission
control policies in our traffic models, which would regulate traffic arrival in
high load situations and can have a significant impact on user experience.
Another direction for future work could be to look into traffic offloading
schemes. Since 3GPP Release 8, the 3GPP community has been introducing
better QoE to its own users, and therefore provide a more consistent user
62
6.2. Future Work
scheduling frameworks which will provide the best user experience and at
one direction for future work could be the actual evaluation of subjective
mentation is not designed to be robust against any form of packet loss and
can reconstitute samples of bitstreams with missing LDUs and output the
testing and gain insights into how the loss of specific packets impacts the
viewing experience. This will give much clearer insights into how packet loss
and Quality of Experience are related for video services, and more specifi-
cally how much the loss of packets carrying I-Frames hurts the Quality of
Experience.
63
Bibliography
[1] Cisco, “Cisco Visual Networking Index: Global Mobile Data Traffic
[2] ITU-T, Advanced Video Coding for generic audio visual services - Rec-
April 2013.
[4] M. Wien, High Efficiency Video Coding - Coding Tools and Specifica-
drews, “Video Capacity and QoE Enhancements over LTE,” IEEE In-
64
Bibliography
2014.
1998.
December 2012.
[13] Y.-K. Wang, R. Even, T. Kristensen, and R. Jesup, RTP Payload For-
65
Bibliography
[15] F. Kelly, “Charging and rate control for elastic traffic,” European Trans-
[17] P. A. Hosein, “QoS Control for WCDMA High Speed Packet Data,”
Network, 2002.
2010.
2009.
09-30.
05-24.
66
Bibliography
[26] “3GPP TR 25.996 v9.0.0 - Spatial channel model for Multiple Input
67
Bibliography
September 2009.
68