Sim2024 Sara
Sim2024 Sara
Abstract—The High Efficiency Video Coding (H.265/HEVC) Video 1 (AV1) [5], which have surpassed HEVC as state-of-
standard is currently in a phase of its life cycle where hardware the-art standards, the HEVC is currently in a phase of its life
acceleration support is prevalent across modern video-enabled cycle where widespread hardware acceleration support is
consumer devices. In contrast, its immediate successor, the prevalent across modern video-enabled consumer devices,
Versatile Video Coding (H.266/VVC) standard, does not yet including TVs, smartphones, mobile computers, gaming
benefit from hardware acceleration by any general-purpose or consoles, and various others. Moreover, the development of
graphics processor. This works evaluates the current state of hardware acceleration for these newer codecs is a time-
HEVC hardware acceleration on modern devices, including an
consuming process. Currently, only a few high-end general
Intel laptop processor, an Nvidia desktop graphics processor
purpose and graphics processors support AV1 encoding, with
and a Qualcomm mobile processor. The results of this work
provide an opportunity for researches to assess whether the
none yet supporting VVC encoding. Therefore, even in the
impact in visual quality and bit rate caused by their own presence of more modern codecs, HEVC maintains significant
solutions aligns with the characteristics of devices available in relevance in the industry.
the market. The purpose of this study is to assess the current state of
HEVC hardware acceleration on modern devices. To conduct
Keywords—HEVC, Hardware encoder, Video coding. this evaluation, a general-purpose processor, a graphics
I. INTRODUCTION processor, and a mobile system-on-chip (SoC) were included.
Additionally, tests were also conducted with a fast software
Nowadays, people increasingly rely on digital video- encoder, commonly used when hardware acceleration is not
related services for their leisure, study, work, and available. The results obtained from these encoders were
communication routines. The encoding of video content for directly compared with the reference encoder of the HEVC
compression purposes becomes indispensable, considering standard.
that the amount of data required for the representation of
visual content, without compression, becomes impractical. The results of this work can be highly valuable for
For instance, without compression, an episode of a series in researchers dedicated to the development of algorithms and
Ultra-High-Definition (UHD) 4K resolution (3840x2160 architectures aimed at optimizing the video encoding process.
pixels) at 24 frames per second (fps), with a duration of 40 These findings provide an opportunity to assess whether the
minutes and 24 bits per pixel, has a size of approximately 1.5 impact on the relationship between visual quality and bit rate
terabytes. This exceeds the total storage capacity commonly caused by their solutions aligns with the characteristics of
found in current personal computers, and furthermore, devices currently available in the market.
streaming this video in real-time far exceeds the bandwidths This paper is organized as follows: Section II offers a brief
commonly available to consumers. background on video coding and outlines the features of the
However, there are dedicated tools to reduce the amount HEVC standard. Section III details the experimental setup,
of data needed for video playback, making it a feasible task. while Section IV presents and discusses the results of the
A codec (encoder/decoder) is dedicated to video compression experiments. Finally, Section V concludes this paper.
and decompression and can be implemented in software or
hardware. Different codecs support the exploration of various II. VIDEO CODING BACKGROUND
algorithms for compression purposes. Nevertheless, in Most of the contemporary encoders are based on the
solutions for fast and low-power encoding, whether following signal and data processing operations: (i) inter- and
implemented in software or hardware, many of these intra-frame prediction, (ii) de-correlating transform (T
algorithms need to be discarded or simplified, resulting in a module), (iii) quantization (Q module), and (iv) entropy
loss in compression quality compared to encoders that coding, as shown in Figure 1 [6].
implement more of the available features.
Moreover, a reconstruction loop (decoding path) with
The High Efficiency Video Coding (HEVC), also called inverse quantization (IQ module) and inverse transform (IT
H.265 [1], was collaboratively developed by the Video module) is also included. This design ensures that the encoder
Coding Experts Group (VCEG) and Moving Picture Experts relies solely on reference frames accessible to the decoders,
Group (MPEG). Introduced in 2013, HEVC succeeded enabling the accurate replication of identical predictions.
H.264/AVC [2] and brought about a significant enhancement Additionally, an optional in-loop filtering module can also be
in compression performance, offering up to a 50% reduction included in the reconstruction loop to improve the subjective
in bit rates for equal perceptual quality [3]. Despite the release image quality of reconstructed frames [6].
of the Versatile Video Coding (VVC) [4] and the AOMedia
Original Prediction
Original samples residue
650 was the first, released in 2015 [10]. However, prior to the
Ref
Ref
Ref
frame
Ref Original + − T Q
Entropy
coding commercial release of hardware-accelerated devices, the x265
Input video samples − Predicted [11], a fast and open-source software codec, had already been
samples
Inter-frame utilized since 2013 by users seeking superior encoding to
prediction
Reference Mode Quantized H.264/AVC.
samples decision residue
Intra-frame In this study, encoding experiments were conducted
prediction
following the Common Test Conditions (CTC) [12], which
Reference 01011010
Ref
Ref In-loop establish a set of video sequences and coding parameters.
Ref
frame
Ref Filtering Rec. + Rec.
IT IQ 11011101
10011011
These guidelines are recommended for all researchers
predicted prediction
samples residue
Output conducting a comparative evaluation of works related to
bitstream
HEVC using the HM reference software, ensuring the
Fig. 1. Diagram of a typical hybrid block-based video encoder. reproducibility of the tests. The performance of each tested
encoder was evaluated in terms of encoding time, bitrate, and
Hybrid block-based encoders apply the abovementioned distortion measured by Peak Signal-to-Noise Ratio (PSNR).
operations after partitioning frames into smaller blocks and Finally, the performance of each fast encoder was compared
use both motion- and still-picture coding techniques. to the reference encoder to determine the Bjøntegaard Delta
The focus of this work is the HEVC standard, which Bitrate (BD-BR) [13] values. The BD-BR is a metric
introduced several features aimed at enhancing compression employed to measure the bitrate disparity between two video
efficiency. The coding structure in HEVC is based on Coding codecs while taking into account their respective PSNR
Tree Units (CTUs), providing flexibility with variable block values. A negative BD-BR value signifies an enhancement in
sizes from 64x64 to 8x8 pixels, allowing for improved bitrate efficiency (lower bitrate for the same quality), whereas
representation of diverse regions within a frame [3]. a positive value indicates a decline in bitrate efficiency.
For intra-frame prediction, HEVC added various new A. Software Experiments
modes and improved others from its predecessor, such as The first stage of this work involved generating the
angular, planar, and DC modes, enhancing the accuracy of encoding results of the HEVC Test Model version 18.0 (HM)
predicting pixel values within a block. Inter-frame prediction [14]. The HM is maintained by the VCEG and MPEG groups,
added advanced motion compensation techniques supporting serving as a reference implementation to ensure compliance
a broader range of motion vectors, contributing to improved with the HEVC standard. The primary purpose of the
efficiency [3]. reference software is not real-time encoding and, because of
that, the encoding of just a few seconds of content can take
The HEVC added support to multiple transform types and
hours or even days to finish. Therefore, the results of the fast
the partitioning scheme of Transform Units (TUs), which also
encoders will be compared against the HM results to assess
show flexibility, varying in size from 4x4 to 32x32 pixels. In-
the impact of real-time encoding in terms of BD-BR.
loop filters, such as the Deblocking Filter and Sample
Adaptive Offset (SAO), have been integrated to reduce Then, similar experiments were done in the
compression artifacts and enhance visual quality [3]. MulticoreWare x265 software through its libx265 library
within the FFmpeg 6.1.1 [15]. It is worth noting that x265
Entropy coding in HEVC relies on Context Adaptive
offers ten speed configurations, which alter various encoding
Binary Arithmetic Coding (CABAC), adapting to the statistics
parameters to accelerate the encoding at the expense of
of encoded data for increased compression efficiency [3].
compression quality. For instance, the fastest preset drastically
Finally, the HEVC extensions added support to higher bit limits the block partitioning tree (larger=32, smaller=16),
depths beyond 10 bits per sample, including support for high while the slowest preset allows the use of all block sizes
dynamic range (HDR) content of up to 16 bits per sample, supported by the standard [11]. Therefore, for each encoded
beneficial for applications requiring enhanced color precision. sequence, the slowest preset that still achieved real-time
Support for different chroma sampling schemes were also encoding was selected, ensuring maximum quality.
added, such as monochrome, 4:2:2, and 4:4:4. Additionally,
These experiments were conducted on a server equipped
specialized tools and modes optimized for screen content
with an Intel Xeon Silver 4314 (2.4GHz) processor.
coding (SCC) were added to process scenarios involving
computer-generated content and screen-sharing applications B. Hardware Experiments
[7]. Encoding experiments were done using the following
Collectively, these features made the HEVC a suitable devices:
choice for a range of applications, including UHD video • CPU Intel Core i7-11800H (2.30GHz) from a Dell
streaming and broadcasting. G15 5511 laptop;
III. HEVC EXPERIMENTS • GPU Nvidia GeForce RTX 4070 from a custom
An analysis of the release history of devices with hardware desktop computer;
acceleration for HEVC encoding for 4K resolution at 30 fps
• SoC Qualcomm Snapdragon 8 Gen 3 from a Samsung
revealed that the first Graphics Processing Unit (GPU) to offer
Galaxy S24 Ultra smartphone.
this support was the Nvidia GTX 950 (Maxwell 2nd gen
microarchitecture) in 2014 [8]. In the case of Central The corresponding hardware-accelerated encoders
Processing Units (CPU), the initial support was introduced by interfaced with FFmpeg through the libraries hevc_qsv,
the 6th generation Intel Core (Skylake microarchitecture) in hevc_nvenc, and hevc_mediacodec, respectively. An x86-64
2015 [9]. Regarding mobile SoCs, the Qualcomm Snapdragon build of FFmpeg 6.1.1 was used for the laptop and desktop
computers, while an aarch64 build was utilized for the It is noteworthy that all fast encoders, including the
smartphone. software-based x265, achieved real-time encoding for all test
sequences up to a resolution of 1080p (classes B and below).
Similar to libx265, all these libraries provide various However, for UHD 4K sequences, the x265 failed to encode
presets that offer different trade-offs between encoding speed any sequences in real-time. The Intel and Qualcomm encoders
and encoding quality, and the presets were selected to successfully encoded two out of six sequences in real-time,
maintain real-time encoding while maximizing the encoding while the Nvidia encoder fell short of achieving real-time
quality. encoding only for the Drums sequence, which is the most
IV. RESULTS demanding test sequence running at 100 fps. Nevertheless,
when considering the averages from Table II, the Nvidia
Table I presents the encoding speed results for both the encoder demonstrated real-time processing for all classes.
HM and each of the fast encoders, where 1.00x indicates real- This was an expected result, because the desktop processors
time speed. Additionally, it displays the BD-BR results for have a higher thermal design power (TDP) when compared to
each fast encoder when compared to the HM. Table II depicts their mobile counterparts.
the same results, presented as class averages.
Fig. 2 illustrates the BD-BR averages from Table II. The
As previously mentioned, the HM is not optimized for most prominent characteristic on the graph is the green line
real-time speed, as evidenced by the encoding speed results representing the Qualcomm encoder, which is significantly
shown in the tables below. For instance, the Tango sequence, higher than the others, indicating a much greater impact on
which is an 10-bit UHD 4K video running at 60 fps with a BD-BR. However, this can be attributed to the fact that this is
duration of four seconds, required 75 hours for encoding. a mobile SoC used in a smartphone, with limitations related to