0% found this document useful (0 votes)
22 views4 pages

Sim2024 Sara

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views4 pages

Sim2024 Sara

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Performance Analysis of

Hardware-Accelerated HEVC Encoders

Sara Vitória Henssler1, Márcio Spenst1, Luciano Agostini2, Marcel Corrêa1


[email protected], [email protected], [email protected], [email protected]
1
Instituto Federal de Educação, Ciência e Tecnologia Sul-rio-grandense (IFSul), Bagé, Brasil
2
Video Technology Research Group (ViTech), Universidade Federal de Pelotas (UFPel), Pelotas, Brasil

Abstract—The High Efficiency Video Coding (H.265/HEVC) Video 1 (AV1) [5], which have surpassed HEVC as state-of-
standard is currently in a phase of its life cycle where hardware the-art standards, the HEVC is currently in a phase of its life
acceleration support is prevalent across modern video-enabled cycle where widespread hardware acceleration support is
consumer devices. In contrast, its immediate successor, the prevalent across modern video-enabled consumer devices,
Versatile Video Coding (H.266/VVC) standard, does not yet including TVs, smartphones, mobile computers, gaming
benefit from hardware acceleration by any general-purpose or consoles, and various others. Moreover, the development of
graphics processor. This works evaluates the current state of hardware acceleration for these newer codecs is a time-
HEVC hardware acceleration on modern devices, including an
consuming process. Currently, only a few high-end general
Intel laptop processor, an Nvidia desktop graphics processor
purpose and graphics processors support AV1 encoding, with
and a Qualcomm mobile processor. The results of this work
provide an opportunity for researches to assess whether the
none yet supporting VVC encoding. Therefore, even in the
impact in visual quality and bit rate caused by their own presence of more modern codecs, HEVC maintains significant
solutions aligns with the characteristics of devices available in relevance in the industry.
the market. The purpose of this study is to assess the current state of
HEVC hardware acceleration on modern devices. To conduct
Keywords—HEVC, Hardware encoder, Video coding. this evaluation, a general-purpose processor, a graphics
I. INTRODUCTION processor, and a mobile system-on-chip (SoC) were included.
Additionally, tests were also conducted with a fast software
Nowadays, people increasingly rely on digital video- encoder, commonly used when hardware acceleration is not
related services for their leisure, study, work, and available. The results obtained from these encoders were
communication routines. The encoding of video content for directly compared with the reference encoder of the HEVC
compression purposes becomes indispensable, considering standard.
that the amount of data required for the representation of
visual content, without compression, becomes impractical. The results of this work can be highly valuable for
For instance, without compression, an episode of a series in researchers dedicated to the development of algorithms and
Ultra-High-Definition (UHD) 4K resolution (3840x2160 architectures aimed at optimizing the video encoding process.
pixels) at 24 frames per second (fps), with a duration of 40 These findings provide an opportunity to assess whether the
minutes and 24 bits per pixel, has a size of approximately 1.5 impact on the relationship between visual quality and bit rate
terabytes. This exceeds the total storage capacity commonly caused by their solutions aligns with the characteristics of
found in current personal computers, and furthermore, devices currently available in the market.
streaming this video in real-time far exceeds the bandwidths This paper is organized as follows: Section II offers a brief
commonly available to consumers. background on video coding and outlines the features of the
However, there are dedicated tools to reduce the amount HEVC standard. Section III details the experimental setup,
of data needed for video playback, making it a feasible task. while Section IV presents and discusses the results of the
A codec (encoder/decoder) is dedicated to video compression experiments. Finally, Section V concludes this paper.
and decompression and can be implemented in software or
hardware. Different codecs support the exploration of various II. VIDEO CODING BACKGROUND
algorithms for compression purposes. Nevertheless, in Most of the contemporary encoders are based on the
solutions for fast and low-power encoding, whether following signal and data processing operations: (i) inter- and
implemented in software or hardware, many of these intra-frame prediction, (ii) de-correlating transform (T
algorithms need to be discarded or simplified, resulting in a module), (iii) quantization (Q module), and (iv) entropy
loss in compression quality compared to encoders that coding, as shown in Figure 1 [6].
implement more of the available features.
Moreover, a reconstruction loop (decoding path) with
The High Efficiency Video Coding (HEVC), also called inverse quantization (IQ module) and inverse transform (IT
H.265 [1], was collaboratively developed by the Video module) is also included. This design ensures that the encoder
Coding Experts Group (VCEG) and Moving Picture Experts relies solely on reference frames accessible to the decoders,
Group (MPEG). Introduced in 2013, HEVC succeeded enabling the accurate replication of identical predictions.
H.264/AVC [2] and brought about a significant enhancement Additionally, an optional in-loop filtering module can also be
in compression performance, offering up to a 50% reduction included in the reconstruction loop to improve the subjective
in bit rates for equal perceptual quality [3]. Despite the release image quality of reconstructed frames [6].
of the Versatile Video Coding (VVC) [4] and the AOMedia
Original Prediction
Original samples residue
650 was the first, released in 2015 [10]. However, prior to the
Ref
Ref
Ref
frame
Ref Original + − T Q
Entropy
coding commercial release of hardware-accelerated devices, the x265
Input video samples − Predicted [11], a fast and open-source software codec, had already been
samples
Inter-frame utilized since 2013 by users seeking superior encoding to
prediction
Reference Mode Quantized H.264/AVC.
samples decision residue
Intra-frame In this study, encoding experiments were conducted
prediction
following the Common Test Conditions (CTC) [12], which
Reference 01011010
Ref
Ref In-loop establish a set of video sequences and coding parameters.
Ref
frame
Ref Filtering Rec. + Rec.
IT IQ 11011101
10011011
These guidelines are recommended for all researchers
predicted prediction
samples residue
Output conducting a comparative evaluation of works related to
bitstream
HEVC using the HM reference software, ensuring the
Fig. 1. Diagram of a typical hybrid block-based video encoder. reproducibility of the tests. The performance of each tested
encoder was evaluated in terms of encoding time, bitrate, and
Hybrid block-based encoders apply the abovementioned distortion measured by Peak Signal-to-Noise Ratio (PSNR).
operations after partitioning frames into smaller blocks and Finally, the performance of each fast encoder was compared
use both motion- and still-picture coding techniques. to the reference encoder to determine the Bjøntegaard Delta
The focus of this work is the HEVC standard, which Bitrate (BD-BR) [13] values. The BD-BR is a metric
introduced several features aimed at enhancing compression employed to measure the bitrate disparity between two video
efficiency. The coding structure in HEVC is based on Coding codecs while taking into account their respective PSNR
Tree Units (CTUs), providing flexibility with variable block values. A negative BD-BR value signifies an enhancement in
sizes from 64x64 to 8x8 pixels, allowing for improved bitrate efficiency (lower bitrate for the same quality), whereas
representation of diverse regions within a frame [3]. a positive value indicates a decline in bitrate efficiency.
For intra-frame prediction, HEVC added various new A. Software Experiments
modes and improved others from its predecessor, such as The first stage of this work involved generating the
angular, planar, and DC modes, enhancing the accuracy of encoding results of the HEVC Test Model version 18.0 (HM)
predicting pixel values within a block. Inter-frame prediction [14]. The HM is maintained by the VCEG and MPEG groups,
added advanced motion compensation techniques supporting serving as a reference implementation to ensure compliance
a broader range of motion vectors, contributing to improved with the HEVC standard. The primary purpose of the
efficiency [3]. reference software is not real-time encoding and, because of
that, the encoding of just a few seconds of content can take
The HEVC added support to multiple transform types and
hours or even days to finish. Therefore, the results of the fast
the partitioning scheme of Transform Units (TUs), which also
encoders will be compared against the HM results to assess
show flexibility, varying in size from 4x4 to 32x32 pixels. In-
the impact of real-time encoding in terms of BD-BR.
loop filters, such as the Deblocking Filter and Sample
Adaptive Offset (SAO), have been integrated to reduce Then, similar experiments were done in the
compression artifacts and enhance visual quality [3]. MulticoreWare x265 software through its libx265 library
within the FFmpeg 6.1.1 [15]. It is worth noting that x265
Entropy coding in HEVC relies on Context Adaptive
offers ten speed configurations, which alter various encoding
Binary Arithmetic Coding (CABAC), adapting to the statistics
parameters to accelerate the encoding at the expense of
of encoded data for increased compression efficiency [3].
compression quality. For instance, the fastest preset drastically
Finally, the HEVC extensions added support to higher bit limits the block partitioning tree (larger=32, smaller=16),
depths beyond 10 bits per sample, including support for high while the slowest preset allows the use of all block sizes
dynamic range (HDR) content of up to 16 bits per sample, supported by the standard [11]. Therefore, for each encoded
beneficial for applications requiring enhanced color precision. sequence, the slowest preset that still achieved real-time
Support for different chroma sampling schemes were also encoding was selected, ensuring maximum quality.
added, such as monochrome, 4:2:2, and 4:4:4. Additionally,
These experiments were conducted on a server equipped
specialized tools and modes optimized for screen content
with an Intel Xeon Silver 4314 (2.4GHz) processor.
coding (SCC) were added to process scenarios involving
computer-generated content and screen-sharing applications B. Hardware Experiments
[7]. Encoding experiments were done using the following
Collectively, these features made the HEVC a suitable devices:
choice for a range of applications, including UHD video • CPU Intel Core i7-11800H (2.30GHz) from a Dell
streaming and broadcasting. G15 5511 laptop;
III. HEVC EXPERIMENTS • GPU Nvidia GeForce RTX 4070 from a custom
An analysis of the release history of devices with hardware desktop computer;
acceleration for HEVC encoding for 4K resolution at 30 fps
• SoC Qualcomm Snapdragon 8 Gen 3 from a Samsung
revealed that the first Graphics Processing Unit (GPU) to offer
Galaxy S24 Ultra smartphone.
this support was the Nvidia GTX 950 (Maxwell 2nd gen
microarchitecture) in 2014 [8]. In the case of Central The corresponding hardware-accelerated encoders
Processing Units (CPU), the initial support was introduced by interfaced with FFmpeg through the libraries hevc_qsv,
the 6th generation Intel Core (Skylake microarchitecture) in hevc_nvenc, and hevc_mediacodec, respectively. An x86-64
2015 [9]. Regarding mobile SoCs, the Qualcomm Snapdragon build of FFmpeg 6.1.1 was used for the laptop and desktop
computers, while an aarch64 build was utilized for the It is noteworthy that all fast encoders, including the
smartphone. software-based x265, achieved real-time encoding for all test
sequences up to a resolution of 1080p (classes B and below).
Similar to libx265, all these libraries provide various However, for UHD 4K sequences, the x265 failed to encode
presets that offer different trade-offs between encoding speed any sequences in real-time. The Intel and Qualcomm encoders
and encoding quality, and the presets were selected to successfully encoded two out of six sequences in real-time,
maintain real-time encoding while maximizing the encoding while the Nvidia encoder fell short of achieving real-time
quality. encoding only for the Drums sequence, which is the most
IV. RESULTS demanding test sequence running at 100 fps. Nevertheless,
when considering the averages from Table II, the Nvidia
Table I presents the encoding speed results for both the encoder demonstrated real-time processing for all classes.
HM and each of the fast encoders, where 1.00x indicates real- This was an expected result, because the desktop processors
time speed. Additionally, it displays the BD-BR results for have a higher thermal design power (TDP) when compared to
each fast encoder when compared to the HM. Table II depicts their mobile counterparts.
the same results, presented as class averages.
Fig. 2 illustrates the BD-BR averages from Table II. The
As previously mentioned, the HM is not optimized for most prominent characteristic on the graph is the green line
real-time speed, as evidenced by the encoding speed results representing the Qualcomm encoder, which is significantly
shown in the tables below. For instance, the Tango sequence, higher than the others, indicating a much greater impact on
which is an 10-bit UHD 4K video running at 60 fps with a BD-BR. However, this can be attributed to the fact that this is
duration of four seconds, required 75 hours for encoding. a mobile SoC used in a smartphone, with limitations related to

TABLE I. EXPERIMENTS RESULTS

HM x265 Intel Nvidia Qualcomm


Class Sequence
Speed BD-BR (%) Speed BD-BR (%) Speed BD-BR (%) Speed BD-BR (%) Speed
Tango 0.000148x 118.6 0.55x 80.5 0.79x 90.0 2.70x 208.5 0.88x
A1 Drums 0.000127x 215.2 0.30x 70.9 0.43x 98.6 0.65x 232.1 0.42x
Campfire 0.000393x 92.1 0.95x 73.5 1.51x 47.1 1.70x 125.5 1.20x
CatRobot 0.000239x 98.8 0.50x 67.2 0.81x 98.1 2.20x 192.7 0.86x
A2 TrafficFlow 0.000447x 98.8 0.95x 60.4 1.32x 80.1 2.30x 287.9 1.20x
DayLightRoad 0.000648x 99.4 0.50x 62.1 0.73x 92.6 1.15x 189.7 0.58x
Kimono 0.002042x 57.7 1.30x 100.0 2.83x 100.0 6.25x 100.0 5.68x
ParkScene 0.002646x 51.7 1.35x 52.7 2.73x 116.4 6.25x 201.5 5.70x
B Cactus 0.001172x 86.5 1.65x 54.4 2.45x 102.6 3.10x 194.1 2.85x
BasketballDrive 0.000890x 100.4 1.60x 76.5 2.28x 115.3 3.10x 193.0 2.86x
BQTerrace 0.001143x 192.6 1.95x 80.2 2.07x 220.3 2.65x 350.9 2.37x
BasketballDrill 0.005556x 43.6 1.50x 62.2 3.39x 10.4 14.30x 163.3 5.89x
BQMall 0.005051x 74.7 1.60x 70.1 2.99x 120.5 12.50x 198.4 4.81x
C
PartyScene 0.005910x 54.4 1.50x 64.7 2.78x 133.1 14.30x 229.8 5.62x
RaceHorsesC 0.006460x 81.1 1.75x 64.2 4.44x 79.4 20.00x 136.5 6.30x
BasketballPass 0.021368x 55.4 1.70x 61.8 5.81x 93.9 33.35x 139.4 14.50x
BQSquare 0.025253x 52.8 1.70x 59.7 5.25x 189.2 16.70x 406.7 13.67x
D
BlowingBubbles 0.002796x 47.8 1.80x 71.6 6.10x 162.4 12.50x 233.4 15.08x
RaceHorses 0.003075x 43.8 1.80x 69.9 8.63x 74.2 10.00x 145.3 17.77x
FourPeople 0.003053x 6.3 1.35x 33.6 2.66x 41.4 3.60x 145.0 4.41x
E Jhonny 0.003388x 19.0 1.60x 50.5 3.03x 66.3 2.20x 230.0 4.28x
KristenAndSara 0.002271x 20.3 1.50x 55.7 2.92x 62.4 1.55x 242.2 4.32x
BasketballDrillText 0.005669x 47.8 1.50x 61.0 3.38x 111.6 1.40x 192.6 5.32x
ChinaSpeed 0.003735x 78.6 1.50x 45.9 3.40x 47.1 8.35x 309.6 8.71x
F
SlideEditing 0.007310x -31.9 1.50x -3.8 5.46x -53.0 1.25x 95.2 7.82x
SlideShow 0.008469x 86.9 1.72x 132.3 7.70x 127.1 6.70x 100.0 12.30x

TABLE II. EXPERIMENTS RESULTS AVERAGES

HM x265 Intel Nvidia Qualcomm


Class
Speed BD-BR (%) Speed BD-BR (%) Speed BD-BR (%) Speed BD-BR (%) Speed
UHD 4K
0.000333x 120.4 0.62x 69.1 0.93x 84.4 1.78x 206.0 0.85x
(Classes A1 and A2)
1080p
0.001578x 97.9 1.57x 72.7 2.47x 130.9 4.27x 207.9 3.90x
(Class B)
480p
0.005744x 63.4 1.58x 65.3 3.40x 85.8 15.27x 181.9 5.65x
(Class C)
240p
0.013123x 49.9 1.75x 65.7 6.44x 129.9 18.13x 231.2 15.25x
(Class D)
720p
0.008712x 15.2 1.48x 46.6 2.87x 56.7 2.45x 205.7 4.00x
(Class E)
Screen Content
0.006295x 45.3 1.55x 58.8 4.99x 58.2 4.42x 174.3 8.54x
(Class F)
energy consumption and heat dissipation, because it is part of and speed necessary to enable low-power functionality in
a battery-powered device without active cooling. mobile chips.
Another notable characteristic is that the line representing REFERENCES
the Intel encoder is below the line of the Nvidia encoder by a [1] Information technology: High efficiency coding and media delivery in
considerable margin, indicating that the Intel architecture heterogeneous environments – Part 2: High efficiency video coding.
delivers significantly superior encoding quality in terms of ISO/IEC 23008-2. 2013.
BD-BR. [2] Information technology: Coding of audio-visual objects – Part 10:
Advanced Video Coding. ISO/IEC 14496-10. 2003.
Finally, it is worth noting that the x265 encoder
[3] G. J. Sullivan, J. -R. Ohm, W. -J. Han and T. Wiegand, "Overview of
demonstrated an increase in efficiency as the resolution the High Efficiency Video Coding (HEVC) Standard," in IEEE
decreased. This is attributed to the software offering a very Transactions on Circuits and Systems for Video Technology, vol. 22,
wide range of presets, enabling the activation of features very no. 12, pp. 1649-1668, Dec. 2012, doi:
close to those provided by the reference software when 10.1109/TCSVT.2012.2221191.
encoding low-resolution videos. On the other hand, hardware- [4] Information technology: Coded representation of immersive media --
based encoders do not offer the same degree of flexibility. Part 3: Versatile video coding. ISO/IEC DIS 23090-3. 2020.
[5] P. Rivaz, J. Haughton, "AV1 Bitstream & Decoding Process
Specification," Alliance for Open Media, 2019. Accessed on: Mar.
2024. [Online]. Available: https://aomediacodec.github.io/av1-
spec/av1-spec.pdf
[6] M. Corrêa et al., "AV1 and VVC Video Codecs: Overview on
Complexity Reduction and Hardware Design," in IEEE Open Journal
of Circuits and Systems, vol. 2, pp. 564-576, 2021, doi:
10.1109/OJCAS.2021.3107254.
[7] D. Flynn et al., "Overview of the Range Extensions for the HEVC
Standard: Tools, Profiles, and Performance," in IEEE Transactions on
Circuits and Systems for Video Technology, vol. 26, no. 1, pp. 4-19,
Jan. 2016, doi: 10.1109/TCSVT.2015.2478707.
[8] Nvidia, "Video Encode and Decode GPU Support Matrix," 2024.
[Online]. Available: https://developer.nvidia.com/video-encode-and-
decode-gpu-support-matrix-new
[9] FFmpeg, "Hardware Quicksync," 2023. [Online]. Available:
Fig. 2. BD-BR comparison among fast HEVC encoders. https://trac.ffmpeg.org/wiki/Hardware/QuickSync
[10] Qualcomm, "Qualcomm Snapdragon 650 processor," 2014. [Online].
V. CONCLUSIONS Available: https://www.qualcomm.com/content/dam/qcomm-
martech/dm-
This paper presented a performance analysis of hardware- assets/documents/snapdragon_product_brief_650_102816.pdf
accelerated HEVC encoders found in modern devices [11] MulticoreWare, "x265 documentation," 2024. [Online]. Available:
available for consumers. Additionally, it included a https://x265.readthedocs.io/en/master/
performance assessment of a widely used software-based [12] K. Sharman, K. Sühring, "Common Test Conditions for HM video
encoder, particularly useful in the absence of hardware coding experiments," JCT-VC, document JCT-VC-AC1100, Jan. 2018.
acceleration. The obtained results were directly compared [Online] http://phenix.int-
with those from the HEVC reference software. evry.fr/jct/doc_end_user/documents/29_Macau/wg11/JCTVC-
AC1100-v1.zip
The insights provided by this study are of significant value [13] G. Bjøntegaard, "Calculation of average PSNR differences between
to researchers and developers working on algorithms and RD-curves," Video Coding Experts Group (VCEG), document VCEG-
hardware architectures for video coding. These findings M33, Mar. 2001. [Online] https://www.itu.int/wftp3/av-arch/video-
site/0104_Aus/VCEG-M33.doc
enable a direct comparison of the efficacy of their solutions
[14] JVET, "HEVC HM reference software, " 2023. [Online]. Available:
against existing market solutions. https://vcgit.hhi.fraunhofer.de/jvet/HM
For future research, an analysis of equivalent architectures [15] FFmpeg, "Ffmpeg git repo," 2024. [Online]. Available:
in both desktop and mobile versions is planned. For instance, https://git.ffmpeg.org/gitweb/ffmpeg.git
comparing a desktop CPU with its laptop counterpart. This
investigation aims to reveal the impact on encoding quality

You might also like