0% found this document useful (0 votes)
15 views

MonA03-5

Uploaded by

1321558956abc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

MonA03-5

Uploaded by

1321558956abc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Video compression at extremely low bit rates

1st Tianhao Wang 2nd Yi Fang*


Department of Automation Department of Automation
University of Science and Technology of China University of Science and Technology of China
Hefei, China Hefei, China
[email protected] [email protected]

3rd Jian Wang 4th Tong Gan


Department of Automation Research and Development
University of Science and Technology of China Anhui ShineAuto Autonomous Driving Technology Co., Ltd.
Hefei, China Hefei, China
[email protected] [email protected]

5th Jun Xu 6th Qiang Ling


Department of Automation Institute of Artificial Intelligence
University of Science and Technology of China Hefei Comprehensive National Science Center
Hefei, China Hefei, China
[email protected] [email protected]

Abstract—Traditional video compression standards include enormous volume of video data presents significant challenges
AVC, HEVC, VVC, and AVS. These algorithms may result to its transmission, storage, and processing. Therefore, video
in ringing and blocking artifacts at extremely low bit rates, coding and decoding technology becomes crucial.
which can seriously affect video transmission quality. To address
these issues, we propose a novel video compression method that Due to its numerous implications and benefits, research-
combines super-resolution technology with traditional compres- ing video encoding and decoding technology is a critical
sion standards. In the encoder, we first down-sample the high- study area, especially under extremely low bit rate conditions.
resolution input video and then use the HEVC algorithm to These conditions pose significant challenges to the efficient
encode the low-resolution video into a bitstream. In the decoder, transmission of high-quality video content. The demand for
the HEVC algorithm decodes the received bitstream and inputs it
into a super-resolution network to reconstruct the original high- video streaming and communication services in today’s so-
resolution video. On the one hand, transmitting low-resolution ciety is increasing. Video content contributes to more than
video can reduce the required bandwidth, effectively mitigating 90% of network traffic, and this percentage is expected to
the impact of ringing and blocking artifacts at extremely low bit increase further in the future [2]. Additionally, with the rapid
rates. On the other hand, super-resolution technology can filter development of the online short-video industry, congestion
out ringing and blocking artifacts during the process of restoring
high-resolution video. The experimental results show that our and bandwidth limitations in mobile networks will hinder the
proposed method can achieve a better PSNR performance than instantaneous transmission of video content. Therefore, it is
the traditional HEVC algorithm on the standard HEVC Dataset. crucial to improve compression algorithms to ensure smooth
video playback. With the advancement of modern technology,
Index Terms—Extremely low bit rates, video compression, telemedicine and distance education have gradually gained
video coding, image super-resolution, High Efficiency Video
Coding(HEVC) people’s attention. However, promoting these technologies in
remote or underdeveloped regions requires video communica-
I. I NTRODUCTION tion support. Studying ultra-low bandwidth video coding and
decoding technologies can bridge this digital divide, enabling
Video compression is the process of reducing the amount
people to improve their well-being through technological ad-
of data required to represent digital video while preserving
vances. In summary, challenges such as providing high-quality
an acceptable video quality [1]. Video plays a vital role in
video content and optimizing mobile device transmission in
daily life, such as video calls, short video recording, and video
resource-limited environments underscore the need to study
viewing. With the development of network technology, the
ultra-low bandwidth video coding and decoding technology.
Corresponding Author: Yi Fang. This work was supported in part by the Key However, traditional video encoding methods, such as
Common Technology Development Program of Hefei (Research on Multi- H.264/Advanced Video Coding (AVC) [3] and High Efficiency
sensor Perception and Fusion Algorithms for Autonomous Driving) under Video Coding (HEVC) [4], may introduce ringing and block-
Grant GJ2022GX35, in part by the Key Science and Technology Program
of Anhui under Grant 202203a05020012, and in part by the Natural Science ing artifacts when operating under extremely low bit rate con-
Foundation of Hefei under Grant 2022003. ditions pose significant challenges to efficiently transmitting

979-8-3503-8778-0/24/$31.00 ©2024 IEEE 3996


high-quality video content situations. Blockiness refers to the mobile video. With video’s clarity requirements and bandwidth
appearance of visible block structures in the image, especially limitations in some industries, H.265/HEVC and H.266/VVC
in areas with sharp transitions or edges. This artifact relies [12] have also begun to be gradually applied. The AVS series is
on the constraint of limited data can only be transmitted and a video coding technology with independent intellectual prop-
processed with extremely low bit rates. The ringing effect erty rights in China, which is widely used in domestic satellite
manifests as halo-like distortions around the edges, giving the TV transmission, digital microwave, and terrestrial digital TV
image an unnatural and distorted appearance. These visually systems. Among them, the AVS2 standard is mainly oriented
look like bands or ”ghosts” near the edges. Overshoot artifacts to 4K UHD applications, and the AVS3 standard is mainly
are usually combined with ringing artifacts, which manifest oriented to 8K UHD and VR fields. Both have compression ef-
themselves as an increased jump at the edge transition [5]. ficiency comparable to H.264 and H.266, respectively, and they
These artifacts not only degrade the visual quality of the video have the advantages of independent reliability and royalties.
but also negatively impact the user experience. For example, Video encoders based on the AVS series of standards are also
they distract the user’s attention, reduce the user’s perception widely used in the global broadcast of the Winter Olympics,
of the clarity and detail of the content, and overall decrease the domestic airplane C919, and multiple satellite launch
the effectiveness of the enjoyment and understanding of the bases. Although some image compression coding standards
video. Therefore, mitigating ringing and blocking artifacts (e.g. JPEG2000) are still used for video compression, such
is crucial in crucial in scenarios of extremely low bit rate methods are only capable of intra-frame compression, and the
conditions to ensure optimal video quality and enhance the compression efficiency of inter-frame redundancy is low. In
viewing experience. addition, AOMedia, an open media consortium founded by
This paper introduces an innovative methodology combining more than 30 top technology companies around the world, has
conventional video compression algorithms with image super- developed the open-source, royalty-free AV1 standard, which
resolution techniques. The aim is to address the challenges is fast to decode and has been popularized in video platforms
posed by the ringing and blocking artifacts that result from such as YouTube and Netflix but is still not widely used due
traditional video decoding methods. First, the approach in- to the complexity and high cost of its encoding.
volves down-sampling high-resolution videos before encoding,
effectively decreasing the necessary bandwidth and alleviat- B. Image super-resolution
ing the ringing and blocking artifacts that are traditionally
observed during video encoding. Moreover, utilizing a video In recent years, more and more attention has been paid to
super-resolution network allows for the exploitation of spa- the development of super-resolution technology, which utilizes
tial correlation among pixels to execute filtering operations, a natural image prior and correlation between multi-frame
thereby facilitating the efficient removal of ringing and block- images to reconstruct low-resolution video to obtain high-
ing artifacts. With this method, videos can be encoded and resolution video. The pioneering work of SR is proposed by
decoded with reduced artifacts, thereby improving both the Dong et al. [13], [14] named SRCNN. It attained state-of-
visual fidelity and the viewer’s perception of the content. the-art performance at the time by introducing a deep neural
The rest of the paper is organized as follows: Section II network architecture specifically created for super-resolution
briefly reviews the traditional video coding algorithms and applications. Then, they came up with FSRCNN [15], which
image super-resolution technology. Section III describes the had better SR performance. In an effort to further enhance
complete architecture of our method and the details of some SR performance, researchers have investigated a variety of
critical components. In Section IV, we present our implemen- architectures. For instance, the Very Deep Super-Resolution
tation details and experimental results. Section V draws the (VDSR) model proposed by Kim et al. in 2016 [16] introduced
conclusion. a deeper network with residual learning to exploit image
features better. In 2017, Ledig et al. [17] introduced the
II. R ELATED W ORK
Super-Resolution GAN (SRGAN) which generated visually
A. Traditional video compression appealing high-resolution images by incorporating adversarial
Video coding and decoding are fundamental tasks in the training. Lim et al. [18] modified SRResNet to create a more
field of computer vision, which can significantly reduce deep and broader residual network known as EDSR. With the
the storage space required for storing videos and reduce smart topology structure and a significantly large number of
the bandwidth of network communication. In recent years, learnable parameters, EDSR considerably improved the SR
video compression technology has rapidly developed, and performance. Zhang et al. introduced local and global residual
the common video compression coding standards are the connections in their proposal of the RDN [19]. Subsequently,
MPEG series, AVS series, H.26X series, etc [1], [6]–[10]. they introduced a channel attention mechanism in every local
The MPEG and H.26X series standards are mainly developed residual block and presented RCAN [20], which markedly
by two international organizations, ITU-T and ISO/IEC. With enhanced the performance of SR. The field of SR technology
the continuous development of the standards, MPEG-4 [11], has greatly evolved due to these research endeavors, opening
H.264/AVC technology has now matured and has been widely up new applications in computer science, image processing,
popularized in the fields of satellite TV, Internet video, and and medical imaging.

2024 36th Chinese Control and Decision Conference (CCDC) 3997


III. M ETHOD information will be added to the code stream and passed
A. Video compression framework to the decoder.
2) Intra-prediction or inter-prediction is performed for each
unit, and the residuals can be obtained by subtracting the
predicted values from the original pixel values. Motion
estimation and motion compensation are performed in
inter-prediction. In-loop deblocking and sample adaptive
offset (SAO) filters further improve the quality of the
reconstructed image.
3) For the residuals of each unit, an integer basis function
transform is applied, which is similar to a discrete cosine
transform or discrete sine transform. Then, the transform
coefficients are quantified.
Fig. 1: Framework diagram of our video compression method. 4) Entropy coding involves the binary arithmetic encoding
of previously obtained quantized transform coefficients,
In this paper, we provide a novel method of video compres- prediction information, pattern information, motion in-
sion technique that combines super-resolution networks with formation, and header information. Finally, output the
conventional video code algorithms. The framework diagram resulting coded bitstream.
of our video compression technique is shown in Fig. 1. The
The decoding process is basically the opposite of the
video that needs to be delivered at the encoder is first put
encoding process described above.
into the down-sampled network in order to obtain the low-
resolution video. The down-sampled method used in this paper C. Image super-resolution technology
is the bilinear interpolation approach. It estimates the value of
the new pixel by calculating a weighted average with reference As show in Fig. 3, the image super-resolution network
to the values of the four nearest pixels around the position used in this paper is information multi-distillation network
of each new pixel. Bilinear interpolation provides smoother (IMDN) [22]. Given an input LR image I LR entering the
results than nearest neighbor interpolation and reduces jagged super-resolution network, its corresponding target HR image
edges and mosaic effects. The low-resolution video is encoded I HR can be obtained. The super-resolved image I SR can be
generated by
and sent to the decoder using the conventional HEVC al-  
gorithm after the video down-sampling. This low-resolution ISR = HIM DN ILR , (1)
video stream can be adapted to extremely low bit rate network where HIM DN (·) is the IMDN. The image super-resolution
transmission environments and can realize high-quality video network is optimized using mean absolute error (MAE)
transmission that is not prone to blocking and ringing artifacts. loss
In the decoder, we first decode the received low-resolution  LRfollowed
N most of previous works. Given a training set
Ii , IHR
i i=1
that has N LR-HR pairs. The loss function
video and then reconstruct the original input high-resolution
can be expressed by
video using an image super-resolution network.
N
  SR 
B. Typical HEVC video encoder L(Θ) = 1/N Ii − IiHR  , (2)
With the rapid development of the digital video application i=1
industry chain, the development trend of video applications to where Θ indicates the adjustable parameters of IMDN.
a higher definition, higher frame rate, and higher compression IMDN consists of shallow feature extraction, deep feature
rate is increasingly apparent. Therefore, the market needs more extraction, feature fusion, and upsampler reconstruction. It
efficient video coding standards than H.264/MPEG-4 AVC. first performs shallow feature extraction of the LR image
In this context, HEVC as a new generation of video coding by 3x3 convolution. Then, it comes to the vital component
standards came into being. The design goal of HEVC is to of the whole network, i.e., a series of stacked IMDBs for
reduce the bit rate of H.264/AVC by 50% under the same deep feature extraction. The architecture of IMDB is shown in
image quality [21]. Its design focuses on two main aspects: Fig. 4. We perform a channel split operation on the previous
high-resolution video and increasing the use of parallel pro- features. 75% of these features are concatenated along the
cessing structure. Like the previous video standards ITU-T channel dimension, and the remaining 25% are fed into the
and ISO/IEC developed, HEVC uses a hybrid coding process next calculation unit. Then, the outputs of each IMDB are
based on a chunked structure. Fig. 2 depicts an illustration fused together by 1x1 convolution to recover 64 channels.
of a HEVC encoder. The input is the original video, and the Finally, after a layer of 3x3 convolution, the SR image is
output is a HEVC-compliant bitstream. The HEVC encoding reconstructed by Sub-pixel convolution. This network archi-
process is shown as follows. tecture scheme is beneficial to guarantee the integrity of the
1) Each input video frame is divided into image block units collected information and can further boost SR performance
of different sizes, and the corresponding block division by increasing very few parameters.

3998 2024 36th Chinese Control and Decision Conference (CCDC)


IV. E XPERIMENTS In addition, the compression ratio is measured by the
average bits required to encode each pixel per frame
A. Experiment Setup (bpp).
1) Dataset: In our experiments, we tested four categories of
sequences from ClassA to ClassD of the standard HEVC B. Comparison of Results
Dataset [4]. These four types of sequences cover a wide We compare our video compression method under ex-
range of videos with different characteristics, and the tremely low bit rates with the conventional HEVC algorithm.
textures of the individual test sequences are shown in When we mention HEVC, we are referring to x265 as the
Table I. encoder. The comparison results are shown in Fig. 5. It can
2) Evaluation Standard: The key to compression is to find a be found that the compression performance of our method
balance between distortion and compression ratio, which outperforms the traditional HEVC algorithm in terms of PSNR
is an essential basis for measuring the performance of a when the same bpp is achieved. For example, in the HEVC
compression algorithm. This paper uses the peak signal- ClassA dataset, our method improves the performance of
to-noise ratio (PSNR) to measure compression distortion. HEVC by 2.8 dB at the same bpp (0.0035). Besides, we

Fig. 2: Block Diagram of Traditional HEVC Encoder.

Fig. 3: Super-resolution network architecture diagram. Fig. 4: IMDB architecture diagram.

2024 36th Chinese Control and Decision Conference (CCDC) 3999


Fig. 5: Illustration of the comparison of results on the HEVC Dataset.

(a) ClassA Dataset (b) ClassB Dataset

(c) ClassC Dataset (d) ClassD Dataset

Fig. 6: Illustration of the comparison of visual results on the HEVC Dataset.

can see that our method has a greater advantage than HEVC of the wall, and the grain of the floor are well preserved in
at smaller bpp. It proves that the compression effect of our our method.
method is better in extremely low bit rate scenarios.
V. C ONCLUSION
C. Visual Comparison In recent decades, traditional video compression algorithms
As shown in Fig. 6, the visual comparison results on have been extensively developed and utilized. The approach
the ClassA, ClassB, ClassC, and ClassD datasets are shown. presented in this paper diverges from conventional architec-
Visually, video compressed using only the HEVC algorithm tures, which are predominantly designed for accommodating
is less pleasing than ours. Regarding the extremely low bit extremely low bit rate network environments. Within our archi-
rate scenes, the videos processed by our method are closer to tecture, the video undergoes down-sampling to generate a low-
the original video without severe distortions, such as artifacts, resolution version, which is subsequently compressed using
blurs, and boundaries. The videos processed by the traditional the traditional HEVC algorithm, and then reconstructed via
HEVC algorithm have more obvious ringing and blocking an image super-resolution network. In scenarios with severely
artifacts. For example, in the visualization graph of the ClassC constrained bandwidth, this technique effectively mitigates
dataset, it can be clearly observed that some details of the the ringing and blocking artifacts commonly encountered in
video, such as the pattern of the little girl’s dress, the pattern traditional video compression algorithms, thereby enhancing

4000 2024 36th Chinese Control and Decision Conference (CCDC)


the quality of video transmission. Experimental results show [11] I. J. /Sc, “Information technology - coding of audio-visual objects - part
that our method surpasses the traditional HEVC algorithm both 2: Visual,” 2001.
[12] B. Bross, J. Chen, J.-R. Ohm, G. J. Sullivan, and Y.-K. Wang, “Devel-
visually and in terms of the objective metric PSNR. opments in international video coding standardization after avc, with an
overview of versatile video coding (vvc),” Proceedings of the IEEE, vol.
R EFERENCES 109, no. 9, pp. 1463–1493, 2021.
[13] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional
[1] A. J. Hussain and Z. Ahmed, “A survey on video compression fast block network for image super-resolution,” in Computer Vision–ECCV 2014:
matching algorithms,” Neurocomputing, vol. 335, pp. 215–237, 2019. 13th European Conference, Zurich, Switzerland, September 6-12, 2014,
[2] C. G. C. Index, “Forecast and methodology, 2016–2021 white paper,” Proceedings, Part IV 13. Springer, 2014, pp. 184–199.
Updated: February, vol. 1, p. 2018, 2018. [14] ——, “Image super-resolution using deep convolutional networks,”
[3] T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of IEEE Transactions on Pattern Analysis and Machine Intelligence,
the h.264/avc video coding standard,” IEEE Transactions on Circuits vol. 38, no. 2, pp. 295–307, 2016.
and Systems for Video Technology, vol. 13, no. 7, pp. 560–576, 2003. [15] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution
[4] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the convolutional neural network,” in Computer Vision–ECCV 2016: 14th
high efficiency video coding (hevc) standard,” IEEE Transactions on European Conference, Amsterdam, The Netherlands, October 11-14,
Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649– 2016, Proceedings, Part II 14. Springer, 2016, pp. 391–407.
1668, 2012. [16] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution
[5] X. Wang, L. Xie, C. Dong, and Y. Shan, “Real-esrgan: Training using very deep convolutional networks,” in 2016 IEEE Conference
real-world blind super-resolution with pure synthetic data,” in 2021 on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1646–
IEEE/CVF International Conference on Computer Vision Workshops 1654.
(ICCVW), 2021, pp. 1905–1914. [17] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta,
[6] T. Sikora, “Mpeg digital video-coding standards,” IEEE signal process- A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic
ing magazine, vol. 14, no. 5, pp. 82–100, 1997. single image super-resolution using a generative adversarial network,”
[7] G. J. Sullivan, P. N. Topiwala, and A. Luthra, “The h. 264/avc advanced in 2017 IEEE Conference on Computer Vision and Pattern Recognition
video coding standard: Overview and introduction to the fidelity range (CVPR), 2017, pp. 105–114.
extensions,” Applications of Digital Image Processing XXVII, vol. 5558, [18] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual
pp. 454–474, 2004. networks for single image super-resolution,” in The IEEE Conference
[8] G. J. Sullivan and T. Wiegand, “Video compression-from concepts to on Computer Vision and Pattern Recognition (CVPR) Workshops, July
the h. 264/avc standard,” Proceedings of the IEEE, vol. 93, no. 1, pp. 2017.
18–31, 2005. [19] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense
[9] J.-R. Ohm and G. J. Sullivan, “High efficiency video coding: the next network for image super-resolution,” in 2018 IEEE/CVF Conference on
frontier in video compression [standards in a nutshell],” IEEE Signal Computer Vision and Pattern Recognition, 2018, pp. 2472–2481.
Processing Magazine, vol. 30, no. 1, pp. 152–158, 2013. [20] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-
[10] I. Telecom, “Advanced video coding for generic audiovisual services,” resolution using very deep residual channel attention networks,” in
ITU-T Recommendation H. 264, 2003. Proceedings of the European conference on computer vision (ECCV),
2018, pp. 286–301.
[21] N. Usha Bhanu and C. Saravanakumar, “Investigations of machine
TABLE I: HEVC Test Sequence learning algorithms for high efficiency video coding (hevc),” in 2023
International Conference on Signal Processing, Computation, Electron-
ics, Power and Telecommunication (IConSCEPT), 2023, pp. 1–5.
Type Resolution Video sequence Characteristic [22] Z. Hui, X. Gao, Y. Yang, and X. Wang, “Lightweight image super-
Traffic Homogeneous, flat resolution with information multi-distillation network,” in Proceedings
of the 27th acm international conference on multimedia, 2019, pp. 2024–
ClassA 2560*1600 PeopleOnStreet Non-homogeneous, 2032.
vigorous movement
BasketballDriver Non-homogeneous,
vigorous movement
Cactus Complex backgrounds,
object rotation
ClassB 1920*1080
BQTerrace Inhomogeneous
Kimono Simple background,
slow movement
ParkScene Homogeneousflat
BasketballDrill Non-homogeneous,
vigorous movement
BQMall Inhomogeneous
ClassC 832*480
PartyScene Complex textures
RaceHorses Non-homogeneous,
vigorous movement
BasketballPass Non-homogeneous,
more vigorous
movement
ClassD 416*240 BlowingBubbles Homogeneous, flat
BQSquare Homogeneous, flat
RaceHorses Non-homogeneous,
vigorous movement

2024 36th Chinese Control and Decision Conference (CCDC) 4001

You might also like