0% found this document useful (0 votes)
30 views

SIPaper RG

The document analyzes how spatial and temporal information values vary for 8-bit and 10-bit video sequences under different encoding settings, encoders, resolutions and temporal pooling methods. It presents a comprehensive evaluation of these variations and provides insights into how factors like resolution, bit depth, and compression impact spatial and temporal information values.

Uploaded by

abdelatif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

SIPaper RG

The document analyzes how spatial and temporal information values vary for 8-bit and 10-bit video sequences under different encoding settings, encoders, resolutions and temporal pooling methods. It presents a comprehensive evaluation of these variations and provides insights into how factors like resolution, bit depth, and compression impact spatial and temporal information values.

Uploaded by

abdelatif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/335127938

Analysis of Spatial and Temporal Information Variation for 10-Bit and 8-Bit
Video Sequences

Conference Paper · August 2019


DOI: 10.1109/CAMAD.2019.8858486

CITATIONS READS
28 401

3 authors, including:

Nabajeet Barman Maria G Martini


Sony Interactive Entertainment (PlayStation) Kingston University
42 PUBLICATIONS 793 CITATIONS 222 PUBLICATIONS 3,291 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Nabajeet Barman on 27 March 2021.

The user has requested enhancement of the downloaded file.


Analysis of Spatial and Temporal Information
Variation for 10-bit and 8-bit Video Sequences
Nabajeet Barman, Nabeel Khan, and Maria G. Martini
Wireless Multimedia & Networking Research Group, School of Computer Science and Mathematics,
Kingston University, London, UK
{n.barman, n.khan, m.martini}@kingston.ac.uk

Abstract—Spatial Information (SI) and Temporal Information Spatial and/or Temporal Information
(TI) have been used widely as an approximate estimation of video Example Applications
complexity. Recently, SI (and TI) have found use in many other
applications such as Quality of Experience modeling, Bandwidth
and Rate-distortion modeling, etc., for both traditional and non-
traditional (gaming, dynamic vision sensors, etc.) videos. It is
often assumed that SI and TI only depend on video content, Rate-distortion Clustering and QoE-based objective Bandwidth modelling
while instead factors such as resolution, bit depth, compression modelling for classification of content Video Quality metrics of Dynamic Vision
Scalable Video complexity for gaming (Statistical and machine Sensors for Visual
have an impact on the values of SI and TI for a specific video Coding and non-gaming videos learning based metrics) Sensor Networks
content. A systematic study on SI and TI for videos, investigating [5] [3, 6, 7] [8, 9, 10, 11] [12, 13]
the effect of different video encoding and processing steps on
SI and TI values has been missing so far. Also, SI and TI Fig. 1: Example applications of the spatial and temporal
calculation has been limited to 8-bit videos, while there has been
increasing popularity and usage of 10-bit videos. Towards this information.
end, we present in this paper a comprehensive evaluation of
the variation of SI and TI for different 8-bit and 10-bit videos.
Results and insights into the variation of SI and TI values for by specific video sensors, such as neuromorphic sensors ([12],
different encoding settings, choice of encoders, temporal pooling
methods, resolution, etc. are presented in this study.
[13]). For instance, in [8] scene complexity information is
Index Terms—Spatial Information, Temporal Information, used in terms of the spatial content of frames and temporal
Video Streaming information calculated between consecutive frames to derive a
rate-distortion model for video sequences. The authors in [10]
I. I NTRODUCTION measure the video quality objectively by utilizing the spatial
The measurement of scene complexity can be used to content of the sequences. The authors in [2], [11] and [1]
determine the expected data rate and hence the bandwidth proposed machine learning based QoE models, where spatial
requirement or the required compression level of diverse and temporal information values are used along with other
content types. In fact, more spatially and temporally complex influence factors for quality estimation of gaming videos.
videos require a higher data rate to achieve a satisfactory The applications of spatial and temporal information are not
quality. Measuring the scene complexity plays an important only limited to traditional video sequences and have found
role in key applications ranging from the design of video application in other fields such as neuromorphic engineering.
quality metrics well representative of the quality experienced For example, the authors in [12], [13] proposed several spatial
by the actual users [1][2] to the clustering and classification information based models to predict the data rate output by
of different video sequences [3]. The metrics to measure Dynamic Vision Sensors (DVS).
scene complexity are widely varied, ranging from subjective
complexity measures [4] to diverse objective metrics [5]. The A. Spatial and Temporal Information
spatial information of an image [6], as a measure of edge
energy, is one of the most widely-used metrics for scene In this section, we report the mathematical definitions of
complexity estimation. Spatial Information (SI) and Temporal the spatial and temporal information. Let gh and gv denote
Information (TI), as defined by ITU-T Rec. P.910 [7] as an horizontal and vertical gradients, respectively, of a grey-
approximate measure of video content complexity, have been scale image, evaluated via filtering the grey-scale image with
widely used in the field of quality assessment, in particular for horizontal and vertical Sobel kernels. The magnitude of spatial
the selection of the video content to be used for the subjective information calculated at pixel p, SIp , is represented as:
tests, that should be representative of different complexity q
classes. SIp = gh2 + gv2 . (1)
Figure 1 highlights some of the applications which use spa-
tial and temporal information – ranging from rate-distortion The SI statistics used for pooling, to characterizePthe Spatial
modeling ([8]) to clustering and classification ([6], [9], [3]) Index of an image, are the mean (SImean = P1 SIp ) and
to QoE evaluation metrics ([10], [2], [11], [1]) to data rate the standard
q Pdeviation of the magnitude of spatial information
estimation and bandwidth modelling for information acquired (SIstd = P1 (SIp − SImean )2 , where P is the number of
pixels in the image. For video sequences, ITU-T Rec. P.910 TABLE I: Summary of the eight reference video sequences
[7] defines spatial information as: Sequence ID Sequence Resolution Frame rate Duration(s)
C1 ChimeraEP01 4096x2160 59.94 10
n o
SI = maxtime SIstd . (2) C2 ChimeraEP10 4096x2160 59.94 10
C3 ChimeraEP11 4096x2160 59.94 10
According to (2), SIstd is computed for each of the frames C4 ChimeraEP16 4096x2160 59.94 10
in the video sequence and the maximum of SIstd , among C5 Campfire 3840x2160 25 12
all the frames, is taken (over the whole time duration of the C6 Fountains 3840x2160 25 12
sequence). ITU-T Rec. P.910 [7] defined temporal information C7 Runners 3840x2160 25 12
as: n o C8 Suzie 3840x2160 25 9.6
T I = maxtime std[Mpn ] (3)

Mpn = Fpn − Fpn−1 (4) The remainder of this paper is organized as follows. We
present the evaluation methodology describing the source
where Mpn is the pixel intensity difference between Fpn , video sequences and encoding/processing settings in Sec-
current frame n, and Fpn−1 , previous frame n − 1. For the tion II. Results and observations, addressing each objective
difference frame the standard deviation is applied across all mentioned above, are presented in Section III. Section IV
the pixels. According to (3), the standard deviation of Mpn is concludes the work along with a brief discussion of possible
computed for every frame and the maximum is taken over the future works.
entire time duration of the video sequence.
II. E VALUATION M ETHODOLOGY
B. Contributions
A. Source Sequences
The SI and TI measures, as defined by ITU-T P.910 have
been widely used in the research community as an approxi- In this work, we used a total of eight pristine, uncompressed
mate measure of content complexity, but such an evaluation videos of 4K/UHD resolution from [14], [15] and [16].
so far has been limited to 8-bit videos. There exist a research Fig. 2 shows screenshots of the considered video sequences,
gap in the evaluation and analysis of SI and TI values for highlighting the different types of content utilized in this work
videos of higher bit-depth (10/12 bits, etc.). Towards this end, and Table I summarizes the characteristics of the selected
our evaluation of open source tools1 revealed that these are reference video sequences. In order to neglect any effect of the
incompatible for SI and TI calculation of videos of bit-depth device used to capture the video and/or the capture settings,
higher than 8-bits. Also, a systematic study of SI and TI source video framerate and/or duration, the reference video
values asosciated to different choices of encoder, encoding sequences were selected from different content providers and
settings, temporal pooling method, etc. is missing from the are of different genres, duration, framerates and resolution. It
literature so far. An in-depth analysis of SI and TI values is important to note that the selected content is representative
would help researchers in the design of better QoE estimation of the commonly streamed video sequences on YouTube,
models, calculation of SI and TI for higher bit-depth videos, Netflix, Amazon Prime Video, etc. Sequences C1-C4 are
clustering and classification strategies, etc. and possibly find from different episodes of the famous Netflix video sequence
new application areas of SI and TI. Towards this end, we Chimera depicting various activities (Bar scene, Netflix card
define the following five objectives which address some of twirl, Seaside and pier and Toddler and fountain). Sequences
the research gaps discussed: C5-C7 are from the SJTU 4K dataset and depict a campfire
scene at night, fountains and runners running at a competition.
1) To evaluate SI and TI values for 10-bit uncompressed
The last sequence, C8, is the reconstructed, 4K sequence of
video sequences.
the famous video clip Suzie which depicts a girl answering a
2) To study the relationship between SI and TI values for
telephone.
8-bit and for 10-bit representation of a video sequence.
3) To evaluate the effect of different temporal pooling B. Video Processing
methods (Mean, Median and Minimum), other than the Table II summarizes the video characteristics and encoding
currently used “maximum” pooling method, on SI and settings used in this work. We restrict our analysis to short du-
TI values and on their capability to serve as an indicator ration video sequences of YUV planar colorspace with 4:2:0
for video complexity. chroma subsampling (YUV420) which is currently the most
4) To study the effect of different compression standards widely used chroma subsampling across all video streaming
(H.264/MPEG-AVC and H.265/MPEG-HEVC) on SI and and broadcast applications. All video processing tasks such
TI values. as encoding, 10-bit to 8-bit conversion, chroma subsampling
5) To evaluate the effect of different encoding settings on conversion, etc. are done using FFmpeg2 . For the first four
the behavior of SI and TI values. video sequences (C1-C4), both 8-bit and 10-bit versions are
6) To study the variations of SI and TI with different made available by Netflix at 59.94 fps and YUV422 pixel
resolutions. format. Such video sequences were cut into four sequences
1 https://github.com/Telecommunication-Telemedia- of approximately 10 seconds at original resolution and frame
Assessment/SITI/blob/master/python/siti.py, and
https://github.com/slhck/siti 2 https://ffmpeg.org/
(a) ChimeraEP01 (b) ChimeraEP10 (c) ChimeraEP11 (d) ChimeraEP16

(e) Campfire (f) Fountains (g) Runners (h) Suzie

Fig. 2: Screenshots of the eight videos used in this work.

60 240
Runners Runners

ChimeraEP16 ChimeraEP16
50 Fountains
200 Fountains
Campfire ChimeraEP11 Campfire ChimeraEP11

40 ChimeraEP01 160 ChimeraEP01

ChimeraEP10
ChimeraEP10
30 Suzie 120 Suzie

100
0 20 40 60 0 80 160 240

(a) SI vs. TI plot for the eight Reference 8-bit videos (b) SI vs. TI plot for the eight Reference 10-bit videos

Fig. 3: SI vs. TI plot for 8-bit and 10-bit reference video sequences.

TABLE II: Video characteristics and encoding settings TI calculations on the Y channel for all the frames of the
Parameter Value YUV video. To address objectives 3) - 6), the encoding
Number of Reference Videos 4 (8-bit) + 4 (10-bit) = 8 of the reference video sequences was required. For brevity
Chroma Subsampling YUV420 and based on our findings that all selected reference video
Frame rate 59.94, 25 sequences exhibited the same behaviour during our initial
Encoder FFmpeg studies addressing objective 1 and 2, we restricted this analysis
Encoding Mode CRF (23, 30), Fixed Bitrate (1, 5 Mbps)
only to the first four video sequences (C1-C4). Both the 8-
Video Compression Standard H.264, H.265
Preset Medium (default)
bit and 10-bit versions of the video sequences C1-C4 were
then encoded at two different encoding settings (constant
bitrate and constant rate factor) using two of the most widely
used video compression standards (H.264/MPEG-AVC and
rate and were subsampled to YUV420 pixel format. The H.265/MPEG-HEVC). For the encoders we used the FFmpeg
remaining four sequences (C5-C8) were already of shorter library libx264 and libx265 which are the H.264/MPEG-4
duration, 10 bit-depth and different pixel formats. They were AVC and H.265/HEVC encoder wrapper respectively. The
first processed to create the YUV420 pixel format, 10- and encoded video sequences were decoded back to rawvideo
8- bit-depth versions. As defined in ITU-T Rec. P.910, SI (YUV) format for SI TI calculations, as is commonly done
and TI calculations are performed only on the luminance in the literature. In order to not influence the results due to
(Y) channel of the YUV colorspace. We used MATLAB choice of other encoding settings (preset, GOP size, codec
to read the YUV videos and then performed all SI and
TABLE III: Ratio of SI10bit to SI8bit and T I10bit to T I8bit also holds true when other temporal pooling methods such as
for the eight video sequences. minimum, mean and median are considered instead of max as
Sequence ID SI10bit SI8bit RSI TI10bit TI8bit RTI defined in the ITU standard. Towards this end, we define the
C1 150.20 37.55 4.00 229.83 57.46 4.00 following:
C2 135.30 33.83 4.00 262.37 65.59 4.00 n o
C3 190.98 47.75 4.00 218.14 54.54 4.00 SIStd−M in = mintime SIstd , (5)
C4 195.98 48.99 4.00 216.15 54.04 4.00
C5 179.82 44.95 4.00 142.08 35.52 4.00 n o
C6 187.63 46.89 4.00 47.25 11.82 4.00 SIStd−M ean = meantime SIstd , and (6)
C7 220.35 55.08 4.00 101.38 25.35 4.00
C8 113.49 28.36 4.00 123.89 30.97 4.00
n o
SIStd−M edian = mediantime SIstd . (7)

Similarly, for Temporal information we have T IStd−M in ,


profile and level, etc.), they were all set to the default value T IStd−M ean and T IStd−M edian . Table IV presents the ratio
as used by FFmpeg. between the 10-bit and 8-bit representations of the eight video
III. R ESULTS sequences considering the three different pooling strategies.
It can be observed that the spatial information related ratios
We present in this section, the results and discussions
remain constant at an equal value of 4 even when other
grouped under six different studies, each of which investi-
temporal pooling strategies are considered. The temporal
gates and addresses the six objectives mentioned earlier in
information related ratios also remain approximately constant
Section I-B.
at 4. This is interesting as it confirms the fact that the amount
A. SI and TI calculations for 10-bit and 8-bit videos of spatial and temporal information difference between 10-bit
Figure 3 shows the SI vs. TI plot for the 10-bit (on the right) and 8-bit videos is independent of the pooling method used.
and 8-bit (on the left) representation of the eight reference C. Study 3: Effect of different encoding settings on SI and TI
videos. Comparing the two figures, the SI and TI values for
the 10-bit videos appear to be remarkably higher than that In this section, we evaluate the effect of different encoding
of the respective SI and TI values of the 8-bit videos. Let settings on SI and TI using the following two encoding modes:
SI10bit and SI8bit indicate the SI value of the 10-bit and 8- 1) Constant Rate Factor (CRF) mode of encoding is the
bit representation of a video sequence, respectively. We can preferred mode of encoding when the desired goal is to
thus define RSI = SI10bit /SI8bit as the ratio of the two have consistent quality across the whole video duration,
respective representations of a video sequence. Similarly, for but it results in a variable file size depending on the
TI we have, RT I = T I10bit /T I8bit where T I10bit and T I8bit content complexity of the video.
are the TI value of the 10-bit and 8-bit representation of a 2) Fixed bitrate mode of encoding tries to achieve a desired
video sequence, respectively. Table III reports the values of average bitrate across the whole video duration. Such
RSI and RT I for all the eight sequences. It can be observed mode of encoding is usually preferred for streaming
that the ratio is exactly 4 for all eight sequences for both SI applications as the networks usually can restrain the
and TI ratios. This is due to the fact that the ratio between the video bitrate to match the network throughput.
number of possible pixel values in the 10-bit and 8-bit case Figure 4 presents the results of the four reference videos
is 210−8 = 4). Without loss of generality, we expect that this encoded using the H.264/AVC compression standard using the
observation can be further extended to higher bit depths (e.g., two aforementioned encoding settings. Based on the Figure,
12-bit and 16-bit videos/images). it can be observed that with an increase in CRF value
Note: The observation reported above depends on how 10- (decreasing bitrate), the SI value decreases. The decrease in
bit sequences are transformed into 8-bit sequences. What we SI value with a decrease in the bitrate is due to the fact
observe above is true if a linear mapping is used (which that during the encoding process, there is a loss of pixel
happens to be the case for the 8-bit sequences from Netflix information which can lead to loss of edge information, thus
and also for the ones we generated using FFmpeg). In case leading to lower SI values. The decrease is more prominent
of other transform methods such as logarithm, are used, this in videos with a higher SI value (e.g., C3 and C4).
may not hold. Hence, SI and TI calculation can be used as a
D. Study 4: Effect of different encoding standards (H.264/AVC
simple way to check if the mapping for bit-depth conversion
and H.265/HEVC) on SI and TI considering four reference
was linear.
videos
B. Study 2: Effect of different temporal pooling on SI and TI Next, we wanted to evaluate if the choice of the encoder
values for 8-bit and 10-bit videos influence the SI and TI values of the 10-bit and 8-bit video
SI and TI values of a video sequence are defined in ITU-T sequences differently or in a similar fashion (and hence the
Rec. P.910 (see (2) and (3)) as the maximum value over ratio between the SI (and TI) values of the 10-bit and 8-bit
all the frames of a video sequence. Our previously discussed videos). For brevity, we restrict ourselves to four reference
results in Section III-A showed that the ratio between the SI videos (C1-C4), but the results should nevertheless be gen-
and TI values of the 10-bit and 8-bit versions is exactly 4, eralizable across all video sequences. In order to make sure
to which we further wanted to quantify if this observation that the results reported are not limited to the choice of the
TABLE IV: Ratios between spatial and temporal information of 10-bit and 8-bit videos considering different “temporal pooling”
methods.

Sequence ID RSI_MIN RSI_MEAN RSI_MEDIAN RTI_MIN RTI_MEAN RTI_MEDIAN


C1 4.00 4.00 4.00 3.97 3.98 3.98
C2 4.00 4.00 4.00 3.99 4.00 4.00
C3 4.00 4.00 4.00 3.95 3.97 3.97
C4 4.00 4.00 4.00 4.00 4.00 4.00
C5 4.00 4.00 4.00 4.00 4.00 4.00
C6 4.00 4.00 4.00 4.00 4.00 4.00
C7 4.00 4.00 4.00 4.00 4.00 4.00
C8 4.00 4.00 4.00 3.85 3.99 3.97

50 50
CRF=23 CRF=23
CRF=30 CRF=30
45

SI_x265
45

40
SI

40

35
35

30
30 30 35 40 45 50
ChimeraEP01 ChimeraEP10 ChimeraEP11 ChimeraEP16 SI_x264
50 50
BR=1 Mbps BR=1 Mbps
BR=5 Mbps
BR=5 Mbps
45 45
SI_x265
SI

40 40

35
35

30
30 30 35 40 45 50
ChimeraEP01 ChimeraEP10 ChimeraEP11 ChimeraEP16 SI_x264

Fig. 4: Variation of SI of four 8-bit video sequences (C1-


C4) considering two different encoding modes: CRF (top) and Fig. 5: Variation of SI values for four 8-bit reference videos
fixed average BR (bottom) (Encoder=H.264/AVC). (C1-C4) for CRF and Bitrate encoding using H.264 and
H.265.

encoding setting, we evaluate the SI and TI values considering


two different modes of encoding: CRF and “fixed” average 720p (1280x720) and 480p (640x480) and then their SI and TI
bitrate as discussed previously. We define SI values obtained values calculated. Figure 6 presents the variation of SI with the
from encoding using H.264 and H.265 encoders as SIx264 and resolution for one of the sample videos. Similar observations
SIx265 respectively. Fig. 5 shows the SIx265 vs. SIx264 for are observed for the rest of the videos but are not presented
four video sequences considering the two encoding modes. It here for brevity. It can be observed that with a decrease in
can be observed that the SI value remains the same for the the resolution, the SI value increases. This is primarily due
two encoders for both encoding modes. Therefore, SI values to the fact that after the downscaling of a video, the same
appear to be independent of the choice of the encoder and amount of spatial (edge) information is now contained in a
depends only on the content type. Similar observations hold lower number of pixels, leading to a higher SI score. On the
true when considering other temporal pooling methods and other hand, no considerable change is observed in TI values,
also for TI values. The ratio of respective SI and TI values which is consistent with the fact that TI represents the amount
between the 10-bit and 8-bit videos remains constant at 4, as of temporal (motion) information in a video, which is more
observed earlier for the reference video sequences. or less independent of the resolution of the video. Figure 7
shows the difference in SI values for consecutive resolutions
E. Study 5: Variation of SI and TI with Resolution for all eight video sequences. It can be observed that while
Since SI is an indicator of the spatial complexity, we the SI values for some videos such as C5, C6, and C8 (low
wanted to evaluate the effect of resolution on the SI value. to medium SI and TI) are not much affected by a decrease
Towards this end, the eight reference videos were down- in the resolution, for some videos such as C1 and C2 (high
scaled using “bilinear” filter from their native resolution of TI) and C8 (high SI), the change in SI between consecutive
4K/UHD to three different resolutions: 1080p (1920x1080), resolutions is of much higher magnitude. Hence, depending
observations reported in this study will be of interest and
use to the research community working on QoE modeling,
content classification, data rate estimation and related ones
and possibly extend such analysis to applications other than
videos (e.g., neuromorphic data).
V. ACKNOWLEDGMENT
The authors acknowledge the support of the European
Commission (project H2020-643072 QoE-NET) and EPSRC
(project IoSiRe EP/P022715/1).
R EFERENCES
[1] N. Barman, E. Jammeh, S. A. Ghorashi, and M. G. Martini, “No-
Fig. 6: Variation of SI with Resolution for 8-bit C1 video Reference Video Quality Estimation Based on Machine Learning for
Passive Gaming Video Streaming Applications,” IEEE Access, vol. 7,
sequence. Other video sequences show a similar behaviour. pp. 74 511–74 527, June 2019.
[2] S. Zadtootaghaj, N. Barman, S. Schmidt, M. G. Martini, and S. Möller,
35 Campfire
“NR-GVQM: A No Reference Gaming Video Quality Metric,” in 2018
ChimeraEP01 IEEE International Symposium on Multimedia (ISM), Taichung, Taiwan,
ChimeraEP10
30
ChimeraEP11 Dec 2018, pp. 131–134.
ChimeraEP16
Fountains
[3] S. Zadtootaghaj, S. Schmidt, N. Barman, S. Möller, and M. Martini, “A
25
Runners Classification of Video Games based on Game Characteristics linked to
Suzie
20 Video Coding Complexity,” Amsterdam, Netherlands, June 2018.
[4] V. Chikhman, V. Bondarko, M. Danilova, A. Goluzina, and Y. Shelepin,
Δ SI

15 “Complexity of images: Experimental and computational estimates


10
compared,” Perception, vol. 41, pp. 631–647, 2012.
[5] R. Cilibrasi and P. M. B. Vitanyi, “Clustering by compression.” IEEE
5 Transactions on Information Theory, vol. 51, pp. 1523–1545, 2005.
[6] H. Yu and S. Winkler, “Image complexity and spatial information,” in
0
Δ2K-4K Δ720p-1080p Δ480p-720p
IEEE International Conference on Quality of Multimedia Experience
(QoMEX), Klagenfurt, Austria, 2013, pp. 12–17.
[7] ITU-T Rec. P.910, Subjective video quality assessment methods for
Fig. 7: Difference in values of SI for two consecutive resolu- multimedia applications, ITU-T Recommendation, April 2008.
tion for all eight video sequences. [8] A. Haseeb, M. G. Martini, S. Cicalo, and V. Tralli, “Rate and distortion
modeling for real-time MGS coding and adaptation,” in IEEE Wireless
Advanced conference (WiAd), London, UK, June. 2012.
[9] N. Barman, S. Zadtootaghaj, S. Schmidt, M. G. Martini, and S. Möller,
on the SI and TI values of the native resolution of the video, “An objective and subjective quality assessment study of passive gaming
video streaming,” International Journal of Network Management, vol.
encoding settings can be optimized to take into account the e2054, Nov 2018.
higher variation in the value of SI. [10] M. H. Pinson and S. Wolf, “A new standardized method for objectively
measuring video quality,” IEEE Transactions on Broadcasting, vol. 50,
IV. C ONCLUSION AND F UTURE W ORK no. 3, pp. 312–322, Sept 2004.
[11] S. Göring, R. R. R. Rao, and A. Raake, “nofu A Lightweight No-
In this paper, we addressed an existing research gap by Reference Pixel Based Video Quality Model for Gaming Content,”
in 2019 Eleventh International Conference on Quality of Multimedia
studying the effect of different bit-depth, temporal pooling Experience (QoMEX), June 2019, pp. 1–6.
strategies, encoders, encoding settings and resolution on the [12] N. Khan and M. Martini, “Data rate estimation based on scene
variation of SI (and TI). Our results showed that the SI and complexity for dynamic vision sensors on unmanned vehicles,” in
IEEE International Symposium on Personal, Indoor and Mobile Radio
TI values of 10-bit representation of the video are exactly Communications (PIMRC), Bologna, Italy, September 2018.
four times that of the 8-bit representation of the same video, [13] ——, “Bandwidth Modeling of Silicon Retinas for Next Generation
which remains the same even for different temporal pooling Visual Sensor Networks,” Sensors, vol. 19, no. 8: 1751, April 2019.
[14] NETFLIX, “The Consumer Digital Video Library,”
strategies. Such results, as discussed in Section III-A can be https://www.cdvl.org/, Dec 2015, [Online: Accessed 25-May-2019].
used to predict the transform algorithm used to obtain the [15] Li Song, Xun Tang, Wei Zhang, Xiaokang Yang, and Pingjian Xia, “The
lower-bit depth representations of the video. We also found SJTU 4K video sequence dataset,” in 2013 Fifth International Workshop
on Quality of Multimedia Experience (QoMEX), July 2013, pp. 34–35.
that the SI and TI values were the same for a video encoded [16] A. Elemental, “Suzie,” https://www.elemental.com/, [Online: Accessed
with different encoders, which also holds true even when dif- 25-May-2019].
ferent encoding modes are used, as well as for 10-bit videos.
Such information can be taken into account for the design
of a generic QoE model considering videos encoded with
multiple encoders as done commonly in YouTube and Netflix.
Also, the magnitude of the ratio of 4 between the 10-bit and
8-bit representation of the video is approximately the same
irrespective of the encoder and/or encoding choice. A proper
model design using such information can be used by the
decoder to distinguish between different content types for high
accuracy content classification. We believe that the results and

View publication stats

You might also like