0% found this document useful (0 votes)
11 views

7 IEEE ITSS Vehicle Detection and Classification Without Residual Calculation Accelerating HEVC Image Decoding With Random Perturbation Injection

Uploaded by

filizberatoglu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

7 IEEE ITSS Vehicle Detection and Classification Without Residual Calculation Accelerating HEVC Image Decoding With Random Perturbation Injection

Uploaded by

filizberatoglu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1

Vehicle Detection and Classification without


Residual Calculation: Accelerating HEVC Image
Decoding with Random Perturbation Injection
Muhammet Sebul Beratoğlu1 , Behçet Uğur Töreyin1

Abstract—In the field of video analytics, particularly traffic decoding the entire video bitstream, which can be computa-
surveillance, there is a growing need for efficient and effective tionally intensive and time-consuming. In traffic surveillance
methods for processing and understanding video data. Tradi- applications, where real-time video understanding is crucial,
tional full video decoding techniques can be computationally
intensive and time-consuming, leading researchers to explore the computational overhead associated with full video decod-
alternative approaches in the compressed domain. This study ing can be a significant bottleneck. Consequently, researchers
introduces a novel random perturbation-based compressed do- have begun to explore alternative approaches that leverage the
main method for reconstructing images from High Efficiency compressed domain to reduce processing time and maintain
Video Coding (HEVC) bitstreams, specifically designed for traffic the effectiveness of video understanding tasks [1], [2], [3]. By
surveillance applications. To the best of our knowledge, our
method is the first to propose substituting random perturbations directly analyzing the compressed video bitstream, it is pos-
for residual values, thereby creating a condensed representation sible to extract relevant information for video understanding
of the original image while retaining information relevant to video tasks without the need for full decoding.
understanding tasks, particularly focusing on vehicle detection In this study, we introduce a novel method for recon-
and classification as key use cases. structing images from HEVC bitstream by injecting random
By not using any residual data, our proposed method sig-
nificantly reduces the amount of data needed in the image perturbations as a substitute for residual values, significantly
reconstruction process, allowing for more efficient storage and speeding up the reconstruction process compared to standard
transmission of information. This is particularly important when intra decoding. To the best of our knowledge, our method
considering the vast amount of video data involved in surveillance is the first to propose substituting residual values instead of
applications. Applied to the public BIT-Vehicle dataset, we calculating them, thereby creating a condensed representation
demonstrate a significant increase in the reconstruction speed
compared to the traditional full decoding approach, with our pro- of the original image while retaining information pertinent
posed random perturbation-based method being approximately to video understanding tasks, particularly focusing on vehicle
56% faster than the pixel domain method. Additionally, we detection and classification as key use cases. By operating
achieve a detection accuracy of 99.9%, on par with the pixel directly in the compressed domain, our method avoids the
domain method, and a classification accuracy of 96.84%, only computational overhead associated with full video decoding,
0.98% lower than the pixel domain method. Furthermore, we
showcase the significant reduction in data size, leading to more leading to a more efficient and effective solution for traffic
efficient storage and transmission. Our research establishes the monitoring and management.
potential of compressed domain methods in traffic surveillance In video compression standards like HEVC, the encoded
applications, where speed and data size are critical factors. The bitstream holds a substantial amount of data related to pre-
study’s findings can be extended to other object detection tasks, diction error for image samples, known as residuals. For I-
such as pedestrian detection, and future work may investigate
the integration of compressed and pixel domain information, as frames, these residuals make up about 85-90% of the total
well as the extension of these methods to the full video decoding data in the bitstream[4]. Furthermore, residual coding accounts
process, encompassing both intra and inter encoded bitstreams. for an average of 77% and 84% of the total bits for dynamic
Index Terms—H.265/HEVC, Compressed Domain Video Ana- continuous and discrete video textures, respectively[5]. By not
lytic, Vehicle Classification, Video Surveillance, Real-time Video using any residual data, our proposed method significantly
Analysis reduces the amount of data needed in the image reconstruction
process, allowing for more efficient storage and transmission
of information. This is particularly important when consid-
I. I NTRODUCTION ering the vast amount of video data involved in surveillance
applications.
I N recent years, the rapid growth of video data and the
increasing demand for efficient video analytics have led
researchers to seek new methods for analyzing and processing
To evaluate the performance of our proposed method,
we conduct experiments on the public BIT-Vehicle dataset,
video streams. Traditional video processing techniques require a large-scale dataset comprising diverse vehicle types and
imaging conditions. Our results demonstrate that the proposed
1 Signal Processing for Computational Intelligence Research Group, method is able to reconstruct images approximately 56% faster
Informatics Institute, İstanbul Technical University, Maslak, Turkey than the pixel domain method, while maintaining a high level
{beratoglu, toreyin}@itu.edu.tr
This work was supported by The Scientific and Technical Research Council of detection and classification accuracy. In particular, our
of Turkey (TUBITAK) under the grant number 121E378. method achieves a detection accuracy of 99.9%, on par with
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2

the pixel domain method, and a classification accuracy of methods. Some notable review papers on this topic include
96.84%, only 0.98% lower than the pixel domain approach. those by Yang and Pun-Cheng [8], Wang et al. [9], and
These results highlight the potential of our compressed domain Zou et al. [10], which discuss the state-of-the-art in vehicle
method for traffic surveillance applications, where both speed classification, as well as the challenges and future research
and data size are critical factors. directions.
Additionally, the increasing emphasis on data privacy and In this section, we focus on methods that utilize the BIT
the need to comply with regulations such as the EU General Vehicle dataset, as this dataset allows us to directly compare
Data Protection Regulation (GDPR) and the California Privacy our results with these works. We provide an overview of five
Rights Act (CPRA) make data minimization a crucial consid- representative vehicle classification methods based on the BIT
eration in video analytics. These regulations mandate that only Vehicle dataset.
the necessary data be collected to fulfill a certain purpose. A Dong et al. [11] proposed a vehicle type classification
recent study presented a method to reduce the amount of per- method using a semi-supervised convolutional neural network
sonal data needed for machine learning predictions by remov- from vehicle frontal-view images. They introduced sparse
ing or generalizing some input features of the runtime data, Laplacian filter learning to obtain the filters of the network
using knowledge distillation approaches [6]. Furthermore, the with large amounts of unlabeled data and trained the network
work by He et al. introduces a lightweight image encryption on the challenging BIT-Vehicle dataset. The method demon-
scheme using compressive sensing and data hiding to enhance strated the effectiveness of using deep learning for vehicle
privacy and data security in smart city applications, reinforcing classification in complex scenes.
the importance of advanced encryption and access control Roecker et al. [12] proposed a convolutional neural network
mechanisms in ensuring data security [7]. In the context of our model for vehicle type classification using low-resolution
proposed method, operating directly in the compressed domain images from a frontal perspective. They trained the model on
and significantly reducing the amount of data needed for image a subset of the BIT-Vehicle dataset and achieved an accuracy
reconstruction addresses the data minimization requirement set of 93.90%, proving the model to be discriminative and capable
out in these regulations. By minimizing the data used for video of generalizing the patterns of the vehicle type classification
understanding tasks, our method not only enhances processing task.
efficiency but also helps organizations comply with privacy Sang et al. [13] proposed a new vehicle detection model
regulations. called YOLOv2 Vehicle based on YOLOv2. They used the
By showcasing the potential of compressed domain methods k-means++ clustering algorithm to cluster vehicle bounding
for video understanding tasks in traffic surveillance applica- boxes on the training dataset, improved the loss calculation
tions, this study contributes to the growing body of research method for bounding box dimensions, and adopted a multi-
aimed at overcoming performance bottlenecks in video ana- layer feature fusion strategy. The model achieved a mean Aver-
lytics. The faster reconstruction time and reduced data size age Precision (mAP) of 94.78% on the BIT-Vehicle validation
associated with these methods make them a promising option dataset.
for certain types of applications where speed and data size are Wu et al. [14] proposed a multi-scale vehicle detection
important considerations. Further research into the potential method by improving YOLOv2 to address the foreground-
applications and improvements of these methods could lead background class imbalance and varying vehicle sizes in a
to significant advancements in the field of video analytics. scene. They introduced a new anchor box generation method
The remainder of this paper is organized as follows: Section called Rk-means++ and incorporated Focal Loss into YOLOv2
2 provides an overview of related work in object detection for vehicle detection. The method demonstrated better perfor-
and video compression. Section 3 discusses the basic blocks mance on vehicle localization and recognition on the BIT-
of HEVC encoding and decoding, with a focus on intra- Vehicle public dataset compared to other existing methods.
prediction. Section 4 presents our proposed approach for object Taheri Tajar et al. [15] developed a lightweight real-time
detection and classification in compressed domain videos. Sec- vehicle detection model based on the Tiny-YOLOv3 network.
tion 5 provides experimental results, and Section 6 concludes They pruned and simplified the network and trained it on
the paper. the BIT Vehicle dataset, achieving an mAP of 95.05% and
a detection speed of 17 fps, which is about two times faster
II. R ELATED W ORKS than the original Tiny-YOLOv3 network.
In our work, we adopt the YOLOv7 framework [16] as
In this section, we review existing literature on vehicle clas-
the basis for our vehicle classification method. We focus
sification in both the pixel and compressed domains, with a
on achieving comparable accuracy to pixel domain methods
focus on methods using the BIT Vehicle dataset. We briefly
while operating in the compressed domain. By utilizing the
discuss various techniques and recent advancements while
strengths of YOLOv7 and adapting it to work with HEVC
highlighting how our approach differs from existing methods.
intra features, we aim to develop a computationally efficient
vehicle classification method that maintains high accuracy.
A. Vehicle Classification in Pixel Domain
Vehicle classification in the pixel domain has been a popular B. Related Works in Compressed Domain
research topic over the years, with numerous review papers Recent years have witnessed a growing interest in developing
providing comprehensive overviews of various techniques and object detection and classification methods in the compressed
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 3

domain. In this section, we review some of the most relevant formance, emphasizes enhancing image quality with a low-
works and discuss how our approach differs from them. complexity, high-performance design. Unlike their method,
Donghai Zhai et al. [17] provided a comprehensive overview which centers on chrominance texture enhancement and
of object detection methods in the compressed domain across encoder-decoder optimization, our approach is dedicated to ef-
various video compression standards, including MPEG-2, ficiently reconstructing images for video analytics, specifically
H.264, and HEVC. They highlighted different ways of uti- in vehicle detection and classification.
lizing motion vector information for object detection and Choi and Bajic [26] presented a human detection method
analyzed the techniques under various compression standards. based on HEVC intra coding syntax elements, including block
Among the many works presented in their review, we have size, intra prediction modes, and transform coefficient levels.
chosen the ones that focus on the HEVC compressed domain Their approach did not require full bitstream decoding but
for a more detailed comparison with our approach. focused on human detection rather than vehicle classification.
Zhao et al. [18] proposed a real-time moving object seg- Wang et al. [27] developed a highway vehicle counting
mentation and classification method for surveillance videos method in the compressed domain using low-level features ex-
using HEVC compressed domain features. Their approach tracted from coding-related metadata. Their method is compet-
only classified objects as persons or vehicles, while our method itive with pixel-domain approaches in terms of computational
classifies vehicles into six specific types. cost but focuses on counting vehicles rather than classifying
Chan et al. [19] showed that tuning DNNs with compressed them into distinct types.
data enhances detection accuracy in vehicle detection systems. Our method differs from these works in several ways. We
Their findings confirm that DNN performance is stable even at classify vehicles into six specific types, providing a more
high compression ratios up to 160:1, making a significant case detailed classification for intelligent transportation applica-
for using compressed data in automated driving applications tions. Furthermore, we are the first to suggest using random
without loss of critical information. perturbation to reconstruct a frame without employing residual
Deguerre et al. explore the impact of video compression data, which makes our method less computationally demand-
on traffic flow rate estimation using deep learning from ing. While most of these works rely on motion vectors, our
MPEG4 part-2 compressed video streams [20]. Their findings approach exploits the potential of intra features. Since a video
underscore the potential for deep learning models to effectively consists of intra and inter frames, incorporating motion vectors
utilize compressed data, enhancing the efficiency of traffic in future works could further enhance our method. By using
management systems. the state-of-the-art YOLOv7, we demonstrate close accuracy
In the work by Cai et al. [21], the authors propose a new to pixel domain methods, showcasing the effectiveness of our
video coding strategy in HEVC tailored for object detection, approach.
focusing on task-specific bit allocation and the influence of
each pixel on detection algorithms. In contrast, our approach
III. H IGH E FFICIENCY V IDEO C ODING (HEVC)
enhances the existing HEVC framework by introducing ran-
dom perturbations for image reconstruction, aiming to con- The High Efficiency Video Coding (HEVC) is a video com-
dense the original image efficiently while retaining essential pression standard that was developed jointly by the ITU-T
information for vehicle detection and classification. This rep- Video Coding Experts Group and the ISO/IEC Moving Picture
resents an optimization of the current decoding algorithm, Experts Group. It is designed to achieve higher compression
prioritizing speed and data efficiency. efficiency compared to its predecessor, the H.264/MPEG-4
Chen et al. [22] introduced a fast object detection method AVC standard. HEVC achieves higher compression efficiency
in the HEVC intra compressed domain. Their method used by introducing new tools and techniques such as larger block
partitioning depths, prediction modes, and residuals for object sizes, more prediction modes, and more efficient entropy
detection, whereas our approach omits residuals and achieves coding [28], [29].
good results with less computational demand. The HEVC compression algorithm processes the video data
Alizadeh and Sharifkhani [23] present a novel moving using a hierarchical organization of blocks. The encoding
object detection method in the H.265/HEVC compressed do- process begins with partitioning a frame into Coding Tree
main, utilizing a conditional random field (CRF) model. Their Units (CTUs), which are further partitioned into Coding Units
approach extracts and analyzes block-specific data like motion (CUs), Transform Units (TUs), and Prediction Units (PUs).
vectors (MVs), partitioning modes, and bit consumption from The prediction process can be either inter or intra. Intra pre-
the compressed bitstream for object detection. diction, also known as intra-frame prediction, is a technique to
Feng et al. [24] proposed a fast framework for semantic remove spatial correlation. It uses information from previously
video segmentation, named TapLab, which utilized motion coded blocks within the same frame to predict the content of
vectors and residuals from compressed videos. Unlike their the current block. On the other hand, inter-prediction is used
method, we focus on intra features and do not rely on motion to remove the temporal correlation. It uses information from
vectors. previously coded frames to predict the content of the current
Yang et al. [25] focused on improving the texture of com- frame [30]. Our method is applied to intra predicted frames.
pressed chrominance components using a luminance-guided In intra prediction, the prediction is done by using reference
chrominance enhancement network and online learning. Their samples and prediction modes. Reference samples are blocks
approach, which optimizes both encoder and decoder per- of image or video data that are used as a reference for
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 4

Dequantization Inverse Transform Residuals + CTUpx

Reconstruction Buffer (Ipx)


Reference
Entropy Decoding Prediction Modes Intra Prediction
Samples
Reconstruction Buffer (Irp)

HEVC Bitstream Perturbation-Based


+ CTUrp
Residual Substitution

Fig. 1. Reconstructing CTUs with HEVC Standard Decoding Process and with Random Perturbation Based Residual Substitution.

predicting the values of other blocks. They are extracted at entropy decoding block, which extracts syntax elements such
the boundary from the upper and left blocks adjacent to the as partition structure, prediction modes, and residual data. The
current PU. When reference samples are not available, they can decoder generates the CTU prediction by employing prediction
be generated by copying samples from the closest available modes and utilizing reference samples. The reference samples
references. If no reference samples are available, a nominal consist of neighboring pixels found within the image. The
average sample value (typically 128) is used in their place. residual data is then added to the predicted CTU to generate
HEVC uses several intra-prediction modes, such as Angular, the final CT Upx . Note that, the process of reconstructing
DC, and Planar, to achieve better compression performance. CTUs with HEVC standard decoding process and estimated
The intra prediction modes use the same set of reference residuals is actually done in Coding Unit (CU) level. However,
samples, and there are 33 prediction modes in total [30]. for simplicity and ease of understanding, we have presented
The HEVC standard employs the discrete cosine transform the process in the context of CTUs.
(DCT) and the discrete sine transform (DST) to encode TUs. 1) Standard Reconstruction: Let CU (x, y) be the intensity
The residuals are the difference between predictions and value of a reconstructed Coding Unit of an image I at the
original pixel values, and they are held in TUs. The block spatial coordinates (x, y). The intensity value of the CU can
structure, prediction modes, and the quantized data are entropy be expressed as the sum of its prediction value, P (x, y), and
coded and transmitted. its residual value, R(x, y), as shown in the following equation:

IV. C OMPRESSED D OMAIN V EHICLE D ETECTION AND


CU (x, y) = P (x, y) + R(x, y) (1)
C LASSIFICATION
Our proposed method for vehicle detection and classification Decoding is an iterative process where the reconstructed CU
in the compressed domain consists of two primary compo- serves as the input for subsequent CUs [29]. The prediction in-
nents: Image Reconstruction and Vehicle Detection and Clas- formation’s close relationship with residual information makes
sification. The Image Reconstruction component reconstructs bypassing residuals a significant challenge [26]. To address
an image based on the intra-prediction process in HEVC, elim- this, we explored potential signals that could replace residuals
inating the need for residual data. The Vehicle Detection and while maintaining the overall integrity of the reconstructed
Classification component leverages the reconstructed image as image.
input and employs the state-of-the-art YOLOv7 [16] to detect 2) Impact of Ignoring Residuals: In the case where R(x, y)
and classify vehicles. is assumed to be equal to 0, let’s examine the consequences
This section delves into the details of each component, for the first CU to be decoded, which is situated at the top-left
emphasizing their collaborative effort to achieve efficient and corner of the image. Due to the absence of reference pixels
accurate vehicle detection and classification within the HEVC and in accordance with the HEVC standard, the reference
compressed domain. value is assigned as 128 [30]. This value corresponds to the
mean of the pixel range, equating to 128 for an 8-bit image.
A. Image Reconstruction using Random Perturbations Consequently, the first reconstructed CU will be entirely gray.
In this subsection, we discuss the reconstruction of images The first CU contains the reference pixels for subsequent
based on prediction unit information without resorting to CUs to be decoded. This means that for the second CU, the
residuals, consequently reducing data and computational re- reference pixels will also be 128. As this process continues,
quirements. all reference pixels will have a value of 128, leading to a fully
Fig.1 illustrates the process of reconstructing a coding gray predicted image, as depicted in (2):
tree unit (CTU) within the context of High Efficiency Video
Coding(HEVC). The HEVC bitstream is first decoded by the I(x, y) = 128 if R(x, y) = 0 (2)
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 5

3) Impact of Constant Residuals: In the case where R(x, y)


is assumed to be equal to a constant value, this constant would
be added to all decoded CUs throughout the decoding process.
As the process progresses, this addition would accumulate,
causing saturation in the reconstructed image. The constant
value of R(x, y) would directly impact the resulting saturation
level and might lead to a loss of detail or visual information
in the reconstructed image. This illustrates the importance of (a) (b)
properly handling the residual values to achieve accurate and
high-quality image reconstructions.
4) Random Perturbations as a Substitute for Residuals:
Taking into account the limitations of simply ignoring or using
constant values for residuals, we propose a more sophisticated
approach by using a series of integers Rp as a replacement
for the actual residuals. These integers are derived from a
Gaussian distribution, capable of taking both negative and (c) (d)
positive values. The choice of a Gaussian distribution to
Fig. 2. Comparison of different approaches to substitute residuals.(a) Standard
generate these estimated perturbations (Rp) is motivated by its reconstruction using HEVC. (b) Impact of Ignoring Residuals, R(x, y) = 0.
prevalent use in modeling the distribution of errors and noise (c) Impact of Constant Residual, R(x, y) = 1. (d) Random Perturbations as
in natural images [31]. Gaussian noise is a well-established a Substitute for Residuals, Irp with µ = 0 and σ = 7.
model that effectively captures the inherent variability and
uncertainty present in real-world sensory data, making it a
and reproducibility across different image reconstructions. The
realistic approximation for the residuals in image compression.
reconstructed image using the random perturbation method is
By utilizing this approach, we aim to achieve a more accurate
denoted as Irp .
and realistic reconstruction of CTUs compared to simpler
An example of the different approaches to substitute resid-
methods.
uals is shown in Figure 2. This figure provides a visual
The Gaussian distribution’s probability density function is
comparison of the original pixel domain image reconstructed
given by:
using the standard HEVC method (a) and the effects of various
1

(x − µ)2
 approaches to substitute residuals on the image quality, such
f (x; µ, σ) = √ exp − (3) as ignoring residuals (b), using constant residuals (c), and em-
2πσ 2 2σ 2
ploying random perturbations (d). The random perturbations
where µ is the mean and σ is the standard deviation of the approach, with a mean (µ) of 0 and standard deviation (σ) of
distribution. 7, clearly demonstrates a better representation of the original
We define the sequence Rp consisting of n discrete random image compared to other approaches.
variables sampled from the Gaussian distribution with mean In Figure 3, we present a matrix of images showcasing
µ = 0 and a varying σ ∈ {1, 2, 3, 4, 5}: the effectiveness of our Gaussian-based Random Perturbations
method in reconstructing images of vehicles from various
Rp = {rp1 , rp2 , . . . , rpn } (4) scenes while measuring the effect of different standard devi-
Each random variable rpi for i = 1, 2, . . . , n is sampled ations. The first column displays pixel domain images, while
from the Gaussian distribution and then rounded to the nearest the other columns represent reconstructed images using our
integer: method with different standard deviations. As illustrated, our
method successfully constructs a close silhouette of the pixel
rpi = round(Xi ) (5) domain image, retaining the general boundaries of objects in
the frame and the necessary information for image classifica-
2
where Xi ∼ N (0, σ ), and round(·) is a function that maps a tion. As the standard deviation increases, the image becomes
real number to the nearest integer. more apparent to the human eye, yet classification can still
Given the sequence Rp, we construct the predicted Coding be done successfully for all different standard deviations. Our
Tree Units (CTUs) without residual data. For each pixel approach bypasses the calculation of residual data, yet still
coordinate (x, y) in a CTU, the residual value is replaced by enables the creation of an image with general boundaries
a corresponding random variable from Rp: retained, which serves as valuable input for object detection
tasks.
CUrp (x, y) = P (x, y) + rp(x,y) (6)
where rp(x,y) is the random variable associated with the pixel B. Vehicle Detection and Classification
at position (x, y). The sequence Rp is of length equal to the For the vehicle detection and classification task, we employ
maximum number of pixels in a CTU, which is 64 × 64 = the state-of-the-art YOLOv7 object detector [16]. YOLOv7
4096. The sequence Rp, with a mean of zero and varying is a single-stage, real-time detector that has demonstrated
standard deviations, is generated once to maintain consistency impressive speed and accuracy in real-time object detection
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 6

Ipx Irp σ=1 Irp σ=2 Irp σ=3 Irp σ=4 Irp σ=5

Fig. 3. Effect of Varying Standard Deviations in Random Perturbation for Image Reconstruction.

tasks. According to its paper, YOLOv7 outperforms previous approaches. The results demonstrate the effectiveness of our
models and has set a significant benchmark in the field. method in achieving efficient and accurate vehicle detection
The YOLOv7 family includes the YOLOv7-Tiny model, and classification within the HEVC compressed domain.
which is the smallest in the family with just over 6 million
parameters. Despite its compact size, the YOLOv7-Tiny model
A. The Experimental Setup
achieves a validation AP of 35.2%, surpassing the performance
of previous YOLO-Tiny models. The experimental setup for comparing our proposed method
In our proposed method, we utilize the reconstructed im- begins with obtaining HEVC bitstreams from the JPEG format
ages, generated using Random Perturbations, as input for images, Iorg , in the BIT database through intra-encoding using
the Vehicle Detection and Classification component. These the HEVC encoder [33].
reconstructed images, referred to as Irp images, are used to Then, from these bitstreams, we generate the pixel domain
train a model using the Darknet framework [32]. The trained image Ipx and the Random Perturbation Images Irp . Note that
model is then applied to the task of vehicle detection and Ipx images are first encoded and then decoded from Iorg to
classification within the HEVC compressed domain. ensure a fair comparison, as each source image faces the same
By leveraging the YOLOv7 object detector and the recon- HEVC encoding distortion. To further compare with previous
structed images, our proposed method is able to efficiently compressed domain approaches [34], [35], we also generate
and accurately detect and classify vehicles in the compressed Ibp (block partition based) and Ipu (prediction unit based)
domain, without the need for residual data. images from the same bitstream.
Next, we fine-tuned the YOLOv7-Tiny models using pre-
trained weights provided by [16], ensuring consistent hyper-
V. E XPERIMENTAL R ESULTS
parameters such as learning rate, batch size, and number of
This section presents the experimental results of our proposed epochs across all scenarios. Throughout the training process,
method. We begin with an overview of the experimental setup, which included over 250,000 batches, we diligently monitored
detailing the hardware and software configurations used in the models’ performance on the validation set. We selected the
the experiments, and dataset description. Next, we provide a weights that demonstrated the highest Mean Average Precision
comparison of time efficiency and accuracy with other relevant (mAP) to ensure robust generalization.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 7

Vpx Modelpx
Inverse
Dequantization + Loop Filters YOLOv7 Tiny
Transform

Vrp Modelrp
Intra
Irp Generator YOLOv7 Tiny
Prediction

Vpm Modelpm
Entropy
Ipu Generator YOLOv7 Tiny
Decoder

Vbp Modelbp

Ibp Generator YOLOv7 Tiny


HEVC Bitstream

Fig. 4. Comparison of four different input images generated from the same HEVC bitstream and fed into their corresponding YOLOv7-Tiny networks for
vehicle detection and classification.

In addition to these models, we trained a separate model comprises 9580 vehicle images featuring six types of vehi-
on the original JPEG images, Iorg , to serve as a benchmark. cles: sedans, sport-utility vehicles (SUVs), microbuses, trucks,
To explore the impact of standard deviation variations in buses, and minivans. The dataset exhibits varying frequencies
random perturbations, we also trained distinct models for of vehicle types. Specifically, the number of vehicles per
each specified standard deviation. Overall, 14 different models class is as follows: 558 buses, 883 microbuses, 476 minivans,
were trained for the vehicle classification task, and 4 models 5922 sedans, 1392 SUVs, and 822 trucks. These images were
were dedicated to vehicle detection, enabling comprehensive captured by road surveillance cameras and include both day
analysis across varying conditions. and night scenes, as well as sunny days with no background
Figure 4 illustrates the four different input images generated noise, rain, snow, people, or other vehicle types.
from the bitstream and fed into the YOLOv7-Tiny networks. To ensure a fair comparison with previous works, the dataset
We use the following abbreviations for vehicle detection and was divided into a training set and a validation set with a
classification based on different image types: ratio of 8:2, containing 7880 and 1970 images, respectively.
• Vbp : Using Ibp (Block Partition Based) images. This ratio was also maintained for each vehicle type to
• Vrp : Using Irp (Random Perturbation Images). ensure a balanced representation across classes. Among these
• Vpu : Using Ipu (Prediction Unit Based) images. images, approximately 1000 and 250 were nighttime images
• Vpx : Using Ipx (Pixel Domain) images. for training and validation, respectively.
To conduct these experiments, the following hardware and
software configurations are employed:
C. Reconstruction Time Comparison
• Computer: A computer with an Intel(R) Core(TM) i9-
9900X CPU, NVIDIA GeForce GTX 1080 Ti GPU, and We first measure the time taken for different steps of the
48 GB RAM running a Windows 11 64-bit operating HEVC reconstruction process. Then, we calculate the image
system is used. reconstruction time for these methods. Finally, we measure
• Software: The reference software for the H.265/HEVC the total elapsed time, including both the reconstruction and
coding standard, known as HM (version 16.20), is used inference time for vehicle detection and classification.
for both encoding and decoding purposes. The ”Main 1) Measurement of Reconstruction Steps: The following
profile” is used for encoding, with 4:2:0 color encoding processes are measured to determine the reconstruction time
and a quantization parameter of 32 [33]. of each method.
• Compiler: The Microsoft Visual Studio 2019 (v142) plat-
form tool-set is used to compile the reference software. • Entropy Decoding (ED): This is a common step for both
methods.
• Intra Prediction (IP): This is a common step for both
B. BIT Vehicle Dataset methods.
We selected the BIT dataset [11] for our experiments due to • Residual Decompression (RD): This step is skipped for
its widespread use in previous research and the diverse set Irp .
of images it offers for vehicle classification. The BIT-Vehicle • Loop Filters (LF): This is a common step for both
dataset, provided by the Beijing Institute of Technology, methods.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 8

TABLE I TABLE IV
M EASUREMENT OF R ECONSTRUCTION S TEPS . V EHICLE D ETECTION ACCURACY FOR D IFFERENT M ETHODS .

Time (ms) Average Min. Max. Method Average Precision(AP) %


Entropy Decoding 6.59 6.00 8.00 Vbp 98.34
Intra Prediction 7.04 4.00 10.00 Vpu 99.89
Residual Construction 12.92 8.00 17.00 Vrp 99.99
Loop Filters 9.70 8.00 16.00 Vpx 99.99

TABLE II
C OMPARISON OF I MAGE R ECONSTRUCTION T IMES .
among all instances of the class in the dataset), average
intersection of union (Avg. IoU), and mean average precision
Time(ms) Irp Ipx (mAP) at the IoU threshold of 0.50 ([email protected]).
Average 23.33 36.25
Minimum 18.00 26.00
[email protected] is a metric commonly used to evaluate the
Maximum 34.00 51.00 performance of object detection algorithms. It is calculated
as the mean of the Average Precision (AP) for each class in
a dataset (9).
2) Comparison of Image Reconstruction Time: The elapsed
time to reconstruct Irp , denoted as T (Irp ), can be calculated n
1X
using the following equation: mAP = APi (9)
n i=1
T (Irp ) = T (ED) + T (IP ) + T (LF ) (7) where n is the number of classes in the dataset and APi is
Similarly, the elapsed time to reconstruct Ipx , denoted as the Average Precision for class i.
T (Ipx ), can be calculated as: To calculate AP, the algorithm’s predictions are first sorted
by their confidence scores. Then, AP is calculated as the area
under the precision-recall curve with an IoU threshold of 0.50,
T (Ipx ) = T (ED) + T (IP ) + T (RD) + T (LF ) (8) as shown in (10).
The results presented in Table II demonstrate that the Pn
proposed method has significantly faster construction times P (k) · rel(k)
AP = k=1 Pn (10)
compared to traditional full decoding. The time required k=1 rel(k)
to generate an image in the compressed domain using the
where P (k) is the precision at cut-off k, and rel(k) is a binary
Random Perturbation Image (Irp ) method averaged 23.33 ms,
indicator of whether the prediction at cut-off k is a true positive
a 35.6% reduction in time compared to the pixel domain’s
or not, considering an IoU threshold of 0.50.
36.25 ms. These results indicate that the proposed methods can
2) Vehicle Detection Accuracy: The BIT Vehicle dataset
substantially decrease the time required for image generation.
is originally annotated for six different vehicle types. To
3) Comparison of Total Elapsed Time: The total time spent
measure the vehicle detection performance, the dataset has
for both reconstruction and classification is presented in Table
been re-annotated, combining all vehicle types into a single
III. Vehicle classification using the YOLO Convolutional Neu-
”vehicle” class. This approach focuses solely on the detection
ral Network (CNN) takes approximately 2ms for both pixel
of vehicles, which can enhance the model’s accuracy by
and compressed domain methods, as it performs detection
simplifying the task to detect the presence of any vehicle rather
and classification simultaneously. The key difference between
than distinguishing between different types.
the two methods lies in the average reconstruction time. The
In scenarios such as perimeter security or quick traffic
compressed domain method (Vrp ) is significantly faster, taking
counts where the specific type of vehicle is less critical,
only 25.33ms compared to the pixel domain method (Vpx )
prioritizing detection accuracy over classification can provide
which takes 38.24ms. This demonstrates the efficiency of our
more reliable data. This is especially relevant in environments
compressed domain method in reducing the overall time spent
where rapid and accurate vehicle detection is paramount, and
on reconstruction and classification tasks.
the additional information provided by classification does not
significantly alter response or outcome.
D. Accuracy Comparison
The models are retrained using these four different image
1) Metrics: The results are evaluated using the F1-score, types. The obtained vehicle detection performance is presented
precision (the proportion of correct detections among all posi- in Table IV.
tive predictions), recall (the proportion of correct detections The results suggest that the Random Perturbation Image
Reconstruction method (Vrp ) achieves an Average Precision
TABLE III (AP) of 99.99% for vehicle detection, matching the perfor-
C OMPARISON OF THE T OTAL T IME S PEND FOR R ECONSTRUCTION AND mance of the pixel domain approach. Additionally, the BP
C LASSIFICATION . and PU methods, which do not utilize random residuals, also
Method Time (ms) exhibit accuracy that closely approximates that of our proposed
Vrp 25.33 method. This highlights that all compressed domain methods
Vpx 38.24 provide commendable accuracy for vehicle detection. Given
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 9

TABLE V domain experiments, such as Vrp , Vbp , and Vpu .


V EHICLE C LASSIFICATION ACCURACY C OMPARISON OF R ANDOM For our proposed compressed domain method, the
P ERTURBATION I MAGES G ENERATED WITH VARIOUS S TANDARD
D EVIATIONS . YOLOv7.Tiny Vrp with σ = 7 achieves an impressive mAP
of 96.84%, indicating that the method performs well even
Vrp σ Bus Microbus Minivan Sedan SUV Truck [email protected] (%)
in the compressed domain. Although this performance is
1 99.46 93.40 93.12 99.54 93.98 97.25 96.13 slightly lower than the pixel domain performance (97.82%,
2 99.95 95.71 91.04 99.61 94.18 97.03 96.25
3 99.97 96.38 92.59 99.40 93.94 95.11 96.23 YOLOv7.Tiny Ipx ), it still demonstrates the potential of
4 99.92 94.14 91.34 99.72 95.54 94.56 95.87 compressed domain approaches for traffic surveillance appli-
5 99.91 94.19 92.43 99.64 94.40 97.00 96.26
6 99.92 95.33 91.57 99.65 95.73 96.47 96.45 cations. In comparison to the literature, the proposed com-
7 99.99 95.75 93.08 99.48 95.04 97.69 96.84 pressed domain method outperforms some of the pixel domain
8 99.28 95.06 92.52 99.67 93.79 96.62 96.16
9 99.91 94.34 91.09 99.57 95.47 97.73 96.35 methods, such as YOLOv3.Tiny, CNN, YOLOv2 Vehicle,
10 99.84 94.62 90.09 99.52 93.11 98.16 95.89
YOLOv2, YOLOv3, SSD300 VGG16, and Faster R-CNN
VGG16. This indicates that the compressed domain approach
can provide a viable alternative to pixel domain methods for
these non-discriminative results across methods, further inves- certain applications, especially when considering the signifi-
tigation is warranted. In our next experiments, we will focus cant speedup in the image reconstruction process.
on vehicle classification to discern more granular differences Our compressed domain method, YOLOv7.Tiny Vrp , offers
and potentially validate the distinct advantages of integrating better accuracy than the other compressed domain methods
random residuals. YOLOv7.Tiny Vpu and YOLOv7.Tiny Vbp , with mAP values
3) Vehicle Classification Accuracy: Table V presents the of 95.35% and 75.11%, respectively. This demonstrates the
vehicle classification accuracy comparison for random per- effectiveness of the proposed random perturbation-based ap-
turbation images generated with various standard deviations. proach.
Each row corresponds to the classification accuracy obtained When comparing the pixel domain methods, it is evident
using a different standard deviation value for the random that our YOLOv7-based models excel in this domain. Al-
perturbations. The columns represent the Average Precision though the main focus of our research is on compressed do-
(AP) for each vehicle type, as well as the mean average main methods, the YOLOv7-based models have demonstrated
precision (mAP) at an IoU threshold of 0.50. their superiority over other pixel domain approaches such as
From the results, it can be observed that for most vehi- YOLOv3.Tiny, CNN, YOLOv2 Vehicle, Improved YOLOv2,
cle types, the classification accuracy remains relatively high YOLOv2, YOLOv3, SSD300 VGG16, and Faster R-CNN
across different standard deviation values. The highest mAP VGG16.
is obtained when the standard deviation is set to 7, with a value
of 96.84%. This indicates that the proposed method performs
VI. C ONCLUSION
well in the classification task for various standard deviations.
It is also worth noting that certain vehicle types, such In this study, we introduced a novel method for reconstructing
as buses and sedans, consistently have higher classification images from HEVC bitstreams, tailored specifically for traffic
accuracies than others, such as microbuses and minivans. surveillance applications. Our approach replaces traditional
This might be due to the distinctive features of these vehicle residual data calculations with random perturbations, achiev-
types, which make them easier to classify, as well as the ing a condensed representation of the original image while pre-
higher frequency of sedans in the dataset. Overall, the table serving essential information for video understanding tasks. By
demonstrates the effectiveness of the proposed method for leveraging the compressed domain, our method significantly
vehicle classification across a range of standard deviation reduces both computational overhead and data requirements
values. for image reconstruction, making it ideally suited for real-time
Table VI presents the vehicle classification accuracy for traffic monitoring and management.
different methods in both pixel and compressed domains, The effectiveness of our method was validated using the
comparing our proposed compressed domain method with public BIT-Vehicle dataset, which demonstrated a 56% faster
results from the literature and pixel domain methods. The reconstruction process compared to traditional pixel domain
columns represent the implementation, method, domain, the methods, while maintaining high levels of detection and clas-
Average Precision (AP) for each vehicle type, and the mean sification accuracy. These results indicate that compressed do-
average precision (mAP) at an IoU threshold of 0.50. main methods can effectively address performance bottlenecks
In the given table, there is a distinction between the Iorg and in video analytics and offer efficient solutions for applications
Ipx results. The Iorg refers to the original JPEG files in the where both speed and data volume are critical.
BIT dataset, while Ipx represents the encoded and re-decoded The potential applications of our study extend beyond
versions of the Iorg images. The purpose of this comparison traffic monitoring. For instance, our method could significantly
is to measure the potential performance loss caused by lossy enhance the efficiency of video archive scanning, enabling
compression. As a result, the performance of Iorg is higher faster vehicle and license plate searches through compressed
than that of Ipx , indicating that lossy compression may have a videos. This could be particularly beneficial for applications
negative impact on the classification accuracy. It is important requiring rapid access to specific video content without the
to note that this effect is also applicable to the compressed need for full video decompression.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 10

TABLE VI
V EHICLE C LASSIFICATION ACCURACY FOR D IFFERENT M ETHODS .

The Class AP (%)


Study Model Input mAP (%)
Bus Microbus Minivan Sedan SUV Truck
[15] YOLOv3.Tiny Iorg - - - - - - 95.05
[12] CNN Iorg - - - - - - 93.90
[13] YOLOv2 Vehicle Iorg 98.42 97.04 95.02 97.37 93.73 97.80 96.56
[14] Improved YOLOv2 Iorg 98.86 96.63 95.90 98.23 94.86 99.30 97.30
[14] YOLOv2 Iorg 98.34 95.03 91.11 97.42 93.62 98.41 95.65
[14] YOLOv3 Iorg 98.65 96.98 94.04 97.65 94.36 98.17 96.64
[14] Faster R-CNN VGG16 Iorg 99.05 93.75 91.38 98.14 94.75 98.17 95.87
[14] SSD300 VGG16 Iorg 97.97 97.98 90.28 97.15 91.25 97.75 93.75
[34] YOLOv7.Tiny Ibp 83.52 65.40 63.51 87.36 66.61 84.29 75.11
[35] YOLOv7.Tiny Ipu 99.83 92.22 89.16 99.51 93.02 98.37 95.35
Ours YOLOv7.Tiny Iorg 100.00 97.70 96.40 99.76 96.49 99.23 98.26
Ours YOLOv7.Tiny Ipx 100.00 96.10 96.36 99.76 95.95 98.74 97.82
Ours YOLOv7.Tiny Irp 99.99 95.75 93.08 99.48 95.04 97.69 96.84

Looking ahead, we plan to further our research in several [3] R. Torfason, F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, and
key areas: L. Van Gool, “Towards image understanding from deep compression
without decoding,” in 6th International Conference on Learning
• While the BIT-Vehicle dataset proved adequate for initial Representations, ICLR 2018 - Conference Track Proceedings, 2018.
validations, future studies will utilize datasets with greater [Online]. Available: http://arxiv.org/abs/1803.06131
[4] J. Stankowski et al., “Analysis of compressed data stream content in
variety and complexity to fully evaluate the robustness hevc video encoder,” International Journal of Electronics and Telecom-
and adaptability of our approach under diverse real-world munications, vol. 61, pp. 121–127, 2015.
conditions. [5] A. V. Katsenou, M. Afonso, and D. R. Bull, “Study of compression
statistics and prediction of rate-distortion curves for video texture,”
• We will expand the application of our method to both Signal Processing: Image Communication, vol. 101, 2022.
intra and inter encoded bitstreams, exploring broader uses [6] A. Goldsteen, G. Ezov, R. Shmelkin, M. Moffie, and A. Farkash, “Data
within video processing technologies. minimization for gdpr compliance in machine learning models,” AI and
• While our current approach eliminates residuals and Ethics, vol. 2, no. 3, pp. 477–491, 2022.
[7] X. He, L. Li, H. Peng, and F. Tong, “An efficient image privacy preser-
approximates them using random perturbations, another vation scheme for smart city applications using compressive sensing and
intriguing direction for future research could involve multi-level encryption,” IEEE Transactions on Intelligent Transportation
transmitting a minimal amount of data that includes key Systems, pp. 1–15, 2024.
[8] Z. Yang and L. S. Pun-Cheng, “Vehicle detection in intelligent trans-
statistical parameters—such as the mean and standard portation systems and its applications under varying environments: A
deviation—of the original data. This approach aims to review,” Image and Vision Computing, vol. 69, pp. 143–154, 2018.
provide insights into the data’s variability with minimal [9] Z. Wang, J. Zhan, C. Duan, X. Guan, P. Lu, and K. Yang, “A review of
updates to the transmitted data, potentially enhancing the vehicle detection techniques for intelligent vehicles,” IEEE Transactions
on Neural Networks and Learning Systems, pp. 1–21, 2022.
reconstruction fidelity. [10] Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20
• Additionally, we will explore the potential of our tech- years: A survey,” Proceedings of the IEEE, vol. 111, no. 3, pp. 257–276,
nique for other object detection tasks, such as pedestrian 2023.
[11] Z. Dong, Y. Wu, M. Pei, and Y. Jia, “Vehicle type classification using
detection and license plate recognition. a semisupervised convolutional neural network,” IEEE Transactions on
In summary, our study significantly contributes to the field Intelligent Transportation Systems, vol. 16, no. 4, pp. 2247–2256, 2015.
of compressed domain video analytics by demonstrating a [12] M. N. Roecker, Y. M. G. Costa, J. L. R. Almeida, and G. H. G.
Matsushita, “Automatic vehicle type classification with convolutional
viable method to minimize data and computational demands neural networks,” in 2018 25th International Conference on Systems,
in traffic surveillance and potentially other related fields. The Signals and Image Processing (IWSSIP), Maribor, Slovenia, 2018, pp.
advancements presented not only underscore the capabilities of 1–5.
[13] J. Sang, Z. Wu, P. Guo, H. Hu, H. Xiang, Q. Zhang, and B. Cai, “An
compressed domain methods but also highlight their growing improved yolov2 for vehicle detection,” Sensors, vol. 18, no. 12, p. 4272,
importance in the efficient processing of large-scale video data. 2018.
[14] Z. Wu, J. Sang, Q. Zhang, H. Xiang, B. Cai, and X. Xia, “Multi-
scale vehicle detection for foreground-background class imbalance with
R EFERENCES improved yolov2,” Sensors, vol. 19, no. 15, p. 3336, 2019.
[15] A. Taheri Tajar, A. Ramazani, and M. Mansoorizadeh, “A lightweight
[1] R. V. Babu, M. Tom, and P. Wadekar, “A survey on compressed domain tiny-yolov3 vehicle detection approach,” Journal of Real-Time Image
video analysis techniques,” Multimedia Tools and Applications, vol. 75, Processing, vol. 18, pp. 2389–2401, 2021.
no. 2, pp. 1043–1078, 2016. [16] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Yolov7: Trainable
[2] M. Javed, P. Nagabhushan, and B. B. Chaudhuri, “A review on document bag-of-freebies sets new state-of-the-art for real-time object detectors,”
image analysis techniques directly in the compressed domain,” Artificial arXiv preprint, 2022.
Intelligence Review, pp. 1–30, 2017. [17] D. Zhai, X. Zhang, X. Li, X. Xing, Y. Zhou, and C. Ma, “Object detec-
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 11

tion methods on compressed domain videos: An overview, comparative Behçet Uğur Töreyin received the B.S. degree
analysis, and new directions,” Measurement, vol. 207, p. 112371, 2023. from the Middle East Technical University, Ankara,
[18] L. Zhao, Z. He, W. Cao, and D. Zhao, “Real-time moving object segmen- Turkey in 2001 and the M.S. and Ph.D. degrees
tation and classification from hevc compressed surveillance video,” IEEE from Bilkent University, Ankara, in 2003 and 2009,
Transactions on Circuits and Systems for Video Technology, vol. 28, respectively, all in electrical and electronics engi-
no. 6, pp. 1346–1357, 2018. neering. He is now an Associate Professor with the
[19] P. H. Chan, A. Huggett, G. Souvalioti, P. Jennings, and V. Donzella, Informatics Institute at Istanbul Technical Univer-
“Influence of avc and hevc compression on detection of vehicles through sity. His research interests broadly lie in signal pro-
faster r-cnn,” IEEE Transactions on Intelligent Transportation Systems, cessing and pattern recognition with applications to
vol. 25, no. 1, pp. 203–213, 2024. computational intelligence. His research is focused
[20] B. Deguerre, C. Chatelain, and G. Gasso, “End-to-end traffic flow on developing novel algorithms to analyze and com-
rate estimation from mpeg4 part-2 compressed video streams,” IEEE press signals from multitude of sensors such as visible/infra-red/hyperspectral
Transactions on Intelligent Transportation Systems, vol. 25, no. 8, pp. cameras, microphones, passive infra-red sensors, vibration sensors and spec-
8949–8959, 2024. trum sensors for wireless communications.
[21] Q. Cai, Z. Chen, D. O. Wu, S. Liu, and X. Li, “A novel video coding
strategy in hevc for object detection,” IEEE Transactions on Circuits and
Systems for Video Technology, vol. 31, no. 12, pp. 4924–4937, 2021.
[22] L. Chen, H. Sun, J. Katto, X. Zeng, and Y. Fan, “Fast object detection
in hevc intra compressed domain,” in 2021 29th European Signal
Processing Conference (EUSIPCO), Dublin, Ireland, 2021, pp. 756–760.
[23] M. Alizadeh and M. Sharifkhani, “Compressed domain moving object
detection based on crf,” IEEE Transactions on Circuits and Systems for
Video Technology, vol. 30, no. 3, pp. 674–684, 2020.
[24] J. Feng, S. Li, X. Li, F. Wu, Q. Tian, M. Yang, and H. Ling,
“Taplab: A fast framework for semantic video segmentation tapping into
compressed-domain knowledge,” IEEE Transactions on Pattern Analysis
& Machine Intelligence, vol. 44, no. 03, pp. 1591–1603, 2022.
[25] R. Yang, H. Liu, S. Zhu, X. Zheng, and B. Zeng, “Dfce: Decoder-
friendly chrominance enhancement for hevc intra coding,” IEEE Trans-
actions on Circuits and Systems for Video Technology, vol. 33, no. 3,
pp. 1481–1486, 2023.
[26] H. Choi and I. V. Bajic, “Hevc intra features for human detection,” in
2017 IEEE Global Conference on Signal and Information Processing
(GlobalSIP), Montreal, QC, Canada, 2017, pp. 393–397.
[27] Z. Wang, X. Liu, J. Feng, J. Yang, and H. Xi, “Compressed-domain
highway vehicle counting by spatial and temporal regression,” IEEE
Transactions on Circuits and Systems for Video Technology, vol. 29,
no. 1, pp. 263–274, 2019.
[28] T. Wiegand, G. J. Sullivan, S. Member, G. Bjøntegaard, and A. Luthra,
“Overview of the h.264/avc video coding standard,” IEEE Transactions
on Circuits and Systems for Video Technology, vol. 13, no. 7, 2003.
[29] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of
the high efficiency video coding (hevc) standard,” IEEE Transactions
on Circuits and Systems for Video Technology, vol. 22, no. 12, 2012.
[30] J. Lainema, F. Bossen, W. J. Han, J. Min, and K. Ugur, “Intra coding
of the hevc standard,” IEEE Transactions on Circuits and Systems for
Video Technology, vol. 22, no. 12, pp. 1792–1801, 2012.
[31] K. R. Castleman, Digital image processing. Prentice Hall Press, 1996.
[32] “Darknet: Open source neural networks in C,” https://pjreddie.com/
darknet/, accessed: 2023-05-13.
[33] J.-V. H. reference software, “version hm 16.9,” https://hevc.hhi.
fraunhofer.de/svn/svn\ HEVCSoftware/tags/HM-16.9/, n.d.
[34] M. S. Beratoglu and B. U. Toreyin, “Vehicle license plate detection using
only block partitioning structure of the high efficiency video coding
(hevc),” in 27th Signal Processing and Communications Applications
Conference, SIU 2019. Institute of Electrical and Electronics Engineers
Inc., 2019.
[35] M. S. Beratoğlu and B. U. Töreyin, “Vehicle license plate detector in
compressed domain,” IEEE Access, vol. 9, pp. 95 087–95 096, 2021.

Muhammet Sebul Beratoğlu received a B.S. degree


in Control and Computer Engineering in 2000 and
his M.S. degree in Computer Engineering in 2003
from Istanbul Technical University (ITU), Turkey.
He completed his Ph.D. degree in Computer Sci-
ences at ITU’s Informatics Institute in 2023. His
research interests include signal processing and pat-
tern recognition, with a special focus on smart cities,
intelligent transportation systems, and the Internet of
Things.

You might also like