Konstantinos Krommydas, Christos D. Antonopoulos, Nikolaos Bellas, Wu-Chun Feng

Uploaded by

tan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

Konstantinos Krommydas, Christos D. Antonopoulos, Nikolaos Bellas, Wu-Chun Feng

Uploaded by

tan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

AVS VIDEO DECODER ON MULTICORE SYSTEMS: OPTIMIZATIONS AND TRADEOFFS

Konstantinos Krommydas¹, Christos D. Antonopoulos², Nikolaos Bellas², Wu-chun Feng¹

¹Department of Computer Science, Virginia Tech, USA

{kokrommy, wfeng}@vt.edu
²Department of Computer and Communications Engineering, University of Thessaly, Greece
{cda, nbellas}@uth.gr

ABSTRACT similar to H264/AVC, and more than two times the coding
efficiency of MPEG 2.
Newer video compression standards provide high video These standards can efficiently handle nowadays‟
quality and greater compression efficiency, compared to typical resolutions and their implementations can provide
their predecessors. Their increased complexity can be the desired frame rate, dictated by human vision‟s real-time
outbalanced by leveraging all the levels of available requirement of about 30 frames per second. The prospective
parallelism, task- and data-level, using available off-the- trends, however, for even higher definitions indicate that the
shelf hardware, such as current generation‟s chip already heavy workload will become heavier, as will the
multiprocessors. As we move to more cores though, technical complexity of future video encoders and decoders.
scalability issues arise and need to be tackled in order to Programmers have new tools, hardware and even new
take advantage of the abundant computational power. computing paradigms in their efforts to overcome such
In this paper we evaluate a previously implemented problems. Unfortunately, trying to apply solutions tailored to
parallel version of the AVS video decoder on the a small number of cores to more introduces a series of
experimental 32-core Intel Manycore Testing Lab. We issues. These can be related to the scalability of a particular
examine this previous version‟s performance bottlenecks algorithm itself or can pertain to side-effects on the part of
and scalability issues and introduce a distributed queue the hardware (e.g. cache-related issues).
implementation as the proposed solution. Finally, we This paper builds on our previous work [2], where we
provide insight on separate optimizations regarding inter found that the hyper-threading feature of Intel Core i7
macroblocks and investigate performance variations and multiprocessor does not cater to further performance gains,
tradeoffs, when combined with a distributed queue scheme. mainly because of contention of the cores‟ shared resources.
Of great importance is the lock-free queue used, whose
Index Terms— AVS codec, task queue, video decoding contention limits any performance gains. As a solution, we
propose a distributed queue scheme.
1. INTRODUCTION In Section 2, we provide background on the AVS
standard and briefly present its base and previous parallel
Advances in video compression techniques and display implementation. In Section 3, we present related literature,
technology have facilitated high definition video (resolutions motivate our work and discuss our contribution. Section 4
up to 1920x1080 pixels) and the first generation of three- introduces our evaluation platform. Section 5, describes our
dimensional television. Meanwhile, Quad Full High distributed queue approach and the inter macroblock (MB)
Definition is making its first steps, and motion picture and optimization tradeoffs and provides results. Section 6
television engineers are paving the way for Ultra High concludes the paper with some thoughts and future work.
Definition TV, which will offer unprecedented picture
clarity of 7680x4320 pixels. 2. AVS DECODER BACKGROUND
The prevalent video standard nowadays, namely
H.264/AVC, is extensively used for high-definition video 2.1. Base implementation
coding. One video codec less known in the west world is the
Chinese Audio Video Standard (AVS), drafted by the AVS The AVS standard follows the MPEG 2‟s basic structure and
Workgroup [1, 3]. AVS workgroup was established by the incorporates similar tools. The decoding process (Fig. 1)
Chinese Ministry of National Information Industry and AVS entails the entropy decoding stage, intra prediction, the
is a national standard. AVS can deliver coding efficiency motion compensation (MC) procedure for inter prediction,
inverse transform, inverse quantization, as well as a smart
otherwise be blocked, do actual work, which is abundant,
especially for inter-decoded frames. Readers interested in
the full set of optimizations (sequential code optimizations,
vectorization) can refer to our original paper [2].

3. RELATED WORK

Most research has focused on H.264. Since AVS and H.264

are based on the same basic principles, much of the work
and conclusions for H.264 are applicable to the AVS (and
vice versa). First, we list some of the literature regarding
Fig. 1. AVS decoder block diagram H.264 which relates/applies to our work and then we
examine research on the AVS standard, in particular.
deblocking filter. It sacrifices some of the video quality and Earlier works of Van der Tol et al. [5], and Chen et al.
coding efficiency in favor of reducing the extra complexity [6] have investigated different levels of parallelization for
that comes with smaller block sizes [3]. In AVS, intra H.264, albeit with limiting assumptions (i.e. static MB-level
predictions can be derived from the neighboring pixels in the scheduling, limited frame-level parallelism). In [7], Mesa et
top left, top, top right and left MBs. A similar dependency al. extend the above works about parallel scalability of the
set exists for the deblocking filter. For more details on the H.264 decoder. They find that task-level parallelism does
AVS standard, the reader can refer to [2, 3]. not scale well, in contrast to data-level parallelism methods.
Distribution of the computation at the MB level proves to be
2.2. Parallel implementation the best solution in terms of scalability and load balancing.
In [4], they present findings on a cache-coherent NUMA
The AVS reference decoder performs decoding in a raster multiprocessor and comment on the limitations of the single
scan order, where MBs are processed from top to bottom shared queue. They conclude that a work stealing technique
and from left to the right. Any parallelization effort has to or a tail submit method could shift these limitations.
take into account the dependencies related to intra prediction Concerning AVS, not much has been done in terms of
and deblocking. Inter encoded MBs, namely these that do optimization for multi-core systems. Instead, there has been
not depend on MBs of the same video frame, can start enough research on VLSI design of specific kernels, such as
decoding as soon as the reference MBs (in previously Motion Compensation (MC) [8], and Inverse Quantization
decoded frames) have been decoded. The latter is ensured kernel [9]. Optimization efforts have also been made for the
by the way frames are decoded (out of order decoding- in heterogeneous Tensilica SIMD processor, as well as
the IPBB form). Experiments show that even in high-bitrate embedded System on Chip designs [10]. However, with
encoded videos, inter MBs are abundant and provide an multi-core processors and very high definition videos
excellent source of available parallelism [2]. becoming the norm, a detailed study of scalable techniques
Available work, i.e. MBs that can start the decoding of the AVS standard to such multiprocessors is necessary.
(MC) process, is dynamically put into a single shared queue. To the best of our knowledge, the only all-around
The worker thread pool takes work from that queue, and optimization strategy for the AVS decoder on a commodity
when possible bypasses the queue using a tail submit scheme multiprocessor system is [2]. Our paper tries to contribute to
[4] (dependence-driven self-scheduling scheme - Fig. 2). the limited literature for AVS, extending the aforementioned
The latter optimization is beneficial for performance in work by investigating the applicability of already known
AVS, as it is in H.264. When a thread finds an available task techniques (from H.264 literature) and by trying to apply
in the queue, it takes and executes it, and updates the new ones. We extend the work of [4] by implementing a
dependency numbers of the neighboring MBs. The first one distributed task-queue and measuring its performance and
whose dependencies it zeroes, it takes for decoding without compare its performance to the second proposed method in
putting it in the queue. The others (if any such exist) it puts [7] (tail submit). Finally, we investigate some new
in the queue for another thread to take. optimizations regarding inter macroblocks and their
In [2] we chose not to assign a single thread exclusively applicability to multi-cores with different number of cores.
for bitstream parsing (BP) and variable length decoding
(VLD). In an effort to present a pragmatic, practical 4. EVALUATION PLATFORM
approach we did not just decouple BP and VLD from the
rest of the decoding and measure pure decoding time. For our experiments, we used the Intel® Manycore Testing
Instead, the first thread to enter the BP/VLD critical section, Lab (MTL) [13]. MTL consists of four socket Intel Xeon
proceeds with BP/VLD, and starts putting eligible MBs in X7560 processors, totaling 32 cores, each running at
the queue. This way, the other threads, which would 2.26GHz. Each of the four multiprocessors features a large
24MB last-level cache. Total system memory amounts to Since we still use a single dependency table (Fig.2) and
64GB. Intel MTL has the Intel® Compiler, Vtune profiler other shared data structures (such as queue head pointers),
and other useful tools for code inspection and optimization. we have to be extra careful of the false sharing effect [11].
The executables were all compiled with the 11.1 version Neighboring data in the same cache line may get invalidated
of the Intel® C/C++ Compiler, with the same set of without reason, leading to high off-chip memory transfers.
optimization options (for fairness). The Linux kernel version This effect is even worse in architectures with larger cache
running on our test machine was 2.6.18. lines. Appropriate techniques, such as padding and proper
Throughout the paper, results refer to the “Rush hour” alignment along cache lines were applied to minimize such
[14] encoded video file, which is indicative of the average negative effects.
case for AVS (according to [2]), at FullHD (1920x1080
pixels), at 20Mbps. It contains a typical amount of inter 5.2. Inter MB decoding optimizations
encoded MBs (in P/B frames) in order to showcase some of
the optimization techniques. The benchmark video follows In [2] we made use of a feature of inter MBs. That is,
the encoding pattern of and YUV 4:2:0 format. identify the type of the MB during the VLD phase, and
enqueue it if it is of inter type (P or B). This (we call it P/B
5. OPTIMIZATIONS optimization) would intuitively allow for the worker threads
to immediately start decoding such enqueued MBs.
5.1. Distributed queues However, for typical encoding bitrates and videos without
special characteristics (e.g. explosions, irregular patterns,
Mesa et al. [7] conclude that a single task queue scheme is sudden movement), inter frames (P/B) consist mainly of
one contention point that prevents video decoders (H.264 inter MBs. This effectively leads to most MBs being
and AVS respectively) from scaling well at large numbers of enqueued during the VLD phase, and thus limiting the
cores. The more threads probing the queue, the more the utilization of the tail submit feature. This, in turn, leads to
work distribution gets serialized. In this paper, we propose a higher contention if a single queue is used and, accordingly,
distributed scheme of multiple task queues, along with a performance deterioration.
work stealing technique. On the other hand, when we want to make use of the
In particular we extend the single lock-free queue of [2], distributed queues scheme, this optimization makes sense, in
by assigning a separate queue to each worker thread. The that it fills all the queues „on-the-fly‟ (during VLD), and the
thread that performs VLD is responsible for assigning the limited use of tail-submit is counterbalanced by the large
inter MBs (i.e. zero intra dependencies) to each of these number of threads working concurrently on their private
queues in a round-robin fashion. In inter frames, where most queues. While for a small number of threads, the tail submit
of the MBs are inter-decoded, this leads to good load technique (without the P/B optimization) is more efficient,
balancing. Although decoding time per MB may vary, the the new scheme (distributed queues) overtakes it as more
work stealing technique takes care of maintaining a good threads are added, and is suspected to scale well for more
balance. Worker threads that have work available in their than the 32 cores available in our experimental platform.
queues continue with actual decoding. When they have no A different approach, in respect to the single queue
work, they resort to work stealing, by referring to the other approach, would be not to enqueue inter MBs on the fly, but
queues in a linear fashion. The update_dependencies zero the number of their dependencies in the dependence
function employs the tail submission technique (queue table (we name this technique P/B zero). Unfortunately, this
bypassing) for the first dependency-free MB, and puts the is practically equal to using the first P/B optimization. The
rest of the MBs it finds with zero dependencies in the only difference is that tail-submit is used for one in four
respective queue. We chose this simple scheme for the MBs (on average), whereas in the original P/B optimization,
update_dependencies function, since the load imbalance it tail submit was utilized even less (talking of inter frames). A
may introduce is negligible, compared to data locality gains. combination of the above technique with the distributed
queues scheme might be a good compromise for a medium
number of cores, but we would need a more efficient
technique for the update_dependencies procedure.

5.3. Results

We present results for the above optimization combinations

in Table 1, in frames per second (VLD is subtracted from
the measurement- we focus on „pure‟ MB decoding). Due to
space restrictions, we showcase only the 8, 16, 32 core runs.
Fig. 2. Parallelization technique schematic.
We observe that for 8 and 16 cores, the distributed ACKNOWLEDGMENTS
queues scheme performs worse than the two single queue
ones. This is mainly due to the extra overhead related to the This work was supported primarily by the Institute for
logistics of the queues and worse data locality. A single Critical Technology and Applied Science (ICTAS).
queue still scales well up to that number of cores. Moreover, We would like to thank the management, staff, and
P/B zero outperforms P/B opt., as it takes better advantage facilities of the Intel® Manycore Testing Lab [12].
of the tail submission technique in inter frames. In P/B opt.
inter MBs enter the single queue immediately as they are 7. REFERENCES
VLDed. This limits the number of MBs bypassing the queue.
Yet, P/B zero zeroes inter MBs‟ dependencies during VLD, [1] AVS Workgroup, http://www.avs.org.cn/en/.
but leaves the enqueuing process to the [2] K. Krommydas, et al., "Mapping and optimization
update_dependencies procedure. This way, one of the of the AVS video decoder on a high performance
zeroed MBs bypasses the queue, and the rest are enqueued. chip multiprocessor," in Multimedia and Expo
This is confirmed by our results (frame-rate and the (ICME), 2010 IEEE International Conference on,
measured number of MBs that bypass the queue). 2010, pp. 896-901.
When it comes to more threads (note that we use 1 [3] G. Wen, "AVS standard - Audio Video Coding
thread/physical core), we can see that the distributed queue Standard Workgroup of China," Wireless and
scheme starts performing better than the single queue Optical Communications, 2005. 14th Annual
techniques. While the first two techniques show a decline in WOCC 2005. International Conference on, 2005
decoding frame rate from 16 to 32 cores, the distributed [4] Mauricio Alvarez, et al., "Performance Evaluation
queues scheme demonstrates a constant frame-rate increase. of Macroblock-level Parallelization of H.264
Decoding on a CC-NUMA Multiprocessor
6. CONCLUSIONS/ FUTURE WORK Architecture," 4CCC: 4th Colombian Computing
[5] E. van der Tol, Jaspers, E., Gelderblom, R,
Video decoders, as many other applications, need to be "Mapping of H.264 Decoding on a Multiprocessor
tackled from a different perspective as the number of cores Architecture.," in Proc. SPIE Conf. on Image and
of future multiprocessors grows. New problems arise and Video Communications and Processing, 2003.
new programming paradigms may have to be eventually [6] Y. Chen, Li, E., Zhou, X., Ge, S, "Implementation
employed to continue accruing performance gains. of H. 264 Encoder and Decoder on Personal
Effective and more complex queue schemes with Computers," Journal of Visual Communications
architecture-aware work stealing, have to be used in order to and Image Representation, vol. 17, 2006.
avoid contention and make best use of available resources. [7] M. A. Mesa, et al., "Scalability of Macroblock-
In our case, we conclude that a distributed queue scheme is level Parallelism for H.264 Decoding," in Parallel
useful only after a (big) number of cores. Small multi-core and Distributed Systems (ICPADS), 2009 15th
systems will still perform reasonably well with single task International Conference on, 2009, pp. 236-243.
queues, combined with techniques that exploit inter MBs‟ [8] Z. Dajiang and L. Peilin, "A Hardware-Efficient
same frame independence property, as those presented. Dual-Standard VLSI Architecture for MC
Additionally, sequential video decoder parts (mainly Interpolation in AVS and H.264," in Circuits and
Variable Length Decoding) constitute a serious bottleneck, Systems, 2007. ISCAS 2007. IEEE International
and need to be optimized to the fullest to reduce Amdahl‟s Symposium on, 2007, pp. 2910-2913.
Law implications on parallelism gains. [9] S. Bin, et al., "An implemented VLSI architecture
At the same time, GPU architectures (e.g. Nvidia of inverse quantizer for AVS HDTV video
CUDA) become prevalent in the area of scientific decoder," in ASIC, 2005. ASICON 2005. 6th
computation and more computationally powerful many-core International Conference On, 2005, pp. 244-247.
GPUs become commercially available. Future work revolves [10] J. Xin, et al., "AVS video standard implementation
around how video decoding algorithms (both as independent for SoC design," in Neural Networks and Signal
kernels, and as a whole) could harness the power of current Processing, 2008 International Conference on,
GPUs, in an efficient CPU-GPU co-scheduling scheme. 2008, pp. 660-665.
[11] J. Torrellas, et al., "False sharing and spatial
Table 1. Results locality in multiprocessor caches," Computers,
# cores P/B opt. P/B zero Distr. Queues IEEE Transactions on, vol. 43, pp. 651-663, 1994.
8 69 fps 80 fps 59 fps [12] Home: www.intel.com/software/manycoretestinglab
16 106 fps 115 fps 88 fps Intel® Software Network: www.intel.com/software
32 104 fps 106.5 fps 112 fps [13] Raw benchmark videos: ftp://ftp.ldv.e-technik.tu-
muenchen.de/pub/test_sequences/1080p/

Node.js 63 Interview Questions and Answers
From Everand
Node.js 63 Interview Questions and Answers
John Edward Cooper Berg
No ratings yet
Software-Defined Networks: A Systems Approach
From Everand
Software-Defined Networks: A Systems Approach
Larry Peterson
5/5 (1)
Analysis and Architecture Design of An HDTV720p 30 Frames/s H.264/AVC Encoder
No ratings yet
Analysis and Architecture Design of An HDTV720p 30 Frames/s H.264/AVC Encoder
16 pages
A Real-Time H.264/AVC Encoder & Decoder With Vertical Mode For Intra Frame and Three Step Search Algorithm For P-Frame
No ratings yet
A Real-Time H.264/AVC Encoder & Decoder With Vertical Mode For Intra Frame and Three Step Search Algorithm For P-Frame
13 pages
VLSI Design for Video Coding 2010th Edition Youn - The ebook is now available, just one click to start reading
100% (1)
VLSI Design for Video Coding 2010th Edition Youn - The ebook is now available, just one click to start reading
35 pages
H.264/ AVC: Compression Standard
No ratings yet
H.264/ AVC: Compression Standard
21 pages
VLSI Design for Video Coding 2010th Edition Youn pdf download
100% (1)
VLSI Design for Video Coding 2010th Edition Youn pdf download
50 pages
A High-Level Simulator For The H.264/AVC Decoding Process in Multi-Core Systems
No ratings yet
A High-Level Simulator For The H.264/AVC Decoding Process in Multi-Core Systems
23 pages
VLSI Design for Video Coding 2010th Edition Youn - The ebook with rich content is ready for you to download
100% (1)
VLSI Design for Video Coding 2010th Edition Youn - The ebook with rich content is ready for you to download
50 pages
HEVC
No ratings yet
HEVC
50 pages
Error Resiliency Schemes in H.264/AVC Standard
No ratings yet
Error Resiliency Schemes in H.264/AVC Standard
26 pages
IJNRD2404873
No ratings yet
IJNRD2404873
4 pages
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Do_Video_Encoding_Workloads_Stress_the_Microarchitecture
No ratings yet
Do_Video_Encoding_Workloads_Stress_the_Microarchitecture
11 pages
H.264 AVC: The Emerging
No ratings yet
H.264 AVC: The Emerging
12 pages
JM Reference Software Manual (JVT-AE010)
No ratings yet
JM Reference Software Manual (JVT-AE010)
90 pages
H.264 Video Encoder Standard - Review
No ratings yet
H.264 Video Encoder Standard - Review
5 pages
Survey 1
No ratings yet
Survey 1
10 pages
Algoritma h264 PDF
No ratings yet
Algoritma h264 PDF
16 pages
H.264/AVC Intra-Only Coding (iAVC) Techniques For Video Over Wireless Networks
No ratings yet
H.264/AVC Intra-Only Coding (iAVC) Techniques For Video Over Wireless Networks
10 pages
Mastering Kubernetes
From Everand
Mastering Kubernetes
Manish Soni
No ratings yet
Electronics Circuit Design
No ratings yet
Electronics Circuit Design
8 pages
Rpribas, 327-QuaseFinal
No ratings yet
Rpribas, 327-QuaseFinal
8 pages
Video Compression Using H.264
No ratings yet
Video Compression Using H.264
27 pages
VHDL_Implementation_of_H264_Video_Coding_Standard
No ratings yet
VHDL_Implementation_of_H264_Video_Coding_Standard
8 pages
H 264/avc
No ratings yet
H 264/avc
23 pages
Configuration of FFmpeg For High Stability During Encoding
No ratings yet
Configuration of FFmpeg For High Stability During Encoding
38 pages
H265 HEVC Overview and Comparison With H264 AVC
No ratings yet
H265 HEVC Overview and Comparison With H264 AVC
19 pages
H.264-video-compression
No ratings yet
H.264-video-compression
4 pages
Parallel Algorithms
No ratings yet
Parallel Algorithms
204 pages
Rate-Constrained Coder Control and Comparison of Video Coding Standards
No ratings yet
Rate-Constrained Coder Control and Comparison of Video Coding Standards
19 pages
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
From Everand
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
Jordan Lioy
No ratings yet
A Tutorial On H.264SVC Scalable Video Coding
No ratings yet
A Tutorial On H.264SVC Scalable Video Coding
24 pages
Hevc 4 Videosense
No ratings yet
Hevc 4 Videosense
61 pages
Cascaded Mpeg Rate Control For Simultaneous Improvement of Accuracy and
No ratings yet
Cascaded Mpeg Rate Control For Simultaneous Improvement of Accuracy and
40 pages
The Next Frontier in Video Encoding: White Paper
No ratings yet
The Next Frontier in Video Encoding: White Paper
7 pages
Ip 58 678 684 PDF
No ratings yet
Ip 58 678 684 PDF
7 pages
Decode Mpeg-2 Video With Virtex Fpgas
No ratings yet
Decode Mpeg-2 Video With Virtex Fpgas
3 pages
NVDEC_VideoDecoder_API_ProgGuide
No ratings yet
NVDEC_VideoDecoder_API_ProgGuide
24 pages
Effective Video Coding For Multimedia Applications
No ratings yet
Effective Video Coding For Multimedia Applications
266 pages
A_High-Throughput_Hardware_Design_for_the_AV1_Decoder_Intraprediction
No ratings yet
A_High-Throughput_Hardware_Design_for_the_AV1_Decoder_Intraprediction
14 pages
Mastering Video Coding A Comprehensive Dive From Tools To Consumer Deployment
No ratings yet
Mastering Video Coding A Comprehensive Dive From Tools To Consumer Deployment
8 pages
Activity-Based Motion Estimation Scheme For
No ratings yet
Activity-Based Motion Estimation Scheme For
11 pages
01 - Overview of The H.264AVC Video Coding Standard
No ratings yet
01 - Overview of The H.264AVC Video Coding Standard
17 pages
Overview of The H.264/AVC Video Coding Standard
No ratings yet
Overview of The H.264/AVC Video Coding Standard
17 pages
NVMe Performance Hacks
From Everand
NVMe Performance Hacks
Mei Gates
No ratings yet
Human Visual System Model: Understanding Perception and Processing
From Everand
Human Visual System Model: Understanding Perception and Processing
Fouad Sabry
No ratings yet
AWS Certified Solutions Architect - Associate Exam Prep kit
From Everand
AWS Certified Solutions Architect - Associate Exam Prep kit
SUJAN
No ratings yet
Edge Computing with Amazon Web Services: A practical guide to architecting secure edge cloud infrastructure with AWS
From Everand
Edge Computing with Amazon Web Services: A practical guide to architecting secure edge cloud infrastructure with AWS
Sean Howard
No ratings yet
Lecture 20- Video Coding
No ratings yet
Lecture 20- Video Coding
36 pages
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
From Everand
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
Steve Brown
No ratings yet
Overview of The AVC
No ratings yet
Overview of The AVC
24 pages
Hexagonal Based Search Pattern For Motion Estimation in H.264AVC
No ratings yet
Hexagonal Based Search Pattern For Motion Estimation in H.264AVC
5 pages
h.256 Presentation
No ratings yet
h.256 Presentation
34 pages
Mastering the Art of x86 Assembly Programming: Unlocking the Secrets of Expert-Level Skills
From Everand
Mastering the Art of x86 Assembly Programming: Unlocking the Secrets of Expert-Level Skills
Steve Jones
No ratings yet
Sim2024 Sara
No ratings yet
Sim2024 Sara
4 pages
Serial parallel dataflow-pipelined processing architecture based accelerator for 2D transform-quantization in video coder and decoder
No ratings yet
Serial parallel dataflow-pipelined processing architecture based accelerator for 2D transform-quantization in video coder and decoder
12 pages
H.265 High Efficiency Video Coding (HEVC) : Presented by
100% (1)
H.265 High Efficiency Video Coding (HEVC) : Presented by
29 pages
White Paper: An Overview of H.264 Advanced Video Coding: Iain Richardson Vcodex / Onecodec 2007-2011
No ratings yet
White Paper: An Overview of H.264 Advanced Video Coding: Iain Richardson Vcodex / Onecodec 2007-2011
7 pages
RD-Optimisation Analysis For H.264/AVC Scalable Video Coding
No ratings yet
RD-Optimisation Analysis For H.264/AVC Scalable Video Coding
5 pages
Appendix-11 General Construction Supervision Manual
No ratings yet
Appendix-11 General Construction Supervision Manual
65 pages
Construction Supervisor Handbook
100% (2)
Construction Supervisor Handbook
74 pages
Exterior Design Magazine
No ratings yet
Exterior Design Magazine
8 pages
Construction Supervisor 1 and 2: General
No ratings yet
Construction Supervisor 1 and 2: General
2 pages
2019 New Build Ebook
No ratings yet
2019 New Build Ebook
76 pages
Sex, Morality, and The Law by Lori Gruen, George Panichas
No ratings yet
Sex, Morality, and The Law by Lori Gruen, George Panichas
472 pages
Pros Cons: Smartphone Camera
No ratings yet
Pros Cons: Smartphone Camera
1 page
FTTH Notes 1
No ratings yet
FTTH Notes 1
48 pages
Kyocera Mita - Taskalfa - Copystar Error Codes PDF
No ratings yet
Kyocera Mita - Taskalfa - Copystar Error Codes PDF
39 pages
16.1.5 Lab - Implement IPsec VTI Site-To-Site VPNs
No ratings yet
16.1.5 Lab - Implement IPsec VTI Site-To-Site VPNs
17 pages
MTCRE
No ratings yet
MTCRE
5 pages
Cnpilot™ E500, 501S, 502S Outdoor Wi-Fi Acess Points
No ratings yet
Cnpilot™ E500, 501S, 502S Outdoor Wi-Fi Acess Points
7 pages
Ipecs Lik: RSGM Installer Guide
No ratings yet
Ipecs Lik: RSGM Installer Guide
43 pages
Chapter 1 Imc 407
No ratings yet
Chapter 1 Imc 407
14 pages
Gujarat Technological University: Page 1 of 3
No ratings yet
Gujarat Technological University: Page 1 of 3
3 pages
Cms 301 Exam - Past Questions-1
No ratings yet
Cms 301 Exam - Past Questions-1
10 pages
Gartner Says Worldwide Traditional PC, Tablet, Ultramobile and Mobile Phone Shipments Are On Pace To Grow 6.9
No ratings yet
Gartner Says Worldwide Traditional PC, Tablet, Ultramobile and Mobile Phone Shipments Are On Pace To Grow 6.9
2 pages
Sony Kv-hz29m65 Kv-hz29m85 Kv-hz29m90 Kv-hz34m85 Chassis Ax-1
No ratings yet
Sony Kv-hz29m65 Kv-hz29m85 Kv-hz29m90 Kv-hz34m85 Chassis Ax-1
145 pages
AXE IO ONE User Manual
No ratings yet
AXE IO ONE User Manual
42 pages
Lecture 1
No ratings yet
Lecture 1
22 pages
BL602/604 Datasheet: Version 1.6
No ratings yet
BL602/604 Datasheet: Version 1.6
30 pages
Verimatrix Encryptionengine™: High-Performance Cryptographic Operations in Compact Form Factor
No ratings yet
Verimatrix Encryptionengine™: High-Performance Cryptographic Operations in Compact Form Factor
2 pages
DLink Product Guide PDF
No ratings yet
DLink Product Guide PDF
44 pages
8904A Specifications
No ratings yet
8904A Specifications
7 pages
12i Digital India GRP 2
No ratings yet
12i Digital India GRP 2
27 pages
HUAWEI ONT Portfolio 06
No ratings yet
HUAWEI ONT Portfolio 06
1 page
Introduction To SMPP
100% (1)
Introduction To SMPP
21 pages
Radio Spectrum Management For A Converging World
No ratings yet
Radio Spectrum Management For A Converging World
24 pages
Super Dual Band User Manual
100% (1)
Super Dual Band User Manual
98 pages
Pretest - Posttest - Emp. Tech
No ratings yet
Pretest - Posttest - Emp. Tech
3 pages
DWM1001C Data Sheet
No ratings yet
DWM1001C Data Sheet
31 pages
Mutiara Smart City Project 2.0
No ratings yet
Mutiara Smart City Project 2.0
29 pages
Coverage OR ELECTRIC POLES ORDINANCE
No ratings yet
Coverage OR ELECTRIC POLES ORDINANCE
3 pages
PCNE Workbook
No ratings yet
PCNE Workbook
83 pages
Types of Internet Connections - Internet Basics
No ratings yet
Types of Internet Connections - Internet Basics
3 pages
FiberHome Introduction (07.12.2015)
No ratings yet
FiberHome Introduction (07.12.2015)
13 pages
Microwave Filters Design: Course Notes
No ratings yet
Microwave Filters Design: Course Notes
27 pages

Konstantinos Krommydas, Christos D. Antonopoulos, Nikolaos Bellas, Wu-Chun Feng

Uploaded by

Konstantinos Krommydas, Christos D. Antonopoulos, Nikolaos Bellas, Wu-Chun Feng

Uploaded by

AVS VIDEO DECODER ON MULTICORE SYSTEMS: OPTIMIZATIONS AND TRADEOFFS

Konstantinos Krommydas¹, Christos D. Antonopoulos², Nikolaos Bellas², Wu-chun Feng¹

¹Department of Computer Science, Virginia Tech, USA

Most research has focused on H.264. Since AVS and H.264

We present results for the above optimization combinations

You might also like