H264 Encoder Short
H264 Encoder Short
Systems
Marco D. Santambrogio1.2 , Henry Hoffmann1 , Jonathan Eastep1 , Jason E. Miller1 , Anant Agarwal1
1
Politecnico di Milano
Dipartimento di Elettronica e Informazione
20133 Milano, Italy
[email protected]
I. I NTRODUCTION
Resources such as quantities of transistors and memory, the level of integration and the speed of components
have increased dramatically over the years. Even though
the technologies have improved, we continue to apply
outdated approaches to our use of these resources. Key
computer science abstractions have not changed since
the 1960s. The operating systems, languages, etc we
use today, were designed for a different era. Therefore
this is the time for a fresh approach to the way systems
are designed and used. The Self-Aware computing research leverages the new balance of resources to improve
performance, utilization, reliability and programmability
[1, 2].
Within this context, imagine a revolutionary computing system that can observe its own execution and
optimize its behavior around a users or applications
needs. Imagine a programming capability by which
users can specify their desired goals rather than how
to perform a task, along with constraints in terms of
an energy budget, a time constraint, or simply a preference for an approximate answer over an exact answer.
Imagine further a computing chip that performs better
according to a users preferred goal the longer it runs
an application. Such an architecture will enable, for
example, a hand-held radio or a cell phone that can
run cooler the longer the connection time. Or, a system
that can perform reliably and continuously in a range
of environments by tolerating hard and transient failures
through self healing. Self-aware computer systems [3]
will be able to configure, heal, optimize and protect
themselves without the need for human intervention.
A. Application Heartbeats
The Application Heartbeats framework provides a
simple, standardized way for applications to report their
performance and goals to external observers [4]. As
shown in Figure 1, this progress can then be observed
by either the application itself or an external system
(such as the OS or another application). Having a simple,
standardized interface makes it easy for programmers to
add Heartbeats to their applications. A standard interface,
or API, is also crucial for portability and inter-operability
between different applications, runtime systems, and
operating systems. Registering an applications goals
with external systems enables adaptive systems to make
optimization decisions while monitoring the programs
performance directly rather than having to infer that
performance from low-level details. If performance is
found to be unacceptable, information gleaned from
hardware counters can help explain why and what should
be changed.
API
Machine
framework
framework
API
To achieve the vision described in the previous sections, a self-aware system must be able to monitor
its behavior to update one or more of its components
(hardware architecture, operating system and running
applications), to achieve its goals. This paper proposes
the vision of organic computation that will create such
a self-aware computing system. An organic computer is
given a goal and a set of resources and their availability,
it then finds the best way to accomplish the goal while
optimizing constraints of interest. An organic computer
has four major properties:
It is goal oriented in that, given application goals,
it takes actions automatically to meet them;
It is adaptive in that it observes itself, reflects on its
behavior to learn, computes the delta between the
goal and observed state, and finally takes actions to
optimize its behavior towards the goal;
It is self healing in that it constantly monitors
for faults and continues to function through them,
taking corrective action as needed;
It is approximate in that it uses the least amount of
computation or energy to meet accuracy goals and
accomplish a given task.
More importantly, much like biological organisms, an organic computer can go well beyond traditional measures
of goodness like performance and can adapt to different
environments and even improve itself over time.
To adapt what the organic computer is doing or how
it is doing a given task at run time, it is necessary to develop a control system as part of the system that observes
execution, measures thresholds and compares them to
goals, and then adapts the architecture, the operating
system or algorithms as needed. A key challenge is to
identify what parts of a computer need to be adapted and
to quantify the degree to which adaptation can afford
savings in metrics of interest to us. Examples of mechanisms that can be adapted include recent researches on
API
API
OS
App
App Parameters
App
(a)
System Parameters
(b)
Fig. 1: (a) Self-optimizing application using the Application Heartbeats framework. (b) Optimization of machine
parameters by an external observer.
The Application Heartbeats framework measures application progress toward goals using a simple and well-
Fig. 2: Smartlocks Architecture. ML engine tunes Smartlock internally to maximize monitor reward signal. Tuning adapts the lock acquisition scheduling policy by configuring a priority lock and per-thread priority settings.
by adjusting Smartlocks lock acquisition scheduling
policy. The scheduler is implemented as a priority lock,
and the scheduling policy is configured by dynamically
manipulating per-thread priority settings.
IV. P RELIMINARY R ESULTS
This section presents several examples illustrating the
use of the Heartbeats framework and Smartlocks. First,
a brief study is presented using Heartbeats to instrument
the PARSEC benchmark suite [37] and after that the
benefits in using Smartlocks have been presented using
a synthetic benchmark.
A. Heartbeats in the PARSEC Benchmark Suite
We present several results demonstrating the simplicity and efficacy of the Heartbeat API. These results all
make use of our reference implementation of the API
which uses file I/O for communication. Results were
collected on an Intel x86 server with dual 3.16 GHz
Xeon X5460 quad-core processors.
To demonstrate the broad applicability of the Heartbeats framework across a range of applications, we apply
it to the PARSEC benchmark suite (v. 1.0). For each
benchmark, we find the outer-most loop used to process
inputs and insert a call to register a heartbeat in that
loop. In some cases, the application is structured so that
multiple inputs are consumed during one iteration of the
loop. Table I shows both how the heartbeats relate to the
input processed by each benchmark and the average heart
rate (measured in beats per second) achieved running
the native input data set1 . The Heartbeat interface
is found to be easy to insert into an application, as
it requires adding less than half-a-dozen lines of code
per benchmark, and only requires identifying the loop
that consumes input data. In addition, the interface is
low-overhead, resulting in immeasurable overhead for 9
1
freqmine and vips are not included as the unmodified benchmarks did not compile on the target system with our installed version
of gcc.
Heartbeat Location
Every 25000 options
Every frame
Every 1875 moves
Every chunk
Every frame
Every query
Every frame
Every 200000 points
Every swaption
Every frame
4.5
3.5
2.5
4
Heartrate
Target Min
Target Max
Cores
1.5
1
0.5
Cores
3
2
1
0
0
50
100
150
200
250
Time (Heartbeat)
studies using heartbeats to develop adaptive video encoders. [39] describes the SpeedGuard run-time system
which can be automatically inserted into applications by
the SpeedPress compiler. SpeedGuard uses heartbeats to
monitor application performance and trade quality-ofservice for performance in the presence of faults such
as core failures or clock-frequency changes.
Workload #1
1.6
x 10
Workload #2
Workload #1
1.4
1.2
Optimal
Smartlock
Priority lock: policy 1
Priority lock: policy 2
0.8
0.6
0.0
0.3
0.6
1.0
1.3
1.6
2.0
2.3
2.6
3.0
3.3
3.6
4.0
Time (seconds)
Fig. 4: Performance results on thermal throttling experiment. Smartlocks adapt to different workloads; no single
static policy is optimal for all of the different conditions.
core speeds.2 No thread migration is assumed. Instead,
the virtual performance of each thread is adjusted by
adjusting heartbeats. The main thread always runs at 3.16
GHz. At any given time, 1 worker runs at 3.16 GHz and
the others run at 2.11 GHz. The thermal throttling events
change which worker is running at 3.16 GHz. The first
event occurs at time 1.4s. The second occurs at time 2.7s
and reverses the first event.
Figure 4 shows several things. First, it shows the
performance of the Smartlock against existing reactive
lock techniques. Smartlock performance is the curve
labeled Smartlock and reactive lock performance is
labeled Spin-lock: reactive lock. The performance of
any reactive lock implementation is upper-bounded by
its best-performing internal algorithm at any given time.
The best algorithm for this experiment is the write-biased
readers-writer lock so the reactive lock is implemented
as that.3 The graph also compares Smartlock against a
baseline Test and Set spin-lock labeled Spin-lock: test
and set for reference. The number of cycles required to
perform each unit of work has been chosen so that the
difference in acquire and release overheads between lock
algorithms is not distracting but so that lock contention
is high; what is important is the policy intrinsic to the
lock algorithm (and the adaptivity of the policy in the
case of the Smartlock). As the figure shows, Smartlock
outperforms the reactive lock and the baseline, implying
that reactive locks are sub-optimal for this and similar
benchmark scenarios.
The second thing that Figure 4 shows is the gap
between reactive lock performance and optimal performance. One lock algorithm / policy that can outperform
standard techniques is the priority lock and prioritized
access. The graph compares reactive locks against two
priority locks / hand-coded priority settings (the curves
labeled Priority lock: policy 1 and Priority lock:
2
[11]
[12]
ACKNOWLEDGMENT
Wed like to thank all the many people we worked
with over the last two years for all the useful discussions
and for their thoughtful ideas. Special thanks to Jason
E. Miller and all the members of the CARBON research
groups and to the Rocca Foundation for its support.
[13]
R EFERENCES
[1] Jeffrey O. Kephart and David M. Chess. The vision
of autonomic computing. Computer, 36(1):4150,
2003.
[2] Mazeiar Salehie and Ladan Tahvildari.
Selfadaptive software: Landscape and research challenges. ACM Trans. Auton. Adapt. Syst., 4(2):142,
2009.
[3] P. Dini. Internet, GRID, self-adaptability and beyond: Are we ready? Aug 2004.
[4] Henry Hoffmann, Jonathan Eastep, Marco D. Santambrogio, Jason E. Miller, and Anant Agarwal.
Application heartbeats for software performance
and health. In PPOPP, pages 347348, 2010.
[5] Jonathan Eastep, David Wingate, Marco D. Santambrogio, and Anant Agarwal. Smartlocks: Selfaware synchronization through lock acquisition
scheduling. SMART 2010: Workshop on Statistical
and Machine learning approaches to ARchitectures
and compilaTion, 2010. Online document, http:
//ctuning.org/dissemination/smart10-05.pdf.
[6] B. Sprunt. Pentium 4 performance-monitoring
features. IEEE Micro, 22(4):7282, Jul/Aug 2002.
[7] R. Kumar, K. Farkas, N.P. Jouppi, P. Ranganathan,
and D.M. Tullsen. Processor power reduction via
single-isa heterogeneous multi-core architectures.
Computer Architecture Letters, 2(1):22, JanuaryDecember 2003.
[8] Intel Inc. Intel itanium architecture software developers manual, 2006.
[9] R. Azimi, M. Stumm, and R. W. Wisniewski.
Online performance analysis by statistical sampling
of microprocessor performance counters. In ICS
05: Proceedings of the 19th Inter. Conf. on Supercomputing, pages 101110, 2005.
[10] Rich Wolski, Neil T. Spring, and Jim Hayes. The
network weather service: a distributed resource
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
performance forecasting service for metacomputing. Future Generation Computer Systems, 15(5
6):757768, 1999.
HP Labs. HP open view self-healing services:
Overview and technical introduction.
David Breitgand, Maayan Goldstein, Ealan Henis,
Onn Shehory, and Yaron Weinsberg. Panacea
towards a self-healing development framework. In
Integrated Network Management, pages 169178.
IEEE, 2007.
C. M. Garcia-Arellano, S. Lightstone, G. Lohman,
V. Markl, and A.Storm. A self-managing relational
database server: Examples from IBMs DB2 universal database for linux unix and windows. IEEE
Transactions on Systems, Man and Cybernetics,
36(3):365 376, 2006.
Andres Quiroz, Nathan Gnanasambandam, Manish
Parashar, and Naveen Sharma. Robust clustering
analysis for the management of self-monitoring
distributed systems. Cluster Computing, 12(1):73
85, 2009.
Salim Hariri, Guangzhi Qu, R. Modukuri, Huoping
Chen, and Mazin S. Yousif. Quality-of-protection
(qop)-an online monitoring and self-protection
mechanism. IEEE Journal on Selected Areas in
Communications, 23(10):19831993, 2005.
Onn Shehory. Shadows: Self-healing complex
software systems. In ASE Workshops, pages 71
76, 2008.
S. S. Vadhiyar and J. J. Dongarra. Self adaptivity in
grid computing. Concurr. Comput. : Pract. Exper.,
17(2-4):235257, 2005.
J. Buisson, F. Andre, and J. L. Pazat. Dynamic
adaptation for grid computing. Lecture Notes in
Computer Science. Advances in Grid Computing EGC, pages 538547, 2005.
P. Reinecke and K. Wolter. Adaptivity metric and
performance for restart strategies in web services
reliable messaging. In WOSP 08: Proceedings of
the 7th International Workshop on Software and
Performance, pages 201212. ACM, 2008.
John Strassner, Sung-Su Kim, and James Won-Ki
Hong. The design of an autonomic communication
element to manage future internet services. In
Choong Seon Hong, Toshio Tonouchi, Yan Ma, and
Chi-Shih Chao, editors, APNOMS, volume 5787 of
Lecture Notes in Computer Science, pages 122
132. Springer, 2009.
Armando Fox, Emre Kiciman, and David Patterson.
Combining statistical monitoring and predictable
recovery for self-management. In WOSS 04:
Proceedings of the 1st ACM SIGSOFT workshop
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]