0% found this document useful (0 votes)
111 views11 pages

A Comparison of The Time Interfaces On HP-UX/Itanium

Time

Uploaded by

akkati123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views11 pages

A Comparison of The Time Interfaces On HP-UX/Itanium

Time

Uploaded by

akkati123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

A Comparison of the Time Interfaces on

HP-UX/Itanium

2/20/07 © Copyright 2007, Hewlett Packard Development Company, L.P 1/11


[email protected], HP Worldwide Presales
Overview
Many applications have a need to determine and manipulate some representation of the
time. HP-UX offers many time-based interfaces. The resolution of the time provided by
these interfaces, the accuracy, whether the provided time is relative to a known epoch or
is relative to the time at which the system booted, plus of course their performance,
varies. Selecting the most appropriate time interface, particularly in application
performance paths, can significantly impact application performance.

This short paper provides an overview of the various time-based interfaces on HP-UX.
Each interface is described, and an insight offered into how the data provided by the
interface is generated. Relative performance is evaluated, and any significant accuracy
issues are discussed. Finally, some thoughts on efficient application use of time-based
interfaces are offered.

2/20/07 © Copyright 2007, Hewlett Packard Development Company, L.P 2/11


[email protected], HP Worldwide Presales
The Interval Counter
Each Itanium processor has an embedded interval counter accessed through register
AR_ITC. During the boot process the interval counter of each CPU is initialized to zero,
so times provided by reading an interval counter are relative to this instant during the
boot process.

On most platforms the interval counter is updated once per hardware cycle, so for
example the interval counter on a CPU running at 1.5GHz will be updated once every
1/1,500,000,000th of a second. This is not the case for Montecito however; its interval
counter increases at one quarter the base chip frequency. Measurements made using the
interval counter therefore are not in common time units, and will vary from platform to
platform.

Unfortunately in multi-CPU systems the interval counters are not synchronized, so


interval timer values from one CPU cannot accurately be compared with those from
another. This severely limits the usefulness of directly reading the interval counter, since
in most application environments the operating system is free to migrate threads between
CPUs at will, and this may cause measurements to become inaccurate.

Reading the interval counter directly is the cheapest way to access some form of time.
The cost is approximately 36 cycles, and of course there is no memory or kernel impact
of determining the time in this way.

The interval counter is easily accessed through some simple inline assembly:

#include <ia64/sys/inline.h>
#define GET_ITIMER ((uint64_t) (Asm_mov_from_ar(AREG_ITC,_DFLT_FENCE)))

uint64_t now = GET_ITIMER;

gethrtime() The Corrected Interval Counter


The HP-UX kernel maintains for each CPU a note of the offset of its interval counter
from that of the monarch. This offset, when added to a value read from the interval
counter, yields a value that is comparable between CPUs. This technique is the basis of
the approach taken by gethrtime().

gethrtime() faces a couple of challenges in making this technique work. Firstly it needs to
be able to access the offset correction stored in kernel address space, and secondly it
needs to be sure that it knows which CPU it was running on the instant it read the interval
counter so as to be sure to apply the right offset correction. For these reasons gethrtime()
is implemented as a system call; however to minimize the cost it is implemented as a
lightweight – not full – system call.

2/20/07 © Copyright 2007, Hewlett Packard Development Company, L.P 3/11


[email protected], HP Worldwide Presales
gethrtime() further converts the cycle-based time derived through the above technique to
nanoseconds.

The cost of gethrtime() is very low at around 130 cycles per call. It does access memory,
but performs no in-kernel locking.

To use gethrtime:

#include <time.h>

hrtime_t now = gethrtime();

In summary, gethrtime() is an excellent interface to use to determine an ultra-high


resolution time. It is accurate across CPUs, and is very cheap. It is ideal in particular for
timing events, even if those events are of short duration. gethrtime() is not directly useful
if a calendar time is required, for example for a transaction timestamp.

hg_gethrtime() A Faster Implementation (11.31 only)


Project Mercury, often called “HG”, introduced a chunk of memory shared between
userspace applications and the kernel, through which one could pass useful information
to the other. The interfaces based on this shared memory include:

hg_public_is_running() is the specified thread executing on a CPU?


hg_public_is_onrunQ() is the specified thread waiting to execute on a CPU?
hg_public_is_reporting() is the specified thread still alive?

hg_setcrit() turn on/off a “don’t preempt me” hint for caller

hg_getspu() which CPU is the caller running on?


hg_context_switch_tries() how many context switches has the caller performed?

hg_gethrcycles() read adjusted interval counter


hg_nano_to_cycle_ratio() return the ratio between cycles and nonseconds
hg_gethrtime() HG-based gethrtime() implementation

hg_gethrtime() differs from gethrtime() in that it is implemented without even a


lightweight system call. The offset correction of each CPU is shared by the kernel
through the user/kernel shared memory and so can be read directly without a system call.
And using hg_getspu() and hg_context_switch_tries() we can ensure that we know which
CPU we’re on prior to reading the interval counter and that we didn’t switch during the
read.

hg_gethrtime() is significantly faster than gethrtime, approximately 61 cycles per call.

2/20/07 © Copyright 2007, Hewlett Packard Development Company, L.P 4/11


[email protected], HP Worldwide Presales
To use hg_gethrtime():

#include <time.h>
#include <mercury.h>

hrtime_t start = hg_gethrtime();

time()
The time() system call returns the number of seconds since epoch – defined as midnight
1st January 1970 GMT. As such it provides what might be called a “real world” time
suitable for timestamps and other applications where the caller needs to determine a real-
world date and/or time. The value returned from time() can easily be converted to a
meaningful human-readable string using the ctime(3C) family of functions.

time() does its work by copying the seconds field out of the kernels time structure. The
copy is atomic so doesn’t require a lock. This seconds value is returned to the caller.

If the caller supplies an address as a parameter, this same seconds value is written to this
address in the callers address space. This additional step incurs significant overhead, and
can be avoided by supplying a NULL parameter.

Examples of the use of time():

#include <time.h>

time_t t1, t2;


time_t tloc;

t1 = time(NULL);
t2 = time(&tloc);

In the first form above time() takes approximately 660 cycles per call; in the second form
924 cycles per call.

times()
The times() system call can be used to determine the user and system CPU consumed by
the caller and its children. However it can also be used to determine time since system
boot in operating system clock ticks (1/100th of a second). The latter allows intervals to
be measured in units of 1/100th of a second.

times() is implemented as a full system call, albeit a very simple one. It sums the user and
system mode CPU usage for threads in the current process, then sums those of this

2/20/07 © Copyright 2007, Hewlett Packard Development Company, L.P 5/11


[email protected], HP Worldwide Presales
process’ children. It copies the results out to the user-supplied buffer, and then returns the
value of the kernel variable ticks_since_boot.

The approximate cost of a call to times() is 1200 cycles.

While summing the resource used by the calling processes threads, times() holds an
exclusive process-level lock called process_lock; the more threads a process has the
longer this lock is held. If multiple threads in the same process call times()
simultaneously, severe contention for this lock can occur, and this can impact both
process performance and system performance generally. times() is not suitable therefore
for concurrent use in a multithreaded process.

To use times():

#include <sys/times.h>

struct tms dummy;


clock_t now;

now = times(&dummy);

Frankly it would be nice to be able to specify a NULL value for the struct tms parameter
and thereby avoid the expensive copyout of the target and child CPU use data when using
times() purely for timing purposes. Unfortunately this is not possible.

gettimeofday()
The gettimeofday() system call returns the current time, expressed in seconds and
microseconds, since epoch – midnight January 1st 1970 GMT. The seconds component of
the returned timeval structure can be used as input to any of the ctime(3C) family of time
formatting functions. The seconds and microseconds fields are often used to measure
intervals with high resolution, however we’ll see in a moment that the cost of
gettimeofday() is not insignificant, so other interfaces may be more appropriate for this
purpose.

gettimeofday() is implemented as a full system call, albeit with a minimal amount of


locking. During the processing of each kernel clock tick (1/100th of a second) the kernel
advances the time since epoch, and notes the corrected interval timer value. When a call
is made to gettimeofday() the caller reads the stored time and applies an offset based on
the difference between the stored and current interval timer values. The returned value is
of a high resolution and accurate.

gettimeofday() is not cheap however. It takes approximately 3750 cycles.

2/20/07 © Copyright 2007, Hewlett Packard Development Company, L.P 6/11


[email protected], HP Worldwide Presales
To use gettimeofday():

#include <sys/time.h>

struct timeval start, end;


struct timezone tz;

gettimeofday(&end, &tz);

clock_gettime()
The clock_gettime() system call can be used to measure passing walltime
(CLOCK_REALTIME) or execution time (CLOCK_PROFILE). The returned time is
expressed in seconds and nanoseconds.

While clock_gettime() appears to provide higher resolution than gettimeofday(),


internally clock_gettime() is implemented through a call to gettimeofday(), with the result
simply copied from the seconds/microseconds format to seconds/nanoseconds. The
performance characteristics of each are identical, as is the effective resolution.

2/20/07 © Copyright 2007, Hewlett Packard Development Company, L.P 7/11


[email protected], HP Worldwide Presales
OK across
Precision Accuracy CPUs? Cost (cycles) Notes
Read Interval counter Hardware cycles Hardware cycles No 36
hg_gethrtime() Nanoseconds Nanoseconds Yes 61 HP-UX 11.31 only
gethrtime() Nanoseconds Nanoseconds Yes 130
Faster if called with NULL
time() Seconds Seconds Yes 660/924 parameter
times() 1/100th second 1/100th second Yes 1200
gettimeofday() Microseconds Microseconds Yes 3750
clock_gettime() Nanoseconds Microseconds Yes 3900

2/20/07 © Copyright 2007, Hewlett Packard Development Company, L.P 8/11


[email protected], HP Worldwide Presales
Some Thoughts
None of the time interfaces is particularly expensive – with the one exception of times()
called from multiple threads in the same process concurrently, where the cost of
contention can be considerable. However it is not uncommon for applications to call a
time interface tens or even hundreds of thousands of times a second, and almost anything
called this frequently becomes a performance concern.

Determining Elapsed Times

If the application need is to determine an elapsed time – maybe the duration of an event –
then gethrtime() is likely the most appropriate interface. gethrtime() is supported on HP-
UX 11.11, 11.23 and 11.31. If support for operating system versions prior to 11.31 is not
required, hg_gethrtime() is a slightly better choice.

Generating Timestamps

Another common application need is to generate timestamps; these generally need to be


expressed as time-since-epoch rather than the time-since-boot available from gethrtime().
A couple of approaches can reduce the cost of generating these timestamps:

1. Have a housekeeping thread wake every second to update a “current time” global. If
the time is always formatted using a member of the ctime(3C) family prior to use, we
might store the formatted rather than raw time in the global to avoid having to frequently
re-do the conversion. If the application already has a housekeeping thread that wakes
periodically this may be an easy approach.

2. Call both the cheap time-since-boot interface (e.g. gethrtime()) and the expensive time-
since-epoch interface (e.g. gettimeofday()) once during application startup to establish the
offset between them. Then when the application needs time-since-epoch it calls the cheap
time-since-boot interface and applies the offset calculated at startup.

When using this latter approach care needs to be given to ensuring that processes have a
consistent view of the time. If each of many application processes determines the offset
for themselves they invariably will end up with different values (if a context switch
occurs at an inopportune moment these differences could be considerable!) Rather it is
much better to determine the relationship once, and then share this between all
application processes through some mechanism such as shared memory. Also of course,
if the date or time is explicitly changed by the system administrator after the application
has determined the offset, times calculated in this way may be invalid.

2/20/07 © Copyright 2007, Hewlett Packard Development Company, L.P 9/11


[email protected], HP Worldwide Presales
Manipulating and Formatting Times

It’s a little-known fact that most of the library functions that manipulate times (see
ctime(3C)) do not scale well in a multithreaded application. Standards dictate that they
respect the current values of the globals timezone, daylight and tzname, as well
as the environment variable TZ. To prevent the value of any of these changing in a
multithreaded process while one of the time conversion functions is in-progress, a single
process-wide mutex is held for the duration. A mutex is of course an exclusive lock, so
only one thread at a time can be executing much of the code in the ctime(3C) family of
functions. To date this has not been the cause of customer dissatisfaction; clearly it’s an
easy enough issue for HP to resolve if it becomes an issue.

2/20/07 © Copyright 2007, Hewlett Packard Development Company, L.P 10/11


[email protected], HP Worldwide Presales
Disclaimer
The information contained herein is subject to change without notice. The only
warranties for HP products and services are set forth in the express warranty statements
accompanying such products and services. Nothing herein should be construed as
constituting an additional warranty. HP shall not be liable for technical or editorial errors
or omissions contained herein.

© Copyright 2006, Hewlett Packard Development Company, L.P

Itanium is a registered trademark of Intel Corporation or its subsidiaries in the United


States and other countries.

Printed in the US.

2/20/07 © Copyright 2007, Hewlett Packard Development Company, L.P 11/11


[email protected], HP Worldwide Presales

You might also like