A Comparison of The Time Interfaces On HP-UX/Itanium
A Comparison of The Time Interfaces On HP-UX/Itanium
HP-UX/Itanium
This short paper provides an overview of the various time-based interfaces on HP-UX.
Each interface is described, and an insight offered into how the data provided by the
interface is generated. Relative performance is evaluated, and any significant accuracy
issues are discussed. Finally, some thoughts on efficient application use of time-based
interfaces are offered.
On most platforms the interval counter is updated once per hardware cycle, so for
example the interval counter on a CPU running at 1.5GHz will be updated once every
1/1,500,000,000th of a second. This is not the case for Montecito however; its interval
counter increases at one quarter the base chip frequency. Measurements made using the
interval counter therefore are not in common time units, and will vary from platform to
platform.
Reading the interval counter directly is the cheapest way to access some form of time.
The cost is approximately 36 cycles, and of course there is no memory or kernel impact
of determining the time in this way.
The interval counter is easily accessed through some simple inline assembly:
#include <ia64/sys/inline.h>
#define GET_ITIMER ((uint64_t) (Asm_mov_from_ar(AREG_ITC,_DFLT_FENCE)))
gethrtime() faces a couple of challenges in making this technique work. Firstly it needs to
be able to access the offset correction stored in kernel address space, and secondly it
needs to be sure that it knows which CPU it was running on the instant it read the interval
counter so as to be sure to apply the right offset correction. For these reasons gethrtime()
is implemented as a system call; however to minimize the cost it is implemented as a
lightweight – not full – system call.
The cost of gethrtime() is very low at around 130 cycles per call. It does access memory,
but performs no in-kernel locking.
To use gethrtime:
#include <time.h>
#include <time.h>
#include <mercury.h>
time()
The time() system call returns the number of seconds since epoch – defined as midnight
1st January 1970 GMT. As such it provides what might be called a “real world” time
suitable for timestamps and other applications where the caller needs to determine a real-
world date and/or time. The value returned from time() can easily be converted to a
meaningful human-readable string using the ctime(3C) family of functions.
time() does its work by copying the seconds field out of the kernels time structure. The
copy is atomic so doesn’t require a lock. This seconds value is returned to the caller.
If the caller supplies an address as a parameter, this same seconds value is written to this
address in the callers address space. This additional step incurs significant overhead, and
can be avoided by supplying a NULL parameter.
#include <time.h>
t1 = time(NULL);
t2 = time(&tloc);
In the first form above time() takes approximately 660 cycles per call; in the second form
924 cycles per call.
times()
The times() system call can be used to determine the user and system CPU consumed by
the caller and its children. However it can also be used to determine time since system
boot in operating system clock ticks (1/100th of a second). The latter allows intervals to
be measured in units of 1/100th of a second.
times() is implemented as a full system call, albeit a very simple one. It sums the user and
system mode CPU usage for threads in the current process, then sums those of this
While summing the resource used by the calling processes threads, times() holds an
exclusive process-level lock called process_lock; the more threads a process has the
longer this lock is held. If multiple threads in the same process call times()
simultaneously, severe contention for this lock can occur, and this can impact both
process performance and system performance generally. times() is not suitable therefore
for concurrent use in a multithreaded process.
To use times():
#include <sys/times.h>
now = times(&dummy);
Frankly it would be nice to be able to specify a NULL value for the struct tms parameter
and thereby avoid the expensive copyout of the target and child CPU use data when using
times() purely for timing purposes. Unfortunately this is not possible.
gettimeofday()
The gettimeofday() system call returns the current time, expressed in seconds and
microseconds, since epoch – midnight January 1st 1970 GMT. The seconds component of
the returned timeval structure can be used as input to any of the ctime(3C) family of time
formatting functions. The seconds and microseconds fields are often used to measure
intervals with high resolution, however we’ll see in a moment that the cost of
gettimeofday() is not insignificant, so other interfaces may be more appropriate for this
purpose.
#include <sys/time.h>
gettimeofday(&end, &tz);
clock_gettime()
The clock_gettime() system call can be used to measure passing walltime
(CLOCK_REALTIME) or execution time (CLOCK_PROFILE). The returned time is
expressed in seconds and nanoseconds.
If the application need is to determine an elapsed time – maybe the duration of an event –
then gethrtime() is likely the most appropriate interface. gethrtime() is supported on HP-
UX 11.11, 11.23 and 11.31. If support for operating system versions prior to 11.31 is not
required, hg_gethrtime() is a slightly better choice.
Generating Timestamps
1. Have a housekeeping thread wake every second to update a “current time” global. If
the time is always formatted using a member of the ctime(3C) family prior to use, we
might store the formatted rather than raw time in the global to avoid having to frequently
re-do the conversion. If the application already has a housekeeping thread that wakes
periodically this may be an easy approach.
2. Call both the cheap time-since-boot interface (e.g. gethrtime()) and the expensive time-
since-epoch interface (e.g. gettimeofday()) once during application startup to establish the
offset between them. Then when the application needs time-since-epoch it calls the cheap
time-since-boot interface and applies the offset calculated at startup.
When using this latter approach care needs to be given to ensuring that processes have a
consistent view of the time. If each of many application processes determines the offset
for themselves they invariably will end up with different values (if a context switch
occurs at an inopportune moment these differences could be considerable!) Rather it is
much better to determine the relationship once, and then share this between all
application processes through some mechanism such as shared memory. Also of course,
if the date or time is explicitly changed by the system administrator after the application
has determined the offset, times calculated in this way may be invalid.
It’s a little-known fact that most of the library functions that manipulate times (see
ctime(3C)) do not scale well in a multithreaded application. Standards dictate that they
respect the current values of the globals timezone, daylight and tzname, as well
as the environment variable TZ. To prevent the value of any of these changing in a
multithreaded process while one of the time conversion functions is in-progress, a single
process-wide mutex is held for the duration. A mutex is of course an exclusive lock, so
only one thread at a time can be executing much of the code in the ctime(3C) family of
functions. To date this has not been the cause of customer dissatisfaction; clearly it’s an
easy enough issue for HP to resolve if it becomes an issue.