51 Solaris Dev Day DTrace060522
51 Solaris Dev Day DTrace060522
A Deep Dive
Peter Karlsson
Technology Evangelist
Sun Microsystems
http://opensolaris.org
1
Agenda
Why DTrace
What is DTrace
D - the language
Solaris DTrace Providers
DTrace and Java
DTrace resources
Why DTrace?
Transient problems are hard to debug.
Example.
> Who sent a kill signal to my process
> Thread gets preempted when it should not
> In live production system my application does not scale
Current Options
Reproduce problem outside of production
> Not easy & Expensive
Current Options
Custom instrumented application or kernel
>
>
>
>
>
arbitrary data
> permit dynamically turning on/off instrumentation
> be performant to run in production
> ensures safety
DTrace
Over 30K probes built into Solaris 10
Can create more probes on the fly
New powerful, dynamically interpreted language (D)
to instantiate probes
Probes are light weight and low overhead
No overhead if probe not enabled
Safe to use on live system
# dtrace -l |wc -l
38485
7
Part I - DTrace
lockstat(1M)
DTrace(1M)
plockstat(1M)
libDTrace(3LIB)
userland
DTrace(7D)
kernel
DTrace
DTrace
providers
sysinfo
proc
vminfo
syscall
fasttrap
sdt
fbt
9
How it works
D- Language:
11
The D Language
DTrace uses new scripting language called D
> Dynamically interpreted language
12
D Language - Format.
probe description
/ predicate /
{
action statements
}
syscalls.d
13
DTrace Example.
# dtrace -n BEGIN -n END
dtrace: description BEGIN matched 1 probe
dtrace: description END matched 1 probe
CPU
ID
FUNCTION:NAME
0
1
:BEGIN
^C
0
2
:END
#
Output
>
>
>
>
Hello World in D
hello.d
#!/usr/sbin/dtrace -s
BEGIN
{
printf(Hello World\n);
exit(0);
}
END
{
printf(Goodbye Cruel World\n);
}
exit.
15
DEMO
16
Providers
Providers represent a methodology for
instrumenting the system
Providers make probes available to the DTrace
framework
DTrace informs providers when a probe is to be
enabled
Providers transfer control to DTrace when an
enabled probe is fired
Examples
syscall provider provides probes in every system call fbt
provider provides probe into every function in the kernel
17
Probe
Probes are points of instrumentation
Probes are made available by providers
Probes identify the module and function that they
instruments
Each probe has a name
These four attributes define a tuple that uniquely
identifies each probe
provider:module:function:name
Example
syscall::open:entry
18
Listing Probes.
Probes can be listed with the -l option to
dtrace(1M)
>
>
>
>
Predicates
A predicate is a D expression
Actions will only be executed if the predicate
expression evaluates to true
A predicate takes the form /expression/ and is
placed between the probe description and the
action
Examples
> Print the pid of every ls process that is started
pred.d
#!/usr/sbin/dtrace -s
proc:::exec-success
/execname == "ls"/
{
}
20
Actions
Actions are executed when a probe fires
Actions are completely programmable
Most actions record some specified state in the
system
Some actions change the state of the system
system in a well-defined manner
> These are called destructive actions and are disabled by
default.
21
Destructive actions
We saw that DTrace is safe to use on a live system
because there are checks to make sure it does not modify
what it observes.
There are some cases where you want to change the state
of the system.
Example
> stop a process to better analyze it.
> kill a runaway process
> run a process using system
22
Destructive actions
stop()
> stop the process that fired the probe. Use prun to make the
raise(int signal)
> Send signal to the process that fired the probe.
system(string program)
> Similar to system call in C. Run the program. system also allows
23
24
Built-in Variable
Here are a few built-in variables.
arg0 ... arg9 Arguments represented in int64_t format
args[ ] - Arguments represented in correct type based on function
cpu current cpu id
cwd current working directory
errno error code from last system call
gid, uid real group id, user id
pid, ppid, tid process id, parent proc id & thread id
probeprov, probemod, probefunc, probename - probe info
timestamp, walltimestamp, vtimestamp time stamp nano sec from
an arbitary point and nano sec from epoc
25
External Variable
DTrace provides access to kernel & external
variables.
To access value of external variable use `
#!/usr/sbin/dtrace -qs
dtrace:::BEGIN
{
printf("physmem is %d\n", `physmem);
printf("maxusers is %d\n", `maxusers);
printf("ufs:freebehind is %d\n", ufs`freebehind);
exit(0);
}
ext.d
26
Aggregation
Think of a case when you want to know the total
time the system spends in a function.
> We can save the amount of time spent by the function
> If the function was called 1000 times that is 1000 bits of info
stored in the buffer just for us to finally add to get the total.
Aggregates
Often the patterns are more interesting than each
individual sample
Want to aggregate data to look for trends
Aggregates as first class operations
Aggregation is the result of an aggregating function
Examples:
> count()
> max(), min(), avg()
> quantize()
28
Aggregation - Format
@name[keys] = aggfunc(args);
'@' - key to show that name is an aggregation.
keys comma separated list of D expressions.
aggfunc could be one of...
>
>
>
>
>
Aggregation Example
#!/usr/sbin/dtrace -s
pid$target:libc:malloc:entry
{
@["Malloc Distribution"]=quantize(arg0);
}
aggr_malloc.d
$ aggr2.d -c who
dtrace: script './aggr_malloc.d' matched 1 probe
...
dtrace: pid 6906 has exited
Malloc Distribution
value ------------- Distribution --------------------------------------------------count
1|
0
2 |@@@@@@@@@@@@@@@@@
3
4|
0
8 |@@@@@@
1
16 |@@@@@@
1
32 |
0
64 |
0
128 |
0
256 |
0
512 |
0
1024 |
0
2048 |
0
4096 |
0
8192 |@@@@@@@@@@@
2
16384 |
0
30
31
Aggregation example
#!/usr/sbin/dtrace -s
syscall::mmap:entry
{
@a["number of mmaps"] = count();
@b["average size of mmaps"] = avg(arg1);
@c["size distribution"] = quantize(arg1);
}
profile:::tick-10sec
{
printa(@a);
printa(@b);
printa(@c);
clear(@a);
clear(@b);
clear(@b);
}
32
syscall::open*:return,
syscall::close*:return
{
timespent = timestamp - ts;
printf("ThreadID %d spent %d nsecs in %s", tid, timespent, probefunc);
ts=0; /*allow DTrace to reclaim the storage */
timespent = 0;
}
33
34
DTrace Providers
35
Providers
Here is the list of providers
>
>
>
>
>
>
>
>
>
>
Providers - cont.
We will now see some more details on a few Solaris
Providers
>
>
>
>
>
>
37
fbt Provider
The fbt Function Boundary Tracing provider has probe into most functions in the
kernel.
Using fbt probe you can track entry and return from almost every function in the
kernel.
There are over 20,000 fbt probe in even the smallest Solaris systems
You'd need Solaris internal knowledge to be able to use this effectively
Once opensolaris.org has entire Solaris code you will be able to use these probes
more effectively.
Very useful if you develop your own kernel module.
We will see a few examples.
38
fbt1.d
fbt:::
/self->traceme/
{}
syscall::ioctl:return
/self->traceme/
{
self->traceme = 0;
exit(0);
}
39
profile Provider
Profile providers has probes that will fire at regular intervals.
These probes are not associated with any kernel or user
code execution
profile provider has two probes. profile probe and tick
probe.
format for profile probe: profile-n
> The probe will fire n times a second on every CPU.
> An optional ns or nsec (nano sec), us or usec (microsec), msec or
40
This one tracks how the priority of process changes over time.
#!/usr/sbin/dtrace -qs
profile-1001
/pid == $1/
{
@proc[execname]=lquantize(curlwpsinfo->pr_pri,0,100,10);
prio.d
41
tick-n probe
Very similar to profile-n probe
Only difference is that the probe only fires on one
CPU.
The meaning of n is similar to the profile-n probe.
42
proc Provider
The proc Provider has probes for process/lwp lifecycle
create fires when a proc is created using fork and its variants
exec fires when exec and its variants are called
exec-failure & exec-success when exec fails or succeeds
lwp-create, lwp-start, lwp-exit lwp life cycle probes
signal-send, signal-handle, signal-clear probes for various
signal states
> start fires when a process starts before the first instruction is
executed.
>
>
>
>
>
43
Examples
The following script prints all the processes that are
created. It also prints who created these process as well.
#!/usr/sbin/dtrace -qs
proc:::exec
{
self->parent = execname;
}
proc:::exec-success
/self->parent != NULL/
{
@[self->parent, execname] = count();
self->parent = NULL;
}
proc1.d
proc:::exec-failure
/self->parent != NULL/
{
self->parent = NULL;
}
END
{
printf("%-20s %-20s %s\n", "WHO", "WHAT", "COUNT");
printa("%-20s %-20s %@d\n", @);
}
44
More Examples
The following script prints all the signals that are sent in the
system. It also prints who sent the signal to whom.
#!/usr/sbin/dtrace -qs
proc:::signal-send
{
@[execname, stringof(args[1]->pr_fname),args[2]] = count();
}
proc2.d
END
{
printf("%20s %20s %12s %s\n", "SENDER", "RECIPIENT", "SIG", "COUNT");
printa("%20s %20s %12d %@d\n", @);
}
$ ./proc2.d
^C
SENDER
sched
sched
sched
sched
ksh
ksh
RECIPIENT
dtrace
ls
ksh
ksh
ksh
ksh
SIG COUNT
21
21
18 4
25
25
20 12
45
46
47
Examples:
pid1234:date:main:entry
pid1122:libc:open:return
pid1.d
48
Examples:
pid1234:date:main:16
pid1122:libc:open:4
offs.d
49
trace_code.d
pid$1::$2:return
/self->trace_code/
{
exit(0);
}
Execute.
# trace_code.d 1218 printf
50
process stack
52
cpy3.d
53
sig1.d
proc:::signal-send
/args[1]->pr_fname == $$1/
{
printf("%s(pid:%d) is sending signal %d to %s\n", execname, pid, args[2],args[1]->pr_fname);
}
$ ./sig1.d bc
sched(pid:0) is sending signal 24 to bc
sched(pid:0) is sending signal 24 to bc
bash(pid:3987) is sending signal 15 to bc
bash(pid:3987) is sending signal 15 to bc
bash(pid:3987) is sending signal 9 to bc
The above program prints out process that is sending the signal to the
program bc.
Note: $$1 is argument 1 as string
The signal-send probe has arg1 that has info on signal destination
The signal-send probe has args2 that has the signal number
54
55
56
ustack
$ ./ustk.d -c "java -version"
dtrace: script './ustk.d' matched 1 probe
java version "1.5.0_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_01-b08)
Java HotSpot(TM) Client VM (build 1.5.0_01-b08, mixed mode, sharing)
#!/usr/sbin/dtrace -s
CPU ID
FUNCTION:NAME
syscall::write:entry
0 13
write:entry
/pid == $target/
libc.so.1`_write+0x8
{
libjvm.so`JVM_Write+0xb8
ustack(50,500);
libjava.so`0xfe99f580
libjava.so`Java_java_io_FileOutputStream_writeBytes+0x3c
}
java/io/FileOutputStream.writeBytes
java/io/FileOutputStream.writeBytes
java/io/FileOutputStream.write
java/io/BufferedOutputStream.flushBuffer
java/io/BufferedOutputStream.flush
java/io/PrintStream.write
sun/nio/cs/StreamEncoder$CharsetSE.writeBytes
sun/nio/cs/StreamEncoder$CharsetSE.implFlushBuffer
sun/nio/cs/StreamEncoder.flushBuffer
java/io/OutputStreamWriter.flushBuffer
java/io/PrintStream.write
java/io/PrintStream.print
java/io/PrintStream.println
sun/misc/Version.print
0xf8c05764
0xf8c00218
libjvm.so`__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_+0x548
libjvm.so`jni_CallStaticVoidMethod+0x4a8
java`main+0x824
java`_start+0x108
57
ustack1.d
jstack() action
jstack action prints mixed mode stack trace
Both java frames and native (C/C++) frames are shown
Only JVM versions 5.0_01 and above are supported
jstack shows hex numbers for JVM versions before 5.0_01
Example (usejstack.d)
#!/usr/sbin/dtrace -s
syscall::pollsys:entry
/ pid == $1 / {
jstack(50);
}
Integer argument to limit the number of frames shown
58
Sample Output
$ usejstack.d 1344 | c++filt
libc.so.1`__pollsys+0xa
libc.so.1`poll+0x52
libjvm.so`int os_sleep(long long,int)+0xb4
libjvm.so`int os::sleep(Thread*,long long,int)+0x1ce
libjvm.so`JVM_Sleep+0x1bc
java/lang/Thread.sleep
dtest.method3
dtest.method2
dtest.method1
dtest.main
[... more output deleted for brevity ...]
59
vm-init()
vm-death();
thread-start(char *thread_name);
thread-end();
class-load(char *class_name);
class-unload(char *class_name);
gc-start();
gc-finish();
gc-stats(long used_objects, long used_object_space);
object-alloc(char *class_name, long size);
object-free(char *class_name);
method-entry(char *class_name, char *method_name, char
*method_signature);
> method__return(char *class_name, char *method_name, char
*method_signature);
>
>
>
>
>
>
>
>
>
>
>
>
61
62
63
java_method_count.d
dvm$target:::method-entry
{
@[copyinstr(arg0),copyinstr(arg1)] = count();
}
# ./java_method_count.d -p `pgrep -n java`
64
65
DTrace in Mustang
In Mustang will support DTrace out of the box
It will provide the probes dvm provided and will
include the following
> Method compilation (method-compile-begin/end)
> Compiled method load/unload(compiled-method-
load/unload)
> JNI method probes.
> DTrace probes as entry and return from each JNI method.
VM lifecycle probes
Thread lifecycle probes
Classloading probes
Garbage collection probes
Method compilation probes
Monitor probes
Application probes (object alloc, method entry/return)
68
VM Lifecycle Probes
hotspot$1:::vm-init-begin {
/* actions */
}
hotspot$1:::vm-init-end {
/* actions */
}
hotspot$1:::vm-shutdown {
/* actions */
}
69
Classloading Probes
hotspot$1:::class-loaded {
self->str_ptr = (char*) copyin(arg0, arg1+1);
self->str_ptr[arg1] = '\0';
self->name = (string) self->str_ptr;
printf(class %s loaded\n, self->name);
}
hotspot$1:::class-unloaded {
/* actions */
}
71
72
Monitor Probes
hotspot$1:::monitor-contended-enter {
/* actions */
}
hotspot$1:::monitor-contended-entered {
/* actions */
}
hotspot$1:::monitor-wait {
/* actions */
}
74
hotspot$1:::object-alloc {
self->str_ptr = (char*) copyin(arg1, arg2+1);
self->str_ptr[arg2] = '\0';
self->classname = (string) self->str_ptr;
@allocs_count[self->classname] = count();
@allocs_size[self->classname] = sum(arg3);
}
75
Method Frequency
hotspot$1:::method-entry {
self->ptr = (char*)copyin(arg1, arg2+1);
self->ptr[arg2] = '\0';
self->classname = (string)self->ptr;
self->ptr = (char*)copyin(arg3, arg4+1);
self->ptr[arg4] = '\0';
self->methodname = (string)self->ptr;
77
jstack();
}
78
79
80
81
dvm provider
java.net project to add DTrace support in
1.4.2 and 1.5
https://solaris10-dtrace-vm-agents.dev.java.net/
Download shared libs
> libdvmti.so java 1.5
> libdvmpi.so java 1.4.2
84
85
java_method_count.d
dvm$target:::method-entry
{
@[copyinstr(arg0),copyinstr(arg1)] = count();
}
# ./java_method_count.d -p `pgrep -n java`
86
87
vm-init()
vm-death();
thread-start(char *thread_name);
thread-end();
class-load(char *class_name);
class-unload(char *class_name);
gc-start();
gc-finish();
gc-stats(long used_objects, long used_object_space);
object-alloc(char *class_name, long size);
object-free(char *class_name);
method-entry(char *class_name, char *method_name, char
*method_signature);
> method__return(char *class_name, char *method_name, char
*method_signature);
>
>
>
>
>
>
>
>
>
>
>
>
88
DTrace in Mustang
In Mustang will support DTrace out of the box
It will provide the probes dvm provided and will
include the following
> Method compilation (method-compile-begin/end)
> Compiled method load/unload(compiled-method-
load/unload)
> JNI method probes.
> DTrace probes as entry and return from each JNI method.
DTrace resources
90
91
DTrace Resources
93
Thank you!
Peter Karlsson
Technology Evangelist
Sun Microsystems
http://opensolaris.org
94
Reference Slides
95
0-60.d
96
Provider details
97
dtrace Provider
The dtrace provider provides three probes (BEGIN,
END, ERROR)
> BEGIN
> BEGIN is the first probe to fire.
> All BEGIN clauses will fire before any other probe fires.
> Typically used to initialize.
> END
> Will fire after all other probes are completed
> Can be used to output results
> ERROR
> Will fire under an error condition
> For error handling
98
dtrace.d
ERROR
{
printf("Error has occurred!");
}
END
{
}
99
lockstat Provider
lockstat has two kinds of probes. contention-event probes and hold-event probes.
> contention-event Used to track contention events. As these are rare it does
not impose too much of an overhead and so can be safely enabled
> hold-event These are to track acquiring and releasing locks. Enabling these
probes can incur an overhead as these events are more common.
lockstat allows you to probe adaptive, spin, thread and reader and writer locks.
100
lockstat probes
Contention-event
Hold-event
Adaptive
adaptive-block, adaptivespin
Spin
Thread
spin-spin
adaptive-acquire,
adaptive block
spin-acquire, spinblock
Reader Writer
thread-spin
rw-block
101
lockstat - Example
Here is an example. It counts all the lock events of the given executable.
#!/usr/sbin/dtrace -qs
lockstat:::
/execname==$$1/
{
@locks[probename]=count();
}
lockstat.d
102
plockstat provider
One final provider that may be of interest is plockstat
plockstat is the user land equivalent of lockstat in kernel.
Three types of lock events can be traced.
Contention events probes for user level lock contention
Hold events probes for lock acquiring, releasing etc.
Error events error coditions.
There are two families of probes
Mutex Probes
Reader Writer lock Probes
103
plockstat Providers
Contention
Hold Probe Error Probe
Probe
Mutex Probes
Reader/Writer
lock probes
mutex-block
mutex-spin
mutex-acquire
mutex-release
mutex-error
rw-block
rw-acquire
rw-release
rw-error
104
sched Provider
The sched provider allows users to gain insight into how a process is scheduled. It
helps answer questions like why and when did the thread of interest change priority.
The following are a few probes that are part of the sched provider.
change-pri When priority changes
dqueue/enqueue when proc taken off or put on the run queue
off-cpu / on-cpu when thread taken off or put on a cpu.
preempt when thread preempted
sleep / wakeup when thread sleep on a synchronization object and when it
wakes up.
105
sched examples.
This script prints the distribution of the time threads spends on a cpu.
#!/usr/sbin/dtrace -qs
sched:::on-cpu
{
self->ts = timestamp;
}
sched.d
sched:::off-cpu
/self->ts/
{
@[cpu] =quantize(timestamp - self->ts);
}
106
Arrays
name[key] = expression;
name name of array
key list of scalar expression values (tuples)
expression evaluates to the type of array
#!/usr/sbin/dtrace -s
syscall::open*:entry,
syscall::close*:entry
{
ts[probefunc,pid,tid]=timestamp; /* save time stamp at entry */
}
syscall::open*:return,
syscall::close*:return
{
timespent = timestamp - ts[probefunc,pid,tid];
printf("%s threadID %d spent %d nsecs in %s\n", execname, tid, timespent, probefunc);
/* print time-spent at return */
ts[probefunc,pid,tid]=0;
timespent = 0;
}
array.d
107
struct construct
struct type{
element1;
element2;
}
Example
struct info{
string f_name;
int count;
int timespent;
} /* definition of struct info */
struct info my_callinfo; /* Declaring my_callinfo as variable of type info */
my_callinfo.f_name; /* access to member of struct */
108
109
110
contain.d
111
112
sig2.d
proc:::signal-send
/args[1]->pr_fname == $$1/
{
printf("%s(pid:%d) is sending signal %d to %s\n", execname, pid, args[2],args[1]->pr_fname);
stop();
}
$ ./sig2.d bc
bash(pid:3987) is sending signal 9 to bc
113
Postmortem Tracing
114
Postmortem tracing
A nifty feature of DTrace is to be able to dig DTrace related info from a system
crash dump.
Feature could be very useful to support engineers
Here is how it works.
> Load core dump into mdb
> ::dtrace_state prints out details about all dtrace consumers when the dump
was generated.
> Take the address for dtrace consumer and
> <addr>::dtrace prints all the info from dtrace buffer.
115
Postmortem tracing
You can create a ring buffer of data using dtrace
> Use the -b option for data size & -bufpolicy=ring for ring buffer policy.
You can leave this running and if system crashes you can analyze the buffer from
the crash dump.
Options
> <addr>::dtrace -c 1
> Print only info from cpu 1.
116
Speculation
We will now see how to catch a bad code path using speculation!
Here is why we need speculation.
> Some time we only see error message after the error has occured.
For example: We see a function return an error but the problem was caused
by something that the function did earlier.
We see the error and want to go back and find out what the function did
wrong. But alas the function has already happened
> One solution could be to save details every time the function executes but this is
wasting trace buffer with a lot of useless data when were are only concerned
about the one time the function failed.
> A better solution - speculation
117
Speculation - example
pid$target::fopen64:entry
{
self->spec = speculation();
speculate(self->spec);
printf("Path is %s\n", copyinstr(arg0));
}
pid$target:::entry
/self->spec/
{
speculate(self->spec);
}
pid$target:::return
/self->spec/
{
speculate(self->spec);
}
spec.d
pid$target::fopen64:return
/self->spec && arg1 != 0/
{
discard(self->spec);
self->spec = 0;
}
pid$target::fopen64:return
/self->spec && arg1 == 0/
{
commit(self->spec);
self->spec = 0;
}
118
119
Consumers
120
121
122
You just created a provider foobar and two probes foo & bar. That's it. (almost!)
The arguments are the types of the two arguments your probe exposes.
123
...
if(inp<10){
val1 = inp^3;
}else
val1 = inp^2;
}
...
#include <sys/sdt.h>
foo {
...
if(inp<10){
DTRACE_PROBE2(foobar, foo, inp, 3);
val1=inp^3;
}else
...
}
124
cc -c probeale.c
dtrace -G -32 -s foobar.d probeable.o
cc -o probeable foobar.o probeable.o
The dtrace command compiles the .d file. It takes input from the
probeable.o(place in your code where you have added the code)
-G option generates a .o file
-32 / -64 for 32 and 64 bit apps.
The last line compiles all the .o's into your app.
Ok you are done! Really!
125
126