0% found this document useful (0 votes)

10 views17 pages

Cs 8803 Ss Project

This document analyzes security threats related to GPU programming with CUDA. It outlines a threat model using STRIDE and explores potential spoofing, tampering, repudiation, information disclosure, and denial of service attacks on the CUDA system. Experiments were conducted using various NVIDIA GPUs to successfully demonstrate attacks and identify areas for further research.

Uploaded by

nick black

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views17 pages

Cs 8803 Ss Project

Uploaded by

nick black

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

My Other Computer is your GPU:

System-Centric CUDA Threat Modeling with CUBAR

Nick Black and Jason Rodzik
CS8803SS Project, Spring 2010

“The charm of history and its enigmatic lesson consist in the fact that, from age to age, nothing changes and yet everything is completely different.”
—Aldous Huxley

Abstract
Heterogeneous computing has definitely arrived, and graphics processing units (GPUs) in the millions are
employed worldwide. History has shown newly programmable domains to be rapidly subjected (and often
found vulnerable) to attacks of the past. We enumerate a system-centric threat model using Microsoft’s
STRIDE process [23]. We then describe an active general purpose programming system, NVIDIA’s Com-
pute Unified Device Architecture (CUDA) 1 , and explore the threat space. We derive and describe previously-
undisclosed memory protection and translation processes in CUDA, successfully mount several attacks on
the CUDA system, and identify numerous directions for future work. Our CUBAR suite of tools, especially
cudash (the CUDA Shell), form a solid platform for research to come.

1 Introduction using version 195.36.15 of the CUDA software. Hard-

ware included a GTS 360M, a GeForce 8400 GS, and a
Within modern desktops, high-performance workstations,
GT 9600. Our source code can be found on GitHub2 .
and even laptops there exist sources of tremendous com-
puting power: programmable, massively parallel graph- 2 An overview of CUDA
ics cards. The hundreds of simple in-order cores found
on recent NVIDIA and ATI hardware provide many more A GPU-accelerated CUDA application on Linux adds
peak FLOPS than the x86 processor packages with which four component parts to a typical process:
they’re commonly coupled. Devices such as NVIDIA’s • A “fat binary” containing host and guest object code,
Tesla™ have already traded video outputs for onboard typically x86 and PTX [20]. Note that PTX is a mere
memory, giving rise to “desktop supercomputing.” Those intermediate representation, converted to the native
who would fully utilize their machines can no more ignore CUBIN format at runtime.
heterogeneous processing than multiple cores or SIMD. • The CUDA userspace library libcuda.so. This
But is it safe? GPUs have traditionally accelerated closed-source archive provides an interface to the
single-instance, interactive tasks (such as video games NVIDIA driver, and management of a CUDA in-
and CAD) and decomposable, compute-bound tasks (such stance.
as batch rendering) for which a deinterleaved schedule
• The NVIDIA driver nvidia.ko. This closed-source
is optimal. Hardware facilities for the support of multi-
module handles ioctls issued by libcuda.so.
ple threads or processes are primitive at best. When the
GPU was limited to blitting, fogging, and anti-aliasing • The hardware, controlled by registers and memory-
framebuffers, minimal isolation and assurance capabili- mapped IO.
ties were tolerable. Compromises of scientific data, cryp- No complete open-source CUDA solution yet exists.
tographic materials and SCADA systems are rather more The CUDA language is an extension of C++, designed
serious, yet these precise applications drive the adoption for compilation with NVIDIA’s nvcc [17] toolchain.
of GPGPU programming. NVIDIA states that a given device supports multiple host
We explored CUDA 3.0 on a 2.6.34-rc2 Linux kernel, threads, but that a given host thread can control only one
1 NVIDIA, CUDA, and GeForce are trademarks or registered trademarks of the NVIDIA Corporation.
2 http://github.com/dankamongmen/wdp/tree/master/cs8803ss-project/

1
device. Data must either be explicitly copied between (by default, /dev/nvidiaX); such privileges are neces-
system and video RAM or, on devices of Compute Ca- sary to compute on the device, and thus a safe assumption.
pability 1.2 or higher, restricted to shared memory maps Escalation to device access is an attack on the operating
[19]. GPU processes (“kernels”) can be launched asyn- system’s access control, and outside the scope of this pa-
chronously, though (until Compute Capability 2.0) only per.
one kernel can be executed at a time by a given device. We seek to answer the following questions, for ma-
CUDA supports numerous memory “spaces”, selected chines with one or more CUDA-capable cards:
via PTX affixes (until the advent of unified addressing in
Compute Capability 2.0, presumably effected via memory 3.1 Spoofing
translation). Spaces differ according to caching, method
• Is it possible for one CUDA kernel to preëmpt data
of initialization, visibility across the device’s multiproces-
copies requested from another?
sors, and mutability. Official documentation does not ad-
dress memory protection or translation. 3.2 Tampering
CUDA kernels are typically distributed as JIT-friendly
PTX binaries [10], an intermediate representation suit- • Can one CUDA kernel manipulate another’s data set?
able for all NVIDIA hardware. Upon being loaded onto • Is it possible to construct a debugging environment
the card, dynamic compilation3 is performed, resulting in around other CUDA kernels?
a locally-optimized CUBIN blob. These blobs are dis- • Can a CUDA kernel modify the active display?
patched to Streaming Multiprocessors across the card, all
• Is it possible to pervert compilation processes, either
of which share a common memory. A given system thread
nvcc or JIT?
can use only one device at a time, but a device may be
used by more than one system thread. 3.3 Repudiation
In our case, the nvidia.ko kernel module and
libcuda.so library weigh in at thirteen and seven • Can a CUDA kernel disassociate itself from the sys-
megabytes respectively. We can assume them to contain tem process which spawned it?
substantial logic. Ultimately, some of our questions can • Can a CUDA kernel spawn new CUDA kernels?
be answered only via analysis of these binaries. This is a • What forensic data, if any, is created as a result of
matter of some importance for open source projects seek- CUDA computing?
ing to duplicate CUDA functionality, such as the Nouveau
Project and libcudest [2]. 3.4 Information disclosure
2.1 Existing controls • Can a CUDA kernel read another kernel’s data set?
The /dev/nvidiaX device nodes control access to the Need they be simultaneously scheduled for this to oc-
CUDA hardware and kernelspace component. Under the cur?
standard Linux security model, these will be restricted via • Is it possible to read another CUDA kernel’s code,
0660 permissions to a group (typically video). Mem- even if it cannot be controlled?
bership in the owning group is thus necessary and suffi- • Is it possible to read the system memory of another
cient to satisfy the operating system. strace output for CUDA application via calls through the CUDA inter-
CUDA applications, along with NVIDIA’s nvidia-smi mediary?
device configuration program, show use of the geteuid • Is it possible to reconstruct the system’s video channel
system call, but no further access controls are exported to from an arbitrary CUDA kernel? What about textures?
the user.
3.5 Denial of service
3 Threat model
• Can a CUDA kernel monopolize resources in the face
Well-known threat taxonomies include the “CIA Triad”
of competitors?
(extended by the “Parkerian Hexad [3]”) and Microsoft’s
STRIDE. The latter, developed as part of Microsoft’s • Can a CUDA kernel prevent another from being con-
Secure Product Lifecycle, is designed for system-centric trolled, or executing data transfers to or from the sys-
threat modeling and suitable for our purposes. We assume tem?
that an attacker has access to the NVIDIA device node(s) • Can a CUDA kernel deny resources beyond the GPU?
3 Likely performed in the driver, not the hardware, though we have not yet verified this.

2
3.6 Escalation of privilege we implemented our own context-multiplexing.
The PTX Reference provides some details. Load and
• Might the driver be exploited, allowing arbitrary ring
store instructions require a “state space” modifier in addi-
0 code to run?
tion to an address. General-purpose registers (r0–r127)
• If it is possible to return doppelgänger data, might it are indexed via 7-bit immediates. Global memory, ac-
be leveraged to attack (and hopefully exploit) system- cessed via one of 16 linear ranges of up to 4GB each (g0–
side processes? g15), is addressed via the contents of a 32-bit general
• Is it possible to arbitrarily (i.e. without exploitation) purpose register4 . Constant memory is referenced through
manipulate another CUDA kernel’s code? Is it possi- one of 16 linear ranges of up to 64KB each (c0–c15), as
ble to construct a CUDA virus? is a block-shared region of up to 16KB. Per-thread local
• Is it possible to arbitrarily (i.e. without exploitation) memory, an abstraction atop the global memory, performs
manipulate a system process’s code maps from a address translation based on block and thread indices.
CUDA kernel? 4.1 Tools
4 Methodology Beyond the basic toolchain, we made use of:
• vbtracetool [15] to dump video BIOS,
Some of the questions asked by our threat model can be
answered via simple experiments (others, such as whether • cuda-gdb [18] to debug CUDA programs,
the driver might be exploited, are essentially undecid- • strace [12] to track system calls,
able). Describing the internal mechanisms implementing • nv50dis [11] to disassemble nv50 binaries,
these externally-visible properties generally requires dis-
• nvtrace [16] to decipher ioctls issued to the
assembly, and is the object of further work.
NVIDIA proprietary driver, and
It is first necessary to develop a model of memory
protection, multiple user contexts, and memory layout • the Linux kernel’s MMIOTrace [21] infrastructure to
in CUDA. Few details have been made public regarding track memory-mapped I/O operations.
these topics; what public knowledge exists takes the form We then developed some tools:
of scattered fora posts, Wikis [14], and mailing lists. Fore- • cudadump (and its helper binary, cudaranger) to
most among these is memory protection; a trusted multi- discover readable regions in a given virtual memory,
process computing base cannot be constructed in its ab-
• cudash, the CUDA Shell, to perform close-in exper-
sence. Memory protection for host accesses of the GPU
iments and prototype attacks, and
could be implemented at three levels, each more effective
than the last: • cudabounder, cudaminimal, cudapinner,
cudaquirky, cudaspawner, and cudastuffer,
• the proprietary CUDA and OpenGL libaries, a series of single-purpose attack tools.
• the proprietary nvidia.ko kernel driver, and
• on the hardware itself.
Userspace protection can likely be thwarted by userspace
code, whereas protection implemented within the kernel
module ought be secure against all but ring 0 operations
(recall that our threat model does not assume access to
ring 0). It is unlikely that protection on the card itself can
be generally circumvented. Furthermore, this is the only
place to protect memory from CUDA kernels.
As an example of the futility of userspace protec-
tion, we were trivially able (contrary to NVIDIA doc-
umentation) to control multiple devices from a single
host thread. Use of the ltrace library call tracer in-
dicated CUDA context association to be performed via
pthread key t thread-specific data. By interposition-
ing ourselves between the binary and libpthread.so,
4 Platforms such as the Tesla™ C1060 provide the full 4GB of accessible memory. Compute Capability 2.0 unifies addressing and extends

addresses to 64 bits. 40 physical bits are currently supported for up to 1TB of RAM.

3
5 Attacking CUDA (A dialogue) (1MB) distinct from either TLB size (4KB and 8MB), we
We move now beyond the realm of the documented. consider this to imply a backing store in video RAM, and
that memory protection is a single bit per entry.
5.1 Von Neumann or Harvard architecture? Members of the cuMemset*() family of functions,
CUDA appears to be a Harvard architecture. Kernels do when given an invalid device address, neither return an
occupy video memory, as can be verified by dumping the error nor segfault. Instead, later context functions re-
video RAM. We assert that, by default, a given kernel’s turn a 700 (“previous kernel failed”) error. This strongly
code is neither readable nor writeable using the global suggests that memory protection is not modeled in soft-
state space. The former was tested by verifying that global ware. Further research, especially involving shared,
state spaces checksummed to the same value over distinct, pinned maps, ought investigate this further.
subsequent kernels (all of which operated strictly on the 5.4 Does memory protection incorporate contexts?
shared memory space). The latter was tested by revers-
ing all possible bits in the global memory space, and then CUDA hardware considers contexts. To ensure that mem-
performing a series of calculations. It is possible that un- ory isolation was not being implemented purely through
documented opcodes can retrieve or modify code. translation, we allocated a large region of memory in one
This is puzzling: the Harvard architecture’s primary context, and operated on it. We then ensured that a dis-
advantage is the ability to fetch instructions and data into tinct, simultaneous context failed accessing any possible
the CPU simultaneously. We suggest that coherence sim- address. This suggests that some manner of unforgeable
plification, combined with the weak memory assurances capability [5] lies at the heart of a CUDA context.
of the global memory space, motivated this solution. Here, we make a controversial conjecture: these ca-
pabilities can be forged. We draw this conclusion from
5.2 Is virtual memory implemented? the interaction of CUDA and the fork system call. A
CUDA implements virtual memories. We conclusively CUDA application’s system resources are wholly accessi-
demonstrated this by examining allocation results in mul- ble by a child process up until being unmapped via exec
tiple concurrent CUDA contexts. Returned device point- or some other system call. Furthermore, we verified allo-
ers are equivalent for equivalent allocations in multiple cations or kernels executed in either process following the
contexts. Allocations begin at 0x101000, honor a mini- fork to be visible in the other. We were unable to forge a
mum alignment of 256 bytes (larger alignments are hon- context, but think it likely that disassembly of the CUDA
ored for very large allocations), and otherwise move con- stack would make a method plain. It is likely that some
tiguously through memory. Freed regions can be re- memory-mapped I/O is involved independent of the con-
claimed. This suggests multilevel memory allocation, text object itself; fork, which preserves maps, thus nat-
split between user and kernelspace. urally facilitates context-forging (the /dev/nvidiaX
Physical memory cannot be aliased or oversubscribed; maps, as can be seen in /proc/X/maps, are mapped
multiple contexts’ maximum allocations cannot add up to MAP SHARED and thus not subject to Copy-on-Write be-
more than a single context’s maximum possible alloca- havior).
tions.
5.5 Is memory scrubbed between kernels?
5.3 Does memory protection exist?
CUDA does not scrub memory. This was demonstrated
CUDA enforces memory protection in hardware. Use of conclusively by verifying that a large (8MB) random
a general-purpose, word-sized register when referencing string could be recovered, contiguously and in its entirety,
the global space means kernels’ memory accesses can- by a subsequent kernel. It was not possible to determine
not be preverified, and the possibility of software-assisted whether code regions were scrubbed, since CUDA does
memory protection can be eliminated by analyzing details not support indirect branches. The cudawrapper [7]
of memory transactions relative to the device clock [25]. tool has been developed to provide scrubbing, but the au-
The use of two-level TLBs has been illustrated in previous thors consider it highly dubious that this userspace wrap-
work [4]. We verified memory protection to be effective per could successfully deal with an uncatchable signal.
beyond the combined capacities of the TLBs, and note a
5.6 Can kernels disassociate from processes?
L2 TLB miss to be only about half again as expensive as
a L2 TLB hit. Thus we assert that memory protection is The CUDA kernelspace tracks processes. This was con-
performed independently of, and in parallel with, address clusively demonstrated by sending SIGKILL signals to
translation. This requires virtual- (and thus per-context) CUDA processes after they had performed large alloca-
indexing; together with a memory protection granularity tions and launched intense, long-running kernels. There

4
was no opportunity for userspace code to run, yet these RLIMIT MEMLOCK limit (the default value on Debian
resources were properly freed upon process termination. Linux is 64KB). CUDA, allocating system memory from
5.7 Can a kernel fork? within kernelspace, does not honor this limit. This more
than once resulted in Linux’s “OOM (Out-of-Memory)
CUDA kernels do not appear capable of forking. A pri- killer” delivering SIGKILL to firefox or even sshd
mum movens in the form of a host-system CPU is required processes during testing.
to launch a CUDA kernel. The kernel-launching interface
must, therefore, be visible to the host. There is no com- 5.11 Any miscellaneous security issues?
pelling reason to then duplicate this interface within the CUDA introduces sundry security issues. The CUDA in-
GPU. staller (which must be run as root throughout) contacts an
Until the mechanism used to launch a kernel is known, anonymous FTP server to check for updated versions of
however, we cannot be sure that the following loophole the driver. This is open to any number of classic man-in-
does not exist: devices of Compute Capability 1.2 or the-middle attacks, leaving the user vulnerable to trojans.
higher can map host memory into a context’s address Even if the download was signed, running the downloader
space. If kernels are launched entirely via memory- as root leaves the system vulnerable to user agent exploits.
mapped I/O transactions, and the map through which this The nvcc compiler does not support a -pipe option
is done is mapped again, this time into shared memory, ala GCC, instead forcing use of temporary files for multi-
device code might be able to manipulate the device. At phase operations. The temporary names used are wholly
this point, all bets are off. predictable, suggesting vulnerability to a class of symlink
5.8 Does CUDA produce forensics? attack. Successfully attacking this scheme via named FI-
FOs, and thus providing a -pipe capability for nvcc, is
CUDA provides insufficient forensic data. CUDA kernels left as an exercise for the reader.
are not tied into the process accounting mechanism. Ex- The ability of any CUDA application to monopo-
ceptions are delivered to klogd, but rate-limited within the lize device memory facilitates “use-of-NULL” attacks
kernel itself; it would be trivial to hide a meaningful ex- throughout the graphics stack, much like those which have
ception behind a flurry of meaningless ones, as this policy plagued the Linux kernel in 2009 and 2010. Below, see
cannot be overridden. No system exists to log who ran two corrupted displays. It is doubtful that our CUDA
what kernel, or for how long. application actively molested the display, since it simply
The nvidia-smi tool claims to decode and report allocated device memory; more likely, the programs ig-
exceptions as stored in a hardware buffer, similar to the nored some allocation failure.
Machine Check Exception registers of x86. We were not, Behold the grim visage of undefined behavior:
however, able to generate any informative output using
nvidia-smi.
5.9 Does CUDA enforce resource limits?
CUDA provides no resource limits. UNIX provides a
rich set of resource limits via the rlimit mechanism.
These are just as important for protecting against pro-
(a) Corrupted AA (b) Pixellated VDPAU
gramming errors as assuring fair resource distribution.
CUDA neither honors the existing infrastructure, nor pro-
vides any of its own. Device memory allocations do not 6 Defending CUDA
count against RLIMIT AS, kernel execution time does Implementors of competing CUDA stacks must ensure
not count against RLIMIT CPU, and executing kernels that their systems implement the relatively successful
are not bounded by RLIMIT NPROC. memory protections extant in the NVIDIA solution. The
5.10 Can CUDA deny access to system resources? following properties ought be considered in designs, and
verified:
CUDA allows system RAM to be monopolized. Devices
of Compute Capability 1.2 or higher support mapping • CUDA manifests a Harvard architecture atop what is
host memory into a device’s address space. For this likely a Von Neumann unified memory. It is neces-
to be done, the memory must be pinned (also known sary that code be properly protected, especially under
as locked), and thus protected against swapping. Since Compute Capability 2.0’s concurrent kernels.
this makes physical memory unusable by the rest of • CUDA performs memory translation and protection,
the system, memory pinning is strictly limited using the almost certainly in hardware. It is necessary that these

5
mechanisms be properly configured, and that proper • Further investments in cache — and thus, hopefully,
invalidations are performed in the face of remappings. fewer memory delays — serve only to approach
Concurrent kernels must not be able to freely read more closely a device’s theoretical peak. That peak
each other’s memory. itself is a function of architecture, not of programs
• CUDA monitors processes from within kernelspace, or their access patterns.
freeing their device resources on termination. This
must be duplicated, and must be performed in ker- • Investments in OOO apparatus (reorder buffers,
nelspace to be effective. frontend reservation stations, register renaming)
• It ought be ensured that device control maps are not yield diminishing returns due to the profound limi-
shared with device code via a secondary mapping. tations of instruction level parallelism [6]. Like im-
This code must be located in kernelspace. provements to cache, OOO can only hide delays,
not find new FLOPS.
• It would be valuable for CUDA to make available
more forensic information. In particular, the standard
• Denser chip-multithreading similarly serves only to
UNIX process accounting mechanism ought be hon-
hide latency.
ored. Furthermore, rate-limiting ought use the stan-
dard, configurable klogd policies.
• Larger chips require either pipelined wires or high
• CUDA resources ought be accounted for by the voltages to operate reliably, and place strict require-
rlimit infrastructure. It is absolutely critical that ments on clock-signaling circuity. Efficient power
this be done for mapped host memory. management requires complex and expensive par-
7 Conclusions tial power gating5 .

The recommendations of the Orange Book [13], require- We see that FLOPS — at any price — can be had only
ments of the Common Criteria [9], and animadversions by adding cores or extending SIMD order. The former
of Saltzer and Schroeder [22] have long provided general is obviously easier with small, in-order, SISD cores, such
principles for security-conscious design. These principles as the unified shaders of a modern GPU; duplicating a
have been at least superficially observed along the way Nehalem core is a much more daunting process. As for
to NVIDIA’s CUDA system. Reverse engineering of the the latter, note that Intel’s “Sandy Bridge” microarchitec-
CUDA software stack and validation of open-source com- ture is expected to add the AVX instruction scheme and its
petition will reveal how truly sound the implementations 256-bit YMM vector registers [8] [1]. The data-based de-
may be. Modern GPUs are some of the world’s most pow- composition of CUDA’s “SIMT” model can immediately
erful devices. Harnessing these mighty processing units is take advantage of their new cores (given sufficiently large
necessary to even approach full machine utilization. This problem sets, of course), whereas x86 binaries6 would re-
will become only more true: quire dynamic translation or recompilation to take advan-
tage of the new SIMD order.
• Increasing the clock frequency affects cooling and
Extensive manycoring and huge memory bandwidths
power requirements (dynamic power consumption
are certainly the key to FLOPS. Without security-
is proportional to frequency). Furthermore, clock
conscious analysis and verification of infrastructure and
frequency increases tend to force decomposition of
policy, they must not be trusted in a multiprocessing con-
pipeline stages, exacerbating branch misprediction
text. As GPUs become more and more general-purpose,
delays and pressuring any OOO system [24].
the attack space will grow; Compute Capability 2.0 al-
• Increasing the issue width, or duplicating functional ready adds significant complexity and danger. The only
units, either requires ISA changes (for VLIW) or answer is constant vigilance.
massive frontend resources for lookahead, disam-
References
biguation, and hazard tracking. Assuming these ex-
penses acceptable, diminishing returns result from [1] AMD C ORPORATION. AMD64 Architecture Pro-
limitations of instruction-level parallelism. Further- grammer’s Manual Volume 6: 128-Bit and 256-
more, very wide issues lead to sparse code flows, Bit XOP and FMA4 Instructions 3.04. No. 43479.
taxing the instruction store subsystem. November 2009.
5 For instance, feeding µops from Nehalem’s Loop Stream Detector results in power-down of the leading three pipeline stages.
6 Most of them, anyway.

6
[2] B LACK , N. libcudest. http://dank.qemfd. [15] N OUVEAU P ROJECT. vbtracetool. http:
net/dankwiki/index.php/Libcudest, //nouveau.freedesktop.org/wiki/
2010. DumpingVideoBios, 2009–.
[3] B OSWORTH , S., AND K ABAY, M. E. The Computer [16] N OUVEAU P ROJECT. nvtrace. http:
Security Handbook. John Wiley & Sons, Inc., New //nouveau.freedesktop.org/wiki/
York, NY, USA, 2002, ch. 5. Nvtrace, 2009.
[4] D EMMEL , J. W., AND VOLKOV, V. Benchmark-
ing GPUs to tune dense linear algebra. ACM/IEEE [17] NVIDIA C ORPORATION. The CUDA Compiler
conference on Supercomputing (2008). Driver NVCC. 2007–2010.

[5] FABRY, R. S. Capability-based addressing. Com- [18] NVIDIA C ORPORATION. CUDA-GDB (NVIDIA
munications of the ACM (1974), 403–412. CUDA Debugger) User Manual 3.0. No. PG-00000-
004. January 2010.
[6] H ENNESSY, J. L., AND PATTERSON , D. A. Com-
puter Architecture: A Quantitative Approach (The [19] NVIDIA C ORPORATION. NVIDIA CUDA™ Pro-
Morgan Kaufmann Series in Computer Architecture gramming Guide 3.0. 2010.
and Design). Morgan Kaufmann, May 2002.
[20] NVIDIA C ORPORATION. NVIDIA PTX: Parallel
[7] I NNOVATIVE S YSTEMS L AB , NATIONAL C EN -
Thread Execution ISA Version 2.0. 2010.
TER FOR S UPERCOMPUTING A PPLICATIONS.
cuda wrapper. http://cudawrapper.
[21] PAALANEN , P., M UIZELAAR , J., I NTEL C OR -
sourceforge.net/, 2009.
PORATION , AND N OUVEAU P ROJECT . In-kernel
[8] I NTEL C ORPORATION. Intel® Advanced Vector Ex- memory-mapped i/o tracing. http://nouveau.
tensions Programming Reference. No. 319433-006. freedesktop.org/wiki/MmioTrace,
July 2009. 2003–.

[9] ISO 15408-1:2009. Common criteria 3.1: Infor- [22] SALTZER, J. H., AND SCHROEDER, M. D.
mation technology — security techniques — evalu- The protection of information in computer systems.
ation criteria for it security. In Fourth ACM Symposium on Operating System
[10] K ERR , A., D IAMOS , G., AND YALAMANCHILI , Principles (1973).
S. A characterization and analysis of PTX ker-
[23] S HOSTACK , A. Experiences threat modeling at Mi-
nels. IEEE Workload Characterization Symposium
crosoft. Workshop on Modeling Security (September
0 (2009), 3–12.
2008).
[11] KO ŚCIELNICKI , M. W. nv50dis. http://0x04.
net/cgit/index.cgi/nv50dis, 2009–. [24] S PRANGLE , E., AND C ARMEAN , D. Increas-
ing processor performance by implementing deeper
[12] K RANENBURG , P., AND M C G RATH , R. strace. pipelines. Proceedings of the 29th annual interna-
http://sourceforge.net/projects/ tional symposium on Computer architecture (ISCA
strace/, 1990–. ’02) 30 (2002).
[13] NATIONAL C OMPUTER S ECURITY C ENTER.
[25] W ONG , H., PAPADOPOULOU , M.-M.,
Trusted Computer System Evaluation Criteria
S ADOOGHI -A LVANDI , M., AND M OSHOVOS ,
(Orange Book). No. DOD 5200.28-STD. 1985.
A. Demystifying GPU microarchitecture through
[14] N OUVEAU P ROJECT. Nouveau project wiki: microbenchmarking. IEEE International Sym-
Cuda. http://nouveau.freedesktop. posium on Performance Analysis of Systems and
org/wiki/CUDA, 2009–. Software (March 2010).

7
A strace(2)d CUDA binary
1 execve("C/bin/linux/release/deviceQueryDrv", ["C/bin/linux/release/deviceQueryD"...], [/* 45 vars */]) = 0
2 brk(0) = 0x1b29000
3 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdcc8571000
4 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
5 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdcc856f000
6 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
7 open("/etc/ld.so.cache", O_RDONLY) = 3
8 fstat(3, {st_mode=S_IFREG|0644, st_size=86224, ...}) = 0
9 mmap(NULL, 86224, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fdcc8559000
10 close(3) = 0
11 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
12 open("/usr/lib/libcuda.so.1", O_RDONLY) = 3
13 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\375\7\0\0\0\0\0"..., 832) = 832
14 fstat(3, {st_mode=S_IFREG|0755, st_size=7404990, ...}) = 0
15 mmap(NULL, 8623832, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fdcc7d1f000
16 mprotect(0x7fdcc8399000, 1044480, PROT_NONE) = 0
17 mmap(0x7fdcc8498000, 618496, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x679000) = 0x7fdcc8498000
18 mmap(0x7fdcc852f000, 169688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fdcc852f000
19 close(3) = 0
20 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
21 open("/usr/lib/libstdc++.so.6", O_RDONLY) = 3
22 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\243\245\266:\0\0\0"..., 832) = 832
23 fstat(3, {st_mode=S_IFREG|0644, st_size=1046720, ...}) = 0
24 mmap(0x3ab6a00000, 3223576, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3ab6a00000
25 mprotect(0x3ab6af6000, 2097152, PROT_NONE) = 0
26 mmap(0x3ab6cf6000, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xf6000) = 0x3ab6cf6000
27 mmap(0x3ab6cff000, 81944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3ab6cff000
28 close(3) = 0
29 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
30 open("/lib/libm.so.6", O_RDONLY) = 3
31 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p> \253:\0\0\0"..., 832) = 832
32 fstat(3, {st_mode=S_IFREG|0644, st_size=533472, ...}) = 0
33 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdcc7d1e000
34 mmap(0x3aab200000, 2625752, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3aab200000
35 mprotect(0x3aab281000, 2093056, PROT_NONE) = 0
36 mmap(0x3aab480000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x80000) = 0x3aab480000
37 close(3) = 0
38 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
39 open("/lib/libgcc_s.so.1", O_RDONLY) = 3
40 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P- \266:\0\0\0"..., 832) = 832
41 fstat(3, {st_mode=S_IFREG|0644, st_size=93072, ...}) = 0
42 mmap(0x3ab6200000, 2186360, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3ab6200000
43 mprotect(0x3ab6216000, 2093056, PROT_NONE) = 0
44 mmap(0x3ab6415000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x3ab6415000
45 close(3) = 0
46 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
47 open("/lib/libc.so.6", O_RDONLY) = 3
48 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\353\241\252:\0\0\0"..., 832) = 832
49 fstat(3, {st_mode=S_IFREG|0755, st_size=1385152, ...}) = 0
50 mmap(0x3aaaa00000, 3487784, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3aaaa00000
51 mprotect(0x3aaab4a000, 2097152, PROT_NONE) = 0
52 mmap(0x3aaad4a000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14a000) = 0x3aaad4a000
53 mmap(0x3aaad4f000, 18472, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3aaad4f000
54 close(3) = 0
55 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
56 open("/lib/libpthread.so.0", O_RDONLY) = 3
57 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320X‘\253:\0\0\0"..., 832) = 832
58 fstat(3, {st_mode=S_IFREG|0755, st_size=134033, ...}) = 0
59 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdcc7d1d000
60 mmap(0x3aab600000, 2208640, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3aab600000
61 mprotect(0x3aab616000, 2097152, PROT_NONE) = 0
62 mmap(0x3aab816000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x3aab816000
63 mmap(0x3aab818000, 13184, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3aab818000
64 close(3) = 0
65 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
66 open("/usr/lib/libz.so.1", O_RDONLY) = 3
67 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\"\240\253:\0\0\0"..., 832) = 832
68 fstat(3, {st_mode=S_IFREG|0644, st_size=96448, ...}) = 0
69 mmap(0x3aaba00000, 2188976, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3aaba00000
70 mprotect(0x3aaba17000, 2093056, PROT_NONE) = 0
71 mmap(0x3aabc16000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x3aabc16000
72 close(3) = 0
73 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
74 open("/lib/libdl.so.2", O_RDONLY) = 3
75 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\r\340\252:\0\0\0"..., 832) = 832

8
76 fstat(3, {st_mode=S_IFREG|0644, st_size=17504, ...}) = 0
77 mmap(0x3aaae00000, 2109696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3aaae00000
78 mprotect(0x3aaae02000, 2097152, PROT_NONE) = 0
79 mmap(0x3aab002000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x3aab002000
80 close(3) = 0
81 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdcc7d1c000
82 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdcc7d1b000
83 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdcc7d1a000
84 arch_prctl(ARCH_SET_FS, 0x7fdcc7d1a710) = 0
85 mprotect(0x3aab002000, 4096, PROT_READ) = 0
86 mprotect(0x3aab816000, 4096, PROT_READ) = 0
87 mprotect(0x3aaad4a000, 16384, PROT_READ) = 0
88 mprotect(0x3aab480000, 4096, PROT_READ) = 0
89 mprotect(0x3ab6cf6000, 28672, PROT_READ) = 0
90 mprotect(0x3aa961c000, 4096, PROT_READ) = 0
91 munmap(0x7fdcc8559000, 86224) = 0
92 set_tid_address(0x7fdcc7d1a7e0) = 4613
93 set_robust_list(0x7fdcc7d1a7f0, 0x18) = 0
94 futex(0x7fff6bcd716c, FUTEX_WAKE_PRIVATE, 1) = 0
95 futex(0x7fff6bcd716c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, 7fdcc7d1a710) = -1 EAGAIN
96 rt_sigaction(SIGRTMIN, {0x3aab605750, [], SA_RESTORER|SA_SIGINFO, 0x3aab60e990}, NULL, 8) = 0
97 rt_sigaction(SIGRT_1, {0x3aab6057e0, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x3aab60e990}, NULL, 8) = 0
98 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
99 getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
100 sched_get_priority_max(SCHED_RR) = 99
101 sched_get_priority_min(SCHED_RR) = 1
102 futex(0x3ab6cffb68, FUTEX_WAKE_PRIVATE, 2147483647) = 0
103 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
104 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdcc856e000
105 write(1, "CUDA Device Query (Driver API) s"..., 58) = 58
106 open("/proc/stat", O_RDONLY|O_CLOEXEC) = 3
107 read(3, "cpu 9758 0 3719 540454 5815 470"..., 8192) = 2079
108 close(3) = 0
109 brk(0) = 0x1b29000
110 brk(0x1b4a000) = 0x1b4a000
111 geteuid() = 1000
112 geteuid() = 1000
113 open("/dev/nvidiactl", O_RDWR) = 3
114 ioctl(3, 0xc04846d2, 0x7fff6bcd6160) = 0
115 ioctl(3, 0xc00446ca, 0x7fdcc85538e0) = 0
116 ioctl(3, 0xc60046c8, 0x7fdcc85532e0) = 0
117 ioctl(3, 0xc00c4622, 0x7fff6bcd61b0) = 0
118 ioctl(3, 0xc020462a, 0x7fff6bcd6190) = 0
119 geteuid() = 1000
120 open("/dev/nvidia0", O_RDWR) = 4
121 ioctl(3, 0xc048464d, 0x7fff6bcd5f00) = 0
122 open("/proc/interrupts", O_RDONLY) = 5
123 fstat(5, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
124 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdcc856d000
125 read(5, " CPU0 CPU1 "..., 1024) = 1024
126 read(5, " 97 0 IO-APIC-fasteoi"..., 1024) = 1024
127 read(5, " 0 0 0 "..., 1024) = 1024
128 read(5, "7 13730 23433 Local "..., 1024) = 1024
129 read(5, " 0 0 0 "..., 1024) = 246
130 read(5, "", 1024) = 0
131 read(5, "", 1024) = 0
132 close(5) = 0
133 munmap(0x7fdcc856d000, 4096) = 0
134 ioctl(3, 0xc020462a, 0x7fff6bcd6110) = 0
135 ioctl(3, 0xc020462a, 0x7fff6bcd5f40) = 0
136 ioctl(3, 0xc020462a, 0x7fff6bcd5f40) = 0
137 geteuid() = 1000
138 open("/dev/nvidia0", O_RDWR) = 5
139 ioctl(3, 0xc048464d, 0x7fff6bcd5df0) = 0
140 open("/proc/interrupts", O_RDONLY) = 6
141 fstat(6, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
142 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdcc856d000
143 read(6, " CPU0 CPU1 "..., 1024) = 1024
144 read(6, " 97 0 IO-APIC-fasteoi"..., 1024) = 1024
145 read(6, " 0 0 0 "..., 1024) = 1024
146 read(6, "7 13730 23433 Local "..., 1024) = 1024
147 read(6, " 0 0 0 "..., 1024) = 246
148 read(6, "", 1024) = 0
149 read(6, "", 1024) = 0
150 close(6) = 0
151 munmap(0x7fdcc856d000, 4096) = 0
152 ioctl(3, 0xc020462b, 0x7fff6bcd60e0) = 0

9
153 ioctl(5, 0xc0204637, 0x7fff6bcd6140) = 0
154 ioctl(5, 0xc0204637, 0x7fff6bcd6140) = 0
155 ioctl(3, 0xc020462a, 0x7fff6bcd6110) = 0
156 ioctl(3, 0xc020462a, 0x7fff6bcd5fb0) = 0
157 ioctl(3, 0xc020462a, 0x7fff6bcd5eb0) = 0
158 ioctl(3, 0xc020462a, 0x7fff6bcd5eb0) = 0
159 geteuid() = 1000
160 open("/dev/nvidia0", O_RDWR) = 6
161 ioctl(3, 0xc048464d, 0x7fff6bcd5d60) = 0
162 open("/proc/interrupts", O_RDONLY) = 7
163 fstat(7, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
164 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdcc856d000
165 read(7, " CPU0 CPU1 "..., 1024) = 1024
166 read(7, " 97 0 IO-APIC-fasteoi"..., 1024) = 1024
167 read(7, " 0 0 0 "..., 1024) = 1024
168 read(7, "8 13731 23434 Local "..., 1024) = 1024
169 read(7, " 0 0 0 "..., 1024) = 246
170 read(7, "", 1024) = 0
171 read(7, "", 1024) = 0
172 close(7) = 0
173 munmap(0x7fdcc856d000, 4096) = 0
174 ioctl(3, 0xc014462d, 0x7fff6bcd6040) = 0
175 ioctl(3, 0xc020462a, 0x7fff6bcd6110) = 0
176 ioctl(5, 0xc0144632, 0x7fff6bcd5de0) = 0
177 ioctl(5, 0xc0144632, 0x7fff6bcd5de0) = 0
178 ioctl(5, 0xc0144632, 0x7fff6bcd5de0) = 0
179 ioctl(5, 0xc0204637, 0x7fff6bcd5de0) = 0
180 ioctl(3, 0xc020462a, 0x7fff6bcd5db0) = 0
181 ioctl(3, 0xc020462a, 0x7fff6bcd5db0) = 0
182 ioctl(5, 0xc0204637, 0x7fff6bcd5de0) = 0
183 ioctl(5, 0xc0204637, 0x7fff6bcd5de0) = 0
184 ioctl(3, 0xc020462a, 0x7fff6bcd5db0) = 0
185 ioctl(3, 0xc020462a, 0x7fff6bcd5db0) = 0
186 ioctl(5, 0xc0204637, 0x7fff6bcd5de0) = 0
187 ioctl(3, 0xc020462a, 0x7fff6bcd5db0) = 0
188 ioctl(3, 0xc020462a, 0x7fff6bcd5db0) = 0
189 ioctl(3, 0xc020462a, 0x7fff6bcd5db0) = 0
190 ioctl(3, 0xc020462a, 0x7fff6bcd5db0) = 0
191 ioctl(3, 0xc020462a, 0x7fff6bcd5f50) = 0
192 ioctl(3, 0xc020462b, 0x7fff6bcd6020) = 0
193 ioctl(3, 0xc030464e, 0x7fff6bcd6010) = 0
194 mmap(NULL, 4096, PROT_READ, MAP_SHARED, 6, 0xf2009000) = 0x7fdcc856d000
195 ioctl(3, 0xc020462a, 0x7fff6bcd6180) = 0
196 write(1, "There is 1 device supporting CUD"..., 34) = 34
197 ioctl(3, 0xc020462a, 0x7fff6bcd6dc0) = 0
198 write(1, "\n", 1) = 1
199 write(1, "Device 0: \"GeForce GTS 360M\"\n", 29) = 29
200 write(1, " CUDA Driver Version: "..., 53) = 53
201 write(1, " CUDA Capability Major revision"..., 51) = 51
202 write(1, " CUDA Capability Minor revision"..., 51) = 51
203 ioctl(3, 0xc098464a, 0x7fff6bcd6e10) = 0
204 write(1, " Total amount of global memory:"..., 66) = 66
205 write(1, " Number of multiprocessors: "..., 52) = 52
206 write(1, " Number of cores: "..., 52) = 52
207 write(1, " Total amount of constant memor"..., 61) = 61
208 write(1, " Total amount of shared memory "..., 61) = 61
209 write(1, " Total number of registers avai"..., 55) = 55
210 write(1, " Warp size: "..., 52) = 52
211 write(1, " Maximum number of threads per "..., 53) = 53
212 write(1, " Maximum sizes of each dimensio"..., 64) = 64
213 write(1, " Maximum sizes of each dimensio"..., 67) = 67
214 write(1, " Maximum memory pitch: "..., 66) = 66
215 write(1, " Texture alignment: "..., 59) = 59
216 ioctl(3, 0xc020462a, 0x7fff6bcd6c90) = 0
217 ioctl(3, 0xc020462a, 0x7fff6bcd6c90) = 0
218 ioctl(3, 0xc020462a, 0x7fff6bcd6c90) = 0
219 ioctl(3, 0xc020462a, 0x7fff6bcd6d30) = 0
220 ioctl(3, 0xc020462a, 0x7fff6bcd6d30) = 0
221 write(1, " Clock rate: "..., 58) = 58
222 write(1, " Concurrent copy and execution:"..., 53) = 53
223 ioctl(3, 0xc020462a, 0x7fff6bcd6e20) = 0
224 write(1, " Run time limit on kernels: "..., 53) = 53
225 write(1, " Integrated: "..., 52) = 52
226 write(1, " Support host page-locked memor"..., 53) = 53
227 ioctl(3, 0xc020462a, 0x7fff6bcd6e20) = 0
228 write(1, " Compute mode: "..., 116) = 116
229 write(1, "\n", 1) = 1

10
230 write(1, "PASSED\n", 7) = 7
231 write(1, "\n", 1) = 1
232 write(1, "Press ENTER to exit...\n", 23) = 23
233 fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
234 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdcc856c000
235 read(0, "\n", 1024) = 1
236 exit_group(0) = ?

11
B libcuda.so.195.36.15 3.0 symbols
1 #generated via nm -S -D -g -a -C --defined-only /usr/lib/libcuda.so
2 0000000000102120 0000000000000119 T clGetExtensionFunctionAddress
3 00000000000f03b0 0000000000000005 T clGetPlatformInfo
4 0000000000111150 0000000000000272 T cuArray3DCreate
5 0000000000110ed0 0000000000000272 T cuArray3DGetDescriptor
6 00000000001118a0 0000000000000272 T cuArrayCreate
7 00000000001113d0 000000000000024d T cuArrayDestroy
8 0000000000111620 0000000000000272 T cuArrayGetDescriptor
9 0000000000119c00 0000000000000272 T cuCtxAttach
10 000000000011a0d0 000000000000028e T cuCtxCreate
11 0000000000119e80 000000000000024d T cuCtxDestroy
12 00000000001199b0 000000000000024d T cuCtxDetach
13 00000000001192c0 000000000000024d T cuCtxGetDevice
14 0000000000119510 000000000000024d T cuCtxPopCurrent
15 0000000000119760 000000000000024d T cuCtxPushCurrent
16 000000000010a6a0 000000000000018c T cuCtxSynchronize
17 000000000011aaf0 000000000000028e T cuDeviceComputeCapability
18 000000000011b260 0000000000000272 T cuDeviceGet
19 000000000011a360 000000000000028e T cuDeviceGetAttribute
20 000000000011b010 000000000000024d T cuDeviceGetCount
21 000000000011ad80 000000000000028e T cuDeviceGetName
22 000000000011a5f0 0000000000000272 T cuDeviceGetProperties
23 000000000011a870 0000000000000272 T cuDeviceTotalMem
24 000000000011b4e0 000000000000024d T cuDriverGetVersion
25 000000000010d1b0 0000000000000272 T cuEventCreate
26 000000000010c850 000000000000024d T cuEventDestroy
27 000000000010c5c0 000000000000028e T cuEventElapsedTime
28 000000000010ccf0 000000000000024d T cuEventQuery
29 000000000010cf40 0000000000000261 T cuEventRecord
30 000000000010caa0 000000000000024d T cuEventSynchronize
31 0000000000111da0 000000000000028e T cuFuncGetAttribute
32 00000000001122b0 00000000000002b2 T cuFuncSetBlockShape
33 0000000000111b20 0000000000000272 T cuFuncSetCacheConfig
34 0000000000112030 0000000000000272 T cuFuncSetSharedSize
35 000000000011b8c0 000000000000028e T cuGLCtxCreate
36 000000000011b730 000000000000018c T cuGLInit
37 000000000011c7a0 000000000000028e T cuGLMapBufferObject
38 000000000011bdc0 00000000000002aa T cuGLMapBufferObjectAsync
39 000000000011ca30 0000000000000253 T cuGLRegisterBufferObject
40 000000000011c070 0000000000000261 T cuGLSetBufferObjectMapFlags
41 000000000011c540 0000000000000253 T cuGLUnmapBufferObject
42 000000000011bb50 0000000000000261 T cuGLUnmapBufferObjectAsync
43 000000000011c2e0 0000000000000253 T cuGLUnregisterBufferObject
44 000000000010aa90 0000000000000272 T cuGetExportTable
45 000000000011cf50 000000000000028e T cuGraphicsGLRegisterBuffer
46 000000000011cc90 00000000000002b2 T cuGraphicsGLRegisterImage
47 000000000010afa0 0000000000000286 T cuGraphicsMapResources
48 000000000010b4b0 000000000000028e T cuGraphicsResourceGetMappedPointer
49 000000000010b230 0000000000000272 T cuGraphicsResourceSetMapFlags
50 000000000010b740 00000000000002b2 T cuGraphicsSubResourceGetMappedArray
51 000000000010ad10 0000000000000286 T cuGraphicsUnmapResources
52 000000000010ba00 000000000000024d T cuGraphicsUnregisterResource
53 000000000010a830 0000000000000253 T cuInit
54 000000000010d970 000000000000024d T cuLaunch
55 000000000010d6e0 000000000000028e T cuLaunchGrid
56 000000000010d430 00000000000002aa T cuLaunchGridAsync
57 0000000000117930 0000000000000272 T cuMemAlloc
58 0000000000116ee0 0000000000000272 T cuMemAllocHost
59 0000000000117650 00000000000002d4 T cuMemAllocPitch
60 00000000001173f0 0000000000000253 T cuMemFree
61 0000000000116c90 000000000000024d T cuMemFreeHost
62 0000000000117160 000000000000028e T cuMemGetAddressRange
63 0000000000081ef0 000000000000005f T cuMemGetAttribute
64 0000000000117bb0 0000000000000272 T cuMemGetInfo
65 0000000000116a00 000000000000028e T cuMemHostAlloc
66 0000000000116770 000000000000028e T cuMemHostGetDevicePointer
67 00000000001164f0 0000000000000272 T cuMemHostGetFlags
68 0000000000114d10 000000000000024d T cuMemcpy2D
69 0000000000113850 0000000000000261 T cuMemcpy2DAsync
70 0000000000114ac0 000000000000024d T cuMemcpy2DUnaligned
71 0000000000114870 000000000000024d T cuMemcpy3D
72 00000000001135e0 0000000000000261 T cuMemcpy3DAsync
73 0000000000114f60 00000000000002d4 T cuMemcpyAtoA
74 00000000001157c0 00000000000002b1 T cuMemcpyAtoD
75 0000000000115240 00000000000002b2 T cuMemcpyAtoH

12
76 0000000000113ac0 00000000000002cc T cuMemcpyAtoHAsync
77 0000000000115a80 00000000000002b2 T cuMemcpyDtoA
78 0000000000115d40 000000000000028d T cuMemcpyDtoD
79 0000000000114060 00000000000002aa T cuMemcpyDtoDAsync
80 0000000000115fd0 000000000000028e T cuMemcpyDtoH
81 0000000000114310 00000000000002aa T cuMemcpyDtoHAsync
82 0000000000115500 00000000000002b9 T cuMemcpyHtoA
83 0000000000113d90 00000000000002cc T cuMemcpyHtoAAsync
84 0000000000116260 000000000000028d T cuMemcpyHtoD
85 00000000001145c0 00000000000002aa T cuMemcpyHtoDAsync
86 00000000001130a0 000000000000029d T cuMemsetD16
87 0000000000112850 00000000000002d5 T cuMemsetD2D16
88 0000000000112570 00000000000002d3 T cuMemsetD2D32
89 0000000000112b30 00000000000002d5 T cuMemsetD2D8
90 0000000000112e10 000000000000028d T cuMemsetD32
91 0000000000113340 000000000000029c T cuMemsetD8
92 0000000000118380 000000000000028e T cuModuleGetFunction
93 00000000001180c0 00000000000002b9 T cuModuleGetGlobal
94 0000000000117e30 000000000000028e T cuModuleGetTexRef
95 0000000000119040 0000000000000272 T cuModuleLoad
96 0000000000118dc0 0000000000000272 T cuModuleLoadData
97 0000000000118ae0 00000000000002d4 T cuModuleLoadDataEx
98 0000000000118860 0000000000000272 T cuModuleLoadFatBinary
99 0000000000118610 000000000000024d T cuModuleUnload
100 000000000010e660 0000000000000272 T cuParamSetSize
101 000000000010dbc0 000000000000028e T cuParamSetTexRef
102 000000000010e110 00000000000002b5 T cuParamSetf
103 000000000010e3d0 000000000000028e T cuParamSeti
104 000000000010de50 00000000000002b9 T cuParamSetv
105 000000000010c340 0000000000000272 T cuStreamCreate
106 000000000010bc50 0000000000000245 T cuStreamDestroy
107 000000000010c0f0 0000000000000245 T cuStreamQuery
108 000000000010bea0 0000000000000245 T cuStreamSynchronize
109 0000000000110c80 000000000000024d T cuTexRefCreate
110 0000000000110a30 000000000000024d T cuTexRefDestroy
111 000000000010f580 0000000000000272 T cuTexRefGetAddress
112 000000000010f070 000000000000028e T cuTexRefGetAddressMode
113 000000000010f300 0000000000000272 T cuTexRefGetArray
114 000000000010edf0 0000000000000272 T cuTexRefGetFilterMode
115 000000000010e8e0 0000000000000272 T cuTexRefGetFlags
116 000000000010eb60 000000000000028e T cuTexRefGetFormat
117 00000000001104e0 00000000000002b2 T cuTexRefSetAddress
118 0000000000110220 00000000000002b2 T cuTexRefSetAddress2D
119 000000000010fd00 000000000000028e T cuTexRefSetAddressMode
120 00000000001107a0 000000000000028e T cuTexRefSetArray
121 000000000010fa80 0000000000000272 T cuTexRefSetFilterMode
122 000000000010f800 0000000000000272 T cuTexRefSetFlags
123 000000000010ff90 000000000000028e T cuTexRefSetFormat
124 00000000000d0b30 000000000000003d T cudbgGetAPI
125 00000000000d0b70 000000000000002a T cudbgGetAPIVersion
126 00000000000c9b10 000000000000000a T gpudbgDebuggerAttached

C dmesg output from hung task

1 NVRM: loading NVIDIA UNIX x86_64 Kernel Module 195.36.15 Fri Mar 12 00:29:13 PST 2010
2 NVRM: Xid (0007:00): 13, 0001 00000000 000050c0 00000368 00000000 00000100
3 NVRM: Xid (0007:00): 13, 0003 00000000 000050c0 00000368 00000000 00000100
4 NVRM: Xid (0007:00): 13, 0003 00000000 000050c0 00000368 00000000 00000100
5 NVRM: Xid (0007:00): 13, 0003 00000000 000050c0 00000368 00000000 00000100
6 NVRM: Xid (0007:00): 13, 0003 00000000 000050c0 00000368 00000000 00000100
7 NVRM: Xid (0007:00): 13, 0003 00000000 000050c0 00000368 00000000 00000100
8 NVRM: Xid (0007:00): 13, 0003 00000000 000050c0 00000368 00000000 00000100
9 NVRM: Xid (0007:00): 13, 0003 00000000 000050c0 00000368 00000000 00000100
10 NVRM: Xid (0007:00): 13, 0003 00000000 000050c0 00000368 00000000 00000100
11 INFO: task cudadump:7314 blocked for more than 120 seconds.
12 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
13 cudadump D ffff880143d843b0 0 7314 7000 0x00000004
14 ffff880146f6f9b8 0000000000000046 ffffffff812ddb8e ffff880146f6fa80
15 000000000005e450 0000000000000000 ffff880146f6ffd8 000000000000dea0
16 0000000000012940 0000000000004000 ffff880143d84740 ffff8800e2009000
17 Call Trace:
18 [<ffffffff812ddb8e>] ? common_interrupt+0xe/0x13
19 [<ffffffff8101d749>] ? __change_page_attr_set_clr+0xed/0x983
20 [<ffffffff812dc0a0>] schedule_timeout+0x35/0x1ea
21 [<ffffffff81078ae6>] ? __pagevec_free+0x29/0x3c
22 [<ffffffff81077839>] ? free_pcppages_bulk+0x46/0x244
23 [<ffffffff812dbf26>] wait_for_common+0xc4/0x13a

13
24 [<ffffffff8102f121>] ? default_wake_function+0x0/0xf
25 [<ffffffff812dc026>] wait_for_completion+0x18/0x1a
26 [<ffffffffa12048bd>] os_acquire_sema+0x3f/0x66 [nvidia]
27 [<ffffffffa110c8dc>] _nv006655rm+0x6/0x1f [nvidia]
28 [<ffffffffa1114e9d>] ? rm_free_unused_clients+0x5a/0xb7 [nvidia]
29 [<ffffffffa1201967>] ? nv_kern_ctl_close+0x93/0xcb [nvidia]
30 [<ffffffffa12025b8>] ? nv_kern_close+0xa1/0x373 [nvidia]
31 [<ffffffff810a6af3>] ? __fput+0x112/0x1d1
32 [<ffffffff810a6bc7>] ? fput+0x15/0x17
33 [<ffffffff810a3f71>] ? filp_close+0x58/0x62
34 [<ffffffff8103550a>] ? put_files_struct+0x65/0xb4
35 [<ffffffff81035594>] ? exit_files+0x3b/0x40
36 [<ffffffff81036d1c>] ? do_exit+0x1dc/0x661
37 [<ffffffff81037211>] ? do_group_exit+0x70/0x99
38 [<ffffffff81040435>] ? get_signal_to_deliver+0x2de/0x2f9
39 [<ffffffff81001527>] ? do_signal+0x6d/0x681
40 [<ffffffff81046ce0>] ? remove_wait_queue+0x4c/0x51
41 [<ffffffff810368bc>] ? do_wait+0x1b2/0x1f5
42 [<ffffffff810369a7>] ? sys_wait4+0xa8/0xbc
43 [<ffffffff81001b62>] ? do_notify_resume+0x27/0x51
44 [<ffffffff81035195>] ? child_wait_callback+0x0/0x53
45 [<ffffffff810021cb>] ? int_signal+0x12/0x17
46 INFO: task cudadump:7314 blocked for more than 120 seconds.
47 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
48 cudadump D ffff880143d843b0 0 7314 7000 0x00000004
49 ffff880146f6f9b8 0000000000000046 ffffffff812ddb8e ffff880146f6fa80
50 000000000005e450 0000000000000000 ffff880146f6ffd8 000000000000dea0
51 0000000000012940 0000000000004000 ffff880143d84740 ffff8800e2009000
52 Call Trace:
53 [<ffffffff812ddb8e>] ? common_interrupt+0xe/0x13
54 [<ffffffff8101d749>] ? __change_page_attr_set_clr+0xed/0x983
55 [<ffffffff812dc0a0>] schedule_timeout+0x35/0x1ea
56 [<ffffffff81078ae6>] ? __pagevec_free+0x29/0x3c
57 [<ffffffff81077839>] ? free_pcppages_bulk+0x46/0x244
58 [<ffffffff812dbf26>] wait_for_common+0xc4/0x13a
59 [<ffffffff8102f121>] ? default_wake_function+0x0/0xf
60 [<ffffffff812dc026>] wait_for_completion+0x18/0x1a
61 [<ffffffffa12048bd>] os_acquire_sema+0x3f/0x66 [nvidia]
62 [<ffffffffa110c8dc>] _nv006655rm+0x6/0x1f [nvidia]
63 [<ffffffffa1114e9d>] ? rm_free_unused_clients+0x5a/0xb7 [nvidia]
64 [<ffffffffa1201967>] ? nv_kern_ctl_close+0x93/0xcb [nvidia]
65 [<ffffffffa12025b8>] ? nv_kern_close+0xa1/0x373 [nvidia]
66 [<ffffffff810a6af3>] ? __fput+0x112/0x1d1
67 [<ffffffff810a6bc7>] ? fput+0x15/0x17
68 [<ffffffff810a3f71>] ? filp_close+0x58/0x62
69 [<ffffffff8103550a>] ? put_files_struct+0x65/0xb4
70 [<ffffffff81035594>] ? exit_files+0x3b/0x40
71 [<ffffffff81036d1c>] ? do_exit+0x1dc/0x661
72 [<ffffffff81037211>] ? do_group_exit+0x70/0x99
73 [<ffffffff81040435>] ? get_signal_to_deliver+0x2de/0x2f9
74 [<ffffffff81001527>] ? do_signal+0x6d/0x681
75 [<ffffffff81046ce0>] ? remove_wait_queue+0x4c/0x51
76 [<ffffffff810368bc>] ? do_wait+0x1b2/0x1f5
77 [<ffffffff810369a7>] ? sys_wait4+0xa8/0xbc
78 [<ffffffff81001b62>] ? do_notify_resume+0x27/0x51
79 [<ffffffff81035195>] ? child_wait_callback+0x0/0x53
80 [<ffffffff810021cb>] ? int_signal+0x12/0x17
81 INFO: task cudadump:7314 blocked for more than 120 seconds.
82 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
83 cudadump D ffff880143d843b0 0 7314 7000 0x00000004
84 ffff880146f6f9b8 0000000000000046 ffffffff812ddb8e ffff880146f6fa80
85 000000000005e450 0000000000000000 ffff880146f6ffd8 000000000000dea0
86 0000000000012940 0000000000004000 ffff880143d84740 ffff8800e2009000
87 Call Trace:
88 [<ffffffff812ddb8e>] ? common_interrupt+0xe/0x13
89 [<ffffffff8101d749>] ? __change_page_attr_set_clr+0xed/0x983
90 [<ffffffff812dc0a0>] schedule_timeout+0x35/0x1ea
91 [<ffffffff81078ae6>] ? __pagevec_free+0x29/0x3c
92 [<ffffffff81077839>] ? free_pcppages_bulk+0x46/0x244
93 [<ffffffff812dbf26>] wait_for_common+0xc4/0x13a
94 [<ffffffff8102f121>] ? default_wake_function+0x0/0xf
95 [<ffffffff812dc026>] wait_for_completion+0x18/0x1a
96 [<ffffffffa12048bd>] os_acquire_sema+0x3f/0x66 [nvidia]
97 [<ffffffffa110c8dc>] _nv006655rm+0x6/0x1f [nvidia]
98 [<ffffffffa1114e9d>] ? rm_free_unused_clients+0x5a/0xb7 [nvidia]
99 [<ffffffffa1201967>] ? nv_kern_ctl_close+0x93/0xcb [nvidia]
100 [<ffffffffa12025b8>] ? nv_kern_close+0xa1/0x373 [nvidia]

14
101 [<ffffffff810a6af3>] ? __fput+0x112/0x1d1
102 [<ffffffff810a6bc7>] ? fput+0x15/0x17
103 [<ffffffff810a3f71>] ? filp_close+0x58/0x62
104 [<ffffffff8103550a>] ? put_files_struct+0x65/0xb4
105 [<ffffffff81035594>] ? exit_files+0x3b/0x40
106 [<ffffffff81036d1c>] ? do_exit+0x1dc/0x661
107 [<ffffffff81037211>] ? do_group_exit+0x70/0x99
108 [<ffffffff81040435>] ? get_signal_to_deliver+0x2de/0x2f9
109 [<ffffffff81001527>] ? do_signal+0x6d/0x681
110 [<ffffffff81046ce0>] ? remove_wait_queue+0x4c/0x51
111 [<ffffffff810368bc>] ? do_wait+0x1b2/0x1f5
112 [<ffffffff810369a7>] ? sys_wait4+0xa8/0xbc
113 [<ffffffff81001b62>] ? do_notify_resume+0x27/0x51
114 [<ffffffff81035195>] ? child_wait_callback+0x0/0x53
115 [<ffffffff810021cb>] ? int_signal+0x12/0x17
116 INFO: task cudaranger:7349 blocked for more than 120 seconds.
117 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
118 cudaranger D ffff88002820dea0 0 7349 7314 0x00000004
119 ffff8800663199e8 0000000000000046 0000000000000000 0000000000000000
120 ffff8800663199f8 ffffffff8102694c ffff880066319fd8 000000000000dea0
121 0000000000012940 0000000000004000 ffff880143caddd0 000000010037459f
122 Call Trace:
123 [<ffffffff8102694c>] ? select_task_rq_fair+0x4eb/0x8a2
124 [<ffffffff812dc0a0>] schedule_timeout+0x35/0x1ea
125 [<ffffffff8102f10f>] ? try_to_wake_up+0x328/0x33a
126 [<ffffffff812dbf26>] wait_for_common+0xc4/0x13a
127 [<ffffffff8102f121>] ? default_wake_function+0x0/0xf
128 [<ffffffff812dc026>] wait_for_completion+0x18/0x1a
129 [<ffffffffa12048bd>] os_acquire_sema+0x3f/0x66 [nvidia]
130 [<ffffffffa110c8dc>] _nv006655rm+0x6/0x1f [nvidia]
131 [<ffffffffa1114e9d>] ? rm_free_unused_clients+0x5a/0xb7 [nvidia]
132 [<ffffffffa12025e2>] ? nv_kern_close+0xcb/0x373 [nvidia]
133 [<ffffffff810a6af3>] ? __fput+0x112/0x1d1
134 [<ffffffff810a6bc7>] ? fput+0x15/0x17
135 [<ffffffff810a3f71>] ? filp_close+0x58/0x62
136 [<ffffffff8103550a>] ? put_files_struct+0x65/0xb4
137 [<ffffffff81035594>] ? exit_files+0x3b/0x40
138 [<ffffffff81036d1c>] ? do_exit+0x1dc/0x661
139 [<ffffffffa1204865>] ? os_release_sema+0x47/0x60 [nvidia]
140 [<ffffffff81037211>] ? do_group_exit+0x70/0x99
141 [<ffffffff81040435>] ? get_signal_to_deliver+0x2de/0x2f9
142 [<ffffffff81001527>] ? do_signal+0x6d/0x681
143 [<ffffffff812dbc43>] ? schedule+0x9fd/0xaf0
144 [<ffffffff810b1fd1>] ? do_vfs_ioctl+0x480/0x4c6
145 [<ffffffff81001b62>] ? do_notify_resume+0x27/0x51
146 [<ffffffff810b2059>] ? sys_ioctl+0x42/0x65
147 [<ffffffff812ddc5a>] ? retint_signal+0x3d/0x83
148 [recombinator](0) $

D dmesg output from wayward OOM killer

1 [41561.362741] NVRM: Xid (0001:00): 13, 0006 00000000 000050c0 00000368 00000000 00000100
2 [41592.776280] NVRM: Xid (0001:00): 13, 0006 00000000 000050c0 00000368 00000000 00000100
3 [42842.683531] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
4 [42842.683536] firefox-bin cpuset=/ mems_allowed=0
5 [42842.683538] Pid: 16771, comm: firefox-bin Tainted: P W 2.6.34-rc2 #1
6 [42842.683540] Call Trace:
7 [42842.683547] [<ffffffff81098a11>] ? T.492+0x5f/0x16f
8 [42842.683550] [<ffffffff8109be60>] ? get_page_from_freelist+0x6ea/0x72d
9 [42842.683553] [<ffffffff8109895d>] ? badness+0x1d2/0x227
10 [42842.683555] [<ffffffff81098b58>] ? T.491+0x37/0xfe
11 [42842.683558] [<ffffffff81098d5f>] ? __out_of_memory+0x140/0x157
12 [42842.683561] [<ffffffff81098ed1>] ? out_of_memory+0x15b/0x18d
13 [42842.683563] [<ffffffff8109c57a>] ? __alloc_pages_nodemask+0x4a0/0x5d5
14 [42842.683567] [<ffffffff8109de57>] ? __do_page_cache_readahead+0x93/0x1b3
15 [42842.683569] [<ffffffff8109df93>] ? ra_submit+0x1c/0x20
16 [42842.683572] [<ffffffff81097807>] ? filemap_fault+0x17e/0x2f3
17 [42842.683575] [<ffffffff810adb86>] ? __do_fault+0x52/0x3ba
18 [42842.683578] [<ffffffff810ae9bc>] ? handle_mm_fault+0x3ed/0x7aa
19 [42842.683582] [<ffffffff81051f21>] ? autoremove_wake_function+0x0/0x2a
20 [42842.683586] [<ffffffff81022417>] ? do_page_fault+0x27e/0x29a
21 [42842.683589] [<ffffffff812a2945>] ? page_fault+0x25/0x30
22 [42842.683591] Mem-Info:
23 [42842.683592] Node 0 DMA per-cpu:
24 [42842.683594] CPU 0: hi: 0, btch: 1 usd: 0
25 [42842.683596] CPU 1: hi: 0, btch: 1 usd: 0
26 [42842.683597] CPU 2: hi: 0, btch: 1 usd: 0

15
27 [42842.683599] CPU 3: hi: 0, btch: 1 usd: 0
28 [42842.683601] CPU 4: hi: 0, btch: 1 usd: 0
29 [42842.683602] CPU 5: hi: 0, btch: 1 usd: 0
30 [42842.683604] CPU 6: hi: 0, btch: 1 usd: 0
31 [42842.683606] CPU 7: hi: 0, btch: 1 usd: 0
32 [42842.683607] Node 0 DMA32 per-cpu:
33 [42842.683609] CPU 0: hi: 186, btch: 31 usd: 172
34 [42842.683611] CPU 1: hi: 186, btch: 31 usd: 183
35 [42842.683612] CPU 2: hi: 186, btch: 31 usd: 170
36 [42842.683614] CPU 3: hi: 186, btch: 31 usd: 133
37 [42842.683615] CPU 4: hi: 186, btch: 31 usd: 65
38 [42842.683617] CPU 5: hi: 186, btch: 31 usd: 74
39 [42842.683619] CPU 6: hi: 186, btch: 31 usd: 164
40 [42842.683620] CPU 7: hi: 186, btch: 31 usd: 184
41 [42842.683622] Node 0 Normal per-cpu:
42 [42842.683623] CPU 0: hi: 186, btch: 31 usd: 169
43 [42842.683625] CPU 1: hi: 186, btch: 31 usd: 149
44 [42842.683627] CPU 2: hi: 186, btch: 31 usd: 170
45 [42842.683628] CPU 3: hi: 186, btch: 31 usd: 146
46 [42842.683630] CPU 4: hi: 186, btch: 31 usd: 122
47 [42842.683632] CPU 5: hi: 186, btch: 31 usd: 96
48 [42842.683633] CPU 6: hi: 186, btch: 31 usd: 167
49 [42842.683635] CPU 7: hi: 186, btch: 31 usd: 153
50 [42842.683639] active_anon:64781 inactive_anon:11419 isolated_anon:32
51 [42842.683640] active_file:166 inactive_file:472 isolated_file:0
52 [42842.683640] unevictable:0 dirty:0 writeback:11229 unstable:0
53 [42842.683641] free:11299 slab_reclaimable:3152 slab_unreclaimable:21764
54 [42842.683642] mapped:1053702 shmem:493 pagetables:7058 bounce:0
55 [42842.683644] Node 0 DMA free:15896kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file
56 [42842.683652] lowmem_reserve[]: 0 3445 7990 7990
57 [42842.683655] Node 0 DMA32 free:22812kB min:4928kB low:6160kB high:7392kB active_anon:64216kB inactive_anon:12900kB active_file:0
58 [42842.683664] lowmem_reserve[]: 0 0 4545 4545
59 [42842.683666] Node 0 Normal free:6488kB min:6500kB low:8124kB high:9748kB active_anon:194908kB inactive_anon:32776kB active_file:
60 [42842.683675] lowmem_reserve[]: 0 0 0 0
61 [42842.683678] Node 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15896kB
62 [42842.683685] Node 0 DMA32: 4687*4kB 0*8kB 2*16kB 2*32kB 0*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 22812kB
63 [42842.683691] Node 0 Normal: 634*4kB 16*8kB 1*16kB 5*32kB 1*64kB 2*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 6488kB
64 [42842.683698] 12609 total pagecache pages
65 [42842.683699] 11387 pages in swap cache
66 [42842.683701] Swap cache stats: add 22707, delete 11320, find 2/2
67 [42842.683703] Free swap = 6744792kB
68 [42842.683704] Total swap = 6835620kB
69 [42842.707808] 2097151 pages RAM
70 [42842.707810] 64340 pages reserved
71 [42842.707812] 1056384 pages shared
72 [42842.707813] 964448 pages non-shared
73 [42842.707816] Out of memory: kill process 8555 (cudapinner) score 1057549 or a child
74 [42842.707820] Killed process 8555 (cudapinner) vsz:4230196kB, anon-rss:744kB, file-rss:15432kB
75 [42842.718544] NVRM: VM: nv_vm_malloc_pages: failed to allocate a page
76 [42891.026550] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
77 [42891.026559] firefox-bin cpuset=/ mems_allowed=0
78 [42891.026565] Pid: 16520, comm: firefox-bin Tainted: P W 2.6.34-rc2 #1
79 [42891.026569] Call Trace:
80 [42891.026582] [<ffffffff81098a11>] ? T.492+0x5f/0x16f
81 [42891.026588] [<ffffffff8109be60>] ? get_page_from_freelist+0x6ea/0x72d
82 [42891.026595] [<ffffffff8109895d>] ? badness+0x1d2/0x227
83 [42891.026602] [<ffffffff81098b58>] ? T.491+0x37/0xfe
84 [42891.026611] [<ffffffff81098d5f>] ? __out_of_memory+0x140/0x157
85 [42891.026614] [<ffffffff81098ed1>] ? out_of_memory+0x15b/0x18d
86 [42891.026616] [<ffffffff8109c57a>] ? __alloc_pages_nodemask+0x4a0/0x5d5
87 [42891.026619] [<ffffffff8111d910>] ? ext4_get_block+0x0/0xe1
88 [42891.026622] [<ffffffff8109de57>] ? __do_page_cache_readahead+0x93/0x1b3
89 [42891.026624] [<ffffffff8109df93>] ? ra_submit+0x1c/0x20
90 [42891.026626] [<ffffffff81097807>] ? filemap_fault+0x17e/0x2f3
91 [42891.026629] [<ffffffff810adb86>] ? __do_fault+0x52/0x3ba
92 [42891.026632] [<ffffffff810ae9bc>] ? handle_mm_fault+0x3ed/0x7aa
93 [42891.026635] [<ffffffff810034ce>] ? call_function_interrupt+0xe/0x20
94 [42891.026638] [<ffffffff8100948a>] ? read_tsc+0x5/0x16
95 [42891.026641] [<ffffffff81022417>] ? do_page_fault+0x27e/0x29a
96 [42891.026645] [<ffffffff812a2945>] ? page_fault+0x25/0x30
97 [42891.026646] Mem-Info:
98 [42891.026647] Node 0 DMA per-cpu:
99 [42891.026649] CPU 0: hi: 0, btch: 1 usd: 0
100 [42891.026650] CPU 1: hi: 0, btch: 1 usd: 0
101 [42891.026652] CPU 2: hi: 0, btch: 1 usd: 0
102 [42891.026653] CPU 3: hi: 0, btch: 1 usd: 0
103 [42891.026655] CPU 4: hi: 0, btch: 1 usd: 0

16
104 [42891.026656] CPU 5: hi: 0, btch: 1 usd: 0
105 [42891.026658] CPU 6: hi: 0, btch: 1 usd: 0
106 [42891.026659] CPU 7: hi: 0, btch: 1 usd: 0
107 [42891.026660] Node 0 DMA32 per-cpu:
108 [42891.026662] CPU 0: hi: 186, btch: 31 usd: 178
109 [42891.026663] CPU 1: hi: 186, btch: 31 usd: 0
110 [42891.026665] CPU 2: hi: 186, btch: 31 usd: 173
111 [42891.026666] CPU 3: hi: 186, btch: 31 usd: 60
112 [42891.026667] CPU 4: hi: 186, btch: 31 usd: 0
113 [42891.026669] CPU 5: hi: 186, btch: 31 usd: 0
114 [42891.026670] CPU 6: hi: 186, btch: 31 usd: 0
115 [42891.026671] CPU 7: hi: 186, btch: 31 usd: 57
116 [42891.026672] Node 0 Normal per-cpu:
117 [42891.026674] CPU 0: hi: 186, btch: 31 usd: 171
118 [42891.026676] CPU 1: hi: 186, btch: 31 usd: 0
119 [42891.026677] CPU 2: hi: 186, btch: 31 usd: 176
120 [42891.026678] CPU 3: hi: 186, btch: 31 usd: 145
121 [42891.026680] CPU 4: hi: 186, btch: 31 usd: 0
122 [42891.026681] CPU 5: hi: 186, btch: 31 usd: 0
123 [42891.026683] CPU 6: hi: 186, btch: 31 usd: 6
124 [42891.026684] CPU 7: hi: 186, btch: 31 usd: 136
125 [42891.026688] active_anon:45494 inactive_anon:8079 isolated_anon:17
126 [42891.026688] active_file:264 inactive_file:470 isolated_file:0
127 [42891.026689] unevictable:0 dirty:0 writeback:6579 unstable:0
128 [42891.026690] free:11235 slab_reclaimable:2986 slab_unreclaimable:21615
129 [42891.026690] mapped:1053541 shmem:200 pagetables:7107 bounce:0
130 [42891.026692] Node 0 DMA free:15896kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file
131 [42891.026699] lowmem_reserve[]: 0 3445 7990 7990
132 [42891.026702] Node 0 DMA32 free:22784kB min:4928kB low:6160kB high:7392kB active_anon:52436kB inactive_anon:10560kB active_file:9
133 [42891.026710] lowmem_reserve[]: 0 0 4545 4545
134 [42891.026712] Node 0 Normal free:6260kB min:6500kB low:8124kB high:9748kB active_anon:129540kB inactive_anon:21756kB active_file:
135 [42891.026720] lowmem_reserve[]: 0 0 0 0
136 [42891.026722] Node 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15896kB
137 [42891.026728] Node 0 DMA32: 765*4kB 386*8kB 204*16kB 139*32kB 56*64kB 13*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 22948
138 [42891.026734] Node 0 Normal: 315*4kB 54*8kB 56*16kB 5*32kB 2*64kB 2*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 6460kB
139 [42891.026740] 9590 total pagecache pages
140 [42891.026741] 8676 pages in swap cache
141 [42891.026742] Swap cache stats: add 44073, delete 35397, find 892/1022
142 [42891.026743] Free swap = 6664808kB
143 [42891.026744] Total swap = 6835620kB
144 [42891.056305] 2097151 pages RAM
145 [42891.056308] 64340 pages reserved
146 [42891.056309] 1057095 pages shared
147 [42891.056311] 965272 pages non-shared
148 [42891.056313] Out of memory: kill process 8614 (cudapinner) score 1057549 or a child
149 [42891.056316] Killed process 8614 (cudapinner) vsz:4230196kB, anon-rss:748kB, file-rss:15560kB
150 [42891.073527] NVRM: VM: nv_vm_malloc_pages: failed to allocate a page
151 [hyperbox](0) $

Lecture 2
No ratings yet
Lecture 2
77 pages
Cuda C
No ratings yet
Cuda C
70 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
ACA Unit3 Revised
No ratings yet
ACA Unit3 Revised
53 pages
CUDA_1
No ratings yet
CUDA_1
45 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
Program Structure of CUDA
No ratings yet
Program Structure of CUDA
3 pages
21.L18 Intro To GPU and CUDA C
No ratings yet
21.L18 Intro To GPU and CUDA C
89 pages
1 Cuda
100% (1)
1 Cuda
173 pages
Side-Channel Power Analysis of A GPU AES Implementation: Chao Luo, Yunsi Fei, Pei Luo, Saoni Mukherjee, David Kaeli
No ratings yet
Side-Channel Power Analysis of A GPU AES Implementation: Chao Luo, Yunsi Fei, Pei Luo, Saoni Mukherjee, David Kaeli
8 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
247 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
Lecture3 Fundamentals of CUDA(Part1)_2025
No ratings yet
Lecture3 Fundamentals of CUDA(Part1)_2025
52 pages
Unit 5 - CUDA Architecture
No ratings yet
Unit 5 - CUDA Architecture
17 pages
CUDAProgModel
No ratings yet
CUDAProgModel
24 pages
Lecture2 Cuda Basic 2010
No ratings yet
Lecture2 Cuda Basic 2010
44 pages
CUDA Programming: Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen
No ratings yet
CUDA Programming: Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen
28 pages
cs179 2016 Lec13
No ratings yet
cs179 2016 Lec13
30 pages
Gpu Cuda
No ratings yet
Gpu Cuda
204 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
Cuda Talk
100% (1)
Cuda Talk
82 pages
Overview of GPGPU's
No ratings yet
Overview of GPGPU's
81 pages
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
No ratings yet
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
28 pages
rCUDA Guide
No ratings yet
rCUDA Guide
13 pages
Introduction To CUDA C
No ratings yet
Introduction To CUDA C
67 pages
CUDA Programming with C++: From Basics to Expert Proficiency
From Everand
CUDA Programming with C++: From Basics to Expert Proficiency
William Smith
No ratings yet
Lec 1
No ratings yet
Lec 1
27 pages
Parallel & Distributed Computing Report
No ratings yet
Parallel & Distributed Computing Report
4 pages
217 Lec2
No ratings yet
217 Lec2
24 pages
Introduction To CUDA C 3
No ratings yet
Introduction To CUDA C 3
67 pages
CUDA 1_Introduction to GPU, CUDA (1)
No ratings yet
CUDA 1_Introduction to GPU, CUDA (1)
21 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
cuuda nvidai guide_Part1
No ratings yet
cuuda nvidai guide_Part1
15 pages
Lec 2 PDC
No ratings yet
Lec 2 PDC
31 pages
govind_6
No ratings yet
govind_6
4 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
VCUDA
No ratings yet
VCUDA
11 pages
cuda
No ratings yet
cuda
25 pages
CUDA PPT Anurita Unit3
No ratings yet
CUDA PPT Anurita Unit3
42 pages
Endsem Imp Hpc Unit 5
No ratings yet
Endsem Imp Hpc Unit 5
24 pages
Getting Started With CUDA Samples
No ratings yet
Getting Started With CUDA Samples
9 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
NVCC 1.1
No ratings yet
NVCC 1.1
30 pages
Nvidia Cuda C Getting Started Guide For Linux: Installation and Verification On Linux Systems
No ratings yet
Nvidia Cuda C Getting Started Guide For Linux: Installation and Verification On Linux Systems
16 pages
Puting Experiences
No ratings yet
Puting Experiences
15 pages
Cuda Lab Manual
100% (1)
Cuda Lab Manual
22 pages
CUDA Programming Invert
No ratings yet
CUDA Programming Invert
36 pages
AcceleratingAIAdvancements Pre Print Doube Blind
No ratings yet
AcceleratingAIAdvancements Pre Print Doube Blind
9 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
Parallel Processing With Cuda
No ratings yet
Parallel Processing With Cuda
25 pages
CUDA Wikipedia
No ratings yet
CUDA Wikipedia
10 pages
5. Moving to Parallel With CUDA - Hello Program
No ratings yet
5. Moving to Parallel With CUDA - Hello Program
14 pages
CUDA8.0 Installation Guide Linux
No ratings yet
CUDA8.0 Installation Guide Linux
41 pages
Mastering CUDA C Programming
From Everand
Mastering CUDA C Programming
Ed Norex
No ratings yet
Mastering CUDA C++ Programming: A Comprehensive Guidebook
From Everand
Mastering CUDA C++ Programming: A Comprehensive Guidebook
Brett Neutreon
No ratings yet
CUDA Programming Fundamentals: Definitive Reference for Developers and Engineers
From Everand
CUDA Programming Fundamentals: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Cs 8803 Ss Project

Uploaded by

Cs 8803 Ss Project

Uploaded by

My Other Computer is your GPU:

System-Centric CUDA Threat Modeling with CUBAR

1 Introduction using version 195.36.15 of the CUDA software. Hard-

C dmesg output from hung task

D dmesg output from wayward OOM killer

You might also like