0% found this document useful (0 votes)
9 views14 pages

Jang Asplos19

Uploaded by

chrisjo987987
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views14 pages

Jang Asplos19

Uploaded by

chrisjo987987
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Heterogeneous Isolated Execution for Commodity

GPUs
Insu Jang Adrian Tang Taehoon Kim
[email protected] [email protected] [email protected]
School of Computing, KAIST Department of Computer Science, School of Computing, KAIST
Daejeon, Republic of Korea Columbia University Daejeon, Republic of Korea
New York, NY, USA

Simha Sethumadhavan Jaehyuk Huh


[email protected] [email protected]
Department of Computer Science, School of Computing, KAIST
Columbia University Daejeon, Republic of Korea
New York, NY, USA
Abstract Keywords Trusted execution, Heterogeneous computing,
Traditional CPUs and cloud systems based on them have em- GPU security
braced the hardware-based trusted execution environments ACM Reference Format:
to securely isolate computation from malicious OS or hard- Insu Jang, Adrian Tang, Taehoon Kim, Simha Sethumadhavan,
ware attacks. However, GPUs and their cloud deployments and Jaehyuk Huh. 2019. Heterogeneous Isolated Execution for
have yet to include such support for hardware-based trusted Commodity GPUs. In 2019 Architectural Support for Programming
computing. As large amounts of sensitive data are offloaded Languages and Operating Systems (ASPLOS ’19), April 13–17, 2019,
to GPU acceleration in cloud environments, ensuring the Providence, RI, USA. ACM, New York, NY, USA, 14 pages. https:
//doi.org/10.1145/3297858.3304021
security of the data is a current and pressing need. As de-
ployed today, the outsourced GPU model is vulnerable to
attacks from compromised privileged software. To support
1 Introduction
isolated remote execution on GPUs even under vulnerable In conventional CPU-based computation, hardware-based
operating systems, this paper proposes a novel hardware trusted execution environments (TEE) such as Intel SGX
and software architecture, called HIX (Heterogeneous Iso- and ARM TrustZone have been providing trusted and iso-
lated eXecution). HIX does not require modifications to the lated computing environments to user applications. Such
GPU architecture to offer protections: Instead, it offers se- hardware-based TEEs reduce the trusted computing base
curity by modifying the I/O interconnect between the CPU (TCB) of the computation to the processor and critical code
and GPU, and by refactoring the GPU device driver to work running in TEE. With the TEE support, security-critical ap-
from within the CPU trusted environment. A result of the plications can be protected from compromised privileged
architectural choices behind HIX is that the concept can be software as well as hardware-based attacks to the memory
applied to other offload accelerators besides GPUs. This work and system buses, to provide secure computation running
implements the proposed HIX architecture on an emulated on untrusted remote cloud servers.
machine with KVM and QEMU. Experimental results from With increasing use of general purpose GPU computing
the emulated security support with a real GPU show that from traditional high performance computing to data center
the performance overhead for security is curtailed to 26% on acceleration and machine learning applications, securing
average for the Rodinia benchmark, while providing secure the GPU computation has become critical to protect secu-
isolated GPU computing. rity sensitive data [34, 45, 56, 57]. However, although even
more and more critical data are processed in GPUs, trusted
Permission to make digital or hard copies of all or part of this work for computing is yet to be supported in GPU computation. In
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear the current system architecture, high performance discrete
this notice and the full citation on the first page. Copyrights for components GPUs communicate with CPUs through I/O interconnects
of this work owned by others than ACM must be honored. Abstracting with such as PCI Express (PCIe) buses, and the GPU driver which
credit is permitted. To copy otherwise, or republish, to post on servers or to is part of the operating system controls the GPUs [25]. As the
redistribute to lists, requires prior specific permission and/or a fee. Request privileged operating system can fully control the hardware
permissions from [email protected].
ASPLOS ’19, April 13–17, 2019, Providence, RI, USA
I/O interconnects and GPU driver, computing in GPUs is
© 2019 Association for Computing Machinery. vulnerable to potential attacks on the operating system [8].
ACM ISBN 978-1-4503-6240-5/19/04. . . $15.00 Beyond the GPU-based computing, the proliferation of vari-
https://doi.org/10.1145/3297858.3304021 ous accelerator-based computing models has been increasing
ASPLOS ’19, April 13–17, 2019, Providence, RI, USA I. Jang, et al.

the demands for higher-level of security supports for accel- modification of the accelerators themselves, if the accelerator
erators under the vulnerable privileged software. is connected via I/O interconnects.
In existing architectures, both of the code and data in We evaluate the proposed architecture in terms of security
GPUs can be compromised by a privileged adversary. Recent and performance. We have implemented a prototype for HIX
work has demonstrated that the integrity of GPU code can on KVM and QEMU, adding extra instructions for the GPU
be subverted by disrupting and replacing the code at runtime enclave and separating the GPU driver from the operating
with an off-the-shelf reverse engineering tool [13]. In addi- system. The prototype using the emulation connected to
tion to code, data in GPU can potentially be uncovered and a real GPU shows that the performance degradation intro-
leaked [45]. GPU data vulnerable to confidentiality attacks duced by HIX secure GPU computation is 26% compared to
comprises both the communication data being transferred to the conventional unsecure GPU computation for the bench-
and from a GPU, and the data being processed within a GPU. marks from the Rodinia suite.
The susceptibility of GPUs to confidentiality and integrity We summarize the main contributions of this work as
attacks stems from the lack of access control to their inter- follows:
faces such as the I/O interconnects and memory-mapped I/O
• We provide an attack surface assessment of GPU com-
addresses.
putation. We identify key GPU components that can be
To support secure computing in GPUs, this paper proposes
attacked from privileged software: PCIe interconnect,
a novel hardware and software architecture for isolating
memory mapped I/O region, and GPU driver.
GPUs even from the potentially malicious privileged soft-
• We augment the design of the PCIe interconnect to
ware (OS and hypervisor). The proposed architecture, called
block any routing change after the GPU initializa-
Heterogeneous Isolated eXecution (HIX), requires minor ex-
tion, and to further guarantee the address mapping
tensions to the current PCIe interconnect implementation
immutability of the memory mapped I/O region to the
and the TEE support in CPUs. The goal of HIX is to extend
GPU.
the security guarantees, namely confidentiality and integrity
• We extend the current SGX interface to support the
of user data, of TEE technologies to heterogeneous com-
GPU enclave, which runs the GPU driver in a secure
puting environments. At the time of writing, none of these
way. The MMU design is extended to protect the GPU
technologies protect accelerators in heterogeneous systems
memory mapped I/O region from unauthorized ac-
from privileged software attacks; they only protect the code
cesses.
and data in trusted “enclaves” running on the processors.
• We implement a prototype on an emulated system
In this work, we expand the scope of a widely used trusted
with KVM and QEMU to evaluate the performance
isolation technology, Intel SGX, to secure general purpose
overhead of HIX. Although it is implemented in the
accelerators, in particular GPUs.
emulated system due to the required changes in hard-
Our proposed architecture consists of four main hardware
ware, it faithfully reflects necessary changes in hard-
and software changes. First, key functions of the GPU driver
ware interfaces and software architectures.
are removed from the operating system (OS) and relocated
in a separate process in its own GPU enclave. The GPU en- The rest of the paper is organized as follows. Section 2
clave is an extension of the current SGX enclave, designed describes the current architecture of SGX, PCIe, and GPU dri-
to exclusively manage the GPU. Second, the PCIe intercon- ver. Section 3 discusses the threat model. Section 4 presents
nect architecture is slightly modified to prevent the OS from the proposed architecture. Section 5 discusses the security
changing the routing configuration of the interconnect, once analysis and shows performance results. Section 6 presents
the GPU enclave is completely initialized. Third, the memory the prior work and Section 7 concludes the paper.
management unit (MMU) is augmented to protect the mem-
ory mapped GPU I/O region from unauthorized accesses. 2 Background
Fourth, the CPU counterpart process of a GPU application
runs on an SGX enclave, and the SGX enclave sets up a HIX is designed on top of Intel SGX architecture and the
trusted communication path to the GPU enclave, which is PCI Express standard. We provide a brief overview of these
robust even against privileged adversaries. technologies in this section.
To support the secure execution environments for GPUs
without any GPU modification, HIX does not provide the 2.1 Intel Software Guard Extensions (SGX)
protection against direct hardware-based attacks, as PCIe Intel SGX is a hardware-based protection technology that
buses and the memory of GPUs are exposed to such hardware provides a trusted execution environment (TEE) called an
attacks in the current architecture. Although the security enclave, protected even from the privileged software and di-
level is lower compared to the hardware TEEs for CPUs, HIX rect hardware attacks. SGX protects the enclave memory and
can be extended to other accelerators without requiring any execution contexts to support the strong isolated execution.
The SGX hardware-based isolated execution is augmented
Heterogeneous Isolated Execution for Commodity GPUs ASPLOS ’19, April 13–17, 2019, Providence, RI, USA

Virtual Address Space MMIO access Software


Untrusted Data, Code, etc ELRANGE EPCM DMA access MMIO Virtual Address
VA PA
DRAM VA PA CPU
EPC Pages … GPU PCIe Root Complex
MMU
IOMMU
Figure 1. SGX enclave memory mapping structure
DRAM Main Memory MMIO Physical Address
System Address Map
by an attestation service that verifies the integrity of the
code running on the enclave [1, 35].
The main memory is untrusted under the SGX threat Figure 2. I/O path in PCI Express system architecture
model, and thus, SGX provides memory encryption and ac-
cess restriction mechanisms to protect a small region of main
memory for enclaves, called the enclave page cache (EPC). system routes the DMA request. An input/output memory
Although SGX uses the virtual memory support provided management unit (IOMMU) can be used to translate device
by the untrusted OS, it protects EPC pages from unautho- addresses to physical addresses for DMAs [42].
rized accesses with hardware-based verification. Figure 1
illustrates the structure of SGX address space. In the figure, 2.3 Controlling GPU in Software
ELRANGE (Enclave Linear Address Range) is the protected vir- Given the underlying hardware I/O path described in Sec-
tual address range in the enclave, and the pages in the range tion 2.2, the software is able to control the GPU by writing
are guaranteed to be mapped to EPC pages. When an enclave commands to a GPU command buffer in the GPU MMIO
is created, the system software registers the virtual address region. Once a virtual address is assigned to the GPU MMIO
and corresponding EPC physical address of a page in the pro- physical address, the OS or a user process can access the
tected memory using EADD SGX instruction. During handling GPU through the MMIO virtual address, if the MMIO virtual
of the EADD instruction, the hardware stores the mapping address is accessible from the OS or process [47]. The data
information in the enclave page cache map (EPCM) to verify such as GPU binary codes or input data can be transferred
future accesses to the page during address translation in to the GPU via MMIO or DMA, while DMA is optimized for
MMU [9]. bulk data transfers [15].
2.2 PCI Express Architecture
3 Threat Model
Modern GPUs are connected to the system via the PCI Ex-
3.1 Attacker Model and Assumptions
press (PCIe) interface. The PCIe interface facilitates memory-
mapped I/O (MMIO) access to PCIe devices for software. The adversarial model we address is a privileged adversary
Since the MMIO mechanism maps the hardware registers with the goal of breaking confidentiality and integrity of
and memory of a device to the system memory address space the data to be processed by GPUs. We focus on attack vec-
for software, this enables the software to transparently access tors comprising the hardware and software I/O data path
the PCIe devices using regular memory addresses. Figure 2 between a user application to the GPU. We assume that the
illustrates how the system routes device access requests to adversary has privileged software control over the target
the device by using the system memory address map [49]. system. Specifically, the adversary can control all the privi-
CPU is responsible for distinguishing accesses to the MMIO leged software components such as the OS kernel and device
regions from main memory accesses. It uses its internal hard- drivers within the kernel space. In addition to being capa-
ware registers which are initialized by BIOS at system boot ble of controlling code execution of these components, the
time, to route access requests for MMIO appropriately [19]. adversary is also able to inspect and observe data in main
When the address of a memory access is for the MMIO memory and manage the system address map, a set of infor-
region, the PCIe root complex takes the request. As PCIe mation indicating where main memory and MMIO access
devices are attached to the system as a tree, where the PCIe requests should be routed. We also assume that the CPU
root complex is its root, the root complex creates a PCIe package and GPU card are trusted, and the GPU has its own
transaction packet and routes it to the desired device, using separate device memory.
the hardware routing registers [5, 43]. These registers are
also initialized by the BIOS at system boot time to cover the 3.2 Out of Scope
entire physical address ranges of attached devices. Consistent with the defense scope of SGX, we do not consider
Modern PCIe devices use direct memory access (DMA) physical attacks to the CPU package and side channel-based
to directly read or write the main memory without CPU attacks [9]. It is not our goal to defend against implemen-
intervention. The DMA arrows in Figure 2 show how the tation bugs in user code to be run within the enclaves and
ASPLOS ’19, April 13–17, 2019, Providence, RI, USA I. Jang, et al.
Table 1. Required hardware and software changes for HIX.
User Enclave
Privileged Attacker
Type Changed Component Purpose Section
HIX Components
SW GPU enclave Sole GPU control 4.2
Protected Path
GPU Enclave Compromised by HIX HW New SGX instructions HW support for GPU enclave 4.2
OS HW Internal data structures HW support for GPU enclave 4.2
GPU Driver Thwarted Attacks
HW MMU page table walker MMIO access protection 4.3
HW PCIe root complex MMIO lockdown 4.3
Software SW Inter-enclave communication Trusted GPU usage for users 4.4
GPU MMIO
Hardware
SGX EPC SGX-enabled
MMU
CPU
GPU Enclave PCIe
Bus
MMIO region, protecting the GPU MMIO from the malicious
Meta-data PCIe Root Complex GPU OS.
Secure hardware I/O path: The GPU enclave manages the
Figure 3. HIX architecture overview GPU exclusively by sending commands and data through
MMIO, and thus the communication through MMIO must
be secured from the OS and other applications. It requires
several hardware extensions to the SGX support as well as
GPUs [11]. Availability attacks such as not to schedule a the PCIe architecture. First, similar to the enclave memory
specific process are not in our scope. protection, the OS is not allowed to change the virtual to
Apart from the limitations we inherit from Intel SGX, physical address mapping for the GPU MMIO region, once
HIX has several limitations specific to PCIe devices and I/O the mapping is established for the GPU enclave. Second, any
interconnect architecture. Physical attacks on the PCIe inter- accesses other than from the GPU enclave to the GPU MMIO
connects and GPUs, such as directly injecting PCIe packets region must be prohibited. Third, the GPU MMIO mapping
in the I/O communication path with a special hardware or ac- and routing configuration in the PCIe root complex must
cessing the GPU memory physically, are out of scope of HIX. not be changed once the GPU enclave is initialized. Finally,
This is an inherent trade-off we make because this study is the DMA data from/to the GPU must be protected from the
based on unmodified GPU hardware. Using the PCIe peer-to- malicious OS.
peer transaction functionality with a GPU protected by HIX Trusted application-to-GPU communication: For secure
is not available. While the latest GPUs support on-demand GPU computation, GPU requests are transferred from the
page-fault mechanism in GPUs [10, 16], the GPU computing user enclave to the GPU enclave, and the GPU enclave sends
model that HIX supports is restricted to the conventional the corresponding command to the GPU on behalf of the
model, which requires all the data to be in the GPU device user enclave. HIX leverages attestation and symmetric en-
memory before a GPU kernel execution. In addition, we do cryption to ensure the secure communication between the
not address availability attacks against GPUs in the form of user and GPU enclave.
resource exhaustion or denial-of-service attacks. We discuss Table 1 summarizes the required hardware and software
the limitations in more detail in Section 5.6. changes. With the hardware and software changes, HIX pro-
vides trusted GPU services to user enclaves, supporting the
4 HIX Architecture confidentiality and integrity of their sensitive data and the
4.1 Architecture Overview secure execution on them.
A key tenet in the HIX design is securing the command and
data path from the user application to a GPU at the software 4.2 GPU Enclave
and hardware levels. In a typical unprotected setting, the As illustrated in Figure 3, central to the HIX design is the
GPU driver is part of the operating system (OS), and the user-mode GPU enclave, which is responsible for two func-
I/O path to the GPU through MMIO is controlled by the OS. tions: (1) sole control over the GPU, and (2) sole user access
However, in the proposed HIX architecture, the GPU driver interface to the GPU. To reduce the attack surface, HIX sepa-
is separated from the OS, running in a secure enclave. The rates the critical functionality for controlling the GPU from
OS cannot affect the MMIO mapping and routing to the GPU. the OS-resident driver, and isolate it within the GPU enclave.
To provide the secure computing, the following software and The role of the remaining part of driver in the OS is reduced
hardware components must be supported. to offering benign kernel services such as assigning new
Isolated GPU management with GPU enclave: For se- virtual addresses for MMIO regions allocated to the GPU
cure GPU computing under the vulnerable OS, HIX separates enclave. During its initialization, the GPU enclave resets the
the GPU driver from the OS space. The GPU driver runs on GPU state to eliminate possible untrusted GPU programs
a TEE environment, called GPU enclave, as illustrated in Fig- loaded in the GPU. A required extension for SGX to sup-
ure 3. Only the GPU enclave is allowed to access the GPU port the GPU enclave is to allow the GPU enclave to access
Heterogeneous Isolated Execution for Commodity GPUs ASPLOS ’19, April 13–17, 2019, Providence, RI, USA

GPU Bus/Dev/Func Number 01:00.0 GPU Enclave ID = 2 real hardware GPU, and no GPU is registered to two GPU
enclaves at the same time. After creation, the GPU enclave
GECS TGMR Table
GPU Owner GPU GPU Physical Virtual
registers virtual address and MMIO physical address pairs
Size to HIX with EGADD instruction. During the registration, HIX
number Enclave ID number Address Address
01:00.0 2 01:00.0 0x8e00… 0x7fff… 0x200… checks whether the virtual address and MMIO address are
01:00.0 0xf000… 0x7fff… 0x400…
valid for the GPU enclave and owning GPU device, and stores
it into the TGMR table, if they are verified. The registered
Page Table DRAM EPC
MMIO regions are access-protected through verification us-
Page Table Entry MMU ing a virtual to physical address mapping protection, similar
PA:0x8e01… VA:0x7fff… Page Table to SGX regular enclaves. The MMIO access protection mech-
Walker anism is detailed in Section 4.3.1.
Insert a page table entry
Enclave ID after access validation 4.2.2 GPU Initialization and Measurement
Enclave VA TLB PA GPU Once the GPU enclave is created and loaded, it initializes the
GPU state to clean up any potentially malicious code in the
Figure 4. Data structures for protecting MMIO accesses GPU. In addition, the GPU enclave reads and measures the
GPU BIOS, which may have been compromised before the
GPU enclave is created. Note that once the GPU enclave is
the GPU MMIO region exclusively, preventing all the other
created, the GPU enclave has the exclusive control over the
software from accessing the GPU MMIO region.
GPU, and thus even the operating system cannot change the
4.2.1 GPU MMIO Registration BIOS of GPU.
Attesting the GPU hardware is done through two steps: (1)
HIX provides extended SGX instructions to safely manage verifying the integrity of the GPU BIOS, and (2) resetting the
GPU MMIO regions related to GPU management and data GPU to eliminate potential malicious codes. The GPU enclave
copy. The hardware needs to know (1) which MMIO region reads the GPU BIOS bytecode from the address stored in the
should be protected (the physical addresses of MMIO region), PCIe expansion ROM base address register. Once the GPU
(2) where it is mapped in the GPU enclave’s virtual address BIOS is verified to be genuine, HIX initiates the reset step
space (the corresponding virtual address of MMIO region), for the GPU, cleansing the GPU device state.
and (3) which GPU enclave should be permitted to access, to
protect the hardware I/O path from unauthorized accesses. 4.2.3 GPU Protection on GPU Enclave Termination
To register the GPU MMIO regions, two new instructions:
EGCREATE and EGADD, similar to the Intel SGX instructions Although HIX does not address availability attacks, HIX is
ECREATE and EADD, are added. still responsible for protecting the data in the GPU when
Intel SGX stores SGX internal data structures in EPC mem- the GPU enclave becomes unavailable. Even if the adversary
ory pages that are not accessible from software. Likewise, forcefully kills the GPU enclave, the GPU is protected by HIX
HIX stores additional internal data structures for GPU man- hardware. As the killed GPU enclave process still owns the
agement in EPC memory pages. Two of the hidden data GPU, the GPU can no longer be accessed by any software, and
structures are GPU enclave control structure (GECS) and even a newly created GPU enclave process cannot own the
trusted GPU MMIO region (TGMR) table, which are analo- GPU. Hence the user data in the GPU remains inaccessible
gous to the SGX enclave control structure (SECS) and enclave and protected. The GPU can only be used again after the
page cache map (EPCM) for regular enclaves. GECS contains system is shutdown and booted again. During the system
the control information regarding the GPU enclave including cold boot procedure, the GPU memory and register states
the hardware GPU number and GPU enclave ID. TGMR con- are all reset, and the GPU registration information stored in
tains the virtual and physical address mapping information GECS and TGMR table is cleared.
of the GPU MMIO region, which is used to verify the address If the OS asks a graceful termination to the GPU enclave,
mapping for the MMIO region. the GPU enclave aborts the entire GPU execution, clears the
Figure 4 illustrates how the security meta-data structures GPU data, and returns the GPU to the OS. User enclaves are
for a GPU enclave are used. During its initialization, a GPU notified that the GPU enclave is terminated and the GPU is
enclave process creates a GPU enclave by using EGCREATE no longer trusted.
instruction with the GPU number consisting of bus, device,
and function numbers, retrieved from the PCIe interface 4.3 Securing I/O Path: MMIO and DMA
provided by the trusted PCIe root complex. Then a pair of The next step to secure GPU computing is to protect com-
the created GPU enclave ID and GPU number is stored in mand and data path to the GPU. The command path to the
the GECS. HIX hardware ensures that the given GPU is a GPU is through PCIe interconnect accessed via MMIO, and
ASPLOS ’19, April 13–17, 2019, Providence, RI, USA I. Jang, et al.

the data can be transferred by MMIO and DMA. This section Trusted Entities Untrusted Entities
presents the I/O path protection by HIX.
Shared
User Process Symmetric Key
4.3.1 MMIO Access Protection GPU Enclave Process
User Enclave
Resp
The baseline SGX EPC access protection mechanism vali- HIX Trusted Msg GPU Enclave
User Library Queue
dates the virtual-to-physical mapping in a translation looka- Req

side buffer (TLB) with the information in EPCM. HIX extends Direct Copy MMIO
Bulk Data Copy
it to protect address translation for the MMIO region, using to GPU

GECS and TGMR, as illustrated in Figure 4. When a soft- Inter-Enclave Shared Memory GPU
ware accesses the MMIO region with a virtual address, the Figure 5. HIX software architecture. The GPU enclave coor-
MMU translates it to physical address using the TLB. For a dinates the communication between a user enclave and the
TLB miss, before adding a TLB entry into the TLB, the hard- GPU. In the user enclave, the trusted user runtime handles
ware page table walker validates it with the following four the interaction with the GPU enclave.
comparisons: (1) the current process is the GPU enclave by
comparing its enclave ID with GECS, (2) the virtual address
in the new TLB entry matches that the GPU enclave requests, routing or MMIO mapping, the root complex simply discards
(3) the virtual address in the new TLB entry matches that in it. In addition to the lockdown during EGCREATE, HIX extends
TGMR, and (4) the physical address in the new TLB entry SGX to securely measure the MMIO configuration register
matches that in TGMR. The entry will be added into the TLB values as part of the GPU enclave measurement.
only if the validation succeeds. Otherwise, the access will
4.3.3 Trusted DMA
be denied. The validation guarantees only a qualified GPU
enclave can access its own MMIO region. Under the malicious OS, the data transfer through DMA is
This access validation step shares mostly the same mech- not secure. The DMA memory region is not protected by
anism as regular enclaves, partially sharing the same hard- SGX, and in addition, the OS can route the DMA data to any
ware logic component for verification. One minor difference memory pages by assigning the target buffer to arbitrary
from the regular SGX enclave is to use the enclave meta- memory pages or by compromising the IOMMU page table.
data dedicated to the GPU enclave (GECS and TGMR) to Therefore, to support the confidentiality and integrity of
protect the GPU MMIO regions. For regular enclaves, the DMAed data in HIX, the data transferred via DMA must be
unchanged SGX does not consider to protect accesses to the encrypted and integrity-protected with message authentica-
MMIO region. tion code (MAC). With the protection for DMA data, only
the encrypted DMA data exist in the unprotected buffer and
4.3.2 MMIO Lockdown and Securing PCIe Routing the integrity is validated by MAC. Therefore, the OS cannot
In the conventional architecture, the privileged system soft- break the confidentiality and integrity of DMA data. Across
ware can remap the MMIO region, or even maliciously mod- user enclaves, the GPU enclave, and GPU, keys are securely
ify PCIe packet routing direction by modifying PCIe device exchanged as discussed in Section 4.4.1. With the secure
registers such as Base Address Registers (BARs) that store key exchange, the communication through untrusted DMA
the information about the MMIO region. To guarantee that mechanism is protected.
the MMIO region mapping and routing of PCIe messages
to the GPU are not modified by malicious software, HIX 4.4 Application-to-GPU Communication
provides an MMIO lockdown mechanism in the PCIe root As the GPU enclave solely controls the GPU, it should pro-
complex. vide a trusted interface for GPU service to user enclaves.
The MMIO lockdown feature is enabled when EGCREATE Figure 5 shows the communication path between a user en-
is called, to freeze the MMIO address map. The processor clave and the GPU enclave. Note that the GPU enclave can
must freeze the MMIO configuration registers of all PCIe make secure connections with different keys against multiple
devices between the PCIe root complex and GPU. All the user enclaves simultaneously.
information about the MMIO regions and PCIe routing is Trusted Runtime User Library: HIX provides the trusted
stored in hardware registers. When the lockdown is enabled, user runtime library for applications, which runs in each
the PCIe root complex rejects all PCIe configuration write application enclave. This library consists of GPU APIs such
requests that attempt to modify the MMIO address map and as memory copy or GPU kernel launch operation, the se-
routing configuration. The root complex is able to inspect curity module containing key initialization and user data
the destination of a write request to modify register values encryption, and the communication module for data trans-
by inspecting the target device number and register offset fers. The library facilitates the application development for
in the PCIe configuration transaction packet [5, 19, 43]. If the trusted GPU execution with HIX. In the user enclave of a
the packet is intended to modify the registers related to PCIe GPU application, the trusted user runtime is in charge of the
Heterogeneous Isolated Execution for Commodity GPUs ASPLOS ’19, April 13–17, 2019, Providence, RI, USA

secure interaction with the GPU enclave, hiding the details copy. The GPU enclave performs in-GPU decryption after
of user-side software components of HIX. copying encrypted data from the shared memory to the GPU
memory, or performs in-GPU encryption before copying
4.4.1 Secure Inter-Enclave Communication the data from the GPU memory to the shared memory. HIX
GPU management and GPU service functions are moved supports two ways for data copy; (1) directly writing data
from the OS device driver to the GPU enclave, which runs as to the trusted MMIO that is mapped to the GPU memory,
a separate user space process. Therefore, a communication and (2) using a GPU DMA engine to copy data [26]. In both
channel that ensures the confidentiality and integrity of the ways, the single-copy mechanism is used.
transferred data among a user enclave, the GPU enclave,
and the GPU has to be established. To provide the confiden- 4.4.3 Communication Example
tiality and integrity of transmitted data via untrusted inter- This section describes how data is securely transferred be-
enclave shared media, HIX uses a symmetric authenticated tween endpoints, i.e. a user enclave and the GPU. For a mem-
encryption. A user enclave and the GPU enclave perform ory copy from host to device (cuMemcpyHtoD), the user enclave
SGX-supported local attestation to verify each other. Once first copies the encrypted metadata for the request such as
they establish the trust through attestation, they create a data size, and sends a cuMemcpyHtoD request to the GPU en-
shared symmetric key by using the Diffie-Hellman key ex- clave through the message queue. After the GPU enclave
change protocol. As the Diffie-Hellman key exchange can be decrypts the request and accepts it, the user enclave encrypts
done among multiple parties, the GPU also participates in the actual data and copies it into the inter-enclave shared
this key setup procedure and generates a shared symmetric memory, and notifies the GPU enclave again. Unlike the re-
key. quest metadata that is decrypted in the GPU enclave, the
The GPU enclave uses two communication channels with user data heading to the GPU is directly copied from the
each user enclave; a message queue and shared memory. The inter-enclave shared memory to the GPU memory, through
message queue is used for communication synchronization, either MMIO or DMA, by the GPU enclave. Then, the GPU
and the shared memory is for the actual encrypted data enclave launches an in-GPU decryption kernel to decrypt
transmission. The user enclave first writes an encrypted data in the GPU, and replies to the user enclave that the data
data into the inter-enclave shared memory, and transfers copy is done. Then, the user enclave can send a next request,
a request through the message queue, waking up the GPU such as launching a kernel.
enclave. Then, the GPU enclave handles the request with
the data in the shared memory after decrypting it with the 4.5 Support for Multiple User Contexts
shared key. The pre-Volta Multi-Process Server (MPS) from NVIDIA al-
lows the concurrent multi-kernel execution in GPU from
4.4.2 Secure Communication between the GPU different user processes. However, the pre-Volta MPS plat-
Enclave and GPU form merges kernels from different user processes into a
Once the trust is set and a key is shared, two enclaves can single GPU context with multiple streams, since the current
communicate securely through an unsecure medium such as GPU allows only one GPU context to be executed in GPU at
shared memory. Between the GPU enclave and GPU itself, a time [24, 37]. As kernels even from different user processes
the secure communication path is established through the share the same GPU context including the address space,
trusted MMIO to the GPU device. A GPU command buffer is a kernel can access the address range used by a different
allocated in the trusted MMIO region and secured by HIX’s kernel [37].
MMIO access protection. The GPU enclave sends commands Unlike the pre-Volta MPS, HIX creates multiple GPU con-
that the user enclave requests to the GPU through the secure texts, each of which is for each user enclave, to isolate a user
command buffer. GPU address space from the others. Each user enclave sets
A naive design for memory copy operation from the user up a unique key with the GPU enclave for secure commu-
enclave to the GPU, is to copy the user encrypted data to the nication. The GPU enclave creates separate GPU contexts
GPU enclave first. The GPU enclave decrypts and re-encrypts for user enclaves and maintains per-user keys. The GPU
with a different key, and copies the data again to GPU. To multi-context execution is done by context switches in GPU.
eliminate unnecessary data copy and encryption, the HIX If the current context does not have any pending request
design adopts a single-copy mechanism, as the user enclave, for kernel execution, a context switch occurs to a different
GPU enclave, and GPU share a key. The GPU enclave sends context [16].
a command to the GPU to copy the user encrypted data Prior studies reported that the GPU context switch and
in the inter-enclave shared memory to the GPU memory memory deallocation, if not carefully done, can leak in-
(cuMemcpyHtoD), or copy the data in the GPU memory to the formation through the shared memory and global mem-
inter-enclave shared memory (cuMemcpyDtoH) directly. This ory [17, 45, 51]. To prevent such data leaks, the GPU runtime
design mitigates the overheads from cryptography and data system must cleanse the deallocated global memory and
ASPLOS ’19, April 13–17, 2019, Providence, RI, USA I. Jang, et al.
Table 2. HIX Trusted Computing Base (TCB) breakdown
Protection Mechanism
Components Software Attack Surface Related Section(s)
Access Restriction Memory Encryption
GPU Enclave Memory Access (MemAcc.) SGX EPC Protection§ 4.2, 4.3, 4.4
GECS & TGMR MemAcc. & HIX Instructions SGX EPC Protection 4.2
GPU BIOS† MMIO MMU 4.2
GPU Registers MMIO MMU 4.2, 4.3
GPU Memory MMIO & DMA MMU OCB-AES 4.2, 4.3
PCIe Infrastructure‡ MMIO PCIe Root Complex 4.3
User Enclave & HIX Library MemAcc. SGX EPC Protection 4.4
Inter-Enclave Shared Memory MemAcc. & DMA OCB-AES 4.4
§ - SGX EPC protection consists of access restriction with EPCM, and memory encryption with MEE.
† - GPU BIOS is first restricted to be accessed, and measured by the GPU enclave.
‡ - PCIe routing mechanism is protected by modification on the PCIe root complex.

shared memory. The high cost of context switch in GPUs


Table 3. Prototype system configurations
will adversely affect the performance of HIX. The latest
NVIDIA Volta architecture supports a better isolated simul- Host Guest
taneous execution with a fully separate GPU address space OS Ubuntu 16.04.4 LTS 64bit Ubuntu 16.04.5 LTS 64bit
for each client [38]. If the GPU-side support for concurrent Kernel 4.14.28 4.13.0
multi-context execution is available, improving HIX with CPU Intel Core i7 6700 3.40GHz 4C/8T
the support is our future work. GPU - NVIDIA Geforce GTX 580
SGX KVM-SGX & QEMU-SGX SGX SDK ver 2.0

5 Evaluation
This section presents the HIX prototype implementation on and QEMU-SGX [23] that are provided by Intel to enable
an emulated system, and evaluates its performance over- SGX functionalities in a guest virtual machine.
heads. In addition, the section provides a qualitative assess- The required hardware modifications such as the MMIO
ment of the security of HIX built upon its design principles. lockdown and new instructions are supported via emulation.
For the new HIX instructions, we used the conditional VM
5.1 Trusted Computing Base (TCB) exit mechanism for SGX instructions by using the ENCLS-
exiting bitmap [18]. It is a 64-bit field in the virtual machine
HIX is secured with a combination of memory encryption control structure (VMCS), and each bit position of the bitmap
and access restriction [36]. In Table 2, we enumerate the com- forces the corresponding SGX instruction to incur a VM
ponents of HIX’s TCB, together with their respective attack exit. The instructions and internal data structures are imple-
surfaces and protection mechanisms. To operate on commod- mented in KVM and managed by the VM exit handler. PCIe
ity GPUs without any modification, we protect GPU hard- MMIO lockdown is implemented in the QEMU’s emulated
ware resources with access restriction, where the modified IOH3420 PCIe root port device. The modified PCIe root de-
MMU denies all accesses other than from the GPU enclave. vice rejects write requests to the PCIe configuration space
The trusted PCIe I/O routing mechanism guarantees packets if the request modifies the registers for MMIO routing. TLB
to reach the desired GPU with the MMIO lockdown from entry validation, checking whether the MMIO addresses are
the PCIe root complex. Furthermore, auxiliary control data modified or whether an adversary is accessing the trusted
structures used for access validation are further secured with MMIO, is emulated in the EPT violation handling procedure
the hardware-based SGX protection, which stores the data as of KVM [54].
encrypted in EPC pages, and allows no software accesses to For the GPU driver running on the GPU enclave, we
it. Enclaves, secured with Intel SGX, communicate with the use Gdev, an open-source CUDA platform for GPU com-
inter-enclave shared memory, protected by authenticated puting [27, 28]. Gdev is modified to run on the modified SGX
encryption. enclave as the GPU enclave. In the Gdev design, synchro-
nization between the GPU driver and GPU is done via MMIO
5.2 Prototype Implementation polling, not interrupts.
We implemented a prototype of HIX using the system vir- The static HIX trusted library is linked to the user enclave
tualization and emulation. The software components such for inter-enclave communication, and provides an essential
as the trusted GPU driver in the GPU enclave, and the pro- application programming interface (API) almost identical to
tected communication mechanism across the user enclave, the corresponding CUDA driver API. Therefore, program-
GPU enclave, and GPU, are implemented on top of the em- mers can easily use HIX in the same way as they use the
ulated system. The system emulation uses KVM-SGX [22] existing CUDA API. We use the OCB-AES-128 authenticated
Heterogeneous Isolated Execution for Commodity GPUs ASPLOS ’19, April 13–17, 2019, Providence, RI, USA
Table 4. Size of matrix and the corresponding data size
2081
11264 4881
Matrix size HtoD size DtoH size Total mem requirement

Matrix size
1181
8192 Close
2048x2048 32MB 16MB 48MB 2621
MemcpyDtoH
433
4096x4096 128MB 64MB 192MB 4096 729 Execution
Gdev
8192x8192 512MB 256MB 768MB 251
MemcpyHtoD
2048 HIX Init
11264x11264 968MB 484MB 1452MB 259
0 1000 2000 3000 4000 5000
Matrix addition execution time (ms)

encryption algorithm for data confidentiality and integrity 11264


44919
47770
protection [33]. Intel SGX-SSL library is used for encryp-

Matrix size
18167
8192 19609
tion and decryption in enclaves, and we implemented the 2580
GPU cryptography functions based on the OpenSSL OCB 4096 2880

implementation and RFC7253 specification [14, 21, 33, 46]. 2048


511
521
To mitigate the cryptography overheads, the memory 0 10000 20000 30000 40000 50000
Matrix multiplication execution time (ms)
copies in HIX are pipelined; i.e. authenticated encryption or
decryption and actual copy operation are operated in par- Figure 6. Execution time of matrix addition and matrix mul-
allel. HIX divides a large data block into multiple smaller tiplication on Gdev and HIX.
chunks, and encrypts the n+1th chunk during the transfer
of the encrypted nth chunk. Table 5. List of Rodinia benchmark applications

5.3 Performance Overhead App Memcpy (HtoD / DtoH) Problem Size

We evaluate the overheads of HIX for performance with two Back Propagation (BP) 117.0MB / 42.75MB 589,824 nodes
Breadth-First Search (BFS) 45.78MB / 3.81MB 1,000,000 nodes
workload scenarios. First, we use micro-benchmarks for ma-
Gaussian Elimination (GS) 32.00MB / 32.00MB 2048×2048 points
trix add and multiplication to evaluate the entire execution Hotspot (HS) 8.00MB / 4.00MB 1024×1024 points
stages from the application initiation originated from the LU Decomposition (LUD) 16.00MB / 16.00MB 2048×2048 points
user enclave to the completion in the GPU. Next, we use Needleman-Wunsch (NW) 128.1MB / 64.03MB 4096×4096 points
the Rodinia benchmarks for more realistic workload scenar- K-nearest Neighbors (NN) 334.1KB / 167.05KB Default inputs
Pathfinder (PF) 256.0MB / 32.00KB 8192×8192 points
ios [6, 7]. Each test is measured five times, and an average is
SRAD 24.23MB / 24.19MB 3096×2048 points
shown.
We perform our evaluation on a system with a GPU and
SGX-enabled CPU. Table 3 presents the system configuration
for the evaluation. In this section, Gdev denotes runs with execution time. For multiplication with the 11264×11264 in-
the original unsecure Gdev platform. put size, HIX is slower than the original Gdev by only 6.34%.
As shown by the analysis, the majority of performance over-
5.3.1 Matrix Operation Microbenchmarks heads in HIX are from the authenticated encryption over-
To analyze the performance of HIX, we first use simple ma- heads between the user enclave and GPU. The performance
trix operations: integer matrix addition (A + B = C) and cost of HIX highly depends on the ratios of the computation
integer matrix multiplication (A × B = C), and compare the in GPUs and communication between the CPU and GPU.
results between HIX and the original Gdev with various data
sizes. Table 4 represents the sizes of input and output data 5.3.2 Rodinia Microbenchmarks
in terms of the matrix size. Note the GPU we used for tests Table 5 presents the list of applications selected from the
(NVIDIA Geforce GTX 580 1 ) has 1.5GB memory capacity, Rodinia benchmark suite, and the data amounts transferred
hence we could not measure the performance for matrix between the CPU and GPU along with the problem sizes. The
operations with larger than 1.5GB memory usage. application selection follows the ones used for the original
The results are illustrated in Figure 6. For matrix addition Gdev evaluation, although there are minor changes due to
with a low ratio of computation over communication, the the porting issues.
overhead from the cryptographic operations dominates the Figure 7 presents the result of the selected Rodinia bench-
other costs, causing the execution to be 2.5x times slower mark applications. HIX showed 26.8% slower performance
than Gdev. For matrix multiplication, however, computation than the unsecure Gdev on average. When the computa-
time drastically increases compared to the addition, making tion to communication ratio is high as shown in GS, HIX
security overheads account for a much less portion of the exhibits a comparable performance to Gdev. However, the
performance degradations are higher for the applications
1 Theparticular GPU was selected in this study, due to the availability of with large data transfers (BP, NW, and PF), with 81.5%, 70.1%,
Gdev support for the GPU architecture. and 154% performance degradations respectively. In addition,
ASPLOS ’19, April 13–17, 2019, Providence, RI, USA I. Jang, et al.

2500
24662420 received requests sequentially. Once the concurrent multi-
user execution without context switches is supported with
Gdev Close
MemcpyDtoH
2000 HIX
Execution time (ms)

Execution
MemcpyHtoD
the introduction of the latest NVIDIA Volta architecture,
1500 Init the performance degradation is expected to be significantly
1000 952 reduced.
747 780
590 626
500 439
325 272 219 5.5 Security Analysis
209 270 212 179 97 246
108
0
BP BFS GS HS LUD NW NN PF SRAD
We first present a minimal set of security axioms that HIX is
founded upon and analyze how HIX defends against classes
Figure 7. Execution time of Rodinia benchmarks with single- of attacks given these axioms. We assume that the following
user execution security axioms hold valid for HIX:

6 Axiom #1 - Hardware root of trust: Both the


Normalized execution time

5
Gdev parallel Gdev serial
GPU and SGX-enabled CPU are trusted and not subject
HIX parallel HIX serial
4 to physical attacks.

2.51x
2.01x
1.96x
3 1.35x
2 Axiom #2 - SGX-enabled security: SGX preserves
1 the integrity of code running within enclaves and the
0
BP BFS GS HS LUD NW NN PF SRAD mean confidentiality of data stored at runtime in the enclaves.
Figure 8. Multi-user execution (two users): Rodinia bench- Axiom #1 guarantees the presence of trusted CPU that
mark execution time with two users, normalized to Gdev ensures SGX operates correctly as designed. In addition, it
one user. assumes that the GPU hardware itself is trusted, as a physical
attack on it is out of scope. Axiom #2 ensures the confidential-
12 ity and integrity of code and data within the SGX enclaves.
Normalized execution time

Gdev parallel Gdev serial


10
HIX parallel HIX serial The code that executes in the enclaves (both in the user and
8
GPU enclaves) can be attested and verified to be as intended.
5.02x
4.01x

6
In Figure 10, we illustrate the round-trip user data flow to
2.99x
2.14x

4
and from the user app and the GPU, and highlight the attack
2
surface of HIX indicated with circled numbers. We design
0
BP BFS GS HS LUD NW NN PF SRAD mean HIX to guard against these possible attack points.
Figure 9. Multi-user execution (four users): Rodinia bench- Data Confidentiality and Integrity Attacks: An attacker
mark execution time with four users, normalized to Gdev can target two forms of data, namely (1) communication data
one user. between two entities at runtime, and (2) the computational
data that is being used in the entity or stored at rest.
First, to protect communication data outside the trusted en-
the task initialization overhead is slightly lower in HIX, and tities covered by Axiom #1, HIX safeguards the inter-enclave
thus small kernel launches in HS, LUD, and NN, are faster shared memory communication channel ( 1 ), the MMIO
in HIX than Gdev. path ( 3 ), and the PCIe routing path ( 4 ). To secure the
inter-enclave communication, HIX uses the Intel SGX local
5.4 Multi-User Execution attestation and Diffie-Hellman key exchange protocol [12] to
In this section, we evaluate the performance when multiple negotiate the initial session encryption keys between the en-
users request the GPU service simultaneously. The results claves. The subsequent inter-enclave communication flows
are illustrated in Figure 8 (service to 2 users) and Figure 9 of the request messages and user data are then encrypted
(service to 4 users). The execution times are normalized to with the OCB-AES authenticated encryption algorithm to
those with Gdev with one user. As HIX includes multiple ensure their confidentiality and integrity. An incrementing
in-GPU cryptography kernel executions, the overheads from nonce is also used to ensure freshness of the encryption
the cryptography kernel execution itself, increased context messages and to prevent replay attacks. When the data is
switches, and resource underutilization for small data cryp- transferred, they remain encrypted with the key, hence the
tography make HIX performance worse than Gdev. HIX confidentiality and integrity are still guaranteed until the
parallel execution shows performance about 45.2% worse data reaches to the GPU.
with two users, 39.7% worse with four users, than the Gdev Second, we ensure that the critical ephemeral data, such
parallel execution. However, the performance is still better as the session cryptographic keys, remain protected within
than the execution scenario that the GPU enclave runs the the confines of SGX hardware enforced isolation ( 2 ). Axiom
Heterogeneous Isolated Execution for Commodity GPUs ASPLOS ’19, April 13–17, 2019, Providence, RI, USA

IOMMU 5
User Process
2 GPU Enclave Process
User 1
Inter-Enclave
2
PCIe
6

Enclave Shared Memory


GPU Enclave 3 MMIO Controller
4 GPU

Figure 10. Attack surface analysis illustrates possible attacks and HIX defense at different stages of the secure dataflow.

#2 ensures that the secret data remains confidential and inac- PCIe Routing Modification Attacks: An attacker can at-
cessible to the adversary. The session cryptographic keys are tempt to intercept PCIe packets heading to the GPU by mod-
created and stored in the GPU, still they cannot be accessed ifying the intermediate PCIe routing path ( 4 ). In addition,
from the adversary as the MMIO can only be accessed by an attacker can redirect packets from the GPU enclave to
the eligible GPU enclave, which is detailed in Section 4.3. an untrusted destination, which can potentially induce the
Code Integrity Attacks: Since the GPU enclave mediates GPU enclave to create a secret key with an untrusted device
sole access to the GPU, an attacker may attempt to compro- other than the GPU. To prevent the PCIe routing table from
mise the code running within the GPU enclave ( 2 ) during modification, the GPU enclave locks the MMIO routing in-
the setup process. Axiom #2 ensures the integrity of the code formation through MMIO lockdown, and then validates the
running within the enclaves. Furthermore, the user lever- routing information from the PCIe root complex to the GPU
ages SGX to perform a remote attestation [20] on the code during initialization, as illustrated in Section 4.3.2. After the
running within the GPU enclave. As part of the attestation GPU enclave is initialized, the attacker cannot modify the
process, the GPU enclave code cryptographically confirms routing path from the host to the GPU.
its provenance (as being the code provided by the GPU ven- GPU Enclave Termination Attacks: As discussed in Sec-
dor) and further verifies that it has not been modified and is tion 4.2.3, the forcefully terminated GPU enclave ( 2 ) is still
indeed executing on a genuine Intel SGX-enabled system. registered in the hardware (GECS and TGMR) as the owner
MMIO Address Translation Attacks: To subvert the se- process of the GPU. Therefore, even a newly created GPU
cure hardware I/O path established between the GPU and enclave process cannot access the GPU, as the GPU enclave
GPU enclave via the MMIO ( 3 ), an attacker can try to redi- registration is not reset with the GPU enclave termination.
rect one of the path endpoints to an attacker-controlled en- The GPU can be used only after a power recycling and sys-
tity. Two potential ways to achieve this are: (1) registering tem reboot, removing any remaining information in the GPU
an erroneous address pair during TGMR registration, and (2) and its memory.
modifying the page table entry related to the MMIO. These GPU Emulation Attacks: A privileged adversary can set
attacks are thwarted by HIX’s design. For the first type of up an emulated GPU ( 6 ). However, during the secure initial-
attacks, the execution of EGADD validates that the virtual and ization of the GPU enclave, HIX checks the hardware status
physical addresses are within the proper range of the GPU of the GPU. Since the trusted PCIe root complex retrieves
enclave virtual address space and physical MMIO region. only the real devices attributes, HIX can prevent an emulated
In the second type of attacks, after registering the trusted GPU from being used and guarantee the trusted routing to
MMIO region, the attacker can attempt to modify a page the actual hardware GPU.
table entry for the MMIO to redirect traffic between the GPU
and GPU enclave to a memory region the attacker controls. 5.6 Limitations
To guard against this, the page table entry retrieved by the
In this section, we discuss the limitations of HIX, stemmed
page table walker is validated before being used, as detailed
from the key design principle: no modification to the GPU
in Section 4.3.
architecture.
DMA Attacks: An attacker can modify the target physical
Physical Attacks on GPUs: GPUs do not have a trusted
address of DMA and allow attacker-controlled data to be
memory region, and the data in the GPU memory exist in
copied to/from the GPU ( 5 ). However, this attack will not
plaintext. Therefore, direct physical accesses on the GPU
work in the presence of HIX’s use of authenticated encryp-
memory will expose the user data. In addition, as PCIe in-
tion. The integrity of the encrypted data is checked based
terconnects are exposed, injecting malicious PCIe packets
on the OCB-AES algorithm. If an attacker attempts to inject
via a special hardware is possible. For such packet injections,
compromised data at runtime, the GPU and user enclave
securing the routing path to the GPU is not sufficient to
will detect the failure in the integrity check and abort. This
secure the control of the GPU via MMIO.
protection is still valid when a malicious IOMMU is used for
No PCIe Peer-to-Peer Transaction Service: PCIe peer-to-
DMA [58].
peer (P2P) transaction services, such as NVIDIA GPUDirect,
are used for high performance systems. The HIX design in
this paper is focused on a single GPU or multi-GPU system
ASPLOS ’19, April 13–17, 2019, Providence, RI, USA I. Jang, et al.

without P2P connection across GPUs, providing protection residual information from past computational sessions can be
only for the communication path between a user enclave examined by attackers from the GPU memory [17, 29, 34, 56].
and GPU. Investigating the P2P communication with HIX is Recent studies investigate the security aspects of I/O de-
our future work. vices and their computations. Border Control proposed the
PCIe feature not supported for MMIO Lockdown: The security of heterogeneous systems with accelerators [39].
PCI specification specifies a way of getting the MMIO size The study is focused on protecting the system from the po-
[41]. However, the sizing inquiry involves a BAR write with tentially malicious accelerators, while HIX provides secure
all 1’s, which is not allowed after the MMIO lockdown in HIX. GPU and accelerator computation isolated from compro-
This problem is implementation specific; it can be solved mised privileged software. SUD isolates potentially mali-
with additional mechanism, such as the PCIe root complex cious device drivers from the kernel space by providing an
exceptionally accepts a MMIO modification if it is writing emulated kernel environment in the user space [4].
all 1’s for the sizing inquiry. Several studies investigated a hypervisor-based approach [53,
No GPU Demand Paging Support: Recent GPUs support 57] and system management mode (SMM) -based approach [32]
demand paging which dynamically copies data from the host to improve the security between the user and I/O devices
to the GPU with page faults to extend GPU memory to the under an untrusted OS. SGXIO utilized a formally verified hy-
main memory [44, 47, 48]. Supporting such demand paging pervisor to provide a trusted path between user applications
requires additional encryption and integrity protection for and I/O devices [53]. The trusted device driver on the hyper-
the pages before writing back to the main memory. However, visor provides device services to the user application in an
our prototype does not provide the feature due to the lack of enclave. However, SGXIO does not investigate its approach
demand paging supports in the open source Gdev platform. for performance-oriented GPU computing. SGXIO relies on
Adding the demand paging will be our future work. device virtualization, which has high performance overheads
for GPUs. Recent studies for full GPU virtualization showed
that the performance overheads are significantly higher than
6 Related Work native executions [50, 55].
A recent study, Graviton, conducted in parallel to HIX, pro-
posed a trusted computation on GPUs by the GPU-provided
isolated execution [52]. Graviton proposed to modify the 7 Conclusion
GPU hardware to prevent the device driver from directly
This paper proposed a hardware and software architecture
accessing several critical GPU interfaces, such as communi-
to protect GPU computation from malicious privileged soft-
cation channels, page table entry, etc. Unlike Graviton, HIX
ware. HIX isolates the I/O interconnect and GPU driver from
is focused on protecting commodity GPUs with hardware
the control of the OS, without requiring any change to the
extensions of the I/O components and SGX supports in the
hardware GPU architecture. Although this paper focuses on
CPU side.
the discrete GPU platform connected with PCIe buses, HIX
There have been several recent studies to improve the
can be extended to support various accelerator architectures
security of the current systems with SGX. In SCONE [2], a
communicating with CPUs over I/O interconnects by apply-
docker container runs inside SGX enclave. It uses an asyn-
ing the proposed device isolation principles. The prototype
chronous system call interface to pass container’s system
implementation on an emulated system demonstrates the
call requests to outside the enclave fast. Kim et al. used SGX
feasibility of secure GPU computation with minor hardware
to enhance the security of anonymity network software to
I/O interconnect changes.
address its current limitations [30]. There are several recent
studies to reduce the limitation of memory capacity of SGX.
Eleos proposed a general library for storing data on the se-
cure memory pool outside of the enclave [40]. ShieldStore Acknowledgments
and SPEICHER proposed application-specific approaches This work was supported by the National Research Founda-
for key-value storage to keep data securely on untrusted tion of Korea (NRF-2016R1A2B4013352) and by the Institute
memory [3, 31]. for Information & communications Technology Promotion
There are several studies that analyzed the security vul- (IITP-2017-0-00466). Both grants are funded by the Ministry
nerabilities of GPUs. CUDA Leaks [45] showed how GPU of Science and ICT, Korea. This work is partially supported
data can be leaked to a malicious user, and Zhu et al. [58] by HR0011-18-C-0017 (DARPA) and a gift from Bloomberg.
analyzed the GPU architecture and its potential security Opinions, findings, conclusions and recommendations ex-
holes. PixelVault uses the GPU hardware as a secure storage pressed in this material are those of the authors and do
of keys, exploiting the physical isolation between GPU and not necessarily reflect the views of the US Government or
CPU [51]. Since GPUs typically reuse memory blocks that are commercial entities. Simha Sethumadhavan has a significant
not initialized to zero in memory allocations or deallocations, financial interest in Chip Scan Inc.
Heterogeneous Isolated Execution for Commodity GPUs ASPLOS ’19, April 13–17, 2019, Providence, RI, USA

References https://www.intel.com/content/dam/www/public/us/en/documents/
[1] Ittai Anati, Shay Gueron, Simon Johnson, and Vincent Scarlata. 2013. datasheets/desktop-6th-gen-core-family-datasheet-vol-2.pdf
Innovative Technology for CPU Based Attestation and Sealing. In The [20] Intel. 2016. Intel Software Guard Extensions Remote At-
2nd International Workshop on Hardware and Architectural Support for testation End-to-End Example. Retrieved Jan 2, 2019
Security and Privacy (HASP ’13), Vol. 13. 1–6. from https://software.intel.com/en-us/articles/intel-software-guard-
[2] Sergei Arnautov, Bohdan Trach, Franz Gregor, Thomas Knauth, An- extensions-remote-attestation-end-to-end-example
dre Martin, Christian Priebe, Joshua Lind, Divya Muthukumaran, [21] Intel. 2018. Intel Software Guard Extensions SSL. Retrieved December
Dan O’Keeffe, Mark Stillwell, David Goltzsche, Dave Eyers, RÃijdiger 29, 2018 from https://github.com/intel/intel-sgx-ssl
Kapitza, Peter Pietzuch, and Christof Fetzer. 2016. SCONE: Secure [22] Intel. 2018. KVM-SGX. Retrieved December, 29, 2018 from https:
Linux Containers with Intel SGX. In 12th USENIX Symposium on Oper- //github.com/intel/kvm-sgx
ating Systems Design and Implementation (OSDI ’16). 689–703. [23] Intel. 2018. QEMU-SGX. Retrieved December 29, 2018 from https:
[3] Maurice Bailleu, Jörg Thalehim, Pramod Bhatotia, Christof Fetzer, //github.com/intel/qemu-sgx
Michio Honda, and Kapil Vaswani. 2019. SPEICHER: Securing LSM- [24] Qing Jiao, Mian Lu, Huynh Huynh Phung, and Tulika Mitra. 2015.
based Key-Value Stores using Shielded Execution. In 17th USENIX Improving GPGPU Energy-Efficiency through Concurrent Kernel Ex-
Conference on File and Storage Technologies (FAST ’19). ecution and DVFS. In IEEE/ACM International Symposium on Code
[4] Silas Boyd-Wickizer and Nickolai Zeldovich. 2010. Tolerating Ma- Generation and Optimization (CGO ’15). 1–11.
licious Device Drivers in Linux. In 2010 USENIX Annual Technical [25] Asim Kadav and Michael M. Swift. 2012. Understanding Modern
Conference (USENIX ATC ’10). 1–9. Device Drivers. In The 17th International Conference on Architectural
[5] Ravi Budruk, Don Anderson, and Tom Shanley. 2004. PCI Express Support for Programming Languages and Operating Systems (ASPLOS
System Architecture. ’12). 87–98.
[6] Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W [26] Shinpei Kato. 2013. Implementing Open-Source CUDA Runtime. Tech-
Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A Benchmark nical Report. Nagoya University.
Suite for Heterogeneous Computing. In IEEE International Symposium [27] Shinpei Kato, Yuki Abe, Jason Aumiller, Takuya Edahiro, Yuseke Fujii,
on Workload Characterization (IISWC ’09). 44–54. Masaki Iwata, Marcin Koscielnicki, Michael McThrow, Martin Peres,
[7] Shuai Che, Jeremy W Sheaffer, Michael Boyer, Lukasz G Szafaryn, Hiroshi Sasaki, Yuske Suzuki, Hisashi Usuda, Kaibo Wang, and Hiroshi
Liang Wang, and Kevin Skadron. 2010. A Characterization of the Yamada. 2014. Gdev: Open-Source GPGPU Runtime and Driver Soft-
Rodinia Benchmark Suite with Comparison to Contemporary CMP ware. Retrieved June 17, 2018 from https://github.com/shinpei0208/
Workloads. In IEEE International Symposium on Workload Characteri- gdev
zation (IISWC ’10). 1–11. [28] Shinpei Kato, Michael McThrow, Carlos Maltzahn, and Scott A. Brandt.
[8] Stephen Checkoway and Hovav Shacham. 2013. Iago Attacks: Why 2012. Gdev: First-Class GPU Resource Management in the Operating
the System Call API is a Bad Untrusted RPC Interface. In The 18th System. In 2012 USENIX Annual Technical Conference (USENIX ATC
International Conference on Architectural Support for Programming ’12). 401–412.
Languages and Operating Systems (ASPLOS ’13). 253–264. [29] Michael Kerrisk. 2012. XDC2012: Graphics Stack Security.
[9] Victor Costan and Srinivas Devadas. 2017. Intel SGX Explained. IACR [30] Seong Min Kim, Juhyeng Han, Jaehyeong Ha, Taesoo Kim, and Dongsu
Cryptology ePrint Archive (Feb 2017), 1–118. Han. 2017. Enhancing Security and Privacy of Tor’s Ecosystem by
[10] Advanced Micro Devices. 2017. Radeon’s Next Generation Vega Archi- Using Trusted Execution Environments. In 14th USENIX Symposium
tecture. Technical Report. Advanced Micro Devices, Santa Clara, CA, on Networked Systems Design and Implementation (NSDI ’17). 145–161.
USA. [31] Taehoon Kim, Joonun Park, Jaewook Woo, Seungheun Jeon, and Jae-
[11] Bang Di, Jianhua Sun, and Hao Chen. 2016. A Study of Overflow hyuk Huh. 2019. ShieldStore: Shielded In-memory Key-value Storage
Vulnerabilities on GPUs. In IFIP International Conference on Network with SGX. In 14th European Conference on Computer Systems (EuroSys
and Parallel Computing (NPC ’16). 103–115. ’19).
[12] Whitfield Diffie and Martin E. Hellman. 1976. New Directions in [32] Yonggon Kim, Ohmin Kwon, Jinsoo Jang, Seongwook Jin, Hyeongboo
Cryptography. Transactions on Information Theory 22, 6 (Nov 1976), Baek, Brent Byunghoon Kang, and Hyunsoo Yoon. 2016. On-demand
644–654. bootstrapping mechanism for isolated cryptographic operations on
[13] Envytools. 2016. Envytools - Tools for People Envious of NVIDIA’s commodity accelerators. Computers & Security 62 (Sep 2016), 33–48.
Blob Driver. Retrieved August 6, 2018 from https://github.com/ [33] Ted Krovetz and Phillip Rogaway. 2014. The OCB authenticated-
envytools/envytools encryption algorithm. Technical Report. 1–19 pages.
[14] OpenSSL Software Foundation. 2003. OpenSSL: The Open Source [34] Sangho Lee, Youngsok Kim, Jangwoo Kim, and Jong Kim. 2014. Stealing
toolkit for SSL/TLS. Retrieved July 14, 2018 from https://openssl.org webpages rendered on your browser by exploiting GPU vulnerabilities.
[15] Yusuke Fujii, Takuya Azumi, Nobuhiko Nishio, Shinpei Kato, and In IEEE Symposium on Security and Privacy (SP ’14). 19–33.
Masato Edahiro. 2013. Data Transfer Matters for GPU Computing. In [35] Frank McKeen, Ilya Alexandrovich, Alex Berenzon, Carlos V. Rozas,
International Conference on Parallel and Distributed Systems (ICPADS Hisham Shafi, Vedvyas Shanbhogue, and Uday R. Savagaonkar. 2013.
’13). 275–282. Innovative Instructions and Software Model for Isolated Execution. In
[16] Peter N Glaskowsky. 2009. NVIDIA’s Fermi: The First Complete GPU The 2nd International Workshop on Hardware and Architectural Support
Computing Architecture. Technical Report. NVIDIA, Santa Clara, CA, for Security and Privacy (HASP ’13). 1–8.
USA. [36] Zhenyu Ning, Fengwei Zhang, Weisong Shi, and Weidong Shi. 2017.
[17] Ari B Hayes, Lingda Li, Mohammad Hedayati, Jiahuan He, Eddy Z Position Paper: Challenges Towards Securing Hardware-assisted Exe-
Zhang, and Kai Shen. 2017. GPU Taint Tracking. In 2017 USENIX cution Environments. In The Hardware and Architectural Support for
Annual Technical Conference (USENIX ATC ’17). 209–220. Security and Privacy (HASP ’17). 1–8.
[18] Intel. 2014. Intel Software Guard Extensions Programming Reference. [37] NVIDIA. 2017. Multi Process Service. Technical Report. NVIDIA, Santa
Technical Report. Intel, Santa Clara, CA, USA. https://software.intel. Clara, CA, USA. https://docs.nvidia.com/deploy/pdf/CUDA_Multi_
com/sites/default/files/managed/48/88/329298-002.pdf Process_Service_Overview.pdf
[19] Intel. 2016. 6th Generation Intel Processor Datasheet for S- [38] NVIDIA. 2017. NVIDIA Volta Architecture. Technical Report. NVIDIA,
Platforms. Technical Report. Intel, Santa Clara, CA, USA. Santa Clara, CA, USA.
ASPLOS ’19, April 13–17, 2019, Providence, RI, USA I. Jang, et al.

[39] Lena E. Olson, Jason Power, Mark D. Hill, and David A. Wood. 2015. 2, 2019 from http://resources.infosecinstitute.com/system-address-
Border Control: Sandboxing Accelerators. In Proceedings of the 48th map-initialization-x86x64-architecture-part-2-pci-express-based-
International Symposium on Microarchitecture (MICRO ’15). 470–481. systems/
[40] Meni Orenbach, Pavel Lifshits, Marina Minkin, and Mark Silberstein. [50] Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2014.
2017. Eleos: ExitLess OS Services for SGX Enclaves. In 12th European GPUvm: Why Not Virtualizing GPUs at the Hypervisor?. In 2014
Conference on Computer Systems (EuroSys ’17). 238–253. USENIX Annual Technical Conference (USENIX ATC ’14). 109–120.
[41] PCI-SIG. 2004. PCI Local Bus Specification Specification, Revision 3.0. [51] Giorgos Vasiliadis, Elias Athanasopoulos, Michalis Polychronakis, and
Technical Report. PCI-SIG, Beaverton, OR, USA. Sotiris Ioannidis. 2014. PixelVault: Using GPUs for Securing Cryp-
[42] PCI-SIG. 2009. Address Translation Services Specification, Revision 1.1. tographic Operations. In ACM SIGSAC Conference on Computer and
Technical Report. PCI-SIG, Beaverton, OR, USA. Communications Security (CCS ’14). 1131–1142.
[43] PCI-SIG. 2010. PCI Express Base Specification Specification, Revision 3.0. [52] Stavros Volos, Kapil Vaswani, and Rodrigo Bruno. 2018. Graviton:
Technical Report. PCI-SIG, Beaverton, OR, USA. Trusted Execution Environments on GPUs. In 13th USENIX Symposium
[44] Bharath Pichai, Lisa Hsu, and Abhishek Bhattacharjee. 2014. Architec- on Operating Systems Design and Implementation (OSDI ’18). 681–696.
tural Support for Address Translation on GPUs: Designing Memory [53] Samuel Weiser and Mario Werner. 2017. SGXIO: Generic Trusted I/O
Management Units for CPU/GPUs with Unified Address Spaces. In The Path for Intel SGX. In ACM Conference on Data and Application Security
19th International Conference on Architectural Support for Programming and Privacy (CODASPY ’17). 261–268.
Languages and Operating Systems (ASPLOS ’14). 743–758. [54] Sheng Yang. 2008. Extending KVM with new Intel Virtualization Tech-
[45] Roberto Di Pietro, Flavio Lombardi, and Antonio Villani. 2016. CUDA nology. https://www.linux-kvm.org/images/c/c7/KvmForum2008%
Leaks: A Detailed Hack for CUDA and a (Partial) Fix. ACM Transactions 24kdf2008_11.pdf KVM Forum.
on Embedded Computing Systems (TECS) 15, 1, Article 15 (Feb 2016), [55] Hangchen Yu and Christopher J. Rossbach. 2017. Full Virtualization
25 pages. for GPUs Reconsidered. In 14th Annual Workshop on Duplicating, De-
[46] Phillip W. Rogaway. 2006. Method and Apparatus for Facilitating constructing, and Debunking (WDDD ’17). 1–11.
Efficient Authenticated Encryption. Patent No. U.S. 7,046,802, Filed [56] Zhe Zhou, Wenrui Diao, Xiangyu Liu, Zhou Li, Kehuan Zhang, and Rui
July 30th., 2001, Issued May 16th., 2006. Liu. 2017. Vulnerable GPU Memory Management: Towards Recovering
[47] Phil Rogers. 2013. Heterogeneous System Architecture Overview. In Raw Data from GPU. Proceedings on Privacy Enhancing Technologies
A Symposium on High Performance Chips (Hot Chips ’13). 1–41. (PoPETs) 2017, 2 (2017), 57–73.
[48] Nikolay Sakharnykh. 2017. Unified Memory on Pascal and Volta. [57] Zongwei Zhou, Virgil D. Gligor, James Newsome, and Jonathan M.
http://on-demand.gputechconf.com/gtc/2017/presentation/s7285- McCune. 2012. Building Verifiable Trusted Path on Commodity x86
nikolay-sakharnykh-unified-memory-on-pascal-and-volta.pdf GPU Computers. In Symposium on Security and Privacy (SP ’12). 616–630.
Technology Conference ’17. [58] Zhiting Zhu, Sangman Kim, Yuri Rozhanski, Yige Hu, Emmett Witchel,
[49] Darmawan Salihun. 2014. System Address Map Initialization in x86/64 and Mark Silberstein. 2017. Understanding The Securty of Discrete
Architecture Part 2: PCI Express-Based Systems. Retrieved Jan GPUs. In Proceedings of the General Purpose GPUs (GPGPU ’10). 1–11.

You might also like