Architectural Support For High Speed Protection of Memory Integrity and Confidentiality in Multiprocessor Systems
Architectural Support For High Speed Protection of Memory Integrity and Confidentiality in Multiprocessor Systems
Weidong Shi Hsien-Hsin (Sean) Lee Mrinmoy Ghosh Chenghuai Lu Georgia Institute of Technology Atlanta, GA 30332
1
XBOX with MOD-chip installed. MOD-chip is a low cost bus snoop and spoof device widely used to break XBOX security.
Shared-Memory MP Security Architecture 2
P-III
BIOS Flash (some BIOS codes are encrypted)
BIOS hijacking
Low cost FPGA based bus snooping device
Motivation
Yet to be solved Issues of prior security measures Uni-processor based security model Protected memory cannot be shared Large space and performance overhead in security support Some compromise some security for performance improvement
Our Work
Protect integrity and confidentiality in a Shared-memory Multiprocessor platform
Shared-Memory MP Security Architecture 4
Agenda
Uni-processor Security Architecture Platform-oriented Security Architecture
Conclusions
Secure Processor
North Bridge
(Mem Controller)
RAM
South Bridge
Ethernet
Mouse
Keyboard
Disk
Secure Processor
North Bridge
(Mem Controller)
RAM
South Bridge
Ethernet
Mouse
Keyboard
Disk
Crypto Engine
Secure Processor
North Bridge
(Mem Controller)
South Bridge
Ethernet
Mouse
Keyboard
Disk
Receiver
M bit MAC
M bit MAC
Exception
Again, Sender and Receiver share the same secret key Detect data tampering using Message Authentication Code (or MAC) Any attempt for an adversary to modify data or forge a valid authentication code is guaranteed to be detected
Shared-Memory MP Security Architecture 9
Processor n (PE n)
Processor Core Caches Crypto Engine
Cache-to-Cache
- send encrypted data first then followed by encrypted MAC - receiver decrypts data and verifies integrity
Cache-to-Memory
send encrypted data and MAC to Nbridge Need- to be - Nbridge decrypts the data, verifies its protected integrity, updates MAC tree, and store encrypted data to the RAM
RAM
Crypto Engine
MAC Tree Cache
MAC
M-ary MAC (message authentication code) tree to protect physical memory integrity dynamically (e.g. Replay attack). The root MAC is a signature of the protected memory space. Root MAC is kept inside the North Bridge. Frequently accessed MAC tree nodes are cached inside NBridge
Shared-Memory MP Security Architecture 11
Processor n (PE n)
Processor Core Caches Crypto Engine
Cache-to-Cache
- send encrypted data first then followed by encrypted MAC - receiver decrypts data and verifies integrity
Cache-to-Memory
- send encrypted data and MAC to Nbridge - Nbridge decrypts the data, verifies its integrity, updates MAC tree, and store encrypted data to the RAM
RAM
Crypto Engine
MAC Tree Cache
Memory-to-Cache
- Nbrdige reads encrypted data and MAC from the RAM - Nbridge decrypts the data, verifies its MAC, reencrypts the data and put encrypted data and MAC on the shared bus 12 - receiver decrypts data and verifies integrity
13
Receiver
Init. Counter + 0
Pseudo-random pad
Pseudo-random pad
Plaintext A
XOR
Ciphertext A
XOR
Plaintext A
To send a data sequence securely Sender and receiver share a secret key, and an initial counter value. A pseudo-random pad is generated deterministically Counter value does not need to be a secret.
Shared-Memory MP Security Architecture
14
Receiver
Init. Counter + 1
Pseudo-random pad
Pseudo-random pad
Plaintext B
XOR
Ciphertext B
XOR
Plaintext B
15
Cryptographic Hash
One-Time-Pad (OTP)
Cache Line
Encrypted Data
OTP generation Bus sequence number Process Key Bus sequence number a 64-bit secret initialized after the system is booted shared by all the parties connected to the shared bus. incremented after each transaction All PEs on the shared bus snoop each bus transaction OTP can be pre-computed based on an approximate range of bus sequence numbers
16
By secure kernel
Process unique ID Secret Constant
Session Key
Encryption (AES)
Session Key
Encryption (AES)
Process Key
Hash (SHA256)
Cryptographic Hash
Cryptographic Hash
OTP (one-time-pad)
Data Block
Encrypted Data
OTP (one-time-pad)
Encrypted Data
Data Block
19
OTP Pre-computing
+1,+2, +3,
Process Key Latest Bus sequence number
Data to be transmitted
OTP queue
OTP(0x1234abcd0000) OTP Generation
OTP(0x1234abcd0001)
OTP(0x1234abcd0002) OTP(0x1234abcd001e) OTP(0x1234abcd001e) OTP(0x1234abcd001f) Bus Arbitration Logic
Shared Bus
Ownership granted, current bus sequence number = 0x1234abcd001e OTP Generation is on the critical path We can pre-compute OTP needed in the neighborhood
20
OTP Pre-Computing
Processor A Processor B
Cryptographic Hash
Cryptographic Hash
OTP (one-time-pad)
Data Block
Encrypted Data
OTP (one-time-pad)
Encrypted Data
Data Block
21
Processor A
Processor B
Processor C
Data(id, seq), Data(id+1, seq+1), MAC(id-3, seq-3), Data(id+2, seq+2), MAC(id, seq), Shared Bus
Shared-Memory MP Security Architecture 22
23
ASE
SAB Tag = 2
r3
SAB Tag =2
Load r3
r4
SAB Tag =3
r6
SAB Tag =2
Load r6
r5 r5<r6 N
SAB Tag =1
0: r3 = (addr1) 1: r4 = r3*const1 2: r5 = r4+const2 3: r6 = (addr2) 4: if (r5<r6) { 5: } else { 6: r7 = r6 + r1} 7: (addr3) = r7 MAC Fetched Fetched Fetched Verify? Verified Verified Verified
r1 r1
SAB Tag =1
r7
Wait until all the data sources are verified Shared-Memory MP Security Architecture
Save r7
24
Evaluation Methodology
RSIM MP simulator
Benchmarks: Splash, Splash2
Modified Rsim simulator to support bus snoop based cache coherence Added an accurate DRAM model Added shared memory support Implemented a North Bridge simulator with MAC tree authentication. Extended processor model to support performance simulation of proposed protection including speculative authentication.
25
AIO ASE
AIO ASE
ASE outperforms in-order execution by 80% for 2P- and 4Pprocessor systems.
26
Data Confidentiality
Performance of Protection on Confidentiality (4P)
No cache
1
Normalized IPC
0.8 0.6 0.4 0.2 0 fft lu radix quicksort water mp3d Average
40 to 55% Performance loss compared to no security support More cache-to-cache transactions, the faster execution due to OTP pre-computation With a sequence number cache, memory-to-cache operations can be accelerated by ~30%
Shared-Memory MP Security Architecture 27
Conclusions
Proposed security scheme to protect confidentiality and integrity for shared memory in snoop bus multiprocessor system. Proposed a number of techniques to minimize the overhead caused by security protection including,
Physical memory (RAM) authentication Shared bus sequence number based encryption Split transmission of data and MAC Authentication Speculative Execution without violating rule of authentication safe
Lightweight secure processor design with novel security design features (offload to North Bridge).
Shared-Memory MP Security Architecture 28
29