0% found this document useful (0 votes)
45 views28 pages

Database Privacy. Buffer Overflow Attacks

Database privacy techniques include query restriction and perturbation. Perturbation involves adding noise to query answers to protect privacy while maintaining some utility. However, as the number of queries increases, privacy decreases. Buffer overflows occur when a fixed-size buffer is filled with more data than it can hold, potentially allowing execution of arbitrary code. This is a major security issue exploited in many Internet attacks.

Uploaded by

Raymond
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views28 pages

Database Privacy. Buffer Overflow Attacks

Database privacy techniques include query restriction and perturbation. Perturbation involves adding noise to query answers to protect privacy while maintaining some utility. However, as the number of queries increases, privacy decreases. Buffer overflows occur when a fixed-size buffer is filled with more data than it can hold, potentially allowing execution of arbitrary code. This is a major security issue exploited in many Internet attacks.

Uploaded by

Raymond
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

Database privacy.

Buffer
overflow attacks
Database privacy
 Two general methods to deal with database
privacy
– Query restriction: Limit what queries are allowed.
Allowed queried are answered correctly, while
disallowed queries are simply not answered
– Perturbation: Queries answered “noisily”. Also includes
“scrubbing” (or suppressing) some of the data
Perturbation
 Data perturbation: Add noise to entire table, then
answer queries accordingly (or release entire
perturbed dataset)
 Output perturbation: Keep table intact, but add
noise to answers
(From: “Computer Security,” by Stallings)
Perturbation
 Trade-off between privacy and utility!
 No randomization – bad privacy but perfect utility
 Complete randomization – perfect privacy but no
utility
Data perturbation
 One technique: data swapping
Restriction to
– Substitute and/or swap any
values, while maintaining
low-order statistics
two columns is
identical
F Bio 4.0 F Bio 3.0
F CS 3.0 F CS 4.0
F EE 3.0 F EE 4.0
F Psych 4.0 F Psych 3.0
M Bio 3.0 M Bio 4.0
M CS 4.0 M CS 3.0
M EE 4.0 M EE 3.0
M Psych 3.0 M Psych 4.0
Data perturbation
 Second technique: (re)generate the table based on
derived distribution
– For each sensitive attribute, determine a probability
distribution that best matches the recorded data
– Generate fresh data according to the determined
distribution
– Populate the table with this fresh data
 Queries on the database can never “learn” more
than what was learned initially
Data perturbation
 Data cleaning/scrubbing: remove sensitive data, or
data that can be used to breach anonymity
 k-anonymity: ensure that any “identifying
information” is shared by at least k members of
the database
 Example…
Example: 2-anonymity
Race ZIP Smoke? Cancer?
Asian
-
Asian 0213x
02138 Y Y
Asian
-
Asian 0213x
02139 Y N
Asian
-
Asian 0214x
02141 N Y
Asian
-
Asian 0214x
02142 Y Y
Black
-
Black 0213x
02138 N N
Black
-
Black 0213x
02139 N Y
Black
-
Black 0214x
02141 Y Y
Black
-
Black 0214x
02142 N N
White
-
White 0213x
02138 Y Y
White
-
White 0213x
02139 N N
White
-
White 0214x
02141 Y Y
White
-
White 0214x
02142 Y Y
Problems with k-anonymity
 Hard to find the right balance between what is
“scrubbed” and utility of the data
 Not clear what security guarantees it provides
– For example, what if I know that the Asian person in
ZIP code 0214x smokes?
• Does not deal with out-of-band information
– What if all people who share some identifying
information share the same sensitive attribute?
Output perturbation
 One approach: replace the query with a perturbed
query, then return an exact answer to that
– E.g., a query over some set of entries C is answered
using some (randomly-determined) subset C’  C
– User only learns the answer, not C’
 Second approach: add noise to the exact answer
(to the original query)
– E.g., answer SUM(salary, S) with
SUM(salary, S) + noise
A negative result [Dinur-Nissim]
 Heavily paraphrased:
Given a database with n rows, if roughly n queries
are made to the database then essentially the entire
database can be reconstructed even if O(n1/2) noise
is added to each answer
 On the positive side, it is known that very small
error can be used when the total number of queries
is kept small
Formally defining privacy
 A problem inherent in all the approaches we have
discussed so far (and the source of many of the
problems we have seen) is that no definition of
“privacy” is offered
 Recently, there has been work addressing exactly
this point
– Developing definitions
– Provably secure schemes!
A definition of privacy
 Differential privacy [Dwork et al.]
 Roughly speaking:
– For each row r of the database (representing, say, an
individual), the distribution of answers when r is
included in the database is “close” to the distribution of
answers when r is not included in the database
• No reason for r not to include themselves in the database!
– Note: can’t hope for “closeness” better than 1/|DB|
 Further refining/extending this definition, and
determining when it can be applied, is an active
area of research
Achieving privacy
 A “converse” to the Dinur-Nissim result is that
adding some (carefully-generated) noise, and
limiting the number of queries, can be proven to
achieve privacy
 An active area of research
Achieving privacy
 E.g., answer SUM(salary, S) with
SUM(salary, S) + noise,
where the magnitude of the noise depends on the
range of plausible salaries (but not on |S|!)
 Automatically handles multiple (arbitrary) queries,
though privacy degrades as more queries are made
 Gives formal guarantees
Buffer overflows
Buffer overflows
 Previous focus in this class has been on secure
protocols and algorithms
 For real-world security, it is not enough for the
protocol/algorithm to be secure -- the
implementation must also be secure
– We have seen this already when we talked about side-
channel attacks
– Here, the attacks are active rather than passive
– Also, here the attacks exploit the way programs are run
by the machine/OS
Importance of the problem
 Most common cause of Internet attacks
– Over 50% of CERT advisories related to buffer
overflow vulnerabilities
 Morris worm (1988)
– 6,000 machines infected
 CodeRed (2001)
– 300,000 machines infected in 14 hours
 Etc.
Buffer overflows
 Fixed-sized buffer that is to be filled with
unknown data, usually provided directly by user
 If more data “stuffed” into the buffer than it can
hold, that data spills over into adjacent memory
 If this data is executable code, the victim’s
machine may be tricked into running it
 Can overflow on the stack or the heap…
A glimpse into memory
Registers
ebp function
frame stack
esp
eip

heap

code
Stack overview
 Each function that is executed is allocated its own
frame on the stack
 When one function calls another, a new frame is
initialized and placed (pushed) on the stack
 When a function is finished executing, its frame is
taken off (popped) the stack
Function calls

memory grows this way


frame for
caller function callee
function
arguments
saved eip
saved ebp
frame for
callee function local
variables
“Simple” buffer overflow
 Overflow one variable into another

ret Frame of the


color price ebp addr args calling function

locals vars

 gets(color)
– What if I type “blue 1” ?
– (Actually, need to be more clever than this)
More devious examples…
 strcpy(buf, str)
ret Frame of the
bufoverflowebp addr calling function

Pointer to This will be


Execute
previous interpreted
code at
frameas athis
return address!
address
after func()
finishes

 What if str has more than buf can hold?


 Problem: strcpy does not check that str is shorter
than buf
Even more devious…

ret Frame of the


bufoverflow sfp addr calling function

In the overflow, a pointer back


Attacker puts actual assembly into the buffer appears in
instructions into his input string, e.g., the location where the system
binary code of execve(“/bin/sh”) expects to find return address
Severity of attack?
 Theoretically, attacker can cause machine to
execute arbitrary code with the permissions of the
program itself
 Actually carrying out such an attack involves
many more details
– See “Smashing the Stack…”
Heap overflows
 The examples just described all involved
overflowing the stack
 Also possible to overflow the heap
 More difficult to get arbitrary code to execute, but
imagine the effects of overwriting
– Passwords
– Usernames
– Filenames
– Variables
– Function pointers (possible to execute arbitrary code)

You might also like