5_LanguageBasedSecurity_Safety
5_LanguageBasedSecurity_Safety
Language-based Security:
'Safe' programming languages
[Chapter 3 of lecture notes]
Erik Poll
1
Producing more secure code
You can try to produce more secure C(++) code.
Not just SAST & DAST, but more importantly:
• ...
2
Language-based security
Security features & guarantees provided by programming language
• safety guarantees,
incl. memory-safety, type-safety, thread-safety
There are many flavours & levels of 'safety' here.
Eg. different type systems give different notions of type-safety.
• forms of access control
– visibility/access restrictions with eg. public, private
– sandboxing mechanism inside programming language
• forms of information flow control
Some features dependent on each other, eg
– type safety & just about anything else relies on memory safety
– sandboxing relying on memory & type safety
This week: safety. See course lecture notes, chapters 2 & 3
3
Other ways the programming language can help
A programming language can also help security by
• offering good APIs/libraries, eg.
– APIs with parametrised queries/prepared statements for SQL
– more secure string libraries for C
• incorporating support for 'external' languages,
– eg support for SQL and HTML in Wyvern
• offering convenient language features,
– esp. exceptions, to simplify handling error conditions
• making assurance of the security easier, by
– being able to understand code in a modular way
– only having to review the public interface, in a code review
These properties require some form of safety
4
(Aside: safety vs security)
Common source of confusion!
Precise border hard to pin down, but what is good for safety is also
good for security, so often the distinction is not so relevant.
5
'Safe' programming languages?
You can write insecure programs in ANY programming language.
Eg
6
General idea behind safety
Under which conditions does a must be a non-null byte array;
a[i] = (byte)b i should be a non-negative integer
less then array length;
make sense?
b should be (castable to?) a byte
Two approaches
1. the programmer is responsible for ensuring these conditions
“unsafe” approach
2. the language is responsible for checking this
“safe” approach
7
Safe programming languages
Safe programming languages
• impose some discipline or restrictions on the programmer
• offer some abstractions to the programmer,
with associated guarantees
This takes away some freedom & flexibility from the programmer,
but hopefully extra safety and clearer understanding makes it worth
this.
8
Attempts at a general definition of safety
A programming language can be considered safe if
9
'safer' & 'unsafer' languages
machine code Java Scala Haskell
Clean
C MISRA-C C#
ML
C++ OCaml
Rust
Prolog
10
Dimensions & levels of safety
There are many dimensions of safety
11
Safety: how?
Mechanisms to provide safety include
12
Compiled binaries vs execution engines
Compiled binary runs on bare Execution engine (aka ‘runtime') isolates
hardware code from hardware
high level high level
code code
hardware hardware
14
Memory-safety – two different flavours
A programming language is memory-safe if it guarantees that
Here
15
Memory safety
Unsafe language features that break memory safety
• no array bounds checks
• pointer arithmetic
• null pointers, but only if these cause undefined behaviour
16
Null pointers in C
Common (and incorrect!) folklore:
17
Memory safety
Unsafe language features that break memory safety
• no array bounds checks
• pointer arithmetic
• null pointers, but only if these cause undefined behaviour
• manual memory management
19
Types
• Types assert invariant properties of program elements. Eg
– This variable will always hold an integer
– This function will always return an object of class X (or one of its
subclasses)
• This array will never store more than 10 items
NB there is a wide range of expressivity in type systems!
20
Type information & ideally guarantees
greeting only accessible
public class Demo{ in class Demo
static private string greeting = "Hello";
final static int CONST = 43;
21
Type-safety
Type-safety programming language guarantees that programs that
pass the type-checker can only manipulate data in ways allowed by
their types
22
Combinations of memory & type safety
Programming languages can be
• memory-safe, typed, and type sound:
– Java, C#, Rust, Go
– though some of these have loopholes to allow unsafety
– Functional languages such as Haskell, ML, Clean, F#
• memory-safe and untyped
– LISP, Prolog, many interpreted languages
• memory-unsafe, typed, and type-unsafe
– C, C++
Not type sound: using pointer arithmetic in C, you can break
any guarantees the type system could possibly make
23
Example – breaking type soundness in C++
class DiskQuota {
private:
int MinBytes;
int MaxBytes;
};
24
Ruling out buffer overflows in Java or C#
Ruled out at language-level, by combination of
• compile-time typechecking (static checks)
– or at load-time, by bytecode verifier (bcv)
– runtime checks (dynamic checks)
What runtime checks are performed when executing the code below?
public class A extends Super{
protected int[] d;
runtime checks for
1) non-nullness of d,
private A next;
and 2) array bound
1. in native code
26
Breaking type safety?
char* y
... ...
} }
28
How do we know a type system is sound? (1)
Representation independence (for booleans)
29
How do we know type system is sound? (2)
Give two formal definitions of the programming language
• a typed operational semantics, which records and checks type
information at runtime
• an untyped operational semantics, which does not
and prove their equivalence for all well-typed programs.
Or, in other words, prove the equivalence of
• a defensive execution engine (which records and checks all type
information at runtime) and
• a normal execution engine which does not
for any program that passes the type checker.
People have formalised the semantics and type system of eg Java, using
theorem provers (Coq, Isabelle/HOL), to prove such results.
30
Ongoing evolution to richer type systems:
non-null vs nullable
Many ways to enrich type systems further, eg
• Distinguish non-null & possibly-null (aka nullable) types
public @NonNull String hello = "hello";
• to improve efficiency
• to prevent null pointer bugs or detect (some/all?) of them
earlier, at compile time
• Support for this has become mainstream:
– C# supports nullable types written as A? or Nullable<A>
– In Java you can use type annotations @Nullable and @NonNull
– Scala, Rust, Kotlin, Swift, and Ceylon have non-null vs nullable aka
option(al) types
• Typically languages take the approach that references are
non-null by default
31
Ongoing evolution to richer type systems:
aliasing & information flow
• Alias control
restrict possible interferences between modules due to aliasing.
– More on the risk of aliasing later this lecture
• Information flow
controlling on the way tainted information flows through an
implementation.
– More on type systems for information flow in later lectures.
32
Other language-based guarantees
• visibility: public, private, etc
– eg private fields not accessible from outside a class
• immutability
34
Safe arithmetic
What happens if i=i+1; overflows?
35
Thread-safety
&
Aliasing
36
Problems with threads (ie. lack of thread safety)
• Two concurrent execution threads both execute the statement
x = x+1;
where x initially has the value 0.
What is the value of x in the end?
– Answer: x can have value 2 or 1
• The root cause of the problem is a data race:
x = x+1 is not an atomic operation, but happens in two steps -
reading x and assigning it the new value - which may be
interleaved in unexpected ways
37
Weird multi-threading behaviour in Java
class A {
private int i ; Can geti() ever return
something else than 5?
A() { i = 5 ;}
Yes!
int geti() { return i; }
}
38
Weird multi-threading behaviour in Java
class A { Now geti() always return 5.
private final int i ;
A() { i = 5 ;}
int geti() { return i;}
}
• A revision of the Java Memory Model specifies how compilers & VM (incl.
underlying hardware) can deal with concurrency, in 2004.
• The API implementation of String was only fixed in Java 2 (aka 1.5)
39
Data races and thread-safety
• A program contains a data race if two execution threads
simultaneously access the same variable and at least one of these
accesses is a write
NB data races are highly non-deterministic, and a pain to debug!
• thread-safety = the behaviour of a program consisting of several
threads can be understood as an interleaving of those threads
• In Java, the semantics of a program with data races is effectively
undefined, i.e. only programs without data races are thread-safe
This is the root cause of many problems, not just with concurrency
41
References to mutable data are dangerous
In multi-threaded programs, aliasing of mutable data structures can
be problematic, as the referenced data can change,
• even in safe programming languages such as Java or C# !
If there is aliasing, another thread can modify the content of the array at any
moment.
42
References to immutable data are less dangerous
In a multi-threaded program, aliasing of immutable data structures
are safer.
5 ...
Another thread with a reference to the same string cannot change the value
(or ‘contents’) of the string, as Java strings are immutable.
43