Data Types and Representation
Data Types and Representation
Agenda
11 Session
Session Overview
Overview
22 Data
Data Types
Types and
and Representation
Representation
33 ML
ML
44 Conclusion
Conclusion
2
What is the course about?
Textbook:
» Programming Language Pragmatics (3rd Edition)
Michael L. Scott
Morgan Kaufmann
ISBN-10: 0-12374-514-4, ISBN-13: 978-0-12374-514-4, (04/06/09)
Additional References:
» Osinski, Lecture notes, Summer 2010
» Grimm, Lecture notes, Spring 2010
» Gottlieb, Lecture notes, Fall 2009
» Barrett, Lecture notes, Fall 2008
Session Agenda
Session Overview
Data Types and Representation
ML Overview
Conclusion
4
Icons / Metaphors
Information
Common Realization
Knowledge/Competency Pattern
Governance
Alignment
Solution Approach
55
Session 5 Review
Historical Origins
Lambda Calculus
Functional Programming Concepts
A Review/Overview of Scheme
Evaluation Order Revisited
High-Order Functions
Functional Programming in Perspective
Conclusions
6
Agenda
11 Session
Session Overview
Overview
22 Data
Data Types
Types and
and Representation
Representation
33 ML
ML
44 Conclusion
Conclusion
Data Types
» Strong vs. Weak Typing
» Static vs. Dynamic Typing
Type Systems
» Type Declarations
Type Checking
» Type Equivalence
» Type Inference
» Subtypes and Derived Types
Scalar and Composite Types
» Records, Variant Records, Arrays, Strings, Sets
Pointers and References
» Pointers and Recursive Types
Function Types
Files and Input / Output
8
Data Types
Denotational
» type is a set T of values
» value has type T if it belongs to the set
» object has type T if it is guaranteed to be bound to a
value in T
Constructive
» type is either built-in (int, real, bool, char, etc.) or
» constructed using a type-constructor (record, array,
set, etc.)
Abstraction-based
» Type is an interface consisting of a set of operations
10
Data Types
11
Strong Typing
» has become a popular buzz-word like structured
programming
» informally, it means that the language prevents you
from applying an operation to data on which it is not
appropriate
» more formally, it means that the language does not
allow variables to be used in a way inconsistent with
their types (no loopholes)
Weak Typing
» Language allows many ways to bypass the type
system (e.g., pointer arithmetic)
» Trust the programmer vs. not
12
Data Types – Static vs. Dynamic Typing
Static Typing
» variables have types
» compiler can do all the checking of type rules at compile time
» ADA, PASCAL, ML
Dynamic Typing
» variables do not have types, values do
» Compiler ensures that type rules are obeyed at run time
» LISP, SCHEME, SMALLTALK, scripting languages
A language can have a mixture
» e.g., Java has mostly a static type system with some runtime checks
Pros and Cons:
» Static is faster
• Dynamic requires run-time checks
» Dynamic is more flexible, and makes it easier to write code
» Static makes it easier to refactor code (easier to understand and
maintain code), and facilitates error checking
13
14
Type Systems
15
Type Systems
Examples
»Common Lisp is strongly typed, but
not statically typed
»Ada is statically typed
»Pascal is almost statically typed
»Java is strongly typed, with a non-
trivial mix of things that can be
checked statically and things that
have to be checked dynamically
16
Type Systems – Scalar Types Overview
discrete types
» must have clear successor, predecessor
» Countable
» One-dimensional
• integer
• boolean
• character
floating-point types, real
» typically 64 bit (double in C); sometimes 32 bit as well (float in C)
rational types
» used to represent exact fractions (Scheme, Lisp)
complex
» Fortran, Scheme, Lisp, C99, C++ (in STL)
» Examples
• enumeration
• subrange
17
integer types
» often several sizes (e.g., 16 bit, 32 bit, 64 bit)
» sometimes have signed and unsigned variants
(e.g., C/C++, Ada, C#)
» SML/NJ has a 31-bit integer
boolean
» Common type; C had no boolean until C99
character
» See next slide
enumeration types
18
Type Systems – Other Intrinsic Types
character, string
» some languages have no character data type (e.g.,
Javascript)
» internationalization support
• Java: UTF-16
• C++: 8 or 16 bit characters; semantics implementation
dependent
» string mutability
• Most languages allow it, Java does not.
void, unit
» Used as return type of procedures;
» void: (C, Java) represents the absence of a type
» unit: (ML, Haskell) a type with one value: ()
19
20
Type Systems – Enumeration Types and Strong Typing
Ada again:
type Fruit is (Apple , Orange , Grape , Apricot );
type Vendor is (Apple , IBM , HP , Dell );
My_PC : Vendor ;
Dessert : Fruit ;
...
My_PC := Apple ;
Dessert := Apple ;
Dessert := My_PC ; -- error
is overloaded. It can be of type Fruit or Vendor.
Apple
22
Type Systems – Composite Types
Records
variants, variant records, unions
arrays, strings
classes
pointers, references
sets
Lists
maps
function types
files 23
Type Systems
24
Type Systems
For example
»Pascal is more orthogonal than Fortran,
(because it allows arrays of anything,
for instance), but it does not permit
variant records as arbitrary fields of
other records (for instance)
Orthogonality is nice primarily
because it makes a language easy to
understand, easy to use, and easy to
reason about 25
Type Checking
26
Type Checking – Type Compatibility
27
28
Type Checking
30
Type Checking – Type Equivalence Examples
31
type student = {
name : string ,
address : string
}
type school = {
name : string ,
address : string
}
type age = float ;
type weight = float ;
With structural equivalence, we can accidentally
assign a school to a student, or an age to a
weight
32
Type Checking – Type Conversion
Type Checking
34
Type Checking
35
Type Checking
Coercion
»When an expression of one type is used
in a context where a different type is
expected, one normally gets a type
error
»But what about
var a : integer; b, c : real;
...
c := a + b;
36
Type Checking
Coercion
»Many languages allow things like this,
and COERCE an expression to be of
the proper type
»Coercion can be based just on types of
operands, or can take into account
expected type from surrounding context
as well
»Fortran has lots of coercion, all based
on operand type
37
Type Checking
38
Type Checking
Type Checking
40
Type Checking - Type Coercion
Coercion in C
» The following types can be freely mixed in C:
• char
• (unsigned) (short, long) int
• float, double
» Recent trends in type coercion:
• static typing: stronger type system, less type
coercion
• user-defined: C++ allows user-defined type
coercion rules
41
SCHEME
(define (length l)
(cond
((null? l) 0)
(#t (+ (length (cdr l)) 1))))
The types are checked at run-time
ML
fun length xs =
if null xs
then 0
else 1 + length (tl xs)
length returns an int, and can take a list of any element type, because we
don’t care what the element type is. The type of this function is written ’a list
-> int
Subtype examples:
» A record type containing fields a, b and c can be
considered a subtype of one containing only a and c
» A variant record type consisting of fields a or c can
be considered a subtype of one containing a or b or c
» The subrange 1..100 can be considered a subtype of
the subrange 1..500.
44
Type Checking – Subtype Polymorphisms and Coercion
subtype polymorphism:
» ability to treat a value of a subtype as a value of a
supertype
coercion:
» ability to convert a value of a subtype to a value of
Example:
» Let’s say type s is a subtype of r.
var vs: s;
var vr: r;
» Subtype polymorphism:
function [t r] f (x: t): t { return x; }
f(vr ); // returns a value of type r
f(vs ); // returns a value of type s
» Coercion:
function f (x: r): r { return x; }
f(vr ); // returns a value of type r
f(vs ); // returns a value of type r
45
46
Type Checking – Overloading and Coercion
47
» C/C++: const
» Java: final
48
Type Checking – Type Inference
Type checking:
» Variables are declared with their type
» Compiler determines if variables are used in accordance with
their type declarations
fun f x =
if x = 5 (* There are two type errors here *)
then hd x
else tl x
49
Records
» A record consists of a set of typed fields.
» Choices:
• Name or structural equivalence? Most statically typed languages
choose name equivalence
• ML, Haskell are exceptions
» Nested records allowed?
• Usually, yes. In FORTRAN and LISP, records but not record
declarations can be nested
» Does order of fields matter?
• Typically, yes, but not in ML
» Any subtyping relationship with other record types?
• Most statically typed languages say no
• Dynamically typed languages implicitly say yes
• This is know as duck typing
“if it walks like a duck and quacks like a duck, I would call it a
duck”
-James Whitcomb Riley
51
Records (Structures)
»usually laid out contiguously
»possible holes for alignment reasons
»smart compilers may re-arrange fields
to minimize holes (C compilers promise
not to)
»implementation problems are caused
by records containing dynamic arrays
• we won't be going into that in any detail
52
Scalar and Composite Types – Records Syntax
PASCAL:
type element = record
name : array[1..2] of char;
atomic_number : integer;
atomic_weight : real;
end;
C:
struct element {
char name[2];
int atomic_number;
double atomic_weight;
};
ML:
type element = {
name: string,
atomic_number: int,
atomic_weight: real
};
53
56
Scalar and Composite Types – Records and Variant Records
Figure 7.1 Likely layout in memory for objects of type element on a 32-bit
machine. Alignment restrictions lead to the shaded “holes.”
57
Figure 7.2 Likely memory layout for packed element records. The atomic_number
and atomic_weight fields are nonaligned, and can only be read or written (on most
machines) via multi-instruction sequences.
58
Scalar and Composite Types – Records and Variant Records
Figure 7.3 Rearranging record fields to minimize holes. By sor ting fields
according to the size of their alignment constraint, a compiler can minimize the
space devoted to holes, while keeping the fields aligned.
59
Figure 7.15 (CD) Likely memory layouts for element variants. The value of the naturally
occurring field (shown here with a double border) determines which of the interpretations of
the remaining space is valid. Type string_ptr is assumed to be represented by a (four-byte)
pointer to dynamically allocated storage.
60
Scalar and Composite Types –Variant Records in Ada
62
Scalar and Composite Types – Discriminant Checking – Part 2
L : Figure ( Line );
F : Figure ; -- illegal , don ’t know which kind
P1 := Point ;
...
C := ( Circle , Red , False , 10, P1 );
-- record aggregate
... C. Orientation ...
-- illegal , circles have no orientation
C := L;
-- illegal , different kinds
C. Kind := Square ;
-- illegal , discriminant is constant
Discriminant is a visible constant component of object.
63
64
Scalar and Composite Types – Free Unions
66
Scalar and Composite Types – Arrays
index types
» most languages restrict to an integral type
» Ada, Pascal, Haskell allow any scalar type
index bounds
» many languages restrict lower bound:
» C, Java: 0, Fortran: 1, Ada, Pascal: no restriction
when is length determined
» Fortran: compile time; most other languages: can choose
dimensions
» some languages have multi-dimensional arrays (Fortran, C)
» many simulate multi-dimensional arrays as arrays of arrays (Java)
literals
» C/C++ has initializers, but not full-fledged literals
» Ada: (23, 76, 14) Scheme: #(23, 76, 14)
first-classness
» C, C++ does not allow arrays to be returned from functions
a slice or section is a rectangular portion of an array
» Some languages (e.g. FORTRAN, PERL, PYTHON, APL) have a rich set of array
operations for creating and manipulating sections. 68
Scalar and Composite Types – Array Literals
69
Figure 7.4 Array slices(sections) in Fortran90. Much like the values in the header of an enumeration-
controlled loop (Section6.5.1), a: b: c in a subscript indicates positions a, a+c, a+2c, ...through b. If a or b
is omitted, the corresponding bound of the array is assumed. If c is omitted, 1 is assumed. It is even
possible to use negative values of c in order to select positions in reverse order. The slashes in the
second subscript of the lower right example delimit an explicit list of positions. 70
Scalar and Composite Types – Arrays Shapes
71
Two-dimensional arrays
» Row-major layout: Each row of array is in a contiguous chunk of
memory
» Column-major layout: Each column of array is in a contiguous chunk of
memory
» Row-pointer layout: An array of pointers to rows lying anywhere in
memory
If an array is traversed differently from how it is laid out, this can
dramatically affect performance (primarily because of cache
misses)
A dope vector contains the dimension, bounds, and size information
for an array. Dynamic arrays require that the dope vector be held in
memory during run-time
Contiguous elements (see Figure 7.7)
» column major - only in Fortran
» row major
• used by everybody else
• makes array [a..b, c..d] the same as array [a..b] of array [c..d]
73
Figure7.7 Row- and column-major memory layout for two-dimensional arrays. In row-major order, the
elements of a row are contiguous in memory; in column-major order, the elements of a column are
contiguous. The second cache line of each array is shaded, on the assumption that each element is an
eight-byte floating-point number, that cache lines are 32 bytes long (a common size), and that the array
begins at a cache line boundary. If the array is indexed from A[0,0] to A[9,9], then in the row-major case
elements A[0,4] through A[0,7] share a cache line; in the column-major case elements A[4,0] through
A[7,0] share a cache line.
74
Scalar and Composite Types - Arrays
Figure 7.8 Contiguous array allocation v. row pointers in C. The declaration on the left is a tr ue
two-dimensional array. The slashed boxes are NUL bytes; the shaded areas are holes. The
declaration on the right is a ragged array of pointers to arrays of character s. In both cases, we
have omitted bounds in the declaration that can be deduced from the size of the initializer
(aggregate). Both data structures permit individual characters to be accessed using double
subscripts, but the memory layout (and corresponding address arithmetic) is quite different.
76
Scalar and Composite Types - Arrays
Example: Suppose
A : array [L1..U1] of array [L2..U2] of
array [L3..U3] of elem;
D1 = U1-L1+1
D2 = U2-L2+1
D3 = U3-L3+1
Let
S3 = size of elem
S2 = D3 * S3
S1 = D2 * S2
77
Figure 7.9 Virtual location of an array with nonzero lower bounds. By computing the constant
portions of an array index at compile time, we effectively index into an array whose starting
address is offset in memory, but whose lower bounds are all zero.
78
Scalar and Composite Types - Arrays
Example (continued)
We could compute all that at run time, but
we can make do with fewer subtractions:
80
Scalar and Composite Types - Strings
82
Scalar and Composite Types – Initializers in C++
83
84
Pointers and Recursive Types – Pointers and References
85
86
Pointers and Recursive Types - Pointers
Figure 7.11 Implementation of a tree in Lisp. A diagonal slash through a box indicates a null
pointer. The C and A tags serve to distinguish the two kinds of memory blocks: cons cells and
blocks containing atoms.
87
Figure 7.12 Typical implementation of a tree in a language with explicit pointers. As in Figure 7.11, a
diagonal slash through a box indicates a null pointer.
88
Pointers and Recursive Types – Extra Pointer Capabilities
Questions:
» Is it possible to get the address of a variable?
» Convenient, but aliasing causes optimization difficulties
(the same way that pass by reference does)
» Unsafe if we can get the address of a stack allocated
variable.
Is pointer arithmetic allowed?
» Unsafe if unrestricted.
» In C, no bounds checking:
// allocate space for 10 ints
int *p = malloc (10 * sizeof (int ));
p += 42;
... *p ... // out of bounds , but no check
89
90
Pointers and Recursive Types - Pointers
92
Pointers and Recursive Types – “Generic” Reference Types
93
95
96
Pointers and Recursive Types - Pointers
97
98
Pointers and Recursive Types - Pointers
Figure 7.17 (CD) Tombstones. A valid pointer refers to a tombstone that in turn refers to an
object. A dangling reference refers to an “expired” tombstone.
99
Figure 7.18 (CD) Locks and Keys. A valid pointer contains a key that matches the lock on an
object in the heap. A dangling reference is unlikely to match.
100
Pointers and Recursive Types - Pointers
101
Figure 7.13 Reference counts and circular lists. The list shown here cannot be found via any
program variable, but because it is circular, every cell contains a nonzero count.
102
Pointers and Recursive Types - Pointers
Mark-and-sweep
» commonplace in Lisp dialects
» complicated in languages with rich type
structure, but possible if language is strongly
typed
» achieved successfully in Cedar, Ada, Java,
Modula-3, ML
» complete solution impossible in languages
that are not strongly typed
» conservative approximation possible in almost
any language (Xerox Portable Common
Runtime approach)
103
104
Pointers and Recursive Types – Lists, Sets, and Maps
Recursive Types
» list: ordered collection of elements
» set: collection of elements with fast searching
» map: collection of (key, value) pairs with fast key lookup
Low-level languages typically do not provide these.
High-level and scripting
» languages do, some as part of a library.
• Perl, Python: built-in, lists and arrays merged.
• C, Fortran, Cobol: no
• C++: part of STL: list<T>, set<T>, map<K,V>
• Java: yes, in library
• Setl: built-in
• ML, Haskell: lists built-in, set, map part of library
• Scheme: lists built-in
• Pascal: built-in sets
– but only for discrete types with few elements, e.g., 32
105
106
Pointers and Recursive Types – Dynamic Data Structures
struct cell {
int value ;
cell * prev ; // legal to mention name
cell * next ; // before end of declaration
};
struct list ; // incomplete declaration
struct link {
link * succ ; // pointers to the
list * memberOf ; // incomplete type
};
struct list { // full definition
link * head ; // mutually recursive references
};
108
Function Types
11 Session
Session Overview
Overview
22 Data
Data Types
Types and
and Representation
Representation
33 ML
ML
44 Conclusion
Conclusion
111
State
» Introduces context sensitivity
» Harder to reuse functions in different context
» Easy to develop inconsistent state
int balance = account.getBalance;
balance += deposit;
// Now there are two different values stored in two different places
Sequence of function calls may change behavior of a
function
• Oh, didn’t you know you have to call C.init() before you…
» Lack of Referential Transparency
These issues can make imperative programs hard to
understand
112
What is functional programming?
113
Some Sums
x is a vector of integers
Imperative
Describes how to calculate result
Iterator it = x.iterator();
int result = 0;
while(it.hasNext()) {
result += it.next();
}
Functional
Defines what the result is
function sum [] = 0
| sum (x::xs) = x + (sum xs)
+/x
114
History
115
SML Implementations
Poly/ML is at www.polyml.org
116
ML: a quasi-functional language with strong typing
Conventional syntax:
> val x = 5; (*user input *)
val x = 5: int (*system response*)
> fun len lis = if (null lis) then 0 else 1 + len (tl lis); val
len = fn : ‘a list -> int
117
118
ML Overview
121
Records
122
Lists
123
Operations on Lists
append ([1,2,3],[4,5,6]);
val it = [1, 2, 3, 4, 5, 6] : int list
125
Selection
if x = 0 then y = 1 else y = 2
fun fif(0) = 1
| fif(-) = 2;
126
Currying: partial bindings
fun reduce f i [] = i
| reduce f i (x::xs) = f x (reduce f i xs);
fun add a b = a + b
fun times a b = a * b
128
Simple Functions
A function declaration:
- fun abs x = if x >= 0.0 then x else –x
val abs = fn : real -> real
A function expression:
- fn x => if x >= 0.0 then x else -x
val it = fn : real -> real
129
Functions, II
- fun length xs =
if null xs
then 0
else 1 + length (tl xs );
val length = fn : ’a list -> int
- fun length [] = 0
| length (x:: xs) = 1 + length xs
val length = fn : ’a list -> int
130
Type inference
131
132
Polymorphism
133
134
Unification algorithm
135
Multiple Arguments?
136
The Tuple Solution
137
Currying
- fun append2 [ ] ys = ys
| append2 (x:: xs) ys = x :: ( append2 xs ys );
val append2 = fn: ’a list -> ’a list -> ’a list
- append2 [1 ,2 ,3] [8 ,9];
val it = [1 ,2 ,3 ,8 ,9] : int list
- val app123 = append2 [1 ,2 ,3];
val app123 = fn : int list -> int list
- app123 [8 ,9];
val it = [1 ,2 ,3 ,8 ,9] : int list
138
More Partial Application
139
fun flip f y x = f x y
The type of flip is ( → → ) → → → . Why?
Consider (f x). f is a function; its parameter must have the
same type as x
f:A→B x : A (f x) : B
Now consider (f x y). Because function application is left-
associative, f x y ≡ (f x) y. Therefore, (f x) must be a
function, and its parameter must have the same type as y:
(f x) : C → D y : C (f x y) : D
Note that B must be the same as C → D. We say that B
must unify with C → D
The return type of flip is whatever the type of f x y is. After
renaming the types, we have the type given at the top
140
User-defined types and inference
141
Type Rules
142
Passing Functions
143
Applying Functionals
145
146
Let declarations
fun fib 0 = 0
| fib n = let fun fibb (x, prev, curr) = if x=1 then curr
else fibb (x-1, curr, prev + curr)
in
fibb(n, 0, 1)
end;
147
148
Another Variant of Mergesort
149
150
More on ML Records
Data Types
A datatype declaration:
» defines a new type that is not equivalent to any
other type (name equivalence)
» introduces data constructors
• data constructors can be used in patterns
» they are also values themselves
152
Datatype Example
153
154
Parametrized Datatypes
datatype ’a gentree =
Leaf of ’a
| Node of ’a gentree * ’a gentree
val names = Node ( Leaf " this ", Leaf " that ")
155
Pattern elements:
» integer literals: 4, 19
» character literals: #’a’
» string literals: "hello"
» data constructors: Node (· · ·)
• depending on type, may have arguments, which
would also be patterns
» variables: x, ys
» wildcard: _
Convention is to capitalize data constructors,
and start variables with lower-case.
156
More Rules of Pattern Matching
Special forms:
» (), {} – the unit value
» [ ] – empty list
» [p1, p2, · · ·, pn]
• means (p1 :: (p2 :: · · · (pn :: [])· · ·))
» (p1, p2, · · ·, pn) – a tuple
» {field1, field2, · · · fieldn} – a record
» {field1, field2, · · · fieldn, ...}
• a partially specified record
» v as p
• v is a name for the entire pattern p
157
158
Another Lookup Function
159
160
Overloading
161
162
Parametric Polymorphism vs. Generics
ML Signature
signature STACKS =
sig
type stack
exception Underflow
val empty : stack
val push : char * stack -> stack
val pop : stack -> char * stack
val isEmpty : stack -> bool
end
164
Programming in the large in ML
165
ML Structure
- use (“complex.ml”);
signature Complex :
sig
….
- Complex.prod (Complex.i, Complex.i);
val it = (~1.0, 0.0);
167
Multiple implementations
168
Information Hiding
169
Functors
170
Imperative Programming in ML
171
fun Id x = x; (* id : ‘a -> ‘a *)
val fp = ref Id; (*a function pointer *)
fp := not; !fp 5 ;
(* must be forbidden! *)
172
Agenda
11 Session
Session Overview
Overview
22 Data
Data Types
Types and
and Representation
Representation
33 ML
ML
44 Conclusion
Conclusion
173
Readings
» Chapter Section 7
Programming Assignment #2
» See Programming Assignment #2 posted under “handouts” on the
course Web site - Ongoing
» Due on March 31, 2011
174
Next Session: Program Structure, OO Programming
175