0% found this document useful (0 votes)
164 views

Fundamental Data Structures PDF

This document discusses fundamental data structures used in computer science. It covers abstract data types, common data structures like arrays and dictionaries, and analysis of algorithms. Various data structures are examined, including arrays, dynamic arrays, associative arrays, hash tables, and their applications, implementations, and efficiency.

Uploaded by

RajeshRajan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
164 views

Fundamental Data Structures PDF

This document discusses fundamental data structures used in computer science. It covers abstract data types, common data structures like arrays and dictionaries, and analysis of algorithms. Various data structures are examined, including arrays, dynamic arrays, associative arrays, hash tables, and their applications, implementations, and efficiency.

Uploaded by

RajeshRajan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Fundamental Data Structures

Contents
1

Introduction

1.1

Abstract data type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.1

Dening an abstract data type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.2

Advantages of abstract data typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.3

Typical operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.4

Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.5

Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.6

See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.7

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.8

Further . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.9

External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2.1

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2.2

Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2.3

Language support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2.4

See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2.5

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2.6

Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2.7

External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Analysis of algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3.1

Cost models

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3.2

Run-time analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3.3

Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

1.3.4

Constant factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

1.3.5

See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

1.3.6

Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

1.3.7

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

1.2

1.3

Sequences

13

2.1

Array data type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.1.1

History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.1.2

Abstract arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.1.3

Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

ii

CONTENTS

2.2

2.3

2.1.4

Language support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.1.5

See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

2.1.6

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

2.1.7

External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

Array data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

2.2.1

History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.2.2

Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.2.3

Element identier and addressing formulas . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.2.4

Eciency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.2.5

Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.2.6

See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.2.7

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.2.8

External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

Dynamic array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.3.1

Bounded-size dynamic arrays and capacity . . . . . . . . . . . . . . . . . . . . . . . . . .

21

2.3.2

Geometric expansion and amortized cost . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

2.3.3

Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

2.3.4

Variants

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.3.5

Language support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.3.6

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.3.7

External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

Dictionaries

24

3.1

Associative array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

3.1.1

Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

3.1.2

Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

3.1.3

Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

3.1.4

Language support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

3.1.5

See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

3.1.6

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

3.1.7

External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

Association list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

3.2.1

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

Hash table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

3.3.1

Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

3.3.2

Key statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

3.3.3

Collision resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

3.3.4

Dynamic resizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

3.3.5

Performance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

3.3.6

Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

3.3.7

Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

3.3.8

Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

3.2
3.3

CONTENTS
3.3.9

3.4

iii
History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

3.3.10 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

3.3.11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

3.3.12 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

3.3.13 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

Linear probing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

3.4.1

Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

3.4.2

Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

3.4.3

Dictionary operation in constant time . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

3.4.4

See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

3.4.5

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

3.4.6

External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

Text and image sources, contributors, and licenses

37

4.1

Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

4.2

Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

4.3

Content license . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

Chapter 1

Introduction
1.1 Abstract data type

programs.
The term abstract data type can also be regarded as
a generalised approach of a number of algebraic structures, such as lattices, groups, and rings.[2] The notion of
abstract data types is related to the concept of data abstraction, important in object-oriented programming and
design by contract methodologies for software development .

Not to be confused with Algebraic data type.

In computer science, an abstract data type (ADT) is


a mathematical model for a certain class of data structures that have similar behavior; or for certain data types
of one or more programming languages that have similar
semantics. An abstract data type is dened only by the operations that may be performed on it and by mathematical 1.1.1 Dening an abstract data type
pre-conditions and constraints on the eects (and possibly cost) of those operations. They were rst proposed An abstract data type is dened as a mathematical model
by Barbara Liskov and Stephen N. Zilles in 1974.[1]
of the data objects that make up a data type as well as the
For example, an abstract stack, which is a last-in-rst- functions that operate on these objects. There are no stanout structure, could be dened by three operations: push, dard conventions for dening them. A broad division may
that inserts some data item onto the structure, pop, that be drawn between imperative and functional deniextracts an item from it, and peek or top, that allows data tion styles.
on top of the structure to be examined without removal.
An abstract queue data structure, which is a rst-in-rstout structure, would also have three operations, enqueue Imperative denition style
to join the queue; dequeue, to remove the rst element
In the imperative denition style, which is closer to the
from the queue; and front, in order to access and serve
philosophy of imperative programming languages, an abthe rst element in the queue. There would be no way of
stract data structure is conceived as an entity that is mutadierentiating these two data types, unless a mathematible meaning that it may be in dierent states at diercal constraint is introduced that for a stack species that
ent times. Some operations may change the state of the
each pop always returns the most recently pushed item
ADT; therefore, the order in which operations are evalthat has not been popped yet. When analyzing the euated is important, and the same operation on the same
ciency of algorithms that use stacks, one may also specify
entities may have dierent eects if executed at dierent
that all operations take the same time no matter how many
times just like the instructions of a computer, or the
items have been pushed into the stack, and that the stack
commands and procedures of an imperative language. To
uses a constant amount of storage for each element.
underscore this view, it is customary to say that the operAbstract data types are purely theoretical entities, used ations are executed or applied, rather than evaluated. The
(among other things) to simplify the description of ab- imperative style is often used when describing abstract alstract algorithms, to classify and evaluate data structures, gorithms. This is described by Donald E. Knuth and can
and to formally describe the type systems of program- be referenced from here The Art of Computer Programming languages. However, an ADT may be implemented ming.
by specic data types or data structures, in many ways
and in many programming languages; or described in a
formal specication language. ADTs are often imple- Abstract variable Imperative ADT denitions often
mented as modules: the modules interface declares pro- depend on the concept of an abstract variable, which may
cedures that correspond to the ADT operations, some- be regarded as the simplest non-trivial ADT. An abstract
times with comments that describe the constraints. This variable V is a mutable entity that admits two operations:
information hiding strategy allows the implementation of
the module to be changed without disturbing the client
store(V,x) where x is a value of unspecied nature;
1

CHAPTER 1. INTRODUCTION
and
fetch(V), that yields a value;

with the constraint that

This axiom may be strengthened to exclude also partial


aliasing with other instances. On the other hand, this axiom still allows implementations of create() to yield a previously created instance that has become inaccessible to
the program.

fetch(V) always returns the value x used in the most Preconditions, postconditions, and invariants In
recent store(V,x) operation on the same variable V. imperative-style denitions, the axioms are often expressed by preconditions, that specify when an operation
As in so many programming languages, the operation may be executed; postconditions, that relate the states of
store(V,x) is often written V x (or some similar nota- the ADT before and after the execution of each operation), and fetch(V) is implied whenever a variable V is tion; and invariants, that specify properties of the ADT
used in a context where a value is required. Thus, for that are not changed by the operations.
example, V V + 1 is commonly understood to be a
shorthand for store(V,fetch(V) + 1).
Example: abstract stack (imperative) As another
In this denition, it is implicitly assumed that storing a example, an imperative denition of an abstract stack
value into a variable U has no eect on the state of a dis- could specify that the state of a stack S can be modied
tinct variable V. To make this assumption explicit, one only by the operations
could add the constraint that
if U and V are distinct variables, the sequence {
store(U,x); store(V,y) } is equivalent to { store(V,y);
store(U,x) }.

push(S,x), where x is some value of unspecied nature; and


pop(S), that yields a value as a result;

with the constraint that


More generally, ADT denitions often assume that any
operation that changes the state of one ADT instance has
For any value x and any abstract variable V, the seno eect on the state of any other instance (including
quence of operations { push(S,x); V pop(S) } is
other instances of the same ADT) unless the ADT axequivalent to { V x };
ioms imply that the two instances are connected (aliased)
in that sense. For example, when extending the denition
of abstract variable to include abstract records, the opera- Since the assignment { V x }, by denition, cannot
tion that selects a eld from a record variable R must yield change the state of S, this condition implies that { V
pop(S) } restores S to the state it had before the {
a variable V that is aliased to that part of R.
push(S,x) }. From this condition and from the properThe denition of an abstract variable V may also restrict
ties of abstract variables, it follows, for example, that the
the stored values x to members of a specic set X, called
sequence
the range or type of V. As in programming languages,
such restrictions may simplify the description and analysis
{ push(S,x); push(S,y); U pop(S); push(S,z);
of algorithms, and improve their readability.
V pop(S); W pop(S); }
Note that this denition does not imply anything about
the result of evaluating fetch(V) when V is un-initialized, where x,y, and z are any values, and U, V, W are pairwise
that is, before performing any store operation on V. An distinct variables, is equivalent to
algorithm that does so is usually considered invalid, because its eect is not dened. (However, there are some
{ U y; V z; W x }
important algorithms whose eciency strongly depends
on the assumption that such a fetch is legal, and returns
Here it is implicitly assumed that operations on a stack insome arbitrary value in the variables range.)
stance do not modify the state of any other ADT instance,
including other stacks; that is,
Instance creation Some algorithms need to create new
instances of some ADT (such as new variables, or new
For any values x,y, and any distinct stacks S and T,
stacks). To describe such algorithms, one usually includes
the sequence { push(S,x); push(T,y) } is equivalent
in the ADT denition a create() operation that yields an
to { push(T,y); push(S,x) }.
instance of the ADT, usually with axioms equivalent to
the result of create() is distinct from any instance S
in use by the algorithm.

A stack ADT denition usually includes also a Booleanvalued function empty(S) and a create() operation that returns a stack instance, with axioms equivalent to

1.1. ABSTRACT DATA TYPE

create() S for any stack S (a newly created stack is In a functional-style denition there is no need for a credistinct from all previous stacks)
ate operation. Indeed, there is no notion of stack instance. The stack states can be thought of as being po empty(create()) (a newly created stack is empty)
tential states of a single stack structure, and two stack
states that contain the same values in the same order are
not empty(push(S,x)) (pushing something into a
considered to be identical states. This view actually mirstack makes it non-empty)
rors the behavior of some concrete implementations, such
as linked lists with hash cons.
Single-instance style Sometimes an ADT is dened
as if only one instance of it existed during the execution
of the algorithm, and all operations were applied to that
instance, which is not explicitly notated. For example,
the abstract stack above could have been dened with operations push(x) and pop(), that operate on the only existing stack. ADT denitions in this style can be easily
rewritten to admit multiple coexisting instances of the
ADT, by adding an explicit instance parameter (like S
in the previous example) to every operation that uses or
modies the implicit instance.

Instead of create(), a functional denition of a stack ADT


may assume the existence of a special stack state, the
empty stack, designated by a special symbol like or "()";
or dene a bottom() operation that takes no arguments
and returns this special stack state. Note that the axioms
imply that
push(,x)

In a functional denition of a stack one does not need an


empty predicate: instead, one can test whether a stack is
On the other hand, some ADTs cannot be meaningfully empty by testing whether it is equal to .
dened without assuming multiple instances. This is the
Note that these axioms do not dene the eect of top(s)
case when a single operation takes two distinct instances
or pop(s), unless s is a stack state returned by a push.
of the ADT as parameters. For an example, consider augSince push leaves the stack non-empty, those two opermenting the denition of the stack ADT with an operaations are undened (hence invalid) when s = . On the
tion compare(S,T) that checks whether the stacks S and
other hand, the axioms (and the lack of side eects) imT contain the same items in the same order.
ply that push(s,x) = push(t,y) if and only if x = y and s =
t.
Functional ADT denitions
As in some other branches of mathematics, it is customary to assume also that the stack states are only those
Another way to dene an ADT, closer to the spirit of whose existence can be proved from the axioms in a functional programming, is to consider each state of the nite number of steps. In the stack ADT example above,
structure as a separate entity. In this view, any opera- this rule means that every stack is a nite sequence of
tion that modies the ADT is modeled as a mathematical values, that becomes the empty stack () after a nite
function that takes the old state as an argument, and re- number of pops. By themselves, the axioms above do not
turns the new state as part of the result. Unlike the im- exclude the existence of innite stacks (that can be poped
perative operations, these functions have no side eects. forever, each time yielding a dierent state) or circular
Therefore, the order in which they are evaluated is imma- stacks (that return to the same state after a nite numterial, and the same operation applied to the same argu- ber of pops). In particular, they do not exclude states s
ments (including the same input states) will always return such that pop(s) = s or push(s,x) = s for some x. However,
the same results (and output states).
since one cannot obtain such stack states with the given
In the functional view, in particular, there is no way (or operations, they are assumed not to exist.
need) to dene an abstract variable with the semantics
of imperative variables (namely, with fetch and store opWhether to include complexity
erations). Instead of storing values into variables, one
passes them as arguments to functions.
Aside from the behavior in terms of axioms, it is also possible to include, in the denition of an ADTs operations,
Example: abstract stack (functional) For example, their algorithmic complexity. Alexander Stepanov, dea complete functional-style denition of a stack ADT signer of the C++ Standard Template Library, included
complexity guarantees in the STLs specication, argucould use the three operations:
ing:
push: takes a stack state and an arbitrary value, returns a stack state;
top: takes a stack state, returns a value;
pop: takes a stack state, returns a stack state;

The reason for introducing the notion of


abstract data types was to allow interchangeable software modules. You cannot have
interchangeable modules unless these modules
share similar complexity behavior. If I replace

CHAPTER 1. INTRODUCTION
one module with another module with the
same functional behavior but with dierent
complexity tradeos, the user of this code
will be unpleasantly surprised. I could tell
him anything I like about data abstraction,
and he still would not want to use the code.
Complexity assertions have to be part of the
interface.
Alexander Stepanov[3]

1.1.2

Advantages of abstract data typing

Encapsulation
Abstraction provides a promise that any implementation
of the ADT has certain properties and abilities; knowing
these is all that is required to make use of an ADT object.
The user does not need any technical knowledge of how
the implementation works to use the ADT. In this way,
the implementation may be complex but will be encapsulated in a simple interface when it is actually used.
Localization of change

print(s) or show(s), that produces a human-readable


representation of the structures state.
In imperative-style ADT denitions, one often nds also
create(), that yields a new instance of the ADT;
initialize(s), that prepares a newly created instance
s for further operations, or resets it to some initial
state";
copy(s,t), that puts instance s in a state equivalent to
that of t;
clone(t), that performs s create(), copy(s,t), and
returns s;
free(s) or destroy(s), that reclaims the memory and
other resources used by s;
The free operation is not normally relevant or meaningful, since ADTs are theoretical entities that do not use
memory. However, it may be necessary when one needs
to analyze the storage used by an algorithm that uses the
ADT. In that case one needs additional axioms that specify how much memory each ADT instance uses, as a function of its state, and how much of it is returned to the pool
by free.

Code that uses an ADT object will not need to be edited


if the implementation of the ADT is changed. Since any 1.1.4 Examples
changes to the implementation must still comply with the
interface, and since code using an ADT may only refer to Some common ADTs, which have proved useful in a great
properties and abilities specied in the interface, changes variety of applications, are
may be made to the implementation without requiring any
changes in code where the ADT is used.
Container
Flexibility

Deque
List

Dierent implementations of an ADT, having all the


same properties and abilities, are equivalent and may
be used somewhat interchangeably in code that uses the
ADT. This gives a great deal of exibility when using
ADT objects in dierent situations. For example, different implementations of an ADT may be more ecient
in dierent situations; it is possible to use each in the situation where they are preferable, thus increasing overall
eciency.

1.1.3

Typical operations

Some operations that are often specied for ADTs (possibly under other names) are

Map
Multimap
Multiset
Priority queue
Queue
Set
Stack
Tree
Graph

Each of these ADTs may be dened in many ways and


compare(s,t), that tests whether two structures are variants, not necessarily equivalent. For example, a stack
ADT may or may not have a count operation that tells
equivalent in some sense;
how many items have been pushed and not yet popped.
hash(s), that computes some standard hash function This choice makes a dierence not only for its clients but
from the instances state;
also for the implementation.

1.1. ABSTRACT DATA TYPE

1.1.5

Implementation

Implementing an ADT means providing one procedure or


function for each abstract operation. The ADT instances
are represented by some concrete data structure that is
manipulated by those procedures, according to the ADTs
specications.

This implementation could be used in the following manner:

#include <stack.h> /* Include the stack interface. */


stack_T t = stack_create(); /* Create a stack instance.
*/ int foo = 17; /* An arbitrary datum. */ stack_push(t,
&foo); /* Push the address of 'foo' onto the stack. */
Usually there are many ways to implement the same
void *e = stack_pop(t); /* Get the top item and delete
ADT, using several dierent concrete data structures.
it from the stack. */ if (stack_empty(t)) { } /* Do
Thus, for example, an abstract stack can be implemented
something if stack is empty. */
by a linked list or by an array.
An ADT implementation is often packaged as one or
more modules, whose interface contains only the signature (number and types of the parameters and results)
of the operations. The implementation of the module
namely, the bodies of the procedures and the concrete data structure used can then be hidden from most
clients of the module. This makes it possible to change
the implementation without aecting the clients.
When implementing an ADT, each instance (in
imperative-style denitions) or each state (in functionalstyle denitions) is usually represented by a handle of
some sort.[4]
Modern object-oriented languages, such as C++ and Java,
support a form of abstract data types. When a class is
used as a type, it is an abstract type that refers to a hidden
representation. In this model an ADT is typically implemented as a class, and each instance of the ADT is usually an object of that class. The modules interface typically declares the constructors as ordinary procedures,
and most of the other ADT operations as methods of that
class. However, such an approach does not easily encapsulate multiple representational variants found in an
ADT. It also can undermine the extensibility of objectoriented programs. In a pure object-oriented program
that uses interfaces as types, types refer to behaviors not
representations.

This interface can be implemented in many ways. The


implementation may be arbitrarily inecient, since the
formal denition of the ADT, above, does not specify
how much space the stack may use, nor how long each
operation should take. It also does not specify whether
the stack state t continues to exist after a call s pop(t).
In practice the formal denition should specify that the
space is proportional to the number of items pushed and
not yet popped; and that every one of the operations above
must nish in a constant amount of time, independently
of that number. To comply with these additional specications, the implementation could use a linked list, or an
array (with dynamic resizing) together with two integers
(an item count and the array size)

Functional-style interface
Functional-style ADT
denitions are more appropriate for functional programming languages, and vice-versa. However, one can provide a functional style interface even in an imperative language like C. For example:

typedef struct stack_Rep stack_Rep; /* Type: stack state


representation (an opaque record). */ typedef stack_Rep
*stack_T; /* Type: handle to a stack state (an opaque
pointer). */ typedef void *stack_Item; /* Type: item
(arbitrary address).
*/ stack_T stack_empty(void);
/* Returns the empty stack state.
*/ stack_T
stack_push(stack_T s, stack_Item x); /* Adds x at
Example: implementation of the stack ADT
the top of s, returns the resulting state. */ stack_Item
As an example, here is an implementation of the stack stack_top(stack_T s); /* Returns the item currently
at the top of s. */ stack_T stack_pop(stack_T s); /*
ADT above in the C programming language.
Remove the top item from s, returns the resulting state. */
Imperative-style interface
An imperative-style interface might be:
typedef struct stack_Rep stack_Rep; /* Type: instance
representation (an opaque record). */ typedef stack_Rep
*stack_T; /* Type: handle to a stack instance (an opaque
pointer). */ typedef void *stack_Item; /* Type: value
that can be stored in stack (arbitrary address). */ stack_T
stack_create(void); /* Create new stack instance, initially
empty. */ void stack_push(stack_T s, stack_Item e);
/* Add an item at the top of the stack. */ stack_Item
stack_pop(stack_T s); /* Remove the top item from the
stack and return it . */ int stack_empty(stack_T ts); /*
Check whether stack is empty. */

The main problem is that C lacks garbage collection, and


this makes this style of programming impractical; moreover, memory allocation routines in C are slower than allocation in a typical garbage collector, thus the performance impact of so many allocations is even greater.

ADT libraries
Many modern programming languages, such as C++ and
Java, come with standard libraries that implement several
common ADTs, such as those listed above.

CHAPTER 1. INTRODUCTION

Built-in abstract data types


The specication of some programming languages is
intentionally vague about the representation of certain
built-in data types, dening only the operations that can
be done on them. Therefore, those types can be viewed as
built-in ADTs. Examples are the arrays in many scripting languages, such as Awk, Lua, and Perl, which can be
regarded as an implementation of the Map or Table ADT.

1.1.6

See also

Initial algebra

1.2 Data structure

keys

hash
function
00

John Smith
Lisa Smith
Sandra Dee

01

521-8976

02

521-1234

03
:

13
14

521-9655

15

Concept (generic programming)


Design by contract

buckets

A hash table

Formal methods
Functional specication
Liskov substitution principle
Object-oriented programming
Type system
Type theory
Algebraic data type
Generalized algebraic data type

1.1.7

References

[1] B. Liskov & S. N. Zilles, Programming with Abstract Data


Types, SIGPlan Notices, 9(4), pp. 50-59, 1974.
[2] Rudolf Lidl (2004). Abstract Algebra. Springer. ISBN
81-8128-149-7., Chapter 7,section 40.
[3] Stevens, Al (March 1995). Al Stevens Interviews Alex
Stepanov. Dr. Dobbs Journal. Retrieved 31 January
2015.

In computer science, a data structure is a particular way


of organizing data in a computer so that it can be used
eciently.[1][2]
Dierent kinds of data structures are suited to dierent
kinds of applications, and some are highly specialized to
specic tasks. For example, databases use B-tree indexes
for small percentages of data retrieval and compilers and
databases use dynamic hash tables as look up tables.
Data structures provide a means to manage large amounts
of data eciently for uses such as large databases and
internet indexing services. Usually, ecient data structures are key to designing ecient algorithms. Some formal design methods and programming languages emphasize data structures, rather than algorithms, as the key organizing factor in software design. Storing and retrieving
can be carried out on data stored in both main memory
and in secondary memory.

1.2.1 Overview

Data structures are generally based on the ability of a


computer to fetch and store data at any place in its mem[4] Robert Sedgewick (1998). Algorithms in C. Addiory, specied by a pointer a bit string, representing a
son/Wesley. ISBN 0-201-31452-5., denition 4.4.
memory address, that can be itself stored in memory and
manipulated by the program. Thus, the array and record
data structures are based on computing the addresses of
1.1.8 Further
data items with arithmetic operations; while the linked
Mitchell, John C.; Plotkin, Gordon (July 1988). data structures are based on storing addresses of data
Abstract Types Have Existential Type. ACM items within the structure itself. Many data structures use
Transactions on Programming Languages and Sys- both principles, sometimes combined in non-trivial ways
(as in XOR linking).
tems 10 (3).
The implementation of a data structure usually requires
writing a set of procedures that create and manipulate in1.1.9 External links
stances of that structure. The eciency of a data struc Abstract data type in NIST Dictionary of Algo- ture cannot be analyzed separately from those operations.
This observation motivates the theoretical concept of an
rithms and Data Structures
abstract data type, a data structure that is dened indi Walls and Mirrors, the classic textbook
rectly by the operations that may be performed on it, and

1.2. DATA STRUCTURE

the mathematical properties of those operations (includ- 1.2.3 Language support


ing their space and time cost).
Most assembly languages and some low-level languages,
such as BCPL (Basic Combined Programming Lan1.2.2 Examples
guage), lack built-in support for data structures. On the
other hand, many high-level programming languages and
Main article: List of data structures
some higher-level assembly languages, such as MASM,
have special syntax or other built-in support for cerThere are numerous types of data structures, generally tain data structures, such as records and arrays. For
example, the C and Pascal languages support structs
built upon simpler primitive data types:
and records, respectively, in addition to vectors (one An array is a number of elements in a specic order, dimensional arrays) and multi-dimensional arrays.[3][4]
typically all of the same type. Elements are accessed Most programming languages feature some sort of library
using an integer index to specify which element is re- mechanism that allows data structure implementations to
quired (although the elements may be of almost any be reused by dierent programs. Modern languages usutype). Typical implementations allocate contiguous ally come with standard libraries that implement the most
memory words for the elements of arrays (but this is common data structures. Examples are the C++ Standard
not always a necessity). Arrays may be xed-length Template Library, the Java Collections Framework, and
or resizable.
Microsoft's .NET Framework.
A record (also called a tuple or struct) is an aggregate
data structure. A record is a value that contains other
values, typically in xed number and sequence and
typically indexed by names. The elements of records
are usually called elds or members.

Modern languages also generally support modular programming, the separation between the interface of a library module and its implementation. Some provide
opaque data types that allow clients to hide implementation details. Object-oriented programming languages,
An associative array (also called a dictionary or such as C++, Java and Smalltalk may use classes for this
map) is a more exible variation on an array, in purpose.
which name-value pairs can be added and deleted Many known data structures have concurrent versions that
freely. A hash table is a common implementation of allow multiple computing threads to access the data strucan associative array.
ture simultaneously.
A union type species which of a number of permitted primitive types may be stored in its instances,
1.2.4 See also
e.g. oat or long integer. Contrast with a record,
which could be dened to contain a oat and an in Abstract data type
teger; whereas in a union, there is only one value
at a time. Enough space is allocated to contain the
Concurrent data structure
widest member datatype.
Data model
A tagged union (also called a variant, variant record,
Dynamization
discriminated union, or disjoint union) contains an
additional eld indicating its current type, for en Linked data structure
hanced type safety.
List of data structures
A set is an abstract data structure that can store specic values, in no particular order and with no du Persistent data structure
plicate values.
Plain old data structure
Graphs and trees are linked abstract data structures
composed of nodes. Each node contains a value and
one or more pointers to other nodes arranged in a hi- 1.2.5 References
erarchy. Graphs can be used to represent networks,
while variants of trees can be used for sorting and [1] Paul E. Black (ed.), entry for data structure in Dictionary
of Algorithms and Data Structures. U.S. National Institute
searching, having their nodes arranged in some relof Standards and Technology. 15 December 2004. Online
ative order based on their values.
An object contains data elds, like a record, as well
as various methods which operate on the contents
of the record. In the context of object-oriented
programming, records are known as plain old data
structures to distinguish them from objects.

version Accessed May 21, 2009.

[2] Entry data structure in the Encyclopdia Britannica


(2009) Online entry accessed on May 21, 2009.
[3] The GNU C Manual. Free Software Foundation. Retrieved 15 October 2014.

CHAPTER 1. INTRODUCTION

logarithm of the length of the list being searched, or


in O(log(n)), colloquially in logarithmic time". Usually asymptotic estimates are used because dierent
implementations of the same algorithm may dier in ef1.2.6 Further reading
ciency. However the eciencies of any two reasonable implementations of a given algorithm are related by
Peter Brass, Advanced Data Structures, Cambridge a constant multiplicative factor called a hidden constant.
University Press, 2008.
Exact (not asymptotic) measures of eciency can some Donald Knuth, The Art of Computer Programming, times be computed but they usually require certain asvol. 1. Addison-Wesley, 3rd edition, 1997.
sumptions concerning the particular implementation of
the algorithm, called model of computation. A model of
Dinesh Mehta and Sartaj Sahni Handbook of
computation may be dened in terms of an abstract comData Structures and Applications, Chapman and
puter, e.g., Turing machine, and/or by postulating that
Hall/CRC Press, 2007.
certain operations are executed in unit time. For exam Niklaus Wirth, Algorithms and Data Structures, ple, if the sorted list to which we apply binary search has
n elements, and we can guarantee that each lookup of an
Prentice Hall, 1985.
element in the list can be done in unit time, then at most
log2 n + 1 time units are needed to return an answer.
[4] Free Pascal: Reference Guide. Free Pascal. Retrieved
15 October 2014.

1.2.7

External links

course on data structures


Data structures Programs Examples in c,java

1.3.1 Cost models

Time eciency estimates depend on what we dene to be


a step. For the analysis to correspond usefully to the actual
Descriptions from the Dictionary of Algorithms and execution time, the time required to perform a step must
Data Structures
be guaranteed to be bounded above by a constant. One
must be careful here; for instance, some analyses count
Data structures course
an addition of two numbers as one step. This assumption
An Examination of Data Structures from .NET per- may not be warranted in certain contexts. For example, if
the numbers involved in a computation may be arbitrarily
spective
large, the time required by a single addition can no longer
Schaer, C. Data Structures and Algorithm Analysis be assumed to be constant.

UC Berkeley video course on data structures

Two cost models are generally used:[1][2][3][4][5]

1.3 Analysis of algorithms


In computer science, the analysis of algorithms is the
determination of the amount of resources (such as time
and storage) necessary to execute them. Most algorithms
are designed to work with inputs of arbitrary length. Usually, the eciency or running time of an algorithm is
stated as a function relating the input length to the number of steps (time complexity) or storage locations (space
complexity).

the uniform cost model, also called uniform-cost


measurement (and similar variations), assigns a
constant cost to every machine operation, regardless
of the size of the numbers involved
the logarithmic cost model, also called
logarithmic-cost measurement (and variations
thereof), assigns a cost to every machine operation
proportional to the number of bits involved

Algorithm analysis is an important part of a broader


computational complexity theory, which provides theoretical estimates for the resources needed by any algorithm which solves a given computational problem. These
estimates provide an insight into reasonable directions of
search for ecient algorithms.

The latter is more cumbersome to use, so its only employed when necessary, for example in the analysis of
arbitrary-precision arithmetic algorithms, like those used
in cryptography.

In theoretical analysis of algorithms it is common to estimate their complexity in the asymptotic sense, i.e., to
estimate the complexity function for arbitrarily large input. Big O notation, Big-omega notation and Big-theta
notation are used to this end. For instance, binary search
is said to run in a number of steps proportional to the

A key point which is often overlooked is that published


lower bounds for problems are often given for a model of
computation that is more restricted than the set of operations that you could use in practice and therefore there
are algorithms that are faster than what would naively be
thought possible.[6]

1.3. ANALYSIS OF ALGORITHMS

1.3.2

Run-time analysis

Run-time analysis is a theoretical classication that estimates and anticipates the increase in running time (or
run-time) of an algorithm as its input size (usually denoted
as n) increases. Run-time eciency is a topic of great
interest in computer science: A program can take seconds, hours or even years to nish executing, depending
on which algorithm it implements (see also performance
analysis, which is the analysis of an algorithms run-time
in practice).

9
time of that algorithm. In other words, for a given input
size n greater than some n0 and a constant c, the running
time of that algorithm will never be larger than c f(n).
This concept is frequently expressed using Big O notation.
For example, since the run-time of insertion sort grows
quadratically as its input size increases, insertion sort can
be said to be of order O(n).
Big O notation is a convenient way to express the worstcase scenario for a given algorithm, although it can also
be used to express the average-case for example,
the worst-case scenario for quicksort is O(n), but the
average-case run-time is O(n log n).[7]

Shortcomings of empirical metrics


Since algorithms are platform-independent (i.e.
a
given algorithm can be implemented in an arbitrary
programming language on an arbitrary computer running
an arbitrary operating system), there are signicant drawbacks to using an empirical approach to gauge the comparative performance of a given set of algorithms.

Empirical orders of growth


Assuming the execution time follows power rule, t k
na , the coecient a can be found [8] by taking empirical measurements of run time {t1, t2} at some problemsize points {n1, n2} , and calculating t2 /t1 = (n2 /n1 )a
so that a = log(t2 /t1 )/ log(n2 /n1 ) . If the order of
growth indeed follows the power rule, the empirical value
of a will stay constant at dierent ranges, and if not, it
will change - but still could serve for comparison of any
two given algorithms as to their empirical local orders of
growth behaviour. Applied to the above table:

Take as an example a program that looks up a specic entry in a sorted list of size n. Suppose this program were
implemented on Computer A, a state-of-the-art machine,
using a linear search algorithm, and on Computer B, a
much slower machine, using a binary search algorithm.
Benchmark testing on the two computers running their It is clearly seen that the rst algorithm exhibits a linear
respective programs might look something like the fol- order of growth indeed following the power rule. The emlowing:
pirical values for the second one are diminishing rapidly,
Based on these metrics, it would be easy to jump to the suggesting it follows another rule of growth and in any
conclusion that Computer A is running an algorithm that case has much lower local orders of growth (and improvis far superior in eciency to that of Computer B. How- ing further still), empirically, than the rst one.
ever, if the size of the input-list is increased to a sucient
number, that conclusion is dramatically demonstrated to
Evaluating run-time complexity
be in error:
Computer A, running the linear search program, exhibits
a linear growth rate. The programs run-time is directly
proportional to its input size. Doubling the input size doubles the run time, quadrupling the input size quadruples
the run-time, and so forth. On the other hand, Computer B, running the binary search program, exhibits a
logarithmic growth rate. Doubling the input size only increases the run time by a constant amount (in this example, 50,000 ns). Even though Computer A is ostensibly a
faster machine, Computer B will inevitably surpass Computer A in run-time because its running an algorithm with
a much slower growth rate.
Orders of growth
Main article: Big O notation
Informally, an algorithm can be said to exhibit a growth
rate on the order of a mathematical function if beyond
a certain input size n, the function f(n) times a positive
constant provides an upper bound or limit for the run-

The run-time complexity for the worst-case scenario


of a given algorithm can sometimes be evaluated by
examining the structure of the algorithm and making
some simplifying assumptions. Consider the following
pseudocode:
1 get a positive integer from input 2 if n > 10 3 print This
might take a while... 4 for i = 1 to n 5 for j = 1 to i 6
print i * j 7 print Done!"
A given computer will take a discrete amount of time to
execute each of the instructions involved with carrying
out this algorithm. The specic amount of time to carry
out a given instruction will vary depending on which instruction is being executed and which computer is executing it, but on a conventional computer, this amount
will be deterministic.[9] Say that the actions carried out
in step 1 are considered to consume time T1 , step 2 uses
time T2 , and so forth.
In the algorithm above, steps 1, 2 and 7 will only be run
once. For a worst-case evaluation, it should be assumed
that step 3 will be run as well. Thus the total amount of
time to run steps 1-3 and step 7 is:

10

CHAPTER 1. INTRODUCTION

T1 + T2 + T3 + T7 .
The loops in steps 4, 5 and 6 are trickier to evaluate. The
outer loop test in step 4 will execute ( n + 1 ) times (note
that an extra step is required to terminate the for loop,
hence n + 1 and not n executions), which will consume
T4 ( n + 1 ) time. The inner loop, on the other hand, is
governed by the value of i, which iterates from 1 to i. On
the rst pass through the outer loop, j iterates from 1 to
1: The inner loop makes one pass, so running the inner
loop body (step 6) consumes T6 time, and the inner loop
test (step 5) consumes 2T5 time. During the next pass
through the outer loop, j iterates from 1 to 2: the inner
loop makes two passes, so running the inner loop body
(step 6) consumes 2T6 time, and the inner loop test (step
5) consumes 3T5 time.

[1 2
]
that
+
2 (n + n) T6
[ 1 Prove
]
2
(n
+
3n)
T
+
(n
+
1)T
+
T
5
4
1 +
2
2
T2 +
T
+
T

cn
,
n

n
[1 2 0
]
[1 3 2 7 ]
2 (n + n) T6 + 2 (n + 3n) T5 +(n+
1)T4 + T1 + T2 + T3 + T7
(n2 +n)T6 +(n2 +3n)T5 +(n+1)T4 +
T1 + T2 + T3 + T7 (for n 0)
Let k be a constant greater than or equal to
[T1 ..T7 ]
T6 (n2 + n) + T5 (n2 + 3n) + (n + 1)T4 + T1 +
T2 + T3 + T7 k(n2 + n) + k(n2 + 3n) +
kn + 5k
= 2kn2 + 5kn + 5k 2kn2 + 5kn2 + 5kn2
(for n 1) = 12kn2 [
]
1
2
Therefore
(n
+
n)
T6
+
2
[1 2
]
2 (n + 3n) T5 + (n + 1)T4 + T1 + T2 +
T3 + T7 cn2 , n n0 for c = 12k, n0 = 1

Altogether, the total time required to run the inner loop


A more elegant approach to analyzing this algorithm
body can be expressed as an arithmetic progression:
would be to declare that [T1 ..T7 ] are all equal to one unit
of time, in a system of units chosen so that one unit is
greater than or equal to the actual times for these steps.
T6 + 2T6 + 3T6 + + (n 1)T6 + nT6
This would mean that the algorithms running time breaks
which can be factored[10] as
down as follows:[11]
[
T6 [1 + 2 + 3 + + (n 1) + n] = T6

]
1 2
(n + n)
2

n
n
4+ i=1 i 4+ i=1 n = 4+n2 5n2
(for n 1) = O(n2 ).

The total time required to run the outer loop test can be
Growth rate analysis of other resources
evaluated similarly:
2T5 + 3T5 + 4T5 + + (n 1)T5 + nT5 +
(n + 1)T5
= T5 + 2T5 + 3T5 + 4T5 + + (n 1)T5 +
nT5 + (n + 1)T5 T5
which can be factored as

The methodology of run-time analysis can also be utilized


for predicting other growth rates, such as consumption of
memory space. As an example, consider the following
pseudocode which manages and reallocates memory usage by a program based on the size of a le which that
program manages:

while (le still open) let n = size of le for every 100,000


of mem- [
[ kilobytes ]of increase in le size double[the amount ]
]
1 ory2 reserved
1 2
1 2
T5 [1 + 2 + 3 + + (n 1) + n + (n + 1)]T5 =
(n + n) T5 +(n+1)T5 T5 = T5 (n + n) +nT5 =
(n + 3n) T
2
2
2
In this instance, as the le size n increases, memory will
be consumed at an exponential growth rate, which is orTherefore the total running time for this algorithm is:
der O(2n ). This is an extremely rapid and most likely
[
]
[ unmanageable
] growth rate for consumption of memory
1 2
1resources.
f (n) = T1 +T2 +T3 +T7 +(n+1)T4 + (n + n) T6 + (n2 + 3n) T5
2
2
which reduces to
[

1.3.3 Relevance

]
[
]
Algorithm analysis is important in practice because the
1 2
1 2
f (n) =
(n + n) T6 + (n + 3n) T5 +(n+1)T4 +T
accidental
unintentional
use of an inecient algorithm
1 +T2 +Tor
3 +T
7
2
2
can signicantly impact system performance. In timeAs a rule-of-thumb, one can assume that the highest- sensitive applications, an algorithm taking too long to
order term in any given function dominates its rate of run can render its results outdated or useless. An inefgrowth and thus denes its run-time order. In this ex- cient algorithm can also end up requiring an uneconomample, n is the highest-order term, so one can conclude ical amount of computing power or storage in order to
that f(n) = O(n). Formally this can be proven as follows: run, again rendering it practically useless.

1.3. ANALYSIS OF ALGORITHMS

1.3.4

Constant factors

11
Smoothed analysis

Termination analysis the subproblem of checking


Analysis of algorithms typically focuses on the asympwhether a program will terminate at all
totic performance, particularly at the elementary level,
but in practical applications constant factors are impor Time complexity includes table of orders of
tant, and real-world data is in practice always limited
growth for common algorithms
in size. The limit is typically the size of addressable
memory, so on 32-bit machines 232 = 4 GiB (greater if
segmented memory is used) and on 64-bit machines 264 1.3.6 Notes
= 16 EiB. Thus given a limited size, an order of growth
(time or space) can be replaced by a constant factor, and [1] Alfred V. Aho; John E. Hopcroft; Jerey D. Ullman
in this sense all practical algorithms are O(1) for a large
(1974). The design and analysis of computer algorithms.
enough constant, or for small enough data.
Addison-Wesley Pub. Co., section 1.3
This interpretation is primarily useful for functions that
grow extremely slowly: (binary) iterated logarithm (log* )
is less than 5 for all practical data (265536 bits); (binary)
log-log (log log n) is less than 6 for virtually all practical data (264 bits); and binary log (log n) is less than 64
for virtually all practical data (264 bits). An algorithm
with non-constant complexity may nonetheless be more
ecient than an algorithm with constant complexity on
practical data if the overhead of the constant time algorithm results in a larger constant factor, e.g., one may have
6
K > k log log n so long as K/k > 6 and n < 22 = 264
.

[2] Juraj Hromkovi (2004). Theoretical computer science:


introduction to Automata, computability, complexity, algorithmics, randomization, communication, and cryptography. Springer. pp. 177178. ISBN 978-3-540-14015-3.

For large data linear or quadratic factors cannot be ignored, but for small data an asymptotically inecient algorithm may be more ecient. This is particularly used
in hybrid algorithms, like Timsort, which use an asymptotically ecient algorithm (here merge sort, with time
complexity n log n ), but switch to an asymptotically inecient algorithm (here insertion sort, with time complexity n2 ) for small data, as the simpler algorithm is
faster on small data.

[5] Robert Endre Tarjan (1983). Data structures and network


algorithms. SIAM. pp. 37. ISBN 978-0-89871-187-5.

1.3.5

See also

Amortized analysis
Asymptotic computational complexity
Best, worst and average case
Big O notation
Computational complexity theory
Master theorem
NP-Complete
Numerical analysis
Polynomial time
Program optimization
Proling (computer programming)
Scalability

[3] Giorgio Ausiello (1999). Complexity and approximation:


combinatorial optimization problems and their approximability properties. Springer. pp. 38. ISBN 978-3-54065431-5.
[4] Wegener, Ingo (2005), Complexity theory: exploring the
limits of ecient algorithms, Berlin, New York: SpringerVerlag, p. 20, ISBN 978-3-540-21045-0

[6] Examples of the price


ory.stackexchange.com

of

abstraction?,

csthe-

[7] The term lg is often used as shorthand for log2


[8] How To Avoid O-Abuse and Bribes, at the blog Gdels
Lost Letter and P=NP by R. J. Lipton, professor of Computer Science at Georgia Tech, recounting idea by Robert
Sedgewick
[9] However, this is not the case with a quantum computer
[10] It can be proven by induction that 1 + 2 + 3 + + (n
1) + n = n(n+1)
2
[11] This approach, unlike the above approach, neglects the
constant time consumed by the loop tests which terminate
their respective loops, but it is trivial to prove that such
omission does not aect the nal result

1.3.7 References
Cormen, Thomas H.; Leiserson, Charles E.; Rivest,
Ronald L. & Stein, Cliord (2001). Introduction to
Algorithms. Chapter 1: Foundations (Second ed.).
Cambridge, MA: MIT Press and McGraw-Hill. pp.
3122. ISBN 0-262-03293-7.
Sedgewick, Robert (1998). Algorithms in C, Parts 14: Fundamentals, Data Structures, Sorting, Searching (3rd ed.). Reading, MA: Addison-Wesley Professional. ISBN 978-0-201-31452-6.

12
Knuth, Donald. The Art of Computer Programming.
Addison-Wesley.
Greene, Daniel A.; Knuth, Donald E. (1982). Mathematics for the Analysis of Algorithms (Second ed.).
Birkhuser. ISBN 3-7643-3102-X.
Goldreich, Oded (2010). Computational Complexity: A Conceptual Perspective. Cambridge University Press. ISBN 978-0-521-88473-0.

CHAPTER 1. INTRODUCTION

Chapter 2

Sequences
2.1 Array data type

selected by indices computed at run-time.

Depending on the language, array types may overlap (or


be identied with) other data types that describe aggregates of values, such as lists and strings. Array types are
often implemented by array data structures, but someIn computer science, an array type is a data type that times by other means, such as hash tables, linked lists, or
is meant to describe a collection of elements (values or search trees.
variables), each selected by one or more indices (identifying keys) that can be computed at run time by the program. Such a collection is usually called an array vari- 2.1.1 History
able, array value, or simply array.[1] By analogy with
the mathematical concepts of vector and matrix, array Assembly languages and low-level languages like
types with one and two indices are often called vector BCPL[3] generally have no syntactic support for arrays.
type and matrix type, respectively.
Because of the importance of array structures for efLanguage support for array types may include certain cient computation, the earliest high-level programbuilt-in array data types, some syntactic constructions ming languages, including FORTRAN (1957), COBOL
(array type constructors) that the programmer may use to (1960), and Algol 60 (1960), provided support for multidene such types and declare array variables, and spe- dimensional arrays.
cial notation for indexing array elements.[1] For example, in the Pascal programming language, the declaration type MyTable = array [1..4,1..2] of integer, denes 2.1.2 Abstract arrays
a new array data type called MyTable. The declaration
var A: MyTable then denes a variable A of that type, An array data structure can be mathematically modeled
which is an aggregate of eight elements, each being an as an abstract data structure (an abstract array) with two
integer variable identied by two indices. In the Pas- operations
cal program, those elements are denoted A[1,1], A[1,2],
A[2,1], A[4,2].[2] Special array types are often dened
get(A, I): the data stored in the element of the
by the languages standard libraries.
array A whose indices are the integer tuple I.
Not to be confused with Array data structure.

Arrays are distinguished from lists in that arrays allow


set(A,I,V): the array that results by setting the
random access, while lists only allow sequential access.
value of that element to V.
Dynamic lists are also more common and easier to implement than dynamic arrays. Array types are distin- These operations are required to satisfy the axioms[4]
guished from record types mainly because they allow the
element indices to be computed at run time, as in the Pasget(set(A,I, V), I) = V
cal assignment A[I,J] := A[N-I,2*J]. Among other things,
get(set(A,I, V), J) = get(A, J) if I J
this feature allows a single iterative statement to process
arbitrarily many elements of an array variable.
In more theoretical contexts, especially in type theory and for any array state A, any value V, and any tuples I, J for
in the description of abstract algorithms, the terms ar- which the operations are dened.
ray and array type sometimes refer to an abstract data
type (ADT) also called abstract array or may refer to an
associative array, a mathematical model with the basic
operations and behavior of a typical array type in most
languages basically, a collection of elements that are

The rst axiom means that each element behaves like a


variable. The second axiom means that elements with
distinct indices behave as disjoint variables, so that storing a value in one element does not aect the value of any
other element.

13

14

CHAPTER 2. SEQUENCES

These axioms do not place any constraints on the set of


valid index tuples I, therefore this abstract model can be
used for triangular matrices and other oddly-shaped arrays.

2.1.3

Implementations

In order to eectively implement variables of such types


as array structures (with indexing done by pointer arithmetic), many languages restrict the indices to integer data
types (or other types that can be interpreted as integers,
such as bytes and enumerated types), and require that
all elements have the same data type and storage size.
Most of those languages also restrict each index to a nite
interval of integers, that remains xed throughout the lifetime of the array variable. In some compiled languages,
in fact, the index ranges may have to be known at compile
time.
On the other hand, some programming languages provide
more liberal array types, that allow indexing by arbitrary
values, such as oating-point numbers, strings, objects,
references, etc.. Such index values cannot be restricted to
an interval, much less a xed interval. So, these languages
usually allow arbitrary new elements to be created at any
time. This choice precludes the implementation of array
types as array data structures. That is, those languages use
array-like syntax to implement a more general associative
array semantics, and must therefore be implemented by a
hash table or some other search data structure.

2.1.4

Language support

A two-dimensional array stored as a one-dimensional array of


one-dimensional arrays (rows).

or, in general, where the valid range of each index depends on the values of all preceding indices.
This representation for multi-dimensional arrays is quite
prevalent in C and C++ software. However, C and C++
will use a linear indexing formula for multi-dimensional
arrays that are declared as such, e.g. by int A[10][20] or
int A[m][n], instead of the traditional int **A.[6]:p.81
Indexing notation
Most programming languages that support arrays support the store and select operations, and have special syntax for indexing. Early languages used parentheses, e.g.
A(i,j), as in FORTRAN; others choose square brackets,
e.g. A[i,j] or A[i][j], as in Algol 60 and Pascal.

Multi-dimensional arrays

Index types

The number of indices needed to specify an element is


called the dimension, dimensionality, or rank of the array
type. (This nomenclature conicts with the concept of
dimension in linear algebra,[5] where it is the number of
elements. Thus, an array of numbers with 5 rows and 4
columns, hence 20 elements, is said to have dimension 2
in computing contexts, but represents a matrix with dimension 4-by-5 or 20 in mathematics. Also, the computer science meaning of rank is similar to its meaning
in tensor algebra but not to the linear algebra concept of
rank of a matrix.)

Array data types are most often implemented as array


structures: with the indices restricted to integer (or totally
ordered) values, index ranges xed at array creation time,
and multilinear element addressing. This was the case in
most third generation languages, and is still the case
of most systems programming languages such as Ada, C,
and C++. In some languages, however, array data types
have the semantics of associative arrays, with indices of
arbitrary type and dynamic element creation. This is the
case in some scripting languages such as Awk and Lua,
and of some array types provided by standard C++ libraries.

Many languages support only one-dimensional arrays. In


those languages, a multi-dimensional array is typically
represented by an Ilie vector, a one-dimensional array
of references to arrays of one dimension less. A twodimensional array, in particular, would be implemented
as a vector of pointers to its rows. Thus an element in
row i and column j of an array A would be accessed by
double indexing (A[i][j] in typical notation). This way of
emulating multi-dimensional arrays allows the creation of
jagged arrays, where each row may have a dierent size

Bounds checking
Some languages (like Pascal and Modula) perform
bounds checking on every access, raising an exception
or aborting the program when any index is out of its
valid range. Compilers may allow these checks to be
turned o to trade safety for speed. Other languages (like
FORTRAN and C) trust the programmer and perform no

2.1. ARRAY DATA TYPE

15

checks. Good compilers may also analyze the program provided via standard extension libraries for other general
to determine the range of possible values that the index purpose programming languages (such as the widely used
may have, and this analysis may lead to bounds-checking NumPy library for Python).
elimination.
String types and arrays
Index origin
Some languages, such as C, provide only zero-based array
types, for which the minimum valid value for any index
is 0. This choice is convenient for array implementation
and address computations. With a language such as C,
a pointer to the interior of any array can be dened that
will symbolically act as a pseudo-array that accommodates negative indices. This works only because C does
not check an index against bounds when used.
Other languages provide only one-based array types,
where each index starts at 1; this is the traditional convention in mathematics for matrices and mathematical
sequences. A few languages, such as Pascal, support
n-based array types, whose minimum legal indices are
chosen by the programmer. The relative merits of each
choice have been the subject of heated debate. Zerobased indexing has a natural advantage to one-based indexing in avoiding o-by-one or fencepost errors.[7]

Many languages provide a built-in string data type, with


specialized notation ("string literals") to build values of
that type. In some languages (such as C), a string is just
an array of characters, or is handled in much the same
way. Other languages, like Pascal, may provide vastly
dierent operations for strings and arrays.
Array index range queries
Some programming languages provide operations that return the size (number of elements) of a vector, or, more
generally, range of each index of an array. In C and C++
arrays do not support the size function, so programmers
often have to declare separate variable to hold the size,
and pass it to procedures as a separate parameter.
Elements of a newly created array may have undened
values (as in C), or may be dened to have a specic default value such as 0 or a null pointer (as in Java).

See comparison of programming languages (array) for


In C++ a std::vector object supports the store, select, and
the base indices used by various languages.
append operations with the performance characteristics
discussed above. Vectors can be queried for their size
Highest index
and can be resized. Slower operations like inserting an
element in the middle are also supported.
The relation between numbers appearing in an array declaration and the index of that arrays last element also
varies by language. In many languages (such as C), one Slicing
should specify the number of elements contained in the
array; whereas in others (such as Pascal and Visual Basic An array slicing operation takes a subset of the elements
.NET) one should specify the numeric value of the index of an array-typed entity (value or variable) and then asof the last element. Needless to say, this distinction is sembles them as another array-typed entity, possibly with
other indices. If array types are implemented as array
immaterial in languages where the indices start at 1.
structures, many useful slicing operations (such as selecting a sub-array, swapping indices, or reversing the direcArray algebra
tion of the indices) can be performed very eciently by
manipulating the dope vector of the structure. The posSome programming languages support array program- sible slicings depend on the implementation details: for
ming, where operations and functions dened for certain example, FORTRAN allows slicing o one column of a
data types are implicitly extended to arrays of elements matrix variable, but not a row, and treat it as a vector;
of those types. Thus one can write A+B to add corre- whereas C allow slicing o a row from a matrix, but not
sponding elements of two arrays A and B. Usually these a column.
languages provide both the element-by-element multiplication and the standard matrix product of linear algebra, On the other hand, other slicing operations are possible
and which of these is represented by the * operator varies when array types are implemented in other ways.
by language.
Languages providing array programming capabilities
have proliferated since the innovations in this area of
APL. These are core capabilities of domain-specic languages such as GAUSS, IDL, Matlab, and Mathematica.
They are a core facility in newer languages, such as Julia
and recent versions of Fortran. These capabilities are also

Resizing
Some languages allow dynamic arrays (also called resizable, growable, or extensible): array variables whose index ranges may be expanded at any time after creation,
without changing the values of its current elements.

16

CHAPTER 2. SEQUENCES

For one-dimensional arrays, this facility may be provided [3] John Mitchell, Concepts of Programming Languages.
Cambridge University Press.
as an operation append(A,x)" that increases the size of
the array A by one and then sets the value of the last el[4] Lukham, Suzuki (1979), Verication of array, record,
ement to x. Other array types (such as Pascal strings)
and pointer operations in Pascal. ACM Transactions on
provide a concatenation operator, which can be used toProgramming Languages and Systems 1(2), 226244.
gether with slicing to achieve that eect and more. In
some languages, assigning a value to an element of an [5] see the denition of a matrix
array automatically extends the array, if necessary, to include that element. In other array types, a slice can be [6] Brian W. Kernighan and Dennis M. Ritchie (1988), The
C programming Language. Prentice-Hall, 205 pages.
replaced by an array of dierent size with subsequent elements being renumbered accordingly as in Pythons [7] Edsger W. Dijkstra, Why numbering should start at zero
list assignment "A[5:5] = [10,20,30]", that inserts three
new elements (10,20, and 30) before element "A[5]". Resizable arrays are conceptually similar to lists, and the two 2.1.7 External links
concepts are synonymous in some languages.
NISTs Dictionary of Algorithms and Data StrucAn extensible array can be implemented as a xed-size
tures: Array
array, with a counter that records how many elements are
actually in use. The append operation merely increments
the counter; until the whole array is used, when the append operation may be dened to fail. This is an imple- 2.2 Array data structure
mentation of a dynamic array with a xed capacity, as in
the string type of Pascal. Alternatively, the append op- Not to be confused with Array data type.
eration may re-allocate the underlying array with a larger
size, and copy the old elements to the new area.
In computer science, an array data structure or simply
an array is a data structure consisting of a collection of elements (values or variables), each identied by at least one
2.1.5 See also
array index or key. An array is stored so that the position
of each element can be computed from its index tuple by
Array access analysis
a mathematical formula.[1][2][3] The simplest type of data
Array programming
structure is a linear array, also called one-dimensional array.
Array slicing
For example, an array of 10 32-bit integer variables, with
Bounds checking and index checking
indices 0 through 9, may be stored as 10 words at memory
addresses 2000, 2004, 2008, 2036, so that the element
Bounds checking elimination
with index i has the address 2000 + 4 i.[4]
Delimiter-separated values
Because the mathematical concept of a matrix can be represented as a two-dimensional grid, two-dimensional ar Comparison of programming languages (array)
rays are also sometimes called matrices. In some cases
the term vector is used in computing to refer to an ar Parallel array
ray, although tuples rather than vectors are more correctly
the mathematical equivalent. Arrays are often used to imRelated types
plement tables, especially lookup tables; the word table is
sometimes used as a synonym of array.
Variable-length array
Arrays are among the oldest and most important data
Dynamic array
structures, and are used by almost every program. They
are also used to implement many other data structures,
Sparse array
such as lists and strings. They eectively exploit the addressing logic of computers. In most modern computers and many external storage devices, the memory is a
2.1.6 References
one-dimensional array of words, whose indices are their
[1] Robert W. Sebesta (2001) Concepts of Programming addresses. Processors, especially vector processors, are
Languages. Addison-Wesley. 4th edition (1998), 5th edi- often optimized for array operations.
tion (2001), ISBN 9780201385960
[2] K. Jensen and Niklaus Wirth, PASCAL User Manual and
Report. Springer. Paperback edition (2007) 184 pages,
ISBN 978-3540069508

Arrays are useful mostly because the element indices can


be computed at run time. Among other things, this feature allows a single iterative statement to process arbitrarily many elements of an array. For that reason, the

2.2. ARRAY DATA STRUCTURE


elements of an array data structure are required to have
the same size and should use the same data representation. The set of valid index tuples and the addresses of
the elements (and hence the element addressing formula)
are usually,[3][5] but not always,[2] xed while the array is
in use.

17
portably.
Arrays can be used to determine partial or complete
control ow in programs, as a compact alternative to
(otherwise repetitive) multiple IF statements. They are
known in this context as control tables and are used
in conjunction with a purpose built interpreter whose
control ow is altered according to values contained in
the array. The array may contain subroutine pointers (or
relative subroutine numbers that can be acted upon by
SWITCH statements) that direct the path of the execution.

The term array is often used to mean array data type, a


kind of data type provided by most high-level programming languages that consists of a collection of values or
variables that can be selected by one or more indices computed at run-time. Array types are often implemented by
array structures; however, in some languages they may be
implemented by hash tables, linked lists, search trees, or
2.2.3
other data structures.

Element identier and addressing


formulas

The term is also used, especially in the description of


algorithms, to mean associative array or abstract array,
a theoretical computer science model (an abstract data When data objects are stored in an array, individual
type or ADT) intended to capture the essential properties objects are selected by an index that is usually a nonnegative scalar integer. Indices are also called subscripts.
of arrays.
An index maps the array value to a stored object.

2.2.1

History

The rst digital computers used machine-language programming to set up and access array structures for data
tables, vector and matrix computations, and for many
other purposes. Von Neumann wrote the rst arraysorting program (merge sort) in 1945, during the building of the rst stored-program computer.[6]p. 159 Array indexing was originally done by self-modifying code, and
later using index registers and indirect addressing. Some
mainframes designed in the 1960s, such as the Burroughs
B5000 and its successors, used memory segmentation to
perform index-bounds checking in hardware.[7]
Assembly languages generally have no special support
for arrays, other than what the machine itself provides.
The earliest high-level programming languages, including FORTRAN (1957), COBOL (1960), and ALGOL
60 (1960), had support for multi-dimensional arrays, and
so has C (1972). In C++ (1983), class templates exist
for multi-dimensional arrays whose dimension is xed at
runtime[3][5] as well as for runtime-exible arrays.[2]

2.2.2

Applications

Arrays are used to implement mathematical vectors and


matrices, as well as other kinds of rectangular tables.
Many databases, small and large, consist of (or include)
one-dimensional arrays whose elements are records.

There are three ways in which the elements of an array


can be indexed:
0 (zero-based indexing): The rst element of the array is indexed by subscript of 0.[8]
1 (one-based indexing): The rst element of the array is indexed by subscript of 1.[9]
n (n-based indexing): The base index of an array can
be freely chosen. Usually programming languages
allowing n-based indexing also allow negative index
values and other scalar data types like enumerations,
or characters may be used as an array index.
Arrays can have multiple dimensions, thus it is not uncommon to access an array using multiple indices. For
example a two-dimensional array A with three rows and
four columns might provide access to the element at the
2nd row and 4th column by the expression A[1, 3] (in a
row major language) or A[3, 1] (in a column major language) in the case of a zero-based indexing system. Thus
two indices are used for a two-dimensional array, three
for a three-dimensional array, and n for an n-dimensional
array.
The number of indices needed to specify an element is
called the dimension, dimensionality, or rank of the array.

In standard arrays, each index is restricted to a certain


range of consecutive integers (or consecutive values of
Arrays are used to implement other data structures, such some enumerated type), and the address of an element is
as heaps, hash tables, deques, queues, stacks, strings, and computed by a linear formula on the indices.
VLists.
One or more large arrays are sometimes used to emu- One-dimensional arrays
late in-program dynamic memory allocation, particularly
memory pool allocation. Historically, this has some- A one-dimensional array (or single dimension array) is a
times been the only way to allocate dynamic memory type of linear array. Accessing its elements involves a sin-

18

CHAPTER 2. SEQUENCES

gle subscript which can either represent a row or column changed by changing the base address B. Thus, if a twoindex.
dimensional array has rows and columns indexed from 1
As an example consider the C declaration int anArray- to 10 and 1 to 20, respectively, then replacing B by B + c1 3 c1 will cause them to be renumbered from 0 through
Name[10];
9 and 4 through 23, respectively. Taking advantage of
Syntax : datatype anArrayname[sizeofArray];
this feature, some languages (like FORTRAN 77) specify
In the given example the array can contain 10 elements of that array indices begin at 1, as in mathematical tradition;
any value available to the int type. In C, the array element while other languages (like Fortran 90, Pascal and Algol)
indices are 0-9 inclusive in this case. For example, the ex- let the user choose the minimum value for each index.
pressions anArrayName[0] and anArrayName[9] are the
rst and last elements respectively.
Dope vectors
For a vector with linear addressing, the element with index i is located at the address B + c i, where B is a xed The addressing formula is completely dened by the dibase address and c a xed constant, sometimes called the mension d, the base address B, and the increments c1 , c2 ,
, ck. It is often useful to pack these parameters into a
address increment or stride.
record called the arrays descriptor or stride vector or dope
If the valid element indices begin at 0, the constant B is
vector.[2][3] The size of each element, and the minimum
simply the address of the rst element of the array. For
and maximum values allowed for each index may also be
this reason, the C programming language species that
included in the dope vector. The dope vector is a comarray indices always begin at 0; and many programmers
plete handle for the array, and is a convenient way to pass
will call that element "zeroth" rather than rst.
arrays as arguments to procedures. Many useful array
However, one can choose the index of the rst element by slicing operations (such as selecting a sub-array, swapan appropriate choice of the base address B. For example, ping indices, or reversing the direction of the indices) can
if the array has ve elements, indexed 1 through 5, and the be performed very eciently by manipulating the dope
base address B is replaced by B + 30c, then the indices of vector.[2]
those same elements will be 31 to 35. If the numbering
does not start at 0, the constant B may not be the address
Compact layouts
of any element.
Often the coecients are chosen so that the elements occupy a contiguous area of memory. However, that is not
necessary. Even if arrays are always created with conFor a two-dimensional array, the element with indices i,j tiguous elements, some array slicing operations may crewould have address B + c i + d j, where the coe- ate non-contiguous sub-arrays from them.
cients c and d are the row and column address increments,
There are two systematic compact layouts for a tworespectively.
dimensional array. For example, consider the matrix
More generally, in a k-dimensional array, the address of
an element with indices i1 , i2 , , ik is

1 2 3
A = 4 5 6.
B + c1 i1 + c2 i2 + + ck ik.
7 8 9
Multidimensional arrays

For example: int a[3][2];


This means that array a has 3 rows and 2 columns, and
the array is of integer type. Here we can store 6 elements
they are stored linearly but starting from rst row linear
then continuing with second row. The above array will be
stored as a11 , a12 , a13 , a21 , a22 , a23 .
This formula requires only k multiplications and k additions, for any array that can t in memory. Moreover, if
any coecient is a xed power of 2, the multiplication
can be replaced by bit shifting.

In the row-major order layout (adopted by C for statically


declared arrays), the elements in each row are stored in
consecutive positions and all of the elements of a row have
a lower address than any of the elements of a consecutive
row:
In column-major order (traditionally used by Fortran), the
elements in each column are consecutive in memory and
all of the elements of a column have a lower address than
any of the elements of a consecutive column:

For arrays with three or more indices, row major order


The coecients ck must be chosen so that every valid in- puts in consecutive positions any two elements whose index tuple maps to the address of a distinct element.
dex tuples dier only by one in the last index. Column
major
order is analogous with respect to the rst index.
If the minimum legal value for every index is 0, then B is

the address of the element whose indices are all zero. As In systems which use processor cache or virtual memory,
in the one-dimensional case, the element indices may be scanning an array is much faster if successive elements

2.2. ARRAY DATA STRUCTURE


are stored in consecutive positions in memory, rather than
sparsely scattered. Many algorithms that use multidimensional arrays will scan them in a predictable order. A programmer (or a sophisticated compiler) may use this information to choose between row- or column-major layout
for each array. For example, when computing the product AB of two matrices, it would be best to have A stored
in row-major order, and B in column-major order.

19
through individual element access. The speedup of such
optimized routines varies by array element size, architecture, and implementation.

Memory-wise, arrays are compact data structures with no


per-element overhead. There may be a per-array overhead, e.g. to store index bounds, but this is languagedependent. It can also happen that elements stored in an
array require less memory than the same elements stored
in individual variables, because several array elements
can be stored in a single word; such arrays are often called
Resizing
packed arrays. An extreme (but commonly used) case is
the bit array, where every bit represents a single element.
Main article: Dynamic array
A single octet can thus hold up to 256 dierent combinations of up to 8 dierent conditions, in the most compact
Static arrays have a size that is xed when they are created form.
and consequently do not allow elements to be inserted or
removed. However, by allocating a new array and copy- Array accesses with statically predictable access patterns
ing the contents of the old array to it, it is possible to are a major source of data parallelism.
eectively implement a dynamic version of an array; see
dynamic array. If this operation is done infrequently, insertions at the end of the array require only amortized Comparison with other data structures
constant time.
Growable arrays are similar to arrays but add the ability
Some array data structures do not reallocate storage, but to insert and delete elements; adding and deleting at the
do store a count of the number of elements of the array end is particularly ecient. However, they reserve linear
in use, called the count or size. This eectively makes ((n)) additional storage, whereas arrays do not reserve
the array a dynamic array with a xed maximum size or additional storage.
capacity; Pascal strings are examples of this.
Associative arrays provide a mechanism for array-like
functionality without huge storage overheads when the inNon-linear formulas
dex values are sparse. For example, an array that contains
values only at indexes 1 and 2 billion may benet from usMore complicated (non-linear) formulas are occasionally ing such a structure. Specialized associative arrays with
used. For a compact two-dimensional triangular array, integer keys include Patricia tries, Judy arrays, and van
for instance, the addressing formula is a polynomial of Emde Boas trees.
degree 2.
Balanced trees require O(log n) time for indexed access,
but also permit inserting or deleting elements in O(log
n) time,[14] whereas growable arrays require linear ((n))
2.2.4 Eciency
time to insert or delete elements at an arbitrary position.
Both store and select take (deterministic worst case) Linked lists allow constant time removal and insertion in
constant time. Arrays take linear (O(n)) space in the the middle but take linear time for indexed access. Their
memory use is typically worse than arrays, but is still linnumber of elements n that they hold.
In an array with element size k and on a machine with a ear.

cache line size of B bytes, iterating through an array of


n elements requires the minimum of ceiling(nk/B) cache
misses, because its elements occupy contiguous memory
locations. This is roughly a factor of B/k better than the
number of cache misses needed to access n elements at
random memory locations. As a consequence, sequential iteration over an array is noticeably faster in practice
than iteration over many other data structures, a property called locality of reference (this does not mean however, that using a perfect hash or trivial hash within the
same (local) array, will not be even faster - and achievable in constant time). Libraries provide low-level optimized facilities for copying ranges of memory (such as
memcpy) which can be used to move contiguous blocks
of array elements signicantly faster than can be achieved

An Ilie vector is an alternative to a multidimensional


array structure. It uses a one-dimensional array of
references to arrays of one dimension less. For two dimensions, in particular, this alternative structure would
be a vector of pointers to vectors, one for each row. Thus
an element in row i and column j of an array A would
be accessed by double indexing (A[i][j] in typical notation). This alternative structure allows jagged arrays,
where each row may have a dierent size or, in general, where the valid range of each index depends on the
values of all preceding indices. It also saves one multiplication (by the column address increment) replacing it by
a bit shift (to index the vector of row pointers) and one
extra memory access (fetching the row address), which
may be worthwhile in some architectures.

20

CHAPTER 2. SEQUENCES

1
4
7

2
5
8

[2] Bjoern Andres; Ullrich Koethe; Thorben Kroeger; Hamprecht (2010). Runtime-Flexible Multi-dimensional Arrays and Views for C++98 and C++0x. arXiv:1008.2909
[cs.DS].

[3] Garcia, Ronald; Lumsdaine, Andrew (2005). MultiArray: a C++ library for generic programming with arrays. Software: Practice and Experience 35 (2): 159
188. doi:10.1002/spe.630. ISSN 0038-0644.

[4] David R. Richardson (2002), The Book on Data Structures. iUniverse, 112 pages. ISBN 0-595-24039-9, ISBN
978-0-595-24039-5.

A two-dimensional array stored as a one-dimensional array of


one-dimensional arrays (rows).

[5] T. Veldhuizen. Arrays in Blitz++. In Proc. of the 2nd Int.


Conf. on Scientic Computing in Object-Oriented Parallel Environments (ISCOPE), LNCS 1505, pages 223-220.
Springer, 1998.

2.2.5

[6] Donald Knuth, The Art of Computer Programming, vol. 3.


Addison-Wesley

Dimension

The dimension of an array is the number of indices


needed to select an element. Thus, if the array is seen
as a function on a set of possible index combinations, it
is the dimension of the space of which its domain is a
discrete subset. Thus a one-dimensional array is a list of
data, a two-dimensional array a rectangle of data, a threedimensional array a block of data, etc.
This should not be confused with the dimension of the
set of all matrices with a given domain, that is, the
number of elements in the array. For example, an array with 5 rows and 4 columns is two-dimensional, but
such matrices form a 20-dimensional space. Similarly,
a three-dimensional vector can be represented by a onedimensional array of size three.

2.2.6

See also

Dynamic array
Parallel array
Variable-length array
Bit array

[7] Levy, Henry M. (1984), Capability-based Computer Systems, Digital Press, p. 22, ISBN 9780932376220.
[8] Array Code Examples - PHP Array Functions - PHP
code. http://www.configure-all.com/: Computer Programming Web programming Tips. Retrieved 8 April
2011. In most computer languages array index (counting)
starts from 0, not from 1. Index of the rst element of the
array is 0, index of the second element of the array is 1,
and so on. In array of names below you can see indexes
and values.
[9] Chapter 6 - Arrays, Types, and Constants. Modula-2
Tutorial. http://www.modula2.org/tutor/index.php. Retrieved 8 April 2011. The names of the twelve variables
are given by Automobiles[1], Automobiles[2], ... Automobiles[12]. The variable name is Automobiles and the
array subscripts are the numbers 1 through 12. [i.e. in
Modula-2, the index starts by one!]
[10] Gerald Kruse. CS 240 Lecture Notes: Linked Lists Plus:
Complexity Trade-os. Juniata College. Spring 2008.
[11] Day 1 Keynote - Bjarne Stroustrup: C++11 Style at GoingNative 2012 on channel9.msdn.com from minute 45 or
foil 44
[12] Number crunching: Why you should never, ever, EVER use
linked-list in your code again at kjellkod.wordpress.com

Oset (computer science)

[13] Brodnik, Andrej; Carlsson, Svante; Sedgewick, Robert;


Munro, JI; Demaine, ED (1999), Resizable Arrays in Optimal Time and Space (Technical Report CS-99-09), Department of Computer Science, University of Waterloo

Row-major order

[14] Counted B-Tree

Array slicing

Stride of an array

2.2.7

References

[1] Black, Paul E. (13 November 2008). array. Dictionary


of Algorithms and Data Structures. National Institute of
Standards and Technology. Retrieved 22 August 2010.

2.2.8 External links

2.3 Dynamic array


In computer science, a dynamic array, growable array,
resizable array, dynamic table, mutable array, or ar-

2.3. DYNAMIC ARRAY

21

then return to using xed-size arrays during program optimization. Resizing the underlying array is an expensive
task, typically involving copying the entire contents of the
array.

27
271

2.3.2 Geometric expansion and amortized


cost

2713

To avoid incurring the cost of resizing many times, dynamic arrays resize by a large amount, such as doubling
in size, and use the reserved space for future expansion.
The operation of adding an element to the end might work
as follows:

27138
271384
Logical size
Capacity

function insertEnd(dynarray a, element e) if (a.size


= a.capacity) // resize a to twice its current capacity:
a.capacity a.capacity * 2 // (copy the contents to the
new memory location here) a[a.size] e a.size a.size
+1

As n elements are inserted, the capacities form a


geometric progression. Expanding the array by any constant proportion ensures that inserting n elements takes
O(n) time overall, meaning that each insertion takes
amortized constant time. The value of this proportion
a leads to a time-space tradeo: the average time per insertion operation is about a/(a1), while the number of
ray list is a random access, variable-size list data struc- wasted cells is bounded above by (a1)n. The choice of
ture that allows elements to be added or removed. It is a depends on the library or application: some textbooks
[3][4]
but Javas ArrayList implementation uses
supplied with standard libraries in many modern main- use a = 2,
[1]
a
=
3/2
and
the
C implementation of Python's list data
stream programming languages.
structure uses a = 9/8.[5]
A dynamic array is not the same thing as a dynamically
allocated array, which is a xed-size array whose size is Many dynamic arrays also deallocate some of the unxed when the array is allocated, although a dynamic ar- derlying storage if its size drops below a certain threshold, such as 30% of the capacity. This threshold must
ray may use such a xed-size array as a back end.[1]
be strictly smaller than 1/a in order to provide hysteresis
(provide a stable band to avoiding repeatedly growing and
shrinking) and support mixed sequences of insertions and
2.3.1 Bounded-size dynamic arrays and removals with amortized constant cost.
Several values are inserted at the end of a dynamic array using
geometric expansion. Grey cells indicate space reserved for expansion. Most insertions are fast (constant time), while some are
slow due to the need for reallocation ((n) time, labelled with turtles). The logical size and capacity of the nal array are shown.

capacity

Dynamic arrays are a common example when teaching


[3][4]
The simplest dynamic array is constructed by allocating amortized analysis.
a xed-size array and then dividing it into two parts: the
rst stores the elements of the dynamic array and the sec2.3.3 Performance
ond is reserved, or unused. We can then add or remove
elements at the end of the dynamic array in constant time
The dynamic array has performance similar to an array,
by using the reserved space, until this space is completely
with the addition of new operations to add and remove
consumed. The number of elements used by the dynamic
elements:
array contents is its logical size or size, while the size of
the underlying array is called the dynamic arrays capac Getting or setting the value at a particular index
ity or physical size, which is the maximum possible size
[2]
(constant time)
without relocating data.
In applications where the logical size is bounded, the
xed-size data structure suces. This may be shortsighted, as more space may be needed later. A
philosophical programmer may prefer to write the code
to make every array capable of resizing from the outset,

Iterating over the elements in order (linear time,


good cache performance)
Inserting or deleting an element in the middle of the
array (linear time)

22

CHAPTER 2. SEQUENCES

Inserting or deleting an element at the end of the namic array data structure, which wastes only n1/2 space
array (constant amortized time)
for n elements at any point in time, and they prove a lower
bound showing that any dynamic array must waste this
Dynamic arrays benet from many of the advantages much space if the operations are to remain amortized
of arrays, including good locality of reference and data constant time. Additionally, they present a variant where
cache utilization, compactness (low memory use), and growing and shrinking the buer has not only amortized
random access. They usually have only a small xed addi- but worst-case constant time.
tional overhead for storing information about the size and
capacity. This makes dynamic arrays an attractive tool
for building cache-friendly data structures. However, in
languages like Python or Java that enforce reference semantics, the dynamic array generally will not store the
actual data, but rather it will store references to the data
that resides in other areas of memory. In this case, accessing items in the array sequentially will actually involve accessing multiple non-contiguous areas of memory, so the many advantages of the cache-friendliness of
this data structure are lost.
Compared to linked lists, dynamic arrays have faster indexing (constant time versus linear time) and typically
faster iteration due to improved locality of reference;
however, dynamic arrays require linear time to insert or
delete at an arbitrary location, since all following elements must be moved, while linked lists can do this in
constant time. This disadvantage is mitigated by the gap
buer and tiered vector variants discussed under Variants
below. Also, in a highly fragmented memory region, it
may be expensive or impossible to nd contiguous space
for a large dynamic array, whereas linked lists do not require the whole data structure to be stored contiguously.

Bagwell (2002)[12] presented the VList algorithm, which


can be adapted to implement a dynamic array.

2.3.5 Language support


C++'s std::vector is an implementation of dynamic arrays, as are the ArrayList[13] classes supplied with the
Java API and the .NET Framework. The generic List<>
class supplied with version 2.0 of the .NET Framework is
also implemented with dynamic arrays. Smalltalk's OrderedCollection is a dynamic array with dynamic start
and end-index, making the removal of the rst element
also O(1). Python's list datatype implementation is a dynamic array. Delphi and D implement dynamic arrays
at the languages core. Ada's Ada.Containers.Vectors
generic package provides dynamic array implementation
for a given subtype. Many scripting languages such as
Perl and Ruby oer dynamic arrays as a built-in primitive
data type. Several cross-platform frameworks provide
dynamic array implementations for C: CFArray and CFMutableArray in Core Foundation; GArray and GPtrArray in GLib.

A balanced tree can store a list while providing all operations of both dynamic arrays and linked lists reasonably 2.3.6 References
eciently, but both insertion at the end and iteration over
the list are slower than for a dynamic array, in theory and [1] See, for example, the source code of java.util.ArrayList
class from OpenJDK 6.
in practice, due to non-contiguous storage and tree traversal/manipulation overhead.

2.3.4

Variants

Gap buers are similar to dynamic arrays but allow ecient insertion and deletion operations clustered near the
same arbitrary location. Some deque implementations
use array deques, which allow amortized constant time
insertion/removal at both ends, instead of just one end.
Goodrich[10] presented a dynamic array algorithm called
Tiered Vectors that provided O(n1/2 ) performance for order preserving insertions or deletions from the middle of
the array.
Hashed Array Tree (HAT) is a dynamic array algorithm
published by Sitarski in 1996.[11] Hashed Array Tree
wastes order n1/2 amount of storage space, where n is the
number of elements in the array. The algorithm has O(1)
amortized performance when appending a series of objects to the end of a Hashed Array Tree.
In a 1999 paper,[9] Brodnik et al. describe a tiered dy-

[2] Lambert, Kenneth Alfred (2009), Physical size and logical size, Fundamentals of Python: From First Programs
Through Data Structures (Cengage Learning): 510, ISBN
1423902181
[3] Goodrich, Michael T.; Tamassia, Roberto (2002), 1.5.2
Analyzing an Extendable Array Implementation, Algorithm Design: Foundations, Analysis and Internet Examples, Wiley, pp. 3941.
[4] Cormen, Thomas H.; Leiserson, Charles E., Rivest,
Ronald L., Stein, Cliord (2001) [1990]. 17.4 Dynamic
tables. Introduction to Algorithms (2nd ed.). MIT Press
and McGraw-Hill. pp. 416424. ISBN 0-262-03293-7.
[5] List object implementation from python.org, retrieved
2011-09-27.
[6] Gerald Kruse. CS 240 Lecture Notes: Linked Lists Plus:
Complexity Trade-os. Juniata College. Spring 2008.
[7] Day 1 Keynote - Bjarne Stroustrup: C++11 Style at GoingNative 2012 on channel9.msdn.com from minute 45 or
foil 44

2.3. DYNAMIC ARRAY

[8] Number crunching: Why you should never, ever, EVER use
linked-list in your code again at kjellkod.wordpress.com
[9] Brodnik, Andrej; Carlsson, Svante; Sedgewick, Robert;
Munro, JI; Demaine, ED (1999), Resizable Arrays in Optimal Time and Space (Technical Report CS-99-09), Department of Computer Science, University of Waterloo
[10] Goodrich, Michael T.; Kloss II, John G. (1999), Tiered
Vectors: Ecient Dynamic Arrays for Rank-Based
Sequences, Workshop on Algorithms and Data Structures, Lecture Notes in Computer Science 1663: 205
216, doi:10.1007/3-540-48447-7_21, ISBN 978-3-54066279-2
[11] Sitarski, Edward (September 1996), HATs: Hashed array trees, Dr. Dobbs Journal 21 (11) |chapter= ignored
(help)
[12] Bagwell, Phil (2002), Fast Functional Lists, Hash-Lists,
Deques and Variable Length Arrays, EPFL
[13] Javadoc on ArrayList

2.3.7

External links

NIST Dictionary of Algorithms and Data Structures:


Dynamic array
VPOOL - C language implementation of dynamic
array.
CollectionSpy A Java proler with explicit support for debugging ArrayList- and Vector-related issues.
Open Data Structures - Chapter 2 - Array-Based
Lists

23

Chapter 3

Dictionaries
3.1 Associative array
Dictionary (data structure)" redirects here. It is not to
be confused with data dictionary.

Add or insert: add a new (key, value) pair to the


collection, binding the new key to its new value.
The arguments to this operation are the key and the
value.

In computer science, an associative array, map, symbol


table, or dictionary is an abstract data type composed of
a collection of (key, value) pairs, such that each possible
key appears just once in the collection.

Reassign: replace the value in one of the


(key, value) pairs that are already in the collection,
binding an old key to a new value. As with an insertion, the arguments to this operation are the key and
the value.

Operations associated with this data type allow:[1][2]

Remove or delete: remove a (key, value) pair from


the collection, unbinding a given key from its value.
The argument to this operation is the key.

the addition of pairs to the collection


the removal of pairs from the collection

Lookup: nd the value (if any) that is bound to a


given key. The argument to this operation is the key,
and the value is returned from the operation. If no
value is found, some associative array implementations raise an exception.

the modication of the values of existing pairs


the lookup of the value associated with a particular
key
The dictionary problem is a classic computer science
problem: the task of designing a data structure that maintains a set of data during 'search' 'delete' and 'insert'
operations.[3] A standard solution to the dictionary problem is a hash table; in some cases it is also possible to
solve the problem using directly addressed arrays, binary
search trees, or other more specialized structures.[1][2][4]
Many programming languages include associative arrays
as primitive data types, and they are available in software
libraries for many others. Content-addressable memory
is a form of direct hardware-level support for associative
arrays.

In addition, associative arrays may also include other operations such as determining the number of bindings or
constructing an iterator to loop over all the bindings. Usually, for such an operation, the order in which the bindings
are returned may be arbitrary.
A multimap generalizes an associative array by allowing
multiple values to be associated with a single key.[6] A
bidirectional map is a related abstract data type in which
the bindings operate in both directions: each value must
be associated with a unique key, and a second lookup operation takes a value as argument and looks up the key
associated with that value.

Associative arrays have many applications including such


fundamental programming patterns as memoization and
the decorator pattern.[5]
3.1.2

3.1.1

Operations

In an associative array, the association between a key and


a value is often known as a binding, and the same word
binding may also be used to refer to the process of creating a new association.
The operations that are usually dened for an associative
array are:[1][2]

Example

Suppose that the set of loans made by a library is to be


represented in a data structure. Each book in a library
may be checked out only by a single library patron at a
time. However, a single patron may be able to check out
multiple books. Therefore, the information about which
books are checked out to which patrons may be represented by an associative array, in which the books are the
keys and the patrons are the values. For instance (using notation from Python, or JSON (JavaScript Object

24

3.1. ASSOCIATIVE ARRAY

25

Notation), in which a binding is represented by placing a cell).[1][2][4]


colon between the key and the value), the current check- Dictionaries may also be stored in binary search trees or
outs may be represented by an associative array
in data structures specialized to a particular type of keys
{ Great Expectations": John, Pride and Prejudice": such as radix trees, tries, Judy arrays, or van Emde Boas
Alice, Wuthering Heights": Alice }
trees, but these implementation methods are less ecient
than hash tables as well as placing greater restrictions on
the types of data that they can handle. The advantages
A lookup operation with the key Great Expectations
in this array would return the name of the person who of these alternative structures come from their ability to
checked out that book, John. If John returns his book, handle operations beyond the basic ones of an associative
that would cause a deletion operation in the associative ar- array, such as nding the binding whose key is the closest
ray, and if Pat checks out another book, that would cause to a queried key, when the query is not itself present in
the set of bindings.
an insertion operation, leading to a dierent state:
{ Pride and Prejudice": Alice, The Brothers Kara3.1.4
mazov": Pat, Wuthering Heights": Alice }

Language support

Main article: Comparison of programming languages


In this new state, the same lookup as before, with the key
(mapping)
Great Expectations, would raise an exception, because
this key is no longer present in the array.
Associative arrays can be implemented in any programming language as a package and many language systems
provide them as part of their standard library. In some
3.1.3 Implementation
languages, they are not only built into the standard sysFor dictionaries with very small numbers of bindings, it tem, but have special syntax, often using array-like submay make sense to implement the dictionary using an scripting.
association list, a linked list of bindings. With this im- Built-in syntactic support for associative arrays was introplementation, the time to perform the basic dictionary duced by SNOBOL4, under the name table. MUMPS
operations is linear in the total number of bindings; how- made multi-dimensional associative arrays, optionally
ever, it is easy to implement and the constant factors in persistent, its key data structure. SETL supported them
its running time are small.[1][7]
as one possible implementation of sets and maps. Most
Another very simple implementation technique, usable
when the keys are restricted to a narrow range of integers, is direct addressing into an array: the value for a
given key k is stored at the array cell A[k], or if there is
no binding for k then the cell stores a special sentinel value
that indicates the absence of a binding. As well as being
simple, this technique is fast: each dictionary operation
takes constant time. However, the space requirement for
this structure is the size of the entire keyspace, making it
impractical unless the keyspace is small.[4]

modern scripting languages, starting with AWK and including Rexx, Perl, Tcl, JavaScript, Python, Ruby, and
Lua, support associative arrays as a primary container
type. In many more languages, they are available as library functions without special syntax.
In Smalltalk, Objective-C, .NET,[8] Python, REALbasic,
and Swift they are called dictionaries; in Perl, Ruby and
Seed7 they are called hashes; in C++, Java, Go, Clojure,
Scala, OCaml, Haskell they are called maps (see map
(C++), unordered_map (C++), and Map); in Common
Lisp and Windows PowerShell, they are called hash tables (since both typically use this implementation). In
PHP, all arrays can be associative, except that the keys
are limited to integers and strings. In JavaScript (see also
JSON), all objects behave as associative arrays. In Lua,
they are called tables, and are used as the primitive building block for all data structures. In Visual FoxPro, they
are called Collections. The D language also has support
for associative arrays [9]

The most frequently used general purpose implementation of an associative array is with a hash table: an array
of bindings, together with a hash function that maps each
possible key into an array index. The basic idea of a hash
table is that the binding for a given key is stored at the position given by applying the hash function to that key, and
that lookup operations are performed by looking at that
cell of the array and using the binding found there. However, hash table based dictionaries must be prepared to
handle collisions that occur when two keys are mapped
by the hash function to the same index, and many different collision resolution strategies have been developed 3.1.5 See also
for dealing with this situation, often based either on open
Tuple
addressing (looking at a sequence of hash table indices instead of a single index, until nding either the given key
Function (mathematics)
or an empty cell) or on hash chaining (storing a small association list instead of a single binding in each hash table
Key-value data store

26

CHAPTER 3. DICTIONARIES

JSON

3.1.6

References

[1] Goodrich, Michael T.; Tamassia, Roberto (2006), 9.1


The Map Abstract Data Type, Data Structures & Algorithms in Java (4th ed.), Wiley, pp. 368371.
[2] Mehlhorn, Kurt; Sanders, Peter (2008), 4 Hash Tables
and Associative Arrays, Algorithms and Data Structures:
The Basic Toolbox, Springer, pp. 8198.
[3] Anderson, Arne (1989). Optimal Bounds on the Dictionary Problem. Proc. Symposium on Optimal Algorithms
(Springer Verlag): 106114.
[4] Cormen, Thomas H.; Leiserson, Charles E.; Rivest,
Ronald L.; Stein, Cliord (2001), 11 Hash Tables,
Introduction to Algorithms (2nd ed.), MIT Press and
McGraw-Hill, pp. 221252, ISBN 0-262-03293-7 .
[5] Goodrich & Tamassia (2006), pp. 597599.
[6] Goodrich & Tamassia (2006), pp. 389397.
[7] When should I use a hash table instead of an association
list?". lisp-faq/part2. 1996-02-20.

In the early development of Lisp, association lists


were used to resolve references to free variables in
procedures.[1]
Many programming languages, including Lisp, Scheme,
OCaml, and Haskell have functions for handling association lists in their standard library.

3.2.1 References
[1] McCarthy, John; Abrahams, Paul W.; Edwards, Daniel
J.; Hart, Timothy P.; Levin, Michael I. (1985). LISP 1.5
Programmers Manual. MIT Press. ISBN 0-262-130114.

3.3 Hash table


Not to be confused with Hash list or Hash tree.
Rehash redirects here. For the South Park episode, see
Rehash (South Park). For the IRC command, see List of
Internet Relay Chat commands REHASH.
In computing, a hash table (hash map) is a data struc-

[8] Dictionary<TKey, TValue> Class. MSDN.


[9] Associative Arrays, the D programming language. Digital Mars.

keys

hash
function

buckets
00

3.1.7

External links

NISTs Dictionary of Algorithms and Data Structures: Associative Array

John Smith
Lisa Smith
Sandra Dee

3.2 Association list


In computer programming and particularly in Lisp, an association list, often referred to as an alist, is a linked list
in which each list element (or node) comprises a key and
a value. The association list is said to associate the value
with the key. In order to nd the value associated with
a given key, each element of the list is searched in turn,
starting at the head, until the key is found. Duplicate keys
that appear later in the list are ignored. It is a simple way
of implementing an associative array.
The disadvantage of association lists is that the time to
search is O(n), where n is the length of the list. And unless the list is regularly pruned to remove elements with
duplicate keys multiple values associated with the same
key will increase the size of the list, and thus the time to
search, without providing any compensatory advantage.
One advantage is that a new element can be added to the
list at its head, which can be done in constant time. For
quite small values of n it is more ecient in terms of time
and space than more sophisticated strategies such as hash
tables and trees.

01

521-8976

02

521-1234

03
:

13
14

521-9655

15

A small phone book as a hash table

ture used to implement an associative array, a structure


that can map keys to values. A hash table uses a hash
function to compute an index into an array of buckets or
slots, from which the correct value can be found.
Ideally, the hash function will assign each key to a unique
bucket, but this situation is rarely achievable in practice
(usually some keys will hash to the same bucket). Instead,
most hash table designs assume that hash collisions
dierent keys that are assigned by the hash function to the
same bucketwill occur and must be accommodated in
some way.
In a well-dimensioned hash table, the average cost (number of instructions) for each lookup is independent of the
number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions
of key-value pairs, at (amortized[2] ) constant average cost
per operation.[3][4]

3.3. HASH TABLE


In many situations, hash tables turn out to be more ecient than search trees or any other table lookup structure. For this reason, they are widely used in many kinds
of computer software, particularly for associative arrays,
database indexing, caches, and sets.

3.3.1

27
ate if there is a risk of malicious users trying to sabotage
a network service by submitting requests designed to generate a large number of collisions in the servers hash tables. However, the risk of sabotage can also be avoided
by cheaper methods (such as applying a secret salt to the
data, or using a universal hash function).

Hashing
Perfect hash function

Main article: Hash function


If all keys are known ahead of time, a perfect hash funcThe idea of hashing is to distribute the entries (key/value tion can be used to create a perfect hash table that has
pairs) across an array of buckets. Given a key, the algo- no collisions. If minimal perfect hashing is used, every
rithm computes an index that suggests where the entry can location in the hash table can be used as well.
be found:
Perfect hashing allows for constant time lookups in the
worst case. This is in contrast to most chaining and open
index = f(key, array_size)
addressing methods, where the time for lookup is low on
Often this is done in two steps:
average, but may be very large (proportional to the number of entries) for some sets of keys.
hash = hashfunc(key) index = hash % array_size
In this method, the hash is independent of the array size,
and it is then reduced to an index (a number between 0 3.3.2
and array_size 1) using the modulo operator (%).
In the case that the array size is a power of two, the remainder operation is reduced to masking, which improves
speed, but can increase problems with a poor hash function.
Choosing a good hash function
A good hash function and implementation algorithm are
essential for good hash table performance, but may be
dicult to achieve.
A basic requirement is that the function should provide a
uniform distribution of hash values. A non-uniform distribution increases the number of collisions and the cost
of resolving them. Uniformity is sometimes dicult to
ensure by design, but may be evaluated empirically using statistical tests, e.g., a Pearsons chi-squared test for
discrete uniform distributions.[5][6]

Key statistics

A critical statistic for a hash table is called the load factor. This is simply the number of entries divided by the
number of buckets, that is, n/k where n is the number of
entries and k is the number of buckets.
If the load factor is kept reasonable, the hash table should
perform well, provided the hashing is good. If the load
factor grows too large, the hash table will become slow,
or it may fail to work (depending on the method used).
The expected constant time property of a hash table assumes that the load factor is kept below some bound. For
a xed number of buckets, the time for a lookup grows
with the number of entries and so does not achieve the
desired constant time.

Second to that, one can examine the variance of number


of entries per bucket. For example, two tables both have
1000 entries and 1000 buckets; one has exactly one entry in each bucket, the other has all entries in the same
bucket. Clearly the hashing is not working in the second
The distribution needs to be uniform only for table sizes one.
that occur in the application. In particular, if one uses
dynamic resizing with exact doubling and halving of the A low load factor is not especially benecial. As the load
table size s, then the hash function needs to be uniform factor approaches 0, the proportion of unused areas in
only when s is a power of two. On the other hand, some the hash table increases, but there is not necessarily any
hashing algorithms provide uniform hashes only when s reduction in search cost. This results in wasted memory.
is a prime number.[7]
For open addressing schemes, the hash function should
also avoid clustering, the mapping of two or more keys to
consecutive slots. Such clustering may cause the lookup
cost to skyrocket, even if the load factor is low and collisions are infrequent. The popular multiplicative hash[3] is
claimed to have particularly poor clustering behavior.[7]

3.3.3 Collision resolution

Hash collisions are practically unavoidable when hashing


a random subset of a large set of possible keys. For example, if 2,450 keys are hashed into a million buckets,
even with a perfectly uniform random distribution, acCryptographic hash functions are believed to provide cording to the birthday problem there is approximately a
good hash functions for any table size s, either by modulo 95% chance of at least two of the keys being hashed to
reduction or by bit masking. They may also be appropri- the same slot.

28

CHAPTER 3. DICTIONARIES

Therefore, most hash table implementations have some


collision resolution strategy to handle such events. Some
common strategies are described below. All these methods require that the keys (or pointers to them) be stored
in the table, together with the associated values.
Separate chaining

keys

buckets
000

John Smith
Lisa Smith
Sam Doe
Sandra Dee
Ted Baker

001

entries
Lisa Smith

521-8976

John Smith

521-1234

Sandra Dee

521-9655

Ted Baker

418-4165

Sam Doe

521-5030

002
:

151
152
153
154
:

253
254
255

For separate-chaining, the worst-case scenario is when all


entries are inserted into the same bucket, in which case
the hash table is ineective and the cost is that of searching the bucket data structure. If the latter is a linear list,
the lookup procedure may have to scan all its entries, so
the worst-case cost is proportional to the number n of entries in the table.
The bucket chains are often implemented as ordered lists,
sorted by the key eld; this choice approximately halves
the average cost of unsuccessful lookups, compared to an
unordered list. However, if some keys are much more
likely to come up than others, an unordered list with
move-to-front heuristic may be more eective. More sophisticated data structures, such as balanced search trees,
are worth considering only if the load factor is large
(about 10 or more), or if the hash distribution is likely
to be very non-uniform, or if one must guarantee good
performance even in a worst-case scenario. However, using a larger table and/or a better hash function may be
even more eective in those cases.

Chained hash tables also inherit the disadvantages of


linked lists. When storing small keys and values, the
Hash collision resolved by separate chaining.
space overhead of the next pointer in each entry record
can be signicant. An additional disadvantage is that
In the method known as separate chaining, each bucket traversing a linked list has poor cache performance, makis independent, and has some sort of list of entries with ing the processor cache ineective.
the same index. The time for hash table operations is the
time to nd the bucket (which is constant) plus the time
for the list operation. (The technique is also called open
hashing or closed addressing.)
overow
keys

In a good hash table, each bucket has zero or one entries, and sometimes two or three, but rarely more than
that. Therefore, structures that are ecient in time and
space for these cases are preferred. Structures that are efcient for a fairly large number of entries per bucket are
not needed or desirable. If these cases happen often, the
hashing is not working well, and this needs to be xed.

entries

buckets
000

John Smith
Lisa Smith
Sam Doe
Sandra Dee
Ted Baker

001

Lisa Smith

521-8976

002
:

152

John Smith

521-1234

153

Ted Baker

418-4165

Sam Doe

521-5030

151
Sandra Dee

521-9655

154
:

253
254
255

Hash collision by separate chaining with head records in the

Separate chaining with linked lists Chained hash ta- bucket array.
bles with linked lists are popular because they require
only basic data structures with simple algorithms, and can
Separate chaining with list head cells Some chaining
use simple hash functions that are unsuitable for other
implementations store the rst record of each chain in the
methods.
slot array itself.[4] The number of pointer traversals is deThe cost of a table operation is that of scanning the entries creased by one for most cases. The purpose is to increase
of the selected bucket for the desired key. If the distribu- cache eciency of hash table access.
tion of keys is suciently uniform, the average cost of a
The disadvantage is that an empty bucket takes the same
lookup depends only on the average number of keys per
space as a bucket with one entry. To save space, such hash
bucketthat is, on the load factor.
tables often have about as many slots as stored entries,
Chained hash tables remain eective even when the num- meaning that many slots have two or more entries.
ber of table entries n is much higher than the number of
slots. Their performance degrades more gracefully (linearly) with the load factor. For example, a chained hash Separate chaining with other structures Instead of
table with 1000 slots and 10,000 stored keys (load fac- a list, one can use any other data structure that supports
tor 10) is ve to ten times slower than a 10,000-slot table the required operations. For example, by using a self(load factor 1); but still 1000 times faster than a plain se- balancing tree, the theoretical worst-case time of common hash table operations (insertion, deletion, lookup)
quential list.

3.3. HASH TABLE

29

can be brought down to O(log n) rather than O(n). However, this approach is only worth the trouble and extra
memory cost if long delays must be avoided at all costs
(e.g., in a real-time application), or if one must guard
against many entries hashed to the same slot (e.g., if one
expects extremely non-uniform distributions, or in the
case of web sites or other publicly accessible services,
which are vulnerable to malicious key distributions in requests).

records are stored in the bucket array itself. When a


new entry has to be inserted, the buckets are examined,
starting with the hashed-to slot and proceeding in some
probe sequence, until an unoccupied slot is found. When
searching for an entry, the buckets are scanned in the
same sequence, until either the target record is found, or
an unused array slot is found, which indicates that there
is no such key in the table.[12] The name open addressing refers to the fact that the location (address) of the
The variant called array hash table uses a dynamic array item is not determined by its hash value. (This method
is also called closed hashing; it should not be confused
to store all the entries that hash to the same slot.[8][9][10]
with
open hashing or closed addressing that usually
Each newly inserted entry gets appended to the end of
mean separate chaining.)
the dynamic array that is assigned to the slot. The dynamic array is resized in an exact-t manner, meaning it Well-known probe sequences include:
is grown only by as many bytes as needed. Alternative
techniques such as growing the array by block sizes or
Linear probing, in which the interval between
pages were found to improve insertion performance, but
probes is xed (usually 1)
at a cost in space. This variation makes more ecient
Quadratic probing, in which the interval between
use of CPU caching and the translation lookaside buer
probes is increased by adding the successive outputs
(TLB), because slot entries are stored in sequential memof a quadratic polynomial to the starting value given
ory positions. It also dispenses with the next pointers that
by the original hash computation
are required by linked lists, which saves space. Despite
frequent array resizing, space overheads incurred by oper Double hashing, in which the interval between
ating system such as memory fragmentation, were found
probes is computed by another hash function
to be small.
An elaboration on this approach is the so-called dynamic
perfect hashing,[11] where a bucket that contains k entries
is organized as a perfect hash table with k2 slots. While it
uses more memory (n2 slots for n entries, in the worst case
and n*k slots in the average case), this variant has guaranteed constant worst-case lookup time, and low amortized
time for insertion.
Open addressing

keys

buckets
000
001

Lisa Smith

521-8976

152

John Smith

521-1234

153

Sandra Dee

521-9655

154

Ted Baker

418-4165

Sam Doe

521-5030

John Smith

002

Lisa Smith

151

Sam Doe
Sandra Dee

155
:

A drawback of all these open addressing schemes is that


the number of stored entries cannot exceed the number
of slots in the bucket array. In fact, even with good hash
functions, their performance dramatically degrades when
the load factor grows beyond 0.7 or so. For many applications, these restrictions mandate the use of dynamic
resizing, with its attendant costs.
Open addressing schemes also put more stringent requirements on the hash function: besides distributing the keys
more uniformly over the buckets, the function must also
minimize the clustering of hash values that are consecutive in the probe order. Using separate chaining, the only
concern is that too many objects map to the same hash
value; whether they are adjacent or nearby is completely
irrelevant.
Open addressing only saves memory if the entries are
small (less than four times the size of a pointer) and the
load factor is not too small. If the load factor is close to
zero (that is, there are far more buckets than stored entries), open addressing is wasteful even if each entry is
just two words.

Open addressing avoids the time overhead of allocating


each new entry record, and can be implemented even in
the absence of a memory allocator. It also avoids the ex255
tra indirection required to access the rst entry of each
bucket (that is, usually the only one). It also has betHash collision resolved by open addressing with linear probing
ter locality of reference, particularly with linear probing.
(interval=1). Note that Ted Baker has a unique hash, but nevWith
small record sizes, these factors can yield better perertheless collided with Sandra Dee, that had previously collided
formance
than chaining, particularly for lookups. Hash
with John Smith.
tables with open addressing are also easier to serialize,
In another strategy, called open addressing, all entry because they do not use pointers.
Ted Baker

253
254

30

CHAPTER 3. DICTIONARIES
function is used, and so on. If a collision happens during
insertion, then the key is re-hashed with the second hash
function to map it to another bucket. If all hash functions are used and there is still a collision, then the key it
collided with is removed to make space for the new key,
and the old key is re-hashed with one of the other hash
functions, which maps it to another bucket. If that location also results in a collision, then the process repeats
until there is no collision or the process traverses all the
buckets, at which point the table is resized. By combining multiple hash functions with multiple cells per bucket,
very high space utilisation can be achieved.

This graph compares the average number of cache misses required to look up elements in tables with chaining and linear
probing. As the table passes the 80%-full mark, linear probings
performance drastically degrades.

Hopscotch hashing Another alternative openaddressing solution is hopscotch hashing,[13] which


combines the approaches of cuckoo hashing and linear
probing, yet seems in general to avoid their limitations.
On the other hand, normal open addressing is a poor In particular it works well even when the load factor
choice for large elements, because these elements ll en- grows beyond 0.9. The algorithm is well suited for
tire CPU cache lines (negating the cache advantage), and implementing a resizable concurrent hash table.
a large amount of space is wasted on large empty table The hopscotch hashing algorithm works by dening a
slots. If the open addressing table only stores references neighborhood of buckets near the original hashed bucket,
to elements (external storage), it uses space comparable where a given entry is always found. Thus, search is limto chaining even for large records but loses its speed ad- ited to the number of entries in this neighborhood, which
vantage.
is logarithmic in the worst case, constant on average, and
Generally speaking, open addressing is better used for with proper alignment of the neighborhood typically rehash tables with small records that can be stored within quires one cache miss. When inserting an entry, one rst
the table (internal storage) and t in a cache line. They attempts to add it to a bucket in the neighborhood. Howare particularly suitable for elements of one word or less. ever, if all buckets in this neighborhood are occupied, the
If the table is expected to have a high load factor, the algorithm traverses buckets in sequence until an open slot
records are large, or the data is variable-sized, chained (an unoccupied bucket) is found (as in linear probing). At
that point, since the empty bucket is outside the neighhash tables often perform as well or better.
borhood, items are repeatedly displaced in a sequence of
Ultimately, used sensibly, any kind of hash table algo- hops. (This is similar to cuckoo hashing, but with the difrithm is usually fast enough; and the percentage of a cal- ference that in this case the empty slot is being moved into
culation spent in hash table code is low. Memory usage is the neighborhood, instead of items being moved out with
rarely considered excessive. Therefore, in most cases the the hope of eventually nding an empty slot.) Each hop
dierences between these algorithms are marginal, and brings the open slot closer to the original neighborhood,
other considerations typically come into play.
without invalidating the neighborhood property of any of
the buckets along the way. In the end, the open slot has
been moved into the neighborhood, and the entry being
Coalesced hashing A hybrid of chaining and open inserted can be added to it.
addressing, coalesced hashing links together chains of
nodes within the table itself.[12] Like open addressing, it
achieves space usage and (somewhat diminished) cache Robin Hood hashing
advantages over chaining. Like chaining, it does not exhibit clustering eects; in fact, the table can be eciently One interesting variation on double-hashing collision reslled to a high density. Unlike chaining, it cannot have olution is Robin Hood hashing.[14][15] The idea is that a
new key may displace a key already inserted, if its probe
more elements than table slots.
count is larger than that of the key at the current position. The net eect of this is that it reduces worst case
Cuckoo hashing Another alternative open-addressing search times in the table. This is similar to ordered hash
solution is cuckoo hashing, which ensures constant tables[16] except that the criterion for bumping a key does
lookup time in the worst case, and constant amortized not depend on a direct relationship between the keys.
time for insertions and deletions. It uses two or more hash Since both the worst case and the variation in the number
functions, which means any key/value pair could be in two of probes is reduced dramatically, an interesting variation
or more locations. For lookup, the rst hash function is is to probe the table starting at the expected successful
used; if the key/value is not found, then the second hash probe value and then expand from that position in both

3.3. HASH TABLE

31

directions.[17] External Robin Hashing is an extension of Resizing by copying all entries


this algorithm where the table is stored in an external le
and each table position corresponds to a xed-sized page A common approach is to automatically trigger a complete resizing when the load factor exceeds some threshor bucket with B records.[18]
old r . Then a new larger table is allocated, all the entries of the old table are removed and inserted into this
new table, and the old table is returned to the free storage pool. Symmetrically, when the load factor falls below
2-choice hashing
a second threshold r , all entries are moved to a new
smaller
table.
2-choice hashing employs 2 dierent hash functions,
h1 (x) and h2 (x), for the hash table. Both hash functions
are used to compute two table locations. When an object
is inserted in the table, then it is placed in the table location that contains fewer objects (with the default being the
h1 (x) table location if there is equality in bucket size). 2choice hashing employs the principle of the power of two
choices.[19]

For hash tables that shrink and grow frequently, the resizing downward can be skipped entirely. In this case, the table size is proportional to the number of entries that ever
were in the hash table, rather than the current number.
The disadvantage is that memory usage will be higher,
and thus cache behavior may be worse. For best control,
a shrink-to-t operation can be provided that does this
only on request.

If the table size increases or decreases by a xed percentage at each expansion, the total cost of these resizings,
3.3.4 Dynamic resizing
amortized over all insert and delete operations, is still a
constant, independent of the number of entries n and of
The good functioning of a hash table depends on the fact the number m of operations performed.
that the table size is proportional to the number of entries. For example, consider a table that was created with the
With a xed size, and the common structures, it is simi- minimum possible size and is doubled each time the load
lar to linear search, except with a better constant factor. ratio exceeds some threshold. If m elements are inserted
In some cases, the number of entries may be denitely into that table, the total number of extra re-insertions that
known in advance, for example keywords in a language. occur in all dynamic resizings of the table is at most m
More commonly, this is not known for sure, if only due 1. In other words, dynamic resizing roughly doubles the
to later changes in code and data. It is one serious, al- cost of each insert or delete operation.
though common, mistake to not provide any way for the
table to resize. A general-purpose hash table class will
almost always have some way to resize, and it is good Incremental resizing
practice even for simple custom tables. An implementation should check the load factor, and do something if it Some hash table implementations, notably in real-time
becomes too large (this needs to be done only on inserts, systems, cannot pay the price of enlarging the hash table
since that is the only thing that would increase it).
all at once, because it may interrupt time-critical operaTo keep the load factor under a certain limit, e.g., under tions. If one cannot avoid dynamic resizing, a solution is
3/4, many table implementations expand the table when to perform the resizing gradually:
items are inserted. For example, in Javas HashMap class
the default load factor threshold for table expansion is
0.75 and in Pythons dict, table size is resized when load
factor is greater than 2/3.

During the resize, allocate the new hash table, but


keep the old table unchanged.

Since buckets are usually implemented on top of a


dynamic array and any constant proportion for resizing
greater than 1 will keep the load factor under the desired
limit, the exact choice of the constant is determined by
the same space-time tradeo as for dynamic arrays.

Perform insertion operations only in the new table.

Resizing is accompanied by a full or incremental table


rehash whereby existing items are mapped to new bucket
locations.
To limit the proportion of memory wasted due to empty
buckets, some implementations also shrink the size of
the tablefollowed by a rehashwhen items are deleted.
From the point of space-time tradeos, this operation is
similar to the deallocation in dynamic arrays.

In each lookup or delete operation, check both tables.

At each insertion also move r elements from the old


table to the new table.
When all elements are removed from the old table,
deallocate it.
To ensure that the old table is completely copied over before the new table itself needs to be enlarged, it is necessary to increase the size of the table by a factor of at least
(r + 1)/r during resizing.

32

CHAPTER 3. DICTIONARIES

Monotonic keys
If it is known that key values will always increase (or
decrease) monotonically, then a variation of consistent
hashing can be achieved by keeping a list of the single
most recent key value at each hash table resize operation.
Upon lookup, keys that fall in the ranges dened by these
list entries are directed to the appropriate hash function
and indeed hash tableboth of which can be dierent for
each range. Since it is common to grow the overall number of entries by doubling, there will only be O(lg(N))
ranges to check, and binary search time for the redirection would be O(lg(lg(N))). As with consistent hashing,
this approach guarantees that any keys hash, once issued,
will never change, even when the hash table is later grown.
Other solutions

be shown that hashing with chaining requires (1 + n/k)


comparisons on average for an unsuccessful lookup, and
hashing with open addressing requires (1/(1 n/k)).[21]
Both these bounds are constant, if we maintain n/k < c
using table resizing, where c is a xed constant less than
1.

3.3.6 Features
Advantages
The main advantage of hash tables over other table data
structures is speed. This advantage is more apparent
when the number of entries is large. Hash tables are particularly ecient when the maximum number of entries
can be predicted in advance, so that the bucket array can
be allocated once with the optimum size and never resized.

Linear hashing[20] is a hash table algorithm that permits


incremental hash table expansion. It is implemented us- If the set of key-value pairs is xed and known ahead
ing a single hash table, but with two possible look-up of time (so insertions and deletions are not allowed), one
may reduce the average lookup cost by a careful choice
functions.
of the hash function, bucket table size, and internal data
Another way to decrease the cost of table resizing is to
structures. In particular, one may be able to devise a hash
choose a hash function in such a way that the hashes of
function that is collision-free, or even perfect (see below).
most values do not change when the table is resized. This
In this case the keys need not be stored in the table.
approach, called consistent hashing, is prevalent in diskbased and distributed hashes, where rehashing is prohibitively costly.
Drawbacks

3.3.5

Performance analysis

In the simplest model, the hash function is completely unspecied and the table does not resize. For the best possible choice of hash function, a table of size k with open
addressing has no collisions and holds up to k elements,
with a single comparison for successful lookup, and a table of size k with chaining and n keys has the minimum
max(0, n-k) collisions and O(1 + n/k) comparisons for
lookup. For the worst choice of hash function, every insertion causes a collision, and hash tables degenerate to
linear search, with (n) amortized comparisons per insertion and up to n comparisons for a successful lookup.

Although operations on a hash table take constant time on


average, the cost of a good hash function can be signicantly higher than the inner loop of the lookup algorithm
for a sequential list or search tree. Thus hash tables are
not eective when the number of entries is very small.
(However, in some cases the high cost of computing the
hash function can be mitigated by saving the hash value
together with the key.)

For certain string processing applications, such as spellchecking, hash tables may be less ecient than tries, nite
automata, or Judy arrays. Also, if each key is represented
by a small enough number of bits, then, instead of a hash
table, one may use the key directly as the index into an
Adding rehashing to this model is straightforward. As in array of values. Note that there are no collisions in this
a dynamic array, geometric resizing by a factor of b im- case.
plies that only n/bi keys are inserted i or more times, so The entries stored in a hash table can be enumerated efthat the total number of insertions is bounded above by ciently (at constant cost per entry), but only in some
bn/(b1), which is O(n). By using rehashing to maintain pseudo-random order. Therefore, there is no ecient
n < k, tables using both chaining and open addressing can way to locate an entry whose key is nearest to a given key.
have unlimited elements and perform successful lookup Listing all n entries in some specic order generally rein a single comparison for the best choice of hash func- quires a separate sorting step, whose cost is proportional
tion.
to log(n) per entry. In comparison, ordered search trees
In more realistic models, the hash function is a random have lookup and insertion cost proportional to log(n), but
variable over a probability distribution of hash functions, allow nding the nearest key at about the same cost, and
and performance is computed on average over the choice ordered enumeration of all entries at constant cost per enof hash function. When this distribution is uniform, the try.
assumption is called simple uniform hashing and it can If the keys are not stored (because the hash function is

3.3. HASH TABLE

33

collision-free), there may be no easy way to enumerate Caches


the keys that are present in the table at any given moment.
Hash tables can be used to implement caches, auxiliary
Although the average cost per operation is constant and
data tables that are used to speed up the access to data that
fairly small, the cost of a single operation may be quite
is primarily stored in slower media. In this application,
high. In particular, if the hash table uses dynamic resizhash collisions can be handled by discarding one of the
ing, an insertion or deletion operation may occasionally
two colliding entriesusually erasing the old item that is
take time proportional to the number of entries. This may
currently stored in the table and overwriting it with the
be a serious drawback in real-time or interactive applicanew item, so every item in the table has a unique hash
tions.
value.
Hash tables in general exhibit poor locality of referencethat is, the data to be accessed is distributed
seemingly at random in memory. Because hash tables Sets
cause access patterns that jump around, this can trigger microprocessor cache misses that cause long delays. Besides recovering the entry that has a given key, many
Compact data structures such as arrays searched with hash table implementations can also tell whether such an
linear search may be faster, if the table is relatively small entry exists or not.
and keys are compact. The optimal performance point Those structures can therefore be used to implement a
varies from system to system.
set data structure, which merely records whether a given
Hash tables become quite inecient when there are many key belongs to a specied set of keys. In this case, the
collisions. While extremely uneven hash distributions structure can be simplied by eliminating all parts that
are extremely unlikely to arise by chance, a malicious have to do with the entry values. Hashing can be used to
adversary with knowledge of the hash function may be implement both static and dynamic sets.
able to supply information to a hash that creates worstcase behavior by causing excessive collisions, resulting in
very poor performance, e.g., a denial of service attack.[22]
In critical applications, universal hashing can be used; a
data structure with better worst-case guarantees may be
preferable.[23]

3.3.7

Uses

Associative arrays

Object representation
Several dynamic languages, such as Perl, Python,
JavaScript, and Ruby, use hash tables to implement objects. In this representation, the keys are the names of the
members and methods of the object, and the values are
pointers to the corresponding member or method.
Unique data representation

Hash tables are commonly used to implement many


types of in-memory tables. They are used to implement associative arrays (arrays whose indices are arbitrary strings or other complicated objects), especially
in interpreted programming languages like Perl, Ruby,
Python, and PHP.

Hash tables can be used by some programs to avoid creating multiple character strings with the same contents.
For that purpose, all strings in use by the program are
stored in a single string pool implemented as a hash table,
which is checked whenever a new string has to be created.
This technique was introduced in Lisp interpreters under
When storing a new item into a multimap and a hash col- the name hash consing, and can be used with many other
lision occurs, the multimap unconditionally stores both kinds of data (expression trees in a symbolic algebra system, records in a database, les in a le system, binary
items.
decision diagrams, etc.)
When storing a new item into a typical associative array
and a hash collision occurs, but the actual keys themselves are dierent, the associative array likewise stores String interning
both items. However, if the key of the new item exactly
matches the key of an old item, the associative array typ- Main article: String interning
ically erases the old item and overwrites it with the new
item, so every item in the table has a unique key.

3.3.8 Implementations
Database indexing
In programming languages
Hash tables may also be used as disk-based data structures
and database indices (such as in dbm) although B-trees Many programming languages provide hash table funcare more popular in these applications.
tionality, either as built-in associative arrays or as stan-

34

CHAPTER 3. DICTIONARIES

dard library modules. In C++11, for example, the


unordered_map class provides hash tables for keys and
values of arbitrary type.

Stable hashing

In PHP 5, the Zend 2 engine uses one of the hash functions from Daniel J. Bernstein to generate the hash values
used in managing the mappings of data pointers stored
in a hash table. In the PHP source code, it is labelled as
DJBX33A (Daniel J. Bernstein, Times 33 with Addition).

Extendible hashing

Consistent hashing

Lazy deletion
Pearson hashing

PhotoDNA
Python's built-in hash table implementation, in the form
of the dict type, as well as Perl's hash type (%) are used
internally to implement namespaces and therefore need Related data structures
to pay more attention to security, i.e., collision attacks.
Python sets also use hashes internally, for fast lookup There are several data structures that use hash functions
(though they store only keys, not values).[24]
but cannot be considered special cases of hash tables:
In the .NET Framework, support for hash tables is provided via the non-generic Hashtable and generic Dictio Bloom lter, memory ecient data-structure denary classes, which store key-value pairs, and the generic
signed for constant-time approximate lookups; uses
HashSet class, which stores only values.
hash function(s) and can be seen as an approximate
hash table.
Independent packages

Distributed hash table (DHT), a resilient dynamic


table spread over several nodes of a network.

SparseHash (formerly Google SparseHash) An extremely memory-ecient hash_map implementa Hash array mapped trie, a trie structure, similar to
tion, with only 2 bits/entry of overhead. The Sparsethe array mapped trie, but where each key is hashed
Hash library has several C++ hash map implementarst.
tions with dierent performance characteristics, including one that optimizes for memory use and an3.3.11 References
other that optimizes for speed.
SunriseDD An open source C library for hash table storage of arbitrary data objects with lock-free
lookups, built-in reference counting and guaranteed
order iteration. The library can participate in external reference counting systems or use its own
built-in reference counting. It comes with a variety of hash functions and allows the use of runtime supplied hash functions via callback mechanism. Source code is well documented.
uthash This is an easy-to-use hash table for C structures.

3.3.9

History

The idea of hashing arose independently in dierent


places. In January 1953, H. P. Luhn wrote an internal
IBM memorandum that used hashing with chaining.[25]
G. N. Amdahl, E. M. Boehme, N. Rochester, and Arthur
Samuel implemented a program using hashing at about
the same time. Open addressing with linear probing (relatively prime stepping) is credited to Amdahl, but Ershov
(in Russia) had the same idea.[25]

3.3.10

See also

RabinKarp string search algorithm

[1] Cormen, Thomas H.; Leiserson, Charles E.; Rivest,


Ronald L.; Stein, Cliord (2009). Introduction to Algorithms (3rd ed.). Massachusetts Institute of Technology.
pp. 253280. ISBN 978-0-262-03384-8.
[2] Charles E. Leiserson, Amortized Algorithms, Table
Doubling, Potential Method Lecture 13, course MIT
6.046J/18.410J Introduction to AlgorithmsFall 2005
[3] Knuth, Donald (1998). 'The Art of Computer Programming'. 3: Sorting and Searching (2nd ed.). AddisonWesley. pp. 513558. ISBN 0-201-89685-0.
[4] Cormen, Thomas H.; Leiserson, Charles E.; Rivest,
Ronald L.; Stein, Cliord (2001). Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. 221252.
ISBN 978-0-262-53196-2.
[5] Pearson, Karl (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably
supposed to have arisen from random sampling. Philosophical Magazine, Series 5 50 (302). pp. 157175.
doi:10.1080/14786440009463897.
[6] Plackett, Robin (1983). Karl Pearson and the ChiSquared Test. International Statistical Review (International Statistical Institute (ISI)) 51 (1). pp. 5972.
doi:10.2307/1402731.
[7] Thomas Wang (1997), Prime Double Hash Table. Retrieved April 27, 2012

3.3. HASH TABLE

35

[8] Askitis, Nikolas; Zobel, Justin (October 2005). Cacheconscious Collision Resolution in String Hash Tables. Proceedings of the 12th International Conference, String
Processing and Information Retrieval (SPIRE 2005).
3772/2005. pp. 91102. doi:10.1007/11575832_11.
ISBN 978-3-540-29740-6.

[24] http://stackoverflow.com/questions/513882/
python-list-vs-dict-for-look-up-table

[9] Askitis, Nikolas; Sinha, Ranjan (2010). Engineering


scalable, cache and space ecient tries for strings. The
VLDB Journal 17 (5): 633660. doi:10.1007/s00778010-0183-9. ISSN 1066-8888.

3.3.12 Further reading

[10] Askitis, Nikolas (2009). Fast and Compact Hash Tables


for Integer Keys. Proceedings of the 32nd Australasian
Computer Science Conference (ACSC 2009) 91. pp. 113
122. ISBN 978-1-920682-72-9.
[11] Erik Demaine, Je Lind. 6.897: Advanced Data Structures. MIT Computer Science and Articial Intelligence
Laboratory. Spring 2003. http://courses.csail.mit.edu/6.
897/spring03/scribe_notes/L2/lecture2.pdf
[12] Tenenbaum, Aaron M.; Langsam, Yedidyah; Augenstein,
Moshe J. (1990). Data Structures Using C. Prentice Hall.
pp. 456461, p. 472. ISBN 0-13-199746-7.
[13] Herlihy, Maurice; Shavit, Nir; Tzafrir, Moran (2008).
DISC '08: Proceedings of the 22nd international symposium on Distributed Computing. Berlin, Heidelberg:
Springer-Verlag. pp. 350364. |chapter= ignored (help)
[14] Celis, Pedro (1986). Robin Hood hashing (Technical report). Computer Science Department, University of Waterloo. CS-86-14.
[15] Goossaert, Emmanuel (2013). Robin Hood hashing.
[16] Amble, Ole; Knuth, Don (1974).
Ordered
hash tables.
Computer Journal 17 (2): 135.
doi:10.1093/comjnl/17.2.135.
[17] Viola, Alfredo (October 2005). Exact distribution of individual displacements in linear probing hashing. Transactions on Algorithms (TALG) (ACM) 1 (2,): 214242.
doi:10.1145/1103963.1103965.
[18] Celis, Pedro (March 1988). External Robin Hood Hashing
(Technical report). Computer Science Department, Indiana University. TR246.
[19] http://www.eecs.harvard.edu/~{}michaelm/postscripts/
handbook2001.pdf
[20] Litwin, Witold (1980). Proc. 6th Conference on Very
Large Databases. pp. 212223. |chapter= ignored (help)
[21] Doug Dunham. CS 4521 Lecture Notes. University of
Minnesota Duluth. Theorems 11.2, 11.6. Last modied
April 21, 2009.
[22] Alexander Klink and Julian Wldes Ecient Denial of
Service Attacks on Web Application Platforms, December
28, 2011, 28th Chaos Communication Congress. Berlin,
Germany.
[23] Crosby and Wallachs Denial of Service via Algorithmic
Complexity Attacks.

[25] Mehta, Dinesh P.; Sahni, Sartaj. Handbook of Datastructures and Applications. pp. 915. ISBN 1-58488-435-5.

Tamassia, Roberto; Goodrich, Michael T. (2006).


Chapter Nine: Maps and Dictionaries. Data structures and algorithms in Java : [updated for Java 5.0]
(4th ed.). Hoboken, NJ: Wiley. pp. 369418. ISBN
0-471-73884-0.
McKenzie, B. J.; Harries, R.; Bell, T. (Feb
1990). Selecting a hashing algorithm. Software Practice & Experience 20 (2): 209224.
doi:10.1002/spe.4380200207.

3.3.13 External links


A Hash Function for Hash Table Lookup by Bob
Jenkins.
Hash Tables by SparkNotesexplanation using C
Hash functions by Paul Hsieh
Design of Compact and Ecient Hash Tables for
Java link not working
Libhashish hash library
NIST entry on hash tables
Open addressing hash table removal algorithm from
ICI programming language, ici_set_unassign in set.c
(and other occurrences, with permission).
A basic explanation of how the hash table works by
Reliable Software
Lecture on Hash Tables
Hash-tables in Ctwo simple and clear examples of
hash tables implementation in C with linear probing
and chaining
Open Data Structures Chapter 5 Hash Tables
MITs Introduction to Algorithms: Hashing 1 MIT
OCW lecture Video
MITs Introduction to Algorithms: Hashing 2 MIT
OCW lecture Video
How to sort a HashMap (Java) and keep the duplicate entries
How python dictionary works

36

CHAPTER 3. DICTIONARIES

3.4 Linear probing

strictly less than one.[2] This analysis makes the (unrealistic) assumption that the hash function is completely
Linear probing is a scheme in computer programming random, but can be extended also to 5-independent hash
[3]
for resolving hash collisions of values of hash functions by functions. Weaker properties, such as universal hashsequentially searching the hash table for a free location.[1] ing, are not strong enough to ensure the constant-time operation of linear probing,[4] but one practical method of
hash function generation, tabulation hashing, again leads
to a guaranteed constant expected time performance de3.4.1 Algorithm
spite not being 5-independent.[5]
Linear probing is accomplished using two values - one as
a starting value and one as an interval between successive
values in modular arithmetic. The second value, which is 3.4.4 See also
the same for all keys and known as the stepsize, is repeat Quadratic probing
edly added to the starting value until a free space is found,
or the entire table is traversed. (In order to traverse the
Collision resolution
entire table the stepsize should be relatively prime to the
arraysize, which is why the array size is often chosen to
be a prime number.)
3.4.5 References
newLocation = (startingValue +
stepSize) % arraySize
Given an ordinary hash function H(x), a linear probing
function (H(x, i)) would be:

H(x, i) = (H(x) + i) (mod n).


Here H(x) is the starting value, n the size of the hash table,
and the stepsize is i in this case.
Often, the step size is one; that is, the array cells that are
probed are consecutive in the hash table. Double hashing
is a variant of the same method in which the step size is
itself computed by a hash function.

3.4.2

Properties

[1] Dale, Nell (2003). C++ Plus Data Structures. Sudbury,


MA: Jones and Bartlett Computer Science. ISBN 0-76370481-4.
[2] Knuth, Donald (1963), Notes on Open Addressing
[3] Pagh, Anna; Pagh, Rasmus; Rui, Milan (2009), Linear
probing with constant independence, SIAM Journal on
Computing 39 (3): 11071120, doi:10.1137/070702278,
MR 2538852
[4] Ptracu, Mihai; Thorup, Mikkel (2010), On the kindependence required by linear probing and minwise independence, Automata, Languages and Programming,
37th International Colloquium, ICALP 2010, Bordeaux,
France, July 6-10, 2010, Proceedings, Part I, Lecture
Notes in Computer Science 6198, Springer, pp. 715726,
doi:10.1007/978-3-642-14165-2_60
[5] Ptracu, Mihai; Thorup, Mikkel (2011), The power
of simple tabulation hashing, Proceedings of the
43rd annual ACM Symposium on Theory of Computing (STOC '11), pp.
110, arXiv:1011.5200,
doi:10.1145/1993636.1993638

This algorithm, which is used in open-addressed hash tables, provides good memory caching (if stepsize is equal
to one), through good locality of reference, but also re- 3.4.6 External links
sults in clustering, an unfortunately high probability that
where there has been one collision there will be more.
How Caching Aects Hashing by Gregory L. HeileThe performance of linear probing is also more sensitive
man and Wenbin Luo 2005.
to input distribution when compared to double hashing,
Open Data Structures - Section 5.2 - Linwhere the stepsize is determined by another hash function
earHashTable: Linear Probing
applied to the value instead of a xed stepsize as in linear
probing.

3.4.3

Dictionary operation in constant


time

Using linear probing, dictionary operation can be implemented in constant time. In other words, insert, remove
and search operations can be implemented in O(1), as
long as the load factor of the hash table is a constant

Chapter 4

Text and image sources, contributors, and


licenses
4.1 Text
Abstract data type Source: http://en.wikipedia.org/wiki/Abstract%20data%20type?oldid=644974826 Contributors: SolKarma, Merphant, Ark, B4hand, Michael Hardy, Wapcaplet, Skysmith, Haakon, Silvonen, Populus, Wernher, W7cook, Aqualung, BenRG, Noldoaran,
Tobias Bergemann, Giftlite, WiseWoman, Jonathan.mark.lingard, Jorge Stol, Daniel Brockman, Knutux, Dunks58, Andreas Kaufmann,
Corti, Mike Rosoft, Rich Farmbrough, Wrp103, Pink18, RJHall, Leif, Spoon!, R. S. Shaw, Alansohn, Diego Moya, Mr Adequate,
Kendrick Hang, Japanese Searobin, Miaow Miaow, Ruud Koot, Marudubshinki, Graham87, Qwertyus, Kbdank71, MZMcBride, Everton137, Chobot, YurikBot, Wavelength, SAE1962, Fang Aili, Sean Whitton, Petri Krohn, DGaw, TuukkaH, SmackBot, Brick Thrower,
Jpvinall, Chris the speller, SchftyThree, Cybercobra, Dreadstar, A5b, Breno, Antonielly, MTSbot, Phuzion, Only2sea, Blaisorblade,
Gnfnrf, Thijs!bot, Sagaciousuk, Ideogram, Widefox, JAnDbot, Magioladitis, David Eppstein, Zacchiro, Felipe1982, Javawizard, SpallettaAC1041, AntiSpamBot, Cobi, Funandtrvl, Lights, Sector7agent, Anna Lincoln, Don4of4, Kbrose, Arjun024, Yerpo, Svick, Fishnet37222, Denisarona, ClueBot, The Thing That Should Not Be, Garyzx, Adrianwn, Mild Bill Hiccup, PeterV1510, Boing! said Zebedee,
Armin Rigo, Cacadril, BOTarate, Thehelpfulone, Aitias, Appicharlask, Baudway, Addbot, Ghettoblaster, Capouch, Daniel.Burckhardt,
Chamal N, Debresser, Bluebusy, Jarble, Yobot, Legobot II, Pcap, Vanished user rt41as76lk, ArthurBot, Nhey24, RibotBOT, FrescoBot,
Mark Renier, Chevymontecarlo, The Arbiter, RedBot, Tanayseven, Reconsider the static, Babayagagypsies, Dismantle101, Liztanp, Efphf,
Dinamik-bot, Thecheesykid, Ebrambot, Demonkoryu, ChuispastonBot, Double Dribble, Rocketrod1960, Hoorayforturtles, Frietjes, Widr,
BG19bot, Ameyenn, ChrisGualtieri, GoShow, Hower64, JYBot, Epicgenius, Cpt Wise and Anonymous: 168
Data structure Source: http://en.wikipedia.org/wiki/Data%20structure?oldid=643795357 Contributors: LC, Ap, -- April, Andre Engels,
Karl E. V. Palmen, XJaM, Arvindn, Ghyll, Michael Hardy, TakuyaMurata, Minesweeper, Ahoerstemeier, Nanshu, Kingturtle, Glenn,
UserGoogol, Jiang, Edaelon, Nikola Smolenski, Dcoetzee, Chris Lundberg, Populus, Traroth, Mrje, Robbot, Noldoaran, Craig Stuntz,
Altenmann, Babbage, Mushroom, Seth Ilys, GreatWhiteNortherner, Tobias Bergemann, Giftlite, DavidCary, Esap, Jorge Stol, Siroxo,
Pgan002, Kjetil r, Lancekt, Jacob grace, Pale blue dot, Andreas Kaufmann, Corti, Wrp103, MisterSheik, Lycurgus, Shanes, Viriditas, Vortexrealm, Obradovic Goran, Helix84, Mdd, Jumbuck, Alansohn, Liao, Tablizer, Yamla, PaePae, ReyBrujo, Derbeth, Forderud, Mahanga,
Bushytails, Mindmatrix, Carlette, Ruud Koot, Easyas12c, TreveX, Bluemoose, Abd, Palica, Mandarax, Yoric, Qwertyus, Koavf, Ligulem,
GeorgeBills, Margosbot, Fragglet, RexNL, Fresheneesz, Butros, Chobot, Tas50, Banaticus, YurikBot, RobotE, Hairy Dude, Piet Delport,
Mipadi, Grafen, Dmoss, Tony1, Googl, Ripper234, Closedmouth, Vicarious, JLaTondre, GrinBot, TuukkaH, SmackBot, Reedy, DCDuring, Thunderboltz, BurntSky, Gilliam, Ohnoitsjamie, EncMstr, MalafayaBot, Nick Levine, Frap, Allan McInnes, Khukri, Ryan Roos,
Sethwoodworth, Er Komandante, SashatoBot, Demicx, Soumyasch, Antonielly, SpyMagician, Loadmaster, Noah Salzman, Mr Stephen,
Alhoori, Sharcho, Caiaa, Iridescent, CRGreathouse, Ahy1, FinalMinuet, Requestion, Nnp, Peterdjones, GPhilip, Pascal.Tesson, Qwyrxian, MTA, Thadius856, AntiVandalBot, Widefox, Seaphoto, Jirka6, Dougher, Tom 99, Lanov, MER-C, Wikilolo, Wmbolle, Rhwawn,
Nyq, Wwmbes, David Eppstein, User A1, Cpl Syx, Oicumayberight, Gwern, MasterRadius, Rettetast, Lithui, Sanjay742, Rrwright,
Marcin Suwalczan, Jimmytharpe, TXiKiBoT, Eve Hall, Vipinhari, Coldre82, DragonLord, Rozmichelle, Falcon8765, Spinningspark, Spitre8520, Haiviet, SieBot, Caltas, Eurooppa, Ham Pastrami, Jerryobject, Strife911, Ramkumaran7, Nskillen, DancingPhilosopher, Digisus,
Tanvir Ahmmed, ClueBot, Spellsinger180, Justin W Smith, The Thing That Should Not Be, Rodhullandemu, Sundar sando, Garyzx, Adrianwn, Abhishek.kumar.ak, Excirial, Alexbot, Erebus Morgaine, Arjayay, Morel, DumZiBoT, XLinkBot, Paushali, Pgallert, Galzigler,
Alexius08, MystBot, Dsimic, Jncraton, MrOllie, EconoPhysicist, Publichealthguru, Tide rolls, , Teles, , Legobot, Luckas-bot,
Yobot, Fraggle81, SteelPangolin, Jim1138, Kingpin13, Materialscientist, ArthurBot, Xqbot, Pur3r4ngelw, Miym, DAndC, RibotBOT,
Shadowjams, Methcub, Prari, FrescoBot, Liridon, Mark Renier, Hypersonic12, Rameshngbot, MertyWiki, Thompsonb24, Profvalente,
FoxBot, Laureniu Dasclu, Lotje, Bharatshettybarkur, Tbhotch, Thinktdub, Kh31311, Vineetzone, Uriah123, DRAGON BOOSTER,
EmausBot, Apoctyliptic, Thecheesykid, ZroBot, MithrandirAgain, EdEColbert, IGeMiNix, Mentibot, BioPupil, Chandraguptamaurya,
Rocketrod1960, Raveendra Lakpriya, ClueBot NG, Aks1521, Widr, Danim, Jorgenev, Orzechowskid, Gmharhar, HMSSolent, Wbm1058,
Walk&check, Panchobook, Richfaber, SoniyaR, Yashykt, Cncmaster, Sallupandit, , Sgord512, Anderson, Vishnu0919,
Varma rockzz, Frosty, Hernan mvs, Forgot to put name, I am One of Many, Bereajan, Gauravxpress, Haeynzen, Gambhir.jagmeet, Richard
Yin, Jrachiele, Guturu Bhuvanamitra, TranquilHope, Iliazm and Anonymous: 346
Analysis of algorithms Source: http://en.wikipedia.org/wiki/Analysis%20of%20algorithms?oldid=635159205 Contributors: Bryan Derksen, Seb, Arvindn, Hfastedge, Edward, Nixdorf, Kku, McKay, Pakaran, Murray Langton, Altenmann, MathMartin, Bkell, Tobias Bergemann, David Gerard, Giftlite, Jao, Brona, Manuel Anastcio, Beland, Andreas Kaufmann, Liberlogos, Mani1, Bender235, Ashewmaker,
MCiura, Gary, Terrycojones, Pinar, Ruud Koot, Ilya, Miserlou, DVdm, YurikBot, PrologFan, SmackBot, Vald, Nbarth, Mhym, GRuban,
Radagast83, Cybercobra, Kendrick7, Spiritia, Lee Carre, Amakuru, CRGreathouse, ShelfSkewed, Hermel, Magioladitis, VoABot II, David

37

38

CHAPTER 4. TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

Eppstein, User A1, Maju wiki, 2help, Cometstyles, The Wilschon, BotKung, Groupthink, Keilana, Xe7al, Ykhwong, Alastair Carnegie,
Ivan Akira, Roux, Jarble, Legobot, Yobot, Fraggle81, Pcap, GateKeeper, AnomieBOT, Materialscientist, Miym, Charvest, Fortdj33,
124Nick, RobinK, WillNess, RjwilmsiBot, Uploader4u, Jmencisom, Wikipelli, The Nut, Tirab, Tijfo098, ClueBot NG, Ifarzana, Satellizer,
Tvguide1234, Helpful Pixie Bot, Intr199, Manuel.mas12, Liam987, AlexanderZoul, Jochen Burghardt, Cubism44, Vieque and Anonymous:
74
Array data type Source: http://en.wikipedia.org/wiki/Array%20data%20type?oldid=634958546 Contributors: Edward, Michael Hardy,
Jorge Stol, Beland, D6, Spoon!, Bgwhite, RussBot, KGasso, SmackBot, Canthusus, Nbarth, Cybercobra, Lambiam, Korval, Hvn0413,
Mike Fikes, JohnCaron, Hroulf, Vigyani, Cerberus0, Kbrose, ClueBot, Mxaza, Garyzx, Staticshakedown, Addbot, Yobot, Denispir, Pcap,
AnomieBOT, Jim1138, Praba230890, Nhantdn, Termininja, Akhilan, Thecheesykid, Cgtdk, ClueBot NG, Mariuskempe, Helpful Pixie
Bot, Airatique, Pratyya Ghosh, Soni, Yamaha5 and Anonymous: 39
Array data structure Source: http://en.wikipedia.org/wiki/Array%20data%20structure?oldid=644965427 Contributors: The Anome, Ed
Poor, Andre Engels, Tsja, B4hand, Patrick, RTC, Michael Hardy, Norm, Nixdorf, Graue, TakuyaMurata, Alo, Ellywa, Julesd, Cgs, Poor
Yorick, Rossami, Dcoetzee, Dysprosia, Jogloran, Wernher, Fvw, Sewing, Robbot, Josh Cherry, Fredrik, Lowellian, Wikibot, Jleedev,
Giftlite, DavidCary, Massysett, BenFrantzDale, Lardarse, Ssd, Jorge Stol, Macrakis, Jonathan Grynspan, Lockeownzj00, Beland, Vanished user 1234567890, Karol Langner, Icairns, Simoneau, Jh51681, Andreas Kaufmann, Mattb90, Corti, Jkl, Rich Farmbrough, Guanabot,
Qutezuce, ESkog, ZeroOne, Danakil, MisterSheik, G worroll, Spoon!, Army1987, Func, Rbj, Mdd, Jumbuck, Mr Adequate, Atanamir,
Krischik, Tauwasser, ReyBrujo, Suruena, Rgrig, Forderud, Beliavsky, Beej71, Mindmatrix, Jimbryho, Ruud Koot, Je3000, Grika, Palica,
Gerbrant, Graham87, Kbdank71, Zzedar, Ketiltrout, Bill37212, Ligulem, Yamamoto Ichiro, Mike Van Emmerik, Quuxplusone, Intgr,
Visor, Sharkface217, Bgwhite, YurikBot, Wavelength, Borgx, RobotE, RussBot, Fabartus, Splash, Piet Delport, Stephenb, Pseudomonas,
Kimchi.sg, Dmason, JulesH, Mikeblas, Bota47, JLaTondre, Hide&Reason, Heavyrain2408, SmackBot, Princeatapi, Blue520, Trojo, Brick
Thrower, Alksub, Apers0n, Betacommand, GwydionM, Anwar saadat, Keegan, Timneu22, Mcaruso, Tsca.bot, Tamfang, Berland, Cybercobra, Mwtoews, Masterdriverz, Kukini, Smremde, SashatoBot, Derek farn, John, 16@r, Slakr, Beetstra, Dreftymac, Courcelles,
George100, Ahy1, Engelec, Wws, Neelix, Simeon, Kaldosh, Travelbird, Mrstonky, Skittleys, Strangelv, Christian75, Narayanese, Epbr123,
Sagaciousuk, Trevyn, Escarbot, Thadius856, AntiVandalBot, AbstractClass, JAnDbot, JaK81600, Cameltrader, PhiLho, SiobhanHansa,
Magioladitis, VoABot II, Ling.Nut, DAGwyn, David Eppstein, User A1, Squidonius, Gwern, Highegg, Themania, Patar knight, J.delanoy,
Slogsweep, Darkspots, Jayden54, Mfb52, Funandtrvl, VolkovBot, TXiKiBoT, Anonymous Dissident, Don4of4, Amog, Redacteur, Nicvaroce, Kbrose, SieBot, Caltas, Garde, Tiptoety, Paolo.dL, Oxymoron83, Svick, Anchor Link Bot, Jlmerrill, ClueBot, LAX, Jackollie,
The Thing That Should Not Be, Alksentrs, Rilak, Supertouch, R000t, Liempt, Excirial, Immortal Wowbagger, Bargomm, Thingg, Footballfan190, Johnuniq, SoxBot III, Awbell, Chris glenne, Staticshakedown, SilvonenBot, Henrry513414, Dsimic, Gaydudes, Btx40, EconoPhysicist, SamatBot, Zorrobot, Legobot, Luckas-bot, Yobot, Ptbotgourou, Fraggle81, Peter Flass, Tempodivalse, Obersachsebot, Xqbot,
SPTWriter, FrescoBot, Citation bot 2, I dream of horses, HRoestBot, MastiBot, Jandalhandler, Laureniu Dasclu, Dinamik-bot, TylerWilliamRoss, Merlinsorca, Jfmantis, The Utahraptor, EmausBot, Mfaheem007, Donner60, Ipsign, Ieee andy, EdoBot, Muzadded, Mikhail
Ryazanov, ClueBot NG, Widr, Helpful Pixie Bot, Roger Wellington-Oguri, Wbm1058, 111008066it, Mark Arsten, Simba2331, Insidiae,
ChrisGualtieri, A'bad group, Makecat-bot, Chetan chopade, Ginsuloft and Anonymous: 238
Dynamic array Source: http://en.wikipedia.org/wiki/Dynamic%20array?oldid=635160619 Contributors: Damian Yerrick, Edward,
Ixfd64, Phoe6, Dcoetzee, Furrykef, Wdscxsj, Jorge Stol, Karol Langner, Andreas Kaufmann, Moxfyre, Dpm64, Wrp103, Forbsey, ZeroOne, MisterSheik, Spoon!, Ryk, Fresheneesz, Wavelength, SmackBot, Rnin, Bluebot, Octahedron80, Nbarth, Cybercobra, Decltype,
MegaHasher, Beetstra, Green caterpillar, Icep, Wikilolo, David Eppstein, Gwern, Cobi, Spinningspark, Arbor to SJ, Ctxppc, ClueBot, Simonykill, Garyzx, Alex.vatchenko, Addbot, AndersBot, , Aekton, Didz93, Tartarus, Luckas-bot, Yobot, , Rubinbot, Citation bot,
SPTWriter, FrescoBot, Mutinus, Patmorin, WillNess, EmausBot, Card Zero, Franois Robere and Anonymous: 38
Associative array Source: http://en.wikipedia.org/wiki/Associative%20array?oldid=644061854 Contributors: Damian Yerrick, Robert
Merkel, Fubar Obfusco, Maury Markowitz, Hirzel, B4hand, Paul Ebermann, Edward, Patrick, Michael Hardy, Shellreef, Graue,
Minesweeper, Brianiac, Samuelsen, Bart Massey, Hashar, Dcoetzee, Dysprosia, Silvonen, Bevo, Robbot, Noldoaran, Fredrik, Altenmann,
Wlievens, Catbar, Wikibot, Ruakh, EvanED, Jleedev, Tobias Bergemann, Ancheta Wis, Jpo, DavidCary, Mintleaf, Inter, Wolfkeeper,
Jorge Stol, Macrakis, Pne, Neilc, Kusunose, Karol Langner, Bosmon, Int19h, Andreas Kaufmann, RevRagnarok, Ericamick, LeeHunter,
PP Jewel, Kwamikagami, James b crocker, Spoon!, Bobo192, TommyG, Minghong, Alansohn, Mt, Krischik, Sligocki, Kdau, Tony Sidaway, RainbowOfLight, Forderud, TShilo12, Boothy443, Mindmatrix, RzR, Apokrif, Kglavin, Bluemoose, ObsidianOrder, Pfunk42,
Yurik, Swmcd, Scandum, Koavf, Agorf, Je02, RexNL, Alvin-cs, Wavelength, Fdb, Maerk, Dggoldst, Cedar101, JLaTondre, Ffangs,
TuukkaH, SmackBot, KnowledgeOfSelf, MeiStone, Mirzabah, TheDoctor10, Sam Pointon, Brianski, Hugo-cs, Jdh30, Zven, Cfallin,
CheesyPus144, Malbrain, Nick Levine, Vegard, Radagast83, Cybercobra, Decltype, Paddy3118, AvramYU, Doug Bell, AmiDaniel, Antonielly, EdC, Tobe2199, Hans Bauer, Dreftymac, Pimlottc, George100, JForget, Jokes Free4Me, Pgr94, MrSteve, Countchoc, Ajo Mama,
WinBot, Oddity-, Alphachimpbot, Maslin, JonathanCross, Pfast, PhiLho, Wmbolle, Magioladitis, David Eppstein, Gwern, Doc aberdeen,
Signalhead, VolkovBot, Chaos5023, Kyle the bot, TXiKiBoT, Anna Lincoln, BotKung, Comet--berkeley, Jesdisciple, PanagosTheOther,
Nemo20000, Jerryobject, CultureDrone, Anchor Link Bot, ClueBot, Irishjugg, XLinkBot, Orbnauticus, Frostus, Dsimic, Deineka, Addbot,
Debresser, Jarble, Bartledan, Davidwhite544, Margin1522, Legobot, Luckas-bot, Yobot, TaBOT-zerem, Pcap, Peter Flass, RibotBOT, January2009, Sae1962, Efadae, Neil Schipper, Floatingdecimal, Tushar858, EmausBot, WikitanvirBot, Marcos canbeiro, AvicBot, ClueBot
NG, JannuBl22t, Helpful Pixie Bot, Mithrasgregoriae, JYBot, Dcsaba70, Alonsoguillenv and Anonymous: 187
Association list Source: http://en.wikipedia.org/wiki/Association%20list?oldid=547284302 Contributors: SJK, Dcoetzee, Dremora, Tony
Sidaway, Pmcjones, Yobot, Helpful Pixie Bot and Anonymous: 2
Hash table Source: http://en.wikipedia.org/wiki/Hash%20table?oldid=645193336 Contributors: Damian Yerrick, AxelBoldt, Zundark,
The Anome, BlckKnght, Sandos, Rgamble, LapoLuchini, AdamRetchless, Imran, Mrwojo, Frecklefoot, Michael Hardy, Nixdorf, Pnm,
Axlrosen, TakuyaMurata, Ahoerstemeier, Nanshu, Dcoetzee, Dysprosia, Furrykef, Omegatron, Wernher, Bevo, Tjdw, Pakaran, Secretlondon, Robbot, Fredrik, Tomchiukc, R3m0t, Altenmann, Ashwin, UtherSRG, Miles, Giftlite, DavidCary, Wolfkeeper, BenFrantzDale,
Everyking, Waltpohl, Jorge Stol, Wmahan, Neilc, Pgan002, CryptoDerk, Knutux, Bug, Sonjaaa, Teacup, Beland, Watcher, DNewhall,
ReiniUrban, Sam Hocevar, Derek Parnell, Askewchan, Kogorman, Andreas Kaufmann, Kaustuv, Shuchung, T Long, Hydrox, Cfailde,
Luqui, Wrp103, Antaeus Feldspar, Khalid, Raph Levien, JustinWick, CanisRufus, Shanes, Iron Wallaby, Krakhan, Bobo192, Davidgothberg, Larry V, Sleske, Helix84, Mdd, Varuna, Baka toroi, Anthony Appleyard, Sligocki, Drbreznjev, DSatz, Akuchling, TShilo12, Nuno
Tavares, Woohookitty, LOL, Linguica, Paul Mackay, Davidfstr, GregorB, Meneth, Kbdank71, Tostie14, Rjwilmsi, Scandum, Koavf, Kinu,
Filu, Nneonneo, FlaBot, Ecb29, Fragglet, Intgr, Fresheneesz, Antaeus FeIdspar, YurikBot, Wavelength, RobotE, Mongol, RussBot, Me
and, CesarBs unpriviledged account, Lavenderbunny, Gustavb, Mipadi, Cryptoid, Mike.aizatsky, Gareth Jones, Piccolomomo, CecilWard,
Nethgirb, Gadget850, Bota47, Sebleblanc, Deeday-UK, Sycomonkey, Ninly, Gulliveig, Th1rt3en, CWenger, JLaTondre, ASchmoo, Kungfuadam, SmackBot, Apanag, Obakeneko, PizzaMargherita, Alksub, Eskimbot, RobotJcb, C4chandu, Arpitm, Neurodivergent, EncMstr,

4.2. IMAGES

39

Cribe, Deshraj, Tackline, Frap, Mayrel, Radagast83, Cybercobra, Decltype, HFuruseth, Rich.lewis, Esb, Acdx, MegaHasher, Doug Bell,
Derek farn, IronGargoyle, Josephsieh, Peter Horn, Pagh, Saxton, Tawkerbot2, Ouishoebean, CRGreathouse, Ahy1, MaxEnt, Seizethedave, Cgma, Not-just-yeti, Thermon, OtterSmith, Ajo Mama, Stannered, AntiVandalBot, Hosamaly, Thailyn, Pixor, JAnDbot, MER-C,
Epeeeche, Dmbstudio, SiobhanHansa, Wikilolo, Bongwarrior, QrczakMK, Josephskeller, Tedickey, Schwarzbichler, Cic, Allstarecho,
David Eppstein, Oravec, Gwern, Magnus Bakken, Glrx, Narendrak, Tikiwont, Mike.lifeguard, Luxem, NewEnglandYankee, Cobi, Cometstyles, Winecellar, VolkovBot, Simulationelson, Floodyberry, Anurmi, BotKung, Collin Stocks, JimJJewett, Nightkhaos, Spinningspark,
Abatishchev, Helios2k6, Kehrbykid, Kbrose, PeterCanthropus, Gerakibot, Triwbe, Digwuren, Svick, JL-Bot, ObfuscatePenguin, ClueBot,
Justin W Smith, Adrianwn, Mild Bill Hiccup, Niceguyedc, JJuran, Groxx, Berean Hunter, Eddof13, Johnuniq, Arlolra, XLinkBot, Hetori,
Pichpich, Paulsheer, TheTraveler3, MystBot, Karuthedam, Wolkykim, Addbot, Gremel123, CanadianLinuxUser, MrOllie, Numbo3-bot,
Om Sao, Zorrobot, Jarble, Frehley, Legobot, Luckas-bot, Yobot, Denispir, KamikazeBot, Dmcomer, AnomieBOT, Erel Segal, Jim1138,
Sz-iwbot, Citation bot, ArthurBot, Baliame, Drilnoth, Arbalest Mike, Ched, Shadowjams, Kracekumar, FrescoBot, W Nowicki, X7q,
Sae1962, Citation bot 1, Velociostrich, Simonsarris, Iekpo, Trappist the monk, SchreyP, Grapesoda22, Patmorin, Cutelyaware, JeepdaySock, Shagoldwasser, Kastchei, DuineSidhe, EmausBot, Super48paul, Ibbn, DanielWaterworth, Mousehousemd, ZroBot, Purplie,
Ticklemepink42, Paul Kube, Demonkoryu, Donner60, Carmichael, Pheartheceal, Aberdeen01, Teapeat, Rememberway, ClueBot NG, AznBurger, Incompetence, Rawafmail, Cntras, Rezabot, Jk2q3jrklse, Helpful Pixie Bot, Jan Spousta, MusikAnimal, SanAnMan, Pbruneau,
AdventurousSquirrel, Triston J. Taylor, CitationCleanerBot, Happyuk, FeralOink, Spacemanaki, Aloksukhwani, Emimull, Deveedutta,
Shmageggy, IgushevEdward, AlecTaylor, Mcom320, Razibot, Djszapi, QuantiedElf, Chip Wildon Forster, Tmferrara, Tuketu7, Monkbot,
Iokevins, Oleaster and Anonymous: 423
Linear probing Source: http://en.wikipedia.org/wiki/Linear%20probing?oldid=636374192 Contributors: Ubiquity, Bearcat, Enochlau,
Andreas Kaufmann, Gazpacho, Discospinster, RJFJR, Linas, Tas50, CesarBs unpriviledged account, SpuriousQ, Chris the speller, JonHarder, MichaelBillington, Sbluen, Jeberle, Negrulio, Jngnyc, Alaibot, Thijs!bot, A3nm, David Eppstein, STBot, Themania, OliviaGuest,
C. A. Russell, Addbot, Tedzdog, Patmorin, Innity ive, Dixtosa, Danmoberly and Anonymous: 15

4.2 Images
File:8bit-dynamiclist.gif Source: http://upload.wikimedia.org/wikipedia/commons/1/1d/8bit-dynamiclist.gif License: CC-BY-SA-3.0
Contributors: Own work Original artist: Seahen
File:Array_of_array_storage.svg Source: http://upload.wikimedia.org/wikipedia/commons/0/01/Array_of_array_storage.svg License:
Public domain Contributors: ? Original artist: ?
File:Commons-logo.svg Source: http://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: ? Contributors: ? Original
artist: ?
File:Dynamic_array.svg Source: http://upload.wikimedia.org/wikipedia/commons/3/31/Dynamic_array.svg License: CC0 Contributors:
Own work Original artist: Dcoetzee
File:Hash_table_3_1_1_0_1_0_0_SP.svg Source: http://upload.wikimedia.org/wikipedia/commons/7/7d/Hash_table_3_1_1_0_1_0_
0_SP.svg License: CC BY-SA 3.0 Contributors: Own work Original artist: Jorge Stol
File:Hash_table_5_0_1_1_1_1_0_LL.svg Source: http://upload.wikimedia.org/wikipedia/commons/5/5a/Hash_table_5_0_1_1_1_1_
0_LL.svg License: CC BY-SA 3.0 Contributors: Own work Original artist: Jorge Stol
File:Hash_table_5_0_1_1_1_1_0_SP.svg Source: http://upload.wikimedia.org/wikipedia/commons/b/bf/Hash_table_5_0_1_1_1_1_0_
SP.svg License: CC BY-SA 3.0 Contributors: Own work Original artist: Jorge Stol
File:Hash_table_5_0_1_1_1_1_1_LL.svg Source: http://upload.wikimedia.org/wikipedia/commons/d/d0/Hash_table_5_0_1_1_1_1_
1_LL.svg License: CC BY-SA 3.0 Contributors: Own work Original artist: Jorge Stol
File:Hash_table_average_insertion_time.png Source: http://upload.wikimedia.org/wikipedia/commons/1/1c/Hash_table_average_
insertion_time.png License: Public domain Contributors: Authors Own Work. Original artist: Derrick Coetzee (User:Dcoetzee)
File:LampFlowchart.svg Source: http://upload.wikimedia.org/wikipedia/commons/9/91/LampFlowchart.svg License: CC-BY-SA-3.0
Contributors: vector version of Image:LampFlowchart.png Original artist: svg by Booyabazooka
File:Office-book.svg Source: http://upload.wikimedia.org/wikipedia/commons/a/a8/Office-book.svg License: Public domain Contributors: This and myself. Original artist: Chris Down/Tango project
File:Question_book-new.svg Source: http://upload.wikimedia.org/wikipedia/en/9/99/Question_book-new.svg License: Cc-by-sa-3.0
Contributors:
Created from scratch in Adobe Illustrator. Based on Image:Question book.png created by User:Equazcion Original artist:
Tkgd2007
File:Text_document_with_red_question_mark.svg Source: http://upload.wikimedia.org/wikipedia/commons/a/a4/Text_document_
with_red_question_mark.svg License: Public domain Contributors: Created by bdesham with Inkscape; based upon Text-x-generic.svg
from the Tango project. Original artist: Benjamin D. Esham (bdesham)
File:Wiki_letter_w_cropped.svg Source: http://upload.wikimedia.org/wikipedia/commons/1/1c/Wiki_letter_w_cropped.svg License:
CC-BY-SA-3.0 Contributors:
Wiki_letter_w.svg Original artist: Wiki_letter_w.svg: Jarkko Piiroinen
File:Wikibooks-logo-en-noslogan.svg Source: http://upload.wikimedia.org/wikipedia/commons/d/df/Wikibooks-logo-en-noslogan.
svg License: CC BY-SA 3.0 Contributors: Own work Original artist: User:Bastique, User:Ramac et al.
File:Wikibooks-logo.svg Source: http://upload.wikimedia.org/wikipedia/commons/f/fa/Wikibooks-logo.svg License: CC BY-SA 3.0
Contributors: Own work Original artist: User:Bastique, User:Ramac et al.
File:Wikiquote-logo.svg Source: http://upload.wikimedia.org/wikipedia/commons/f/fa/Wikiquote-logo.svg License: Public domain
Contributors: ? Original artist: ?

40

CHAPTER 4. TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

File:Wikisource-logo.svg Source: http://upload.wikimedia.org/wikipedia/commons/4/4c/Wikisource-logo.svg License: CC BY-SA 3.0


Contributors: Rei-artur Original artist: Nicholas Moreau
File:Wikiversity-logo-Snorky.svg Source: http://upload.wikimedia.org/wikipedia/commons/1/1b/Wikiversity-logo-en.svg License: CC
BY-SA 3.0 Contributors: Own work Original artist: Snorky
File:Wiktionary-logo-en.svg Source: http://upload.wikimedia.org/wikipedia/commons/f/f8/Wiktionary-logo-en.svg License: Public domain Contributors: Vector version of Image:Wiktionary-logo-en.png. Original artist: Vectorized by Fvasconcellos (talk contribs), based
on original logo tossed together by Brion Vibber

4.3 Content license


Creative Commons Attribution-Share Alike 3.0

You might also like