0% found this document useful (0 votes)
5 views

Data Structures — Algorithm Design 1 documentation

The document discusses data structures, focusing on lists, dynamic arrays, and linked lists, detailing their operations and runtime complexities. It explains the implementation of these structures, their pros and cons, and provides exercises for practical understanding. Additionally, it introduces specialized containers like stacks and queues, emphasizing their unique operations.

Uploaded by

ubersetzer1322
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Data Structures — Algorithm Design 1 documentation

The document discusses data structures, focusing on lists, dynamic arrays, and linked lists, detailing their operations and runtime complexities. It explains the implementation of these structures, their pros and cons, and provides exercises for practical understanding. Additionally, it introduces specialized containers like stacks and queues, emphasizing their unique operations.

Uploaded by

ubersetzer1322
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Docs » Data Structures

Data Structures
A data structure represent a particular way of organizing data in the computer memory and a set of operations on this data. The
problem of designing efficient data structures is tightly linked with algorithm design. While some data structures are very specific
to particular algorithms, there exist a set of data structures which are so common that it is important to know them, and to
understand their pros and cons.

In this chapter we will recall the most common data structures, that you should have seen in the introduction to programming
courses, and their runtime complexity for common operations. We will see that for a given set of operations, we usually have
different possible implementations, each having its own benefits and limitations.

Lists
List is a very common data structure used to store values where each value is associated to an index between 0 and n − 1, with
n the number of elements in the list (of course indices could start at 1 instead of 0, this is a convention).

Common operations on lists are the following:

Operation Description

at(k) or [k] get the element at index k

previous(e) get the element before the given element e

next(e) get the element after the given element e

insert_front(ne) insert the new element ne at the beginning of the list

insert_back(ne) insert the new element ne at the end of the list

insert_after(e, ne) insert the new element ne after the given element e

remove_front() remove the element at the beginning of the list

remove_back() remove the element at the end of the list

remove(e) remove the given element e

find(e) find the if the given element e is in the list

extend(l) insert all the elements of the list l at the end of the current list

The most common ways to implement those operations are dynamic arrays and linked lists.

 Note
In object-oriented programming languages, the data-structure List is often represented by an interface or an abstract class.
Dynamic arrays and linked lists are then two different classes that implement this interface or that inherit from this abstract
class. For example in Java, List is the interface, ArrayList is a class implementing this interface as a dynamic array and
LinkedList is a class implementing this interface as a linked list.

Dynamic array
Dynamic array is a very common implementation of lists which is readily available in the standard library of many programming
languages: ArrayList in Java, std::vector in C++, list in Python, List in C#, Array in JavaScript…
In this case, the list elements are simply stored in an array. The size of the underlying array is called the capacity, it might be larger
than or equal to the number of elements n in the list: the additional free space is used when new elements are added to the list.

3 5 2
size
capacity
Fig. 7 A dynamic array, with 3 elements and a capacity of 6.

The declaration of a dynamic array will usually looks like this:

class List:
def __init__(self):
self.array = [None]; # underlying array, current capacity is the length of the array
self.size = 0; # number of elements in the list

Elements in a dynamic array can be accessed in constant time O(1) ( at , previous , next ). Adding/removing an element at the
front or somewhere in the middle of the array implies to move all the elements after the position of the insertion or deletion,
leading to a linear worst case runtime complexity Θ(n).

insert 7

3 5 2

3 5 2

3 7 5 2
Fig. 8 Inserting an element in the middle of a dynamic array implies to move all the elements after the position of insertion.

For example, the pseudocode for removing the front element can be:

class List:
...

def remove_front(self):
# move all the elements to the left, first is discarded
for i in range(1, self.size):
self.array[i - 1] = self.array[i]
self.size = self.size - 1

Because the elements of a dynamic array are packed at the beginning, removing and adding elements at the end of the array is
simpler. However, a difficulty happens when the size of the list reaches its capacity, inserting a new element then requires to
increase the size of the underlying array. The pseudocode of the insert_back function can be:

class List:
...

def insert_back(self, new_element):


if self.size == len(self.array): # underlying array is full
self.increase_array_size() # reallocate a new array: complexity O(L.size), defined below

self.array[self.size] = new_element # put new element at the end


self.size = self.size + 1; # increase size

In practice, these two operations, insert_back and remove_back , are done in constant amortized time Θ(1). The term amortized
here means that over a sequence of k insertions at the back of the array, the average complexity is constant, but, for a single
insertion the complexity might indeed be worth (linear in this case).

 Resizing dynamic arrays


In practice, “resizing the underlying array” means that we must allocate a new array and copy the old one in the new one, so
we must copy the n elements of the array which obviously has a linear time complexity Θ(n). The question is then, when we
allocate a new array: what should be its new capacity? If the new capacity is too small, we might have to reallocate a new one
quickly if several new elements are inserted (and thus recopy all the elements). If the new capacity is too large, then we will
waste memory.
A nice result shows that, if the capacity is multiplied by a constant k > 1 called the growth factor, then the array reallocations
happen rarely enough that they become negligible in average. The increase_array_size function of the dynamic array class
could then be:

class List:
...

def increase_array_size(self):
# allocate new array
newCapacity = math.ceil(len(self.array) * growth_factor) # growth_factor > 1
newArray = [None] * newCapacity # new array of size newCapacity
# copy the old array in the new array
for i in range(self.size):
newArray[i] = self.array[i]
# replace the old array by the new one
self.array = newArray

Let’s consider a simple example, where we start with an empty list of capacity 1 and a growth factor of 2. Assume that we
have a sequence of 2p insertions, with p > 1. Reallocations occur when the capacity is equal to 1, 2, 4, 8, 16, …, 2p−1 and at
each reallocation we have to copy all the elements of the array (so we have to copy 1, 2, 4, 8, 16 elements…). Thus, to insert
our 2p elements, the total number of element copies will be:

p−1 p
1 + 2 + 4 + 8 + … + 2 = 2 − 1.

As we have made 2p insertions, in average, we made:


p
2 − 1 1
= 1 − = O(1)
p p
2 2

copies. Therefore, in average, the number of copies per insertion is constant. Some insertions still take a linear time Θ(n), but
those costly insertions are rare and their cost is amortized by all the insertions that are done in constant time Θ(1). This result
can be generalized to any sequence of insertions and removals.

The growth factor is a trade-off between memory and runtime: a large growth factor will lead to less reallocations but will
waste more memory.

Finding an element in a dynamic array can be done in linear time Θ(n) with the linear search algorithm that was described in the
introductory example Algorithm complexity.

Merging two dynamic arrays is also done in linear time with respect to the size of the two lists.

Dynamic arrays are extremely common in practice because they are simple to implement and efficient for most operations. It is
thus essential to be comfortable with using them correctly in algorithms.

Exercice 1 : Unique elements in a list


Assume we have a list of numbers A. The list A can contain duplicated elements and the goal is to find a list B, denoted
unique(A) , containing all unique elements in A.

For example: a possible solution of unique([4,3,22,3,11,4,-1 ,3,-1]) is [-1,3,4,11,22] . Note that the order of the elements in
unique(A) does not matter.

A simple strategy to do this is to sort the element of A and then browse the elements of the sorted list: whenever an element
is different from the previous one, it means that we have found a new unique element.

Programming exercise : Implement unique in the  python notebook

Give the worst-case runtime complexity of your algorithm.

Exercice 2 : Intersection of two lists


Consider the following problem: given two lists of integers A and B, we want to compute the intersection of the two lists, i.e.
the list of integers that are present in both lists. The intersection list should not contain duplicates.

For example, if we have the two lists [3, 1, 5, 2, 4] and [2, 7, 8, 5, 3] , the intersection list, denoted inter([3, 1, 5, 2,

4], [2, 7, 8, 5, 3]) , should be [3, 5, 2] . Note that the order of the elements in the intersection list is not important.
A simple strategy to do this is to sort the elements of A and B, and then, to browse the elements of the two sorted arrays in
parallels:

if the current elements in A and B are the same we have found a new common element, if this element is different from
the last found element, then we add it to the intersection list;,
if the current element of A (resp. B) is smaller than the current element of B (resp. A), then we move to the next element
in A (resp. B).

Programming exercise : Implement inter in the  python notebook

Give the worst-case runtime complexity of your algorithm in terms of the size of the two lists n and m.

Linked list
Linked lists do not rely on a contiguous array to store elements, instead, they store each element in an individual node. In order to
keep track of all the nodes in the list, each node also stores a link (pointer or reference) toward the next node in the list and
possibly the previous one. A linked list that only stores a link toward the next node is called a singly linked list; a list where each
node contains a link toward the previous and the next node is called a doubly linked list.

3 5 2 3 5 2
Singly linked list Doubly linked list
Fig. 9 A singly and a doubly linked list with 3 elements.

The first node of a linked list is called the head and the last one the tail. We assume that it is possible to know the size of singly
and doubly linked lists in constant time Θ(1) (in practice, the linked list data structure stores an integer which is
incremented/decremented whenever an element is inserted/removed). Contrarily to dynamic arrays, it is possible to insert an
element anywhere in a linked list in constant time Θ(1) because there is no need to reorganize the other elements in the list: only
a few links need to be updated. Removing an element can be done in constant time anywhere in a doubly linked list, however, in a
singly linked list, we need to know the previous in order to update its reference to the next node: if this node is known the
removal can be done in constant time, otherwise it’s in linear time.

7 7

3 5 2 3 5 2
Singly linked list Doubly linked list
Fig. 10 Inserting an element in the middle of a linked list implies only local changes to the next (and previous in case of doubly linked lists)
nodes.

However, with linked list, we loose efficient random access: to get the k-th element of a linked list, one must browse the k first
cells of the list from its head, leading to a linear time Θ(n) access. Accessing to the next element is done in constant time by
following the link to the next node. With doubly linked list, accessing to the previous element is also done in constant time,
however with singly linked list, as we don’t have a link to the previous node, one has to browse the list from its head to find the
previous element, leading to a linear time operation.

Finding an element in a linked list can be done in linear time Θ(n) with the linear search algorithm that was described in the
introductory example Algorithm complexity.

Extending a linked list can be done in constant time by just gluing the tail of the first list with the head of the second list.

Summary
In summary, the runtime complexities of the operations these 3 possible implementation of list are:

list\operation at previous next insert/remove_front insert/remove_back insert_after remove fi

Dynamic array Θ(1) Θ(1) Θ(1) Θ(n) Θ(1)



Θ(n) Θ(n) Θ(

Singly linked list Θ(n) Θ(n) Θ(1) Θ(1) Θ(n) Θ(1) Θ(n) Θ(

Doubly linked list Θ(n) Θ(1) Θ(1) Θ(1) Θ(1) Θ(1) Θ(1) Θ(

with n the size of the list (and m the size of the second list in extend ) and ∗ denotes amortized complexity.
Exercice 3 : Choosing the right implementation
The following algorithms take a list as parameter:

def algorithm_1(list):
for e in list:
if list.previous(e) is defined:
list.previous(e) = e

def algorithm_2(list):
for e in list:
if list.next(e) is defined:
list.next(e) = e/2

def algorithm_3(list):
for i in range(len(list) // 2):
list[i] = list[list.size() - 1 - i]

def algorithm_4(list):
for e in list:
if e < 0:
list.remove(e)

def algorithm_5(list):
x = list[0];
while x is defined:
if x > 0:
list.insert_front(x)
x = list.next(x)

Give the worst case complexity of each of these algorithms, for each of the three list implementations described above:

Algorithm Dynamic array Singly linked list Doubly linked list

algorithm_1

algorithm_2

algorithm_3

algorithm_4

algorithm_5

Corriger >

Containers based on lists


Lists are very generic data-structures with a lot of operations allowed. In algorithms, we often like to use more specialized
containers, like stacks or queues, whose stronger semantic helps to better understand the intent of the data structure.

Stacks
A stack, or LIFO (Last In First Out) data structure, is a container with only 2 operations:

push(ne) : insert a new element at the top of the stack


pop() : remove and return the element at the top of the stacks

A third operation called top() that returns the top element without removing it is also often proposed for efficiency (note that it
can be obtained by combining pop and push ). A stack can therefore only be accessed from one end.
1 2 3 4 5 6
5 Push
4 Push
3 5
Push 4 4
2 Push 3 3 3
Push 2 2 2 2
1 1 1 1 1

6 6 7 8 9 10
Pop 5
Pop 4
5
Pop 3
4 4
3 3 3 Pop 2
Pop
2 2 2 2
1 1 1 1 1

Fig. 11 Representation of a stack, the elements can only be retrieved in the reverse order of the order in which they were placed on the
stack (image source stack wikipedia ).

Exercice 4 : Reverse polish notation


Consider the following problem: You are given an expression in reverse Polish notation (RPN), also known as postfix
notation. In RPN, operators follow their operands, such as 3 4 + instead of 3 + 4 . The goal is to evaluate the expression
and return the final value. You can assume that the expression is valid and contains only integers with a single digit and
operators + and * .

For example, given the expression 3 4 + 5 * , you should return 35 (since 3 + 4 = 7 and 7 * 5 = 35 ).

A simple algorithm can solve this problem in linear time Θ(n) with a stack. The idea is to browse the expression from left to
right, then:

each time you encounter a number, push it on the stack;


each time you encounter an operator, pop the last two numbers from the stack, perform the operation on these two
numbers and push the result back on the stack;
at the end, pop and return the last value remaining on the stack.

Programming exercise : Implement reverse_polish_evaluation in the  python notebook

Exercice 5 : Syntax error


Consider the following problem: we have as input an array of characters (a string) of size n, consisting of the characters ()

[] , and the goal is to determine whether its parentheses () and square brackets [] are “balanced” or not.

For example:

“[()[]]” is balanced
“[()” is not balanced (the first left square bracket has no corresponding right square bracket)
“[(])” is not balanced (also the numbers of left and right paranthesis and square brackets match, they are interleaved)

A simple algorithm can solve this problem with a stack. The idea is the following one: we scan the string from left to right, and
each time we encounter a left parenthesis or a left square bracket, we push it on the stack. Each time we encounter a right
parenthesis or a right square bracket, we pop the stack and check that the popped element is the corresponding left
parenthesis or left square bracket.

Programming exercise : Implement the function is_balanced in the  python notebook

Give the worst-case runtime complexity of your algorithm.

Queues
A queue, or FIFO (First In First Out) data structure, is a container with only 3 operations:

enqueue(ne) : insert a new element at the back of the queue


peek() : return the element at the front of the queue
dequeue() : remove the element at the front of the queue

Elements are thus inserted at one end of the queue and retrieved at the other end.
1 2 3 4

5 2 3 5
Enqueue Enqueue Dequeue Dequeue

3 5 3 2 5 2

Fig. 12 Representation of a queue, the elements are retrieved in there order of insertion.

Exercice 6 : Print numbers in binary


Consider the following problem: given a positive integer n, print all the binary numbers between 1 and n in increasing order.

For example, for n = 10 , the algorithm should print: 1, 10, 11, 100, 101, 110, 111, 1000, 1001, 1010

A simple algorithm can solve this problem with a queue. The idea is the following one: we start with the number 1, and we
enqueue it. Then, we dequeue the first number x in the queue, print it, and enqueue its two successors x0 and x1 in the
queue (for example, if x = 10, we enqueue 100 and 101). We repeat this operation until we have printed n numbers.

Write this algorithm in pseudocode.

Dequeues
A dequeue (double-ended queue), pronounced deck, like “cheque”, is a container with 6 operations:

insert_front(ne) : insert a new element at the front of the queue


remove_front() : remove the element at the front of the queue
peek_front() : return the element at the front of the queue
insert_back(ne) : insert a new element at the back of the queue
remove_back() : remove the element at the back of the queue
peek_back() : return the element at the back of the queue

Elements can therefore be accessed from both ends of the dequeue.

Dequeues can be implemented with dynamic arrays or linked lists, an efficient implementation is proposed in an optional exercise
bellow.

Priority queues
In a priority queue, each element is associated to a priority, i.e. a number which indicates the importance of the element. Instead
of behave retrieved in the order of insertion, like in a classical queue, the elements are retrieved by order of priority: the most
important element is served first. By convention, we assume that a lower the priority value, the more important the element is
(for example, an element with priority 2 comes before an element with priority 8).

Common operations on priority queues are the following:

Operation Description

find-min() get the element of minimum priority

delete-min() remove the element of minimum priority

insert(ne, p) insert the new element ne with priority p

decrease_key(e, p) reduce the priority of the element e to the value p

merge(pq) merge the priority queue pq with the current priority queue

Priority queues are usually implemented with heaps which can themselves rely on a partially sorted contiguous array (they are like
a dynamic array with an additional heap property) or on a generalization of linked list where nodes can have several successors.
The most common variants of heaps and the worst case runtime complexity of their associated priority queue operations are the
following:
heap\operation find-min delete-min insert decrease_key merge

Binary Θ(1) Θ(log(n)) O(log(n)) O(log(n)) Θ(n)

Binomial Θ(1) Θ(log(n)) Θ(1)



Θ(log(n)) O(log(n))

Fibonacci Θ(1) O(log(n))



Θ(1) Θ(1)

Θ(1)

Where n is the size of the heap and * indicates amortized time complexity. While Fibonacci heaps have the best runtime
complexity, they are also the most complex to implement, with a large runtime and memory overhead. That’s why, in practice, it
may be more interesting to use simpler heap implementations, especially the binary heap which only required a dynamic array, in
particular when the merge operation is no needed.

Exercice 7 : Heap sort


In the previous chapter we have seen two sorting algorithms with quadratic worst case runtime complexity. With the heap
data structure we can define a new very simple sorting algorithm (assuming that the heap data structure is already
implemented):

1. Insert all the elements of the unsorted array in a heap


2. Retrieve all the elements of the heap and put them in the order of arrival in a new list

Programming exercise : Implement heap sort in the  python notebook and verify that the experimental runtime matches
with the theoretical complexity.

Assume that we use a binary heap in the above algorithm, what is the worst case runtime complexity of this algorithm ?
(n is the size of the array)

Can we improve the worst case runtime complexity of heap-sort by using the Binomial or the Fibonacci heaps?
V ? F

Corriger >

Exercice 8 : K-smallest element


Given an array of numbers of size n, print the k ≤ n smallest elements of the array in increasing order (we assume that
printing is done in constant time).

1. Propose a first algorithm in pseudocode with a linearithmic runtime Θ(n log(n)) using heap-sort.
2. Propose a second algorithm in pseudocode with a runtime Θ(n + k log(n)) using a binomial heap.
3. Propose a third algorithm in pseudocode with a runtime Θ(n log(k)) using a binary heap (you can consider that the
heap gives access to the maximal element instead of the minimal one). Hint: the heap doesn't need to hold more than *k*
elements.

Dictionaries
A dictionary is a data structure where each stored element is associated to a unique key (no two elements can have the same key).
In some way, dictionaries are a generalization of lists: in a list each element is associated to a unique integer, its index, between 0
and the length of the list minus 1; in a dictionary indices can be (mostly) anything and are called keys. Dictionaries are also called
associative arrays or maps.

Common operations on dictionaries are:

Operation Description

put(k, ne) insert the new element ne associated with the key k

get(k) or [k] get the element associated to the key k

remove(k) remove the element associated to the key k


There are mainly two efficient ways to implement dictionaries (with many variants): hash maps or search trees.

Hash maps rely on an auxiliary hash function that associates an index to any key which allows storing the elements in a classical
array at the index given by the hash of its associated key. However, it is possible that two different keys have the same hash
value, they are hence associated to the same index: in this case, we say that there is a collision between the two keys and special
technics must be used to handle this problem. If we assume that the hash function is good (somehow, it produces few collisions),
then the three operations put , get and remove have a constant average runtime Θ(1). However, the worst case is linear O(n)
with respect to the size n of the hash maps.

Search trees on the other hand assumes that the keys are totally ordered, for any two keys k1 and k2, we have either k1 ≤ k2 or
k2 ≤ k1. The three operations put , get and remove have a logarithmic average and worst case runtime O(log(n)). Search trees

are thus less efficient than hash maps in average, but they behave better in the worst case.

 Warning
Dictionaries have become very popular as general purpose containers with high level interpreted languages such has Python
and JavaScript. Existing implementations of dictionaries manage to have good runtime complexities in average, but they are
complex data structure with a significant constant overhead both in terms of runtime and memory. While this overhead can
be ignored in high level interpreted scripts, it becomes important when we consider the efficient implementation of
algorithms, it is thus a good idea to avoid dictionaries when simpler data structure can be used.

Check your understanding


Exercice 9 : Basic knowledge on data-structures
In the following, efficient must be understood as anything with a (amortized) runtime complexity stricly better than linear (
o(n)).

A queue is a FIFO container V ? F

A stack is a FIFO container V ? F

A doubly linked list is a good container to store a static (that does not change over time) collection of items V ? F

A dynamic array list is a good container to store a static (that does not change over time) collection of items V ? F

Any element can be be efficiently accessed in a doubly linked list V ? F

Finding the smallest element of a dictionary is efficient V ? F

Finding the smallest element of a heap is efficient V ? F

Finding the smallest element of a list is efficient V ? F

We can access efficiently to an element of a given priority in a priority queue V ? F

The insertion of new elements in a dictionary is efficient V ? F

Searching if a key exists in a dictionary is efficient V ? F

The insertion of new elements in the middle of a dynamic array is efficient V ? F

The insertion of new elements at the front of a dynamic array is efficient V ? F

The insertion of new elements at the back of a dynamic array is efficient V ? F

In a queue, we can insert elements at the back and at the front V ? F

We can insert/remove elements efficiently at the front and at the back of a dequeue V ? F
Searching if an element exists in a list is efficient V ? F

In a dynamic array, swapping two elements is efficient V ? F

Corriger >

Going further
Exercice 10 : Dynamic array insertions
Compare the effect of two different resizing strategies for dynamic arrays:

1. Linear: the capacity is increased by a constant k = 5 each time the array is full.
2. Geometric: the capacity is multiplied by a constant k = 2 each time the array is full.

Programming exercise : Implement the two functions new_size and insert_back of the class dynamic array in the
 python notebook and compare the runtime of the two strategies for a sequence of insertions.

Exercice 11 : Dequeue based on dynamic arrays


We have seen that dequeues are containers where elements can be inserted and removed at the front and back. Because
the elements in classical dynamic arrays are packed at the front, they are not suitable for dequeue implementation: insertion
and deletion at the front are not efficient. A simple way to alleviate this issue is to put the elements in the middle of the array
instead of the front. It is then possible to insert new elements in the free space at the front and at the back.
start

2 3 5
size
capacity
Fig. 13 A dequeue with 3 elements (2, 3, 5) with a capacity of 5. The elements in the array are not packed at the front, the
first value is located at the index start (1 here).

If the space becomes insufficient at the front or at the back, a larger array is allocated and the elements of the old array are
copied in the middle of the new one. If the new capacity is c and the number of elements in the dequeue is n, the new start
index is simply ⌊(c − n)/2⌋

1 1
insert_front
7 2 3 5

start
2 increase array size

7 2 3 5

start
3 insert new element

1 7 2 3 5

start
Fig. 14 Insertion at front in a dequeue with no space left at front.

Programming exercise : Implement a dequeue container based on a dynamic array, all operations must have a constant
(amortized) runtime, in the  python notebook

For the array container, we will use a Numpy array (to avoid using Python lists which are already dynamic arrays). We will
use a growth factor of 2. For simplicity we never shrink the array when elements are removed.

Exercice 12 : Reverse singly linked list


Singly linked lists are elegant and light data structures but the algorithms for common operations can become tricky with
them!

Propose a non-recursive algorithm that that reverses a singly linked list of n elements in linear time Θ(n) using only a
constant amount of memory O(1) (you cannot allocate an array with the same size as the list for example).

Programming exercise : Implement the singly linked list reverse function in the  python notebook

You might also like