Uol Algorithms
Uol Algorithms
CIS226
Software engineering, algorithm
design and analysis (vol.2)
Subject guide
Publisher:
University of London Press
Senate House
Malet Street
London
WC1E 7HU
Preface v
1 Algorithm analysis 1
1.1 Essential reading . . . . . . . . . . . . . . . . . . . . . 1
1.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . 1
1.3 Problems and algorithms . . . . . . . . . . . . . . . . . 1
1.3.1 Implementation . . . . . . . . . . . . . . . . . . 2
1.4 Pseudo-code for algorithm description . . . . . . . . . 2
1.5 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Measures of performance . . . . . . . . . . . . . . . . 3
1.7 Algorithm analysis . . . . . . . . . . . . . . . . . . . . 4
1.8 Model of computation . . . . . . . . . . . . . . . . . . 5
1.8.1 Counting steps . . . . . . . . . . . . . . . . . . 6
1.8.2 Implementation . . . . . . . . . . . . . . . . . . 6
1.8.3 Characteristic operations . . . . . . . . . . . . . 7
1.9 Asymptotic behaviour . . . . . . . . . . . . . . . . . . 8
1.9.1 Big O notations . . . . . . . . . . . . . . . . . . 8
1.9.2 Comparing orders of two functions . . . . . . . 9
1.10 The worst and average cases . . . . . . . . . . . . . . . 9
1.10.1 Implementation . . . . . . . . . . . . . . . . . . 10
1.10.2 Typical growth rates . . . . . . . . . . . . . . . 12
1.11 Verification of an analysis . . . . . . . . . . . . . . . . 12
i
CIS226 Software engineering, algorithm design and analysis (vol.2)
2.8.2 Implementation . . . . . . . . . . . . . . . . . . 37
2.8.3 Applications . . . . . . . . . . . . . . . . . . . . 39
2.9 Queues . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.9.1 Operations on queues . . . . . . . . . . . . . . 40
2.9.2 Implementation of queues . . . . . . . . . . . . 41
2.9.3 Variation of queues . . . . . . . . . . . . . . . . 43
2.10 Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.10.1 Collision . . . . . . . . . . . . . . . . . . . . . . 46
2.10.2 Collision resolving . . . . . . . . . . . . . . . . 46
2.10.3 Extra work for retrieval process . . . . . . . . . 50
2.10.4 Observation . . . . . . . . . . . . . . . . . . . . 50
ii
5.7 Binary search trees . . . . . . . . . . . . . . . . . . . . 109
6 Sorting 113
6.1 Essential reading . . . . . . . . . . . . . . . . . . . . . 113
6.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . 113
6.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 113
6.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.5 Insertion Sort . . . . . . . . . . . . . . . . . . . . . . . 114
6.5.1 Algorithm analysis . . . . . . . . . . . . . . . . 115
6.6 Selection sort . . . . . . . . . . . . . . . . . . . . . . . 115
6.7 Shellsort . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.8 Mergesort . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.9 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.10 General lower bounds for sorting . . . . . . . . . . . . 124
6.11 Bucket sort . . . . . . . . . . . . . . . . . . . . . . . . 126
6.12 Sorting large records . . . . . . . . . . . . . . . . . . . 127
6.13 Heapsort . . . . . . . . . . . . . . . . . . . . . . . . . . 128
iii
CIS226 Software engineering, algorithm design and analysis (vol.2)
iv
Preface
Introduction
Textbooks
The materials covered in the books may overlap and not every
chapter of a book is required for the examination. Hence you are
not expected to read all the books, nor all the chapters of a single
book on the list. You do, however, need to get hold of at least ONE
book on algorithms and data structures for frequent reference and
for studying individual topics in depth. Your book does not have to
be in Java but it should cover at least 80 percent of the examinable
topics below3 : 3
Of course, you still need to have an
access to the other 20 percent
1. Algorithms and efficiency analysis materials in various sources such as
in the library.
2. Abstract data types and data structures
3. Lists, stacks, queues and sets
4. Recursions, divide and conquer
5. Trees, graphs and maps
v
CIS226 Software engineering, algorithm design and analysis (vol.2)
Essential reading
Desirable reading
However, you are not required to study all the topics in these texts,
for no single text can entirely meet the requirements of this course
unit. The essential reading chapters are listed at the beginning of
each chapter of the subject guide.
List A
Michael T. Goodrich and Roberto Tamassia, Data Structures and Algorithms
in Java. (John Wiley & Sons, Inc., 2005, fifth edition)
[ISBN10 0-471-73884-0], [ISBN13 978-471-73884-8].
Duane A. Bailey Java Structures: Data Structures in Java for the principled
programmer. (McGraw-Hill Companies, Inc. 1999, McGraw-Hill
International editions) [ISBN 0-13-489428-6].
Jurg Nievergelt and Klaus H Hinrichs Algorithms & Data Structures.
(Prentice Hall, Inc., 1993) [ISBN 0-13-489428-6].
vi
Anany Levintin Introduction to the design and analysis of algorithm.
(Addison-Wesley Professional, 2003) [ISBN 0-201-743957].
These books are of interest and are recommended, but you will be
fully prepared for the examination should you have studied only
those essential texts above. These are provided for completeness
and to allow the interested reader to pursue some topics in more
depth. You will not be examined on the content of those books listed
in the references other than where the material appears in the texts
listed above.
List B
Additional reading
Mark Allen Weiss Data Structures and Problem Solving Using Java.
(Addison Wesley Longman Inc., 1998) [ISBN 0-201-54991-3].
vii
CIS226 Software engineering, algorithm design and analysis (vol.2)
This subject guide outlines the main topics in the syllabus. It can be
used as a reference which summarises, highlights and draws
attention to some important points of the subject. It cannot,
however, replace a textbook although it is fairly self-contained. The
guide sets out a sequence which helps you to study the topics in the
module within limited hours. The guide provides some additional
background material including examples, lab exercises and sample
examination questions. It also provides guidance for further
reading. recommended textbooks.
One thing you should always bear in mind is the fact that the
algorithm subject, like subjects in any other area of computer
science, has kept evolving and has been updated frequently. You
should therefore not be surprised if you find different approaches,
explanations or results in the books you read including this guide.
viii
explain the limit of computations and the complexity classes for
decision problems.
Prerequisites
Most importantly, you must also have easy access to a Java platform
or have a Java platform installed on a computer at home.
Installing Java
There are lots of public domain versions of Java among which the
most popular one is called JDK (free). It is at
http://www.javasoft.com/ or http://java.sun.com/. A great
amount of information is provided on these sites and you can
download the software.
If you are using Linux, then the free software package normally
already includes a free Java platform.
java -version
For example:
ix
CIS226 Software engineering, algorithm design and analysis (vol.2)
CmapTool
Study time
You would, however, normally have to double the study hours if you
could not attend lectures. For example, if you self study at home,
you would expect to spend six hours on intensive study of the
materials and two hours for the lab exercises, plus a similar amount
of additional homework time, every week for ten weeks or the
equivalent.
x
and record the time it took you to meet the requirements, and adjust
your plan accordingly.
Study methods
As experts have predicted that more and more people in future will
write programs without being programmers, you are recommended
to learn the important principles and apply them in your
programming practice as much as possible. The experience could be
very useful for your future career whatever you do.
More specifically:
Laboratory exercises
xi
CIS226 Software engineering, algorithm design and analysis (vol.2)
Examination
Important The information and advice given in the following section are
based on the examination structure used at the time this guide was
written. However, the university can alter the format, style or
requirements of an examination paper without notice. Because of this we
strongly advise you to check the rubric/instructions on the paper you
actually sit.
Week 1
Lecture 1-3
The aim, objectives and plan of the course
Problems and algorithms
Big-O notation
Pseudocode
Cmaptools (http://cmap.ihmc.us)
Ex 1 Time efficiency
Week 2
Lecture 4-6
Abstract Data Types
array, lists, stacks, queues, sets, (trees,
graphs, hash tables, heaps)
Specialised data structures
Algorithms Design and Implementation
Ex 2 Abstract Data Types
Lab 1 Estimating time efficiency
xii
Week 3
Lecture 7-9
Algorithm Design Techniques (1)
Sorting, selection, searching and traversal.
Ex 3 Sorting, searching, traversal and selection
Lab 2 Implementation of ADT list, or binary tree
Week 4
Lecture 10-12
Algorithm Design Techniques (2)
Divide and conquer, Recursion
Ex 4 Divide and conquer, Recursion
Lab 3 Implementation of searching a sorted list, or
traversal of a connected graph.
Week 5
Lecture 13-15
Algorithm Design Techniques (3)
Dynamic programming
trees, graphs
Ex 5 Dynamic programming
Lab 4 Implementation of a Recursion programme
Study week
no lectures/labs
Week 7
Lecture 16-18
Algorithm Design Techniques (4)
Greedy approach and heuristics
hash tables, heaps
Ex 7 Greedy approach and heuristics
Lab 5 Implementation of dynamic programming
Week 8
Lecture 19-21
Limits of Computing
Intractable problems and approximation
Introduction to NP-completeness
Ex 8 Intractability and approximation
Lab 7 Greedy approach and heuristics
Week 9
Lecture 22-24
Some well known problems and algorithms (1)
String matching problems
Ex 9 String matching problems
Lab 8 Intractability and approximation
Week 10
Lecture 25-27
Some well known problems and algorithms (2)
Computational geometry problems
Ex 10 Computational geometry problems
Lab 9 String matching problems
Week 11
Lecture 28-30
xiii
CIS226 Software engineering, algorithm design and analysis (vol.2)
Revision
Ex 11 Sample examination questions
Lab 11 Computational geometry problems
Activity 0.0
xiv
Chapter 1
Algorithm analysis
1
CIS226 Software engineering, algorithm design and analysis (vol.2)
Instance:
What is the minimum element of (2,5,8,3) entered on a single line
from the keyboard?
1.3.1 Implementation
import java.util.Scanner;
int min() {
while (x!=999) {
System.out.println("x=? (999 to end) ");
x = input.nextInt();
if (x < min) {
min = x;
}
}
return min;
}
2
Measures of performance
1.5 Efficiency
3
CIS226 Software engineering, algorithm design and analysis (vol.2)
the results.
Observation
Solution
4
Model of computation
The assumptions help keep analysis feasible and focused, for certain
hardware details are ignorable from an algorithmic point of view.
For example, standard operations such as addition, subtraction,
multiplication, division, comparison, assignment and conditional
control would require different amount of time to run on a real
computer. The storage of real computers is limited and the memory
for integers and reals may be of a different size. However, taking the
difference made by these hardware details into consideration gives
little impact on the result in comparison of different algorithms,
because these standard operations are required for almost every
algorithm. Taking too many details into consideration can, if
anything, make an analysis too complicated to be carried out. 5
CIS226 Software engineering, algorithm design and analysis (vol.2)
Example 1.3 Pn
Problem: Compute k=1 k, where n is an integer.
Pn n(n+1)
From the fact k=1 k= 2 , we have
1.8.2 Implementation
How about the space efficiency? Let a simple variable require one
unit of storage. Algorithm 1.2 needs three units since it involves
three variables sum, k and n, and Algorithm 1.3 only needs one unit
since it involves only one variable n. We can therefore conclude that
Algorithm 1.3 is more efficient in terms of storage, too.
6
Model of computation
For loops:
Consecutive statements:
If-then-else:
Logarithmic
Exponential.
7
CIS226 Software engineering, algorithm design and analysis (vol.2)
We are often interested in the rate of growth of the time required for
an algorithm when the input size gets larger. So the lower order
terms of the time complexity T (n) could be ignored, where n is a
positive integer. In other words, we only need to master the
asymptotic behaviour of T (n). Here the term asymptotic means
approximate in a specific way.
Here n and n1 are both simpler than T (n) and it is easier to handle
their behaviour in an analysis.
8
The worst and average cases
Example 1.7 Given TA1 (n) = 1000n and TA2 (n) = n3 , which function
grows faster when n → ∞? What is the relationship between the two
functions?
Example 1.8
9
CIS226 Software engineering, algorithm design and analysis (vol.2)
The worst case analysis could help to provide an estimate for a time
limit for a particular implementation of an algorithm. It is
particularly useful in real time applications. The average case
analysis is more meaningful in providing an overall picture because
it computes the number of steps performed for each possible input
instance of size n and then takes the (probability-weighted) average.
In this course, we shall consider only the worst case analysis if not
specified otherwise.
1.10.1 Implementation
10
The worst and average cases
found = ( x==Y[i] );
i++;
}
return found;
}
...
Step 1 and 2: 2
Step 1--6: 21
All steps: 22
...
...
Step 1--2: 2
Step 1--6: 6
All steps: 7
...
11
CIS226 Software engineering, algorithm design and analysis (vol.2)
class test {
acount.printArray(A);
System.out.println(acount.foundFirstX(2, A));
System.out.println();
acount.printArray(B);
System.out.println(acount.foundFirstX(2, B));
}
}
Some functions are commonly seen with typical growth rates in the
algorithm analysis. We list some common ones here.
Note All logarithms in this subject guide are of base 2 if not stated
otherwise.
Functions Name
c constant
log n logarithmic
log2 n log-squared
n linear
n log n n-log-n
n2 quadratic
n3 cubic
2n exponential
12
Verification of an analysis
Activity 1.11
T IME COMPLEXITY
1. Discuss briefly the time complexity in the worst case for the
algorithm below. Indicate the input, output of the algorithm and
the main comparison you have counted.
1: k←1
2: repeat
3: k ←2×k
4: until k ≥ N
13
Chapter 2
15
CIS226 Software engineering, algorithm design and analysis (vol.2)
16
From Abstraction to Implementation
For example, the word ‘algorithm’ contains 9 characters and the first
character is ‘a’ and the second is ‘l’ and so on in that order. An
appropriate data structure for a string can be linear such as an array,
a linked list or be a hierarchical such as a trie.1 1
We shall study the characteristics of
these individually later.
We give each operation a name which is in a similar format to a
class name in Java. It consists a string of characters ended by a pair
of brackets to quote arguments, which can be empty. The arguments
represent the input data required by the operation.
17
CIS226 Software engineering, algorithm design and analysis (vol.2)
18
Motivation of abstract data types
From the above examples, it is clear that there are overlaps of the
set of operations for various abstract data types. An object oriented
computer language provides convenience for defining the common
abstract data types and, more importantly, how to reuse certain
parts of the programs.
i 1 2 3 4 5
E[i] 3 4 6 2 5
19
CIS226 Software engineering, algorithm design and analysis (vol.2)
We shall show you more examples later on how the abstract data
types allow us to separate design and implementation issues.
Activity 2.5
2.6 Arrays
20
Arrays
2.6.1 Applications
Let i be the row number indexing student, and j the column number
indexing subject in the matrix below.
0 23 30 40 90 20
1 45 55 11 40 30
2 ...
3
4
5 .
6 .
7 .
8
9 44 55 12 48 39
21
CIS226 Software engineering, algorithm design and analysis (vol.2)
number of subjects.
22
Arrays
1 Tom 54.5 O
2 Alex 78.0 H
3 Anna 90.5 O
studentX.name = ‘Peter’;
studentX.mark = 98;
studentX.overseas = false;
Example 2.3 The assignment of a value for each field could be:
student[1].name = ‘Peter’;
student[1].mark = 98;
student[1].overseas = false;
student[3].name = ‘John’;
student[3].mark = 61.5;
student[3].overseas = true.
23
CIS226 Software engineering, algorithm design and analysis (vol.2)
Activity 2.6
A RRAYS
1. Define an array to store student marks. Suppose each student
has only one mark and there are at most one thousand students.
2. Following the above, write a method that displays the marks of
the students.
3. Write a boolean type method which takes (1) an array of
integers and (2) the size of the array as parameters and
determines if all the integers in the array are between 10 and 50
inclusive.
2.7 Lists
Arrays are powerful for storing data but they are after all a data
holder. Data is stored in contiguous locations (called storage cells
and the size of the array has to be fixed before any usage of the
array. This makes some updating operations such as insertion or
deletion or updating extremely inefficient. For example, in order to
insert a datum at some place in an array of integers, all the data
from that location on have to be moved one cell towards the end to
generate a cell for the new datum.
Example 2.4 We would like to store four integers: (12, 10, 3, 4). We
first store the four integers into the array A in a normal way, i.e. in a
contiguous way (Figure 2.2 (a)).
Suppose that we need to insert one integer before 10. We shall then
have to first move data 10, 3, 4 one location to the right. If the array
is a long one of size n, the worst case would be that we have to shift
n data items, one at a time.
Figure 2.2 (c) suggests a method to form a logical link among the
data in an array. Instead of an array of integers, let A be an array of
24
Lists
1 2 3 4 5 6 7
(a) A 12 10 3 4
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(b) A 12 10 3 4
1 2 3 4 5 6
(c) A 12 2 10 3 3 4 4
1 2 3 4 5 6
(d) A 12 4 4 3 2 10 3
(e) Head 12 10 3 4
objects with two fields, one contains the data and the other is used
to store the address of the next datum (an index of array A). Let the
first field be data and the other, next.
For each datum, the field next points to the following datum, where
the sign “/” represents the end of the linked set (i.e. the special
value null in Java). For example, the next object after {12,2}
should be {10,3}.
Figure 2.2 (d) shows that the physical location of each datum does
not really matter as long as the data link gives the correct order.
Figure 2.2 (c) and (d) both have the same logical link (i.e. data
order) as 2.2 (e). Now it is an easy matter to insert an element
before 10. We could store the new datum at the first available
physical location and modify the links of two nodes.
Example 2.5 Here simple variables p and q are pointers and their
values are 105 and 205.
p q
25
CIS226 Software engineering, algorithm design and analysis (vol.2)
p q
(a)
p q
(b)
Example 2.6 In Figure 2.5, each object consists of two fields. The first
field is the datum itself, and the second is a pointer which indicates the
address of the following object.
1 2 3 4 5 6
10 5 32 4 11 6 34 65 3 21 2
Example 2.7 The following Java source code can be used to define
such a node type and some operations:
import java.io.*;
26
Lists
next = nextNode;
} // Constructor
Example 2.8 (12, 10, 3, 4) and (‘A’, ‘B’, ‘C’, ‘D’) are two simple lists.
The first one is a list of integers and the second one is a list of
characters. Figure 2.6 and 2.7 shows their data structures.
Head 12 10 3 4
Head A B C D
Following the link of a node, we can access its next node. Similarly,
following the link of the next node we can access the next node of
the next node. In this way, all the nodes can be accessed one after
another.
Two nodes are special in such a linked list, one is the first node, and
the other is the last node. The first node of a linked list is called the
head of the list, which cannot be accessed following the link of any
list nodes. The head of a list has to be initialised. The last node of a
linked list points to nobody and has a null value for its link field.
27
CIS226 Software engineering, algorithm design and analysis (vol.2)
E F G H I J N
C D K L M
A B
The next step is to look at each operation closely and design the
algorithm for each of the above operations.
28
Lists
Suppose that p points to the current node, and newN ode points to a
new node to be inserted. There are 2 fields for each node, namely
data and next. This can be seen from Figure 2.9 and Algorithm 2.3.
newNode 7
p
head 12 10 3 4
(a)
p
7
head 12 10 3 4
(b)
If we want to add one node before the current node, the address of
the node immediately before the current one needs to be known.
There are two cases depending on whether the current node is, or
not, the head of the list.
newNode 7
previous p
head 12 10 3 4
(a)
previous p
7
head 12 10 3 4
(b)
Figure 2.10: Add one node before the non-head current node 29
CIS226 Software engineering, algorithm design and analysis (vol.2)
newNode 7
head 12 10 3 4
(a)
head 7
newNode 12 10 3 4
(b)
Suppose that p points to the current node. The deletion process can
be seen from Figure 2.12, and Algorithm 2.6.
Head 12 10 3 4
(a)
p
Head 12 10 3 4
(b)
30
Lists
2.7.7 Implementation
class List {
Node head;
// unnecessary
public void addOneNodeBefore(
Node previous, Node p, Node newNode) {
previous.next = newNode;
newNode.next = p;
}
Figure 2.13 shows how to construct the list in Figure 2.2(e) starting
from the last node, adding a new node each time into the list by
insertion at the front of the list.
31
CIS226 Software engineering, algorithm design and analysis (vol.2)
head null
head 4
head 3 4
head 10 3 4
head 12 10 3 4
tail
head 12 tail
head 12 10 tail
head 12 10 3 tail
head 12 10 3 4
2.7.9 Implementation
Example 2.11
void constructListFront() {
Scanner input = new Scanner( System.in );
int x = input.nextInt();
Node p = new Node( x );
while ( x != 999 ) {
addOneNodeBeforeHead( p );
32
Lists
x = input.nextInt();
p = new Node( x );
}
}
Node tail() {
Node p = head;
while ( p.next != null ) {
p = p.next;
}
return p;
}
void constructListTail() {
Scanner input = new Scanner( System.in );
int x = input.nextInt();
Node p = new Node( x );
Node t = tail();
while ( x != 999 ) {
addOneNodeAfter( t,p );
t = t.next;
x = input.nextInt();
p = new Node( x );
}
}
void printList() {
Node p = head;
while (p != null) {
System.out.printf( " " + p.item );
p = p.next;
}
System.out.println();
}
You need to write a main class to test the methods that we have
seen so far. Example 2.12 shows such a main class that can be used
to test the methods.
Example 2.12
public class linkList {
public static void main( String argv[] ) {
l.initilise();
System.out.println("A empty list? "+l.empty());
l.head = n;
System.out.println("A empty list? "+l.empty());
33
CIS226 Software engineering, algorithm design and analysis (vol.2)
34
Lists
Activity 2.7
35
CIS226 Software engineering, algorithm design and analysis (vol.2)
item = newItem;
next = null;
} // Constructor
2.8 Stacks
36
Stacks
Similarly to the special pointer variable head for a linked list, top is a
pointer variable that always points to the top of a stack. Figure 2.15
shows a stack of integers with a ‘1’ on the top.
1 top
6
2
4
Like linked lists, stacks are dynamic data structures. That is, the size
of a stack may change and the data come and go. We can define
certain standard operations to, for example, initialise, add an
element to, delete a datum from and check the content in a stack:
Stacks are sometimes called LIFO queues (see Section 2.9 for
queues).
2.8.2 Implementation
1. By an array
A stack can be implemented by an array. We define, for
example, a class stack that consists of an array
stackArray[0..max-1] (where max is some predefined upper
index limit), a special pointer variable top, and the following
operations (Algorithms 2.7–2.11):
37
CIS226 Software engineering, algorithm design and analysis (vol.2)
2. By a linked list
Here we define a class stack that consists of a linked list with a
special pointer variable top that points to the head of the list
(Figure 2.16), and the following operations:
top 1 6 2 4
Figure 2.16:
38
Stacks
2.8.3 Applications
a
b
102 c
21
d e
278
100
189 f
Figure 2.18 shows that the leaving and return points of each
program/procedure can be stored and retrieved using a stack. For
example, the leaving point 102 is pushed into the stack when
subprocedure b is called (1). Next 21 is pushed into the stack when
subprocedure c is called (2), and it popped and used as the return
address when procedure c is completed (3). Figures (4) to (10)
shows the content of the stack at each of the remaining stages.
In a real operation system not only the leaving and return points of
the procedure calls, but also the entire environment including the
values of local variables are stored in and retrieved from stacks.
39
CIS226 Software engineering, algorithm design and analysis (vol.2)
21
102 102 102 278
(1) (2) (3) (4) (5)
100 189
278 278 278 278
(6) (7) (8) (9) (10)
2.9 Queues
A queue is a list that restricts insertions to one end called rear and
deletions from the other end called front of a list structure.
A queue is a data structure that follows the first in first out (FIFO5 ) 5
pronounced ‘fie-foe’.
principle. Objects are added to the rear of the queue, and are
removed from the front of the queue. The concept of queues is used
in computing as they are in real life.
Example 2.15
Two addresses are ‘remembered’ for a queue, the rear (R) and the
front (F ).
Buffers
Rear
In
A B C D E
Out
Front
40
Queues
R
1 2 3 4 5 6 7 9 10
(a)
5 2 7 1
8
F R
1 2 3 4 5 6 7 8 10
(b)
5 2 7 1 4
9
F
41
CIS226 Software engineering, algorithm design and analysis (vol.2)
(a)
5 2 7 1
F
R
(b)
5 2 7 1 4
Figure 2.21: R has reached max and is cycled to the beginning of the queue array
F
R
42
Queues
There are useful data structures derived from the basic structures.
Note the difference between the term dequeue and deque: Dequeue
is an operation on a queue, and deque is a special queue that allows
the addition or deletion of an element from both ends.
43
CIS226 Software engineering, algorithm design and analysis (vol.2)
Activity 2.9
2.10 Hashing
44
Hashing
table. As we can see from Example 2.18 below, a hash code can be
calculated easily given the hash function and the key.
Solution
We first compute the hash code for each datum using the hash function:
h(7) = 7 mod 11 = 7
Similarly, we have
h(31) = 31 mod 11 = 9
h(159) = 159 mod 11 = 5
h(189) = 189 mod 11 = 2
h(23) = 23 mod 11 = 1
h(6) = 6 mod 11 = 6.
Since the hash codes are the indices (i) for the corresponding data in
the hashtable, the content of the hashtable (H) is, therefore,
i 0 1 2 3 4 5 6 7 8 9 10 11
H 23 189 159 6 7 31
Example 2.19 Suppose that we search for key k = 23. We check the
location and find h(23) = 23 mod 11 = 1, and compare the key k and
the Hashtable[1]. Since k = Hashtable[1] = 23, we know that 23 is in
the hashtable and can return its location 1.
Example 2.20 Suppose that we search for key k = 50. We check the
location and find h(50) = 50 mod 11 = 6, and compare the key k and
the Hashtable[6]. Since k = 50, but Hashtable[6] = 6, there is no
match, this means that 50 is not in the hashtable6. 6
Note: This is true only if there is no
collision (Section 2.10.1).
Observation
45
CIS226 Software engineering, algorithm design and analysis (vol.2)
3. Hashing can be very efficient in terms of both the time and the
space complexity.
2.10.1 Collision
We say that ‘data 93 and 159 are collided’ meaning that they are
both mapped to the same address (5 in this example). This is the
so-called collision problem in hashing.
The cause of the collisions is due to the attempt to map a large key
space to a limited hashtable range. A natural solution is hence to
re-allocate the collided data elsewhere.
The first approach is called closed address hashing, for it does not
consume any extra addresses of the hashtable. The number of
addresses of the hashtable will remain the same.
The second approach is called open address hashing, for the number
of addresses of the hashtable may be increased.
46
Hashing
Example 2.22 Suppose the hash function is h(k) = k mod 11, and the
rehash function is then h(k) = (k + 1) mod 11. The hash table is empty
initially. Show the content of the hash table after inserting the data
(29, 93, 31, 159, 51, 189, 27, 23, 17, 9).
Solution Again we compute the hash code(s) first and get: (7, 5, 9, 5,
7, 2, 5, 1, 6, 9).
Collisions occur since 159 and 27 have the same hash code as for 93;
51 has the same hash code as that for 29, and 9 has the same hash
code as for 31. We link, therefore, 93, 159 and 27 together, link 29
and 51 together, and link 31 and 9 together as follows.
i 0 1 2 3 4 5 6 7 8 9 10
H[i] 23 189 93 17 29 31
↓ ↓ ↓
159 51 9
↓
27
During the retrieval process, not only each hash cell but each linked
list will also be searched.
Linear probing
rh(k) = (k + 1) mod h
Example 2.23 Suppose the hash function is h(k) = k mod 11, and the
rehash function is then h(k) = (k + 1) mod 11. The hash table is empty
47
CIS226 Software engineering, algorithm design and analysis (vol.2)
initially. Show the content of the hash table after inserting the data
(29, 93, 31, 159, 51, 189, 27, 23, 17, 9).
Solution Again we compute the hash code(s) first, using the hash
function h(k) = k mod 11 and get: (7, 5, 9, 5, 7, 2, 5, 1, 6, 9).
Found the collisions for 159, 51, 27, 17 and 9 at hash cell H[i], we
rehash each of them by probing the next hash cell H[i + 1]. This
process continues until each of them can be placed in a free hash cell.
For example, since cell H[5] is occupied, we probe cell[6] and found it
available, so place 159 to cell H[6].
There are 5 probes at cell location 6–10 before placing 17 at cell H[0].
Finally, there are 5 probes at cell location 10–3 before placing 9 at cell
H[3].
The process can be described precisely as to rehash 159, 51, 27, 17 and
9 as follows:
The content of the hashtable is (the data in bold which are rehashed to
the location. ):
i 0 1 2 3 4 5 6 7 8 9 10
H 17 23 189 9 93 159 29 51 31 27
As we can see from the table below (where symbol ‘’ represents a
rehash), while 29, 93, 31, 189 and 23 can be stored in the hashtable
immediately, 159 and 51 have to be rehashed to the next locations
H[6] and H[8] after one probe. In contrast, 27, 17 and 9 have to be
rehashed to the location H[10], H[0] and H[3] after 5 probes:
48
Hashing
k 29 93 31 159 51 189 27 23 17 9
h(k) 7 5 9 5 7 2 5 1 6 9
rh1 (k) 6 8 6 7 10
rh2 (k) 7 8 0
rh3 (k) 8 9 1
rh4 (k) 9 10 2
rh5 (k) 10 0 3
Note the overlaps between the rehashing sequences for 159, 51, 27
and 17, especially 27 and 17. They hash and rehash to the same
locations H[5 · · · 10]. This is called clustering. Primary clustering
occurs when a number of different valued keys hash to the same
location and rehash to locations with collisions with the same set of
keys, such as for 159 and 27. Secondary clustering occurs when keys
that initially hash to different locations eventually rehash to the
same sequence of locations, such as for 27 and 17.
Double hashing
Example 2.24 Suppose the hash function is h(k) = k mod 11, and the
rehash function is h(k) = k mod 13. The hash table is empty initially.
Show the content of the hash table after inserting the data (29, 93, 31,
159, 51, 189, 27, 23, 17, 9).
Solution Again we compute the hash code(s) first and get: (7, 5, 9, 5,
7, 2, 5, 1, 6, 9).
rh(51) = 51 mod 13 = 12
rh(27) = 27 mod 13 = 1
rh(23) = 23 mod 13 = 10
rh(9) = 9 mod 13 = 9
The content of the hashtable becomes (the data in bold which are
rehashed to the location.):
i 0 1 2 3 4 5 6 7 8 9 10 11 12
H 27 189 159 93 17 29 31 23 51
9
49
CIS226 Software engineering, algorithm design and analysis (vol.2)
We first find the location for 159: h(159) = 159 mod 11 = 5. Since the
state is rehash, we proceed rehashing to probe location 6. This time,
the state of the cell is occupied, we know there is no further rehashing
is required and the key is found in H[6] since H[6] = 159.
2.10.4 Observation
Both Linear probing and double hashing are called open addressed
hashing because the original hashtable may grow in size after
hashing. The extra hash cells may be required as a consequence of
rehashing.
Note the word may used. It does not say that the size of the
hashtable will definitely grow by these approaches. It depends on
the data, the occupation state of the hashtable, and the hash
functions. It is possible that the original hashtable is not extended
after certain rehashing. For example, if the the data in a hashtable is
sparse, a perfect hashing may be possible. Similarly, all the collisions
may be resolved within the address range of the original hashtable,
and no extra cell is required for certain data and hash functions.
This, however, is different from closed address hashing where the
size of the hashtable is fixed and there is definitely no change with
the number of hash cells.
50
Hashing
Activity 2.10
H ASHING
1. What is it meant by Hashing?
2. What are the hash code and a hash function? How are they
used in the hashing technique? Give an example of a hash code
and a hash function.
3. In the context of hash addressing, what is a collision? Why are
collisions undesirable and why are they usually unavoidable?
4. Describe, with an example, the methods of closed address
hashing, of linear probing and of double hashing.
51
Chapter 3
Having read this chapter and consulted the relevant material you
should be able to:
Explain the concept of recursion and the advantage of the
recursive approach
Describe the concept of dynamic programming and the greedy
approach
Develop and implement simple recursive programs
Explain the dynamic programming and the greedy approach
with an example
3.3 Recursion
53
CIS226 Software engineering, algorithm design and analysis (vol.2)
f (0) = f (1) = 1
Solution
0!=1;
1!=1;
for n>1, n!= n * (n-1)!
5! = 5 ∗ 4!
= 5 ∗ 4 ∗ 3!
= 5 ∗ 4 ∗ 3 ∗ 2!
= 5 ∗ 4 ∗ 3 ∗ 2 ∗ 1!
= 5∗4∗3∗2∗1
The solution here shows that once some factorial has been found, it
can be used to compute the next (bigger) factorial. In other words,
The n factorial is known if the previous n − 1 factorial is known.
Solution
n = 0, xn = 1
54
Recursion
n > 0, xn = x ∗ xn−1
30 = 1;
35 = 3 × 34 , 34 = 3 × 33 , 33 = 3 × 32 , 32 = 3 × 31 , 31 = 3 × 30 = 3 × 1.
1. Base case
This must be a well defined termination
2. Inductive or recursive steps
This consists of well defined inductive (or recursive) steps that
must lead to a termination state.
Question: In previous examples, which are the base cases and which
are the inductive steps?
Recursive programs
55
CIS226 Software engineering, algorithm design and analysis (vol.2)
3.3.1 Implementation
56
Recursion
Solution
5
X 4
X
j = 5+ j
j=1 j=1
3
X
= 5+4+ j
j=1
2
X
= 5+4+3+ j
j=1
1
X
= 5+4+3+2+ j
j=1
0
X
= 5+4+3+2+1+ j
j=1
= 5+4+3+2+1+0
57
CIS226 Software engineering, algorithm design and analysis (vol.2)
= 5 + 4 + 3 + 2| {z
+ 1}
3
= 5 + 4 + 3| {z
+ 3}
6
= 5 + 4| {z
+ 6}
10
= 5| +
{z10}
15
= 15
Example 3.5
58
Recursion
A B C
1. Base case:
The solution for 1-disk problem (n = 1):
Move the disk from peg A to peg C (Figure 3.2).
A B C
A B C
59
CIS226 Software engineering, algorithm design and analysis (vol.2)
2. Inductive steps:
(a) The solution for 2-disk problem (n = 2):
i. Use the one-disk solution to move disk 1 to peg B,
ii. then move disk 2 to peg C and
iii. use the solution to the one-disk problem to move disk 1 to
peg C (Figure 3.3).
A B C
A B C
60
Recursion
A B C
A B C
61
CIS226 Software engineering, algorithm design and analysis (vol.2)
A B C
A B C
62
Recursion
Example 3.8 Given the declarations for the linked list structure as
below, write a recursive procedure to search the list for a particular
item x and return a pointer to the item in the list if it is found. If the
item is not found in the list, a null pointer should be returned.
It is very important for you to make sure that both base cases and
inductive steps are correctly presented in recursive programs. We
show some common errors in recursive programs written by
students.
63
CIS226 Software engineering, algorithm design and analysis (vol.2)
Correction
The procedure below is meant to display the data in a linked list one by
one.
Correction
64
Divide and conquer
Activity 3.3
R ECURSION
1. Implement one of the recursive algorithms discussed in the
Section 3.3 that:
Classical algorithms such as the binary search, merge sort and quick
sort are good examples of the divide and conquer approach.
65
CIS226 Software engineering, algorithm design and analysis (vol.2)
Binary search
Let the sorted list be L[0..n − 1] and the key be X. The idea is to
check if X is the middle element of the list L[mid], where mid is the
index of the middle element. If not, L[mid] divides the list into two
halves, and only one half needs to be checked.
Let l, r be the index of the first (left most) and the last element
(right most) of a list respectively. The middle index can then be
defined as mid = ⌊(l + r)/2⌋ (Here ⌊x⌋ reads ‘floor of x’, which
rounds x to the nearest integer≤ x. For example, ⌊2.96⌋ = 2).
Merge sort
66
Divide and conquer
l r Result in c
8 34 51 64 21 32
34 51 64 21 32 8
34 51 64 32 8 21
34 51 64 8 21 32
8 21 32 34 51 64
In each iteration, the front elements of the two queues are compared,
the smaller one (marked in bold) is then dequeued and enqueued to
the result list c. When one queue (r in this example) is empty, the
other (l = (34, 51, 64) in the example) is appended to the result list c.
67
CIS226 Software engineering, algorithm design and analysis (vol.2)
Quick sort
Similar to Merge Sort, Quicksort also splits a list into two parts
according to, however, the value of a so-called pivot element. A
pivot can be a randomly selected element in the array, for example,
the first element. The pivot value is used as a guide item to which
all the values compare. On each recursive round, we idendify a
correct pivot location such that, all the elements on its left are of
smaller value than the pivot, and all the elements on its right are of
larger value than the pivot. In this way, the pivot divides the given
list into two parts.
Let the original list be list, the left part be lList and the right part
be rList. The two parts, lList and rList are then recursively
applied the quicksort. The idea is that if both lList and rList are
sorted, the whole list, i.e. [lList] pivot [rList] is sorted.
Each line in the following table shows the content of Pivot, lList,
rList, Result on each recursive level in each iteration. We
select the first element as a pivot (underlined) each time.
68
Divide and conquer
We first select 33, the first element as the pivot. The original list is
then divided into two sublists: lList = (26 29 19 12 22) and
rList = (35). Next the first element 26 of the lList is divided into
lList = (19 12 22) and rList = (29). Next 19 is the pivot and
divides the lList into lList = (12) and rList = (22), each contains
a single element, which is sorted. Now the ‘bottom-up’ process
begins: the lList and rList is put together to get a sorted sublist:
(12 19 22) in the format: [lList] pivot [rList]. The process
continues until the whole list is sorted.
69
CIS226 Software engineering, algorithm design and analysis (vol.2)
We should therefore avoid the divide and conquer approach for the
following cases in general:3 3
Note: Some problems have
exponential time algorithms only
1. An instance of size n is divided into two or more instances of (see Chapter 8).
similar size to n. Such a partition may lead to an exponential
time algorithm.
2. An instance of size n is divided into about n instances of size
n/c where c is a constant.
70
Dynamic programming
i 1 2 3 4 5 6 7 8 9 ···
fibonacciItem[i] 1 1 2 3 5 8 13 21 34 ···
Now for each i > 2, only addition is required because the partial
results for both previous terms f ibanacciItem(i − 1) and
f ibanacciItem(i − 2) are already available from the ‘program’ table.
71
CIS226 Software engineering, algorithm design and analysis (vol.2)
First, we use the same recurrence as before, and divide the problem
of size n to two smaller subproblems of size n − 1 and n − 2.
Example 3.17 For the divide and conquer approach (top down):
5! = 5 ∗ 4!
= 5 ∗ 4 ∗ 3!
= 5 ∗ 4 ∗ 3 ∗ 2!
= 5 ∗ 4 ∗ 3 ∗ 2 ∗ 1!
1! = 1
2! = 2 ∗ 1!
3! = 3 ∗ 2!
4! = 4 ∗ 3!
5! = 5 ∗ 4!
3.5.5 Observation
1. The dynamic programming approach is effective when the
problem can be reduced to several slightly smaller subproblems.
2. All the subproblems are computed and the results are stored in
a table to avoid repeating computation.
3. Dynamic programming often requires a large space.
Activity 3.5
73
Chapter 4
Having read this chapter and consulted the relevant material you
should be able to:
explain the importance of binary trees, graphs and heaps
implement the binary tree data structure in Java or other
languages
describe some applications of binary trees.
demonstrate how to represent a graph in computers
describe the two main algorithms of graph traversal
outline the algorithms for solving some classical graph
problems.
4.3 Trees
75
CIS226 Software engineering, algorithm design and analysis (vol.2)
Trees are a very important and widely used abstract data structure
in computer science. Recall arrays, linked lists, stacks and queues.
Each of these data structures represents a relationship between data.
A tree represents a hierarchical relationship. Each node in a tree
spawns one or more branches that each leads to the top node of a
subtree. Almost all operating systems store sets of files in trees or
tree-like structures.
/usr
/ida /bill /martin
....
/research /mail /teaching
/paper
sorting cis206 cis208
soda06 bctcs05
Trees are very rich in properties. We shall find out later that there
are actually many ways to define a tree.
root The top most node is called the ‘root’. For example, ‘usr’ is the
root of the (free) tree.
leaf A node that has no children is called a leaf. For example,
‘soda06’ is a leaf.
parent The predecessor node that every node has (except the root).
For example, ‘ida’ is the parent of ‘research’ and ‘research’ is
the parent of ‘sorting’.
child A successor node that each node has (except leaves). For
example, ‘sorting’ is a child of ‘research’.
siblings Successor nodes that share a common parent. For example,
‘sorting’ and ‘paper’ are siblings.
subtrees A subtree is a substructure of a tree. Each node in a tree
may be thought of as the root of a subtree. For example, ‘paper’
is the root of a three-nodes subtree consisting of ‘paper’,
‘soda06’ and ‘bctcs05’.
degree of a tree node The number of children (or subtrees) of a
node. For example, the node ‘ida’ has a degree of 3.
degree of the tree The maximum of the degrees of the nodes of the
tree, which is 3 in the example.
ancestors of a node All the nodes along the path from the root to
that node. For example, the ancestors of node ‘research’ are
‘usr’ and ‘ida’ (or ‘usr/ida’).
76
Trees
In addition,
root
+
* C
A B
* C
A B
An implementation view
There are actually many kinds of trees. We are only concerned with
rooted, labelled and binary trees here.
77
CIS226 Software engineering, algorithm design and analysis (vol.2)
import java.io.*;
78
Trees
Example 4.4 In this example, we first display the data field value of
the root, then move to the left child of the root and display the data
field value, finally move to the right child of the left child of the root,
and display the value.
1: print(tree.da)
2: tree ← tree.lef t
3: print(tree.da)
4: tree ← tree.right
5: print(tree.da)
tree /
∗ −
+ + a b
a b a b
79
CIS226 Software engineering, algorithm design and analysis (vol.2)
tree
treeNode
treeNode
subtree1 subtree2
80
Trees
The three traversals, i.e. the preorder, inorder and postorder traversal
on an expression tree can result in three forms of arithmetic
expression, namely, prefix, infix and postfix form respectively.
81
CIS226 Software engineering, algorithm design and analysis (vol.2)
We use the standard procedures and functions defined earlier for trees
(Section 4.3.4), and stacks (Section 2.8.1), where simple variables l, r
point to the left and the right subtree respectively, T is the root of the
expression tree to be built.
b + +
a a b a b
(a) step 1 (b) step 2 (c) step 3
82
Trees
e
d
* *
+ c + c
a b a b
*
+
* +
*
d e
+ c + c d e
a b a b
Applications
There are too many tree applications to list completely here. We will
look at some of them in later chapters. The reader is encouraged to
study more examples in the text books.
Activity 4.3
T REES
1. What is the difference between a tree and a binary tree?
2. Draw an expression tree for each of the following expressions:
(a) 5
(b) (5+6*4)/2
(c) (5+6*4)/2-3/7
(d) 1+9*((5+6*4)/2-3/7)
(e) A × B − (C + D) × (P/Q)
3. Hand draw binary expression trees that correspond to the
expressions for which
83
CIS226 Software engineering, algorithm design and analysis (vol.2)
84
Priority queues and heaps
Example 4.7 The binary tree in Figure 4.8 is a complete binary tree,
and a full binary tree in which all its leaves are at the same level.
12
6 10
14 1 5 8
i 1 2 3 4 5 6 7
A[i] 12 6 10 14 1 5 8
Example 4.8 Figure 4.9 is another complete binary tree in which all
the levels are full except the last level where the leaves are stored from
the left to the right without gap.
12
6 10
14 1 5 8
2 7
i 1 2 3 4 5 6 7 8 9
A[i] 12 6 10 14 1 5 8 2 7
Example 4.9 Figure 4.10 is a binary tree but not a complete binary
tree because the left most node is missing from the last level. An
incomplete binary tree leaves gaps in an array structure. The tree
structure can, of course, still be stored in an array but dummy
elements such as “%” are required.
i 1 2 3 4 5 6 7 8 9
A[i] 12 6 10 14 1 5 % % 7
85
CIS226 Software engineering, algorithm design and analysis (vol.2)
12
6 10
14 1 5
The order property requires, for every node in the structure, the
value of both its child nodes must be smaller (or bigger) than the
value at the current node.
Example 4.10 Figure 4.11 shows a heap with the minimum value at
the root. The value at each node is smaller than either child.
1
1
2 3
6 5
4
14 512 610 7
8
8
15
1 2 3 4 5 6 7 8
A 1 6 5 14 12 10 8 15
or
A heap with the maximum value at the root, and the value at each
node is smaller than either child (Figure 4.12).
1
15
2 3
14 10
4 5 6 7
8 12 5 6
8
1
1 2 3 4 5 6 7 8
A 15 14 10 8 12 5 6 1
Note the complete binary tree in Figure 4.9 is not a heap because it
does not have the order property required.
86
Priority queues and heaps
4.4.2.1 Deletion
The smallest (or largest) element will always be at the root node.
remove
1
6 5 6 5 6 5
14 12 8 10 14 12 8 10 14 12 8 10
15 15 15
(a) (b) (c)
15 5 5
6 5 6 15 6 8
14 12 8 10 14 12 8 10 14 12 15 10
(d) (e) (f)
1. Remove the root element 15, and move element 1, the rightmost
leaf at the bottom level to the root. (Figure 4.14(a))
2. Restore the order property by repeatedly swapping element 1 with
its larger child element 14, e.g. first swap 1 at the root with the
87
CIS226 Software engineering, algorithm design and analysis (vol.2)
left child 14, and then swap 1 with its right child 12, until the
order property is restored. (Figure 4.14(b)).
1
1
14
2 3
2
14
3
10 1 10
4 5 6 7
4
8 5
12 6
5 7
6 8 12 5 6
8
8
1
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
A 14 10 8 12 5 6 1 A 14 1 10 8 12 5 6
i j
15
1
1 1
14
2 3
14 10 2
12 3
10
4 5 6 7
8 12 5 6 4
8 5
1 6
5 7
6
8
8
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
A 1 14 10 8 12 5 6 A 14 12 10 8 1 5 6
15 i j
(a) (b)
4.4.2.2 Insertion
Since heaps are complete trees, a new node can easily be added to
the first available location from the left at the bottom level. We then
88
Priority queues and heaps
check the order property and adjust internal nodes. This is done in a
‘bottom-up’ fashion. We first compare the value of the new node
with its father. If it satisfied the order property, the addition process
is completed. If not, we swap it with its father. The checking process
is then repeated on the new node on the level above. This process
continues until the order property is satisfied (or the new element
reaches the root position).
Example 4.13 Figure 4.15 shows (a) a binary heap and how the
order property is maintained after (b) a new element 2 is inserted at
the bottom level of the heap, where the left most available location. (c)
2 is to be swapped with its father 5 since 2 is smaller than 5, (d) 2 is to
be swapped with its father 4 because 2 is smaller than 4. (e) The
process ends because 2 is at the root position and the order property is
now restored.
4 4
6 5 6 5 add 2
14 12 8 14 12 8
(a) (b)
4 4 2
6 5 6 2 6 4
14 12 8 2 14 12 8 5 14 12 8 5
(c) (d) (e)
10 10 10 14 14 14 15
8 8 14 8 10 8 10 15 10 14 10
15 8 8
(a) (b) (c)i (c)ii (d)i (d)ii (d)iii
15 15 15 15
14 10 14 10 14 10 14 10
8 12 8 12 5 8 12 5 6 8 12 5 6
Figure 4.16 shows the construction process from the initial state.
Starting from the root, a new element is inserted, one by one, to the
left most available position at the bottom level of the tree. (a) The
first element 10 is the root, (b) 8 is added as its left child, (c)i 14 is
89
CIS226 Software engineering, algorithm design and analysis (vol.2)
10 8 14 15 12 5 6 1
10 8 14 15 12 5 6 1
10 8 14 15 12 5 6 1
10 8 14 15 12 5 6 1
14 8 10 15 12 5 6 1
14 8 10 15 12 5 6 1
14 15 10 8 12 5 6 1
15 14 10 8 12 5 6 1
15 14 10 8 12 5 6 1
15 14 10 8 12 5 6 1
15 14 10 8 12 5 6 1
Applications
Example 4.15 Sort a list of integers A[1..8] = (10, 8, 14, 15, 12, 5, 6, 1)
using a max-heap.
During the sorting process, the list is divided into two sections: A[1..k],
the heap and A[k + 1..n], the sorted part (in shade in
Figures 4.17–4.19). We shall each time remove the root element at
A[1], the largest integer from the current heap and insert it to location
k + 1, and k ← k − 1.
1. Figure 4.17(a).
2. Figure 4.17(b)
3. Figure 4.18(a).
4. Figure 4.18(b).
5. Figure 4.19(a).
90
Priority queues and heaps
1
14
2 3
14 10 2 3
8 12
4 5 6 7
8 12 5 6 4 5 6 7
6 1 10 5
8
1 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
A 14 10 8 12 5 6 1
A 14 8 12 6 1 10 5 15
15
1 1
1 5
2 3 2 3
14 10 8 12
4 5 6 7 4 5 6 7
8 12 5 6 6 1 10 14
8 8
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
A 1 14 10 8 12 5 6 15 A 5 8 12 6 1 10 14 15
(a) (b)
1 1
12 10
2 3 2 3
8 10 8 5
4 5 6 7 4 5 6 7
6 1 5 6 1
8 8
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
A 12 8 10 6 1 5 14 15 A 10 8 5 6 1 12 14 15
1 1
12 10
2 3 2 3
8 10 8 5
4 5 6 7 4 5 6 7
6 1 5 6 1
8 8
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
A 5 8 10 6 1 12 14 15 A 1 8 5 6 10 12 14 15
(a) (b)
6. Figure 4.19(b).
7. Figure 4.19(c).
91
CIS226 Software engineering, algorithm design and analysis (vol.2)
1
5
2 3
1
1 1
8 6 4 5 6 7
2 3 2 3
6 5 1 5 8
4 5 6 7 4 5 6 7
1 1 2 3 4 5 6 7 8
8 8
A 5 1 6 8 10 12 14 15
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
1
A 8 6 5 1 10 12 14 15 A 6 1 5 8 10 12 14 15 5
2 3
1
1 1 4 5 6 7
8 6
2 3 2 3 8
6 5 1 5
1 2 3 4 5 6 7 8
4 5 6 7 4 5 6 7
1
A 1 5 6 8 10 12 14 15
8 8
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
A 1 6 5 8 10 12 14 15 A 5 1 6 8 10 12 14 15 A 1 5 6 8 10 12 14 15
Activity 4.4
HEAPS
1. What is the difference between a binary tree and a tree in which
each node has at most two children?
2. What is the approximate number of comparisons of keys
required to find a target in a complete binary tree of size n?
3. What is a heap?
4. What is a priority queue?
5. Demonstrate, step by step, how to construct a max-heap for the
list of integers A[1..8] = (12, 8, 15, 5, 6, 14, 1, 10) (in the given
order), following Example 4.14.
6. Demonstrate, step by step, how to sort a list of integers (2, 4, 1,
5, 6, 7) using a max-heap, following Example 4.15.
4.5 Graphs
92
Graphs
w
c
Computing Services Whitehead Building
l m
25 St James
Library
C 9K W C 9K W
7K 7K
20K 20K
10K 10K
5K 5K
L 30K M L 30K M
c w
9 9 9
5 10 5 10 5 10
30 30 30
l m 9 9
20 7
5 20 10 7
10 5
30
30
9
9 20 7
5 10 10
5
7 20
30 30
9
20
7 7 20
5 10
20 7 7 20
30
We could then make a decision to install the cable for the route that
consists of direct connections (c,l), (c,m) and (c,w). The total cost
would be: 5 + 7 + 9 = 21K.
Definitions
93
CIS226 Software engineering, algorithm design and analysis (vol.2)
There are two main classes of graphs: graphs and directed graphs.
For graphs, the edge set consists of a non-ordered pair of vertices,
e.g. (1,2)=(2,1), (2,3)=(3,2) etc. For directed graphs digraphs, the
edge set consists of an ordered pair of vertices, e.g. (1,2)6=(2,1),
(3,2)6=(2,3), etc. (Figure 4.23).
1 e1 2 1 e1 2
e5 e4 e2 e5 e4 e2
4 e3 3 4 e3 3
graph digraph
As for trees, some terms and concepts are introduced for discussions
on graphs:
94
Graphs
1 2 1 2
4 3 4 3
(a) simple (b) non-simple
1 2 5 6 1 2 5 6
4 3 8 7 4 3 8 7
(a) A connected graph (b) A disconnected graph
Example 4.18 Figure 4.26(a) shows two labelled graphs and they are
different, but the two unlabelled graphs in Figure 4.26(b) are the
same. The graphs are connected and unweighted in both figures.
1 2 1 2
4 3 4 3
(a) Two different labelled graphs (b) Two identical unlabelled
graphs
Representation of graphs
95
CIS226 Software engineering, algorithm design and analysis (vol.2)
1. Adjacency matrices
A graph G = (V, E) can be represented by a 0-1 matrix showing
the relationship between each pair of vertices of the graph. We
assign 1 or 0 depending on whether the two vertices are
connected by an edge or not.
Given a graph G = (V, E), let n be the number of vertices. The
adjacency matrix of the graph is a n × n matrix.
a1,1 a1,2 ··· a1,n
.. .. .. ..
A= . . . .
an,1 an,2 · · · an,n
where
1 if (i, j) ∈ E
ai,j =
0 otherwise
c w c w
e1 e2 e3 e1 e2 e3
l m l m
a graph a digraph
Example 4.19 The Adjacency matrix for the graph in Figure 4.27
is,
c w m l
0 0 0 1
c 0 0 0 1 0 0 1 1
w 0 0 1 1 A= 0 1 0 0
m 0 1 0 0
1 1 0 0
l 1 1 0 0
and for the digraph in Figure 4.27 is,
c w m l
0 0 0 1
c 0 0 0 1 0 0 1 1
w 0 0 1 1 A=
0 0 0 0
m 0 0 0 0
0 0 0 0
l 0 0 0 0
96
Graphs
For a digraph,
−1 if edge j leaves vertex i
bi,j = 1 if edge j enters vertex i
0 otherwise
Example 4.20 The incidence matrix for the graph in Figure 4.27
is
e1 e2 e3
1 0 0
c 1 0 0 0 1 1
w 0 1 1 B=
0
0 1
m 0 0 1
1 1 0
l 1 1 0
and for the digraph in Figure 4.27 is,
e1 e2 e3
−1 0 0
c −1 0 0 0 −1 −1
w 0 −1 −1 B=
0 0 1
m 0 0 1
1 1 0
l 1 1 0
Note Incidence matrices are not suitable for any digraph with a
self-loop (‘loop’ for short).
3. Adjacency lists
In an adjacency list representation, a graph G = (V, E) is
represented by an array of lists, one for each vertex in V . For
each vertex u in V , the list contains all the vertices adjacent to u
in an arbitrary order, usually in increasing or decreasing order
for convenience.
Why an adjacency list?
It is space efficient for sparse graphs where the number of edges
is much less than the squared power of the number of vertices.
Example 4.21 Suppose that we need to store the digraph with
few edges in Figure 4.27 . Suggest a data structure for the digraph.
Solution An adjacency list can save space for a sparse graph
(Figure 4.29).
Implementations
97
CIS226 Software engineering, algorithm design and analysis (vol.2)
1 2 3 4
1 0 0 0 1
2 0 0 1 1
3 0 0 0 0
4 0 0 0 0
1 c l
2 w m l
3 m
4 l
numerals. For our example (see the digraph in Figure 4.27), the
labels for c, w, m, l can be replaced by 1, 2, 3, 4 respectively. Hence
the data structure may be represented as follows:
1 c 5 1 5
2 w 6 2 6
3 m 3
4 l 4
5 l 5 4
6 m 7 6 3 7
7 l 7 4
Graph algorithms
98
Graphs
Activity 4.5
G RAPHS
1. Consider the adjacency matrix of a graph below:
0 1 1 0 0
0 0 1 0 0
A= 0 0 0 1 0
0 0 0 0 0
1 0 1 1 0
(a) Draw the graph
(b) Write the adjacency list for the graph
(c) Discuss the suitability of using an adjacency matrix and the
adjacency list for the graph. Justify your answer.
2. Using the adjacency matrix approach, write a program to store a
simple graph and display the graph.
Example 4.23
Store and Display
a simple graph
-------------------------
1. Store a graph
2. Display a graph
0. Quit
Please input your choice (0-2) >
Hint An easy approach may be:
(a) define a data structure for the graph
(b) decide a means to input the graph, for example, you may
i. type the entries of the adjacency matrix on the keyboard
ii. read the entries of the adjacency matrix from a text file
iii. generate a random adjacency matrix by a program.3 3
Use a random generator to generate
(c) write the main program or method with interfaces of the a 0 or 1 uniformly at random for
each entry of the matrix.
sub-methods or procedures
(d) develop each part of the program.
99
Chapter 5
5.3 Traversal
101
CIS226 Software engineering, algorithm design and analysis (vol.2)
1: for i ← 1; i ≤ 10; i + + do
2: S[i] ← 0
3: end for
The graph traversal problem can be broadly divided into two types:
one is to visit every node of a graph and the other is to traverse
every edge of a graph. We consider only simple graphs, i.e. graphs
which contain no self-loops or parallel edges, in this subject guide.
102
Traversal
A C F G
B E D H
Solution
103
CIS226 Software engineering, algorithm design and analysis (vol.2)
Note there may be other correct traversals and the result depends
on the implementation.
5.4 Searching
They focus on different issues although the main concerns are the
same. In this course unit, however, we are only concerned with
internal searching problems due to the time constraints.
104
Sequential search
The simplest way to search for a key in a list is to scan the whole
list, from the start to the end, until the key is found or the finish end
is reached. This is called sequential search.
We compare the key “7” with each element in the list from left to right.
12 34 2 9 7 5
7
12 34 2 9 7 5
7
12 34 2 9 7 5
7
12 34 2 9 7 5
7
12 34 2 9 7 5
7 found!
Algorithm outline
105
CIS226 Software engineering, algorithm design and analysis (vol.2)
Complexity analysis
Example 5.4 L = (3, 7, 11, 12, 15, 19, 24, 33, 41, 55), X = 20.
i 1 2 3 4 5 6 7 8 9 10 centre element
L[i] 3 7 11 12 15 19 24 33 41 55 15
20
L[i] 19 24 33 41 55 33
20
L[i] 19 24 19
20
L[i] 24 24
20
20 is not found!
Algorithm outline
106
Binary search
respectively, of the list at each stage, and mid as the index of the
centre element of the list. The main idea of binary search is outlined
in Algorithm 5.6, which does not work yet (why?):
Example 5.5 Suppose the key value to be searched is 20, again. Let us
demonstrate how the algorithm works by tracing the index values for
the locations low, high, mid and the flag found in the binary search
algorithm. Let low be 1 and high be some integer n initially. Suppose
mid = (low + high) DIV 2.
The algorithm ends when low > high. Since the flag found is still
False, the algorithm will report the searching result “not found”.
We can now modify the draft Algorithm 5.6, and derive the working
version in Algorithm 5.7.
Complexity analysis
107
CIS226 Software engineering, algorithm design and analysis (vol.2)
108
Binary search trees
Binary search trees (BST) are binary trees with an order property.
They are particularly useful for searching. Each node in a binary
search tree contains at least one key field of some rankable value.
For every node Y in the tree, the values of all the keys in the left
subtree are smaller than the key value of Y , and the values of all the
keys in the right subtree are larger than the key value of Y .
5
/ \
2 7
/ \
6 9
5 5
/ \ / \
6 2 2 6
/ \ / \
7 9 4 9
5
/ \
2 6
/ | \
7 9 10
Next, let us look at how to construct a binary tree for a given set of
data.
10 10 10 10 10 10 10
/ / / \ / \ / \ / \
2 2 2 12 2 12 2 12 2 12
\ \ \ \ / \
3 3 3 3 1 3
\ \ \
5 5 5
109
CIS226 Software engineering, algorithm design and analysis (vol.2)
Solution
(a) 12 (b) 1
/ \
10 2
/ \
5 3
/ \
3 5
/ \
2 10
/ \
1 12
As you can see, the two BSTs constructed grow only to one
direction, the first one (a) to the left and the second one (b) to the
right. Such a tree is called a Splay tree.
5
/ \
3 8
/ / \
1 6 9
We write down the nodes visited in each of the following tree traversals:
1. pre-order traversal: 5 3 1 8 6 9
2. post-order traversal: 1 3 6 9 8 5
3. in-order traversal: 1 3 5 6 8 9.
This means that given a list of integers, we can first store the
integers in a binary search tree which takes O(n log n) time, and
then conduct an in-order traversal to print out a list of integers, and
this list of integers becomes sorted.
110
Binary search trees
Implementation
Suppose that there are three fields of each treeNode, namely left,
data and right.
An easy way to construct a binary search tree for a list of data, e.g.
integers, is to store them one by one in the order of entry and let
each element be the left or right child of a leaf according to the
value of the key. In this way the shape of the binary tree will depend
on the order of entry of the integers as we discussed earlier.
12 12
2 34 2 34
9 9
7 7
5 5
(a) (b)
We can define a new type treeNode for a binary search tree node. A
binary search tree can be identified simply by a variable, e.g. T of
the treeNode type.
111
CIS226 Software engineering, algorithm design and analysis (vol.2)
Algorithm analysis
However, a binary search tree can be so “bad” that it does not have
any branching. For example, if the integers are inserted in ascending
order using the procedure insert (Algorithm 5.10) started with an
empty tree. In this case, the height of the binary search tree is O(n).
Hence the function findNode needs O(n) time in the worst case.
Activity 5.7
112
Chapter 6
Sorting
Having read this chapter and consulted the relevant material you
should be able to:
6.3 Introduction
6.4 Motivation
113
CIS226 Software engineering, algorithm design and analysis (vol.2)
The idea behind Insertion Sort is natural and general. The analyses
for the worst case and average case are straightforward. We first
look at an example and then derive the algorithm. We then analyse
the algorithm for the worst case and average case. The
implementations of the algorithms for different data structures are
also discussed.
Example 6.2 Sort the integers (34, 8, 64, 51, 32, 21) into ascending
order.
The idea is to first look at the elements one by one and build up the
sorted list by inserting each element to the correct location.
Implementation
114
Selection sort
115
CIS226 Software engineering, algorithm design and analysis (vol.2)
Example 6.3 Sort the integers (34, 8, 64, 51, 32, 21) into ascending
order.
The idea is to, for each element A[i], find the MinKey (the minimum
key) and its location m, and swap MinKey and A[i].
Implementation
116
Shellsort
6.7 Shellsort
The reason that Insertion sort can move items only one position at a
time is that it compares only adjacent elements. Shellsort compares
and sorts elements far apart. Instead of comparing only adjacent
keys, Shellsort uses the increment sequence, h1 , h2 , · · · , ht , where
ht = 1 and check the items hk apart at pass k, where k = 1 . . . t.
117
CIS226 Software engineering, algorithm design and analysis (vol.2)
Shell sort is named after its inventor Donald Shell. It is one of the
first algorithms to break the quadratic time barrier. However, it was
not until several years later that this breakthrough was proved.
Shell sort is sometimes called diminishing increment sort.
hk ← hk−1 div 3 + 1
Algorithm analysis
Shellsort is simple to code but the analysis of its running time turns
out to be exceedingly difficult. This is mainly because the running
time of Shellsort depends on the choice of increment sequence. The
average case analysis of Shellsort is unkonwn, except for very trivial
increment sequences.
Summary of Shellsort
1. Shellsort sorts a list of n keys by successively sorting sublists
whose elements are intermingled in the whole list.
2. The sublists are determined by an increment sequence, h1 , . . . , ht .
3. Some choices of increment sequence are better than others.
Empirical studies show: for large n, the number of moves is in
the range of n1.25 to 1.6n1.25 .
4. Shellsort is a substantial improvement over insertion sort in
general.
6.8 Mergesort
The name Mergesort comes from the idea of taking two sorted lists
of elements and merging them into a single sorted list. This is used
for external sorting.
118
Mergesort
When merging two sorted lists l and r, the first entry of the result
will be the smaller of the first elements of the two lists. After it is
taken, the second element of the result list will be the smaller of
what remains. This process repeats until the end of one list is
reached. Finally the remains of the other list will be appended to the
result list (See example 3.12).
Implementation
We use l, r to trace the ‘current’ node on the two sorted lists lList
and rList respectively for comparison, and c points to the head of
the merged list and s traces the last node of the combined list.
(a) l 2 5 8 9
r 3 4
c 2 3 4 5 8 9
(b) l 5 8 9
r 3 4
c 2
s
Algorithm 6.9 rearranges two sorted lists l and r to a new sorted list
c. We assume that neither lList nor rList is empty. 2 2
Of course, if one of them is empty,
the combined list is simply the other
one. 119
CIS226 Software engineering, algorithm design and analysis (vol.2)
Mergesort a list
The idea of Mergesort is to divide the list into halves (or two as
equal ones as possible), then sort the two sublists recursively.
Finally, the two sorted sublists are merged into one sorted list.
Mergesort is a good example of a recursive algorithm and divide and
conquer approach (see Example 3.13).
Algorithm analysis
120
Quicksort
the list n.
A so-called recursion tree can be drawn to demonstrate that
the depth of the tree is log n.
The total number of comparisons done on each level ≤ n.
Therefore, the total number of comparisons is ≤ n log n, i.e.
O(n log n).
Summary of Mergesort
Although its running time is O(n log n), Mergesort is hardly ever
used for main memory sorts. This is because the method firstly
requires linear extra memory.
Secondly, the sort actually has been slowed down considerably
while copying elements to the temporary array and back
throughout the algorithm.
6.9 Quicksort
Instead of splitting the list by length, Quicksort splits the lists each
time by the key value of each item compared with a “standard”.
Such a standard element is called the pivot of the list and can be
randomly selected. All the items whose keys are smaller than pivot
will be relocated to its left sublist and all the items with keys that
are greater than the pivot will be relocated to its right (See
Example 3.15).
This idea works, however, how would we decide the pivot location?
121
CIS226 Software engineering, algorithm design and analysis (vol.2)
i 1 2 3 4 5 6 7
(1) A[i] 33 26 35 29 19 12 22 A[l]>A[r]
l r
22 26 35 29 19 12 33
l r
22 26 35 29 19 12 33
l r
(2) 22 26 35 29 19 12 33 A[l]>A[r]
l r
22 26 33 29 19 12 35
l r
22 26 33 29 19 12 35
l r
(3) 22 26 33 29 19 12 35 A[l]>A[r]
l r
22 26 12 29 19 33 35
l r
22 26 12 29 19 33 35
l r
(4) 22 26 12 29 19 33 35
l r
In stage (2), l is moved toward the right until A[l] > A[r], and we
swap them. Now A[p] = A[l], so we move r one position to the left, i.e.
r ← r − 1.
In stage (3), r is moved toward the left, but failed because A[l] > A[r];
so we swap them. Now A[p] = A[r], and we move l one position to the
right, i.e. l ← l + 1.
Algorithms
122
Quicksort
Algorithm analysis
T (n) = T (i) + T (n − i − 1) + cn
Implementation
}
if (l<r) {
swap(A, p, r);
p=r; l++;
123
CIS226 Software engineering, algorithm design and analysis (vol.2)
}
else p=l;
}
return p;
}
Summary of Quicksort
1. Quicksort is another example of a recursive algorithm using the
divide and conquer strategy.
2. Quicksort is the fastest known general sorting algorithm in
practice on average.
3. The worst case is as bad as the worst case of selection sort
(O(n2 )).
124
General lower bounds for sorting
introduction).
Decision trees
a ≤ b
F T
a ≤ c b ≤ c
F T F T
b ≤ c b < a ≤ c a ≤ c a ≤ b ≤ c
F T F T
125
CIS226 Software engineering, algorithm design and analysis (vol.2)
Path from the root to a leaf: the actions of the sorting algorithm
on a particular input.
Number of nodes on a path from the root to a leaf: the number
of comparisons on a particular input.
Depth of the tree: the number of comparisons for the worst case.
Average length of all the paths from the root to a leaf: the
number of comparisons for the average case.
The following theorem states the general lower bound for sorting
algorithms by comparisons.
Example 6.7 If we know in advance that there are 100 items with
distinguished keys of integers within the range [15, 120]. The best
way to sort the 100 items is to define an array indexed 15..120. If a
particular item has key i, then place ‘1’ in location i. At the end of the
process, the table contains 100 ‘1’s. We can subsequently display all the
indices where a ‘1’ is stored in order of the indices.
A 1 0 1 1 ··· 0 1 0
Radix sort
126
Sorting large records
The basic idea of radix sort is to make one pass through the entries,
placing each entry at the back of the xi th linked list, where xi is the
ith digit of the entry’s key.
A practical solution is to use indirect sort, i.e. to have the input array
contain pointers to the records. We sort by comparing the keys the
pointers point to, and only swap pointers if necessary.
A p
field1 field2 key field4
1 23 1
2 12 2
3 28 3
4 30 4
5 7 5
127
CIS226 Software engineering, algorithm design and analysis (vol.2)
5 1
5 1 3
5 1 3 4
5 2 3 4 1
5 2 3
5 2 3 4
5 2 1 4 3
5 2 1 4
5 2 1 3 4
6.13 Heapsort
Recall that a binary heap (Section 4.4) is a binary tree with some
special properties including the structure property and order
property:
Ideas of Heapsort
If the keys can be stored in a heap, then we could get a sorted list by
repeatedly removing the key from the root (the minimum remaining
128
Heapsort
6 5
14 12 10 8
15
key), copying it to the output array, and rearranging the keys left in
the heap to reestablish the heap properties.
Example 6.10 We show how elements in the heap in Figure 6.5 can
be sorted.
1. Remove the root 1 and copy it to Output. Move the leftmost leaf
15 at the bottom level (i.e. the last leaf in the table) to the root
and maintain the order property by swapping with its child with
the smaller data value (smaller child) if it exists, i.e. swap(15,5),
swap(15,8) (Figure 6.6).
6 5
Output: 1
14 12 10 8
15
15 5 5
6 5 6 15 6 8
14 12 10 8 14 12 10 8 14 12 10 15
Figure 6.6: Remove 1 from the heap and reestablish the heap properties.
2. Remove the root 5 and copy to Output. Move the last leaf 15 to the
root and maintain the order property by swapping with its smaller
child if it exists, i.e. swap(15,6), swap(15,12) (Figure 6.7).
6 8
Output: 1 5
14 12 10 15
15 6 6
6 8 15 8 12 8
14 12 10 14 12 10 14 15 10
3. Remove the root 6 and copy to Output. Move the last leaf 10 to
the root and maintain the order property by swapping with its
smaller child if it exists, i.e. swap(10,8) (Figure 6.8).
4. Remove the root 8 and copy to Output. Move the last leaf 15 to
the root and maintain the order property by swapping with its
smaller child if it exists, i.e. swap(15,10) (Figure 6.9).
129
CIS226 Software engineering, algorithm design and analysis (vol.2)
12 8
Output: 1 5 6
14 15 10
10 8
12 8 12 10
14 15 14 15
12 10
Output: 1 5 6 8
14 15
15 10
12 10 12 15
14 14
5. Remove the root 10 and copy to Output. Move the last leaf 14 to
the root and maintain the order property by swapping with its
smaller child if it exists, i.e. swap(14,12) (Figure 6.10).
12 15 Output: 1 5 6 8 10
14
14 12
12 15 14 15
6. Remove the root 12 and copy to Output. Move the last leaf 15 to
the root and maintain the order property by swapping with its
smaller child if it exists, i.e. swap(15,14) (Figure 6.11).
7. Remove the root 14 and copy to Output. Move the last leaf 15 to
the root. Since 15 is the only element in the heap, we copy it to the
output (Figure 6.12).
Implementation
There are ways other than the usual linked structures to implement
binary trees. For example, we could store a complete binary tree in
a one-dimensional array A by first labelling the tree nodes,
beginning with the root, from left to right on each level, then storing
each node in the position shown by its label.
Note: the parent of an element A[i] is A[i div 2] if i > 1; the left
child of A[i] is A[2i] and the right child is A[2i + 1] if they exist.
130
Heapsort
Output: 1 5 6 8 10 12
14 15
15 14
14 15
Output: 1 5 6 8 10 12 14
15
15 Output: 1 5 6 8 10 12 14 15
1
1
2 3
6 5
4 5 6 7
14 12 10 8
8
15
1 2 3 4 5 6 7 8
A 1 6 5 14 12 10 8 15
1
15
2 3
14 10
4 5 6 7
8 12 5 6
8
1
1 2 3 4 5 6 7 8
A 15 14 10 8 12 5 6 1
131
CIS226 Software engineering, algorithm design and analysis (vol.2)
addRoot: insert an element into the heap as a root and restore the
heap properties.
buildHeap: construct the initial heap from a list of items (keys) in
arbitrary order.
We copy Algorithm 4.8 and 4.9 here (Algorithm 6.16 and 6.17) for
convenience of discussion:
132
Heapsort
Example 6.12 Suppose an integer list L = (1, 6, 5, 14, 8, 10, 12, 15) is
stored in array A. Trace the content of the array A on execution of
each iteration of the algorithm heapSort(L).
The content of array A and the values of k, i, j are listed below. You
should attempt the task of tracing the content first and the following
answer is best used for checking your own solution.
1. Call buildHeap():
index 1 2 3 4 5 6 7 8 k i j
k
A[index] 1 6 5 14 8 10 12 15
i j 4 4 8
A[index] 1 6 5 15 8 10 12 14 8 16 >n=8
k
A[index] 1 6 5 15 8 10 12 14
i j 3 3 6
j 7
A[index] 1 6 12 15 8 10 5 14 7 14 >n
k
A[index] 1 6 12 15 8 10 5 14
i j 2 2 4
A[index] 1 15 12 6 8 10 5 14
i j 4 8
A[index] 1 15 12 14 8 10 5 6 8 16 >n
k
A[index] 1 15 12 14 8 10 5 6
i j 1 1 2
A[index] 15 1 12 14 8 10 5 6
i j 2 4
A[index] 15 14 12 1 8 10 5 6
i j 4 8
A[index] 15 14 12 6 8 10 5 1 8 16 >n
k
A[index] 14 8 12 6 1 10 5 15 7
A[index] 5 8 12 6 1 10 14 15
i j 1 2
j 3
A[index] 12 8 5 6 1 10 14 15
i j 3 6
A[index] 12 8 10 6 1 5 14 15 6 12 >k-1
133
CIS226 Software engineering, algorithm design and analysis (vol.2)
A[index] 12 8 10 6 1 5 14 15 6
A[index] 5 8 10 6 1 12 14 15
i j 1 2
j 3
A[index] 10 8 5 6 1 12 14 15 3 6 >k-1
k
A[index] 10 8 5 6 1 12 14 15 5
A[index] 1 8 5 6 10 12 14 15
i j 1 2
A[index] 8 1 5 6 10 12 14 15
i j 2 4
A[index] 8 6 5 1 10 12 14 15 4 8 >k-1
k
A[index] 8 6 5 1 10 12 14 15 4
A[index] 1 6 5 8 10 12 14 15
i j 1 2
A[index] 6 1 5 8 10 12 14 15 2 4 >k-1
k
A[index] 6 1 5 8 10 12 14 15 3
A[index] 5 1 6 8 10 12 14 15
i j 1 2
A[index] 5 1 6 8 10 12 14 15 2 4 >k-1
k
A[index] 1 5 6 8 10 12 14 15 2 1 2 >k-1
Activity 6.13
S ORTING
1. Trace through the steps, by hand, or implementation of a
program, that each of the sorting algorithms in Chapter 6 will
use on each of the following lists. In each case, count the
number of comparisons that will be made and the number of
times an item will be moved.
134
Heapsort
135
Chapter 7
Optimisation problems
Having read this chapter and consulted the relevant material you
should be able to:
137
CIS226 Software engineering, algorithm design and analysis (vol.2)
138
Optimisation problems
have:
Matrices „ « Operations Num of × Num of +
2 ` ´
AB 2 1 3 1 3 2×1×5 2×0×5
3
0 1
1 2
„ «B 2 3 C
4 2 6 2 6 B B C
(AB)C 1 2 C 2×5×2 2×4×2
6 3 9 3 9 B @2
C
3 A
„ «„ 1 2
«
24 44 2 2 1
((AB)C)D 2×2×3 2×1×3
36 66 3 2 1
Matrices Num of × =
((A2×1 B1×5 )C5×2 )D2×3 2×1×5+2×5×2+2×2×3 42
(A2×1 (B1×5 C5×2 ))D2×3 1×5×2+2×1×2+2×2×3 26
A2×1 ((B1×5 C5×2 )D2×3 ) 1×5×2+1×2×3+2×1×3 22
A2×1 (B1×5 (C5×2 D2×3 )) 5×2×3+1×5×3+2×1×3 51
(A2×1 B1×5 )(C5×2 D2×3 ) 2×1×5+5×2×3+2×5×3 70
We now can conclude that A((BC)D) is the best order for the matrix
multiplication because it requires the minimum number (22) of
multiplications.
We want to have a nice holiday but are only allowed to take one
knapsack. Suppose that we have n objects of different sizes and
values. We now can only take some of them with us. So here is the
question:
139
CIS226 Software engineering, algorithm design and analysis (vol.2)
example, the most favourite objects, the most useful tools, or simply
the most expensive items. The goal is, however, to select a subset of
the objects with the maximum total value to fit in the knapsack. The
total size of the objects must not exceed the size limit of the
knapsack.
An easy way to find the most valuable set is to list all the
possibilities. This approach is called exhaustive-search. We start from
the most expensive item. For each object, we will see if any other
object can fit in the remaining room. For example, we first take
object 5, the most valuable single item and put it in the knapsack.
We then take object 4, the next most valuable single item, but found
it cannot fit in. So, we try object 3, the next most valuable item, and
then object 2, and 1, etc.
We have:
Now we can conclude that we can take two objects 1 and 5, or two
objects 3 and 4, or three objects 1, 2 and 3.
140
Optimisation problems
Table 7.1:
We have
Subset Total size Total value
5,4 16 not fit
5,3 14 not fit
5,2 13 not fit
5,1 12 5 + 80 = 85
4,3 12 40 + 10 = 50
4,2 11 60
4,1 10 40 + 80 = 120
3,2 9 30
3,1 8 90
2,1 7 100
4,2,1 14 not fit
3,2,1 12 10 + 20 + 80 = 110
3,1,4 15 not fit
2,1,5 16 not fit
It is interesting to see that the most valuable subset of the objects does
not have to be the objects that fill exactly the knapsack any more.
The knapsack problem has many versions because the size can be
interpreted to anything that can be a limit, for example, the weight,
or size in from one to many dimensions. The valuables can be
interpreted as anything that can be measured quantitatively. The
question to be asked can be to fit in (≤) or exactly fit/fill (=) or fill
at least (≥) the limit of the capacity of literally anything.
You go shopping and you need to pay, say £2.54. Handing over £10
cash, you would receive changes of £7.46. This consists of, for
example, coins: 7 × £1, 4 × 10p, and 1 × 5p and 1 × 1p, a total of
7 + 4 + 1 = 12 coins. Alternatively, you may receive: 14 × 50p, 2 × 20p
and 6 × 1p coins, a total of 14 + 2 + 6 = 22 coins.
Suppose that you hate carrying coins around, hence you want to
have change consisting of as few coins as possible.
The question for the optimisation problem is: what choice of the
change consists of the minimum number of coins?
141
CIS226 Software engineering, algorithm design and analysis (vol.2)
We will consider the changes for each digit from the highest value digit
to the lowest one. The problem is therefore divided as follows:
1. Make change of £5
2. Make change of 60p
3. Make change of 3p.
Using recursion
142
Optimisation problems
Let the coins be of values 200p, 100p, 50p, 20p, 10p, 5p, 2p and 1p.
What is the minimum number of coins needed to make Xp of
change?
i 0 1 2 3 4 5 6 7
coins[i] 200 100 50 20 10 5 2 1
Base case:
1. If a change of 0p is required, the number of coins for the
change is 0;
2. Otherwise, if a change of a coin value, i.e. 200p, 100p, 50p,
20p, 10p, 5p, 2p or 1p, is required, the number of coins for
the change is 1.
Induction:
Otherwise,
1. If a change is smaller than a coin value (e.g. coins[i]),
meaning the coin is too large to be included in the change,
then the number of coins for the change is the number of
coins with values ranging from the next coin value
coins[i+1], coins[i+2] · · · coins[7]=1p.
2. Otherwise, this is the case when a change is larger than a
coin value coins[i]. So the minimum number of coins
required for the change is K plus the number of coins
required for the difference between the change and
K × coins[i], where K = change div coins[i], is the
maximum multiple of the current coin value for the change.
143
CIS226 Software engineering, algorithm design and analysis (vol.2)
import java.util.Scanner;
class changeTest {
144
Greedy approach
The greedy approach can be seen from the coin change problem in
Example 7.5. When we selected coins in each step, we followed a
strategy of choosing the best possible partial solution for that step.
For example, in step 1, we chose three coins (2 £2-coins and 1
£1-coin) instead of five coins (5 £1-coins), etc.. Although we did not
know the final result, we made the effort towards the final goal of
choosing as few coins as possible in each step.
145
CIS226 Software engineering, algorithm design and analysis (vol.2)
Frequency table
Count each symbol in the text and we have the frequency table for the
text:
Character Frequency
B 3
I 1
L 2
E 2
A 1
T 1
S 1
N 1
2
. 1
Total 15
Sort by frequency:
Character Frequency
B 3
L 2
E 2
2
I 1
A 1
T 1
S 1
N 1
. 1
Total 15
146
Greedy approach
Huffman’s ideas
In step (1), symbols S and N, the two symbols with the lowest
frequency are combined to form a combination SN with a frequency
of 2, the total frequencies of the two single symbols.
This process continues until all the symbols are combined to one
symbol ‘(((SN)L)EI)(B(AT))’ in step (8).
147
CIS226 Software engineering, algorithm design and analysis (vol.2)
(1) B L E I A T S N (6) B L E I A T S N
3 2 2 1 1 1 1 1 0 1 0 1 10 11 00 01
B(AT) (SN)L EI
(2) B L E I A T S N 5 4 3
0 1
B SN L E I A T B2 2L EI
3 2 2 2 1 1 1
AT SN
SN (7) B L E I A T S N
(3) B L E I A T S N 0 01 10 11 10 11 000 001
0 1 0 1 ((SN)L)EI B(AT)
B AT SN L E I 7 5
3 2 2 2 2 1
4 3 B2
ATSN
2LEI AT
(4) B L
E I A T S N
0 1 0 1 0 1 SN
EI B AT SN L (8) B L E I A T S N
3 3 2 2 2 10 001 010 011 110 111 0000 0001
EI ATSN (((SN)L)EI)(B(AT))
12
(5) B L E I A T S N
1 0 1 0 1 00 01 7 5
(SN)L EI B AT
4 3 3 2 4 3 B 2
2L EI AT 2LEI AT
SN SN
(((SN)L)EI)(B(AT))
12
0 1
7 5
0 1 0 1
4 3 B 2
0 1 0 1 0 1
2 LEI A T
0 1
SN
Figure 7.3 shows the first seven steps of decoding the symbols S and
E. The decoder reads the 0s or 1s bit by bit. The ‘current’ bit is
highlighted by shading in the sequence to be decompressed on each
step. The edge chosen by the decompression algorithm is marked as
a bold line. For example, in step (1), starting from the root of the
Huffman tree, we move along the left branch one edge down to the
left child since a bit 0 is read. In step (2), we move along the left
branch again to the left child since a bit 0 is read, and so on. When
148
Greedy approach
(7) 00000100001
12
0 1
7 5
01 01
4 3 B 2 ···
0101 01
2LEI AT
01
SN
we reach a leaf, for example, in step (4), the symbol (the bold ‘S’) at
the leaf is output. This process starts from the root again (5) until
step (7) when another leaf is reached and the symbol ‘E’ is output.
The decoding process ends when EOF is reached for the entire string.
Observation
149
CIS226 Software engineering, algorithm design and analysis (vol.2)
For such a small alphabet, it would be much easier to use 3-bit fixed
length code instead of Huffman codes. However, in a real
application such as the ASCII code, the standard alphabet would
contain 256 symbols.
Activity 7.4
G REEDY APPROACH
1. Describe the main characteristics of optimisation problems.
2. What is the greedy approach? Outline the ideas of the approach.
3. Give an example of an optimisation problem and describe it.
4. Study the greedy approach for the change-making problem and
describe it. Describe an instance of the optimisation problem for
which the greedy algorithm does not yield an optimal solution.
5. Design and implement an algorithm for the knapsack problem
in Section 7.3.2.
6. Implement compression and decompression parts of the
Huffman algorithm.
150
Chapter 8
Limits of computing
Having read this chapter and consulted the relevant material you
should be able to:
151
CIS226 Software engineering, algorithm design and analysis (vol.2)
If you have not studied the Turing machine, you only need to know
the following in order to follow the rest discussion of the chapter:
Turning machine is a theoretical computation model. It consists of a
limitless memory for both data and algorithms, and a moving head
that can read from and write to each cell of the memory following the
instructions in the algorithm. It can simulate arbitrary algorithms or
programs without consequential loss of efficiency.
152
Decision problems
Example 8.2
Input: k, a number
Output: 1 if k is a positive number; 0 otherwise.
1. Given a key and a list of elements, search for the key in the list of
the elements. Return the index of the key if it is found in the list
and return a null otherwise.
2. Given a key and a list of elements, search for the key in the list of
the elements. Return a ‘yes’ if the key is found in the list and
return a no, otherwise.
153
CIS226 Software engineering, algorithm design and analysis (vol.2)
Example 8.4
Problem: Given a graph G, find the shortest tour that visits
every vertex precisely once.
Decision version: Given a graph G and a positive integer k,
is there a ‘tour’ of all vertices of G with
total distance at most k?
The difference between the two versions does not affect the problem
complexity. The decision version requires an input k as a constant,
but the length of tour for the original problem is not required.
Example 8.5
Problem: Given a graph G, determine the smallest number
of colours needed to colour the vertices of G.
Decision version: Given a graph G and a positive integer k,
is there a colouring of G using at most k colours?
Again the difference between the two versions does not affect the
problem complexity. Similarly, the decision version requires an input
k as a constant, but the number of colours for the original problem
is not required.
1. It is possible that the algorithm for a problem has not yet been
found, or there is simply no algorithm for the problem
2. An algorithm behaves differently in various cases, typically, the
best, worst and average case
3. An algorithm may be known but much less efficient than the
optimal solution.
154
Problem classes
We can now divide problems into two broad classes, namely, open
problems and close problems, according to the gap between the
upper and lower bound of the problems.
Note:
155
CIS226 Software engineering, algorithm design and analysis (vol.2)
8.8 Class P
Why polynomials
1. Polynomials have nice so-called closure properties under
composition and addition. If a polynomial time algorithm
(feasible algorithm) calls another feasible algorithm as a
component, the composed algorithm is again a polynomial
algorithm. If two feasible algorithms run one after another, the
whole algorithm is again a polynomial algorithm.
2. All sequential digital computers are polynomially related. If a
problem is solvable in polynomial time on one conventional
computer, it can then be solved in polynomial time on another
computer.
3. We can say that if a problem is not in P, it will be extremely
expensive and probably impossible to solve in practice. In
general, if an algorithm is of exponential (or worse) then it is
feasible for small inputs only.
Note: the last observation is an empirical one and it does not hold in
all cases. An algorithm may take a few minutes in practice in the
average case but exponential in the worst case. Most polynomial
algorithms that grow faster than an order of cubic are almost useless
for a large input size in practice.
8.9 Class N P
1. Composite
Example 8.6
156
P and N P
CHECK(f )
Input: Input instance I for f , possible witness W that f (I) = 1.
Output: 1 if W is a genuine witness that f (I) = 1, 0 otherwise.
Now
CHECK(Composite)
CHECK(SAT )
CHECK(HC)
CHECK(CLIQUE)
CHECK(3-COL)
8.10 P and N P
157
CIS226 Software engineering, algorithm design and analysis (vol.2)
Hamiltonian cycle
Satisfiability
3-Colouring
Clique
Travelling salesman problem
Integer knapsack problem
Longest path in a graph
Edge colouring
Optimal scheduling
Minimising boolean expressions
Minimum graph partition
Composite number
Primality
158
NP-complete problems
If Phd is easy then P is also easy. If Phd is hard, can we say that P is
also hard? We cannot because there may be some other way to solve
P . What we can say is that Phd is at least as hard as P , or that P is
no harder than Phd .
159
CIS226 Software engineering, algorithm design and analysis (vol.2)
Bin packing
Knapsack
Graph colouring
Clique
Hamiltonian cycle
Travelling salesman problem
Set partitioning
Longest path
Activity 8.11
C OMPLEXITY THEORY
1. Explain, with an example, what the upper bound and lower
bound of a problem are.
2. What is class P? What is class N P?
3. What is a NP-complete problem?
4. What is important about polynomial complexity in classifying
problems?
5. Describe a decision problem with an example.
6. Suppose algorithms A1 and A2 have worst-case time complexity
bounded by t1 and t2 respectively. If algorithm A3 consists of
applying A2 to the output of A1 , i.e. the input for A3 is the input
of A1 , what is the worst-case time bound for A3 ?
160
Chapter 9
Consider two strings, one is called a text and the other pattern. The
161
CIS226 Software engineering, algorithm design and analysis (vol.2)
text is often longer than the pattern and the pattern is no longer
than the text. The goal of the algorithm is to find a matched string
called pattern P of length m within a text T of length n and the
position of P in T .
T a b c a a b b c a b
P a a b b
1. Naive approach
2. Boyer-Moore algorithm
3. KMP matching algorithm.
Terminology
162
The string ADT
int indexOf(char ch) returns the index within this string of the
first occurrence of the specified substring
char charAt(int i) returns the character at index i
boolean equals(Object o) compare this string to object o
boolean startsWith(String prefix) check if this string starts with
string prefix
boolean endsWith(String suffix) check if this string ends with
string suffix
String substring(int beginIndex) returns the substring of this
string beginning with the character at beginIndex
String substring(int beginIndex, int endIndex) returns the
substring of this string beginning with the character at
beginIndex
String concat(String s) concatenates the specified string to the end
of this string.
163
CIS226 Software engineering, algorithm design and analysis (vol.2)
length() 16
indexOf("dkkkk") 5
charAt(5) k
equals("gihhdkkkkdhhhhhk") true
startsWith("gihh") true
endsWith("algorithm") false
substring(5,10) kkkkdh
concat("yyy") gihhdkkkkdhhhhhkyyy
164
String matching
comparison in the text and the pattern can be specified using the
index values i and j. For instance, in step (6) of Example 9.4, i = 6
and j = 1 point to the first pair of comparison symbols T[6] and
P[1] respectively. While they are identical (marked by a ‘!’), let
i = i + 1; j = j + 1. The next two symbols for comparison are T[i]
and P[j] for the updated i and j, i.e. T[7] and P[2]. This process
continuous until T[i] and P[j] are mismatched. When they are
different (marked by a ‘x’), let i = i + 1 and j = 1, i.e. P is shifted
one position (marked by a ‘.’) to the right. A new starts over again.
As we can see, there is a shift by merely one position for each step
for the naive algorithm.
(1) (m)
j 1 2 3 4
P[j] k k d h
T[i] g g h h d k k k k d h h h h h k
i 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6
x (n)
(2) P[j] . k k d h
T[i] g g h h d k k k k d h h h h h k
x
(3) P[j] . k k d h
T[i] g g h h d k k k k d h h h h h k
x
(4) P[j] . k k d h
T[i] g g h h d k k k k d h h h h h k
x
(5) P[j] . k k d h
T[i] g g h h d k k k k d h h h h h k
x
(6) P[j] . k k d h
T[i] g g h h d k k k k d h h h h h k
! ! x
(7) P[j] . k k d h
T[i] g g h h d k k k k d h h h h h k
! ! x
(8) P[j] . k k d h
T[i] g g h h d k k k k d h h h h h k
! ! ! !
165
CIS226 Software engineering, algorithm design and analysis (vol.2)
9.5.2 Observation
166
String matching
1 i 1 2 3 4 5 ...
2 T[i] g g h h d k k k k d h h h h h k
3 P[j] . g k h d
4 j 1 2 3 4
5 x ! !
Secondly, the BM algorithm shifts the pattern from the left to right
in each outer iteration. However, the shift may skip certain
positions. The shift distance depends on whether the mismatched
symbol in the text appears in the not-yet-compared part of the
pattern and, if yes, on its position in the pattern.
!
1 T[i] g g h h d k k k k d h h h h h k
2 P[j] . g d k k (before shift)
3 ! x
4 P[j] . . g d k k (after shift)
5 !
Example 9.7 Consider the example below. The mismatched symbol ‘h’
in T is not in the not-yet-compared part of P, i.e. not in (g). Therefore,
the pattern P can be shifted safely to line-up with the first location
after the mismatched position. As we can see, the pattern jumps from
the location (see line 2) to the location (line 4) below, and shifted by 2
index positions.
1 T[i] g g h h d k k k k d h h h h h k
2 P[j] . g k h d (before shift)
3 x ! !
4 P[j] . g k h d (after shift)
1 T[i] g g h h d k k k k d h h h h h k
2 P[j] . g k k k (before shift)
3 x
4 P[j] . . . . g k k k (after shift)
167
CIS226 Software engineering, algorithm design and analysis (vol.2)
Example 9.9 Consider the question in Example 9.4 again. The steps
now become:10 10
we mark the previous step numbers
in a double quote for comparison.
(1) 1 1 1 1 1 1 1
i 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6
T[i] g g h h d k k k k d h h h h h k
(m) (n)
j 1 2 3 4
P[j] k k d h
(2) T[i] g g h h d k k k k d h h h h h k
P[j] . . . k k d h :"(4)"
(3) T[i] g g h h d k k k k d h h h h h k
P[j] . . k k d h :"(6)"
9.5.4 Observation
1. Main inefficiency: Despite many comparisons before a
conclusion of nomatch, we did not use any information gained
by the comparisons.
2. The KMP algorithm in the next section improves on this.
168
KMP Algorithm
matched part
s i
T[i]
1 j-1 j m
P[j]
1 k j-k j-1
P[j]
prefix suffix
Figure 9.1:
i 1 2 3 4 5 6 7 8
T[i] b a a b a c a a b a c c a b a c a b a a b b
P[j] a b a c a b a
j 1 2 3 4 5 6
169
CIS226 Software engineering, algorithm design and analysis (vol.2)
Consider the matched part of the pattern P[1..5]: abaca. Here ‘a’ in
abacaba is a prefix of P that is also a suffix of abaca, the first j − 1
characters. ‘The length of the longest prefix of P that is a suffix of
the first j − 1 characters’ in this example is therefore 1, i.e.
P [1..k] = P [j − k..j − 1], where k = 1.
i 1 2 3 4 5 6 7 8
T[i] b a a b a c a a b a c c a b a c a b a a b b
P[j] . . . . a b a c a b a
j 1 2
Note, in the next step, there is no need to compare the first pattern
symbol P[1] with T[7] because we know that, from the previous
step, P[5]=T[7]=P[1]. The next comparison can start from i = 8 in
T and j = 2 in P. The longer the matched prefix, the more
comparisons may be saved.
(1) i 1 2 3 4 5 6 7 8 9 10 · · ·
T[i] b a a c a b a c a b c b a b ···
P[j] a c a b a c a c c a
j 1 2 3 4 5 6 7 8 ···
The failure function F(j) is defined as the length of the longest prefix
of P that is a suffix of P[1..j], where j = 1..m, and F (j) = 0 if there is
no such prefix. In particular, F (1) = 0. The importance of F(j) is that
it encodes repeated substrings inside P itself, where j = 1..m..
P[j] a c a b a c a c c a
j 1 2 3 4 5 6 7 8 9 10
F[j] 0
P[j] a c a b a c a c c a
j 1 2 3 4 5 6 7 8 9 10
F[j] 0 0
P[j] a c a b a c a c c a
j 1 2 3 4 5 6 7 8 9 10
F[j] 0 0 3
170
KMP Algorithm
P[j] a c a b a c a c c a
j 1 2 3 4 5 6 7 8 9 10
F[j] 0 0 1 0 1 2 3 2 0 1
We now look at the string matching problem again and see how the
F[j] can be used to find a pattern.
1 m
P[j]
j
s m
T[i]
i
1 11
00
00
11
j-1 m
P[j]
00
11
00
11 j
1 m
P[j]
s 11
00
00
11
m
T[i]
00
11
00
11
j i
s m
T[i]
1 11
00
00
11
m
P[j]
00
11
i i+1 00
11
j
F[j]
1 m shift
P[j]
11
00
s
00
11
m
T[i]
00
11
j 00
11 i
(a) (b)
1 m
P[j]
j
s m
T[i]
i
1 m
P[j]
j
s m
T[i]
i
(c)
171
CIS226 Software engineering, algorithm design and analysis (vol.2)
Example 9.11
1 1 1 1 1 1 1 1 1 1 1 2
i 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
T[i] a b a c a a b a c c a b a c a b a a b b
(m) (n)
j 1 2 3 4 5 7
P[j] a b a c a b
! ! ! ! ! x
2 T[i] a b a c a a b a c c a b a c a b a a b b
P[j] . . . . a b a c a b
x
3 T[i] a b a c a a b a c c a b a c a b a a b b
P[j] . a b a c a b
! ! ! ! x
4 T[i] a b a c a a b a c c a b a c a b a a b b
P[j] . . . . a b a c a b
x
172
KMP Algorithm
5 T[i]a b a c a a b a c c a b a c a b a a b b
P[j] . a b a c a b
! ! ! ! ! !
‘!’ or ‘x’ represent a comparison,
where ‘!’ represents a match, and ‘x’ a mismatch.
Implementation
Activity 9.6
S TRING MATCHINGS
1. Trace by hand the operations of the following algorithms step by
step and calculate the number of comparisons made for the case
where P=‘ABAA’ for T=‘AAABBABABBAABABABAAABB’.
(a) Naive algorithm
173
CIS226 Software engineering, algorithm design and analysis (vol.2)
9.7 Tries
Example 9.12 A trie for strings (allex, ally, alone, along, arm, mixer,
mixture, more) from an alphabet (a,l,e,x,y,o,n,g,r,m,i,x,t,u).
( )
/ \
a m
/ \ / \
l r i o
/ \ \ | |
l o m x r
/ \ \ / \ |
e y n e t e
| / \ | |
x e g r u
|
r
|
e
First convert it to
(all%,ally,alone,along,arm,mix%,mixer,mixture,more)
174
Tries
Example 9.14 A trie for strings (allex, ally, alone, along, arm, mixer,
mixture, more) from an alphabet (a,l,e,x,y,o,n,g,r,m,i,x,t,u).
( )
/ \
a m
/ \ / \
l rm ix ore
/ \ / \
l on er ture
/ \ /\
ex y e g
Activity 9.7
T RIES
1. Derive a standard trie, and a compressed trie, that store the
following names: (aim, guy, jon, ann, jim, eva, amy, tim, ron,
kim, tom, roy, kay, dot).
2. The Hamming distance between two strings of same length X
and Y is the number of positions where the corresponding
symbols are different. What is the Hamming distance between
‘101100’ and ‘001101’?
3. The edit distance13 between two strings X and Y is defined as the 13
Also called ‘Levenshtein distance’.
minimum number of edit operations (insertion, deletion, or
substitution of a single character) to transform from string X to Y.
What is the edit distance between ‘algorithm’ and ‘rhythm’?
175
CIS226 Software engineering, algorithm design and analysis (vol.2)
176
Chapter 10
Having read this chapter and consulted the relevant material you
should be able to:
10.3 Quadtrees
Quadtrees are a tree data structure often used for partitioning and
storing images in a two dimensional space by recursively
subdividing it into four quadrants or regions. In a quadtree, each
internal node has at most 4 children. Figure 10.1 shows an example
of a quadtree.
177
CIS226 Software engineering, algorithm design and analysis (vol.2)
1 0
2 3
10.4 Octtrees
10.6 Operations
178
Parameter spaces
points
lines
polylines
rectangles
circles
ellipses.
The graphic objects are much more mobile than text. Text is usually
filled in pages consisting of characters with restricted number of
fonts and sizes. In contrast, pictures can be placed in different
places with thousands of colours, shapes and dimensions to choose
from. The objects in a graphical environment can be viewed from
almost any angles, too. Lights and shades are different for different
view points. Therefore, graphics require a much larger parameter
space. Almost every operation involves some geometry parameters.
The most common parameters required are as follows:
Activity 10.8
179
CIS226 Software engineering, algorithm design and analysis (vol.2)
1 11
00
00
110
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
11
00
00
1100
11 00
11
00
11
00
11
00
11
00
1100
11 00
11
00 11
11 00
11
00
11
00
11
00
1100
11 00
00
11
00
11
00
11
00 11
11 00
11
00
11
00
11
00
11
00
00
11
00 11
11 00
11
00
11
00
11
00
11
00
00
11
00
11
00
1100
11
00
11
00
11
00
11
00
11
00
1100
11
00
11
00
11
00
11 00
11
00
11
00
11
00
11
00
11
00
11
00
1100
11
00
11
00 11
00
11
11 00
00
11
00
11
00
1100
11
00 11
11 00
2 3
180
Chapter 11
11.1 Examination
181
CIS226 Software engineering, algorithm design and analysis (vol.2)
11.5 Recommendation
182
Good luck!
Hashing (maps)
Recursion as a problem-solving technique
Dynamic programming
Greedy approach
Intractability
Well done if you have made it this far to prepare for the
examination!
I hope you find the materials in this course module stimulating and
useful, and that you enjoy your studies as much as I have enjoyed
writing this subject guide.
183
Appendix A
The full marks for each question are 25. The marks for each
subquestion are displayed in brackets (e.g. [3]).
Section B
Question 4
1. Discuss briefly the time complexity in the worst case for the
algorithm below. Indicate the input, output of the algorithm and
the main comparison you have counted. [6]
insertionsort(int array[0..n-1])
input: ________________
output: ________________
for i=1 to n-1
current=array[i]
position=i-1
while position>=1 and current<array[position]
array[position+1]=array[position]
position=position-1
endwhile
array[position+1]=current
endfor
2. Draw the binary search tree for the sequence valued 15 13 17 14
11 16 19, where the values are inserted in this order. [3]
3. What is a (binary) heap? What are the two main properties of a
heap? [3]
4. Consider the list of integers in the array below, where i is the index
of the array. Suppose that the search target is 9. Discuss briefly
the difference between the behaviour of the sequential search
algorithm and the binary search algorithm. [6]
i 1 2 3 4 5 6 7
A[i] 7 4 8 3 9 1 2
5. Derive and draw a diagram of the compressed trie for the set of
strings below. [3]
185
CIS226 Software engineering, algorithm design and analysis (vol.2)
1 0
2 3
186
APPENDIX A. SAMPLE EXAMINATION PAPER
Question 5
187
CIS226 Software engineering, algorithm design and analysis (vol.2)
Question 6
A C F G
B E D H
2. Consider the following algorithm StackAndQueue. Explain what it
does essentially. Suppose that the stack S is, initially, empty and
the queue Q contains 3, 4, 5, 3, 2, 1 with 3 in front. What
are the elements in S and Q on completion of the execution of the
algorithm? [4]
StackAndQueue(queue: Q);
stack S
integer item
begin
initialise(S)
while not Empty(Q) do
begin
dequeue(Q,item)
push(S,item)
end
while not Empty(S) do
begin
pop(S,item)
enqueue(Q,item)
end
end
188
Appendix B
Sample solutions
Please be aware that these solutions are given in brief note form for the purpose of this
section. Candidates for the examination are usually expected to provide coherent
answers in full form to gain full marks. Also, there are other correct solutions which are
completely acceptable but which are not included here.
Section B
Question 4
189
CIS226 Software engineering, algorithm design and analysis (vol.2)
root
“e” “y”
1 0
2 3
190
APPENDIX B. SAMPLE SOLUTIONS
Question 5
1. [5]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23 3 4 5 26 6 7 55 16 39 17 22
2. The numbers visited: postorder: 4 8 5 2 6 9 7 3 1 [3]
3. public class TreeNode {
private Object treeItem;
private TreeNode leftChild, rightChild;
[1/9]
public TreeNode(Object newItem) {
treeItem = newItem;
leftChild = null;
rightChild = null;
} // Constructor
191
CIS226 Software engineering, algorithm design and analysis (vol.2)
4. Figure: [2/4]
B C
A D
E
List: [2/4]
A B C D E
A 0 0 0 0 1
B 1 0 0 0 0
C 0 0 0 0 0
D 0 0 0 0 0
E 0 1 1 1 0
5. [0.5x6+1]
a b a c a a b a d c a b a c a b a a b b
a b a c a b
1 (comparison)
a b a c a a b a d c a b a c a b a a b b
a b a c a b
4 3 2
a b a c a a b a d c a b a c a b a a b b
a b a c a b
5
a b a c a a b a d c a b a c a b a a b b
a b a c a b
6
a b a c a a b a d c a b a c a b a a b b
a b a c a b
7
a b a c a a b a d c a b a c a b a a b b
a b a c a b
1 1 1 1 9 8
3 2 1 0
192
APPENDIX B. SAMPLE SOLUTIONS
Question 6
193
CIS226 Software engineering, algorithm design and analysis (vol.2)
1 2 5 6 9 7 8 9
1 2 5 6 8 7 9 7
1 2 5 6 7 8 9 8
194
Appendix C
Pseudocode notation
C.1 Values
integers or digits: 0, 1, · · ·, 9
fraction, rational numbers
true, false
English characters or strings.
C.2 Types
C.3 Operations
← (assignment)
(), [], {}
+, −, ×, /
and (&&), or (||), xor (XOR)
<, >, ≤, ≥, =, 6=.
C.4 Priority
195
CIS226 Software engineering, algorithm design and analysis (vol.2)
function
procedure
method
(typed) method.
if – end if
for – end for
repeat – until
while – end while
return.
196
Examples of sequential structures
2. If-then-else
if condition then
other statements
end if
if condition then
other statements
else
other statements
end if
if condition then
other statements
else if condition then
other statements
else
other statements
end if
while condition do
other statements
end while
repeat
other statements
until condition
197
Examples of sequential structures
Acknowledgements
The author wishes to thank Dr David Brownwigg and editor
Christina Loziol-Webster for proof reading and valuable comments.
199