0% found this document useful (0 votes)
11 views

Algorithmic Problem Solving: Johan Sannemo 2020

Uploaded by

slf.ysh31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Algorithmic Problem Solving: Johan Sannemo 2020

Uploaded by

slf.ysh31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 404

Algorithmic Problem Solving

Johan Sannemo

2020
This version of the book is a preliminary draft. Expect to
find typos and other mistakes. If you do, please report them
to [email protected]. A number of sections and
chapters are also unfinished, and a number of problems are
not yet uploaded to the judge – this are known issues.
Note: the linked problems are sometimes available on
Kattis (https://open.kattis.com/problems/PROBLEMID)
and sometimes on Kodsport.dev
(https://kodsport.dev/problems/PROBLEMID). In this
particular version, you should try the first one for most
chapters.

ii
Contents

Preface ix

Reading this Book xi

I Preliminaries 1

1 Algorithms and Problems 3


1.1 Computational Problems . . . . . . . . . . . . . . . . . . . . 3
1.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Programming Languages . . . . . . . . . . . . . . . . . . . . 8
1.4 Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 The KS.Dev Online Judge . . . . . . . . . . . . . . . . . . . 10

2 Programming in C++ 15
2.1 Development Environments . . . . . . . . . . . . . . . . . . . 16
2.2 Hello World! . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Variables and Types . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6 If Statements . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7 For Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.8 While Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.9 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.10 Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.11 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.12 Lambdas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.13 The Preprocessor . . . . . . . . . . . . . . . . . . . . . . . . 43
2.14 Template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

iii
C ONTENTS

3 The C++ Standard Library 47


3.1 vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Iterators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3 queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4 stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5 priority_queue . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.6 set and map . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.7 Math . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.8 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.9 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.10 Input/Output . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4 Implementation Problems 65

5 Time Complexity 83
5.1 The Complexity of Insertion Sort . . . . . . . . . . . . . . . . 83
5.2 Asymptotic Notation . . . . . . . . . . . . . . . . . . . . . . 86
5.3 NP-complete problems . . . . . . . . . . . . . . . . . . . . . 92
5.4 Other Types of Complexities . . . . . . . . . . . . . . . . . . 92
5.5 The Importance of Constant Factors . . . . . . . . . . . . . . 92
5.6 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . 93
5.7 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6 Data Structures 97
6.1 Dynamic Arrays . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.2 Stacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.3 Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.4 Priority Queues . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.5 Bitsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.6 Hash Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

7 Recursion 117
7.1 Recursive Definitions . . . . . . . . . . . . . . . . . . . . . . 117
7.2 The Time Complexity of Recursive Functions . . . . . . . . . 120
7.3 Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.4 Multidimensional Recursion . . . . . . . . . . . . . . . . . . 126
7.5 Recursion vs. Iteration . . . . . . . . . . . . . . . . . . . . . 127

iv
C ONTENTS

8 Graph Theory 131


8.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
8.2 Representing Graphs . . . . . . . . . . . . . . . . . . . . . . 135
8.3 Breadth-First Search . . . . . . . . . . . . . . . . . . . . . . 138
8.4 Depth-First Search . . . . . . . . . . . . . . . . . . . . . . . 143
8.5 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

II Basics 147
9 Brute Force 149
9.1 Optimization Problems . . . . . . . . . . . . . . . . . . . . . 149
9.2 Generate and Test . . . . . . . . . . . . . . . . . . . . . . . . 150
9.3 Backtracking . . . . . . . . . . . . . . . . . . . . . . . . . . 154
9.4 Fixing Parameters . . . . . . . . . . . . . . . . . . . . . . . . 162
9.5 Meet in the Middle . . . . . . . . . . . . . . . . . . . . . . . 165
9.6 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 170

10 Greedy Algorithms 171


10.1 Change-making Problem . . . . . . . . . . . . . . . . . . . . 171
10.2 Optimal Substructure . . . . . . . . . . . . . . . . . . . . . . 172
10.3 Locally Optimal Choices . . . . . . . . . . . . . . . . . . . . 173
10.4 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
10.5 Huffman Coding . . . . . . . . . . . . . . . . . . . . . . . . 178

11 Dynamic Programming 181


11.1 Best Path in a DAG . . . . . . . . . . . . . . . . . . . . . . . 181
11.2 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . 183
11.3 Multidimensional DP . . . . . . . . . . . . . . . . . . . . . . 186
11.4 Subset DP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
11.5 Digit DP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
11.6 Standard Problems . . . . . . . . . . . . . . . . . . . . . . . 193

12 Divide and Conquer 201


12.1 Inductive Constructions . . . . . . . . . . . . . . . . . . . . . 201
12.2 Merge Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
12.3 Binary Search . . . . . . . . . . . . . . . . . . . . . . . . . . 210
12.4 Karatsuba’s algorithm . . . . . . . . . . . . . . . . . . . . . . 217

v
C ONTENTS

12.5 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 219

13 Data Structures 221


13.1 Disjoint Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
13.2 Range Queries . . . . . . . . . . . . . . . . . . . . . . . . . . 224
13.3 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 230

14 Graph Algorithms 231


14.1 Breadth-First Search . . . . . . . . . . . . . . . . . . . . . . 231
14.2 Depth-First Search . . . . . . . . . . . . . . . . . . . . . . . 236
14.3 Weighted Shortest Path . . . . . . . . . . . . . . . . . . . . . 239
14.4 Minimum Spanning Tree . . . . . . . . . . . . . . . . . . . . 242
14.5 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 244

15 Maximum Flows 247


15.1 Flow Networks . . . . . . . . . . . . . . . . . . . . . . . . . 247
15.2 Edmonds-Karp . . . . . . . . . . . . . . . . . . . . . . . . . 249
15.3 Applications of Flows . . . . . . . . . . . . . . . . . . . . . . 252
15.4 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 254

16 Strings 255
16.1 Tries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
16.2 String Matching . . . . . . . . . . . . . . . . . . . . . . . . . 260
16.3 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 265

17 Combinatorics 267
17.1 The Addition and Multiplication Principles . . . . . . . . . . 267
17.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . 270
17.3 Ordered Subsets . . . . . . . . . . . . . . . . . . . . . . . . . 276
17.4 Binomial Coefficients . . . . . . . . . . . . . . . . . . . . . . 277
17.5 The Principle of Inclusion and Exclusion . . . . . . . . . . . . 286
17.6 The Pigeon Hole Principle . . . . . . . . . . . . . . . . . . . 288
17.7 Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
17.8 Monovariants . . . . . . . . . . . . . . . . . . . . . . . . . . 290
17.9 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 295

vi
C ONTENTS

18 Game Theory 297


18.1 Mathematical Techniques . . . . . . . . . . . . . . . . . . . . 297
18.2 The Graph Game . . . . . . . . . . . . . . . . . . . . . . . . 299

19 Number Theory 303


19.1 Divisibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
19.2 Prime Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 310
19.3 The Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . 319
19.4 Modular Arithmetic . . . . . . . . . . . . . . . . . . . . . . . 329
19.5 Chinese Remainder Theorem . . . . . . . . . . . . . . . . . . 332
19.6 Euler’s totient function . . . . . . . . . . . . . . . . . . . . . 335

20 Competitive Programming Strategy 343


20.1 IOI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
20.2 ICPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

21 Papers 351
21.1 Paper 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352

III Advanced Topics 355


22 Data Structures 357
22.1 Self-Balancing Trees . . . . . . . . . . . . . . . . . . . . . . 357
22.2 Persistent Data Structures . . . . . . . . . . . . . . . . . . . . 357
22.3 Heavy-Light Decomposition . . . . . . . . . . . . . . . . . . 357

23 Combinatorics 359
23.1 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . 359

24 Strings 361
24.1 Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
24.2 Dynamic Hashing . . . . . . . . . . . . . . . . . . . . . . . . 370

A Discrete Mathematics 371


A.1 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
A.2 Sets and Sequences . . . . . . . . . . . . . . . . . . . . . . . 374
A.3 Sums and Products . . . . . . . . . . . . . . . . . . . . . . . 377

vii
C ONTENTS

Hints 381

Solutions 383

Bibliography 387

Index 390

viii
Preface
Algorithmic problem solving is the art of formulating efficient methods that
solve problems of a mathematical nature. From the many numerical algorithms
developed by the ancient Babylonians to the founding of graph theory by Euler,
algorithmic problem solving has been a popular intellectual pursuit during
the last few thousand years. For a long time, it was a purely mathematical
endeavor with algorithms meant to be executed by hand. During the recent
decades algorithmic problem solving has evolved. What was mainly a topic of
research became a mind sport known as competitive programming. As a sport
algorithmic problem solving rose in popularity with the largest competitions
attracting tens of thousands of programmers. While its mathematical counterpart
has a rich literature, there are only a few books on algorithms with a strong
problem solving focus.
The purpose of this book is to contribute to the literature of algorithmic
problem solving in two ways. First of all, it tries to fill in some holes in
existing books. Many topics in algorithmic problem solving lack any treatment
at all in the literature – at least in English books. Much of the content is
instead documented only in blog posts and solutions to problems from various
competitions. While this book attempts to rectify this, it is not to detract from
those sources. Many of the best treatments of an algorithmic topic I have seen
are as part of a well-written solution to a problem. However, there is value in
completeness and coherence when treating such a large area. Secondly, I hope
to provide another way of learning the basics of algorithmic problem solving by
helping the reader build an intuition for problem solving. A large part of this
book describes techniques using worked-through examples of problems. These
examples attempt not only to describe the manner in which a problem is solved,
but to give an insight into how a thought process might be guided to yield the
insights necessary to arrive at a solution.
This book is different from pure programming books and most other
algorithm textbooks. Programming books are mostly either in-depth studies of
a specific programming language or describe various programming paradigms.
A single language is used in this book – C++. The text on C++ exists for the

ix
C ONTENTS

sole purpose of enabling those readers without prior programming experience to


implement the solutions to algorithm problems. Such a treatment is necessarily
minimal and teach neither good coding style nor advanced programming concepts.
Algorithm textbooks teach primarily algorithm analysis, basic algorithm design,
and some standard algorithms and data structures. They seldom include as much
problem solving as this book does. The book also falls somewhere between
the practical nature of a programming book and the heavy theory of algorithm
textbooks. This is in part due to the book’s dual nature of being not only about
algorithmic problem solving, but also competitive programming to some extent.
As such there is more real code and efficient C++ implementations of algorithms
included compared to most algorithm books.

Acknowledgments. First and foremost, thanks to Per Austrin who provided


much valuable advice and feedback during the writing of this book. Thanks to
Simon and Mårten who have competed with me for several years as Omogen
Heap. A lot of the knowledge in this book has its roots in you. Finally, thanks to
several others who have read through drafts and caught numerous mistakes of
my own.

x
Reading this Book
This book consists of three parts. The first part contains some preliminary
background, such as algorithm analysis and programming in C++. With an
undergraduate education in computer science most of these chapters are probably
familiar to you. It is recommended that you at least skim through the first part
since the remainder of the book assumes you know the contents of the preliminary
chapters.
The second part makes up most of the material in the book. Some of it
should be familiar if you have taken a course in algorithms and data structures.
The take on those topics is a bit different compared to an algorithms course. We
therefore recommend that you read through even the parts you feel familiar with
– in particular those on the basic problem solving paradigms, i.e. brute force,
greedy algorithms, dynamic programming and divide & conquer. The chapters
in this part are structured so that a chapter builds upon only the preliminaries
and previous chapters to the largest extent possible.
In the third part you will find the advanced topics. These are extensions of
the topics from the second part. This part is less cohesive, with few dependencies
between chapters. You can to a larger degree choose what topics you wish to
study, though most of them depend on several of the chapters from the basics.
At the end of the book you can find an appendix with some mathematical
background, together with hints and solutions for selected exercies.
When reading this book, know that every problem and technique was chosen
with care; every step on the way in a solution added to provide value. Sometimes,
this can make the book feel boring – a solution can take a long time tracing out
the intuition behind some small step, or show partial solutions that are unused
in the final result. At other times, missing a single sentence can leave you with a
crucial gap in your knowledge. I have tried to make sure that every sentence
written is important; when the book is long-winding, trust that it is useful, and
when difficult, endure to make sure you attain the deep understanding I hope
this book will be able to provide.
Similarly, the exercises are meant as attempts for you to construct some
crucial knowledge on your own. There may be fewer end-of-chapter exercises

xi
C ONTENTS

than you might be used to in a textbook, and more exercises inlined in chapters.
This is because we expect you to solve all exercises as part of the reading of the
book. Sometimes, the text after an exercise will assume that you read and solved
the exercise. The lecture analogue would be the lecturer pausing to ask the class
a question; only giving an answer if none is provided by the class. Since this is
a book, you are blessed with unlimited time to think in contrast to the lecture
setting, where you typically get on the order of minutes. Some exercises took
the author on the order of hours to solve at first, so do not feel disparaged if you
find them difficult. At the back of the book, you find hints and solutions for
selected exercises. If you fail to solve an exercise, first check if it has a hint, and
give it another attempt.
This book can also be used to improve your competitive programming
skills. Some parts are unique to competitive programming (in particular
Chapter 20 on contest strategy). This knowledge is extracted into competitive
tips:
Competitive Tip
A competitive tip contains some information specific to competitive programming.
These can be safely ignored if you are interested only in the problem solving aspect
and not the competitions.

The book often refers to exercises from the Kodsport.dev online judge:

Problem 0.1
Problem Name – problemid

The URL of such a problem is https://kodsport.dev/problems/problemid.


The C++ code in this book makes use of some preprocessor directives from
a template. Even if you are familiar with C++ (or do not wish to learn it) we
still recommend that you read through this template (section 2.14) to better
understand the C++ code in the book.

xii
Part I

Preliminaries

1
1 Algorithms and Problems
The greatest technical invention of the last century was probably the digital
general purpose computer. It was the start of the revolution which provided us
with the Internet, smartphones, tablets, and the computerization of society.
To harness the power of computers we use programming. Programming is
the art of developing a solution to a computational problem, in the form of a set
of instructions that a computer can execute. These instructions are what we call
code, and the language in which they are written a programming language. The
abstract method that such code describes is what we call an algorithm.
The aim of algorithmic problem solving is thus to, given a computational
problem, devise an algorithm that solves it. One does not necessarily need to
complete the full programming process (i.e. write code that implements the
algorithm in a programming language) to enjoy solving algorithmic problems.
However, it often provides more insight and trains you at finding simpler
algorithms to problems.
In this chapter, we begin our journey into algorithmic problem solving by
taking a closer look at these concepts and showing a solution to a common
problem.

1.1 Computational Problems


A computational problem generally consists of two parts. First, it needs an
input description, such as “a sequence of integers”, “a text string”, or some other
kind of mathematical object. Using this input, we have a goal which we want
to accomplish defined by an output description. For example, a computational
problem might require us to sort a given sequence of integers. This particular
problem is called the Sorting Problem:

3
C HAPTER 1. A LGORITHMS AND P ROBLEMS

Sorting
Your task is to sort a sequence of integers in ascending order, i.e. from the
lowest to the highest.
Input
The input is a sequence of 𝑁 integers 𝑎 0, 𝑎 1, ..., 𝑎 𝑁 −1 .
Output
Output a permutation 𝑎 0 of the sequence 𝑎, such that 𝑎 00 ≤ 𝑎 10 ≤ ... ≤ 𝑎 𝑁0 −1 .

A particular input to a computational problem is called an instance of the


problem. To the sorting problem, the sequence 3, 6, 1, −1, 2, 2 would be an
instance. The correct output for this particular problem would be −1, 1, 2, 2, 3, 6.

Exercise 1.1. If you were given cards with 5 different integers 1 and 1 000 000
written on them, how would you sort them in ascending order? How would your
approach change if you had 30 integers? 1000? 1 000 000?

Some variations of this problem format appears later (such as problems


without inputs) but in general this is what the problems look like.
Competitive Tip
Problem statements sometimes contain huge amounts of text. Skimming through the
input and output sections before any other text in a problem can often give you a quick
idea about its topic and difficulty. This helps in determining what problems to solve
first when posed with a large number of problems and little time.

Exercise 1.2. What are the input and output descriptions for the following
computational problems?
1) Compute the greatest common divisor (see Def. 19.5, page 319 if you
are not familiar with the concept) of two numbers.
2) Find a root (i.e. a zero) of a polynomial.
3) Multiply two numbers.

Exercise 1.3. Consider the following problem. I am thinking of an integer


between 1 and 100. Your task is to find this number by giving me integers, one
at a time. I will tell you whether the given integer is higher, lower or equal to 𝑥.
This is an interactive, or online, computational problem. How would you
describe the input and output to it? Why do you think it is called interactive?

4
1.2. A LGORITHMS

1.2 Algorithms
Algorithms are solutions to computational problems. They define methods
that use the input to a problem in order to produce the correct output. A
computational problem can have many solutions. Efficient algorithms to solve
the sorting problem form an entire research area! Let us look at one possible
sorting algorithm, called selection sort, as an example.

Selection Sort
We construct the answer, the sorted sequence, iteratively one element at a
time, starting with the smallest.
Assume that we have chosen and sorted the 𝐾 smallest elements of the
original sequence. Then, the smallest unchosen element remaining in that
sequence must be the (𝐾 + 1)’st smallest element of the original sequence.
Thus, by finding the smallest element among those that remain we know what
the (𝐾 + 1)’st element of the sorted sequence is. By appending this element
to the already sorted 𝐾 smallest elements we get the sorted 𝐾 + 1 smallest
elements of the output.
If we repeat this process 𝑁 times, the result is the 𝑁 numbers of the
original sequence, but sorted. 

You can see this algorithm performed on our previous example instance (the
sequence 3, 6, 1, −1, 2, 2) in Figures 1.1a-1.1f.
So far, we have been vague about what exactly an algorithm is. Looking
at our Selection Sort example, we do not have any particular structure or rigor
in the description of our method. There is nothing inherently wrong with
describing algorithms this way. It is easy to understand and gives the writer an
opportunity to provide context as to why certain actions are performed, making
the correctness of the algorithm more obvious. The main downsides of such a
description are ambiguity and a lack of detail.
Until an algorithm is described in sufficient detail, it is possible to accidentally
abstract away operations we may not know how to perform behind a few English
words. As a somewhat contrived example, our plain text description of selection
sort includes actions such as “choosing the smallest number of a sequence”.
While such an operation may seem very simple to us humans, algorithms are
generally constructed with regards to some kind of computer. Unfortunately,
computers can not map such English expressions to their code counterparts yet.
Instructing a computer to execute an algorithm thus requires us to formulate our

5
C HAPTER 1. A LGORITHMS AND P ROBLEMS

3 6 1 −1 2 2

(a) Originally, we start out with the unsorted sequence (3, 6, 1, −1, 2, 2).

−1 3 6 1 2 2

(b) The smallest element of the sequence is −1, so this is the first element of the sorted sequence.

−1 1 3 6 2 2

(c) We find the next element of the output by removing the −1 and finding the smallest remaining
element – in this case 1.

−1 1 2 3 6 2

(d) Here, there is no unique smallest element. We can choose any of the two 2’s in this case.

−1 1 2 2 3 6
−1 1 2 2 3 6

(e) The next two elements chosen will be a 2 and a 3.

−1 1 2 2 3 6

(f) Finally, we choose the last remaining element of the input sequence – the 6. This concludes
the sorting of our sequence.

Figure 1.1: An example execution of selection sort.

algorithm in steps small enough that even a computer knows how to perform
them. In this sense, a computer is rather stupid.
The English language is also ambiguous. We are sloppy with references
to “this variable” and “that set”, relying on context to clarify meaning for us.
We use confusing terminology and frequently misunderstand each other. Real
code does not have this problem. It forces us to be specific with what we mean.
However, as all programmers know, we often manage to construct highly specific
algorithms that do the wrong thing due to our own erronous thought processes.
We will generally describe our algorithms in a representation called pseudo
code (Section 1.4), accompanied by an online exercise to implement the code.
Sometimes, we will instead give explicit code that solves a problem. This will
be the case whenever an algorithm is very complex, or care must be taken to
make the implementation efficient. The goal is that you should get to practice
understanding pseudo code, while still ending up with correct implementations

6
1.2. A LGORITHMS

of the algorithms (thus the online exercises).

Exercise 1.4. Do you know any algorithms, for example from school? (Hint:
you use many algorithms to solve certain arithmetic and algebraic problems,
such as those in Exercise 1.2.)

Exercise 1.5. In Exercise 1.1, you were asked to come up with your own
approaches to the sorting problem. Attempt to write them down formally as
descriptions of algorithms.

Exercise 1.6. Construct an algorithm that solves the guessing problem in


exercise 1.3 using as few questions as possible. How many questions does it
use?

Correctness
One subtle, albeit important, point that we glossed over is what it means for an
algorithm to actually be correct.
There are two common notions of correctness – partial correctness and total
correctness. Partial correctness requires an algorithm to, upon termination, have
produced an output that fulfills all the criteria laid out in the output description.
Total correctness additionally requires an algorithm to finish within finite time.
When we talk about correctness of our algorithms later on, we generally focus on
the partial correctness. Termination is instead proved implicitly, as we consider
a more granular measure of efficiency (called time complexity, in Chapter 5) than
just finite termination. This measure implies the termination of the algorithm,
completing the proof of total correctness.
Proving that the selection sort algorithm finishes in finite time is quite easy.
It performs one iteration of the selection step for each element in the original
sequence (which is finite). Furthermore, each such iteration can be performed in
finite time by looking at each remaining element of the selection when finding
the smallest one. The remaining sequence is a subsequence of the original one
and is therefore also finite.
Proving that the algorithm produces the correct output is a bit more difficult
to prove formally. The main idea behind a formal proof is contained within our
description of the algorithm itself.
While this definition seems clear enough – our algorithm should simply do
what the problem asks of it! – we will compromise on both conditions at later
points in the book. Generally, we are satisfied with an algorithm terminating in

7
C HAPTER 1. A LGORITHMS AND P ROBLEMS

expected finite time or answering correctly with, say, probability 0.75 for every
input. Similarly, we are sometimes happy to find an approximate solution to a
problem. What this means more concretely will become clear in due time when
we study such algorithms.
Competitive Tip
Proving your algorithm correct is sometimes quite difficult. In a competition, a correct
algorithm is correct even if you cannot prove it. If you have an idea you think is correct
it may be worth testing. This is not a strategy without problems though, since it makes
distinguishing between an incorrect algorithm and an incorrect implementation even
harder.

Exercise 1.7. Prove the correctness of your algorithm to the guessing problem
from Exercise 1.6 and your sorting algorithms from Exercise 1.5.

Exercise 1.8. Why would an algorithm that is correct with e.g. probability 0.75
still be very useful to us?
Why is it important that such an algorithm is correct with probability 0.75
on every problem instance, instead of always being correct for 75% of all cases?

1.3 Programming Languages


The purpose of programming languages is to formulate methods at a level of
detail where a computer could perform them. While we in textual descriptions
of methods are often satisfied with describing what we wish to do, programming
languages require considerably more constructive descriptions. Computers are
quite basic creatures compared to us humans. They only understand a very
limited set of instructions such as adding numbers, multiplying numbers, or
moving data around within its memory. The syntax of programming languages
often seems a bit arcane at first, but it grows on you with coding experience.
To complicate matters further, programming languages themselves define
a spectrum of expressiveness. On the lowest level, programming deals with
electrical current in your processor. Current above or below a certain threshold is
used to represent the binary digits 0 and 1. Above these circuit-level electronics
lies a processor’s own programming, often called microcode. Using this, a
processor implements machine code, such as the x86 instruction set. Machine
code is often written using a higher-level syntax called Assembly. While some
code is written in this rather low-level language, we mostly abstract away details
of them in high-level languages such as C++ (this book’s language of choice).

8
1.4. P SEUDO C ODE

This knowledge is somewhat useless from a problem solving standpoint,


but intimate knowledge of how a computer works is of high importance in
software engineering, and is occasionally helpful in programming competitions.
Therefore, you should not be surprised about certain remarks relating to these
low-level concepts.
These facts also provide some motivation for why we use something called
compilers. When programming in C++ we can not immediately tell a computer
to run our code. As you now know, C++ is code at a higher level than what
the processor of a computer can run. A compiler takes care of this problem by
translating our C++ code into machine code that the processor knows how to
handle. It is a program of its own and takes the code files we write as input and
produces executable files that we can run on the computer. The process and
purpose of a compiler is somewhat like what we do ourselves when translating
a method from English sentences or our own thoughts into the lower level
language of C++.

1.4 Pseudo Code


Somewhere in between describing algorithms in English text and in a pro-
gramming language we find something called pseudo code. As hinted by its
name it is not quite real code. The instructions we write are not part of the
programming language of any particular computer. The point of pseudo code
is to be independent of the computer it is implemented on. Instead, it tries
to convey the main points of an algorithm in a detailed manner so that it can
easily be translated into any particular programming language. Secondly, we
sometimes fall back to the liberties of the English language. At some point, we
may decide that “choose the smallest number of a sequence” is clear enough for
our audience.
With an explanation of this distinction in hand, let us look at a concrete
example of pseudo code. The honor of being an example again falls upon
selection sort, now described in pseudo code:

1: procedure SelectionSort(sequence 𝐴)
2: Let 𝐴 0 be an empty sequence
3: while 𝐴 is not empty do
4: minIndex ← 0
5: for every element 𝐴𝑖 in 𝐴 do

9
C HAPTER 1. A LGORITHMS AND P ROBLEMS

6: if 𝐴𝑖 < 𝐴minIndex then


7: minIndex ← 𝑖
8: Append 𝐴minIndex to 𝐴 0
9: Remove 𝐴minIndex from 𝐴
10: return the sequence 𝐴 0

Pseudo code reads somewhat like our English language variant of the
algorithm, except the actions are broken down into much smaller pieces. Most
of the constructs of our pseudo code are more or less obvious. The notation
variable ← 𝑣𝑎𝑙𝑢𝑒 is how we denote an assignment in pseudo code. For those
without programming experience, this means that the variable named variable
now takes the value 𝑣𝑎𝑙𝑢𝑒. Pseudo code appears when we try to explain some
part of a solution in great detail but programming language specific aspects
would draw attention away from the algorithm itself.
Competitive Tip
In team competitions where a team only have a single computer, a team will often
have solved problems waiting to be coded. Writing pseudo code of the solution to one
of these problems while waiting for computer time is an efficient way to parallelize
your work. This can be practiced by writing pseudo code on paper even when you are
solving problems by yourself.

Exercise 1.9. Write pseudo code for your algorithm to the guessing problem
from Exercise 1.6.

1.5 The KS.Dev Online Judge


Most of the exercises in this book exist as problems on the KS.Dev web system.
You can find it at https://kodsport.dev. KS.Dev is a so called online judge. It
contains a large collection of computational problems, and allows you to submit
a program you have written that purports to solve a problem. KS.Dev will then
run your program on a large number of predetermined instances of the problem
called the problem’s test data.
Problems on an online judge include some additional information compared
to our example problem. Since actual computers only have a finite amount of
time and memory, the amount of these resources available to our programs are
limited when solving an instance of a problem. This also means that the size of
inputs to a problem need to be constrained as well, or else the resource limits for

10
1.5. T HE KS.D EV O NLINE J UDGE

a given problem would not be obtainable – an arbitrarily large input generally


takes arbitrarily large time to process, even for a computer. A more complete
version of the sorting problem as given in a competition could look like this:

Sorting
Time: 1s, memory: 1MB
Your task is to sort a sequence of integers in ascending order, i.e. from the
lowest to the highest.
Input
The input is a sequence of 𝑁 integers (1 ≤ 𝑁 ≤ 1000) 𝑎 0, 𝑎 1, ..., 𝑎 𝑁 −1 (|𝑎𝑖 | ≤
109 ).
Output
Output a permutation 𝑎 0 of the sequence 𝑎, such that 𝑎 00 ≤ 𝑎 10 ≤ ... ≤ 𝑎 𝑁0 −1 .

If your program exceeds the allowed resource limits (i.e. takes too much
time or memory), crashes, or gives an invalid output, KS.Dev will tell you so
with a rejected judgment. There are many kinds of rejected judgments, such as
Wrong Answer, Time Limit Exceeded, and Run-time Error. These mean your
program gave an incorrect output, took too much time, and crashed, respectively.
Assuming your program passes all the instances, it will be be given the Accepted
judgment.
Note that getting a program accepted by KS.Dev is not the same as having a
correct program – it is a necessary but not sufficient criterion for correctness.
This is also a fact that can sometimes be exploited during competitions by
writing a knowingly incorrect solution that one thinks will pass all test cases
that the judges of the competitions designed.
We strongly recommend that you get a (free) account on KS.Dev so that you
can follow along with the book’s exercises.

Exercise 1.10. Register an account on KS.Dev.

Many other online judges exists, such as:

• Kattis (https://open.kattis.com)

• Codeforces (http://codeforces.com)

• CSAcademy (https://csacademy.com)

11
C HAPTER 1. A LGORITHMS AND P ROBLEMS

• AtCoder (https://atcoder.jp)

• TopCoder (https://topcoder.com)

• HackerRank (https://hackerrank.com)

Chapter Exercises
Exercise 1.11. Pick two sorting algorithms from Wikipedia’s list of sorting
algorithms: https://en.wikipedia.org/wiki/Category:Sorting_algorithms. Try
to understand them and their proof of correctness. Use them by hand to sort the
integers 5, 1, 2, 7, 5, 6, 2, 9.

Exercise 1.12. Consider the following problems:

Palindrome
A word is a palindrome if it reads the same forwards and backwards, for example
tacocat, madam, or abba. Determine if a word is a palindrome.

Input
The input consists of a single word, containing only lowercase letters a-z.
Output
Output yes if the word is a palindrome and no otherwise.

Primality
We call an integer 𝑛 > 1 a prime if its only positive divisors are 1 and 𝑛.
Determine if a particular integer is a prime.
Input
The input consists of a single integer 𝑛 > 1.
Output
Output yes if the number 𝑛 was a prime and no otherwise.
For each of them,

1. devise an algorithm to solve it,

2. formalize the algorithm and write it down in pseudo code, and

3. prove the correctness of the algorithm.

12
1.5. T HE KS.D EV O NLINE J UDGE

Chapter Notes
The introductions given in this chapter are very bare, mostly stripped down to
what you need to get by when solving algorithmic problems.
Many other books delve deeper into the theoretical study of algorithms
than we do, in particular regarding subjects not relevant to algorithmic problem
solving. Introduction to Algorithms [7] is a rigorous introductory text book on
algorithms with both depth and breadth.
For a gentle introduction to the technology that underlies computers, CODE
[23] is a well-written journey from the basics of bits and bytes all the way up to
assembly code and operating systems. It requires no knowledge of programming
to read.

13
C HAPTER 1. A LGORITHMS AND P ROBLEMS

14
2 Programming in C++
In this chapter we learn the basics of the C++ programming language. This
language is the most common programming language within the competitive
programming community for a few reasons (aside from C++ being a popular
language in general). Programs coded in C++ are generally somewhat faster
than those written in most other competitive programming languages. There are
also many routines in the accompanying standard code libraries that are useful
when implementing algorithms.
Of course, no language is without downsides. C++ is a bit difficult to learn
as your first programming language to say the least. Its error management is
unforgiving, often causing erratic behavior in programs instead of crashing with
an error. Programming certain things become quite verbose, compared to many
other languages.
After bashing the difficulty of C++, you might ask if it really is the best
language in order to get started with algorithmic problem solving. While
there certainly are simpler languages we believe that the benefits outweigh the
disadvantages in the long term even though it demands more from you as a
reader. Either way, it is definitely the language we have the most experience of
teaching problem solving with.
When you study this chapter, you will see a lot of example code. Type
this code and run it. We can not really stress this point enough. Learning
programming from scratch – in particular a complicated language such as C++ –
is not possible unless you try the concepts yourself. Additionally, we strongly
recommend that you do every exercise in this chapter, even moreso than in the
other chapters.
Finally, know that our treatment of C++ is minimal. We do not explain all
the details behind the language, nor do we teach good coding style or general
software engineering principles. In fact, we frequently make use of bad coding
practices. If you want to delve deeper, you can find more resources in the chapter
notes.

15
C HAPTER 2. P ROGRAMMING IN C++

2.1 Development Environments


Before we get to the juicy parts of C++ you need to install a compiler for C++
and (optionally) a code editor.
We recommend the editor Visual Studio Code. The installation procedure
varies depending on what operating system you use. We provide them for
Windows, Ubuntu and macOS. If you choose to use some other editor, compiler
or operating system you must find out how to perform the corresponding actions
(such as compiling and running code) yourself.
Note that instructions like these tend to rot, with applications disappearing
from the web, operating systems changing names, and so on. In that case, you
are on your own and have to find instructions by yourself.

Windows

Installing a C++ compiler is somewhat complicated in Windows. We recommend


installing the Mingw-w64 compiler from http://www.mingw-w64.org/.
After installing the compiler, you can download the installer for Visual
Studio Code from https://code.visualstudio.com/.

Ubuntu

On Ubuntu, or similar Linux-based operating systems, you need to install


the GCC C++ compiler, which is the most popular compiled for Linux-based
systems. It is called g++ in most package managers and can be downloaded with
the command sudo apt-get install g++. After installing the compiler, you can
download the installer for Visual Studio Code from https://code.visualstudio.
com/. Choose the deb installer.

macOS

When using macOS, you first need to install the Clang compiler by installing
Xcode from the Mac App Store. This is also a code editor, but the compiler is
bundled with it.
After installing the compiler, you can download the installer for Visual
Studio Code from https://code.visualstudio.com/. It is available as a normal
macOS package for installation.

16
2.2. H ELLO W ORLD !

Installing the C++ tools


Now that you have installed the compiler and Visual Studio Code, you need
to install the C++ plugin for Visual Studio Code. You can do this by open-
ing the program, launching Quick Open (using Ctrl+P), typing ext install
ms-vscode.cpptools, and pressing Enter. Then, launch Quick Open again, but
this time type ext install formulahendry.code-runner instead.
The tools need to be configured a bit. Press Ctrl+Shift+P and serach for
Open Settings. Select Open Settings (JSON) in the list. Here, enter the following
configuration:
{
"code-runner.runInTerminal": true,
"code-runner.saveAllFilesBeforeRun": true,
"code-runner.executorMap": {
"cpp": "cd $dir && g++ $fileName -fsanitize=undefined,address -o ←↪
$fileNameWithoutExt -std=c++17 -Wall && $dir$fileNameWithoutExt"
}
}

Note that the line ending with ←↪ denotes that the text on the following line
should be on the same line.
Then, restart your editor again.

2.2 Hello World!


Now that you have a compiler and editor ready, it is time to learn the basic
structure of a C++ program. The classical example of a program when learning
a new language is to print the text Hello World!. We also solve our first KS.Dev
problem in this section.

Exercise 2.1. Throughout this chapter, you will learn many concepts within
C++. We recommend that you create a notebook (for example in a file on your
computer) where you write down how the different constructs are used when
programming to keep as a reference for later.

Start by opening Visual Studio Code and create a new file by going to File
⇒ New File. Save the file as hello.cpp by pressing Ctrl+S. Make sure to save it
somewhere you can find it.
Now, type the code from Listing 2.1 into your editor.

17
C HAPTER 2. P ROGRAMMING IN C++

Listing 2.1 Hello World!

1 #include <iostream>
2
3 using namespace std;
4
5 int main() {
6 // Print Hello World!
7 cout << "Hello World!" << endl;
8 }

To run the program in Visual Studio Code, you press Ctrl+Alt+N. A tab
below your code named TERMINAL containing the text Hello World! should appear.
If no window appears, you probably mistyped the program.
Coincidentally, KS.Dev happens to have a problem whose output description
dictates that your program should print the text Hello World!. How convenient.
This is a great opportunity to get familiar with KS.Dev.
Problem 2.1
Hello World! – hello

When you submit your solution, KS.Dev grades it and give you its judgment.
If you typed everything correctly, KS.Dev tells you it got Accepted. Otherwise,
you probably got Wrong Answer, meaning your program output the wrong text
(and you mistyped the code).
Now that you have managed to solve the problem, it is time to talk a bit
about the code you typed.
The first line of the code,
#include <iostream>

is used to include the iostream – input and output stream – file from the so-called
standard library of C++. The standard library is a large collection of ready-to-use
algorithms, data structures, and other routines which you can use when coding.
For example, there are sorting routines in the C++ standard library, meaning you
do not need to implement your own sorting algorithm when coding solutions.
Later on, we will see other useful examples of the standard library and
include many more files. The iostream file contains routines for reading and
writing data to your screen. Your program used code from this file when it
printed Hello World! upon execution.

18
2.2. H ELLO W ORLD !

On some platforms, there is a special include file called bits/stdc++.h. This


file includes the entire standard library. You can check if it is available on your
platform by including it using
#include <bits/stdc++.h>

in the beginning of your code. If your program still compiles, you can use this
and not include anything else. By using this line you do not have to care about
including any other files from the standard library which you wish to use.
The third line,
using namespace std;

tells the compiler that we wish to use code from the standard library. If we
did not use it, we would have to specify this every time we used code from the
standard library later in our program by prefixing what we use from the library
by std:: (for example std::cout).
The fifth line defines our main function. When we instruct the computer to
run our program the computer starts looking at this point for code to execute.
The first line of the main function is thus where the program starts to run with
further lines in the function executed sequentially. Later on we learn how to
define and use additional functions as a way of structuring our code. Note that
the code in a function – its body – must be enclosed by curly brackets. Without
them, we would not know which lines belonged to the function.
On line 6, we wrote a comment
// Print Hello World!

Comments are explanatory lines which are not executed by the computer. The
purpose of a comment is to explain what the code around it does and why. They
begin with two slashes // and continue until the end of the current line.
It is not until the seventh line that things start happening in the program. We
use the standard library utility cout to print text to the screen. This is done by
writing e.g.:
cout << "this is text you want to print. ";
cout << "you can " << "also print " << "multiple things. ";
cout << "to print a new line" << endl << "you print endl" << endl;
cout << "without any quotes" << endl;

Lines that do things in C++ are called statements. Note the semi colon at
the end of the line! Semi colons are used to specify the end of a statement, and
are mandatory.
Exercise 2.2. Must the main function be named main? What happens if you
changed main to something else and try to run your program?

19
C HAPTER 2. P ROGRAMMING IN C++

Exercise 2.3. Play around with cout a bit, printing various things. For example,
you can print a pretty haiku.

2.3 Variables and Types


The Hello World! program is boring. It only prints text – seldom the only
necessary component of an algorithm (aside from the Hello World! problem on
KS.Dev). We now move on to a new but hopefully familiar concept.
When we solve mathematical problems, it often proves useful to introduce
all kinds of names for known and unknown values. Math problems often deal
with classes of 𝑁 students, ice cream trucks with velocity 𝑣𝑐𝑎𝑟 km/h, and candy
prices of 𝑝𝑐𝑎𝑛𝑑 𝑦 $/kg.
This concept naturally translates into C++ but with a twist. In most
programming languages, we first need to say what type a variable has! We do
not bother with this in mathematics. We say “let 𝑥 = 5”, and that is that. In
C++, we need to be a bit more verbose. We must write that “I want to introduce
a variable 𝑥 now. It is going to be an integer – more specifically, 5”. Once we
have decided what kind of value 𝑥 will be (in this case integer) it will always be
an integer. We cannot just go ahead and say “oh, I’ve changed my mind. 𝑥 = 2.5
now!” since 2.5 is of the wrong type (a decimal number rather than an integer).

Listing 2.2 Variables

1 #include <iostream>
2 using namespace std;
3
4 int main() {
5 int five = 5;
6 cout << five << endl;
7 int seven = 7;
8 cout << seven << endl;
9 five = seven + 2; // = 7 + 2 = 9
10 cout << five << endl;
11 seven = 0;
12 cout << five << endl; // five is still 9
13 cout << 5 << endl; // we print the integer 5 directly
14 }

Another major difference is that variables in C++ are not tied to a single
value for the entirety of their lifespans. Instead, we are able to modify the value
which our variables have using something called assignment. Some languages

20
2.3. VARIABLES AND T YPES

does not permit this, preferring their variables to be immutable.


In Listing 2.2 we demonstrate how variables are used in C++. Type this
program into your editor and run it. What is the output? What did you expect
the output to be?
The first time we use a variable in C++ we must decide what kind of values
it may contain. This is called declaring the variable of a certain type. For
example the statement
int five = 5;

declares an integer variable five and assigns the value 5 to it. The int part is
C++ for integer and is what we call a type. After the type, we write the name of
the variable – in this case five. Finally, we may assign a value to the variable.
Note that further use of the variable never include the int part. We declare the
type of a variable once and only once.
Later on in Listing 2.2 we decide that 5 is a somewhat small value for
a variable called five. We can change the value of a variable by using the
assignment operator – the equality sign =. The assignment
five = seven + 2;

states that from now on the variable five should take the value given by the
expression seven + 2. Since (at least for the moment) seven has the value 7 the
expression evaluates to 7 + 2 = 9. Thus five will actually be 9, explaining the
output we get from line 12.
On line 14 we change the value of the variable seven. Note that line 15
still prints the value of five as 9. Some people find this model of assignment
confusing. We first performed the assignment five = seven + 2;, but the value
of five did not change with the value of seven. This is mostly an unfortunate
consequence of the choice of = as operator for assignment. One could think that
“once an equality, always an equality” – that the value of five should always be
the same as the value of seven + 2. This is not the case. An assignment sets
the value of the variable on the left hand side to the value of the expression on
the right hand side at a particular moment in time, nothing more.
The snippet also demonstrates how to print the value of a variable on the
screen – we cout it the same way as with text. This also clarifies why text needs
to be enquoted. Without quotes, we can not distinguish between the text string
"hi" and the variable hi.

21
C HAPTER 2. P ROGRAMMING IN C++

Note that it is possible to declare a variable without assigning a value to


it. When this is done, the variable may receive a random value instead. This is
useful when you immediately want to assign a value to variable that the user can
input (see the next Section 2.4).
Exercise 2.4. What values will the variables 𝑎, 𝑏, and 𝑐 have after executing the
following code:
int a = 4;
int b = 2;
int c = 7;
b = a + c;
c = b - 2;
a = a + a;
b = b * 2;
c = c - c;

Here, the operator - denotes subtraction and * represents multiplication. Once


you have arrived at an answer, type this code into the main function of a new
program and print the values of the variables. Did you get it right?
Exercise 2.5. What happens when an integer is divided by another integer? Try
printing the result of the following divisions: 35 , 15 2 7 −7 7
5 , 2 , 2 , 2 , and −2 .

Exercise 2.6. C++ allows declarations of immutable (constant) variables, using


the keyword const. For example
const int FIVE = 5;

What happens if you try to perform an assignment to such a variable?


There are many other types than int. We have seen one (although without
its correct name), the type for text. You can see some of the most common types
in Listing 2.3.
The text data type is called string. Values of this type must be enclosed
with double quotes. If we want to include an actual quote character in a string,
we type \".
There exists a data type containing one single letter, the char. Such a value
is surrounded by single quotes. The char value containing the single quote is
written ’\”, similarly to how we included double quotes in strings.
Then comes the int, which we discussed earlier. The long long type contains
integers just like the int type. They differ in how large integers they can contain.
An int can only contain integers between −231 and 231 − 1 while a long long
extends this range to −263 to 263 − 1.

22
2.3. VARIABLES AND T YPES

Listing 2.3 Types

1 string text = "Johan said: \"heya!\" ";


2 cout << text << endl;
3
4 char letter = '@';
5 cout << letter << endl;
6
7 int number = 7;
8 cout << number << endl;
9
10 long long largeNumber = 888888888888LL;
11 cout << largeNumber << endl;
12
13 double decimalNumber = 513.23;
14 cout << decimalNumber << endl;
15
16 bool thisisfalse = false;
17 bool thisistrue = true;
18 cout << thisistrue << " and " << thisisfalse << endl;

Exercise 2.7. Since \" is used to include a double quote in a string, we can not
include backslashes in a string like any other character. For example, how would
you output the verbatim string \"? Find out how to include a literal backslash in
a string (for example by searching the web or thinking about how we included
the different quote characters).

Exercise 2.8. Write a program that assigns the minimum and maximum values
of an int to a int variable x. What happens if you increment or decrement this
value using x = x + 1; or x = x - 1; respectively and print its new value?

Competitive Tip
One of the most common sources for errors in code is trying to store an integer value
outside the range of the type. Always make sure your values fit inside the range of an
int if you use it. Otherwise, use long longs!
One of the reasons for why we do not simply use long long all the time is that some
operations involving long longs can be slower using ints under certain conditions.

Next comes the double type. This type represents decimal numbers. Note
that the decimal sign in C++ is a dot, not a comma. There is also another similar
type called the float. The difference between these types are similar to that of

23
C HAPTER 2. P ROGRAMMING IN C++

the int and long long. A double can represent “more” decimal numbers than
a float. This may sound weird considering that there is an infinite number of
decimal numbers even between 0 and 1. However, a computer can clearly not
represent every decimal number – not even those between 0 and 1. To do this,
it would need infinite memory to distinguish between these numbers. Instead,
they represent a limited set of numbers – with about 15 significant digits, and
about 308 zeroes to the left or right of those digits. Floats have fewer significant
digits, and can only represent smaller numbers.
The last of our common types is the bool (short for boolean). This type can
only contain one of two values – it is either true or false. While this may look
useless at a first glance, the importance of the boolean becomes apparent later.

Exercise 2.9. In the same way the integer types had a valid range of values, a
doublecannot represent arbitrarily large values. Find out what the minimum and
maximum values a double can store is.

C++ has a construct called the typedef , or type definition. It allows us to


give certain types new names. Since typing long long for every large integer
variable is very annoying, we could use a type definition to alias it with the
much shorter ll instead. Such a typedef statement looks like this:

typedef long long ll;

On every line after this statement, we can use ll just as if it were a long long:

ll largeNumber = 888888888888LL;

Sometimes we use types with very long names but do not want to shorten
them using type definitions. This could be the case when we use many different
such types and typedefing them would take unnecessarily long time. We then
resort to using the auto “type” instead. If a variable is declared as auto and
assigned a value at the same time its type is inferred from that of the value. This
means we could write

auto str = 123;

instead of

int str = 123;

24
2.4. I NPUT AND O UTPUT

2.4 Input and Output


In previous sections we occasionally printed things onto our screen. To spice
our code up a bit we are now going to learn how to do the reverse – reading
values which we type on our keyboards into a running program! When we run a
program we may type things in the window that appears. Pressing the Enter key
allows the program to read what we have written so far.
Reading input data is done just as you would expect, almost entirely sym-
metric to printing output. Instead of cout we use cin, and instead of << variable
we use >> variable, i.e.
cin >> variable;

Type in the program from Listing 2.4 to see how it works.

Listing 2.4 Input

1 #include <iostream>
2 using namespace std;
3
4 int main() {
5 string name;
6 cout << "What's your first name?" << endl;
7 cin >> name;
8 int age;
9 cout << "How old are you?" << endl;
10 cin >> age;
11 cout << "Hi, " << name << "!" << endl;
12 cout << "You are " << age << " years old." << endl;
13 }

Exercise 2.10. What happens if you type an invalid input, such as your first
name instead of your age?

When the program reads input into a string variable it only reads the text
until the first whitespace.
We revisit more advanced input and output concepts in Section 3.10 about
the standard library. For example, we learn how to read entire lines of text and
not only single words.

25
C HAPTER 2. P ROGRAMMING IN C++

Problem 2.2
Echo – echo
Note: only solve part 1, reciving 1/2 points

2.5 Operators
Earlier we saw examples of what is called operators, such as the assignment
operator =, and the arithmetic operators + - * /, which stand for addition,
subtraction, multiplication and division. They work almost like they do in
mathematics, and allow us to write code such as the one in Listing 2.5.

Exercise 2.11. Type in Listing 2.5 and test it on a few different values. Most
importantly, test:

• 𝑏=0

• Negative values for a and/or b

• Values where the expected result is outside the valid range of an int

As you probably noticed, the division operator of C++ performs so-called


integer division. This means the answer is rounded to an integer (towards 0).
Hence 7 / 3 = 2, with remainder 1, and -7 / 3 = -2.

Exercise 2.12. If division rounds down towards zero, how do you compute 𝑥
𝑦
rounded to an integer away from zero?

The snippet also introduces the modulo operator, %. It computes the


remainder of the first operand when divided by the second. As an example, 7 %
3 = 1. Different programming languages have different behaviours regarding
modulo operations on negative integers. In particular, the value of a modulo
operation can be negative when including negative operands.
In case we want the answer to be a decimal number instead of performing
integer division one of the operands must be a double (Listing 2.6).
We end this section with some shorthand operators. Check out Listing 2.7
for some examples. Each arithmetic operator has a corresponding combined
assignment operator. Such an operator, e.g. a += 5;, is equivalent to a = a + 5;
They act as if the variable on the left hand side is also the left hand side of the
corresponding arithmetic operator and assign the result of this computation to
said variable. Hence, the above statement increases the variable a with 5.

26
2.5. O PERATORS

Listing 2.5 Operators

1 #include <iostream>
2 using namespace std;
3
4 int main() {
5 int a = 0;
6 int b = 0;
7 cin >> a >> b;
8 cout << "Sum: " << (a + b) << endl;
9 cout << "Difference: " << (a - b) << endl;
10 cout << "Product: " << (a * b) << endl;
11 cout << "Quotient: " << (a / b) << endl;
12 cout << "Remainder: " << (a % b) << endl;
13 }

Listing 2.6 Division Operators

1 int a = 6;
2 int b = 4;
3 cout << (a / b) << endl;
4
5 double aa = 6.0;
6 double bb = 4.0;
7 cout << (aa / bb) << endl;

It turns out that addition and subtraction with 1 is a fairly common operation.
So common, in fact, that additional operators were introduced into C++ for
this purpose of saving an entire character compared to the highly verbose +=1
operator. These operators consist of two plus signs or two minus signs. For
instance, a++ increments the variable by 1.
We sometimes use the fact that these expressions also evaluate to a value.
Which value this is depends on whether we put the operator before or after the
variable name. By putting ++ before the variable, the value of the expression
will be the incremented value. If we put it afterwards we get the original value.
To get a better understanding of how this works it is best if you type the code in
Listing 2.7 in yourself and analyze the results.
We end the discussion on operators by saying something about operator
precedence, i.e. the order in which operators are evaluted in expressions.
In mathematics, there is a well-defined precedence: brackets go first, then

27
C HAPTER 2. P ROGRAMMING IN C++

Listing 2.7 Shorthand Operators

1 int num = 0;
2 num += 1;
3 cout << num << endl;
4 num *= 2;
5 cout << num << endl;
6 num -= 3;
7 cout << num << endl;
8 cout << num++ << endl;
9 cout << num << endl;
10 cout << ++num << endl;
11 cout << num << endl;
12 cout << num-- << endl;
13 cout << num << endl;

exponents, followed by division, multiplication, addition, and subtraction.


Furthermore, most operations (exponents being a notable exception) have left-
to-right associativity so that 5 − 3 − 1 equals ((5 − 3) − 1) = 1 rather than
(5 − (3 − 1)) = 3. In C++, there are a lot of operators, and knowing precedence
rules can easily save you from bugs in your future code.

Exercise 2.13. Research online C++ documentation on operator precedence to


determine what the expression
2 * 4 - 7 * 2 % 4 / 2

evalutes to in C++. Run it as a program to see if you got it correct.

Problem 2.3
Two-sum – twosum
Triangle Area – triarea
Bijele – bijele
Digit Swap – digitswap
Pizza Crust – pizzacrust
R2 – r2

2.6 If Statements
In addition to assignment and arithmetic there are a large number of comparison
operators. These compare two values and evaluate to a bool value depending
on the result of the comparison (see Listing 2.8).

28
2.6. I F S TATEMENTS

Listing 2.8 Comparison Operators

1 a == b // check if a equals b
2 a != b // check if a and b are different
3 a > b // check if a is greater than b
4 a < b // check if a is less than b
5 a <= b // check if a is less than or equal to b
6 a >= b // check if a is greater than or equal to b

A bool can also be negated using the ! operator. So the expression !false
(which we read as “not false”) has the value true and vice versa !true evaluates
to false. The operator works on any boolean expressions, so that if b would be
a boolean variable with the value true, then the expression !b evaluates to false.
There are two more important boolean operators. The and operator && takes
two boolean values and evaluates to true if and only if both values were true.
Similarly, the or operator || evalutes to true if and only if at least one of its
operands were true.

Exercise 2.14. Write a program that reads two integers as input, and prints the
result of the different comparison operators from Listing 2.8, e.g
cout << (a == b) << endl;

Note the parenthesis used due to operator precedence!

A major use of boolean variables is in conjunction with if statements (also


called conditional statements). They come from the necessity of executing
certain lines of code if (and only if) some condition is true. Let us write a
program that takes an integer as input, and tells us whether it is odd or even.
We can do this by computing the remainder of the input when divided by 2
(using the modulo operator) and checking if it is 0 (even number), 1 (positive
odd number) or, -1 (negative odd number). An implementation of this can be
seen in Listing 2.9.
An if statement consists of two parts – a condition, given inside brackets
after the if keyword, followed by a body – some lines of code surrounded by
curly brackets. The code inside the body will be executed in case the condition
evaluates to true.
Our odd or even example contains a certain redundancy. If a number is
not even we already know it is odd. Checking this explicitly using the modulo

29
C HAPTER 2. P ROGRAMMING IN C++

Listing 2.9 Odd or Even

1 int input;
2 cin >> input;
3 if (input % 2 == 0) {
4 cout << input << " is even!" << endl;
5 }
6 if (input % 2 == 1 || input % 2 == -1) {
7 cout << input << " is odd!" << endl;
8 }

operator seems to be a bit unnecessary. Indeed, there is a construct that saves


us from this verbosity – the else statement. It is used after an if statement and
contains code that should be run if the condition given to the condition of an if
statement is false. We can adopt this to simplify our odd and even program to
the one in Listing 2.10.

Listing 2.10 Odd or Even 2

1 int input;
2 cin >> input;
3 if (input % 2 == 0) {
4 cout << input << " is even!" << endl;
5 } else {
6 cout << input << " is odd!" << endl;
7 }

There is one last if-related construct – the else if. Since code is worth a
thousand words, we demonstrate how it works in Listing 2.11 by implementing
a helper for the children’s game FizzBuzz. In FizzBuzz, one goes through the
natural numbers in increasing order and say them out loud. When the number is
divisible by 3 you instead say Fizz. If it is divisible by 5 you say Buzz, and if it
is divisible by both you say FizzBuzz.

Exercise 2.15. Run the program in Listing 2.11 with the values 30, 10, 6, 4.
Explain the output you get.

Problem 2.4
Expected Earnings – casino

30
2.7. F OR L OOPS

Grading – grading
Three-Sort – threesort
Spavanac – spavanac
Cetvrta – cetvrta

2.7 For Loops


Another rudimentary building block of programs is the for loop. A for loop is
used to execute a block of code multiple times. The most basic loop repeats
code a fixed number of times as in the example from Listing 2.12.
A for loop is built up from four parts. The first three parts are the semi-colon
separated statements immediately after the for keyword. In the first of these
parts you write some expression, such as a variable declaration. In the second
part you write an expression that evaluates to a bool, such as a comparison
between two values. In the third part you write another expression.
The first part will be executed only once – it is the first thing that happens in
a loop. In this case, we declare a new variable i and set it to 0. The loop will
then be repeated until the condition in the second part is false. Our example loop
will repeat until i is no longer less than repetitions. The third part executes
after each execution of the loop. Since we use the variable i to count how many
times the loop has executed, we want to increment this by 1 after each iteration.
Together, these three parts make sure our loop will run exactly repetitions
times. The final part of the loop is the statements within curly brackets. Just as
with the if statements, this is called the body of the loop and contains the code
that will be executed in each repetition of the loop. A repetition of a loop is in
algorithm language more commonly referred to as an iteration.

Exercise 2.16. What happens if you enter a negative value as the number of
loop repetitions?

Exercise 2.17. Design a loop that instead counts backwards, from repetitions − 1
to 0.

Problem 2.5
N-Sum – nsum
Building Pyramids – pyramids
Echo – echo
Note: solve both parts now, reciving 2/2 points

31
C HAPTER 2. P ROGRAMMING IN C++

Cinema Crowds – cinema


Refridgerator Transports – refridgerator

Within a loop, two useful keywords can be used to modify the loop – continue
and break. Using continue; inside a loop exits the current iteration and starts the
next one. break; on the other hand, exits the loop altogether. For an example,
consider Listing 2.13.

Exercise 2.18. What will the following code snippet output?

1 for (int i = 0; false; i++) {


2 cout << i << endl;
3 }
4
5 for (int i = 0; i >= -10; --i) {
6 cout << i << endl;
7 }
8
9 for (int i = 0; i <= 10; ++i) {
10 if (i % 2 == 0) continue;
11 if (i == 8) break;
12 cout << i << endl;
13 }

Problem 2.6
Cinema Crowds 2 – cinema2
Lamps – lamps

2.8 While Loops


There is a second kind of loop, which is simpler than the for loop. It is called a
while loop, and works like a for loop where the initial statement and the update
statement are removed, leaving only the condition and the body. It can be
used when you want to loop over something until a certain condition is false
(Listing 2.14).
The break; and continue; statements work the same way as the do in a for
loop.
Problem 2.7
3n+1 – 3nplus1
Soda Sluper – sodaslurper

32
2.9. F UNCTIONS

Listing 2.11 Else If

1 int input;
2 cin >> input;
3 if (input % 15 == 0) {
4 cout << "FizzBuzz" << endl;
5 } else if (input % 5 == 0) {
6 cout << "Buzz" << endl;
7 } else if (input % 3 == 0) {
8 cout << "Fizz" << endl;
9 } else {
10 cout << input << endl;
11 }

Listing 2.12 For Loops

1 int repetitions = 0;
2 cin >> repetitions;
3 for (int i = 0; i < repetitions; i++) {
4 cout << "This is repetition " << i << endl;
5 }

2.9 Functions
In mathematics a function is something that takes one or more arguments and
computes some value based on them. Common functions include the squaring
function square(𝑥) = 𝑥 2 , the addition function add(𝑥, 𝑦) = 𝑥 +𝑦 or, the minimum
function min(𝑎, 𝑏) which evalutes to the smallest of its arguments.
Functions exists in programming as well but work a bit differently. Indeed,
we have already seen a function – the main() function. We have implemented
the example functions in Listing 2.15.
In the same way that a variable declaration starts by proclaiming what
data type the variable contains a function declaration states what data type the
function evaluates to. Afterwards, we write the name of the function followed
by its arguments (which is a comma-separated list of variable declarations).
Finally, we give it a body of code wrapped in curly brackets.
All of these functions contain a statement with the return keyword, unlike our
main function. A return statement says “stop executing this function, and return
the following value!”. Thus, when we call the squaring function by square(x),

33
C HAPTER 2. P ROGRAMMING IN C++

Listing 2.13 Break and Continue

1 int check = 36;


2
3 for (int divisor = 2; divisor * divisor <= check; ++divisor) {
4 if (check % divisor == 0) {
5 cout << check << " is not prime!" << endl;
6 cout << "It equals " << divisor << " x " << (check / divisor) << endl;
7 break;
8 }
9 }
10
11 for (int divisor = 1; divisor <= check; ++divisor) {
12 if (check % divisor == 0) {
13 continue;
14 }
15 cout << divisor << " does not divide " << check << endl;
16 }

Listing 2.14 While

1 int num = 9;
2 while (num != 1) {
3 if (num % 2 == 0) {
4 num /= 2;
5 } else {
6 num = 3 * num + 1;
7 }
8 cout << num << endl;
9 }

the function will compute the value x * x and make sure that square(x) evaluates
to just that.
Why have we left a return statement out of the main function? In main(), the
compiler inserts an implicit return 0; statement at the end of the function.
Exercise 2.19. What will the following function calls evaluate to?
min(square(10), add(square(9), 23));

Exercise 2.20. . We declared all of the new arithmetic functions above our main
function in the example. Why did we do this? What happens if you move one
below the main function instead? (Hint: what happens if you try to use a variable
before declaring it?)

34
2.9. F UNCTIONS

Listing 2.15 Functions

1 #include <iostream>
2
3 using namespace std;
4
5 int square(int x) {
6 return x * x;
7 }
8
9 int min(int x, int y) {
10 if (x < y) {
11 return x;
12 } else {
13 return y;
14 }
15 }
16
17 int add(int x, int y) {
18 return x + y;
19 }
20
21 int main() {
22 int x, y;
23 cin >> x >> y;
24 cout << x << "^2 = " << square(x) << endl;
25 cout << x << " + " << y << " = " << add(x, y) << endl;
26 cout << "min(" << x << ", " << y << ") = " << min(x, y) << endl;
27 }

Exercise 2.21. . Research online what a forward declaration of a function is,


and how it resolves the problem from Exercise 2.20.

Problem 2.8
Arithmetic Functions – arithmeticfunctions

An important caveat to note when calling functions is that the arguments


we send along are copied. If we try to change them by assigning values to our
arguments, we will not change the original variables in the calling function (see
Listing 2.16 for an example).
We can also choose to not return anything by using the void return type. This
may seem useless since nothing ought to happen if we call a function but does
not get anything in return. However, there are ways we can affect the program

35
C HAPTER 2. P ROGRAMMING IN C++

Listing 2.16 Argument Copying

1 void change(int val) {


2 val = 0;
3 }
4
5 int main() {
6 int variable = 100;
7 change(variable);
8 cout << "Variable is " << variable << endl;
9 }

without returning.
The first one is by using global variables. It turns out that variables may
be declared outside of a function. It is then available to every function in your
program. Changes to a global variable by one function are also be seen by other
functions (try out Listing 2.17 to see them in action).

Listing 2.17 Global Variables

1 int currentMoney = 0;
2
3 void deposit(int newMoney) {
4 currentMoney += newMoney;
5 }
6 void withdraw(int withdrawal) {
7 currentMoney -= withdrawal;
8 }
9
10 int main() {
11 cout << "Currently, you have " << currentMoney << " money" << endl;
12 deposit(1000);
13 withdraw(2000);
14 cout << "Oh-oh! Your balance is " << currentMoney << " :(" << endl;
15 }

Problem 2.9
Counting Days – countingdays

Secondly, we may actually change the variables given to us as arguments by


declaring them as references. Such an argument is written by adding a & before

36
2.10. S TRUCTURES

the variable name, for example int &x. If we perform assignments to the variable
x within the function we change the variable used for this argument in the calling
function instead. Listing 2.18 contains an example of using references.

Listing 2.18 References

1 // Note &val instead of val


2 void change(int &val) {
3 val = 0;
4 }
5
6 int main() {
7 int variable = 100;
8 cout << "Variable is " << variable << endl;
9 change(variable);
10 cout << "Variable is " << variable << endl;
11 }

Problem 2.10
Logic Functions – logicfunctions

Exercise 2.22. Why is the function call change(4) not valid C++? (Hint: what
exactly are we changing when we assign to the reference in func?)

2.10 Structures
Algorithms operate on data, usually lots of it. Programming language designers
therefore came up with many ways of organizing the data our programs use.
One of these constructs is the structure (also called a record, and in C++ almost
equivalent to something called a class). Structures are a special kind of data
type that can contain member variables – variables inside them – and member
functions – functions which can operate on member variables.
The basic syntax used to define a structure looks like this:
struct Point {
double x;
double y;
};
This particular structure contains two member variables, x and y, representing
the coordinates of a point in 2D Euclidean space.
Once we have defined a structure we can create instances of it. Every
instance has its own copy of the member variables of the structure. Structs

37
C HAPTER 2. P ROGRAMMING IN C++

essentially encapsulate concepts – like books – while instances of the struct


represent individal, particular books (like this one!).
To create an instance of a struct, use the same syntax as with other variables;
We can get the value of a member variable of a structure using the syntax
instance.variable:

Point origin; // create an instance of the Point structure

// set the coordinates to (0, 0)


origin.x = 0;
origin.y = 0;

cout << "The origin is (" << origin.x << ", "
<< origin.y << ")." << endl;

As you can see structures allow us to group certain kinds of data together in a
logical fashion. Later on, this will simplify the coding of certain algorithms and
data structures immensely.
There is an alternate way of constructing instances called constructors. A
constructor looks like a function inside our structure and allows us to pass
arguments when we create a new instance of a struct. The constructor receives
these arguments to help set up the instance.
Let us add a constructor to our point structure, to more easily create instances:

struct Point {
double x;
double y;

Point(double theX, double theY) {


x = theX;
y = theY;
}
};

The newly added constructor lets us pass two arguments when constructing
the instance to set the coordinates correctly. With it, we avoid the two extra
statements to set the member variables.
Point p(4, 2.1);
cout << "The point is (" << p.x << ", " << p.y << ")." << endl;

Structure values can also be constructed outside of a variable declaration using


the syntax
Point(1, 2);

so that we can reassign a previously declared variable with

38
2.10. S TRUCTURES

p = Point(1, 2);

We can also define functions inside the structure. These functions work
just like any other functions except they can also access the member variables
of the instance that the member function is called on. For example, we might
want a convenient way to mirror a certain point in the x-axis. This could be
accomplished by adding a member function:
struct Point {
double x;
double y;

Point(double theX, double theY) {


x = theX;
y = theY;
}

Point mirror() {
return Point(x, -y);
}
};

To call the member function mirror() on the point p, we write p.mirror(),


for example:
Point p(1, 2);
Point mirrored = p.mirror();
cout << "(" << mirrored.x << ", " << mirrored.y << ")" << endl;

In this example we see yet another use of a void function. Such member
functions can still modify the member variables of the struct the belong to.

Exercise 2.23. Add a translate member function to the point structure. It should
take two double values x and y as arguments, returning a new point which is the
instance point translated by (𝑥, 𝑦).

Similarly to the const modifier that could be added to a variable declaration,


one can also declare a member function to be const:
Point mirror() const {
return Point(x, -y);
}

The keyword must be added right before the last brace. Such a function is unable
to modify any of the member variables. It can not call other member functions
that are not declared as const either. Generally, you will never have to worry
about declaring functions to be const.

39
C HAPTER 2. P ROGRAMMING IN C++

Exercise 2.24. What happens if we try to change a member variable in a const


member function?

Finally, C++ has a powerful mechanism called operator overloading. It


allows us to define how various operators such as + should behave if we apply
them to instances of a struct. For example, we could define what happens
when we write
a + b

where a and b are Points. The syntax for the binary operators looks like this:
Point operator+(Point other) {
double newX = x + other.x;
double newY = y + other.y;
return Point(newX, newY);
}

Try this function out by defining two points and computing their sum.

Exercise 2.25. One can use operator overloading for binary operators where the
types are different as well. For example,
Point operator*(double m) { ... }

would define what happens if you multiply a point by a double. Add such a
function to your point, that returns a point with its coordinates scaled by the
given double.

Exercise 2.26. Fill in the remaining code to implement this structure:


1 struct Quotient {
2 // .. member variables?
3 // Construct a new Quotient with the given numerator and denominator
4 Quotient(int n, int d) { }
5 // Return a new Quotient, this instance plus the "other" instance
6 Quotient add(const Quotient &other) const { }
7 // Return a new Quotient, this instance times the "other" instance
8 Quotient multiply(const Quotient &other) const { }
9 // Output the value on the screen in the format n/d
10 void print() const { }
11 };

2.11 Arrays
In the Sorting Problem from Chapter 1 we often spoke of the data type “sequence
of integers”. Until now, none of the data types we have seen in C++ represents
this kind of data. We present the array. It is a special type of variable, which

40
2.11. A RRAYS

can contain a large number of variables of the same type. For example, it could
be used to represent the recurring data type “sequence of integers” from the
Sorting Problem in Chapter 1. When declaring an array, we specify the type of
variable it should contain, its name, and its size using the syntax:
type name[size];
For example, an integer array of size 10 named seq would be declared with
int seq[10];
This creates 10 integer “variables” which we can refer to using the syntax
seq[index], starting from zero (they are zero-indexed). Thus we can use seq[0],
seq[1], etc., all the way up to seq[9]. The values are called the elements of the
array.

size = 10

seq[0] seq[1] seq[2] seq[3] seq[4] seq[5] seq[6] seq[7] seq[8] seq[9]

Figure 2.1: A 10-element array called seq.

Be aware that using an index outside the valid range for a particular array
(i.e. below 0 or above the size − 1) can cause erratic behavior in the program
without crashing it.
If you declare a global array all elements get a default value. For numeric
types this is 0, for booleans this is false, for strings this is the empty string and
so on. If, on the other hand, the array is declared in the body of a function that
guarantee does not apply. Instead of being zero-initialized, the elements can
have random values. For this reason, arrays are mostly declared globally in
competitive programming.
You can see an example of arrays in action in Listing 2.19, which computes
a few of the the possible scores of a roll in the dice game Yatzee.
Later on (Section 3.1) we transition from using arrays to a much more
powerful structure from the standard library which serves the same purpose –
the vector.
Problem 2.11
Reversal – reverse
N-Back – nback

41
C HAPTER 2. P ROGRAMMING IN C++

Modulo – modulo
I’ve Been Everywhere, Man – everywhere

2.12 Lambdas
We will now briefly discuss a somewhat complex language constract – lambdas.
It is very seldom necessary to solve problems, but we occasionally use it in code
throughout the book.
A lambda expression is essentially an unnamed function that can be defined
within another function and assigned to a variable of the function type:
function<int,int(int)> op = [](int a, int b) -> int {
return a * b + a + b;
};
cout << op(5, op(1, 2)) << endl;

Here, we have defined a function that takes two values a and b, and returns
the value a * b + a + b. We have assigned the function to the variable op, and
can invoke it as if it was a regular function with that name.
Generally, definitions look simpler than this – if the function is “simple
enough”, we can ignore the -> int part, which we use to specify the return value
of the lambda. We also tend to use the auto type instead of the more convoulted
function<...> type, as long as the lambda does not call itself through the name
of the variable to which it is assigned.
Thus, the declaration may also look like this:
auto op = [](int a, int b) {
return a * b + a + b;
};

What is the point of doing this rather than simply using regular functions?
Lambdas can also be given access to variables of the enclosing function:
int x = 5;
auto addToX = [&](int y) {
x += y;
};

Here, note the added ampersand in [&]. This means that all variables defined
before the lambda in the function should be accessible within the lambda as
references.

Exercise 2.27. Use the internet to figure out:

• how to only make a single variable from the enclosing function available
in a lambda.

42
2.13. T HE P REPROCESSOR

Listing 2.19 Arrays

1 #include <iostream>
2
3 using namespace std;
4
5 int rolls[7];
6
7 int main() {
8 cout << "Enter 5 dice rolls between 1 and 6: " << endl;
9 for (int i = 0; i < 5; i++) {
10 int roll;
11 cin >> roll;
12 rolls[roll]++;
13 }
14 cout << "Yatzee scores: " << endl;
15 for (int i = 1; i <= 6; i++) {
16 cout << i << "'s: " << (i * rolls[i]) << endl;
17 }
18 }

• how to make variables within the enclosing function avalable as copies


rather than as references.

• how lambdas can be passed as arguments to other functions.

2.13 The Preprocessor


C++ has a powerful tool called the preprocessor. This utility is able to read and
modify your code using certain rules during compilation. The commonly used
#include is a preprocessor directive that includes a certain file in your code.
Besides file inclusion, we mostly use the #define directive. It allows us to
replace certain tokens in our code with other ones. The most basic usage is
#define TOREPLACE REPLACEWITH

which replaces the token TOREPLACE in our program with REPLACEWITH. The true
power of the define comes when using define directives with parameters. These
look similar to functions and allows us to replace certain expressions with
another one, additionally inserting certain values into it. We call these macros.
For example the macro
#define rep(i,a,b) for (int i = a; i < b; i++)

means that the expression

43
C HAPTER 2. P ROGRAMMING IN C++

rep(i,0,5) {
cout << i << endl;
}
is expanded to
for (int i = 0; i < 5; ++i) {
cout << i << endl;
}
You can probably get by without ever using macros in your code. The reason
we discuss them is because we are going to use them in code in the book so it
is a good idea to at least be familiar with their meaning. They are also used in
competitive programming in general,

2.14 Template
In competitive programming, one often uses a template, with some shorthand
typedef’s and preprocessor directives. In Listing 2.20, we give an example of
the template used in some of the C++ code in this book.

Listing 2.20 Coding Template

1 #include <bits/stdc++.h>
2 using namespace std;
3
4 #define rep(i, a, b) for(int i = a; i < (b); ++i)
5 #define trav(a, x) for(auto& a : x)
6 #define all(x) x.begin(), x.end()
7 #define sz(x) (int)(x).size()
8 typedef long long ll;
9 typedef pair<int, int> pii;
10 typedef vector<int> vi;
11
12 int main() {
13 }

The rep(i,a,b) macro is the one we saw in the previous section – it can be
used to write a simple counting loop in a compact way.
The trav(a, x) macro is used to iterate through all members of a data structure
from the standard library such as the vector – the first topic of Chapter 3.
The all(x) macro is used together with certain operations from the standard
library – we’ll see concrete examples in the next chapter.
The sz(x) macro is used get the size of a data structure from the standard
library.

44
2.14. T EMPLATE

Chapter Exercises
Problem 2.12
Cubes – kuber
Islands – oar
Grading – betygsattning
Faroffistanian Personal Numbers – checksum
Mini Golf – minigolf
Booking – booking
Tomatoes – tomater
Will Roger’s Phenomena – willrogers
Yatzee – yatzee
Memory – memory

Chapter Notes
C++ was invented by Danish computer scientist Bjarne Stroustrup. Bjarne has
also published a book on the language, The C++ Programming Language[27],
that contains a more in-depth treatment of the language. It is rather accessible to
C++ beginners but is better read by someone who have some prior programming
experience (in any programming language).
C++ is standardized by the International Organization for Standardization
(ISO). These standards are the authoritative source on what C++ is. The final
drafts of the standards can be downloaded at the homepage of the Standard C++
Foundation1.
There are many online references of the language and its standard library.
The two we use most are:

• http://en.cppreference.com/w/

• http://www.cplusplus.com/reference/

1 https://isocpp.org/

45
C HAPTER 2. P ROGRAMMING IN C++

46
3 The C++ Standard Library
In this chapter we study parts of the C++ standard library – that is, data structures,
algorithms and utilities that are already provided for us without having to code
them ourselves.
We start by examining a number of basic data structures. Data structures
help us organize the data we work with in the hope of making processing
both easier and more efficient. Different data structures serve widely different
purposes and solve different problems. Whether a data structure fits our needs
depends on what operations we wish to perform on the data. We consider
neither the efficiency of the various operations in this chapter nor how they are
implemented. These concerns are postponed until Chapter 6, when we have the
tools to analyze the efficiency of data structures.
The standard library also contains many useful algorithms such as sorting and
various mathematical functions. These are discussed after the data structures.
In the end, we take a deeper look at string handling in C++ and some more
input/output routines.

3.1 vector

One of the latter things discussed in the C++ chapter was the fixed-size array.
As you might remember the array is a special kind of data type that allows us to
store multiple values of the same data type inside what appeared to us as a single
variable. Arrays are a bit awkward to work with in practice. When passing them
as parameters we must also pass along the size of the array. We are also unable
to change the size of arrays once declared nor can we easily remove or insert
elements, or copy arrays.
The dynamic array is a special type of array that can change size (hence
the name dynamic). It also supports operations such as removing and inserting
elements at any position in the list.
The C++ standard library includes a dynamic array called a vector, which is
an alternative name for dynamic arrays in some languages. To use it you must
include the vector file by adding the line

47
C HAPTER 3. T HE C++ S TANDARD L IBRARY

#include <vector>

among your other includes at the top of your program.


When declaring vectors, they need to know what type of data they should
store, just like ordinary arrays. This is done using a somewhat peculiar syntax.
To create a vector containing strings named words we write
vector<string> words;

This angled bracket syntax appears again later when using other C++
structures from the standard library.
Once a vector is created elements can be appended to it using the push_back
member function. The following four statements would add the words Simon is
a fish as separate elements to the vector:

words.push_back("Simon");
words.push_back("is");
words.push_back("a");
words.push_back("fish");

To refer to a specific element in a vector you can use the same operator [] as
for arrays. Thus, words[i] refers to the 𝑖’th value in the vector (starting at 0).
cout << words[0] << " " << words[1] << " "; // Prints Simon is
cout << words[2] << " " << words[3] << " "; // Prints a fish

Like arrays, accessing indices outside the valid range of the vector can cause
weird behaviour in your program.
We can get the current size of an array using the size() member function:
cout << "The vector contains " << words.size() << " words" << endl;

There is also an empty() function that can be used to check if the vector contains
no elements. These two functions are part of basically every standard library
data structure.
Problem 3.1
Vector Functions – vectorfunctions

You can also create dynamic arrays that already contain a number of elements.
This is done by passing an integer argument when first declaring the vector.
They are filled with the same default value as (global) arrays are when created:
vector<int> vec(5); // creates a vector containing 5 zeroes

48
3.1. vector

The value that such an array is filled with can also be set explicitly by using
a two-argument constructor; the second argument is the value to fill the array
with:
vector<int> vec(5, -1); // creates a vector containing 5 -1's

Exercise 3.1. What happens when you create vectors of a struct?


Try using structures:

• without a constructor,

• with a zero argument constructor,

• with only non-zero argument constructors and

• with both zero argument and non-zero argument constructors.

We can create vectors that also contain other vectors, to make multidimen-
sional vectors. For example, we could make a 2-dimensional vector (i.e. a grid
of values) in the following way:
vector<vector<int>> grid(7, vector<int>(5));

Since we filled the vector with 7 vectors of length 5, we get 7 × 5 grid of integers.
The values in the grid are then referred to using grid[a][b] where 0 ≤ 𝑎 < 5
and 0 ≤ 𝑏 < 7.
Similarly, one can create 𝑁 -dimensional vectors by creating vectors of
vectors of ... and so on.
Problem 3.2
Cinema Seating – cinemaseating

Other occasionally useful functions that a vector support are:

• pop_back(): remove the last element of a vector

• clear(): remove all elements of a vector

• front(): get the first element of a vector

• back(): get the first element of a vector

• assign(n, val): replace the contents of the vector with 𝑛 copies of val.

49
C HAPTER 3. T HE C++ S TANDARD L IBRARY

3.2 Iterators
A concept central to the standard library is the iterator. An iterator is an object
which “points to” an element in some kind of data structure (such as a vector).
Essentially, they are a generalization of the role played by an integer representing
an index of a vector. The reason we could not simply eliminate their use and use
integer indexes directly wherever iterators appear is that some data structures
do not support accessing values directly by their index. Not all data structures
support iterators either.
The type of an iterator for a data structure of type t is t::iterator. An
iterator of a vector<string> thus has the type vector<string>::iterator. Most
of the time we instead use the auto type since this is very long to type.
To get an iterator to the first element of a vector, we use begin():
auto first = words.begin();

We can get the value that an iterator points at using the * operator:
cout << "The first word is " << *first << endl;

If we have an iterator it pointing at the 𝑖’th element of a vector we can get


a new iterator pointing to another value in one of two ways. For iterators of a
vector, we add or subtract an integer value to the iterator. For example, it + 4
points to the (𝑖 + 4)’th element of the vector, and it - 1 is the iterator pointing
to the (𝑖 − 1)’st element.
For those structures that do not support access by indexes, the iterators can
instead by moved forwards and backwards using the ++ and -- operators, i.e. by
writing it++ and it--.
There is a special kind of iterator which points to the first position after the
last element. We get this iterator by using the function end(). It allows us to
iterate through a vector in the following way:
for (auto it = words.begin(); it != words.end(); it++) {
string value = *it;
cout << value << endl;
}

In this loop we start by creating an iterator which points to the first element of
the vector. Our update condition will repeatedly move the iterator to the next
element in the vector. The loop condition ensures that the loop breaks when the
iterator first points to the element past the end of the vector.
In modern C++ language versions, there is a shorter construct that is
equivalent to this loop:

50
3.3. queue

for (auto value : words) {


cout << value << endl;
}

In addition to the begin() and end() pair of iterators, there is also rbegin()
and rend(). They work similarly, except that they are reverse iterators - they
iterate in the other direction. Thus, rbegin() actually points to the last element
of the vector, and rend() to an imaginary element before the first element of the
vector. If we move a reverse iterator in a positive direction, we will actually
move it in the opposite direction (i.e. adding 1 to a reverse iterator makes it
point to the element before it in the vector).
Exercise 3.2. Use the rbegin()/rend() iterators to code a loop that iterates
through a vector in the reverse order.
Certain operators on a vector require the use of vector iterators. For example,
the insert and erase member functions, used to insert and erase elements at
arbitrary positions, take iterators to describe positions. When removing the
second element, we write
words.erase(words.begin() + 1);

The insert() function uses an iterator to know at what position an element


should be inserted. If it is passed the begin() iterator the new element will be
inserted at the start of the array. Similarly, as an alternative to push_back() we
could have written
words.insert(words.end(), "food");

to insert an element at the end of the vector.


Exercise 3.3. After adding these two lines, what would the loop printing every
element of the vector words output?

Problem 3.3
Cut in Line – cutinline

3.3 queue

The queue structure corresponds to a plain, real-life queue. It supports mainly


two operations: appending an element to the back of the queue, and extracting
the first element of the queue. The structure is in the queue file so it must be
included using

51
C HAPTER 3. T HE C++ S TANDARD L IBRARY

#include<queue>

As with all standard library data structures declaring a queue requires us to


provide the data type which we wish to store in it. To create a queue storing
ints we would use the statement

queue<int> q;

We use mainly five functions when dealing with queues:

• push(x): add the element x to the back of the queue

• pop(): remove the element from the front of the queue

• front(): return the element from the front of the queue

• empty(): return true if and only if the queue is empty

• size(): return the number of elements in the queue

Exercise 3.4. There is a similar data structured called a dequeue. The standard
library version is named after the abbreviation deque instead. Use one of the
C++ references from the C++ chapter notes (Section 2.14) to find out what this
structure does and what its member functions are called.

3.4 stack

A is a structure very similar to a queue. The difference is that when


stack
push()ing something to a stack, it is added first to the stack instead of last. To
use it you need to include
#include<stack>

Creating a stack containing e.g. ints is as easy as creating a queue:


stack<int> q;

We use mainly five functions when dealing with queues:

• push(x): add the element x to the top of the stack

• pop(): remove the element from the top of the stack

• top(): return the element from the top of the stack

52
3.5. priority_queue

• empty(): return true if and only if the stack is empty

• size(): return the number of elements in the stack

Note the change in terminology. Instead of the first and last elements being
called front and back as for the queue they are instead called top and bottom in a
stack.

3.5 priority_queue

The queue and stack structures are arguably unnecessary, since they can be
emulated using a vector (see Sections 6.2, 6.3). This is not the case for the next
structure, the priority_queue.
The structure is similar to a queue or a stack, but instead of insertions and
extractions happening at one of the endpoints of the structure, the greatest
element is always returned during the extraction.
The structure is located in the same file as the queue structure, so add
#include<queue>

to use it.
To initialize a priority queue, use the same syntax as for the other structures:
priority_queue<int> pq;

This time there is one more way to create the structure that is important to
remember. It is not uncommon to prefer the sorting to be done according to
some other order than descending. For this reason there is another way of
creating a priority queue. One can specify a comparison function that takes
two arguments of the type stored in the queue and returns true if the first one
should be considered less than the second. This function can be given as an
argument to the type in the following way:
bool cmp(int a, int b) {
return a > b;
}

priority_queue<int, vector<int>, cmp> pq;

// or equivalently
priority_queue<int, vector<int>, greater<int>> pq;

Note that a priority queue by default returns the greatest element. If we want to
make it return the smallest element, the comparison function needs to instead

53
C HAPTER 3. T HE C++ S TANDARD L IBRARY

say that the smallest of the two elements actually is the greatest, somewhat
counter-intuitively.
Interactions with the queue is similar to that of the other structures:
• push(x): add the element x to the priority queue

• pop(): remove the greatest element from the priority queue

• top(): return the greatest element from the priority queue

• empty(): return true if and only if the priority queue is empty

• size(): return the number of elements in the priority queue

Problem 3.4
I Can Guess the Data Structure! – guessthedatastructure
Akcija – akcija
Cookie Selection – cookieselection
Pivot – pivot

3.6 set and map


The final data structures to be studied in this chapter are also the most powerful:
the set and the map.
The set structure is similar to a mathematical set (Section A.2), in that it
contains a collection of unique elements. Unlike the vector, particular positions
in the structure can not be accessed using the [] operator. This may make
sets seem worse than vectors. The advantage of sets is twofold. First, we can
determine membership of elements in a set much more efficently compared to
when using vectors (in Chapters 5 and 6, what this means will become clear).
Secondly, sets are also sorted. This means we can quickly find the smallest and
greatest values of the set.
Elements are instead accessed only through iterators, obtained using the
begin(), end() and find() member functions. These iterators can be moved
using the ++ and -- iterators, allowing us to navigate through the set in sorted
(ascending) order (with begin() referring to the smallest element).
Elements are inserted using the insert function and removed using the erase
function. A concrete example usage is found in Listing 3.1
A structure similar to the set is the map. It is essentially the same as a set,
except the elements are called keys and have associated values. When declaring

54
3.7. M ATH

Listing 3.1 Sets

1 set<int> s;
2 s.insert(4);
3 s.insert(7);
4 s.insert(1);
5
6 // find returns an iterator to the element if it exists
7 auto it = s.find(4);
8 // ++ moves the iterator to the next element in order
9 ++it;
10 cout << *it << endl;
11
12 // if nonexistant, find returns end()
13 if (s.find(7) == s.end()) {
14 cout << "7 is not in the set" << endl;
15 }
16
17 // erase removes the specific element
18 s.erase(7);
19
20 if (s.find(7) == s.end()) {
21 cout << "7 is not in the set" << endl;
22 }
23
24 cout << "The smallest element of s is " << *s.begin() << endl;

a map two types need to be provided – that of the key and that of the value. To
declare a map with string keys and int values you write
map<string, int> m;

Accessing the value associated with a key x is done using the [] operator, for
example, m["Johan"]; would access the value associated with the "Johan" key.
Problem 3.5
Secure Doors – securedoors
Babelfish – babelfish

3.7 Math
Many algorithmic problems require mathematical functions. In particular you
there is a heavy use of square roots and trigonometric functions in geometry
problems. These of these functions are be found in the

55
C HAPTER 3. T HE C++ S TANDARD L IBRARY

Listing 3.2 Maps

1 map<string, int> age;


2 age["Johan"] = 22;
3 age["Simon"] = 23;
4
5 if (age.find("Aron") == age.end()) {
6 cout << "No record of Aron's age" << endl;
7 }
8
9 cout << "Johan is " << age["Johan"] << " years old" << endl;
10 cout << "Anton is " << age["Anton"] << " years old" << endl;
11
12 age.erase("Johan");
13 cout << "Johan is " << age["Johan"] << " years old" << endl;
14
15 auto last = --age.end();
16 cout << (*last).first << " is " << (*last).second << " years old" << endl;

#include <cmath>

library.
We list some of the most common such functions here:
• abs(x):
computes |𝑥 | (𝑥 if 𝑥 ≥ 0, otherwise −𝑥)

• sqrt(x): computes 𝑥

• pow(x, y): computes 𝑥 𝑦

• exp(x): computes 𝑒 𝑥

• log(x): computes ln(𝑥)

• cos(x) / acos(x): computes cos(𝑥) and arccos(𝑥) respectively

• sin(x) / asin(x): computes sin(𝑥) and arcsin(𝑥) respectively

• tan(x) / atan(x): computes tan(𝑥) and arctan(𝑥) respectively

• ceil(x) / floor(x): computes d𝑥e and b𝑥c respectively


There are also min(x, y) and max(x, y) functions which compute the min-
imum and maximum of the values 𝑥 and 𝑦 respectively. These are not in the
cmath library however. Instead, they are in algorithm.

56
3.8. A LGORITHMS

Problem 3.6
Vacuumba – vacuumba
Half a Cookie – halfacookie
Ladder – ladder
A1 Paper – a1paper

3.8 Algorithms
A majority of the algorithms we regularly use from the standard library operate
on sequences. To use algorithms, you need to include
#include <algorithm>

Sorting
Sorting a sequences is very easy in C++. The function for doing so is named
sort. It takes two iterators marking the beginning and end of the interval to be
sorted and sorts it in-place in ascending order. For example, to sort the first 10
elements of a vector named v you would use
sort(v.begin(), v.begin() + 10);

Note that the right endpoint of the interval is exclusive – it is not included in
the interval itself. This means that you can provide v.end() as the end of the
interval if you want to sort the vector until the end.
As with priority_queues or sets, the sorting algorithm can take a custom
comparator if you want to sort according to some other order than that defined
by the < operator. For example,
sort(v.begin(), v.end(), greater<int>());

would sort the vector v in descending order. You can provide other sorting
functions as well. For example, you can sort numbers by their absolute value by
passing in the following comparator:
bool cmp(int a, int b) {
return abs(a) < abs(b);
}
sort(v.begin(), v.end(), cmp);

What happens if two values have the same absolute value when sorted with
the above comparator? With sort, this behaviour is not specified: they can
be ordered in any way. Occasionally you want that values compared by your
comparison function as equal are sorted in the same order as they were given

57
C HAPTER 3. T HE C++ S TANDARD L IBRARY

in the input. This is called a stable sort, and is implemented in C++ with the
function stable_sort.
To check if a vector is sorted, the is_sorted function can be used. It takes
the same arguments as the sort function.
Problem 3.7
Shopaholic – shopaholic
Busy Schedule – busyschedule
Sort of Sorting – sortofsorting

Searching
The most basic search operation is the find function. It takes two iterators
representing an interval and a value. If one of the elements in the interval equals
the value, an iterator to the element is returned. In case of multiple matches the
first one is returned. Otherwise, the iterator provided as the end of the interval
is returned. The common usage is
find(v.begin(), v.end(), 5);

which would return an iterator to the first instance of 5 in the vector.


To find out how many times an element appears in a vector, the count function
takes the same arguments as the find function and returns the total number of
matches.
If the array is sorted, you can use the much faster binary search opera-
tions instead. The binary_search function takes as argument a sorted interval
given by two iterators and a value. It returns true if the interval contains the
value. The lower_bound and upper_bound functions takes the same arguments as
binary_search, but instead returns an iterator to the first element not less and
greater than the specified value, respectively. For more details on how these
are implemented, read Section 12.3.

Permutations
In some problems, the solution involves iterating through all permutations
(Section ??) of a vector. As one of few languages, C++ has a built-in func-
tions for this purpose: next_permutation. The function takes two iterators as
arguments and rearranges the interval they specify to be the next permutation
in lexicographical order. If there is no such permutation, the interval instead
becomes sorted and the function returns false. This suggests the following

58
3.9. S TRINGS

common pattern to iterate through all permutations of a vector v:


sort(v.begin(), v.end());
do {
// do something with v
} while (next_permutation(v.begin(), v.end()));

This do-while-syntax is similar to the while loop, except the condition is checked
after each iteration instead of before. It is equivalent to
sort(v.begin(), v.end());
while (true) {
// do something with v
if (!next_permutation(v.begin(), v.end())) {
break;
}
}

Problem 3.8
Veci – veci

3.9 Strings
We have already used the string type many times before. Until now one of the
essential features of a string has been omitted – a string is to a large extent like
a vector of chars. This is especially true in that you can access the individual
characters of a string using the [] operator. For a string
string thecowsays = "boo";

the expression thecowsays[0] is the character ’b’. Furthermore, you can push_back
new characters to the end of a string.
thecowsays.push_back('p');

would instead make the string boop.


Problem 3.9
Detailed Differences – detaileddifferences
Autori – autori
Skener – skener

Conversions
In some languages, the barrier between strings and e.g. integers is more fuzzy
than in C++. In Java, for example, the code "4" + 2 would append the character

59
C HAPTER 3. T HE C++ S TANDARD L IBRARY

’2’ to the string "4", yielding the string "42". This is not the case in C++ (what
errors do you get if you try to do this?).
Instead, there are other ways to convert between strings and other types. The
easiest way is through using the stringstream class. A stringstream essentially
works as a combined cin and cout. An empty stream is declared by
stringstream ss;

Values can be written to the stream using the << operator and read from it using
the >> operator. This can be exploited to convert strings to and from e.g. numeric
types like this:
stringstream numToString;
numToString << 5;
string val;
numToString >> val; // val is now the string "5"

stringstream stringToNum;
stringToNum << "5";
int val;
stringToNum >> val; // val is now the integer 5
Just as with cin, you can use a stringstream to determine what type the next
word is. If you try to read from a stringstream into an int but the next word is
not an integer, the expression will evaluate to false:
stringstream ss;
ss << "notaninteger";
int val;
if (ss >> val) {
cout << "read an integer!" << endl;
} else {
cout << "next word was not an integer" << endl;
}

Problem 3.10
Filip – filip
Stacking Cups – cups

3.10 Input/Output
Input and output is primarily handled by the cin and cout objects, as previsouly
witnessed. While they are very easy to use, adjustments are sometimes necessary.

Detecting End of File


The first advanced usage is reading input until we run out of input (often called
reading until the end-of-file). Normally, input formats are constructed so that

60
3.10. I NPUT /O UTPUT

you always know beforehand how many tokens of input you need to read. For
example, lists of integers are often either prefixed by the size of the list or
terminated by some special sentinel value. For those few times when we need
to read input until the end we use the fact that cin >> x is an expression that
evaluates to false if the input reading failed. This is also the case if you try to
read an int but the next word is not actually an integer. This kind of input loop
thus looks something like the following:
int num;
while (cin >> num) {
// do something with num
}

Problem 3.11
A Different Problem – different
Statistics – statistics

Input Line by Line


As we stated briefly in the C++ chapter, cin only reads a single word when used
as input to a string. This is a problem if the input format requires us to read
input line by line. The solution to this is the getline function, which reads text
until the next newline:
getline(cin, str);

Be warned that if you use cin to read a single word that is the last on its line,
the final newline is not consumed. That means that for an input such as

word
blah blah

the code
string word;
cin >> word;
string line;
getline(cin, line);

would produce an empty line! After cin >> word the newline of the line word still
remains, meaning that getline only reads the (zero) remaining characters until
the newline. To avoid this problem, you need to use cin.ignore(); to ignore the
extra newline before your getline.
Once a line has been read we often need to process all the words on the line
one by one. For this, we can use the stringstream:

61
C HAPTER 3. T HE C++ S TANDARD L IBRARY

stringstream line(str);
string word;
while (line >> word) {
// do something with word
}

The stringstream takes an argument that is the string you want to process. After
this, it works just like cin does, except reading input from the string instead of
the terminal. To use stringstream, add the include
#include <sstream>

Problem 3.12
Bacon Eggs and Spam – baconeggsandspam
Compound Words – compoundwords

Output Decimal Precision


Another common problem is that outputting decimal values with cout produces
numbers with too few decimals. Many problems stipulate that an answer is
considered correct if it is within some specified relative or absolute precision of
the judges’ answer. The default precision of cout is 10−6 . If a problem requires
higher precision, it must be set manually using e.g.
cout << setprecision(10);

If the function argument is 𝑥, the precision is set to 10−𝑥 . This means that the
above statement would set the precision of cout to 10−10 . This precision is
normally the relative precision of the output (i.e. the total number of digits to
print). If you want the precision to be absolute (i.e. specify the number of digits
after the decimal point) you write
cout << fixed;

Problem 3.13
A Real Challenge – areal

Chapter Exercises
Problem 3.14
Apaxiaaaaaaaaaaaans! – apaxiaaans
Different Distances – differentdistances
Odd Man Out – oddmanout

62
3.10. I NPUT /O UTPUT

Timebomb – timebomb
Missing Gnomes – missinggnomes

Chapter Notes
In this chapter, only the parts from the standard library we deemed most important
to problem solving were extracted. The standard library is much larger than this,
of course. While you will almost always get by using only what we discussed
additional knowledge of the library can make you a faster, more effective coder.
For a good overview of the library, cppreference.com 1 contains lists of the
library contents categorized by topic.

1 http://en.cppreference.com/w/cpp

63
C HAPTER 3. T HE C++ S TANDARD L IBRARY

64
4 Implementation Problems
The “simplest” kind of problem we solve is those where the statement of a
problem is so detailed that the difficult part is not figuring out the solution, but
implementing it in code. This type of problem is mostly given in the form of
performing some calculation or simulating some process based on a list of rules
stated in the problem.

The Recipe
Swedish Olympiad in Informatics 2011, School Qualifiers (CC BY-SA 3.0)
You have decided to cook some food. The dish you are going to make requires
𝑁 different ingredients. For every ingredient, you know the amount you have at
home, how much you need for the dish, and how much it costs to buy (per unit).
If you do not have a sufficient amount of some ingredient you need to buy
the remainder from the store. Your task is to compute the cost of buying the
remaining ingredients.
Input
The first line of input is an integer 𝑁 ≤ 10, the number of ingredients in the
dish.
The next 𝑁 lines contain the information about the ingredients, one per line.
An ingredient is given by three space-separated integers 0 ≤ ℎ, 𝑛, 𝑐 ≤ 200 – the
amount you have, the amount you need, and the cost per unit for this ingredient.
Output
Output a single integer – the cost for purchasing the remaining ingredients
needed to make the dish.
This problem is not particularly hard. For every ingredient we need to
calculate the amount which we need to purchase. The only gotcha in the problem
is the mistake of computing this as 𝑛 − ℎ. The correct formula is max(0, 𝑛 − ℎ),
required in case of the luxury problem of having more than we need. We then
multiply this number by the ingredient cost and sum the costs up for all the
ingredients. A solution would look something like the following.

65
C HAPTER 4. I MPLEMENTATION P ROBLEMS

1: procedure Recipe(𝑁 , 𝑖𝑛𝑡 [] ℎ, 𝑖𝑛𝑡 [] 𝑛, 𝑖𝑛𝑡 [] 𝑐)


2: ans ← 0
3: for 𝑖 ← 0 to 𝑁 − 1 do
4: ans ← ans + max(0, 𝑛[𝑖] − ℎ[𝑖]) · 𝑐 [𝑖]
5: return ans

Generally, the implementation problems are the easiest type of problems in


a contest. They do not require much algorithmic knowledge so more contestants
are able to solve them. However, not every implementation problem is easy to
code. Just because implementation problems are easy to spot, understand, and
formulate a solution to, you should not underestimate the difficulty coding them.
Contestants usually fail implementation problems either because the algorithm
you are supposed to implement is very complicated with many easy-to-miss
details, or because the amount of code is very large. In the latter case, you are
more prone to bugs because more lines of code tend to include more bugs.
Let us study a straightforward implementation problem that turned out to be
rather difficult to code.

Game Rank
Nordic Collegiate Programming Contest 2016 – Jimmy Mårdell (CC BY-SA 3.0)
The gaming company Sandstorm is developing an online two player game. You
have been asked to implement the ranking system. All players have a rank
determining their playing strength which gets updated after every game played.
There are 25 regular ranks, and an extra rank, “Legend”, above that. The ranks
are numbered in decreasing order, 25 being the lowest rank, 1 the second highest
rank, and Legend the highest rank.
Each rank has a certain number of “stars” that one needs to gain before
advancing to the next rank. If a player wins a game, she gains a star. If before
the game the player was on rank 6-25, and this was the third or more consecutive
win, she gains an additional bonus star for that win. When she has all the stars
for her rank (see list below) and gains another star, she will instead gain one
rank and have one star on the new rank.
For instance, if before a winning game the player had all the stars on her
current rank, she will after the game have gained one rank and have 1 or 2 stars
(depending on whether she got a bonus star) on the new rank. If on the other
hand she had all stars except one on a rank, and won a game that also gave her a

66
bonus star, she would gain one rank and have 1 star on the new rank.
If a player on rank 1-20 loses a game, she loses a star. If a player has zero
stars on a rank and loses a star, she will lose a rank and have all stars minus one
on the rank below. However, one can never drop below rank 20 (losing a game
at rank 20 with no stars will have no effect).
If a player reaches the Legend rank, she will stay legend no matter how many
losses she incurs afterwards.
The number of stars on each rank are as follows:

• Rank 25-21: 2 stars

• Rank 20-16: 3 stars

• Rank 15-11: 4 stars

• Rank 10-1: 5 stars

A player starts at rank 25 with no stars. Given the match history of a player,
what is her rank at the end of the sequence of matches?
Input
The input consists of a single line describing the sequence of matches. Each
character corresponds to one game; ‘W’ represents a win and ‘L’ a loss. The
length of the line is between 1 and 10 000 characters (inclusive).
Output
Output a single line containing a rank after having played the given sequence of
games; either an integer between 1 and 25 or “Legend”.

A very long problem statement! The first hurdle is finding the energy to
read it from start to finish without skipping any details. Not much creativity
is needed here – indeed, the algorithm to implement is given in the statement.
Despite this, it is not as easy as one would think. Although it was the second
most solved problem at the contest where it was used in, it was also the one
with the worst success ratio. On average, a team needed 3.59 attempts before
getting a correct solution, compared to the runner-up problem at 2.92 attempts.
None of the top 6 teams in the contest got the problem accepted on their first
attempt. Failed attempts cost a lot. Not only in absolute time, but many forms
of competition include additional penalties for submitting incorrect solutions.
Implementation problems get much easier when you know your programming

67
C HAPTER 4. I MPLEMENTATION P ROBLEMS

language well and can use it to write good, structured code. Split code into
functions, use structures, and give your variables good names and implementation
problems become easier to code. A solution to the Game Rank problem which
attempts to use this approach is given here:

1 #include <bits/stdc++.h>
2
3 using namespace std;
4
5 int curRank = 25, curStars = 0, conseqWins = 0;
6
7 int starsOfRank() {
8 if (curRank >= 21) return 2;
9 if (curRank >= 16) return 3;
10 if (curRank >= 11) return 4;
11 if (curRank >= 1) return 5;
12 assert(false);
13 }
14
15 void addStar() {
16 if (curStars == starsOfRank()) {
17 --curRank;
18 curStars = 0;
19 }
20 ++curStars;
21 }
22
23 void addWin() {
24 int curStarsWon = 1;
25 ++conseqWins;
26 if (conseqWins >= 3 && curRank >= 6) curStarsWon++;
27
28 for (int i = 0; i < curStarsWon; i++) {
29 addStar();
30 }
31 }
32
33 void loseStar() {
34 if (curStars == 0) {
35 if (curRank == 20) return;
36 ++curRank;
37 curStars = starsOfRank();
38 }
39 --curStars;
40 }
41
42 void addLoss() {
43 conseqWins = 0;
44 if (curRank <= 20) loseStar();

68
45 }
46
47 int main() {
48 string seq;
49 cin >> seq;
50 for (char res : seq) {
51 if (res == 'W') addWin();
52 else addLoss();
53 if (curRank == 0) break;
54 assert(1 <= curRank && curRank <= 25);
55 assert(0 <= curStars && curStars <= starsOfRank());
56 }
57 if (curRank == 0) cout << "Legend" << endl;
58 else cout << curRank << endl;
59 }

Note the use of the assert() function. The function takes a single boolean
parameter and crashes the program with an assertion failure if the parameter
evaluated to false. This is helpful when solving problems since it allows us to
verify that assumptions we make regarding the internal state of the program
indeed holds. In fact, when the above solution was written the assertions in it
actually managed to catch some bugs before submitting the problem!
Problem 4.1
Game Rank – gamerank

Next, we work through a complex implementation problem, starting with a


long, hard-to-read solution with a few bugs. Then, we refactor it a few times
until it is correct and easy to read.

Mate in One
Introduction to Algorithms at Danderyds Gymnasium
"White to move, mate in one."

When you are looking back in old editions of the New in Chess magazine,
you find loads of chess puzzles. Unfortunately, you realize that it was way too
long since you played chess. Even trivial puzzles such as finding a mate in one
now far exceed your ability.
But, perseverance is the key to success. You realize that you can instead use
your new-found algorithmic skills to solve the problem by coding a program to
find the winning move.

69
C HAPTER 4. I MPLEMENTATION P ROBLEMS

You will be given a chess board, which satisfy:

• No player may castle.

• No player can perform an en passant1 .

• The board is a valid chess position.

• White can mate black in a single, unique move.

Write a program to output the move white should play to mate black.
Input
The board is given as a 8 × 8 grid of letters. The letter . represent an empty
space, the characters pbnrqk represent a white pawn, bishop, knight, rook, queen
and king, and the characters PBNRQK represents a black pawn, bishop, knight,
rook, queen and king.
Output
Output a move on the form a1b2, where a1 is the square to move a piece from
(written as the column, a-h, followed by the row, 1-8) and b2 is the square to
move the piece to.
Our first solution attempt clocks in at about 300 lines.
1 #include <bits/stdc++.h>
2 using namespace std;
3
4 #define rep(i,a,b) for (int i = (a); i < (b); ++i)
5 #define trav(it, v) for (auto& it : v)
6 #define all(v) (v).begin(), (v).end()
7 typedef pair<int, int> ii;
8 typedef vector<ii> vii;
9 template <class T> int size(T &x) { return x.size(); }
10
11 char board[8][8];
12
13 bool iz_empty(int x, int y) {
14 return board[x][y] == '.';
15 }
16
17 bool is_white(int x, int y) {
18 return board[x][y] >= 'A' && board[x][y] <= 'Z';
19 }

1If you are not aware of this special pawn rule, do not worry – knowledge of it is irrelevant with
regard to the problem.

70
20
21 bool is_valid(int x, int y) {
22 return x >= 0 && x < 8 && y >= 0 && y < 8;
23 }
24
25 int rook[8][2] = {
26 {1, 2},
27 {1, -2},
28 {-1, 2},
29 {-1, -2},
30
31 {2, 1},
32 {-2, 1},
33 {2, -1},
34 {-2, -1}
35 };
36
37 void display(int x, int y) {
38 printf("%c%d", y + 'a', 7 - x + 1);
39 }
40
41 vii next(int x, int y) {
42 vii res;
43
44 if (board[x][y] == 'P' || board[x][y] == 'p') {
45 // pawn
46
47 int dx = is_white(x, y) ? -1 : 1;
48
49 if (is_valid(x + dx, y) && iz_empty(x + dx, y)) {
50 res.push_back(ii(x + dx, y));
51 }
52
53 if (is_valid(x + dx, y - 1)
54 && is_white(x, y) != is_white(x + dx, y - 1)) {
55 res.push_back(ii(x + dx, y - 1));
56 }
57
58 if (is_valid(x + dx, y + 1)
59 && is_white(x, y) != is_white(x + dx, y + 1)) {
60 res.push_back(ii(x + dx, y + 1));
61 }
62
63 } else if (board[x][y] == 'N' || board[x][y] == 'n') {
64 // knight
65
66 for (int i = 0; i < 8; i++) {
67 int nx = x + rook[i][0],
68 ny = y + rook[i][1];
69

71
C HAPTER 4. I MPLEMENTATION P ROBLEMS

70 if (is_valid(nx, ny) && (iz_empty(nx, ny) ||


71 is_white(x, y) != is_white(nx, ny))) {
72 res.push_back(ii(nx, ny));
73 }
74 }
75
76 } else if (board[x][y] == 'B' || board[x][y] == 'b') {
77 // bishop
78
79 for (int dx = -1; dx <= 1; dx++) {
80 for (int dy = -1; dy <= 1; dy++) {
81 if (dx == 0 && dy == 0)
82 continue;
83
84 if ((dx == 0) != (dy == 0))
85 continue;
86
87 for (int k = 1; ; k++) {
88 int nx = x + dx * k,
89 ny = y + dy * k;
90
91 if (!is_valid(nx, ny)) {
92 break;
93 }
94
95 if (iz_empty(nx, ny) || is_white(x, y) != is_white(nx, ny)) {
96 res.push_back(ii(nx, ny));
97 }
98
99 if (!iz_empty(nx, ny)) {
100 break;
101 }
102 }
103 }
104 }
105
106 } else if (board[x][y] == 'R' || board[x][y] == 'r') {
107 // rook
108
109 for (int dx = -1; dx <= 1; dx++) {
110 for (int dy = -1; dy <= 1; dy++) {
111 if ((dx == 0) == (dy == 0))
112 continue;
113
114 for (int k = 1; ; k++) {
115 int nx = x + dx * k,
116 ny = y + dy * k;
117
118 if (!is_valid(nx, ny)) {
119 break;

72
120 }
121
122 if (iz_empty(nx, ny) || is_white(x, y) != is_white(nx, ny)) {
123 res.push_back(ii(nx, ny));
124 }
125
126 if (!iz_empty(nx, ny)) {
127 break;
128 }
129 }
130 }
131 }
132
133 } else if (board[x][y] == 'Q' || board[x][y] == 'q') {
134 // queen
135
136 for (int dx = -1; dx <= 1; dx++) {
137 for (int dy = -1; dy <= 1; dy++) {
138 if (dx == 0 && dy == 0)
139 continue;
140
141 for (int k = 1; ; k++) {
142 int nx = x + dx * k,
143 ny = y + dy * k;
144
145 if (!is_valid(nx, ny)) {
146 break;
147 }
148
149 if (iz_empty(nx, ny) || is_white(x, y) != is_white(nx, ny)) {
150 res.push_back(ii(nx, ny));
151 }
152
153 if (!iz_empty(nx, ny)) {
154 break;
155 }
156 }
157 }
158 }
159
160
161 } else if (board[x][y] == 'K' || board[x][y] == 'k') {
162 // king
163
164 for (int dx = -1; dx <= 1; dx++) {
165 for (int dy = -1; dy <= 1; dy++) {
166 if (dx == 0 && dy == 0)
167 continue;
168
169 int nx = x + dx,

73
C HAPTER 4. I MPLEMENTATION P ROBLEMS

170 ny = y + dy;
171
172 if (is_valid(nx, ny) && (iz_empty(nx, ny) ||
173 is_white(x, y) != is_white(nx, ny))) {
174 res.push_back(ii(nx, ny));
175 }
176 }
177 }
178 } else {
179 assert(false);
180 }
181
182 return res;
183 }
184
185 bool is_mate() {
186
187 bool can_escape = false;
188
189 char new_board[8][8];
190
191 for (int x = 0; !can_escape && x < 8; x++) {
192 for (int y = 0; !can_escape && y < 8; y++) {
193 if (!iz_empty(x, y) && !is_white(x, y)) {
194
195 vii moves = next(x, y);
196 for (int i = 0; i < size(moves); i++) {
197 for (int j = 0; j < 8; j++)
198 for (int k = 0; k < 8; k++)
199 new_board[j][k] = board[j][k];
200
201 new_board[moves[i].first][moves[i].second] = board[x][y];
202 new_board[x][y] = '.';
203
204 swap(new_board, board);
205
206
207 bool is_killed = false;
208 for (int j = 0; !is_killed && j < 8; j++) {
209 for (int k = 0; !is_killed && k < 8; k++) {
210 if (!iz_empty(j, k) && is_white(j, k)) {
211 vii nxts = next(j, k);
212
213 for (int l = 0; l < size(nxts); l++) {
214 if (board[nxts[l].first][nxts[l].second] == 'k') {
215 is_killed = true;
216 break;
217 }
218 }
219 }

74
220 }
221 }
222
223 swap(new_board, board);
224
225 if (!is_killed) {
226 can_escape = true;
227 break;
228 }
229 }
230
231 }
232 }
233 }
234
235 return !can_escape;
236 }
237
238 int main()
239 {
240 for (int i = 0; i < 8; i++) {
241 for (int j = 0; j < 8; j++) {
242 scanf("%c", &board[i][j]);
243 }
244
245 scanf("\n");
246 }
247
248 char new_board[8][8];
249 for (int x = 0; x < 8; x++) {
250 for (int y = 0; y < 8; y++) {
251 if (!iz_empty(x, y) && is_white(x, y)) {
252
253 vii moves = next(x, y);
254
255 for (int i = 0; i < size(moves); i++) {
256
257 for (int j = 0; j < 8; j++)
258 for (int k = 0; k < 8; k++)
259 new_board[j][k] = board[j][k];
260
261 new_board[moves[i].first][moves[i].second] = board[x][y];
262 new_board[x][y] = '.';
263
264 swap(new_board, board);
265
266
267 if (board[moves[i].first][moves[i].second] == 'P' &&
268 moves[i].first == 0) {
269

75
C HAPTER 4. I MPLEMENTATION P ROBLEMS

270 board[moves[i].first][moves[i].second] = 'Q';


271 if (is_mate()) {
272 printf("%c%d%c%d\n", y + 'a', 7 - x + 1,
273 moves[i].second + 'a', 7 - moves[i].first + 1);
274 return 0;
275 }
276
277 board[moves[i].first][moves[i].second] = 'N';
278 if (is_mate()) {
279 printf("%c%d%c%d\n", y + 'a', 7 - x + 1,
280 moves[i].second + 'a', 7 - moves[i].first + 1);
281 return 0;
282 }
283
284 } else {
285 if (is_mate()) {
286 printf("%c%d%c%d\n", y + 'a', 7 - x + 1,
287 moves[i].second + 'a', 7 - moves[i].first + 1);
288 return 0;
289 }
290 }
291
292 swap(new_board, board);
293 }
294 }
295 }
296 }
297
298 assert(false);
299
300 return 0;
301 }

That is a lot of code! Note how there are a few obvious mistakes which
makes the code harder to read, such as typo of iz_empty instead of is_empty, or
how the list of moves for the knight is called rook. Our final solution reduces
this to less than half the size.
Exercise 4.1. Read through the above code carefully and consider if there are
better ways to solve the problem. Furthermore, it has a bug – can you find it?
First, let us clean up the move generation a bit. Currently, it is implemented
as the function next, together with some auxillary data (lines 25-179). It is not
particularly abstract, plagued by a lot of code duplication.
The move generation does not need a lot of code. Almost all the moves of
the pieces can be described in the same way, as: “pick a direction out of a list
𝐷 and move at most 𝐿 steps along this direction, stopping either before exiting

76
the board or taking your own piece, or when taking another piece.”. For the
king and queen, 𝐷 is all 8 directions one step away, with 𝐿 = 1 for the king and
𝐿 = ∞ for the queen.
Implementing this abstraction is done with little code.
1 const vii DIAGONAL = {{-1, 1}, {-1, 1}, {1, -1}, {1, 1}};
2 const vii CROSS = {{0, -1}, {0, 1}, {-1, 0}, {1, 0}};
3 const vii ALL_MOVES = {{-1, 1}, {-1, 1}, {1, -1}, {1, 1},
4 {0, -1}, {0, 1}, {-1, 0}, {1, 0}};
5 const vii KNIGHT = {{-1, -2}, {-1, 2}, {1, -2}, {1, 2},
6 {-2, -1}, {-2, 1}, {2, -1}, {2, 1}};
7 vii directionMoves(const vii& D, int L, int x, int y) {
8 vii moves;
9 trav(dir, D) {
10 rep(i,1,L+1) {
11 int nx = x + dir.first * i, ny = y + dir.second * i;
12 if (!isValid(nx, ny)) break;
13 if (isEmpty(nx, ny)) moves.emplace_back(nx, ny);
14 else {
15 if (isWhite(x, y) != isWhite(nx, ny)) moves.emplace_back(nx, ny);
16 break;
17 }
18 }
19 }
20 return moves;
21 }

A short and sweet abstraction, that will prove very useful. It handles all
possible moves, except for pawns. These have a few special cases.
1 vii pawnMoves(int x, int y) {
2 vii moves;
3 if (x == 0 || x == 7) {
4 vii queenMoves = directionMoves(ALL_MOVES, 16, x, y);
5 vii knightMoves = directionMoves(KNIGHT, 1, x, y);
6 queenMoves.insert(queenMoves.begin(), all(knightMoves));
7 return queenMoves;
8 }
9 int mv = (isWhite(x, y) ? - 1 : 1);
10 if (isValid(x + mv, y) && isEmpty(x + mv, y)) {
11 moves.emplace_back(x + mv, y);
12 bool canMoveTwice = (isWhite(x, y) ? x == 6 : x == 1);
13 if (canMoveTwice && isValid(x + 2 * mv, y) && isEmpty(x + 2 * mv, y)) {
14 moves.emplace_back(x + 2 * mv, y);
15 }
16 }
17 auto take = [&](int nx, int ny) {
18 if (isValid(nx, ny) && !isEmpty(nx, ny)
19 && isWhite(x, y) != isWhite(nx, ny))

77
C HAPTER 4. I MPLEMENTATION P ROBLEMS

20 moves.emplace_back(nx, ny);
21 };
22 take(x + mv, y - 1);
23 take(x + mv, y + 1);
24 return moves;
25 }

This pawn implementation also takes care of promotion, rendering the logic
previously implementing this obsolete.
The remainder of the move generation is now implemented as:
1 vii next(int x, int y) {
2 vii moves;
3 switch(toupper(board[x][y])) {
4 case 'Q': return directionMoves(ALL_MOVES, 16, x, y);
5 case 'R': return directionMoves(CROSS, 16, x, y);
6 case 'B': return directionMoves(DIAGONAL, 16, x, y);
7 case 'N': return directionMoves(KNIGHT, 1, x, y);
8 case 'K': return directionMoves(ALL_MOVES, 1, x, y);
9 case 'P': return pawnMoves(x, y);
10 }
11 return moves;
12 }

These make up a total of about 50 lines – a reduction to a third of how the


move generation was implemented before. The trick was to rework all code
duplication into a much cleaner abstraction.
We also have a lot of code duplication in the main (lines 234-296) and is_mate
(lines 181-232) functions. Both functions loop over all possible moves, with
lots of duplication. First of all, let us further abstract the move generation to not
only generate the moves a certain piece can make, but all the moves a player can
make. This is done in both functions, so we should be able to extract this logic
into only one place:
1 vector<pair<ii, ii>> getMoves(bool white) {
2 vector<pair<ii, ii>> allMoves;
3 rep(x,0,8) rep(y,0,8) if (!isEmpty(x, y) && isWhite(x, y) == white) {
4 vii moves = next(x, y);
5 trav(it, moves) allMoves.emplace_back(ii{x, y}, it);
6 }
7 return allMoves;
8 }

We also have some duplication in the code making the moves. Before
extracting this logic, we will change the structure used to represent the board. A

78
char[8][8] is a tedious structure to work with. It is not easily copied or sent as
parameter. Instead, we use a vector<string>, typedef’d as Board:
typedef vector<string> Board;

We then add a function to make a move, returning a new board:


1 Board doMove(pair<ii, ii> mv) {
2 Board newBoard = board;
3 ii from = mv.first, to = mv.second;
4 newBoard[to.first][to.second] = newBoard[from.first][from.second];
5 newBoard[from.first][from.second] = '.';
6 return newBoard;
7 }

Hmm... there should be one more thing in common between the main and
is_mate functions. Namely, to check if the current player is in check after a move.
However, it seems this is not done in the main function – a bug. Since we do
need to do this twice, it should probably be its own function:
1 bool inCheck(bool white) {
2 trav(mv, getMoves(!white)) {
3 ii to = mv.second;
4 if (!isEmpty(to.first, to.second)
5 && isWhite(to.first, to.second) == white
6 && toupper(board[to.first][to.second]) == 'K') {
7 return true;
8 }
9 }
10 return false;
11 }

Now, the long is_mate function is much shorter and readable, thanks to our
refactoring:
1 bool isMate() {
2 if (!inCheck(false)) return false;
3 Board oldBoard = board;
4 trav(mv, getMoves(false)) {
5 board = doMove(mv);
6 if (!inCheck(false)) return false;
7 board = oldBoard;
8 }
9 return true;
10 }

A similar transformation is now possible of the main function, that loops


over all moves white make and checks if black is in mate:

79
C HAPTER 4. I MPLEMENTATION P ROBLEMS

1 int main() {
2 rep(i,0,8) {
3 string row;
4 cin >> row;
5 board.push_back(row);
6 }
7 Board oldBoard = board;
8 trav(mv, getMoves(true)) {
9 board = doMove(mv);
10 if (!inCheck(true) && isMate()) {
11 outputSquare(mv.first.first, mv.first.second);
12 outputSquare(mv.second.first, mv.second.second);
13 cout << endl;
14 break;
15 }
16 }
17 return 0;
18 }

Now, we have actually rewritten the entire solution. From the 300-line
behemoth with gigantic functions, we have refactored the solution into few,
short functions with are easy to follow. The rewritten solution is less than half
the size, clocking in at less than 140 lines (the author’s own solution is 120
lines). Learning to code such structured solutions comes to a large extent from
experience. During a competition, we might not spend time thinking about
how to structure our solutions, instead focusing on getting it done as soon as
possible. However, spending 1-2 minutes thinking about how to best implement
a complex solution could pay off not only in faster implementation times (such
as halving the size of the program) but also in being less buggy.
Problem 4.2
Mate in One – mateinone

To sum up: implementation problems should not be underestimated in terms


of implementation complexity. Work on your coding best practices and spend
time practicing coding complex solutions and you will see your implementation
performance improve.

Chapter Exercises
Problem 4.3
Flexible Spaces – flexiblespaces

80
Permutation Encryption – permutationencryption
Jury Jeopardy – juryjeopardy
Fun House – funhouse
Settlers of Catan – settlers2
Cross – cross
Basic Interpreter – basicinterpreter
Cat Coat Colors – catcoat

Chapter Notes
Many good sources exist to become more proficient at writing readable and
simple code. Clean Code[17] describes many principles that helps in writing
better code. It includes good walk-throughs on refactoring, and shows in a very
tangible fashion how coding cleanly also makes coding easier.
Code Complete[18] is a huge tome on improving your programming skills.
While much of the content is not particularly relevant to coding algorithmic
problems, chapters 5-19 give many suggestions on coding style.
Different languages have different best practices. Some resources on
improving your skills in whatever language you code in are:

C++ Effective C++[20], Effective Modern C++[21], Effective STL[19], by


Scott Meyers,

Java Effective Java[5] by Joshua Bloch,

Python Effective Python[25] by Brett Slatkin, Python Cookbook[4] by David


Beazley and Brian K. Jones.

81
C HAPTER 4. I MPLEMENTATION P ROBLEMS

82
5 Time Complexity
How do you know if your algorithm is fast enough before you have coded it? In
this chapter we examine this question from the perspective of time complexity, a
common tool of algorithm analysis to determine roughly how fast an algorithm
is.
We start our study of complexity by looking at a new sorting algorithm
– insertion sort. Just like selection sort (studied in Chapter 1), insertion sort
works by iteratively sorting a sequence.

5.1 The Complexity of Insertion Sort


The insertion sort algorithm works by ensuring that all of the first 𝑖 elements of
the input sequence are sorted. First for 𝑖 = 1, then for 𝑖 = 2, etc, up to 𝑖 = 𝑛, at
which point the entire sequence is sorted.
Insertion Sort
Assume we wish to sort the list 𝑎 0, 𝑎 1, ..., 𝑎 𝑁 −1 of 𝑁 integers. If we know
that the first 𝐾 elements 𝑎 0, ..., 𝑎𝐾−1 numbers are sorted, we can make the
list 𝑎 0, ..., 𝑎𝐾 sorted by taking the element 𝑎𝐾 and inserting it into the correct
position of the already-sorted prefix 𝑎 0, ..., 𝑎𝐾−1 .
For example, we know that a list of a single element is always sorted,
so we can use that 𝑎 0 is sorted as a base case. We can then sort 𝑎 0, 𝑎 1 by
checking whether 𝑎 1 should be to the left or to the right of 𝑎 0 . In the first case,
we swap the two numbers.
Once we have sorted 𝑎 0, 𝑎 1 , we insert 𝑎 2 into the sorted list. If it is larger
than 𝑎 1 , it is already in the correct place. Otherwise, we swap 𝑎 1 and 𝑎 2 ,
and keep going until we either find the correct location, or determine that
the number was the smallest one in which case the correct location is in the
beginning.
This procedure is then repeated for every remaining element.

In this section we determine how long time insertion sort takes to run. When
analyzing an algorithm we do not attempt to compute the actual wall clock time
an algorithm takes. Indeed, this would be nearly impossible a priori – modern

83
C HAPTER 5. T IME C OMPLEXITY

computers are complex beasts with often unpredictable behavior. Instead, we


try to approximate the growth of the running time, as a function of the size of
the input.
Competitive Tip
While it is difficult to measure exact wall-clock time of your algorithm just by analyzing
the algorithm and code, it is sometimes a good idea to benchmark your solution before
you submit it to the judge. This way you trade a few minutes of time (constructing the
worst-case input) for avoiding many time limit exceeded verdicts. If you are unsure of
your solution and the competition format penalizes you for rejected submissions, this
tradeoff can have good value.

When sorting fixed-size integers the size of the input would be the number of
elements we are sorting, 𝑁 . We denote the time the algorithm takes in relation
to 𝑁 as 𝑇 (𝑁 ). Since an algorithm often has different behaviours depending on
how an instance is constructed, this time is taken to be the worst-case time, over
every instance of 𝑁 elements.

5 2 4 1 3 0 t0 = 0

5 2 4 1 3 0 t1 = 1

2 5 4 1 3 0 t2 = 1

2 4 5 1 3 0 t3 = 3

1 2 4 5 3 0 t4 = 2

1 2 3 4 5 0 t5 = 5

0 1 2 3 4 5

Figure 5.1: Insertion Sort sorting the sequence 2, 1, 4, 5, 3, 0.

84
5.1. T HE C OMPLEXITY OF I NSERTION S ORT

To properly analyze an algorithm, we need to be more precise about exactly


what it does. We give the following pseudo code for insertion sort:

1: procedure InsertionSort(𝐴) ⊲ Sorts the sequence 𝐴 containing 𝑁 elements


2: for 𝑖 ← 0 to 𝑁 − 1 do
3: 𝑗 ←𝑖
4: while 𝑗 > 0 and 𝐴[ 𝑗] < 𝐴[ 𝑗 − 1] do
5: Swap 𝐴[ 𝑗] och 𝐴[ 𝑗 − 1]
6: 𝑗 ← 𝑗 −1

To analyze the running time of the algorithm, we make the assumption that
any “sufficiently small” operation takes the same amount of time – exactly 1
(of some undefined unit). We have to be careful in what assumptions we make
regarding what a sufficiently small operation means. For example, sorting 𝑁
numbers is not a small operation, while adding or multiplying two fixed-size
numbers is. Multiplication of integers of arbitrary size is not a small operation
(see the Karatsuba algorithm, Section 12.4).
In our program every line happens to represent a small operation. However,
the two loops may cause some lines to execute more than once. The outer for
loop will execute 𝑁 times. The number of times the inner loop runs depends
on how the input looks. We introduce the notation 𝑡𝑖 to mean the number of
iterations the inner loop runs during the 𝑖’th iteration of the outer loop. These
are included in figure 5.1 for every iteration.
Now we can annotate our pseudo code with the number of times each line
executes.

1: procedure InsertionSort(𝐴) ⊲ Sorts the sequence 𝐴 containing 𝑁 elements


2: for 𝑖 ← 0 to 𝑁 − 1 do ⊲ Runs 𝑁 times, cost 1
3: 𝑗 ←𝑖 ⊲ Runs 𝑁 times, cost 1
while 𝑗 > 0 and 𝐴[ 𝑗] < 𝐴[ 𝑗 − 1] do ⊲ Runs 𝑖=0 𝑡 times, cost 1
Í𝑁 −1
4:
Í𝑁 −1 𝑖
5: Swap 𝐴[ 𝑗] och 𝐴[ 𝑗 − 1] ⊲ Runs 𝑖=0 𝑡 times, cost 1
Í𝑁 −1 𝑖
6: 𝑗 ← 𝑗 −1 ⊲ Runs 𝑖=0 𝑡𝑖 times, cost 1

We can now express 𝑇 (𝑁 ) as

𝑁 −1
! 𝑁 −1
! 𝑁 −1
!
Õ Õ Õ
𝑇 (𝑁 ) = 𝑁 + 𝑁 + 𝑡𝑖 + 𝑡𝑖 + 𝑡𝑖
𝑖=0 𝑖=0 𝑖=0

85
C HAPTER 5. T IME C OMPLEXITY

𝑁 −1
!
Õ
=3 𝑡𝑖 + 2𝑁
𝑖=0
We still have some 𝑡𝑖 variables left so we do not truly have a function of 𝑁 .
We can eliminate this by realizing that in the worst case 𝑡𝑖 = 𝑖. This occurs when
the list we are sorting is in descending order. Each element must then be moved
to the front, requiring 𝑖 swaps for the 𝑖’th element.
With this substition we can simplify the expression:
𝑁 −1
!
Õ
𝑇 (𝑁 ) = 3 𝑖 + 2𝑁
𝑖=0

(𝑁 − 1)𝑁
=3 + 2𝑁
2
3 2
= (𝑁 − 𝑁 ) + 2𝑁
2
3 𝑁
= 𝑁2 +
2 2
This function grows quadratically with the number of elements of 𝑁 . Since
the approximate growth of the time a function takes is assigned such importance
a notation was developed for it.

5.2 Asymptotic Notation


Almost always when we express the running time of an algorithm we use what
is called asymptotic notation. The notation captures the behavior of a function
as its arguments grow. For example, the function 𝑇 (𝑁 ) = 32 𝑁 2 + 𝑁2 which
described the running time of insertion sort, is bounded by 𝑐 · 𝑁 2 for large 𝑁 ,
for some constant 𝑐. We write

𝑇 (𝑁 ) = 𝑂 (𝑁 2 )

to state this fact.


Similarly, the linear function 2𝑁 + 15 is bounded by 𝑐 · 𝑁 for large 𝑁 , with
𝑐 = 3, so 2𝑁 +15 = 𝑂 (𝑁 ). The notation only captures upper bounds though (and
not the actual rate of growth). We could therefore say that 2𝑁 + 15 = 𝑂 (𝑁 2 ),
even though this particular upper bound is very lax. However, 𝑁 2 is not bounded
by 𝑐 · 𝑁 for any constant 𝑐 when 𝑁 is large, so 𝑁 2 ≠ 𝑂 (𝑁 ). This also corresponds
to intuition – quadratic functions grows much faster than linear functions.

86
5.2. A SYMPTOTIC N OTATION

Definition 5.1 — 𝑂-notation


Let 𝑓 and 𝑔 be non-negative functions from R ≥0 to R ≥0 . If there exists
positive constants 𝑛 0 and 𝑐 such that 𝑓 (𝑛) ≤ 𝑐𝑔(𝑛) whenever 𝑛 ≥ 𝑛 0 , we say
that 𝑓 (𝑛) = 𝑂 (𝑔(𝑛)).
T (N)

100N + 1337

N2

Figure 5.2: All linear functions are eventually outgrown by 𝑁 2 , so 𝑎𝑛 + 𝑏 = 𝑂 (𝑛 2 ).

Intuitively, the notation means that 𝑓 (𝑛) grows slower than or as fast as
𝑔(𝑛), within a constant factor. Any quadratic function 𝑎𝑛 2 + 𝑏𝑛 + 𝑐 = 𝑂 (𝑛 2 ).
Similarly, any linear function 𝑎𝑛 + 𝑏 = 𝑂 (𝑛 2 ) as well. This definition implies
that for two functions 𝑓 and 𝑔 which are always within a constant factor of each
other, we have that both 𝑓 (𝑛) = 𝑂 (𝑔(𝑛)) and 𝑔(𝑛) = 𝑂 (𝑓 (𝑛)).
We can use this definition to prove that the running time of insertion sort is
𝑂 (𝑁 2 ), even in the worst case.

Example 5.1 Prove that 32 𝑁 2 + 𝑁2 = 𝑂 (𝑁 2 ).

Proof. When 𝑁 ≥ 1 we have 𝑁 2 ≥ 𝑁 (by multiplying both sides with 𝑁 ).


This means that
3 2 𝑁 3 𝑁2 4 2
𝑁 + ≤ 𝑁2 + = 𝑁 = 2𝑁 2
2 2 2 2 2
for 𝑁 ≥ 1. Using the constants 𝑐 = 2 and 𝑛 0 = 1 we fulfill the condition
from the definition. 

87
C HAPTER 5. T IME C OMPLEXITY

For constants 𝑘 we say that 𝑘 = 𝑂 (1). This is a slight abuse of notation,


since neither 𝑘 nor 1 are functions, but it is a well-established abuse. If you
prefer, you can instead assume we are talking about functions 𝑘 (𝑁 ) = 𝑘 and
1(𝑁 ) = 1.
Competitive Tip
The following table describes approximately what complexity you need to solve a
problem of size 𝑛 if your algorithm has a certain complexity when the time limit is
about 1 second.

Complexity 𝑛
7
𝑂 (log 𝑛) 2 (10 )

𝑂 ( 𝑛) 1014
𝑂 (𝑛) 107
𝑂 (𝑛 log 𝑛) 106

𝑂 (𝑛 𝑛) 105
𝑂 (𝑛 2 ) 5 · 103
𝑂 (𝑛 2 log 𝑛) 2 · 103
𝑂 (𝑛 3 ) 300
𝑂 (2𝑛 ) 24
𝑂 (𝑛2𝑛 ) 20
𝑂 (𝑛 2 2𝑛 ) 17
𝑂 (𝑛!) 11

Table 5.1: Approximations of needed time complexities

Note that this is in no way a general rule – while complexity does not bother about
constant factors, wall clock time does!

Complexity analysis can also be used to determine lower bounds of the time
an algorithm takes. To reason about lower bounds we use Ω-notation. It is
similar to 𝑂-notation except it describes the reverse relation.

Definition 5.2 — Ω-notation


Let 𝑓 and 𝑔 be non-negative functions from R ≥0 to R ≥0 . If there exists
positive constants 𝑛 0 and 𝑐 such that 𝑐𝑔(𝑛) ≤ 𝑓 (𝑛) for every 𝑛 ≥ 𝑛 0 , we say
that 𝑓 (𝑛) = Ω(𝑔(𝑛)).

We know that the complexity of insertion sort has an upper bound of 𝑂 (𝑁 2 )


in the worst-case, but does it have a lower bound? It actually has the same lower

88
5.2. A SYMPTOTIC N OTATION

bound as upper bound, i.e. 𝑇 (𝑁 ) = Ω(𝑁 2 ).

Example 5.2 Prove that 32 𝑁 2 + 𝑁2 = Ω(𝑁 2 ).

Proof. When 𝑁 ≥ 1, we have 32 𝑁 2 + 𝑁2 ≥ 𝑁 2 . Using the constants 𝑐 = 1


and 𝑛 0 = 1 we fulfill the condition from the definition. 

In this case, both the lower and the upper bound of the worst-case running
time of insertion sort coincided (asymptotically). We have another notation for
when this is the case:
Definition 5.3 — Θ-notation
If 𝑓 (𝑛) = 𝑂 (𝑔(𝑛)) and 𝑓 (𝑛) = Ω(𝑔(𝑛)), we say that 𝑓 (𝑛) = Θ(𝑔(𝑛)).
Thus, the worst-case running time for insertion sort is Θ(𝑛 2 ).
There are many ways of computing the time complexity of an algorithm. The
most common case is when a program has 𝐾 nested loops, each of with performs
𝑂 (𝑀) iterations. The complexity of these loops are then 𝑂 (𝑀 𝐾 · 𝑓 (𝑁 )) if the
inner-most operation takes 𝑂 (𝑓 (𝑁 )) time. In Chapter 12, you also see some
ways of computing the time complexity of a particular type of recursive solution,
called Divide and Conquer algorithms.

Exercise 5.1. Find a lower and an upper bound that coincide for the best-case
running time for insertion sort.

Exercise 5.2. Give a Θ(𝑛) algorithm and a Θ(1) algorithm to compute the sum
of the 𝑛 first integers.

Exercise 5.3. Prove, using the definition, that 10𝑛 2 + 7𝑛 − 5 + log2 𝑛 = 𝑂 (𝑛 2 ).


What constants 𝑐, 𝑛 0 did you get?

Exercise 5.4. Prove that 𝑓 (𝑛) + 𝑔(𝑛) = Θ(max{𝑓 (𝑛), 𝑔(𝑛)}) for non-negative
functions 𝑓 and 𝑔.

Exercise 5.5. Determine whether, with proof:


1) Is 2𝑛+1 = 𝑂 (2𝑛 )?
2) Is 22𝑛 = 𝑂 (2𝑛 )?

Exercise 5.6. Prove that (𝑛 + 𝑎)𝑏 = Θ(𝑛𝑏 ) for positive constants 𝑎, 𝑏.

89
C HAPTER 5. T IME C OMPLEXITY

Amortized Complexity
Consider the following algorithm:

1: procedure CountOccurances(𝐴, 𝑣)⊲ Count the number of occurances of the


value 𝑣 in the sequence 𝐴
2: 𝑁 ← length of 𝐴
3: 𝑖←0
4: ans ← 0
5: while 𝑖 ≠ 𝑁 do
6: while 𝑖 < 𝑁 and 𝐴[𝑖] ≠ 𝑣 do
7: 𝑖 ←𝑖 +1
8: if 𝑖 ≠ 𝑁 then
9: ans ← ans + 1
10: 𝑖 ←𝑖 +1

It computes the number of occurances of a certain value in a sequence (in a


somewhat cumbersome way). We repeatedly scan through the sequence 𝐴 to
find the next occurance of 𝑣. Whenever we find one, we increase the answer by
1 and resume our scanning. This procedure is continued until we have scanned
through the entire sequence.
What is the time complexity of this procedure? How do we handle the fact
that neither the outer nor the inner loop of the algorithm repeats a predictable
number of iterations each time? For example, the other loop would iterate 𝑁
times if every element equals 𝑣, but only once if it does not contain the value
at all. Similarly, the the number of iterations of the inner loop depends on the
position of the elements in 𝐴 which equals 𝑣.
We find help in a technique called amortized complexity. The concept of
amortization is best known from loans. It represents the concept of periodically
paying of some kind of debt until it has been paid off in full.
When analyzing algorithms, the “debt” is the running time the algorithm
has. We try to prove that the algorithm has a given running time by looking at
many, possibly uneven parts that together over the long run sums up to the total
running time, similarly to how a loan can be paid of by amortization.
A quick analysis of the counting algorithm gives us a good guess that the
algorithm runs in time Θ(𝑁 ). It should be clear that it is the inner loop at lines
6-7 that dominates the running time of the algorithm. The question is how we

90
5.2. A SYMPTOTIC N OTATION

compute the number of times it executes even though we do not know how
many times it executes nor how many iterations each execution takes? We try
the amortization trick by looking at how many iterations it performs over all
those executions, no matter how many they are. Assume that the loop is run
𝑘 times (including the final time when the condition first is false) and each
run iterates 𝑏𝑖 times (1 ≤ 𝑖 ≤ 𝑘). We claim that

𝑏 1 + 𝑏 2 + · · · + 𝑏𝑘 = Θ(𝑁 )

Our reasoning is as follows. There are two ways the variable 𝑖 can increase.
It can either be increased inside the loop at line 7, or at line 10. If the loop
executes 𝑁 times in total, it will certainly complete and never be executed again
since the loop at line 5 completes too. This gives us 𝑘𝑖=1 𝑏𝑖 = 𝑂 (𝑁 ).
Í

On the other hand, we get one iteration for every time 𝑖 is incresed. If 𝑖 is
increased on line 7, it was done within a loop iteration. If 𝑖 is increased on line
9, we instead count the final check if the loop just before it once. Each addition
of 𝑖 seems happen together with an iteration of the loop, so 𝑘𝑖=1 𝑏𝑖 = Ω(𝑁 ).
Í

Together, these two results prove our claim.


This particular application of amortized complexity is called the aggregate
method.

Exercise 5.7. Consider the following method of implementing adding 1 to an


integer represented by its binary digit.

1: procedure BinaryIncrement(𝐷)
2: 𝑖←0
3: while 𝐷 [𝑖] = 1 do ⊲ Add 1 to the 𝑖’th digit
4: 𝐷 [𝑖] = 0 ⊲ We add 1 to a 1 digit, resulting in a 0 digit plus a carry
5: 𝑖 ←𝑖 +1
6: 𝐷 [𝑖] = 1 ⊲ We add 1 to a digit not resulting in a carry

The algorithm is the binary version of the normal addition algorithm where the
two addends are written above each other and each resulting digit is computed
one at a time, possibly with a carry digit.
What is the amortized complexity of this procedure over 2𝑛 calls, if 𝐷 starts
out as 0?

91
C HAPTER 5. T IME C OMPLEXITY

5.3 NP-complete problems


Of particular importance in computer science are the problems that can be solved
by algorithms running in polynomial time (often considered to be the “tractable”
problems). For some problems we do not yet know if there is an algorithm
whose time complexity is bounded by a polynomial. One particular class of
these are the NP-complete problems. They have the property that they are all
reducible to one another, in the sense that a polynomial-time algorithm to any
one of them yields a polynomial-time algorithm to all the others. Many of these
NP-complete problems (or problems reducible to such a problem, a property
called NP-hardness) appear in algorithmic problem solving. It is good to know
that they exist and that it is unlikely that you can find a polynomial-time solution.
During the course of this book, you will occasionally see such problems with
their NP-completeness mentioned.

5.4 Other Types of Complexities


There are several other types of complexities aside from the time complexity.
For example, the memory complexity of an algorithm measures the amount of
memory it uses. We use the same asymptotic notation when analyzing memory
complexity. In most modern programming competitions, the allowed memory
usage is high enough for the memory complexity not be a problem – if you get
memory limit problems you also tend to have time limit problems. However, it
is still of interest in computer science (and thus algorithmic problem solving)
and computer engineering in general.
Another common type of complexity is the query complexity. In some
problems (like the Guessing Problem from chapter 1), we are given access to
some kind of external procedure (called an oracle) that computes some value
given parameters that we provide. A procedure call of this kind is called a query.
The number of queries that an algorithm makes to the oracle is called its query
complexity. Problems where the algorithm is allowed access to an oracle often
bound the number of queries the algorithm may make. In these problems the
query complexity of the algorithm is of interest.

5.5 The Importance of Constant Factors


In this chapter, we have essentially told you not to worry about the constant
factors that the Θ notation hides from you. While this is true when you are

92
5.6. A DDITIONAL E XERCISES

solving problems in theory, only attempting to get a good asymptotic time


complexity, constant factors can unfortunately be of large importance when
implementing the problems subject to time limits.
Speeding up a program that has the correct time complexity but still gets
time limit exceeded when submitted to an online judge is half art and half
engineering. The first trick is usually to generate the worst-case test instance. It
is often enough to create a test case where the input matches the input limits
of the problem, but sometimes your program behaves differently depending on
how the input looks. In these cases, more complex reasoning may be required.
Once the worst-case instance has been generated, what remains is to improve
your code until you have gained a satisfactory decrease in how much time our
program uses. When doing this, focus should be placed on those segments of
your code that takes the longest wall-clock time. Decreasing time usage by 10%
in a segment that takes 1 second is clearly a larger win than decreasing time
usage by 100% in a segment that takes 0.01 seconds.
There are many tricks to improving your constant factors, such as:

• using symmetry to perform less calculations

• precomputing oft-repeated expressions, especially involving trigonometric


functions

• passing very large data structures by reference instead of by copying

• avoiding to repeatedly allocate large amounts of memory

• using SIMD (single instruction, multiple data) instructions

5.6 Additional Exercises


Exercise 5.8. Prove that if 𝑎(𝑥) = 𝑂 (𝑏 (𝑥)) and 𝑏 (𝑥) = 𝑂 (𝑐 (𝑥)), then 𝑎(𝑥) =
𝑂 (𝑐 (𝑥)), i.e. asymptotic notation is a transitive property. This means functions
can be ordered by their asymptotic growth.

Exercise 5.9. Order the following functions by their asymptotic growth with
proof !

• 𝑥

• 𝑥

93
C HAPTER 5. T IME C OMPLEXITY

• 𝑥2

• 2𝑥

• 𝑒𝑥

• 𝑥!

• log 𝑥
1
• 𝑥

• 𝑥 log 𝑥

• 𝑥3

Exercise 5.10. 1) Prove that if 𝑎(𝑥) = 𝑂 (𝑏 (𝑥)) and 𝑐 (𝑥) = 𝑂 (𝑑 (𝑥)), then
𝑎(𝑥) + 𝑐 (𝑥) = 𝑂 (𝑏 (𝑥) + 𝑑 (𝑥)).
2) Prove that if 𝑎(𝑥) = 𝑂 (𝑏 (𝑥)) and 𝑐 (𝑥) = 𝑂 (𝑑 (𝑥)), then 𝑎(𝑥) · 𝑐 (𝑥) =
𝑂 (𝑏 (𝑥) · 𝑑 (𝑥)).

5.7 Chapter Notes


Advanced algorithm analysis sometimes use complicated discrete mathemat-
ics, including number theoretical (as in Chapter 19) or combinatorial (as in
Chapter 17) facts. Concrete Mathematics [15] by Donald Knuth, et al, does a
thorough job on both accounts.
An Introduction to the Analysis of Algorithms [11] by Sedgewick and Flajolet
has a more explicit focus on the analysis of algorithms, mainly discussing
combinatorial analysis.
The study of various kinds of complexities constitute a research area
called computational complexity theory. Computational Complexity [22] by
Papadimitriou is a classical introduction to computational complexity, although
Computational Complexity: A Modern Approach [3] by Arora and Barak is a
more modern textbook, with recent results that the book by Papadimitriou lack.
While complexity theory is mainly concerned about the limits of specific
computational models on problems that can be solved within those models,
what can not be done by computers is also interesting. This is somewhat out
of scope for an algorithmic problem solving book (since we are interested in
those problems which can be solved), but is still of general interest. A book

94
5.7. C HAPTER N OTES

on e.g. automata theory (such as Introduction to Automata Theory, Languages,


and Computation [28] by Ullman et al) can be a good compromise, mixing both
some foundations of the theory of computation with topics more applicable to
algorithms (such as automatons and languages).

95
C HAPTER 5. T IME C OMPLEXITY

96
6 Data Structures
Solutions to algorithmic problems consists of two constructs – algorithms and
data structures. Data structures are used to organize the data that the algorithms
operate on. For example, the array is such a data structure.
Many data structures has been developed to handle particular common
operations we need to perform on data quickly. In this chapter we discuss some
of the basic data structures used in programming. We have chosen an approach
that is perhaps more theoretic than many other problem solving texts when
it comes to the basic structures. In particular, we have chosen to explicitly
discuss not only the data structures themselves and their complexities, but
also their implementations. We do this mainly because we believe that their
implementations show useful algorithmic techniques. While you may feel that
you can simply skip this chapter if you are familiar with all the data structures,
we advise that you still read through the sections for the structures for which
you lack confidence in their implementations.

6.1 Dynamic Arrays


The most basic data structure is the fixed-size array. It consists of a fixed size
block of memory and can be viewed as a sequence of 𝑁 variables of the same
type T. It supports the operations:

• T 𝑎𝑟𝑟 [] = new T[size]: creating a new array, with a given size.


Complexity: Θ(1)1

• delete 𝑎𝑟𝑟 : deleting an existing array.


Complexity: Θ(1)

• 𝑎𝑟𝑟 [𝑖𝑛𝑑𝑒𝑥]: accessing the value in a certain access a value.


Complexity: Θ(1)

1This complexity is debatable, and highly dependent on what computational model one uses. In
practice, this is roughly “constant time” in most memory management libraries used in C++. In all
Java and Python implementations we tried, it is instead linear in the size.

97
C HAPTER 6. D ATA S TRUCTURES

In Chapter 2 we saw how to create fixed-size arrays where we knew the size
beforehand. In C++ we can create fixed-size arrays using an expression as size
instead. This is done using the syntax above, for example:
int size = 5;
int arr[] = new int[size];
arr[2] = 5;
cout << arr[2] << endl;
delete arr;

Exercise 6.1. What happens if you try to create an array with a negative size?

The fixed-size array can be used to implement a more useful data structure,
the dynamic array. This is an array that can change size when needed. For
example, we may want to repeatedly insert values in the array without knowing
the total number of values beforehand. This is a very common requirement
in programming problems. In particular, we want to support two additional
operations in addition to the operations supported by the fixed-size array.

• insert(𝑝𝑜𝑠, 𝑣𝑎𝑙): inserting a value in the array at a given position.


Amortized complexity: Θ(size − pos)
Worst case complexity: Θ(size)

• remove(𝑝𝑜𝑠): erase a position in an array.


Complexity: Θ(size − pos)

The complexities we list above are a result of the usual implementation of


the dynamic array. A key consequence of these complexities is that addition and
removal of elements to the end of the dynamic array takes Θ(1) amortized time.
A dynamic array can be implemented in numerous ways, and underlies
the implementation of essentially every other data structure that we use. A
naive implementation of a dynamic array is using a fixed-size array to store the
data and creating a new one with the correct size whenever we add or remove
an element, copying the elements from the old array to the new array. The
complexity for this approach is linear in the size of the array for every operation
that changes the size since we need to copy Θ(𝑠𝑖𝑧𝑒) elements during every
update. We need to do better than this.
To achieve the targeted complexity, we can instead modify this naive approach
slightly by not creating a new array every time we have to change the size of the
dynamic array. Whenever we need to increase the size of the dynamic array, we

98
6.1. D YNAMIC A RRAYS

create a fixed-size array that is larger than we actually need it to be. For example,
if we create a fixed-size array with 𝑛 more elements than our dynamic array
needs to store, we will not have to increase the size of the backing fixed-size
array until we have added 𝑛 more elements to the dynamic array. This means that
a dynamic array does not only have a size, the number of elements we currently
store in it, but also a capacity, the number of elements we could store in it. See
Figure 6.1 for a concrete example of what happens when we add elements to a
dynamic array that is both within its capacity and when we exceed it.

size = 4 size = 5

0 1 2 3 0 1 2 3 4

cap = 5 cap = 5
size = 6

0 1 2 3 4 5

cap = 10

Figure 6.1: The resizing of an array when it overflows it capacity.

To implement a dynamic array in C++, we could use a structure storing as


members the capacity, size and backing fixed-size array of the dynamic array.
An example of how such a structure could look can be found in Listing 6.1.

Listing 6.1 The dynamic array structure

1 struct DynamicArray {
2 int capacity;
3 int size;
4 int backing[];
5
6 DynamicArray() {
7 capacity = 10;
8 size = 0;
9 backing = new int[10];
10 }
11 };

We are almost ready to add and remove elements to our array now. First, we

99
C HAPTER 6. D ATA S TRUCTURES

need to handle the case where insertion of a new element would result in the size
of the dynamic array would exceed its capacity, that is when 𝑠𝑖𝑧𝑒 = 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦.
Our previous suggestion was to allocate a new, bigger one, but just how big?
If we always add, say, 10 new elements to the capacity, we have to perform
the copying of the old elements with every 10’th addition. This still results in
additions to the end of the array taking linear time on average. There is a neat
trick that avoids this problem; creating the new backing array with double the
current capacity.
This ensures that the complexity of all copying needed for an array up to
some certain capacity have an amortized complexity of Θ(cap). Assume that
we have just increased the capacity of our array to cap, which required us to
copy cap2 elements. Then, the previous increase will have happened at around
capacity cap cap
2 and took time 4 . The one before that occurred at capacity 4
cap

and so on.
We can sum up all of this copying:
cap cap
+ + · · · ≤ cap
2 4
using the formula for the sum of a geometric series.
Since each copy is assumed to take Θ(1) time, the total time to create this
array was Θ(cap). As cap 2 ≤ size ≤ cap, this is also Θ(size), meaning that
adding size elements to the end of the dynamic array takes amortized Θ(size)
time.
When implementing this in code, we use a function that takes as argument
the capacity we require the dynamic array to have and ensures that the backing
array have at least this size, possibly by creating a new one double in size until it
is sufficiently large. Example code for this can be found in Listing 6.2.
With this method in hand, insertion and removal of elements is actually
pretty simple. Whenever we remove an element, we simply need to move the
elements coming after it in the dynamic array forward one step. See Figure 6.2
for an illustration of an element being removed.
When adding an element, we reverse this process by moving the elements
coming after the position we wish to insert a new element at one step towards
the back. An example of this is shown in Figure 6.3.
Exercise 6.2. Implement insertion and removal of elements in a dynamic array.
Dynamic arrays are called vectors in C++ (Section 3.1). They have the same
complexities as the one described at the beginning of this section.

100
6.2. S TACKS

Listing 6.2 Ensuring that a dynamic array have sufficient capacity

1 void ensureCapacity(int need) {


2 while (capacity < need) {
3 int newBacking[] = new int[2 * capacity];
4 for (int i = 0; i < size; i++)
5 newBacking[i] = backing[i];
6 delete backing;
7 backing = newBacking;
8 }
9 }

size = 6 size = 6

0 1 2 3 4 5 0 1 × ←3 ←4 ←5

cap = 10 cap = 10
size = 5

0 1 3 4 5

cap = 10

Figure 6.2: The removal of the element 2 at index 2.

Exercise 6.3. How can any element be removed in Θ(1) if we ignore the ordering
of values in the array?

6.2 Stacks
The stack is a data structure that contains an ordered lists of values and supports
the following operations:

• push(𝑣𝑎𝑙): inserting a value at the top of the stack.


Amortized complexity: Θ(1)

• pop(): remove the value at the top of the stack.


Complexity: Θ(1)

• top(): get the value at the top of the stack.


Complexity: Θ(1)

The structure is easily implemented with the above time complexities using

101
C HAPTER 6. D ATA S TRUCTURES

size = 5 size = 5

0 1 3 4 5 0 1 ∗→3→4 →5

cap = 10 cap = 10
size = 6

0 1 2 3 4 5

cap = 10

Figure 6.3: The insertion of the element 2 into index 2.

a dynamic vector. After all, the vector supports exactly the same operations that
a stack requires. In C++, the stack is called a stack (Section 3.4).
Exercise 6.4. Implement a stack using a dynamic vector.

6.3 Queues
The queue is, like the vector and the stack, an ordered list of values. Instead of
removing and getting values from the end like the stack, it gets the value from
the front. The supported operations are thus:
• push(𝑣𝑎𝑙): inserting a value at the end of the queue.
Amortized complexity: Θ(1)

• pop(): remove the value at the front of the queue.


Amortized complexity: Θ(1)

• front(): get the value at the front of the queue.


Complexity: Θ(1)
As previously seen, C++ has an implementation of the queue called queue
(Section 3.3).
Implementing a queue can also be done using a vector. After all, the
operations and complexities are nearly the same; only removing the value of the
front is wrong. To fix this, one can simply hold a pointer to what the front of the
queue is in the vector. Removing the front element is then equivalent to moving
the pointer forward one step. To see this how this would work in practice, see an
example push and pop operation in Figure 6.4.
Exercise 6.5. Implement a queue using a vector.

102
6.4. P RIORITY Q UEUES

size = 4
size = 5

1 2 3 4 1 2 3 4 5

f ront = 1 f ront = 1
size = 4

2 3 4 5

f ront = 2

Figure 6.4: Pushing and popping elements in a queue

Exercise 6.6. A queue can also be implemented using two stacks. How?
In a similar manner, a stack can also be implemented using two queues.
How?

Exercise 6.7. This naive suggestion of a queue implementation suffers a slight


problem. After pushing and poping 𝑘 elements, the backing dynamic array
has size at least 𝑘 even though none of its elements are in use, thus occupying
memory unnecessarily. Devise a strategy that ensures the backing dynamic
array never uses more than 𝑐𝑛 elements (where 𝑛 is the current size of the queue)
for some constant 𝑐.

6.4 Priority Queues


Now, let us look at our first more complicated data structure. The priority queue
is an unordered bag of elements, from which we can get and remove the largest
one quickly. It supports the operations

• push(𝑣𝑎𝑙): inserting a value into the heap.


Complexity: 𝑂 (log 𝑛)

• pop(): remove the largest value in the heap.


Complexity: 𝑂 (log 𝑛)

• getMax(): get the largest value in the heap.


Complexity: Θ(1)

103
C HAPTER 6. D ATA S TRUCTURES

This is implemented as priority_queue in C++ (Section 3.5), although one


often instead use a set which not only supports the same operations with the
same complexities, but also supports erasing elements.
The backing implementation of the priority queue structure we study is
called a heap. The heap itself will be implemented using another data structure
called a binary tree, which we need to describe first.

Binary Trees
A binary tree is a rooted tree where every vertex have either 0, 1 or 2 children.
In Figure 6.5a, you can see an example of a binary tree.

2 1

5 7 2 3

11 4 4 5 6
(a) A non-complete binary (b) A complete binary tree.
tree.

Figure 6.5: Examples of binary trees.

We call a binary tree complete if every level of the tree is completely filled,
except possibly the bottom one. If the bottom level is not filled, all the vertices
need to be as far as possible to the left. In a complete binary tree, we can order
every vertex as we do in Figure 6.5b, i.e. from the top down, left to right at each
layer.
The beauty of this numbering is that we can use it to store a binary tree in a
vector. Since each vertex is given a number, we can map the number of each
vertex into a position in a vector. The 𝑛 vertices of a complete binary tree then
occupy all the indices [1, 𝑛]. An important property of this numbering is that
it is easy to compute the number of the parent, left child and right child of a
vertex. If a vertex has number 𝑖, the parent simply have number b 2𝑖 c, the left
child has number 2𝑖 and the right child has number 2𝑖 + 1.

Exercise 6.8. Prove that the above properties of the numbering of a complete
binary tree hold.

A heap is implemented as a complete binary tree, which we in turn implement

104
6.4. P RIORITY Q UEUES

by a backing vector in the manner described. In the implementation, we use the


following convenience functions:
1: function Parent(𝑖) return 𝑖/2
2: function Left(𝑖) return 2𝑖
3: function Right(𝑖) return 2𝑖 + 1
Note: if you use a vector to represent a complete binary tree in this manner
it needs to have the size 𝑛 + 1 where 𝑛 is the number of vertices, since the tree
numbering is 1-indexed and the vector is 0-indexed!

Heaps
A heap is a special kind of complete binary tree. More specifically, it should
always satisfy the following property: a vertex always have a higher value than
its immediate children. Note that this condition acts transitively, which means
that a vertex also has a higher value than its grand-children, and their children
and so on. In particular, a consequence of this property is that the root of the
three is always be the largest value in the heap. As it happens, this property is
exactly what we need to quickly get the maximum value of the heap. You can
see an example of a heap in Figure 6.6

10

7 4

4 5 1
Figure 6.6: A heap of the elements 1, 4, 4, 5, 7, 10.

We thus start our description of a heap somewhat backwards, with the


function needed to get the largest element:
1: procedure Get-Max(tree) return tree[1]
The complicated operations on the heap is to add and remove elements
ensuring that the heap keeps satisfying this property. We start with the simplest
one, adding a new element. Since we represent the heap using a vector, adding
a new element to a heap can be done by appending the element to the vector. In
this manner, we ensure that the underlying binary tree is still complete. However,

105
C HAPTER 6. D ATA S TRUCTURES

it may be that the value we added is now larger than its parent. If this is the case,
we can fix the violation of our heap property by swapping the value with its
parent. This does not guarantee that the value still is not larger than its parent.
In fact, if the newly added element is largest in the heap, it would have to be
repeatedly swapped up to the top! This procedure, of moving the newly added
element up in the tree until it is no longer larger than its parent (or it becomes
the root) is called bubbling up:

1: procedure Bubble-Up(idx, tree)


2: while 𝑖𝑑𝑥 > 1 do
3: if tree[idx] > tree[Parent(idx)] then
4: Swap tree[idx] and tree[Parent(idx)]
5: idx ← Parent(idx)
6: else
7: break

Pushing a value now reduces to appending it to the tree and bubbling it up.
You can see this procedure in action in Figure 6.7.

1: procedure Push(𝑥, tree)


2: tree. push_back(𝑥)
3: Bubble-Up(tree. size() − 1, tree)

10 10

7 4 7 4

4 5 1 4 5 1 13
10 13

7 13 7 10

4 5 1 4 4 5 1 4

Figure 6.7: Adding a new value and bubbling it up.

106
6.4. P RIORITY Q UEUES

Removing a value is slightly harder. First of, the tree will no longer be a
binary tree – it is missing its root! To rectify this, we can take the last element
of the tree and put it as root instead. This keeps the binary tree complete, but
may cause it to violate the heap property since our new root may be smaller than
either or both of its children.
The solution to this problem is similar as to that of adding an element.
Instead of bubbling up, we bubble it down by repeatedly swapping it with one
of its children until it no longer is greater than any of its children. The only
question mark is which of its children we should bubble down to, in case the
element is smaller than both of its children. The answer is clearly the largest
of the two children. If we take the smaller of the two children, we will again
violate the heap property. Just as with pushing, popping a value is done by
a combination of removing the value and fixing the heap to satisfy the heap
property again.

1: procedure Remove-Max(𝑥, tree)


2: tree[1] ← tree[tree. size() − 1]
3: tree. pop_back()
4: Bubble-Down(1, tree)
5: procedure Bubble-Down(idx, tree)
6: while true do
7: largest ← idx
8: if Left(idx) < tree. size() and tree[Left(idx)] > tree[largest] then
9: largest ← Left(idx)
10: if Right(idx) < tree. size() and tree[Right(idx)] > tree[largest] then
11: largest ← Right(idx)
12: if largest = idx then
13: break
14: else
15: Swap tree[idx] and tree[largest]
16: idx ← largest

A final piece of our analysis is missing. It is not yet proven that the time
complexity of adding and removing elements are indeed 𝑂 (log 𝑛). To do this,
we first need to state a basic fact of complete binary trees: their height is at most
log2 𝑛. This is easily proven by contradiction. Assume that the height of the tree

107
C HAPTER 6. D ATA S TRUCTURES

is at least log2 𝑛 + 1. We claim that any such tree must have strictly more than 𝑛
vertices. Since all but the last layers of the tree must be complete, it must have at
least 1 + 2 + · · · + 2log2 𝑛 = 2log2 𝑛+1 − 1 vertices. But 2log2 𝑛+1 − 1 = 2𝑛 − 1 > 𝑛
for positive 𝑛 – the tree has more than 𝑛 vertices. This means that a tree with 𝑛
vertices cannot have more than height log2 𝑛.
The next piece of the puzzle is analyzing just how many iterations the loops
in the bubble up and bubble down procedures can perform. In the bubble up
procedure, we keep an index to a vertex that, for every iteration, moves up in
the tree. This can only happen as many times as there are levels in the tree.
Similarly, the bubble down procedure tracks a vertex that moves down in the
tree for every iteration. Again, this is bounded by the number of levels in the
tree. We are forced to conclude that since the complexity of each iteration is
Θ(1) as they only perform simple operations, the complexities of the procedures
as a whole are 𝑂 (log 𝑛).
Problem 6.1
Binary Heap – heap

Exercise 6.9. Prove that adding an element using Push never violates the heap
property.

Exercise 6.10. To construct a heap with 𝑛 elements by repeatedly adding one at


a time takes 𝑂 (𝑛 log 𝑛) time, since the add function takes 𝑂 (log 𝑛) time in the
worst case.
One can also construct it in Θ(𝑛) time in the following way: arbitrarily con-
struct a complete binary tree with all the 𝑛 elements, and then call Bubble-Down
on each of the elements in reverse order 𝑛, 𝑛 − 1, . . . , 2, 1.
Prove that this correctly constructs a heap, and that is takes Θ(𝑛) time.

6.5 Bitsets
We move on to the simplest data structure of the chapter. The bitset can be
viewed as a specialization of a static-length array for the case where the values
stored are booleans, i.e. supporting only the operations of setting and getting
values in the array.
The idea behind it is simple. Booleans are essentially values 0 (false) or 1
(true), i.e. equivalent to a binary digit. Another data type also consists of binary
digits – integers. In the chapter on programming, we noted that the memory of

108
6.5. B ITSETS

a computer is just a long sequence of binary digits. Any interpretation of what


these digits actually mean is up to us. For example, your typical 64-bit integers
could just as well represent an array of 64 booleans, indexed from 0 to 63. That
is the bit part.
An array of booleans of size 𝑁 can also be interpreted as a subset of the
integers {0, 1, . . . , 𝑁 − 1}. If the 𝑖’th value of the array is true, we say that 𝑖
is in the subset, otherwise it is not. This is the set part. An example of this
equivalence between integers and subsets is visualized in Figure 6.8.

7 6 5 4 3 2 1 0

90 = 0 1 0 1 1 0 1 0 = {1, 3, 4, 6}

Figure 6.8: The equivalence between 90 and set of elements {1, 3, 4, 6}

Interpreting 𝑁 -bit integers as subsets of {0, 1, . . . , 𝑁 − 1} is surprisingly


useful. It allows us to easily pass subsets as parameters to functions, and even
use them as indexes into a vector rather than using a map!
To operate on a bitset, we use bitwise operators in C++. To construct the
representation of the set {𝑖}, the expression 1 << i is used. This operator is
called the left-shift operator. It works by taking the binary representation of the
left operand and moves it 𝑖 steps to the left, adding 𝑖 zeroes to the right. Since
the binary representation of 1 is one, the expression results in a binary number
with an 1 in its 𝑖’th place, which is the correct representation of the set.
To take the union of two bitsets x and y, we use the bitwise or operator: x |
y. This operator takes the two integers and gives us a new one, with a 1 as the
𝑖’th digit if either x or y had a 1 as their 𝑖’th digit.
Similarly, the bitwise and operator x & y computes the intersection of the
two sets.
This allows us to check for set membership of 𝑖 in the bitset x using the
expression x & (1 << i) which is 0 if 𝑖 was not a member of the set, and 1 << i
otherwise.
One can also compute the symmetric difference of two sets with the bitwise
exclusive or operator, x ˆ y. An element is a member of the bitset if it was a
member of exactly one of x and y. Thus, one can toggle the presence of an
element 𝑖 in a bitset using x ˆ (1 << i).
Finally, the bitwise negation operator ˜x computes the complement of a set.

109
C HAPTER 6. D ATA S TRUCTURES

To compute the size of a bitset, the most common compilers support the
macro __builtin_popcount(x) which returns the number of 1 digits in x.
There are several neat tricks involving bitsets. Some worth mentioning are:

• computing the representation of {0, 1, . . . , 𝑁 − 1} with the expression (1


<< N) - 1,

• removing the lowest-numbered element of the set with x & (x - 1),

• retrieve the lowest-numbered element using x & -x, and

• iterating through all (non-empty) subsets of a bitset 𝑥 using the loop


for (int sub = x; sub != 0; sub = (sub - 1) & x)

Exercise 6.11. Given a bitset, use bitwise operators to compute the next higher
bitset with the same number of elements.

6.6 Hash Tables


In Section 3.6, we looked at a data structure called map, which stored a mapping
from a set of keys to their corresponding values. The underlying implementation
of this data structure in STL is called a self-balancing tree, something we explore
further in Chapter 22.1. If one is willing to forego having the structure being
sorted by keys, a hash table can be used instead. Hash tables are an easier
implementation of the map, and can, depending on read and write patterns,
be faster than the self-balancing tree implementation. It exists in STL as well,
called unordered_map.
The operations the hash table supports are:

• set(x, y): set the value of key x to y.


Complexity: expected Θ(1)

• get(x): return the value of key x.


Complexity: expected Θ(1)

• erase(x): remove the key x.


Complexity: expected Θ(1)

• contains(x): returns whether the table contains the key x or not.


Complexity: expected Θ(1)

110
6.6. H ASH TABLES

The main idea behind supporting these operations quickly is essentially the
same as that of the dynamic array. Assume that the set of all possible keys, the
universe, were the integers 0, 1, . . . , 𝑁 for some fixed 𝑁 . If so, we could easily
implement the above operations by storing the values in a dynamic array of this
size. What do we do if this is not the case?
We apply the concept of hashing. Imagine your universe consists of all
possible integers that fits in an int. Clearly, we could not use a dynamic array of
that size (232 ) to store the table – at least not in competitions. Instead, the goal
of hashing is to shrink this huge universe into a small one, that we could store in
an array. For example, we could take the last 𝐾 digits of the key for some small
𝐾 (i.e take key mod 10𝐾 ) and use it as the index into an array for the value of
that key. There are only 10𝐾 such keys, which can fit in an array for a small 𝐾.
Given such a mapping, we can store the key and value as a pair on the
corresponding index in the array. This is illustrated for 𝐾 = 1 in Figure 6.9.

0 1 2 3 4 5 6 7 8 9

(50, A) (2002, B) (144, C) (17, D)

Figure 6.9: An example of where the hash table would store the values of a few keys when 𝐾 = 1.

This kind of transformation, that takes an arbitrarily large value (any integer)
and maps it into a set of constant size, is called a hash function2.
Unfortunately, we are bitten by one of the fundamental limitations of
mathematics – if a function maps a set to a smaller one, there must be at least two
values which map to the same value. Our table can thus be subject to a collision,
two keys that map to the same index in the array. Resolving this situation is
actually straightforward. Instead of using the table to store a single key-value
pair, we can store a dynamic array of key-value pairs3.
This complicates an implementation only slightly. To retrieve a value from
the hash table, we look up the correct index of the array backing the table and
check all key-value pairs stored in the array at that index. If we find a pair with
2You might have heard about a version of this often used in cryptography, the cryptographic
hash function, which aims to provide stronger guarantees than we care about.
3Traditionally, most introductions to the hash table decides to instead use a data structure called
a linked list to pairs that collide. We elect not to, since it’s mostly a question about real-world
performance, and expect you to mostly use the implementation from STL if you use C++.

111
C HAPTER 6. D ATA S TRUCTURES

0 1 2 3 4 5 6 7 8 9

(50, A) (2002, B) (144, C) (17, D)

(270, E) (27, F)

Figure 6.10: When several keys with the same hash value are stored, we save all pairs in a sub-
array.

the correct key, we return its associated value.


Setting the value of a key becomes a two-step process. First, we need to
check the correct sub-array to see if it already contains a key-value pair with the
same key. If it does, we update the value of the pair to the new value. If it does
not, we insert the new key and value as a pair in that list.
We are not done yet. If the observant reader attempts to reason about the
time complexity of the above solution, they would notice a problem. When all
the keys inserted into the table have the same hash value – in the case of our
simple function, the same last 𝐾 digits – they map into the same position in
the backing array. Searching through the sub-array to determine if it already
contains the key would then become a linear time operation as our hash table
degrades into a single array! Similarly, if we pick a 𝐾 that is much smaller than
the number of keys, we will have to scan through very large arrays. For example,
when 𝐾 = 1 and we have 106 keys, every operation would need to scan through
on average 105 values. This leaves us with two issues to resolve – how do we
choose a reasonable hash function, and how big must our hash table be?
As for the second question, the goal is essentially to always keep the size as
a multiple of the number of keys. When the number of keys grows too large
for the table, the size is increased and the hash of all keys are recomputed with
regards to the new table size. Typically, one would double the size of the hash
table when it grows too dense, similar to how dynamic arrays were implemented.
This makes sure inserting keys keep the time complexity amortized constant
over resizing. Furthermore, if one starts with a table size that is a power of two,
doubling it keeps the size a power of two. In the remainder of the section, we
will assume it is equal to 2𝑁 at the time of hashing.
The first question depends on the context. For the sake of programming
problems where the hash table keys are 32-bit integers, one can usually use

112
6.6. H ASH TABLES

something taking the upper 𝑁 bits of 𝐴𝑥 as the hash, where 𝐴 is a large constant
odd constant, i.e:
(A * x) >> (64 - N)

Using an odd constant makes sure 𝐴 is relatively prime to 2𝑁 , which gives us


more randomness. Otherwise, 𝐴𝑥 in binary would just have a few extra zeroes
at the end.

Exercise 6.12. What happens if one takes the lower 𝑁 bits of the product 𝐴𝑥 as
the hash instead?

Exercise 6.13. The above hash function is easily broken when 𝑥 can be a 64-bit
integer – how?

When we now move on to trying to compute the complexity of a hash table,


we assume that a randomly chosen key has the same probability of being mapped
to any of the possible hash values. By the resizing trick above, we can also
assume that the table size is always within a constant of the number of keys.
Assume that 𝐾 operations are performed on a table of some size 𝑀. The
complexity of the 𝑖’th operation is exactly that of the length of the sub-array to
which the key involved in the operation would map. The expected complexity of
the operation is then equal to the expected length of the sub-array. Let 𝑎 𝑗 be 0 if the
𝑗’th operation inserted a key into this sub-array, and 1 otherwise (for 1 ≤ 𝑗 < 𝑖).
The expected length of the sub-array is then E[ 𝑖𝑗=1 𝑎 𝑗 ] = 𝑖𝑗=1 E[𝑎 𝑗 ] by the
Í Í

linearity of expectation. By the assumption that keys map randomly into hash
values, 𝐸 [𝑎 𝑗 ] ≤ 𝑀1 so that the sum above is bounded by 𝑀𝑗 ≤ 𝑀 𝐾
. Since 𝑀 𝐾
≤𝑐
for some constant 𝑐 by the dynamic resizing, the expected length is also bounded
by a constant, meaning the complexity is as well.
Note that this analysis says nothing about the worst-case complexity of an
operation (which can be linear of all keys map to the same hash value) or the
log 𝐾
expected length of the longest sub-array (which is log log 𝐾 ).

Universal Hashing
Certain competition forms include a stage where contestants may challenge the
solutions of others for correctness, by providing a test case they believe the
solution would fail. In this case, the hash function above is not good enough.
Another contestant can easily generate values of 𝑥 that all map to the same hash
value, by generating a large number of values and evaluating your hash function

113
C HAPTER 6. D ATA S TRUCTURES

on them, picking a large set of collisions from them. This also applies when
using unordered_map from STL4. To resolve this, one picks a hash function from
a family of functions at random at every invocation of your program, a concept
called universal hashing. In practice, the randomness tends come from reading
the current time at a sufficiently granular level to be hard to predict.
The hash function we will look at is again 𝐴𝑥 (as an unsigned 64-bit integer),
but this time 𝐴 is a random (odd) 64-bit integer. We claim that taking the top 𝐾
bits of 𝐴𝑥 makes a good hash function from 64-bit integers to 𝐾-bit integers, i.e.
 
𝐴𝑥
ℎ(𝑥) = 64−𝐾 .
2
To prove we are not making you use a weak hash for when you com-
pete against the authors, we provide a somewhat technical and uninteresting
proof.

Theorem 6.1
For any two fixed 64-bit integers 𝑥, 𝑦, their hashes ℎ(𝑥) and ℎ(𝑦) are equal
with probability 22𝐾 over all choices of the hash function parameter 𝐴).

Proof. This proof uses some basic number theoretic facts – if you are not
familiar with modular inverses, you might need to work through Chapter 19.
Assume that ℎ(𝑥) = ℎ(𝑦). This means that the top 𝐾 bits of 𝐴𝑥 and 𝐴𝑦
are equal. Thus, the top 𝐾 bits of 𝐴𝑥 − 𝐴𝑦 = 𝐴(𝑥 − 𝑦) must either be all
zeroes (if 𝐴𝑥 ≥ 𝐴𝑦) or all ones (if 𝐴𝑥 < 𝐴𝑦, causing a carry bit from the top
𝐾 bits).
Now, we introduce the following variables. Let 𝑧 be the odd part
of 𝑥 − 𝑦 such that 𝑥 − 𝑦 = 𝑧2𝑖 for some 𝑖. Also, let 𝐵 be the top 63
bits of 𝐴 so that 𝐴 = 2𝐵 + 1. Since 𝐴 is a uniformly random odd 64-
bit integer, 𝐵 is uniformly random. We can now perform the rewrite
𝐴(𝑥 − 𝑦) = (2𝐵 + 1) (𝑥 − 𝑦) = (2𝐵 + 1)𝑧2𝑖 = 𝐵𝑧2𝑖+1 + 𝑧2𝑖 . Since 𝐵 is
uniformly random over 263 and 𝑧 is odd, 𝐵𝑧 mod 263 is uniformly random
over 263 (this follows from the fact that 𝑧 as an odd number is relatively prime
with 263 ). Thus, the integer 𝐴(𝑥 − 𝑦) = 𝐵𝑧2𝑖+1 + 𝑧2𝑖 is uniformly random in
its top 63 − 𝑖 bits and contains only zeroes in its lower 𝑖 bits.
Note that 𝐴𝑥 = 𝐴𝑦 + 𝐴(𝑥 − 𝑦). Since 𝐴(𝑥 − 𝑦) has zeroes in the lower 𝑖

4Using map is fine however, since it is not backed by a hash table.

114
6.6. H ASH TABLES

bits and a 1 in the 𝑖’th bit, the 𝑖’th bit of 𝐴𝑦 will change when adding 𝐴(𝑥 −𝑦)
to it, so that it will differ from the 𝑖’th bit in 𝐴𝑥. By assumption the top 𝐾
bits of 𝐴𝑥 and 𝐴𝑦, are equal, which thus forces 𝑖 ≤ 64 − 𝐾.
Since the top 63 − 𝑖 ≥ 63 − (64 − 𝐾) = 𝐾 are thus uniformly random, we
get that they are all ones or all zeroes with probability 22𝐾 . 

Exercise 6.14. Is it a problem that the hash has pairwise collisions with proba-
bility 22𝐾 rather than 21𝐾 with regard to hash table complexity?

Chapter Exercises
Exercise 6.15. Assume that you want to implement shrinking of a dynamic
array (or a hash table) where many elements were deleted so that the capacity is
unnecessarily large. This will be implemented by calling a particular function
after any removal, to see if the array should be shrunk. What is the problem
with the following implementation?

1: procedure ShrinkVector(𝑉 )
2: while 2 · V.capacity > V.size do
3: arr ← 𝑛𝑒𝑤 𝑇 [V.capacity/2]
4: copy the elements of V.backing to arr
5: V.backing ← arr
6: V.capacity ← V.capacity/2

Chapter Notes
For a more rigorous treatment of the basic data structures, we again refer to
Introduction to Algorithms [7]. In partiular it goes through other techniques
regarding hash tables more thorougly, something we skipped since it is the
hashing technique and general knowledge of the structure we deemed important
here – an efficient implementation is something your language standard library
will provide.
If you want to dive deeper into proper implementations of the algorithms
in C++, Data Structures and Algorithm Analysis in C++[29] covers what we
brought up in this chapter and a bit more.

115
C HAPTER 6. D ATA S TRUCTURES

116
7 Recursion
This chapter introduces the first proper algorithimc technique of the book, that
of recursion. The first four chapters of the next part – brute force, greedy
algorithms, dynamic programming and divide and conquer – are all based on
this concept. Recursion is perhaps the first truly creatively tricky (rather than
technically difficult) technique faced by the fresh programmer, so we have chosen
to dedicate an entire chapter for a primer on the topic.
The remainder of this book, and computer science as a whole, strongly
depends on a solid understanding of recursion. You are therefore urged to read
it more carefully than the previous chapters. Even better; once you have read it,
read it again.

7.1 Recursive Definitions


The first example of recursion most people become acquainted with is the
definition of a particular mathematical sequence, the Fibonacci numbers. The
infinite sequence starts with the numbers 0, 1, 1, 2, 3, 5, 8, 13, . . . . Except for
the first two, each number is computed by taking the sum of the two previous
ones. A formal mathematical definition of the 𝑖’th Fibonacci number 𝐹𝑖 can look
like this:
0 if 𝑖 = 0




(7.1)


𝐹𝑖 = 1 if 𝑖 = 1

 𝐹𝑖−1 + 𝐹𝑖−2 for 𝑖 ≥ 2.



By the definition, we would have e.g. 𝐹 6 = 𝐹 5 + 𝐹 4 = 5 + 3 = 8, which holds.

Exercise 7.1. Use the definition to compute the 15 first Fibonacci numbers.

This is a so-called called recursive definition, meaning that it refers back


to itself – the definition of a Fibonacci number depends on the definition of
(earlier) Fibonacci numbers. A program to directly implement the recursion
looks very similar to the mathematical definition Eq. 7.1.

117
C HAPTER 7. R ECURSION

Computing Fibonacci Numbers


1 int F(int n) {
2 if (n == 0) return 0;
3 if (n == 1) return 1;
4 return F(n - 1) + F(n - 2);
5 }

Note that this function, just like the recursive definition, computes its result
𝐹 (𝑛) by calling itself to compute the (smaller) Fibonacci numbers 𝐹 (𝑛 − 1) and
𝐹 (𝑛 − 2). A knee-jerk reaction might be that such a function could never finish.
After all, in order to compute a single Fibonacci number, the function calls itself,
not just one, but two times! The solution is one of the key ideas of recursion,
namely that there is some base case where the self-referential – recursive –
computation eventually bottoms out, so that the definition does not refer back to
itself forever and ever. In the case of Fibonacci, once you try computing 𝐹 0 or
𝐹 1 , the definition gives us the values immediately without having to apply the
recursive case. One can visualize the computation as in Figure 7.1.

𝐹 (5) 𝐹 (4) 𝐹 (3) 𝐹 (2) 𝐹 (1) 5 3 2 1 1

𝐹 (3) 𝐹 (2) 𝐹 (1) 𝐹 (0) 2 1 1 0

𝐹 (2) 𝐹 (1) 1 1
𝐹 (1) 1
𝐹 (1) 𝐹 (0) 1 0
𝐹 (0) 0

(a) The recursive calls. (b) The return values.

Figure 7.1: A visualization of the computation of 𝐹 (5)

Other possible applications of the recursive principle would be computing


𝑎𝑛 where 𝑛 is non-negative integer. Since 𝑎𝑛 is defined as the product of 𝑎 𝑛
times, we can base a recursion around first computing 𝑎𝑛−1 and then multiplying
it with 𝑎: (
1 if 𝑛 = 0
𝑛
𝑎 = (7.2)
𝑎 ·𝑎 𝑛−1 for 𝑛 ≥ 1.
The implementation is similarly straightforward.

118
7.1. R ECURSIVE D EFINITIONS

Recursive Exponentiation
1 int power(int a, int n) {
2 if (n == 0) return 1;
3 return a * power(a, n - 1);
4 }

Even though recursive definitions are the simplest examples of recursion,


they often come up in practice.
One can also solve problems traditionally programmed using loops by
formulating them recursively1. Consider the problem of summing an array
of integers. Normally, you’d use loops for such a task. However, there is
nothing preventing you from using the following recursive definition. Let
𝐴 = (𝑎 0, . . . , 𝑎𝑛−1 ) be an array of integers. If 𝑆 (𝑘) is the sum of the first 𝑘
elements of 𝐴, we have that
(
0 if 𝑘 = 0
𝑆 (𝑘) = (7.3)
𝑎𝑘−1 + 𝑆 (𝑘 − 1) otherwise.

To compute the sum of the entire array, we would call 𝑆 (𝑛). Even though we
now have to deal with a vector, the implementation is similar:
1 // Invoked with sum(A, A.size())
2 int sum(const vi& A, int k) {
3 if (k == 0) return 0;
4 return A[k - 1] + sum(A, k - 1);
5 }

Equation 7.3 is a recursive definition too: it reduces the problem of summing


the entire array 𝐴 to summing a smaller part of 𝐴. This is the essence of
recursion – it was the common factor in all three examples. A recursive
definition is meant to express the solution to a problem in terms of other
instances of the same problem. The goal is that the new instances should be
smaller than the first, in order to make progress on the problem.
Exercise 7.2. Write recursive functions to compute:
• the product of all integers in an array,

• the largest element of an array, and

• the greatest sum of two consecutive elements of an array.


1In fact, some programming languages do not have loops. Instead, you must formulate them
recursively.

119
C HAPTER 7. R ECURSION

7.2 The Time Complexity of Recursive Functions


How fast is a recursive program? It is not as easy to compute as most of the
algorithms we have seen so far. The work is distributed over some number of
recursive calls, and we need to sum all of it up.
In all problems of this chapter, the time complexity of each recursive call
is the same. This is true of all of our examples so far – they only performed
constant-time work plus some recursive calls. Computing the complexity then
boils down to two factors: the number of function calls in total, and the time
complexity of a single function call (excluding the recursive calls). Summing
all the work is as simple as taking the product of these two things.
In the example of summing a vector of size𝑚, there are a total of𝑚+1 = Θ(𝑚)
function calls. One call is made for each element, and one final call for the base
case of the recursion. A single call performs only constant-time operations, so it
has complexity Θ(1). This means the time complexity is Θ(𝑚) · Θ(1) = Θ(𝑚).
The number of function calls made in total can be considerably harder to
compute, such as for the Fibonacci recursion.

Exercise 7.3. Write a program that uses the recursive function to compute
Fibonacci numbers. Try computing all the Fibonacci numbers starting from 𝐹 30
and upwards until the execution takes over 30 seconds. Take note on how long
your program takes. What complexity does the function seem to have?

From the above exercise, one thing should be clear – the running time is not a
linear function. In fact, it turns out to be exponential!
A simple lower bound is 2 2 function calls, which we can prove by induction.
𝑛

Let 𝑇 (𝑛) be the time taken to compute 𝐹𝑛 . If 𝑇 (𝑛) ≥ 2 2 for all 𝑛 up to some
𝑛

𝑛 0 − 1 and 𝑛 = 1, then

𝑇 (𝑛 0) ≥ 𝑇 (𝑛 0 − 2) + 𝑇 (𝑛 0 − 1)
𝑛0 −2 𝑛0 −1
=2 2 +2 2

𝑛0 −2 𝑛0 −2
≥2 2 +2 2

𝑛0 −2
= 21+ 2

𝑛0
=22

using the fact that 𝑇 (𝑛) = 𝑇 (𝑛 − 2) + 𝑇 (𝑛 − 1) + Θ(1), so the statement holds


for 𝑛 = 𝑛 0 too. By induction, it holds for all 𝑛 ≥ 1.

120
7.3. C HOICE

This lower bound is quite lax though – we can do better. Assume that
𝑇 (𝑛) ≥ 𝑥 𝑛 for some real value 𝑥. As 𝑇 (𝑛) ≥ 𝑇 (𝑛 − 2) + 𝑇 (𝑛 − 1) = 𝑥 𝑛−1 + 𝑥 𝑛−2
(within a constant term), we get the inequality
𝑥 𝑛 ≥ 𝑥 𝑛−1 + 𝑥 𝑛−2
which, after dividing by 𝑥 𝑛−2 results in 𝑥 2 − 𝑥 − 1 > 0. Solving for 𝑥 = 0, we
find that the √inequality holds when 𝑥 is greater than the so-called the golden
ratio, 𝜙 = 1+2 5 ≈ 1.618.
Exercise 7.4. Use the same inductive technique as before to prove that 𝑇 (𝑛) =
Ω(1.61𝑛 ) and 𝑇 (𝑛) = 𝑂 (1.62𝑛 ).

7.3 Choice
While all recursion is based on reducing a problem instance to a smaller instance
of the same problem, there are many different conceptual ways to do this. This
time, we look at problems involving choices of different kinds.

Stairs
Tasha the Kitty loves playing with the stairs at home while her caretakers are
at work. Her favorite game involves jumping up to the top of the stairs by
repeatedly skipping either 1 or 2 stairs at a time. She doesn’t like jumping on
the exact same sequence of stairs during two different climbs.

Figure 7.2: One way Tasha could climb a staircase of 6 stairs.

If the staircase has 1𝑛 ≤ 20 steps (including the top), in how many different
ways can she climb the stairs?

Solution. With such a small 𝑛, computing this efficiently is not the main issue;
computing it at all is. The trick lies in formulating Tasha’s jumping up the stairs

121
C HAPTER 7. R ECURSION

as a sequence of choices. After Tasha has jumped 𝑘 steps, she has two choices –
should her next jump be up a single stair to 𝑘 + 1, or two stairs to 𝑘 + 2? When
dealt such a problem, always ask yourself: what was Tasha’s last choice, just
before she climbed up to the top of the stairs? Consider these two options in
Figure 7.3.

n
n−1
n−2

Figure 7.3: The two stairs leading to the top.

If there are a total of 𝑛 stairs and Tasha’s last jump was a single step, then
she came from step 𝑛 − 1. Similarly, if she took two steps, she came from step
𝑛 − 2. These two options are exhaustive – there is no other way she could have
come to step 𝑛. They are also exclusive – we assumed that this was Tasha’s last
jump, so there is no overlap between these possibilities. This means that the
number of ways Tasha can get to the 𝑛’th step must be equal to the number of
ways she could get to the (𝑛 − 1)’st step, plus the number of ways she could get
to the (𝑛 − 2)’nd step.
A recursive procedure based on this insight is then straightforward:

1: procedure Stairs(𝑛)
2: if 𝑛 = 0 then
3: return 1
4: if 𝑛 = 1 then
5: return 1
6: return Stairs(𝑛 − 1) + Stairs(𝑛 − 2)

Note the bases cases we added, for the case of an empty staircase or a single
stair. The time complexity of the solution is the same as that for Fibonacci, since
the recursion is the same. 

122
7.3. C HOICE

We solve another problem from an early Swedish high school qualifier in


the same way.

The Plank
Swedish Olympiad in Informatics, School Qualifiers 2001
You want to construct a long plank using smaller wooden pieces. There are three
kinds of pieces of lengths 1, 2 and 3 meters respectively, which you have an
unlimited number each of. You can glue together several of the smaller pieces
to create a longer plank.

Figure 7.4: There are 7 ways to glue a 4 meter plank.

If the plank should have length 𝑛 (1 ≤ 𝑛 ≤ 24) meters, in how many different
ways can you glue pieces together to get a plank of the right length?

Solution. The idea here is the same as in the Stairs problem. To compute the
size of the set of all possible planks, we find a recursive definition that reduces
the problem into counting the number of ways one can build some smaller
planks. For any given plank of length 𝑛, the rightmost piece of the plank has size
either 1, 2 or 3. This means that the number of ways in which we can construct
the plank is equal to the number of ways in which planks of sizes 𝑛 − 1, 𝑛 − 2
and 𝑛 − 3 can be computed. While this isn’t easier to compute directly, we can
apply the same reduction recursively to these smaller planks, ending up with a
very similar solution:

1: procedure PlankWays(𝑛)
2: if 𝑛 < 0 then
3: return 0
4: if 𝑛 = 0 then
5: return 1

123
C HAPTER 7. R ECURSION

6: return PlankWays(𝑛 − 1) + PlankWays(𝑛 − 2) + PlankWays(𝑛 − 3)

Again, we had to add a few base cases to give the recursion somewhere to stop.
The two base cases we picked here may be slightly less intuitive. We say that
there is a single way to construct a plank of length 0, and no ways to construct
negative-length planks. 

Exercise 7.5. The PlankWays algorithm has a time complexity lower bound of
Ω(1.83𝑛 ) and 𝑂 (1.84𝑛 ). Prove this.

Problem 7.1
The Plank – plankan
Note: solve the first subtask for 50 points.

While these two problems in particular are much alike, many other recursive
problems also follow this template:

• formulate the problem as a sequence of choices,

• look at what the last choice was, and

• find out if “backtracking” along that choice reduces the problem to smaller
instances of the same problem.

Now that we have warmed up, we are going to look at a slightly harder
recursive problem, where it is less obvious to figure out how to reduce the
problem to a smaller one.

Dominoes
In how many ways can a 2 × 𝑛 (1 ≤ 𝑛 ≤ 20) grid be tiled by 𝑛 dominoes, i.e.
bricks of size 1 × 2 or 2 × 1 such that no dominoes overlap?

Figure 7.5: An example tiling of a 2 × 7 grid.

124
7.3. C HOICE

Solution. Looking at the example tiling in figure 7.5 might help us. Let us
denote the number of tilings of a 2 × 𝑛 grid with 𝑆 (𝑛). In general, a recursion
would somehow reduce the problem of computing 𝑆 (𝑛) to computing smaller
values of this function. By considering the rightmost domino of the example, a
partial solution idea should form. If the rightmost tile is placed vertically, the
remaining grid has size 2 × (𝑛 − 1), so there are 𝑆 (𝑛 − 1) such tilings. If it is
not placed vertically, the two rightmost squares must instead be occupied by two
horizontal tiles. In this case, the remaining grid would have size 2 × (𝑛 − 2),
meaning there would be 𝑆 (𝑛 − 2) ways to complete the remainder of the tiling
(see Figure 7.6).

n−2

= +
n
n−1
Figure 7.6: The two resulting subproblems after covering the rightmost column.

Since these are the only two options, the total number of tilings must be
𝑆 (𝑛) = 𝑆 (𝑛 − 1) + 𝑆 (𝑛 − 2), and thus we get our recursive solution. Here
too we got the same recursion as the one for Fibonacci, with the same time
complexity. 

Exercise 7.6. 1) Write a recursion to compute the number of strings of length 𝑛


consisting of only letters A and B, with no two A’s next to each other.
2) Write a recursion to compute the number of subsets of {1, 2, . . . , 𝑛} of
length 𝑛, where at least one of 𝑖, 𝑖 + 1 must be in the subset for all 1 ≤ 𝑖 < 𝑛.

In the next chapter on brute force, we will revisit this way of thinking in a
new light as we use recursion to solve optimization problems rather than simply
counting ways.

125
C HAPTER 7. R ECURSION

7.4 Multidimensional Recursion


So far, every recursive solution we produced were about a single sequence of
numbers – the input was an integer 𝑛, and we computed the 𝑛’th value of the
sequence by a recursive definition.
This is far from the only area where recursion can be applied. Problem
instances comes in all forms and shapes. Recursing on more advanced problems
can sometimes give us several recursive sequences that not only refer to
themselves, but smaller values of the other sequences.

Varied Amusements
Marika and Lisa loves going to amusement parks. This time, they have their
eyes set on a park with lots of exciting rides of three different types: tilt-a-whirls,
roller coasters and drop towers. There are 1 ≤ 𝑎 ≤ 10 different tilt-a-whirls,
1 ≤ 𝑏 ≤ 10 roller coasters and 1 ≤ 𝑐 ≤ 10 drop towers. They want to ride
1 ≤ 𝑛 ≤ 10 different rides in sequence, but never two rides of the same type in a
row. In how many ways can they choose such sequences of 𝑛?

Solution. On the surface, the problem is a prime candidate for the choice-strategy.
There are 𝑛 choices – what ride to go on each time. However, once we choose
the last ride the girls took, we are faced with a problem. If we chose a roller
coaster, the first 𝑛 − 1 rides may not end with a roller coaster. This is not a
smaller instance of the same problem, where the last ride could be anyone we
wanted. Instead, depending on the type of ride we choose as the last one, we get
three different problems: How many sequences of 𝑛 − 1 rides are there that does
not end with A) a tilt-a-whirl? B) a roller coaster? or C) a drop tower?
What happens if we apply the same strategy to these three new problems?
Well, in the problem where we have to choose an 𝑛 − 1 ride sequence that does
not end with a tilt-a-whirl, there are two options for the last ride. If it was a
roller coaster, we have to choose the remaining 𝑛 − 2 rides such that they do not
end with a roller coaster. If it was a drop tower, the remaining 𝑛 − 2 rides may
not end with a drop tower. Either way, both cases reduce to a smaller problem
of the other two types!
By introducing the three new problems 𝐴(𝑛), 𝐵(𝑛) and 𝐶 (𝑛), defined as the
number of ride sequences of length 𝑛 not ending in a tilt-a-whirl, roller coaster
or drop tower respectively, we can produce recursive definitions that refer only

126
7.5. R ECURSION VS . I TERATION

to these recursions:

𝐴(𝑛) = 𝑏 · 𝐵(𝑛 − 1) + 𝑐 · 𝐶 (𝑛 − 1)
𝐵(𝑛) = 𝑎 · 𝐴(𝑛 − 1) + 𝑐 · 𝐶 (𝑛 − 1)
𝐶 (𝑛) = 𝑎 · 𝐴(𝑛 − 1) + 𝑏 · 𝐵(𝑛 − 1)

with the only required base case of 𝐴(0) = 𝐵(0) = 𝐶 (0) = 1. The answer then
becomes 𝑎 · 𝐴(𝑛 − 1) + 𝑏 · 𝐵(𝑛 − 1) + 𝑐 · 𝐶 (𝑛 − 1).
When implementing the solution in C++, don’t forget results from Exercises
2.20-2.21 to resolve the circular dependencies between functions calling each
other. 

Exercise 7.7. Determine, with proof, the time complexity of the solution to
Varied Amusements.

Problem 7.2
Varied Amusements – variedamusements
Note: solve the first subtask for 1 point.

7.5 Recursion vs. Iteration


After looking through the problems we solved in this chapter, you might wonder
if recursion really is needed. When computing the Fibonacci numbers by hand,
we tend to not use a method nearly as complicated as the recursive function.
Instead, we write them down one by one after each other, taking the sum of
the last two ones to compute the next. This approach could be implemented
iteratively in code, looking something like this:
1 int F(int n) {
2 int secondLast = 0; // stores F_(i-2)
3 int last = 1; // stores F_(i-1)
4 for (int i = 2; i <= n; i++) {
5 int current = last + secondLast; // Compute F_i = F_(i-2) + F_(i-1)
6 // Since i will be be increased by 1, the old F_(i-1) becomes
7 // F_(i-2), and the old F_i becomes F_(i-1)
8 secondLast = last;
9 last = current;
10 }
11 return last;
12 }

Algorithmically, recursion – in the sense of functions calling themselves


– is not needed. As a programming construct it doesn’t bring any additional

127
C HAPTER 7. R ECURSION

computational powers. Recursive functions can even be simulated with a single


loop and a stack, by storing the current chain of recursive calls in the stack and
processing them one at a time in the loop. This is actually what your computer
does behind the scenes2.
The reason behind the strong focus in recursion is another, namely that it is
an incredibly powerful mode of thinking. To us, it is unclear how one would find
a natural solution to the Dominoes problem without going in with a recursive
mindset and looking for that reduction to a smaller instance. That being said,
once a recursive formulation has been deduced, an iterative implementation can
many times be simpler or faster. For example, try to compute 𝐹 46 using both the
recursive and iterative approach. You’ll see that one of the two versions finish,
and one do not.
Problem 7.3
Varied Amusements – variedamusements
Note: solve the first two subtasks for 2 points.
The Plank – plankan
Note: solve both subtasks for 100 points.

Chapter Exercises

Exercise 7.8. There are 𝑛 lines drawn in the place, no three lines intersecting in
the same point. What is the number of connected regions they split the plane
into?

Problem 7.4
3 × 𝑛 Dominoes – 3xndominoes
Note: solve the first subtask for 2 points.
3-close Sets – 3close
Note: solve the first two subtasks for 2 point.
Even A’s, Odd B’s – evenaoddb
Note: solve the first two subtasks for 2 point.

2At a low level, modern processors are essentially a single execution loop with a stack for
function-related memory.

128
7.5. R ECURSION VS . I TERATION

Chapter Notes
Recursion as a problem solving technique is a common one both in mathematics
and algorithmics. In mathematics, there tends to be a larger focus in finding
closed forms for the recurions though, while we are happy with any kind of
efficient computation. There is a rich combinatorial theory behind finding such
closed forms. As previously mentioned, Concrete Mathematics[?] is one of the
mathematics books that really excel on teaching these techniques.

129
C HAPTER 7. R ECURSION

130
8 Graph Theory
We finish this foundational part with an introduction to graph theory, the study
of mathemtical objects known as graphs. As a mathematical area of study, it
dates back to the early 1700s, when Euler first studied the famous Seven Bridges
of Königsberg problem. It is one of the most well-studied areas in algorithmic
problem solving, being one of only two topics (together with data structures) to
make an appearance in all three parts of this book. In almost every programming
contests you can find a problem relating to graphs.

8.1 Graphs
A graph is an abstract way of representing various types of relations, such
as roads between cities, friendships between people, networks links between
computers and so on. Graphs are essentially a set of objects where certain
pairs of objects are connected. Formally, graphs are defined in the following
way.

Definition 8.1 A simple graph 𝐺 = (𝑉 , 𝐸) consists of a pair 𝑉 of vertices,


and a set 𝐸 of edges. An edge consists of a pair of vertices {𝑢, 𝑣 } called the
endpoints of the edge.
A graph lends itself naturally to a graphical representation, where vertices
are represented by points in the plane with lines drawn between the two
vertices of an edge. For example, the graph given by 𝑉 = {1, 2, 3, 4, 5} and
𝐸 = {{1, 2}, {2, 3}, {3, 4}, {2, 4}} can be drawn as in Figure 8.1.

4 3

1 2 5

Figure 8.1: An example graph with 6 verices and 4 edges.

Exercise 8.1. Draw the graphical representation of the graph with vertices

131
C HAPTER 8. G RAPH T HEORY

{𝑎, 𝑏, 𝑐, 𝑑 } and edges {{𝑎, 𝑏}, {𝑏, 𝑐}, {𝑐, 𝑑 }, {𝑎, 𝑑 }, {𝑏, 𝑑 }}.

Exercise 8.2. The graph on 𝑛 vertices containing all possible edges is called the
complete graph, or 𝐾𝑛 . How many edges does 𝐾𝑛 have?

Trip Planning
Lars is planning to do a backpacking tour by train throughout 𝑁 cities in Europe.
He has a list of the 𝑀 train lines that go back and forth between pairs of these
cities. He wants to visit the cities in the order 1, 2, . . . , 𝑁 , finally returning back
to his home in city 1.
Since Lars have limited vacation days, he only has time to take exactly 𝑁
direct trains during his trip. Can you determine if this is possible, and tell Lars
which trains to take?
Input
The first line contains the number of cities 𝑁 ≤ 106 that Lars wants to visit, and
𝑀 ≤ 106 , the number of direct trains.
The next 𝑀 lines each contain two integers 1 ≤ 𝑎 ≠ 𝑏 ≤ 𝑁 , indicating that
there is a train line traveling between cities 𝑎 and 𝑏. No two train lines will have
the same two integers.
Output
If Lars cannot perform his trip taking only 𝑁 trains, output no trip. Otherwise,
output the numbers of the 𝑁 train lines that lines Lars should take (in order of
travel), where the train lines are numbered from 1 to 𝑀 in the order they appear
in the input.

Solution. The problem essentially asks if there are direct train lines between
cities (1, 2), (2, 3), . . . , (𝑁 − 1, 𝑁 ), (𝑁 , 1). If there are, we want to find the
indices of these lines in the input. This is a typical problem that can be modelled
as a graph. In those terms, we have a graph on 𝑁 vertices with its 𝑀 edges given
in a list. We are asked if the graph contains a certain list of edges.
A possible solution would be to keep a vector of the indices of these particular
edges while we read the list of edges. If we find the edge {𝑘, 𝑘 + 1} for a given
𝑘, we can store the index of the edge in the 𝑘’th position in the vector. Only if
we managed to find every edge should we reply with their indexes. Otherwise,
we would output no trip. 

132
8.1. G RAPHS

Problem 8.1
Trip Planning – tripplanning

Of particular importance is the set of vertices that are connected to a given


vertex by edges.

Definition 8.2 For a graph 𝐺 = (𝑉 , 𝐸), if 𝑢, 𝑣 ∈ 𝐸, 𝑢 and 𝑣 are called


neighbours.
The set of neighbours of a vertex 𝑣 is denoted 𝑁 (𝑣), called the neigh-
bourhood of 𝑣.
The size of the neighbourhood of a vertex 𝑣 is called the degree of 𝑣. It
is denoted deg(𝑣).
The degrees of the vertices in a graph fulfill many useful properties.

Theorem 8.1
The sum of degrees of a graph 𝐺 = (𝑉 , 𝐸) is even. Specifically,
Õ
deg(𝑣) = 2|𝐸|.
𝑣 ∈𝑉

Proof. We prove the statement by induction. If the graph contains no edges,


the degree of each vertex is clearly 0, and |𝐸| = 0, so the equality holds.
Now, assume that it holds for all graphs of 𝑘 edges. If we add a single
edge {𝑢, 𝑣 }, to this graph, we add one more vertex to the neighbourhoods of
𝑢 and 𝑣. This increases deg(𝑢) and deg(𝑣) by 1, increasing the left hand side
by 2. It also increases |𝐸| by 1, so the right hand side is increased by 2 as
well, thus the identity holds for all graphfs on 𝑘 + 1 edges too.
By the principle of induction, the statement must hold for all graphs. 

Example 8.1 In a graph 𝐺, the degrees of the vertices are 3, 5, 4, 4, 4, 6, 6


respectively. Prove that there is a sequence of neighbouring vertices starting
and ending with the vertices of degree 3 and 5.

Solution. Let 𝑆 be the set of vertices 𝑥 for which there is a sequence of


neighbouring vertices starting at the vertex of degree 3 and ending at 𝑥. Let
𝐻 be a new graph with 𝑆 as vertices, containing as edges only those that

133
C HAPTER 8. G RAPH T HEORY

have vertices of 𝑆 as endpoints. Note that 𝑆 must contain every neighbour


of every vertex 𝑥 in 𝑆 – if there is a sequence of neighbouring vertices from
the degree 3 vertex to 𝑥, such a sequence to the neighbours of 𝑥 can be
constructed by appending the neighbour it to the sequence of vertices ending
at 𝑥. This means that the degree of all vertices in 𝐻 have the same degree as
they had in 𝐺.
Now, the sum of the degrees of all vertices in 𝑆 is even by Theorem 8.1.
Since 𝑆 contains a vertex of degree 3, it must contain another vertex of odd
degree (or the sum would be odd). The only such vertex is the one of degree
5. By the definition of 𝑆, this means the requested sequence of neighbouring
vertices between the two vertices exist. 

Exercise 8.3. Prove that in a simple graph of at least 2 vertices, there must exist
2 vertices of the same degree.

Problem 8.2
Given a graph, print all the vertices with the highest degrees.

While the simple graph is able to represent many kinds of relations, we


sometimes need variations to capture all the information we are interested in.
Consider a graph representing roads between cities, where the vertices represent
the cities and the edges correspond to the roads between them. In this case, we
might be interested in also capturing the lengths of the roads between all the
cities as well: a situation depicted in Figure 8.2.

82 km
F-field G-grad
km

37

km

46
km

km
50

15

B-burg C-city D-dorf

Figure 8.2: A road network on 5 cities.

We can modify our definition of an edge to include such a number, giving


us a new type of graph.

134
8.2. R EPRESENTING G RAPHS

Definition 8.3 A weighted graph is a graph (𝑉 , 𝐸) together with a weight


function 𝑤 associating each edge 𝑒 ∈ 𝐸 with a real-valued weight 𝑤 (𝑒).
Weighted graphs often appear when there is a natural measure of an edge, such
the distance between two cities, the latency between two computers, or the cost
of a train ticket between two airports.
Problem 8.3
Compute the shortest triangle in a weighted graph, where 𝐸 ≤ 100.

Finally, not all relations we model are symmetric in the way indicated by
graphs. In many situations, we would instead prefer if an edge could have a
certain direction, going from a vertex to another vertex. For example, when
modelling all the car roads in a city, certain roads may be one-way, a nuance the
simple graph would miss. We fix this by making edges ordered pairs rather than
sets:
Definition 8.4 A directed graph is a graph (𝑉 , 𝐸) where 𝐸 consists of directed
edges, i.e. ordered pairs 𝑒 = (𝑢, 𝑣) of vertices. The edge 𝑒 is called an
out-edge of 𝑢 and in-edge of 𝑣.
When representing directed graphs graphically, edges will be arrows, with the
arrowhead pointing from 𝑢 to 𝑣 (Figure 8.3).

4 3

1 2

Figure 8.3: The graph given by 𝑉 = {1, 2, 3, 4} and 𝐸 = { (1, 2), (3, 1), (4, 2), (4, 1) }.

Problem 8.4
Determine if there exists a directed triangle in a graph, where 𝐸 ≤ 100.

8.2 Representing Graphs


When dealing with graphs in algorithms, there are three common data structures
used to represent them: adjacency matrices, adjacency lists and adjacency
maps. Occasionally we also represent graphs implicitly – for example by a
function that takes a vertex 𝑣 and provides us with all edges it’s an endpoint of.

135
C HAPTER 8. G RAPH T HEORY

This latter representation is common when dealing with searches in the graph
corresponding to the positions in a game such as chess.
In the following sections, we present the representation of the directed,
unweighted graph in Figure 8.4.

4 3

1 2 5

Figure 8.4: An example graph with 6 verices and 5 edges.

Adjacency Matrices
An adjacency matrix represents a graph 𝐺 = (𝑉 , 𝐸) with |𝑉 | as a 2D |𝑉 | × |𝑉 |
matrix in the following way:

Definition 8.5 If 𝐺 = (𝑉 , 𝐸, 𝑤) is a directed, weighted graph and 𝑉 =


{𝑣 1, ..., 𝑣𝑛 }, the graph’s adjacency matrix the |𝑉 | × |𝑉 |-matrix 𝐴 = (𝑎𝑖,𝑗 )
where 𝑎𝑖,𝑗 = 𝑤 (𝑣𝑖 𝑣 𝑗 ).
For undirected graphs, we set 𝑎𝑖,𝑗 = 𝑎 𝑗,𝑖 = 𝑤 ({𝑣𝑖 , 𝑣 𝑗 }).
For unweighted graphs, we set 𝑎𝑖,𝑗 = 1.
This representation uses Θ(|𝑉 | 2 ) memory, and takes 𝑂 (1) time for adding,
modifying and removing edges. To iterate through the neighbours of a vertex,
you need Θ(|𝑉 |) time, independent of the number of neighbours of the vertex
itself.
Adjacency matrices are best to use when |𝑉 | 2 ≈ |𝐸|, i.e. when the graph is
dense.
The adjacency matrix for the directed, unweighted graph in Figure 8.4 is:

0 1 0 0 0 0
0 0 1 0 0 0
© ª
­ ®
0 0 0 1 0 0
­ ®
­ ®
­ ®
­
­ 0 1 0 0 0 0 ®
®
­
­ 0 0 0 0 1 0 ®
®
« 0 0 0 0 0 0 ¬

136
8.2. R EPRESENTING G RAPHS

Adjacency Lists
Another way to represent graphs is by storing lists of neighbours for every vertex.
This approach is called adjacency lists. This only requres Θ(|𝐸| + |𝑉 |) memory,
which is better when your graph have few edges, i.e. it is sparse. If you use
a vector to represent each list of neighbours, you also get Θ(1) addition and
removal (if you know the index of the edge and ignore their order) of edges,
but it takes 𝑂 (|𝑉 |) time to determine if an edge exists. On the upside, iterating
through the neighbours of a vertex takes time proportional to the number of
neighbours instead of the number of vertices in the graph. This means that
iterating through all the neighbours of all vertices takes time Θ(|𝐸| + |𝑉 |) instead
of Θ(|𝑉 | 2 ) as for the adjacency matrix. For large, sparse graphs this is clearly
better!
When representing weighted graphs, the list usually stores the edges as pairs
of (neighbour, weight) instead. For undirected graphs, both endpoints of an
edge contains the other in their adjacency lists.
This representation is common in many graph search algorithms to be studied
in Chapter 14.

Adjacency Maps
An adjacency map combines the adjacency matrix with the adjacency list to
get the benefits of both the matrix (Θ(1) time to check if an edge exists) and
the lists (low memory usage and fast neighbourhood iteration). Instead of using
lists of neighbours for each vertex, we can use a hash table for each vertex.
This has the same time and memory complexities as the adjacency lists, but
it also allows for checking if an edge is present in Θ(1) time. The downsides are
that hash tables have a higher constant factor than the adjacency list, and that
you lose the ordering you have of your neighbours (if this is important). The
adjacency map also inherits another sometimes important property from the
matrix: you can remove arbitrary edges in Θ(1) time!
This representation is mostly used when one is dynamically modifying a
graph.

Exercise 8.4. Given a graph, which representation or representations are suitable


if

a) |𝑉 | = 1000 and |𝐸| = 499500

b) |𝑉 | = 10000 and |𝐸| = 20000

137
C HAPTER 8. G RAPH T HEORY

c) |𝑉 | = 1000 and |𝐸| = 10000


Exercise 8.5. Implement programs that take unweighted, directed graphs as
input and:
• output the degree of each vertex,
• can quickly determine whether a certain edge exists or not, and
• finds a triangle – i.e. a cycle of length 3 – in a graph.

8.3 Breadth-First Search


A path is a sequence of distinct vertices 𝑝 0 , 𝑝 1 , ..., 𝑝𝑙−1 , 𝑝𝑙 such that {𝑝𝑖 , 𝑝𝑖+1 } ∈ 𝐸
(𝑖 = 0, 1, ..., 𝑙 − 1). This means any two vertices on a path must be connected
by an edge. We say that this path has length 𝑙, since it consists of 𝑙 edges. In
Figure ??, the sequence 3, 1, 4, 2 is a path of length 3.
One of the most common basic graph algorithms is the breadth-first search.
It is used to find the distances from a certain vertex in an unweighted graph.

Single-Source Shortest Path, Unweighted Edges


Given an unweighted graph 𝐺 = (𝑉 , 𝐸) and a source vertex 𝑠, compute the
shortest distances 𝑑 (𝑠, 𝑣) for all 𝑣 ∈ 𝑉 .
For simplicity, we first consider the problem on a grid graph, where the
unit squares constitute vertices, and vertices which share an edge are connected.
Additionally, some squares are blocked (and don’t have a corresponding vertex).
An example can be seen in Figure 14.1.

Figure 8.5: An example grid graph, with source marked 𝑠.

Let us solve this problem inductively. First of all, what vertices have distance
0? Clearly, this is only the source vertex 𝑠 itself. This seems like a reasonable

138
8.3. B READTH -F IRST S EARCH

base case, since the problem is about shortest paths from 𝑠. Then, what vertices
have distance 1? These are exactly those with a path consisting of a single edge
from 𝑠, meaning they are the neighbors of 𝑠 (marked in Figure 14.2).

1 s
1

Figure 8.6: The square with distance 1 from the source.

If a vertex 𝑣 has distance 2, it must be a neighbor of a vertex 𝑢 with distance


1 (except for the starting vertex). This is also a sufficient condition, since we can
construct a path of length 2 simply by extending the path of any neighbor of
distance 1 with the edge (𝑢, 𝑣).

4
2 1 s 3 2 1 s 3 2 1 s
2 1 2 3 2 1 2 4 3 2 1 2

Figure 8.7: The squares with distance 2, 3 and 4.

In fact, this reasoning generalizes to any particular distance, i.e., that all the
vertices that have exactly distance 𝑘 are those that have a neighbor of distance
𝑘 − 1 but no neighbor to a vertex with a smaller distance. Using this, we can
construct an algorithm to solve the problem. Initially, we set the distance of 𝑠
to 0. Then, for every dist = 1, 2, . . . , we mark all vertices that have a neighbor
with distance dist − 1 as having dist. This algorithm is called the breadth-first
search.
Exercise 8.6. Use the BFS algorithm to compute the distance to every square in
the following grid:

139
C HAPTER 8. G RAPH T HEORY

A simple implementation of this would be to iteratively construct the lists of


the vertices which have distance 0, 1, . . . , and so on.

1: procedure BreadthFirstSearch(vertices 𝑉 , vertex 𝑠)


2: distances ← new int[|𝑉 |]
3: fill distances with ∞
4: curDist ← 0
5: curVertices ← new vector
6: curVertices. add(𝑠)
7: distances[𝑠] ← curDist
8: while curDistVertices ≠ ∅ do
9: nextVertices ← new vector
10: for from ∈ curVertices do
11: for 𝑣 ∈ from. neighbours do
12: if distances[𝑣] = ∞ then
13: nextVertices. add(𝑣)
14: distances[𝑣] ← curDist + 1
15: curDist ← curDist + 1
16: curVertices = nextVertices
17: return distances

Each vertex is added to nextVertices at most once, since it is only pushed if


distances[v] = ∞ whereupon it is immediately set to something else. We then
iterate through every neighbor of all these vertices. In total, the number of all
neighbours is 2𝐸, so the algorithm in total uses Θ(𝑉 + 𝐸) time.
Usually, the outer loop are often coded in another way. Instead of maintaining
two separate vectors, we can merge them into a single queue:

1: while curVertices ≠ ∅ do

140
8.3. B READTH -F IRST S EARCH

2: from ← curVertices. front()


3: curVertices. pop()
4: for from ∈ curVertices do
5: for 𝑣 ∈ from. neighbours do
6: if distances[𝑣] = ∞ then
7: curVertices. add(𝑣)
8: distances[𝑣] = distances[from] + 1

The order of iteration is equivalent to the original order.


Exercise 8.7. Prove that the shorter way of coding the BFS loop (Algorithm ??)
is equivalent to the longer version (Algorithm ??).
Exercise 8.8. Implement BFS problem
In many problems the task is to find a shortest path between some pair of
vertices where the graph is given implicitly.

8-puzzle
In the 8-puzzle, 8 tiles are arranged in a 3 × 3 grid, with one square left empty.
A move in the puzzle consists of sliding a tile into the empty square. The goal
of the puzzle is to perform some moves to reach the target configuration. The
target configuration has the empty square in the bottom right corner, with the
numbers in order 1, 2, 3, 4, 5, 6, 7, 8 on the three lines.

8 6 8 6 1 2 3
7 1 4 7 1 4 4 5 6
2 5 3 2 5 3 7 8

Figure 8.8: An example 8-puzzle, with a valid move. The rightmost puzzle shows the target
configuration.

Given a puzzle, determine how many moves are required to solve it, or if it
cannot be solved.
This is a typical BFS problem, characterized by a starting state (the initial
puzzle), some transitions (the moves we can make), and the task of finding a
short sequence of transitions to some goal state. We can model this kind of
problem using a graph. The vertices represent the possible arrangements of

141
C HAPTER 8. G RAPH T HEORY

the tiles in the grid, and an edge connects two states if the differ by a single
move. A sequence of moves from the starting state to the target configuration
then represents a path in this graph. The minimum number of moves required is
the same as the distance between those vertices in the graph, meaning we can
use a BFS.
In such a problem, most of the code usually deals with with the representation
of a state as a vertex, and generating the edges that a certain vertex is adjacent to.
When an implicit graph is given, we generally do not compute the entire graph
explicitly. Instead, we use the states from the problems as-is, and generate the
edges of a vertex only when it is being visited in the breadth-first search. In the
8-puzzle, we can represent each state as a 3 × 3 2D-vector. The difficult part is
generating all the states that we can reach from a certain state.

Generating 8-puzzle Moves


1 typedef vector<vi> Puzzle;
2
3 vector<Puzzle> edges(const Puzzle& v) {
4 int emptyRow, emptyCol;
5 rep(row,0,3)
6 rep(col,0,3)
7 if (v[row][col] == 0) {
8 emptyRow = row;
9 emptyCol = col;
10 }
11 vector<Puzzle> possibleMoves;
12 auto makeMove = [&](int rowMove, int colMove) {
13 int newRow = emptyRow + rowMove;
14 int newCol = emptyCol + colMove;
15 if (newRow >= 0 && newCol >= 0 && newRow < 3 && newCol < 3) {
16 Puzzle newPuzzle = v;
17 swap(newPuzzle[emptyRow][emptyCol], newPuzzle[newRow][newCol]);
18 possibleMoves.push_back(newPuzzle);
19 }
20 };
21 makeMove(-1, 0);
22 makeMove(1, 0);
23 makeMove(0, -1);
24 makeMove(0, 1);
25 return possibleMoves;
26 }

With the edge generation in hand, the rest of the solution is a normal BFS,
slightly modified to account for the fact that our vertices are no longer numbered
0, . . . , 𝑉 − 1. We can solve this by using e.g. maps instead.

142
8.4. D EPTH -F IRST S EARCH

8-puzzle BFS
1 int puzzle(const Puzzle& S, const Puzzle& target) {
2 map<Puzzle, int> distances;
3 distances[S] = 0;
4 queue<Puzzle> q;
5 q.push(S);
6 while (!q.empty()) {
7 const Puzzle& cur = q.front(); q.pop();
8 int dist = distances[cur];
9 if (cur == target) return dist;
10 for (const Puzzle& move : edges(cur)) {
11 if (distances.find(move) != distances.end()) continue;
12 distances[move] = dist + 1;
13 q.push(move);
14 }
15 }
16 return -1;
17 }

Besides this kind of search problems that can be solved using a BFS, some
problems require modifications of a BFS, or use the distances generated only as
an intermediary result.

Shortest Cycle
Compute the length of the shortest simple cycle in a graph.

Problem 8.5
Button Bashing – buttonbashing

8.4 Depth-First Search


If a path additionally satisfy that {𝑝 0, 𝑝𝑙 } ∈ 𝐸, we may append this edge to make
the path cyclical. This is called a cycle . Similarly, a walk with starts and ends
at the same vertex is called a closed walk. If a trail starts and ends at the same
vertex, we call it a closed trail.
A graph where any pair of vertices have a path between then is called a
connected graph. The (maximal) subsets of a graph which are connected form
the connected components of the graph. In Figure ??, the graph consists of two
components, {1, 2, 3, 4} and {5}.
The depth-first search is an analogue to the breadth-first search that visits
vertices in another order. Similarly to how the BFS grows the set of visited
vertices using a wide frontier around the source vertex, the depth-first search

143
C HAPTER 8. G RAPH T HEORY

proceeds its search by, at every step, trying to plunge deeper into the graph.
This order is called the depth-first order. More precisely, the search starts at
some source vertex 𝑠. Then, any neighbor of 𝑠 is chosen to be the next vertex 𝑣.
Before visiting any other neighbor of 𝑠, we first visit any of the neighbours of 𝑣,
and so on.
Implementing the depth-first search is usually done with a recursive function,
using a vector seen to keep track of visited vertices:

1: procedure Depth-First Search(vertex at, adjacency list 𝐺)


2: if seen[at] then
3: return
4: seen[at] = true
5: for neighbour ∈ 𝐺 [at] do
6: dfs(neighbour, 𝐺)

In languages with limited stack space, it is possible to implement the DFS


iteratively using a stack instead, keeping the vertices which are currently open
in it.
Due to the simplicity of coding the DFS compared to a BFS, it is usu-
ally the algorithm of choice in problems where we want to visit all the ver-
tices.

Coast Length
KTH Challenge 2011 – Ulf Lundström
The residents of Soteholm value their coast highly and therefore want to maximize
its total length. For them to be able to make an informed decision on their
position in the issue of global warming, you have to help them find out whether
their coastal line will shrink or expand if the sea level rises. From height maps
they have figured out what parts of their islands will be covered by water, under
the different scenarios described in the latest IPCC report on climate change,
but they need your help to calculate the length of the coastal lines.

144
8.4. D EPTH -F IRST S EARCH

Figure 8.9: Gray squares are land and white squares are water. The thick black line is the sea
coast.

You will be given a map of Soteholm as an 𝑁 × 𝑀 grid. Each square in


the grid has a side length of 1 km and is either water or land. Your goal is to
compute the total length of sea coast of all islands. Sea coast is all borders
between land and sea, and sea is any water connected to an edge of the map
only through water. Two squares are connected if they share an edge. You may
assume that the map is surrounded by sea. Lakes and islands in lakes are not
contributing to the sea coast.

Solution. We can consider the grid as a graph, where all the water squares are
vertices, and two squares have an edge between them if they share an edge. If
we surround the entire grid by an water tiles (a useful trick to avoid special cases
in this kind of grid problems), the sea consists exactly of those vertices that are
connected to these surrounding water tiles. This means we need to compute the
vertices which lie in the same connected component as the sea – a typical DFS
task1. After computing this component, we can determine the coast length by
looking at all the squares which belong to the sea. If such a square share an edge
with a land tile, that edge contributes 1 km to the coast length.
1 const vpi moves = {pii(-1, 0), pii(1, 0), pii(0, -1), pii(0, 1)};
2
3 int coastLength(const vector<vector<bool>>& G) {
4 int H = sz(G) + 4;
5 W = sz(G[0]) + 4;
6 vector<vector<bool>> G2(H, vector<bool>(W, true));
7 rep(i,0,sz(G)) rep(j,0,sz(G[i])) G2[i+2][j+2] = G[i][j];
8 vector<vector<bool>> sea(H, vector<bool>(W));
9

1This particular application of DFS, i.e. computing a connected area in a 2D grid, is called a
flood fill.

145
C HAPTER 8. G RAPH T HEORY

10 function<void(int, int)> floodFill = [&](int row, int col) {


11 if (row < 0 || row >= H|| col < 0 || col >= W) return;
12 if (sea[row][col]) return;
13 sea[row][col] = true;
14 trav(move, moves) floodFill(row + move.first, col + move.second);
15 };
16 dfs(0, 0);
17
18 int coast = 0;
19 rep(i,1,sz(G)+1) rep(j,1,sz(G[0])+1) {
20 if (sea[i][j]) continue;
21 trav(move, moves) if (!sea[i + move.first][j + move.second]) coast++;
22 }
23 return coast;
24 }

Problem 8.6
Mårten’s DFS – martensdfs

8.5 Trees
A tree is a special kind of graph – a connected graph which does not contain any
cycle. The graph in Figure ?? is not a tree, since it contains the cycle 1, 2, 4, 1.
The graph in Figure 8.10 on the other hand, contains no cycle.

Figure 8.10: The tree given by 𝑉 = {1, 2, 3, 4} and 𝐸 = { {1, 2}, {3, 1}, {4, 1} }.

Exercise 8.9. Prove that a tree of 𝑛 vertices have exactly 𝑛 − 1 edges.

Chapter Exercises
Given a graph, determine how many edges must be added to make it regular.

Chapter Notes

146
Part II

Basics

147
9 Brute Force
Many problems are solved by testing a large number of possibilities. For example,
chess engines work by testing countless variations of moves and choosing the
ones resulting in the “best” positions. This approach is called brute force. Brute
force algorithms exploit that computers are fast, resulting in you having to be less
smart. Just as with chess engines, brute force solutions might still require some
ingenuity. A brute force problem might have a simple algorithm which requires
a computer to evaluate 240 options, while some deeper analysis might be able to
reduce this to 220 . This would be a huge reduction in running time. Different
approaches to brute force may be the key factor in reaching the latter case instead
of the former. In this chapter, we look at four different techniques used to solve
brute force problems, ranging from the simplicity of just evaluating every single
option to an advanced memory-time tradeoff called meet-in-the-middle.

9.1 Optimization Problems


In an optimization problem, we have some solution set 𝑆 and a value function
𝑓 . The goal is to find an 𝑥 ∈ 𝑆 which maximize 𝑓 (𝑥), i.e., optimizing the
function.
Optimization problems constitute a large class of the problems that we solve
in algorithmic problem solving, such as the Max Clique problem and the Buying
Books problems we examine in this chapter. One of the most famous optimization
problems is the NP-complete Travelling Salesman Problem. The problem asks
for the shortest cycle that visits all vertices of a weighted graph. The practical
applications of this problem are many. A logistics company that must perform
a number of deliveries probably want to minimize the distance traveled when
visiting all the points of delivery. When planning your backpacking vacation,
you may prefer to minimize the cost of travelling between all your destinations.
In this problem, the solution set 𝑆 would consist of all cycles in the graph that
visit all the vertices, with 𝑓 (𝑥) being the sum of all edges in the cycle 𝑥.
The brute force technique essentially consists of evaluating 𝑓 (𝑥) for a large
number (sometimes even all) of 𝑥 ∈ 𝑆. For large 𝑆, this is slow.

149
C HAPTER 9. B RUTE F ORCE

The focus of this chapter and the chapters on Greedy Algorithms (Chapter 10)
and Dynamic Programming (Chapter 11) is to develop techniques that exploit
particular structures of optimization problems to avoid evaluating the entire set
𝑆.

9.2 Generate and Test


Our first brute force method is the generate and test method. This particular
brute force strategy consists of generating solutions – naively constructing
candidate solutions to a problem – and then testing them – evaluating the value
function of them and at the same time removing invalid solutions. It is applicable
whenever the number of candidate solutions is quite small.

Max Clique
In a graph, a subset of the vertices form a clique if each pair of vertices is
connected by an edge.

3
1 5

2
0

Figure 9.1: A graph with a clique of size 4 higlighted.

Given a graph on 𝑉 vertices and 𝐸 edges, determine the size of the largest
clique.

This problem is one of the so-called NP-complete problems we mentioned


in Chapter 5. Thus, a polynomial-time solution is currently out of reach. We
solve the problem for 𝑉 ≤ 15.
Is a generate and test approach is suitable? To know, we must first define
what our candidate solutions are. In this problem, only one object comes
naturally; subsets of vertices. For every such subset candidate, we must test

150
9.2. G ENERATE AND T EST

whether it is a clique. If it is, its size must be computed and the largest clique
chosen.
In the Max Clique problem, there are only 2𝑉 subsets of vertices (and
candidate solutions) – a quite small number. Given such a set, we can verify
whether it is a clique in 𝑂 (𝑉 2 ) time by checking if every pair of vertices in the
candiate set has an edge between them. To perform this check in Θ(1) time, we
keep a 2D vector 𝑎𝑑 𝑗 such that 𝑎𝑑 𝑗 [𝑖] [ 𝑗] is true if and only if vertices 𝑖 and 𝑗
are adjacent to each other. This gives us a total complexity of Θ(2𝑉 · 𝑉 2 ) in
the worst case. According to our table of approximate allowed input sizes for
various complexities (p. 88), this should be fast enough for 𝑉 = 15.
Max Clique
1 int V, E;
2 cin >> V >> E;
3 vector<vector<bool>> adj(V, vector<bool>(V));
4 rep(i,0,E) {
5 int a, b;
6 cin >> a >> b;
7 adj[a][b] = adj[b][a] = true;
8 }
9 rep(i,0,V) adj[i][i] = true;
10
11 int ans = 0;
12 rep(subset,0,1<<V) {
13 bool isClique = true;
14 rep(i,0,V) {
15 // Skip if the subset does not contain i
16 if ((subset & (1 << i)) == 0) continue;
17 rep(j,0,V) {
18 // Skip if the subset does not contain j
19 if ((subset & (1 << j)) == 0) continue;
20 if (!adj[i][j]) {
21 // Subset contained both i, j and are not neighbors.
22 isClique = false;
23 }
24 }
25 }
26 if (isClique) {
27 ans = max(ans, __builtin_popcount(subset));
28 }
29 }
30 cout << ans << endl;

Note the nifty use of integers interpreted as bitsets to easily iterate over every
possible subset of an 𝑉 -element set, a common technique in generate and test
solutions based on subsets.

151
C HAPTER 9. B RUTE F ORCE

Problem 9.1
Max Clique – maxclique

This kind of brute force problem is often easy to spot. There will be a very
small input limit on the parameter you are to brute force over. The solution will
often be subsets of some larger base set (such as the vertices of a graph), or
combinations of several small sets.

Exercise 9.1. Algorithm ?? can be made to run in Θ(2𝑁 · 𝑁 ) by using bitsets to


represent the neighbourhoods. Figure out how to do this.

Problem 9.2
4 thought – 4thought
Lifting Walls – walls

Let us look at another example of this technique, where the answer is not
just a subset.

The Clock
Swedish Olympiad in Informatics 2004, School Qualifiers (CC BY-SA 3.0)
When someone asks you what time it is, most people respond “a quarter past
five”, “15:29” or something similar. If you want to make things a bit harder, you
can answer with the angle from the minute hand to the hour hand, since this
uniquely determines the time. However, most people are not used to this way of
specifying the time, so it would be nice to have a program which translates this
to a more common format.

152
9.2. G ENERATE AND T EST

12
11 1

10 2
75◦

9 3
105◦

8 4
180◦
7 5
6
Figure 9.2: The angle between the hands at time :30.

We assume that our clock have no seconds hand, and only display the time
at whole minutes (i.e., both hands only move forward once a minute). The angle
is determined by starting at the hour hand and measuring the number of degrees
clockwise to the minute hand. To avoid decimals, this angle is specified in tenths
of a degree.
Input
The first and only line of input contains a single integer 0 ≤ 𝐴 < 3600, the angle
specified in tenths of a degree.
Output
Output the time in the format hh:mm between 00:00 and 11:59.
It is difficult to come up with a formula that gives the correct times as a function
of the angles between the hands on a clock. Instead, we can turn the problem
around. If we know what the time is, can we compute the angle between the two
hands of the clock?
Assume that the time is currently ℎ hours and 𝑚 minutes. The minutes hand
is then at angle 360
60 𝑚 = 6𝑚 degrees clockwise from straight up. Similarly, the
hour hand moves 360 12 ℎ = 30ℎ degrees clockwise after ℎ whole hours, with an
extra 360 1
12 60 𝑚 = 0.5𝑚 degrees added due to the minute. While computing the
current time directly from the angle is difficult, computing the angle from the
current time is easy.
The brute force solution is to test the 60 · 12 = 720 different times, and pick

153
C HAPTER 9. B RUTE F ORCE

the one which matched the given angle:

1: procedure Clock(𝐴)
2: for ℎ ← 0 to 11 do
3: for 𝑚 ← 0 to 59 do
4: hourAng ← 300ℎ + 5𝑚 ⊲ Angles are 10’ths of degrees to avoid
storing half of degrees.
5: minuteAng ← 60𝑚
6: angBetween ← (minuteAng − hourAng + 3600) mod 3600
7: if angBetween = 𝐴 then
8: return ℎ:𝑚

Exercise 9.2. Can there be two times that produce the same angle? If yes,
produce such an example. If no, prove that there are no two such times.

Competitive Tip
Competitions sometimes pose problems which are solvable quite fast, but where a
brute force algorithm suffice. Code the simplest correct solution that is fast enough,
even if you see a faster one.

Problem 9.3
The Clock – theclock
All about that base – allaboutthatbase
Perket – perket

9.3 Backtracking
Backtracking is a variation of the generate and test method. It can be much faster
than a generate and test solution, but it is not always applicable and is sometimes
be more difficult to code (in particular when the solutions are subsets).
Consider our solution to the Max Clique problem. In our solution, we
generated all the candidate solutions (i.e., subsets of vertex) by using bitsets.
When solving problems where the candidate solutions are other objects than
subsets, or the number of subsets is too large to iterate through, we need to
construct the solutions in another way. Generally, we do this recursively.
When generating subsets, we would go through every element one at a time,
deciding whether to include it or not in a recursive fashion. Backtracking extends

154
9.3. B ACKTRACKING

Include 2? {}

{}
Include 1? {2}

{}
Include 2? {1}

{1}

FAIL

Figure 9.3: Recursively backtracking over subsets.

generate and test to not only testing all the candidate solutions, but also these
partial candidates. Once a partial candidate is identified as being infeasible –
for the clique example, once we include two non-neighbouring vertices – the
backtracking can stop early. If there are much fewer valid partial candidates
than total candidates checked by a generate and test approach, this saves time.
In Figure 9.3 an example of this approach is demonstrated. The example
illustrates the beginning of a backtracking recursion that generates subsets.
Subsets are recursively created by either including or excluding every element,
one at a time. One subset, {1, 2} was identified as not being permissible – for
example, because the vertices they represented in a clique problem was not
neighbours. Thus, no further backtracking was performed.
As a concrete example, consider a variant of the clique problem, where we
are interested in computing the number of cliques with at most 6 vertices. In
this case, we can solve the problem for larger instances than what generating
all possible subsets and testing them allows us to. The trick is that while there
are 2𝑉 subsets of 𝑉 vertices, there are only 𝑂 (𝑉 7 ) subsets containing up to 6
vertices (see Chapter 17 on why this is). Thus, if we can restrict ourselves to
never generating any other kind of subset, the solution ought to be much faster.
This generation is easily implemented using backtracking:

155
C HAPTER 9. B RUTE F ORCE

6-cliques, Recursive Variant


1 // Recursively construct cliques of at most 6 vertices. We keep track
2 // of what vertices are included so far in the bitset included, and
3 // have yet to make a decision of whether vertices [0, 1, ..., at]
4 // should be included or not.
5 int clique6(int at, bitset<40> included,
6 const vector<bitset<40>>& neighbours) {
7 if (at < 0) return 1;
8 int answer = 0;
9 // Case 1: Not including the current vertex.
10 answer += clique6(at - 1, included, neighbours);
11 // Case 2: All the previously included vertices are neighbours,
12 // and we have only chosen at most 5 vertices previously.
13 if ((included & neighbours[at]) == included
14 && included.count() <= 5) {
15 included.set(at);
16 answer += clique6(at - 1, included, neighbours);
17 }
18 return answer;
19 }
20
21 int main() {
22 int V, E;
23 cin >> V >> E;
24 // Bitsets containing all the neighbours of all vertices
25 vector<bitset<40>> neighbours(V);
26 rep(i,0,E) {
27 int A, B;
28 cin >> A >> B;
29 neighbours[A].set(B);
30 neighbours[B].set(A);
31 }
32 cout << clique6(V - 1, bitset<40>(), neighbours) << endl;
33 }

How fast is this solution? Analysis is slightly tricky, but we can give an
upper bound that is small enough. If we have about 40 vertices, there are about
7.6 · 105 subsets with at most 5 vertices. Since no subset requires more than 40
recursive calls to construct it (one for each vertex in the set), constructing these
subsets require no more than about 3.0 · 107 recursive calls. Each subset of 6
elements is constructed from one of these subsets and results in only a single
additional recursive call, so in total the function is invoked at most 6.0 · 107 .
The function performs only a few constant-time operations, so this should be
fine. Compare this with a non-backtracking generate and test solution, which
would need to construct about 1.1 · 1012 subsets instead – clearly too much.
Backtracking in principle works if we can:

156
9.3. B ACKTRACKING

• construct candidate solutions recursively,


• quickly determine whether a partial candidate solution can possibly be
completed to a candidate solution, and
• the number of valid partial solutions is sufficiently small.

Problem 9.4
6-cliques – 6clique
Class Picture – classpicture
Boggle – boggle
Geppetto – geppetto
Map Colouring – mapcolouring
Picking Apples – apples

As a general principle, backtracking seems simple enough. Some backtrack-


ing solutions require a bit more ingenuity, as in the next problem.

Basin City Surveillance


Nordic Collegiate Programming Contest 2014
Authors: Pål G. Drange and Markus S. Dregi (CC BY-SA 3.0))
Basin City is known for her incredibly high crime rates. The police see no
option but to tighten security. They want to install traffic drones at different
intersections to observe who’s running on a red light. If a car runs a red light,
the drone will chase and stop the car to give the driver an appropriate ticket.
The drones are quite stupid, however, and a drone will stop before it comes to
the next intersection as it might otherwise lose its way home, its home being
the traffic light to which it is assigned. The drones are not able to detect the
presence of other drones, so the police’s R&D department found out that if a
drone was placed at some intersection, then it was best not to put any drones at
any of the neighbouring intersections. As is usual in many cities, there are no
intersections in Basin City with more than four other neighbouring intersections.
The drones are government funded, so the police force would like to buy as
many drones as they are allowed to. Being the programmer-go-to for the Basin
City Police Department, they ask you to decide, for a given number of drones,
whether it is feasible to position exactly this number of drones.
Input

157
C HAPTER 9. B RUTE F ORCE

The first line contains an integer 𝑘 (0 ≤ 𝑘 ≤ 15), giving the number of drones
to position. Then follows one line with 1 ≤ 𝑛 ≤ 100 000, the total number
of intersections in Basin City. Finally follow 𝑛 lines describing consecutive
intersections. The 𝑖’th line describes the 𝑖’th intersection in the following
format: The line starts with one integer 𝑑 (0 ≤ 𝑑 ≤ 4) describing the number
of intersections neighbouring the 𝑖’th one. Then follow 𝑑 integers denoting
the indices of these neighbouring intersections. They will be all distinct and
different from 𝑖. The intersections are numbered from 1 to 𝑛.
Output
If it is possible to position 𝑘 drones such that no two neighbouring intersections
have been assigned a drone, output a single line containing possible. Otherwise,
output a single line containing impossible.
At a first glance, it is not even obvious whether the problem is a brute force
problem, or if some smarter principle should be applied. After all, 100 000
intersections is a huge number of intersections! We can make the problem a bit
more reasonable with our first insight. If we have a large number of intersections,
and every intersection is adjacent to very few other intersection, it is probably
very easy to place the drones at appropriate intersections. To formalize this
insight, consider what happens when we place a drone at an intersection.

Figure 9.4: The intersections affected by placing a drone at an intersection.

By placing a drone at the intersection marked in black in Figure 9.4, at most


five intersections are affected – the intersection we placed the drone at, along
with its neighbouring intersections marked in gray. If we would remove these

158
9.3. B ACKTRACKING

five intersections, we would be left with a new city where we need to place
𝑘 − 1 drones. This simple fact – which is the basis of a recursive solution to
the problem – tells us that if we have 𝑁 ≥ 5𝑘 − 4 intersections, we immediately
know the answer is possible. The −4 terms comes from the fact that when
placing the final drone, we no longer care about removing its neighbourhood,
since no further placements will take place.
Therefore, we can assume that the number of intersections is less than
5 · 15 − 4 = 71, i.e., 𝑛 ≤ 70. This certainly makes the problem seem much more
tractable. Now, let us start developing solutions to the problem.
First of all, we can attempt to use the same algorithm as we used for the
Max Clique problem. We could recursively construct the set of our 𝑘 drones
by, for each intersection, trying to either place a drone there or not. If placing a
drone at an intersection, we would forbid placing drones at any neighbouring
intersection.
Unfortunately, this means that we test every intersection when placing a
certain drone somewhere. This would give us a complexity of 𝑂 (𝑛𝑘 ). More
specifically, the execution time 𝑇 (𝑛, 𝑘) would satisfy 𝑇 (𝑛, 𝑘) ≈ 𝑇 (𝑛 − 1, 𝑘) +
𝑇 (𝑛 − 1, 𝑘 − 1), which implies 𝑇 (𝑛, 𝑘) ≈ 𝑛𝑘 = Ω(𝑛𝑘 ) (see Section ?? for more


details). For 𝑛 = 70, 𝑘 = 15, this is too high. The values of 𝑛 and 𝑘 do
suggest that an exponential complexity is in order, just not of this kind. Instead,
something similar to 𝑂 (𝑐 𝑘 ) where 𝑐 is a small constant would be a better fit. One
way of achiving such a complexity would be to limit the number of intersections
we must test to place a drone at before trying one that definitely works. If we
could manage to test only 𝑐 such intersections, we would get a complexity of
𝑂 (𝑐 𝑘 ).

159
C HAPTER 9. B RUTE F ORCE

Competitive Tip

In this problem, we tried to use the size of the parameters 𝑛 and 𝑘 together with the
time limit to guide the kind of solution we need to design. While this works most of
the time, note that this can sometimes be severely misleading – as this problem was
before we realized that having 100 000 intersections was red herring.

The trick, yet again, comes from Figure 9.4. Assume that we choose to
include the black intersection in our solution, but still can not construct a solution.
The only reason this case can happen is (aside from bad previous choices) that
no optimal solution includes this intersection. What could possibly stop this
intersection from being included in an optimal solution? It must be because
one of its gray neighbours is included in every optimal solution. If this was not
the case, then we could just pick an optimal solution where none of the gray
intersections were included and instead include the black vertex. Fortunately for
us, this gives us just what we need to improve our algorithm – either a given
intersection, or one of its neighbours, must be included in any optimal solution.
We have accomplished our goal of reducing the number of intersections to
test for each drone to a mere 5, which will give us a complexity of about 𝑂 (5𝑘 )
(possibly with an additional polynomial factor in 𝑛 depending on implementation).
This is still too much, unless, as the jury, noted, some “clever heuristics” are
applied. Fortunately for us, applying a common principle will speed things up
dramatically (even giving us a better time complexity).
Our next trick is to assume that the graph we are working with is connected.
In many problems the connected components of the graph are all independent of
each other. This is also the case in this problem. Clearly placing drones in one
component does not affect how we place drones in any other component, so we
can solve them separately by computing the maximal number of drones we can
place in every such set, until we have processed enough sets to place 𝑘 drones.
How does the connectedness help us? Consider what happens when we have
placed our first drone on the black intersection as in Figure 9.4. By removing
it and the gray neighbours, the white intersections must now have at most 3
neighbours instead. Recursing on one of the white intersections would then
leave us with only 4 choices, placing a drone on either the white vertex or one
of its (at most 3) neighbours. In fact, we can extend this reasoning to show
that there must always be an intersection with at most 3 neighbours! Proving
this is a straightforward proof by contradiction. Assume that we in a connected
graph have placed at least one drone, but all remaining intersections have four

160
9.3. B ACKTRACKING

neighbours. Then, none of these intersections can be neighbours with any vertex
we have removed so far. This means that the set of intersections removed and the
set of intersections remaining are actually disconnected, to the contrary of our
assumption of connectedness. Taking this insight to its conclusion, we achieve
a complexity of 𝑂 (4𝑘 ) by always branching on the intersection with the fewest
neighbours.
While such an algorithm is significantly faster than 𝑂 (5𝑘 ), further improve-
ments are possible. Again, let us consider under what circumstances a certain
intersection is excluded from any optimal solution. We have already concluded
that if this is the case, then one of its neighbours must be included in any optimal
solution. Can it ever be the case that only one of its neighbours are included in
an optimal solution, as in Figure 9.5 where we chose to place a drone on the
black vertex but none of the white intersections?

Figure 9.5: Placing a drone at a single neighbour of an intersection.

This is actually never the case. We can always move the drone from the
black intersection to the white intersection it has as neighbour, since none of
its other white intersections contain a drone. Now, we are basically done; for
any intersection, there will either be an optimal solution including it, or (at
least) two of its neighbours. Since an intersection has at most 4 neighbours,
it has at most 6 pairs of neighbours. This means our recursion will take time
𝑇 (𝑘) = 𝑇 (𝑘 − 1) + 6𝑇 (𝑘 − 2) in the worst case. This recurrence has the solution
3𝑘 , since 3𝑘−1 + 6 · 3𝑘−2 = 3𝑘−1 + 2 · 3𝑘−1 = 3 · 3𝑘−1 = 3𝑘 . A final improvement
would be to combine this insight with the independence of the connected subsets
of intersections. The second term of the time recurrence would then be a 3

161
C HAPTER 9. B RUTE F ORCE

instead of a 6 (as 3 neighbours make 3 pairs). Solving this recurrence would


give us the complexity 𝑂 (2.31𝑘 ) instead.
Problem 9.5
Basin City – basincity

The general version of this problem (without the bounded degree) is called
Independent Set, and is also one of the NP-complete problems.
So, what is the take-away regarding backtracking? Start by finding a way to
construct candidate solutions iteratively. Then, try to integrate the process of
testing the validity of a complete solution with the iterative construction, in the
hope of significantly reducing the number of candidate solutions which need
evaluating. Finally, we might need to use some additional insights, such as what
to branch on (which can be something complicated like the neighborhood of a
vertex), deciding whether to backtrack or not (i.e., improving the testing part) or
reducing the number of branches necessary (speeding up the generation part).
Problem 9.6
Domino – domino
Fruit Baskets – fruitbaskets
Infiltration – infiltration
Vase Collection – vase

9.4 Fixing Parameters


The parameter fixing technique is also similar to the generate and test method,
but is not used to test the solution set itself. Instead, you perform brute force to
fix some parameter in the problem by trying possible values they can assume.
Hopefully, fixing the correct parameters allows you to solve the remaining
problem easily. The intuition is that while any one choice of parameter may be
wrong, testing every choice allows us to assume that we at some point used the
correct parameter.
The path to a solution usually starts in the other end. Let us illustrate the
technique with a problem.

162
9.4. F IXING PARAMETERS

Buying Books
Swedish Olympiad in Informatics 2010, Finals
You are going to buy 𝑁 books, and are currently checking the different 𝑀
internet book stores for prices. Each book is sold by at least one book store,
and can vary in prices between the different stores. Furthermore, each book
store incur a postage fee if you order from it. Postage may vary between the
various book stores, but it is always the same for a book store no matter how
many books you decide to order. You may order any number of books from any
number of the book stores. Compute the smallest amount of money you need to
pay for all the books.
Input
The first line contains two integers 1 ≤ 𝑁 ≤ 100 – the number of books, and
1 ≤ 𝑀 ≤ 15 – the number of book stores.
Then, 𝑀 descriptions of the book stores follow. The description of the 𝑖’th
store starts with a line containing two integers 0 ≤ 𝑃𝑖 ≤ 1 000 (the postage for
this book store), and 1 ≤ 𝐿𝑖 ≤ 𝑁 (the number of books this store sells). The
next 𝐿 lines contains the books sold by the store. The 𝑗’th book is described by
two integers 0 ≤ 𝐵𝑖,𝑗 < 𝑁 – the (zero-indexed) number of a book being sold
here, and 1 ≤ 𝐶𝑖,𝑗 ≤ 1000 – the price of this book at the current book store.
Output
Output a single integer – the smallest amount of money you need to pay for the
books.
If we performed naive generate and test on this problem, we would probably
get something like 15100 solutions, by testing every book stores for every book.
This is infeasible. So, why can we do better than this? There must be some hidden
structure in the problem that makes testing all those possibilities unnecessary.
To find this structure, we will analyze a candidate solution as given by the naive
generate and test method, i.e. an assignment from each book to a book store
where we should purchase it from.
For the sake of example, let’s assume that in this candidate solution, we
purchased books from the book stores 1, 4 and 5. If we then purchased a book
from store 4, but it was actually cheaper from store 1, we should have picked it
from there instead. Thus there seems to be quite a bit of redundancy in this set
of candidate solutions – a strong hint that we might have found some crucial
insight. We could decide to use this fact to turn our generate and test into a

163
C HAPTER 9. B RUTE F ORCE

backtracking algorithm, by pruning away any solution where we at some point


decide to purchase a book from a book store that contains a book that we already
purchased at a cheaper price. Unfortunately, this is easily defeated by giving
most of the books equal prices at most of the book stores.
Instead, let us use the insight differently. Our observation hints that making
a choice for every book is not especially good, since choices elsewhere affects
the optimality of an individual book choice greatly. Digging deeper, we find
that we do not really have any choice in where we buy the books. During the
course of the suggested backtracking, what matters is never what particular store
we purchased a book from. Rather, it is only of interest what stores we have
decided to use so far – this will uniquely determine where we purchase every
book from, since we are forced to buy the book from the cheapest store that we
buy books from.
At this point, we are basically done. We have reduced the amount of
information we need to know in order to easily find a solution “which book
stores will we purchase from?”. This parameter has much fewer possibilities –
only 215 . By testing each possible choice and fixing it, we can immediately find
the best candidate solution using the “cheapest store for each book” rule. Since
we test every option for this parameter in the full set of candidate solutions, we
must also test the one that is given by the optimal solution. This results in the
following pseudo code:

1: procedure BuyingBooks(books 𝑁 , stores 𝑀, costs 𝐶, postages 𝑃)


2: answer ← ∞
3: for every 𝑆 ⊆ [𝑀] do
4: cost ← 0
5: for every 𝑠 ∈ 𝑆 do
6: cost ← cost + 𝑃𝑠
7: for every 𝑏 ∈ [𝑁 ] do
8: cost ← cost + min𝑖 ∈𝑆 𝐶𝑖,𝑏
9: answer ← min(answer, cost)
10: return answer

Alternatively, we could have come to the same insight by asking ourselves


“can we bruteforce over the number of book stores?”. Whenever you have a
parameter about this size in a problem, this is a question worth asking. However,

164
9.5. M EET IN THE M IDDLE

the parameter to brute force over is not always this explicit, as in the following
problem, which asks us to find all integer solutions to an equation in a certain
interval.
Problem 9.7
Buying Books – buyingbooks

Integer Equation
Codeforces Round #262, Problem B
Given integers 𝑎 (|𝑎| ≤ 10 000), 𝑏 (1 ≤ 𝑏 ≤ 5), and 𝑐 (|𝑐 | ≤ 10 000), determine
the integers 𝑥 (1 ≤ 𝑥 ≤ 109 ) satisfying

𝑥 = 𝑎 · 𝑠 (𝑥)𝑏 + 𝑐,

where 𝑠 (𝑥) is the digit sum of 𝑥.

In this problem, the only explicit object we have is 𝑥. Unfortunately, 109 is a


tad too many possibilities. If we study the equation a bit closer, we see that 𝑠 (𝑥)
also varies as a function of 𝑥. This is helpful, since 𝑠 (𝑥) has far fewer options
than 𝑥. In fact, a number bounded by 109 has at most 9 digits, meaning it has a
maximum digit sum of 9 · 9 = 81. Thus, we can solve the problem by looping
over all the possibles values of 𝑠 (𝑥). This uniquely fixes our right hand side,
which equals 𝑥. Given 𝑥, we verify that 𝑠 (𝑥) has the correct value. Whenever
we have such a function, i.e., one with a large domain (like 𝑥 in the problem)
but a small image (like 𝑠 (𝑥) in the problem), this technique can be used by brute
forcing over the image instead. This is actually what we did in the book store
problem as well. Our function was then from a given candidate solution to the
book stores used. Since the image of this function (all the possible subsets of
book stores) was small, the problem could be attacked by brute forcing over the
image of the function rather than the domain.
Problem 9.8
Shopping Plan – shoppingplan

9.5 Meet in the Middle


The meet in the middle technique is a special case of the parameter fixing
technique. The general theme will be to fix half of the parameter space and

165
C HAPTER 9. B RUTE F ORCE

build some fast data structure such that when testing the other half of the
parameter space, we can quickly find the best parameters for the first half. It is a
space-time tradeoff, in the sense that we improve the time usage (testing half of
the parameter space much faster), by paying with increased memory usage (to
save the pre-computed structures).

Subset Sum
Given a set of integers 𝑆, is there some subset 𝐴 ⊆ 𝑆 with a sum equal to 𝑇 ?
Input
The first line contains an integer 𝑁 , the size of 𝑆, and 𝑇 . The next line contains
𝑁 integers 𝑠 1, 𝑠 2, ..., 𝑠 𝑁 , separated by spaces – the elements of 𝑆. It is guaranteed
that 𝑠𝑖 ≠ 𝑠 𝑗 for 𝑖 ≠ 𝑗.
Output
Output possible if such a subset exists, and impossible otherwise.

In this problem, a simple generate and test solution would have 𝑁 parameters
to brute force over. For each element of 𝑆, we either choose to include it in 𝐴 or
not – a total of two choices for each parameter. This naive attempt at solving
the problem (which amounts to computing the sum of every subset) gives us
a complexity of 𝑂 (2𝑁 ). While sufficient for e.g. 𝑁 = 20, we can make an
improvement that makes the problem tractable even for 𝑁 = 40.
To figure out if a meet in the middle solution is applicable, the two halves of
the parameter space must to a large extent be independent. Individual choices in
one half should have little effect on the other. This is for example not the case
in the max clique problem. Deciding what vertices to include from, say, the
lower-numbered half will put very complicated constraints on what vertices we
could pick from the other half. Such a situation should discourage you from
attempting to meet in the middle.
In the subset sum problem, our parameters are to a large extent independent.
When fixing the 𝑁2 first parameters, which may mean we include elements with
a sum of 𝑈 , a single constraint is placed on the remaining 𝑁2 parameters; they
must sum to 𝑇 − 𝑈 , to together make the correct sum.
Thus, if we could quickly answer the question “can we choose the latter half
of the integers such that they have a given sum?” we could solve the problem by
fixing the first half of the parameters. Individually, this constraint takes 𝑂 (2 2 )
𝑁

time to check if we use brute force. However, we can compute the answer for

166
9.5. M EET IN THE M IDDLE

all such questions in one go by computing the sum of every subset of the latter
half of elements in Θ(𝑁 2 2 ). The resulting sums can be inserted into a hash set,
𝑁

letting us determine if a sum can be formed by those elements in Θ(1) instead.


Here is the space-time tradeoff – we sacrifice an exponential amount of memory
(a set of 2 2 elements) to win an exponential speedup of 2 2 . The complexity is
𝑁 𝑁

thus Θ(𝑁 2 2 ), dominated by the precomputation. In pseudo code, a meet in the


𝑁

middle solution looks something like this:

1: procedure SubsetSum(set 𝑆, target 𝑇 )


2: 𝑁 ← |𝑆 |
3: left ← 𝑁2
4: right ← 𝑁 − left
5: Lset ← the left first elements of 𝑆
6: Rset ← 𝑆 \ Lset
7: Lsums ← new 𝑠𝑒𝑡
8: for each 𝐿 ⊆ Lset do
9:
Í
Lsums.𝑖𝑛𝑠𝑒𝑟𝑡 ( 𝑙 ∈𝐿 𝑙)
10: for each 𝑅 ⊆ Rset do
11:
Í
sum ← 𝑟 ∈𝑅 𝑟
12: if Lsums.𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑠 (𝑇 − sum) then
13: output 𝑡𝑟𝑢𝑒
14: return
15: output 𝑓 𝑎𝑙𝑠𝑒

Problem 9.9
Maximum Loot – maxloot

We will end the chapter by solving a brute force problem combining two
techniques.

Limited Correspondence
Greg Hamerly
Emil, a Polish mathematician, sent a simple puzzle by post to his British friend,
Alan. Alan sent a reply saying he didn’t have an infinite amount of time he could
spend on such non-essential things. Emil modified his puzzle (making it a bit

167
C HAPTER 9. B RUTE F ORCE

more restricted) and sent it back to Alan. Alan then solved the puzzle.
Here is the original puzzle Emil sent: given a sequence of pairs of strings
(𝑎 1, 𝑏 1 ), (𝑎 2, 𝑏 2 ), . . ., (𝑎𝑘 , 𝑏𝑘 ), find a non-empty sequence 𝑠 1 , 𝑠 2 , . . ., 𝑠𝑚 such
that the following is true:

𝑎𝑠1 𝑎𝑠2 . . . 𝑎𝑠𝑚 = 𝑏𝑠1 𝑏𝑠2 . . . 𝑏𝑠𝑚

where 𝑎𝑠1 𝑎𝑠2 . . . indicates string concatenation. The modified puzzle that Emil
sent added the following restriction: for all 𝑖 ≠ 𝑗, 𝑠𝑖 ≠ 𝑠 𝑗 .
You don’t have enough time to solve Emil’s original puzzle. Can you solve
the modified version?
Input
The input starts with a line containing an integer 1 ≤ 𝑘 ≤ 11, followed by 𝑘 lines.
Each of the 𝑘 lines contains two lowercase alphabetic strings which represent
a pair of strings. Each individual string will be non-empty and at most 100
characters long.
Output
Output the sequence found (if it is possible to form one) or IMPOSSIBLE (if it is not
possible to solve the problem). If it is possible but there are multiple sequences,
you should prefer the shortest one (in terms of the number of characters output).
If there are multiple shortest sequences, choose the one that is lexicographically
first.
The original problem as posed by Emil1 is called the Post correspondence
problem and is an undecidable problem, i.e. there is no algorithm in the familiar
sense that can solve the problem in finite time.
Alan’s added restriction to the problem, that if 𝑖 ≠ 𝑗, then 𝑠𝑖 ≠ 𝑠 𝑗 , means that
each pair of strings may be used in the sequence at most once. This clearly allows
us to solve the problem in finite time – we could simply test all 𝑘! permutations
of the pairs, and walk through them to see if the two strings formed by the
respective strings of all the pairs match. This would require around 𝑘! · 𝑘 · 100
operations, which is about 4 · 1010 for the maximum 𝑘 = 11, which is clearly
too slow.
The big difference in this problem compared to the one where we could
perform meet in the middle is that our selection of strings is highly order
dependent. We can’t arbitrarily split up our word pairs into a “first half” and a
1Emil Post, an American mathematician

168
9.5. M EET IN THE M IDDLE

“second half” and attempt to combine them. After all, the correct solution might
involve constructing a string where words from the two halves are intertwined.
Of course, we have previously seen how to deal with such a problem. If an
arbitrary choice is not good enough, we try all the choices. We simply fix the
parameter that is the subset of word pairs constituting the first half. There are
only 462 ways in which one can pick the first 5 word pairs out of a maximum
eleven – a small price to pay.
Fixing parameters thus allows ut to split up the words in two halfs. This
begs the question – if we attempt all possible permutations of the words in the
first half, what constraint do they put on the second half? Let’s check. Assume
that a given permutation of the (currently fixed) first half of the pairs give a
concatenated string of 𝑎’s is the string 𝑆, and without loss of generality shorter
than the concatenation of the 𝑏’s. First of all, it should be clear that 𝑆 must
be a substring of the concatenation of the 𝑏’s – otherwise, concatenating the
strings of the second half can’t make them equal. Thus, we assume that the
concatenation of the 𝑏’s is 𝑆𝑇 .
Symmetrically, this tells us that if the concatenation of all 𝑏’s in the second
half is the string 𝑈 , the concatenation of all the 𝑎’s must be 𝑇𝑈 , to together
make the strings 𝑆𝑇𝑈 . Thus, the question is – can we order the words of the
second part so that they create strings of the form 𝑈 and 𝑇𝑈 for any 𝑈 ?
This is precisely the kind of simple question that makes meet in the middle
possible. To answer it quickly, we check all the 6 words in the second half. If
they are of the form 𝑈 and 𝑋𝑈 , for some 𝑋 , we store it in a hash map from 𝑋 to
the lexicographically smallest 𝑈 that we found. With this map in hand, we can
determine if there is a way to complete the strings formed by the first half in
constant time.
In total, the cost is somewhere around about 462 · (5! · 5 + 6! · 6) ≈ 2 · 106
hash map operations, and 462 · (5! · 5 + 6! · 6) · 600 ≈ 1.3 · 109 individual
character operations depending on implementation. While the latter seems like
a lot, they are very fast – solving the worst-case on the author’s computer 5 times
takes less than a second.
Problem 9.10
Closest Sums – closestsums
Celebrity Split – celebritysplit
Circuit Counting – countcircuits

169
C HAPTER 9. B RUTE F ORCE

Indoorienteering – indoorienteering
Key to Knowledge – keytoknowledge
Knights in Fen – knightsfen
Rubrik’s Revenge in ... 2D!? 3D? – rubriksrevenge

9.6 Chapter Notes


Brute force problems come in many varieties. While the basis of most solutions
often involve some kind of backtracking with pruning, finding the right ones
can often need a lot of creativity.
A deeper dive into brute force together can be found in Exact Exponential
Algorithms[1]. The book also discusses several techniques that are common
within algorithm problem solving.
A concept that often comes up in research and and occasionally in pro-
gramming competitions, is that of fixed parameter tractability. A problem that
might (as of now) only have exponential time algorithms, such as the max clique
problem, may have a polynomial solution when fixing some parameter, such as
the size of the maximum clique. Indeed, we saw that the problem of finding the
largest clique of size at most 6 admitted a polynomial solution in the number
of vertices. Parameterized Algorithms[8] provide a comprehensive toolbox of
such problems.
Another neat technique with elements of brute force is color-coding[2] to
solve certain graph theoretical problems.

170
10 Greedy Algorithms
In this chapter, we are going to look at another standard technique to solve some
categories of search and optimization problems faster than naive bruteforce, by
exploiting properties of local optimality.

10.1 Change-making Problem


Let us start by looking at a well-known algorithmic problem, the change-making
problem.

Change-making Problem, Denominations 1, 2, 5


Given an infinite number of coins of denominations 1, 2, 5, determine the smallest
number of coins needed to sum up to 𝑇 (1 ≤ 𝑇 ≤ 109 ).

We can use the tools from the previous chapter on brute force to formulate a
plain backtracking solution. Using a recursive function that takes in 𝑇 , we can
attempt to add a single coin of each type, and keep searching:

1: procedure MakeChange(integer 𝑇 )
2: if T = 0 then
3: return 0
4: answer ← 1 + MakeChange(T − 1)
5: if T ≥ 2 then
6: answer ← min(answer, 1 + MakeChange(T − 2))
7: if T ≥ 5 then
8: answer ← min(answer, 1 + MakeChange(T − 5))
9: return 𝑎𝑛𝑠𝑤𝑒𝑟

However, this solution is exponential in time. It branches three times per


call, and each call decreases 𝑇 by at most 5, so the recursion will have a depth
of at least 𝑇5 . It thus has a lower bound of Ω(3 5 ).
𝑇

We can phrase this problem using the kind of graph we previously discussed.
Let the graph have vertices labeled 0, 1, 2, ...,𝑇 , representing the amount of

171
C HAPTER 10. G REEDY A LGORITHMS

money we wish to sum up to. For a vertex labeled 𝑥, we add edges to vertices
labeled 𝑥 − 1, 𝑥 − 2, 𝑥 − 5 (if those vertices exist), weighted 1. Traversing
such an edge represents adding a coin of denomination 1, 2 or 5. Then, the
Change-making Problem can be phrased as computing the shortest path from
the vertex 𝑇 to 0. The corresponding graph for 𝑇 = 5 can be seen in Figure 10.1.

-1

-1 -1

5 4 3 2 1 0
-1 -1 -1 -1 -1
-1 -1

Figure 10.1: The Change-making Problem, formulated as finding the shortest path in a DAG, for
𝐾 = 5.

So, how does this graph formulation help us? Solving the problem on the
graph as before using simple recursion would be very slow (with an exponential
complexity, even). In Chapter 11 on Dynamic Programming, we will see how to
solve such problems in polynomial time. For now, we will settle with solving
problems exhibiting yet another property besides having optimal substructure –
that of local optimality.

Exercise 10.1. Compute the shortest path from each vertex in Figure 10.1 to 𝑇
using the optimal substructure property.

10.2 Optimal Substructure


Most optimization problems we study can be formulated as having to make
sequential choices. For example, we managed to formulate the max clique
problem as a sequence of 𝑉 choices – for each vertex, should we include it or
not?
A useful formulation of this process is the following: Given a weighted,
directed acyclic graph (DAG) on 𝑁 vertices, what is the “best” path from a
vertex 𝑆 to another vertex 𝑇 ? This graph is almost never be given explicitly.
Instead, it hides behind the problem as a set of states (the vertices). At each state,
we are to make some choice that takes us to another state (traversing an edge).
In the case of backtracking, you may recognize elements of this formulation in

172
10.3. L OCALLY O PTIMAL C HOICES

Figure 9.3 on page 155, since we essentially explore a big kind of graph in such
algorithms.
If the path consists of edges 𝑒 1, 𝑒 2, . . . , 𝑒𝑘 , the function we are to maximize
will be of the form

𝐺 (𝑒 1, 𝑒 2, . . . , 𝑒𝑘 ) = 𝑔(𝑒 1, 𝑔(𝑒 2, 𝑔(. . . , 𝑔(𝑒𝑘 , 0))))

where 𝑔 is a function from 𝐸 × 𝑅 to 𝑅. We will denote 𝐵(𝑣) as the maximum


value of 𝐺 (𝑒 1, 𝑒 2, . . . , 𝑒𝑘 ) over all paths from 𝑣 to 𝑇 . Often, 𝑔 will be the negative
sum of the edge weights, meaning we look for the shortest path from 𝑆 to 𝑇 .
If 𝑔(𝑒, 𝑥) is increasing in 𝑥, we say that problems exhibiting this property has
optimal substructure.
One way to solve the problem would be to evaluate 𝐺 for every path in
the graph. Unfortunately, the number of paths in a general graph can be
huge (growing exponentially with the number of vertices). However, the
optimal substructure property allows us to make a simplification. Assume
that 𝑆 has neighbors 𝑣 1, 𝑣 2, ..., 𝑣𝑚 . Then by the definition of 𝐵 and 𝑔, 𝐵(𝑠) =
max(𝑔({𝑠, 𝑣𝑖 }, 𝐵(𝑣𝑖 )). Thus, we can solve the problem by first solving the same
problem on all vertices 𝑣𝑖 instead.

10.3 Locally Optimal Choices


Greedy algorithms solve this kind of problem by making what is called a locally
optimal choice. We generally construct our shortest path iteratively, one edge at
a time. We start out at the first vertex, 𝑆, and need to choose an edge to traverse.
A greedy algorithm chooses the edge which locally looks like a good choice,
without any particular thought about future edges.
For example, consider how you would solve the Change-making Problem
yourself. You would probably attempt to add the 5 coin as many times as
possible, until you need to add less than 5. This makes sense locally, since 5 is
the largest amount we may charge at a time. When we need to add less than 5,
we would probably instead add coins worth 2, until we need to add either 0 or 1.
In the latter case, we would add a final 1 coin.
Intuitively, this makes sense. Adding the highest amount every time ought
to be the best possible. For this problem, this is actually true. However, if the
denominations were different, this could fail (Exercise 10.2).
Exercise 10.2. Prove that the greedy choice may fail if the coins have denomi-
nations 1, 6 and 7.

173
C HAPTER 10. G REEDY A LGORITHMS

Assume that the optimal solution uses the 1, 2 and 5 coins 𝑎, 𝑏 and 𝑐 times
respectively. We then have that either:

𝑎 = 1, 𝑏 ≤ 1 If 𝑏 ≥ 2, we could exchange one 1 coin and two 2 coins for one 5


coin.

𝑎 = 0, 𝑏 ≤ 2 If 𝑏 ≥ 3, we could exchange three 2 coins one 1 coin and one 5


coin.
If 𝑎 ≥ 2, we could instead add a single 2 coin instead of two 1 coins.
This means that the possibilities for 𝑎 and 𝑏 are few:

• 𝑎 = 0, 𝑏 = 0: value 0

• 𝑎 = 1, 𝑏 = 0: value 1

• 𝑎 = 0, 𝑏 = 1: value 2

• 𝑎 = 1, 𝑏 = 1: value 3

• 𝑎 = 0, 𝑏 = 2: value 4

Now, assume that we use exactly 𝑐 coins of value 5, so that 𝑇 = 5𝑐 + 𝑟 . We


know that 0 ≤ 𝑟 < 5. Otherwise, we would be summing up to an amount larger
than 4 without using any 5 coins, but this is impossible based on the above list.
This means 𝑐 must be as large as possible, i.e. it is optimal to add as many 5
coins as possible – which the greedy choice will. Then, only the cases 0, 1, 2,
3 and 4 remain. Looking at the list of those cases, we see that their optimal
solutions correspond to how the greedy algorithm works. Thus, the greedy
algorithm will always give the optimal answer.
Competitive Tip
If you have the choice between a greedy algorithm and another algorithm (such as
one based on brute force or dynamic programming), use the other algorithm unless
you are certain the greedy choice works.

Proving the correctness of a locally optimal choice is sometimes very


cumbersome. In the remainder of the chapter, we are going to look at a few
standard problems that are solvable using a greedy algorithm. Take note of the
kind of arguments we are going to use – there are a few common types of proofs
which are often used in proofs of greedy algorithms.

174
10.4. S CHEDULING

10.4 Scheduling
Scheduling problems are a class of problems which deals with constructing large
subsets of non-overlapping intervals, from some given set of intervals.
The classical Scheduling Problem is the following.

Scheduling Problem
Given is a set of half-open (being open on the right) intervals 𝑆. Determine the
largest subset 𝐴 ⊆ 𝑆 of non-overlapping intervals.
Input
The input contains the set of intervals 𝑆, where |𝑆 |.
Output
The output should contain the subset 𝐴.

[ )

[ )

[ )

[ )

[ )

[ )

[ )

0 1 2 3 4 5 6 7 8
[ ) [ )[ ) [ )

Figure 10.2: An instance of the scheduling problem, with the optimal solution at the bottom.

We will construct the solution iteratively, adding one interval at a time.


When looking for greedy choices to perform, extremal cases are often the first
ones you should consider. Hopefully, one of these extremal cases can be proved

175
C HAPTER 10. G REEDY A LGORITHMS

to be included in an optimal solution. For intervals, some extremal cases would


be:

• a shortest interval,

• a longest interval,

• an interval overlapping as few other intervals as possible,

• an interval with the leftmost left endpoint (and symmetrically, the rightmost
right endpoint),

• an interval with the leftmost right endpoint (and symmetrically, the


rightmost left endpoint).

As it turns out, we can always select an interval satisfying the fifth case.
In the example instance in Figure 10.2, this results in four intervals. First, the
interval with the leftmost right endpoint is the interval [1, 2). If we include this
in the subset 𝐴, intervals [0, 3) and [1, 6) must be removed since they overlap
[1, 2). Then, the interval [3, 4) would be the one with the leftmost right endpoint
of the remaining intervals. This interval overlaps no other interval, so it should
obviously be included. Next, we would choose [4, 6) (overlapping with [4, 7)),
and finally [7, 8). Thus, the answer would be 𝐴 = {[1, 2), [3, 4), [4, 6), [7, 8)}.

1: procedure Scheduling(set 𝑆)
2: ans ← new 𝑠𝑒𝑡
3: Sort 𝑆 by right endpoint
4: highest ← ∞
5: for each interval [𝑙, 𝑟 ) ∈ 𝑆 do
6: if 𝑙 ≥ highest then
7: ans.𝑖𝑛𝑠𝑒𝑟𝑡 ( [𝑙, 𝑟 ))
8: highest ← 𝑟
9:
Í
Lsums.𝑖𝑛𝑠𝑒𝑟𝑡 ( 𝑙 ∈𝐿 𝑙)
10: return ans

We can prove that this strategy is optimal using a swapping argument, one
of the main greedy proof techniques. In a swapping argument, we attempt to
prove that given any solution, we can always modify it in such a way that our
greedy choice is no worse. This is what we did in the Change-making Problem,

176
10.4. S CHEDULING

where we argued that an optimal solution had to conform to a small set of


possibilities, or else we could swap some set of coins for another (such as two
coins worth 1 for a single coin worth 2).
Assume that the optimal solution does not contain the interval [𝑙, 𝑟 ), an
interval with the leftmost right endpoint. The interval [𝑙 0, 𝑟 0) that has the leftmost
right endpoint of the intervals in the solution, must have 𝑟 0 > 𝑟 . In particular,
this means any other interval [𝑎, 𝑏) in the solution must have 𝑎 ≥ 𝑟 0 (or else the
intervals would overlap). However, this means that the interval [𝑙, 𝑟 ) does not
overlap any other interval either, since 𝑎 ≥ 𝑟 0 > 𝑟 so that 𝑎 ≥ 𝑟 . Then, swapping
the interval [𝑙, 𝑟 ) for [𝑙 0, 𝑟 0) still constitute a valid solution, of the same size.
This proves that we could have included the interval [𝑙, 𝑟 ) in the optimal solution.
Note that the argument in no way say that the interval [𝑙, 𝑟 ) must be in an optimal
solution. It is possible for a scheduling problem to have many distinct solutions.
For example, in the example in Figure 10.2, we might just as well switch [4, 6)
for [4, 7) and still get an optimal solution.
This solution can be implemented in Θ(|𝑆 | log |𝑆 |) time. In the Scheduling
algorithm, this is accomplished by sort performing a sort (in Θ(|𝑆 | log |𝑆 |)),
followed by a loop for each interval, where each iteration takes Θ(1) time.

Exercise 10.3. For each of the first four strategies, find a set of intervals where
they fail to find an optimal solution.

We can extend the problem to choosing exactly 𝐾 disjoint subsets of non-


overlapping intervals instead, maximizing the sum of their sizes. The solution is
similar to the original problem, in that we always wish to include the interval
with the leftmost right endpoint in one of the subsets if possible. The question
is then, what subset? Intuitively, we wish to choose a subset there the addition
of our new interval causes as little damage as possible. This subset is the one
with the rightmost right endpoint that we can place the interval in. Proving this
is similar to the argument we used when deciding what interval to choose, and
is a good exercise to practice the swapping argument.

Exercise 10.4. Prove that choosing to place the interval in the subset with the
rightmost right endpoint is optimal.

Problem 10.1
Entertainment Box – entertainmentbox
Disastrous Downtime – downtime

177
C HAPTER 10. G REEDY A LGORITHMS

10.5 Huffman Coding


When storing binary data with a non-negligible degree of regularity, it is possible
to encode this data using a representation that uses fewer bits than the data itself.
For example, we could store the string “0101010101010101010101010101” as
“01, 14 times”. One such encoding scheme uses a lossless type of coding called
Huffman Coding.

Definition 10.1 — Prefix-free code


Given a set of symbols 𝑆, a binary prefix-free code of 𝑆 is a function
𝑃 : 𝑆 → {0, 1}∗ such that 𝑃 (𝑥) is a prefix of 𝑃 (𝑦) only if 𝑥 = 𝑦.
A prefix-free code is useful since it allows us to encode a string of symbols
using a sequence of bits, such that the original string of symbols can be uniquely
determined from the bit sequence. This can be done greedily. If the binary
encoding is 𝐵 = 𝑏 1𝑏 2 . . . 𝑏𝑛 , we find any symbol 𝑐 such that 𝑃 (𝑐) is some prefix
𝑏 1𝑏 2 . . . 𝑏𝑘 of the encoding. We know only one such prefix can exist. Otherwise,
if the encodings 𝑃 (𝑐) and 𝑃 (𝑑) of two symbols 𝑐 and 𝑑 would both be prefix of
𝐵, the shorter of them would also be a prefix of the other. Then, 𝑃 would not
be a prefix-free code. We thus choose the unique symbol whose encoding is
a prefix of 𝐵 as the first symbol that was encoded, remove the corresponding
symbols and continue until 𝐵 is empty.

Example 10.1 Let 𝑆 = {𝑎, 𝑏, 𝑐, 𝑑 }. Then, the code defined by 𝑃 (𝑎) = 0,


𝑃 (𝑏) = 1, 𝑃 (𝑐) = 10, 𝑃 (𝑑) = 11 is not prefix-free, since 𝑃 (𝑏) = 1 is a prefix
of 𝑃 (𝑐) = 10.
On the other hand, the code defined by 𝑃 (𝑎) = 0, 𝑃 (𝑏) = 11, 𝑃 (𝑐) = 100,
𝑃 (𝑑) = 101 is prefix-free. If 0110101100 is an encoding of a string of
symbols, the code given by 𝑃 gives us that the first symbol must be 𝑎 (0),
the second 𝑏 (11), the third 𝑎 (0), followed by 𝑑 (101) and 𝑐 (100).

There exists a straightforward fixed-length construction of a prefix-free code.


If 2𝑛−1 < |𝑆 | ≤ 2𝑛 , we can choose |𝑆 | strings of exactly 𝑛 bits as encodings for
the symbols. For compression purposes, we instead want a short encoding.
The Huffman code is a special kind of prefix-free code that takes into account
the frequence of every symbol. For example, if the string we are to encode
consist to 99% of the letter 𝑎, we probablyl want 𝑎 to have a very short encoding.
On the other hand, a symbol which only appear with a frequence of 1% does
not contribute much to the length of the encoding.

178
10.5. H UFFMAN C ODING

Problem 10.2
Whether Report – whether

Chapter Notes
Determining whether coins of denominations 𝐷 can even be used to construct
an amount 𝑇 is an NP-complete problem in the general case[16]. It possible to
determine what cases can be solved using the greedy algorithm described in
polynomial time though[6]. Such a set of denominations is called a canonical
coin system.
Introduction to Algorithms[7] also treats the scheduling problem in its
chapter in greedy algorithms. It also brings up the connection between greedy
problems and a concept known as matroids, which is well worth studying.

179
C HAPTER 10. G REEDY A LGORITHMS

180
11 Dynamic Programming
This chapter will study a technique called dynamic programming (often abbrevi-
ated DP). In one sense, it is simply a technique to solve the general case of the
best path in a directed acyclic graph problem (Section 10.2) in cases where the
graph does not admit locally optimal choices, in time approximately equal to
the number of edges in the graph. For graphs which are essentially trees with a
unique path to each vertex, dynamic programming is no better than brute force.
In more interconnected graphs, where many paths lead to the same vertex, the
power of dynamic programming shines through. It can also be seen as a way
to speed up recursive functions (called memoization), which will be our first
application.
First, we will see a familiar example – the Change-making problem, with a
different set of denominations. Then, we will discuss a little bit of theory, and
finally round of with a few concrete examples and standard problems.

11.1 Best Path in a DAG


We promised a generalization that can find the best path in a DAG that exhibit
optimal substructure even when locally optimal choices does not lead to an
optimal solution. In fact, we already have such a problem – the Change-
making Problem from Section 10.2, but with certain other sets of denominations.
Exercise 10.2 even asked you to prove that the case with coins worth 1, 6 and 7
could not be solved in the same greedy fashion.
So, how can we adapt our solution to this case? The secret lies in the graph
formulation of the problem which we constructed (Figure 10.1). For the greedy
solution, we essentially performed a recursive search on this graph, except we
always knew which edge to pick. When we do not know, the solution ought to
be obvious – let us test all the edges.

1: procedure ChangeMaking(denominations 𝐷, target 𝑇 )


2: if 𝑇 = 0 then
3: return 0

181
C HAPTER 11. D YNAMIC P ROGRAMMING

4: ans ← ∞
5: for denomination 𝑑 ∈ {1, 6, 7} do
6: if 𝑇 ≥ 𝑑 then
7: ans = min(ans, 1 + ChangeMaking(𝐷,𝑇 − 𝑑)
8: return ans

This solution as written is actually exponential in 𝑇 (it is Ω(3 7 )). The


𝑇

recursion tree for the case 𝑇 = 10 can be seen in Figure 11.1.

1 0

8 2 1 0

7 0

1 0

6 0

5 4 3 2 1 0

Figure 11.1: The recursion tree for the Change-making problem with 𝑇 = 10.

The key behind the optimal substructure property is that the answer for any
particular call in this graph with the same parameter 𝑐 is the same, independently
of the previous calls in the recursion. Right now, we perform calls with the same
parameters multiple times. Instead, we can save the result of a call the first time
we perform it:

1: memo = new 𝑖𝑛𝑡 [𝑇 + 1]


2: set memo[𝑖] = −1 for all 𝑖
3: procedure ChangeMaking(denominations 𝐷, target 𝑇 )
4: if 𝑇 = 0 then
5: return 0
6: if memo[𝑇 ] ≠ −1 then

182
11.2. D YNAMIC P ROGRAMMING

7: return memo[𝑇 ]
8: ans ← ∞
9: for denomination 𝑑 ∈ 𝐷 do
10: if 𝑇 ≥ 𝑑 then
11: ans = min(ans, 1 + ChangeMaking(𝐷,𝑇 − 𝑑)
12: memo[𝑇 ] ← ans
13: return ans

Theis new algorithm is actually linear in 𝑇 instead of exponential. The


call graph now looks very different (Figure 11.2), since all calls with the same
parameter will be merged (as such calls are only evaluated once).

8 7 6 5 4 3 2 1 0

Figure 11.2: The recursion tree for the Change-making problem with 𝑇 = 10, with duplicate calls
merged.

Note the similarity between this graph and our previous DAG formulation
of the Change-making problem (Figure 10.1).

11.2 Dynamic Programming


With these examples in hand, we are ready to give a more concrete character-
ization of dynamic programming. In principle, it can be seen as solving the
kind of “sequence of choices” problems that we used bruteforce to solve, but
where different choices can result in the same situation. For example, in the
Change-making Problem, after adding two coins worth 1 and 6, we might just
as well have added a 7 coin instead. After we have performed a sequence of

183
C HAPTER 11. D YNAMIC P ROGRAMMING

choices, how we got to the resulting state is no longer relevant – only where
we can go from there. Basically, we throw away the information (what exact
coins we used) that is no longer needed. This view of dynamic programming
problems as having a “forgetful” property, that the exact choices we have made
do not affect the future, is useful in most dynamic programming problems.
Another, more naive view, is that dynamic programming solutions are simple
recursions, where we happen to solve the same recursive subproblem a large
number of times. In this view, a DP solution is basically nothing more than a
recursive solution – find the correct base cases, a fast enough recursion, and
memoize the results.
More pragmatically, DP consists of two important parts – the states of the
DP, and the computation of a state. Both of these parts are equally important.
Fewer states generally mean less computations to make, and a better complexity
per state gives a better complexity overall.

Bottom-Up Computation
When applied to a dynamic programming problem, memoization is sometimes
called top-down dynamic programming instead. The name is inspired from the
way we compute the solution to our problem by starting at the largest piece at
the top of the recursion tree, and recursively breaking it down to smaller and
smaller pieces.
There is an alternative way of implementing a dynamic programming
solution, which (not particularly surprisingly) is called bottom-up dynamic
programming. This method instead constructs the solutions to our sub-problems
in the other order, starting with the base case and iteratively computing solutions
to larger sub-problems.
For example, we might just as well compute the solution to the Change-
making problem the following way:

1: procedure ChangeMaking(denominations 𝐷, target 𝑇 )


2: ans = new 𝑖𝑛𝑡 [𝑇 + 1]
3: set ans[𝑖] = ∞ for all 𝑖
4: set ans[0] = 0
5: for 𝑖 = 1 to 𝑇 do
6: for denomination 𝑑 ∈ 𝐷 do
7: if 𝑖 ≥ 𝑑 then

184
11.2. D YNAMIC P ROGRAMMING

8: ans[𝑖] = min(ans[𝑖], 1 + 𝑎𝑛𝑠 [𝑖 − 𝑑])


9: return ans[𝑇 ]

How do you choose between bottom-up and top-down? Mostly, it comes


down to personal choice. A dynamic programming solution will almost always
be fast enough no matter if you code it recursively or iteratively. There are
some performance concerns, both ways. A recursive solution is affected by the
overhead of recursive function calls. This problem is not as bad in C++ as in
many other languages, but it is still noticeable. When you notice that the number
of states in your DP solution is running a bit high, you might want to consider
coding it iteratively. Top-down DP, on the other hand, has the upside that only
the states reachable from the starting state will be visited. In some DP solutions,
the number of unreachable states which are still in some sense “valid” enough
to be computed bottom-up is significant enough that excluding them weighs up
for the function call overhead. In extreme cases, it might turn out that an entire
parameter is uniquely given by other parameters (such as the Ferry Loading
problem in Section 11.3). While we probably would notice when this is the case,
the top-down DP saves us when we do not.

Order of Computation and Memory Usage


For top-down DP, the memory usage is often quite clear and unavoidable. If a
DP solution has 𝑁 states, it will have an Ω(𝑁 ) memory usage. For a bottom-up
solution, the situation is quite different.
Firstly, let us consider one of the downsides of bottom-up DP. When coding
a top-down DP, you do not need to bother with the structure of the graph you are
solving your problem on. For a bottom-up DP, you need to ensure that whenever
you solve a subproblem, you have already solved its subproblems too. This
requires you to define an order of computation, such that if the subproblem 𝑎 is
used in solving subproblem 𝑏, 𝑎 is computed before 𝑏.
In most cases, such an order is quite easy to find. Most parameters can simply
be increasing or decreasing, using a nested loop for each parameter. When a
DP is done over intervals, the order of computation is often over increasing or
decreasing length of the interval. DP over trees usually require a post-order
traversal of the tree. In terms of the recurring graph representation we often use
for DP problems, the order of computation must be a topological ordering of the
graph.
While this order of computation business may seem to be nothing but a

185
C HAPTER 11. D YNAMIC P ROGRAMMING

nuisance that bottom-up users have to deal with, it is related to one of the perks
of bottom-up computation. If the order of computation is chosen in a clever
way, we need not save every state during our computation. Consider e.g. the
Change-making Problem again, which had the following recursion:
(
0 if 𝑛 = 0
𝑐ℎ𝑎𝑛𝑔𝑒 (𝑛) =
min(𝑐ℎ𝑎𝑛𝑔𝑒 (𝑛 − 1), 𝑐ℎ𝑎𝑛𝑔𝑒 (𝑛 − 6), 𝑐ℎ𝑎𝑛𝑔𝑒 (𝑛 − 7)) if 𝑛 > 0

It should be clear that using the order of computation 0, 1, 2, 3, ..., once we have
computed e.g. 𝑐ℎ𝑎𝑛𝑔𝑒 (𝑘), the subproblems 𝑐ℎ𝑎𝑛𝑔𝑒 (𝑘 − 7), 𝑐ℎ𝑎𝑛𝑔𝑒 (𝑘 − 8), ... etc.
are never used again.
Thus, we only need to save the value of 7 subproblems at a time. This Θ(1)
memory usage is pretty neat compared to the Θ(𝐾) usage needed to compute
𝑐ℎ𝑎𝑛𝑔𝑒 (𝐾) otherwise.
Competitive Tip
Generally, memory limits are very generous nowadays, somewhat diminishing the art
of optimizing memory in DP solutions. It can still be a good exercise to think about
improving the memory complexity of the solutions we will look at, for the few cases
where these limits are still relevant.

11.3 Multidimensional DP
Now, we are going to look at a DP problem where our state consists of more
than one variable. The example will demonstrate the importance of carefully
choosing your DP parameters.

Ferry Loading
Swedish Olympiad in Informatics 2013, Online Qualifiers
A ferry is to be loaded with cars of different lengths, with a long line of cars
currently queued up for a place. The ferry consists of four lanes, each of the
same length. When the next car in the line enters the ferry, it picks one of the
lanes and parks behind the last car in that line. There must be safety margin of 1
meter between any two parked cars.
Given the length of the ferry and the length of the cars in the queue, compute
the maximal amount of cars that can park if they choose the lanes optimally.

186
11.3. M ULTIDIMENSIONAL DP

1
2 2

5 5 1
1
2
1

Figure 11.3: An optimal placement on a ferry of length 5 meters, of the cars with lengths
2, 1, 2, 5, 1, 1, 2, 1, 1, 2 meters. Only the first 8 cars could fit on the ferry.

Input
The first line contains the number of cars 0 ≤ 𝑁 ≤ 200 and the length of the
ferry 1 ≤ 𝐿 ≤ 60. The second line contains 𝑁 integers, the length of the cars
1 ≤ 𝑎𝑖 ≤ 𝐿.
Output
Output a single integer – the maximal number of cars that can be loaded on the
ferry.
The ferry problem looks like a classical DP problem. It consists of a large
number of similar choices. Each car has 4 choices – one of the lanes. If a car of
length 𝑚 chooses a lane, the remaining length of the chosen lane is reduced by
𝑚 + 1 (due to the safety margin). After the first 𝑐 cars have parked on the ferry,
the only thing that has changed are the lengths of the ferry. As a simplification,
we increase the initial length of the ferry by 1, to accommodate an imaginary
safety margin for the last car in a lane in case it is completely filled.
This suggests a DP solution with 𝑛𝐿 4 states, each state representing the
number of cars so far placed and the lengths of the four lanes:
Ferry Loading
1 int dp[201][62][62][62][62] = {-1};
2
3 int ferry(int c, vi used, const vi& A) {
4 if (c == sz(A)) return 0;
5 int& ans = dp[c][used[0]][used[1]][used[2]][used[3]];
6 if (ans != -1) return ans;
7 rep(i,0,4) {
8 if (used[i] + A[i] + 1 > L + 1) continue;
9 used[i] += A[i] + 1;
10 ans = max(ans, ferry(c + 1, used, A) + 1);

187
C HAPTER 11. D YNAMIC P ROGRAMMING

11 used[i] -= A[i] + 1;
12 }
13 return ans;
14 }

Unfortunately, memoizing this procedure would not be sufficient. The size of


our memoization array is 200 · 604 ≈ 2.6 · 109 , which needs many gigabytes of
memory.
The trick in improving this is basically the same as the fundamental principle
of DP. In DP, we reduce a set of choices to a smaller set of information, which
represent the effects of those choices. This removes information which turned
out to be redundant. In our case, we do not care about what lanes cars chose,
only their remaining lengths. Our suggested solution still has some lingering
redundancy though.
In Figure 11.3 from the problem statement, we have an example assignment of
the cars 2, 1, 2, 5, 1, 1, 2, 1. These must use a total of 3+2+3+6+2+2+3+2 = 23
meters of space on the ferry. We define 𝑈 (𝑐) to be the total usage (i.e., lengths
plus the safety margin) of the first 𝑐 cars. Note that 𝑈 (𝑐) is a strictly increasing
function in 𝑐, meaning it is in bijection to [𝑛]. Let 𝑢 1 (𝑐), 𝑢 2 (𝑐), 𝑢 3 (𝑐), 𝑢 4 (𝑐) be
the usage of the four lanes individually in some given assignment. Then, we
have that 𝑈 (𝑐) = 𝑢 1 (𝑐) + 𝑢 2 (𝑐) + 𝑢 3 (𝑐) + 𝑢 4 (𝑐). The four terms on the right are
four parameters in our memoization. The left term is not, but it has a bijection
with 𝑐, which is a parameter in the memoization. Thus, we actually have a
redundancy in our parameters. We can eliminate the parameter 𝑐, since it is
uniquely defined given the values 𝑢 1 (𝑐), 𝑢 2 (𝑐), 𝑢 3 (𝑐), 𝑢 4 (𝑐). This simplification
leaves us with 604 ≈ 13 000 000 states, which is well within reason.

11.4 Subset DP
Another common theme of DP is subsets, where the state represents a subset
of something. The subset DP is used in many different ways. Sometimes (as
in Subsection 11.6.4), the problem itself is about sets and subsets. Another
common usage is to reduce a solution which requires us to test permutations
of something into instead constructing permutations iteratively, using DP to
remember only what elements so far used in the permutation, and not the exact
assignment.

188
11.4. S UBSET DP

Amusement Park
Swedish Olympiad in Informatics 2012, Online Qualifiers
Lisa has just arrived at an amusement park, and wants to visit each of the 𝑁
attractions exactly once. For each attraction, there are two identical facilities at
different locations in the park. Given the locations of all the facilities, determine
which facility Lisa should choose for each attraction, in order to minimize the
total distance she must walk. Originally, Lisa is at the entrance at coordinates
(0, 0). Lisa must return to the entrance once she has visited every attraction.
Input
The first line contains the integer 1 ≤ 𝑁 ≤ 15, the number of attractions Lisa
wants to visit. Then, 𝑁 lines follow. The 𝑖’th of these lines contains four integers
−106 ≤ 𝑥 1, 𝑦1, 𝑥 2, 𝑦2 ≤ 106 . These are the coordinates (𝑥 1, 𝑦1 ) and (𝑥 2, 𝑦2 ) for
the two facilities of the 𝑖’th attraction.
Output
First, output the smallest distance Lisa must walk. Then, output 𝑁 lines, one
for each attraction. The 𝑖’th line should contain two numbers 𝑎 and 𝑓 – the 𝑖’th
attraction Lisa visited (a number between 1 and 𝑁 ), and the facility she visited
(1 or 2).
Consider a partial walk, where we have visited a set 𝑆 of attractions and
currently stand at coordinates (𝑥, 𝑦). Then, any choice up to this point is
irrelevant for the remainder of the problem, which suggests that these parameter
𝑆, 𝑥, 𝑦 is a good DP state. Note that (𝑥, 𝑦) only have at most 31 possibilities –
two for each attraction, plus the entrance at (0, 0). Since we have at most 15
attractions, the set 𝑆 of visited attractions has 215 possibilities. This gives us
31 · 215 ≈ 106 states. Each state can be computed in Θ(𝑁 ) time, by choosing
what attraction to visit next. All in all, we get a complexity of Θ(𝑁 2 2𝑁 ).
When coding DP over subsets, we generally use bitsets to represent the subset,
since these map very cleanly to integers (and therefore indices into a vector):

189
C HAPTER 11. D YNAMIC P ROGRAMMING

Amusement Park
1 double best(int at, int visited) {
2 // 2N is the number given to the entrance point
3 if (visited == (1<<N) - 1) return dist(at, 2*N);
4 double ans = inf;
5 rep(i,0,N) {
6 if (visited&(1<<N)) continue;
7 rep(j,0,2) {
8 // 2i + j is the number given to the j'th facility
9 // of the i'th attraction
10 int nat = 2 * i + j;
11 ans = min(ans, dist(at + nat) + best(nat, visited | (1<<i)));
12 }
13 }
14 return ans;
15 }

11.5 Digit DP
Digit DP are a class of problems where we count numbers with certain properties
that contain a large number of digits, up to a certain limit. These properties are
characterized by having the classical properties of DP problems, i.e. being easily
computable if we would construct the numbers digit-by-digit by remembering
very little information about what those numbers actually were.

190
11.5. D IGIT DP

Palindrome-Free Numbers
Baltic Olympiad in Informatics 2013 – Antti Laaksonen
A string is a palindrome if it remains the same when it is read backwards. A
number is palindrome-free if it does not contain a palindrome with a length
greater than 1 as a substring. For example, the number 16276 is palindrome-free
whereas the number 17276 is not because it contains the palindrome 727. The
number 10102 is not valid either, since it has 010 as a substring (even though
010 is not a number itself).
Your task is to calculate the total number of palindrome-free numbers in a
given range.
Input
The input contains two numbers 0 ≤ 𝑎 ≤ 𝑏 ≤ 1018 .
Output
Your output should contain one integer: the total number of palindrome-free
numbers in the range 𝑎, 𝑎 + 1, ..., 𝑏 − 1, 𝑏 (including 𝑎 and 𝑏).
First, a common simplification when solving counting problems on intervals.
Instead of computing the answer for the range 𝑎, 𝑎 + 1, ..., 𝑏 − 1, 𝑏, we will solve
the problem for the intervals [0, 𝑎) and [0, 𝑏 + 1). The answer is then the answer
for the second interval with the answer for the first interval removed. Our lower
limit will then be 0 rather than 𝑎, which simplifies the solution.
Next up, we need an essential observation to turn the problem into a standard
application of digit DP. Palindromes as general objects are very unwieldly in
our situation. Any kind of iterative construction of numbers would have to
bother with digits far back in the number since any of them could be the edge
of a palindrome. Fortunately, it turns out that any palindrome must contain a
rather short palindromic subsequence, namely one of length 2 (for even-length
palindromes), or length 3 (for odd-length palindromes). This means that when
constructing the answer recursively, we only need to care about the last two
digits. When adding a digit to a partially constructed number, it may not be
equal to either of the last two digits.
Before arriving at the general solution, we will solve the problem when the
upper limit was 999...999 – the sequence consisting of 𝑛 nines. In this case, a
simple recursive function will do the trick:

191
C HAPTER 11. D YNAMIC P ROGRAMMING

Palindrome-Free Numbers
1 ll sol(int at, int len, int b1, int b2) {
2 if (at == len) return 1; // we have successfully constructed a number
3 ll ans = 0;
4 rep(d,0,10) {
5 // this digit would create a palindrome
6 if (d == b2 || d == b1) continue;
7 // let -1 represent a leading 0, to avoid the palindrome check
8 bool leadingZero = b2 == -1 && d == 0;
9 ans += sol(at + 1, len, b2, leadingZero ? -1 : d);
10 }
11 return ans;
12 }
13
14 // we start with an empty number with leading zeroes
15 sol(0, n, true, -1, -1);

We fix the length of all numbers to have length 𝑛, by giving shorter numbers
leading zeroes. Since leading zeroes in a number are not subject to the palindrome
restriction, they must be treated differently. In our case, they are given the
special digit −1 instead, resulting in 11 possible “digits”. Once this function is
memoized, it will have 𝑛 · 2 · 11 · 11 different states, with each state using a loop
iterating only 10 times. Thus, it uses on the order of 1000𝑛 operations. In our
problem, the upper limit has at most 19 digits. Thus, the solution only requires
about 20 000 operations.
Once a solution has been formulated for this simple upper limit, extending
it to a general upper limit is quite natural. First, we will save the upper limit
as a sequence of digits 𝐿. Then, we need to differentiate between two cases in
our recursive function. The partially constructed number is either equal to the
corresponding partial upper limit, or it is less than the limit. In the first case, we
are still constrained by the upper limit – the next digit of our number can not
exceed the next digit of the upper limit. In the other case, the the upper limit is
no longer relevant. If a prefix of our number is strictly lower than the prefix of
the upper limit, our number can never exceed the upper limit.
This gives us our final solution:

192
11.6. S TANDARD P ROBLEMS

Palindrome-Free Numbers, General Case


1 vector<int> L;
2
3 ll sol(int at, int len, bool limEq, int b1, int b2) {
4 if (at == len) return 1;
5 ll ans = 0;
6 // we may not exceed the limit for this digit
7 // if equal to the prefix of the limit
8 rep(d,0,(limEq ? L[at] + 1 : 10)) {
9 if (d == b2 || d == b1) continue;
10 // the next step will be equal to the prefix if it was now,
11 // and our added digit was exactly the limit
12 bool limEqNew = limEq && d == L[at];
13 bool leadingZero = b2 == -1 && d == 0;
14 ans += sol(at + 1, len, limEqNew, b2, leadingZero ? -1 : d);
15 }
16 return ans;
17 }
18
19 // initially, the number is equal to the
20 // prefix limit (both being the empty number)
21 sol(0, n, true, true, -1, -1);

11.6 Standard Problems


Many problems are variations of known DP problems, or have them as parts of
their solutions. This section will walk you through a few of them.

Knapsack
The knapsack problem is one of the most common standard DP problem. The
problem itself has countless variations. We will look at the “original” knapsack
problem, with constraints making it suitable for a dynamic programming
approach.

Knapsack
Given is a knapsack with an integer capacity 𝐶, and 𝑛 different objects, each
with an integer weight and value. Your task is to select a subset of the items with
maximal value, such that the sum of their weights does not exceed the capacity
of the knapsack.
Input

193
C HAPTER 11. D YNAMIC P ROGRAMMING

The integer 𝐶 giving the capacity of the knapsack, and an integer 𝑛, giving the
number of objects. This is followed by the 𝑛 objects, given by their value 𝑣𝑖 and
weight 𝑤𝑖 .
Output
Output the indicies of the chosen items.
We are now going to attempt to formulate an 𝑂 (𝑛𝐶) solution. As is often
the case when the solution is a subset of something in DP solutions, we solve
the problem by looking at the subset as a sequence of choices – to either include
an item in the answer or not. In this particular problem, our DP state is rather
minimalistic. Indeed, after including a few items, we are left only with the
remaining items and a smaller knapsack to solve the problem for.
Letting 𝐾 (𝑐, 𝑖) be the maximum value using at most weight 𝑐 and the 𝑖 first
items, we get the recursion
(
𝐾 (𝑐, 𝑖 − 1)
𝐾 (𝑐, 𝑖) = max
𝐾 (𝑐 − 𝑤𝑖 , 𝑖 − 1) + 𝑣𝑖 if 𝑤𝑖 ≤ 𝑐

Translating this recursion into a bottom-up solution gives a rather compact


algorithm:

1: procedure Knapsack(capacity 𝐶, items 𝑛, values 𝑉 , weights 𝑊 )


2: best ← 𝑛𝑒𝑤 𝑖𝑛𝑡 [𝑛 + 1] [𝐶 + 1]
3: fill best with −∞
4: best[0] [𝐶] = 0
5: for 𝑖 from 0 to 𝑛 − 1 do
6: for 𝑗 from 0 to 𝐶 do
7: if 𝑗 ≥ 𝑊 [𝑖] then
8: best[𝑖 + 1] [ 𝑗] ← max(best[𝑖] [ 𝑗], best[𝑖] [ 𝑗 − 𝑊 [𝑖]] + 𝑉 [𝑖])
9: return best

However, this only helps us compute the answer. The problem asks us to
explicitly construct the subset. This step, i.e., tracing what choices we made to
arrive at an optimal solution is called backtracking.
For this particular problem, the backtracking is relatively simple. One
usually proceeds by starting at the optimal state, and then consider all transitions
that lead to this state. Among these, the “best” one is picked. In our case, the
transitions correspond to either choosing the current item, or not choosing it.

194
11.6. S TANDARD P ROBLEMS

Both lead to two other states which are simple to compute. In the first case,
the state we arrived from must have the same value and capacity, while in the
second case the value should differ by 𝑉 [𝑖] and the weight by 𝑊 [𝑖]:
Make sure to study the implementation closely; this kind of reconstruction
is a bit tricky to get right, but most reconstructions look something like it.

1: procedure KnapsackConstruct(capacity 𝐶, items 𝑛, values 𝑉 , weights 𝑊 )


2: best ← 𝐾𝑛𝑎𝑝𝑠𝑎𝑐𝑘 (𝐶, 𝑛, 𝑉 ,𝑊 )
3: bestCap ← 𝐶
4: for 𝑖 from 𝐶 to 0 do
5: if best[𝑁 ] [𝑖] > best[𝑁 ] [bestCap] then
6: bestCap ← 𝑖
7: for 𝑖 from 𝑁 to 1 do
8: if 𝑊 [𝑖] ≤ bestCap then
9: newVal ← best[𝑖 − 1] [bestCap − 𝑊 [𝑖]]
10: if newVal = best[𝑖] [bestCap] + 𝑉 [𝑖] then
11: ans.𝑎𝑑𝑑 (𝑖)
12: bestCap ← bestCap − 𝑊 [𝑖]
13: output 𝑎𝑛𝑠

Problem 11.1
Knapsack – knapsack
Walrus Weights – walrusweights

A common knapsack variation is a variation of coin change problem. In


this problem, we are instead allowed to use each item an unlimited number of
times, and ask ourselves how many different ways we can choose items in order
to fill the knapsack. We formulate our DP solution in a similar way to ordinary
knapsack. As subproblem, we count the number of ways to fill a knapsack
of capacity 𝑐 using items 𝑖, 𝑖 + 1, .... In such a situation, we have two choices.
We can either keep using item 𝑖 once, leaving us with the same set of items
but capacity 𝑐 − 𝑤𝑖 , or decide that we have finished using item 𝑖. In the latter
case, our knapsack will have the same capacity, but we will instead use items
𝑖 + 1, 𝑖 + 2, ....
In this case, the recursion is slightly altered:

𝐾 (𝑐, 𝑖) = 𝐾 (𝑐, 𝑖 + 1) + 𝐾 (𝑐 − 𝑤𝑖 , 𝑖)

195
C HAPTER 11. D YNAMIC P ROGRAMMING

Longest Common Subsequence

Longest Common Subsequence


A sequence 𝑎 1, 𝑎 2, ..., 𝑎𝑛 has 𝑐 1, 𝑐 2, ..., 𝑐𝑘 as a subsequence if there exists in-
dices 𝑝 1 < 𝑝 2 < ..., < 𝑝𝑘 such that 𝑎𝑝𝑖 = 𝑐𝑖 . For example, the sequence
h1, 1, 5, 3, 3, 7, 5i, has h1, 3, 3, 5i as one of its sub-sequences.
Given two sequences 𝐴 = h𝑎 1, 𝑎 2, ..., 𝑎𝑛 i and 𝐵 = h𝑏 1, 𝑏 2, ..., 𝑏𝑚 i, find the
longest sequence 𝑐 1, ..., 𝑐𝑘 that is a subsequence of both 𝐴 and 𝐵.

When dealing with DP problems on pairs of sequences, a natural subproblem


is to solve the problem for all prefixes of 𝐴 and 𝐵. For the subsequence problem,
some reasoning about what a common subsequence is leads to a recurrence
expressed in this way. Basically, we can do a case analysis on the last letter of
the strings 𝐴 and 𝐵. If the last letter of 𝐴 is not part of a longest increasing
subsequence, we can simply ignore it, and solve the problem on the two strings
where the last letter of 𝐴 is removed. A similar case is applicable when the
last letter of 𝐵 is not part of a longest increasing subsequence. A single case
remains – when both the last letter of 𝐴 and the last letter of 𝐵 are part of a
longest increasing subsequence. In this case, we argue that these two letters
must correspond to the same letter 𝑐𝑖 in the common subsequence. In particular,
they must correspond to the final character of the subsequence (by the definition
of a subsequence). Thus, whenever the two final letters are equal, we may have
the case that they are the last letter of the subsequence, and that the remainder of
the subsequence is the longest common subsequence of 𝐴 and 𝐵 with the final
letter removed.
This yields a simple recursive formulation, which takes Θ(|𝐴||𝐵|) to evaluate
(since each state takes Θ(1) to evaluate).



 0 if 𝑛 = 0 or 𝑚 = 0

 lcs(𝐴, 𝐵, 𝑛 − 1, 𝑚) if 𝑛 > 0


lcs(𝐴, 𝐵, 𝑛, 𝑚) = max



 lcs(𝐴, 𝐵, 𝑛, 𝑚 − 1) if 𝑚 > 0

 lcs(𝐴, 𝐵, 𝑛 − 1, 𝑚 − 1) + 1

if 𝑎𝑛 = 𝑏𝑚

TODO
Problem 11.2
Longest Common Subsequence – longcommonsubseq

196
11.6. S TANDARD P ROBLEMS

Longest Increasing Subsequence


Problem 11.3
Longest Increasing Subsequence – longincsubseq

Set Cover

Set Cover
You are given a family of subsets 𝑆 1, 𝑆 2, ..., 𝑆𝑘 of some larger set 𝑆 of size 𝑛. Find
a minimum number of subsets 𝑆𝑎1 , 𝑆𝑎2 , ..., 𝑆𝑎𝑙 such that
𝑙
Ø
𝑆 𝑎𝑖 = 𝑆
𝑖=1

i.e., cover the set 𝑆 by taking the union of as few of the subsets 𝑆𝑖 as possible.
For small 𝑘 and large 𝑛, we can solve the problem in Θ(𝑛2𝑘 ), by simply
testing each of the 2𝑘 covers. In the case where we have a small 𝑛 but 𝑘 can be
large, this becomes intractable. Instead, let us apply the principle of dynamic
programming. In a brute force approach, we would perform 𝑘 choices. For each
subset, we would try including it or excluding it. After deciding which of the
first 𝑚 subsets to include, what information is relevant? Well, if we consider
what the goal of the problem is – covering 𝑆 – it would make sense to record
what elements have been included so far. This little trick leaves us with a DP of
Θ(𝑘2𝑛 ) states, one for each subset of 𝑆 we might have reached, plus counting
how many of the subsets we have tried to use so far. Computing a state takes
Θ(𝑛) time, by computing the union of the current cover with the set we might
potentially add. The recursion thus looks like:
(
0 if 𝐶 = 𝑆
cover(𝐶, 𝑘) =
min(𝑐𝑜𝑣𝑒𝑟 (𝐶, 𝑘 + 1), 𝑐𝑜𝑣𝑒𝑟 (𝐶 ∪ 𝑆𝑘 , 𝑘 + 1)) else

This is a fairly standard DP solution. The interesting case occurs when 𝑛 is


small, but 𝑘 is really large, say, 𝑘 = Θ(2𝑛 ). In this case, our previous complexity
Θ(𝑛𝑘2𝑛 ) turns into Θ(𝑛4𝑛 ). Such a complexity is unacceptable for anything but
very small 𝑛. To avoid this, we must rethink our DP a bit.
The second term of the recursive case of 𝑐𝑜𝑣𝑒𝑟 (𝐶, 𝑘), i.e. 𝑐𝑜𝑣𝑒𝑟 (𝐶 ∪𝑆𝑘 , 𝑘 +1),
actually degenerates to 𝑐𝑜𝑣𝑒𝑟 (𝐶, 𝑘 + 1) if 𝑆𝑘 ⊆ 𝐶. When 𝑘 is large, this means
many states are essentially useless. In fact, at most 𝑛 of our 𝑘 choices will

197
C HAPTER 11. D YNAMIC P ROGRAMMING

actually result in us adding something, since we can only add a new element at
most 𝑛 times.

We have been in a similar situation before, when solving the backtracking


problem Basin City Surveillance in Section 9.3. We were plagued with having
many choices at each state, where a large number of them would fail. Our
solution was to limit our choices to a set where we knew an optimal solution
would be found.

Applying the same change to our set cover solution, we should instead do
DP over our current cover, and only try including sets which are not subsets of
the current cover. So, does this help? How many subsets are there, for a given
cover 𝐶, which are not its subsets? If the size of 𝐶 is 𝑚, there are 2𝑚 subsets of
𝐶, meaning 2𝑛 − 2𝑚 subsets can add a new element to our cover.

To find out how much time this needs, we will use two facts. First of all, there
are 𝑚𝑛
subsets of size 𝑚 of a size 𝑛 set. Secondly, the sum 𝑚=0 𝑚 2 =3 .
𝑛 𝑚
Í𝑛 𝑚

If you are not familiar with this notation or this fact, you probably want to take a
look at Section ?? on binomial coefficients.

So, summing over all possible extending subsets for each possible partial 𝐶,
we get:

𝑛  
Õ 𝑛
(2𝑛 − 2𝑚 ) = 2𝑛 · 2𝑛 − 3𝑛 = 4𝑛 − 3𝑛
𝑚=0
𝑚

Closer, but no cigar. Intuitively, we still have a large number of redundant


choices. If our cover contains, say, 𝑛 − 1 elements, there are 2𝑛−1 sets which can
extend our cover, but they all extend it in the same way. This sounds wasteful,
and avoiding it probably the key to getting an asymptotic speedup.

It seems that we are missing some key function which, given a set 𝐴, can
respond to the question: “is there some subset 𝑆𝑖 , that could extend our cover
with some subset 𝐴 ⊆ 𝑆?”. If we had such a function, computing all possible
extensions of a cover of size 𝑚 would instead take time 2𝑛−𝑚 – the number of
possible extensions to the cover. Last time we managed to extend a cover in time
2𝑛 − 2𝑚 , but this is exponentially better!

198
11.6. S TANDARD P ROBLEMS

In fact, if we do our summing this time, we get:


𝑛   𝑛  
Õ 𝑛 Õ 𝑛
2𝑛−𝑚 = 2𝑛−𝑚
𝑚=0
𝑚 𝑚=0
𝑛 − 𝑚
𝑛  
Õ 𝑛 𝑚
= 2
𝑚=0
𝑚
= 3𝑛

It turns out our exponential speedup in extending a cover translated into an


exponential speedup of the entire DP.
We are not done yet – this entire algorithm depended on the assumption of
our magical “can we extend a cover with a subset 𝐴?” function. Sometimes,
this function may be quick to compute. For example, if 𝑆 = {1, 2, ..., 𝑛} and the
family 𝑆𝑖 consists of all sets whose sum is less than 𝑛, an extension is possible if
and only if its sum is also less than 𝑛. In the general case, our 𝑆𝑖 are not this
nice. Naively, one might think that in the general case, an answer to this query
would take Θ(𝑛𝑘) time to compute, by checking if 𝐴 is a subset of each of our 𝑘
sets. Yet again, the same clever trick comes to the rescue.
If we have a set 𝑆𝑖 of size 𝑚 available for use in our cover. just how many
possible extensions could this subset provide? Well, 𝑆𝑖 itself only have 2𝑚
subsets. Thus, if we for each 𝑆𝑖 mark for each of its subsets that this is a possible
extension to a cover, this precomputation only takes 3𝑛 time (by the same sum
as above).
Since both steps are 𝑂 (3𝑛 ), this is also our final complexity.
Problem 11.4
Square Fields (Easy) – squarefieldseasy
Square Fields (Hard) – squarefieldshard

Chapter Notes

199
C HAPTER 11. D YNAMIC P ROGRAMMING

200
12 Divide and Conquer
A recursive algorithm solves a problem by reducing it to smaller subproblems,
hoping that their solutions can be used to solve the larger problem. So far, the
subproblems we have considered have been “almost the same” as the problem at
hand. We have usually recursed on a series of choices, where each recursive
step made one choice, as in the change-making problem. In particular, our
subproblems often overlapped – solving two different subproblems required
solving a common, third subproblem. In this chapter, we will take another
approach altogether, by splitting our instance into large, disjoint (or almost
disjoint parts) parts – dividing it – and combining their solutions – conquering
it.

12.1 Inductive Constructions


Inductive construction problems comprise a large class of divide and conquer
problems. The goal is often to construct something, such as a tiling of a grid,
cycles in a graph and so on. In these cases, divide and conquer algorithms aim
to reduce the construction of the whole object to instead constructing smaller
parts which can be combined into the final answer. Such constructions are
often by-products of mathematical induction proofs of the existence of such a
construction. In the following example problems, it is not initially clear that the
object we are asked to construct even exists.

Grid Tiling
In a square grid of side length 2𝑛 , one unit square is blocked (represented by
coloring it black). Your task is to cover the remaining 4𝑛 − 1 squares with
triominos, 𝐿-shaped tiles consisting of three squares in the following fashion.
The triominos can be rotated by any multiple of 90 deg (Figure 12.1).

201
C HAPTER 12. D IVIDE AND C ONQUER

Figure 12.1: The four rotations of a triomino.

The triominos may not overlap each other, nor cover anything outside the
grid. A valid tiling for 𝑛 = 2 would be

Figure 12.2: A possible tiling for 𝑛 = 2.

Input
The input consists of three integers 1 ≤ 𝑛 ≤ 8, 0 ≤ 𝑥 < 2𝑛 and 0 ≤ 𝑦 < 2𝑛 . The
black square has coordinates (𝑥, 𝑦).
Output
Output the positions and rotations of any valid tiling of the grid.
When tiling a 2𝑛 × 2𝑛 grid, it is not immediately clear how the divide and conquer
principle can be used. To be applicable, we must be able to reduce the problem
into smaller instances of the same problem and combine them. The peculiar
side length 2𝑛 does hint about a possible solution. Aside from the property
that 2𝑛 · 2𝑛 − 1 is evenly divisible by 3 (a necessary condition for a tiling to be
possible), it also gives us a natural way of splitting an instance, namely into its 4
quadrants.

Figure 12.3: Splitting the 𝑛 = 3 case into its four quadrants.

202
12.1. I NDUCTIVE C ONSTRUCTIONS

Each of these have the size 2𝑛−1 × 2𝑛−1 , which is also of the form we require
of grids in the problem. The crux lies in that these four new grids do not comply
with the input specification of the problem. While smaller and disjoint, three of
them contain no black square, a requirement of the input. Indeed, a grid of this
size without any black squares can not be tiled using triominos.
The solution lies in the trivial solution to the 𝑛 = 1 case, where we can easily
reduce the problem to four instances of the 𝑛 = 0 case:

Figure 12.4: A solution to the 𝑛 = 1 case.

In the solution, we use a single triomino which blocked a single square of


each of the four quadrants. This gives us four trivial subproblems of size 1 × 1,
where each grid has one blocked square. We can actually place such a triomino
in every grid by placing it in the center (the only place where a triomino may
cover three quadrants at once).

Figure 12.5: Placing a triomino in the corners of the quadrants without a black square.

After this transformation, we can now apply the divide and conquer principle.
We split the grid into its four quadrants, each of which now contain one black
square. This allows us to recursively solve four new subproblems. At some
point, this recursion will finally reach the base case of a 1 × 1 square, which
must already be filled.

203
C HAPTER 12. D IVIDE AND C ONQUER

1: procedure Tile(𝑁 , (𝐵𝑥 , 𝐵 𝑦 ), (𝑇𝑥 ,𝑇𝑦 ))


2: if 𝑁 = 0 then
3: return
4: mid ← 2𝑁 −1
5: blocked ← {(0, 0), (𝑚𝑖𝑑 − 1, 0), (𝑚𝑖𝑑 − 1, 𝑚𝑖𝑑 − 1), (0, 𝑚𝑖𝑑 − 1)}
6: if 𝐵𝑥 ≥ 𝑚𝑖𝑑 and 𝐵 𝑦 ≥ 𝑚𝑖𝑑 then
7: blockedQuad ← 𝑇𝑂𝑃_𝑅𝐼𝐺𝐻𝑇
8: if 𝐵𝑥 < 𝑚𝑖𝑑 and 𝐵 𝑦 ≥ 𝑚𝑖𝑑 then
9: blockedQuad ← 𝑇𝑂𝑃_𝐿𝐸𝐹𝑇
10: if 𝐵𝑥 < 𝑚𝑖𝑑 and 𝐵 𝑦 < 𝑚𝑖𝑑 then
11: blockedQuad ← 𝐵𝑂𝑇𝑇𝑂𝑀_𝐿𝐸𝐹𝑇
12: if 𝐵𝑥 ≥ 𝑚𝑖𝑑 and 𝐵 𝑦 < 𝑚𝑖𝑑 then
13: blockedQuad ← 𝐵𝑂𝑇𝑇𝑂𝑀_𝑅𝐼𝐺𝐻𝑇
14: place(𝑇𝑥 + 𝑚𝑖𝑑,𝑇𝑦 + 𝑚𝑖𝑑, 𝑏𝑙𝑜𝑐𝑘𝑒𝑑𝑄𝑢𝑎𝑑)
15: tile(𝑁 − 1, 𝑏𝑙𝑜𝑐𝑘𝑒𝑑 [0],𝑇𝑥 + mid,𝑇𝑦 + mid)
16: tile(𝑁 − 1, 𝑏𝑙𝑜𝑐𝑘𝑒𝑑 [1],𝑇𝑥 ,𝑇𝑦 + mid)
17: tile(𝑁 − 1, 𝑏𝑙𝑜𝑐𝑘𝑒𝑑 [2],𝑇𝑥 ,𝑇𝑦 )
18: tile(𝑁 − 1, 𝑏𝑙𝑜𝑐𝑘𝑒𝑑 [3],𝑇𝑥 + mid,𝑇𝑦 )

The time complexity of the algorithm can be computed easily if we use the
fact that each call to tile only takes Θ(1) time except for the four recursive calls.
Furthermore, each call places exactly one tile on the board. Since there are 4 3−1
𝑛

tiles to be placed, the time complexity must be Θ(4𝑛 ). This is asymptotically


optimal, since this is also the size of the output.

Exercise 12.1. It is possible to tile such a grid with triominos colored red, blue
and green such that no two triominos sharing an edge have the same color. Prove
this fact, and give an algorithm to generate such a coloring.

Divisible Subset
Let 𝑛 = 2𝑘 . Given a set 𝐴 of 2𝑛 − 1 integers, find a subset 𝑆 of size exactly 𝑛
such that Õ
𝑥
𝑥 ∈𝑆

is a multiple of 𝑛.

204
12.1. I NDUCTIVE C ONSTRUCTIONS

Input
The input contains an integer 1 ≤ 𝑛 ≤ 215 that is a power of two, followed by
the 2𝑛 − 1 elements of 𝐴.
Output
Output the 𝑛 elements of 𝑆.
When given a problem, it is often a good idea to solve a few small cases
by hand. This applies especially to this kind of construction problems, where
constructions for small inputs often shows some pattern or insight into how to
solve larger instances. The case 𝑛 = 1 is not particularly meaningful, since it
is trivially true (any integer is a multiple of 1). When 𝑛 = 2, we get an insight
which might not seem particularly interesting, but is key to the problem. We are
given 2 · 2 − 1 = 3 numbers, and seek two numbers whose sum is even. Given
three numbers, it must have either two numbers which both are even, or two odd
numbers. Both of these cases yield a pair with an even sum.
It turns out that this construction generalizes to larger instances. Generally,
it is easier to do the “divide” part of a divide and conquer solution first, but in
this problem we will do it the other way around. The recursion will follow quite
naturally after we attempt to find a way in combining solutions to the smaller
instance to a larger one.
We will lay the the ground work for a reduction of the case 2𝑛 to 𝑛. First,
assume that we could solve the problem for a given 𝑛. The larger instance then
contains 2(2𝑛 − 1) = 4𝑛 − 1 numbers, of which we seek 2𝑛 numbers whose sum
is a multiple of 2𝑛. This situation is essentially the same as for the case 𝑛 = 2,
except everything is scaled up by 𝑛. Can we scale our solution up as well?
If we have three sets of 𝑛 numbers whose respective sums are all multiples
of 𝑛, we can find two sets of 𝑛 numbers whose total sum is divisible by 2𝑛.
This construction essentially use the same argument as for 𝑛 = 2. If the three
subsets have sums 𝑎𝑛, 𝑏𝑛, 𝑐𝑛 and we wish to find two whose sum is a multiple of
2𝑛, this is the same as finding two numbers of 𝑎, 𝑏, 𝑐 whose sum is a multiple of
2. This is possible, according to the case 𝑛 = 2.
A beautiful generalization indeed, but we still have some remnants of wishful
thinking we need to take care of. The construction assumes that, given 4𝑛 − 1
numbers, we can find three sets of 𝑛 numbers whose sum are divisible by 𝑛.
We have now come to the recursive aspect of the problem. By assumption, we
could solve the problem for 𝑛. This means we can pick any 2𝑛 − 1 of our 4𝑛 − 1
numbers to get our first subset. The subset uses up 𝑛 of our 4𝑛 − 1 numbers,

205
C HAPTER 12. D IVIDE AND C ONQUER

leaving us with only 3𝑛 − 1 numbers. We keep going, and pick any 2𝑛 − 1 of


these numbers and recursively get a second subset. After this, 2𝑛 − 1 numbers
are left, exactly how many we need to construct our third subset.
The division of the problem was thus into four parts. Three subsets of 𝑛
numbers, and one set of 𝑛 − 1 which we throw away. Coming up with such
a division essentially required us to solve the combination part first with the
generalizing of the case 𝑛 = 2:
1 void fillWith(hashset<int>& toFill, hashset<int>& from, int size) {
2 while (sz(from) < size) {
3 from.insert(toFill.begin());
4 toFill.erase(toFill.begin());
5 }
6 }
7
8 hashset<int> divisbleSubset(int k, hashset<int> A) {
9 if (k == 1) return nums;
10
11 hashset<int> part;
12 // Find three subsets of size k/2 with sums divisible by k/2
13 fillWith(part, A, k - 1);
14 hashset<int> pa = divisibleSubset(k / 2, A);
15 A.erase(all(pa));
16
17 fillWith(part, A, k - 1);
18 hashset<int> pb = divisibleSubset(k / 2, A);
19 A.erase(all(pb));
20
21 fillWith(part, A, k - 1);
22 hashset<int> pc = divisibleSubset(k / 2, A);
23 A.erase(all(pc));
24
25 // Choose two who sum to k/2
26 int as = accumulate(all(pa), 0);
27 int bs = accumulate(all(pa), 0);
28 int cs = accumulate(all(pa), 0);
29
30 hashset<int> ans;
31 if ((as + bs) % k == 0) {
32 ans.insert(all(pa));
33 ans.insert(all(pb));
34 } else if ((as + cs) % k == 0) {
35 ans.insert(all(pa));
36 ans.insert(all(pc));
37 } else {
38 ans.insert(all(pb));
39 ans.insert(all(pc));
40 }

206
12.1. I NDUCTIVE C ONSTRUCTIONS

41 return ans;
42 }

This complexity is somewhat more difficult to analyze. Now, each call to


divisibleSubset takes linear time to 𝑛, and makes 3 recursive calls with 𝑛/2.
Thus, the complexity obeys the recurrence 𝑇 (𝑛) = 3𝑇 (𝑛/2) + Θ(𝑛). By the
master theorem, this has complexity Θ(𝑛 log2 3 ) ≈ Θ(𝑛 1.6 ).

Exercise 12.2. What happens if we, when solving the problem for some 𝑘,
construct 𝑘 − 1 pairs of integers whose sum are even, throw away the remaining
element, and scale the problem down by 2 instead? What is the complexity
then?

Exercise 12.3. The problem can be solved using a similar divide and conquer
algorithm for any 𝑘, not just those which are powers of two1. In this case, those
𝑘 which are prime numbers can be treated as base cases. How is this done for
composite 𝑘? What is the complexity?

Exercise 12.4. The knight piece in chess can move in 8 possible ways (moving 2
steps in any one direction, and 1 step in one of the two perpendicular directions).
A closed tour exists for an 8 × 8 grid.

Figure 12.6: A closed tour on an 8 × 8 grid.

Give an algorithm to construct a tour for any 2𝑛 × 2𝑛 grid with 𝑛 ≥ 3.

Exercise 12.5. An 𝑛-bit Gray code is a sequence of all 2𝑛 bit strings of length
𝑛, such that two adjacent bit strings differ in only one position. The first and
1This result is known as the Erdős–Ginzburg–Ziv theorem

207
C HAPTER 12. D IVIDE AND C ONQUER

last strings of the sequence are considered adjacent. Possible Gray codes for the
first few 𝑛 are
𝑛 = 1: 0 1
𝑛 = 2: 00 01 11 10
𝑛 = 3: 000 010 110 100 101 111 011 001
Give an algorithm to construct an 𝑛-bit Gray code for any 𝑛.

Problem 12.1
Bell Ringing – bells
Hamiltonian Hypercube – hypercube

12.2 Merge Sort


Merge sort is a sorting algorithm which uses divide and conquer. It is rather
straightforward, and works by recursively sorting smaller and smaller parts of
the array. When sorting an array by dividing it into parts and combining their
solution, there is an obvious candidate for how to perform this partitioning.
Namely, splitting the array into two halves and sorting them. When splitting
an array in half repeatedly, we will eventually reach a rather simple base case.
An array containing a single element is already sorted, so it is trivially solved.
If we do so recursively, we get the recursion tree in Figure 12.7. Coding this
recursive split is easy.

5 1 6 3 7 2 0 4

5 1 6 3 7 2 0 4

5 1 6 3 7 2 0 4

5 1 6 3 7 2 0 4

Figure 12.7: The recursion tree given when performing a recursive split of the array
[5, 1, 6, 3, 7, 2, 0, 4].

When we have sorted the two halves, we need to combine them to get a sorted
version of the entire array. The procedure to do this is based on a simple insight.
If an array 𝐴 is partitioned into two smaller arrays 𝑃 1 and 𝑃2 , the smallest value

208
12.2. M ERGE S ORT

of 𝐴 must be either the smallest value of 𝑃1 or the smallest value of 𝑃2 . This


insight gives rise to a simple iterative procedure, where we repeatedly compare
the smallest values of 𝑃1 and 𝑃2 , extract the smaller one of them, and append it
to our sorted array.
1 void sortVector(vector<int>& a, int l, int r){
2 int m = (l + r)/2;
3 sortVector(a, l, m); // sort left half
4 sortVector(a, m, r); // sort right half
5
6 //combine the answers
7 vector<int> answer;
8 int x = l;
9 int y = m;
10 while(x < m || y < r){
11 if(x == m || (y < r && a[y] < a[x])){
12 answer.push_back(a[y]);
13 y++;
14 } else {
15 answer.push_back(a[x]);
16 x++;
17 }
18 }
19 for (int i = l; i < r; ++i) {
20 a[i] = answer[i - l];
21 }
22 }

To compute the complexity, consider the recursion tree in Figure 12.7. We


make one call with 8 elements, two calls with 4 elements, and so on. Further, the
combining procedure takes Θ(𝑙) time for a call with 𝑙 elements. In the general
case of 𝑛 = 2𝑘 elements, this means merge sort takes time

𝑘
Õ
2𝑖 · Θ(2𝑘−𝑖 ) = Θ(𝑘2𝑘 )
𝑖=0

Since 𝑘 = log2 𝑛, this means the complexity is Θ(𝑛 log 𝑛).

Exercise 12.6. Our complexity analysis assumed that the length of the array is a
power of 2. The complexity is the same in the general case. Prove this fact.

Exercise 12.7. Given an array 𝐴 of size 𝑛, we call the pair 𝑖 < 𝑗 an inversion
of 𝐴 if 𝐴[𝑖] > 𝐴[ 𝑗]. Adapt the merge sort algorithm to count the number of
inversions of an array in Θ(𝑛 log 𝑛).

209
C HAPTER 12. D IVIDE AND C ONQUER

12.3 Binary Search


The binary search is a common component of other solution. Given a number
𝐿 and a non-decreasing function 𝑓 : R → R, we wish to find the greatest 𝑥
such that 𝑓 (𝑥) ≤ 𝐿. Additionally, we need two numbers lo and hi, such that
𝑓 (lo) ≤ 𝐿 < 𝑓 (hi).
It is not difficult to see how we solve this problem. Consider the number
2 . If 𝑓 (mid) ≤ 𝐿, then we know that the answer must lie somewhere
𝑚𝑖𝑑 = lo+hi
in the interval [mid, hi). On the other hand, 𝐿 < 𝑓 (mid) gives us a better upper
bound on the answer, which must be contained in [lo, mid). Computing 𝑓 (mid)
allowed us to halve the interval in which the answer can be.
lo hi
f (x)

mid

x
lo hi
f (x)

mid

x
lo hi
f (x)

mid

Figure 12.8: Two iterations of binary search.

We can repeat this step until we get close to 𝑥.

1 const double precision = 1e-7;


2
3 double binarySearch(double lo, double hi, double lim) {
4 while (hi - lo > precision) {
5 double mid = (lo + hi) / 2;
6 if (lim < f(mid)) hi = mid;
7 else lo = mid;
8 }
9 return lo;
10 }

210
12.3. B INARY S EARCH

Notice that we are actually computing an approximation of 𝑥 within some


given precision (10−7 in the example implementation), but this is often useful
enough.
Competitive Tip
Remember that the double-precision floating point type only have a precision of about
1015 . If the limits in your binary search are on the order of 10𝑥 , this means that using
a binary search precision of something smaller than 10𝑥−15 may cause an infinite
loop. This happens because the difference between lo and the next possible double is
actually larger than your precision.
As an example, the following parameters cause our binary search with precision
10−7 to fail.
1 double f(double x) {
2 return 0;
3 }
4
5 double lo = 1e12;
6 double hi = nextafter(lo, 1e100);
7 binarySearch(lo, hi, 0);

An alternative when dealing with limit precision is to perform binary search a


fixed number of iterations:
1 double binarySearch(double lo, double hi, double lim) {
2 rep(i,0,60) {
3 double mid = (lo + hi) / 2;
4 if (lim < f(mid)) hi = mid;
5 else lo = mid;
6 }
7 return lo;
8 }

The complexity of binary search depends on how good of an approximation


we want. Originally, the interval we are searching in has length hi − lo. After
halving the interval 𝑐 times, it has size hi−lo
2𝑐 . If we binary search until our
interval has some size 𝑝, this means we must choose 𝑐 such that
hi − lo
≤𝑝
2𝑐
hi − lo
≤1
𝑝2𝑐
hi − lo
≤ 2𝑐
𝑝
hi − lo
log2 ≤𝑐
𝑝

211
C HAPTER 12. D IVIDE AND C ONQUER

For example, if we have an interval of size 109 which we wish to binary search
down to 10−7 , this would require log2 1016 = 54 iterations of binary search.
Now, let us study some applications of binary search.

Optimization Problems

Cutting Hot Dogs


The finals of the Swedish Olympiad in Informatics is soon to be arranged. During
the competition the participants must have a steady supply of hot dogs to get
enough energy to solve the problems. Being foodies, the organizing committee
couldn’t just buy ready-to-cook hot dogs – they have to make them on their own!
The organizing committee has prepared 𝑁 long rods of hot dog. These rods
have length 𝑎 1, 𝑎 2, ..., 𝑎 𝑁 centimeters. It is now time to cut up these rods into
actual hot dogs. A rod can be cut into any number 𝑘 of hot dogs using 𝑘 − 1
cuts. Note that parts of one rod cannot be combined with parts of another rod,
due to the difference in taste. It is allowed to leave leftovers from a rod that will
not become a hot dog.
In total, the contest has 𝑀 participants. Each contestant should receive a
single hot dog, and all of their hot dogs should be of the same length. What is
the maximal hot dog length 𝐿 the committee can use to cut 𝑀 hot dogs of length
𝐿?
Input
The first line contains two integers 1 ≤ 𝑁 ≤ 10 000 and 1 ≤ 𝑀 ≤ 109 .
The next line contains 𝑁 real numbers 0 < 𝑎 1, 𝑎 2, ..., ≤ 106 , the length of the
hot dog rods in centimeters.
Output
Output a single real number – the maximal hot dog length possible in centimeters.
Any answer with a relative or absolute error of 10−6 is acceptable.

Solution. This problem is not only an optimization problem, but a monotone


one. The monotonicity lies in that while we only ask for a certain maximal hot
dog length, all lengths below it would also work (in the sense of being able
to have 𝑀 hot dogs cut of this length), while all lengths above the maximal
length produce less than 𝑀 hot dogs. Monotone optimization problems makes
it possible to remove the optimization aspect by inverting the problem. Instead
of asking ourselves what the maximum length is, we can instead ask how many

212
12.3. B INARY S EARCH

hot dogs 𝑓 (𝑥) can be constructed from a given length 𝑥. After this inversion,
the problem is now on the form which binary search solves: we wish to find
the greatest 𝑥 such that 𝑓 (𝑥) = 𝑀 (replacing ≤ with = is equivalent in the cases
where we know that 𝑓 (𝑥) assume the value we are looking for). We know that
this length is at most max𝑖 𝑎𝑖 ≤ 106 , which gives us the interval (0, 106 ] to
search in.
What remains is to actually compute the function 𝑓 (𝑥). In our case, this can
be done by considering just a single rod. If we want to construct hot dogs of
length 𝑥, we can get at most b 𝑎𝑥𝑖 c hot dogs from a rod of length 𝑎𝑖 . Summing
this for every rod gives us our solution.

1: procedure CountRods(lengths 𝐴, minimum length 𝑀)


2: dogs ← 0
3: for each 𝑙 ∈ 𝐴 do
4: dogs ← dogs + b𝑙/𝑀c
5: return dogs
6: procedure HotDogs(lengths 𝐴, participants 𝑀)
7: 𝐿 ← 0, 𝐻 ← 106
8: while 𝐻 − 𝐿 ← 10−7 do
9: mid ← (𝐿 + 𝐻 )/2
10: if CountRods(𝐴, mid) < 𝑀 then
11: 𝐻 ← mid
12: else
13: 𝐿 ← mid
14: output 𝐿

The key to our problem was that the number of hot dogs constructible with a
length 𝑥 was monotonically decreasing with 𝑥. It allowed us to perform binary
search on the answer, a powerful technique which is a component of many
optimization problems. In general, it is often easier to determine if an answer is
acceptable, rather than computing a maximal answer.

Searching in a Sorted Array


The classical application of binary search is to find the position of an element 𝑥
in a sorted array 𝐴 of length 𝑛. Applying binary search to this is straightforward.

213
C HAPTER 12. D IVIDE AND C ONQUER

At first, we know nothing about location of the element – its position could be
anyone of [0, 𝑛). So, we consider the middle element, 𝑚𝑖𝑑 = b 𝑛2 c, and compare
𝐴[𝑚𝑖𝑑] to 𝑥. Since 𝐴 is sorted, this leaves us with three cases:

• 𝐴[𝑚𝑖𝑑] = 𝑥 – and we are done

• 𝑥 < 𝐴[𝑚𝑖𝑑] – since the array is sorted, any occurrence of 𝑥 must be to


the left of 𝑚𝑖𝑑

• 𝐴[𝑚𝑖𝑑] < 𝑥 – by the same reasoning, 𝑥 can only lie to the right of 𝑚𝑖𝑑.

The last two cases both halve the size of the sub-array which 𝑥 could be inside.
Thus, after doing this halving log2 𝑛 times, we have either found 𝑥 or can
conclude that it is not present in the array.

1: procedure Search(array 𝐴, target 𝑥)


2: 𝐿 ← 0, 𝐻 ← |𝐴|
3: while 𝐻 − 𝐿 > 0 do
4: mid ← b(𝐿 + 𝐻 )/2c
5: if 𝐴[mid] = 𝑥 then
6: return mid
7: else if 𝑥 < 𝐴[mid] then
8: 𝐻 = mid
9: else
10: 𝐿 = mid + 1
11: return −1

Competitive Tip
When binary searching over discrete domains, care must be taken. Many bugs have
been caused by improper binary searches2 .
The most common class of bugs is related to the endpoints of your interval (i.e.
whether they are inclusive or exclusive). Be explicit regarding this, and take care that
each part of your binary search (termination condition, midpoint selection, endpoint
updates) use the same interval endpoints.

Exercise 12.8. — Kattis Problems Ballot Boxes – ballotboxes


2In fact, for many years the binary search in the standard Java run-time had a bug: http:
//bugs.java.com/bugdatabase/view_bug.do?bug_id=6412541

214
12.3. B INARY S EARCH

Generalized Binary Search


Binary search can also be used to find all points where a monotone function
changes value (or equivalently, all the intervals on which a monotone function
is constant). Often, this is used in problems on large sequences (often with
𝑛 = 100 000 elements), which can be solved by iterating through all contiguous
sub-sequences in Θ(𝑛 2 ) time.

Or Max
Petrozavodsk Winter Training Camp 2015
Given is an array 𝐴 of integers. Let

𝐵(𝑖, 𝑘) = 𝐴[𝑖] | 𝐴[𝑖 + 1] | ... | 𝐴[𝑖 + 𝑘 − 1]

i.e. the bitwise or of the 𝑘 consecutive numbers starting with the 𝑖’th,

𝑀 (𝑖, 𝑘) = max{𝐴[𝑖], 𝐴[𝑖 + 1], ..., 𝐴[𝑖 + 𝑘 − 1]}

i.e. the maximum of the 𝑘 consecutive numbers starting with the 𝑖’th, and

𝑆 (𝑖, 𝑘) = 𝐵(𝑖, 𝑘) + 𝑀 (𝑖, 𝑘)

For each 1 ≤ 𝑘 ≤ 𝑛, find the maximum of 𝑆 (𝑖, 𝑘).


Input
The first line contains the length 1 ≤ 𝑛 ≤ 105 of 𝐴.
The next and last line contains the 𝑛 values of 𝐴 (0 ≤ 𝐴[𝑖] < 216 ), separated
by spaces.
Output
Output 𝑛 integers, the maximum values of 𝑆 (𝑖, 𝑘) for 𝑘 = 1, 2, ..., 𝑛.

As an example, consider the array in Figure 12.9. The best answer for 𝑘 = 1
would be 𝑆 (0, 1), with both maximal element and bitwise or 5, totaling 10. For
𝑘 = 2, we have 𝑆 (6, 2) = 7 + 4 = 11.
This problem can easily be solved in Θ(𝑛 2 ), by computing every 𝑆 (𝑖, 𝑘)
iteratively. We can compute all the 𝐵(𝑖, 𝑘) and 𝑀 (𝑖, 𝑘) using the recursions
(
0 if 𝑘 = 0
𝐵(𝑖, 𝑘) :=
𝐵(𝑖, 𝑘 − 1) | 𝐴[𝑖 + 𝑘 − 1] if 𝑘 > 0

215
C HAPTER 12. D IVIDE AND C ONQUER

i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9

5 1 4 2 2 0 4 3 1 2

101 001 100 010 010 000 100 011 001 010

Figure 12.9: Example array, with the numbers additionally written in binary.

(
0 if 𝑘 = 0
𝑀 (𝑖, 𝑘) =
max{𝑀 (𝑖, 𝑘 − 1), 𝐴[𝑖 + 𝑘 − 1]} if 𝑘 > 0

by looping over 𝑘, once we fix an 𝑖. With 𝑛 = 100 000, this approach is too slow.
The difficulty of the problem lies in 𝑆 (𝑖, 𝑘) consisting of two basically
unrelated parts – the maximal element and the bitwise or of a segment. When
maximizing sums of unrelated quantities that put constraints on each other,
brute force often seems like a good idea. This is basically what we did in the
Buying Books problem (Section 9.4), where we minimized the sum of two parts
(postage and book costs) which constrained each other (buying a book forced us
to pay postage to its store) by brute forcing over one of the parts (the set of stores
to buy from). Since the bitwise or is much more complicated than the maximal
element – it is decided by an entire interval rather than a single element – we
are probably better of doing brute force over the maximal element. Our brute
force will consist of fixing which element is our maximal element, by assuming
that 𝐴[𝑚] is the maximal element.
With this simplification in hand, only the bitwise or remains. We could
now solve the problem by looping over all the left endpoints of the interval and
all the right endpoints of the interval. At a first glance, this seems to actually
worsen the complexity. Indeed, this takes quadratic time for each 𝑚 (on average),
resulting in a cubic complexity.
This is where we use our new technique. It turns out that, once we fix 𝑚, there
are only a few possible values for the bitwise or of the intervals containing the𝑚’th
element, Any such interval 𝐴[𝑙], 𝐴[𝑙 + 1], ..., 𝐴[𝑚 − 1], 𝐴[𝑚], 𝐴[𝑚 + 1], ..., 𝐴[𝑟 −
1], 𝐴[𝑟 ] can be split into two parts: one to the left, 𝐴[𝑙], 𝐴[𝑙 +1], ..., 𝐴[𝑖 −1], 𝐴[𝑖],
and one to the right, 𝐴[𝑖], 𝐴[𝑖 + 1], ..., 𝐴[𝑟 − 1], 𝐴[𝑟 ]. The bitwise or of either
of these two parts is actually a monotone function (in their length), and can only
assume at most 16 different values!
Studying Figure 12.10 gives a hint about why. The first row shows the binary

216
12.4. K ARATSUBA’ S ALGORITHM

i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9

101 001 100 010 010 000 100 011 001 010

111 111 110 110 110 100 100 111 111 111

7 7 6 6 6 4 4 7 7 7

Figure 12.10: The bitwise or of the left and right parts, with an endpoint in 𝑚 = 6

values of the array, with 𝑚 = 6 (our presumed maximal element) marked. The
second row shows the binary values of the bitwise or of the interval [𝑖, 𝑚] or
[𝑚, 𝑖] (depending on whether 𝑚 is the right or left endpoint). The third line
shows the decimal values of the second row.
For example, when extending the interval [2, 6] (with bitwise or 110) to
the left, the new bitwise or will be 110|001. This is the only way the bitwise
or can change – when the new value includes bits which so far have not been
set. Obviously, this can only happen at most 16 times, since the values in 𝐴 are
bounded by 216 .
For a given 𝑚, this gives us a partition of all the elements, by the bitwise or
of the interval [𝑚, 𝑖]. In Figure 12.10, the left elements will be partitioned into
[0, 1], [2, 4], [5, 6]. The right elements will be partitioned into [6, 6], [7, 9].
These partitions are everything we need to compute the final.
For example, if we pick the left endpoint from the part [2, 4] and the right
endpoint from the part [7, 9], we would get a bitwise or that is 6 | 7 = 7, of a
length between 4 and 8, together with the 4 as the presumed maximal element.
For each maximal element, we get at most 16 · 16 such choices, totaling less
than 256𝑁 such choices. From these, we can compute the final answer using a
simple sweep line algorithm.

12.4 Karatsuba’s algorithm


Karatsuba’s algorithm was developed by Russian mathematician Anatoly Karat-
suba and published in the early 1960’s. It is one of the earliest examples of a
divide and conquer algorithm, and is used to quickly multiply large numbers.
While multiplying small numbers (i.e. those that fit in the integer types of your
favorite programming language) is considered to be a Θ(1) operation, this is

217
C HAPTER 12. D IVIDE AND C ONQUER

not the case for arbitrarily large integers. We will look at Karatsuba as a way of
multiplying polynomials, but this can easily be extended to multiplying integers.

Polynomial Multiplication
Given two 𝑛-degree polynomials (where 𝑛 can be large) 𝑝 (𝑥) = and
Í𝑛 𝑖𝑎
𝑖=0 𝑥 𝑖
𝑞(𝑥) = 𝑛𝑖=0 𝑥 𝑖 𝑏𝑖 compute their product
Í

2𝑛
Õ 𝑖
Õ
(𝑝𝑞) (𝑥) = 𝑥𝑖 ( 𝑎 𝑗 𝑏𝑖−𝑗 )
𝑖=0 𝑗=0

The naive multiplication algorithm evaluates this using Θ(𝑛 2 ) multiplications


(e.g. by two nested loops).
It turns out we can do this faster, using a recursive transformation. If we split
the numbers 𝑝 and 𝑞 into their upper and lower 𝑘 = 𝑛2 coefficients (if 𝑛 is odd,
we pad the polynomials with a leading zero), so that 𝑝 (𝑥) = 𝑝𝑙 (𝑥)𝑥 𝑘 + 𝑝𝑟 (𝑥)
and 𝑞(𝑥) = 𝑞𝑙 (𝑥)𝑥 𝑘 + 𝑞𝑟 (𝑥), their product is equal to

(𝑝𝑞) (𝑥) = (𝑝𝑙 (𝑥)𝑥 𝑘 + 𝑝𝑟 (𝑥)) (𝑞𝑙 (𝑥)𝑥 𝑘 + 𝑞𝑟 (𝑥))


= 𝑝𝑙 (𝑥)𝑞𝑙 (𝑥)𝑥 𝑛 + (𝑝𝑙 (𝑥)𝑞𝑟 (𝑥) + 𝑝𝑟 (𝑥)𝑞𝑙 (𝑥))𝑥 𝑘 + 𝑝𝑟 (𝑥)𝑞𝑟 (𝑥)

This formula requires multiplying 4 pairs of 𝑘-degree polynomials instead, which


we can recursively compute, resulting in the time complexity recurrence 𝑇 (𝑛) =
4𝑇 ( 𝑛2 ) + Θ(𝑛). Using the master theorem gives us the solution 𝑇 (𝑛) = Θ(𝑛 2 ),
which is no faster than the naive multiplication.
However, we can compute 𝑝𝑙 (𝑥)𝑞𝑟 (𝑥) + 𝑝𝑟 (𝑥)𝑞𝑙 (𝑥) using only one multi-
plication instead of two. Both of these terms are part of the expansion of 𝑝𝑞,
which is only one multiplication. That particular multiplication is on 𝑛-degree
polynomials, but it is not difficult to see how we can reduce it to a single 𝑘-degree
multiplication. We simply throw away the multiplicative factors that makes
𝑝𝑙 𝑥 𝑘 + 𝑝𝑟 and 𝑞𝑙 𝑥 𝑘 + 𝑞𝑟 an 𝑛-degree polynomials:

(𝑝𝑙 (𝑥)+𝑝𝑟 (𝑥)) (𝑞𝑙 (𝑥)+𝑞𝑟 (𝑥)) = 𝑝𝑙 (𝑥)𝑞𝑙 (𝑥)+𝑝𝑙 (𝑥)𝑞𝑟 (𝑥)+𝑝𝑟 (𝑥)𝑞𝑙 (𝑥)+𝑝𝑟 (𝑥)𝑞𝑟 (𝑥)

so that

𝑝𝑙 (𝑥)𝑞𝑟 (𝑥)+𝑝𝑟 (𝑥)𝑞𝑙 (𝑥) = (𝑝𝑙 (𝑥)+𝑝𝑟 (𝑥)) (𝑞𝑙 (𝑥)+𝑝𝑟 (𝑥))−𝑝𝑙 (𝑥)𝑞𝑙 (𝑥)−𝑝𝑟 (𝑥)𝑞𝑟 (𝑥)

This means we only need to compute three 𝑘-degree multiplications: (𝑝𝑙 (𝑥) +
𝑝𝑟 (𝑥)) (𝑞𝑙 (𝑥) + 𝑞𝑟 (𝑥)), 𝑝𝑙 (𝑥)𝑞𝑙 (𝑥), 𝑝𝑟 (𝑥), 𝑞𝑟 (𝑥) Our time complexity recurrence

218
12.5. C HAPTER N OTES

is then reduced to 𝑇 (𝑛) = 3𝑇 ( 𝑛2 ) + 𝑂 (𝑛), which by the master theorem is


Θ(𝑛 log2 3 ) ≈ Θ(𝑛 1.585 ).

Exercise 12.9. Polynomial Multiplication 2 – polymul2

12.5 Chapter Notes

219
C HAPTER 12. D IVIDE AND C ONQUER

220
13 Data Structures
This chapter extends Chapter 6 further by showing the two most common
advanced data structures that make an appearence in algorithmic problem
solving. We do not take the approach of simply presenting the data structure
and the problem that it solves. Instead, we take the same approach as we would
for any example problem or algorithm in this book, by gradually improving an
initial naive solution using additional insights. This is particularly valuable in
this case, since certain problems may require variations of these structures in
which only certain of the optimizations we show are applicable. Some of the
techniques which we show during this journy can also occasionally be applied
to other problems, so make sure to digest not only the final results, but every
individual step on the journey to it.

13.1 Disjoint Sets


In the Connectivity Problem, we want to determine whether two vertices in
the graph are in the same connected component. This problem can be solved
using a Depth-First Search (Section 14.2). For now, we will instead focus on an
extension of this problem called the Dynamic Connectivity Problem, Additions
only1, where we may also alter the graph by adding edges.

Dynamic Connectivity, Additions Only


Given a graph 𝐺 which initially consists of 𝑉 disconnected vertices, you will
receive 𝑄 queries of two types:

1. take two vertices 𝑣 and 𝑤, and add an edge between them

2. determine whether vertices 𝑣 and 𝑤 are currently in the same component


This problem can easily be solved in 𝑂 (𝑄 2 ), by after each query performing
a DFS to partition the graph into its connected components. Since the graph has
at most 𝑄 edges, a DFS takes 𝑂 (𝑄) time.
1The Dynamic Connectivity problem where edges may also be removed can be solved by a data
structured called a Link-Cut tree, which is not discussed in this book.

221
C HAPTER 13. D ATA S TRUCTURES

We can improve this by explicitly computing the connected components at


each step. Originally, we have 𝑉 components of a single vertex each. When
adding an edge between two vertices that are in different components, these
components will now merge. Note that it is irrelevant which vertices we are
adding an edge between – it is only important what components the vertices
belong to. When merging two connected components to one, we iterate through
all the vertices of one component and add them to the other. Since we make
at most 𝑉 joins of different components, and the components involved contain
at most 𝑉 vertices, these queries only take 𝑂 (𝑉 2 ) time. Determining whether
two vertices are in the same component can then be done in 𝑂 (𝑉 ), leaving us
with a total complexity of 𝑂 (𝑉 (𝑉 + 𝑄)). We can speed this up further by using
a simple look-up table 𝑐𝑜𝑚𝑝 [𝑣] for the vertices, which stores some identifier for
the component a vertex is in. We can then respond to a connectivity query in
Θ(1) by comparing the value of 𝑐𝑜𝑚𝑝 [𝑣] and 𝑐𝑜𝑚𝑝 [𝑤]. Our complexity is then
𝑂 (𝑉 2 + 𝑄) instead.
Finally, we will use a common trick to improve the complexity to 𝑂 (𝑉 log 𝑉 +
𝑄) instead. Whenever we merge two components of size 𝑎 and 𝑏, we can do this
in 𝑂 (min(𝑎, 𝑏)) instead of 𝑂 (𝑎 + 𝑏) by merging the smaller component into the
larger component. Any individual vertex can be part of the smaller component
at most 𝑂 (log 𝑉 ) times. Since the total size of a component is always at least
twice the size of the smaller component, this means that if a vertex is merged as
part of the smaller component 𝑘 times the new component size must be at least
2𝑘 . This cannot exceed 𝑉 , meaning 𝑘 ≤ log2 𝑉 . If we sum this up for every
vertex, we arrive at the 𝑂 (𝑉 log 𝑉 + 𝑄) complexity.

Disjoint Set
1 struct DisjointSets {
2
3 vector<vector<int>> components;
4 vector<int> comp;
5 DisjointSets(int elements) : components(elements), comp(elements) {
6 iota(all(comp), 0);
7 for (int i = 0; i < elements; ++i) components[i].push_back(i);
8 }
9
10 void unionSets(int a, int b) {
11 a = comp[a]; b = comp[b];
12 if (a == b) return;
13 if (components[a].size() < components[b].size()) swap(a, b);
14 for (int it : components[b]) {

222
13.1. D ISJOINT S ETS

15 comp[it] = a;
16 components[a].push_back(it);
17 }
18 }
19
20 };

A somewhat faster2 version of this structure instead performs the merges


of components lazily. When merging two components, we do not update
comp[𝑣] for every vertex in the smaller component. Instead, if 𝑎 and 𝑏 are
the representative vertices of the smaller and the larger component, we only
merge 𝑎 by setting comp[𝑎] = 𝑏. However, we still need to perform the merges.
Whenever we try to find the component a vertex lies in, we perform all the
merges we have stored so far.
Improved Disjoint Set
1 struct DisjointSets {
2
3 vector<int> comp;
4 DisjointSets(int elements) : comp(elements, -1) {}
5
6 void unionSets(int a, int b) {
7 a = repr(a); b = repr(b);
8 if (a == b) return;
9 if (-comp[a] < -comp[b]) swap(a, b);
10 comp[a] += comp[b];
11 comp[b] = a;
12 }
13
14 bool is_repr(int x) { return comp[x] < 0; }
15
16 int repr(int x) {
17 if (is_repr(x)) return x;
18 while (!is_repr(comp[x])) {
19 comp[x] = comp[comp[x]];
20 }
21 return comp[x];
22 }
23
24 };

However, it turns out we can sometimes perform many merges at once.


Consider the case where we have 𝑘 merges lazily stored for some vertex 𝑣. Then,
we can perform all the merges of 𝑣, comp[𝑣], comp[comp[𝑣]], . . . at the same
time since they all have the same representative: the representative of 𝑣.
2With regards to actual time, not asymptotically.

223
C HAPTER 13. D ATA S TRUCTURES

Performing Many Merges


1 int repr(int x) {
2 if (comp[x] < 0) return x;
3 int par = comp[x];
4 comp[x] = repr(par);
5 return comp[x];
6 }

Intuitively, this should be faster. After all, we would be performing perform-


ing at least 𝑘 + (𝑘 − 1) + (𝑘 − 2) + · · · = 𝑂 (𝑘 2 ) merges in 𝑂 (𝑘) time, only for
those vertices. If there were more vertices part of the components merged, this
number grows even more.
It turns out that this change improves the complexity asymptotically.

13.2 Range Queries


Problems often ask us to compute some expression based on some interval of
an array. The expressions are usually easy to compute in Θ(len) where len is
the length of the interval. We will now study some techniques that trade much
faster query responses for a bit of memory and precomputation time.

Prefix Precomputation

Interval Sum
Given a sequence of integers 𝑎 0, 𝑎 1, . . . , 𝑎 𝑁 −1 , you will be given 𝑄 queries of
the form [𝐿, 𝑅). For each query, compute 𝑆 (𝐿, 𝑅) = 𝑎𝐿 + 𝑎𝐿+1 + · · · + 𝑎𝑅−1 .
Computing the sums naively would require Θ(𝑁 ) worst-case time per query
if the intervals are large, for a total complexity of Θ(𝑁 𝑄). If 𝑄 = Ω(𝑁 ) we
can improve this to Θ(𝑁 2 + 𝑄) by precomputing all the answers. To do this in
quadratic time, we use the recurrence
(
0 if 𝐿 = 𝑅
𝑆 (𝐿, 𝑅) =
𝑆 (𝐿, 𝑅 − 1) + 𝑎𝑅−1 otherwise

Using this recurrence we can compute the sequence 𝑆 (𝐿, 𝐿), 𝑆 (𝐿, 𝐿+1), 𝑆 (𝐿, 𝐿+
2), . . . , 𝑆 (𝐿, 𝑁 ) in average Θ(𝑁 ) time for every 𝐿. This gives us the Θ(𝑁 2 + 𝑄)
complexity.
If the function we are computing has an inverse, we can speed this precomputa-
tion up a bit. Assume that we have computed the values 𝑃 (𝑅) = 𝑎 0 +𝑎 1 +· · ·+𝑎𝑅−1 ,

224
13.2. R ANGE Q UERIES

i.e. the prefix sums of 𝑎𝑖 . Since this function is invertible (with inverse −𝑃 (𝑅)),
we can compute 𝑆 (𝐿, 𝑅) = 𝑃 (𝑅) − 𝑃 (𝐿). Basically, the interval [𝐿, 𝑅) consists
of the prefix [0, 𝑅) with the prefix [0, 𝐿) removed. As addition is invertible, we
could simply remove the latter prefix 𝑃 (𝐿) from the prefix 𝑃 (𝑅) using subtraction.
Indeed, expanding this expression shows us that

𝑃 (𝑅) − 𝑃 (𝐿) = (𝑎 0 + 𝑎 1 + · · · + 𝑎𝑅−1 ) − (𝑎 0 + 𝑎 1 + · · · + 𝑎𝐿−1 )


= 𝑎𝐿 + · · · + 𝑎𝑅−1 = 𝑆 (𝐿, 𝑅)

This leads to the following algorithm:

1: procedure Prefixes(sequence 𝐴)
2: P ← 𝑛𝑒𝑤 𝑖𝑛𝑡 [|𝐴| + 1]
3: for 𝑖 = 0 to |𝐴| − 1 do
4: 𝑃 [𝑖 + 1] ← 𝑃 [𝑖] + 𝐴[𝑖]
5: return 𝑃

for the precomputation, and then

1: procedure IntervalSum(interval [𝐿, 𝑅), prefix table 𝑃)


2: return 𝑃 [𝑅] − 𝑃 [𝐿]

to query for values.


This same technique works for any invertible operation.

Exercise 13.1. The above technique does not work straight-off for non-commutative
operations. How can it be adapted to this case?

Sparse Tables
The case where a function does not have an inverse is a bit more diffi-
cult.

225
C HAPTER 13. D ATA S TRUCTURES

Interval Minimum
Given a sequence of integers 𝑎 0, 𝑎 1, . . . , 𝑎 𝑁 −1 , you will be given 𝑄 queries of
the form [𝐿, 𝑅). For each query, compute the value

𝑀 (𝐿, 𝑅) = 𝑚𝑖𝑛(𝑎𝐿 , 𝑎𝐿+1, . . . , 𝑎𝑅−1 )


Generally, you cannot compute the minimum of an interval based only on a
constant number of prefix minimums of a sequence. We need to modify our
approach. If we consider the naive approach, where we simply answer the
queries by computing it explicitly, by looping over all the 𝑅 − 𝐿 numbers in the
interval, this is Θ(len). A simple idea will improve the time used to answer
queries by a factor 2 compared to this. If we precompute the minimum of every
pair of adjacent elements, we cut down the number of elements we need to check
in half. We can take it one step further, by using this information to precompute
the minimum of all subarrays of four elements, by taking the minimum of two
pairs. By repeating this procedure for very power of two, we will end up with a
table 𝑚[𝑙] [𝑖] containing the minimum of the interval [𝑙, 𝑙 + 2𝑖 ), computable in
Θ(𝑁 log 𝑁 ).

Sparse Table
1 vector<vi> ST(const vi& A) {
2 vector<vi> ST(__builtin_popcount(sz(A)), vi(sz(A)));
3 ST[0] = A;
4 rep(len,1,ST.size()) {
5 rep(i,0,n - (1 << len) + 1) {
6 ST[len][i] = max(ST[len - 1][i], ST[len - 1][i + 1 << (len - 1)]);
7 }
8 }
9 return ST;
10 }

Given this, we can compute the minimum of an entire interval in logarithmic


time. Consider the binary expansion of the length len = 2𝑘1 + 2𝑘2 + · · · + 2𝑘𝑙 .
This consists of at most 𝑙𝑜𝑔2 len terms. However, this means that the intervals

[𝐿, 𝐿 + 2𝑘1 )

[𝐿 + 2𝑘1 , 𝐿 + 2𝑘1 + 2𝑘2 )


...
[𝐿 + 2𝑘1 + · · · + 2𝑘𝑙 −1 , 𝐿 + len)

226
13.2. R ANGE Q UERIES

together cover [𝐿, 𝐿 + 𝑙𝑒𝑛). Thus we can compute the minimum of [𝐿, 𝐿 + len)
as the minimum of log2 len intervals.

Sparse Table Querying


1 int rangeMinimum(const vector<vi>& table, int L, int R) {
2 int len = R - L;
3 int ans = std::numeric_limits<int>::max();
4 for (int i = sz(table) - 1; i >= 0; --i) {
5 if (len & (1 << i)) {
6 ans = min(ans, table[i][L]);
7 L += 1 << i;
8 }
9 }
10 return ans;
11 }

This is Θ((𝑁 + 𝑄) log 𝑁 ) time, since the preprocessing uses Θ(𝑁 log 𝑁 )
time and each query requires Θ(log 𝑄) time. This structure is called a Sparse
Table, or sometimes just the Range Minimum Query data structure.
We can improve the query time to Θ(1) by using that the min operation is
idempotent, meaning that min(𝑎, 𝑎) = 𝑎. Whenever this is the case (and the
operation at hand is commutative), we can use just two intervals to cover the
entire interval. If 2𝑘 is the largest power of two that is at most 𝑅 − 𝐿, then

[𝐿, 𝐿 + 2𝑘 )

[𝑅 − 2𝑘 , 𝑅)
covers the entire interval.
1 int rangeMinimum(const vector<vi>& table, int L, int R) {
2 int maxLen = 31 - __builtin_clz(R - L);
3 return min(table[maxLen][L], table[maxLen][R - (1 << maxLen)]);
4 }

While most functions either have inverses (so that we can use the prefix
precomputation) or has idempotent (so that we can use the Θ(1) sparse table),
some functions do not. In such cases (for example matrix multiplication), we
must use the logarithmic querying of the sparse table.

Segment Trees
The most interesting range queries occur on dynamic sequences, where values
can change.

227
C HAPTER 13. D ATA S TRUCTURES

Dynamic Interval Sum


Given a sequence of integers 𝑎 0, 𝑎 1, . . . , 𝑎 𝑁 −1 , you will be given 𝑄 queries. The
queries are of two types:

1. Given an interval [𝐿, 𝑅), compute 𝑆 (𝐿, 𝑅) = 𝑎𝐿 + 𝑎𝐿+1 + · · · + 𝑎𝑅−1 ).

2. Given an index 𝑖 and an integer 𝑣, set 𝑎𝑖 := 𝑣.


To solve the dynamic interval problem, we will use a similar approach as
the general sparse table. Using a sparse table as-is for the dynamic version,
we would need to update Θ(𝑁 ) intervals, meaning the complexity would be
Θ(log 𝑁 ) for interval queries and Θ(𝑁 ) for updates. It turns out the sparse table
as we formulated it contains an unnecessary redundancy.
If we accept using 2 log 𝑁 intervals to cover each query instead of log 𝑁 ,
we can reduce memory usage (and precomputation time!) to Θ(𝑁 ) instead of
Θ(log 𝑁 ). We will use the same decomposition as in merge sort (Section 12.2).
In Figure 13.1, you can see this decomposition, with an example of how a certain
interval can be covered. In this context, the decomposition is called a segment
tree.

5 1 6 3 7 2 0 4

5 1 6 3 7 2 0 4

5 1 6 3 7 2 0 4

5 1 6 3 7 2 0 4

Figure 13.1: The 2𝑁 − 1 intervals to precompute.

Usually, this construction is represented as a flat, 1-indexed array of length


2 dlog2 𝑁 e . The extraneous are set to some sentinel value that does not affect
queries (i.e. 0 in the case of sum queries). From this point, we assume 𝑁 to be
a power of two, with the array padded by these sentinel values.

1: procedure MakeTree(sequence 𝐴)
2: tree ← 𝑛𝑒𝑤 𝑖𝑛𝑡 [2|𝑁 |]

228
13.2. R ANGE Q UERIES

3: for 𝑖 = |𝑁 | to 2|𝑁 | − 1 do
4: tree[𝑖] ← 𝐴[𝑖 − |𝑁 |]
5: for 𝑖 = |𝑁 | − 1 to 1 do
6: tree[𝑖] ← tree[2 · 𝑖] + tree[2 · 𝑖 + 1]
7: return 𝑃

In the construction, we label each interval 1, 2, 3, . . . in order, meaning the entire


interval will have index 1, the two halves indices 2, 3 and so on. This means that
the two halves of the interval numbered 𝑖 will have indices 2𝑖 and 2𝑖 + 1, which
explains the precomputation loop.
We can compute the sum of each of these intervals in Θ(1), assuming the
sum of all the smaller intervals have already been computed, since each interval
is composed by exactly two smaller intervals (except the length 1 leaves). The
height of this tree is logarithmic in 𝑁 .
Note that any particular element of the array is included in log 𝑁 intervals
– one for each size. This means that updating an element requires only log 𝑁
intervals to be updated, which means the update time is Θ(log 𝑁 ) instead of
Θ(𝑁 ) which was the case for sparse tables.

1: procedure UpdateTree(tree 𝑇 , index 𝑖, value 𝑣)


2: index ← 𝑖 + 𝑁
3: tree[index] ← 𝑣
4: while index ≠ 0 do
5: index ← index/2
6: tree[index] ← tree[2 · index] + tree[2 · index + 1]

It is more difficult to construct an appropriate cover if the interval we are


to compute the sum of. A recursive rule can be used. We start at the interval
[0, 𝑁 ). One of three cases must now apply:

• We are querying the entire interval [0, 𝑁 )

• We are querying an interval that lies in either [0, 𝑁


2 ) or [ 𝑁2 , 𝑁 )

• We are querying an interval that lies in both [0, 𝑁


2 ) or [ 𝑁2 , 𝑁 )

In the first case, we are done (and respond with the sum of the current
interval). In the second case, we perform a recursive call on the half of the

229
C HAPTER 13. D ATA S TRUCTURES

interval that the query lies in. In the third case, we make the same recursive
construction for both the left and the right interval.
Since there is a possibility we perform two recursive calls, we might think
that the worst-case complexity of this query would be Θ(𝑁 ) time. However, the
calls that the third call results in will have a very specific form – they will always
have one endpoint in common with the interval in the tree. In this case, the only
time the recursion will branch is to one interval that is entirely contained in the
query, and one that is not. The first call will not make any further calls. All in
all, this means that there will me at most two branches of logarithmic height, so
that queries are 𝑂 (log 𝑁 ).

1: procedure QueryTree(tree 𝑇 , index 𝑖, query [𝐿, 𝑅), tree interval [𝐿 0, 𝑅 0))


2: if 𝑅 ≤ 𝐿 0 or 𝐿 > 𝑅 0 then
3: return 0
4: if 𝐿 = 𝐿 0 and 𝑅 = 𝑅 0 then
5: return 𝑇 [𝑖]
6: 𝑀 = (𝐿 0 + 𝑅 0)/2
7: lsum = QueryTree(𝑇 , 2𝑖, [𝐿, min(𝑅, 𝑀)), [𝐿 0, 𝑀))
8: rsum = QueryTree(𝑇 , 2𝑖 + 1, [max(𝐿, 𝑀), 𝑅), [𝑀, 𝑅))
9: return lsum + rsum

13.3 Chapter Notes

230
14 Graph Algorithms
Graph theory is probably the richest of all algorithmic areas. You are almost
guaranteed to see at least one graph problem in any given contest, so it is
important to be well versed in the common algorithms that operate on graphs.
The most important graph algorithms are used to find shortest paths from some
vertex. It is these algorithms that we study first.

14.1 Breadth-First Search


One of the most common basic graph algorithms is the breadth-first search. It
is used to find the distances from a certain vertex in an unweighted graph.

Single-Source Shortest Path, Unweighted Edges


Given an unweighted graph 𝐺 = (𝑉 , 𝐸) and a source vertex 𝑠, compute the
shortest distances 𝑑 (𝑠, 𝑣) for all 𝑣 ∈ 𝑉 .
For simplicity, we first consider the problem on a grid graph, where the
unit squares constitute vertices, and vertices which share an edge are connected.
Additionally, some squares are blocked (and don’t have a corresponding vertex).
An example can be seen in Figure 14.1.

Figure 14.1: An example grid graph, with source marked 𝑠.

Let us solve this problem inductively. First of all, what vertices have distance
0? Clearly, this is only the source vertex 𝑠 itself. This seems like a reasonable
base case, since the problem is about shortest paths from 𝑠. Then, what vertices

231
C HAPTER 14. G RAPH A LGORITHMS

have distance 1? These are exactly those with a path consisting of a single edge
from 𝑠, meaning they are the neighbors of 𝑠 (marked in Figure 14.2).

1 s
1

Figure 14.2: The square with distance 1 from the source.

If a vertex 𝑣 has distance 2, it must be a neighbor of a vertex 𝑢 with distance


1 (except for the starting vertex). This is also a sufficient condition, since we can
construct a path of length 2 simply by extending the path of any neighbor of
distance 1 with the edge (𝑢, 𝑣).

4
2 1 s 3 2 1 s 3 2 1 s
2 1 2 3 2 1 2 4 3 2 1 2

Figure 14.3: The squares with distance 2, 3 and 4.

In fact, this reasoning generalizes to any particular distance, i.e., that all the
vertices that have exactly distance 𝑘 are those that have a neighbor of distance
𝑘 − 1 but no neighbor to a vertex with a smaller distance. Using this, we can
construct an algorithm to solve the problem. Initially, we set the distance of 𝑠
to 0. Then, for every dist = 1, 2, . . . , we mark all vertices that have a neighbor
with distance dist − 1 as having dist. This algorithm is called the breadth-first
search.

Exercise 14.1. Use the BFS algorithm to compute the distance to every square
in the following grid:

232
14.1. B READTH -F IRST S EARCH

A simple implementation of this would be to iteratively construct the lists of


the vertices which have distance 0, 1, . . . , and so on.

1: procedure BreadthFirstSearch(vertices 𝑉 , vertex 𝑠)


2: distances ← new int[|𝑉 |]
3: fill distances with ∞
4: curDist ← 0
5: curVertices ← new vector
6: curVertices. add(𝑠)
7: distances[𝑠] ← curDist
8: while curDistVertices ≠ ∅ do
9: nextVertices ← new vector
10: for from ∈ curVertices do
11: for 𝑣 ∈ from. neighbours do
12: if distances[𝑣] = ∞ then
13: nextVertices. add(𝑣)
14: distances[𝑣] ← curDist + 1
15: curDist ← curDist + 1
16: curVertices = nextVertices
17: return distances

Each vertex is added to nextVertices at most once, since it is only pushed if


distances[v] = ∞ whereupon it is immediately set to something else. We then
iterate through every neighbor of all these vertices. In total, the number of all
neighbours is 2𝐸, so the algorithm in total uses Θ(𝑉 + 𝐸) time.
Usually, the outer loop are often coded in another way. Instead of maintaining
two separate vectors, we can merge them into a single queue:

1: while curVertices ≠ ∅ do

233
C HAPTER 14. G RAPH A LGORITHMS

2: from ← curVertices. front()


3: curVertices. pop()
4: for from ∈ curVertices do
5: for 𝑣 ∈ from. neighbours do
6: if distances[𝑣] = ∞ then
7: curVertices. add(𝑣)
8: distances[𝑣] = distances[from] + 1

The order of iteration is equivalent to the original order.


Exercise 14.2. Prove that the shorter way of coding the BFS loop (Algorithm ??)
is equivalent to the longer version (Algorithm ??).
Exercise 14.3. Implement BFS problem
In many problems the task is to find a shortest path between some pair of
vertices where the graph is given implicitly.

8-puzzle
In the 8-puzzle, 8 tiles are arranged in a 3 × 3 grid, with one square left empty.
A move in the puzzle consists of sliding a tile into the empty square. The goal
of the puzzle is to perform some moves to reach the target configuration. The
target configuration has the empty square in the bottom right corner, with the
numbers in order 1, 2, 3, 4, 5, 6, 7, 8 on the three lines.

8 6 8 6 1 2 3
7 1 4 7 1 4 4 5 6
2 5 3 2 5 3 7 8

Figure 14.4: An example 8-puzzle, with a valid move. The rightmost puzzle shows the target
configuration.

Given a puzzle, determine how many moves are required to solve it, or if it
cannot be solved.
This is a typical BFS problem, characterized by a starting state (the initial
puzzle), some transitions (the moves we can make), and the task of finding a
short sequence of transitions to some goal state. We can model this kind of
problem using a graph. The vertices represent the possible arrangements of

234
14.1. B READTH -F IRST S EARCH

the tiles in the grid, and an edge connects two states if the differ by a single
move. A sequence of moves from the starting state to the target configuration
then represents a path in this graph. The minimum number of moves required is
the same as the distance between those vertices in the graph, meaning we can
use a BFS.
In such a problem, most of the code usually deals with with the representation
of a state as a vertex, and generating the edges that a certain vertex is adjacent to.
When an implicit graph is given, we generally do not compute the entire graph
explicitly. Instead, we use the states from the problems as-is, and generate the
edges of a vertex only when it is being visited in the breadth-first search. In the
8-puzzle, we can represent each state as a 3 × 3 2D-vector. The difficult part is
generating all the states that we can reach from a certain state.

Generating 8-puzzle Moves


1 typedef vector<vi> Puzzle;
2
3 vector<Puzzle> edges(const Puzzle& v) {
4 int emptyRow, emptyCol;
5 rep(row,0,3)
6 rep(col,0,3)
7 if (v[row][col] == 0) {
8 emptyRow = row;
9 emptyCol = col;
10 }
11 vector<Puzzle> possibleMoves;
12 auto makeMove = [&](int rowMove, int colMove) {
13 int newRow = emptyRow + rowMove;
14 int newCol = emptyCol + colMove;
15 if (newRow >= 0 && newCol >= 0 && newRow < 3 && newCol < 3) {
16 Puzzle newPuzzle = v;
17 swap(newPuzzle[emptyRow][emptyCol], newPuzzle[newRow][newCol]);
18 possibleMoves.push_back(newPuzzle);
19 }
20 };
21 makeMove(-1, 0);
22 makeMove(1, 0);
23 makeMove(0, -1);
24 makeMove(0, 1);
25 return possibleMoves;
26 }

With the edge generation in hand, the rest of the solution is a normal BFS,
slightly modified to account for the fact that our vertices are no longer numbered
0, . . . , 𝑉 − 1. We can solve this by using e.g. maps instead.

235
C HAPTER 14. G RAPH A LGORITHMS

8-puzzle BFS
1 int puzzle(const Puzzle& S, const Puzzle& target) {
2 map<Puzzle, int> distances;
3 distances[S] = 0;
4 queue<Puzzle> q;
5 q.push(S);
6 while (!q.empty()) {
7 const Puzzle& cur = q.front(); q.pop();
8 int dist = distances[cur];
9 if (cur == target) return dist;
10 for (const Puzzle& move : edges(cur)) {
11 if (distances.find(move) != distances.end()) continue;
12 distances[move] = dist + 1;
13 q.push(move);
14 }
15 }
16 return -1;
17 }

Besides this kind of search problems that can be solved using a BFS, some
problems require modifications of a BFS, or use the distances generated only as
an intermediary result.

Shortest Cycle
Compute the length of the shortest simple cycle in a graph.

Problem 14.1
Button Bashing – buttonbashing

14.2 Depth-First Search


The depth-first search is an analogue to the breadth-first search that visits vertices
in another order. Similarly to how the BFS grows the set of visited vertices
using a wide frontier around the source vertex, the depth-first search proceeds
its search by, at every step, trying to plunge deeper into the graph. This order
is called the depth-first order. More precisely, the search starts at some source
vertex 𝑠. Then, any neighbor of 𝑠 is chosen to be the next vertex 𝑣. Before
visiting any other neighbor of 𝑠, we first visit any of the neighbours of 𝑣, and so
on.
Implementing the depth-first search is usually done with a recursive function,
using a vector seen to keep track of visited vertices:

236
14.2. D EPTH -F IRST S EARCH

1: procedure Depth-First Search(vertex at, adjacency list 𝐺)


2: if seen[at] then
3: return
4: seen[at] = true
5: for neighbour ∈ 𝐺 [at] do
6: dfs(neighbour, 𝐺)

In languages with limited stack space, it is possible to implement the DFS


iteratively using a stack instead, keeping the vertices which are currently open
in it.
Due to the simplicity of coding the DFS compared to a BFS, it is usu-
ally the algorithm of choice in problems where we want to visit all the ver-
tices.

Coast Length
KTH Challenge 2011 – Ulf Lundström
The residents of Soteholm value their coast highly and therefore want to maximize
its total length. For them to be able to make an informed decision on their
position in the issue of global warming, you have to help them find out whether
their coastal line will shrink or expand if the sea level rises. From height maps
they have figured out what parts of their islands will be covered by water, under
the different scenarios described in the latest IPCC report on climate change,
but they need your help to calculate the length of the coastal lines.

Figure 14.5: Gray squares are land and white squares are water. The thick black line is the sea
coast.

You will be given a map of Soteholm as an 𝑁 × 𝑀 grid. Each square in

237
C HAPTER 14. G RAPH A LGORITHMS

the grid has a side length of 1 km and is either water or land. Your goal is to
compute the total length of sea coast of all islands. Sea coast is all borders
between land and sea, and sea is any water connected to an edge of the map
only through water. Two squares are connected if they share an edge. You may
assume that the map is surrounded by sea. Lakes and islands in lakes are not
contributing to the sea coast.
Solution. We can consider the grid as a graph, where all the water squares are
vertices, and two squares have an edge between them if they share an edge. If
we surround the entire grid by an water tiles (a useful trick to avoid special cases
in this kind of grid problems), the sea consists exactly of those vertices that are
connected to these surrounding water tiles. This means we need to compute the
vertices which lie in the same connected component as the sea – a typical DFS
task1. After computing this component, we can determine the coast length by
looking at all the squares which belong to the sea. If such a square share an edge
with a land tile, that edge contributes 1 km to the coast length.
1 const vpi moves = {pii(-1, 0), pii(1, 0), pii(0, -1), pii(0, 1)};
2
3 int coastLength(const vector<vector<bool>>& G) {
4 int H = sz(G) + 4;
5 W = sz(G[0]) + 4;
6 vector<vector<bool>> G2(H, vector<bool>(W, true));
7 rep(i,0,sz(G)) rep(j,0,sz(G[i])) G2[i+2][j+2] = G[i][j];
8 vector<vector<bool>> sea(H, vector<bool>(W));
9
10 function<void(int, int)> floodFill = [&](int row, int col) {
11 if (row < 0 || row >= H|| col < 0 || col >= W) return;
12 if (sea[row][col]) return;
13 sea[row][col] = true;
14 trav(move, moves) floodFill(row + move.first, col + move.second);
15 };
16 dfs(0, 0);
17
18 int coast = 0;
19 rep(i,1,sz(G)+1) rep(j,1,sz(G[0])+1) {
20 if (sea[i][j]) continue;
21 trav(move, moves) if (!sea[i + move.first][j + move.second]) coast++;
22 }
23 return coast;
24 }


1This particular application of DFS, i.e. computing a connected area in a 2D grid, is called a
flood fill.

238
14.3. W EIGHTED S HORTEST PATH

Problem 14.2
Mårten’s DFS – martensdfs

Cutvertices and Bridges


Strongly Connected Components
14.3 Weighted Shortest Path
The theory of computing shortest paths in the case of weighted graphs is a bit
more rich than the unweighted case. Which algorithm to use depends on three
factors:

• The number of vertices.

• Whether edge weights are non-negative or not.

• If we seek shortest paths only from a single vertex or between all pairs of
vertices.

There are mainly three algorithms used: Dijkstra’s Algorithm, the Bellman-Ford
algorithm, and the Floyd-Warshall algorithm.

Dijkstra’s Algorithm
Dijkstra’s Algorithm can be seen as an extension of the breadth-first search that
works for weighted graphs as well.

Single-Source Shortest Path, non-negative weights


Given a weighted graph 𝐺 = (𝑉 , 𝐸) where all weights are non-negative and a
source vertex 𝑠, compute the shortest distances 𝑑 (𝑠, 𝑣) for all 𝑣 ∈ 𝑉 (𝐺).
It has a similar inductive approach, where we iteratively compute the shortest
distances to all vertices in ordered by distance. The difference lies in that we do
not immediately know when we have found the shortest path to a vertex. For
example, the shortest path from a neighbour to the source vertex may now use
several other vertices in its shortest path to 𝑠 (Figure TODO).
However, can this be the case for every vertex adjacent to 𝑠? In particular,
can the neighbour to 𝑠 with the smallest edge weight 𝑊 to 𝑠 have a distance
smaller than 𝑤? This is never the case in Figure TODO, and actually holds in
general. Assume that this is not the case. In Figure TODO, this hypothetical
scenario is shown. Any such path must still pass through some neighbour 𝑢 of 𝑠.

239
C HAPTER 14. G RAPH A LGORITHMS

By assumption, the weight of the edge (𝑠, 𝑢) must be larger than 𝑊 (which was
the minimal weight of the edges adjacent to 𝑠). This reasoning at least allows us
to find the shortest distance to one other vertex.

Bellman-Ford

Single-Source Shortest Path, non-negative weights


Given a weighted graph 𝐺 = (𝑉 , 𝐸) where all weights are non-negative and a
source vertex 𝑠, compute the shortest distances 𝑑 (𝑠, 𝑣) for all 𝑣 ∈ 𝑉 (𝐺).
When edges can have negative weights, the idea behind Dijkstra’s algorithm
no longer works. It is very much possible that a negative weight edge somewhere
else in the graph could be used to construct a shorter path to a vertex which was
already marked as completed. However, the concept of relaxing an edge is still
very much applicable and allows us to construct a slower, inductive solution.
Initially, we know the shortest distances to each vertex, assuming we are
allowed to traverse 0 edges from the source. Assuming that we know these
values when allowed to traverse up to 𝑘 edges, can we find the shortest paths
that traverse up to 𝑘 + 1 edges? This way of thinking is similar to how to solved
the BFS problem. Using a BFS, we computed the vertices at distance 𝑑 + 1 by
taking the neighbours of the vertices at distance 𝑑 Similarly, we can find the
shortest path to a vertex 𝑣 that traverse up to 𝑘 + 1 edges, by attempting to extend
the shortest paths using 𝑘 edges from the neighbours of 𝑣. Letting 𝐷 (𝑘, 𝑣) be
the shorter distance to 𝑣 by traversing up to 𝑘 edges, we arrive at the following
recursion:

0 if 𝑣 = 𝑠




𝐷 (𝑘, 𝑣) = min 𝐷 (𝑘 − 1, 𝑣)


if 𝑘 > 0

 min𝑒=(𝑢,𝑣) ∈𝐸 𝐷 (𝑘 − 1, 𝑢) + 𝑊 (𝑒) if 𝑘 > 0



The implementation is straightforward:

1: procedure BellmanFord(vertices 𝑉 , edges 𝐸, vertex 𝑠)


2: 𝐷 ← new int[|𝑉 |] [|𝑉 | + 1]
3: fill 𝐷 with ∞
4: 𝐷 [0] [𝑠] ← 0
5: for 𝑘 = 1 to |𝑉 | do
6: 𝐷 [𝑘] = 𝐷 [𝑘 − 1]

240
14.3. W EIGHTED S HORTEST PATH

7: for 𝑒 = (𝑢, 𝑣) ∈ 𝐸 do
8: 𝐷 [𝑘] [𝑣] = min(𝐷 [𝑘] [𝑣], 𝐷 [𝑘 − 1] [𝑢] + 𝑊 (𝑒)
9: return 𝐷

All in all, the states 𝐷 (𝑣, 𝑖) for a particular 𝑖 takes time Θ(|𝐸|) to evaluate. To
compute the distance 𝑑 (𝑠, 𝑣), we still need to know what the maximum possible
𝑘 needed to arrive at this shortest path could be. It turns out that this could
potentially be infinite, in the case where the graph contains a negative-weight
cycle. Such a cycle can be exploited to construct arbitrarily short paths.
However, of no such cycle exists, 𝑘 = |𝑉 | will be sufficient. If a shortest
path uses more than |𝑉 | edges, it must contain a cycle. If this cycle is not of
negative weight, we may simply remove it to obtain a path of at most the same
length. Thus, the algorithm takes 𝑂Θ(|𝑉 ||𝐸|).

Exercise 14.4. Bellman-Ford can be adapted to instead use only Θ(𝑉 ) memory,
by only keeping a current know shortest path and repeatedly relaxing every edge.
Sketch out the pseudo code for such an approach, and prove its correctness.

Exercise 14.5. We may terminate Bellman-Ford earlier without loss of correct-


ness, in case 𝐷 [𝑘] = 𝐷 [𝑘 − 1]. How can this fact be used to determine whether
the graph in question contains a negative-weight cycle?

Floyd-Warshall

All-Pairs Shortest Paths


Given a weighted graph 𝐺 = (𝑉 , 𝐸), compute the shortest distance 𝑑 (𝑢, 𝑣) for
every pair of vertices 𝑢, 𝑣.

The Floyd-Warshall algorithm is a remarkably short method to solve the all-pairs


shortest paths problem. It basically consists of three nested loops containing a
single statement, one of the shortest highly useful algorithms there is:

1: procedure Floyd-Warshall(distance matrix 𝐷)


2: for 𝑘 = 0 to |𝑉 | − 1 do
3: for 𝑖 = 0 to |𝑉 | − 1 do
4: for 𝑗 = 0 to |𝑉 | − 1 do
5: 𝐷 [𝑖] [ 𝑗] = min(𝐷 [𝑖] [ 𝑗], 𝐷 [𝑖] [𝑘] + 𝐷 [𝑘] [ 𝑗])
6: return 𝐷

241
C HAPTER 14. G RAPH A LGORITHMS

Initially, the distance matrix 𝐷 contains the distances of all the edges in 𝐸, so
that 𝐷 [𝑖] [ 𝑗] is the weight of the edge (𝑖, 𝑗) if such an edge exists, ∞ if there is
no edge between 𝑖 and 𝑗 or 0 if 𝑖 = 𝑗. Note that if multiple edges exists between
𝑖 and 𝑗, 𝐷 [𝑖] [ 𝑗] must be given the minimum weight of them all. Additionally, if
there is a self-loop (i.e. an edge from 𝑖 to 𝑖 itself) of negative weight, 𝐷 [𝑖] [𝑖]
must be set to this value.
To see why this approach works, we can use the following invariant proven
by induction. After the 𝑘’th iteration of the loop, 𝐷 [𝑖] [ 𝑗] will be at most the
minimum distance between 𝑖 and 𝑗 that uses vertices 0, 1, . . . , 𝑘 − 1. Assume
that this is true for a particular 𝑘. After the next iteration, there are two cases
for 𝐷 [𝑖] [ 𝑗]. Either there is no shorter path using vertex 𝑘 than those using only
vertices 0, 1, . . . , 𝑘 − 1. In this case, 𝐷 [𝑖] [ 𝑗] will fulfill the condition by the
induction assumption. If there is a shorter path between 𝑖 and 𝑗 if we use the
vertex 𝑘, this must have length 𝐷 [𝑖] [𝑘] + 𝐷 [𝑘] [ 𝑗], since 𝐷 [𝑖] [𝑘] and 𝐷 [𝑘] [ 𝑗]
both contain the shortest paths between 𝑖 and 𝑘, and 𝑘 and 𝑗 using vertices
0, 1, . . . , 𝑘 − 1. Since we set 𝐷 [𝑖] [ 𝑗] = min(𝐷 [𝑖] [ 𝑗], 𝐷 [𝑖] [𝑘] + 𝐷 [𝑘] [ 𝑗]) in the
inner loop, we will surely find this path too in this iteration. Thus, the statement
is true after the 𝑘 + 1’th iteration too. By induction, it is true for 𝑘 = |𝑉 |, meaning
𝐷 [𝑖] [ 𝑗] contains at most the minimum distance between 𝑖 and 𝑗 using any vertex
in the graph.

14.4 Minimum Spanning Tree


Using the depth-first search, we were able to find a subtree of a connected graph
which spanned the entire set of vertices. A particularly important such spanning
tree is the minimum spanning tree.

Minimum Spanning Tree


We say that the weight of a spanning tree is the sum of the weights of its edges.
A minimum spanning tree is a spanning tree whose weight is minimal.

242
14.4. M INIMUM S PANNING T REE

4 4
D D
B F B F
3 3
1 1
1 1
2 C 2 C
5
A A
2 2
5

E E

Figure 14.6: A graph with a corresponding minimum spanning tree.

Given a weighted graph, find a minimum spanning tree.


Mainly two algorithms are used to solve the Minimum Spanning Tree (MST)
problem: either Kruskal’s Algorithm which is based on the Union-Find data
structure, or Prim’s Algorithm which is an extension of Dijkstra’s Algorithm.
We will demonstrate Kruskal’s, since it is by far the most common of the two.
Kruskal’s algorithm is based on a greedy, incremental approach that reminds
us of the Scheduling problem (Section 10.4). In the Scheduling problem, we
tried to find any interval that we could prove to be part of an optimal solution,
by considering various extremal cases. Can we do the same thing when finding
a minimum spanning tree?
First of all, if we can always find such an edge, we are essentially done.
Given an edge (𝑎, 𝑏) that we know is part of a minimum spanning tree, we can
contract the two vertices 𝑎 and 𝑏 to a single vertex 𝑎𝑏, with all edges adjacent
to the two vertices. Any edges that go between contracted vertices are ignored.
An example of this process in action be be seen in Figure 14.7. Note how the
contraction of an edge reduces the problem to finding an MST in a smaller
graph.
A natural extremal case to consider is the edge with the minimum weight.
After all, we are trying to minimize the edge sum. Our proof is similar in
structure to the Scheduling problem as well by using a swapping argument.
Assume that a minimum-weight edge {𝑎, 𝑏} with weight 𝑤 is not part of
any minimum spanning tree. Then, consider the spanning tree with this edge
appended. The graph will then contain exactly one cycle. In a cycle, any edge
can be removed while maintaining connectivity. This means that if any edge
{𝑐, 𝑑 } on this cycle have a weight 𝑤 0 larger weight than 𝑤, we can erase it. We

243
C HAPTER 14. G RAPH A LGORITHMS

will thus have replaced the edge {𝑐, 𝑑 } by {𝑎, 𝑑 }, while changing the weight
of the tree by 𝑤 − 𝑤 0 < 0, reducing the sum of weights. Thus, the tree was
improved by using the minimum-weight edge, proving that it could have been
part of the tree.

Exercise 14.6. What happens if all edges on the cycle that appears have weight
𝑤? Is this a problem for the proof?

When implementing the algorithm, the contraction of the edge added to the
minimum spanning tree is generally not performed explicitly. Instead, a disjoint
set data structure is used to keep track of which subsets of vertices have been
contracted. Then, all the original edges are iterated through in increasing order
of weight. An edge is added to the spanning tree if and only if the two endpoints
of the edge are not already connected (as in Figure 14.7).

1: procedure MinimumSpanningTree(vertices 𝑉 , edges 𝐸)


2: sort 𝐸 by increasing weight
3: uf ← new DisjointSet(𝑉 )
4: mst ← new Graph
5: for each edge {𝑎, 𝑏} ∈ 𝐸 do
6: if not uf . sameSet(𝑎, 𝑏) then
7: mst. append(𝑎, 𝑏)
8: uf . join(𝑎, 𝑏)
9: return mst

The complexity of this algorithm is dominated by the sorting (which is


𝑂 (𝐸 log 𝑉 )), since operations on the disjoint set structure is 𝑂 (log 𝑉 ).

14.5 Chapter Notes

244
14.5. C HAPTER N OTES

3
2 A A
2 3
D E D E
1 1
2
2
2 2 2
2 B B
C C

A
3
D 2 E
1
2
2
2
B
C

2
2
D A 3 3
D A
E
2 1 E
2 1
2

C B 2
2 C B
2

Figure 14.7: Incrementally constructing a minimum spanning tree by merging

245
C HAPTER 14. G RAPH A LGORITHMS

246
15 Maximum Flows
This chapter studies so called flow networks, and algorithms we use to solve the
so-called maximum flow and minimum cut problems on such networks. Flow
problems are common algorithmic problems, particularly in ICPC competitions
(while they are out-of-scope for IOI contests). They are often hidden behind
statements which seem unrelated to graphs and flows, especially the minimum
cut problem.
Finally, we will end with a specialization of maximum flow on the case of
bipartite graphs (called bipartite matching).

15.1 Flow Networks


Informally, a flow network is a directed graph that models any kind of network
where paths have a fixed capacity, or throughput. For example, in a road network,
each road might have a limited throughput, proportional to the number of lanes
on the road. A computer network may have different speeds along different
connections due to e.g. the type of material. These natural models are often use
when describing a problem that is related to flows. A more formal definition is
the following.

Definition 15.1 — Flow Network


A flow network is a special kind of directed graph (𝑉 , 𝐸, 𝑐), where each
edge 𝑒 is given a non-negative capacity 𝑐 (𝑒). Two vertices are designated
the source and the sink, which we will often abbreviate to 𝑆 and 𝑇 .
In Figure 15.1, you can see an example of a flow network.
In such a network, we can assign another value to each edge, that models
the current throughput (which generally does not need to match the capacity).
These values are what we call flows.
Definition 15.2 — Flow
A flow is a function 𝑓 : 𝐸 → R ≥0 , associated with a particular flow network
(𝑉 , 𝐸, 𝑐). We call a flow 𝑓 admissible if:

247
C HAPTER 15. M AXIMUM F LOWS

10
𝑐 𝑑
5 3

𝑆 2 𝑇
2 7

6 8
𝑎 𝑏
6

Figure 15.1: An example flow network.

• 0 ≤ 𝑓 (𝑒) ≤ 𝑐 (𝑒) – the flow does not exceed the capacity

• For every 𝑣 ∈ 𝑉 {𝑆,𝑇 }, 𝑒 ∈𝑖𝑛 (𝑣) 𝑓 (𝑒) = 𝑒 ∈𝑜𝑢𝑡 (𝑣) – flow is conserved
Í Í

for each vertex, possibly except the source and sink.

The size of a flow is defined to be the value


Õ Õ
𝑓 (𝑣) − 𝑓 (𝑣)
𝑣 ∈out(𝑆) 𝑣 ∈in(𝑆)

In a computer network, the flows could e.g. represent the current rate of
transfer through each connection.

Exercise 15.1. Prove that the size of a given flow also equals
Õ Õ
𝑓 (𝑣) − 𝑓 (𝑣)
𝑣 ∈in(𝑇 ) 𝑣 ∈out(𝑇 )

i.e. the excess flow out from 𝑆 must be equal to the excess flow in to 𝑇 .

In Figure 15.2, flows have been added to the network from Figure 15.1.
Given such a flow, we are generally interested in determining the flow of the
largest size. This is what we call the maximum flow problem. The problem is
not only interesting on its own. Many problems which we study might initially
seem unrelated to maximum flow, but will turn out to be reducible to finding a
maximum flow.

248
15.2. E DMONDS -K ARP

1/10
𝑐 𝑑
1/5 3/3

𝑆 1/2 𝑇
1/2 0/7

6/6 5/8
𝑎 𝑏
5/6

Figure 15.2: An example flow network, where each edge has an assigned flow. The size of the
flow is 8.

Maximum Flow
Given a flow network (𝑉 , 𝐸, 𝑐, 𝑆,𝑇 ), construct a maximum flow from 𝑆 to 𝑇 .
Input
A flow network.
Output
Output the maximal size of a flow, and the flow assigned to each edge in one
such flow.

Exercise 15.2. The flow of the network in Figure 15.2 is not maximal – there is
a flow of size 9. Find such a flow.

Before we study problems and applications of maximum flow, we will first


discuss algorithms to solve the problem. We can actually solve the problem
greedily, using a rather difficult insight, that is hard to prove but essentially gives
us the algorithm we will use. It is probably one of the more complex standard
algorithms that is in common use.

15.2 Edmonds-Karp
There are plenty of algorithms which solve the maximum flow problem. Most
of these are too complicated to be implemented to be practical. We are going
to study two very similar classical algorithms that computes a maximum flow.
We will start with proving the correctness of the Ford-Fulkerson algorithm.
Afterwards, a modification known as Edmonds-Karp will be analyzed (and
found to have a better worst-case complexity).

249
C HAPTER 15. M AXIMUM F LOWS

Augmenting Paths
For each edge, we define a residual flow 𝑟 (𝑒) on the edge, to be 𝑐 (𝑒) − 𝑓 (𝑒). The
residual flow represents the additional amount of flow we may push along an
edge.
In Ford-Fulkerson, we associate every edge 𝑒 with an additional back edge
𝑏 (𝑒) which points in the reverse order. Each back edge is originally given a flow
and capacity 0. If 𝑒 has a certain flow 𝑓 , we assign the flow of the back-edge
𝑏 (𝑒) to be −𝑓 (i.e. 𝑓 (𝑏 (𝑒)) = −𝑓 (𝑒). Since the back-edge 𝑏 (𝑒) of 𝑒 has capacity
0, their residual capacity is 𝑟 (𝑏 (𝑒)) = 𝑐 (𝑏 (𝑒)) − 𝑓 (𝑏 (𝑒)) = 0 − (−𝑓 (𝑒)) = 𝑓 (𝑒).
Intuitively, the residual flow represents the amount of flow we can add to a
certain edge. Having a back-edge thus represents “undoing” flows we have added
to a normal edge, since increasing the flow along a back-edge will decrease the
flow of its associated edge.

9
𝑐 𝑑
1 1 3

4
1
𝑆 1 1 𝑇
1 7 5

6 1 3
𝑎 𝑏
5

Figure 15.3: The residual flows from the network in Figure 15.2.

The basis of the Ford-Fulkerson family of algorithms is the augmenting path.


An augmenting path is a path from 𝑆 to 𝑇 in the network consisting of edges
𝑒 1, 𝑒 2, ..., 𝑒𝑙 , such that 𝑟 (𝑒𝑖 ) > 0, i.e. every edge along the path has a residual
flow. Letting 𝑚 be the minimum residual flow among all edges on the path, we
can increase the flow of every such edge with 𝑚.
In Figure 15.3, the path 𝑆, 𝑐, 𝑑, 𝑏,𝑇 is an augmenting path, with minimum
residual flow 1. This means we can increase the flow by 1 in the network, by:

• Increasing the flow from 𝑆 to 𝑐 by 1

• Increasing the flow from 𝑐 to 𝑑 by 1

250
15.2. E DMONDS -K ARP

• Decreasing the flow from 𝑏 to 𝑑 by 1 (since (𝑑, 𝑏) is a back-edge, augment-


ing the flow along this edge represents removing flow from the original
edge)

• Increasing the flow form 𝑑 to 𝑇


The algorithm for augmenting a flow using an augmenting path is simple:

1: procedure Augment(path 𝑃)
2: inc ← ∞
3: for 𝑒 ∈ 𝑃 do
4: inc ← min(inc, 𝑐 (𝑒) − 𝑓 (𝑒))
5: for 𝑒 ∈ 𝑃 do
6: f (e) ← 𝑓 (𝑒) + inc
7: f (b(e)) ← 𝑓 (𝑏 (𝑒)) − inc
8: return inc

Performing this kind of augmentation on an admissible flow will keep the


flow admissible. A path must have either zero or two edges adjacent to any vertex
(aside from the source and sink). One of these will be an incoming edge, and
one an outgoing edge. Increasing the flow of these edges by the same amount
conserves the equality of flows between in-edges and out-edges, meaning the
flow is still admissible.
This means that a flow can be maximum only if it contains no augmenting
paths. It turns out this is also a necessary condition, i.e. a flow is maximum if it
contains no augmenting path. Thus, we can solve the maximum flow problem
by repeatedly finding augmenting paths, until no more exists.

Finding Augmenting Paths


The most basic algorithms based on augmenting paths is the Ford-Fulkerson
algorithm. It uses a simple DFS to find the augmenting paths:

1: procedure AugmentingPath(flow network (𝑉 , 𝐸, 𝑐, 𝑓 , 𝑆,𝑇 ))


2: 𝑏𝑜𝑜𝑙 [] 𝑠𝑒𝑒𝑛 ← 𝑛𝑒𝑤 𝑏𝑜𝑜𝑙 [|𝑉 |]
3: 𝑆𝑡𝑎𝑐𝑘 𝑠𝑡𝑎𝑐𝑘 ← 𝑛𝑒𝑤 𝑆𝑡𝑎𝑐𝑘
4: 𝑓 𝑜𝑢𝑛𝑑 ← 𝐷𝐹𝑆 (𝑆,𝑇 , 𝑓 , 𝑐, 𝑠𝑒𝑒𝑛, 𝑠𝑡𝑎𝑐𝑘)
5: if 𝑓 𝑜𝑢𝑛𝑑 then
6: return 𝑠𝑡𝑎𝑐𝑘

251
C HAPTER 15. M AXIMUM F LOWS

7: return 𝑁 𝑖𝑙
8: procedure Dfs(vertex 𝑎𝑡, sink 𝑇 , flow 𝑓 , capacity 𝑐, path p)
9: 𝑝.𝑝𝑢𝑠ℎ(𝑎𝑡)
10: if 𝑎𝑡 = 𝑇 then
11: return 𝑡𝑟𝑢𝑒
12: for every out-edge 𝑒 = (𝑎𝑡, 𝑣) from 𝑎𝑡 do
13: if 𝑓 (𝑒) < 𝑐 (𝑒) then
14: if 𝐷𝐹𝑆 (𝑣,𝑇 , 𝑓 , 𝑐, 𝑝) then
15: return true
16: 𝑝.𝑝𝑜𝑝 ()
17: return 𝑓 𝑎𝑙𝑠𝑒

For integer flows, where the maximum flow has size 𝑚 Ford-Fulkerson may
require up to 𝑂 (𝐸𝑚) time. In the worst case, a DFS takes Θ(𝐸) time to find a
path from 𝑆 to 𝑇 , and one augmenting path may contribute only a single unit of
flow. For non-integral flows, there are instances where Ford-Fulkerson may not
even terminate (nor converge to the maximum flow).
An improvement to this approach is simply to use a BFS instead. This is
what is called the Edmonds-Karp algorithm. The BFS looks similar to the
Ford-Fulkerson DFS, and is modified in the same way (i.e. only traversing
those edges where the flow 𝑓 (𝑒) is smaller than the capacity 𝑐 (𝑒). The resulting
complexity is instead 𝑂 (𝑉 𝐸 2 ) (which is tight in the worst case).

15.3 Applications of Flows


We will now study a number of problems which are reducible to finding a
maximum flow in a network. Some of these problems are themselves considered
to be standard problems.

Maximum-Flow with Vertex Capacities


In a flow network, each vertex 𝑣 additionally have a limit 𝐶 𝑣 on the amount of
flow that can go through it, i.e.
Õ
𝑓 (𝑒) ≤ 𝐶 𝑣
𝑒 ∈𝑖𝑛 (𝑣)

Find the maximum flow subject to this additional constraint.

252
15.3. A PPLICATIONS OF F LOWS

This is nearly the standard maximum flow problem, with the addition of
vertex capacities. We are still going to use the normal algorithms for maximum
flow. Instead, we will make some minor modifications to the network. The
additional constraint given is similar to the constraint placed on an edge. An
edge has a certain amount of flow passing through it, implying that the same
amount must enter and exit the edge. For this reason, it seems like a reasonable
approach to reduce the vertex capacity constraint to an ordinary edge capacity,
by forcing all the flow that passes through a vertex 𝑣 with capacity 𝐶 𝑣 through a
particular edge.
If we partition all the edges adjacent to 𝑣 into incoming and outgoing edges,
it becomes clear how to do this. We can split up 𝑣 into two vertices 𝑣𝑖𝑛 and
𝑣𝑜𝑢𝑡 , where all the incoming edges to 𝑣 are now incoming edges to 𝑣𝑖𝑛 and the
outgoing edges instead become outgoing edges from 𝑣𝑜𝑢𝑡 . If we then add an
edge of infinite capacity from 𝑣𝑖𝑛 to 𝑣𝑜𝑢𝑡 , we claim that the maximum flow
of the network does not change. All the flow that passes through this vertex
must now pass through this edge between 𝑣𝑖𝑛 and 𝑣𝑜𝑢𝑡 . This construction thus
accomplish our goal of forcing the vertex flow through a particular edge. We
can now enforce the vertex capacity by changing the capacity of this edge to 𝐶 𝑣 .

Maximum Bipartite Matching


Given a bipartite graph, a bipartite matching is a subset of edges in the graph,
such that no two edges share an endpoint. Determine the matching containing
the maximum number of edges.

The maximum bipartite matching problem is probably the most common


reduction to maximum flow in use. Some standard problems additionally reduce
to bipartite matching, making maximum flow even more important. Although
there are others ways of solving maximum bipartite matching than a reduction
to flow, this is what how we are going to solve it.
How can we find such a reduction? In general, we try to find some kind of
graph structure in the problem, and model what it “means” for an edge to have
flow pushed through it. In the bipartite matching problem, we are already given
a graph. We also have a target we wish to maximize – the size of the matching –
and an action that is already associated with edges – including it in the matching.
It does not seem unreasonable that this is how we wish to model the flow, i.e.
that we want to construct a network based on this graph where pushing flow
along one of the edges means that we include the edge in the matching. No two

253
C HAPTER 15. M AXIMUM F LOWS

selected edges may share an endpoint, which brings only a minor complication.
After all, this condition is equivalent to each of the vertices in the graph having
a vertex capacity of 1. We already know how to enforce vertex capacities from
the previous problem, where we split each such vertex into two, one for in-edges
and one for out-edges. Then, we added an edge between them with the required
capacity. After performing this modification on the given graph, we are still
missing one important part of a flow network. The network does not yet have
a source and sink. Since we want flow to go along the edges, from one of the
parts to another part of the graph, we should place the source at one side of the
graph and the sink at the other, connecting the source to all vertices on one side
and all the vertices on the other side to the sink.

Minimum Path Cover


In a directed, acyclic graph, find a minimum set of vertex-disjoint paths that
includes every vertex.
This is a difficult problem to derive a flow reduction to. It is reduced to
bipartite matching in a rather unnatural way. First of all, a common technique
must be used to get introduce a bipartite structure into the graph. For each
vertex, we split it into two vertices, one in-vertex and one out-vertex. Note that
this graph still have the same minimum path covers as the original graph.
Now, consider any path cover of this new graph, where we ignore the added
edges. Each vertex is then adjacent to at most a single edge, since paths are
vertex-disjoint. Additionally, the number of paths are equal to the number of
in-edges that does not lie on any path in the cover (since these vertices are the
origins of the paths). Thus, we wish to select a maximum subset of the original
edges. Since the subgraph containing only these edges is now bipartite, the
problem reduces to bipartite matching.

Exercise 15.3. The minimum path cover reduction can be modified slightly
to find a minimum cycle cover in a directed graph instead. Construct such a
reduction.

15.4 Chapter Notes


The Edmonds-Karp algorithm was originally published in 1970 by Yefim Dinitz
The paper by Edmonds and Karp

254
16 Strings
In computing, much of the information we process is text. Therefore, it should
not come as a surprise that many common algorithms and problems focus
concerns text strings. In this chapter, we will study some of the common string
algorithms and data structures.

16.1 Tries
The trie (also called a prefix tree) is the most common string-related data
structure. It represents a set of words as a rooted tree, where every prefix of
every word is a vertex, with children from a prefix 𝑃 to all strings 𝑃𝑐 which are
also prefixes of a word. If two words have a common prefix, the prefix only
appears once as a vertex. The root of the tree is the empty prefix. The trie is
very useful when we want to associate some information with prefixes of strings
and quickly get the information from neighboring strings.
The most basic operation of the trie is the insertion of strings, which may be
implemented as follows.

Trie
1 struct Trie {
2 map<char, Trie> children;
3 bool isWord = false;
4
5 void insert(const string& s, int pos) {
6 if (pos != sz(s)) children[s[pos]].insert(s, pos + 1);
7 else isWord = true;
8 }
9
10 };

We mark those vertices which corresponds to the inserted word using a boolean
flag isWord. Many problems essentially can be solved by very simple usage of a
trie, such as the following IOI problem.

255
C HAPTER 16. S TRINGS

Type Printer
International Olympiad in Informatics 2008
You need to print 𝑁 words on a movable type printer. Movable type printers are
those old printers that require you to place small metal pieces (each containing a
letter) in order to form words. A piece of paper is then pressed against them
to print the word. The printer you have allows you to do any of the following
operations:

• Add a letter to the end of the word currently in the printer.

• Remove the last letter from the end of the word currently in the printer.
You are only allowed to do this if there is at least one letter currently in
the printer.

• Print the word currently in the printer.

Initially, the printer is empty; it contains no metal pieces with letters. At the
end of printing, you are allowed to leave some letters in the printer. Also, you
are allowed to print the words in any order you like. As every operation requires
time, you want to minimize the total number of operations.
Your task is to output a sequence of operations that prints all the words using
the minimum number of operations needed.
Input
The first line contains the number of words 1 ≤ 𝑁 ≤ 25 000. The next 𝑁 lines
contain the words to be printed, one per line. Each word is at most 20 letters
long and consist only of lower case letters a-z. All words will be distinct
Output
Output a sequence of operations that prints all the words. The operations should
be given in order, one per line, starting with the first. Adding a letter c is
represented by outputting c on a line. Removing the last letter of the current
word is represented by a -. Printing the current word is done by outputting P.

Let us start by solving a variation of the problem, where we are not allowed
to leave letters in the printer at the end. First of all, are there actions that never
make sense? For example, what sequences of letters will ever appear in the type
writer during an optimal sequence of operations? Clearly we never wish to input
a sequence that is not a prefix of a word we wish to type. For example, if we

256
16.1. T RIES

input 𝑎𝑏𝑐𝑑𝑒 𝑓 and this is not a prefix of any word, we must at some point erase
the last letter 𝑓 , without having printed any words. But then we can erase the
entire sequence of operations between inputting the 𝑓 and erasing the 𝑓 , without
changing what words we print.
On the other hand, every prefix of a word we wish to print must at some
point appear on the type writer. Otherwise, we would not be able to reach the
word we wish to print. Therefore, the partial words to ever appear on the type
printer are exactly the prefixes of the words we wish to print – strongly hinting
at a trie-based solution.
If we build the trie of all words we wish to print, it contains as vertices exactly
those strings which will appear as partial words on the printer. Furthermore, the
additions and removals of letters form a sequence of vertices that are connected
by edges in this trie. We can move either from a prefix 𝑃 to a prefix 𝑃𝑐, or from
a prefix 𝑃𝑐 to a prefix 𝑃, which are exactly the edges of a trie. The goal is then
to construct the shortest possible tour starting at the root of the trie and passing
through all the vertices of the trie.
Since a trie is a tree, any such trail must pass through every edge of the trie
at least twice. If we only passed through an edge once, we can never get back
to the root since every edge disconnects the root from the endpoint of the edge
further away from the root. It is actually possible to construct a trail which
passes through every edge exactly twice (which is not particularly difficult if
you attempt this task by hand). As it happens, the depth-first search of a tree
passes through an edge exactly twice – once when first traversing the edge to an
unvisited vertex, and once when backtracking.
The problem is subtly different once we are allowed to leave some letters in
the printer at the end. Clearly, the only difference between an optimal sequence
when letters may remain and an optimal sequence when we must leave the
printer empty is that we are allowed to skip some trailing removal operations. If
the last word we print is 𝑆, the difference will be exactly |𝑆 | “-” operations. An
optimal solution will therefore print the longest word last, in order to “win” as
many “-” operations as possible. We would like this last word to be the longest
word of all the ones we print if possible. In fact, we can order our DFS such that
this is possible. First of all, our DFS starts from the root and the longest word is
𝑠 1𝑠 2 . . . 𝑠𝑛 . When selecting which order the DFS should visit the children of the
root in, we can select the child 𝑠 1 last. Thus, all words starting with the letter 𝑠 1
will be printed last. When visiting 𝑠 1 , we use the same trick and visit the child

257
C HAPTER 16. S TRINGS

𝑠 1𝑠 2 last of the children of 𝑠 1 , and so on. This guarantees 𝑆 to be the last word to
be printed.
Note that the solution requires no additional data to be stored in the trie –
the only modification to our basic trie is the DFS.
Typewriter
1 struct Trie {
2 ...
3
4 void dfs(int depth, const string& longest) {
5 trav(it, children)
6 if (it->first != longest[depth])
7 dfs2(depth, longest, it->first);
8 dfs2(depth, longest, longest[depth]);
9 }
10
11 void dfs2(int depth, const string& longest, char output) {
12 cout << output << endl;
13 if (isWord) cout << "P" << endl;
14 children[output]->dfs(depth + 1, longest);
15 if (longest[depth] != output) {
16 cout << "-" << endl;
17 }
18 }
19 };

Generally, the uses of tries are not this simple, where we only need to
construct the trie and fetch the answer through a simple traversal. We often need
to augment tries with additional information about the prefixes we insert. This
is when tries start to become really powerful. The next problem requires only a
small augmentation of a trie, to solve a problem which looks complex.

Rareville
In Rareville, everyone must have a distinct name. When a new-born baby is to
be given a name, its parents must first visit NAME, the Naming Authority under
the Ministry of Epithets, to have its name approved. The authority has a long
list of all names assigned to the citizens of Rareville. When deciding whether to
approve a name or not, a case officer uses the following procedure. They start
at the first name in the list, and read the first letter of it. If this letter matches
the first letter of the proposed name, they proceed to read the next letter in the
word. This is repeated for every letter of the name in the list. After reading a
letter from the word, the case officer can sometime determine that this could not

258
16.1. T RIES

possibly be the same name as the proposed one. This happens if either

• the next letter in the proposed name did not match the name in the list

• there was no next letter in the proposed name

• there was no next letter in the name in the list

When this happen, the case officer starts over with the next name in the list,
until exhausting all names in the list. For each letter the case officer reads (or
attempts to read) from a name in the list, one second pass.
Currently, there are 𝑁 people in line waiting to apply for a name. Can you
determine how long time the decision process will take for each person?
Input
The first line contains integers 1 ≤ 𝐷 ≤ 200 000 and 1 ≤ 𝑁 ≤ 200 000, the size
of the dictionary and the number of people waiting in line. The next 𝐷 lines
contains one lowercase name each, the contents of the dictionary. The next 𝑁
lines contains one lowercase name each, the names the people in line wish to
apply with. The total size of the lists is at most 106 letters.
Output
For each of the 𝑁 names, output the time (in seconds) the case officer needs to
decide on the application.
The problem clearly relates to prefixes in some way. Given a dictionary
word 𝐴 and an application for a name 𝐵, the case officer needs to read letters
from 𝐴 corresponding to the longest common prefix of 𝐴 and 𝐵, plus 1. Hence,
our solution will probably be to consider all the prefixes of each proposed name,
which is exactly what tries are good at.
Instead of thinking about this process one name a time, we use a common trie
technique and look at the transpose of this problem, i.e. for every 𝑖, how many
names 𝐶𝑖 have a longest common prefix of length at least 𝑖 when handling the
application for a name 𝑆? This way, we have transformed the problem from being
about 𝐷 individual processes to |𝑆 | smaller problems which treats the dictionary
as unified group of strings. Then, we will have to read 𝐶 0 + 𝐶 1 + · · · + 𝐶 |𝑆 | letters.
Now, the solution should be clear. We augment the trie vertex for a particular
prefix 𝑝 with the number of strings 𝑃𝑝 in the list that start with this prefix.
Initially, an empty trie has 𝑃𝑝 = 0 for every 𝑝. Whenever we insert a new word
𝑊 = 𝑤 1𝑤 2 . . . in the trie, we need to increment 𝑃 𝑤1 , 𝑃 𝑤1 𝑤2 , . . . , to keep all the 𝑃𝑝

259
C HAPTER 16. S TRINGS

correct, since we have added a new string which have those prefixes. Then, we
have that 𝐶𝑖 = 𝑃𝑠1𝑠2 ...𝑠𝑖 , so that we can compute all the numbers 𝑃𝑖 by following
the word 𝑆 in the trie. The construction of the trie is linear in the number of
characters we insert, and responding to a query is linear in the length of the
proposed name.

Rareville
1 struct Trie {
2 map<char, Trie> children;
3 int P = 0;
4
5 void insert(const string& s, int pos) {
6 P++;
7 if (pos != sz(s)) children[s[pos]].insert(s, pos + 1);
8 }
9
10 int query(const string& s, int pos) {
11 int ans = P;
12 if (pos != sz(s)) {
13 auto it = children.find(s[pos]);
14 if (it != children.end) ans += it->second.query(s, pos + 1);
15 }
16 return ans;
17 }
18 };

16.2 String Matching


A common problem on strings – both in problem solving and real life – is that
of searching. Not only do we need to check whether e.g. a set of strings contain
some particular string, but also if one string contain another one as a substring.
This operation is ubiquitous; operating systems allow us to search the contents
of our files, and our text editors, web browsers and email clients all support
substring searching in documents. It should come as no surprise that string
matching is part of many string problems.

260
16.2. S TRING M ATCHING

String Matching
Find all occurrences of the pattern 𝑃 as a substring in the string 𝑊 .

We can solve this problem naively in 𝑂 (|𝑊 | · |𝑃 |). If we assume that


an occurrence of 𝑃 starts at position 𝑖 in 𝑊 , we can compare the substring
𝑊 [𝑖...𝑖 + |𝑃 | − 1] to 𝑃 in 𝑂 (|𝑃 |) time by looping through both strings, one
character at a time:

1: procedure StringMatching(pattern 𝑃, string 𝑊 )


2: answer ← new vector
3: for outer: 𝑖 from 0 to |𝑊 | − |𝑃 | do
4: for 𝑗 from 0 to |𝑃 | − 1 do
5: if 𝑃 [ 𝑗]! = 𝑊 [𝑖 + 𝑗] then
6: start next iteration of outer
7: answer. append(𝑖)
8: return answer

Intuitively, we should be able to do better. With the naive matching, our


problem is basically that we can perform long stretches of partial matches for
every position. Searching for the string 𝑎 2 in the string 𝑎𝑛 takes 𝑂 (𝑛 2 ) time,
𝑛

since each of the 𝑛2 positions where the pattern can appear requires us to look
ahead for 𝑛2 characters to realize we made a match. On the other hand, if we
manage to find a long partial match of length 𝑙 starting at 𝑖, we know what the
next 𝑙 letters of 𝑊 are – they are the 𝑙 first letters of 𝑃. With some cleverness,
we should be able to exploit this fact, hopefully avoiding the need to scan them
again when we attempt to find a match starting at 𝑖 + 1.
For example, assume we have 𝑃 = 𝑏𝑎𝑛𝑎𝑛𝑎𝑟𝑎𝑚𝑎. Then, if we have performed
a partial match of 𝑏𝑎𝑛𝑎𝑛𝑎 at some position 𝑖 in 𝑊 but the next character
is a mismatch (i.e., it is not an 𝑟 ), we know that no match can begin at
the next 5 characters. Since we have matched 𝑏𝑎𝑛𝑎𝑛𝑎 at 𝑖, we have that
𝑊 [𝑖 + 1...𝑖 + 5] = 𝑎𝑛𝑎𝑛𝑎, which does not contain a 𝑏.
As a more interesting example, take 𝑃 = 𝑎𝑏𝑏𝑎𝑎𝑏𝑏𝑜𝑟𝑟𝑒. This pattern has the
property that the partial match of 𝑎𝑏𝑏𝑎𝑎𝑏𝑏 actually contains as a prefix of 𝑃 itself
as a suffix, namely 𝑎𝑏𝑏. This means that if at some position 𝑖 get this partial match
but the next character is a mismatch, we can not immediately skip the next 6
characters. It is possible that the entire string could have been 𝑎𝑏𝑏𝑎𝑎𝑏𝑏𝑎𝑎𝑏𝑏𝑜𝑟𝑟𝑒.
Then, an actual match (starting at the fifth character) overlaps our partial match.

261
C HAPTER 16. S TRINGS

It seems that if we find a partial match of length 7 (i.e. 𝑎𝑏𝑏𝑎𝑎𝑏𝑏), we can only
skip the first 4 characters of the partial match.
For every possible partial match of the pattern 𝑃, how many characters are
we able to skip if we fail a 𝑘-length partial match? If we could precompute such
a table, we should be able to perform matching in linear time, since we would
only have to investigate every character of 𝑊 once. Assume the next possible
match is 𝑙 letters forward. Then the new partial match must consist of the last
𝑘 − 𝑙 letters of the partial match, i.e. 𝑃 [𝑙 . . . 𝑘 − 1]. But a partial match is just a
prefix of 𝑃, so we must have 𝑃 [𝑙 . . . 𝑘 − 1] = 𝑃 [0 . . . 𝑙 − 1]. In other word, for
every given 𝑘, we must find the longest suffix of 𝑃 [0 . . . 𝐾 − 1] that is also a
prefix of 𝑃 (besides 𝑃 [0 . . . 𝑘 − 1] itself, of course).
We can compute these suffixes rather easily in 𝑂 (𝑛 2 ). For each possible
position for the next possible match 𝑙, we perform a string matching to find all
occurrences of prefixes of 𝑃 within 𝑃:

1: procedure LongestSuffixes(pattern 𝑃)
2: 𝑇 ← new int[|P| + 1]
3: for 𝑙 from 1 to |𝑃 | − 1 do
4: matchLen ← 0
5: while 𝑙 + matchLen ≤ |𝑊 | do
6: if 𝑃 [𝑙]! = 𝑃 [matchLen] then
7: break
8: matchLen ← matchLen + 1
9: 𝑇 [𝑙 + matchLen] = matchLen
10: return 𝑇

A string such as 𝑃 = 𝑏𝑎𝑛𝑎𝑛𝑎𝑟𝑎𝑚𝑎, where no partial match could possibly


contain a new potential match, this table would simply be:

𝑃 𝑏 𝑎 𝑛 𝑎 𝑛 𝑎 𝑟 𝑎 𝑚 𝑎
𝑇 0 0 0 0 0 0 0 0 0 0
When 𝑃 = 𝑎𝑏𝑏𝑎𝑎𝑏𝑏𝑜𝑟𝑟𝑒, the table instead becomes:
𝑃 𝑎 𝑏 𝑏 𝑎 𝑎 𝑏 𝑏 𝑜 𝑟 𝑟 𝑒
𝑇 0 0 0 1 1 2 3 0 0 0 0
With this precomputation, we can now perform matching in linear time.
The matching is similar to the naive matching, except we can now use this

262
16.2. S TRING M ATCHING

precomputed table to determine whether there is a new possible match somewhere


within the partial match.

1: procedure StringMatching(pattern 𝑃, text 𝑊 )


2: matches ← new vector
3: 𝑇 ← LongestSuffixes(𝑃)
4: pos ← 0, match ← 0
5: while pos + match < |𝑊 | do
6: if match < |𝑃 | and 𝑊 [pos + match] = 𝑃 [match] then
7: match ← match + 1
8: else if match = 0 then
9: pos ← pos + 1
10: else
11: pos ← pos + match − 𝑇 [match]
12: match ← 𝑇 [match]
13: if match = |𝑃 | then
14: matches. append(match)
15: return matches

In each iteration of the loop, we see that either match is increased by one,
or match is decreased by match − 𝑇 [match] and pos is increased by the same
amount. Since match is bounded by 𝑃 and pos is bounded by |𝑊 |, this can
happen at most |𝑊 | + |𝑃 | times. Each iteration takes constant time, meaning our
matching is Θ(|𝑊 | + |𝑃 |) time.
While this is certainly better than the naive string matching, it is not
particularly helpful when |𝑃 | = Θ(|𝑊 |) since we need an 𝑂 (|𝑃 |) preprocessing.
The solution lies in how we computed the table of suffix matches, or rather, the
fact that it is entirely based on string matching itself. We just learned how to use
this table to perform string matching in linear time. Maybe we can use this table
to extend itself and get the precomputation down to 𝑂 (|𝑃 |)? After all, we are
looking for occurrences of prefixes of 𝑃 in 𝑃 itself, which is exactly what string
matching does. If we modify the string matching algorithm for this purpose, we
get what we need:

1: procedure LongestSuffixes(pattern 𝑃)
2: 𝑇 ← new int[|𝑃 | + 1]

263
C HAPTER 16. S TRINGS

3: pos ← 1, match ← 0
4: while pos + match < |𝑃 | do
5: if 𝑃 [pos + match] = 𝑃 [match] then
6: 𝑇 [pos + match] ← match + 1
7: match ← match + 1
8: else if match = 0 then
9: pos ← pos + 1
10: else
11: pos ← pos + match − 𝑇 [match]
12: match ← 𝑇 [match]
13: if match = |𝑃 | then
14: matches. append(match)
15: return 𝑇

This string matching algorithm is called the Knuth-Morris-Pratt (KMP) algo-


rithm.
Using the same analysis as for the improved string matching, this precompu-
tation is instead Θ(|𝑃 |). The resulting string matching then takes Θ(|𝑃 | + |𝑊 |).
Competitive Tip
Most programming languages have functions to find occurrences of a certain string in
another. However, they mostly use the naive 𝑂 (|𝑊 ||𝑃 |) procedure. Be aware of this
and code your own string matching if you need it to perform in linear time.

Clock Pictures
Nordic Collegiate Programming Contest 2014
You have two pictures of an unusual kind of clock. The clock has 𝑛 hands, each
having the same length and no kind of marking whatsoever. Also, the numbers
on the clock are so faded that you can’t even tell anymore what direction is up in
the picture. So the only thing that you see on the pictures, are 𝑛 shades of the 𝑛
hands, and nothing else.
You’d like to know if both images might have been taken at exactly the same
time of the day, possibly with the camera rotated at different angles.
Given the description of the two images, determine whether it is possible
that these two pictures could be showing the same clock displaying the same

264
16.3. C HAPTER N OTES

time.
Input
The first line contains a single integer 𝑛 (2 ≤ 𝑛 ≤ 200000), the number of hands
on the clock.
Each of the next two lines contains 𝑛 integers 𝑎𝑖 (0 ≤ 𝑎𝑖 ≤ 360000),
representing the angles of the hands of the clock on one of the images, in
thousandths of a degree. The first line represents the position of the hands on
the first image, whereas the second line corresponds to the second image. The
number 𝑎𝑖 denotes the angle between the recorded position of some hand and
the upward direction in the image, measured clockwise. Angles of the same
clock are distinct and are not given in any specific order.
Output
Output one line containing one word: possible if the clocks could be showing
the same time, impossible otherwise.

16.3 Chapter Notes


rabin karp paper
KMP
hashing

265
C HAPTER 16. S TRINGS

266
17 Combinatorics
Combinatorics deals with various discrete structures, such as graphs and
permutations. In this chapter, we will mainly study the branch of combinatorics
known as enumerative combinatorics – the art of counting. We will count the
number of ways to choose 𝐾 different candies from 𝑁 different candies, the
number of distinct seating arrangements around a circular table, the sum of sizes
of all subsets of a set and many more objects. Many combinatorial counting
problems are based on a few standard techniques which we will learn in this
chapter.

17.1 The Addition and Multiplication Principles


The addition principle states that, given a finite collection of disjoint sets
𝑆 1, 𝑆 2, . . . , 𝑆𝑛 , we can compute the size of the union of all sets by simply adding
up the sizes of our sets, i.e.

|𝑆 1 ∪ 𝑆 2 ∪ · · · ∪ 𝑆𝑛 | = |𝑆 1 | + |𝑆 2 | + · · · + |𝑆𝑛 |

Example 17.1 Assume we have 5 different types of chocolate bars (the set 𝐶),
3 different types of bubble gum (the set 𝐺), and 4 different types of lollipops
(the set 𝐿). These form three disjoint sets, meaning we can compute the
total number of snacks by summing up the number of snacks of the different
types. Thus, we have |𝐶 | + |𝐺 | + |𝐿| = 5 + 3 + 4 = 12 different snacks.

Later on, we will see a generalization of the addition principle that handles
cases where our sets are not disjoint.
The multiplication principle, on the other hand, states that the size of the
Cartesian product 𝑆 1 × 𝑆 2 × · · · × 𝑆𝑛 equals the product of the individual sizes of
these sets, i.e.
|𝑆 1 × 𝑆 2 × · · · × 𝑆𝑛 | = |𝑆 1 | · |𝑆 2 | · · · |𝑆𝑛 |

267
C HAPTER 17. C OMBINATORICS

Example 17.2 Assume that we have the same sets of candies 𝐶, 𝐺 and 𝐿
as in Example 17.1. We want to compose an entire dinner out of snacks,
by choosing one chocolate bar, one bubble gum and a lollipop. The
multiplication principles tells us that, modeling a snack dinner as a tuple
(𝑐, 𝑔, 𝑙) ∈ 𝐶 × 𝐺 × 𝐿, we can form our dinner in 5 · 3 · 4 = 60 ways.

The addition principle is often useful when we solve counting problems by


case analysis.

Example 17.3 How many four letter words consisting of the letters 𝑎, 𝑏, 𝑐
and 𝑑 contain exactly two letters 𝑎?
There are six possible ways to place the two letters 𝑎:

𝑎𝑎__
𝑎_𝑎_
𝑎__𝑎
_𝑎𝑎_
_𝑎_𝑎
__𝑎𝑎

For each of these ways, there are four ways of choosing the other two letters
(𝑏𝑏, 𝑏𝑐, 𝑐𝑏, 𝑐𝑐). Thus, there are 4 + 4 + 4 + 4 + 4 + 4 = 6 · 4 = 24 such words.

Let us now apply these basic principle sto solve the following problem:

Kitchen Combinatorics
Northwestern Europe Regional Contest 2015 – Per Austrin
The world-renowned Swedish Chef is planning a gourmet three-course dinner
for some muppets: a starter course, a main course, and a dessert. His famous
Swedish cook-book offers a wide variety of choices for each of these three
courses, though some of them do not go well together (for instance, you of
course cannot serve chocolate moose and sooted shreemp at the same dinner).
Each potential dish has a list of ingredients. Each ingredient is in turn
available from a few different brands. Each brand is of course unique in its own
special way, so using a particular brand of an ingredient will always result in a

268
17.1. T HE A DDITION AND M ULTIPLICATION P RINCIPLES

completely different dinner experience than using another brand of the same
ingredient.
Some common ingredients such as pølårber may appear in two of the three
chosen dishes, or in all three of them. When an ingredient is used in more than
one of the three selected dishes, Swedish Chef will use the same brand of the
ingredient in all of them.
While waiting for the meecaroo, Swedish Chef starts wondering: how many
different dinner experiences are there that he could make, by different choices
of dishes and brands for the ingredients?
Input
The input consists of:

• five integers 𝑟 , 𝑠, 𝑚, 𝑑, 𝑛, where 1 ≤ 𝑟 ≤ 1 000 is the number of different


ingredients that exist, 1 ≤ 𝑠, 𝑚, 𝑑 ≤ 25 are the number of available starter
dishes, main dishes, and desserts, respectively, and 0 ≤ 𝑛 ≤ 2 000 is the
number of pairs of dishes that do not go well together.

• 𝑟 integers 𝑏 1, . . . , 𝑏𝑟 , where 1 ≤ 𝑏𝑖 ≤ 100 is the number of different brands


of ingredient 𝑖.

• 𝑠 + 𝑚 + 𝑑 dishes – the 𝑠 starter dishes, then the 𝑚 main dishes, then the 𝑑
desserts. Each dish starts with an integer 1 ≤ 𝑘 ≤ 20 denoting the number
of ingredients of the dish, and is followed by 𝑘 distinct integers 𝑖 1, . . . , 𝑖𝑘 ,
where for each 1 ≤ 𝑗 ≤ 𝑘, 1 ≤ 𝑖 𝑗 ≤ 𝑟 is an ingredient.

• 𝑛 pairs of incompatible dishes.

Output
If the number of different dinner experiences Swedish Chef can make is at most
1018 , then output that number. Otherwise, output “too many”.
The solution is a similar addition-multiplication principle combo as used
in Example 17.3. First off, we can simplify the problem considerably by brute
forcing over the coarsest component of a dinner experience, namely the courses
included. Since there are at most 25 dishes of every type, we need to check up
to 253 = 15 625 choices of dishes. By the addition principle, we can compute
the number of dinner experiences for each such three-course dinner, and then
sum them up to get the answer. Some pairs of dishes do not go well together. At
this stage in the process we exclude any triple of dishes that include such a pair.

269
C HAPTER 17. C OMBINATORICS

We can perform this check in Θ(1) time if we save the incompatible dishes in
2D boolean vectors, so that e.g. badStarterMain[𝑖] [ 𝑗] determines if starter 𝑖 is
incompatible with main dish 𝑗.
For a given dinner course consisting of starter 𝑎, main dish 𝑏 and dessert
𝑐, only the set of ingredients of three dishes matters since the chef will use the
same brand for an ingredient even if it is part of two dishes. The next step is
thus to compute this set by taking the union of ingredients for the three included
dishes. This step takes Θ(𝑘𝑎 + 𝑘𝑏 + 𝑘𝑐 ). Once this set is computed, the only
remaining task is to choose a brand for each ingredient. Assigning brands is
an ordinary application of the multiplication principle, where we multiply the
number of brands available for each ingredient together.

17.2 Permutations
A permutation of a set 𝑆 is an ordering of all the elements in the set. For
example, the set {1, 2, 3} has 6 permutations:

123 132
213 231
312 321

Our first “real” combinatorial problem will be to count the number of per-
mutations of an 𝑛-element set 𝑆. When counting permutations, we use the
multiplication principle. We will show a procedure that can be used to construct
permutations one element at a time. Assume that the permutation is the sequence
h𝑎 1, 𝑎 2, . . . , 𝑎𝑛 i. The first element of the permutation, 𝑎 1 , can be assigned any
of the 𝑛 elements of 𝑆. Once this assignment has been made, we have 𝑛 − 1
elements we can choose to be 𝑎 2 (any element of 𝑆 except 𝑎 1 ). In general, when
we are to select the (𝑖 + 1)’th value 𝑎𝑖+1 of the permutation, 𝑖 elements have
already been included in the permutation, leaving 𝑛 − 𝑖 options for 𝑎𝑖+1 . Using
this argument for all 𝑛 elements of the sequence, we can construct a permutation
in 𝑛 · (𝑛 − 1) · · · 2 · 1 ways (by the multiplication principle).
This number is so useful that it has its own name and notation.
Definition 17.1 — Factorial
The factorial of 𝑛, where 𝑛 is a non-negative integer, denoted 𝑛!, is defined

270
17.2. P ERMUTATIONS

as the product of the first 𝑛 positive integers, i.e.


𝑛
Ö
𝑛! = 1 · 2 · · · 𝑛 = 𝑖
𝑖=1

For 𝑛 = 0, we use the convention that the empty product is 1.


This sequence of numbers thus begin 1, 1, 2, 6, 24, 120, 720, 40 320, 362 880,
3 628 800, 39 916 800 for 𝑛 = 0, 1, 2, . . . , 11. It is good to know the magnitudes
of these numbers, since they are frequent in time complexities when doing brute
force over permutations. Asypmtotically, the grow as 𝑛 Θ(𝑛) . More precisely, the
well-used Stirling’s formula1 gives the approximation

√  𝑛 𝑛  1
 
𝑛! = 2𝜋𝑛 1+𝑂
𝑒 𝑛

Exercise 17.1. In how many ways can 8 persons be seated around a round table,
if we consider cyclic rotations of a seating to be different? What if we consider
cyclic rotations to be equivalent?

Problem 17.1
𝑛’th permutation – nthpermutation
Name That Permutation – namethatpermutation

Permutations as Bijections
The word permutation has roots in Latin, meaning “to change completely”. We
are now going look at permutations in a very different light, which gives some
justification to the etymology of the word.
Given a set such as [5], we can fix some ordering of its elements such as
h1, 2, 3, 4, 5i. A permutation 𝜋 = h1, 3, 4, 5, 2i of this set can then be seen as a
movement of these elements. Of course, this same movement can be applied
to any other 5-element set with a fixed permutation, such as h𝑎, 𝑏, 𝑐, 𝑑, 𝑒i being
transformed to h𝑎, 𝑐, 𝑑, 𝑒, 𝑏i. This suggests that we can consider permutation as
a “rule” which describes how to move – permute – the elements.
Such a movement rule can also be described as a function 𝜋 : [𝑛] → [𝑛],
where 𝜋 (𝑖) describes what element should be placed at position 𝑖. Thus, the

1Named after James Stirling (who have other important combinatorial objects named after him
too), but stated already by his contemporary Abraham de Moivre.

271
C HAPTER 17. C OMBINATORICS

permutation h1, 3, 4, 5, 2i would have 𝜋 (1) = 1, 𝜋 (2) = 3, 𝜋 (3) = 4, 𝜋 (4) = 5,


𝜋 (5) = 2.
𝑖 1 2 3 4 5
↓ ↓ ↓ ↓ ↓
𝜋 (𝑖) 1 3 4 5 2
Since each element is mapped to a different element, the function induced by
a permutation is actually a bijection. By interpreting permutations as function,
all the theory from functions apply to permutations too.
We call h1, 2, 3, 4, . . . , 𝑛i the identity permutation, since the function given
by the identity permutation is actually the identity function. As a function, we
can also consider the composition of two permutations. Given two permutations,
𝛼 and 𝛽, their composition 𝛼𝛽 is also a permutation, given by 𝛼𝛽 (𝑘) = 𝛼 (𝛽 (𝑘)).
If we let 𝜎 = h5, 4, 3, 2, 1i the composition with 𝜋 = h1, 3, 4, 5, 2i from above
would then be
𝑖 1 2 3 4 5
↓ ↓ ↓ ↓ ↓
𝜋 (𝑖) 1 3 4 5 2
↓ ↓ ↓ ↓ ↓
𝜎𝜋 (𝑖) 5 3 2 1 4
This is called multiplying permutations, i.e. 𝜎𝜋 is the product of 𝜎 and 𝜋. If we
multiply a permutation 𝜋 by itself 𝑛 times, we call the resulting product 𝜋 𝑛 .
An important property regarding the multiplication of permutations follows
from their functional properties, namely their associativity. We have that
the permutation (𝛼𝛽)𝛾 = 𝛼 (𝛽𝛾), so we will take the liberty of dropping the
parentheses and writing 𝛼𝛽𝛾.
Problem 17.2
Permutation Product – permutationproduct

Permutations also have inverses, which are just the inverses of their functions.
The permutation 𝜋 = h1, 3, 4, 5, 2i which we looked at in the beginning thus
have the inverse given by
𝜋 −1 (1) = 1 𝜋 −1 (3) = 2 𝜋 −1 (4) = 3 𝜋 −1 (5) = 4 𝜋 −1 (2) = 5
written in permutation notation as h1, 5, 2, 3, 4i. Since this is the functional
inverse, we expect 𝜋 −1 𝜋 = id.

272
17.2. P ERMUTATIONS

𝑖 1 2 3 4 5
↓ ↓ ↓ ↓ ↓
𝜋 (𝑖) 1 3 4 5 2
↓ ↓ ↓ ↓ ↓
𝜋 −1 𝜋 (𝑖) 1 2 3 4 5

Problem 17.3
Permutation Inverse – permutationinverse

A related concept is that of the cycle decomposition of a permutation. If


we start with an element 𝑖 and repeatedly apply a permutation on this element
(i.e. take 𝑖, 𝜋 (𝑖), 𝜋 (𝜋 (𝑖)), . . . ) we will at some point find that 𝜋 𝑘 (𝑖) = 𝑖, at which
point we will start repeating ourselves.

𝑖 1 2 3 4 5
↓ ↓ ↓ ↓ ↓
𝜋 (𝑖) 2 1 4 5 3
↓ ↓ ↓ ↓ ↓
𝜋 2 (𝑖) 1 2 5 3 4
↓ ↓ ↓ ↓ ↓
𝜋 3 (𝑖) 2 1 3 4 5
↓ ↓ ↓ ↓ ↓
𝜋 4 (𝑖) 1 2 4 5 3
↓ ↓ ↓ ↓ ↓
𝜋 5 (𝑖) 2 1 5 3 4
↓ ↓ ↓ ↓ ↓
𝜋 6 (𝑖) 1 2 3 4 5

We call the 𝑘 distinct numbers of this sequence the cycle of 𝑖. For 𝜋, we


have two cycles: (1, 2) and (3, 4, 5). Note how 𝜋 (1) = 2 and 𝜋 (2) = 1 for the
first cycle, and 𝜋 (3) = 4, 𝜋 (4) = 5, 𝜋 (5) = 3. It gives us an alternative way of
writing it, namely as the concatenation of its cycles: (1, 2) (3, 4, 5).
To compute the cycle decomposition of a permutation 𝜋, we repeatedly
pick any element of the permutation which is currently not a part of a cycle,
and compute the cycle it is in using the method described above. Since we
will consider every element exactly once, this procedure is Θ(𝑛) for 𝑛-element
permutations.

273
C HAPTER 17. C OMBINATORICS

Problem 17.4
Cycle Decomposition – cycledecomposition

Given a permutation 𝜋, we define its order, denoted ord 𝜋, as the size of


the set {𝜋, 𝜋 2, 𝜋 3, . . . }. For all permutations except for the identity permutation,
this is the smallest integer 𝑘 > 0 such that 𝜋 𝑘 is the identity permutation. In
our example, we have that ord 𝜋 = 6, since 𝜋 6 was the first power of 𝜋 that was
equal to the identity permutation. How can we quickly compute the order of 𝜋?
The maximum √ possible order of a permutation happens to grow rather quickly
(1+𝑜 (1)) 𝑛 log 𝑛
(it is 𝑒 in the number of elements 𝑛). Thus, trying to compute the
order by computing 𝜋 𝑘 for every 𝑘 until 𝜋 𝑘 is the identity permutation is too
slow. Instead, we can use the cycle decomposition. If a permutation has a cycle
(𝑐 1, 𝑐 2, . . . 𝑐𝑙 ), we know that

𝜋 𝑙 (𝑐 1 ) = 𝑐 1, 𝜋 𝑙 (𝑐 2 ) = 𝑐 2, . . . , 𝜋 𝑙 (𝑐𝑙 ) = 𝑐𝑙

by the definition of the cycle composition. Additionally, this means that


(𝜋 𝑙 )𝑘 (𝑐 1 ) = (𝜋 𝑙𝑘 ) (𝑐 1 ) = 𝑐 1 . Hence, any power of 𝜋 that is a multiple of 𝑙 will
act as the identity permutation on this particular cycle.
This fact gives us an upper bound on the order of 𝜋. If its cycle decomposition
has cycles of length 𝑙 1, 𝑙 2, . . . , 𝑙𝑚 , the smallest positive number that is the multiple
of every 𝑙𝑖 is lcm(𝑙 1, 𝑙 2, . . . , 𝑙𝑚 ). The permutation 𝜋 = h2, 1, 4, 5, 3i had two cycles,
one of length 2 and 3. Its order was lcm(2, 3) = 2 · 3 = 6. This is also a lower
bound on the order, a fact that uses the following fact which is left as an exercise:
Exercise 17.2. Prove that if 𝜋 has a cycle of length 𝑙, we must have 𝑙 | ord 𝜋.

Problem 17.5
Order of a Permutation – permutationorder

Dance Reconstruction
Nordic Collegiate Programming Contest 2013 – Lukáš Poláček
Marek loves dancing, got really excited when he heard about the coming wedding
of his best friend Miroslav. For a whole month he worked on a special dance for
the wedding. The dance was performed by 𝑁 people and there were 𝑁 marks
on the floor. There was an arrow from each mark to another mark and every
mark had exactly one incoming arrow. The arrow could be also pointing back to

274
17.2. P ERMUTATIONS

the same mark.


At the wedding, every person first picked a mark on the floor and no 2
persons picked the same one. Every 10 seconds, there was a loud signal when
all dancers had to move along the arrow on the floor to another mark. If an
arrow was pointing back to the same mark, the person at the mark just stayed
there and maybe did some improvised dance moves on the spot.
Another wedding is now coming up a year later, and Marek would like to
do a similar dance. He found two photos from exactly when the dance started
and when it ended. Marek also remembers that the signal was triggered 𝐾 times
during the time the song was played, so people moved 𝐾 times along the arrows.
Given the two photos, can you help Marek reconstruct the arrows on the
floor? On the two photos it can be seen for every person to which position he or
she moved. Marek numbered the people in the first photo from 1 to 𝑁 and then
wrote the number of the person whose place they took in the second photo.
Marek’s time is running out, so he is interested in any placement of arrows
that could produce the two photos.
Input
Two integers 2 ≤ 𝑁 ≤ 10 000 and 1 ≤ 𝐾 ≤ 109 . Then, 𝑁 integers 1 ≤
𝑎 1, . . . , 𝑎 𝑁 ≤ 𝑁 , denoting that dancer number 𝑖 ended up at the place of dancer
number 𝑎𝑖 . Every number between 1 and 𝑁 appears exactly once in the sequence
𝑎𝑖 .
Output
If it is impossible to find a placement of arrows such that the dance performed
𝐾 times would produce the two photos, print “Impossible”. Otherwise print 𝑁
numbers on a line, the 𝑖’th number denoting to which person the arrow leads
from person number 𝑖.

The problem can be rephrased in terms of permutations. First of all, the


dance corresponds so some permutation 𝜋 of the dancers, given by where the
arrows pointed. This is the permutation we seek in the problem. We are given
the permutation 𝑎, so we seek a permutation 𝜋 such that 𝜋 𝐾 = 𝑎.
When given permutation problems of this kind, we should probably attack
it using cycle decompositions in some way. Since the cycles of 𝜋 are all
independent of each other under multiplication, it is a good guess that the
decomposition can simplify the problem. The important question is then how a
cycle of 𝜋 is affected when taking powers. For example, a cycle of 10 elements

275
C HAPTER 17. C OMBINATORICS

in 𝜋 would decompose into two cycles of length 5 in 𝜋 2 , and five cycles of


length 2 in 𝜋 5 . The general case involves the divisors of 𝑙 and 𝐾:

Exercise 17.3. Prove that a cycle of length 𝑙 in a permutation 𝜋 decomposes


into gcd(𝑙, 𝐾) cycles of length gcd(𝑙,𝐾)
𝑙
in 𝜋 𝐾 .

This suggests our first simplification of the problem: to consider all cycles
of 𝜋 𝐾 partitioned by their lengths. By Exercise 17.3, cycles of different lengths
are completely unrelated in the cycle decomposition of 𝜋 𝐾 .

The result also gives us a way to “reverse” the decomposition that happens
to the cycles of 𝜋. Given 𝑚𝑙 cycles of length 𝑚 in 𝜋 𝐾 , we can combine them
into a 𝑙-cycle in 𝜋 in the case where 𝑚 · 𝑔𝑐𝑑 (𝑙, 𝐾) = 𝑙. By looping over every
possible cycle length 𝑙 (from 1 to 𝑁 ), we can then find all possible ways to
combine cycles of 𝜋 𝐾 into larger cycles of 𝜋. This step takes Θ(𝑁 log(𝑁 + 𝐾))
due to the GCD computation.

Given all the ways to combine cycles, a knapsack problem remains for each
cycle length of 𝜋 𝐾 . If we have 𝑎 cycles of length 𝑙 in 𝜋 𝐾 , we want to partition
them into sets of certain sizes (given by by previous computation). This step
takes Θ(𝑎 · 𝑐) ways, if there are 𝑐 ways to combine 𝑎-length cycles.

Once it has been decided what cycles are to be combined, only the act of
computing a combination of them remains. This is not difficult on a conceptual
level, but is a good practice to do on your own (the solution to Exercise 17.3
basically outlines the reverse procedure).

17.3 Ordered Subsets

A variation of the permutation counting problem is to count the number of


ordered sequences containing exactly 𝑘 distinct elements, from a set of 𝑛. We
can compute this by first consider the permutations of the entire set of 𝑛 elements,
and then group together those whose 𝑘 first elements are the same. Taking the

276
17.4. B INOMIAL C OEFFICIENTS

set {𝑎, 𝑏, 𝑐, 𝑑 } as an example, it has the permutations:

abcd bacd cabd dabc


abdc badc cadb dacb

acbd bcad cbad dbac


acdb bcda cbda dbca

adbc bdac cdab dcab


adcb bdca cdba dcba

Once we have chosen the first 𝑘 elements of a permutation, there are (𝑛 − 𝑘)!
ways to order the remaining 𝑛 − 𝑘 elements. Thus, we must have divided our
𝑛! permutations into one group for each ordered 𝑘-length sequence, with each
group containing (𝑛 − 𝑘)! elements. To get the correct total, this means there
must be (𝑛−𝑘)!
𝑛!
such groups – and 𝑘-length sequences.
We call these objects ordered 𝑘-subsets of an 𝑛-element set, and denote the
number of such ordered sets by
𝑛!
𝑃 (𝑛, 𝑘) =
(𝑛 − 𝑘)!
Note that this number can also be written as 𝑛 · (𝑛 − 1) · · · (𝑛 − 𝑘 + 1), which
hints at an alternative way of computing these numbers. We can perform the
ordering and choosing of elements at the same time. The first element of our
sequence can be any of the 𝑛 elements of the set. The next element any but the
first, leaving us with 𝑛 − 1 choices, and so on. The difference to the permutation
is that we stop after choosing the 𝑘’th element, which we can do in (𝑛 − 𝑘 + 1)
ways.

17.4 Binomial Coefficients


Finally, we are going to do away with the “ordered” part of the ordered 𝑘-subsets,
and count the number of subsets of size 𝑘 of an 𝑛-element size. This number is
called the binomial coefficient, and is probably the most important combinatorial
number there is.
To compute the number of 𝑘-subsets of a set of size 𝑛, we start with all the
𝑃 (𝑛, 𝑘) ordered subsets. Any particular unordered 𝑘-subset can be ordered in

277
C HAPTER 17. C OMBINATORICS

exactly 𝑘! different ways. Hence, there must be 𝑃 (𝑛,𝑘)


𝑘! unordered subsets, by the
same grouping argument we used when determining 𝑃 (𝑛, 𝑘) itself.
For example, consider again the ordered 2-subsets of the set {𝑎, 𝑏, 𝑐, 𝑑 }, of
which there are 12.

ab ba ca da

ac bc cb db

ad bd cd dc

The subset {𝑎, 𝑏} can be ordered in 2! ways - the ordered subsets 𝑎𝑏 and 𝑏𝑎.
Since each unordered subset is responsible for the same number of ordered
subsets, we get the number of unordered subsets by dividing 12 with 2!, giving
us the 6 different 2-subsets of {𝑎, 𝑏, 𝑐, 𝑑 }.

ab

ac bc

ad bd cd

Definition 17.2 — Binomial Coefficient


The number of 𝑘-subsets of an 𝑛-set is called the binomial coefficient
 
𝑛 𝑛!
=
𝑘 𝑘!(𝑛 − 𝑘)!

This is generally read as “𝑛 choose 𝑘”.


Note that
(𝑛 − 𝑘 + 1) · (𝑛 − 𝑘 + 2) · · · (𝑛 − 1) · 𝑛
 
𝑛
=
𝑘 1 · 2 · · · (𝑘 − 1) · 𝑘

They are thus the product of 𝑘 numbers, divided by another 𝑘 numbers. With
this fact in mind, it does not seem unreasonable that they should be computable

278
17.4. B INOMIAL C OEFFICIENTS

in 𝑂 (𝑘) time. Naively, one might try to compute them by first multiplying the 𝑘
numbers in the nominator, then the 𝑘 numbers in the denominator, and finally
divide them.
Unfortunately, both of these numbers grow quickly. Indeed, already at 21!
we have outgrown a 64-bit integer. Instead, we will compute the binomial
coefficient by alternating multiplications and divisions. We will start with
storing 1 = 11 . Then, we multiply with 𝑛 − 𝑟 + 1 and divide with 1, leaving
us with 𝑛−𝑟1 +1 . In the next step we multiply with 𝑛 − 𝑟 + 2 and divide with 2,
having computed (𝑛−𝑟 +1)1·2 ·(𝑛−𝑟 +2)
. After doing this 𝑟 times, we will be left with
our binomial coefficient.
There is one big question mark from performing this procedure - why must
our intermediate result always be integer? This must be true if our procedure is
correct, or we will at some point perform an inexact integer division, leaving
us with an incorrect intermediate quotient. If we study the partial results more
closely, we see that they are binomial coefficients themselves, namely 𝑛−𝑟1 +1 ,

𝑛−𝑟 +2
, . . . , 𝑛−1
𝑟 −1 , 𝑟 . Certainly, these numbers must be integers. As we just
  𝑛 
2
showed, the binomial coefficients count things, and counting things tend to
result in integers.
As a bonus, we discovered another useful identity in computing binomial
coefficients:
𝑛 𝑛−1
   
𝑛
=
𝑟 𝑟 𝑟 −1

Exercise 17.4. Prove this identity combinatorially, by first multiplying both sides
with 𝑟 . (Hint: both sides count the number of ways to do the same two-choice
process, but in different order.)

We have one more useful trick up our sleeves. Currently, if we want to


9 
compute e.g. 10109 −1 , we have to perform 109 − 1 operations. To avoid this,
we exploit a symmetry of the binomial coefficient. Assume we are working
with subsets of some 𝑛-element set 𝑆. Then, we can define a bijection from
the subsets of 𝑆 onto itself by taking complements. Since a subset 𝑇 and its
complement 𝑆 \ 𝑇 are disjoint, we have |𝑆 \ 𝑇 | = |𝑆 | − |𝑇 |. This means that
every 0-subset is mapped bijectively to every 𝑛-subset, every 1-subset to every
(𝑛 − 1)-subset, and every 𝑟 -subset to every (𝑛 − 𝑟 )-subset.
However, if we can bijectively map 𝑟 -subsets to (𝑛 − 𝑟 )-subsets, there must
be equally many such subsets. Since there are 𝑛𝑟 subsets of the first kind and


279
C HAPTER 17. C OMBINATORICS

𝑛 
𝑛−𝑟 subsets of the second kind, they must be equal:
   
𝑛 𝑛
=
𝑟 𝑛 −𝑟

More intuitively, our reasoning is basically “choosing what 𝑟 elements


to include in a set is the same as choosing what 𝑛 − 𝑟 elements to exclude”.
9 
This is very useful in our example of computing 10109 −1 , since this equals
109  9
1 = 10 . More generally, this enables us to compute binomial coefficients in
𝑂 (min {𝑟, 𝑛 − 𝑟 }) instead of 𝑂 (𝑟 ).
Problem 17.6
Binomial Coefficients – binomial

Sjecista
Croatian Olympiad in Informatics 2006/2007, Contest #2
In a convex polygon with 𝑁 sides, line segments are drawn between all pairs
of vertices in the polygon, so that no three line segments intersect in the same
point. Some pairs of these inner segments intersect, however.
For 𝑁 = 6, this number is 15.

Figure 17.1: A polygon with 4 vertices.

Given 𝑁 , determine how many pairs of segments intersect.


Input
The integer 3 ≤ 𝑁 ≤ 100.
Output
The number of pairs of segments that intersect.
The problem is a classical counting problem. If we compute the answer by

280
17.4. B INOMIAL C OEFFICIENTS

hand starting at 𝑁 = 0, we get 0, 0, 0, 0, 1, 5, 15, 35. A quick lookup on OEIS2


suggests that the answer is the binomial coefficient 𝑁4 . While this certainly is a


legit strategy when solving problems on your own, this approach is usually not
applicable at contests where access to the Internet tend to be restricted.

Figure 17.2: Four points taken from Figure 17.1.

Instead, let us find some kind of bijection between the objects we count
(intersections of line segments) with something easier to count. This strategy is
one of the basic principles of combinatorial counting. An intersection is defined
by two line segments, of which there are 𝑁2 . Does every pair of segments


intersect? In Figure 17.2, two segments (the solid segments) do not intersect.
However, two other segments which together have the same four endpoints
do intersect with each other. This suggests that line segments was the wrong
level of abstraction when finding a bijection. On the other hand, if we choose
a set of four points, the segments formed by the two diagonals in the convex
quadrilateral given by those four points will intersect at some point (the dashed
segments in Figure 17.2).
Conversely, any intersection of two segments give rise to such a quadrilateral
– the one given by the four endpoints of the segments that intersect. Thus there
exists a bijection between intersections and quadrilaterals, meaning that there
must be an equal number of both. There are 𝑁4 such choices of quadrilaterals,


meaning there are also 𝑁4 points of intersection.




Exercise 17.5. Prove that


1) 𝑛𝑘 = 𝑛−1
  𝑛−1
𝑘−1 + 𝑘
2) 𝑘=0 𝑘 = 2𝑛
Í𝑛 𝑛 

3) 𝑛𝑘=0 (−1)𝑘 𝑛𝑘 = 0
Í 

4) 𝑛𝑘=0 𝑛𝑘 2𝑘 = 3𝑛
Í 
h  Í𝑘 𝑘  𝑙  i
5) 𝑛𝑘=0 𝑛𝑘 𝑙=0 𝑙 2 = 4𝑛
Í

2https://oeis.org/A000332

281
C HAPTER 17. C OMBINATORICS

Dyck Paths
In a grid of width𝑊 and height 𝐻 , we stand in the lower left corner at coordinates
(0, 0), wanting to venture to the upper right corner at (𝑊 , 𝐻 ). To do this, we
are only allowed two different moves – we can either move one unit north, from
(𝑥, 𝑦) to (𝑥, 𝑦 + 1) or one unit east, to (𝑥 + 1, 𝑦). Such a path is called a Dyck
path.

Figure 17.3: A Dyck path on a grid of width 8 and height 5.

As is the this spirit of this chapter, we ask how many Dyck paths there are in
a grid of size 𝑊 × 𝐻 . The solution is based on two facts: a Dyck path consists of
exactly 𝐻 + 𝑊 moves, and exactly 𝐻 of those should be northbound moves, and
𝑊 eastbound. Conversely, any path consisting of exactly 𝐻 + 𝑊 moves where
exactly 𝐻 of those are northbound moves is a Dyck path.
If we consider e.g. the Dyck path in Figure 17.3, we can write down the
sequence of moves we made, with the symbol 𝑁 for northbound moves and 𝐸
for eastbound moves:
𝐸𝐸𝑁 𝐸𝑁 𝑁 𝐸𝐸𝐸𝑁 𝐸𝐸𝑁
Such a sequence must consist of all 𝐻 + 𝑊 moves, with exactly 𝐻 “𝑁 ”-moves.
There are exactly 𝐻 𝐻+𝑊 such sequences, since this is the number of ways we


can choose the subset of positions which should contain the 𝑁 moves.

Figure 17.4: The two options for the last possible move in a Dyck path.

If we look at Figure 17.3, we can find another way to arrive at the same

282
17.4. B INOMIAL C OEFFICIENTS

answer. Letting 𝐷 (𝑊 , 𝐻 ) be the number of Dyck paths in a 𝑊 × 𝐻 grid, some


case work on the last move gives us the recurrence
𝐷 (𝑊 , 𝐻 ) = 𝐷 (𝑊 − 1, 𝐻 ) + 𝐷 (𝑊 , 𝐻 − 1)
with base cases
𝐷 (0, 𝐻 ) = 𝐷 (𝑊 , 0) = 1
We introduce a new function 𝐷 0, defined by 𝐷 0 (𝑊 + 𝐻,𝑊 ) = 𝐷 (𝑊 , 𝐻 ).
This gives us the recurrence
𝐷 0 (𝑊 + 𝐻, 𝐻 ) = 𝐷 0 (𝑊 − 1 + 𝐻,𝑊 − 1) + 𝐷 0 (𝑊 + 𝐻 − 1, 𝐻 − 1)
with base cases
𝐷 0 (0, 0) = 𝐷 0 (𝐻, 𝐻 ) = 0
These relations are satisfied by the binomial coefficients (Exercise 17.5).
Exercise 17.6. Prove that 𝑛𝑖=0 𝑛𝑖 𝑛−𝑖 = 2𝑛
𝑛 .
Í  𝑛  

While Dyck paths sometimes do appear directly in problems, they are also a
useful tool to find bijections to other objects.

Sums
In how many ways can the numbers 0 ≤ 𝑎 1, 𝑎 2, . . . , 𝑎𝑘 be chosen such that
𝑘
Õ
𝑎𝑖 = 𝑛
𝑖=1

Input
The integers 0 ≤ 𝑛 ≤ 106 and 0 ≤ 𝑘 ≤ 106 .
Output
Output the number of ways modulo 109 + 7.
Given a Dyck path such as the one in Figure 17.3, what happens if we count
the number of northbound steps we take at each 𝑥-coordinate? There are a total
of 𝑊 + 1 coordinates and 𝐻 northbound steps, so we except this to be a sum of
𝑊 + 1 (non-negative) variables with a sum of 𝐻 . This is indeed similar to what
we are counting, and Figure 17.5 shows this connection explicitly.
This mapping gives us a bijective mapping between sums of 𝑘 terms with a
sum of 𝑛, to Dyck paths on a grid of size (𝑘 − 1) × 𝑛. We already know how
many such Dyck paths there are: 𝑛+𝑘−1 .

𝑛

283
C HAPTER 17. C OMBINATORICS

a1 a2 a3 a4 a5 a6 a7 a8 a9
0+0+1+2+0+0+1+0+1=5

Figure 17.5: A nine-term sum as a Dyck path.

Catalan Numbers
A special case of the Dyck paths are the paths on a square grid that do not cross
the diagonal of the grid. See Figure 17.6 for an example.

Figure 17.6: A valid path (left) and an invalid path (right).

We are now going to count the number of such paths, the most complex
counting problem we have encountered so far. It turns out there is a straight-
forward bijection between the invalid Dyck paths, i.e. those who do cross
the diagonal of the grid, to Dyck paths in a grid of different dimensions. In
Figure 17.6, the right grid contained a path that cross the diagonal. If we take
the part of the grid just after the first segment that crossed the diagonal and
mirror it in the diagonal translated one unit upwards, we get the situation in
Figure 17.7.
We claim that when mirroring the remainder of the path in this translated
diagonal, we will get a new Dyck path on the grid of size (𝑛 − 1) × (𝑛 + 1).
Assume that the first crossing is at the point (𝑐, 𝑐). Then, after taking one step up
in order to cross the diagonal, the remaining path goes from (𝑐, 𝑐 + 1) to (𝑛, 𝑛).
This needs 𝑛 − 𝑐 steps to the right and 𝑛 − 𝑐 − 1 steps up. When mirroring, this
instead turns into 𝑛 −𝑐 −1 steps up and 𝑛 −𝑐 steps right. Continuing from (𝑐, 𝑐 +1),

284
17.4. B INOMIAL C OEFFICIENTS

Figure 17.7: Mirroring the part of the Dyck path after its first diagonal crossing.

the new path must thus end at (𝑐 + (𝑛 − 𝑐 − 1), 𝑐 + 1 + (𝑛 − 𝑐)) = (𝑛 − 1, 𝑛 + 1).


This mapping is also bijective.
This bijection lets us count the number of paths that do cross the diagonal:
2𝑛 
they are 𝑛+1 . The numbers of paths that does not cross the diagonal is then
2𝑛 2𝑛 
.

𝑛 − 𝑛+1

Definition 17.3 — Catalan Numbers


The number of Dyck paths in an ×𝑛 grid is called the 𝑛’th Catalan number

2𝑛 2𝑛 2𝑛 𝑛 2𝑛 1 2𝑛
         
𝐶𝑛 = − = − =
𝑛 𝑛+1 𝑛 𝑛+1 𝑛 𝑛+1 𝑛
The first few Catalan numbers3 are 1, 1, 2, 5, 14, 42, 132, 429, 1430.
Problem 17.7
Catalan Numbers – catalan

Catalan numbers count many other objects, most notably the number of
balanced parentheses expressions. A balanced parentheses expression is a string
of 2𝑛 characters 𝑠 1𝑠 2 . . . 𝑠 2𝑛 of letters ( and ), such that every prefix 𝑠 1𝑠 2 . . . 𝑠𝑘
contain at least as many letters ( as ). Given such a string, like (() ()) (()) we
can interpret it as a Dyck path, where ( is a step to the right, and ) is a step
upwards. Then, the condition that the string is balanced is that, for every partial
Dyck path, we have taken at least as many right steps as we have taken up steps.
This is equivalent to the Dyck path never crossing the diagonal, giving us a
bijection between parentheses expressions and Dyck paths. The number of such
parentheses expressions are thus also 𝐶𝑛 .

3https://oeis.org/A000108

285
C HAPTER 17. C OMBINATORICS

17.5 The Principle of Inclusion and Exclusion


Often, we wish to compute the size of the union of a collection of sets
𝑆 1, 𝑆 2, . . . , 𝑆𝑛 , where these sets are not pairwise disjoint. For this, the prin-
ciple of inclusion and exclusion was developed.

A A∩B B

A∪B

Figure 17.8: The union of two sets 𝐴 and 𝐵.

Let us consider the most basic case of the principle, using two sets 𝐴 and
𝐵. If we wish to compute the size of their union |𝐴 ∪ 𝐵|, we at least need to
count every element in 𝐴 and every set in 𝐵, i.e. |𝐴| + |𝐵|. The problem with
this formula is that whenever an element is in both 𝐴 and 𝐵, we count it twice.
Fortunately, this is easily mitigated: the number of elements in both sets equals
|𝐴 ∩ 𝐵| (Figure 17.8). Thus, we see that |𝐴 ∪ 𝐵| = |𝐴| + |𝐵| − |𝐴 ∩ 𝐵|.
Similarly, we can determine a formula for the union of three sets |𝐴 ∪ 𝐵 ∪ 𝐶 |.
We begin by including every element: |𝐴| + |𝐵| + |𝐶 |. Again, we have included
the pairwise intersections too many times, so we remove those and get

|𝐴| + |𝐵| + |𝐶 | − |𝐴 ∩ 𝐵| − |𝐴 ∩ 𝐶 | − |𝐵 ∩ 𝐶 |

This time, however, we are not done. While we have counted the elements
which are in exactly one of the sets correctly (using the first three terms), and
the elements which are in exactly two of the sets correctly (by removing the
double-counting using the three latter terms), we currently do not count the
elements which are in all three sets at all! Thus, we need to add them back,
which gives us the final formula:

|𝐴 ∪ 𝐵 ∪ 𝐶 | = |𝐴| + |𝐵| + |𝐶 | − |𝐴 ∩ 𝐵| − |𝐴 ∩ 𝐶 | − |𝐵 ∩ 𝐶 | + |𝐴 ∩ 𝐵 ∩ 𝐶 |

Exercise 17.7. Compute the number of integers between 1 and 1000 that are
divisible by 2, 3 or 5.

286
17.5. T HE P RINCIPLE OF I NCLUSION AND E XCLUSION

From the two examples, you can probably guess formula in the general case,
which we write in the following way:
𝑛
Ø Õ Õ Õ
𝑆𝑖 = |𝑆𝑖 |− |𝑆𝑖 ∩𝑆 𝑗 |+ |𝑆𝑖 ∩𝑆 𝑗 ∩𝑆𝑘 |−· · ·+(−1)𝑛+1 |𝑆 1 ∩𝑆 2 ∩· · ·∩𝑆𝑛 |
𝑖=1 𝑖 𝑖< 𝑗 𝑖< 𝑗 <𝑘

From this formula, we see the reason behind the naming of the principle.
We include every element, exclude the ones we double-counted, include the
ones we removed too many times, and so on. The principle is based on a very
important assumption – that it is easier to compute intersections of sets than their
unions. Whenever this is the case, you might want to consider if the principle is
applicable.

Derangements
Compute the number of permutations 𝜋 of length 𝑁 such that 𝜋 (𝑖) ≠ 𝑖 for every
𝑖 = 1 . . . 𝑁.
This is a typical application of the principle. We will use it on those sets
of permutations be where the condition is false for at least a particular index
𝑖 If we let these sets be 𝐷𝑖 , the set of all permutations where the condition is
false is 𝐷 1 ∪ 𝐷 2 ∪ · · · ∪ 𝐷 𝑁 . This means we seek 𝑁 ! − |𝐷 1 ∪ · · · ∪ 𝐷 𝑁 |. To
apply the inclusion and exclusion formula, we must be able to compute the size
of intersections of the subsets of 𝐷𝑖 . This task is simplified greatly since the
intersection of 𝑘 such subsets is entirely symmetrical (it does not matter for
which elements the condition is false, only the number).
If we want to compute the intersection of 𝑘 such subsets, this means that
there are 𝑘 indices 𝑖 where 𝜋 (𝑖) = 𝑖. There are 𝑁 − 𝑘 other elements, which
can be arranged in (𝑁 − 𝑘)! ways, so the intersection of these sets have size
(𝑁 − 𝑘)!. Since we can choose which 𝑘 elements that should be fixed in 𝑁𝑘


ways, the term in the formula where we compute all 𝑘-way intersections will
evaluate to 𝑁𝑘 (𝑁 − 𝑘)! = 𝑁𝑘!! . Thus, the formula can be simplified to


𝑁! 𝑁! 𝑁!
− + −...
1! 2! 3!
Subtracting this from 𝑁 ! means that there are
1 1
𝑁 !(1 − 1 + − +...)
2! 3!
This gives us a Θ(𝑁 ) algorithm to compute the answer.

287
C HAPTER 17. C OMBINATORICS

It is possible to simplify this further, using some insights from calculus. We


have that
1 1
𝑒 −1 = 1 − 1 + − + . . .
2 3
Then, we expect that the answer should converge to 𝑁𝑒 ! . As it happens, the
answer will always be 𝑁𝑒 ! rounded to the nearest integer.

Exercise 17.8. 8 persons are to be seated around a circular table. The company
is made up of 4 married couples, where the two members of a couple prefer
not to be seated next to each other. How many possible seating arrangements
are possible, assuming the cyclic rotations of an arrangement are considered
equivalent?

17.6 The Pigeon Hole Principle


The Pigeon Hole Principle is a principle as intuitive as it is useful. Its statement
is short and simple: if we place 𝑁 + 1 different objects into 𝑁 boxes, some box
must contain at least 2 objects.

17.7 Invariants
Many problems deal with processes which consist of many steps. During such
processes, we are often interested in certain properties that never change. We
call such a property an invariant. For example, consider the binary search
algorithm to find a value in a sorted array. During the execution of the algorithm,
we maintain the invariant that the value we are searching for must be contained
in some given segment of the array indexed by [lo, hi) at any time. The fact that
this property is invariant basically constitutes the entire proof of correctness
of binary search. Invariants are tightly attached to greedy algorithms, and is a
common tool used in proving correctness of various greedy algorithms. They
are also one of the main tools in proving impossibility results (for example when
to answer NO in decision problems).

Permutation Swaps
Given is a permutation 𝑎𝑖 of h1, 2, ..., 𝑁 i. Can you perform exactly 𝐾 swaps,
i.e. exchanging pairs of elements of the permutation, to obtain the identity
permutation h1, 2, ..., 𝑁 i?

288
17.7. I NVARIANTS

Input
The first line of input contains the size of the permutation 1 ≤ 𝑁 ≤ 100 000.
The next line contains 𝑁 integers separated, the permutation 𝑎 1, 𝑎 2, ..., 𝑎 𝑁 .
Output
Output YES if it is possible, and NO if it is impossible.
First, we need to compute the minimum number of swaps needed.
Assume the cycle decomposition of the permutation consists of 𝐶 cycles
(see 17.2.1 for a reminder of this concept), with lengths 𝑏 1, 𝑏 2, .., 𝑏𝐶 . Then, we
need at least
Õ𝐶
𝑆= 𝑏𝑖 − 1
𝑖=1

swaps to return it to the identity permutation, a fact you will be asked to prove
in the next section on monovariants. This gives us one necessary condition:
𝐾 ≥ 𝑆. However, this is not sufficient. A single additional condition is needed –
that 𝑆 and 𝐾 have the same parity! To prove this, we will look at the number
of inversions of a permutation, one of the common invariant properties of
permutations.
Given a permutation 𝑎𝑖 , we say that the pair (𝑖, 𝑗) is an inversion if 𝑖 < 𝑗,
but 𝑎𝑖 > 𝑎 𝑗 . Intuitively, it is the number of pairs of elements that are “out of
place” in relation to each other.

3 5 2 1 4 6 inversions

1 5 2 3 4 3 inversions

1 5 3 2 4 4 inversions

1 5 3 4 2 5 inversions

1 2 3 4 5 0 inversions

Figure 17.9: The number of inversions for permutations differing only by a single swap.

If we look at Figure 17.9, where we started out with a permutation and


performed a number of swaps (transforming it to the identity permutation),
we can spot a simple invariant. The parity of the number of swaps and the
number of inversions seems to always be the same. This characterization of
a permutation is called odd and even permutations depending on whether the
number of inversions is odd or even. Let us prove that this invariant actually

289
C HAPTER 17. C OMBINATORICS

holds.
If this is the case, it is obvious why 𝑆 and 𝐾 must have the same parity. Since
𝑆 is the number of swaps needed to transform the identity permutation to the
given permutation, it must have the same parity as the number of inversions. By
performing 𝐾 swaps, 𝐾 must have the same parity as the number of inversions.
As 𝐾 and 𝑆 must have the same parity as the number of inversions, they must
have the same parity as each other.
To see why these two conditions are sufficient, we can, after performing 𝑆
swaps to obtain the identity permutation, simply swap two numbers with each
other the remaining swaps. This can be done since 𝐾 − 𝑆 will be an even number
due to their equal parity.

17.8 Monovariants
Another similar tool (sometimes called a monovariant) instead define some kind
of value 𝑝 (𝑣) to the state 𝑣 of each step of the process. We choose 𝑝 such that it
is strictly increasing or decreasing. They are mainly used to prove the finiteness
of a process, in which either:
• The value function assume e.g. integer values, and is easily bounded in
the direction of monotonicity (e.g. an increasing function would have an
upper bound).
• The value function assume can assume any real, but there are only finitely
many states the process can be in. In this case, the monovariant is used
to prove that the process will never return to a previous state since this
would contradict the monotonicity of 𝑝.
Let us begin with a famous problem of the first kind.

Majority Graph Bipartitioning


Given is a graph 𝐺. Find a bipartition of this graph into parts 𝑈 and 𝑉 , such
that every vertex 𝑣 has at most |𝑁 2(𝑣) | neighbors in the same part as 𝑣 itself.
Input
The first line of input contains integers 1 ≤ 𝑉 ≤ 100 and 0 ≤ 𝐸 ≤ 𝑉 (𝑉2−1) – the
number of vertices and edges respectively. The next 𝐸 lines contain two integers
0 ≤ 𝑎 ≠ 𝑏 < 𝑉 , the zero-indexed numbers of two vertices that are connected by
an edge. No pair of vertices will have two edges.

290
17.8. M ONOVARIANTS

Output
Output 𝑁 integers, one for each vertex. The 𝑖’th integer should be 1 or 2 if the
𝑖’th vertex is in the first or the second part of the partition, respectively.
As an example, consider the valid and invalid partitionings in Figure 17.10.
The vertices which does not fulfill the neighbor condition are marked in gray.

D
B F

C G
A
E

Figure 17.10: An invalid bipartitioning, where vertices 𝐵, 𝐷, 𝐺 break the condition.

Problems generally considered greedy algorithms and pure monovariant


problems usually differ in that the choice of next action usually has less thought
behind it in the monovariant problems. We will often focus not on optimally
short sequences of choices as we do with greedy algorithms, but merely finding
any valid configuration. For example, in the problem above, one might try to
construct a greedy algorithm based on for example the degrees of the vertices,
which seems reasonable. However, it turns out there is not enough structure in
the problem to find any simple greedy algorithm to solve the problem.
Instead, we will attempt to use the most common monovariant attack.
Roughly, the process follows these steps:
1. Start with any arbitrary state 𝑠.

2. Look for some kind of modification to this state, which is possible if and
only if the state is not admissible. Generally, the goal of this modification
is to “fix” whatever makes the state inadmissible.

3. Prove that there is some value 𝑝 (𝑠) that must decrease whenever such a
modification is done.

4. Prove that this value cannot decrease infinitely many times.


Using these four rules, we prove the existence of an admissible state. If (and
only if) 𝑠 is not admissible, by step 2 we can perform some specified action on

291
C HAPTER 17. C OMBINATORICS

it, which by step 3 will decrease the value 𝑝 (𝑠). Step 4 usually follows from one
of the two value functions discussed previously. Hence, by performing finitely
many such actions, we must (by rule 4) reach a state where no such action is
possible. This happens only when the state is admissible, meaning such a state
must exist. The process might seem a bit abstract, but will become clear once
we walk you through the bipartitioning step.
Our algorithm will work as follows. First, consider any bipartition of the
graph. Assume that this graph does not fulfill the neighbor condition. Then,
there must exist a vertex 𝑣 which has more than |𝑁 2(𝑣) | vertices in the same part
as 𝑣 itself. Whenever such a vertex exists, we move any of them to the other side
of the partition. See Figure 17.11 of the this process.

B
D
F

C G
A
E

G
B
D
F

C
A
E

Figure 17.11: Two iterations of the algorithm, which brings the graph to a valid state.

One question remains – why does this move guarantee a finite process? We
now have a general framework to prove such things, which suggests that perhaps
we should look for a value function 𝑝 (𝑠) which is either strictly increasing or
decreasing as we perform an action. By studying the algorithm in action in
Figure 17.11 we might notice that more and more edges tend to go between the

292
17.8. M ONOVARIANTS

two parts. In fact, this number never decreased in our example, and it turns out
this is always the case.
If a vertex 𝑣 has 𝑎 neighbors in the same part, 𝑏 neighbors in the other part,
and violates the neighbor condition, this means that 𝑎 > 𝑏. When we move 𝑣 to
the other part, the 𝑏 edges from 𝑣 to its neighbors in the other part will no longer
be between the two parts, while the 𝑎 edges to its neighbors in the same part will.
This means the number of edges between the parts will change by 𝑎 − 𝑏 > 0.
Thus, we can choose this as our value function. Since this is an integer function
with the obvious upper bound of 𝐸, we complete step 4 of our proof technique
and can thus conclude the final state must be admissible.
In mathematical problem solving, monovariants are usually used to prove
that the an admissible state exists. However, such problems are really algorithmic
problems in disguise, since they actually provide an algorithm to construct such
an admissible state.
Let us complete our study of monovariants, by also showing a problem using
the second value function rule.

Water Pistols
𝑁 girls and 𝑁 boys stand on a large field, with no line going through three
different children.
Each girl is equipped with a water pistol, and wants to pick a boy to fire at.
While the boys probably will not appreciate being drenched in water, at least the
girls are a fair menace – the will only fire at a single boy each. Unfortunately, it
may be the case that two girls choose which boys to fire at in such a way that
the water from their pistols will cross at some point. If this happens, they will
cancel each other out, never hitting their targets.
Help the girls choose which boys to fire at, in such a way that no two girls
fire at the same boy, and the water fired by two girls will not cross.

293
C HAPTER 17. C OMBINATORICS

Figure 17.12: An assignment where some beams intersect (left), and an assignment where no
beams intersect (right).

Input
The first line contains the integer 𝑁 ≤ 200. The next 𝑁 lines contain two real
numbers −106 ≤ 𝑥, 𝑦 ≤ 106 , separated by a space. Each line is the coordinate
(𝑥, 𝑦) of a girl. The next and final 𝑁 lines contain the coordinates of the boys,
in the same format.
Output
Output 𝑁 lines. The 𝑖’th line should contain the zero-indexed number of the boy
which the 𝑖’th girl should fire at.
After seeing the solution to the previous problem, the solution should not
come as a surprise. We start by randomly assigning the girls to one boy each,
with no two girls shooting at he same boy. If this assignment contains two girls
firing water beams which cross, we simply swap their targets.
Unless you are geometrically minded, it may be hard to figure out an
appropriate value function. The naive value function of counting the current
number of water beams crossing unfortunately fails – and might even increase
after a move.
Instead, let us look closer at what happens when we switch the targets of
two girls. In Figure 17.13, we see the before and after of such an example, as
well as the two situations interposed. If we consider the sum of the two lengths
of the water beams before the swap ((𝐶 + 𝐷) + (𝐸 + 𝐹 )) versus the lengths after
the swap (𝐴 + 𝐵), we see that the latter must be less than the first. Indeed, we
have 𝐴 < 𝐶 + 𝐷 and 𝐵 < 𝐸 + 𝐹 by the triangle inequality, which by summing the
two inequalities give the desired result. Thus the sum of all water beam lengths

294
17.9. C HAPTER N OTES

D
A
E
C
B
F

Figure 17.13: Swapping the targets of two intersecting beams.

will decrease whenever we perform such a move. As students of algorithmics,


we can make the additional note that this means the minimum-cost matching
of the complete bipartite graph of girls and boys, with edges given as cost the
distance between a particular girl and boy, is a valid assignment. If this was
not the case, we would be able to swap two targets and decrease the cost of the
matching, contradicting the assumption that it was minimum-cost. Thus, this
rather mathematical proof actually ended up giving us a very simple reduction
to min-cost matching.
Problem 17.8
Army Division – armydivision
Bread Sorting – breadsorting

17.9 Chapter Notes

295
C HAPTER 17. C OMBINATORICS

296
18 Game Theory

In ordinary life, most of us are familar with the concept of a game. We play video
games, sports, card games, board games or any of the other many kinds of games.
As algorithmists, we primarily focus on a subset of competitive games tegic
aspects and well-defined rules, where determining who won is simple. Games
such as chess, poker, tic-tac-toe or or yatzee belong to this category, unlike
soccer (running humans and the behaviour of rolling balls are not sufficiently
well-defined) or most real-time video games where reaction speed counts. The
mathematical area analyzing this kind of game is called game theory.
The games we deal with in algorithmic problem solving is a small subset of
this category. We are often given a position in some kind of abstract turn-based
game, tasked to determine if the player at turn wins if both players play as good
as possible. Players never make mistakes in the games we analyze. For examples,
the game of tic-tac-toe is considered to be a drawn game, since perfect play in
the game always results in a draw.
In this chapter, we learn some basic techniques for determining who wins
a certain game. Occasionally a problem also asks us to construct an optimal
strategy (for example by making the problem interactive and playing against us).
This is often the case when it is “obvious” who wins the game, or at least when
it’s very easy to guess a winner but harder to prove why. Proofs often present us
with an optimal strategy, so we will generally aim to prove who wins even when
one in for example a contest situation just would guess.

18.1 Mathematical Techniques

Before we dive into analysis of games that will require programming, we start
with some of the more basic techniques that one might use to solve games
given in mathematical rather than algorithmic problems. They are useful in
algorithmic problem solving as well, while also serving as an introduction to
the kind of games we try to solve.

297
C HAPTER 18. G AME T HEORY

Symmetry
When children first to play chess, an early attempted strategy is playing that
of playing symmetrically. White moves its E pawn two squares forward, the
child responds with the same, and so on. Of course, this is not a a very good
strategy – once white plays the winning move, it is very difficult for black
to reply with the symmetric move. In variants of chess, this strategy works
better.

Knight Packing
On a 𝑛 × 𝑛 chess board, two players alternate placing a knight on the board. A
knight can only be placed if there is no other knight which would be either 1
row and 2 columns or 2 rows and 1 column away from it. The first player who
cannot place a knight on the board loses. Given 𝑛 (1 ≤ 𝑛 ≤ 109 ), determine if
the first or second player to move wins.

Solution. In all problems of this kind, where one is supposed to make a move
by e.g. choosing a square to place something at, the first question should
be – if my opponent makes the first move, can I pick a symmetric move that
is always possible? This symmetry can manifest in several ways, such as
mirroring a move along an axis, rotating it 180◦ around a center, or even taking
the complement of a subset, where a move consists of choosing a subset of
something. Games on grids are great candidates for a mirroring strategy using
the first two transformations.


Coloring Game 1
A graph consists of 𝑛 vertices and 𝑛 edges, connecting the vertices into a single
cycle of length 𝑛. Two players play the following game on this graph. Initially,
all vertices are colored white. A move consists of coloring one of the white
vertexes red or blue, where player 1 colors vertexes red, and player 2 colors them
blue. A vertex can only be colored red or blue if neither it nor its two neighbours
have that same color.
For a given 𝑛, if both players play optimally, who wins?

Solution. The first steps in most games should be to try and solve it for a few
smaller instances, to see if a pattern emerges. 

298
18.2. T HE G RAPH G AME

18.2 The Graph Game


Most of the games we study can be reduced to the following abstraction, which
we will call the Graph Game. Given a directed graph 𝐺, a game piece is placed
on one of the vertices (generally called positions) in the graph. Two players
(Q and q) alternate in making moves, where a move consists of taking the
piece from its current position 𝑣 and moving it along an outgoing edge from 𝑣.
Whenever a player is unable to make such a move, i.e. when the outdegree of 𝑣
is 0, that player loses the game (called a normal game)1. We call these states
terminal positions.

D Q
A q E

B
Q

Figure 18.1: An example of a graph game with 6 positions. Player Q starts and has three possible
moves 𝐴, 𝐵 and 𝐶. Q chooses to move to 𝐴, whereupon q responds with the only available
move 𝐷. Finally, Q ends the game with the move 𝐸, leaving q with no possible moves who
therefore lose the game.

We will study three variants of the game:

• The Graph Game on acyclic graphs.

1The opposite kind of game, where the player unable to make a move wins, is called a Misère
game.

299
C HAPTER 18. G AME T HEORY

• The Graph Game on general graphs.

• The Graph Game where a position may only be visited once.


Problems relating to games often ask us whether a certain position 𝑣 is a
winning position. This means that, if the game piece is currently placed on
𝑣, the current player can always play in such a way that the other player will
eventually be unable to move (thus losing). Conversely, a losing position is one
where the current player will eventually lose if the other player moves optimally.

Acyclic Games
An acyclic graph game admit a simple classification of winning and losing
positions.

𝑊
𝐿 𝐿

𝑊
𝑊

Figure 18.2: The winning and losing positions of the game in Figure 18.1

A position is winning if and only if at least one of its neighbouring positions


is a losing position, and losing if and only if all of its neighbouring positions
are winning positions. This can be proved by a short proof by contradiction. If
a position has a losing position as neighbour, we can ensure our opponent has a
losing position. This means the opponent will lose if we play optimally. On the
other hand, if all the neighbours are winning positions, we are forced to give
our opponent a winning position, meaning we will lose if the opponent plays
optimally.
1 bool isWinning(graph G, vertex current) {
2 vector<vertex> moves = G.adj(current);
3 for (vertex pos : moves) {
4 if (!isWinning(pos)) return true;
5 }
6 return false;
7 }

300
18.2. T HE G RAPH G AME

If we memoize this algorithm, it is linear in the number of edges of the


graph.

General Games
Non-Repetitive Games

301
C HAPTER 18. G AME T HEORY

302
19 Number Theory
Number theory is the study of certain properties of integers. It makes an
occasional appearance within algorithmic problem solving, in the form of its
subfield computational number theory. It is within number theory topics such
as divisibility and prime numbers belong.
In competitions, number theory problems range from simple applications
of the main theorems you learn in the chapter, to trickier tasks where you must
combine hard number theoretical insights with other algorithmic techniques.
The latter can involve required insights that are difficulty mathematic problems
themselves. This should not come as a surprise. Most content in this chapter is
essentially about learning efficient methods of computing the standard number
theoretical objects, such as primes, modular inverses, divisors, and becoming
well aquainted with time complexities and other asymptotic approximations that
tend to arise in number theoretical problems.

19.1 Divisibility
All of the number theory in this chapter relate to a single property of integers,
divisibility.

Definition 19.1 — Divisibility


An integer 𝑛 is divisible by an integer 𝑑 if there exists an integer 𝑞 such that
𝑛 = 𝑑𝑞. We then call 𝑑 a divisor of 𝑛.
We use the notation 𝑑 | 𝑛 to state that 𝑑 is a divisor of 𝑛, and 𝑑 - 𝑛 when
it is not.
Dividing both sides of the equality 𝑛 = 𝑑𝑞 with 𝑑 gives us an almost
equivalent definition, namely that 𝑑𝑛 is an integer. The difference is that the
first definition admit the divisibility of 0 by 0, while the second one does not
(zero division is undefined). When we speak of the divisors of a number in
most contexts (as in Example 19.1), we generally consider only the non-negative
divisors. Since 𝑑 is a divisor of 𝑛 if and only if −𝑑 is a divisor of 𝑛, this
sloppiness lose little information.

303
C HAPTER 19. N UMBER T HEORY

Example 19.1 — Divisors of 12


The number 12 has 6 divisors – 1 (1 · 12 = 12), 2 (2 · 6 = 12), 3 (3 · 4 = 12),
4 (4 · 3 = 12), 6 (6 · 2 = 12) and 12 (12 · 1 = 12).
12 is not divisible by e.g. 5 – we have 12 2
5 = 2 + 5 , which is clearly not
an integer.

Exercise 19.1. Compute the divisors of 7, 18 and 39.


The concept of divisibility raises many questions. First and foremost – how
do we check if a number is divisible by another? This question has one short
and one long answer. For small numbers, i.e. those that fit inside the native
integer types of a language, checking for divisibility is as simple as using the
modulo operator (%) of your favorite programming language. An integer 𝑛 is
divisible by 𝑑 if and only if 𝑛 % 𝑑 = 0, since this means 𝑑𝑛 has no remainder and
is therefore an integer.
For large numbers, checking divisibility is more difficult. Some programming
languages, such as Java and Python, have built-in support for dealing with large
integers, but e.g. C++ does not. In Section 19.4 on modular arithmetic, we
discuss the implementation of the modulo operator on large integers.

Dual Divisibility
Given two positive integers 𝑎 and 𝑏 with the same number of digits (1 ≤ 𝑏 ≤
𝑎 ≤ 1018 ), compute the number of divisors of 𝑎 that have 𝑏 as a divisor.
For example, with 𝑎 = 96 and 𝑏 = 12, there are 5 such numbers: 12, 24, 36,
48 and 96.

Solution. Assume that 𝑐 is such a number. The solution falls out from some
applications of the definition of divisibility. We have 𝑎 = 𝑐𝑞 and 𝑐 = 𝑏𝑞 0 for
some positive integers 𝑞, 𝑞 0.
The value of 𝑞 0 is at most 9 by the following argument. If 𝑞 0 ≥ 10, we have
𝑎 = 𝑐𝑞 ≥ 𝑐 ≥ 10𝑏, but then 𝑎 has more digits than 𝑏, a contradiction. Thus, we
can simply test all the values of 𝑐 by letting 𝑞 0 = 1, 2, . . . , 9 and verifying that
the two conditions hold using the modulo operator. 

Problem 19.1
Dual Divisibility – dualdivisibility
Evening Out 1 – eveningout1

304
19.1. D IVISIBILITY

Multiplication Table – multtable


Note: Solve it for 1/2 points.
Divisor Shuffle – divisorshuffle

Next, how do we compute all the divisors of an integer 𝑛?

Divisors
Given an integer 𝑛, compute all the positive divisors of 𝑛.
Every integer has at least two particular divisors called the trivial divisors,
namely 1 and 𝑛 itself. If we exclude the divisor 𝑛, we get the proper divisors.
To find the remaining divisors, we can use the fact that any divisor 𝑑 of 𝑛 must
satisfy |𝑑 | ≤ |𝑛| (why?). This means that we can limit ourselves to testing
whether the integers between 1 and 𝑛 are divisors of 𝑛, a Θ(𝑛) algorithm. We
can do a bit better though, by exploiting a nice symmetry.
Hidden in Example 19.1 lies the key insight to speeding this up. It seems
that whenever we had a divisor 𝑑, we were immediately given another divisor 𝑞.
For example, when claiming 3 was a divisor of 12 since 3 · 4 = 12, we found
another divisor, 4. This is not a surprise, given that the definition of divisibility
(Definition 19.1) – the existence of the integer 𝑞 in 𝑛 = 𝑑𝑞 – is symmetric in 𝑑
and 𝑞, meaning divisors come in pairs (𝑑, 𝑑𝑛 ).

Exercise 19.2. Prove that a positive integer has an odd number of divisors if and
only if it is a perfect square.

Since divisors come in pairs, we can limit ourselves to finding one member
of each such pair. Furthermore, one of the elements in each such pair must be
√ √ √
bounded by 𝑛. Otherwise, we would have that 𝑛 = 𝑑 · 𝑑𝑛 > 𝑛 · 𝑛 = 𝑛, a
contradiction (again, 0 is a special case here where we always have 𝑑0 = 0). This

limit helps us reduce the time it takes to find the divisors of a number to Θ( 𝑛),
which allows us to solve the problem sufficiently fast.

1: procedure Divisors(𝑁 )
2: divisors ← new list
3: for 𝑖 from 1 up to 𝑖 2 ≤ 𝑁 do
4: if 𝑁 mod 𝑖 = 0 then
5: divisors. add(𝑖)
6: if 𝑖 ≠ 𝑁 /𝑖 then
7: divisors. add( 𝑁𝑖 )

305
C HAPTER 19. N UMBER T HEORY

8: return divisors

Problem 19.2
Divisors – divisors

Let us look at an application of this algorithm.

Subcommittees
In a parliment of 𝑃 ≤ 1016 people, the speaker wants to divide the parliment
into (at least two) disjoint subcommittees of equal size. Of course, the chair of
such a subcommittee furthermore wants to divide their subcommittee into (at
least two) subsubcommittees of equal size, and so on, until no further divisions
can be performed.
What is the maximum number of levels of subcommittees can be created?

Solution. What different sizes may the first level of subcommitees have? Well,
if we perform a split into groups if size 𝑘, we get 𝑃𝑘 such groups. Of course, this
must be an integer – i.e. 𝑘 must be a divisor of 𝑃. This means we are looking
for a sequence of numbers 𝑐 0, 𝑐 1, 𝑐 2, . . . , 𝑐𝑛 such that 𝑐 0 = 𝑃, 𝑐𝑖+1 | 𝑐𝑖 and 𝑐𝑛 = 1.
A simple solution would be to generate all divisors of 𝑃 (the possible values
of 𝑐 1 ), attempt a split into those group sizes, and then recursively solve the
problem for them. However, this would be too slow. As an example, if we
take 𝑃 = 8 086 598 962 041 600, the sum of the square roots of its divisors is
6 636 882 083, so even finding only the ways to split the parliment into 2-level
committees would be too expensive.
Instead, we will use the following lemma:

Exercise 19.3. Prove that divisibility is a transitive relationship; if 𝑏 | 𝑎 and


𝑐 | 𝑏, then 𝑐 | 𝑎.

This means that the possible values of 𝑐𝑖 , i.e. the transitive closure of
divisibility of 𝑃, are the divisors of 𝑃. Essentially, we are looking for the longest
sequence of divisors of 𝑃 such that each divisor is also a divisor of the previous
divisor. By constructing the directed graph of all the divisors 𝑎 with edges from
𝑎 to its own divisors, we reduce the problem to finding the longest path in a
DAG. Unfortunately, this too is slow – the above 𝑃 has 41 472 divisors, leaving
us with about 41472 = 859 942 656 modulo operations to construct the graph.

2

306
19.1. D IVISIBILITY

What if we instead try to look at the entire sequence at once? We have


𝑃 = 𝑐 0 , 𝑐 0 = 𝑐 1𝑞 1 , 𝑐 1 = 𝑐 2𝑞 2 , . . . , 𝑐𝑛−1 + 𝑐𝑛 𝑞𝑛 , 𝑐𝑛 = 1 where 𝑞𝑖 > 1. Inserting
every equation into the previous one gives us 𝑃 = 𝑞 1𝑞 2 · · · 𝑞𝑛 . Conversely, for
each such choice of 𝑞𝑖 , we can construct a valid sequence of 𝑐𝑖 . Note that all
𝑞𝑖 have only trivial divisors, or we could replace it with two numbers and still
preserve the product, yielding a longer sequence.
Now comes the key insight – we can WLOG choose some 𝑞𝑖 to be the
smallest divisor of 𝑃 – let us call this 𝑘. Pick an 𝑖 such that 𝑘 - 𝑄 = 𝑞 1𝑞 2 · · · 𝑞𝑖−1
and 𝑘 | 𝑄𝑞𝑖 (such an 𝑖 exists since 𝑘 | 𝑃 and 𝑘 - 1). By the following theorem,
𝑞𝑖 = 𝑘

Theorem 19.1 — Euclid’s Lemma

If 𝑝 > 1 and 𝑏 > 1 have only trivial divisors, 𝑝 | 𝑎𝑏 and 𝑝 - 𝑎, then


𝑝 = 𝑏.

Proof. We prove a slightly stronger statement (called Euclid’s Lemma)


instead; if 𝑝 > 1 have only trivial divisors and 𝑝 | 𝑎𝑏, then either 𝑝 | 𝑎 or
𝑝 | 𝑏. This implies the original statement, since if 𝑝 | 𝑏 and 𝑏 only have
trivial divisors, 𝑝 = 1 or 𝑝 = 𝑏 (but 𝑝 > 1).
Consider the smallest 𝑝 for which there exists a (smallest) 𝑎 for which
there exists a 𝑏 where the theorem is false. We now prove that this minimal
counterexample gives rise to an even smaller counterexample.
First, 𝑎 lacks non-trivial divisors. Otherwise, we can pick 𝑛, 𝑚 such that
𝑎 = 𝑛𝑚 where 0 < 𝑛 ≤ 𝑚 < 𝑎. Substitution gives us 𝑝 | (𝑛𝑚)𝑏 = 𝑛(𝑚𝑏).
Since 𝑛 < 𝑎, we have either 𝑝 | 𝑛 or 𝑝 | 𝑚𝑏, since we otherwise find a smaller
counterexample. We know that 𝑝 - 𝑛. Otherwise, as 𝑛 | 𝑎, we get 𝑝 | 𝑎,
Therefore, 𝑝 | 𝑚𝑏. Because 𝑚 < 𝑎, we again find that 𝑝 | 𝑚 or 𝑝 | 𝑏 (or we
have a smaller counterexample). As 𝑝 - 𝑏 (by assumption), we get 𝑝 | 𝑚.
Again, as 𝑚 | 𝑎 we get the corresponding contradiction 𝑝 | 𝑎.
Next, we WLOG assume 𝑎 < 𝑝. Otherwise, consider 𝑐 = 𝑎 − 𝑝 > 0.
Since 𝑝 | 𝑎𝑏, we have 𝑝 | 𝑎𝑏 − 𝑝𝑏 = 𝑐𝑏 (see Exercise 19.4). Thus, 𝑝 | 𝑐 or
𝑝 | 𝑏 (𝑐 would otherwise be a smaller counterexample). We have assumed
𝑝 - 𝑏, so 𝑝 | 𝑐, i.e. 𝑝 | 𝑎 − 𝑝. By the same exercise, this implies 𝑝 | 𝑎, contrary
to our assumption.
Finally, let 𝑛 be such that 𝑝𝑛 = 𝑎𝑏. Since 𝑎 is smaller than 𝑝 and have

307
C HAPTER 19. N UMBER T HEORY

only trivial divisors, we have that 𝑎 | 𝑝𝑛 implies 𝑎 | 𝑝 or 𝑎 | 𝑛. As 𝑝 lacks


non-trivial divisors, the latter must be true. This means there exists 𝑚 such
that 𝑛 = 𝑚𝑎. Inserting this gives us that 𝑎𝑏 = 𝑝𝑚𝑎, or 𝑏 = 𝑝𝑚. But this
means 𝑝 | 𝑏, a contradiction.
Since the assumption of a smallest counterexample only lead to contradic-
tions, we find that that no such counterexample exists, meaning the theorem
is true. 

Exercise 19.4. Prove that if 𝑎 | 𝑏 and 𝑎 | 𝑐 then 𝑎 | 𝑏 + 𝑐.

With this, we are nearly there. The sequence 𝑞𝑖 is independent of order –


we kan let 𝑞 1 = 𝑘. This means choosing the largest possible divisor of 𝑃 as
𝑐 1 . How do we choose the remaining ones? Well, for 𝑐 2 , the same argument
says we should choose greatest possible divisor of 𝑐𝑃1 , and so on. How do we
find it? Since it must also be a divisor of 𝑃, we can iterate through the smaller
divisors (in descending order) and pick the first one that was also a divisor of 𝑐𝑃1 .
Eventually, we reach 𝑐𝑛 = 1.
A solution could look something like the following.

1: procedure Subcommittees(𝑃)
2: list divisors ← 𝐷𝑖𝑣𝑖𝑠𝑜𝑟𝑠 (𝑃)
3: sort divisors in descending order
4: ans ← 0
5: for each 𝑑 in divisors do
6: if 𝑑 divides 𝑃 then
7: ans ← ans + 1
8: 𝑃 ←𝑑
9: return ans

This solution only


√ requires computing and iterating through all divisors of 𝑃,
giving us a Θ( 𝑃) solution. 

Problem 19.3
Subcommittees – subcommittees
Evening Out 2 – eveningout2
Multiplication Table – multtable
Note: Solve for 2 points.

308
19.1. D IVISIBILITY

This result that divisors comes in pairs happens to give us some help in
answering our next question, regarding the plurality of divisors. The above

result gives us an upper bound of 2 𝑛 divisors of an integer 𝑛. We can do a
1
little better, with ≈ 𝑛 3 being a commonly used approximation for the number of
divisors when dealing with integers which fit in the native integer types.1 For
example, the maximal number of divisors of a number less than 103 is 32, 106
is 240, 109 is 1 344, 1018 is 103 680.2
A bound we will find more useful when solving problems concerns the
average number of divisors of the integers between 1 and 𝑛.

Theorem 19.2 — Average Number of Divisors

Let 𝑑 (𝑖) be the number of divisors of 𝑖. Then,

𝑠𝑢𝑚𝑛𝑖=1𝑑 (𝑖) = Θ(𝑛 ln 𝑛)

Proof. There are between 𝑛−𝑖+1 𝑖 and 𝑛𝑖 integers between 1 and 𝑛 divisible by
𝑖, since every 𝑖’th integer is divisible by 𝑖. Thus, the number of divisors of all
those integers is bounded by

𝑛 Õ1 𝑛
𝑠𝑢𝑚𝑛𝑗=1 =𝑛 = 𝑂 (𝑛 ln 𝑛)
𝑗 𝑗=1
𝑗

from above and

𝑛− 𝑗 +1 Õ1 Õ1
𝑛 𝑛
𝑠𝑢𝑚𝑛𝑗=1 =𝑛 −𝑛 + ≥ 𝑛 ln 𝑛 − 𝑛 + ln 𝑛 = Ω(𝑛 ln 𝑛)
𝑗 𝑗=1
𝑗 𝑗=1
𝑗

from below. 

This proof also suggest a way to compute the divisors of all the integers
1, 2, ..., 𝑁 .

1In reality, the maximal number of divisors of the interval [1, 𝑛] grows sub-polynomially, i.e.,
as 𝑂 (𝑛𝜖 ) for every 𝜖 > 0.
2Sequence A066150 from OEIS: http://oeis.org/A066150.

309
C HAPTER 19. N UMBER T HEORY

Divisor Counts
For every integer between 1 and 𝑁 , compute the number of positive divisors it
has.
Solving the problem with the previous√algorithm, computing the divisors for
every single integer, would yield a Θ(𝑁 𝑁 ) algorithm. Instead, we inverse the
problem. For each integer 𝑖, we find all the numbers divisible by 𝑖 (in Θ( 𝑛𝑖 )
time), which are 0𝑖, 1𝑖, 2𝑖, . . . b 𝑛𝑖 c𝑖. In total, this takes Θ(𝑁 ln 𝑁 ) time, a quite
decent improvment.
Problem 19.4
Divisor Counts – divisorcounts
Organizator – organizator

This technique, called sieving is a quite common number theoretical method


– we revisit it in the next section.

19.2 Prime Numbers


When talking about divisibility, we regularly used it as a tool to describe
factorizations of an integer in various ways. For example, given the number
12, we could factor it as 2 · 6, or 3 · 4, or even further into 2 · 2 · 3. This last
factorization is special, in that no matter how hard we try, it cannot be factored
further since 2 and 3 lack non-trivial divisors. It consists only of factors that are
prime numbers.

Definition 19.2 — Prime Number


An integer 𝑝 ≥ 2 is called a prime number if its only positive divisors are 1
and 𝑝.
The integers 𝑛 ≥ 2 that are not prime numbers are called composite
numbers.
Note that 1 is neither prime nor composite.

Example 19.2 The first 10 prime numbers are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29.
Which is the next one?

Problem 19.5

310
19.2. P RIME N UMBERS

Longest Prime Sum – longprimesum


Shortest Composite Sum – shortcompositesum

Let us start with the simple questions – how do we determine if a number


is a prime? Using the knowledge from the previous section, this is simple. A
number is prime if it has only trivial divisors, so we can use the same approach
that we used when computing
√ all the divisors of a number to see if it has any
divisor instead in 𝑂 ( 𝑁 ) time.
Problem 19.6
Primality – primality
Note: Solve for 1 point.

In a similar manner, we can extend the algorithm used to count all divisors
of all numbers up to some limit to counting primes up to a limit. Instead of
counting divisors, we simply mark those numbers that had non-trivial divisors
as non-prime. The running time is the same, Θ(𝑁 ln 𝑁 ).
Problem 19.7
Prime Count – primecount
Note: Solve for 2 points.

Exercise 19.5. Let 𝜋 (𝑁 ) be the number of primes up to 𝑁 . Given a list of


those primes, show how to determine the primality of any integer up to 𝑁 2 in
𝑂 (𝜋 (𝑁 ) 2 ) time.

One might wonder if the algorithm in Exercise 19.5 is faster than testing all
possible divisors. To answer this, we need to know more about the number of
primes.
There are an infinite number of primes. This can be proven by a simple proof
by contradiction. If 𝑝 1, 𝑝 2, ..., 𝑝𝑞 are the only primes, then 𝑃 = 𝑝 1𝑝 2 . . . 𝑝𝑞 + 1
is not divisible by any prime number (and by extension has no divisors but the
trivial ones), so it is not composite. However, 𝑃 is larger than any prime, so it is
not a prime number either, a contradiction.
More relevant is instead the density of primes, since that is what determines
how 𝜋 (𝑁 ) relates to 𝑁 .

311
C HAPTER 19. N UMBER T HEORY

Theorem 19.3 — Prime Number Theorem

1
The density of prime numbers in the interval [1, 𝑁 ] is ≈ ln 𝑁 for large
𝑁.

The proof requires a lot of deep number theory, so we will not show it here. This
means that precomputing primes and then using that list to check primality only
gains you a logarithmic factor. More specifically, the number of primes below
103 is 168, below 106 is 78 498, below 109 is approximately 51 · 106 .
Based on the Prime Number Theorem, one might have the reasonable
thought that prime numbers shouldn’t be that far apart. The Prime Number
Theorem states that the average distance between primes is ln1𝑁 , but of course
the maximum gap may be longer. For all integers up to 109 , the maximum gap
is 282, and for 1018 it is 1442.
A very trivial upper bound on the gaps that is occasionally useful is the
following one.

Theorem 19.4 — Bertrand’s Postulate

For all 𝑛 ≥ 2, there exists a prime 𝑝 where 𝑛 < 𝑝 < 2𝑛.

Prime Time
Jon Marius Venstad, Nordic Collegiate Programming Contest 2011, CC BY-SA 3.0
Odd, Even and Ingmariay are playing a game. They start with an arbitrary positive
integer and take turns either adding 1 or dividing by a prime (assuming the result
is still an integer). Once they reach 1, they each gets points corresponding to the
smallest of the numbers their move resulted in. If a player could make no move,
their score is instead equal to the starting integer. They all play such that they
minimize their own moves. If several possible moves would result in the same
score for the player, they would choose to produce the lowest number that they
can. They play in the order Odd → Even → Ingmariay → . . . , but alternate
who starts the round.
Given a list of starting integers of the rounds they played of the game,
determine the final scores of the players.

312
19.2. P RIME N UMBERS

Input
The first line contains 𝑛 (1 ≤ 𝑛 ≤ 1000), the number of rounds of the game. The
next 𝑛 lines contains the name of the starting player of the round and the starting
integer (between 1 and 10 000).
Output
Output the final scores of the three players.
Note: The problem statement has been shortened.
Solution. The game theoretic solution of the problem would be to construct the
graph of all the integers with edges between possible transitions. One could
then compute the score a player would get for moving to a certain integer in
the graph in the way described in Section 18.2. Unfortunately, the game graph
contains possible loops such as 2 → 3 → 4 → 2. We can eliminate those loops
with some additional insights using the tie-breaking rules the players use when
picking moves.
Let us investigate the behaviour of the players more closely. If a player is
presented with a prime number, they clearly will divide with it to end the game
and get 1 point. In other cases, they may either add 1 or divide away a prime.
This means we never want to move from a prime 𝑝 to 𝑝 + 1.
It turns out that removing these moves makes the game acyclic. Assume
to the contrary that we currently are at a number 𝑎 between two consecutive
primes 𝑝𝑘 and 𝑝𝑘+1 and have a sequence of moves that takes us back to 𝑎. The
next sequence of moves will be to add 1 until we hit an integer 𝑝𝑘 < 𝑏 ≤ 𝑝𝑘+1
(remember we never want to go to 𝑝𝑘+1 + 1) and dividing some prime 𝑝𝑖 away.
But 𝑝𝑏𝑖 ≤ 𝑏2 ≤ 𝑘+1 2 . By Bertrand’s Postulate, 2 < 𝑝𝑘 , so the new result
𝑝 𝑝𝑘+1

will be less than 𝑝𝑘 . However, since we are not allowed to make the transition
𝑝𝑘 → 𝑝𝑘 + 1 we can never reach 𝑎 again.
What remains is to compute the transitions from each integer 1 ≤ 𝑎 ≤ 𝑁 .
By precomputing all the primes up to 10 000, we can afford to test whether all of
4 ·104
them are divisors of each 𝑎. In total, this is on the order of 10ln 10 7
4 ≈ 10 edges,

which is a reasonable number. 

Since the prime numbers have no other divisors besides the trivial ones, a
factorization consisting only of prime numbers is special.

Definition 19.3 — Prime Factorization

313
C HAPTER 19. N UMBER T HEORY

The prime factorization of a positive integer 𝑛 is a factorization of the form

𝑝 𝑒11 · 𝑝 𝑒22 · · · · · 𝑝𝑘𝑒𝑘

where 𝑝𝑖 are all distinct primes.


We define
𝜔 (𝑛) = 𝑘
to be the number of primes dividing 𝑛, and
𝑘
Õ
Ω(𝑛) = 𝑒𝑖
𝑖=1

to be the number of primes with multiplicity in the product of 𝑛.

Example 19.3 The prime factorization of 228 is 2 · 2 · 3 · 19.

Note that in the definition, we spoke of the prime factorization. It turns out
that this factorization is indeed unique, except for a reordering of the 𝑝𝑖 . It may
be “intuitively obvious” that this is the case, but that is misguided. A proof can
be constructed using Euclid’s Lemma from the previous section (p. 307).

Theorem 19.5 — Existence and Uniqueness of Prime Factorizations

There exists a unique factorization of a positive integer 𝑁 into prime numbers.

Proof. First, the existence part, through a simple proof by induction. Assume
that all integers up to 𝑁 − 1 has a prime factorization. If 𝑁 is a prime, then
𝑁 is a prime factorization of itself. Otherwise, it has a non-trivial divisor, so
we can write 𝑁 = 𝑎𝑏 with 1 < 𝑎 ≤ 𝑏 < 𝑁 . By the induction hypothesis, 𝑎
and 𝑏 have prime factorizations. Concatenating the two factorizations gives
us a prime factorization of 𝑁 . Thus, by induction, all positive integers have
prime factorizations.
Next, the uniqueness. We prove this too by induction. Our base case is
𝑁 = 1 which has the empty product as unique prime factorization. Assume
that 𝑁 has two distinct prime factorizations 𝑁 = 𝑝 1𝑝 2 · · · 𝑝𝑘 = 𝑞 1𝑞 2 · · · 𝑞𝑙 ,
but all integers up to 𝑁 − 1 has only one. Consider the prime 𝑝𝑘 . Since it
divides the left side, it must also divide the right side. Let 𝑖 be such that

314
19.2. P RIME N UMBERS

𝑄 = 𝑞 1 · · · 𝑞𝑖−1 is not divisible by 𝑝𝑘 , but 𝑄𝑞𝑖 is. Such an 𝑖 exists since when
𝑖 = 1, the product 𝑄 is 1 (the empty product) which is not divisble by 𝑝 1 , and
with 𝑖 = 𝑙 we get 𝑄 = 𝑁 which is divisible by 𝑝 1 .
Then, by our version of Euclid’s Lemma, since 𝑝 1 and 𝑞𝑖 are primes,
𝑝 | 𝑄𝑞𝑖 but 𝑝 - 𝑄, we have 𝑝 1 = 𝑞𝑖 . WLOG we can reorder the factors 𝑞 and
assume 𝑞𝑖 = 𝑞 1 . If we divide away this factor, we get that 𝑝 2 · · · 𝑝𝑘 = 𝑞 2 · · · 𝑞𝑙
are both prime factorizations of 𝑝𝑁1 < 𝑁 , so by the induction hypothesis
they are the same. That means the original prime factorizations were also
the same, a contradiction. Thus, 𝑁 too has a unique prime factorization,
completing the proof. 

Factorization
Given an integer 𝑁 , compute its prime factorization.

The simplest solution is to extend the algorithm the method used to test primality.

An integer 𝑁 can have at most one prime in its factorization that exceeds 𝑁 ,
since their product
√ otherwise would exceed 𝑁 . Looping over all possible prime
divisors up to 𝑁 and factoring them out from 𝑁 is thus sure √ to find all prime
factors, except for possibly a single one that was larger than 𝑁 . The algorithm
is called trial division.

1: procedure Factor(𝑁 )
2: primes ← new list
3: for 𝑖 from 2 up to 𝑖 2 ≤ 𝑁 do
4: while 𝑁 mod 𝑖 = 0 do
5: primes. add(𝑖)
6: 𝑁 ← 𝑁𝑖
7: if 𝑁 ≠ 1 then √
8: primes. add(𝑁 ) ⊲ 𝑁 may have had a single prime factor > 𝑁
9: return primes

Exercise 19.6. In Algorithm ??, 𝑁 is being modified in the loop when a new
prime is found. Is it a problem to use the new, updated 𝑁 in the 𝑖 2 ≤ 𝑁 check in
the loop? Is the time complexity the same?

315
C HAPTER 19. N UMBER T HEORY

Problem 19.8
Factorization – factorization
Note: Solve for 1 point.

For the next problem, we define some useful notation:

Definition 19.4 For integers 𝑎, 𝑏, 𝑘, we say that 𝑎𝑘 divides 𝑏 exactly if 𝑎𝑘 | 𝑏


but 𝑎𝑘+1 - 𝑏. We use the notation 𝑎𝑘 || 𝑏 for this.

Factorial Power
Given integers 𝑛 and 𝑚 (1 ≤ 𝑛, 𝑚 ≤ 1014 ), determine the 𝑘 for which 𝑛𝑘 || 𝑚!.

Solution. To start, we must first connect the prime factorization with divisibility.
If 𝑛 has the prime factorization 𝑛 = 𝑝 𝑒11 · 𝑝 𝑒22 · · · 𝑝𝑙𝑒𝑙 , a divisor of 𝑛 must be of
𝑒0 𝑒0 𝑒0
the form 𝑑 = 𝑝 11 · 𝑝 22 · · · 𝑝𝑙 𝑙 , where 0 ≤ 𝑒𝑖0 ≤ 𝑒𝑖 . This can be proven using the
uniqueness of the prime factorization, and the fact that 𝑛 = 𝑑𝑞 for some integer
𝑞. Any number of this form is also a divisor of 𝑛.
The exponent laws gives us that 𝑛𝑘 = 𝑝 𝑘𝑒 1 · 𝑝 2 · · · 𝑝𝑙 , so we are looking
1 𝑘𝑒 2 𝑘𝑒𝑙

for the largest 𝑘 such that 𝑘𝑒𝑖 does not exceed the power of 𝑝𝑖 in 𝑚!. Thus, after
factoring 𝑛 the problem is reduced to determining how many times all the 𝑝𝑖
divides 𝑚!. This equals
  $ % $ %
𝑚 𝑚 𝑚
+ 2 + 3 +...
𝑝𝑖 𝑝𝑖 𝑝𝑖

Exercise 19.7. Prove the above formula.


Since 𝑛 can have at most log2 (𝑛) prime factors, and all terms after the log2 (𝑚) first

ones are zero, the complexity of the computation is 𝑂 ( 𝑛 + log(𝑛) log(𝑚)). 

Problem 19.9
Factorial Power – factorialpower
Perfect Pth Powers – perfectpowers
Divisor Guessing Game – divisorguessing

For the next problem, we will show a slightly faster version of the prime
sieve we’ve previously seen, using it to factor all integers in an interval.

316
19.2. P RIME N UMBERS

Product Divisors
Given a sequence of integers 𝑎 1, 𝑎 2, . . . 𝑎𝑛 , compute the number of divisors of
𝐴 = 𝑛𝑖=1 𝑎𝑖 .
Î

Input
The length of the sequence 1 ≤ 𝑛 ≤ 1 000 000, and the sequence 1 ≤ 𝑎 1, . . . , 𝑎𝑛 ≤
106 .
Output
The number of divisors of 𝑛 modulo 109 + 7.

Solution. Again, we use the prime factorization interpretation of divisors.


Let 𝐴 = 𝑘𝑖=1 𝑝𝑖𝑒𝑖 where 𝑝𝑖 are distinct primes. We can now use a simple
Î

combinatorial argument to count the number of divisors of 𝐴. A divisor of 𝐴 is


𝑒0
of the form 𝑘𝑖=1 𝑝𝑖 𝑖 where 𝑒𝑖0 ≤ 𝑒𝑖 . All the 𝑒𝑖0 can take any integer value between
Î

0 and 𝑒𝑖 to fulfill this condition. This gives us 𝑒𝑖 + 1 choices for the value of
𝑒𝑖0. Since each 𝑒𝑖0 is independent, there are a total of (𝑒 1 + 1) (𝑒 2 + 1) . . . (𝑒𝑘 + 1)
numbers of this form, and thus divisors of 𝐴 by the multiplication principle.
We are left with the problem of determining the prime factorization of 𝐴.
Essentially, this is tantamount to computing the prime factorization of every
integer between 1 and 106 , since we could have 𝑎𝑖 = 𝑖 for 𝑖 = 1 . . . 106 . Once
this is done, we can go through the sequence 𝑎𝑖 and tally up all primes in their
factorization. Since an integer 𝑚 has at most log2 𝑚 prime factors, this step
is bounded by approximately 𝑛 log2 106 operations. Then, how do we factor
all integers in [1..106 ]? We could obviously adapt the algorithm we used to
count primes before, but we will now improve it a bit. The general idea was
to loop over multiples of all numbers, and mark them as non-prime. When
dealing with primes however, we only care about doing this for primes. After
all, any non-prime must have a prime divisor, so we don’t lose correctness by
only sieving on primes. With this improvement, the seive is called the Seive of
Eratosthenes.
This results in the following solution:

1: procedure ProductDivisors(sequence 𝐴)
2: counts ← new list[106 + 1]
3: for each 𝑎 in 𝐴 do
4: counts[a] ← counts[a] + 1

317
C HAPTER 19. N UMBER T HEORY

5: isPrime ← new list[106 + 1] of 𝑡𝑟𝑢𝑒


6: exponents ← new map
7: for 𝑖 from 2 to 𝑖 2 ≤ 𝑁 do
8: if isPrime[𝑖] then
9: for 𝑗 ∈ {2𝑖, 3𝑖, . . . } up to 𝑁 − 1 do
10: if tmpj mod 𝑖 = 0 then
11: isPrime[ 𝑗] ← false
12: while tmpj mod 𝑖 = 0 do
13: exponents[𝑖] = exponents[𝑖] + counts[ 𝑗]
14: tmpj = tmpj/𝑖
15: ans ← 1
16: for each 𝑣𝑎𝑙𝑢𝑒 in exponents do
17: ans ← (ans · (𝑣𝑎𝑙𝑢𝑒 + 1)) mod(109 + 7)
18: return ans

The complexity of this solution is a bit tricky to analyze. The important part
of the sieve is the inner loop, which computes the actual factors. Let us count
the number of times a prime 𝑝 is pushed in this loop. First of all, every 𝑝’th
integer is divisible by 𝑝, which totals 𝑛𝑝 iterations. However, every 𝑝 2 ’th integer
integer is divisible by 𝑝 yet again, contributing an additional 𝑝𝑛2 iterations, and
so on. Summing this over every 𝑝 which is used in the sieve gives us the bound
Õ 𝑛 𝑛 𝑛
 Õ 1 1 1

+ 2 + 3 + ... = 𝑛 + 2 + 3 + ...
√ 𝑝 𝑝 𝑝 √ 𝑝 𝑝 𝑝
𝑝≤ 𝑛 𝑝≤ 𝑛

Using the formula for the sum of a geometric series ( 𝑝1 + 𝑝12 + ... = 𝑝−1 ) gives us
𝑝

the simplification
Õ 1 © Õ 1ª
𝑛 = Θ ­𝑛
√ 𝑝 −1
®
√ 𝑝
𝑝≤ 𝑛 « 𝑝 ≤ 𝑛 ¬
1
It is known that 𝑝 ≤𝑛 𝑝 = 𝑂 (ln ln 𝑛). With this, the final complexity becomes a
Í

simple 𝑂 (𝑛 ln ln 𝑛) = 𝑂 (𝑛 ln ln 𝑛). 

318
19.3. T HE E UCLIDEAN A LGORITHM

Competitive Tip
When using the Sieve of Eratosthenes, we can save quite a bit of memory by using a
bitset instead since we only store a boolean state per number (whether it is prime or
not). This gives us slightly better cache behaviour, improving the performance in real
terms.
Competitive Tip
Problem 19.10
Prime Count – primecount
Note: Solve for 3 points.

Problem 19.11

19.3 The Euclidean Algorithm


The Euclidean algorithm is one of the oldest known algorithms, dating back
to Greek mathematician Euclid who wrote of it in his mathematical treatise
Elements. It regards those numbers which are divisors of two different integers,
and extends into integer equations of the form 𝑎𝑥 + 𝑏𝑦 = 𝑐.

Definition 19.5 — Greatest Common Divisor


We call an integer 𝑑 dividing both of the integers 𝑎 and 𝑏 a common divisor
of 𝑎 and 𝑏.
The greatest such integer is called the greatest common divisor, or GCD
of 𝑎 and 𝑏. This number is denoted gcd(𝑎, 𝑏), or with the shorthand (𝑎, 𝑏)
when clear from the context.

Example 19.4 12 and 42 have divisors 1, 2, 3, 4, 6, 12 and 1, 2, 3, 6, 7, 14,


21, 42. Their shared divisors are 1, 2, 3, 6, so their greatest common divisor
is 6.

We warm up with some simple properties of the GCD.

Theorem 19.6 — Properties of the GCD

319
C HAPTER 19. N UMBER T HEORY

Let 𝑎, 𝑏, 𝑐 be non-negative integers. Then

(𝑎, 0) = 𝑎 (19.1)

(𝑎, 𝑎) = 𝑎 (19.2)
(𝑎, 𝑏) ≤ max(𝑎, 𝑏) (19.3)
(𝑎𝑐, 𝑏𝑐) = 𝑐 · (𝑎, 𝑏) (19.4)
(𝑎, 𝑏) | (𝑎, 𝑏𝑐) (19.5)
If (𝑎, 𝑐) = 1,then
(𝑎, 𝑏𝑐) = (𝑎, 𝑏) (19.6)

Proof. We give a proof for the last equation – the others are good exercises
to get acquainted with the GCD.
We can WLOG assume that 𝑐 is prime. If it is not, we can assume we
have a smallest counterexample, perform the substitution 𝑐 = 𝑝𝑐 0 where 𝑝 is
prime, and prove it using (𝑎, (𝑏𝑐 0)𝑝) = (𝑎, 𝑏𝑐 0) instead. As (𝑎, 𝑐 0) | (𝑎, 𝑐) = 1,
the premises still hold for showing that (𝑎, 𝑏𝑐 0) = (𝑎, 𝑏).
First, let 𝑑 = (𝑎, 𝑏). Then, (𝑎, 𝑏𝑐) = 𝑑 ( 𝑑𝑎 , 𝑏𝑐
𝑑 ) by Equation 19.4. We are
then trying to prove that ( 𝑑𝑎 , 𝑐 𝑑𝑏 ) ≠ 1. Assume otherwise. Then there must
be some prime 𝑝 that divides 𝑑𝑎 and 𝑐 𝑑𝑏 . The first condition implies 𝑝 | 𝑎.
However, since (𝑎, 𝑐) = 1 we can not have 𝑝 | 𝑐, or else they would share a
divisor greater than 1. This means 𝑝 | 𝑑𝑏 . But then 𝑝 | ( 𝑑𝑎 , 𝑑𝑏 ) = 1, which is
impossible. Thus, we must have that ( 𝑑𝑎 , 𝑐 𝑑𝑏 = 1 and (𝑎, 𝑐𝑏) = 𝑑 = (𝑎, 𝑏). 

Exercise 19.8. Prove Equations 19.1-19.6.

Before we start looking into how to actually compute the greatest common
divisor, we take a detour into the land of number theoretic sums to also get some
practice and understanding of what the GCD actually means.

320
19.3. T HE E UCLIDEAN A LGORITHM

GCD Sum 1
Compute ÕÕ
gcd(𝑖, 𝑗)
𝑖 |𝑁 𝑗 |𝑁

where 𝑁 (1 ≤ 𝑁 ≤ 1014 ) is a given integer.

Solution. Now and then problems consist of computing some number theoretic
sum. There is a number of different techniques involved in this, so we will show
two different solutions.
Let us first try to transform the sum into something simpler. In our case,
we don’t even know how to compute the gcd of two numbers quickly yet, so it
makes sense to attempt to simplify that term. This approach is also supported by
the gcd(𝑖, 𝑗) being a non-trivial term that requires computation to figure out, but
we know all the values it will assume. Just from the definition, we understand
that gcd(𝑖, 𝑗) | 𝑖, and 𝑖 | 𝑁 , so gcd(𝑖, 𝑗) is a divisor of 𝑁 too. Picking 𝑖 = 𝑗 = 𝑑,
gives us gcd(𝑖, 𝑗) = 𝑑 for any divisor 𝑑.
By fixing the values of gcd(𝑖, 𝑗) one at a time, the problem is instead
transformed to fhe following: for each 𝑑 | 𝑁 , compute the number of pairs 𝑖 | 𝑁 ,
𝑗 | 𝑁 such that gcd(𝑖, 𝑗) = 𝑑. If we let this number be 𝑘 (𝑑), the sum in the
problem simplifies to 𝑑 |𝑁 𝑘 (𝑑) · 𝑑. Evaluating 𝑘 (𝑑) is slightly tricky. We first
Í

need the following: if 𝑎 | 𝑏 and 𝑎 and 𝑏 have a common divisor 𝑑, then 𝑑𝑎 | 𝑑𝑏 .


This can be proved by looking at the prime factorization of the three numbers.
A consequence of this fact is that the condition 𝑖, 𝑗 | 𝑁 reduces to 𝑑𝑖 , 𝑑𝑗 | 𝑁𝑑 .
Thus, we can instead try to count how many pair of divisors (𝑖 0, 𝑗 0) that 𝑁𝑑 have.
Can we choose any such pair of divisors? No – 𝑖 0 and 𝑗 0 cannot share any divisor
𝑑 0 > 1. If they did, 𝑖 and 𝑗 would share the divisor 𝑑𝑑 0, meaning 𝑑 was not the
greatest common divisor of 𝑖 and 𝑗.
The remainder of the proof is now a combinatorial argument. Let 𝑁𝑑 =
𝑝𝑖 |𝑁 0 𝑝𝑖 be the prime factorization of 𝑁 . Since 𝑖 and 𝑗 are divisors of 𝑁
Î 𝑒𝑖 0 0 0 0

but themselves share no factor, there are three cases for each prime factor: it
divides 𝑖 0 to some power, 𝑗 0 to some power, or neither (a consequence of Euclid’s
Lemma). In the first two cases there are 𝑒𝑖 possibilities respectively, and in the
second 1. Thus, there are in total 2𝑒𝑖 + 1 choices for each factor, resulting in the
product (2𝑒𝑖 + 1) for the total number of such pairs (𝑖 0, 𝑗 0).
Î

A note on implementation: when computing all the 𝑑 | 𝑁 and their prime


factorizations, we should not do this in an explicit manner. It is sufficient to

321
C HAPTER 19. N UMBER T HEORY

prime factor 𝑁 and then use a recursive procedure to compute all divisors, one
prime factor at a time.
The second approach we show gives us a quite different way of computing
the sum. It involves first looking at the function we are computing, and figuring
out how it is affected if we isolate one of the prime powers dividing the argument
of the function (𝑁 ). This is a common theme in number theory, where many
sums and functions are easy to compute for prime powers, and hopefully easy to
combine!
Let 𝑝 be a prime where 𝑝 𝑘 || 𝑁 and 𝑁 0 = 𝑝𝑁𝑘 . Then, we can rewrite our sum
as
Õ𝑘 Õ 𝑘
©Õ Õ
gcd(𝑖 · 𝑝 𝑎 , 𝑗 · 𝑝 𝑏 ) ®
ª
­
𝑎=0 𝑏=0 «𝑖 |𝑁 0 𝑗 |𝑁 0 ¬
By Equation 19.4, we can factor out min(𝑝 𝑎 , 𝑝 𝑏 ) from the innermost term.
Assume for the purpose of demonstration that 𝑎 ≤ 𝑏. Then, the sum simplifies to

Õ𝑘 Õ 𝑘
©Õ Õ 𝑎
𝑝 gcd(𝑖, 𝑗 · 𝑝 𝑏−𝑎 ) ®
ª
­
𝑎=0 𝑏=0 «𝑖 |𝑁 0 𝑗 |𝑁 0 ¬

However, since 𝑝 - 𝑖, Equation 19.6, gcd(𝑖, 𝑗 · 𝑝 𝑘 ) = gcd(𝑖, 𝑗) since 𝑖 is a divisor


of the 𝑝-less 𝑁 0. Further simplification becomes possible:

𝑘 Õ
𝑘
!
Õ ©Õ Õ
𝑚𝑖𝑛 (𝑎,𝑏)
gcd(𝑖, 𝑗) ®
ª
𝑝 ­
𝑎=0 𝑏=0 «𝑖 |𝑁 0 𝑗 |𝑁 0 ¬
The observant reader may notice that the left factor in the above product happens
to be same sum you’d get if 𝑁 = 𝑝 𝑘 , since gcd(𝑝 𝑎 , 𝑝 𝑏 ) = 𝑝 min(𝑎,𝑏) ! It’s apparently
enough to compute the sum for all prime factors of 𝑁 and multiply answers
together. This is not uncommon – in Section 19.6 we study more functions like
this.
Now, the only thing that remains is to evaluate this particular sum for each
prime divisor. Luckily 𝑘 and the number of primes in a number is both very
small (𝑙𝑒 log2 (𝑁 )), so it can be computed with nested loops. 

Finally, we get to the big question. How do we compute the greatest common
divisor of two integers?

322
19.3. T HE E UCLIDEAN A LGORITHM

Greatest Common Divisor


Given to integers 𝑎 and 𝑏, compute (𝑎, 𝑏).
√ √
We already know of a Θ( 𝑎 + 𝑏) algorithm to compute (𝑎, 𝑏), namely to
enumerate all divisors of 𝑎 and 𝑏. A new identity is key to the much faster
Euclidean algorithm.

(𝑎, 𝑏) = (𝑎, 𝑏 − 𝑎) (19.7)


We can prove the equality by proving an even stronger result – that all
common divisors of 𝑎 and 𝑏 are also common divisors of 𝑎 and 𝑏 − 𝑎. Assume 𝑑
is a common divisor of 𝑎 and 𝑏, so that 𝑎 = 𝑑𝑎 0 and 𝑏 = 𝑑𝑏 0 for integers 𝑎 0, 𝑏 0.
Then 𝑏 − 𝑎 = 𝑑𝑏 0 − 𝑑𝑎 0 = 𝑑 (𝑏 0 − 𝑎 0), with 𝑏 0 − 𝑎 0 being an integer, is sufficient
for 𝑑 also being a divisor of 𝑏 − 𝑎. The converse is shown in a similar way.
Hence the divisors of 𝑎 and 𝑏 are also divisors of 𝑎 and 𝑏 − 𝑎. In particular,
their largest common divisor is the same. The application of these identities
yield a recursive solution to the problem. If we wish to compute (𝑎, 𝑏) where
𝑎, 𝑏 are positive and 𝑎 ≤ 𝑏, we reduce the problem to a smaller one by instead
computing (𝑎, 𝑏), we compute (𝑎, 𝑏 − 𝑎). This gives us a smaller problem, in
the sense that 𝑎 + 𝑏 decreases. Since both 𝑎 and 𝑏 are non-negative, this means
we must at some point arrive at the situation where 𝑏 = 0. Equation 19.1 tells us
the GCD is then 𝑎.
One simple but important step remains before the algorithm is useful. Note
how computing (1, 109 ) requires about 109 steps right now, since we will do the
reductions (1, 109 − 1), (1, 109 − 2), (1, 109 − 3)... The fix is easy – the repeated
application of subtraction of a number 𝑎 from 𝑏 while 𝑏 ≥ 𝑎 is exactly the
modulo operation, meaning

(𝑎, 𝑏) = (𝑎, 𝑏 mod 𝑎)

This last piece of our Euclidean puzzle complete our algorithm, and gives us
a remarkably short algorithm, as seen in Algorithm ??. Note the recursive
invocation to (𝑏 mod 𝑎, 𝑎) to ensure that 𝑎 ≤ 𝑏.

1: procedure GCD(𝐴, 𝐵)
2: if 𝐵 = 0 then
3: return 𝐴
4: return 𝐺𝐶𝐷 (𝐵 mod 𝐴, 𝐴)

323
C HAPTER 19. N UMBER T HEORY

Competitive Tip
The Euclidean algorithm exists as the built-in function __gcd(a, b) in C++.

Whenever dealing with divisors in a problem, the greatest common divisor


may be useful. This is the case in the next problem, where we also look closer
at the prime factorization of the GCD.

Granica
Croatian Open Competition in Informatics 2007/2008, Contest #6
Given integers 𝑎 1, 𝑎 2, ..., 𝑎𝑛 , find all those numbers 𝑑 such that upon division by
𝑑, all of the numbers 𝑎𝑖 leave the same remainder.
Input
The first line contains the integer 2 ≤ 𝑛 ≤ 100, the length of the sequence 𝑎𝑖 .
The second line contains the integers 𝑛 integers 1 ≤ 𝑎 1, 𝑎 2, . . . , 𝑎𝑛 ≤ 109 .
Output
Output all such integers 𝑑, separated by spaces.

Solution. What does it mean for two numbers 𝑎𝑖 and 𝑎 𝑗 to have the same
remainder when dividing by 𝑑? Letting this remainder be 𝑟 we can write
𝑎𝑖 = 𝑑𝑛 +𝑟 and 𝑎 𝑗 = 𝑑𝑚 +𝑟 for integers 𝑛 and 𝑚. Thus, 𝑎𝑖 −𝑎 𝑗 = 𝑑 (𝑛 −𝑚) so that
𝑑 is divisor of 𝑎𝑖 − 𝑎 𝑗 ! This gives us a necessary condition for our numbers 𝑑. Is
it sufficient? If 𝑎𝑖 = 𝑑𝑛 +𝑟 and 𝑎 𝑗 = 𝑑𝑚 +𝑟 0, we have 𝑎𝑖 −𝑎 𝑗 = 𝑑 (𝑛 −𝑚) + (𝑟 −𝑟 0).
Since 𝑑 is a divisor of 𝑎𝑖 − 𝑎 𝑗 it must be a divisor of 𝑑 (𝑛 − 𝑚) + (𝑟 − 𝑟 0) too,
meaning 𝑑 | 𝑟 − 𝑟 0. As 0 ≤ 𝑟, 𝑟 0 < 𝑑, we have that −𝑑 < 𝑟 − 𝑟 0 < 𝑑, implying
𝑟 − 𝑟 = 0 so that 𝑟 = 𝑟 0 and both remainders were the same after all.
The question then is how we compute the set of common divisors of all
numbers 𝑎𝑖 − 𝑎 𝑗 . We claim that this set is (even for the case of only two numbers)
the divisors of their greatest common divisor. Intuitively true for some, but to
prove it we take aid in the prime factorizations of divisors. A divisor of some
integer
𝑛 = 𝑝 𝑒11 · · · 𝑝𝑘𝑒𝑘

is of the form
𝑒0 𝑒0
𝑑 = 𝑝 11 · · · 𝑝𝑘𝑘

324
19.3. T HE E UCLIDEAN A LGORITHM

where 0 ≤ 𝑒𝑖0 ≤ 𝑒𝑖 . Then, the requirement for 𝑑 to be a common divisor of 𝑛


and another number
𝑚 = 𝑝 11 · · · 𝑝𝑘1
𝑓 𝑓

is that 0 ≤ 𝑒𝑖0 ≤ min(𝑓𝑖 , 𝑒𝑖 ). It should be clear that a number with this property
is indeed a common divisor of 𝑛 and 𝑚.
The largest such number is attained when 𝑒𝑖0 = min(𝑓𝑖 , 𝑒𝑖 ) giving us the GCD.
This also explains why all common divisors must be divisors of GCD.
Using this interpretation of the GCD, we can extend the result to finding
the GCD 𝑑 of a sequence 𝑏 1, 𝑏 2, . . . . Consider any prime 𝑝, such that 𝑝 𝑞𝑖 || 𝑏𝑖 .
Then, we must have 𝑝 min(𝑞1,𝑞2,... ) || 𝑑. This operation is exactly what the GCD
algorithm does for two numbers. Since min(𝑞 1, 𝑞 2, . . . ) = min(𝑞 1, min(𝑞 2, . . . )),
we can use the recursion formula 𝑑 = gcd(𝑏 1, 𝑏 2, . . . ) = gcd(𝑏 1, gcd(𝑏 2, . . . )),
simplest implemented in a loop:

1: procedure MultiGCD(sequence 𝐴)
2: gcd ← 0
3: for each 𝑎 ∈ 𝐴 do
4: gcd ← 𝐺𝐶𝐷 (gcd, 𝑎)
5: return gcd

Finally, we need to find all the divisors of the GCD to arrive at the answer. 

A complementary concept is the least common multiple.

Definition 19.6 — Least Common Multiple


The least common multiple of integers 𝑎 and 𝑏 is the smallest positive integer
𝑚 such that 𝑎 | 𝑚 and 𝑏 | 𝑚.

Example 19.5 The multiples of 12 are 12, 24, 36, 48, 60, . . . . The multiples
of 10 are 10, 20, 30, 40, 50, 60, . . . . The least common multiple of the
numbers is thus 60.

Given 𝑎, 𝑏, 𝑎𝑏 is clearly a common multiple, but it doesn’t have to be the


smallest. Since 𝑎 | 𝑚, we have that 𝑚 = 𝑎𝑘. The question is basically what extra
factors we must add to 𝑎 in the form of 𝑘 in order to have 𝑏 | 𝑚. Previously, we
have determined that the condition for being a divisor is the following. If 𝑝 𝑘 || 𝑏
is one of the primes dividing 𝑏, then 𝑝 𝑘 | 𝑚 has to hold. This basically means

325
C HAPTER 19. N UMBER T HEORY

that we have to add whatever factors to 𝑎 that are additionally present in 𝑏. For
example, since 10 = 2 · 5 and 12 = 2 · 2 · 3, we need to add an additional factor
2 and 3 to 10 to make a common multiple – 2 · 2 · 3 · 5 = 60
To compute the LCM easily, note that a multiple 𝑚 of an integer 𝑎 with
prime factorization
𝑎 = 𝑝 𝑒11 · · · 𝑝𝑘𝑒𝑘
must be of the form
𝑒0 𝑒0
𝑚 = 𝑝 11 · · · 𝑝𝑘𝑘
where 𝑒𝑖 ≤ 𝑒𝑖0.
Thus, if 𝑚 is to be a common multiple of 𝑎 and another integer

𝑏 = 𝑝 11 · · · 𝑝𝑘1
𝑓 𝑓

it must hold that max(𝑓𝑖 , 𝑒𝑖 ) ≤ 𝑒𝑖0, with 𝑒𝑖0 = max(𝑓𝑖 , 𝑒𝑖 ) giving us the smallest such
multiple. Since max(𝑒𝑖 , 𝑓𝑖 )+min(𝑒𝑖 , 𝑓𝑖 ) = 𝑒𝑖 +𝑓𝑖 , we get that lcm(𝑎, 𝑏)·𝑔𝑐𝑑 (𝑎, 𝑏) =
𝑎𝑏. This gives us the formula lcm(𝑎, 𝑏) = gcd(𝑎,𝑏) 𝑎
𝑏 to compute the LCM. The
order of operations is chosen to avoid overflows in computing the product 𝑎𝑏.
As for the GCD of multiple integers, it holds that

lcm(𝑎, 𝑏, 𝑐, . . . ) = lcm(𝑎, lcm(𝑏, lcm(𝑐, . . . )))

GCD and LCM


Given that gcd(𝑎, 𝑏) = 𝑥 and lcm(𝑎, 𝑏) = 𝑦, where 1 ≤ 𝑥, 𝑦 ≤ 1014 determine
the possible pairs (𝑎, 𝑏).

Solution. Consider a prime 𝑝 𝑘 || 𝑦. By the above results, we get that 𝑝 𝑘 || 𝑎 or


𝑝 𝑘 || 𝑏, since the exponent of each prime in the LCM is the maximum of the
corresponding exponents in the two Similarly, if 𝑝 𝑙 || 𝑥, either 𝑝 𝑙 || 𝑎 or 𝑝 𝑙 || 𝑏.
There are then two cases; either 𝑝 𝑘 || 𝑎 and 𝑝 𝑙 || 𝑏, or the other way around.
The possible (𝑎, 𝑏) can be generated by checking both options for each prime
dividing either 𝑦. 

The Extended Euclidean Algorithm


Next up is the extended Euclidean algorithm. It is a way to solve certain integer
equations.

326
19.3. T HE E UCLIDEAN A LGORITHM

Linear Diophantine Equation


Given integers 𝑎, 𝑏, find an integer solution 𝑥, 𝑦 to

𝑎𝑥 + 𝑏𝑦 = (𝑎, 𝑏)
It is not obvious that a solution exists. Let 𝑆 = {𝑎𝑥 + 𝑏𝑦 | 𝑥, 𝑦 integers}. These
numbers are called the linear combinations of 𝑎 and 𝑏. 𝑆 is closed under addition
and negation (and thus also subtraction and multiplication). All numbers of
the form 𝑎𝑥 + 𝑏𝑦 are multiples of (𝑎, 𝑏), and we claim that the set (𝑎, 𝑏) itself
(and thus all its multiples). Assume 𝑑 = (𝑎, 𝑏) is the smallest positive member
of 𝑆. Then 𝑎 − 𝑑 d 𝑑𝑎 e = 𝑎 mod 𝑑 ∈ 𝑆, since it is closed under subtraction and
multiplication. Similarly, 𝑏 mod 𝑑 ∈ 𝑆. As 0 ≤ 𝑎 mod 𝑑 < 𝑑, we must have
𝑎 mod 𝑑 = 0 and 𝑏 mod 𝑑 = 0 as 𝑑 was the smallest element of 𝑆. However, this
is equivalent to 𝑑 | 𝑎 and 𝑑 | 𝑏, so 𝑑 | (𝑎, 𝑏) and 𝑑 = (𝑎, 𝑏) since 𝑑 was a multiple
of (𝑎, 𝑏).
This proof might remind you somewhat of the Euclidean algorithm. The
proof and the algorithm hide within them a method to write (𝑎, 𝑏) as a linear
combination of 𝑎 and 𝑏. Remember that during the computation of the GCD, we
repeatedly used that (𝑎, 𝑏) = (𝑏, 𝑎 mod 𝑏). Since 𝑎 mod 𝑏 is a linear combination
of 𝑎 and 𝑏, it seems as if the numbers (𝑎, 𝑏) during the computation of the GCD
always are linear combinations of 𝑎 and 𝑏. The algorithm concludes at (𝑑, 0), at
which point 𝑑 = (𝑎, 𝑏). If we only kept track of which linear combination that
was equal to 𝑑, we would be able to construct a solution to 𝑎𝑥 + 𝑏𝑦 = (𝑎, 𝑏). Let
us try this with an example, where we use [𝑥, 𝑦] to denote the number 𝑎𝑥 + 𝑏𝑦.

Example 19.6 — Extended Euclidean algorithm


Consider the equation 15𝑥 + 11𝑦 = (15, 11) = 1.
Performing the Euclidean algorithm on these numbers we find that

(15, 11) = ( [1, 0], [0, 1]) = (11, 15 mod 11) =

(11, 15 − 1 · 11) = ( [0, 1], [1, 0] − 1 · [0, 1]) =

(11, 4) = ( [0, 1], [1, −1]) = (4, 11 mod 4) =


(4, 11 − 2 · 4) = ( [1, −1], [0, 1] − 2[1, −1]) =

327
C HAPTER 19. N UMBER T HEORY

(4, 3) = ( [1, −1], [−2, 3]) = (3, 4 mod 3) =


(3, 4 − 1 · 3) = ( [−2, 3], [1, −1] − [−2, 3]) =

(3, 1) = ( [−2, 3], [3, −4]) = (1, 3 mod 1) =


(1, 3 − 3 · 1) = ( [3, −4], [−2, 3] − 3[3, −4]) =

(1, 0) = ( [3, −4], [−11, 15])


Verifying the results, 15 · 3 + 11 · (−4) = 45 − 44 = 1.

Exercise 19.9. Find an integer solution to the equation 24𝑥 + 52𝑦 = 2.

A short solution can be written recursively:

1: procedure ExtendedEuclidean(𝑎, 𝑏)

This gives us a single solution. Finding the others is not much harder. Let
𝑎0 𝑎
= (𝑎,𝑏) and 𝑏 0 = (𝑎,𝑏)
𝑏
. Given two solutions

𝑎𝑥 1 + 𝑏𝑦1 = (𝑎, 𝑏)

𝑎𝑥 2 + 𝑏𝑦2 = (𝑎, 𝑏)
we can first factor out (𝑎, 𝑏) to get

𝑎 0𝑥 1 + 𝑏 0𝑦1 = 1

𝑎 0𝑥 2 + 𝑏 0𝑦2 = 1
A simple subtraction gives us that

𝑎 0 (𝑥 1 − 𝑥 2 ) + 𝑏 0 (𝑦1 − 𝑦2 ) = 0

𝑎 0 (𝑥 1 − 𝑥 2 ) = 𝑏 0 (𝑦2 − 𝑦1 )
Because (𝑎 0, 𝑏 0) = 1, 𝑏 0 | 𝑥 1 − 𝑥 2 . Then there xists 𝑘 such that 𝑥 1 − 𝑥 2 = 𝑘𝑏 0, so
𝑥 1 = 𝑥 2 + 𝑘𝑏 0. Inserting this gives us

𝑎 0 (𝑥 2 + 𝑘𝑏 0 − 𝑥 2 )) = 𝑏 0 (𝑦2 − 𝑦1 )

328
19.4. M ODULAR A RITHMETIC

𝑎 0𝑘𝑏 0 = 𝑏 0 (𝑦2 − 𝑦1 )
𝑎 0𝑘 = 𝑦2 − 𝑦1
𝑦1 = 𝑦2 − 𝑘𝑎 0
Thus, any solution must be of the form
𝑏 𝑎
(𝑥 1 + 𝑘 , 𝑦1 − 𝑘 ) for 𝑘 ∈ Z
(𝑎, 𝑏) (𝑎, 𝑏)
It is easily verified that any 𝑘 also gives us a solution to this. This result is called
Bezout’s identity.

Generalized Knights
A generalized knight is a special chess piece. It moves by first choosing one of
the four cardinal directions and moves 𝑎 steps, and then chooses one of the two
orthogonal cardinal directions and moves 𝑏 steps (for example first up and then
left or right, or first left and then up or down). Compute the minimum number
of moves the knight needs to move from (0, 0) to (𝑥, 𝑦).
Input
The four integers 1 ≤ 𝑎, 𝑏, 𝑥, 𝑦 ≤ 101 8, where 𝑎 ≠ 𝑏.

Solution. Let is split up all the 8 moves the knight can makes into the following
subcomponents: 

19.4 Modular Arithmetic


When first learning division, one is often introduced to the concept of remainders.
For example, when diving 7 by 3, you would get “2 with a remainder of 1”. In
general, when dividing a number 𝑎 with a number 𝑛, you would get a quotient 𝑞
and a remainder 𝑟 . These numbers would satisfy the identity 𝑎 = 𝑛𝑞 + 𝑟 , with
0 ≤ 𝑟 < 𝑏.

Example 19.7 — Division with remainders


Consider division (with remainders) by 4 of the numbers 0, . . . , 6 We have
that
𝑓 𝑟𝑎𝑐04 = 0, remainder 0
𝑓 𝑟𝑎𝑐14 = 0, remainder 1

329
C HAPTER 19. N UMBER T HEORY

𝑓 𝑟𝑎𝑐24 = 0, remainder 2
𝑓 𝑟𝑎𝑐34 = 0, remainder 3
𝑓 𝑟𝑎𝑐44 = 1, remainder 0
𝑓 𝑟𝑎𝑐54 = 1, remainder 1
𝑓 𝑟𝑎𝑐64 = 1, remainder 2

Note how the remainder always increase by 1 when the nominator increased. As
you might remember from Chapter 2 on C++ (or from your favorite programming
language), there is an operator which compute this remainder called the modulo
operator. Modular arithmetic is then the computation on numbers, where
every number is taken modulo some integer 𝑛. Under such a scheme, we
have that e.g. 3 and 7 are basically the same if computing modulo 4, since
3 mod 4 = 3 = 7 mod 4. This concept, where numbers with the same remainder
are treated as if they are equal is called congruence.

Definition 19.7 — Congruence


If 𝑎 and 𝑏 have the same remainder when divided by 𝑛, we say that 𝑎 and 𝑏
are congruent modulo 𝑛, written

𝑎 ≡𝑏 (mod 𝑛)

An equivalent and in certain applications more useful definition is that


𝑎 ≡ 𝑏 (mod 𝑛) if and only if 𝑛 | 𝑎 − 𝑏.

Exercise 19.10. What does it mean for a number 𝑎 to be congruent to 0 modulo


𝑛?

When counting modulo something, the laws of addition and multiplication


are somewhat altered:

+ 0 1 2
0 0 1 2
1 1 2 3≡0
2 2 3≡0 4≡1

330
19.4. M ODULAR A RITHMETIC

* 0 1 2
0 0 0 0
1 0 1 2
2 0 2 4≡1

When we wish to perform arithmetic of this form, we use the integers


modulo 𝑛 rather than the ordinary integers. These has a special set notation as
well: Z𝑛 .
While addition and multiplication is quite natural (i.e. performing the
operation as usual and then taking the result modulo 𝑛), division is a more
complicated story. For real numbers, the inverse 𝑥 −1 of a number 𝑥 is defined
as the number which satisfy the equation 𝑥𝑥 −1 = 1. For example, the inverse of
4 is 0.25, since 4 · 0.25 = 1. The division 𝑏𝑎 is then simply 𝑎 multiplied with the
inverse of 𝑏. The same definition is applicable to modular arithmetic:

Definition 19.8 — Modular Inverse


The modular inverse of 𝑎 modulo 𝑛 is the integer 𝑎 −1 such that 𝑎𝑎 −1 ≡ 1
(mod 𝑛), if such an integer exists.
Considering our multiplication table of Z3 , we see that 0 has no inverse and
1 is its own inverse (just as with the real numbers). However, since 2 · 2 = 4 ≡ 1
(mod 3), 2 is actually its own inverse. If we instead consider multiplication in
Z4 , the situation is quite different.

* 0 1 2 3
0 0 0 0 0
1 0 1 2 3
2 0 2 0 2
3 0 3 2 1

Now, 2 does not even have an inverse! To determine when an inverse exists –
and if so, computing the inverse – we will make use of the extended Euclidean
algorithm. If 𝑎𝑎 −1 ≡ 1 (mod 𝑛), we have 𝑛 | 𝑎𝑎 −1 − 1, meaning 𝑎𝑎 −1 − 1 = 𝑛𝑥
for some integer 𝑥. Rearranging this equation gives us 𝑎𝑎 −1 − 𝑛𝑥 = 1. We know
from Section 19.3 that this has a solution if and only if (𝑎, 𝑛) = 1. In this case,
we can use the extended Euclidean algorithm to compute 𝑎 −1 . Note that be
Bezout’s identity, 𝑎 −1 is actually unique modulo 𝑛.
Just like the reals, modular arithmetic has a cancellation law regarding

331
C HAPTER 19. N UMBER T HEORY

Theorem 19.7
Assume 𝑎⊥𝑛. Then 𝑎𝑏 ≡ 𝑎𝑐 (mod 𝑛) implies 𝑏 ≡ 𝑐 (mod 𝑛).

Proof. Since 𝑎⊥𝑛, there is a number 𝑎 −1 such that 𝑎𝑎 −1 ≡ 1 (mod 𝑛).

𝑎𝑏 ≡ 𝑎𝑐 (mod 𝑛)

with 𝑎 −1 results in
𝑎𝑎 −1𝑏 ≡ 𝑎𝑎 −1𝑐 (mod 𝑛)
Simplifying 𝑎𝑎 −1 gives us

𝑏 ≡𝑐 (mod 𝑛)

Another common modular operation is exponentiation, i.e. computing 𝑎𝑚


(mod 𝑛). While this can be computed easily in Θ(𝑚), we can actually do better
using a method called exponentiation by squaring. It is essentially based on
the recursion

1 mod 𝑛 if m = 0




𝑎 mod 𝑛 = 𝑎 · (𝑎𝑚−1 mod 𝑛) mod 𝑛


𝑚
if m odd

 (𝑎 2 mod 𝑛) 2 mod 𝑛 if m even

 𝑚


This procedure is clearly Θ(log2 𝑚), since applying the recursive formula
for even numbers halve the 𝑚 to be computed, while applying it an odd number
will first make it even and then halve it in the next iteration. It is very important
that 𝑎 2 mod 𝑛 is computed only once, even though it is squared! Computing it
𝑚

twice causes the complexity to degrade to Θ(𝑚) again.

19.5 Chinese Remainder Theorem


The Chinese Remainder Theorem is probably the most useful theorem in
algorithmic problem solving. It gives us a way of solving certain systems of
linear equations.

Theorem 19.8 — Chinese Remainder Theorem

332
19.5. C HINESE R EMAINDER T HEOREM

Given a system of equations

𝑥 ≡ 𝑎1 (mod 𝑚 1 )
𝑥 ≡ 𝑎2 (mod 𝑚 2 )
...
𝑥 ≡ 𝑎𝑚 (mod 𝑚𝑛 )

where the numbers 𝑚 1, . . . , 𝑚𝑛 are pairwise relatively prime, there is a unique


integer 𝑥 (mod 𝑛𝑖=1 𝑚𝑖 ) that satisfy such a system.
Î

Proof. We will prove the theorem inductively. The theorem is clearly true
for 𝑛 = 1, with the unique solution 𝑥 = 𝑎 1 . Now, consider the two equations

𝑥 ≡ 𝑎1 (mod 𝑚 1 )
𝑥 ≡ 𝑎2 (mod 𝑚 2 )

Let 𝑥 = 𝑎 1 · 𝑚 2 · (𝑚 −1
2 mod 𝑚 1 ) + 𝑎 2 · 𝑚 1 · (𝑚 1 mod 𝑚 2 ), where 𝑚 1 mod 𝑚 2
−1 −1

is taken to be a modular inverse of 𝑚 1 modulo 𝑚 2 . These inverses exist, since


𝑚 1 ⊥𝑚 2 by assumption. We then have that 𝑥 = 𝑎 1 · 𝑚 2 · (𝑚 −1 2 mod 𝑚 1 ) + 𝑎 2 ·
𝑚 1 · (𝑚 1 mod 𝑚 2 ) ≡ 𝑎 1 · 𝑚 2 · (𝑚 2 mod 𝑚 1 ) ≡ 𝑎𝑖 (mod 𝑚 1 ).
−1 −1

Since a solution exist for every 𝑎 1, 𝑎 2 , this solution must be unique by


the pigeonhole principle – there are 𝑚 1 · 𝑚 2 possible values for 𝑎 1, 𝑎 2 , and
𝑚 1 · 𝑚 2 possible values for 𝑥. Thus, the theorem is also true for 𝑛 = 2.
Assume the theorem is true for 𝑘 − 1 equations. Then, we can replace the
equations

𝑥 ≡ 𝑎1 (mod 𝑚 1 )
𝑥 ≡ 𝑎2 (mod 𝑚 2 )

with another equation

𝑥 ≡𝑥∗ (mod 𝑚 1𝑚 2 )

333
C HAPTER 19. N UMBER T HEORY

where 𝑥∗ is the solution to the first two equations. We just proved those
two equations are equivalent with regards to 𝑥. This reduces the number of
equations to 𝑘 − 1, which by assumption the theorem holds for. Thus, it also
holds for 𝑘 equations. 

Note that the theorem used an explicit construction of the solution, allowing
us to find what the unique solution to such a system is.

Radar
KTH Challenge 2014
We say that an integer 𝑧 is within distance 𝑦 of an integer 𝑥 modulo an integer 𝑚
if
𝑧 ≡ 𝑥 + 𝑡 (mod 𝑚)
where |𝑡 | ≤ 𝑦.
Find the smallest non-negative integer 𝑧 such that it is:

• within distance 𝑦1 of 𝑥 1 modulo 𝑚 1

• within distance 𝑦2 of 𝑥 2 modulo 𝑚 2

• within distance 𝑦3 of 𝑥 3 modulo 𝑚 3

Input
The integers 0 ≤ 𝑚 1, 𝑚 2, 𝑚 3 ≤ 106 . The integers 0 ≤ 𝑥 1, 𝑥 2, 𝑥 3 ≤ 106 . The
integers 0 ≤ 𝑦1, 𝑦2, 𝑦3 ≤ 300.
Output
The integer 𝑧.
The problem gives rise to three linear equations of the form

𝑧 ≡ 𝑥 𝑖 + 𝑡𝑖 (mod 𝑚𝑖 )

where −𝑦𝑖 ≤ 𝑡𝑖 ≤ 𝑦𝑖 . If we fix all the variables 𝑡𝑖 , the problem reduces to solving
the system of equations using CRT. We could then find all possible values of
𝑧, and choose the minimum one. This requires applying the CRT construction
about 2 · 6003 = 432 000 000 times. Since the modulo operation involved is
quite expensive, this approach would use too much time. Instead, let us exploit
a useful greedy principle in finding minimal solutions.
Assume that 𝑧 is the minimal answer to an instance. There are only two

334
19.6. E ULER ’ S TOTIENT FUNCTION

situations where 𝑧 − 1 cannot be a solution as well:

• 𝑧 = 0 – since 𝑧 must be non-negative, this is the smallest possible answer

• 𝑧 ≡ 𝑥𝑖 − 𝑦𝑖 – then, decreasing 𝑧 would violate one of the constraints

In the first case, we only need to verify whether 𝑧 = 0 is a solution to the three
inequalities. In the second case, we managed to change an inequality to a linear
equation. By testing which of the 𝑖 this equation holds for, we only need to test
the values of 𝑡𝑖 for the two other equations. This reduce the number of times we
need to use the CRT to 6002 = 360 000 times, a modest amount well within the
time limit.

19.6 Euler’s totient function


Now that we have talked about modular arithmetic, we can give the numbers
which are not divisors to some integer 𝑛 their well-deserved attention. This
discussion will start with the 𝜙-function.

Definition 19.9 Two integers 𝑎 and 𝑏 are said to be relatively prime if their
only (and thus greatest) common divisor is 1. If 𝑎 and 𝑏 are relatively prime,
we write that 𝑎⊥𝑏.

Example 19.8 The numbers 74 and 22 are not relatively prime, since they
are both divisible by 2.
The numbers 72 and 65 are relatively prime. The prime factorization
of 72 is 2 · 2 · 2 · 3 · 3, and the factorization of 65 is 5 · 13. Since these
numbers have no prime factors in common, they have no divisors other than
1 in common.

Given an integer 𝑛, we ask ourselves how many of the integers 1, 2, . . . , 𝑛


which are relatively prime to 𝑛.

Definition 19.10 — Euler’s totient function


Euler’s totient function 𝜙 (𝑛) is defined as the number if integers 𝑘 ∈ [1, 𝑛]
such that (𝑘, 𝑛) = 1, i.e. those positive integers less than 𝑛 which are
co-prime to 𝑛.

335
C HAPTER 19. N UMBER T HEORY

Example 19.9 What is 𝜙 (12)? The numbers 2, 4, 6, 8, 10 all have the factor 2
in common with 12 and the numbers 3, 6, 9 all have the factor 3 in common
with 12.
This leaves us with the integers 1, 5, 7, 11 which are relatively prime to
12. Thus, 𝜙 (12) = 4.

For prime powers, 𝜙 (𝑝 𝑘 ) is easy to compute. The only integers which are
𝑝𝑘
not relatively prime to 𝜙 (𝑝 𝑘 ) are the multiples of 𝑝, which there are 𝑝 = 𝑝 𝑘−1
of, meaning
𝑝ℎ𝑖 (𝑝 𝑘 ) = 𝑝 𝑘 − 𝑝 𝑘−1 = 𝑝 𝑘−1 (𝑝 − 1)
It turns out 𝜙 (𝑛) has a property which is highly useful in computing certain
number theoretical functions – it is multiplicative, meaning

𝑝ℎ𝑖 (𝑎𝑏) = 𝜙 (𝑎)𝜙 (𝑏) if 𝑎⊥𝑏

For multiplicative functions, we can reduce the problem of computing arbitrary


values of the function to finding a formula only for prime powers. The reasoning
behind the multiplicativity of 𝜙 is quite simple. Let 𝑎 0 = 𝑎 −𝜙 (𝑎), i.e. the number
of integers which do share a factor with 𝑎, and similarly 𝑏 0 = 𝑏 −𝜙 (𝑏). Then, there
will be 𝑎𝑏 0 numbers between 1 and 𝑎𝑏 which share a factor with 𝑏. If 𝑥 is one of
the 𝑏 0 numbers sharing a factor with 𝑏, then so are 𝑥, 𝑥 +𝑏, 𝑥 + 2𝑏, ..., 𝑥 + (𝑎 − 1)𝑏.
Similarly, there will be 𝑎 0𝑏 numbers between 1 and 𝑎𝑏 sharing a factor with
𝑎. However, there may be some numbers sharing both a factor with 𝑎 and 𝑏.
Consider two such numbers 𝑥 + 𝑖𝑏 = 𝑦 + 𝑗𝑎, which gives 𝑖𝑏 − 𝑗𝑎 = 𝑦 − 𝑥. By
Bezout’s identity, this have a single solution (𝑖, 𝑗) modulo 𝑎𝑏, meaning every
number 𝑥 + 𝑖𝑏 equals exactly one number 𝑦 + 𝑗𝑎. Thus, there were 𝑎 0𝑏 0 numbers
sharing a factor with both 𝑎 and 𝑏. This means there are 𝑎𝑏 0 + 𝑎 0𝑏 − 𝑎 0𝑏 0 numbers
sharing a factor with either 𝑎 and 𝑏, so

𝑝ℎ𝑖 (𝑎𝑏) = 𝑎𝑏 − 𝑎𝑏 0 − 𝑎 0𝑏 + 𝑎 0𝑏 0 = (𝑎 − 𝑎 0) (𝑏 − 𝑏 0) = 𝜙 (𝑎)𝜙 (𝑏)

and we are done.


Using the multiplicativity of 𝜙 we get the simple formula

𝑝ℎ𝑖 (𝑝 𝑒11 . . . 𝑝𝑘𝑒𝑘 ) = 𝜙 (𝑝 𝑒11 ) · · · 𝜙 (𝑝𝑘𝑒𝑘 ) = 𝑝 𝑒11 −1 (𝑝 1 − 1) · · · 𝑝𝑘𝑒𝑘 −1 (𝑝𝑘 − 1)

Computing 𝜙 for a single value can thus be done as quickly as factoring the
number. If we wish to compute 𝜙 for an interval [1, 𝑛] we can use the Sieve of
Eratosthenes.

336
19.6. E ULER ’ S TOTIENT FUNCTION

This seemingly convoluted function might seem useless, but is of great


importance via the following theorem:

Theorem 19.9 — Euler’s theorem

If 𝑎 and 𝑛 are relatively prime and 𝑛 ≥ 1,

𝑎𝜙 (𝑛) ≡ 1 (mod 𝑛)

Proof. The proof of this theorem isn’t trivial, but it is number theoretically
interesting and helps to build some intuition for modular arithmetic. The idea
behind the proof will be to consider the product of the 𝜙 (𝑛) positive integers
less than 𝑛 which are relatively prime to 𝑛. We will call these 𝑥 1, 𝑥 2, . . . , 𝑥𝜙 (𝑛) .
Since these are all distinct integers between 1 and 𝑛, they are incongruent
modulo 𝑛. We call such a set of 𝜙 (𝑛) numbers, all incongruent modulo 𝑛 a
complete residue system (CRS) modulo 𝑛.
Next, we will prove that 𝑎𝑥 1, 𝑎𝑥 2, . . . , 𝑎𝑥𝜙 also form a CRS modulo 𝑛. We
need to show two properties for this:

1. All numbers are relatively prime to 𝑛

2. All numbers are incongruent modulo 𝑛

We will start with the first property. Since both 𝑎 and 𝑥𝑖 are relatively
prime to 𝑛, neither number have a prime factor in common with 𝑛. This
means 𝑎𝑥𝑖 have no prime factor in common with 𝑛 either, meaning the two
numbers are relatively prime. The second property requires us to make use of
the cancellation property of modular arithmetic (Theorem 19.7). If 𝑎𝑥𝑖 ≡ 𝑎𝑥 𝑗
(mod 𝑛), the cancellation law gives us 𝑥𝑖 ≡ 𝑥 𝑗 (mod 𝑛). Since all 𝑥𝑖 are
incongruent modulo 𝑛, we must have 𝑖 = 𝑗, meaning all the numbers 𝑎𝑥𝑖
are incongruent as well. Thus, these numbers did indeed form a complete
residue system modulo 𝑛.
If 𝑎𝑥 1, . . . , 𝑎𝑥𝜙 (𝑛) form a CRS, we know every 𝑎𝑥𝑖 must be congruent to
some 𝑥 𝑗 , meaning

𝑎𝑥 1 · · · 𝑎𝑥𝜙 (𝑛) ≡ 𝑥 1 · · · 𝑥𝜙 (𝑛) (mod 𝑛)

337
C HAPTER 19. N UMBER T HEORY

Factoring the left hand size turns this into

𝑎𝜙 (𝑛) 𝑥 1 · · · 𝑥𝜙 (𝑛) ≡ 𝑥 1 · · · 𝑥𝜙 (𝑛) (mod 𝑛)

Since all the 𝑥𝑖 are relatively prime to 𝑛, we can again use the cancellation
law, leaving
𝑎𝜙 (𝑛) ≡ 1 (mod 𝑛)
completing our proof of Euler’s theorem. 

For primes 𝑝 we get a special case of Euler’s theorem when since 𝜙 (𝑝) =
𝑝 − 1.

Corollary 19.1
Fermat’s Theorem For a prime 𝑝 and an integer 𝑎⊥𝑝, we have

𝑎𝑝−1 ≡ 1 (mod 𝑝)

Competitive Tip

By Fermat’s Theorem, we also have 𝑎𝑝−2 ≡ 𝑎 −1 (mod 𝑝) when 𝑝 is prime (and


𝑎⊥𝑝). This is often an easier way of computing modular inverses modulo primes than
using the extended Euclidean algorithm, in particular if you already coded modular
exponentiation.

Exponial
Nordic Collegiate Programming Contest 2016
Define the exponial of 𝑛 as the function
21
(𝑛−2) ...
𝑜𝑝𝑒𝑥𝑝𝑜𝑛𝑖𝑎𝑙 (𝑛) = 𝑛 (𝑛−1)

Compute exponial(𝑛) (mod 𝑚).


Input
The input contains the integers 1 ≤ 𝑛, 𝑚 ≤ 109 .
Output
Output a single integer, the value of exponial(𝑛) (mod 𝑚).
Euler’s theorem suggests a recursive approach. Since 𝑛𝑒 (mod 𝑚) is periodic

338
19.6. E ULER ’ S TOTIENT FUNCTION

(in 𝑒), with a period of 𝜙 (𝑚), maybe when computing 𝑛 (𝑛−1) we could compute
...

𝑒 = (𝑛 − 1) ... modulo 𝜙 (𝑚) and only then compute 𝑛𝑒 (mod 𝑚)? Alas, this
is only useful when 𝑛⊥𝑚, since this is a necessary precondition for Euler’s
theorem. When working modulo some integer 𝑚 with a prime factorization of
𝑝 𝑒11 · · · 𝑝𝑘𝑒𝑘 , a helpful approach is to instead work modulo its prime powers 𝑝𝑖𝑒𝑖
and then combine the results using the Chinese remainder theorem. Since the
prime powers of a prime factorization is relatively prime, the remainder theorem
applies.
Let us apply this principle to Euler’s theorem. When computing 𝑛𝑒 mod 𝑝 𝑘
we have two cases. Either 𝑝 | 𝑛, in which case 𝑛𝑒 ≡ 0 (mod 𝑝 𝑘 ) whenever
𝑒 ≥ 𝑘. Otherwise, 𝑝⊥𝑛, and 𝑛𝑒 ≡ 𝑛𝑒 mod 𝜙 (𝑝 ) (mod 𝑝 𝑘 ) by Euler’s theorem.
𝑘

This suggests that if 𝑒 ≥ max(𝑒 1, . . . , 𝑒𝑘 ), we have that 𝑛𝑒 is indeed periodic.


Furthermore, 𝑒𝑖 is bounded by log2 𝑛, since

𝑝𝑖𝑒𝑖 ≤ 𝑛 ⇒

𝑒𝑖 log2 (𝑝𝑖 ) ≤ log2 (𝑁 ) ⇒


log2 (𝑁 )
𝑒𝑖 ≤ ≤ log2 (𝑁 )
log2 (𝑝𝑖 )
As 𝑝𝑖 ≥ 2, we know that log2 (𝑝𝑖 ) ≥ 1, which we used in the final inequality.
21
Since log2 (109 ) ≈ 30 and 43 ≥ 30, we can use the periodicity of 𝑛𝑒 whenever
𝑛 ≥ 5.
𝑛𝜙 (𝑚)+exponial(𝑛−1) mod 𝜙 (𝑚) mod 𝑚
For 𝑛 = 4, the exponial equals only 262144, meaning we can compute it
immediately.
One final insight remains. If we use the recursive formula, i.e. first
computing 𝑒 = (𝑛 − 1) (𝑛−2) mod 𝜙 (𝑚) and then 𝑛𝜙 (𝑚)+𝑒 mod 𝜙 (𝑚) mod 𝑚, we
...

still have the problem that 𝑛 can be up to 109 . We would need to perform a
number of exponentiations that is linear in 𝑛, which is slow for such large 𝑛.
However, our modulo will actually very quickly converge to 1. While the final
result is taken modulo 𝑚, the first recursive call is taken modulo 𝜙 (𝑚). The
recursive call performed at the next level will thus be modulo 𝜙 (𝜙 (𝑚)), and so
on. That this sequence decrease very quickly is based on two facts. For even 𝑚,
𝜙 (𝑚) = 𝜙 (2)𝜙 ( 𝑚2 ) = 𝜙 ( 𝑚2 ) ≤ 𝑚2 . For odd 𝑚, 𝜙 (𝑚) is even. Any odd 𝑚 consists
only of odd prime factors, but since 𝜙 (𝑝) = 𝑝 − 1 (i.e. an even number for odd
primes 𝑝) and 𝜙 is multiplicative, 𝜙 (𝑚) must be even. Thus 𝜙 (𝜙 (𝑚)) ≤ 𝑚2 for

339
C HAPTER 19. N UMBER T HEORY

𝑚 > 1 (1 is neither even nor contains an odd prime factor). This means the
modulo will become 1 in a logarithmic number of iterations, completing our
algorithm.

340
19.6. E ULER ’ S TOTIENT FUNCTION

Chapter Exercises
Problem 19.12
Longest Composite Sum – longcompositesum
Inheritance – inheritance
Evening Out 3 – eveningout3
Indivisible Sequence – indivisibleseq
Let 𝑆 be the set of all positive integer divisors of 𝑘. How many numbers are
the product of two distinct elements of 𝑆?
Happy Happy Prime Prime – happyprime
Prime Path – primepathc

Chapter Notes
A highly theoretical introduction to classical number theory can be found in
An Introduction to the Theory of Numbers[13] While devoid of exercises and
examples, it is very comprehensive.
A Computational Introduction to Number Theory and Algebra[24] instead
takes a more applied approach, and is freely available under a Creative Commons
license at the authors home page.3.

3http://www.shoup.net/ntb/

341
C HAPTER 19. N UMBER T HEORY

342
20 Competitive Programming Strat-
egy
Competitive programming is what we call the mind sport of solving algorithmical
problems and coding their solutions, often under the pressure of time. Most
programming competitions are performed online, at your own computer through
some kind of online judge system. For students of either high school or
university, there are two main competitions. High school students compete in
the International Olympiad in Informatics (IOI), and university students go for
the International Collegiate Programming Contest (ICPC).
Different competition styles have different difficulty, problem types and
strategies. In this chapter, we will discuss some basic strategy of programming
competitions, and give tips on how to improve your competitive skills.

20.1 IOI
The IOI is an international event where a large number of countries send teams
of up to 4 high school students to compete individually against each other during
two days of competition. Every participating country has its own national
selection olympiad first.
During a standard IOI contest, contestants are given 5 hours to solve 3
problems, each worth at most 100 points. These problems are not given in
any particular order, and the scores of the other contestants are hidden until
the end of the contest. Generally none of the problems are “easy” in the sense
that it is immediately obvious how to solve the problem in the same way the
first 1-2 problems of most other competitions are. This poses a large problem,
in particular for the amateur. Without any trivial problems nor guidance from
other contestants on what problems to focus on, how does an IOI competitor
prioritize? The problem is further exacerbated by problems not having a simple
binary scoring, with a submission being either accepted or rejected. Instead,
IOI problems contain many so-called subtasks. These subtasks give partial
credit for the problem, and contain additional restrictions and limits on either

343
C HAPTER 20. C OMPETITIVE P ROGRAMMING S TRATEGY

input or output. Some problems do not even use discrete subtasks. In these
tasks, scoring is done on some scale which determines how “good” the output
produced by your program is.

Strategy
Very few contestants manage to solve every problem fully during an IOI contest.
There is a very high probability you are not one of them, which leaves you
with two options – you either skip a problem entirely, or you solve some of its
subtasks. At the start of the competition, you should read through every problem
and all of the subtasks. In the IOI you do not get extra points for submitting faster.
Thus, it does not matter if you read the problems at the beginning instead of
rushing to solve the first problem you read. Once you have read all the subtasks,
you will often see the solutions to some of the subtasks immediately. Take note
of the subtasks which you know how to solve!
Deciding on which order you should solve subtasks in is probably one of
the most difficult parts of the IOI for contestants at or below the silver medal
level. In IOI 2016, the difference between receiving a gold medal and a silver
medal was a mere 3 points. On one of the problems, with subtasks worth 11,
23, 30 and 36 points, the first silver medalist solved the third subtask, worth 30
points (a submission that possibly was a failed attempt at 100 points). Most
competitors instead solved the first two subtasks, together worth 34 points. If
the contestant had solved the first two subtasks instead, he would have gotten a
gold medal.
The problem basically boils down to the question when should I solve
subtasks instead of focusing on a 100 point solution? There is no easy answer
to this question, due to the lack of information about the other contestants’
performances. First of all, you need to get a good sense of how difficult a
solution will be to implement correctly before you attempt it. If you only have
30 minutes left of a competition, it might not be a great idea to go for a 100
point solution on a very tricky problem. Instead, you might want to focus on
some of the easier subtasks you have left on this or other problems. If you fail
your 100 point solution which took over an hour to code, it is nice to know you
did not have some easy subtasks worth 30-60 points which could have given
you a medal.
Problems without discrete scoring (often called heuristic problems) are
almost always the hardest ones to get a full score on. These problems tend to

344
20.1. IOI

be very fun, and some contestants often spend way too much time on these
problems. They are treacherous in that it is often easy to increase your score by
something. However, those 30 minutes you spent to gain one additional point
may have been better spent coding a 15 point subtask on another problem. As a
general rule, go for the heuristic problem last during a competition. This does
not mean to skip the problem unless you completely solve the other two, just to
focus on them until you decide that the heuristic problem is worth more points
if given the remaining time.
In IOI, you are allowed to submit solution attempts a large number of times,
without any penalty. Use this opportunity! When submitting a solution, you
will generally be told the results of your submission on each of the secret test
cases. This provides you with much details. For example, you can get a sense of
how correct or wrong your algorithm is. If you only fail 1-2 cases, you probably
just have a minor bug, but your algorithm in general is probably correct. You
can also see if your algorithm is fast enough, since you will be told the execution
time of your program on the test cases. Whenever you make a change to your
code which you think affect correctness or speed – submit it again! This gives
you a sense of your progress, and also works as a good regression test. If your
change introduced more problems, you will know.
Whenever your solution should pass a subtask, submit it. These subtask
results will help you catch bugs earlier when you have less code to debug.

Getting Better
The IOI usually tend to have pretty hard problems. Some areas get rather little
attention. For example, there are basically no pure implementation tasks and
very little geometry.
First and foremost, make sure you are familiar with all the content in the IOI
syllabus1. This is an official document which details what areas are allowed in
IOI tasks. This book deals with most, if not all of the topics in the IOI syllabus.
In the Swedish IOI team, most of the top performers tend to also be good
mathematical problem solvers (also getting IMO medals). Combinatorial
problems from mathematical competitions tend to be somewhat similar to
the algorithmic frame of mind, and can be good practice for the difficult IOI
problems.
When selecting problems to practice on, there are a large number of national

1 https://people.ksp.sk/~misof/ioi-syllabus/

345
C HAPTER 20. C OMPETITIVE P ROGRAMMING S TRATEGY

olympiads with great problems. The Croatian Open Competition in Informatics2


is a good source. Their competitions are generally a bit easier than solving IOI
with full marks, but are good practice. Additionally, they have a final round
(the Croatian Olympiad in Informatics) which are of high quality and difficulty.
COCI publishes solutions for all of their contests. These solutions help a lot in
training.
One step up in difficulty from COCI is the Polish Olympiad in Informatics3.
This is one of the most difficult European national olympiad published in English,
but unfortunately they do not publish solutions in English for their competitions.
There are also many regional olympiads, such as the Baltic, Balkan, Central
European, Asia-Pacific Olympiads in Informatics. Their difficulty is often higher
than that of national olympiads, and of the same format as an IOI contest (3
problems, 5 hours). These, and old IOI problems, are probably the best sources
of practice.

20.2 ICPC
In ICPC, you compete in teams of three to solve about 10-12 problems during 5
hours. A twist in in the ICPC-style competitions is that the team shares a single
computer. This makes it a bit harder to prioritize tasks in ICPC competitions
than in IOI competitions. You will often have multiple problems ready to be
coded, and wait for the computer. In ICPC, you see the progress of every other
team as well, which gives you some suggestions on what to solve. As a beginner
or medium-level team, this means you will generally have a good idea on what to
solve next, since many better teams will have prioritized tasks correctly for you.
ICPC scoring is based on two factors. First, teams are ranked by the number
of solved problems. As a tie breaker, the penalty time of the teams are used.
The penalty time of a single problem is the number of minutes into the contest
when your first successful attempt was submitted, plus a 20 minute penalty for
any rejected attempts. Your total penalty time is the sum of penalties for every
problem.

Strategy
In general, teams will be subject to the penalty tie-breaking. In the 2016 ICPC
World Finals, both the winners and the team in second place solved 11 problems.
2http://hsin.hr/coci/
3http://main.edu.pl/en/archive/oi

346
20.2. ICPC

Their penalty time differed by a mere 7 minutes! While such a small penalty
difference in the very top is rather unusual, it shows the importance of taking
your penalty into account.
Minimizing penalties generally comes down to a few basic strategic points:

• Solving the problems in the right order.

• Solving each problem quickly.

• Minimizing the number of rejected attempts.

In the very beginning of an ICPC contest, the first few problems will be
solved quickly. In 2016, the first accepted submissions to five of the problems
came in after 11, 15, 18, 32, 44 minutes. On the other hand, after 44 minutes
no team had solved all of those problems. Why does not every team solve the
problems in the same order? Teams are of different skill in different areas, make
different judgment calls regarding difficulty or (especially early in the contest)
simply read the problem in a different order. The better you get, the harder it is
to distinguish between the “easy” problems of a contest – they are all “trivial”
and will take less than 10-15 minutes to solve and code.
Unless you are a very good team or have very significant variations in skill
among different areas (e.g., you are graph theory experts but do not know how
to compute the area of a triangle), you should probably follow the order the
other teams choose in solving the problems. In this case, you will generally
always be a few problems behind the top teams.
The better you get, the harder it is to exploit the scoreboard. You will more
often be tied in the top with teams who have solved the exact same problems.
The problems which teams above you have solved but you have not may only be
solved by 1-2 teams, which is not a particularly significant indicator in terms
of difficulty. Teams who are very strong at math might prioritize a hard maths
problem before an easier (on average for most teams) dynamic programming
problem. This can risk confusing you into solving the wrong problems for the
particular situation of your team.
The amount of cooperation during a contest is difficult to decide upon. The
optimal amount varies a lot between different teams. In general, the amount of
cooperation should increase within a single contest from the start to the end.
In the beginning, you should work in parallel as much as possible, to quickly
read all the problems, pick out the easy-medium problems and start solving

347
C HAPTER 20. C OMPETITIVE P ROGRAMMING S TRATEGY

them. Once you have competed in a few contests, you will generally know the
approximate difficulty of the simplest tasks, so you can skim the problem set for
problems of this difficulty. Sometimes, you find an even easier problem in the
beginning than the one the team decided to start coding.
If you run out of problems to code, you waste computer time. Generally, this
should not happen. If it does, you need to become faster at solving problems.
Towards the end of the contest, it is a common mistake to parallelize on
several of the hard problems at the same time. This carries a risk of not solving
any of the problems in the end, due to none of the problems getting sufficient
attention. Just as with subtasks in IOI, this is the hardest part of prioritizing
tasks. During the last hour of an ICPC contest, the previously public scoreboard
becomes frozen. You can still see the number of attempts other teams make, but
not whether they were successful. Hence, you can not really know how many
problems you have to solve to get the position that you want. Learning your
own limits and practicing a lot as a team – especially on difficult contests – will
help you get a feeling for how likely you are to get in all of your problems if you
parallelize.
Read all the problems! You do not want to be in a situation where you run
out of time during a competition, just to discover there was some easy problem
you knew how to solve but never read the statement of. ICPC contests are made
more complex by the fact that you are three different persons, with different
skills and knowledge. Just because you can not solve a problem does not mean
your team mates will not find the problem trivial, have seen something similar
before or are just better at solving this kind of problem.
The scoreboard also displays failed attempts. If you see a problem where
many teams require extra attempts, be more careful in your coding. Maybe you
can perform some extra tests before submitting, or make a final read-through of
the problem and solution to make sure you did not miss any details.
If you get Wrong Answer, you may want to spend a few minutes to code up
your own test case generators. Prefer generators which create cases where you
already know the answers. Learning e.g. Python for this helps, since it usually
takes under a minute to code a reasonable complex input generator.
If you get Time Limit Exceeded, or even suspect time might be an issue
– code a test case generator. Losing a minute on testing your program on the
worst case, versus a risk of losing 20 minutes to penalty is be a trade-off worth
considering on some problems.

348
20.2. ICPC

You are allowed to ask questions to the judges about ambiguities in the
problems. Do this the moment you think something is ambiguous (judges
generally take a few valuable minutes in answering). Most of the time they give
you a “No comment” response, in which case the perceived ambiguity probably
was not one.
If neither you nor your team mates can find a bug in a rejected solution,
consider coding it again from scratch. Often, this can be done rather quickly
when you have already coded a solution.

Getting Better
• Practice a lot with your team. Having a good team dynamic and learning
what problems the other team members excel at can be the difference that
helps you to solve an extra problem during a contest.

• Learn to debug on paper. Wasting computer time for debugging means


not writing code! Whenever you submit a problem, print the code. This
can save you a few minutes in getting your print-outs when judging is
slow (in case your submission will need debugging). If your attempt was
rejected, you can study your code on paper to find bugs. If you fail on
the sample test cases and it takes more than a few minutes to fix, add a
few lines of debug output and print it as well (or display it on half the
computer screen).

• Learn to write code on paper while waiting for the computer. In particular,
tricky subroutines and formulas are great to hammer out on paper before
occupying valuable computer time.

• Focus your practice on your weak areas. If you write buggy code, learn
your programming language better and code many complex solutions.
If your team is bad at geometry, practice geometry problems. If you
get stressed during contests, make sure you practice under time pressure.
For example, Codeforces (http://codeforces.com) has an excellent gym
feature, where you can compete retroactively in a contest using the same
amount of time as in the original contest. The scoreboard will then show
the corresponding scoreboard from the original contest during any given
time.

349
C HAPTER 20. C OMPETITIVE P ROGRAMMING S TRATEGY

350
21 Papers
This chapter contains a series of problems not associated to any particular topics.
They are a way for you to practice problem solving without being primed with
the techniques that will come up, and a way for us to show techniques, tricks and
combinations that does not fit naturally into the text of any particular chapter.
The problems are divided into shorter “papers”. Each paper includes a
suggested time that could be used if the problems were posed in a real contest.
Some of the papers are actually real contests that was given in the past.
After each contest, you can find solution descriptions of the problem.

351
C HAPTER 21. PAPERS

21.1 Paper 3

Divisor Solitaire
Nicolaas likes playing a solitaire game about divisors. First, he picks an integer
𝑁 (1 ≤ 𝑁 ≤ 1014 ). Then, in each round of the game he picks a divisor of
𝑁 , such that it neither divides or is a divisor of any previously picked number.
Given 𝑁 , determine the maximum number of rounds he can play.

Problem 21.1
Divisor Solitaire – divisorsolitaire

352
21.1. PAPER 3

Solutions
Divisor Solitaire A reasonable guess guided by combinatorial intuition is to
pick as solution exactly those divisors where Ω(𝑑) = 𝑙 for some 𝑙. Two such
number cannot be divisors of each other (why?). It is also difficult to come
up with something better in the case where all 𝑒𝑖 = 1 (i.e. 𝑁 is the product
Í
of
𝑘
distinct primes). In particular, we prove that it is optimal to let 𝑙 = b 𝑖=1
𝑒 𝑖
2 c.
Roughly, we show that all divisors of 𝑁 can be partitioned into chains of
divisors of the form 𝑑 1, 𝑑 2, . . . , 𝑑𝑚 such that 𝑑𝑖 | 𝑑𝑖+1 and Ω(𝑑𝑖 ) = 𝑙 for some
𝑖. All the integers in a chain divide each other, so we can only pick a single
divisor from each chain, making the number of such chains an upper bound to
the answer. As each chain contains a 𝑑𝑖 with Ω(𝑑𝑖 ) = 𝑙 (there can only be one
per chain), then the number of divisors of this kind must also be an upper bound.
Since that upper bound is attainable (by picking the set of all such divisors), it is
the largest possible such set.
More specifically, every chain 𝑑 1 , . . . , 𝑑ℎ , will be such that 𝑑𝑖+1 = 𝑑𝑖 𝑝
for some prime 𝑝, and Ω(𝑑 1 ) + Ω(𝑑ℎ ) = 𝑁 . The first condition implies
Ω(𝑑𝑖+1 ) = Ω(𝑑𝑖 ) + 1 and the second implies Ω(𝑑 1 ) ≤ 𝑙 and Ω(𝑑ℎ ) ≥ 𝑙, so that
Ω(𝑑𝑖 ) = 𝑙 for some 𝑖.
Such a partition exists by the following construction. Assume that all
divisors of 𝑁 0 = 𝑘𝑖=2 𝑝𝑖𝑒𝑖 can be partitioned into such chains. If 𝑑 1 , . . . , 𝑑ℎ , is
Î

such a chain, we can also partition the ℎ𝑒 1 integers 𝑑𝑖 · 𝑝 1𝛼 into chains. First,
take 𝑑 1 , 𝑑 1𝑝 1 , . . . , 𝑑 1𝑝 𝑒11 −1 , 𝑑 1𝑝 𝑒11 , 𝑑 2𝑝 𝑒11 , . . . 𝑑ℎ 𝑝 𝑒11 . This is a valid chain, since
Ω(𝑑 1 ) + Ω(𝑑ℎ 𝑝 𝑒11 ) = Ω(𝑑 1 ) + Ω(𝑑ℎ ) + 𝑒 1 = Ω(𝑁 0) + 𝑒 1 = Ω(𝑁 0𝑝 𝑒11 ) = Ω(𝑁 ).
Similarly, 𝑑 2 , . . . , 𝑑 2𝑝 𝑒11 −1 , . . . 𝑑ℎ 𝑝 𝑒11 −1 is a chain, and so on. Eventually, all
numbers will be in one such chain. If we repeat this for all chains of 𝑁 0, every
divisor of 𝑁 will also belong to a chain.
Constructing a partition for 𝑁 0 can be done in the exact same manner. The
base case where we are partitioning the divisors of a single prime power 𝑝 𝑘 is
straightforward – 1, 𝑝, . . . , 𝑝 𝑘 is exactly a chain.
Finally, computing the number of such divisors is then a straightforward
exercise in bruteforce after factoring 𝑁 to compute the exponents 𝑒 1 , . . . , 𝑒𝑘 .
The proof is due to De Bruijn et al[9].

353
C HAPTER 21. PAPERS

354
Part III

Advanced Topics

355
22 Data Structures
22.1 Self-Balancing Trees
22.2 Persistent Data Structures
22.3 Heavy-Light Decomposition

357
C HAPTER 22. D ATA S TRUCTURES

358
23 Combinatorics
23.1 Convolutions
Fast Fourier Transform
Number Theoretic Transform

359
C HAPTER 23. C OMBINATORICS

360
24 Strings
24.1 Hashing
Hashing is a concept most familiar from the hash table data structure. The idea
behind the structure is to compress a set 𝑆 of elements from a large set to a
smaller set, in order to quickly determine memberships of 𝑆 by having a direct
indexing of the smaller set into an array (which has Θ(1) look-ups). In this
section, we are going to look at hashing in a different light, as a way of speeding
up comparisons of data. When comparing two pieces of data 𝑎 and 𝑏 of size
𝑛 for equality, we need to use Θ(𝑛) time in the worst case since every bit of
data must be compared. This is fine if we perform only a single comparison. If
we instead wish to compare many pieces of data, this becomes an unnecessary
bottleneck. We can use the same kind of hashing as with hash tables, by defining
a “random” function 𝐻 (𝑥) : 𝑆 → Z𝑛 such that 𝑥 ≠ 𝑦 implies 𝐻 (𝑥) ≠ 𝐻 (𝑦) with
high probability. Such a function allows us to perform comparisons in Θ(1) time
(with linear preprocessing), by reducing the comparison of arbitrary data to small
integers (we often choose 𝑛 to be on the order of 232 or 264 to get constant-time
comparisons). The trade-off lies in correctness, which is compromised in the
unfortunate event that we perform a comparison 𝐻 (𝑥) = 𝐻 (𝑦) even though
𝑥 ≠ 𝑦.

FriendBook
Swedish Olympiad in Informatics 2011, Finals
FriendBook is a web site where you can chat with your friends. For a long time,
they have used a simple “friend system” where each user has a list of which other
users are their “friends”. Recently, a somewhat controversial feature was added,
namely a list of your “enemies”. While the friend relation will always be mutual
(two users must confirm that they wish to be friends), enmity is sometimes
one-way – a person 𝐴 can have an enemy 𝐵, who – by plain animosity – refuse
to accept 𝐴 as an enemy.
Being a poet, you have lately been pondering the following quote.

361
C HAPTER 24. S TRINGS

A friend is someone who dislike the same people as yourself.

Given a FriendBook network, you wonder to what extent this quote applies.
More specifically, for how many pairs of users is it the case that they are either
friends with identical enemy lists, or are not friends and does not have identical
enemy lists?
Input
The first line contains an integer 2 ≤ 𝑁 ≤ 5000, the number of friends on
FriendBook. 𝑁 lines follow, each containing 𝑛 characters. The 𝑐’th character on
the 𝑟 ’th line 𝑆𝑟𝑐 species what relation person 𝑟 has to person 𝑐. This character is
either

V – in case they are friends.

F – if 𝑟 thinks of 𝑐 as an enemy.

. – 𝑟 has a neutral attitude towards 𝑐.

𝑆𝑖𝑖 is always ., and 𝑆𝑖 𝑗 is V if and only if 𝑆 𝑗𝑖 is V.


Output
Output a single integer, the number of pairs of persons for which the quote
holds.
This problem lends itself very well to hashing. It is clear that the problem is
about comparisons – indeed, we are to count the number of pairs of persons who
are either friends and have equal enemy lists or are not friends and have unequal
enemy lists. The first step is to extract the enemy lists 𝐸𝑖 for each person 𝑖. This
will be a 𝑁 -length string, where the 𝑗’th character is F if person 𝑗 is an enemy of
person 𝑖, and . otherwise. Basically, we remove all the friendships from the
input matrix. Performing naive comparisons on these strings would only give us
a 𝑂 (𝑁 3 ) time bound, since we need to perform 𝑁 2 comparisons of enemy lists
of length 𝑁 bounded only by 𝑂 (𝑁 ) in the worst case. Here, hashing comes to
our aid. By instead computing ℎ𝑖 = 𝐻 (𝐸𝑖 ) for every 𝑖, comparisons of enemy
lists instead become comparisons of the integers ℎ𝑖 – a Θ(1) operation – thereby
reducing the complexity to Θ(𝑁 2 ).
Alternative solutions exist. For example, we could instead have sorted all
the enemy lists, after which we can perform a partitioning of the lists by equality
in Θ(𝑁 2 ) time. However, this takes 𝑂 (𝑁 2 log 𝑁 ) time with naive sorting (or
𝑂 (𝑁 2 ) if radix sort is used, but it is more complex) and is definitely more

362
24.1. H ASHING

complicated to code than the hashing approach. Another option is to insert


all the strings into a trie, simplifying this partitioning and avoiding the sorting
altogether. This is better, but still more complex. While it would have the same
complexity, the constant factor would be significantly worse compared to the
hashing approach.
This is a common theme among string problems. While most string problems
can be solved without hashes, solutions using them tend to be simpler.
The true power of string hashing is not this basic preprocessing step where
we can only compare two strings. Another hashing technique allows us to
compare arbitrary substring of a string in constant time.

Definition 24.1 — Polynomial Hash


Let 𝑆 = 𝑠 1𝑠 2 . . . 𝑠𝑛 be a string. The polynomial hash 𝐻 (𝑆) of 𝑆 is the number

𝐻 (𝑆) = (𝑠 1𝑝 𝑛−1 + 𝑠 2𝑝 𝑛−2 + · · · + 𝑠𝑛−1𝑝 + 𝑠𝑛 ) mod 𝑀


As usual when dealing with strings in arithmetic expressions, we take 𝑠𝑖 to
be some numeric representation of the character, like its ASCII encoding. In
C++, char is actually a numeric type and is thus usable as a number when using
polynomial hashes.
Polynomial hashes have many useful properties.

Theorem 24.1 — Properties of the Polynomial Hash

If 𝑆 = 𝑠 1 . . . 𝑠𝑛 is a string and 𝑐 is a single character, we have that

1. 𝐻 (𝑆 ||𝑐) = (𝑝𝐻 (𝑆) + 𝐻 (𝑐)) mod 𝑀

2. 𝐻 (𝑐 ||𝑆) = (𝐻 (𝑆) + 𝐻 (𝑐)𝑝 𝑛 ) mod 𝑀

3. 𝐻 (𝑠 2 . . . 𝑠𝑛 ) = (𝐻 (𝑆) − 𝐻 (𝑠 1 )𝑝 𝑛−1 ) mod 𝑀

4. 𝐻 (𝑠 1 . . . 𝑠𝑛−1 ) = (𝐻 (𝑆) − 𝐻 (𝑠𝑛 ))𝑝 −1 mod 𝑀

5. 𝐻 (𝑠𝑙 𝑠𝑙+1 . . . 𝑠𝑟 −2𝑠𝑟 −1 ) = (𝐻 (𝑠 1 . . . 𝑠𝑅−1 ) − 𝐻 (𝑠 1 − 𝑆𝐿−1 )𝑝 𝑅−𝐿 ) mod 𝑀

Exercise 24.1. Prove the properties of Theorem 24.1


Exercise 24.2. How can we compute the hash of 𝑆 ||𝑇 in 𝑂 (1) given the hashes
of the strings 𝑆 and 𝑇 ?
Properties 1-4 alone allow us to append and remove characters from the

363
C HAPTER 24. S TRINGS

beginning and end of a hash in constant time. We refer to this property as


polynomial hashes being rolling. This property allows us to String Matching
problem with a single pattern (Section 16.2) with the same complexity as KMP,
by computing the hash of the pattern 𝑃 and then rolling a |𝑃 |-length hash
through the string we are searching in. This algorithm is called the Rabin-Karp
algorithm.
Property 5 allows us to compute the hash of any substring of a string in con-
stant time, provided we have computed the hashes 𝐻 (𝑆 1 ), 𝐻 (𝑠 1𝑠 2 ), . . . , 𝐻 (𝑠 1𝑠 2 . . . 𝑠𝑛 )
first. Naively this computation would be Θ(𝑛 2 ), but property 1 allows us to
compute them recursively, resulting in Θ(𝑛) precomputation.

Radio Transmission
Baltic Olympiad in Informatics 2009
Given is a string 𝑆. Find the shortest string 𝐿, such that 𝑆 is a substring of the
infinite string 𝑇 = . . . 𝐿𝐿𝐿𝐿𝐿 . . . .
Input
The first and only line of the input contains the string 𝑆, with 1 ≤ |𝑆 | ≤ 106 .
Output
Output the string 𝐿. If there are multiple strings 𝐿 of the shortest length, you
can output any of them.
Assume that 𝐿 has a particular length 𝑙. Then, since 𝑇 is periodic with length
𝑙, 𝑆 must be too (since it is a substring of 𝑇 ). Conversely, if 𝑆 is periodic with
some length 𝑙, can can choose as 𝐿 = 𝑠 1𝑠 2 . . . 𝑠𝑙 . Thus, we are actually seeking
the smallest 𝑙 such that 𝑆 is periodic with length 𝑙. The constraints this puts on 𝑆
are simple. We must have that

𝑠 1 = 𝑠𝑙+1 = 𝑠 2𝑙+1 = . . .

𝑠 2 = 𝑠𝑙+2 = 𝑠 2𝑙+2 = . . .

...

𝑠𝑙 = 𝑠 2𝑙 = 𝑠 3𝑙 = . . .

Using this insight as-is gives us a 𝑂 (|𝑆 | 2 ) algorithm, where we first fix 𝑙 and then
verify if those constraints hold. The idea is sound, but a bit slow. Again, the
problematic step is that we need to perform many slow, linear-time comparisons.

364
24.1. H ASHING

If we look at what comparisons we actually perform, we are actually comparing


two substrings of 𝑆 with each other:

𝑠 1𝑠 2 . . . 𝑠𝑛−𝑙+1 = 𝑠𝑙+1𝑠𝑙+2 . . . 𝑠𝑛

Thus we are actually performing a linear number of substring comparisons, which


we now know are actually constant-time operations after linear preprocessing.
Hashes thus gave us a Θ(𝑁 ) algorithm.

Radio Transmission
1 H lh = 0, Rh = 0;
2 int l = 0;
3 for (int i = 1; i <= n; ++i) {
4 Lh = (Lh * p + S[i]) % M;
5 Rh = (S[n - i + 1] * p^(i - 1) + Rh) % M;
6 if (Lh == Rh) {
7 l = i;
8 }
9 }
10 cout << n - l << endl;

Polynomial hashes are also a powerful tool to compare something against


a large number of strings using hash sets. For example, we could actually use
hashing as a replacement for Aho-Corasick. However, we would have to perform
one pass of rolling hash for each different pattern length. If the string we are
searching in is 𝑁 and the sum of pattern lengths are 𝑃, this is not 𝑂 (𝑁 + 𝑃)
however. If we have 𝑘 different √ pattern lengths, their sum must be at least
1 + 2 + · · · + 𝑘 = Θ(𝑘 2 ), so 𝑘 = 𝑂 ( 𝑃).

Substring Range Matching


Petrozavodsk Winter Training Camp 2015
Given 𝑁 strings 𝑠 1, 𝑠 2, . . . , 𝑠 𝑁 and a list of queries of the form 𝐿, 𝑅, 𝑆, answer for
each such query the number of strings in 𝑠𝐿 , 𝑠𝐿+1, . . . , 𝑠𝑅 which contain 𝑆 as a
substring.
Input
The first line contains 1 ≤ 𝑁 ≤ 50 000 and the number of queries 0 ≤ 𝑄 ≤
100 000. The next 𝑁 lines contains the strings 𝑠 1, 𝑠 2, . . . , 𝑠 𝑁 , one per line.
The next 𝑄 lines contains one query each. A query is given by the integers
1 ≤ 𝐿 ≤ 𝑅 ≤ 𝑁 and a string 𝑆.
The sum of |𝑆 | over all queries is at most 20 000. The lengths |𝑠 1 | + |𝑠 2 | +

365
C HAPTER 24. S TRINGS

· · · + |𝑠 𝑁 | is at most 50 000.
Output
For each query 𝐿, 𝑅, 𝑆, output a line with the answer to the query.
Let us focus on how to solve the problem where every query has the same
string 𝑆. In this case, we would first find which of the strings 𝑠𝑖 that 𝑆 is contained
in using polynomial hashing. To respond to a query, could for example keep
a set of all the 𝑖 where 𝑠𝑖 was an occurrence together with how many smaller
𝑠𝑖 contained the string (i.e. some kind of partial sum). This would allow us
to respond to a query where 𝐿 = 1 using a upper bound in our set. Solving
queries of the form [1, 𝑅] is equivalent to general intervals however, since the
interval [𝐿, 𝑅] is simply the interval [1, 𝑅] with the interval [1, 𝐿 − 1] removed.
This procedure would take Θ( |𝑠𝑖 |) time to find the occurrences of 𝑆, and
Í

𝑂 (𝑄 log 𝑁 ) time to answer the queries.


When extending this to the general case where our queries may contain
different 𝑆, we do the same thing but instead find the occurrences of all the
patterns of the same length 𝑝 simultaneously. This can be done by keeping the
hashes of those patterns in a map,√to allow for fast look-up of our rolling hash.
Since there can only be at most 20 000 ≈ 140 different pattern lengths, we
must perform about 140 · 50 000 ≈ 7 000 000 set look-ups, which is feasible.

Substring Range Matching


1 int countInterval(int upTo, const set<pii>& s) {
2 auto it = s.lower_bound(pii(upTo + 1, 0));
3 if (it == s.begin()) return 0;
4 return (--it)->second;
5 }
6
7 int main() {
8 int N, Q;
9 cin >> N >> Q;
10 vector<string> s(N);
11 rep(i,0,N) cin >> s[i];
12
13 map<int, set<string>> patterns;
14
15 vector<tuple<int, int, string>> queries;
16 rep(i,0,Q) {
17 int L, R;
18 string S;
19 cin >> L >> R >> S;
20 queries.emplace_back(L, R, S);

366
24.1. H ASHING

21 patterns[sz(s)].insert(S);
22 }
23
24 map<H, set<pii>> hits;
25 trav(pat, patterns) {
26 rep(i,0,N) {
27 vector<H> hashes = rollHash(s[i], pat.first);
28 trav(h, hashes)
29 if (pat.second.count(h))
30 hits[h].emplace(i, sz(hits[h]) + 1);
31 }
32 }
33
34 trav(query, queries) {
35 H h = polyHash(get<2>(query));
36 cout << countInterval(R, hits[h]) - countInterval(L-1, hits[h]) << endl;
37 }
38 }

Exercise 24.3. Hashing can be used to determine which of two substrings


are the lexicographically smallest one. How? Extend this result to a simple
Θ(𝑛 log 𝑆 + 𝑆) construction of a suffix array, where 𝑛 is the number of strings
and 𝑆 is the length of the string.

The Parameters of Polynomial Hashes


Until now, we have glossed over the choice of 𝑀 and 𝑝 in our polynomial
hashing. These choices happen to be important. First of all, we want 𝑀 and 𝑝
to be relatively prime. This ensures 𝑝 has an inverse modulo 𝑀, which we use
when erasing characters from the end of a hash. Additionally, 𝑝 𝑖 mod 𝑀 have a
smaller period when 𝑝 and 𝑀 share a factor.
We wish 𝑀 to be sufficiently large, to avoid hash collisions. If we compare

the hashes of 𝑐 strings, we want 𝑀 = Ω( 𝑐) to get a reasonable chance at
avoiding collisions. However, this depends on how we use hashing. 𝑝 must be
somewhat large as well. If 𝑝 is smaller than the alphabet, we get trivial collisions
such as 𝐻 (10) = 𝐻 (𝑝).
Whenever we perform rolling hashes, we must have (𝑀 − 1)𝑝 < 264 if we
use 64-bit unsigned integers to implement hashes. Otherwise, the addition of a
character would overflow. If we perform substring hashes, we instead need that
(𝑀 − 1) 2 < 264 , since we perform multiplication of a hash and an arbitrary power
of 𝑝. When using 32-bit or 128-bit hashes, these limits change correspondingly.
Note that the choice of hash size depends on how large an 𝑀 we can choose,
which affect collision rates.

367
C HAPTER 24. S TRINGS

One might be tempted to choose 𝑀 = 264 and use the overflow of 64-bit
integers as a cheap way of using hashes modulo 264 . This is a bad idea, since it
is possible to construct strings which are highly prone to collisions.

Definition 24.2 — Thue-Morse Sequence


Let the binary sequence 𝜏𝑖 be defined as
(
0 if 𝑖 = 0
𝜏𝑖 =
𝜏𝑖−1𝜏𝑖−1 if 𝑖 > 0

The Thue-Morse sequence is the infinite sequence 𝜏𝑖 as 𝑖 → ∞.


This sequence is well-defined since 𝜏𝑖 is a prefix of 𝜏𝑖−1 , meaning each
recursive step only append a string to the sequence. It starts 0, 01, 0110,
01101001, 0110100110010110.

Exercise 24.4. Prove that 𝜏2𝑖 is a palindrome.

Theorem 24.2
𝑛 (𝑛+1)
For a polynomial hash 𝐻 with an odd 𝑝, 2 2 | 𝐻 (𝜏𝑛 ) − 𝐻 (𝜏𝑛 ).

Proof. We will prove this by induction on 𝑛. For 𝑛 = 0, we have 1| |


𝐻 (𝜏𝑛 ) − 𝐻 (𝜏𝑛 ) which is vacuously true.
In our inductive step, we have that

𝐻 (𝜏𝑛 ) = 𝐻 (𝜏𝑛−1 ||𝜏𝑛−1 ) = 𝑝 2


𝑛−1
· 𝐻 (𝜏𝑛−1 ) + 𝐻 (𝜏𝑛−1 )

and
𝐻 (𝜏𝑛 ) = 𝐻 (𝜏𝑛−1 ||𝜏𝑛−1 ) = 𝑝 2
𝑛−1
· 𝐻 (𝜏𝑛−1 ) + 𝐻 (𝜏𝑛−1 )
Then,

𝐻 (𝜏𝑛 ) − 𝐻 (𝜏𝑛 ) = 𝑝 2
𝑛−1
(𝐻 (𝜏𝑛−1 ) − 𝐻 (𝜏𝑛−1 )) + (𝐻 (𝜏𝑛−1 ) − 𝐻 (𝜏𝑛−1 ))
2𝑛−1
= (𝑝 − 1) (𝐻 (𝜏𝑛−1 ) − 𝐻 (𝜏𝑛−1 ))

Note that 𝑝 2 − 1 = (𝑝 2 − 1) (𝑝 2 + 1) If 𝑝 is odd, the second factor is


𝑛−1 𝑛−2 𝑛−2

divisible by 2. By expanding 𝑝 2 , we can prove that 𝑝 2 is divisible by 2𝑛 .


𝑛−2 𝑛−1

368
24.1. H ASHING

Using our induction assumption, we have that


(𝑛−1)𝑛
2𝑛 · 2 | (𝑝 2 − 1) (𝐻 (𝜏𝑛−1 ) − 𝐻 (𝜏𝑛−1 ))
𝑛−1
2

(𝑛−1)𝑛 𝑛 (𝑛+1)
But 2𝑛 · 2 2 =2 2 , proving our statement. 

This means that we can construct a string of length linear in the bit size of 𝑀
that causes hash collisions if we choose 𝑀 as a power of 2, explaining why it is
a bad choice.

2D Polynomial Hashing
Polynomial hashing can also be applied to pattern matching in grids, by first
performing polynomial hashing on all rows of the grid (thus reducing the grid
to a sequence) and then on the columns.

Surveillance
Swedish Olympiad in Informatics 2016, IOI Qualifiers
Given a matrix of integers 𝐴 = (𝑎𝑟,𝑐 ) find all occurrences of another matrix
𝑃 = (𝑝𝑟,𝑐 ) in 𝐴 which may differ by a constant 𝐶. An occurrence (𝑖, 𝑗) means
that 𝑎𝑖+𝑟,𝑗+𝑐 = 𝑝𝑟,𝑐 + 𝐶 where 𝐶 is a constant.

If we assume that 𝐶 = 0, the problem is reduced to simple 2D pattern matching,


which is easily solved by hashing. The requirement that such a pattern should
be invariant to addition by a constant is a bit more complicated.
How would we solve this problem in one dimension, i.e. when 𝑟 = 1? In
this case, we have that a match on column 𝑗 would imply

𝑎 1,𝑗 − 𝑝 1,1 = 𝑐

...

𝑎 1,𝑗+𝑛−1 − 𝑝 1,𝑛 = 𝑐
Since 𝑐 is arbitrary, this means the only condition is that

𝑎 1,𝑗 − 𝑝 1,1 = · · · = 𝑎 1,𝑗+𝑛−1 − 𝑝 1,𝑛 = 𝑐

Rearranging this gives us that

𝑎 1,𝑗 − 𝑎 1,𝑗+1 = 𝑝 1,1 − 𝑝 1,2

369
C HAPTER 24. S TRINGS

𝑎 1,𝑗+1 − 𝑎 1,𝑗+2 = 𝑝 1,2 − 𝑝 1,3


...
By computing these two sequences of the adjacent differences of elements 𝑎 1,𝑖
and 𝑟 1,𝑗 , we have reduced the problem to substring matching and can apply
hashing. In 2D, we can do something similar. For a match (𝑖, 𝑗), it is sufficient
that this property holds for every line and every column in the match. We can
then find matches using two 2D hashes.
Problem 24.1
Chasing Subs – chasingsubs

24.2 Dynamic Hashing

370
A Discrete Mathematics
This appendix reviews some basic discrete mathematics. Without a good grasp
on the foundations of mathematics, algorithmic problem solving is basically
impossible. When we analyze the efficiency of algorithms, we use sums,
recurrence relations and a bit of algebra. Some basic topics, such as set theory,
are essential to even understand some of the proofs and problems in this book.
This mathematical preliminary touches lightly upon these topics and is
meant to complement a high school education in mathematics in preparation for
the remaining text. While you can probably get by with the mathematics from
this chapter, we highly recommend that you (at some point) delve deeper into
discrete mathematics.
We do assume that you are familiar with induction or contradiction, and
mathematics that is part of a pre-calculus course (trigonometry, polynomials,
etc). Some more mathematically advanced parts of this book will go beyond
these assumptions, but this is only the case in very few places.

371
A PPENDIX A. D ISCRETE M ATHEMATICS

A.1 Logic
In mathematics, we often deal with truths and falsehoods in the form of theorems,
proofs, counter-examples and so on. Mathematical logic is a very exact discipline,
and a precise language has been developed to help us deal with logical statements.
For example, consider the statements

1. an integer is either odd or even,

2. programming is more fun than mathematics,

3. 𝑥 is negative if and only if 𝑥 3 is negative,

4. every apple is blue,

5. there exists an integer.

6. if there exists an odd number divisible by 6, every integer is even,

The first statement uses the logical connective or. It connects two statements,
and requires only one of them to be true in order for the whole statement to be
true. Since any integer is either odd or even, the statement is true.
The second statement is not really a logical statement. While we might have
a personal conviction regarding the entertainment value of programming and
maths, it is hard to consider the statements as having a truth value.
The third statement tells us that two statements are equivalent – one is true
exactly when the other is. This is also a true statement by some simple algebraic
manipulations.
The fourth statement concerns every object if some kind. It is a false
statement, a fact that can be proved by exhibiting e.g., a green apple.
The fifth statement is true. It asks whether something exists, a statement we
can prove by presenting an integer such as 42.
The sixth and last statement complicates matters by introducing an implica-
tion. It is a two-part statement, which only makes a claim regarding the second
part if the first part is true. Since no odd number divisible by 6 exists, it makes
no statement about the evenness of every integer. Thus, this implication is true.
To express such statements, a language has been developed where all these
logical operations such as existence, implication and so on have symbols
assigned to them. This enables us to remove the ambiguity inherent in the

372
A.1. L OGIC

English language, which is of utmost importance when dealing with the exactness
required by logic.
The disjunction (𝑎 is true or 𝑏 is true) is a common logical connective. It is
given the symbol ∨, so that the above statement is written as 𝑎 ∨ 𝑏. Another
common connective, the conjunction (𝑎 is true and 𝑏 is true) is assigned the
symbol ∧. For example, we write that 𝑎 ∧ 𝑏 for the statement that both 𝑎 and 𝑏
are true.
The third statement introduced the equivalence, a statement of the form “𝑎
is true if, and only if, 𝑏 is true”. This is the same as 𝑎 → 𝑏 (the only if part)
and 𝑏 → 𝑎 (the if part). We use the symbol ↔, which follows naturally for this
reason. The statement would then be written as
𝑥 < 0 ↔ 𝑥3 < 0
Logical also contains quantifiers. The fourth statement, that every apple is
blue, actually makes a large number of statements – one for each apple. This
concept is captured using the universal quantifier ∀, read as “for every”. For
example, we could write the statement as
∀ apple 𝑎 : 𝑎 is blue
In the fifth statement, another quantifier was used, which speaks of the
existence of something; the existential quantifier ∃, which we read as “there
exists”. We would write the second statement as
∃𝑥 : 𝑥 is an integer
An implication is a statement of the form “if 𝑎 is true, then 𝑏 must also be
true”. This is a statement on its own, which is true whenever 𝑎 is false (meaning
it does not say anything of 𝑏), or when 𝑎 is true and 𝑏 is true. We use the symbol
→ for this, writing the statement as 𝑎 → 𝑏. The sixth statement would hence be
written as
(∃𝑝 : 𝑝 is prime ∧ 𝑝 is divisible by 6) → ∀ prime 𝑝 : 𝑝 is even
The negation operator ¬ inverts a statement. The statement “no penguin
can fly” would thus be written as
¬(∃ penguin 𝑝 : 𝑝 can fly)
or, equivalently
∀ penguin 𝑝 : ¬𝑝 can fly

373
A PPENDIX A. D ISCRETE M ATHEMATICS

Exercise A.1. Write the following statements using the logical symbols, and
determine whether they are true or false:
1) If 𝑎 and 𝑏 are odd integers, 𝑎 + 𝑏 is an even integer,
2) 𝑎 and 𝑏 are odd integers if and only if 𝑎 + 𝑏 is an even integer,
3) Whenever it rains, the sun does not shine,
4) 𝑎𝑏 is 0 if and only if 𝑎 or 𝑏 is 0
Our treatment of logic ends here. Note that much is left unsaid – it is a
most rudimentary walk-through. This section is mainly meant to give you some
familiarity with the basic symbols used in logic, since they will appear later. If
you wish to gain a better understanding of logic, you can follow the references
in the chapter notes.

A.2 Sets and Sequences


A set is an unordered collection of distinct objects, such as numbers, letters,
other sets, and so on. The objects contained within a set are called its elements,
or members. Sets are written as a comma-separated list of its elements, enclosed
by curly brackets:
𝐴 = {2, 3, 5, 7}
In this example, 𝐴 contains four elements: the integers 2, 3, 5 and 7.
Because a set is unordered and only contains distinct objects, the set
{1, 2, 2, 3} is the exact same set as {3, 2, 1, 1} and {1, 2, 3}.
If 𝑥 is an element in a set 𝑆, we write that 𝑥 ∈ 𝑆. For example, we have that
2 ∈ 𝐴 (referring to our example 𝐴 above). Conversely, we use the notation 𝑥 ∉ 𝑆
when the opposite holds. We have e.g., that 11 ∉ 𝐴.
Another way of describing the elements of a set uses the set builder notation,
in which a set is constructed by explaining what properties its elements should
have. The general syntax is

{element | properties that the element must have}

To construct the set of all even integers, we would use the syntax

{2𝑖 | 𝑖 is an integer}

which is read as “the set containing all numbers of the form 2𝑖 where 𝑖 is an
integer. To construct the set of all primes, we would write

{𝑝 | 𝑝 is prime}

374
A.2. S ETS AND S EQUENCES

Certain sets are used often enough to be assigned their own symbols:
• Z – the set of integers {. . . , −2, −1, 0, 1, 2, . . . },

• Z+ – the set of positive integers {1, 2, 3, . . . },

• N – the set of non-negative integers {0, 1, 2, . . . },

• Q – the set of all rational numbers { 𝑞 | 𝑝, 𝑞 integers where 𝑞 ≠ 0},


𝑝

• R – the set of all real numbers,

• [𝑛] – the set of the first 𝑛 positive integers {1, 2, . . . , 𝑛},

• ∅ – the empty set.


Exercise A.2. 1) Use the set builder notation to describe the set of all odd
integers.
2) Use the set builder notation to describe the set of all negative integers.
3) Compute the elements of the set {𝑘 | 𝑘 is prime and 𝑘 2 ≤ 30}.
A set 𝐴 is a subset of a set 𝑆 if, for every 𝑥 ∈ 𝐴, we also have 𝑥 ∈ 𝑆 (i.e.,
every member of 𝐴 is a member of 𝑆). We denote this with 𝐴 ⊆ 𝑆. For example

{2, 3} ⊆ {2, 3, 5, 7}

and
2 −1
 
, 2, ⊆Q
4 7
For any set 𝑆, we have that ∅ ⊆ 𝑆 and 𝑆 ⊆ 𝑆. Whenever a set 𝐴 is not a subset of
another set 𝐵, we write that 𝐴 * 𝐵. For example,

{2, 𝜋 } * Q

since 𝜋 is not a rational number.


We say that sets 𝐴 and 𝐵 are equal whenever 𝑥 ∈ 𝐴 if and only if 𝑥 ∈ 𝐵. This
is equivalent to 𝐴 ⊆ 𝐵 and 𝐵 ⊆ 𝐴. Sometimes, we will use the latter condition
when proving set equality, i.e., first proving that every element of 𝐴 must also
be an element of 𝐵 and then the other way round.
Exercise A.3. 1) List all subsets of the set {1, 2, 3}.
2) How many subsets does a set containing 𝑛 elements have?
3) Determine which of the following sets are subsets of each other:

375
A PPENDIX A. D ISCRETE M ATHEMATICS

• ∅

• Z

• Z+

• {2𝑘 | 𝑘 ∈ Z}

• {2, 16, 12}

Sets also have many useful operations defined on them. The intersection
𝐴 ∩ 𝐵 of two sets 𝐴 and 𝐵 is the set containing all the elements which are
members of both sets, i.e.,

𝑥 ∈ 𝐴∩𝐵 ⇔𝑥 ∈ 𝐴∧𝑥 ∈ 𝐵

If the intersection of two sets is the empty set, we call the sets disjoint. A
similar concept is the union 𝐴 ∪ 𝐵 of 𝐴 and 𝐵, defined as the set containing
those elements which are members of either set.
For example, if

𝑋 = {1, 2, 3, 4}, 𝑌 = {4, 5, 6, 7}, 𝑍 = {1, 2, 6, 7}

Then,
𝑋 ∩ 𝑌 = {4}
𝑋 ∩𝑌 ∩𝑍 = ∅
𝑋 ∪ 𝑌 = {1, 2, 3, 4, 5, 6, 7}
𝑋 ∪ 𝑍 = {1, 2, 3, 4, 6, 7}

Exercise A.4. Compute the intersection and union of:


1) 𝐴 = {1, 4, 2}, 𝐵 = {4, 5, 6}
2) 𝐴 = {𝑎, 𝑏, 𝑐}, 𝐵 = {𝑑, 𝑒, 𝑓 }
3) 𝐴 = {apple, orange}, 𝐵 = {pear, orange}

A sequence is an ordered collection of values (predominantly numbers)


such as 1, 2, 1, 3, 1, 4, . . . . Sequences will mostly be a list of sub-scripted
variables, such as 𝑎 1, 𝑎 2, . . . , 𝑎𝑛 . A shorthand for this is (𝑎𝑖 )𝑖=1
𝑛 , denoting the

sequence of variables 𝑎𝑖 where 𝑖 ranges from 1 to 𝑛. An infinite sequence is


given ∞ as its upper bound: (𝑎𝑖 )𝑖=1 ∞ .

376
A.3. S UMS AND P RODUCTS

A.3 Sums and Products


The most common mathematical expressions we deal with are sums of sequences
of numbers, such as 1 + 2 + · · · + 𝑛. Such sums often have a variable number of
terms and complex summands, such as 1·3·5+3·5·7+· · ·+(2𝑛+1) (2𝑛+3) (2𝑛+5).
In these cases, sums given in the form of a few leading and trailing terms, with
the remaining part hidden by . . . is too imprecise. Instead, we use a special
syntax for writing sums in a formal way – the sum operator:
𝑘
Õ
𝑎𝑖
𝑖=𝑗

The symbol denotes the sum of the 𝑗 − 𝑘 + 1 terms 𝑎 𝑗 + 𝑎 𝑗+1 + 𝑎 𝑗+2 + · · · + 𝑎𝑘 ,


which we read as “the sum of 𝑎𝑖 from 𝑗 to 𝑘”.
For example, we can express the sum 2 + 4 + 6 + · · · + 12 of the 6 first even
numbers as
Õ6
2𝑖
𝑖=1

Exercise A.5. Compute the sum


4
Õ
2·𝑖 −1
𝑖=−2

Many useful sums have closed forms – expressions in which we do not need
sums of a variable number of terms.
Exercise A.6. Prove the following identities:
𝑛
Õ
𝑐 = 𝑐𝑛
𝑖=1

𝑛(𝑛 + 1)
Õ𝑛
𝑖=
𝑖=1
2
𝑛
𝑛(𝑛 + 21 ) (𝑛 + 1)
𝑖2 =
Õ

𝑖=1
3
𝑛
Õ
2𝑖 = 2𝑛+1 − 1
𝑖=0

377
A PPENDIX A. D ISCRETE M ATHEMATICS

The sum of the inverses of the first 𝑛 natural numbers happen to have a very
neat approximation, which we will occasionally make use of later on:

1
𝑛
Õ
≈ ln 𝑛
𝑖=1
𝑛

This is a reasonable approximation, since 1 𝑥1 𝑑𝑥 = ln(𝑙)


∫𝑙

There is an analogous notation for products, using the product operator :


Î

𝑘
Ö
𝑎𝑖
𝑖=𝑗

denotes the product of the 𝑗 − 𝑘 + 1 terms 𝑎 𝑗 · 𝑎 𝑗+1 · 𝑎 𝑗+2 · · · · · 𝑎𝑘 , which we read


as “the product of 𝑎𝑖 from 𝑗 to 𝑘”.
In this way, the product 1 · 3 · 5 · · · · · (2𝑛 − 1) of the first 𝑛 odd integers can
be written as
Ö 𝑛
2𝑖 − 1
𝑖=1

Exercise A.7. Prove that


𝑛 𝑛+2 𝑛+2
Ö 𝑛 Ö Ö
(𝑛 + 2) 𝑖+ 𝑖= 𝑖
𝑖=1
𝑛 + 1 𝑖=1 𝑖=1

Chapter Notes
If you need a refresher on some more basic mathematics, such as single-variable
calculus, Calculus [26] by Michael Spivak is a solid textbook. It is not the
easiest book, but one the best undergraduate text on single-variable calculus if
you take the time to work it through.
For a gentle introduction to discrete mathematics, Discrete and Combinato-
rial Mathematics: An Applied Introduction [12] by Ralph Grimaldi is a nice
book with a lot of breadth.
Logic in Computer Science [14] is an introduction to formal logic, with many
interesting computational applications. The first chapter on propositional logic
is sufficient for most algorithmic problem solving, but the remaining chapters
shows many non-obvious applications that makes logic relevant to computer
science.

378
A.3. S UMS AND P RODUCTS

One of the best works on discrete mathematics ever produced for the aspiring
algorithmic problem solver is Concrete Mathematics [15], co-authored by famous
computer scientist Donald Knuth. It is rather heavy-weight, and probably serves
better as a more in-depth study of the foundations of discrete mathematics rather
than an introductory text.
Graph Theory [10] by Reinhard Diestel is widely acknowledged as the go-to
book on more advanced graph theory concepts. The author is freely available
for viewing at the book’s home page1.

1 http://diestel-graph-theory.com/

379
A PPENDIX A. D ISCRETE M ATHEMATICS

380
Hints
1.1 Try dividing cards into smaller piles that can be sorted separately.
1.6 The optimal number of questions is 6.
2.12 Try solving it for the special case 𝑦 = 2 first.
5.1 In the best case, line 4 of the insertion sort pseudo code never executes.
5.3 When is log2 𝑛 < 𝑛?
5.4 𝑐 = 2 for the upper bound.
5.5
1) Yes.
2) No.
5.6 Binomial expansion.
5.7
6.10

1. Sum the maximum number of steps each element can move.

2. What is the limit of an geometric series?

7.4 Use that 1.61 + 1 > 1.612 and 1.62 + 1 < 1.622 .
7.5 The positive root of the equation 𝑥 3 = 𝑥 2 + 𝑥 1 + 1 lies between 1.83 and
1.84.
7.6
3) The 𝑛 choices are which of the two letters to put on each position in the
string.
4) The 𝑛 choices are whether to include each element or not.
7.7 Since the three recursions are structurally identical, they will have the same
time complexity 𝑇 (𝑛).

381
A PPENDIX A. D ISCRETE M ATHEMATICS

382
Solutions
1.1 One possible solution is to first divide the cards into separate piles by values
1 − 100 000, 100 001 − 200 000, . . . . If we sort each such pile, the entire stack
of cards is sorted by putting the piles together. Each such pile can be sorted the
same way by instead dividing the cards up based on their 100000 digits, and so
on.
1.2
5) The input consists of two integers 𝑎 and 𝑏, not both 0. The output should
be the greatest common divisor of 𝑎 and 𝑏.
6) The input consists of a sequence of real numbers, the coefficients 𝑥𝑖 of a
polynomial. The output should be a real number that is a root of the polynomial.
7) The input consists of two integers 𝑎 and 𝑏. The output should be the
product 𝑎𝑏.
1.6 One can achieve 6 questions by always asking about the midpoint of the
range of possible numbers. For example, by asking about the number 50, one
knows if the correct number is between 1 − 49 or 51 − 100.
1.8 Given an algorithm that is correct with a probability 0.5 + 𝛼 for some 𝛼 > 0,
we can find the correct answer by running it many times and chosing the answer
that was most common.
1.12

Palindrome If we let the input 𝑛-letter word 𝑆 have the letters 𝑠 0, 𝑠 1, . . . , 𝑠𝑛−1 ,
it reads the same backwards and forwards if 𝑠 0𝑠 1 . . . 𝑠𝑛−1 = 𝑠𝑛−1𝑠𝑛−2 . . . 𝑠 0 . We
thus need to check all the letter pairs (𝑠 0 , 𝑠𝑛−1 ), (𝑠 1 , 𝑠𝑛−2 ) and so forth for equality.

1: procedure Palindrome(string 𝑆)
2: for 𝑖 from 0 to 𝑛 − 1 do
3: if 𝑆𝑖 ≠ 𝑆𝑛−1−𝑖 then
4: return false
5: return true

383
A PPENDIX A. D ISCRETE M ATHEMATICS

Primality The problem can be solved by checking all the numbers between 2
and 𝑛 − 1 to see if any of them are divisors of 𝑛. If not, then it is prime. This
follows from the definition, and the fact that a (positive) divisor of a positive
integer can not be greater than the integer itself.

1: procedure Primality(integer 𝑛)
2: for 𝑖 from 2 to 𝑛 − 1 do
3: if 𝑖 divides 𝑛 then
4: return false
5: return true

2.12 We only analyze the case where 𝑥 and 𝑦 are positive. Assume that
0 ≤ 𝑎 < 𝑥𝑦 ≤ 𝑎 + 1, so that the result when rounded to an integer away from zero
is 𝑎 + 1. Multiplying by 𝑦 gives us 𝑎𝑦 < 𝑥 ≤ 𝑎𝑦 + 𝑦, so that 𝑎𝑦 ≤ 𝑥 − 1 < 𝑎𝑦 + 𝑦
(since all values are now integers). Finally, adding 𝑦 to both inequalities gives us
𝑎𝑦 + 𝑦 ≤ 𝑥 − 1 + 𝑦 < 𝑎𝑦 + 2𝑦. After dividing by 𝑦, we get 𝑎 + 1 ≤ 𝑦 < 𝑎 + 2.
𝑥−1+𝑦

This means that the result of 𝑦 rounded towards zero is 𝑎 + 1, which is what
𝑥−1+𝑦

we wanted.
Analysis for negative 𝑥 and 𝑦 is similar.
5.1 Consider the case when the array is already sorted. In this case, the inner
loop on line 4 never executes, since 𝐴[ 𝑗] ≥ 𝐴[ 𝑗 − 1] for all 𝑗. Thus, only the
lines that take linear time in total are executed, making 𝑂 (𝑛) an upper bound on
the base case. On the other hand, the loop on line 2 always executes a linear
number of times no matter the case, so Ω(𝑛) is also a lower bound. Thus, the
algorithm has a Θ(𝑛) best-case running time.
5.2 To compute the sum in Θ(𝑛) time, we can add all the variables to a counter
using a for loop, one at a time.
To solve the problem in constant time, the formula 1 + 2 + · · · + 𝑛 = 𝑛 (𝑛+1)
2
can be used.
5.3 Let 𝑛 0 = 7. For any 𝑛 ≥ 1, we have log2 𝑛 < 𝑛 as 𝑛 < 2𝑛 (which can be
proved using either induction or simple calculus). In this case, 10𝑛 2 + 7𝑛 −
5 + log2 𝑛 ≤ 10𝑛 2 + 𝑛 2 + 𝑛 2 = 12𝑛 2 . Thus, with 𝑐 = 12 we get the required
statement.
5.4 Clearly max{𝑓 (𝑛), 𝑔(𝑛)} ≤ 𝑓 (𝑛) + 𝑔(𝑛) since the maximum of the two
functions is always equal to one of the functions. This means that 𝑓 (𝑛) + 𝑔(𝑛) =
Ω(max{𝑓 (𝑛), 𝑔(𝑛)}) with 𝑐 = 1. Similarly, 𝑓 (𝑛) + 𝑔(𝑛) ≤ 2 max{𝑓 (𝑛), 𝑔(𝑛)}

384
A.3. S UMS AND P RODUCTS

by the fact that each function individually is smaller than their maximum.
Thus 𝑓 (𝑛) + 𝑔(𝑛) = 𝑂 (max{𝑓 (𝑛), 𝑔(𝑛)}) with 𝑐 = 2. Together this proves the
statement.
5.5
8) This is clear with 𝑐 = 2.
9) For any 𝑐, picking 𝑛 such that 2𝑛 > 𝑐 gives us 22𝑛 = 2𝑛 · 2𝑛 > 𝑐2𝑛 , so no
𝑐 can satisfy the definition.
5.6 First, note that polynomials of higher powers are always greater than
polynomials of lower powers eventually:

𝑎𝑛𝑘 < 𝑛𝑘+1

is true whenever 𝑛 > 𝑎.


Next, we can write (𝑛 + 𝑎)𝑏 as the sum of 𝑛𝑏 plus a lot of terms of
lower powers of 𝑛 using the formula for the binomial expansion. However,
this means that max{𝑛𝑏 , (𝑛 + 𝑎)𝑏 − 𝑛𝑏 } = 𝑛𝑏 for sufficiently large 𝑛. Thus,
(𝑛 +𝑎)𝑏 = 𝑛𝑏 + ((𝑛 +𝑎)𝑏 −𝑛𝑏 ) = Θ(max{𝑛𝑏 , (𝑛 +𝑎)𝑏 −𝑛𝑏 }) = Θ𝑛𝑏 by a previous
result.
5.7
6.10 Assume that the tree has 𝑛 = 2𝑘 elements. The bottom 𝑛 elements will
move 0 steps. The 𝑛2 elements of the next layer moves at most 1 step. The 𝑛4
elements of the next layer moves at most 2 step, and so on. In total, this means
that there are most
𝑛 𝑛 𝑛 𝑛
+2 +3 +···+𝑘 𝑘
2 4 8 2
movements. Note that
𝑛 𝑛
+ +··· ≤ 𝑛
2 4
𝑛 𝑛 𝑛
+ +··· ≤
4 8 2
𝑛 𝑛 𝑛
+ +··· ≤
8 16 4
and so on. If we sum up all of these inequalities, we get that the original sum
𝑛 𝑛 𝑛 𝑛
+2 +3 +···+𝑘 𝑘 ≤
2 4 8 2
𝑛 𝑛
𝑛+ + + · · · ≤ 2𝑛
2 4
proving that the elements are at most moved a linear number of time.

385
A PPENDIX A. D ISCRETE M ATHEMATICS

7.1 They are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377.
7.4 Assume that 𝑇 (𝑛) ≥ 1.61𝑛 for all 𝑛 < 𝑛 0, and for 𝑛 = 1. Then 𝑇 (𝑛 0) ≥
0 0 0 0
1.61𝑛 −1 + 1.61𝑛 −2 = 1.61𝑛 −2 (1.61 + 1) ≥ 1.61𝑛 so that 𝑇 (𝑛) ≥ 1.61𝑛 also for
𝑛 = 𝑛 0. By the principle of induction, 𝑇 (𝑛) ≥ 1.61𝑛 holds for all 𝑛 ≥ 0.
The proof is similar for the upper bound.
7.5 We have that the time function 𝑇 (𝑛) ≥ 𝑇 (𝑛 − 1) + 𝑇 (𝑛 − 2) + 𝑇 (𝑛 − 3). If,
by induction, 𝑇 (𝑘) > 1.83𝑘 for all 𝑘 < 𝑛 we get

𝑇 (𝑛) ≥ 1.83𝑛−1 + 1.83𝑛−2 + 1.83𝑛−3


= 1.83𝑛−3 (1 + 1.83 + 1.832 )
≥ 1.83𝑛−3 1.833
= 1.83𝑛

so the claim holds for 𝑇 (𝑛) too. Proving the upper bound is similar
7.6
10) Let 𝐴(𝑛) be the number of such strings. If the last character of the string
was a B, the remaining string can be formed in 𝐴(𝑛 − 1) ways. If the last character
of the string was an A, the second last character must have been a B (to avoid two
consecutive A’s). There are 𝐴(𝑛 − 2) ways in which the remaining string can be
formed after fixing these two letters, so that 𝐴(𝑛) = 𝐴(𝑛 − 1) + 𝐴(𝑛 − 2). The
base cases are 𝐴(0) = 1 and 𝐴(1) = 2.
11) Let 𝐵(𝑛) be the number of such subsets. If the element 𝑛 is to be
included in the subset, we can choose the remaining 𝑛 − 1 elements in 𝐵(𝑛 − 1)
ways. If the element 𝑛 is to be excluded from the subset, the element 𝑛 − 1 must
be according to the problem. The remaining 𝑛 − 2 elements can then be chosen
in 𝐵(𝑛 − 2) ways, for the recursion 𝐵(𝑛) = 𝐵(𝑛 − 1) + 𝐵(𝑛 − 2). The base cases
are 𝐴(0) = 1 and 𝐴(1) = 2.
7.7 The time complexity fulfills 𝑇 (𝑛) = 2𝑇 (𝑛 − 1) + 𝑂 (1). By induction, we
get 𝑇 (𝑛) = Θ(2𝑛 ).

386
Bibliography
[1] Exact Exponential Algorithms. Fedor V. Fomin and Dieter Kratsch.
Springer, 2010.

[2] Noga Alon, Raphy Yuster, and Uri Zwick. Color-coding: A new method
for finding simple paths, cycles and other small subgraphs within large
graphs. In Proceedings of the Twenty-Sixth Annual ACM Symposium on
Theory of Computing, STOC ’94, page 326–335, New York, NY, USA,
1994. Association for Computing Machinery.

[3] Sanjeev Arora and Boaz Barak. Computational Complexity: A Modern


Approach. Cambridge University Press, 2009.

[4] David Beazley and Brian K. Jones. Python Cookbook. O’Reilly, 2013.

[5] Joshua Bloch. Effective Java. Pearson Education, 2008.

[6] Xuan Cai. Canonical coin systems for change-making problems. In 2009
Ninth International Conference on Hybrid Intelligent Systems, volume 1,
pages 499–504, Aug 2009.

[7] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford


Stein. Introduction to Algorithms. The MIT Press, 3rd edition, 2009.

[8] Marek Cygan, Fedor V. Fomin, Łukasz Kowalik, Daniel Lokshtanov,


Dániel Marx, Marcin Pilipczuk, Michał Pilipczuk, and Saket Saurabh.
Parameterized Algorithms. Springer, 2015.

[9] Nicolaas G. de Bruijn, Cornelia van Ebbenhorst Tengbergen, and Dirk


Kruyswijk. On the set of divisors of a number. Nieuw Archief voor
Wiskunde, serie 2, 23:191–193, 1951.

[10] Reinhard Diestel. Graph Theory. Springer, 2016.

[11] Philippe Flajolet and Robert Sedgewick. An Introduction to the Analysis


of Algorithms. Addison-Wesley, 2013.

387
B IBLIOGRAPHY

[12] Ralph P. Grimaldi. Discrete and Combinatorial Mathematics: An Applied


Introduction. Pearson Education, 2003.

[13] Godfrey H. Hardy and Edward M. Wright. An Introduction to the Theory


of Numbers. Oxford University Press, 2008.

[14] Michael Huth. Logic in Computer Science. Cambridge University Press,


2004.

[15] Donald E. Knuth, Oren Patashnik, and Ronald Graham. Concrete Mathe-
matics: A Foundation for Computer Science. Addison-Wesley, 1994.

[16] George S. Lueker. Two NP-complete Problems in Nonnegative Integer


Programming. Princeton University. Department of Electrical Engineering,
1975.

[17] Robert C. Martin. Clean Code: A Handbook of Agile Software Craftsman-


ship. Pearson Education, 2009.

[18] Steve McConnell. Code Complete: A Practical Handbook of Software


Construction. Microsoft Press, 2004.

[19] Scott Meyers. Effective STL. O’Reilly, 2001.

[20] Scott Meyers. Effective C++. O’Reilly, 2005.

[21] Scott Meyers. Effective Modern C++. O’Reilly, 2014.

[22] Christos Papadimitriou. Computational Complexity. Addison-Wesley,


1994.

[23] Charles Petzold. CODE. Microsoft Press, 2000.

[24] Victor Shoup. A Computational Introduction to Number Theory and


Algebra. Cambridge University Press, 2008.

[25] Brett Slatkin. Effective Python. Addison-Wesley, 2015.

[26] Michael Spivak. Calculus. Springer, 1994.

[27] Bjarne Stroustrup. The C++ Programming Language. Addison-Wesley,


2013.

388
B IBLIOGRAPHY

[28] Jeffrey Ullman and John Hopcroft. Introduction to Automata Theory,


Languages, and Computation. Pearson Education, 2014.

[29] Mark A. Weiss. Data Structures and Algorithm Analysis in C++. Pearson,
2013.

389
Index
𝐾𝑛 , 132 composite number, 310
computational problem, 3
addition principle, 267 conjunction, 373
adjacency lists, 137 connected, 143
adjacency matrix, 136 connected component, 143
algorithm, 5 continue statement, 32
amortized complexity, 90 correctness, 7
and, 373 cycle, 143
and operator, 29 cycle decomposition, 273
array, 40
assignment operator, 21 data structure, 97
auto, 24 degree, of vertex, 133
Dijkstra’s Algorithm, 239
BFS, 139, 232 directed graph, 135
bijection, 271 disjoint sets, 376
binary search, 210 disjunction, 373
binary tree, 104 divide and conquer, 201
binomial coefficient, 278 divides exactly, 316
bipartite matching, 253 divisibility, 303
boolean, 24 divisor, 303
breadth-first search, 139, 232 double, 23
break statement, 32 Dyck path, 282
char, 22 dynamic array, 98
closed trail, 143 edge, 131
closed walk, 143 element, 374
combinatorics, 267 equivalence, 373
comment, 19 existential quantifier, 373
comparison operators, 28
compiler, 16 factorial, 270
complete graph, 132 fixed-size array, 97
component, 143 float, 23

390
I NDEX

flow network, 247 modulo, 26


for loop, 31 multiplication principle, 267

generate and test, 150 negation, 29, 373


graph, 131 neighbour, 133
graph game, 299 next_permutation, 58
NP-complete, 92
heap, 105
online judge, 10
identity permutation, 272 operator, 26
if statements, 29 optimization problem, 149
implication, 373 or, 373
independent set, 162 or operator, 29
input description, 3 oracle, 92
insertion sort, 83 order
instance, 4 of a permutation, 274
int, 22 output description, 3
intersection
of sets, 376 partial correctness, 7
path, 138
judgment, 11 permutation, 270, 271
cycles, 273
KMP, 264 identity, 272
Knuth-Morris-Pratt, 264 inverse of, 272
KS.Dev, 10 multiplication, 272
order, 274
lambda, 42 position
length of game, 299
of path, 138 prime number, 310
logic, 372 priority queue, 103
long long, 22 problem, 3
losing position, 300 product operator, 378
programming language, 8
main function, 19 pseudo code, 9
maximum matching, 253
member, 374 quantifier, 373
memory complexity, 92 query complexity, 92
modular inverse, 331 queue, 51, 102

391
I NDEX

quotient, 329

Rabin-Karp, 364
recursion, 117
recursive definition, 117
remainder, 329

sequence, 376
set, 374
simple graph, 131
stable sort, 58
stack, 52, 101
string, 22
structure, 37
subset, 375
sum operator, 377

test data, 10
time complexity, 83
total correctness, 7
travelling salesman problem, 149
tree, 146
trial division, 315
typedef, 24

union, 376
universal quantifier, 373

variable declaration, 21
vertex, 131
Visual Studio Code, 16

weighted graph, 135


winning position, 300

392

You might also like