Algorithmic Problem Solving: Johan Sannemo 2020
Algorithmic Problem Solving: Johan Sannemo 2020
Johan Sannemo
2020
This version of the book is a preliminary draft. Expect to
find typos and other mistakes. If you do, please report them
to [email protected]. A number of sections and
chapters are also unfinished, and a number of problems are
not yet uploaded to the judge – this are known issues.
Note: the linked problems are sometimes available on
Kattis (https://open.kattis.com/problems/PROBLEMID)
and sometimes on Kodsport.dev
(https://kodsport.dev/problems/PROBLEMID). In this
particular version, you should try the first one for most
chapters.
ii
Contents
Preface ix
I Preliminaries 1
2 Programming in C++ 15
2.1 Development Environments . . . . . . . . . . . . . . . . . . . 16
2.2 Hello World! . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Variables and Types . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6 If Statements . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7 For Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.8 While Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.9 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.10 Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.11 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.12 Lambdas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.13 The Preprocessor . . . . . . . . . . . . . . . . . . . . . . . . 43
2.14 Template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
iii
C ONTENTS
4 Implementation Problems 65
5 Time Complexity 83
5.1 The Complexity of Insertion Sort . . . . . . . . . . . . . . . . 83
5.2 Asymptotic Notation . . . . . . . . . . . . . . . . . . . . . . 86
5.3 NP-complete problems . . . . . . . . . . . . . . . . . . . . . 92
5.4 Other Types of Complexities . . . . . . . . . . . . . . . . . . 92
5.5 The Importance of Constant Factors . . . . . . . . . . . . . . 92
5.6 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . 93
5.7 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6 Data Structures 97
6.1 Dynamic Arrays . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.2 Stacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.3 Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.4 Priority Queues . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.5 Bitsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.6 Hash Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7 Recursion 117
7.1 Recursive Definitions . . . . . . . . . . . . . . . . . . . . . . 117
7.2 The Time Complexity of Recursive Functions . . . . . . . . . 120
7.3 Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.4 Multidimensional Recursion . . . . . . . . . . . . . . . . . . 126
7.5 Recursion vs. Iteration . . . . . . . . . . . . . . . . . . . . . 127
iv
C ONTENTS
II Basics 147
9 Brute Force 149
9.1 Optimization Problems . . . . . . . . . . . . . . . . . . . . . 149
9.2 Generate and Test . . . . . . . . . . . . . . . . . . . . . . . . 150
9.3 Backtracking . . . . . . . . . . . . . . . . . . . . . . . . . . 154
9.4 Fixing Parameters . . . . . . . . . . . . . . . . . . . . . . . . 162
9.5 Meet in the Middle . . . . . . . . . . . . . . . . . . . . . . . 165
9.6 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 170
v
C ONTENTS
16 Strings 255
16.1 Tries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
16.2 String Matching . . . . . . . . . . . . . . . . . . . . . . . . . 260
16.3 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 265
17 Combinatorics 267
17.1 The Addition and Multiplication Principles . . . . . . . . . . 267
17.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . 270
17.3 Ordered Subsets . . . . . . . . . . . . . . . . . . . . . . . . . 276
17.4 Binomial Coefficients . . . . . . . . . . . . . . . . . . . . . . 277
17.5 The Principle of Inclusion and Exclusion . . . . . . . . . . . . 286
17.6 The Pigeon Hole Principle . . . . . . . . . . . . . . . . . . . 288
17.7 Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
17.8 Monovariants . . . . . . . . . . . . . . . . . . . . . . . . . . 290
17.9 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 295
vi
C ONTENTS
21 Papers 351
21.1 Paper 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
23 Combinatorics 359
23.1 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . 359
24 Strings 361
24.1 Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
24.2 Dynamic Hashing . . . . . . . . . . . . . . . . . . . . . . . . 370
vii
C ONTENTS
Hints 381
Solutions 383
Bibliography 387
Index 390
viii
Preface
Algorithmic problem solving is the art of formulating efficient methods that
solve problems of a mathematical nature. From the many numerical algorithms
developed by the ancient Babylonians to the founding of graph theory by Euler,
algorithmic problem solving has been a popular intellectual pursuit during
the last few thousand years. For a long time, it was a purely mathematical
endeavor with algorithms meant to be executed by hand. During the recent
decades algorithmic problem solving has evolved. What was mainly a topic of
research became a mind sport known as competitive programming. As a sport
algorithmic problem solving rose in popularity with the largest competitions
attracting tens of thousands of programmers. While its mathematical counterpart
has a rich literature, there are only a few books on algorithms with a strong
problem solving focus.
The purpose of this book is to contribute to the literature of algorithmic
problem solving in two ways. First of all, it tries to fill in some holes in
existing books. Many topics in algorithmic problem solving lack any treatment
at all in the literature – at least in English books. Much of the content is
instead documented only in blog posts and solutions to problems from various
competitions. While this book attempts to rectify this, it is not to detract from
those sources. Many of the best treatments of an algorithmic topic I have seen
are as part of a well-written solution to a problem. However, there is value in
completeness and coherence when treating such a large area. Secondly, I hope
to provide another way of learning the basics of algorithmic problem solving by
helping the reader build an intuition for problem solving. A large part of this
book describes techniques using worked-through examples of problems. These
examples attempt not only to describe the manner in which a problem is solved,
but to give an insight into how a thought process might be guided to yield the
insights necessary to arrive at a solution.
This book is different from pure programming books and most other
algorithm textbooks. Programming books are mostly either in-depth studies of
a specific programming language or describe various programming paradigms.
A single language is used in this book – C++. The text on C++ exists for the
ix
C ONTENTS
x
Reading this Book
This book consists of three parts. The first part contains some preliminary
background, such as algorithm analysis and programming in C++. With an
undergraduate education in computer science most of these chapters are probably
familiar to you. It is recommended that you at least skim through the first part
since the remainder of the book assumes you know the contents of the preliminary
chapters.
The second part makes up most of the material in the book. Some of it
should be familiar if you have taken a course in algorithms and data structures.
The take on those topics is a bit different compared to an algorithms course. We
therefore recommend that you read through even the parts you feel familiar with
– in particular those on the basic problem solving paradigms, i.e. brute force,
greedy algorithms, dynamic programming and divide & conquer. The chapters
in this part are structured so that a chapter builds upon only the preliminaries
and previous chapters to the largest extent possible.
In the third part you will find the advanced topics. These are extensions of
the topics from the second part. This part is less cohesive, with few dependencies
between chapters. You can to a larger degree choose what topics you wish to
study, though most of them depend on several of the chapters from the basics.
At the end of the book you can find an appendix with some mathematical
background, together with hints and solutions for selected exercies.
When reading this book, know that every problem and technique was chosen
with care; every step on the way in a solution added to provide value. Sometimes,
this can make the book feel boring – a solution can take a long time tracing out
the intuition behind some small step, or show partial solutions that are unused
in the final result. At other times, missing a single sentence can leave you with a
crucial gap in your knowledge. I have tried to make sure that every sentence
written is important; when the book is long-winding, trust that it is useful, and
when difficult, endure to make sure you attain the deep understanding I hope
this book will be able to provide.
Similarly, the exercises are meant as attempts for you to construct some
crucial knowledge on your own. There may be fewer end-of-chapter exercises
xi
C ONTENTS
than you might be used to in a textbook, and more exercises inlined in chapters.
This is because we expect you to solve all exercises as part of the reading of the
book. Sometimes, the text after an exercise will assume that you read and solved
the exercise. The lecture analogue would be the lecturer pausing to ask the class
a question; only giving an answer if none is provided by the class. Since this is
a book, you are blessed with unlimited time to think in contrast to the lecture
setting, where you typically get on the order of minutes. Some exercises took
the author on the order of hours to solve at first, so do not feel disparaged if you
find them difficult. At the back of the book, you find hints and solutions for
selected exercises. If you fail to solve an exercise, first check if it has a hint, and
give it another attempt.
This book can also be used to improve your competitive programming
skills. Some parts are unique to competitive programming (in particular
Chapter 20 on contest strategy). This knowledge is extracted into competitive
tips:
Competitive Tip
A competitive tip contains some information specific to competitive programming.
These can be safely ignored if you are interested only in the problem solving aspect
and not the competitions.
The book often refers to exercises from the Kodsport.dev online judge:
Problem 0.1
Problem Name – problemid
xii
Part I
Preliminaries
1
1 Algorithms and Problems
The greatest technical invention of the last century was probably the digital
general purpose computer. It was the start of the revolution which provided us
with the Internet, smartphones, tablets, and the computerization of society.
To harness the power of computers we use programming. Programming is
the art of developing a solution to a computational problem, in the form of a set
of instructions that a computer can execute. These instructions are what we call
code, and the language in which they are written a programming language. The
abstract method that such code describes is what we call an algorithm.
The aim of algorithmic problem solving is thus to, given a computational
problem, devise an algorithm that solves it. One does not necessarily need to
complete the full programming process (i.e. write code that implements the
algorithm in a programming language) to enjoy solving algorithmic problems.
However, it often provides more insight and trains you at finding simpler
algorithms to problems.
In this chapter, we begin our journey into algorithmic problem solving by
taking a closer look at these concepts and showing a solution to a common
problem.
3
C HAPTER 1. A LGORITHMS AND P ROBLEMS
Sorting
Your task is to sort a sequence of integers in ascending order, i.e. from the
lowest to the highest.
Input
The input is a sequence of 𝑁 integers 𝑎 0, 𝑎 1, ..., 𝑎 𝑁 −1 .
Output
Output a permutation 𝑎 0 of the sequence 𝑎, such that 𝑎 00 ≤ 𝑎 10 ≤ ... ≤ 𝑎 𝑁0 −1 .
Exercise 1.1. If you were given cards with 5 different integers 1 and 1 000 000
written on them, how would you sort them in ascending order? How would your
approach change if you had 30 integers? 1000? 1 000 000?
Exercise 1.2. What are the input and output descriptions for the following
computational problems?
1) Compute the greatest common divisor (see Def. 19.5, page 319 if you
are not familiar with the concept) of two numbers.
2) Find a root (i.e. a zero) of a polynomial.
3) Multiply two numbers.
4
1.2. A LGORITHMS
1.2 Algorithms
Algorithms are solutions to computational problems. They define methods
that use the input to a problem in order to produce the correct output. A
computational problem can have many solutions. Efficient algorithms to solve
the sorting problem form an entire research area! Let us look at one possible
sorting algorithm, called selection sort, as an example.
Selection Sort
We construct the answer, the sorted sequence, iteratively one element at a
time, starting with the smallest.
Assume that we have chosen and sorted the 𝐾 smallest elements of the
original sequence. Then, the smallest unchosen element remaining in that
sequence must be the (𝐾 + 1)’st smallest element of the original sequence.
Thus, by finding the smallest element among those that remain we know what
the (𝐾 + 1)’st element of the sorted sequence is. By appending this element
to the already sorted 𝐾 smallest elements we get the sorted 𝐾 + 1 smallest
elements of the output.
If we repeat this process 𝑁 times, the result is the 𝑁 numbers of the
original sequence, but sorted.
You can see this algorithm performed on our previous example instance (the
sequence 3, 6, 1, −1, 2, 2) in Figures 1.1a-1.1f.
So far, we have been vague about what exactly an algorithm is. Looking
at our Selection Sort example, we do not have any particular structure or rigor
in the description of our method. There is nothing inherently wrong with
describing algorithms this way. It is easy to understand and gives the writer an
opportunity to provide context as to why certain actions are performed, making
the correctness of the algorithm more obvious. The main downsides of such a
description are ambiguity and a lack of detail.
Until an algorithm is described in sufficient detail, it is possible to accidentally
abstract away operations we may not know how to perform behind a few English
words. As a somewhat contrived example, our plain text description of selection
sort includes actions such as “choosing the smallest number of a sequence”.
While such an operation may seem very simple to us humans, algorithms are
generally constructed with regards to some kind of computer. Unfortunately,
computers can not map such English expressions to their code counterparts yet.
Instructing a computer to execute an algorithm thus requires us to formulate our
5
C HAPTER 1. A LGORITHMS AND P ROBLEMS
3 6 1 −1 2 2
(a) Originally, we start out with the unsorted sequence (3, 6, 1, −1, 2, 2).
−1 3 6 1 2 2
(b) The smallest element of the sequence is −1, so this is the first element of the sorted sequence.
−1 1 3 6 2 2
(c) We find the next element of the output by removing the −1 and finding the smallest remaining
element – in this case 1.
−1 1 2 3 6 2
(d) Here, there is no unique smallest element. We can choose any of the two 2’s in this case.
−1 1 2 2 3 6
−1 1 2 2 3 6
−1 1 2 2 3 6
(f) Finally, we choose the last remaining element of the input sequence – the 6. This concludes
the sorting of our sequence.
algorithm in steps small enough that even a computer knows how to perform
them. In this sense, a computer is rather stupid.
The English language is also ambiguous. We are sloppy with references
to “this variable” and “that set”, relying on context to clarify meaning for us.
We use confusing terminology and frequently misunderstand each other. Real
code does not have this problem. It forces us to be specific with what we mean.
However, as all programmers know, we often manage to construct highly specific
algorithms that do the wrong thing due to our own erronous thought processes.
We will generally describe our algorithms in a representation called pseudo
code (Section 1.4), accompanied by an online exercise to implement the code.
Sometimes, we will instead give explicit code that solves a problem. This will
be the case whenever an algorithm is very complex, or care must be taken to
make the implementation efficient. The goal is that you should get to practice
understanding pseudo code, while still ending up with correct implementations
6
1.2. A LGORITHMS
Exercise 1.4. Do you know any algorithms, for example from school? (Hint:
you use many algorithms to solve certain arithmetic and algebraic problems,
such as those in Exercise 1.2.)
Exercise 1.5. In Exercise 1.1, you were asked to come up with your own
approaches to the sorting problem. Attempt to write them down formally as
descriptions of algorithms.
Correctness
One subtle, albeit important, point that we glossed over is what it means for an
algorithm to actually be correct.
There are two common notions of correctness – partial correctness and total
correctness. Partial correctness requires an algorithm to, upon termination, have
produced an output that fulfills all the criteria laid out in the output description.
Total correctness additionally requires an algorithm to finish within finite time.
When we talk about correctness of our algorithms later on, we generally focus on
the partial correctness. Termination is instead proved implicitly, as we consider
a more granular measure of efficiency (called time complexity, in Chapter 5) than
just finite termination. This measure implies the termination of the algorithm,
completing the proof of total correctness.
Proving that the selection sort algorithm finishes in finite time is quite easy.
It performs one iteration of the selection step for each element in the original
sequence (which is finite). Furthermore, each such iteration can be performed in
finite time by looking at each remaining element of the selection when finding
the smallest one. The remaining sequence is a subsequence of the original one
and is therefore also finite.
Proving that the algorithm produces the correct output is a bit more difficult
to prove formally. The main idea behind a formal proof is contained within our
description of the algorithm itself.
While this definition seems clear enough – our algorithm should simply do
what the problem asks of it! – we will compromise on both conditions at later
points in the book. Generally, we are satisfied with an algorithm terminating in
7
C HAPTER 1. A LGORITHMS AND P ROBLEMS
expected finite time or answering correctly with, say, probability 0.75 for every
input. Similarly, we are sometimes happy to find an approximate solution to a
problem. What this means more concretely will become clear in due time when
we study such algorithms.
Competitive Tip
Proving your algorithm correct is sometimes quite difficult. In a competition, a correct
algorithm is correct even if you cannot prove it. If you have an idea you think is correct
it may be worth testing. This is not a strategy without problems though, since it makes
distinguishing between an incorrect algorithm and an incorrect implementation even
harder.
Exercise 1.7. Prove the correctness of your algorithm to the guessing problem
from Exercise 1.6 and your sorting algorithms from Exercise 1.5.
Exercise 1.8. Why would an algorithm that is correct with e.g. probability 0.75
still be very useful to us?
Why is it important that such an algorithm is correct with probability 0.75
on every problem instance, instead of always being correct for 75% of all cases?
8
1.4. P SEUDO C ODE
1: procedure SelectionSort(sequence 𝐴)
2: Let 𝐴 0 be an empty sequence
3: while 𝐴 is not empty do
4: minIndex ← 0
5: for every element 𝐴𝑖 in 𝐴 do
9
C HAPTER 1. A LGORITHMS AND P ROBLEMS
Pseudo code reads somewhat like our English language variant of the
algorithm, except the actions are broken down into much smaller pieces. Most
of the constructs of our pseudo code are more or less obvious. The notation
variable ← 𝑣𝑎𝑙𝑢𝑒 is how we denote an assignment in pseudo code. For those
without programming experience, this means that the variable named variable
now takes the value 𝑣𝑎𝑙𝑢𝑒. Pseudo code appears when we try to explain some
part of a solution in great detail but programming language specific aspects
would draw attention away from the algorithm itself.
Competitive Tip
In team competitions where a team only have a single computer, a team will often
have solved problems waiting to be coded. Writing pseudo code of the solution to one
of these problems while waiting for computer time is an efficient way to parallelize
your work. This can be practiced by writing pseudo code on paper even when you are
solving problems by yourself.
Exercise 1.9. Write pseudo code for your algorithm to the guessing problem
from Exercise 1.6.
10
1.5. T HE KS.D EV O NLINE J UDGE
Sorting
Time: 1s, memory: 1MB
Your task is to sort a sequence of integers in ascending order, i.e. from the
lowest to the highest.
Input
The input is a sequence of 𝑁 integers (1 ≤ 𝑁 ≤ 1000) 𝑎 0, 𝑎 1, ..., 𝑎 𝑁 −1 (|𝑎𝑖 | ≤
109 ).
Output
Output a permutation 𝑎 0 of the sequence 𝑎, such that 𝑎 00 ≤ 𝑎 10 ≤ ... ≤ 𝑎 𝑁0 −1 .
If your program exceeds the allowed resource limits (i.e. takes too much
time or memory), crashes, or gives an invalid output, KS.Dev will tell you so
with a rejected judgment. There are many kinds of rejected judgments, such as
Wrong Answer, Time Limit Exceeded, and Run-time Error. These mean your
program gave an incorrect output, took too much time, and crashed, respectively.
Assuming your program passes all the instances, it will be be given the Accepted
judgment.
Note that getting a program accepted by KS.Dev is not the same as having a
correct program – it is a necessary but not sufficient criterion for correctness.
This is also a fact that can sometimes be exploited during competitions by
writing a knowingly incorrect solution that one thinks will pass all test cases
that the judges of the competitions designed.
We strongly recommend that you get a (free) account on KS.Dev so that you
can follow along with the book’s exercises.
• Kattis (https://open.kattis.com)
• Codeforces (http://codeforces.com)
• CSAcademy (https://csacademy.com)
11
C HAPTER 1. A LGORITHMS AND P ROBLEMS
• AtCoder (https://atcoder.jp)
• TopCoder (https://topcoder.com)
• HackerRank (https://hackerrank.com)
Chapter Exercises
Exercise 1.11. Pick two sorting algorithms from Wikipedia’s list of sorting
algorithms: https://en.wikipedia.org/wiki/Category:Sorting_algorithms. Try
to understand them and their proof of correctness. Use them by hand to sort the
integers 5, 1, 2, 7, 5, 6, 2, 9.
Palindrome
A word is a palindrome if it reads the same forwards and backwards, for example
tacocat, madam, or abba. Determine if a word is a palindrome.
Input
The input consists of a single word, containing only lowercase letters a-z.
Output
Output yes if the word is a palindrome and no otherwise.
Primality
We call an integer 𝑛 > 1 a prime if its only positive divisors are 1 and 𝑛.
Determine if a particular integer is a prime.
Input
The input consists of a single integer 𝑛 > 1.
Output
Output yes if the number 𝑛 was a prime and no otherwise.
For each of them,
12
1.5. T HE KS.D EV O NLINE J UDGE
Chapter Notes
The introductions given in this chapter are very bare, mostly stripped down to
what you need to get by when solving algorithmic problems.
Many other books delve deeper into the theoretical study of algorithms
than we do, in particular regarding subjects not relevant to algorithmic problem
solving. Introduction to Algorithms [7] is a rigorous introductory text book on
algorithms with both depth and breadth.
For a gentle introduction to the technology that underlies computers, CODE
[23] is a well-written journey from the basics of bits and bytes all the way up to
assembly code and operating systems. It requires no knowledge of programming
to read.
13
C HAPTER 1. A LGORITHMS AND P ROBLEMS
14
2 Programming in C++
In this chapter we learn the basics of the C++ programming language. This
language is the most common programming language within the competitive
programming community for a few reasons (aside from C++ being a popular
language in general). Programs coded in C++ are generally somewhat faster
than those written in most other competitive programming languages. There are
also many routines in the accompanying standard code libraries that are useful
when implementing algorithms.
Of course, no language is without downsides. C++ is a bit difficult to learn
as your first programming language to say the least. Its error management is
unforgiving, often causing erratic behavior in programs instead of crashing with
an error. Programming certain things become quite verbose, compared to many
other languages.
After bashing the difficulty of C++, you might ask if it really is the best
language in order to get started with algorithmic problem solving. While
there certainly are simpler languages we believe that the benefits outweigh the
disadvantages in the long term even though it demands more from you as a
reader. Either way, it is definitely the language we have the most experience of
teaching problem solving with.
When you study this chapter, you will see a lot of example code. Type
this code and run it. We can not really stress this point enough. Learning
programming from scratch – in particular a complicated language such as C++ –
is not possible unless you try the concepts yourself. Additionally, we strongly
recommend that you do every exercise in this chapter, even moreso than in the
other chapters.
Finally, know that our treatment of C++ is minimal. We do not explain all
the details behind the language, nor do we teach good coding style or general
software engineering principles. In fact, we frequently make use of bad coding
practices. If you want to delve deeper, you can find more resources in the chapter
notes.
15
C HAPTER 2. P ROGRAMMING IN C++
Windows
Ubuntu
macOS
When using macOS, you first need to install the Clang compiler by installing
Xcode from the Mac App Store. This is also a code editor, but the compiler is
bundled with it.
After installing the compiler, you can download the installer for Visual
Studio Code from https://code.visualstudio.com/. It is available as a normal
macOS package for installation.
16
2.2. H ELLO W ORLD !
Note that the line ending with ←↪ denotes that the text on the following line
should be on the same line.
Then, restart your editor again.
Exercise 2.1. Throughout this chapter, you will learn many concepts within
C++. We recommend that you create a notebook (for example in a file on your
computer) where you write down how the different constructs are used when
programming to keep as a reference for later.
Start by opening Visual Studio Code and create a new file by going to File
⇒ New File. Save the file as hello.cpp by pressing Ctrl+S. Make sure to save it
somewhere you can find it.
Now, type the code from Listing 2.1 into your editor.
17
C HAPTER 2. P ROGRAMMING IN C++
1 #include <iostream>
2
3 using namespace std;
4
5 int main() {
6 // Print Hello World!
7 cout << "Hello World!" << endl;
8 }
To run the program in Visual Studio Code, you press Ctrl+Alt+N. A tab
below your code named TERMINAL containing the text Hello World! should appear.
If no window appears, you probably mistyped the program.
Coincidentally, KS.Dev happens to have a problem whose output description
dictates that your program should print the text Hello World!. How convenient.
This is a great opportunity to get familiar with KS.Dev.
Problem 2.1
Hello World! – hello
When you submit your solution, KS.Dev grades it and give you its judgment.
If you typed everything correctly, KS.Dev tells you it got Accepted. Otherwise,
you probably got Wrong Answer, meaning your program output the wrong text
(and you mistyped the code).
Now that you have managed to solve the problem, it is time to talk a bit
about the code you typed.
The first line of the code,
#include <iostream>
is used to include the iostream – input and output stream – file from the so-called
standard library of C++. The standard library is a large collection of ready-to-use
algorithms, data structures, and other routines which you can use when coding.
For example, there are sorting routines in the C++ standard library, meaning you
do not need to implement your own sorting algorithm when coding solutions.
Later on, we will see other useful examples of the standard library and
include many more files. The iostream file contains routines for reading and
writing data to your screen. Your program used code from this file when it
printed Hello World! upon execution.
18
2.2. H ELLO W ORLD !
in the beginning of your code. If your program still compiles, you can use this
and not include anything else. By using this line you do not have to care about
including any other files from the standard library which you wish to use.
The third line,
using namespace std;
tells the compiler that we wish to use code from the standard library. If we
did not use it, we would have to specify this every time we used code from the
standard library later in our program by prefixing what we use from the library
by std:: (for example std::cout).
The fifth line defines our main function. When we instruct the computer to
run our program the computer starts looking at this point for code to execute.
The first line of the main function is thus where the program starts to run with
further lines in the function executed sequentially. Later on we learn how to
define and use additional functions as a way of structuring our code. Note that
the code in a function – its body – must be enclosed by curly brackets. Without
them, we would not know which lines belonged to the function.
On line 6, we wrote a comment
// Print Hello World!
Comments are explanatory lines which are not executed by the computer. The
purpose of a comment is to explain what the code around it does and why. They
begin with two slashes // and continue until the end of the current line.
It is not until the seventh line that things start happening in the program. We
use the standard library utility cout to print text to the screen. This is done by
writing e.g.:
cout << "this is text you want to print. ";
cout << "you can " << "also print " << "multiple things. ";
cout << "to print a new line" << endl << "you print endl" << endl;
cout << "without any quotes" << endl;
Lines that do things in C++ are called statements. Note the semi colon at
the end of the line! Semi colons are used to specify the end of a statement, and
are mandatory.
Exercise 2.2. Must the main function be named main? What happens if you
changed main to something else and try to run your program?
19
C HAPTER 2. P ROGRAMMING IN C++
Exercise 2.3. Play around with cout a bit, printing various things. For example,
you can print a pretty haiku.
1 #include <iostream>
2 using namespace std;
3
4 int main() {
5 int five = 5;
6 cout << five << endl;
7 int seven = 7;
8 cout << seven << endl;
9 five = seven + 2; // = 7 + 2 = 9
10 cout << five << endl;
11 seven = 0;
12 cout << five << endl; // five is still 9
13 cout << 5 << endl; // we print the integer 5 directly
14 }
Another major difference is that variables in C++ are not tied to a single
value for the entirety of their lifespans. Instead, we are able to modify the value
which our variables have using something called assignment. Some languages
20
2.3. VARIABLES AND T YPES
declares an integer variable five and assigns the value 5 to it. The int part is
C++ for integer and is what we call a type. After the type, we write the name of
the variable – in this case five. Finally, we may assign a value to the variable.
Note that further use of the variable never include the int part. We declare the
type of a variable once and only once.
Later on in Listing 2.2 we decide that 5 is a somewhat small value for
a variable called five. We can change the value of a variable by using the
assignment operator – the equality sign =. The assignment
five = seven + 2;
states that from now on the variable five should take the value given by the
expression seven + 2. Since (at least for the moment) seven has the value 7 the
expression evaluates to 7 + 2 = 9. Thus five will actually be 9, explaining the
output we get from line 12.
On line 14 we change the value of the variable seven. Note that line 15
still prints the value of five as 9. Some people find this model of assignment
confusing. We first performed the assignment five = seven + 2;, but the value
of five did not change with the value of seven. This is mostly an unfortunate
consequence of the choice of = as operator for assignment. One could think that
“once an equality, always an equality” – that the value of five should always be
the same as the value of seven + 2. This is not the case. An assignment sets
the value of the variable on the left hand side to the value of the expression on
the right hand side at a particular moment in time, nothing more.
The snippet also demonstrates how to print the value of a variable on the
screen – we cout it the same way as with text. This also clarifies why text needs
to be enquoted. Without quotes, we can not distinguish between the text string
"hi" and the variable hi.
21
C HAPTER 2. P ROGRAMMING IN C++
22
2.3. VARIABLES AND T YPES
Exercise 2.7. Since \" is used to include a double quote in a string, we can not
include backslashes in a string like any other character. For example, how would
you output the verbatim string \"? Find out how to include a literal backslash in
a string (for example by searching the web or thinking about how we included
the different quote characters).
Exercise 2.8. Write a program that assigns the minimum and maximum values
of an int to a int variable x. What happens if you increment or decrement this
value using x = x + 1; or x = x - 1; respectively and print its new value?
Competitive Tip
One of the most common sources for errors in code is trying to store an integer value
outside the range of the type. Always make sure your values fit inside the range of an
int if you use it. Otherwise, use long longs!
One of the reasons for why we do not simply use long long all the time is that some
operations involving long longs can be slower using ints under certain conditions.
Next comes the double type. This type represents decimal numbers. Note
that the decimal sign in C++ is a dot, not a comma. There is also another similar
type called the float. The difference between these types are similar to that of
23
C HAPTER 2. P ROGRAMMING IN C++
the int and long long. A double can represent “more” decimal numbers than
a float. This may sound weird considering that there is an infinite number of
decimal numbers even between 0 and 1. However, a computer can clearly not
represent every decimal number – not even those between 0 and 1. To do this,
it would need infinite memory to distinguish between these numbers. Instead,
they represent a limited set of numbers – with about 15 significant digits, and
about 308 zeroes to the left or right of those digits. Floats have fewer significant
digits, and can only represent smaller numbers.
The last of our common types is the bool (short for boolean). This type can
only contain one of two values – it is either true or false. While this may look
useless at a first glance, the importance of the boolean becomes apparent later.
Exercise 2.9. In the same way the integer types had a valid range of values, a
doublecannot represent arbitrarily large values. Find out what the minimum and
maximum values a double can store is.
On every line after this statement, we can use ll just as if it were a long long:
ll largeNumber = 888888888888LL;
Sometimes we use types with very long names but do not want to shorten
them using type definitions. This could be the case when we use many different
such types and typedefing them would take unnecessarily long time. We then
resort to using the auto “type” instead. If a variable is declared as auto and
assigned a value at the same time its type is inferred from that of the value. This
means we could write
instead of
24
2.4. I NPUT AND O UTPUT
1 #include <iostream>
2 using namespace std;
3
4 int main() {
5 string name;
6 cout << "What's your first name?" << endl;
7 cin >> name;
8 int age;
9 cout << "How old are you?" << endl;
10 cin >> age;
11 cout << "Hi, " << name << "!" << endl;
12 cout << "You are " << age << " years old." << endl;
13 }
Exercise 2.10. What happens if you type an invalid input, such as your first
name instead of your age?
When the program reads input into a string variable it only reads the text
until the first whitespace.
We revisit more advanced input and output concepts in Section 3.10 about
the standard library. For example, we learn how to read entire lines of text and
not only single words.
25
C HAPTER 2. P ROGRAMMING IN C++
Problem 2.2
Echo – echo
Note: only solve part 1, reciving 1/2 points
2.5 Operators
Earlier we saw examples of what is called operators, such as the assignment
operator =, and the arithmetic operators + - * /, which stand for addition,
subtraction, multiplication and division. They work almost like they do in
mathematics, and allow us to write code such as the one in Listing 2.5.
Exercise 2.11. Type in Listing 2.5 and test it on a few different values. Most
importantly, test:
• 𝑏=0
• Values where the expected result is outside the valid range of an int
Exercise 2.12. If division rounds down towards zero, how do you compute 𝑥
𝑦
rounded to an integer away from zero?
26
2.5. O PERATORS
1 #include <iostream>
2 using namespace std;
3
4 int main() {
5 int a = 0;
6 int b = 0;
7 cin >> a >> b;
8 cout << "Sum: " << (a + b) << endl;
9 cout << "Difference: " << (a - b) << endl;
10 cout << "Product: " << (a * b) << endl;
11 cout << "Quotient: " << (a / b) << endl;
12 cout << "Remainder: " << (a % b) << endl;
13 }
1 int a = 6;
2 int b = 4;
3 cout << (a / b) << endl;
4
5 double aa = 6.0;
6 double bb = 4.0;
7 cout << (aa / bb) << endl;
It turns out that addition and subtraction with 1 is a fairly common operation.
So common, in fact, that additional operators were introduced into C++ for
this purpose of saving an entire character compared to the highly verbose +=1
operator. These operators consist of two plus signs or two minus signs. For
instance, a++ increments the variable by 1.
We sometimes use the fact that these expressions also evaluate to a value.
Which value this is depends on whether we put the operator before or after the
variable name. By putting ++ before the variable, the value of the expression
will be the incremented value. If we put it afterwards we get the original value.
To get a better understanding of how this works it is best if you type the code in
Listing 2.7 in yourself and analyze the results.
We end the discussion on operators by saying something about operator
precedence, i.e. the order in which operators are evaluted in expressions.
In mathematics, there is a well-defined precedence: brackets go first, then
27
C HAPTER 2. P ROGRAMMING IN C++
1 int num = 0;
2 num += 1;
3 cout << num << endl;
4 num *= 2;
5 cout << num << endl;
6 num -= 3;
7 cout << num << endl;
8 cout << num++ << endl;
9 cout << num << endl;
10 cout << ++num << endl;
11 cout << num << endl;
12 cout << num-- << endl;
13 cout << num << endl;
Problem 2.3
Two-sum – twosum
Triangle Area – triarea
Bijele – bijele
Digit Swap – digitswap
Pizza Crust – pizzacrust
R2 – r2
2.6 If Statements
In addition to assignment and arithmetic there are a large number of comparison
operators. These compare two values and evaluate to a bool value depending
on the result of the comparison (see Listing 2.8).
28
2.6. I F S TATEMENTS
1 a == b // check if a equals b
2 a != b // check if a and b are different
3 a > b // check if a is greater than b
4 a < b // check if a is less than b
5 a <= b // check if a is less than or equal to b
6 a >= b // check if a is greater than or equal to b
A bool can also be negated using the ! operator. So the expression !false
(which we read as “not false”) has the value true and vice versa !true evaluates
to false. The operator works on any boolean expressions, so that if b would be
a boolean variable with the value true, then the expression !b evaluates to false.
There are two more important boolean operators. The and operator && takes
two boolean values and evaluates to true if and only if both values were true.
Similarly, the or operator || evalutes to true if and only if at least one of its
operands were true.
Exercise 2.14. Write a program that reads two integers as input, and prints the
result of the different comparison operators from Listing 2.8, e.g
cout << (a == b) << endl;
29
C HAPTER 2. P ROGRAMMING IN C++
1 int input;
2 cin >> input;
3 if (input % 2 == 0) {
4 cout << input << " is even!" << endl;
5 }
6 if (input % 2 == 1 || input % 2 == -1) {
7 cout << input << " is odd!" << endl;
8 }
1 int input;
2 cin >> input;
3 if (input % 2 == 0) {
4 cout << input << " is even!" << endl;
5 } else {
6 cout << input << " is odd!" << endl;
7 }
There is one last if-related construct – the else if. Since code is worth a
thousand words, we demonstrate how it works in Listing 2.11 by implementing
a helper for the children’s game FizzBuzz. In FizzBuzz, one goes through the
natural numbers in increasing order and say them out loud. When the number is
divisible by 3 you instead say Fizz. If it is divisible by 5 you say Buzz, and if it
is divisible by both you say FizzBuzz.
Exercise 2.15. Run the program in Listing 2.11 with the values 30, 10, 6, 4.
Explain the output you get.
Problem 2.4
Expected Earnings – casino
30
2.7. F OR L OOPS
Grading – grading
Three-Sort – threesort
Spavanac – spavanac
Cetvrta – cetvrta
Exercise 2.16. What happens if you enter a negative value as the number of
loop repetitions?
Exercise 2.17. Design a loop that instead counts backwards, from repetitions − 1
to 0.
Problem 2.5
N-Sum – nsum
Building Pyramids – pyramids
Echo – echo
Note: solve both parts now, reciving 2/2 points
31
C HAPTER 2. P ROGRAMMING IN C++
Within a loop, two useful keywords can be used to modify the loop – continue
and break. Using continue; inside a loop exits the current iteration and starts the
next one. break; on the other hand, exits the loop altogether. For an example,
consider Listing 2.13.
Problem 2.6
Cinema Crowds 2 – cinema2
Lamps – lamps
32
2.9. F UNCTIONS
1 int input;
2 cin >> input;
3 if (input % 15 == 0) {
4 cout << "FizzBuzz" << endl;
5 } else if (input % 5 == 0) {
6 cout << "Buzz" << endl;
7 } else if (input % 3 == 0) {
8 cout << "Fizz" << endl;
9 } else {
10 cout << input << endl;
11 }
1 int repetitions = 0;
2 cin >> repetitions;
3 for (int i = 0; i < repetitions; i++) {
4 cout << "This is repetition " << i << endl;
5 }
2.9 Functions
In mathematics a function is something that takes one or more arguments and
computes some value based on them. Common functions include the squaring
function square(𝑥) = 𝑥 2 , the addition function add(𝑥, 𝑦) = 𝑥 +𝑦 or, the minimum
function min(𝑎, 𝑏) which evalutes to the smallest of its arguments.
Functions exists in programming as well but work a bit differently. Indeed,
we have already seen a function – the main() function. We have implemented
the example functions in Listing 2.15.
In the same way that a variable declaration starts by proclaiming what
data type the variable contains a function declaration states what data type the
function evaluates to. Afterwards, we write the name of the function followed
by its arguments (which is a comma-separated list of variable declarations).
Finally, we give it a body of code wrapped in curly brackets.
All of these functions contain a statement with the return keyword, unlike our
main function. A return statement says “stop executing this function, and return
the following value!”. Thus, when we call the squaring function by square(x),
33
C HAPTER 2. P ROGRAMMING IN C++
1 int num = 9;
2 while (num != 1) {
3 if (num % 2 == 0) {
4 num /= 2;
5 } else {
6 num = 3 * num + 1;
7 }
8 cout << num << endl;
9 }
the function will compute the value x * x and make sure that square(x) evaluates
to just that.
Why have we left a return statement out of the main function? In main(), the
compiler inserts an implicit return 0; statement at the end of the function.
Exercise 2.19. What will the following function calls evaluate to?
min(square(10), add(square(9), 23));
Exercise 2.20. . We declared all of the new arithmetic functions above our main
function in the example. Why did we do this? What happens if you move one
below the main function instead? (Hint: what happens if you try to use a variable
before declaring it?)
34
2.9. F UNCTIONS
1 #include <iostream>
2
3 using namespace std;
4
5 int square(int x) {
6 return x * x;
7 }
8
9 int min(int x, int y) {
10 if (x < y) {
11 return x;
12 } else {
13 return y;
14 }
15 }
16
17 int add(int x, int y) {
18 return x + y;
19 }
20
21 int main() {
22 int x, y;
23 cin >> x >> y;
24 cout << x << "^2 = " << square(x) << endl;
25 cout << x << " + " << y << " = " << add(x, y) << endl;
26 cout << "min(" << x << ", " << y << ") = " << min(x, y) << endl;
27 }
Problem 2.8
Arithmetic Functions – arithmeticfunctions
35
C HAPTER 2. P ROGRAMMING IN C++
without returning.
The first one is by using global variables. It turns out that variables may
be declared outside of a function. It is then available to every function in your
program. Changes to a global variable by one function are also be seen by other
functions (try out Listing 2.17 to see them in action).
1 int currentMoney = 0;
2
3 void deposit(int newMoney) {
4 currentMoney += newMoney;
5 }
6 void withdraw(int withdrawal) {
7 currentMoney -= withdrawal;
8 }
9
10 int main() {
11 cout << "Currently, you have " << currentMoney << " money" << endl;
12 deposit(1000);
13 withdraw(2000);
14 cout << "Oh-oh! Your balance is " << currentMoney << " :(" << endl;
15 }
Problem 2.9
Counting Days – countingdays
36
2.10. S TRUCTURES
the variable name, for example int &x. If we perform assignments to the variable
x within the function we change the variable used for this argument in the calling
function instead. Listing 2.18 contains an example of using references.
Problem 2.10
Logic Functions – logicfunctions
Exercise 2.22. Why is the function call change(4) not valid C++? (Hint: what
exactly are we changing when we assign to the reference in func?)
2.10 Structures
Algorithms operate on data, usually lots of it. Programming language designers
therefore came up with many ways of organizing the data our programs use.
One of these constructs is the structure (also called a record, and in C++ almost
equivalent to something called a class). Structures are a special kind of data
type that can contain member variables – variables inside them – and member
functions – functions which can operate on member variables.
The basic syntax used to define a structure looks like this:
struct Point {
double x;
double y;
};
This particular structure contains two member variables, x and y, representing
the coordinates of a point in 2D Euclidean space.
Once we have defined a structure we can create instances of it. Every
instance has its own copy of the member variables of the structure. Structs
37
C HAPTER 2. P ROGRAMMING IN C++
cout << "The origin is (" << origin.x << ", "
<< origin.y << ")." << endl;
As you can see structures allow us to group certain kinds of data together in a
logical fashion. Later on, this will simplify the coding of certain algorithms and
data structures immensely.
There is an alternate way of constructing instances called constructors. A
constructor looks like a function inside our structure and allows us to pass
arguments when we create a new instance of a struct. The constructor receives
these arguments to help set up the instance.
Let us add a constructor to our point structure, to more easily create instances:
struct Point {
double x;
double y;
The newly added constructor lets us pass two arguments when constructing
the instance to set the coordinates correctly. With it, we avoid the two extra
statements to set the member variables.
Point p(4, 2.1);
cout << "The point is (" << p.x << ", " << p.y << ")." << endl;
38
2.10. S TRUCTURES
p = Point(1, 2);
We can also define functions inside the structure. These functions work
just like any other functions except they can also access the member variables
of the instance that the member function is called on. For example, we might
want a convenient way to mirror a certain point in the x-axis. This could be
accomplished by adding a member function:
struct Point {
double x;
double y;
Point mirror() {
return Point(x, -y);
}
};
In this example we see yet another use of a void function. Such member
functions can still modify the member variables of the struct the belong to.
Exercise 2.23. Add a translate member function to the point structure. It should
take two double values x and y as arguments, returning a new point which is the
instance point translated by (𝑥, 𝑦).
The keyword must be added right before the last brace. Such a function is unable
to modify any of the member variables. It can not call other member functions
that are not declared as const either. Generally, you will never have to worry
about declaring functions to be const.
39
C HAPTER 2. P ROGRAMMING IN C++
where a and b are Points. The syntax for the binary operators looks like this:
Point operator+(Point other) {
double newX = x + other.x;
double newY = y + other.y;
return Point(newX, newY);
}
Try this function out by defining two points and computing their sum.
Exercise 2.25. One can use operator overloading for binary operators where the
types are different as well. For example,
Point operator*(double m) { ... }
would define what happens if you multiply a point by a double. Add such a
function to your point, that returns a point with its coordinates scaled by the
given double.
2.11 Arrays
In the Sorting Problem from Chapter 1 we often spoke of the data type “sequence
of integers”. Until now, none of the data types we have seen in C++ represents
this kind of data. We present the array. It is a special type of variable, which
40
2.11. A RRAYS
can contain a large number of variables of the same type. For example, it could
be used to represent the recurring data type “sequence of integers” from the
Sorting Problem in Chapter 1. When declaring an array, we specify the type of
variable it should contain, its name, and its size using the syntax:
type name[size];
For example, an integer array of size 10 named seq would be declared with
int seq[10];
This creates 10 integer “variables” which we can refer to using the syntax
seq[index], starting from zero (they are zero-indexed). Thus we can use seq[0],
seq[1], etc., all the way up to seq[9]. The values are called the elements of the
array.
size = 10
seq[0] seq[1] seq[2] seq[3] seq[4] seq[5] seq[6] seq[7] seq[8] seq[9]
Be aware that using an index outside the valid range for a particular array
(i.e. below 0 or above the size − 1) can cause erratic behavior in the program
without crashing it.
If you declare a global array all elements get a default value. For numeric
types this is 0, for booleans this is false, for strings this is the empty string and
so on. If, on the other hand, the array is declared in the body of a function that
guarantee does not apply. Instead of being zero-initialized, the elements can
have random values. For this reason, arrays are mostly declared globally in
competitive programming.
You can see an example of arrays in action in Listing 2.19, which computes
a few of the the possible scores of a roll in the dice game Yatzee.
Later on (Section 3.1) we transition from using arrays to a much more
powerful structure from the standard library which serves the same purpose –
the vector.
Problem 2.11
Reversal – reverse
N-Back – nback
41
C HAPTER 2. P ROGRAMMING IN C++
Modulo – modulo
I’ve Been Everywhere, Man – everywhere
2.12 Lambdas
We will now briefly discuss a somewhat complex language constract – lambdas.
It is very seldom necessary to solve problems, but we occasionally use it in code
throughout the book.
A lambda expression is essentially an unnamed function that can be defined
within another function and assigned to a variable of the function type:
function<int,int(int)> op = [](int a, int b) -> int {
return a * b + a + b;
};
cout << op(5, op(1, 2)) << endl;
Here, we have defined a function that takes two values a and b, and returns
the value a * b + a + b. We have assigned the function to the variable op, and
can invoke it as if it was a regular function with that name.
Generally, definitions look simpler than this – if the function is “simple
enough”, we can ignore the -> int part, which we use to specify the return value
of the lambda. We also tend to use the auto type instead of the more convoulted
function<...> type, as long as the lambda does not call itself through the name
of the variable to which it is assigned.
Thus, the declaration may also look like this:
auto op = [](int a, int b) {
return a * b + a + b;
};
What is the point of doing this rather than simply using regular functions?
Lambdas can also be given access to variables of the enclosing function:
int x = 5;
auto addToX = [&](int y) {
x += y;
};
Here, note the added ampersand in [&]. This means that all variables defined
before the lambda in the function should be accessible within the lambda as
references.
• how to only make a single variable from the enclosing function available
in a lambda.
42
2.13. T HE P REPROCESSOR
1 #include <iostream>
2
3 using namespace std;
4
5 int rolls[7];
6
7 int main() {
8 cout << "Enter 5 dice rolls between 1 and 6: " << endl;
9 for (int i = 0; i < 5; i++) {
10 int roll;
11 cin >> roll;
12 rolls[roll]++;
13 }
14 cout << "Yatzee scores: " << endl;
15 for (int i = 1; i <= 6; i++) {
16 cout << i << "'s: " << (i * rolls[i]) << endl;
17 }
18 }
which replaces the token TOREPLACE in our program with REPLACEWITH. The true
power of the define comes when using define directives with parameters. These
look similar to functions and allows us to replace certain expressions with
another one, additionally inserting certain values into it. We call these macros.
For example the macro
#define rep(i,a,b) for (int i = a; i < b; i++)
43
C HAPTER 2. P ROGRAMMING IN C++
rep(i,0,5) {
cout << i << endl;
}
is expanded to
for (int i = 0; i < 5; ++i) {
cout << i << endl;
}
You can probably get by without ever using macros in your code. The reason
we discuss them is because we are going to use them in code in the book so it
is a good idea to at least be familiar with their meaning. They are also used in
competitive programming in general,
2.14 Template
In competitive programming, one often uses a template, with some shorthand
typedef’s and preprocessor directives. In Listing 2.20, we give an example of
the template used in some of the C++ code in this book.
1 #include <bits/stdc++.h>
2 using namespace std;
3
4 #define rep(i, a, b) for(int i = a; i < (b); ++i)
5 #define trav(a, x) for(auto& a : x)
6 #define all(x) x.begin(), x.end()
7 #define sz(x) (int)(x).size()
8 typedef long long ll;
9 typedef pair<int, int> pii;
10 typedef vector<int> vi;
11
12 int main() {
13 }
The rep(i,a,b) macro is the one we saw in the previous section – it can be
used to write a simple counting loop in a compact way.
The trav(a, x) macro is used to iterate through all members of a data structure
from the standard library such as the vector – the first topic of Chapter 3.
The all(x) macro is used together with certain operations from the standard
library – we’ll see concrete examples in the next chapter.
The sz(x) macro is used get the size of a data structure from the standard
library.
44
2.14. T EMPLATE
Chapter Exercises
Problem 2.12
Cubes – kuber
Islands – oar
Grading – betygsattning
Faroffistanian Personal Numbers – checksum
Mini Golf – minigolf
Booking – booking
Tomatoes – tomater
Will Roger’s Phenomena – willrogers
Yatzee – yatzee
Memory – memory
Chapter Notes
C++ was invented by Danish computer scientist Bjarne Stroustrup. Bjarne has
also published a book on the language, The C++ Programming Language[27],
that contains a more in-depth treatment of the language. It is rather accessible to
C++ beginners but is better read by someone who have some prior programming
experience (in any programming language).
C++ is standardized by the International Organization for Standardization
(ISO). These standards are the authoritative source on what C++ is. The final
drafts of the standards can be downloaded at the homepage of the Standard C++
Foundation1.
There are many online references of the language and its standard library.
The two we use most are:
• http://en.cppreference.com/w/
• http://www.cplusplus.com/reference/
1 https://isocpp.org/
45
C HAPTER 2. P ROGRAMMING IN C++
46
3 The C++ Standard Library
In this chapter we study parts of the C++ standard library – that is, data structures,
algorithms and utilities that are already provided for us without having to code
them ourselves.
We start by examining a number of basic data structures. Data structures
help us organize the data we work with in the hope of making processing
both easier and more efficient. Different data structures serve widely different
purposes and solve different problems. Whether a data structure fits our needs
depends on what operations we wish to perform on the data. We consider
neither the efficiency of the various operations in this chapter nor how they are
implemented. These concerns are postponed until Chapter 6, when we have the
tools to analyze the efficiency of data structures.
The standard library also contains many useful algorithms such as sorting and
various mathematical functions. These are discussed after the data structures.
In the end, we take a deeper look at string handling in C++ and some more
input/output routines.
3.1 vector
One of the latter things discussed in the C++ chapter was the fixed-size array.
As you might remember the array is a special kind of data type that allows us to
store multiple values of the same data type inside what appeared to us as a single
variable. Arrays are a bit awkward to work with in practice. When passing them
as parameters we must also pass along the size of the array. We are also unable
to change the size of arrays once declared nor can we easily remove or insert
elements, or copy arrays.
The dynamic array is a special type of array that can change size (hence
the name dynamic). It also supports operations such as removing and inserting
elements at any position in the list.
The C++ standard library includes a dynamic array called a vector, which is
an alternative name for dynamic arrays in some languages. To use it you must
include the vector file by adding the line
47
C HAPTER 3. T HE C++ S TANDARD L IBRARY
#include <vector>
This angled bracket syntax appears again later when using other C++
structures from the standard library.
Once a vector is created elements can be appended to it using the push_back
member function. The following four statements would add the words Simon is
a fish as separate elements to the vector:
words.push_back("Simon");
words.push_back("is");
words.push_back("a");
words.push_back("fish");
To refer to a specific element in a vector you can use the same operator [] as
for arrays. Thus, words[i] refers to the 𝑖’th value in the vector (starting at 0).
cout << words[0] << " " << words[1] << " "; // Prints Simon is
cout << words[2] << " " << words[3] << " "; // Prints a fish
Like arrays, accessing indices outside the valid range of the vector can cause
weird behaviour in your program.
We can get the current size of an array using the size() member function:
cout << "The vector contains " << words.size() << " words" << endl;
There is also an empty() function that can be used to check if the vector contains
no elements. These two functions are part of basically every standard library
data structure.
Problem 3.1
Vector Functions – vectorfunctions
You can also create dynamic arrays that already contain a number of elements.
This is done by passing an integer argument when first declaring the vector.
They are filled with the same default value as (global) arrays are when created:
vector<int> vec(5); // creates a vector containing 5 zeroes
48
3.1. vector
The value that such an array is filled with can also be set explicitly by using
a two-argument constructor; the second argument is the value to fill the array
with:
vector<int> vec(5, -1); // creates a vector containing 5 -1's
• without a constructor,
We can create vectors that also contain other vectors, to make multidimen-
sional vectors. For example, we could make a 2-dimensional vector (i.e. a grid
of values) in the following way:
vector<vector<int>> grid(7, vector<int>(5));
Since we filled the vector with 7 vectors of length 5, we get 7 × 5 grid of integers.
The values in the grid are then referred to using grid[a][b] where 0 ≤ 𝑎 < 5
and 0 ≤ 𝑏 < 7.
Similarly, one can create 𝑁 -dimensional vectors by creating vectors of
vectors of ... and so on.
Problem 3.2
Cinema Seating – cinemaseating
• assign(n, val): replace the contents of the vector with 𝑛 copies of val.
49
C HAPTER 3. T HE C++ S TANDARD L IBRARY
3.2 Iterators
A concept central to the standard library is the iterator. An iterator is an object
which “points to” an element in some kind of data structure (such as a vector).
Essentially, they are a generalization of the role played by an integer representing
an index of a vector. The reason we could not simply eliminate their use and use
integer indexes directly wherever iterators appear is that some data structures
do not support accessing values directly by their index. Not all data structures
support iterators either.
The type of an iterator for a data structure of type t is t::iterator. An
iterator of a vector<string> thus has the type vector<string>::iterator. Most
of the time we instead use the auto type since this is very long to type.
To get an iterator to the first element of a vector, we use begin():
auto first = words.begin();
We can get the value that an iterator points at using the * operator:
cout << "The first word is " << *first << endl;
In this loop we start by creating an iterator which points to the first element of
the vector. Our update condition will repeatedly move the iterator to the next
element in the vector. The loop condition ensures that the loop breaks when the
iterator first points to the element past the end of the vector.
In modern C++ language versions, there is a shorter construct that is
equivalent to this loop:
50
3.3. queue
In addition to the begin() and end() pair of iterators, there is also rbegin()
and rend(). They work similarly, except that they are reverse iterators - they
iterate in the other direction. Thus, rbegin() actually points to the last element
of the vector, and rend() to an imaginary element before the first element of the
vector. If we move a reverse iterator in a positive direction, we will actually
move it in the opposite direction (i.e. adding 1 to a reverse iterator makes it
point to the element before it in the vector).
Exercise 3.2. Use the rbegin()/rend() iterators to code a loop that iterates
through a vector in the reverse order.
Certain operators on a vector require the use of vector iterators. For example,
the insert and erase member functions, used to insert and erase elements at
arbitrary positions, take iterators to describe positions. When removing the
second element, we write
words.erase(words.begin() + 1);
Problem 3.3
Cut in Line – cutinline
3.3 queue
51
C HAPTER 3. T HE C++ S TANDARD L IBRARY
#include<queue>
queue<int> q;
Exercise 3.4. There is a similar data structured called a dequeue. The standard
library version is named after the abbreviation deque instead. Use one of the
C++ references from the C++ chapter notes (Section 2.14) to find out what this
structure does and what its member functions are called.
3.4 stack
52
3.5. priority_queue
Note the change in terminology. Instead of the first and last elements being
called front and back as for the queue they are instead called top and bottom in a
stack.
3.5 priority_queue
The queue and stack structures are arguably unnecessary, since they can be
emulated using a vector (see Sections 6.2, 6.3). This is not the case for the next
structure, the priority_queue.
The structure is similar to a queue or a stack, but instead of insertions and
extractions happening at one of the endpoints of the structure, the greatest
element is always returned during the extraction.
The structure is located in the same file as the queue structure, so add
#include<queue>
to use it.
To initialize a priority queue, use the same syntax as for the other structures:
priority_queue<int> pq;
This time there is one more way to create the structure that is important to
remember. It is not uncommon to prefer the sorting to be done according to
some other order than descending. For this reason there is another way of
creating a priority queue. One can specify a comparison function that takes
two arguments of the type stored in the queue and returns true if the first one
should be considered less than the second. This function can be given as an
argument to the type in the following way:
bool cmp(int a, int b) {
return a > b;
}
// or equivalently
priority_queue<int, vector<int>, greater<int>> pq;
Note that a priority queue by default returns the greatest element. If we want to
make it return the smallest element, the comparison function needs to instead
53
C HAPTER 3. T HE C++ S TANDARD L IBRARY
say that the smallest of the two elements actually is the greatest, somewhat
counter-intuitively.
Interactions with the queue is similar to that of the other structures:
• push(x): add the element x to the priority queue
Problem 3.4
I Can Guess the Data Structure! – guessthedatastructure
Akcija – akcija
Cookie Selection – cookieselection
Pivot – pivot
54
3.7. M ATH
1 set<int> s;
2 s.insert(4);
3 s.insert(7);
4 s.insert(1);
5
6 // find returns an iterator to the element if it exists
7 auto it = s.find(4);
8 // ++ moves the iterator to the next element in order
9 ++it;
10 cout << *it << endl;
11
12 // if nonexistant, find returns end()
13 if (s.find(7) == s.end()) {
14 cout << "7 is not in the set" << endl;
15 }
16
17 // erase removes the specific element
18 s.erase(7);
19
20 if (s.find(7) == s.end()) {
21 cout << "7 is not in the set" << endl;
22 }
23
24 cout << "The smallest element of s is " << *s.begin() << endl;
a map two types need to be provided – that of the key and that of the value. To
declare a map with string keys and int values you write
map<string, int> m;
Accessing the value associated with a key x is done using the [] operator, for
example, m["Johan"]; would access the value associated with the "Johan" key.
Problem 3.5
Secure Doors – securedoors
Babelfish – babelfish
3.7 Math
Many algorithmic problems require mathematical functions. In particular you
there is a heavy use of square roots and trigonometric functions in geometry
problems. These of these functions are be found in the
55
C HAPTER 3. T HE C++ S TANDARD L IBRARY
#include <cmath>
library.
We list some of the most common such functions here:
• abs(x):
computes |𝑥 | (𝑥 if 𝑥 ≥ 0, otherwise −𝑥)
√
• sqrt(x): computes 𝑥
• exp(x): computes 𝑒 𝑥
56
3.8. A LGORITHMS
Problem 3.6
Vacuumba – vacuumba
Half a Cookie – halfacookie
Ladder – ladder
A1 Paper – a1paper
3.8 Algorithms
A majority of the algorithms we regularly use from the standard library operate
on sequences. To use algorithms, you need to include
#include <algorithm>
Sorting
Sorting a sequences is very easy in C++. The function for doing so is named
sort. It takes two iterators marking the beginning and end of the interval to be
sorted and sorts it in-place in ascending order. For example, to sort the first 10
elements of a vector named v you would use
sort(v.begin(), v.begin() + 10);
Note that the right endpoint of the interval is exclusive – it is not included in
the interval itself. This means that you can provide v.end() as the end of the
interval if you want to sort the vector until the end.
As with priority_queues or sets, the sorting algorithm can take a custom
comparator if you want to sort according to some other order than that defined
by the < operator. For example,
sort(v.begin(), v.end(), greater<int>());
would sort the vector v in descending order. You can provide other sorting
functions as well. For example, you can sort numbers by their absolute value by
passing in the following comparator:
bool cmp(int a, int b) {
return abs(a) < abs(b);
}
sort(v.begin(), v.end(), cmp);
What happens if two values have the same absolute value when sorted with
the above comparator? With sort, this behaviour is not specified: they can
be ordered in any way. Occasionally you want that values compared by your
comparison function as equal are sorted in the same order as they were given
57
C HAPTER 3. T HE C++ S TANDARD L IBRARY
in the input. This is called a stable sort, and is implemented in C++ with the
function stable_sort.
To check if a vector is sorted, the is_sorted function can be used. It takes
the same arguments as the sort function.
Problem 3.7
Shopaholic – shopaholic
Busy Schedule – busyschedule
Sort of Sorting – sortofsorting
Searching
The most basic search operation is the find function. It takes two iterators
representing an interval and a value. If one of the elements in the interval equals
the value, an iterator to the element is returned. In case of multiple matches the
first one is returned. Otherwise, the iterator provided as the end of the interval
is returned. The common usage is
find(v.begin(), v.end(), 5);
Permutations
In some problems, the solution involves iterating through all permutations
(Section ??) of a vector. As one of few languages, C++ has a built-in func-
tions for this purpose: next_permutation. The function takes two iterators as
arguments and rearranges the interval they specify to be the next permutation
in lexicographical order. If there is no such permutation, the interval instead
becomes sorted and the function returns false. This suggests the following
58
3.9. S TRINGS
This do-while-syntax is similar to the while loop, except the condition is checked
after each iteration instead of before. It is equivalent to
sort(v.begin(), v.end());
while (true) {
// do something with v
if (!next_permutation(v.begin(), v.end())) {
break;
}
}
Problem 3.8
Veci – veci
3.9 Strings
We have already used the string type many times before. Until now one of the
essential features of a string has been omitted – a string is to a large extent like
a vector of chars. This is especially true in that you can access the individual
characters of a string using the [] operator. For a string
string thecowsays = "boo";
the expression thecowsays[0] is the character ’b’. Furthermore, you can push_back
new characters to the end of a string.
thecowsays.push_back('p');
Conversions
In some languages, the barrier between strings and e.g. integers is more fuzzy
than in C++. In Java, for example, the code "4" + 2 would append the character
59
C HAPTER 3. T HE C++ S TANDARD L IBRARY
’2’ to the string "4", yielding the string "42". This is not the case in C++ (what
errors do you get if you try to do this?).
Instead, there are other ways to convert between strings and other types. The
easiest way is through using the stringstream class. A stringstream essentially
works as a combined cin and cout. An empty stream is declared by
stringstream ss;
Values can be written to the stream using the << operator and read from it using
the >> operator. This can be exploited to convert strings to and from e.g. numeric
types like this:
stringstream numToString;
numToString << 5;
string val;
numToString >> val; // val is now the string "5"
stringstream stringToNum;
stringToNum << "5";
int val;
stringToNum >> val; // val is now the integer 5
Just as with cin, you can use a stringstream to determine what type the next
word is. If you try to read from a stringstream into an int but the next word is
not an integer, the expression will evaluate to false:
stringstream ss;
ss << "notaninteger";
int val;
if (ss >> val) {
cout << "read an integer!" << endl;
} else {
cout << "next word was not an integer" << endl;
}
Problem 3.10
Filip – filip
Stacking Cups – cups
3.10 Input/Output
Input and output is primarily handled by the cin and cout objects, as previsouly
witnessed. While they are very easy to use, adjustments are sometimes necessary.
60
3.10. I NPUT /O UTPUT
you always know beforehand how many tokens of input you need to read. For
example, lists of integers are often either prefixed by the size of the list or
terminated by some special sentinel value. For those few times when we need
to read input until the end we use the fact that cin >> x is an expression that
evaluates to false if the input reading failed. This is also the case if you try to
read an int but the next word is not actually an integer. This kind of input loop
thus looks something like the following:
int num;
while (cin >> num) {
// do something with num
}
Problem 3.11
A Different Problem – different
Statistics – statistics
Be warned that if you use cin to read a single word that is the last on its line,
the final newline is not consumed. That means that for an input such as
word
blah blah
the code
string word;
cin >> word;
string line;
getline(cin, line);
would produce an empty line! After cin >> word the newline of the line word still
remains, meaning that getline only reads the (zero) remaining characters until
the newline. To avoid this problem, you need to use cin.ignore(); to ignore the
extra newline before your getline.
Once a line has been read we often need to process all the words on the line
one by one. For this, we can use the stringstream:
61
C HAPTER 3. T HE C++ S TANDARD L IBRARY
stringstream line(str);
string word;
while (line >> word) {
// do something with word
}
The stringstream takes an argument that is the string you want to process. After
this, it works just like cin does, except reading input from the string instead of
the terminal. To use stringstream, add the include
#include <sstream>
Problem 3.12
Bacon Eggs and Spam – baconeggsandspam
Compound Words – compoundwords
If the function argument is 𝑥, the precision is set to 10−𝑥 . This means that the
above statement would set the precision of cout to 10−10 . This precision is
normally the relative precision of the output (i.e. the total number of digits to
print). If you want the precision to be absolute (i.e. specify the number of digits
after the decimal point) you write
cout << fixed;
Problem 3.13
A Real Challenge – areal
Chapter Exercises
Problem 3.14
Apaxiaaaaaaaaaaaans! – apaxiaaans
Different Distances – differentdistances
Odd Man Out – oddmanout
62
3.10. I NPUT /O UTPUT
Timebomb – timebomb
Missing Gnomes – missinggnomes
Chapter Notes
In this chapter, only the parts from the standard library we deemed most important
to problem solving were extracted. The standard library is much larger than this,
of course. While you will almost always get by using only what we discussed
additional knowledge of the library can make you a faster, more effective coder.
For a good overview of the library, cppreference.com 1 contains lists of the
library contents categorized by topic.
1 http://en.cppreference.com/w/cpp
63
C HAPTER 3. T HE C++ S TANDARD L IBRARY
64
4 Implementation Problems
The “simplest” kind of problem we solve is those where the statement of a
problem is so detailed that the difficult part is not figuring out the solution, but
implementing it in code. This type of problem is mostly given in the form of
performing some calculation or simulating some process based on a list of rules
stated in the problem.
The Recipe
Swedish Olympiad in Informatics 2011, School Qualifiers (CC BY-SA 3.0)
You have decided to cook some food. The dish you are going to make requires
𝑁 different ingredients. For every ingredient, you know the amount you have at
home, how much you need for the dish, and how much it costs to buy (per unit).
If you do not have a sufficient amount of some ingredient you need to buy
the remainder from the store. Your task is to compute the cost of buying the
remaining ingredients.
Input
The first line of input is an integer 𝑁 ≤ 10, the number of ingredients in the
dish.
The next 𝑁 lines contain the information about the ingredients, one per line.
An ingredient is given by three space-separated integers 0 ≤ ℎ, 𝑛, 𝑐 ≤ 200 – the
amount you have, the amount you need, and the cost per unit for this ingredient.
Output
Output a single integer – the cost for purchasing the remaining ingredients
needed to make the dish.
This problem is not particularly hard. For every ingredient we need to
calculate the amount which we need to purchase. The only gotcha in the problem
is the mistake of computing this as 𝑛 − ℎ. The correct formula is max(0, 𝑛 − ℎ),
required in case of the luxury problem of having more than we need. We then
multiply this number by the ingredient cost and sum the costs up for all the
ingredients. A solution would look something like the following.
65
C HAPTER 4. I MPLEMENTATION P ROBLEMS
Game Rank
Nordic Collegiate Programming Contest 2016 – Jimmy Mårdell (CC BY-SA 3.0)
The gaming company Sandstorm is developing an online two player game. You
have been asked to implement the ranking system. All players have a rank
determining their playing strength which gets updated after every game played.
There are 25 regular ranks, and an extra rank, “Legend”, above that. The ranks
are numbered in decreasing order, 25 being the lowest rank, 1 the second highest
rank, and Legend the highest rank.
Each rank has a certain number of “stars” that one needs to gain before
advancing to the next rank. If a player wins a game, she gains a star. If before
the game the player was on rank 6-25, and this was the third or more consecutive
win, she gains an additional bonus star for that win. When she has all the stars
for her rank (see list below) and gains another star, she will instead gain one
rank and have one star on the new rank.
For instance, if before a winning game the player had all the stars on her
current rank, she will after the game have gained one rank and have 1 or 2 stars
(depending on whether she got a bonus star) on the new rank. If on the other
hand she had all stars except one on a rank, and won a game that also gave her a
66
bonus star, she would gain one rank and have 1 star on the new rank.
If a player on rank 1-20 loses a game, she loses a star. If a player has zero
stars on a rank and loses a star, she will lose a rank and have all stars minus one
on the rank below. However, one can never drop below rank 20 (losing a game
at rank 20 with no stars will have no effect).
If a player reaches the Legend rank, she will stay legend no matter how many
losses she incurs afterwards.
The number of stars on each rank are as follows:
A player starts at rank 25 with no stars. Given the match history of a player,
what is her rank at the end of the sequence of matches?
Input
The input consists of a single line describing the sequence of matches. Each
character corresponds to one game; ‘W’ represents a win and ‘L’ a loss. The
length of the line is between 1 and 10 000 characters (inclusive).
Output
Output a single line containing a rank after having played the given sequence of
games; either an integer between 1 and 25 or “Legend”.
A very long problem statement! The first hurdle is finding the energy to
read it from start to finish without skipping any details. Not much creativity
is needed here – indeed, the algorithm to implement is given in the statement.
Despite this, it is not as easy as one would think. Although it was the second
most solved problem at the contest where it was used in, it was also the one
with the worst success ratio. On average, a team needed 3.59 attempts before
getting a correct solution, compared to the runner-up problem at 2.92 attempts.
None of the top 6 teams in the contest got the problem accepted on their first
attempt. Failed attempts cost a lot. Not only in absolute time, but many forms
of competition include additional penalties for submitting incorrect solutions.
Implementation problems get much easier when you know your programming
67
C HAPTER 4. I MPLEMENTATION P ROBLEMS
language well and can use it to write good, structured code. Split code into
functions, use structures, and give your variables good names and implementation
problems become easier to code. A solution to the Game Rank problem which
attempts to use this approach is given here:
1 #include <bits/stdc++.h>
2
3 using namespace std;
4
5 int curRank = 25, curStars = 0, conseqWins = 0;
6
7 int starsOfRank() {
8 if (curRank >= 21) return 2;
9 if (curRank >= 16) return 3;
10 if (curRank >= 11) return 4;
11 if (curRank >= 1) return 5;
12 assert(false);
13 }
14
15 void addStar() {
16 if (curStars == starsOfRank()) {
17 --curRank;
18 curStars = 0;
19 }
20 ++curStars;
21 }
22
23 void addWin() {
24 int curStarsWon = 1;
25 ++conseqWins;
26 if (conseqWins >= 3 && curRank >= 6) curStarsWon++;
27
28 for (int i = 0; i < curStarsWon; i++) {
29 addStar();
30 }
31 }
32
33 void loseStar() {
34 if (curStars == 0) {
35 if (curRank == 20) return;
36 ++curRank;
37 curStars = starsOfRank();
38 }
39 --curStars;
40 }
41
42 void addLoss() {
43 conseqWins = 0;
44 if (curRank <= 20) loseStar();
68
45 }
46
47 int main() {
48 string seq;
49 cin >> seq;
50 for (char res : seq) {
51 if (res == 'W') addWin();
52 else addLoss();
53 if (curRank == 0) break;
54 assert(1 <= curRank && curRank <= 25);
55 assert(0 <= curStars && curStars <= starsOfRank());
56 }
57 if (curRank == 0) cout << "Legend" << endl;
58 else cout << curRank << endl;
59 }
Note the use of the assert() function. The function takes a single boolean
parameter and crashes the program with an assertion failure if the parameter
evaluated to false. This is helpful when solving problems since it allows us to
verify that assumptions we make regarding the internal state of the program
indeed holds. In fact, when the above solution was written the assertions in it
actually managed to catch some bugs before submitting the problem!
Problem 4.1
Game Rank – gamerank
Mate in One
Introduction to Algorithms at Danderyds Gymnasium
"White to move, mate in one."
When you are looking back in old editions of the New in Chess magazine,
you find loads of chess puzzles. Unfortunately, you realize that it was way too
long since you played chess. Even trivial puzzles such as finding a mate in one
now far exceed your ability.
But, perseverance is the key to success. You realize that you can instead use
your new-found algorithmic skills to solve the problem by coding a program to
find the winning move.
69
C HAPTER 4. I MPLEMENTATION P ROBLEMS
Write a program to output the move white should play to mate black.
Input
The board is given as a 8 × 8 grid of letters. The letter . represent an empty
space, the characters pbnrqk represent a white pawn, bishop, knight, rook, queen
and king, and the characters PBNRQK represents a black pawn, bishop, knight,
rook, queen and king.
Output
Output a move on the form a1b2, where a1 is the square to move a piece from
(written as the column, a-h, followed by the row, 1-8) and b2 is the square to
move the piece to.
Our first solution attempt clocks in at about 300 lines.
1 #include <bits/stdc++.h>
2 using namespace std;
3
4 #define rep(i,a,b) for (int i = (a); i < (b); ++i)
5 #define trav(it, v) for (auto& it : v)
6 #define all(v) (v).begin(), (v).end()
7 typedef pair<int, int> ii;
8 typedef vector<ii> vii;
9 template <class T> int size(T &x) { return x.size(); }
10
11 char board[8][8];
12
13 bool iz_empty(int x, int y) {
14 return board[x][y] == '.';
15 }
16
17 bool is_white(int x, int y) {
18 return board[x][y] >= 'A' && board[x][y] <= 'Z';
19 }
1If you are not aware of this special pawn rule, do not worry – knowledge of it is irrelevant with
regard to the problem.
70
20
21 bool is_valid(int x, int y) {
22 return x >= 0 && x < 8 && y >= 0 && y < 8;
23 }
24
25 int rook[8][2] = {
26 {1, 2},
27 {1, -2},
28 {-1, 2},
29 {-1, -2},
30
31 {2, 1},
32 {-2, 1},
33 {2, -1},
34 {-2, -1}
35 };
36
37 void display(int x, int y) {
38 printf("%c%d", y + 'a', 7 - x + 1);
39 }
40
41 vii next(int x, int y) {
42 vii res;
43
44 if (board[x][y] == 'P' || board[x][y] == 'p') {
45 // pawn
46
47 int dx = is_white(x, y) ? -1 : 1;
48
49 if (is_valid(x + dx, y) && iz_empty(x + dx, y)) {
50 res.push_back(ii(x + dx, y));
51 }
52
53 if (is_valid(x + dx, y - 1)
54 && is_white(x, y) != is_white(x + dx, y - 1)) {
55 res.push_back(ii(x + dx, y - 1));
56 }
57
58 if (is_valid(x + dx, y + 1)
59 && is_white(x, y) != is_white(x + dx, y + 1)) {
60 res.push_back(ii(x + dx, y + 1));
61 }
62
63 } else if (board[x][y] == 'N' || board[x][y] == 'n') {
64 // knight
65
66 for (int i = 0; i < 8; i++) {
67 int nx = x + rook[i][0],
68 ny = y + rook[i][1];
69
71
C HAPTER 4. I MPLEMENTATION P ROBLEMS
72
120 }
121
122 if (iz_empty(nx, ny) || is_white(x, y) != is_white(nx, ny)) {
123 res.push_back(ii(nx, ny));
124 }
125
126 if (!iz_empty(nx, ny)) {
127 break;
128 }
129 }
130 }
131 }
132
133 } else if (board[x][y] == 'Q' || board[x][y] == 'q') {
134 // queen
135
136 for (int dx = -1; dx <= 1; dx++) {
137 for (int dy = -1; dy <= 1; dy++) {
138 if (dx == 0 && dy == 0)
139 continue;
140
141 for (int k = 1; ; k++) {
142 int nx = x + dx * k,
143 ny = y + dy * k;
144
145 if (!is_valid(nx, ny)) {
146 break;
147 }
148
149 if (iz_empty(nx, ny) || is_white(x, y) != is_white(nx, ny)) {
150 res.push_back(ii(nx, ny));
151 }
152
153 if (!iz_empty(nx, ny)) {
154 break;
155 }
156 }
157 }
158 }
159
160
161 } else if (board[x][y] == 'K' || board[x][y] == 'k') {
162 // king
163
164 for (int dx = -1; dx <= 1; dx++) {
165 for (int dy = -1; dy <= 1; dy++) {
166 if (dx == 0 && dy == 0)
167 continue;
168
169 int nx = x + dx,
73
C HAPTER 4. I MPLEMENTATION P ROBLEMS
170 ny = y + dy;
171
172 if (is_valid(nx, ny) && (iz_empty(nx, ny) ||
173 is_white(x, y) != is_white(nx, ny))) {
174 res.push_back(ii(nx, ny));
175 }
176 }
177 }
178 } else {
179 assert(false);
180 }
181
182 return res;
183 }
184
185 bool is_mate() {
186
187 bool can_escape = false;
188
189 char new_board[8][8];
190
191 for (int x = 0; !can_escape && x < 8; x++) {
192 for (int y = 0; !can_escape && y < 8; y++) {
193 if (!iz_empty(x, y) && !is_white(x, y)) {
194
195 vii moves = next(x, y);
196 for (int i = 0; i < size(moves); i++) {
197 for (int j = 0; j < 8; j++)
198 for (int k = 0; k < 8; k++)
199 new_board[j][k] = board[j][k];
200
201 new_board[moves[i].first][moves[i].second] = board[x][y];
202 new_board[x][y] = '.';
203
204 swap(new_board, board);
205
206
207 bool is_killed = false;
208 for (int j = 0; !is_killed && j < 8; j++) {
209 for (int k = 0; !is_killed && k < 8; k++) {
210 if (!iz_empty(j, k) && is_white(j, k)) {
211 vii nxts = next(j, k);
212
213 for (int l = 0; l < size(nxts); l++) {
214 if (board[nxts[l].first][nxts[l].second] == 'k') {
215 is_killed = true;
216 break;
217 }
218 }
219 }
74
220 }
221 }
222
223 swap(new_board, board);
224
225 if (!is_killed) {
226 can_escape = true;
227 break;
228 }
229 }
230
231 }
232 }
233 }
234
235 return !can_escape;
236 }
237
238 int main()
239 {
240 for (int i = 0; i < 8; i++) {
241 for (int j = 0; j < 8; j++) {
242 scanf("%c", &board[i][j]);
243 }
244
245 scanf("\n");
246 }
247
248 char new_board[8][8];
249 for (int x = 0; x < 8; x++) {
250 for (int y = 0; y < 8; y++) {
251 if (!iz_empty(x, y) && is_white(x, y)) {
252
253 vii moves = next(x, y);
254
255 for (int i = 0; i < size(moves); i++) {
256
257 for (int j = 0; j < 8; j++)
258 for (int k = 0; k < 8; k++)
259 new_board[j][k] = board[j][k];
260
261 new_board[moves[i].first][moves[i].second] = board[x][y];
262 new_board[x][y] = '.';
263
264 swap(new_board, board);
265
266
267 if (board[moves[i].first][moves[i].second] == 'P' &&
268 moves[i].first == 0) {
269
75
C HAPTER 4. I MPLEMENTATION P ROBLEMS
That is a lot of code! Note how there are a few obvious mistakes which
makes the code harder to read, such as typo of iz_empty instead of is_empty, or
how the list of moves for the knight is called rook. Our final solution reduces
this to less than half the size.
Exercise 4.1. Read through the above code carefully and consider if there are
better ways to solve the problem. Furthermore, it has a bug – can you find it?
First, let us clean up the move generation a bit. Currently, it is implemented
as the function next, together with some auxillary data (lines 25-179). It is not
particularly abstract, plagued by a lot of code duplication.
The move generation does not need a lot of code. Almost all the moves of
the pieces can be described in the same way, as: “pick a direction out of a list
𝐷 and move at most 𝐿 steps along this direction, stopping either before exiting
76
the board or taking your own piece, or when taking another piece.”. For the
king and queen, 𝐷 is all 8 directions one step away, with 𝐿 = 1 for the king and
𝐿 = ∞ for the queen.
Implementing this abstraction is done with little code.
1 const vii DIAGONAL = {{-1, 1}, {-1, 1}, {1, -1}, {1, 1}};
2 const vii CROSS = {{0, -1}, {0, 1}, {-1, 0}, {1, 0}};
3 const vii ALL_MOVES = {{-1, 1}, {-1, 1}, {1, -1}, {1, 1},
4 {0, -1}, {0, 1}, {-1, 0}, {1, 0}};
5 const vii KNIGHT = {{-1, -2}, {-1, 2}, {1, -2}, {1, 2},
6 {-2, -1}, {-2, 1}, {2, -1}, {2, 1}};
7 vii directionMoves(const vii& D, int L, int x, int y) {
8 vii moves;
9 trav(dir, D) {
10 rep(i,1,L+1) {
11 int nx = x + dir.first * i, ny = y + dir.second * i;
12 if (!isValid(nx, ny)) break;
13 if (isEmpty(nx, ny)) moves.emplace_back(nx, ny);
14 else {
15 if (isWhite(x, y) != isWhite(nx, ny)) moves.emplace_back(nx, ny);
16 break;
17 }
18 }
19 }
20 return moves;
21 }
A short and sweet abstraction, that will prove very useful. It handles all
possible moves, except for pawns. These have a few special cases.
1 vii pawnMoves(int x, int y) {
2 vii moves;
3 if (x == 0 || x == 7) {
4 vii queenMoves = directionMoves(ALL_MOVES, 16, x, y);
5 vii knightMoves = directionMoves(KNIGHT, 1, x, y);
6 queenMoves.insert(queenMoves.begin(), all(knightMoves));
7 return queenMoves;
8 }
9 int mv = (isWhite(x, y) ? - 1 : 1);
10 if (isValid(x + mv, y) && isEmpty(x + mv, y)) {
11 moves.emplace_back(x + mv, y);
12 bool canMoveTwice = (isWhite(x, y) ? x == 6 : x == 1);
13 if (canMoveTwice && isValid(x + 2 * mv, y) && isEmpty(x + 2 * mv, y)) {
14 moves.emplace_back(x + 2 * mv, y);
15 }
16 }
17 auto take = [&](int nx, int ny) {
18 if (isValid(nx, ny) && !isEmpty(nx, ny)
19 && isWhite(x, y) != isWhite(nx, ny))
77
C HAPTER 4. I MPLEMENTATION P ROBLEMS
20 moves.emplace_back(nx, ny);
21 };
22 take(x + mv, y - 1);
23 take(x + mv, y + 1);
24 return moves;
25 }
This pawn implementation also takes care of promotion, rendering the logic
previously implementing this obsolete.
The remainder of the move generation is now implemented as:
1 vii next(int x, int y) {
2 vii moves;
3 switch(toupper(board[x][y])) {
4 case 'Q': return directionMoves(ALL_MOVES, 16, x, y);
5 case 'R': return directionMoves(CROSS, 16, x, y);
6 case 'B': return directionMoves(DIAGONAL, 16, x, y);
7 case 'N': return directionMoves(KNIGHT, 1, x, y);
8 case 'K': return directionMoves(ALL_MOVES, 1, x, y);
9 case 'P': return pawnMoves(x, y);
10 }
11 return moves;
12 }
We also have some duplication in the code making the moves. Before
extracting this logic, we will change the structure used to represent the board. A
78
char[8][8] is a tedious structure to work with. It is not easily copied or sent as
parameter. Instead, we use a vector<string>, typedef’d as Board:
typedef vector<string> Board;
Hmm... there should be one more thing in common between the main and
is_mate functions. Namely, to check if the current player is in check after a move.
However, it seems this is not done in the main function – a bug. Since we do
need to do this twice, it should probably be its own function:
1 bool inCheck(bool white) {
2 trav(mv, getMoves(!white)) {
3 ii to = mv.second;
4 if (!isEmpty(to.first, to.second)
5 && isWhite(to.first, to.second) == white
6 && toupper(board[to.first][to.second]) == 'K') {
7 return true;
8 }
9 }
10 return false;
11 }
Now, the long is_mate function is much shorter and readable, thanks to our
refactoring:
1 bool isMate() {
2 if (!inCheck(false)) return false;
3 Board oldBoard = board;
4 trav(mv, getMoves(false)) {
5 board = doMove(mv);
6 if (!inCheck(false)) return false;
7 board = oldBoard;
8 }
9 return true;
10 }
79
C HAPTER 4. I MPLEMENTATION P ROBLEMS
1 int main() {
2 rep(i,0,8) {
3 string row;
4 cin >> row;
5 board.push_back(row);
6 }
7 Board oldBoard = board;
8 trav(mv, getMoves(true)) {
9 board = doMove(mv);
10 if (!inCheck(true) && isMate()) {
11 outputSquare(mv.first.first, mv.first.second);
12 outputSquare(mv.second.first, mv.second.second);
13 cout << endl;
14 break;
15 }
16 }
17 return 0;
18 }
Now, we have actually rewritten the entire solution. From the 300-line
behemoth with gigantic functions, we have refactored the solution into few,
short functions with are easy to follow. The rewritten solution is less than half
the size, clocking in at less than 140 lines (the author’s own solution is 120
lines). Learning to code such structured solutions comes to a large extent from
experience. During a competition, we might not spend time thinking about
how to structure our solutions, instead focusing on getting it done as soon as
possible. However, spending 1-2 minutes thinking about how to best implement
a complex solution could pay off not only in faster implementation times (such
as halving the size of the program) but also in being less buggy.
Problem 4.2
Mate in One – mateinone
Chapter Exercises
Problem 4.3
Flexible Spaces – flexiblespaces
80
Permutation Encryption – permutationencryption
Jury Jeopardy – juryjeopardy
Fun House – funhouse
Settlers of Catan – settlers2
Cross – cross
Basic Interpreter – basicinterpreter
Cat Coat Colors – catcoat
Chapter Notes
Many good sources exist to become more proficient at writing readable and
simple code. Clean Code[17] describes many principles that helps in writing
better code. It includes good walk-throughs on refactoring, and shows in a very
tangible fashion how coding cleanly also makes coding easier.
Code Complete[18] is a huge tome on improving your programming skills.
While much of the content is not particularly relevant to coding algorithmic
problems, chapters 5-19 give many suggestions on coding style.
Different languages have different best practices. Some resources on
improving your skills in whatever language you code in are:
81
C HAPTER 4. I MPLEMENTATION P ROBLEMS
82
5 Time Complexity
How do you know if your algorithm is fast enough before you have coded it? In
this chapter we examine this question from the perspective of time complexity, a
common tool of algorithm analysis to determine roughly how fast an algorithm
is.
We start our study of complexity by looking at a new sorting algorithm
– insertion sort. Just like selection sort (studied in Chapter 1), insertion sort
works by iteratively sorting a sequence.
In this section we determine how long time insertion sort takes to run. When
analyzing an algorithm we do not attempt to compute the actual wall clock time
an algorithm takes. Indeed, this would be nearly impossible a priori – modern
83
C HAPTER 5. T IME C OMPLEXITY
When sorting fixed-size integers the size of the input would be the number of
elements we are sorting, 𝑁 . We denote the time the algorithm takes in relation
to 𝑁 as 𝑇 (𝑁 ). Since an algorithm often has different behaviours depending on
how an instance is constructed, this time is taken to be the worst-case time, over
every instance of 𝑁 elements.
5 2 4 1 3 0 t0 = 0
5 2 4 1 3 0 t1 = 1
2 5 4 1 3 0 t2 = 1
2 4 5 1 3 0 t3 = 3
1 2 4 5 3 0 t4 = 2
1 2 3 4 5 0 t5 = 5
0 1 2 3 4 5
84
5.1. T HE C OMPLEXITY OF I NSERTION S ORT
To analyze the running time of the algorithm, we make the assumption that
any “sufficiently small” operation takes the same amount of time – exactly 1
(of some undefined unit). We have to be careful in what assumptions we make
regarding what a sufficiently small operation means. For example, sorting 𝑁
numbers is not a small operation, while adding or multiplying two fixed-size
numbers is. Multiplication of integers of arbitrary size is not a small operation
(see the Karatsuba algorithm, Section 12.4).
In our program every line happens to represent a small operation. However,
the two loops may cause some lines to execute more than once. The outer for
loop will execute 𝑁 times. The number of times the inner loop runs depends
on how the input looks. We introduce the notation 𝑡𝑖 to mean the number of
iterations the inner loop runs during the 𝑖’th iteration of the outer loop. These
are included in figure 5.1 for every iteration.
Now we can annotate our pseudo code with the number of times each line
executes.
𝑁 −1
! 𝑁 −1
! 𝑁 −1
!
Õ Õ Õ
𝑇 (𝑁 ) = 𝑁 + 𝑁 + 𝑡𝑖 + 𝑡𝑖 + 𝑡𝑖
𝑖=0 𝑖=0 𝑖=0
85
C HAPTER 5. T IME C OMPLEXITY
𝑁 −1
!
Õ
=3 𝑡𝑖 + 2𝑁
𝑖=0
We still have some 𝑡𝑖 variables left so we do not truly have a function of 𝑁 .
We can eliminate this by realizing that in the worst case 𝑡𝑖 = 𝑖. This occurs when
the list we are sorting is in descending order. Each element must then be moved
to the front, requiring 𝑖 swaps for the 𝑖’th element.
With this substition we can simplify the expression:
𝑁 −1
!
Õ
𝑇 (𝑁 ) = 3 𝑖 + 2𝑁
𝑖=0
(𝑁 − 1)𝑁
=3 + 2𝑁
2
3 2
= (𝑁 − 𝑁 ) + 2𝑁
2
3 𝑁
= 𝑁2 +
2 2
This function grows quadratically with the number of elements of 𝑁 . Since
the approximate growth of the time a function takes is assigned such importance
a notation was developed for it.
𝑇 (𝑁 ) = 𝑂 (𝑁 2 )
86
5.2. A SYMPTOTIC N OTATION
100N + 1337
N2
Intuitively, the notation means that 𝑓 (𝑛) grows slower than or as fast as
𝑔(𝑛), within a constant factor. Any quadratic function 𝑎𝑛 2 + 𝑏𝑛 + 𝑐 = 𝑂 (𝑛 2 ).
Similarly, any linear function 𝑎𝑛 + 𝑏 = 𝑂 (𝑛 2 ) as well. This definition implies
that for two functions 𝑓 and 𝑔 which are always within a constant factor of each
other, we have that both 𝑓 (𝑛) = 𝑂 (𝑔(𝑛)) and 𝑔(𝑛) = 𝑂 (𝑓 (𝑛)).
We can use this definition to prove that the running time of insertion sort is
𝑂 (𝑁 2 ), even in the worst case.
87
C HAPTER 5. T IME C OMPLEXITY
Complexity 𝑛
7
𝑂 (log 𝑛) 2 (10 )
√
𝑂 ( 𝑛) 1014
𝑂 (𝑛) 107
𝑂 (𝑛 log 𝑛) 106
√
𝑂 (𝑛 𝑛) 105
𝑂 (𝑛 2 ) 5 · 103
𝑂 (𝑛 2 log 𝑛) 2 · 103
𝑂 (𝑛 3 ) 300
𝑂 (2𝑛 ) 24
𝑂 (𝑛2𝑛 ) 20
𝑂 (𝑛 2 2𝑛 ) 17
𝑂 (𝑛!) 11
Note that this is in no way a general rule – while complexity does not bother about
constant factors, wall clock time does!
Complexity analysis can also be used to determine lower bounds of the time
an algorithm takes. To reason about lower bounds we use Ω-notation. It is
similar to 𝑂-notation except it describes the reverse relation.
88
5.2. A SYMPTOTIC N OTATION
In this case, both the lower and the upper bound of the worst-case running
time of insertion sort coincided (asymptotically). We have another notation for
when this is the case:
Definition 5.3 — Θ-notation
If 𝑓 (𝑛) = 𝑂 (𝑔(𝑛)) and 𝑓 (𝑛) = Ω(𝑔(𝑛)), we say that 𝑓 (𝑛) = Θ(𝑔(𝑛)).
Thus, the worst-case running time for insertion sort is Θ(𝑛 2 ).
There are many ways of computing the time complexity of an algorithm. The
most common case is when a program has 𝐾 nested loops, each of with performs
𝑂 (𝑀) iterations. The complexity of these loops are then 𝑂 (𝑀 𝐾 · 𝑓 (𝑁 )) if the
inner-most operation takes 𝑂 (𝑓 (𝑁 )) time. In Chapter 12, you also see some
ways of computing the time complexity of a particular type of recursive solution,
called Divide and Conquer algorithms.
Exercise 5.1. Find a lower and an upper bound that coincide for the best-case
running time for insertion sort.
Exercise 5.2. Give a Θ(𝑛) algorithm and a Θ(1) algorithm to compute the sum
of the 𝑛 first integers.
Exercise 5.4. Prove that 𝑓 (𝑛) + 𝑔(𝑛) = Θ(max{𝑓 (𝑛), 𝑔(𝑛)}) for non-negative
functions 𝑓 and 𝑔.
89
C HAPTER 5. T IME C OMPLEXITY
Amortized Complexity
Consider the following algorithm:
90
5.2. A SYMPTOTIC N OTATION
compute the number of times it executes even though we do not know how
many times it executes nor how many iterations each execution takes? We try
the amortization trick by looking at how many iterations it performs over all
those executions, no matter how many they are. Assume that the loop is run
𝑘 times (including the final time when the condition first is false) and each
run iterates 𝑏𝑖 times (1 ≤ 𝑖 ≤ 𝑘). We claim that
𝑏 1 + 𝑏 2 + · · · + 𝑏𝑘 = Θ(𝑁 )
Our reasoning is as follows. There are two ways the variable 𝑖 can increase.
It can either be increased inside the loop at line 7, or at line 10. If the loop
executes 𝑁 times in total, it will certainly complete and never be executed again
since the loop at line 5 completes too. This gives us 𝑘𝑖=1 𝑏𝑖 = 𝑂 (𝑁 ).
Í
On the other hand, we get one iteration for every time 𝑖 is incresed. If 𝑖 is
increased on line 7, it was done within a loop iteration. If 𝑖 is increased on line
9, we instead count the final check if the loop just before it once. Each addition
of 𝑖 seems happen together with an iteration of the loop, so 𝑘𝑖=1 𝑏𝑖 = Ω(𝑁 ).
Í
1: procedure BinaryIncrement(𝐷)
2: 𝑖←0
3: while 𝐷 [𝑖] = 1 do ⊲ Add 1 to the 𝑖’th digit
4: 𝐷 [𝑖] = 0 ⊲ We add 1 to a 1 digit, resulting in a 0 digit plus a carry
5: 𝑖 ←𝑖 +1
6: 𝐷 [𝑖] = 1 ⊲ We add 1 to a digit not resulting in a carry
The algorithm is the binary version of the normal addition algorithm where the
two addends are written above each other and each resulting digit is computed
one at a time, possibly with a carry digit.
What is the amortized complexity of this procedure over 2𝑛 calls, if 𝐷 starts
out as 0?
91
C HAPTER 5. T IME C OMPLEXITY
92
5.6. A DDITIONAL E XERCISES
Exercise 5.9. Order the following functions by their asymptotic growth with
proof !
• 𝑥
√
• 𝑥
93
C HAPTER 5. T IME C OMPLEXITY
• 𝑥2
• 2𝑥
• 𝑒𝑥
• 𝑥!
• log 𝑥
1
• 𝑥
• 𝑥 log 𝑥
• 𝑥3
Exercise 5.10. 1) Prove that if 𝑎(𝑥) = 𝑂 (𝑏 (𝑥)) and 𝑐 (𝑥) = 𝑂 (𝑑 (𝑥)), then
𝑎(𝑥) + 𝑐 (𝑥) = 𝑂 (𝑏 (𝑥) + 𝑑 (𝑥)).
2) Prove that if 𝑎(𝑥) = 𝑂 (𝑏 (𝑥)) and 𝑐 (𝑥) = 𝑂 (𝑑 (𝑥)), then 𝑎(𝑥) · 𝑐 (𝑥) =
𝑂 (𝑏 (𝑥) · 𝑑 (𝑥)).
94
5.7. C HAPTER N OTES
95
C HAPTER 5. T IME C OMPLEXITY
96
6 Data Structures
Solutions to algorithmic problems consists of two constructs – algorithms and
data structures. Data structures are used to organize the data that the algorithms
operate on. For example, the array is such a data structure.
Many data structures has been developed to handle particular common
operations we need to perform on data quickly. In this chapter we discuss some
of the basic data structures used in programming. We have chosen an approach
that is perhaps more theoretic than many other problem solving texts when
it comes to the basic structures. In particular, we have chosen to explicitly
discuss not only the data structures themselves and their complexities, but
also their implementations. We do this mainly because we believe that their
implementations show useful algorithmic techniques. While you may feel that
you can simply skip this chapter if you are familiar with all the data structures,
we advise that you still read through the sections for the structures for which
you lack confidence in their implementations.
1This complexity is debatable, and highly dependent on what computational model one uses. In
practice, this is roughly “constant time” in most memory management libraries used in C++. In all
Java and Python implementations we tried, it is instead linear in the size.
97
C HAPTER 6. D ATA S TRUCTURES
In Chapter 2 we saw how to create fixed-size arrays where we knew the size
beforehand. In C++ we can create fixed-size arrays using an expression as size
instead. This is done using the syntax above, for example:
int size = 5;
int arr[] = new int[size];
arr[2] = 5;
cout << arr[2] << endl;
delete arr;
Exercise 6.1. What happens if you try to create an array with a negative size?
The fixed-size array can be used to implement a more useful data structure,
the dynamic array. This is an array that can change size when needed. For
example, we may want to repeatedly insert values in the array without knowing
the total number of values beforehand. This is a very common requirement
in programming problems. In particular, we want to support two additional
operations in addition to the operations supported by the fixed-size array.
98
6.1. D YNAMIC A RRAYS
create a fixed-size array that is larger than we actually need it to be. For example,
if we create a fixed-size array with 𝑛 more elements than our dynamic array
needs to store, we will not have to increase the size of the backing fixed-size
array until we have added 𝑛 more elements to the dynamic array. This means that
a dynamic array does not only have a size, the number of elements we currently
store in it, but also a capacity, the number of elements we could store in it. See
Figure 6.1 for a concrete example of what happens when we add elements to a
dynamic array that is both within its capacity and when we exceed it.
size = 4 size = 5
0 1 2 3 0 1 2 3 4
cap = 5 cap = 5
size = 6
0 1 2 3 4 5
cap = 10
1 struct DynamicArray {
2 int capacity;
3 int size;
4 int backing[];
5
6 DynamicArray() {
7 capacity = 10;
8 size = 0;
9 backing = new int[10];
10 }
11 };
We are almost ready to add and remove elements to our array now. First, we
99
C HAPTER 6. D ATA S TRUCTURES
need to handle the case where insertion of a new element would result in the size
of the dynamic array would exceed its capacity, that is when 𝑠𝑖𝑧𝑒 = 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦.
Our previous suggestion was to allocate a new, bigger one, but just how big?
If we always add, say, 10 new elements to the capacity, we have to perform
the copying of the old elements with every 10’th addition. This still results in
additions to the end of the array taking linear time on average. There is a neat
trick that avoids this problem; creating the new backing array with double the
current capacity.
This ensures that the complexity of all copying needed for an array up to
some certain capacity have an amortized complexity of Θ(cap). Assume that
we have just increased the capacity of our array to cap, which required us to
copy cap2 elements. Then, the previous increase will have happened at around
capacity cap cap
2 and took time 4 . The one before that occurred at capacity 4
cap
and so on.
We can sum up all of this copying:
cap cap
+ + · · · ≤ cap
2 4
using the formula for the sum of a geometric series.
Since each copy is assumed to take Θ(1) time, the total time to create this
array was Θ(cap). As cap 2 ≤ size ≤ cap, this is also Θ(size), meaning that
adding size elements to the end of the dynamic array takes amortized Θ(size)
time.
When implementing this in code, we use a function that takes as argument
the capacity we require the dynamic array to have and ensures that the backing
array have at least this size, possibly by creating a new one double in size until it
is sufficiently large. Example code for this can be found in Listing 6.2.
With this method in hand, insertion and removal of elements is actually
pretty simple. Whenever we remove an element, we simply need to move the
elements coming after it in the dynamic array forward one step. See Figure 6.2
for an illustration of an element being removed.
When adding an element, we reverse this process by moving the elements
coming after the position we wish to insert a new element at one step towards
the back. An example of this is shown in Figure 6.3.
Exercise 6.2. Implement insertion and removal of elements in a dynamic array.
Dynamic arrays are called vectors in C++ (Section 3.1). They have the same
complexities as the one described at the beginning of this section.
100
6.2. S TACKS
size = 6 size = 6
0 1 2 3 4 5 0 1 × ←3 ←4 ←5
cap = 10 cap = 10
size = 5
0 1 3 4 5
cap = 10
Exercise 6.3. How can any element be removed in Θ(1) if we ignore the ordering
of values in the array?
6.2 Stacks
The stack is a data structure that contains an ordered lists of values and supports
the following operations:
The structure is easily implemented with the above time complexities using
101
C HAPTER 6. D ATA S TRUCTURES
size = 5 size = 5
0 1 3 4 5 0 1 ∗→3→4 →5
cap = 10 cap = 10
size = 6
0 1 2 3 4 5
cap = 10
a dynamic vector. After all, the vector supports exactly the same operations that
a stack requires. In C++, the stack is called a stack (Section 3.4).
Exercise 6.4. Implement a stack using a dynamic vector.
6.3 Queues
The queue is, like the vector and the stack, an ordered list of values. Instead of
removing and getting values from the end like the stack, it gets the value from
the front. The supported operations are thus:
• push(𝑣𝑎𝑙): inserting a value at the end of the queue.
Amortized complexity: Θ(1)
102
6.4. P RIORITY Q UEUES
size = 4
size = 5
1 2 3 4 1 2 3 4 5
f ront = 1 f ront = 1
size = 4
2 3 4 5
f ront = 2
Exercise 6.6. A queue can also be implemented using two stacks. How?
In a similar manner, a stack can also be implemented using two queues.
How?
103
C HAPTER 6. D ATA S TRUCTURES
Binary Trees
A binary tree is a rooted tree where every vertex have either 0, 1 or 2 children.
In Figure 6.5a, you can see an example of a binary tree.
2 1
5 7 2 3
11 4 4 5 6
(a) A non-complete binary (b) A complete binary tree.
tree.
We call a binary tree complete if every level of the tree is completely filled,
except possibly the bottom one. If the bottom level is not filled, all the vertices
need to be as far as possible to the left. In a complete binary tree, we can order
every vertex as we do in Figure 6.5b, i.e. from the top down, left to right at each
layer.
The beauty of this numbering is that we can use it to store a binary tree in a
vector. Since each vertex is given a number, we can map the number of each
vertex into a position in a vector. The 𝑛 vertices of a complete binary tree then
occupy all the indices [1, 𝑛]. An important property of this numbering is that
it is easy to compute the number of the parent, left child and right child of a
vertex. If a vertex has number 𝑖, the parent simply have number b 2𝑖 c, the left
child has number 2𝑖 and the right child has number 2𝑖 + 1.
Exercise 6.8. Prove that the above properties of the numbering of a complete
binary tree hold.
104
6.4. P RIORITY Q UEUES
Heaps
A heap is a special kind of complete binary tree. More specifically, it should
always satisfy the following property: a vertex always have a higher value than
its immediate children. Note that this condition acts transitively, which means
that a vertex also has a higher value than its grand-children, and their children
and so on. In particular, a consequence of this property is that the root of the
three is always be the largest value in the heap. As it happens, this property is
exactly what we need to quickly get the maximum value of the heap. You can
see an example of a heap in Figure 6.6
10
7 4
4 5 1
Figure 6.6: A heap of the elements 1, 4, 4, 5, 7, 10.
105
C HAPTER 6. D ATA S TRUCTURES
it may be that the value we added is now larger than its parent. If this is the case,
we can fix the violation of our heap property by swapping the value with its
parent. This does not guarantee that the value still is not larger than its parent.
In fact, if the newly added element is largest in the heap, it would have to be
repeatedly swapped up to the top! This procedure, of moving the newly added
element up in the tree until it is no longer larger than its parent (or it becomes
the root) is called bubbling up:
Pushing a value now reduces to appending it to the tree and bubbling it up.
You can see this procedure in action in Figure 6.7.
10 10
7 4 7 4
4 5 1 4 5 1 13
10 13
7 13 7 10
4 5 1 4 4 5 1 4
106
6.4. P RIORITY Q UEUES
Removing a value is slightly harder. First of, the tree will no longer be a
binary tree – it is missing its root! To rectify this, we can take the last element
of the tree and put it as root instead. This keeps the binary tree complete, but
may cause it to violate the heap property since our new root may be smaller than
either or both of its children.
The solution to this problem is similar as to that of adding an element.
Instead of bubbling up, we bubble it down by repeatedly swapping it with one
of its children until it no longer is greater than any of its children. The only
question mark is which of its children we should bubble down to, in case the
element is smaller than both of its children. The answer is clearly the largest
of the two children. If we take the smaller of the two children, we will again
violate the heap property. Just as with pushing, popping a value is done by
a combination of removing the value and fixing the heap to satisfy the heap
property again.
A final piece of our analysis is missing. It is not yet proven that the time
complexity of adding and removing elements are indeed 𝑂 (log 𝑛). To do this,
we first need to state a basic fact of complete binary trees: their height is at most
log2 𝑛. This is easily proven by contradiction. Assume that the height of the tree
107
C HAPTER 6. D ATA S TRUCTURES
is at least log2 𝑛 + 1. We claim that any such tree must have strictly more than 𝑛
vertices. Since all but the last layers of the tree must be complete, it must have at
least 1 + 2 + · · · + 2log2 𝑛 = 2log2 𝑛+1 − 1 vertices. But 2log2 𝑛+1 − 1 = 2𝑛 − 1 > 𝑛
for positive 𝑛 – the tree has more than 𝑛 vertices. This means that a tree with 𝑛
vertices cannot have more than height log2 𝑛.
The next piece of the puzzle is analyzing just how many iterations the loops
in the bubble up and bubble down procedures can perform. In the bubble up
procedure, we keep an index to a vertex that, for every iteration, moves up in
the tree. This can only happen as many times as there are levels in the tree.
Similarly, the bubble down procedure tracks a vertex that moves down in the
tree for every iteration. Again, this is bounded by the number of levels in the
tree. We are forced to conclude that since the complexity of each iteration is
Θ(1) as they only perform simple operations, the complexities of the procedures
as a whole are 𝑂 (log 𝑛).
Problem 6.1
Binary Heap – heap
Exercise 6.9. Prove that adding an element using Push never violates the heap
property.
6.5 Bitsets
We move on to the simplest data structure of the chapter. The bitset can be
viewed as a specialization of a static-length array for the case where the values
stored are booleans, i.e. supporting only the operations of setting and getting
values in the array.
The idea behind it is simple. Booleans are essentially values 0 (false) or 1
(true), i.e. equivalent to a binary digit. Another data type also consists of binary
digits – integers. In the chapter on programming, we noted that the memory of
108
6.5. B ITSETS
7 6 5 4 3 2 1 0
90 = 0 1 0 1 1 0 1 0 = {1, 3, 4, 6}
109
C HAPTER 6. D ATA S TRUCTURES
To compute the size of a bitset, the most common compilers support the
macro __builtin_popcount(x) which returns the number of 1 digits in x.
There are several neat tricks involving bitsets. Some worth mentioning are:
Exercise 6.11. Given a bitset, use bitwise operators to compute the next higher
bitset with the same number of elements.
110
6.6. H ASH TABLES
The main idea behind supporting these operations quickly is essentially the
same as that of the dynamic array. Assume that the set of all possible keys, the
universe, were the integers 0, 1, . . . , 𝑁 for some fixed 𝑁 . If so, we could easily
implement the above operations by storing the values in a dynamic array of this
size. What do we do if this is not the case?
We apply the concept of hashing. Imagine your universe consists of all
possible integers that fits in an int. Clearly, we could not use a dynamic array of
that size (232 ) to store the table – at least not in competitions. Instead, the goal
of hashing is to shrink this huge universe into a small one, that we could store in
an array. For example, we could take the last 𝐾 digits of the key for some small
𝐾 (i.e take key mod 10𝐾 ) and use it as the index into an array for the value of
that key. There are only 10𝐾 such keys, which can fit in an array for a small 𝐾.
Given such a mapping, we can store the key and value as a pair on the
corresponding index in the array. This is illustrated for 𝐾 = 1 in Figure 6.9.
0 1 2 3 4 5 6 7 8 9
Figure 6.9: An example of where the hash table would store the values of a few keys when 𝐾 = 1.
This kind of transformation, that takes an arbitrarily large value (any integer)
and maps it into a set of constant size, is called a hash function2.
Unfortunately, we are bitten by one of the fundamental limitations of
mathematics – if a function maps a set to a smaller one, there must be at least two
values which map to the same value. Our table can thus be subject to a collision,
two keys that map to the same index in the array. Resolving this situation is
actually straightforward. Instead of using the table to store a single key-value
pair, we can store a dynamic array of key-value pairs3.
This complicates an implementation only slightly. To retrieve a value from
the hash table, we look up the correct index of the array backing the table and
check all key-value pairs stored in the array at that index. If we find a pair with
2You might have heard about a version of this often used in cryptography, the cryptographic
hash function, which aims to provide stronger guarantees than we care about.
3Traditionally, most introductions to the hash table decides to instead use a data structure called
a linked list to pairs that collide. We elect not to, since it’s mostly a question about real-world
performance, and expect you to mostly use the implementation from STL if you use C++.
111
C HAPTER 6. D ATA S TRUCTURES
0 1 2 3 4 5 6 7 8 9
(270, E) (27, F)
Figure 6.10: When several keys with the same hash value are stored, we save all pairs in a sub-
array.
112
6.6. H ASH TABLES
something taking the upper 𝑁 bits of 𝐴𝑥 as the hash, where 𝐴 is a large constant
odd constant, i.e:
(A * x) >> (64 - N)
Exercise 6.12. What happens if one takes the lower 𝑁 bits of the product 𝐴𝑥 as
the hash instead?
Exercise 6.13. The above hash function is easily broken when 𝑥 can be a 64-bit
integer – how?
linearity of expectation. By the assumption that keys map randomly into hash
values, 𝐸 [𝑎 𝑗 ] ≤ 𝑀1 so that the sum above is bounded by 𝑀𝑗 ≤ 𝑀 𝐾
. Since 𝑀 𝐾
≤𝑐
for some constant 𝑐 by the dynamic resizing, the expected length is also bounded
by a constant, meaning the complexity is as well.
Note that this analysis says nothing about the worst-case complexity of an
operation (which can be linear of all keys map to the same hash value) or the
log 𝐾
expected length of the longest sub-array (which is log log 𝐾 ).
Universal Hashing
Certain competition forms include a stage where contestants may challenge the
solutions of others for correctness, by providing a test case they believe the
solution would fail. In this case, the hash function above is not good enough.
Another contestant can easily generate values of 𝑥 that all map to the same hash
value, by generating a large number of values and evaluating your hash function
113
C HAPTER 6. D ATA S TRUCTURES
on them, picking a large set of collisions from them. This also applies when
using unordered_map from STL4. To resolve this, one picks a hash function from
a family of functions at random at every invocation of your program, a concept
called universal hashing. In practice, the randomness tends come from reading
the current time at a sufficiently granular level to be hard to predict.
The hash function we will look at is again 𝐴𝑥 (as an unsigned 64-bit integer),
but this time 𝐴 is a random (odd) 64-bit integer. We claim that taking the top 𝐾
bits of 𝐴𝑥 makes a good hash function from 64-bit integers to 𝐾-bit integers, i.e.
𝐴𝑥
ℎ(𝑥) = 64−𝐾 .
2
To prove we are not making you use a weak hash for when you com-
pete against the authors, we provide a somewhat technical and uninteresting
proof.
Theorem 6.1
For any two fixed 64-bit integers 𝑥, 𝑦, their hashes ℎ(𝑥) and ℎ(𝑦) are equal
with probability 22𝐾 over all choices of the hash function parameter 𝐴).
Proof. This proof uses some basic number theoretic facts – if you are not
familiar with modular inverses, you might need to work through Chapter 19.
Assume that ℎ(𝑥) = ℎ(𝑦). This means that the top 𝐾 bits of 𝐴𝑥 and 𝐴𝑦
are equal. Thus, the top 𝐾 bits of 𝐴𝑥 − 𝐴𝑦 = 𝐴(𝑥 − 𝑦) must either be all
zeroes (if 𝐴𝑥 ≥ 𝐴𝑦) or all ones (if 𝐴𝑥 < 𝐴𝑦, causing a carry bit from the top
𝐾 bits).
Now, we introduce the following variables. Let 𝑧 be the odd part
of 𝑥 − 𝑦 such that 𝑥 − 𝑦 = 𝑧2𝑖 for some 𝑖. Also, let 𝐵 be the top 63
bits of 𝐴 so that 𝐴 = 2𝐵 + 1. Since 𝐴 is a uniformly random odd 64-
bit integer, 𝐵 is uniformly random. We can now perform the rewrite
𝐴(𝑥 − 𝑦) = (2𝐵 + 1) (𝑥 − 𝑦) = (2𝐵 + 1)𝑧2𝑖 = 𝐵𝑧2𝑖+1 + 𝑧2𝑖 . Since 𝐵 is
uniformly random over 263 and 𝑧 is odd, 𝐵𝑧 mod 263 is uniformly random
over 263 (this follows from the fact that 𝑧 as an odd number is relatively prime
with 263 ). Thus, the integer 𝐴(𝑥 − 𝑦) = 𝐵𝑧2𝑖+1 + 𝑧2𝑖 is uniformly random in
its top 63 − 𝑖 bits and contains only zeroes in its lower 𝑖 bits.
Note that 𝐴𝑥 = 𝐴𝑦 + 𝐴(𝑥 − 𝑦). Since 𝐴(𝑥 − 𝑦) has zeroes in the lower 𝑖
114
6.6. H ASH TABLES
bits and a 1 in the 𝑖’th bit, the 𝑖’th bit of 𝐴𝑦 will change when adding 𝐴(𝑥 −𝑦)
to it, so that it will differ from the 𝑖’th bit in 𝐴𝑥. By assumption the top 𝐾
bits of 𝐴𝑥 and 𝐴𝑦, are equal, which thus forces 𝑖 ≤ 64 − 𝐾.
Since the top 63 − 𝑖 ≥ 63 − (64 − 𝐾) = 𝐾 are thus uniformly random, we
get that they are all ones or all zeroes with probability 22𝐾 .
Exercise 6.14. Is it a problem that the hash has pairwise collisions with proba-
bility 22𝐾 rather than 21𝐾 with regard to hash table complexity?
Chapter Exercises
Exercise 6.15. Assume that you want to implement shrinking of a dynamic
array (or a hash table) where many elements were deleted so that the capacity is
unnecessarily large. This will be implemented by calling a particular function
after any removal, to see if the array should be shrunk. What is the problem
with the following implementation?
1: procedure ShrinkVector(𝑉 )
2: while 2 · V.capacity > V.size do
3: arr ← 𝑛𝑒𝑤 𝑇 [V.capacity/2]
4: copy the elements of V.backing to arr
5: V.backing ← arr
6: V.capacity ← V.capacity/2
Chapter Notes
For a more rigorous treatment of the basic data structures, we again refer to
Introduction to Algorithms [7]. In partiular it goes through other techniques
regarding hash tables more thorougly, something we skipped since it is the
hashing technique and general knowledge of the structure we deemed important
here – an efficient implementation is something your language standard library
will provide.
If you want to dive deeper into proper implementations of the algorithms
in C++, Data Structures and Algorithm Analysis in C++[29] covers what we
brought up in this chapter and a bit more.
115
C HAPTER 6. D ATA S TRUCTURES
116
7 Recursion
This chapter introduces the first proper algorithimc technique of the book, that
of recursion. The first four chapters of the next part – brute force, greedy
algorithms, dynamic programming and divide and conquer – are all based on
this concept. Recursion is perhaps the first truly creatively tricky (rather than
technically difficult) technique faced by the fresh programmer, so we have chosen
to dedicate an entire chapter for a primer on the topic.
The remainder of this book, and computer science as a whole, strongly
depends on a solid understanding of recursion. You are therefore urged to read
it more carefully than the previous chapters. Even better; once you have read it,
read it again.
Exercise 7.1. Use the definition to compute the 15 first Fibonacci numbers.
117
C HAPTER 7. R ECURSION
Note that this function, just like the recursive definition, computes its result
𝐹 (𝑛) by calling itself to compute the (smaller) Fibonacci numbers 𝐹 (𝑛 − 1) and
𝐹 (𝑛 − 2). A knee-jerk reaction might be that such a function could never finish.
After all, in order to compute a single Fibonacci number, the function calls itself,
not just one, but two times! The solution is one of the key ideas of recursion,
namely that there is some base case where the self-referential – recursive –
computation eventually bottoms out, so that the definition does not refer back to
itself forever and ever. In the case of Fibonacci, once you try computing 𝐹 0 or
𝐹 1 , the definition gives us the values immediately without having to apply the
recursive case. One can visualize the computation as in Figure 7.1.
𝐹 (2) 𝐹 (1) 1 1
𝐹 (1) 1
𝐹 (1) 𝐹 (0) 1 0
𝐹 (0) 0
118
7.1. R ECURSIVE D EFINITIONS
Recursive Exponentiation
1 int power(int a, int n) {
2 if (n == 0) return 1;
3 return a * power(a, n - 1);
4 }
To compute the sum of the entire array, we would call 𝑆 (𝑛). Even though we
now have to deal with a vector, the implementation is similar:
1 // Invoked with sum(A, A.size())
2 int sum(const vi& A, int k) {
3 if (k == 0) return 0;
4 return A[k - 1] + sum(A, k - 1);
5 }
119
C HAPTER 7. R ECURSION
Exercise 7.3. Write a program that uses the recursive function to compute
Fibonacci numbers. Try computing all the Fibonacci numbers starting from 𝐹 30
and upwards until the execution takes over 30 seconds. Take note on how long
your program takes. What complexity does the function seem to have?
From the above exercise, one thing should be clear – the running time is not a
linear function. In fact, it turns out to be exponential!
A simple lower bound is 2 2 function calls, which we can prove by induction.
𝑛
Let 𝑇 (𝑛) be the time taken to compute 𝐹𝑛 . If 𝑇 (𝑛) ≥ 2 2 for all 𝑛 up to some
𝑛
𝑛 0 − 1 and 𝑛 = 1, then
𝑇 (𝑛 0) ≥ 𝑇 (𝑛 0 − 2) + 𝑇 (𝑛 0 − 1)
𝑛0 −2 𝑛0 −1
=2 2 +2 2
𝑛0 −2 𝑛0 −2
≥2 2 +2 2
𝑛0 −2
= 21+ 2
𝑛0
=22
120
7.3. C HOICE
This lower bound is quite lax though – we can do better. Assume that
𝑇 (𝑛) ≥ 𝑥 𝑛 for some real value 𝑥. As 𝑇 (𝑛) ≥ 𝑇 (𝑛 − 2) + 𝑇 (𝑛 − 1) = 𝑥 𝑛−1 + 𝑥 𝑛−2
(within a constant term), we get the inequality
𝑥 𝑛 ≥ 𝑥 𝑛−1 + 𝑥 𝑛−2
which, after dividing by 𝑥 𝑛−2 results in 𝑥 2 − 𝑥 − 1 > 0. Solving for 𝑥 = 0, we
find that the √inequality holds when 𝑥 is greater than the so-called the golden
ratio, 𝜙 = 1+2 5 ≈ 1.618.
Exercise 7.4. Use the same inductive technique as before to prove that 𝑇 (𝑛) =
Ω(1.61𝑛 ) and 𝑇 (𝑛) = 𝑂 (1.62𝑛 ).
7.3 Choice
While all recursion is based on reducing a problem instance to a smaller instance
of the same problem, there are many different conceptual ways to do this. This
time, we look at problems involving choices of different kinds.
Stairs
Tasha the Kitty loves playing with the stairs at home while her caretakers are
at work. Her favorite game involves jumping up to the top of the stairs by
repeatedly skipping either 1 or 2 stairs at a time. She doesn’t like jumping on
the exact same sequence of stairs during two different climbs.
If the staircase has 1𝑛 ≤ 20 steps (including the top), in how many different
ways can she climb the stairs?
Solution. With such a small 𝑛, computing this efficiently is not the main issue;
computing it at all is. The trick lies in formulating Tasha’s jumping up the stairs
121
C HAPTER 7. R ECURSION
as a sequence of choices. After Tasha has jumped 𝑘 steps, she has two choices –
should her next jump be up a single stair to 𝑘 + 1, or two stairs to 𝑘 + 2? When
dealt such a problem, always ask yourself: what was Tasha’s last choice, just
before she climbed up to the top of the stairs? Consider these two options in
Figure 7.3.
n
n−1
n−2
If there are a total of 𝑛 stairs and Tasha’s last jump was a single step, then
she came from step 𝑛 − 1. Similarly, if she took two steps, she came from step
𝑛 − 2. These two options are exhaustive – there is no other way she could have
come to step 𝑛. They are also exclusive – we assumed that this was Tasha’s last
jump, so there is no overlap between these possibilities. This means that the
number of ways Tasha can get to the 𝑛’th step must be equal to the number of
ways she could get to the (𝑛 − 1)’st step, plus the number of ways she could get
to the (𝑛 − 2)’nd step.
A recursive procedure based on this insight is then straightforward:
1: procedure Stairs(𝑛)
2: if 𝑛 = 0 then
3: return 1
4: if 𝑛 = 1 then
5: return 1
6: return Stairs(𝑛 − 1) + Stairs(𝑛 − 2)
Note the bases cases we added, for the case of an empty staircase or a single
stair. The time complexity of the solution is the same as that for Fibonacci, since
the recursion is the same.
122
7.3. C HOICE
The Plank
Swedish Olympiad in Informatics, School Qualifiers 2001
You want to construct a long plank using smaller wooden pieces. There are three
kinds of pieces of lengths 1, 2 and 3 meters respectively, which you have an
unlimited number each of. You can glue together several of the smaller pieces
to create a longer plank.
If the plank should have length 𝑛 (1 ≤ 𝑛 ≤ 24) meters, in how many different
ways can you glue pieces together to get a plank of the right length?
Solution. The idea here is the same as in the Stairs problem. To compute the
size of the set of all possible planks, we find a recursive definition that reduces
the problem into counting the number of ways one can build some smaller
planks. For any given plank of length 𝑛, the rightmost piece of the plank has size
either 1, 2 or 3. This means that the number of ways in which we can construct
the plank is equal to the number of ways in which planks of sizes 𝑛 − 1, 𝑛 − 2
and 𝑛 − 3 can be computed. While this isn’t easier to compute directly, we can
apply the same reduction recursively to these smaller planks, ending up with a
very similar solution:
1: procedure PlankWays(𝑛)
2: if 𝑛 < 0 then
3: return 0
4: if 𝑛 = 0 then
5: return 1
123
C HAPTER 7. R ECURSION
Again, we had to add a few base cases to give the recursion somewhere to stop.
The two base cases we picked here may be slightly less intuitive. We say that
there is a single way to construct a plank of length 0, and no ways to construct
negative-length planks.
Exercise 7.5. The PlankWays algorithm has a time complexity lower bound of
Ω(1.83𝑛 ) and 𝑂 (1.84𝑛 ). Prove this.
Problem 7.1
The Plank – plankan
Note: solve the first subtask for 50 points.
While these two problems in particular are much alike, many other recursive
problems also follow this template:
• find out if “backtracking” along that choice reduces the problem to smaller
instances of the same problem.
Now that we have warmed up, we are going to look at a slightly harder
recursive problem, where it is less obvious to figure out how to reduce the
problem to a smaller one.
Dominoes
In how many ways can a 2 × 𝑛 (1 ≤ 𝑛 ≤ 20) grid be tiled by 𝑛 dominoes, i.e.
bricks of size 1 × 2 or 2 × 1 such that no dominoes overlap?
124
7.3. C HOICE
Solution. Looking at the example tiling in figure 7.5 might help us. Let us
denote the number of tilings of a 2 × 𝑛 grid with 𝑆 (𝑛). In general, a recursion
would somehow reduce the problem of computing 𝑆 (𝑛) to computing smaller
values of this function. By considering the rightmost domino of the example, a
partial solution idea should form. If the rightmost tile is placed vertically, the
remaining grid has size 2 × (𝑛 − 1), so there are 𝑆 (𝑛 − 1) such tilings. If it is
not placed vertically, the two rightmost squares must instead be occupied by two
horizontal tiles. In this case, the remaining grid would have size 2 × (𝑛 − 2),
meaning there would be 𝑆 (𝑛 − 2) ways to complete the remainder of the tiling
(see Figure 7.6).
n−2
= +
n
n−1
Figure 7.6: The two resulting subproblems after covering the rightmost column.
Since these are the only two options, the total number of tilings must be
𝑆 (𝑛) = 𝑆 (𝑛 − 1) + 𝑆 (𝑛 − 2), and thus we get our recursive solution. Here
too we got the same recursion as the one for Fibonacci, with the same time
complexity.
In the next chapter on brute force, we will revisit this way of thinking in a
new light as we use recursion to solve optimization problems rather than simply
counting ways.
125
C HAPTER 7. R ECURSION
Varied Amusements
Marika and Lisa loves going to amusement parks. This time, they have their
eyes set on a park with lots of exciting rides of three different types: tilt-a-whirls,
roller coasters and drop towers. There are 1 ≤ 𝑎 ≤ 10 different tilt-a-whirls,
1 ≤ 𝑏 ≤ 10 roller coasters and 1 ≤ 𝑐 ≤ 10 drop towers. They want to ride
1 ≤ 𝑛 ≤ 10 different rides in sequence, but never two rides of the same type in a
row. In how many ways can they choose such sequences of 𝑛?
Solution. On the surface, the problem is a prime candidate for the choice-strategy.
There are 𝑛 choices – what ride to go on each time. However, once we choose
the last ride the girls took, we are faced with a problem. If we chose a roller
coaster, the first 𝑛 − 1 rides may not end with a roller coaster. This is not a
smaller instance of the same problem, where the last ride could be anyone we
wanted. Instead, depending on the type of ride we choose as the last one, we get
three different problems: How many sequences of 𝑛 − 1 rides are there that does
not end with A) a tilt-a-whirl? B) a roller coaster? or C) a drop tower?
What happens if we apply the same strategy to these three new problems?
Well, in the problem where we have to choose an 𝑛 − 1 ride sequence that does
not end with a tilt-a-whirl, there are two options for the last ride. If it was a
roller coaster, we have to choose the remaining 𝑛 − 2 rides such that they do not
end with a roller coaster. If it was a drop tower, the remaining 𝑛 − 2 rides may
not end with a drop tower. Either way, both cases reduce to a smaller problem
of the other two types!
By introducing the three new problems 𝐴(𝑛), 𝐵(𝑛) and 𝐶 (𝑛), defined as the
number of ride sequences of length 𝑛 not ending in a tilt-a-whirl, roller coaster
or drop tower respectively, we can produce recursive definitions that refer only
126
7.5. R ECURSION VS . I TERATION
to these recursions:
𝐴(𝑛) = 𝑏 · 𝐵(𝑛 − 1) + 𝑐 · 𝐶 (𝑛 − 1)
𝐵(𝑛) = 𝑎 · 𝐴(𝑛 − 1) + 𝑐 · 𝐶 (𝑛 − 1)
𝐶 (𝑛) = 𝑎 · 𝐴(𝑛 − 1) + 𝑏 · 𝐵(𝑛 − 1)
with the only required base case of 𝐴(0) = 𝐵(0) = 𝐶 (0) = 1. The answer then
becomes 𝑎 · 𝐴(𝑛 − 1) + 𝑏 · 𝐵(𝑛 − 1) + 𝑐 · 𝐶 (𝑛 − 1).
When implementing the solution in C++, don’t forget results from Exercises
2.20-2.21 to resolve the circular dependencies between functions calling each
other.
Exercise 7.7. Determine, with proof, the time complexity of the solution to
Varied Amusements.
Problem 7.2
Varied Amusements – variedamusements
Note: solve the first subtask for 1 point.
127
C HAPTER 7. R ECURSION
Chapter Exercises
Exercise 7.8. There are 𝑛 lines drawn in the place, no three lines intersecting in
the same point. What is the number of connected regions they split the plane
into?
Problem 7.4
3 × 𝑛 Dominoes – 3xndominoes
Note: solve the first subtask for 2 points.
3-close Sets – 3close
Note: solve the first two subtasks for 2 point.
Even A’s, Odd B’s – evenaoddb
Note: solve the first two subtasks for 2 point.
2At a low level, modern processors are essentially a single execution loop with a stack for
function-related memory.
128
7.5. R ECURSION VS . I TERATION
Chapter Notes
Recursion as a problem solving technique is a common one both in mathematics
and algorithmics. In mathematics, there tends to be a larger focus in finding
closed forms for the recurions though, while we are happy with any kind of
efficient computation. There is a rich combinatorial theory behind finding such
closed forms. As previously mentioned, Concrete Mathematics[?] is one of the
mathematics books that really excel on teaching these techniques.
129
C HAPTER 7. R ECURSION
130
8 Graph Theory
We finish this foundational part with an introduction to graph theory, the study
of mathemtical objects known as graphs. As a mathematical area of study, it
dates back to the early 1700s, when Euler first studied the famous Seven Bridges
of Königsberg problem. It is one of the most well-studied areas in algorithmic
problem solving, being one of only two topics (together with data structures) to
make an appearance in all three parts of this book. In almost every programming
contests you can find a problem relating to graphs.
8.1 Graphs
A graph is an abstract way of representing various types of relations, such
as roads between cities, friendships between people, networks links between
computers and so on. Graphs are essentially a set of objects where certain
pairs of objects are connected. Formally, graphs are defined in the following
way.
4 3
1 2 5
Exercise 8.1. Draw the graphical representation of the graph with vertices
131
C HAPTER 8. G RAPH T HEORY
{𝑎, 𝑏, 𝑐, 𝑑 } and edges {{𝑎, 𝑏}, {𝑏, 𝑐}, {𝑐, 𝑑 }, {𝑎, 𝑑 }, {𝑏, 𝑑 }}.
Exercise 8.2. The graph on 𝑛 vertices containing all possible edges is called the
complete graph, or 𝐾𝑛 . How many edges does 𝐾𝑛 have?
Trip Planning
Lars is planning to do a backpacking tour by train throughout 𝑁 cities in Europe.
He has a list of the 𝑀 train lines that go back and forth between pairs of these
cities. He wants to visit the cities in the order 1, 2, . . . , 𝑁 , finally returning back
to his home in city 1.
Since Lars have limited vacation days, he only has time to take exactly 𝑁
direct trains during his trip. Can you determine if this is possible, and tell Lars
which trains to take?
Input
The first line contains the number of cities 𝑁 ≤ 106 that Lars wants to visit, and
𝑀 ≤ 106 , the number of direct trains.
The next 𝑀 lines each contain two integers 1 ≤ 𝑎 ≠ 𝑏 ≤ 𝑁 , indicating that
there is a train line traveling between cities 𝑎 and 𝑏. No two train lines will have
the same two integers.
Output
If Lars cannot perform his trip taking only 𝑁 trains, output no trip. Otherwise,
output the numbers of the 𝑁 train lines that lines Lars should take (in order of
travel), where the train lines are numbered from 1 to 𝑀 in the order they appear
in the input.
Solution. The problem essentially asks if there are direct train lines between
cities (1, 2), (2, 3), . . . , (𝑁 − 1, 𝑁 ), (𝑁 , 1). If there are, we want to find the
indices of these lines in the input. This is a typical problem that can be modelled
as a graph. In those terms, we have a graph on 𝑁 vertices with its 𝑀 edges given
in a list. We are asked if the graph contains a certain list of edges.
A possible solution would be to keep a vector of the indices of these particular
edges while we read the list of edges. If we find the edge {𝑘, 𝑘 + 1} for a given
𝑘, we can store the index of the edge in the 𝑘’th position in the vector. Only if
we managed to find every edge should we reply with their indexes. Otherwise,
we would output no trip.
132
8.1. G RAPHS
Problem 8.1
Trip Planning – tripplanning
Theorem 8.1
The sum of degrees of a graph 𝐺 = (𝑉 , 𝐸) is even. Specifically,
Õ
deg(𝑣) = 2|𝐸|.
𝑣 ∈𝑉
133
C HAPTER 8. G RAPH T HEORY
Exercise 8.3. Prove that in a simple graph of at least 2 vertices, there must exist
2 vertices of the same degree.
Problem 8.2
Given a graph, print all the vertices with the highest degrees.
82 km
F-field G-grad
km
37
km
46
km
km
50
15
134
8.2. R EPRESENTING G RAPHS
Finally, not all relations we model are symmetric in the way indicated by
graphs. In many situations, we would instead prefer if an edge could have a
certain direction, going from a vertex to another vertex. For example, when
modelling all the car roads in a city, certain roads may be one-way, a nuance the
simple graph would miss. We fix this by making edges ordered pairs rather than
sets:
Definition 8.4 A directed graph is a graph (𝑉 , 𝐸) where 𝐸 consists of directed
edges, i.e. ordered pairs 𝑒 = (𝑢, 𝑣) of vertices. The edge 𝑒 is called an
out-edge of 𝑢 and in-edge of 𝑣.
When representing directed graphs graphically, edges will be arrows, with the
arrowhead pointing from 𝑢 to 𝑣 (Figure 8.3).
4 3
1 2
Figure 8.3: The graph given by 𝑉 = {1, 2, 3, 4} and 𝐸 = { (1, 2), (3, 1), (4, 2), (4, 1) }.
Problem 8.4
Determine if there exists a directed triangle in a graph, where 𝐸 ≤ 100.
135
C HAPTER 8. G RAPH T HEORY
This latter representation is common when dealing with searches in the graph
corresponding to the positions in a game such as chess.
In the following sections, we present the representation of the directed,
unweighted graph in Figure 8.4.
4 3
1 2 5
Adjacency Matrices
An adjacency matrix represents a graph 𝐺 = (𝑉 , 𝐸) with |𝑉 | as a 2D |𝑉 | × |𝑉 |
matrix in the following way:
0 1 0 0 0 0
0 0 1 0 0 0
© ª
®
0 0 0 1 0 0
®
®
®
0 1 0 0 0 0 ®
®
0 0 0 0 1 0 ®
®
« 0 0 0 0 0 0 ¬
136
8.2. R EPRESENTING G RAPHS
Adjacency Lists
Another way to represent graphs is by storing lists of neighbours for every vertex.
This approach is called adjacency lists. This only requres Θ(|𝐸| + |𝑉 |) memory,
which is better when your graph have few edges, i.e. it is sparse. If you use
a vector to represent each list of neighbours, you also get Θ(1) addition and
removal (if you know the index of the edge and ignore their order) of edges,
but it takes 𝑂 (|𝑉 |) time to determine if an edge exists. On the upside, iterating
through the neighbours of a vertex takes time proportional to the number of
neighbours instead of the number of vertices in the graph. This means that
iterating through all the neighbours of all vertices takes time Θ(|𝐸| + |𝑉 |) instead
of Θ(|𝑉 | 2 ) as for the adjacency matrix. For large, sparse graphs this is clearly
better!
When representing weighted graphs, the list usually stores the edges as pairs
of (neighbour, weight) instead. For undirected graphs, both endpoints of an
edge contains the other in their adjacency lists.
This representation is common in many graph search algorithms to be studied
in Chapter 14.
Adjacency Maps
An adjacency map combines the adjacency matrix with the adjacency list to
get the benefits of both the matrix (Θ(1) time to check if an edge exists) and
the lists (low memory usage and fast neighbourhood iteration). Instead of using
lists of neighbours for each vertex, we can use a hash table for each vertex.
This has the same time and memory complexities as the adjacency lists, but
it also allows for checking if an edge is present in Θ(1) time. The downsides are
that hash tables have a higher constant factor than the adjacency list, and that
you lose the ordering you have of your neighbours (if this is important). The
adjacency map also inherits another sometimes important property from the
matrix: you can remove arbitrary edges in Θ(1) time!
This representation is mostly used when one is dynamically modifying a
graph.
137
C HAPTER 8. G RAPH T HEORY
Let us solve this problem inductively. First of all, what vertices have distance
0? Clearly, this is only the source vertex 𝑠 itself. This seems like a reasonable
138
8.3. B READTH -F IRST S EARCH
base case, since the problem is about shortest paths from 𝑠. Then, what vertices
have distance 1? These are exactly those with a path consisting of a single edge
from 𝑠, meaning they are the neighbors of 𝑠 (marked in Figure 14.2).
1 s
1
4
2 1 s 3 2 1 s 3 2 1 s
2 1 2 3 2 1 2 4 3 2 1 2
In fact, this reasoning generalizes to any particular distance, i.e., that all the
vertices that have exactly distance 𝑘 are those that have a neighbor of distance
𝑘 − 1 but no neighbor to a vertex with a smaller distance. Using this, we can
construct an algorithm to solve the problem. Initially, we set the distance of 𝑠
to 0. Then, for every dist = 1, 2, . . . , we mark all vertices that have a neighbor
with distance dist − 1 as having dist. This algorithm is called the breadth-first
search.
Exercise 8.6. Use the BFS algorithm to compute the distance to every square in
the following grid:
139
C HAPTER 8. G RAPH T HEORY
1: while curVertices ≠ ∅ do
140
8.3. B READTH -F IRST S EARCH
8-puzzle
In the 8-puzzle, 8 tiles are arranged in a 3 × 3 grid, with one square left empty.
A move in the puzzle consists of sliding a tile into the empty square. The goal
of the puzzle is to perform some moves to reach the target configuration. The
target configuration has the empty square in the bottom right corner, with the
numbers in order 1, 2, 3, 4, 5, 6, 7, 8 on the three lines.
8 6 8 6 1 2 3
7 1 4 7 1 4 4 5 6
2 5 3 2 5 3 7 8
Figure 8.8: An example 8-puzzle, with a valid move. The rightmost puzzle shows the target
configuration.
Given a puzzle, determine how many moves are required to solve it, or if it
cannot be solved.
This is a typical BFS problem, characterized by a starting state (the initial
puzzle), some transitions (the moves we can make), and the task of finding a
short sequence of transitions to some goal state. We can model this kind of
problem using a graph. The vertices represent the possible arrangements of
141
C HAPTER 8. G RAPH T HEORY
the tiles in the grid, and an edge connects two states if the differ by a single
move. A sequence of moves from the starting state to the target configuration
then represents a path in this graph. The minimum number of moves required is
the same as the distance between those vertices in the graph, meaning we can
use a BFS.
In such a problem, most of the code usually deals with with the representation
of a state as a vertex, and generating the edges that a certain vertex is adjacent to.
When an implicit graph is given, we generally do not compute the entire graph
explicitly. Instead, we use the states from the problems as-is, and generate the
edges of a vertex only when it is being visited in the breadth-first search. In the
8-puzzle, we can represent each state as a 3 × 3 2D-vector. The difficult part is
generating all the states that we can reach from a certain state.
With the edge generation in hand, the rest of the solution is a normal BFS,
slightly modified to account for the fact that our vertices are no longer numbered
0, . . . , 𝑉 − 1. We can solve this by using e.g. maps instead.
142
8.4. D EPTH -F IRST S EARCH
8-puzzle BFS
1 int puzzle(const Puzzle& S, const Puzzle& target) {
2 map<Puzzle, int> distances;
3 distances[S] = 0;
4 queue<Puzzle> q;
5 q.push(S);
6 while (!q.empty()) {
7 const Puzzle& cur = q.front(); q.pop();
8 int dist = distances[cur];
9 if (cur == target) return dist;
10 for (const Puzzle& move : edges(cur)) {
11 if (distances.find(move) != distances.end()) continue;
12 distances[move] = dist + 1;
13 q.push(move);
14 }
15 }
16 return -1;
17 }
Besides this kind of search problems that can be solved using a BFS, some
problems require modifications of a BFS, or use the distances generated only as
an intermediary result.
Shortest Cycle
Compute the length of the shortest simple cycle in a graph.
Problem 8.5
Button Bashing – buttonbashing
143
C HAPTER 8. G RAPH T HEORY
proceeds its search by, at every step, trying to plunge deeper into the graph.
This order is called the depth-first order. More precisely, the search starts at
some source vertex 𝑠. Then, any neighbor of 𝑠 is chosen to be the next vertex 𝑣.
Before visiting any other neighbor of 𝑠, we first visit any of the neighbours of 𝑣,
and so on.
Implementing the depth-first search is usually done with a recursive function,
using a vector seen to keep track of visited vertices:
Coast Length
KTH Challenge 2011 – Ulf Lundström
The residents of Soteholm value their coast highly and therefore want to maximize
its total length. For them to be able to make an informed decision on their
position in the issue of global warming, you have to help them find out whether
their coastal line will shrink or expand if the sea level rises. From height maps
they have figured out what parts of their islands will be covered by water, under
the different scenarios described in the latest IPCC report on climate change,
but they need your help to calculate the length of the coastal lines.
144
8.4. D EPTH -F IRST S EARCH
Figure 8.9: Gray squares are land and white squares are water. The thick black line is the sea
coast.
Solution. We can consider the grid as a graph, where all the water squares are
vertices, and two squares have an edge between them if they share an edge. If
we surround the entire grid by an water tiles (a useful trick to avoid special cases
in this kind of grid problems), the sea consists exactly of those vertices that are
connected to these surrounding water tiles. This means we need to compute the
vertices which lie in the same connected component as the sea – a typical DFS
task1. After computing this component, we can determine the coast length by
looking at all the squares which belong to the sea. If such a square share an edge
with a land tile, that edge contributes 1 km to the coast length.
1 const vpi moves = {pii(-1, 0), pii(1, 0), pii(0, -1), pii(0, 1)};
2
3 int coastLength(const vector<vector<bool>>& G) {
4 int H = sz(G) + 4;
5 W = sz(G[0]) + 4;
6 vector<vector<bool>> G2(H, vector<bool>(W, true));
7 rep(i,0,sz(G)) rep(j,0,sz(G[i])) G2[i+2][j+2] = G[i][j];
8 vector<vector<bool>> sea(H, vector<bool>(W));
9
1This particular application of DFS, i.e. computing a connected area in a 2D grid, is called a
flood fill.
145
C HAPTER 8. G RAPH T HEORY
Problem 8.6
Mårten’s DFS – martensdfs
8.5 Trees
A tree is a special kind of graph – a connected graph which does not contain any
cycle. The graph in Figure ?? is not a tree, since it contains the cycle 1, 2, 4, 1.
The graph in Figure 8.10 on the other hand, contains no cycle.
Figure 8.10: The tree given by 𝑉 = {1, 2, 3, 4} and 𝐸 = { {1, 2}, {3, 1}, {4, 1} }.
Chapter Exercises
Given a graph, determine how many edges must be added to make it regular.
Chapter Notes
146
Part II
Basics
147
9 Brute Force
Many problems are solved by testing a large number of possibilities. For example,
chess engines work by testing countless variations of moves and choosing the
ones resulting in the “best” positions. This approach is called brute force. Brute
force algorithms exploit that computers are fast, resulting in you having to be less
smart. Just as with chess engines, brute force solutions might still require some
ingenuity. A brute force problem might have a simple algorithm which requires
a computer to evaluate 240 options, while some deeper analysis might be able to
reduce this to 220 . This would be a huge reduction in running time. Different
approaches to brute force may be the key factor in reaching the latter case instead
of the former. In this chapter, we look at four different techniques used to solve
brute force problems, ranging from the simplicity of just evaluating every single
option to an advanced memory-time tradeoff called meet-in-the-middle.
149
C HAPTER 9. B RUTE F ORCE
The focus of this chapter and the chapters on Greedy Algorithms (Chapter 10)
and Dynamic Programming (Chapter 11) is to develop techniques that exploit
particular structures of optimization problems to avoid evaluating the entire set
𝑆.
Max Clique
In a graph, a subset of the vertices form a clique if each pair of vertices is
connected by an edge.
3
1 5
2
0
Given a graph on 𝑉 vertices and 𝐸 edges, determine the size of the largest
clique.
150
9.2. G ENERATE AND T EST
whether it is a clique. If it is, its size must be computed and the largest clique
chosen.
In the Max Clique problem, there are only 2𝑉 subsets of vertices (and
candidate solutions) – a quite small number. Given such a set, we can verify
whether it is a clique in 𝑂 (𝑉 2 ) time by checking if every pair of vertices in the
candiate set has an edge between them. To perform this check in Θ(1) time, we
keep a 2D vector 𝑎𝑑 𝑗 such that 𝑎𝑑 𝑗 [𝑖] [ 𝑗] is true if and only if vertices 𝑖 and 𝑗
are adjacent to each other. This gives us a total complexity of Θ(2𝑉 · 𝑉 2 ) in
the worst case. According to our table of approximate allowed input sizes for
various complexities (p. 88), this should be fast enough for 𝑉 = 15.
Max Clique
1 int V, E;
2 cin >> V >> E;
3 vector<vector<bool>> adj(V, vector<bool>(V));
4 rep(i,0,E) {
5 int a, b;
6 cin >> a >> b;
7 adj[a][b] = adj[b][a] = true;
8 }
9 rep(i,0,V) adj[i][i] = true;
10
11 int ans = 0;
12 rep(subset,0,1<<V) {
13 bool isClique = true;
14 rep(i,0,V) {
15 // Skip if the subset does not contain i
16 if ((subset & (1 << i)) == 0) continue;
17 rep(j,0,V) {
18 // Skip if the subset does not contain j
19 if ((subset & (1 << j)) == 0) continue;
20 if (!adj[i][j]) {
21 // Subset contained both i, j and are not neighbors.
22 isClique = false;
23 }
24 }
25 }
26 if (isClique) {
27 ans = max(ans, __builtin_popcount(subset));
28 }
29 }
30 cout << ans << endl;
Note the nifty use of integers interpreted as bitsets to easily iterate over every
possible subset of an 𝑉 -element set, a common technique in generate and test
solutions based on subsets.
151
C HAPTER 9. B RUTE F ORCE
Problem 9.1
Max Clique – maxclique
This kind of brute force problem is often easy to spot. There will be a very
small input limit on the parameter you are to brute force over. The solution will
often be subsets of some larger base set (such as the vertices of a graph), or
combinations of several small sets.
Problem 9.2
4 thought – 4thought
Lifting Walls – walls
Let us look at another example of this technique, where the answer is not
just a subset.
The Clock
Swedish Olympiad in Informatics 2004, School Qualifiers (CC BY-SA 3.0)
When someone asks you what time it is, most people respond “a quarter past
five”, “15:29” or something similar. If you want to make things a bit harder, you
can answer with the angle from the minute hand to the hour hand, since this
uniquely determines the time. However, most people are not used to this way of
specifying the time, so it would be nice to have a program which translates this
to a more common format.
152
9.2. G ENERATE AND T EST
12
11 1
10 2
75◦
9 3
105◦
8 4
180◦
7 5
6
Figure 9.2: The angle between the hands at time :30.
We assume that our clock have no seconds hand, and only display the time
at whole minutes (i.e., both hands only move forward once a minute). The angle
is determined by starting at the hour hand and measuring the number of degrees
clockwise to the minute hand. To avoid decimals, this angle is specified in tenths
of a degree.
Input
The first and only line of input contains a single integer 0 ≤ 𝐴 < 3600, the angle
specified in tenths of a degree.
Output
Output the time in the format hh:mm between 00:00 and 11:59.
It is difficult to come up with a formula that gives the correct times as a function
of the angles between the hands on a clock. Instead, we can turn the problem
around. If we know what the time is, can we compute the angle between the two
hands of the clock?
Assume that the time is currently ℎ hours and 𝑚 minutes. The minutes hand
is then at angle 360
60 𝑚 = 6𝑚 degrees clockwise from straight up. Similarly, the
hour hand moves 360 12 ℎ = 30ℎ degrees clockwise after ℎ whole hours, with an
extra 360 1
12 60 𝑚 = 0.5𝑚 degrees added due to the minute. While computing the
current time directly from the angle is difficult, computing the angle from the
current time is easy.
The brute force solution is to test the 60 · 12 = 720 different times, and pick
153
C HAPTER 9. B RUTE F ORCE
1: procedure Clock(𝐴)
2: for ℎ ← 0 to 11 do
3: for 𝑚 ← 0 to 59 do
4: hourAng ← 300ℎ + 5𝑚 ⊲ Angles are 10’ths of degrees to avoid
storing half of degrees.
5: minuteAng ← 60𝑚
6: angBetween ← (minuteAng − hourAng + 3600) mod 3600
7: if angBetween = 𝐴 then
8: return ℎ:𝑚
Exercise 9.2. Can there be two times that produce the same angle? If yes,
produce such an example. If no, prove that there are no two such times.
Competitive Tip
Competitions sometimes pose problems which are solvable quite fast, but where a
brute force algorithm suffice. Code the simplest correct solution that is fast enough,
even if you see a faster one.
Problem 9.3
The Clock – theclock
All about that base – allaboutthatbase
Perket – perket
9.3 Backtracking
Backtracking is a variation of the generate and test method. It can be much faster
than a generate and test solution, but it is not always applicable and is sometimes
be more difficult to code (in particular when the solutions are subsets).
Consider our solution to the Max Clique problem. In our solution, we
generated all the candidate solutions (i.e., subsets of vertex) by using bitsets.
When solving problems where the candidate solutions are other objects than
subsets, or the number of subsets is too large to iterate through, we need to
construct the solutions in another way. Generally, we do this recursively.
When generating subsets, we would go through every element one at a time,
deciding whether to include it or not in a recursive fashion. Backtracking extends
154
9.3. B ACKTRACKING
Include 2? {}
{}
Include 1? {2}
{}
Include 2? {1}
{1}
FAIL
generate and test to not only testing all the candidate solutions, but also these
partial candidates. Once a partial candidate is identified as being infeasible –
for the clique example, once we include two non-neighbouring vertices – the
backtracking can stop early. If there are much fewer valid partial candidates
than total candidates checked by a generate and test approach, this saves time.
In Figure 9.3 an example of this approach is demonstrated. The example
illustrates the beginning of a backtracking recursion that generates subsets.
Subsets are recursively created by either including or excluding every element,
one at a time. One subset, {1, 2} was identified as not being permissible – for
example, because the vertices they represented in a clique problem was not
neighbours. Thus, no further backtracking was performed.
As a concrete example, consider a variant of the clique problem, where we
are interested in computing the number of cliques with at most 6 vertices. In
this case, we can solve the problem for larger instances than what generating
all possible subsets and testing them allows us to. The trick is that while there
are 2𝑉 subsets of 𝑉 vertices, there are only 𝑂 (𝑉 7 ) subsets containing up to 6
vertices (see Chapter 17 on why this is). Thus, if we can restrict ourselves to
never generating any other kind of subset, the solution ought to be much faster.
This generation is easily implemented using backtracking:
155
C HAPTER 9. B RUTE F ORCE
How fast is this solution? Analysis is slightly tricky, but we can give an
upper bound that is small enough. If we have about 40 vertices, there are about
7.6 · 105 subsets with at most 5 vertices. Since no subset requires more than 40
recursive calls to construct it (one for each vertex in the set), constructing these
subsets require no more than about 3.0 · 107 recursive calls. Each subset of 6
elements is constructed from one of these subsets and results in only a single
additional recursive call, so in total the function is invoked at most 6.0 · 107 .
The function performs only a few constant-time operations, so this should be
fine. Compare this with a non-backtracking generate and test solution, which
would need to construct about 1.1 · 1012 subsets instead – clearly too much.
Backtracking in principle works if we can:
156
9.3. B ACKTRACKING
Problem 9.4
6-cliques – 6clique
Class Picture – classpicture
Boggle – boggle
Geppetto – geppetto
Map Colouring – mapcolouring
Picking Apples – apples
157
C HAPTER 9. B RUTE F ORCE
The first line contains an integer 𝑘 (0 ≤ 𝑘 ≤ 15), giving the number of drones
to position. Then follows one line with 1 ≤ 𝑛 ≤ 100 000, the total number
of intersections in Basin City. Finally follow 𝑛 lines describing consecutive
intersections. The 𝑖’th line describes the 𝑖’th intersection in the following
format: The line starts with one integer 𝑑 (0 ≤ 𝑑 ≤ 4) describing the number
of intersections neighbouring the 𝑖’th one. Then follow 𝑑 integers denoting
the indices of these neighbouring intersections. They will be all distinct and
different from 𝑖. The intersections are numbered from 1 to 𝑛.
Output
If it is possible to position 𝑘 drones such that no two neighbouring intersections
have been assigned a drone, output a single line containing possible. Otherwise,
output a single line containing impossible.
At a first glance, it is not even obvious whether the problem is a brute force
problem, or if some smarter principle should be applied. After all, 100 000
intersections is a huge number of intersections! We can make the problem a bit
more reasonable with our first insight. If we have a large number of intersections,
and every intersection is adjacent to very few other intersection, it is probably
very easy to place the drones at appropriate intersections. To formalize this
insight, consider what happens when we place a drone at an intersection.
158
9.3. B ACKTRACKING
five intersections, we would be left with a new city where we need to place
𝑘 − 1 drones. This simple fact – which is the basis of a recursive solution to
the problem – tells us that if we have 𝑁 ≥ 5𝑘 − 4 intersections, we immediately
know the answer is possible. The −4 terms comes from the fact that when
placing the final drone, we no longer care about removing its neighbourhood,
since no further placements will take place.
Therefore, we can assume that the number of intersections is less than
5 · 15 − 4 = 71, i.e., 𝑛 ≤ 70. This certainly makes the problem seem much more
tractable. Now, let us start developing solutions to the problem.
First of all, we can attempt to use the same algorithm as we used for the
Max Clique problem. We could recursively construct the set of our 𝑘 drones
by, for each intersection, trying to either place a drone there or not. If placing a
drone at an intersection, we would forbid placing drones at any neighbouring
intersection.
Unfortunately, this means that we test every intersection when placing a
certain drone somewhere. This would give us a complexity of 𝑂 (𝑛𝑘 ). More
specifically, the execution time 𝑇 (𝑛, 𝑘) would satisfy 𝑇 (𝑛, 𝑘) ≈ 𝑇 (𝑛 − 1, 𝑘) +
𝑇 (𝑛 − 1, 𝑘 − 1), which implies 𝑇 (𝑛, 𝑘) ≈ 𝑛𝑘 = Ω(𝑛𝑘 ) (see Section ?? for more
details). For 𝑛 = 70, 𝑘 = 15, this is too high. The values of 𝑛 and 𝑘 do
suggest that an exponential complexity is in order, just not of this kind. Instead,
something similar to 𝑂 (𝑐 𝑘 ) where 𝑐 is a small constant would be a better fit. One
way of achiving such a complexity would be to limit the number of intersections
we must test to place a drone at before trying one that definitely works. If we
could manage to test only 𝑐 such intersections, we would get a complexity of
𝑂 (𝑐 𝑘 ).
159
C HAPTER 9. B RUTE F ORCE
Competitive Tip
In this problem, we tried to use the size of the parameters 𝑛 and 𝑘 together with the
time limit to guide the kind of solution we need to design. While this works most of
the time, note that this can sometimes be severely misleading – as this problem was
before we realized that having 100 000 intersections was red herring.
The trick, yet again, comes from Figure 9.4. Assume that we choose to
include the black intersection in our solution, but still can not construct a solution.
The only reason this case can happen is (aside from bad previous choices) that
no optimal solution includes this intersection. What could possibly stop this
intersection from being included in an optimal solution? It must be because
one of its gray neighbours is included in every optimal solution. If this was not
the case, then we could just pick an optimal solution where none of the gray
intersections were included and instead include the black vertex. Fortunately for
us, this gives us just what we need to improve our algorithm – either a given
intersection, or one of its neighbours, must be included in any optimal solution.
We have accomplished our goal of reducing the number of intersections to
test for each drone to a mere 5, which will give us a complexity of about 𝑂 (5𝑘 )
(possibly with an additional polynomial factor in 𝑛 depending on implementation).
This is still too much, unless, as the jury, noted, some “clever heuristics” are
applied. Fortunately for us, applying a common principle will speed things up
dramatically (even giving us a better time complexity).
Our next trick is to assume that the graph we are working with is connected.
In many problems the connected components of the graph are all independent of
each other. This is also the case in this problem. Clearly placing drones in one
component does not affect how we place drones in any other component, so we
can solve them separately by computing the maximal number of drones we can
place in every such set, until we have processed enough sets to place 𝑘 drones.
How does the connectedness help us? Consider what happens when we have
placed our first drone on the black intersection as in Figure 9.4. By removing
it and the gray neighbours, the white intersections must now have at most 3
neighbours instead. Recursing on one of the white intersections would then
leave us with only 4 choices, placing a drone on either the white vertex or one
of its (at most 3) neighbours. In fact, we can extend this reasoning to show
that there must always be an intersection with at most 3 neighbours! Proving
this is a straightforward proof by contradiction. Assume that we in a connected
graph have placed at least one drone, but all remaining intersections have four
160
9.3. B ACKTRACKING
neighbours. Then, none of these intersections can be neighbours with any vertex
we have removed so far. This means that the set of intersections removed and the
set of intersections remaining are actually disconnected, to the contrary of our
assumption of connectedness. Taking this insight to its conclusion, we achieve
a complexity of 𝑂 (4𝑘 ) by always branching on the intersection with the fewest
neighbours.
While such an algorithm is significantly faster than 𝑂 (5𝑘 ), further improve-
ments are possible. Again, let us consider under what circumstances a certain
intersection is excluded from any optimal solution. We have already concluded
that if this is the case, then one of its neighbours must be included in any optimal
solution. Can it ever be the case that only one of its neighbours are included in
an optimal solution, as in Figure 9.5 where we chose to place a drone on the
black vertex but none of the white intersections?
This is actually never the case. We can always move the drone from the
black intersection to the white intersection it has as neighbour, since none of
its other white intersections contain a drone. Now, we are basically done; for
any intersection, there will either be an optimal solution including it, or (at
least) two of its neighbours. Since an intersection has at most 4 neighbours,
it has at most 6 pairs of neighbours. This means our recursion will take time
𝑇 (𝑘) = 𝑇 (𝑘 − 1) + 6𝑇 (𝑘 − 2) in the worst case. This recurrence has the solution
3𝑘 , since 3𝑘−1 + 6 · 3𝑘−2 = 3𝑘−1 + 2 · 3𝑘−1 = 3 · 3𝑘−1 = 3𝑘 . A final improvement
would be to combine this insight with the independence of the connected subsets
of intersections. The second term of the time recurrence would then be a 3
161
C HAPTER 9. B RUTE F ORCE
The general version of this problem (without the bounded degree) is called
Independent Set, and is also one of the NP-complete problems.
So, what is the take-away regarding backtracking? Start by finding a way to
construct candidate solutions iteratively. Then, try to integrate the process of
testing the validity of a complete solution with the iterative construction, in the
hope of significantly reducing the number of candidate solutions which need
evaluating. Finally, we might need to use some additional insights, such as what
to branch on (which can be something complicated like the neighborhood of a
vertex), deciding whether to backtrack or not (i.e., improving the testing part) or
reducing the number of branches necessary (speeding up the generation part).
Problem 9.6
Domino – domino
Fruit Baskets – fruitbaskets
Infiltration – infiltration
Vase Collection – vase
162
9.4. F IXING PARAMETERS
Buying Books
Swedish Olympiad in Informatics 2010, Finals
You are going to buy 𝑁 books, and are currently checking the different 𝑀
internet book stores for prices. Each book is sold by at least one book store,
and can vary in prices between the different stores. Furthermore, each book
store incur a postage fee if you order from it. Postage may vary between the
various book stores, but it is always the same for a book store no matter how
many books you decide to order. You may order any number of books from any
number of the book stores. Compute the smallest amount of money you need to
pay for all the books.
Input
The first line contains two integers 1 ≤ 𝑁 ≤ 100 – the number of books, and
1 ≤ 𝑀 ≤ 15 – the number of book stores.
Then, 𝑀 descriptions of the book stores follow. The description of the 𝑖’th
store starts with a line containing two integers 0 ≤ 𝑃𝑖 ≤ 1 000 (the postage for
this book store), and 1 ≤ 𝐿𝑖 ≤ 𝑁 (the number of books this store sells). The
next 𝐿 lines contains the books sold by the store. The 𝑗’th book is described by
two integers 0 ≤ 𝐵𝑖,𝑗 < 𝑁 – the (zero-indexed) number of a book being sold
here, and 1 ≤ 𝐶𝑖,𝑗 ≤ 1000 – the price of this book at the current book store.
Output
Output a single integer – the smallest amount of money you need to pay for the
books.
If we performed naive generate and test on this problem, we would probably
get something like 15100 solutions, by testing every book stores for every book.
This is infeasible. So, why can we do better than this? There must be some hidden
structure in the problem that makes testing all those possibilities unnecessary.
To find this structure, we will analyze a candidate solution as given by the naive
generate and test method, i.e. an assignment from each book to a book store
where we should purchase it from.
For the sake of example, let’s assume that in this candidate solution, we
purchased books from the book stores 1, 4 and 5. If we then purchased a book
from store 4, but it was actually cheaper from store 1, we should have picked it
from there instead. Thus there seems to be quite a bit of redundancy in this set
of candidate solutions – a strong hint that we might have found some crucial
insight. We could decide to use this fact to turn our generate and test into a
163
C HAPTER 9. B RUTE F ORCE
164
9.5. M EET IN THE M IDDLE
the parameter to brute force over is not always this explicit, as in the following
problem, which asks us to find all integer solutions to an equation in a certain
interval.
Problem 9.7
Buying Books – buyingbooks
Integer Equation
Codeforces Round #262, Problem B
Given integers 𝑎 (|𝑎| ≤ 10 000), 𝑏 (1 ≤ 𝑏 ≤ 5), and 𝑐 (|𝑐 | ≤ 10 000), determine
the integers 𝑥 (1 ≤ 𝑥 ≤ 109 ) satisfying
𝑥 = 𝑎 · 𝑠 (𝑥)𝑏 + 𝑐,
165
C HAPTER 9. B RUTE F ORCE
build some fast data structure such that when testing the other half of the
parameter space, we can quickly find the best parameters for the first half. It is a
space-time tradeoff, in the sense that we improve the time usage (testing half of
the parameter space much faster), by paying with increased memory usage (to
save the pre-computed structures).
Subset Sum
Given a set of integers 𝑆, is there some subset 𝐴 ⊆ 𝑆 with a sum equal to 𝑇 ?
Input
The first line contains an integer 𝑁 , the size of 𝑆, and 𝑇 . The next line contains
𝑁 integers 𝑠 1, 𝑠 2, ..., 𝑠 𝑁 , separated by spaces – the elements of 𝑆. It is guaranteed
that 𝑠𝑖 ≠ 𝑠 𝑗 for 𝑖 ≠ 𝑗.
Output
Output possible if such a subset exists, and impossible otherwise.
In this problem, a simple generate and test solution would have 𝑁 parameters
to brute force over. For each element of 𝑆, we either choose to include it in 𝐴 or
not – a total of two choices for each parameter. This naive attempt at solving
the problem (which amounts to computing the sum of every subset) gives us
a complexity of 𝑂 (2𝑁 ). While sufficient for e.g. 𝑁 = 20, we can make an
improvement that makes the problem tractable even for 𝑁 = 40.
To figure out if a meet in the middle solution is applicable, the two halves of
the parameter space must to a large extent be independent. Individual choices in
one half should have little effect on the other. This is for example not the case
in the max clique problem. Deciding what vertices to include from, say, the
lower-numbered half will put very complicated constraints on what vertices we
could pick from the other half. Such a situation should discourage you from
attempting to meet in the middle.
In the subset sum problem, our parameters are to a large extent independent.
When fixing the 𝑁2 first parameters, which may mean we include elements with
a sum of 𝑈 , a single constraint is placed on the remaining 𝑁2 parameters; they
must sum to 𝑇 − 𝑈 , to together make the correct sum.
Thus, if we could quickly answer the question “can we choose the latter half
of the integers such that they have a given sum?” we could solve the problem by
fixing the first half of the parameters. Individually, this constraint takes 𝑂 (2 2 )
𝑁
time to check if we use brute force. However, we can compute the answer for
166
9.5. M EET IN THE M IDDLE
all such questions in one go by computing the sum of every subset of the latter
half of elements in Θ(𝑁 2 2 ). The resulting sums can be inserted into a hash set,
𝑁
Problem 9.9
Maximum Loot – maxloot
We will end the chapter by solving a brute force problem combining two
techniques.
Limited Correspondence
Greg Hamerly
Emil, a Polish mathematician, sent a simple puzzle by post to his British friend,
Alan. Alan sent a reply saying he didn’t have an infinite amount of time he could
spend on such non-essential things. Emil modified his puzzle (making it a bit
167
C HAPTER 9. B RUTE F ORCE
more restricted) and sent it back to Alan. Alan then solved the puzzle.
Here is the original puzzle Emil sent: given a sequence of pairs of strings
(𝑎 1, 𝑏 1 ), (𝑎 2, 𝑏 2 ), . . ., (𝑎𝑘 , 𝑏𝑘 ), find a non-empty sequence 𝑠 1 , 𝑠 2 , . . ., 𝑠𝑚 such
that the following is true:
where 𝑎𝑠1 𝑎𝑠2 . . . indicates string concatenation. The modified puzzle that Emil
sent added the following restriction: for all 𝑖 ≠ 𝑗, 𝑠𝑖 ≠ 𝑠 𝑗 .
You don’t have enough time to solve Emil’s original puzzle. Can you solve
the modified version?
Input
The input starts with a line containing an integer 1 ≤ 𝑘 ≤ 11, followed by 𝑘 lines.
Each of the 𝑘 lines contains two lowercase alphabetic strings which represent
a pair of strings. Each individual string will be non-empty and at most 100
characters long.
Output
Output the sequence found (if it is possible to form one) or IMPOSSIBLE (if it is not
possible to solve the problem). If it is possible but there are multiple sequences,
you should prefer the shortest one (in terms of the number of characters output).
If there are multiple shortest sequences, choose the one that is lexicographically
first.
The original problem as posed by Emil1 is called the Post correspondence
problem and is an undecidable problem, i.e. there is no algorithm in the familiar
sense that can solve the problem in finite time.
Alan’s added restriction to the problem, that if 𝑖 ≠ 𝑗, then 𝑠𝑖 ≠ 𝑠 𝑗 , means that
each pair of strings may be used in the sequence at most once. This clearly allows
us to solve the problem in finite time – we could simply test all 𝑘! permutations
of the pairs, and walk through them to see if the two strings formed by the
respective strings of all the pairs match. This would require around 𝑘! · 𝑘 · 100
operations, which is about 4 · 1010 for the maximum 𝑘 = 11, which is clearly
too slow.
The big difference in this problem compared to the one where we could
perform meet in the middle is that our selection of strings is highly order
dependent. We can’t arbitrarily split up our word pairs into a “first half” and a
1Emil Post, an American mathematician
168
9.5. M EET IN THE M IDDLE
“second half” and attempt to combine them. After all, the correct solution might
involve constructing a string where words from the two halves are intertwined.
Of course, we have previously seen how to deal with such a problem. If an
arbitrary choice is not good enough, we try all the choices. We simply fix the
parameter that is the subset of word pairs constituting the first half. There are
only 462 ways in which one can pick the first 5 word pairs out of a maximum
eleven – a small price to pay.
Fixing parameters thus allows ut to split up the words in two halfs. This
begs the question – if we attempt all possible permutations of the words in the
first half, what constraint do they put on the second half? Let’s check. Assume
that a given permutation of the (currently fixed) first half of the pairs give a
concatenated string of 𝑎’s is the string 𝑆, and without loss of generality shorter
than the concatenation of the 𝑏’s. First of all, it should be clear that 𝑆 must
be a substring of the concatenation of the 𝑏’s – otherwise, concatenating the
strings of the second half can’t make them equal. Thus, we assume that the
concatenation of the 𝑏’s is 𝑆𝑇 .
Symmetrically, this tells us that if the concatenation of all 𝑏’s in the second
half is the string 𝑈 , the concatenation of all the 𝑎’s must be 𝑇𝑈 , to together
make the strings 𝑆𝑇𝑈 . Thus, the question is – can we order the words of the
second part so that they create strings of the form 𝑈 and 𝑇𝑈 for any 𝑈 ?
This is precisely the kind of simple question that makes meet in the middle
possible. To answer it quickly, we check all the 6 words in the second half. If
they are of the form 𝑈 and 𝑋𝑈 , for some 𝑋 , we store it in a hash map from 𝑋 to
the lexicographically smallest 𝑈 that we found. With this map in hand, we can
determine if there is a way to complete the strings formed by the first half in
constant time.
In total, the cost is somewhere around about 462 · (5! · 5 + 6! · 6) ≈ 2 · 106
hash map operations, and 462 · (5! · 5 + 6! · 6) · 600 ≈ 1.3 · 109 individual
character operations depending on implementation. While the latter seems like
a lot, they are very fast – solving the worst-case on the author’s computer 5 times
takes less than a second.
Problem 9.10
Closest Sums – closestsums
Celebrity Split – celebritysplit
Circuit Counting – countcircuits
169
C HAPTER 9. B RUTE F ORCE
Indoorienteering – indoorienteering
Key to Knowledge – keytoknowledge
Knights in Fen – knightsfen
Rubrik’s Revenge in ... 2D!? 3D? – rubriksrevenge
170
10 Greedy Algorithms
In this chapter, we are going to look at another standard technique to solve some
categories of search and optimization problems faster than naive bruteforce, by
exploiting properties of local optimality.
We can use the tools from the previous chapter on brute force to formulate a
plain backtracking solution. Using a recursive function that takes in 𝑇 , we can
attempt to add a single coin of each type, and keep searching:
1: procedure MakeChange(integer 𝑇 )
2: if T = 0 then
3: return 0
4: answer ← 1 + MakeChange(T − 1)
5: if T ≥ 2 then
6: answer ← min(answer, 1 + MakeChange(T − 2))
7: if T ≥ 5 then
8: answer ← min(answer, 1 + MakeChange(T − 5))
9: return 𝑎𝑛𝑠𝑤𝑒𝑟
We can phrase this problem using the kind of graph we previously discussed.
Let the graph have vertices labeled 0, 1, 2, ...,𝑇 , representing the amount of
171
C HAPTER 10. G REEDY A LGORITHMS
money we wish to sum up to. For a vertex labeled 𝑥, we add edges to vertices
labeled 𝑥 − 1, 𝑥 − 2, 𝑥 − 5 (if those vertices exist), weighted 1. Traversing
such an edge represents adding a coin of denomination 1, 2 or 5. Then, the
Change-making Problem can be phrased as computing the shortest path from
the vertex 𝑇 to 0. The corresponding graph for 𝑇 = 5 can be seen in Figure 10.1.
-1
-1 -1
5 4 3 2 1 0
-1 -1 -1 -1 -1
-1 -1
Figure 10.1: The Change-making Problem, formulated as finding the shortest path in a DAG, for
𝐾 = 5.
So, how does this graph formulation help us? Solving the problem on the
graph as before using simple recursion would be very slow (with an exponential
complexity, even). In Chapter 11 on Dynamic Programming, we will see how to
solve such problems in polynomial time. For now, we will settle with solving
problems exhibiting yet another property besides having optimal substructure –
that of local optimality.
Exercise 10.1. Compute the shortest path from each vertex in Figure 10.1 to 𝑇
using the optimal substructure property.
172
10.3. L OCALLY O PTIMAL C HOICES
Figure 9.3 on page 155, since we essentially explore a big kind of graph in such
algorithms.
If the path consists of edges 𝑒 1, 𝑒 2, . . . , 𝑒𝑘 , the function we are to maximize
will be of the form
173
C HAPTER 10. G REEDY A LGORITHMS
Assume that the optimal solution uses the 1, 2 and 5 coins 𝑎, 𝑏 and 𝑐 times
respectively. We then have that either:
• 𝑎 = 0, 𝑏 = 0: value 0
• 𝑎 = 1, 𝑏 = 0: value 1
• 𝑎 = 0, 𝑏 = 1: value 2
• 𝑎 = 1, 𝑏 = 1: value 3
• 𝑎 = 0, 𝑏 = 2: value 4
174
10.4. S CHEDULING
10.4 Scheduling
Scheduling problems are a class of problems which deals with constructing large
subsets of non-overlapping intervals, from some given set of intervals.
The classical Scheduling Problem is the following.
Scheduling Problem
Given is a set of half-open (being open on the right) intervals 𝑆. Determine the
largest subset 𝐴 ⊆ 𝑆 of non-overlapping intervals.
Input
The input contains the set of intervals 𝑆, where |𝑆 |.
Output
The output should contain the subset 𝐴.
[ )
[ )
[ )
[ )
[ )
[ )
[ )
0 1 2 3 4 5 6 7 8
[ ) [ )[ ) [ )
Figure 10.2: An instance of the scheduling problem, with the optimal solution at the bottom.
175
C HAPTER 10. G REEDY A LGORITHMS
• a shortest interval,
• a longest interval,
• an interval with the leftmost left endpoint (and symmetrically, the rightmost
right endpoint),
As it turns out, we can always select an interval satisfying the fifth case.
In the example instance in Figure 10.2, this results in four intervals. First, the
interval with the leftmost right endpoint is the interval [1, 2). If we include this
in the subset 𝐴, intervals [0, 3) and [1, 6) must be removed since they overlap
[1, 2). Then, the interval [3, 4) would be the one with the leftmost right endpoint
of the remaining intervals. This interval overlaps no other interval, so it should
obviously be included. Next, we would choose [4, 6) (overlapping with [4, 7)),
and finally [7, 8). Thus, the answer would be 𝐴 = {[1, 2), [3, 4), [4, 6), [7, 8)}.
1: procedure Scheduling(set 𝑆)
2: ans ← new 𝑠𝑒𝑡
3: Sort 𝑆 by right endpoint
4: highest ← ∞
5: for each interval [𝑙, 𝑟 ) ∈ 𝑆 do
6: if 𝑙 ≥ highest then
7: ans.𝑖𝑛𝑠𝑒𝑟𝑡 ( [𝑙, 𝑟 ))
8: highest ← 𝑟
9:
Í
Lsums.𝑖𝑛𝑠𝑒𝑟𝑡 ( 𝑙 ∈𝐿 𝑙)
10: return ans
We can prove that this strategy is optimal using a swapping argument, one
of the main greedy proof techniques. In a swapping argument, we attempt to
prove that given any solution, we can always modify it in such a way that our
greedy choice is no worse. This is what we did in the Change-making Problem,
176
10.4. S CHEDULING
Exercise 10.3. For each of the first four strategies, find a set of intervals where
they fail to find an optimal solution.
Exercise 10.4. Prove that choosing to place the interval in the subset with the
rightmost right endpoint is optimal.
Problem 10.1
Entertainment Box – entertainmentbox
Disastrous Downtime – downtime
177
C HAPTER 10. G REEDY A LGORITHMS
178
10.5. H UFFMAN C ODING
Problem 10.2
Whether Report – whether
Chapter Notes
Determining whether coins of denominations 𝐷 can even be used to construct
an amount 𝑇 is an NP-complete problem in the general case[16]. It possible to
determine what cases can be solved using the greedy algorithm described in
polynomial time though[6]. Such a set of denominations is called a canonical
coin system.
Introduction to Algorithms[7] also treats the scheduling problem in its
chapter in greedy algorithms. It also brings up the connection between greedy
problems and a concept known as matroids, which is well worth studying.
179
C HAPTER 10. G REEDY A LGORITHMS
180
11 Dynamic Programming
This chapter will study a technique called dynamic programming (often abbrevi-
ated DP). In one sense, it is simply a technique to solve the general case of the
best path in a directed acyclic graph problem (Section 10.2) in cases where the
graph does not admit locally optimal choices, in time approximately equal to
the number of edges in the graph. For graphs which are essentially trees with a
unique path to each vertex, dynamic programming is no better than brute force.
In more interconnected graphs, where many paths lead to the same vertex, the
power of dynamic programming shines through. It can also be seen as a way
to speed up recursive functions (called memoization), which will be our first
application.
First, we will see a familiar example – the Change-making problem, with a
different set of denominations. Then, we will discuss a little bit of theory, and
finally round of with a few concrete examples and standard problems.
181
C HAPTER 11. D YNAMIC P ROGRAMMING
4: ans ← ∞
5: for denomination 𝑑 ∈ {1, 6, 7} do
6: if 𝑇 ≥ 𝑑 then
7: ans = min(ans, 1 + ChangeMaking(𝐷,𝑇 − 𝑑)
8: return ans
1 0
8 2 1 0
7 0
1 0
6 0
5 4 3 2 1 0
Figure 11.1: The recursion tree for the Change-making problem with 𝑇 = 10.
The key behind the optimal substructure property is that the answer for any
particular call in this graph with the same parameter 𝑐 is the same, independently
of the previous calls in the recursion. Right now, we perform calls with the same
parameters multiple times. Instead, we can save the result of a call the first time
we perform it:
182
11.2. D YNAMIC P ROGRAMMING
7: return memo[𝑇 ]
8: ans ← ∞
9: for denomination 𝑑 ∈ 𝐷 do
10: if 𝑇 ≥ 𝑑 then
11: ans = min(ans, 1 + ChangeMaking(𝐷,𝑇 − 𝑑)
12: memo[𝑇 ] ← ans
13: return ans
8 7 6 5 4 3 2 1 0
Figure 11.2: The recursion tree for the Change-making problem with 𝑇 = 10, with duplicate calls
merged.
Note the similarity between this graph and our previous DAG formulation
of the Change-making problem (Figure 10.1).
183
C HAPTER 11. D YNAMIC P ROGRAMMING
choices, how we got to the resulting state is no longer relevant – only where
we can go from there. Basically, we throw away the information (what exact
coins we used) that is no longer needed. This view of dynamic programming
problems as having a “forgetful” property, that the exact choices we have made
do not affect the future, is useful in most dynamic programming problems.
Another, more naive view, is that dynamic programming solutions are simple
recursions, where we happen to solve the same recursive subproblem a large
number of times. In this view, a DP solution is basically nothing more than a
recursive solution – find the correct base cases, a fast enough recursion, and
memoize the results.
More pragmatically, DP consists of two important parts – the states of the
DP, and the computation of a state. Both of these parts are equally important.
Fewer states generally mean less computations to make, and a better complexity
per state gives a better complexity overall.
Bottom-Up Computation
When applied to a dynamic programming problem, memoization is sometimes
called top-down dynamic programming instead. The name is inspired from the
way we compute the solution to our problem by starting at the largest piece at
the top of the recursion tree, and recursively breaking it down to smaller and
smaller pieces.
There is an alternative way of implementing a dynamic programming
solution, which (not particularly surprisingly) is called bottom-up dynamic
programming. This method instead constructs the solutions to our sub-problems
in the other order, starting with the base case and iteratively computing solutions
to larger sub-problems.
For example, we might just as well compute the solution to the Change-
making problem the following way:
184
11.2. D YNAMIC P ROGRAMMING
185
C HAPTER 11. D YNAMIC P ROGRAMMING
nuisance that bottom-up users have to deal with, it is related to one of the perks
of bottom-up computation. If the order of computation is chosen in a clever
way, we need not save every state during our computation. Consider e.g. the
Change-making Problem again, which had the following recursion:
(
0 if 𝑛 = 0
𝑐ℎ𝑎𝑛𝑔𝑒 (𝑛) =
min(𝑐ℎ𝑎𝑛𝑔𝑒 (𝑛 − 1), 𝑐ℎ𝑎𝑛𝑔𝑒 (𝑛 − 6), 𝑐ℎ𝑎𝑛𝑔𝑒 (𝑛 − 7)) if 𝑛 > 0
It should be clear that using the order of computation 0, 1, 2, 3, ..., once we have
computed e.g. 𝑐ℎ𝑎𝑛𝑔𝑒 (𝑘), the subproblems 𝑐ℎ𝑎𝑛𝑔𝑒 (𝑘 − 7), 𝑐ℎ𝑎𝑛𝑔𝑒 (𝑘 − 8), ... etc.
are never used again.
Thus, we only need to save the value of 7 subproblems at a time. This Θ(1)
memory usage is pretty neat compared to the Θ(𝐾) usage needed to compute
𝑐ℎ𝑎𝑛𝑔𝑒 (𝐾) otherwise.
Competitive Tip
Generally, memory limits are very generous nowadays, somewhat diminishing the art
of optimizing memory in DP solutions. It can still be a good exercise to think about
improving the memory complexity of the solutions we will look at, for the few cases
where these limits are still relevant.
11.3 Multidimensional DP
Now, we are going to look at a DP problem where our state consists of more
than one variable. The example will demonstrate the importance of carefully
choosing your DP parameters.
Ferry Loading
Swedish Olympiad in Informatics 2013, Online Qualifiers
A ferry is to be loaded with cars of different lengths, with a long line of cars
currently queued up for a place. The ferry consists of four lanes, each of the
same length. When the next car in the line enters the ferry, it picks one of the
lanes and parks behind the last car in that line. There must be safety margin of 1
meter between any two parked cars.
Given the length of the ferry and the length of the cars in the queue, compute
the maximal amount of cars that can park if they choose the lanes optimally.
186
11.3. M ULTIDIMENSIONAL DP
1
2 2
5 5 1
1
2
1
Figure 11.3: An optimal placement on a ferry of length 5 meters, of the cars with lengths
2, 1, 2, 5, 1, 1, 2, 1, 1, 2 meters. Only the first 8 cars could fit on the ferry.
Input
The first line contains the number of cars 0 ≤ 𝑁 ≤ 200 and the length of the
ferry 1 ≤ 𝐿 ≤ 60. The second line contains 𝑁 integers, the length of the cars
1 ≤ 𝑎𝑖 ≤ 𝐿.
Output
Output a single integer – the maximal number of cars that can be loaded on the
ferry.
The ferry problem looks like a classical DP problem. It consists of a large
number of similar choices. Each car has 4 choices – one of the lanes. If a car of
length 𝑚 chooses a lane, the remaining length of the chosen lane is reduced by
𝑚 + 1 (due to the safety margin). After the first 𝑐 cars have parked on the ferry,
the only thing that has changed are the lengths of the ferry. As a simplification,
we increase the initial length of the ferry by 1, to accommodate an imaginary
safety margin for the last car in a lane in case it is completely filled.
This suggests a DP solution with 𝑛𝐿 4 states, each state representing the
number of cars so far placed and the lengths of the four lanes:
Ferry Loading
1 int dp[201][62][62][62][62] = {-1};
2
3 int ferry(int c, vi used, const vi& A) {
4 if (c == sz(A)) return 0;
5 int& ans = dp[c][used[0]][used[1]][used[2]][used[3]];
6 if (ans != -1) return ans;
7 rep(i,0,4) {
8 if (used[i] + A[i] + 1 > L + 1) continue;
9 used[i] += A[i] + 1;
10 ans = max(ans, ferry(c + 1, used, A) + 1);
187
C HAPTER 11. D YNAMIC P ROGRAMMING
11 used[i] -= A[i] + 1;
12 }
13 return ans;
14 }
11.4 Subset DP
Another common theme of DP is subsets, where the state represents a subset
of something. The subset DP is used in many different ways. Sometimes (as
in Subsection 11.6.4), the problem itself is about sets and subsets. Another
common usage is to reduce a solution which requires us to test permutations
of something into instead constructing permutations iteratively, using DP to
remember only what elements so far used in the permutation, and not the exact
assignment.
188
11.4. S UBSET DP
Amusement Park
Swedish Olympiad in Informatics 2012, Online Qualifiers
Lisa has just arrived at an amusement park, and wants to visit each of the 𝑁
attractions exactly once. For each attraction, there are two identical facilities at
different locations in the park. Given the locations of all the facilities, determine
which facility Lisa should choose for each attraction, in order to minimize the
total distance she must walk. Originally, Lisa is at the entrance at coordinates
(0, 0). Lisa must return to the entrance once she has visited every attraction.
Input
The first line contains the integer 1 ≤ 𝑁 ≤ 15, the number of attractions Lisa
wants to visit. Then, 𝑁 lines follow. The 𝑖’th of these lines contains four integers
−106 ≤ 𝑥 1, 𝑦1, 𝑥 2, 𝑦2 ≤ 106 . These are the coordinates (𝑥 1, 𝑦1 ) and (𝑥 2, 𝑦2 ) for
the two facilities of the 𝑖’th attraction.
Output
First, output the smallest distance Lisa must walk. Then, output 𝑁 lines, one
for each attraction. The 𝑖’th line should contain two numbers 𝑎 and 𝑓 – the 𝑖’th
attraction Lisa visited (a number between 1 and 𝑁 ), and the facility she visited
(1 or 2).
Consider a partial walk, where we have visited a set 𝑆 of attractions and
currently stand at coordinates (𝑥, 𝑦). Then, any choice up to this point is
irrelevant for the remainder of the problem, which suggests that these parameter
𝑆, 𝑥, 𝑦 is a good DP state. Note that (𝑥, 𝑦) only have at most 31 possibilities –
two for each attraction, plus the entrance at (0, 0). Since we have at most 15
attractions, the set 𝑆 of visited attractions has 215 possibilities. This gives us
31 · 215 ≈ 106 states. Each state can be computed in Θ(𝑁 ) time, by choosing
what attraction to visit next. All in all, we get a complexity of Θ(𝑁 2 2𝑁 ).
When coding DP over subsets, we generally use bitsets to represent the subset,
since these map very cleanly to integers (and therefore indices into a vector):
189
C HAPTER 11. D YNAMIC P ROGRAMMING
Amusement Park
1 double best(int at, int visited) {
2 // 2N is the number given to the entrance point
3 if (visited == (1<<N) - 1) return dist(at, 2*N);
4 double ans = inf;
5 rep(i,0,N) {
6 if (visited&(1<<N)) continue;
7 rep(j,0,2) {
8 // 2i + j is the number given to the j'th facility
9 // of the i'th attraction
10 int nat = 2 * i + j;
11 ans = min(ans, dist(at + nat) + best(nat, visited | (1<<i)));
12 }
13 }
14 return ans;
15 }
11.5 Digit DP
Digit DP are a class of problems where we count numbers with certain properties
that contain a large number of digits, up to a certain limit. These properties are
characterized by having the classical properties of DP problems, i.e. being easily
computable if we would construct the numbers digit-by-digit by remembering
very little information about what those numbers actually were.
190
11.5. D IGIT DP
Palindrome-Free Numbers
Baltic Olympiad in Informatics 2013 – Antti Laaksonen
A string is a palindrome if it remains the same when it is read backwards. A
number is palindrome-free if it does not contain a palindrome with a length
greater than 1 as a substring. For example, the number 16276 is palindrome-free
whereas the number 17276 is not because it contains the palindrome 727. The
number 10102 is not valid either, since it has 010 as a substring (even though
010 is not a number itself).
Your task is to calculate the total number of palindrome-free numbers in a
given range.
Input
The input contains two numbers 0 ≤ 𝑎 ≤ 𝑏 ≤ 1018 .
Output
Your output should contain one integer: the total number of palindrome-free
numbers in the range 𝑎, 𝑎 + 1, ..., 𝑏 − 1, 𝑏 (including 𝑎 and 𝑏).
First, a common simplification when solving counting problems on intervals.
Instead of computing the answer for the range 𝑎, 𝑎 + 1, ..., 𝑏 − 1, 𝑏, we will solve
the problem for the intervals [0, 𝑎) and [0, 𝑏 + 1). The answer is then the answer
for the second interval with the answer for the first interval removed. Our lower
limit will then be 0 rather than 𝑎, which simplifies the solution.
Next up, we need an essential observation to turn the problem into a standard
application of digit DP. Palindromes as general objects are very unwieldly in
our situation. Any kind of iterative construction of numbers would have to
bother with digits far back in the number since any of them could be the edge
of a palindrome. Fortunately, it turns out that any palindrome must contain a
rather short palindromic subsequence, namely one of length 2 (for even-length
palindromes), or length 3 (for odd-length palindromes). This means that when
constructing the answer recursively, we only need to care about the last two
digits. When adding a digit to a partially constructed number, it may not be
equal to either of the last two digits.
Before arriving at the general solution, we will solve the problem when the
upper limit was 999...999 – the sequence consisting of 𝑛 nines. In this case, a
simple recursive function will do the trick:
191
C HAPTER 11. D YNAMIC P ROGRAMMING
Palindrome-Free Numbers
1 ll sol(int at, int len, int b1, int b2) {
2 if (at == len) return 1; // we have successfully constructed a number
3 ll ans = 0;
4 rep(d,0,10) {
5 // this digit would create a palindrome
6 if (d == b2 || d == b1) continue;
7 // let -1 represent a leading 0, to avoid the palindrome check
8 bool leadingZero = b2 == -1 && d == 0;
9 ans += sol(at + 1, len, b2, leadingZero ? -1 : d);
10 }
11 return ans;
12 }
13
14 // we start with an empty number with leading zeroes
15 sol(0, n, true, -1, -1);
We fix the length of all numbers to have length 𝑛, by giving shorter numbers
leading zeroes. Since leading zeroes in a number are not subject to the palindrome
restriction, they must be treated differently. In our case, they are given the
special digit −1 instead, resulting in 11 possible “digits”. Once this function is
memoized, it will have 𝑛 · 2 · 11 · 11 different states, with each state using a loop
iterating only 10 times. Thus, it uses on the order of 1000𝑛 operations. In our
problem, the upper limit has at most 19 digits. Thus, the solution only requires
about 20 000 operations.
Once a solution has been formulated for this simple upper limit, extending
it to a general upper limit is quite natural. First, we will save the upper limit
as a sequence of digits 𝐿. Then, we need to differentiate between two cases in
our recursive function. The partially constructed number is either equal to the
corresponding partial upper limit, or it is less than the limit. In the first case, we
are still constrained by the upper limit – the next digit of our number can not
exceed the next digit of the upper limit. In the other case, the the upper limit is
no longer relevant. If a prefix of our number is strictly lower than the prefix of
the upper limit, our number can never exceed the upper limit.
This gives us our final solution:
192
11.6. S TANDARD P ROBLEMS
Knapsack
The knapsack problem is one of the most common standard DP problem. The
problem itself has countless variations. We will look at the “original” knapsack
problem, with constraints making it suitable for a dynamic programming
approach.
Knapsack
Given is a knapsack with an integer capacity 𝐶, and 𝑛 different objects, each
with an integer weight and value. Your task is to select a subset of the items with
maximal value, such that the sum of their weights does not exceed the capacity
of the knapsack.
Input
193
C HAPTER 11. D YNAMIC P ROGRAMMING
The integer 𝐶 giving the capacity of the knapsack, and an integer 𝑛, giving the
number of objects. This is followed by the 𝑛 objects, given by their value 𝑣𝑖 and
weight 𝑤𝑖 .
Output
Output the indicies of the chosen items.
We are now going to attempt to formulate an 𝑂 (𝑛𝐶) solution. As is often
the case when the solution is a subset of something in DP solutions, we solve
the problem by looking at the subset as a sequence of choices – to either include
an item in the answer or not. In this particular problem, our DP state is rather
minimalistic. Indeed, after including a few items, we are left only with the
remaining items and a smaller knapsack to solve the problem for.
Letting 𝐾 (𝑐, 𝑖) be the maximum value using at most weight 𝑐 and the 𝑖 first
items, we get the recursion
(
𝐾 (𝑐, 𝑖 − 1)
𝐾 (𝑐, 𝑖) = max
𝐾 (𝑐 − 𝑤𝑖 , 𝑖 − 1) + 𝑣𝑖 if 𝑤𝑖 ≤ 𝑐
However, this only helps us compute the answer. The problem asks us to
explicitly construct the subset. This step, i.e., tracing what choices we made to
arrive at an optimal solution is called backtracking.
For this particular problem, the backtracking is relatively simple. One
usually proceeds by starting at the optimal state, and then consider all transitions
that lead to this state. Among these, the “best” one is picked. In our case, the
transitions correspond to either choosing the current item, or not choosing it.
194
11.6. S TANDARD P ROBLEMS
Both lead to two other states which are simple to compute. In the first case,
the state we arrived from must have the same value and capacity, while in the
second case the value should differ by 𝑉 [𝑖] and the weight by 𝑊 [𝑖]:
Make sure to study the implementation closely; this kind of reconstruction
is a bit tricky to get right, but most reconstructions look something like it.
Problem 11.1
Knapsack – knapsack
Walrus Weights – walrusweights
𝐾 (𝑐, 𝑖) = 𝐾 (𝑐, 𝑖 + 1) + 𝐾 (𝑐 − 𝑤𝑖 , 𝑖)
195
C HAPTER 11. D YNAMIC P ROGRAMMING
0 if 𝑛 = 0 or 𝑚 = 0
lcs(𝐴, 𝐵, 𝑛 − 1, 𝑚) if 𝑛 > 0
lcs(𝐴, 𝐵, 𝑛, 𝑚) = max
lcs(𝐴, 𝐵, 𝑛, 𝑚 − 1) if 𝑚 > 0
lcs(𝐴, 𝐵, 𝑛 − 1, 𝑚 − 1) + 1
if 𝑎𝑛 = 𝑏𝑚
TODO
Problem 11.2
Longest Common Subsequence – longcommonsubseq
196
11.6. S TANDARD P ROBLEMS
Set Cover
Set Cover
You are given a family of subsets 𝑆 1, 𝑆 2, ..., 𝑆𝑘 of some larger set 𝑆 of size 𝑛. Find
a minimum number of subsets 𝑆𝑎1 , 𝑆𝑎2 , ..., 𝑆𝑎𝑙 such that
𝑙
Ø
𝑆 𝑎𝑖 = 𝑆
𝑖=1
i.e., cover the set 𝑆 by taking the union of as few of the subsets 𝑆𝑖 as possible.
For small 𝑘 and large 𝑛, we can solve the problem in Θ(𝑛2𝑘 ), by simply
testing each of the 2𝑘 covers. In the case where we have a small 𝑛 but 𝑘 can be
large, this becomes intractable. Instead, let us apply the principle of dynamic
programming. In a brute force approach, we would perform 𝑘 choices. For each
subset, we would try including it or excluding it. After deciding which of the
first 𝑚 subsets to include, what information is relevant? Well, if we consider
what the goal of the problem is – covering 𝑆 – it would make sense to record
what elements have been included so far. This little trick leaves us with a DP of
Θ(𝑘2𝑛 ) states, one for each subset of 𝑆 we might have reached, plus counting
how many of the subsets we have tried to use so far. Computing a state takes
Θ(𝑛) time, by computing the union of the current cover with the set we might
potentially add. The recursion thus looks like:
(
0 if 𝐶 = 𝑆
cover(𝐶, 𝑘) =
min(𝑐𝑜𝑣𝑒𝑟 (𝐶, 𝑘 + 1), 𝑐𝑜𝑣𝑒𝑟 (𝐶 ∪ 𝑆𝑘 , 𝑘 + 1)) else
197
C HAPTER 11. D YNAMIC P ROGRAMMING
actually result in us adding something, since we can only add a new element at
most 𝑛 times.
Applying the same change to our set cover solution, we should instead do
DP over our current cover, and only try including sets which are not subsets of
the current cover. So, does this help? How many subsets are there, for a given
cover 𝐶, which are not its subsets? If the size of 𝐶 is 𝑚, there are 2𝑚 subsets of
𝐶, meaning 2𝑛 − 2𝑚 subsets can add a new element to our cover.
To find out how much time this needs, we will use two facts. First of all, there
are 𝑚𝑛
subsets of size 𝑚 of a size 𝑛 set. Secondly, the sum 𝑚=0 𝑚 2 =3 .
𝑛 𝑚
Í𝑛 𝑚
If you are not familiar with this notation or this fact, you probably want to take a
look at Section ?? on binomial coefficients.
So, summing over all possible extending subsets for each possible partial 𝐶,
we get:
𝑛
Õ 𝑛
(2𝑛 − 2𝑚 ) = 2𝑛 · 2𝑛 − 3𝑛 = 4𝑛 − 3𝑛
𝑚=0
𝑚
It seems that we are missing some key function which, given a set 𝐴, can
respond to the question: “is there some subset 𝑆𝑖 , that could extend our cover
with some subset 𝐴 ⊆ 𝑆?”. If we had such a function, computing all possible
extensions of a cover of size 𝑚 would instead take time 2𝑛−𝑚 – the number of
possible extensions to the cover. Last time we managed to extend a cover in time
2𝑛 − 2𝑚 , but this is exponentially better!
198
11.6. S TANDARD P ROBLEMS
Chapter Notes
199
C HAPTER 11. D YNAMIC P ROGRAMMING
200
12 Divide and Conquer
A recursive algorithm solves a problem by reducing it to smaller subproblems,
hoping that their solutions can be used to solve the larger problem. So far, the
subproblems we have considered have been “almost the same” as the problem at
hand. We have usually recursed on a series of choices, where each recursive
step made one choice, as in the change-making problem. In particular, our
subproblems often overlapped – solving two different subproblems required
solving a common, third subproblem. In this chapter, we will take another
approach altogether, by splitting our instance into large, disjoint (or almost
disjoint parts) parts – dividing it – and combining their solutions – conquering
it.
Grid Tiling
In a square grid of side length 2𝑛 , one unit square is blocked (represented by
coloring it black). Your task is to cover the remaining 4𝑛 − 1 squares with
triominos, 𝐿-shaped tiles consisting of three squares in the following fashion.
The triominos can be rotated by any multiple of 90 deg (Figure 12.1).
201
C HAPTER 12. D IVIDE AND C ONQUER
The triominos may not overlap each other, nor cover anything outside the
grid. A valid tiling for 𝑛 = 2 would be
Input
The input consists of three integers 1 ≤ 𝑛 ≤ 8, 0 ≤ 𝑥 < 2𝑛 and 0 ≤ 𝑦 < 2𝑛 . The
black square has coordinates (𝑥, 𝑦).
Output
Output the positions and rotations of any valid tiling of the grid.
When tiling a 2𝑛 × 2𝑛 grid, it is not immediately clear how the divide and conquer
principle can be used. To be applicable, we must be able to reduce the problem
into smaller instances of the same problem and combine them. The peculiar
side length 2𝑛 does hint about a possible solution. Aside from the property
that 2𝑛 · 2𝑛 − 1 is evenly divisible by 3 (a necessary condition for a tiling to be
possible), it also gives us a natural way of splitting an instance, namely into its 4
quadrants.
202
12.1. I NDUCTIVE C ONSTRUCTIONS
Each of these have the size 2𝑛−1 × 2𝑛−1 , which is also of the form we require
of grids in the problem. The crux lies in that these four new grids do not comply
with the input specification of the problem. While smaller and disjoint, three of
them contain no black square, a requirement of the input. Indeed, a grid of this
size without any black squares can not be tiled using triominos.
The solution lies in the trivial solution to the 𝑛 = 1 case, where we can easily
reduce the problem to four instances of the 𝑛 = 0 case:
Figure 12.5: Placing a triomino in the corners of the quadrants without a black square.
After this transformation, we can now apply the divide and conquer principle.
We split the grid into its four quadrants, each of which now contain one black
square. This allows us to recursively solve four new subproblems. At some
point, this recursion will finally reach the base case of a 1 × 1 square, which
must already be filled.
203
C HAPTER 12. D IVIDE AND C ONQUER
The time complexity of the algorithm can be computed easily if we use the
fact that each call to tile only takes Θ(1) time except for the four recursive calls.
Furthermore, each call places exactly one tile on the board. Since there are 4 3−1
𝑛
Exercise 12.1. It is possible to tile such a grid with triominos colored red, blue
and green such that no two triominos sharing an edge have the same color. Prove
this fact, and give an algorithm to generate such a coloring.
Divisible Subset
Let 𝑛 = 2𝑘 . Given a set 𝐴 of 2𝑛 − 1 integers, find a subset 𝑆 of size exactly 𝑛
such that Õ
𝑥
𝑥 ∈𝑆
is a multiple of 𝑛.
204
12.1. I NDUCTIVE C ONSTRUCTIONS
Input
The input contains an integer 1 ≤ 𝑛 ≤ 215 that is a power of two, followed by
the 2𝑛 − 1 elements of 𝐴.
Output
Output the 𝑛 elements of 𝑆.
When given a problem, it is often a good idea to solve a few small cases
by hand. This applies especially to this kind of construction problems, where
constructions for small inputs often shows some pattern or insight into how to
solve larger instances. The case 𝑛 = 1 is not particularly meaningful, since it
is trivially true (any integer is a multiple of 1). When 𝑛 = 2, we get an insight
which might not seem particularly interesting, but is key to the problem. We are
given 2 · 2 − 1 = 3 numbers, and seek two numbers whose sum is even. Given
three numbers, it must have either two numbers which both are even, or two odd
numbers. Both of these cases yield a pair with an even sum.
It turns out that this construction generalizes to larger instances. Generally,
it is easier to do the “divide” part of a divide and conquer solution first, but in
this problem we will do it the other way around. The recursion will follow quite
naturally after we attempt to find a way in combining solutions to the smaller
instance to a larger one.
We will lay the the ground work for a reduction of the case 2𝑛 to 𝑛. First,
assume that we could solve the problem for a given 𝑛. The larger instance then
contains 2(2𝑛 − 1) = 4𝑛 − 1 numbers, of which we seek 2𝑛 numbers whose sum
is a multiple of 2𝑛. This situation is essentially the same as for the case 𝑛 = 2,
except everything is scaled up by 𝑛. Can we scale our solution up as well?
If we have three sets of 𝑛 numbers whose respective sums are all multiples
of 𝑛, we can find two sets of 𝑛 numbers whose total sum is divisible by 2𝑛.
This construction essentially use the same argument as for 𝑛 = 2. If the three
subsets have sums 𝑎𝑛, 𝑏𝑛, 𝑐𝑛 and we wish to find two whose sum is a multiple of
2𝑛, this is the same as finding two numbers of 𝑎, 𝑏, 𝑐 whose sum is a multiple of
2. This is possible, according to the case 𝑛 = 2.
A beautiful generalization indeed, but we still have some remnants of wishful
thinking we need to take care of. The construction assumes that, given 4𝑛 − 1
numbers, we can find three sets of 𝑛 numbers whose sum are divisible by 𝑛.
We have now come to the recursive aspect of the problem. By assumption, we
could solve the problem for 𝑛. This means we can pick any 2𝑛 − 1 of our 4𝑛 − 1
numbers to get our first subset. The subset uses up 𝑛 of our 4𝑛 − 1 numbers,
205
C HAPTER 12. D IVIDE AND C ONQUER
206
12.1. I NDUCTIVE C ONSTRUCTIONS
41 return ans;
42 }
Exercise 12.2. What happens if we, when solving the problem for some 𝑘,
construct 𝑘 − 1 pairs of integers whose sum are even, throw away the remaining
element, and scale the problem down by 2 instead? What is the complexity
then?
Exercise 12.3. The problem can be solved using a similar divide and conquer
algorithm for any 𝑘, not just those which are powers of two1. In this case, those
𝑘 which are prime numbers can be treated as base cases. How is this done for
composite 𝑘? What is the complexity?
Exercise 12.4. The knight piece in chess can move in 8 possible ways (moving 2
steps in any one direction, and 1 step in one of the two perpendicular directions).
A closed tour exists for an 8 × 8 grid.
Exercise 12.5. An 𝑛-bit Gray code is a sequence of all 2𝑛 bit strings of length
𝑛, such that two adjacent bit strings differ in only one position. The first and
1This result is known as the Erdős–Ginzburg–Ziv theorem
207
C HAPTER 12. D IVIDE AND C ONQUER
last strings of the sequence are considered adjacent. Possible Gray codes for the
first few 𝑛 are
𝑛 = 1: 0 1
𝑛 = 2: 00 01 11 10
𝑛 = 3: 000 010 110 100 101 111 011 001
Give an algorithm to construct an 𝑛-bit Gray code for any 𝑛.
Problem 12.1
Bell Ringing – bells
Hamiltonian Hypercube – hypercube
5 1 6 3 7 2 0 4
5 1 6 3 7 2 0 4
5 1 6 3 7 2 0 4
5 1 6 3 7 2 0 4
Figure 12.7: The recursion tree given when performing a recursive split of the array
[5, 1, 6, 3, 7, 2, 0, 4].
When we have sorted the two halves, we need to combine them to get a sorted
version of the entire array. The procedure to do this is based on a simple insight.
If an array 𝐴 is partitioned into two smaller arrays 𝑃 1 and 𝑃2 , the smallest value
208
12.2. M ERGE S ORT
𝑘
Õ
2𝑖 · Θ(2𝑘−𝑖 ) = Θ(𝑘2𝑘 )
𝑖=0
Exercise 12.6. Our complexity analysis assumed that the length of the array is a
power of 2. The complexity is the same in the general case. Prove this fact.
Exercise 12.7. Given an array 𝐴 of size 𝑛, we call the pair 𝑖 < 𝑗 an inversion
of 𝐴 if 𝐴[𝑖] > 𝐴[ 𝑗]. Adapt the merge sort algorithm to count the number of
inversions of an array in Θ(𝑛 log 𝑛).
209
C HAPTER 12. D IVIDE AND C ONQUER
mid
x
lo hi
f (x)
mid
x
lo hi
f (x)
mid
210
12.3. B INARY S EARCH
211
C HAPTER 12. D IVIDE AND C ONQUER
For example, if we have an interval of size 109 which we wish to binary search
down to 10−7 , this would require log2 1016 = 54 iterations of binary search.
Now, let us study some applications of binary search.
Optimization Problems
212
12.3. B INARY S EARCH
hot dogs 𝑓 (𝑥) can be constructed from a given length 𝑥. After this inversion,
the problem is now on the form which binary search solves: we wish to find
the greatest 𝑥 such that 𝑓 (𝑥) = 𝑀 (replacing ≤ with = is equivalent in the cases
where we know that 𝑓 (𝑥) assume the value we are looking for). We know that
this length is at most max𝑖 𝑎𝑖 ≤ 106 , which gives us the interval (0, 106 ] to
search in.
What remains is to actually compute the function 𝑓 (𝑥). In our case, this can
be done by considering just a single rod. If we want to construct hot dogs of
length 𝑥, we can get at most b 𝑎𝑥𝑖 c hot dogs from a rod of length 𝑎𝑖 . Summing
this for every rod gives us our solution.
The key to our problem was that the number of hot dogs constructible with a
length 𝑥 was monotonically decreasing with 𝑥. It allowed us to perform binary
search on the answer, a powerful technique which is a component of many
optimization problems. In general, it is often easier to determine if an answer is
acceptable, rather than computing a maximal answer.
213
C HAPTER 12. D IVIDE AND C ONQUER
At first, we know nothing about location of the element – its position could be
anyone of [0, 𝑛). So, we consider the middle element, 𝑚𝑖𝑑 = b 𝑛2 c, and compare
𝐴[𝑚𝑖𝑑] to 𝑥. Since 𝐴 is sorted, this leaves us with three cases:
• 𝐴[𝑚𝑖𝑑] < 𝑥 – by the same reasoning, 𝑥 can only lie to the right of 𝑚𝑖𝑑.
The last two cases both halve the size of the sub-array which 𝑥 could be inside.
Thus, after doing this halving log2 𝑛 times, we have either found 𝑥 or can
conclude that it is not present in the array.
Competitive Tip
When binary searching over discrete domains, care must be taken. Many bugs have
been caused by improper binary searches2 .
The most common class of bugs is related to the endpoints of your interval (i.e.
whether they are inclusive or exclusive). Be explicit regarding this, and take care that
each part of your binary search (termination condition, midpoint selection, endpoint
updates) use the same interval endpoints.
214
12.3. B INARY S EARCH
Or Max
Petrozavodsk Winter Training Camp 2015
Given is an array 𝐴 of integers. Let
i.e. the bitwise or of the 𝑘 consecutive numbers starting with the 𝑖’th,
i.e. the maximum of the 𝑘 consecutive numbers starting with the 𝑖’th, and
As an example, consider the array in Figure 12.9. The best answer for 𝑘 = 1
would be 𝑆 (0, 1), with both maximal element and bitwise or 5, totaling 10. For
𝑘 = 2, we have 𝑆 (6, 2) = 7 + 4 = 11.
This problem can easily be solved in Θ(𝑛 2 ), by computing every 𝑆 (𝑖, 𝑘)
iteratively. We can compute all the 𝐵(𝑖, 𝑘) and 𝑀 (𝑖, 𝑘) using the recursions
(
0 if 𝑘 = 0
𝐵(𝑖, 𝑘) :=
𝐵(𝑖, 𝑘 − 1) | 𝐴[𝑖 + 𝑘 − 1] if 𝑘 > 0
215
C HAPTER 12. D IVIDE AND C ONQUER
i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9
5 1 4 2 2 0 4 3 1 2
101 001 100 010 010 000 100 011 001 010
Figure 12.9: Example array, with the numbers additionally written in binary.
(
0 if 𝑘 = 0
𝑀 (𝑖, 𝑘) =
max{𝑀 (𝑖, 𝑘 − 1), 𝐴[𝑖 + 𝑘 − 1]} if 𝑘 > 0
by looping over 𝑘, once we fix an 𝑖. With 𝑛 = 100 000, this approach is too slow.
The difficulty of the problem lies in 𝑆 (𝑖, 𝑘) consisting of two basically
unrelated parts – the maximal element and the bitwise or of a segment. When
maximizing sums of unrelated quantities that put constraints on each other,
brute force often seems like a good idea. This is basically what we did in the
Buying Books problem (Section 9.4), where we minimized the sum of two parts
(postage and book costs) which constrained each other (buying a book forced us
to pay postage to its store) by brute forcing over one of the parts (the set of stores
to buy from). Since the bitwise or is much more complicated than the maximal
element – it is decided by an entire interval rather than a single element – we
are probably better of doing brute force over the maximal element. Our brute
force will consist of fixing which element is our maximal element, by assuming
that 𝐴[𝑚] is the maximal element.
With this simplification in hand, only the bitwise or remains. We could
now solve the problem by looping over all the left endpoints of the interval and
all the right endpoints of the interval. At a first glance, this seems to actually
worsen the complexity. Indeed, this takes quadratic time for each 𝑚 (on average),
resulting in a cubic complexity.
This is where we use our new technique. It turns out that, once we fix 𝑚, there
are only a few possible values for the bitwise or of the intervals containing the𝑚’th
element, Any such interval 𝐴[𝑙], 𝐴[𝑙 + 1], ..., 𝐴[𝑚 − 1], 𝐴[𝑚], 𝐴[𝑚 + 1], ..., 𝐴[𝑟 −
1], 𝐴[𝑟 ] can be split into two parts: one to the left, 𝐴[𝑙], 𝐴[𝑙 +1], ..., 𝐴[𝑖 −1], 𝐴[𝑖],
and one to the right, 𝐴[𝑖], 𝐴[𝑖 + 1], ..., 𝐴[𝑟 − 1], 𝐴[𝑟 ]. The bitwise or of either
of these two parts is actually a monotone function (in their length), and can only
assume at most 16 different values!
Studying Figure 12.10 gives a hint about why. The first row shows the binary
216
12.4. K ARATSUBA’ S ALGORITHM
i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9
101 001 100 010 010 000 100 011 001 010
111 111 110 110 110 100 100 111 111 111
7 7 6 6 6 4 4 7 7 7
Figure 12.10: The bitwise or of the left and right parts, with an endpoint in 𝑚 = 6
values of the array, with 𝑚 = 6 (our presumed maximal element) marked. The
second row shows the binary values of the bitwise or of the interval [𝑖, 𝑚] or
[𝑚, 𝑖] (depending on whether 𝑚 is the right or left endpoint). The third line
shows the decimal values of the second row.
For example, when extending the interval [2, 6] (with bitwise or 110) to
the left, the new bitwise or will be 110|001. This is the only way the bitwise
or can change – when the new value includes bits which so far have not been
set. Obviously, this can only happen at most 16 times, since the values in 𝐴 are
bounded by 216 .
For a given 𝑚, this gives us a partition of all the elements, by the bitwise or
of the interval [𝑚, 𝑖]. In Figure 12.10, the left elements will be partitioned into
[0, 1], [2, 4], [5, 6]. The right elements will be partitioned into [6, 6], [7, 9].
These partitions are everything we need to compute the final.
For example, if we pick the left endpoint from the part [2, 4] and the right
endpoint from the part [7, 9], we would get a bitwise or that is 6 | 7 = 7, of a
length between 4 and 8, together with the 4 as the presumed maximal element.
For each maximal element, we get at most 16 · 16 such choices, totaling less
than 256𝑁 such choices. From these, we can compute the final answer using a
simple sweep line algorithm.
217
C HAPTER 12. D IVIDE AND C ONQUER
not the case for arbitrarily large integers. We will look at Karatsuba as a way of
multiplying polynomials, but this can easily be extended to multiplying integers.
Polynomial Multiplication
Given two 𝑛-degree polynomials (where 𝑛 can be large) 𝑝 (𝑥) = and
Í𝑛 𝑖𝑎
𝑖=0 𝑥 𝑖
𝑞(𝑥) = 𝑛𝑖=0 𝑥 𝑖 𝑏𝑖 compute their product
Í
2𝑛
Õ 𝑖
Õ
(𝑝𝑞) (𝑥) = 𝑥𝑖 ( 𝑎 𝑗 𝑏𝑖−𝑗 )
𝑖=0 𝑗=0
(𝑝𝑙 (𝑥)+𝑝𝑟 (𝑥)) (𝑞𝑙 (𝑥)+𝑞𝑟 (𝑥)) = 𝑝𝑙 (𝑥)𝑞𝑙 (𝑥)+𝑝𝑙 (𝑥)𝑞𝑟 (𝑥)+𝑝𝑟 (𝑥)𝑞𝑙 (𝑥)+𝑝𝑟 (𝑥)𝑞𝑟 (𝑥)
so that
𝑝𝑙 (𝑥)𝑞𝑟 (𝑥)+𝑝𝑟 (𝑥)𝑞𝑙 (𝑥) = (𝑝𝑙 (𝑥)+𝑝𝑟 (𝑥)) (𝑞𝑙 (𝑥)+𝑝𝑟 (𝑥))−𝑝𝑙 (𝑥)𝑞𝑙 (𝑥)−𝑝𝑟 (𝑥)𝑞𝑟 (𝑥)
This means we only need to compute three 𝑘-degree multiplications: (𝑝𝑙 (𝑥) +
𝑝𝑟 (𝑥)) (𝑞𝑙 (𝑥) + 𝑞𝑟 (𝑥)), 𝑝𝑙 (𝑥)𝑞𝑙 (𝑥), 𝑝𝑟 (𝑥), 𝑞𝑟 (𝑥) Our time complexity recurrence
218
12.5. C HAPTER N OTES
219
C HAPTER 12. D IVIDE AND C ONQUER
220
13 Data Structures
This chapter extends Chapter 6 further by showing the two most common
advanced data structures that make an appearence in algorithmic problem
solving. We do not take the approach of simply presenting the data structure
and the problem that it solves. Instead, we take the same approach as we would
for any example problem or algorithm in this book, by gradually improving an
initial naive solution using additional insights. This is particularly valuable in
this case, since certain problems may require variations of these structures in
which only certain of the optimizations we show are applicable. Some of the
techniques which we show during this journy can also occasionally be applied
to other problems, so make sure to digest not only the final results, but every
individual step on the journey to it.
221
C HAPTER 13. D ATA S TRUCTURES
Disjoint Set
1 struct DisjointSets {
2
3 vector<vector<int>> components;
4 vector<int> comp;
5 DisjointSets(int elements) : components(elements), comp(elements) {
6 iota(all(comp), 0);
7 for (int i = 0; i < elements; ++i) components[i].push_back(i);
8 }
9
10 void unionSets(int a, int b) {
11 a = comp[a]; b = comp[b];
12 if (a == b) return;
13 if (components[a].size() < components[b].size()) swap(a, b);
14 for (int it : components[b]) {
222
13.1. D ISJOINT S ETS
15 comp[it] = a;
16 components[a].push_back(it);
17 }
18 }
19
20 };
223
C HAPTER 13. D ATA S TRUCTURES
Prefix Precomputation
Interval Sum
Given a sequence of integers 𝑎 0, 𝑎 1, . . . , 𝑎 𝑁 −1 , you will be given 𝑄 queries of
the form [𝐿, 𝑅). For each query, compute 𝑆 (𝐿, 𝑅) = 𝑎𝐿 + 𝑎𝐿+1 + · · · + 𝑎𝑅−1 .
Computing the sums naively would require Θ(𝑁 ) worst-case time per query
if the intervals are large, for a total complexity of Θ(𝑁 𝑄). If 𝑄 = Ω(𝑁 ) we
can improve this to Θ(𝑁 2 + 𝑄) by precomputing all the answers. To do this in
quadratic time, we use the recurrence
(
0 if 𝐿 = 𝑅
𝑆 (𝐿, 𝑅) =
𝑆 (𝐿, 𝑅 − 1) + 𝑎𝑅−1 otherwise
Using this recurrence we can compute the sequence 𝑆 (𝐿, 𝐿), 𝑆 (𝐿, 𝐿+1), 𝑆 (𝐿, 𝐿+
2), . . . , 𝑆 (𝐿, 𝑁 ) in average Θ(𝑁 ) time for every 𝐿. This gives us the Θ(𝑁 2 + 𝑄)
complexity.
If the function we are computing has an inverse, we can speed this precomputa-
tion up a bit. Assume that we have computed the values 𝑃 (𝑅) = 𝑎 0 +𝑎 1 +· · ·+𝑎𝑅−1 ,
224
13.2. R ANGE Q UERIES
i.e. the prefix sums of 𝑎𝑖 . Since this function is invertible (with inverse −𝑃 (𝑅)),
we can compute 𝑆 (𝐿, 𝑅) = 𝑃 (𝑅) − 𝑃 (𝐿). Basically, the interval [𝐿, 𝑅) consists
of the prefix [0, 𝑅) with the prefix [0, 𝐿) removed. As addition is invertible, we
could simply remove the latter prefix 𝑃 (𝐿) from the prefix 𝑃 (𝑅) using subtraction.
Indeed, expanding this expression shows us that
1: procedure Prefixes(sequence 𝐴)
2: P ← 𝑛𝑒𝑤 𝑖𝑛𝑡 [|𝐴| + 1]
3: for 𝑖 = 0 to |𝐴| − 1 do
4: 𝑃 [𝑖 + 1] ← 𝑃 [𝑖] + 𝐴[𝑖]
5: return 𝑃
Exercise 13.1. The above technique does not work straight-off for non-commutative
operations. How can it be adapted to this case?
Sparse Tables
The case where a function does not have an inverse is a bit more diffi-
cult.
225
C HAPTER 13. D ATA S TRUCTURES
Interval Minimum
Given a sequence of integers 𝑎 0, 𝑎 1, . . . , 𝑎 𝑁 −1 , you will be given 𝑄 queries of
the form [𝐿, 𝑅). For each query, compute the value
Sparse Table
1 vector<vi> ST(const vi& A) {
2 vector<vi> ST(__builtin_popcount(sz(A)), vi(sz(A)));
3 ST[0] = A;
4 rep(len,1,ST.size()) {
5 rep(i,0,n - (1 << len) + 1) {
6 ST[len][i] = max(ST[len - 1][i], ST[len - 1][i + 1 << (len - 1)]);
7 }
8 }
9 return ST;
10 }
[𝐿, 𝐿 + 2𝑘1 )
226
13.2. R ANGE Q UERIES
together cover [𝐿, 𝐿 + 𝑙𝑒𝑛). Thus we can compute the minimum of [𝐿, 𝐿 + len)
as the minimum of log2 len intervals.
This is Θ((𝑁 + 𝑄) log 𝑁 ) time, since the preprocessing uses Θ(𝑁 log 𝑁 )
time and each query requires Θ(log 𝑄) time. This structure is called a Sparse
Table, or sometimes just the Range Minimum Query data structure.
We can improve the query time to Θ(1) by using that the min operation is
idempotent, meaning that min(𝑎, 𝑎) = 𝑎. Whenever this is the case (and the
operation at hand is commutative), we can use just two intervals to cover the
entire interval. If 2𝑘 is the largest power of two that is at most 𝑅 − 𝐿, then
[𝐿, 𝐿 + 2𝑘 )
[𝑅 − 2𝑘 , 𝑅)
covers the entire interval.
1 int rangeMinimum(const vector<vi>& table, int L, int R) {
2 int maxLen = 31 - __builtin_clz(R - L);
3 return min(table[maxLen][L], table[maxLen][R - (1 << maxLen)]);
4 }
While most functions either have inverses (so that we can use the prefix
precomputation) or has idempotent (so that we can use the Θ(1) sparse table),
some functions do not. In such cases (for example matrix multiplication), we
must use the logarithmic querying of the sparse table.
Segment Trees
The most interesting range queries occur on dynamic sequences, where values
can change.
227
C HAPTER 13. D ATA S TRUCTURES
5 1 6 3 7 2 0 4
5 1 6 3 7 2 0 4
5 1 6 3 7 2 0 4
5 1 6 3 7 2 0 4
1: procedure MakeTree(sequence 𝐴)
2: tree ← 𝑛𝑒𝑤 𝑖𝑛𝑡 [2|𝑁 |]
228
13.2. R ANGE Q UERIES
3: for 𝑖 = |𝑁 | to 2|𝑁 | − 1 do
4: tree[𝑖] ← 𝐴[𝑖 − |𝑁 |]
5: for 𝑖 = |𝑁 | − 1 to 1 do
6: tree[𝑖] ← tree[2 · 𝑖] + tree[2 · 𝑖 + 1]
7: return 𝑃
In the first case, we are done (and respond with the sum of the current
interval). In the second case, we perform a recursive call on the half of the
229
C HAPTER 13. D ATA S TRUCTURES
interval that the query lies in. In the third case, we make the same recursive
construction for both the left and the right interval.
Since there is a possibility we perform two recursive calls, we might think
that the worst-case complexity of this query would be Θ(𝑁 ) time. However, the
calls that the third call results in will have a very specific form – they will always
have one endpoint in common with the interval in the tree. In this case, the only
time the recursion will branch is to one interval that is entirely contained in the
query, and one that is not. The first call will not make any further calls. All in
all, this means that there will me at most two branches of logarithmic height, so
that queries are 𝑂 (log 𝑁 ).
230
14 Graph Algorithms
Graph theory is probably the richest of all algorithmic areas. You are almost
guaranteed to see at least one graph problem in any given contest, so it is
important to be well versed in the common algorithms that operate on graphs.
The most important graph algorithms are used to find shortest paths from some
vertex. It is these algorithms that we study first.
Let us solve this problem inductively. First of all, what vertices have distance
0? Clearly, this is only the source vertex 𝑠 itself. This seems like a reasonable
base case, since the problem is about shortest paths from 𝑠. Then, what vertices
231
C HAPTER 14. G RAPH A LGORITHMS
have distance 1? These are exactly those with a path consisting of a single edge
from 𝑠, meaning they are the neighbors of 𝑠 (marked in Figure 14.2).
1 s
1
4
2 1 s 3 2 1 s 3 2 1 s
2 1 2 3 2 1 2 4 3 2 1 2
In fact, this reasoning generalizes to any particular distance, i.e., that all the
vertices that have exactly distance 𝑘 are those that have a neighbor of distance
𝑘 − 1 but no neighbor to a vertex with a smaller distance. Using this, we can
construct an algorithm to solve the problem. Initially, we set the distance of 𝑠
to 0. Then, for every dist = 1, 2, . . . , we mark all vertices that have a neighbor
with distance dist − 1 as having dist. This algorithm is called the breadth-first
search.
Exercise 14.1. Use the BFS algorithm to compute the distance to every square
in the following grid:
232
14.1. B READTH -F IRST S EARCH
1: while curVertices ≠ ∅ do
233
C HAPTER 14. G RAPH A LGORITHMS
8-puzzle
In the 8-puzzle, 8 tiles are arranged in a 3 × 3 grid, with one square left empty.
A move in the puzzle consists of sliding a tile into the empty square. The goal
of the puzzle is to perform some moves to reach the target configuration. The
target configuration has the empty square in the bottom right corner, with the
numbers in order 1, 2, 3, 4, 5, 6, 7, 8 on the three lines.
8 6 8 6 1 2 3
7 1 4 7 1 4 4 5 6
2 5 3 2 5 3 7 8
Figure 14.4: An example 8-puzzle, with a valid move. The rightmost puzzle shows the target
configuration.
Given a puzzle, determine how many moves are required to solve it, or if it
cannot be solved.
This is a typical BFS problem, characterized by a starting state (the initial
puzzle), some transitions (the moves we can make), and the task of finding a
short sequence of transitions to some goal state. We can model this kind of
problem using a graph. The vertices represent the possible arrangements of
234
14.1. B READTH -F IRST S EARCH
the tiles in the grid, and an edge connects two states if the differ by a single
move. A sequence of moves from the starting state to the target configuration
then represents a path in this graph. The minimum number of moves required is
the same as the distance between those vertices in the graph, meaning we can
use a BFS.
In such a problem, most of the code usually deals with with the representation
of a state as a vertex, and generating the edges that a certain vertex is adjacent to.
When an implicit graph is given, we generally do not compute the entire graph
explicitly. Instead, we use the states from the problems as-is, and generate the
edges of a vertex only when it is being visited in the breadth-first search. In the
8-puzzle, we can represent each state as a 3 × 3 2D-vector. The difficult part is
generating all the states that we can reach from a certain state.
With the edge generation in hand, the rest of the solution is a normal BFS,
slightly modified to account for the fact that our vertices are no longer numbered
0, . . . , 𝑉 − 1. We can solve this by using e.g. maps instead.
235
C HAPTER 14. G RAPH A LGORITHMS
8-puzzle BFS
1 int puzzle(const Puzzle& S, const Puzzle& target) {
2 map<Puzzle, int> distances;
3 distances[S] = 0;
4 queue<Puzzle> q;
5 q.push(S);
6 while (!q.empty()) {
7 const Puzzle& cur = q.front(); q.pop();
8 int dist = distances[cur];
9 if (cur == target) return dist;
10 for (const Puzzle& move : edges(cur)) {
11 if (distances.find(move) != distances.end()) continue;
12 distances[move] = dist + 1;
13 q.push(move);
14 }
15 }
16 return -1;
17 }
Besides this kind of search problems that can be solved using a BFS, some
problems require modifications of a BFS, or use the distances generated only as
an intermediary result.
Shortest Cycle
Compute the length of the shortest simple cycle in a graph.
Problem 14.1
Button Bashing – buttonbashing
236
14.2. D EPTH -F IRST S EARCH
Coast Length
KTH Challenge 2011 – Ulf Lundström
The residents of Soteholm value their coast highly and therefore want to maximize
its total length. For them to be able to make an informed decision on their
position in the issue of global warming, you have to help them find out whether
their coastal line will shrink or expand if the sea level rises. From height maps
they have figured out what parts of their islands will be covered by water, under
the different scenarios described in the latest IPCC report on climate change,
but they need your help to calculate the length of the coastal lines.
Figure 14.5: Gray squares are land and white squares are water. The thick black line is the sea
coast.
237
C HAPTER 14. G RAPH A LGORITHMS
the grid has a side length of 1 km and is either water or land. Your goal is to
compute the total length of sea coast of all islands. Sea coast is all borders
between land and sea, and sea is any water connected to an edge of the map
only through water. Two squares are connected if they share an edge. You may
assume that the map is surrounded by sea. Lakes and islands in lakes are not
contributing to the sea coast.
Solution. We can consider the grid as a graph, where all the water squares are
vertices, and two squares have an edge between them if they share an edge. If
we surround the entire grid by an water tiles (a useful trick to avoid special cases
in this kind of grid problems), the sea consists exactly of those vertices that are
connected to these surrounding water tiles. This means we need to compute the
vertices which lie in the same connected component as the sea – a typical DFS
task1. After computing this component, we can determine the coast length by
looking at all the squares which belong to the sea. If such a square share an edge
with a land tile, that edge contributes 1 km to the coast length.
1 const vpi moves = {pii(-1, 0), pii(1, 0), pii(0, -1), pii(0, 1)};
2
3 int coastLength(const vector<vector<bool>>& G) {
4 int H = sz(G) + 4;
5 W = sz(G[0]) + 4;
6 vector<vector<bool>> G2(H, vector<bool>(W, true));
7 rep(i,0,sz(G)) rep(j,0,sz(G[i])) G2[i+2][j+2] = G[i][j];
8 vector<vector<bool>> sea(H, vector<bool>(W));
9
10 function<void(int, int)> floodFill = [&](int row, int col) {
11 if (row < 0 || row >= H|| col < 0 || col >= W) return;
12 if (sea[row][col]) return;
13 sea[row][col] = true;
14 trav(move, moves) floodFill(row + move.first, col + move.second);
15 };
16 dfs(0, 0);
17
18 int coast = 0;
19 rep(i,1,sz(G)+1) rep(j,1,sz(G[0])+1) {
20 if (sea[i][j]) continue;
21 trav(move, moves) if (!sea[i + move.first][j + move.second]) coast++;
22 }
23 return coast;
24 }
1This particular application of DFS, i.e. computing a connected area in a 2D grid, is called a
flood fill.
238
14.3. W EIGHTED S HORTEST PATH
Problem 14.2
Mårten’s DFS – martensdfs
• If we seek shortest paths only from a single vertex or between all pairs of
vertices.
There are mainly three algorithms used: Dijkstra’s Algorithm, the Bellman-Ford
algorithm, and the Floyd-Warshall algorithm.
Dijkstra’s Algorithm
Dijkstra’s Algorithm can be seen as an extension of the breadth-first search that
works for weighted graphs as well.
239
C HAPTER 14. G RAPH A LGORITHMS
By assumption, the weight of the edge (𝑠, 𝑢) must be larger than 𝑊 (which was
the minimal weight of the edges adjacent to 𝑠). This reasoning at least allows us
to find the shortest distance to one other vertex.
Bellman-Ford
0 if 𝑣 = 𝑠
𝐷 (𝑘, 𝑣) = min 𝐷 (𝑘 − 1, 𝑣)
if 𝑘 > 0
min𝑒=(𝑢,𝑣) ∈𝐸 𝐷 (𝑘 − 1, 𝑢) + 𝑊 (𝑒) if 𝑘 > 0
The implementation is straightforward:
240
14.3. W EIGHTED S HORTEST PATH
7: for 𝑒 = (𝑢, 𝑣) ∈ 𝐸 do
8: 𝐷 [𝑘] [𝑣] = min(𝐷 [𝑘] [𝑣], 𝐷 [𝑘 − 1] [𝑢] + 𝑊 (𝑒)
9: return 𝐷
All in all, the states 𝐷 (𝑣, 𝑖) for a particular 𝑖 takes time Θ(|𝐸|) to evaluate. To
compute the distance 𝑑 (𝑠, 𝑣), we still need to know what the maximum possible
𝑘 needed to arrive at this shortest path could be. It turns out that this could
potentially be infinite, in the case where the graph contains a negative-weight
cycle. Such a cycle can be exploited to construct arbitrarily short paths.
However, of no such cycle exists, 𝑘 = |𝑉 | will be sufficient. If a shortest
path uses more than |𝑉 | edges, it must contain a cycle. If this cycle is not of
negative weight, we may simply remove it to obtain a path of at most the same
length. Thus, the algorithm takes 𝑂Θ(|𝑉 ||𝐸|).
Exercise 14.4. Bellman-Ford can be adapted to instead use only Θ(𝑉 ) memory,
by only keeping a current know shortest path and repeatedly relaxing every edge.
Sketch out the pseudo code for such an approach, and prove its correctness.
Floyd-Warshall
241
C HAPTER 14. G RAPH A LGORITHMS
Initially, the distance matrix 𝐷 contains the distances of all the edges in 𝐸, so
that 𝐷 [𝑖] [ 𝑗] is the weight of the edge (𝑖, 𝑗) if such an edge exists, ∞ if there is
no edge between 𝑖 and 𝑗 or 0 if 𝑖 = 𝑗. Note that if multiple edges exists between
𝑖 and 𝑗, 𝐷 [𝑖] [ 𝑗] must be given the minimum weight of them all. Additionally, if
there is a self-loop (i.e. an edge from 𝑖 to 𝑖 itself) of negative weight, 𝐷 [𝑖] [𝑖]
must be set to this value.
To see why this approach works, we can use the following invariant proven
by induction. After the 𝑘’th iteration of the loop, 𝐷 [𝑖] [ 𝑗] will be at most the
minimum distance between 𝑖 and 𝑗 that uses vertices 0, 1, . . . , 𝑘 − 1. Assume
that this is true for a particular 𝑘. After the next iteration, there are two cases
for 𝐷 [𝑖] [ 𝑗]. Either there is no shorter path using vertex 𝑘 than those using only
vertices 0, 1, . . . , 𝑘 − 1. In this case, 𝐷 [𝑖] [ 𝑗] will fulfill the condition by the
induction assumption. If there is a shorter path between 𝑖 and 𝑗 if we use the
vertex 𝑘, this must have length 𝐷 [𝑖] [𝑘] + 𝐷 [𝑘] [ 𝑗], since 𝐷 [𝑖] [𝑘] and 𝐷 [𝑘] [ 𝑗]
both contain the shortest paths between 𝑖 and 𝑘, and 𝑘 and 𝑗 using vertices
0, 1, . . . , 𝑘 − 1. Since we set 𝐷 [𝑖] [ 𝑗] = min(𝐷 [𝑖] [ 𝑗], 𝐷 [𝑖] [𝑘] + 𝐷 [𝑘] [ 𝑗]) in the
inner loop, we will surely find this path too in this iteration. Thus, the statement
is true after the 𝑘 + 1’th iteration too. By induction, it is true for 𝑘 = |𝑉 |, meaning
𝐷 [𝑖] [ 𝑗] contains at most the minimum distance between 𝑖 and 𝑗 using any vertex
in the graph.
242
14.4. M INIMUM S PANNING T REE
4 4
D D
B F B F
3 3
1 1
1 1
2 C 2 C
5
A A
2 2
5
E E
243
C HAPTER 14. G RAPH A LGORITHMS
will thus have replaced the edge {𝑐, 𝑑 } by {𝑎, 𝑑 }, while changing the weight
of the tree by 𝑤 − 𝑤 0 < 0, reducing the sum of weights. Thus, the tree was
improved by using the minimum-weight edge, proving that it could have been
part of the tree.
Exercise 14.6. What happens if all edges on the cycle that appears have weight
𝑤? Is this a problem for the proof?
When implementing the algorithm, the contraction of the edge added to the
minimum spanning tree is generally not performed explicitly. Instead, a disjoint
set data structure is used to keep track of which subsets of vertices have been
contracted. Then, all the original edges are iterated through in increasing order
of weight. An edge is added to the spanning tree if and only if the two endpoints
of the edge are not already connected (as in Figure 14.7).
244
14.5. C HAPTER N OTES
3
2 A A
2 3
D E D E
1 1
2
2
2 2 2
2 B B
C C
A
3
D 2 E
1
2
2
2
B
C
2
2
D A 3 3
D A
E
2 1 E
2 1
2
C B 2
2 C B
2
245
C HAPTER 14. G RAPH A LGORITHMS
246
15 Maximum Flows
This chapter studies so called flow networks, and algorithms we use to solve the
so-called maximum flow and minimum cut problems on such networks. Flow
problems are common algorithmic problems, particularly in ICPC competitions
(while they are out-of-scope for IOI contests). They are often hidden behind
statements which seem unrelated to graphs and flows, especially the minimum
cut problem.
Finally, we will end with a specialization of maximum flow on the case of
bipartite graphs (called bipartite matching).
247
C HAPTER 15. M AXIMUM F LOWS
10
𝑐 𝑑
5 3
𝑆 2 𝑇
2 7
6 8
𝑎 𝑏
6
• For every 𝑣 ∈ 𝑉 {𝑆,𝑇 }, 𝑒 ∈𝑖𝑛 (𝑣) 𝑓 (𝑒) = 𝑒 ∈𝑜𝑢𝑡 (𝑣) – flow is conserved
Í Í
In a computer network, the flows could e.g. represent the current rate of
transfer through each connection.
Exercise 15.1. Prove that the size of a given flow also equals
Õ Õ
𝑓 (𝑣) − 𝑓 (𝑣)
𝑣 ∈in(𝑇 ) 𝑣 ∈out(𝑇 )
i.e. the excess flow out from 𝑆 must be equal to the excess flow in to 𝑇 .
In Figure 15.2, flows have been added to the network from Figure 15.1.
Given such a flow, we are generally interested in determining the flow of the
largest size. This is what we call the maximum flow problem. The problem is
not only interesting on its own. Many problems which we study might initially
seem unrelated to maximum flow, but will turn out to be reducible to finding a
maximum flow.
248
15.2. E DMONDS -K ARP
1/10
𝑐 𝑑
1/5 3/3
𝑆 1/2 𝑇
1/2 0/7
6/6 5/8
𝑎 𝑏
5/6
Figure 15.2: An example flow network, where each edge has an assigned flow. The size of the
flow is 8.
Maximum Flow
Given a flow network (𝑉 , 𝐸, 𝑐, 𝑆,𝑇 ), construct a maximum flow from 𝑆 to 𝑇 .
Input
A flow network.
Output
Output the maximal size of a flow, and the flow assigned to each edge in one
such flow.
Exercise 15.2. The flow of the network in Figure 15.2 is not maximal – there is
a flow of size 9. Find such a flow.
15.2 Edmonds-Karp
There are plenty of algorithms which solve the maximum flow problem. Most
of these are too complicated to be implemented to be practical. We are going
to study two very similar classical algorithms that computes a maximum flow.
We will start with proving the correctness of the Ford-Fulkerson algorithm.
Afterwards, a modification known as Edmonds-Karp will be analyzed (and
found to have a better worst-case complexity).
249
C HAPTER 15. M AXIMUM F LOWS
Augmenting Paths
For each edge, we define a residual flow 𝑟 (𝑒) on the edge, to be 𝑐 (𝑒) − 𝑓 (𝑒). The
residual flow represents the additional amount of flow we may push along an
edge.
In Ford-Fulkerson, we associate every edge 𝑒 with an additional back edge
𝑏 (𝑒) which points in the reverse order. Each back edge is originally given a flow
and capacity 0. If 𝑒 has a certain flow 𝑓 , we assign the flow of the back-edge
𝑏 (𝑒) to be −𝑓 (i.e. 𝑓 (𝑏 (𝑒)) = −𝑓 (𝑒). Since the back-edge 𝑏 (𝑒) of 𝑒 has capacity
0, their residual capacity is 𝑟 (𝑏 (𝑒)) = 𝑐 (𝑏 (𝑒)) − 𝑓 (𝑏 (𝑒)) = 0 − (−𝑓 (𝑒)) = 𝑓 (𝑒).
Intuitively, the residual flow represents the amount of flow we can add to a
certain edge. Having a back-edge thus represents “undoing” flows we have added
to a normal edge, since increasing the flow along a back-edge will decrease the
flow of its associated edge.
9
𝑐 𝑑
1 1 3
4
1
𝑆 1 1 𝑇
1 7 5
6 1 3
𝑎 𝑏
5
Figure 15.3: The residual flows from the network in Figure 15.2.
250
15.2. E DMONDS -K ARP
1: procedure Augment(path 𝑃)
2: inc ← ∞
3: for 𝑒 ∈ 𝑃 do
4: inc ← min(inc, 𝑐 (𝑒) − 𝑓 (𝑒))
5: for 𝑒 ∈ 𝑃 do
6: f (e) ← 𝑓 (𝑒) + inc
7: f (b(e)) ← 𝑓 (𝑏 (𝑒)) − inc
8: return inc
251
C HAPTER 15. M AXIMUM F LOWS
7: return 𝑁 𝑖𝑙
8: procedure Dfs(vertex 𝑎𝑡, sink 𝑇 , flow 𝑓 , capacity 𝑐, path p)
9: 𝑝.𝑝𝑢𝑠ℎ(𝑎𝑡)
10: if 𝑎𝑡 = 𝑇 then
11: return 𝑡𝑟𝑢𝑒
12: for every out-edge 𝑒 = (𝑎𝑡, 𝑣) from 𝑎𝑡 do
13: if 𝑓 (𝑒) < 𝑐 (𝑒) then
14: if 𝐷𝐹𝑆 (𝑣,𝑇 , 𝑓 , 𝑐, 𝑝) then
15: return true
16: 𝑝.𝑝𝑜𝑝 ()
17: return 𝑓 𝑎𝑙𝑠𝑒
For integer flows, where the maximum flow has size 𝑚 Ford-Fulkerson may
require up to 𝑂 (𝐸𝑚) time. In the worst case, a DFS takes Θ(𝐸) time to find a
path from 𝑆 to 𝑇 , and one augmenting path may contribute only a single unit of
flow. For non-integral flows, there are instances where Ford-Fulkerson may not
even terminate (nor converge to the maximum flow).
An improvement to this approach is simply to use a BFS instead. This is
what is called the Edmonds-Karp algorithm. The BFS looks similar to the
Ford-Fulkerson DFS, and is modified in the same way (i.e. only traversing
those edges where the flow 𝑓 (𝑒) is smaller than the capacity 𝑐 (𝑒). The resulting
complexity is instead 𝑂 (𝑉 𝐸 2 ) (which is tight in the worst case).
252
15.3. A PPLICATIONS OF F LOWS
This is nearly the standard maximum flow problem, with the addition of
vertex capacities. We are still going to use the normal algorithms for maximum
flow. Instead, we will make some minor modifications to the network. The
additional constraint given is similar to the constraint placed on an edge. An
edge has a certain amount of flow passing through it, implying that the same
amount must enter and exit the edge. For this reason, it seems like a reasonable
approach to reduce the vertex capacity constraint to an ordinary edge capacity,
by forcing all the flow that passes through a vertex 𝑣 with capacity 𝐶 𝑣 through a
particular edge.
If we partition all the edges adjacent to 𝑣 into incoming and outgoing edges,
it becomes clear how to do this. We can split up 𝑣 into two vertices 𝑣𝑖𝑛 and
𝑣𝑜𝑢𝑡 , where all the incoming edges to 𝑣 are now incoming edges to 𝑣𝑖𝑛 and the
outgoing edges instead become outgoing edges from 𝑣𝑜𝑢𝑡 . If we then add an
edge of infinite capacity from 𝑣𝑖𝑛 to 𝑣𝑜𝑢𝑡 , we claim that the maximum flow
of the network does not change. All the flow that passes through this vertex
must now pass through this edge between 𝑣𝑖𝑛 and 𝑣𝑜𝑢𝑡 . This construction thus
accomplish our goal of forcing the vertex flow through a particular edge. We
can now enforce the vertex capacity by changing the capacity of this edge to 𝐶 𝑣 .
253
C HAPTER 15. M AXIMUM F LOWS
selected edges may share an endpoint, which brings only a minor complication.
After all, this condition is equivalent to each of the vertices in the graph having
a vertex capacity of 1. We already know how to enforce vertex capacities from
the previous problem, where we split each such vertex into two, one for in-edges
and one for out-edges. Then, we added an edge between them with the required
capacity. After performing this modification on the given graph, we are still
missing one important part of a flow network. The network does not yet have
a source and sink. Since we want flow to go along the edges, from one of the
parts to another part of the graph, we should place the source at one side of the
graph and the sink at the other, connecting the source to all vertices on one side
and all the vertices on the other side to the sink.
Exercise 15.3. The minimum path cover reduction can be modified slightly
to find a minimum cycle cover in a directed graph instead. Construct such a
reduction.
254
16 Strings
In computing, much of the information we process is text. Therefore, it should
not come as a surprise that many common algorithms and problems focus
concerns text strings. In this chapter, we will study some of the common string
algorithms and data structures.
16.1 Tries
The trie (also called a prefix tree) is the most common string-related data
structure. It represents a set of words as a rooted tree, where every prefix of
every word is a vertex, with children from a prefix 𝑃 to all strings 𝑃𝑐 which are
also prefixes of a word. If two words have a common prefix, the prefix only
appears once as a vertex. The root of the tree is the empty prefix. The trie is
very useful when we want to associate some information with prefixes of strings
and quickly get the information from neighboring strings.
The most basic operation of the trie is the insertion of strings, which may be
implemented as follows.
Trie
1 struct Trie {
2 map<char, Trie> children;
3 bool isWord = false;
4
5 void insert(const string& s, int pos) {
6 if (pos != sz(s)) children[s[pos]].insert(s, pos + 1);
7 else isWord = true;
8 }
9
10 };
We mark those vertices which corresponds to the inserted word using a boolean
flag isWord. Many problems essentially can be solved by very simple usage of a
trie, such as the following IOI problem.
255
C HAPTER 16. S TRINGS
Type Printer
International Olympiad in Informatics 2008
You need to print 𝑁 words on a movable type printer. Movable type printers are
those old printers that require you to place small metal pieces (each containing a
letter) in order to form words. A piece of paper is then pressed against them
to print the word. The printer you have allows you to do any of the following
operations:
• Remove the last letter from the end of the word currently in the printer.
You are only allowed to do this if there is at least one letter currently in
the printer.
Initially, the printer is empty; it contains no metal pieces with letters. At the
end of printing, you are allowed to leave some letters in the printer. Also, you
are allowed to print the words in any order you like. As every operation requires
time, you want to minimize the total number of operations.
Your task is to output a sequence of operations that prints all the words using
the minimum number of operations needed.
Input
The first line contains the number of words 1 ≤ 𝑁 ≤ 25 000. The next 𝑁 lines
contain the words to be printed, one per line. Each word is at most 20 letters
long and consist only of lower case letters a-z. All words will be distinct
Output
Output a sequence of operations that prints all the words. The operations should
be given in order, one per line, starting with the first. Adding a letter c is
represented by outputting c on a line. Removing the last letter of the current
word is represented by a -. Printing the current word is done by outputting P.
Let us start by solving a variation of the problem, where we are not allowed
to leave letters in the printer at the end. First of all, are there actions that never
make sense? For example, what sequences of letters will ever appear in the type
writer during an optimal sequence of operations? Clearly we never wish to input
a sequence that is not a prefix of a word we wish to type. For example, if we
256
16.1. T RIES
input 𝑎𝑏𝑐𝑑𝑒 𝑓 and this is not a prefix of any word, we must at some point erase
the last letter 𝑓 , without having printed any words. But then we can erase the
entire sequence of operations between inputting the 𝑓 and erasing the 𝑓 , without
changing what words we print.
On the other hand, every prefix of a word we wish to print must at some
point appear on the type writer. Otherwise, we would not be able to reach the
word we wish to print. Therefore, the partial words to ever appear on the type
printer are exactly the prefixes of the words we wish to print – strongly hinting
at a trie-based solution.
If we build the trie of all words we wish to print, it contains as vertices exactly
those strings which will appear as partial words on the printer. Furthermore, the
additions and removals of letters form a sequence of vertices that are connected
by edges in this trie. We can move either from a prefix 𝑃 to a prefix 𝑃𝑐, or from
a prefix 𝑃𝑐 to a prefix 𝑃, which are exactly the edges of a trie. The goal is then
to construct the shortest possible tour starting at the root of the trie and passing
through all the vertices of the trie.
Since a trie is a tree, any such trail must pass through every edge of the trie
at least twice. If we only passed through an edge once, we can never get back
to the root since every edge disconnects the root from the endpoint of the edge
further away from the root. It is actually possible to construct a trail which
passes through every edge exactly twice (which is not particularly difficult if
you attempt this task by hand). As it happens, the depth-first search of a tree
passes through an edge exactly twice – once when first traversing the edge to an
unvisited vertex, and once when backtracking.
The problem is subtly different once we are allowed to leave some letters in
the printer at the end. Clearly, the only difference between an optimal sequence
when letters may remain and an optimal sequence when we must leave the
printer empty is that we are allowed to skip some trailing removal operations. If
the last word we print is 𝑆, the difference will be exactly |𝑆 | “-” operations. An
optimal solution will therefore print the longest word last, in order to “win” as
many “-” operations as possible. We would like this last word to be the longest
word of all the ones we print if possible. In fact, we can order our DFS such that
this is possible. First of all, our DFS starts from the root and the longest word is
𝑠 1𝑠 2 . . . 𝑠𝑛 . When selecting which order the DFS should visit the children of the
root in, we can select the child 𝑠 1 last. Thus, all words starting with the letter 𝑠 1
will be printed last. When visiting 𝑠 1 , we use the same trick and visit the child
257
C HAPTER 16. S TRINGS
𝑠 1𝑠 2 last of the children of 𝑠 1 , and so on. This guarantees 𝑆 to be the last word to
be printed.
Note that the solution requires no additional data to be stored in the trie –
the only modification to our basic trie is the DFS.
Typewriter
1 struct Trie {
2 ...
3
4 void dfs(int depth, const string& longest) {
5 trav(it, children)
6 if (it->first != longest[depth])
7 dfs2(depth, longest, it->first);
8 dfs2(depth, longest, longest[depth]);
9 }
10
11 void dfs2(int depth, const string& longest, char output) {
12 cout << output << endl;
13 if (isWord) cout << "P" << endl;
14 children[output]->dfs(depth + 1, longest);
15 if (longest[depth] != output) {
16 cout << "-" << endl;
17 }
18 }
19 };
Generally, the uses of tries are not this simple, where we only need to
construct the trie and fetch the answer through a simple traversal. We often need
to augment tries with additional information about the prefixes we insert. This
is when tries start to become really powerful. The next problem requires only a
small augmentation of a trie, to solve a problem which looks complex.
Rareville
In Rareville, everyone must have a distinct name. When a new-born baby is to
be given a name, its parents must first visit NAME, the Naming Authority under
the Ministry of Epithets, to have its name approved. The authority has a long
list of all names assigned to the citizens of Rareville. When deciding whether to
approve a name or not, a case officer uses the following procedure. They start
at the first name in the list, and read the first letter of it. If this letter matches
the first letter of the proposed name, they proceed to read the next letter in the
word. This is repeated for every letter of the name in the list. After reading a
letter from the word, the case officer can sometime determine that this could not
258
16.1. T RIES
possibly be the same name as the proposed one. This happens if either
• the next letter in the proposed name did not match the name in the list
When this happen, the case officer starts over with the next name in the list,
until exhausting all names in the list. For each letter the case officer reads (or
attempts to read) from a name in the list, one second pass.
Currently, there are 𝑁 people in line waiting to apply for a name. Can you
determine how long time the decision process will take for each person?
Input
The first line contains integers 1 ≤ 𝐷 ≤ 200 000 and 1 ≤ 𝑁 ≤ 200 000, the size
of the dictionary and the number of people waiting in line. The next 𝐷 lines
contains one lowercase name each, the contents of the dictionary. The next 𝑁
lines contains one lowercase name each, the names the people in line wish to
apply with. The total size of the lists is at most 106 letters.
Output
For each of the 𝑁 names, output the time (in seconds) the case officer needs to
decide on the application.
The problem clearly relates to prefixes in some way. Given a dictionary
word 𝐴 and an application for a name 𝐵, the case officer needs to read letters
from 𝐴 corresponding to the longest common prefix of 𝐴 and 𝐵, plus 1. Hence,
our solution will probably be to consider all the prefixes of each proposed name,
which is exactly what tries are good at.
Instead of thinking about this process one name a time, we use a common trie
technique and look at the transpose of this problem, i.e. for every 𝑖, how many
names 𝐶𝑖 have a longest common prefix of length at least 𝑖 when handling the
application for a name 𝑆? This way, we have transformed the problem from being
about 𝐷 individual processes to |𝑆 | smaller problems which treats the dictionary
as unified group of strings. Then, we will have to read 𝐶 0 + 𝐶 1 + · · · + 𝐶 |𝑆 | letters.
Now, the solution should be clear. We augment the trie vertex for a particular
prefix 𝑝 with the number of strings 𝑃𝑝 in the list that start with this prefix.
Initially, an empty trie has 𝑃𝑝 = 0 for every 𝑝. Whenever we insert a new word
𝑊 = 𝑤 1𝑤 2 . . . in the trie, we need to increment 𝑃 𝑤1 , 𝑃 𝑤1 𝑤2 , . . . , to keep all the 𝑃𝑝
259
C HAPTER 16. S TRINGS
correct, since we have added a new string which have those prefixes. Then, we
have that 𝐶𝑖 = 𝑃𝑠1𝑠2 ...𝑠𝑖 , so that we can compute all the numbers 𝑃𝑖 by following
the word 𝑆 in the trie. The construction of the trie is linear in the number of
characters we insert, and responding to a query is linear in the length of the
proposed name.
Rareville
1 struct Trie {
2 map<char, Trie> children;
3 int P = 0;
4
5 void insert(const string& s, int pos) {
6 P++;
7 if (pos != sz(s)) children[s[pos]].insert(s, pos + 1);
8 }
9
10 int query(const string& s, int pos) {
11 int ans = P;
12 if (pos != sz(s)) {
13 auto it = children.find(s[pos]);
14 if (it != children.end) ans += it->second.query(s, pos + 1);
15 }
16 return ans;
17 }
18 };
260
16.2. S TRING M ATCHING
String Matching
Find all occurrences of the pattern 𝑃 as a substring in the string 𝑊 .
since each of the 𝑛2 positions where the pattern can appear requires us to look
ahead for 𝑛2 characters to realize we made a match. On the other hand, if we
manage to find a long partial match of length 𝑙 starting at 𝑖, we know what the
next 𝑙 letters of 𝑊 are – they are the 𝑙 first letters of 𝑃. With some cleverness,
we should be able to exploit this fact, hopefully avoiding the need to scan them
again when we attempt to find a match starting at 𝑖 + 1.
For example, assume we have 𝑃 = 𝑏𝑎𝑛𝑎𝑛𝑎𝑟𝑎𝑚𝑎. Then, if we have performed
a partial match of 𝑏𝑎𝑛𝑎𝑛𝑎 at some position 𝑖 in 𝑊 but the next character
is a mismatch (i.e., it is not an 𝑟 ), we know that no match can begin at
the next 5 characters. Since we have matched 𝑏𝑎𝑛𝑎𝑛𝑎 at 𝑖, we have that
𝑊 [𝑖 + 1...𝑖 + 5] = 𝑎𝑛𝑎𝑛𝑎, which does not contain a 𝑏.
As a more interesting example, take 𝑃 = 𝑎𝑏𝑏𝑎𝑎𝑏𝑏𝑜𝑟𝑟𝑒. This pattern has the
property that the partial match of 𝑎𝑏𝑏𝑎𝑎𝑏𝑏 actually contains as a prefix of 𝑃 itself
as a suffix, namely 𝑎𝑏𝑏. This means that if at some position 𝑖 get this partial match
but the next character is a mismatch, we can not immediately skip the next 6
characters. It is possible that the entire string could have been 𝑎𝑏𝑏𝑎𝑎𝑏𝑏𝑎𝑎𝑏𝑏𝑜𝑟𝑟𝑒.
Then, an actual match (starting at the fifth character) overlaps our partial match.
261
C HAPTER 16. S TRINGS
It seems that if we find a partial match of length 7 (i.e. 𝑎𝑏𝑏𝑎𝑎𝑏𝑏), we can only
skip the first 4 characters of the partial match.
For every possible partial match of the pattern 𝑃, how many characters are
we able to skip if we fail a 𝑘-length partial match? If we could precompute such
a table, we should be able to perform matching in linear time, since we would
only have to investigate every character of 𝑊 once. Assume the next possible
match is 𝑙 letters forward. Then the new partial match must consist of the last
𝑘 − 𝑙 letters of the partial match, i.e. 𝑃 [𝑙 . . . 𝑘 − 1]. But a partial match is just a
prefix of 𝑃, so we must have 𝑃 [𝑙 . . . 𝑘 − 1] = 𝑃 [0 . . . 𝑙 − 1]. In other word, for
every given 𝑘, we must find the longest suffix of 𝑃 [0 . . . 𝐾 − 1] that is also a
prefix of 𝑃 (besides 𝑃 [0 . . . 𝑘 − 1] itself, of course).
We can compute these suffixes rather easily in 𝑂 (𝑛 2 ). For each possible
position for the next possible match 𝑙, we perform a string matching to find all
occurrences of prefixes of 𝑃 within 𝑃:
1: procedure LongestSuffixes(pattern 𝑃)
2: 𝑇 ← new int[|P| + 1]
3: for 𝑙 from 1 to |𝑃 | − 1 do
4: matchLen ← 0
5: while 𝑙 + matchLen ≤ |𝑊 | do
6: if 𝑃 [𝑙]! = 𝑃 [matchLen] then
7: break
8: matchLen ← matchLen + 1
9: 𝑇 [𝑙 + matchLen] = matchLen
10: return 𝑇
𝑃 𝑏 𝑎 𝑛 𝑎 𝑛 𝑎 𝑟 𝑎 𝑚 𝑎
𝑇 0 0 0 0 0 0 0 0 0 0
When 𝑃 = 𝑎𝑏𝑏𝑎𝑎𝑏𝑏𝑜𝑟𝑟𝑒, the table instead becomes:
𝑃 𝑎 𝑏 𝑏 𝑎 𝑎 𝑏 𝑏 𝑜 𝑟 𝑟 𝑒
𝑇 0 0 0 1 1 2 3 0 0 0 0
With this precomputation, we can now perform matching in linear time.
The matching is similar to the naive matching, except we can now use this
262
16.2. S TRING M ATCHING
In each iteration of the loop, we see that either match is increased by one,
or match is decreased by match − 𝑇 [match] and pos is increased by the same
amount. Since match is bounded by 𝑃 and pos is bounded by |𝑊 |, this can
happen at most |𝑊 | + |𝑃 | times. Each iteration takes constant time, meaning our
matching is Θ(|𝑊 | + |𝑃 |) time.
While this is certainly better than the naive string matching, it is not
particularly helpful when |𝑃 | = Θ(|𝑊 |) since we need an 𝑂 (|𝑃 |) preprocessing.
The solution lies in how we computed the table of suffix matches, or rather, the
fact that it is entirely based on string matching itself. We just learned how to use
this table to perform string matching in linear time. Maybe we can use this table
to extend itself and get the precomputation down to 𝑂 (|𝑃 |)? After all, we are
looking for occurrences of prefixes of 𝑃 in 𝑃 itself, which is exactly what string
matching does. If we modify the string matching algorithm for this purpose, we
get what we need:
1: procedure LongestSuffixes(pattern 𝑃)
2: 𝑇 ← new int[|𝑃 | + 1]
263
C HAPTER 16. S TRINGS
3: pos ← 1, match ← 0
4: while pos + match < |𝑃 | do
5: if 𝑃 [pos + match] = 𝑃 [match] then
6: 𝑇 [pos + match] ← match + 1
7: match ← match + 1
8: else if match = 0 then
9: pos ← pos + 1
10: else
11: pos ← pos + match − 𝑇 [match]
12: match ← 𝑇 [match]
13: if match = |𝑃 | then
14: matches. append(match)
15: return 𝑇
Clock Pictures
Nordic Collegiate Programming Contest 2014
You have two pictures of an unusual kind of clock. The clock has 𝑛 hands, each
having the same length and no kind of marking whatsoever. Also, the numbers
on the clock are so faded that you can’t even tell anymore what direction is up in
the picture. So the only thing that you see on the pictures, are 𝑛 shades of the 𝑛
hands, and nothing else.
You’d like to know if both images might have been taken at exactly the same
time of the day, possibly with the camera rotated at different angles.
Given the description of the two images, determine whether it is possible
that these two pictures could be showing the same clock displaying the same
264
16.3. C HAPTER N OTES
time.
Input
The first line contains a single integer 𝑛 (2 ≤ 𝑛 ≤ 200000), the number of hands
on the clock.
Each of the next two lines contains 𝑛 integers 𝑎𝑖 (0 ≤ 𝑎𝑖 ≤ 360000),
representing the angles of the hands of the clock on one of the images, in
thousandths of a degree. The first line represents the position of the hands on
the first image, whereas the second line corresponds to the second image. The
number 𝑎𝑖 denotes the angle between the recorded position of some hand and
the upward direction in the image, measured clockwise. Angles of the same
clock are distinct and are not given in any specific order.
Output
Output one line containing one word: possible if the clocks could be showing
the same time, impossible otherwise.
265
C HAPTER 16. S TRINGS
266
17 Combinatorics
Combinatorics deals with various discrete structures, such as graphs and
permutations. In this chapter, we will mainly study the branch of combinatorics
known as enumerative combinatorics – the art of counting. We will count the
number of ways to choose 𝐾 different candies from 𝑁 different candies, the
number of distinct seating arrangements around a circular table, the sum of sizes
of all subsets of a set and many more objects. Many combinatorial counting
problems are based on a few standard techniques which we will learn in this
chapter.
|𝑆 1 ∪ 𝑆 2 ∪ · · · ∪ 𝑆𝑛 | = |𝑆 1 | + |𝑆 2 | + · · · + |𝑆𝑛 |
Example 17.1 Assume we have 5 different types of chocolate bars (the set 𝐶),
3 different types of bubble gum (the set 𝐺), and 4 different types of lollipops
(the set 𝐿). These form three disjoint sets, meaning we can compute the
total number of snacks by summing up the number of snacks of the different
types. Thus, we have |𝐶 | + |𝐺 | + |𝐿| = 5 + 3 + 4 = 12 different snacks.
Later on, we will see a generalization of the addition principle that handles
cases where our sets are not disjoint.
The multiplication principle, on the other hand, states that the size of the
Cartesian product 𝑆 1 × 𝑆 2 × · · · × 𝑆𝑛 equals the product of the individual sizes of
these sets, i.e.
|𝑆 1 × 𝑆 2 × · · · × 𝑆𝑛 | = |𝑆 1 | · |𝑆 2 | · · · |𝑆𝑛 |
267
C HAPTER 17. C OMBINATORICS
Example 17.2 Assume that we have the same sets of candies 𝐶, 𝐺 and 𝐿
as in Example 17.1. We want to compose an entire dinner out of snacks,
by choosing one chocolate bar, one bubble gum and a lollipop. The
multiplication principles tells us that, modeling a snack dinner as a tuple
(𝑐, 𝑔, 𝑙) ∈ 𝐶 × 𝐺 × 𝐿, we can form our dinner in 5 · 3 · 4 = 60 ways.
Example 17.3 How many four letter words consisting of the letters 𝑎, 𝑏, 𝑐
and 𝑑 contain exactly two letters 𝑎?
There are six possible ways to place the two letters 𝑎:
𝑎𝑎__
𝑎_𝑎_
𝑎__𝑎
_𝑎𝑎_
_𝑎_𝑎
__𝑎𝑎
For each of these ways, there are four ways of choosing the other two letters
(𝑏𝑏, 𝑏𝑐, 𝑐𝑏, 𝑐𝑐). Thus, there are 4 + 4 + 4 + 4 + 4 + 4 = 6 · 4 = 24 such words.
Let us now apply these basic principle sto solve the following problem:
Kitchen Combinatorics
Northwestern Europe Regional Contest 2015 – Per Austrin
The world-renowned Swedish Chef is planning a gourmet three-course dinner
for some muppets: a starter course, a main course, and a dessert. His famous
Swedish cook-book offers a wide variety of choices for each of these three
courses, though some of them do not go well together (for instance, you of
course cannot serve chocolate moose and sooted shreemp at the same dinner).
Each potential dish has a list of ingredients. Each ingredient is in turn
available from a few different brands. Each brand is of course unique in its own
special way, so using a particular brand of an ingredient will always result in a
268
17.1. T HE A DDITION AND M ULTIPLICATION P RINCIPLES
completely different dinner experience than using another brand of the same
ingredient.
Some common ingredients such as pølårber may appear in two of the three
chosen dishes, or in all three of them. When an ingredient is used in more than
one of the three selected dishes, Swedish Chef will use the same brand of the
ingredient in all of them.
While waiting for the meecaroo, Swedish Chef starts wondering: how many
different dinner experiences are there that he could make, by different choices
of dishes and brands for the ingredients?
Input
The input consists of:
• 𝑠 + 𝑚 + 𝑑 dishes – the 𝑠 starter dishes, then the 𝑚 main dishes, then the 𝑑
desserts. Each dish starts with an integer 1 ≤ 𝑘 ≤ 20 denoting the number
of ingredients of the dish, and is followed by 𝑘 distinct integers 𝑖 1, . . . , 𝑖𝑘 ,
where for each 1 ≤ 𝑗 ≤ 𝑘, 1 ≤ 𝑖 𝑗 ≤ 𝑟 is an ingredient.
Output
If the number of different dinner experiences Swedish Chef can make is at most
1018 , then output that number. Otherwise, output “too many”.
The solution is a similar addition-multiplication principle combo as used
in Example 17.3. First off, we can simplify the problem considerably by brute
forcing over the coarsest component of a dinner experience, namely the courses
included. Since there are at most 25 dishes of every type, we need to check up
to 253 = 15 625 choices of dishes. By the addition principle, we can compute
the number of dinner experiences for each such three-course dinner, and then
sum them up to get the answer. Some pairs of dishes do not go well together. At
this stage in the process we exclude any triple of dishes that include such a pair.
269
C HAPTER 17. C OMBINATORICS
We can perform this check in Θ(1) time if we save the incompatible dishes in
2D boolean vectors, so that e.g. badStarterMain[𝑖] [ 𝑗] determines if starter 𝑖 is
incompatible with main dish 𝑗.
For a given dinner course consisting of starter 𝑎, main dish 𝑏 and dessert
𝑐, only the set of ingredients of three dishes matters since the chef will use the
same brand for an ingredient even if it is part of two dishes. The next step is
thus to compute this set by taking the union of ingredients for the three included
dishes. This step takes Θ(𝑘𝑎 + 𝑘𝑏 + 𝑘𝑐 ). Once this set is computed, the only
remaining task is to choose a brand for each ingredient. Assigning brands is
an ordinary application of the multiplication principle, where we multiply the
number of brands available for each ingredient together.
17.2 Permutations
A permutation of a set 𝑆 is an ordering of all the elements in the set. For
example, the set {1, 2, 3} has 6 permutations:
123 132
213 231
312 321
Our first “real” combinatorial problem will be to count the number of per-
mutations of an 𝑛-element set 𝑆. When counting permutations, we use the
multiplication principle. We will show a procedure that can be used to construct
permutations one element at a time. Assume that the permutation is the sequence
h𝑎 1, 𝑎 2, . . . , 𝑎𝑛 i. The first element of the permutation, 𝑎 1 , can be assigned any
of the 𝑛 elements of 𝑆. Once this assignment has been made, we have 𝑛 − 1
elements we can choose to be 𝑎 2 (any element of 𝑆 except 𝑎 1 ). In general, when
we are to select the (𝑖 + 1)’th value 𝑎𝑖+1 of the permutation, 𝑖 elements have
already been included in the permutation, leaving 𝑛 − 𝑖 options for 𝑎𝑖+1 . Using
this argument for all 𝑛 elements of the sequence, we can construct a permutation
in 𝑛 · (𝑛 − 1) · · · 2 · 1 ways (by the multiplication principle).
This number is so useful that it has its own name and notation.
Definition 17.1 — Factorial
The factorial of 𝑛, where 𝑛 is a non-negative integer, denoted 𝑛!, is defined
270
17.2. P ERMUTATIONS
√ 𝑛 𝑛 1
𝑛! = 2𝜋𝑛 1+𝑂
𝑒 𝑛
Exercise 17.1. In how many ways can 8 persons be seated around a round table,
if we consider cyclic rotations of a seating to be different? What if we consider
cyclic rotations to be equivalent?
Problem 17.1
𝑛’th permutation – nthpermutation
Name That Permutation – namethatpermutation
Permutations as Bijections
The word permutation has roots in Latin, meaning “to change completely”. We
are now going look at permutations in a very different light, which gives some
justification to the etymology of the word.
Given a set such as [5], we can fix some ordering of its elements such as
h1, 2, 3, 4, 5i. A permutation 𝜋 = h1, 3, 4, 5, 2i of this set can then be seen as a
movement of these elements. Of course, this same movement can be applied
to any other 5-element set with a fixed permutation, such as h𝑎, 𝑏, 𝑐, 𝑑, 𝑒i being
transformed to h𝑎, 𝑐, 𝑑, 𝑒, 𝑏i. This suggests that we can consider permutation as
a “rule” which describes how to move – permute – the elements.
Such a movement rule can also be described as a function 𝜋 : [𝑛] → [𝑛],
where 𝜋 (𝑖) describes what element should be placed at position 𝑖. Thus, the
1Named after James Stirling (who have other important combinatorial objects named after him
too), but stated already by his contemporary Abraham de Moivre.
271
C HAPTER 17. C OMBINATORICS
Permutations also have inverses, which are just the inverses of their functions.
The permutation 𝜋 = h1, 3, 4, 5, 2i which we looked at in the beginning thus
have the inverse given by
𝜋 −1 (1) = 1 𝜋 −1 (3) = 2 𝜋 −1 (4) = 3 𝜋 −1 (5) = 4 𝜋 −1 (2) = 5
written in permutation notation as h1, 5, 2, 3, 4i. Since this is the functional
inverse, we expect 𝜋 −1 𝜋 = id.
272
17.2. P ERMUTATIONS
𝑖 1 2 3 4 5
↓ ↓ ↓ ↓ ↓
𝜋 (𝑖) 1 3 4 5 2
↓ ↓ ↓ ↓ ↓
𝜋 −1 𝜋 (𝑖) 1 2 3 4 5
Problem 17.3
Permutation Inverse – permutationinverse
𝑖 1 2 3 4 5
↓ ↓ ↓ ↓ ↓
𝜋 (𝑖) 2 1 4 5 3
↓ ↓ ↓ ↓ ↓
𝜋 2 (𝑖) 1 2 5 3 4
↓ ↓ ↓ ↓ ↓
𝜋 3 (𝑖) 2 1 3 4 5
↓ ↓ ↓ ↓ ↓
𝜋 4 (𝑖) 1 2 4 5 3
↓ ↓ ↓ ↓ ↓
𝜋 5 (𝑖) 2 1 5 3 4
↓ ↓ ↓ ↓ ↓
𝜋 6 (𝑖) 1 2 3 4 5
273
C HAPTER 17. C OMBINATORICS
Problem 17.4
Cycle Decomposition – cycledecomposition
𝜋 𝑙 (𝑐 1 ) = 𝑐 1, 𝜋 𝑙 (𝑐 2 ) = 𝑐 2, . . . , 𝜋 𝑙 (𝑐𝑙 ) = 𝑐𝑙
Problem 17.5
Order of a Permutation – permutationorder
Dance Reconstruction
Nordic Collegiate Programming Contest 2013 – Lukáš Poláček
Marek loves dancing, got really excited when he heard about the coming wedding
of his best friend Miroslav. For a whole month he worked on a special dance for
the wedding. The dance was performed by 𝑁 people and there were 𝑁 marks
on the floor. There was an arrow from each mark to another mark and every
mark had exactly one incoming arrow. The arrow could be also pointing back to
274
17.2. P ERMUTATIONS
275
C HAPTER 17. C OMBINATORICS
This suggests our first simplification of the problem: to consider all cycles
of 𝜋 𝐾 partitioned by their lengths. By Exercise 17.3, cycles of different lengths
are completely unrelated in the cycle decomposition of 𝜋 𝐾 .
The result also gives us a way to “reverse” the decomposition that happens
to the cycles of 𝜋. Given 𝑚𝑙 cycles of length 𝑚 in 𝜋 𝐾 , we can combine them
into a 𝑙-cycle in 𝜋 in the case where 𝑚 · 𝑔𝑐𝑑 (𝑙, 𝐾) = 𝑙. By looping over every
possible cycle length 𝑙 (from 1 to 𝑁 ), we can then find all possible ways to
combine cycles of 𝜋 𝐾 into larger cycles of 𝜋. This step takes Θ(𝑁 log(𝑁 + 𝐾))
due to the GCD computation.
Given all the ways to combine cycles, a knapsack problem remains for each
cycle length of 𝜋 𝐾 . If we have 𝑎 cycles of length 𝑙 in 𝜋 𝐾 , we want to partition
them into sets of certain sizes (given by by previous computation). This step
takes Θ(𝑎 · 𝑐) ways, if there are 𝑐 ways to combine 𝑎-length cycles.
Once it has been decided what cycles are to be combined, only the act of
computing a combination of them remains. This is not difficult on a conceptual
level, but is a good practice to do on your own (the solution to Exercise 17.3
basically outlines the reverse procedure).
276
17.4. B INOMIAL C OEFFICIENTS
Once we have chosen the first 𝑘 elements of a permutation, there are (𝑛 − 𝑘)!
ways to order the remaining 𝑛 − 𝑘 elements. Thus, we must have divided our
𝑛! permutations into one group for each ordered 𝑘-length sequence, with each
group containing (𝑛 − 𝑘)! elements. To get the correct total, this means there
must be (𝑛−𝑘)!
𝑛!
such groups – and 𝑘-length sequences.
We call these objects ordered 𝑘-subsets of an 𝑛-element set, and denote the
number of such ordered sets by
𝑛!
𝑃 (𝑛, 𝑘) =
(𝑛 − 𝑘)!
Note that this number can also be written as 𝑛 · (𝑛 − 1) · · · (𝑛 − 𝑘 + 1), which
hints at an alternative way of computing these numbers. We can perform the
ordering and choosing of elements at the same time. The first element of our
sequence can be any of the 𝑛 elements of the set. The next element any but the
first, leaving us with 𝑛 − 1 choices, and so on. The difference to the permutation
is that we stop after choosing the 𝑘’th element, which we can do in (𝑛 − 𝑘 + 1)
ways.
277
C HAPTER 17. C OMBINATORICS
ab ba ca da
ac bc cb db
ad bd cd dc
The subset {𝑎, 𝑏} can be ordered in 2! ways - the ordered subsets 𝑎𝑏 and 𝑏𝑎.
Since each unordered subset is responsible for the same number of ordered
subsets, we get the number of unordered subsets by dividing 12 with 2!, giving
us the 6 different 2-subsets of {𝑎, 𝑏, 𝑐, 𝑑 }.
ab
ac bc
ad bd cd
They are thus the product of 𝑘 numbers, divided by another 𝑘 numbers. With
this fact in mind, it does not seem unreasonable that they should be computable
278
17.4. B INOMIAL C OEFFICIENTS
in 𝑂 (𝑘) time. Naively, one might try to compute them by first multiplying the 𝑘
numbers in the nominator, then the 𝑘 numbers in the denominator, and finally
divide them.
Unfortunately, both of these numbers grow quickly. Indeed, already at 21!
we have outgrown a 64-bit integer. Instead, we will compute the binomial
coefficient by alternating multiplications and divisions. We will start with
storing 1 = 11 . Then, we multiply with 𝑛 − 𝑟 + 1 and divide with 1, leaving
us with 𝑛−𝑟1 +1 . In the next step we multiply with 𝑛 − 𝑟 + 2 and divide with 2,
having computed (𝑛−𝑟 +1)1·2 ·(𝑛−𝑟 +2)
. After doing this 𝑟 times, we will be left with
our binomial coefficient.
There is one big question mark from performing this procedure - why must
our intermediate result always be integer? This must be true if our procedure is
correct, or we will at some point perform an inexact integer division, leaving
us with an incorrect intermediate quotient. If we study the partial results more
closely, we see that they are binomial coefficients themselves, namely 𝑛−𝑟1 +1 ,
𝑛−𝑟 +2
, . . . , 𝑛−1
𝑟 −1 , 𝑟 . Certainly, these numbers must be integers. As we just
𝑛
2
showed, the binomial coefficients count things, and counting things tend to
result in integers.
As a bonus, we discovered another useful identity in computing binomial
coefficients:
𝑛 𝑛−1
𝑛
=
𝑟 𝑟 𝑟 −1
Exercise 17.4. Prove this identity combinatorially, by first multiplying both sides
with 𝑟 . (Hint: both sides count the number of ways to do the same two-choice
process, but in different order.)
279
C HAPTER 17. C OMBINATORICS
𝑛
𝑛−𝑟 subsets of the second kind, they must be equal:
𝑛 𝑛
=
𝑟 𝑛 −𝑟
Sjecista
Croatian Olympiad in Informatics 2006/2007, Contest #2
In a convex polygon with 𝑁 sides, line segments are drawn between all pairs
of vertices in the polygon, so that no three line segments intersect in the same
point. Some pairs of these inner segments intersect, however.
For 𝑁 = 6, this number is 15.
280
17.4. B INOMIAL C OEFFICIENTS
legit strategy when solving problems on your own, this approach is usually not
applicable at contests where access to the Internet tend to be restricted.
Instead, let us find some kind of bijection between the objects we count
(intersections of line segments) with something easier to count. This strategy is
one of the basic principles of combinatorial counting. An intersection is defined
by two line segments, of which there are 𝑁2 . Does every pair of segments
intersect? In Figure 17.2, two segments (the solid segments) do not intersect.
However, two other segments which together have the same four endpoints
do intersect with each other. This suggests that line segments was the wrong
level of abstraction when finding a bijection. On the other hand, if we choose
a set of four points, the segments formed by the two diagonals in the convex
quadrilateral given by those four points will intersect at some point (the dashed
segments in Figure 17.2).
Conversely, any intersection of two segments give rise to such a quadrilateral
– the one given by the four endpoints of the segments that intersect. Thus there
exists a bijection between intersections and quadrilaterals, meaning that there
must be an equal number of both. There are 𝑁4 such choices of quadrilaterals,
3) 𝑛𝑘=0 (−1)𝑘 𝑛𝑘 = 0
Í
4) 𝑛𝑘=0 𝑛𝑘 2𝑘 = 3𝑛
Í
h Í𝑘 𝑘 𝑙 i
5) 𝑛𝑘=0 𝑛𝑘 𝑙=0 𝑙 2 = 4𝑛
Í
2https://oeis.org/A000332
281
C HAPTER 17. C OMBINATORICS
Dyck Paths
In a grid of width𝑊 and height 𝐻 , we stand in the lower left corner at coordinates
(0, 0), wanting to venture to the upper right corner at (𝑊 , 𝐻 ). To do this, we
are only allowed two different moves – we can either move one unit north, from
(𝑥, 𝑦) to (𝑥, 𝑦 + 1) or one unit east, to (𝑥 + 1, 𝑦). Such a path is called a Dyck
path.
As is the this spirit of this chapter, we ask how many Dyck paths there are in
a grid of size 𝑊 × 𝐻 . The solution is based on two facts: a Dyck path consists of
exactly 𝐻 + 𝑊 moves, and exactly 𝐻 of those should be northbound moves, and
𝑊 eastbound. Conversely, any path consisting of exactly 𝐻 + 𝑊 moves where
exactly 𝐻 of those are northbound moves is a Dyck path.
If we consider e.g. the Dyck path in Figure 17.3, we can write down the
sequence of moves we made, with the symbol 𝑁 for northbound moves and 𝐸
for eastbound moves:
𝐸𝐸𝑁 𝐸𝑁 𝑁 𝐸𝐸𝐸𝑁 𝐸𝐸𝑁
Such a sequence must consist of all 𝐻 + 𝑊 moves, with exactly 𝐻 “𝑁 ”-moves.
There are exactly 𝐻 𝐻+𝑊 such sequences, since this is the number of ways we
can choose the subset of positions which should contain the 𝑁 moves.
Figure 17.4: The two options for the last possible move in a Dyck path.
If we look at Figure 17.3, we can find another way to arrive at the same
282
17.4. B INOMIAL C OEFFICIENTS
While Dyck paths sometimes do appear directly in problems, they are also a
useful tool to find bijections to other objects.
Sums
In how many ways can the numbers 0 ≤ 𝑎 1, 𝑎 2, . . . , 𝑎𝑘 be chosen such that
𝑘
Õ
𝑎𝑖 = 𝑛
𝑖=1
Input
The integers 0 ≤ 𝑛 ≤ 106 and 0 ≤ 𝑘 ≤ 106 .
Output
Output the number of ways modulo 109 + 7.
Given a Dyck path such as the one in Figure 17.3, what happens if we count
the number of northbound steps we take at each 𝑥-coordinate? There are a total
of 𝑊 + 1 coordinates and 𝐻 northbound steps, so we except this to be a sum of
𝑊 + 1 (non-negative) variables with a sum of 𝐻 . This is indeed similar to what
we are counting, and Figure 17.5 shows this connection explicitly.
This mapping gives us a bijective mapping between sums of 𝑘 terms with a
sum of 𝑛, to Dyck paths on a grid of size (𝑘 − 1) × 𝑛. We already know how
many such Dyck paths there are: 𝑛+𝑘−1 .
𝑛
283
C HAPTER 17. C OMBINATORICS
a1 a2 a3 a4 a5 a6 a7 a8 a9
0+0+1+2+0+0+1+0+1=5
Catalan Numbers
A special case of the Dyck paths are the paths on a square grid that do not cross
the diagonal of the grid. See Figure 17.6 for an example.
We are now going to count the number of such paths, the most complex
counting problem we have encountered so far. It turns out there is a straight-
forward bijection between the invalid Dyck paths, i.e. those who do cross
the diagonal of the grid, to Dyck paths in a grid of different dimensions. In
Figure 17.6, the right grid contained a path that cross the diagonal. If we take
the part of the grid just after the first segment that crossed the diagonal and
mirror it in the diagonal translated one unit upwards, we get the situation in
Figure 17.7.
We claim that when mirroring the remainder of the path in this translated
diagonal, we will get a new Dyck path on the grid of size (𝑛 − 1) × (𝑛 + 1).
Assume that the first crossing is at the point (𝑐, 𝑐). Then, after taking one step up
in order to cross the diagonal, the remaining path goes from (𝑐, 𝑐 + 1) to (𝑛, 𝑛).
This needs 𝑛 − 𝑐 steps to the right and 𝑛 − 𝑐 − 1 steps up. When mirroring, this
instead turns into 𝑛 −𝑐 −1 steps up and 𝑛 −𝑐 steps right. Continuing from (𝑐, 𝑐 +1),
284
17.4. B INOMIAL C OEFFICIENTS
Figure 17.7: Mirroring the part of the Dyck path after its first diagonal crossing.
2𝑛 2𝑛 2𝑛 𝑛 2𝑛 1 2𝑛
𝐶𝑛 = − = − =
𝑛 𝑛+1 𝑛 𝑛+1 𝑛 𝑛+1 𝑛
The first few Catalan numbers3 are 1, 1, 2, 5, 14, 42, 132, 429, 1430.
Problem 17.7
Catalan Numbers – catalan
Catalan numbers count many other objects, most notably the number of
balanced parentheses expressions. A balanced parentheses expression is a string
of 2𝑛 characters 𝑠 1𝑠 2 . . . 𝑠 2𝑛 of letters ( and ), such that every prefix 𝑠 1𝑠 2 . . . 𝑠𝑘
contain at least as many letters ( as ). Given such a string, like (() ()) (()) we
can interpret it as a Dyck path, where ( is a step to the right, and ) is a step
upwards. Then, the condition that the string is balanced is that, for every partial
Dyck path, we have taken at least as many right steps as we have taken up steps.
This is equivalent to the Dyck path never crossing the diagonal, giving us a
bijection between parentheses expressions and Dyck paths. The number of such
parentheses expressions are thus also 𝐶𝑛 .
3https://oeis.org/A000108
285
C HAPTER 17. C OMBINATORICS
A A∩B B
A∪B
Let us consider the most basic case of the principle, using two sets 𝐴 and
𝐵. If we wish to compute the size of their union |𝐴 ∪ 𝐵|, we at least need to
count every element in 𝐴 and every set in 𝐵, i.e. |𝐴| + |𝐵|. The problem with
this formula is that whenever an element is in both 𝐴 and 𝐵, we count it twice.
Fortunately, this is easily mitigated: the number of elements in both sets equals
|𝐴 ∩ 𝐵| (Figure 17.8). Thus, we see that |𝐴 ∪ 𝐵| = |𝐴| + |𝐵| − |𝐴 ∩ 𝐵|.
Similarly, we can determine a formula for the union of three sets |𝐴 ∪ 𝐵 ∪ 𝐶 |.
We begin by including every element: |𝐴| + |𝐵| + |𝐶 |. Again, we have included
the pairwise intersections too many times, so we remove those and get
|𝐴| + |𝐵| + |𝐶 | − |𝐴 ∩ 𝐵| − |𝐴 ∩ 𝐶 | − |𝐵 ∩ 𝐶 |
This time, however, we are not done. While we have counted the elements
which are in exactly one of the sets correctly (using the first three terms), and
the elements which are in exactly two of the sets correctly (by removing the
double-counting using the three latter terms), we currently do not count the
elements which are in all three sets at all! Thus, we need to add them back,
which gives us the final formula:
|𝐴 ∪ 𝐵 ∪ 𝐶 | = |𝐴| + |𝐵| + |𝐶 | − |𝐴 ∩ 𝐵| − |𝐴 ∩ 𝐶 | − |𝐵 ∩ 𝐶 | + |𝐴 ∩ 𝐵 ∩ 𝐶 |
Exercise 17.7. Compute the number of integers between 1 and 1000 that are
divisible by 2, 3 or 5.
286
17.5. T HE P RINCIPLE OF I NCLUSION AND E XCLUSION
From the two examples, you can probably guess formula in the general case,
which we write in the following way:
𝑛
Ø Õ Õ Õ
𝑆𝑖 = |𝑆𝑖 |− |𝑆𝑖 ∩𝑆 𝑗 |+ |𝑆𝑖 ∩𝑆 𝑗 ∩𝑆𝑘 |−· · ·+(−1)𝑛+1 |𝑆 1 ∩𝑆 2 ∩· · ·∩𝑆𝑛 |
𝑖=1 𝑖 𝑖< 𝑗 𝑖< 𝑗 <𝑘
From this formula, we see the reason behind the naming of the principle.
We include every element, exclude the ones we double-counted, include the
ones we removed too many times, and so on. The principle is based on a very
important assumption – that it is easier to compute intersections of sets than their
unions. Whenever this is the case, you might want to consider if the principle is
applicable.
Derangements
Compute the number of permutations 𝜋 of length 𝑁 such that 𝜋 (𝑖) ≠ 𝑖 for every
𝑖 = 1 . . . 𝑁.
This is a typical application of the principle. We will use it on those sets
of permutations be where the condition is false for at least a particular index
𝑖 If we let these sets be 𝐷𝑖 , the set of all permutations where the condition is
false is 𝐷 1 ∪ 𝐷 2 ∪ · · · ∪ 𝐷 𝑁 . This means we seek 𝑁 ! − |𝐷 1 ∪ · · · ∪ 𝐷 𝑁 |. To
apply the inclusion and exclusion formula, we must be able to compute the size
of intersections of the subsets of 𝐷𝑖 . This task is simplified greatly since the
intersection of 𝑘 such subsets is entirely symmetrical (it does not matter for
which elements the condition is false, only the number).
If we want to compute the intersection of 𝑘 such subsets, this means that
there are 𝑘 indices 𝑖 where 𝜋 (𝑖) = 𝑖. There are 𝑁 − 𝑘 other elements, which
can be arranged in (𝑁 − 𝑘)! ways, so the intersection of these sets have size
(𝑁 − 𝑘)!. Since we can choose which 𝑘 elements that should be fixed in 𝑁𝑘
ways, the term in the formula where we compute all 𝑘-way intersections will
evaluate to 𝑁𝑘 (𝑁 − 𝑘)! = 𝑁𝑘!! . Thus, the formula can be simplified to
𝑁! 𝑁! 𝑁!
− + −...
1! 2! 3!
Subtracting this from 𝑁 ! means that there are
1 1
𝑁 !(1 − 1 + − +...)
2! 3!
This gives us a Θ(𝑁 ) algorithm to compute the answer.
287
C HAPTER 17. C OMBINATORICS
Exercise 17.8. 8 persons are to be seated around a circular table. The company
is made up of 4 married couples, where the two members of a couple prefer
not to be seated next to each other. How many possible seating arrangements
are possible, assuming the cyclic rotations of an arrangement are considered
equivalent?
17.7 Invariants
Many problems deal with processes which consist of many steps. During such
processes, we are often interested in certain properties that never change. We
call such a property an invariant. For example, consider the binary search
algorithm to find a value in a sorted array. During the execution of the algorithm,
we maintain the invariant that the value we are searching for must be contained
in some given segment of the array indexed by [lo, hi) at any time. The fact that
this property is invariant basically constitutes the entire proof of correctness
of binary search. Invariants are tightly attached to greedy algorithms, and is a
common tool used in proving correctness of various greedy algorithms. They
are also one of the main tools in proving impossibility results (for example when
to answer NO in decision problems).
Permutation Swaps
Given is a permutation 𝑎𝑖 of h1, 2, ..., 𝑁 i. Can you perform exactly 𝐾 swaps,
i.e. exchanging pairs of elements of the permutation, to obtain the identity
permutation h1, 2, ..., 𝑁 i?
288
17.7. I NVARIANTS
Input
The first line of input contains the size of the permutation 1 ≤ 𝑁 ≤ 100 000.
The next line contains 𝑁 integers separated, the permutation 𝑎 1, 𝑎 2, ..., 𝑎 𝑁 .
Output
Output YES if it is possible, and NO if it is impossible.
First, we need to compute the minimum number of swaps needed.
Assume the cycle decomposition of the permutation consists of 𝐶 cycles
(see 17.2.1 for a reminder of this concept), with lengths 𝑏 1, 𝑏 2, .., 𝑏𝐶 . Then, we
need at least
Õ𝐶
𝑆= 𝑏𝑖 − 1
𝑖=1
swaps to return it to the identity permutation, a fact you will be asked to prove
in the next section on monovariants. This gives us one necessary condition:
𝐾 ≥ 𝑆. However, this is not sufficient. A single additional condition is needed –
that 𝑆 and 𝐾 have the same parity! To prove this, we will look at the number
of inversions of a permutation, one of the common invariant properties of
permutations.
Given a permutation 𝑎𝑖 , we say that the pair (𝑖, 𝑗) is an inversion if 𝑖 < 𝑗,
but 𝑎𝑖 > 𝑎 𝑗 . Intuitively, it is the number of pairs of elements that are “out of
place” in relation to each other.
3 5 2 1 4 6 inversions
1 5 2 3 4 3 inversions
1 5 3 2 4 4 inversions
1 5 3 4 2 5 inversions
1 2 3 4 5 0 inversions
Figure 17.9: The number of inversions for permutations differing only by a single swap.
289
C HAPTER 17. C OMBINATORICS
holds.
If this is the case, it is obvious why 𝑆 and 𝐾 must have the same parity. Since
𝑆 is the number of swaps needed to transform the identity permutation to the
given permutation, it must have the same parity as the number of inversions. By
performing 𝐾 swaps, 𝐾 must have the same parity as the number of inversions.
As 𝐾 and 𝑆 must have the same parity as the number of inversions, they must
have the same parity as each other.
To see why these two conditions are sufficient, we can, after performing 𝑆
swaps to obtain the identity permutation, simply swap two numbers with each
other the remaining swaps. This can be done since 𝐾 − 𝑆 will be an even number
due to their equal parity.
17.8 Monovariants
Another similar tool (sometimes called a monovariant) instead define some kind
of value 𝑝 (𝑣) to the state 𝑣 of each step of the process. We choose 𝑝 such that it
is strictly increasing or decreasing. They are mainly used to prove the finiteness
of a process, in which either:
• The value function assume e.g. integer values, and is easily bounded in
the direction of monotonicity (e.g. an increasing function would have an
upper bound).
• The value function assume can assume any real, but there are only finitely
many states the process can be in. In this case, the monovariant is used
to prove that the process will never return to a previous state since this
would contradict the monotonicity of 𝑝.
Let us begin with a famous problem of the first kind.
290
17.8. M ONOVARIANTS
Output
Output 𝑁 integers, one for each vertex. The 𝑖’th integer should be 1 or 2 if the
𝑖’th vertex is in the first or the second part of the partition, respectively.
As an example, consider the valid and invalid partitionings in Figure 17.10.
The vertices which does not fulfill the neighbor condition are marked in gray.
D
B F
C G
A
E
2. Look for some kind of modification to this state, which is possible if and
only if the state is not admissible. Generally, the goal of this modification
is to “fix” whatever makes the state inadmissible.
3. Prove that there is some value 𝑝 (𝑠) that must decrease whenever such a
modification is done.
291
C HAPTER 17. C OMBINATORICS
it, which by step 3 will decrease the value 𝑝 (𝑠). Step 4 usually follows from one
of the two value functions discussed previously. Hence, by performing finitely
many such actions, we must (by rule 4) reach a state where no such action is
possible. This happens only when the state is admissible, meaning such a state
must exist. The process might seem a bit abstract, but will become clear once
we walk you through the bipartitioning step.
Our algorithm will work as follows. First, consider any bipartition of the
graph. Assume that this graph does not fulfill the neighbor condition. Then,
there must exist a vertex 𝑣 which has more than |𝑁 2(𝑣) | vertices in the same part
as 𝑣 itself. Whenever such a vertex exists, we move any of them to the other side
of the partition. See Figure 17.11 of the this process.
B
D
F
C G
A
E
G
B
D
F
C
A
E
Figure 17.11: Two iterations of the algorithm, which brings the graph to a valid state.
One question remains – why does this move guarantee a finite process? We
now have a general framework to prove such things, which suggests that perhaps
we should look for a value function 𝑝 (𝑠) which is either strictly increasing or
decreasing as we perform an action. By studying the algorithm in action in
Figure 17.11 we might notice that more and more edges tend to go between the
292
17.8. M ONOVARIANTS
two parts. In fact, this number never decreased in our example, and it turns out
this is always the case.
If a vertex 𝑣 has 𝑎 neighbors in the same part, 𝑏 neighbors in the other part,
and violates the neighbor condition, this means that 𝑎 > 𝑏. When we move 𝑣 to
the other part, the 𝑏 edges from 𝑣 to its neighbors in the other part will no longer
be between the two parts, while the 𝑎 edges to its neighbors in the same part will.
This means the number of edges between the parts will change by 𝑎 − 𝑏 > 0.
Thus, we can choose this as our value function. Since this is an integer function
with the obvious upper bound of 𝐸, we complete step 4 of our proof technique
and can thus conclude the final state must be admissible.
In mathematical problem solving, monovariants are usually used to prove
that the an admissible state exists. However, such problems are really algorithmic
problems in disguise, since they actually provide an algorithm to construct such
an admissible state.
Let us complete our study of monovariants, by also showing a problem using
the second value function rule.
Water Pistols
𝑁 girls and 𝑁 boys stand on a large field, with no line going through three
different children.
Each girl is equipped with a water pistol, and wants to pick a boy to fire at.
While the boys probably will not appreciate being drenched in water, at least the
girls are a fair menace – the will only fire at a single boy each. Unfortunately, it
may be the case that two girls choose which boys to fire at in such a way that
the water from their pistols will cross at some point. If this happens, they will
cancel each other out, never hitting their targets.
Help the girls choose which boys to fire at, in such a way that no two girls
fire at the same boy, and the water fired by two girls will not cross.
293
C HAPTER 17. C OMBINATORICS
Figure 17.12: An assignment where some beams intersect (left), and an assignment where no
beams intersect (right).
Input
The first line contains the integer 𝑁 ≤ 200. The next 𝑁 lines contain two real
numbers −106 ≤ 𝑥, 𝑦 ≤ 106 , separated by a space. Each line is the coordinate
(𝑥, 𝑦) of a girl. The next and final 𝑁 lines contain the coordinates of the boys,
in the same format.
Output
Output 𝑁 lines. The 𝑖’th line should contain the zero-indexed number of the boy
which the 𝑖’th girl should fire at.
After seeing the solution to the previous problem, the solution should not
come as a surprise. We start by randomly assigning the girls to one boy each,
with no two girls shooting at he same boy. If this assignment contains two girls
firing water beams which cross, we simply swap their targets.
Unless you are geometrically minded, it may be hard to figure out an
appropriate value function. The naive value function of counting the current
number of water beams crossing unfortunately fails – and might even increase
after a move.
Instead, let us look closer at what happens when we switch the targets of
two girls. In Figure 17.13, we see the before and after of such an example, as
well as the two situations interposed. If we consider the sum of the two lengths
of the water beams before the swap ((𝐶 + 𝐷) + (𝐸 + 𝐹 )) versus the lengths after
the swap (𝐴 + 𝐵), we see that the latter must be less than the first. Indeed, we
have 𝐴 < 𝐶 + 𝐷 and 𝐵 < 𝐸 + 𝐹 by the triangle inequality, which by summing the
two inequalities give the desired result. Thus the sum of all water beam lengths
294
17.9. C HAPTER N OTES
D
A
E
C
B
F
295
C HAPTER 17. C OMBINATORICS
296
18 Game Theory
In ordinary life, most of us are familar with the concept of a game. We play video
games, sports, card games, board games or any of the other many kinds of games.
As algorithmists, we primarily focus on a subset of competitive games tegic
aspects and well-defined rules, where determining who won is simple. Games
such as chess, poker, tic-tac-toe or or yatzee belong to this category, unlike
soccer (running humans and the behaviour of rolling balls are not sufficiently
well-defined) or most real-time video games where reaction speed counts. The
mathematical area analyzing this kind of game is called game theory.
The games we deal with in algorithmic problem solving is a small subset of
this category. We are often given a position in some kind of abstract turn-based
game, tasked to determine if the player at turn wins if both players play as good
as possible. Players never make mistakes in the games we analyze. For examples,
the game of tic-tac-toe is considered to be a drawn game, since perfect play in
the game always results in a draw.
In this chapter, we learn some basic techniques for determining who wins
a certain game. Occasionally a problem also asks us to construct an optimal
strategy (for example by making the problem interactive and playing against us).
This is often the case when it is “obvious” who wins the game, or at least when
it’s very easy to guess a winner but harder to prove why. Proofs often present us
with an optimal strategy, so we will generally aim to prove who wins even when
one in for example a contest situation just would guess.
Before we dive into analysis of games that will require programming, we start
with some of the more basic techniques that one might use to solve games
given in mathematical rather than algorithmic problems. They are useful in
algorithmic problem solving as well, while also serving as an introduction to
the kind of games we try to solve.
297
C HAPTER 18. G AME T HEORY
Symmetry
When children first to play chess, an early attempted strategy is playing that
of playing symmetrically. White moves its E pawn two squares forward, the
child responds with the same, and so on. Of course, this is not a a very good
strategy – once white plays the winning move, it is very difficult for black
to reply with the symmetric move. In variants of chess, this strategy works
better.
Knight Packing
On a 𝑛 × 𝑛 chess board, two players alternate placing a knight on the board. A
knight can only be placed if there is no other knight which would be either 1
row and 2 columns or 2 rows and 1 column away from it. The first player who
cannot place a knight on the board loses. Given 𝑛 (1 ≤ 𝑛 ≤ 109 ), determine if
the first or second player to move wins.
Solution. In all problems of this kind, where one is supposed to make a move
by e.g. choosing a square to place something at, the first question should
be – if my opponent makes the first move, can I pick a symmetric move that
is always possible? This symmetry can manifest in several ways, such as
mirroring a move along an axis, rotating it 180◦ around a center, or even taking
the complement of a subset, where a move consists of choosing a subset of
something. Games on grids are great candidates for a mirroring strategy using
the first two transformations.
Coloring Game 1
A graph consists of 𝑛 vertices and 𝑛 edges, connecting the vertices into a single
cycle of length 𝑛. Two players play the following game on this graph. Initially,
all vertices are colored white. A move consists of coloring one of the white
vertexes red or blue, where player 1 colors vertexes red, and player 2 colors them
blue. A vertex can only be colored red or blue if neither it nor its two neighbours
have that same color.
For a given 𝑛, if both players play optimally, who wins?
Solution. The first steps in most games should be to try and solve it for a few
smaller instances, to see if a pattern emerges.
298
18.2. T HE G RAPH G AME
D Q
A q E
B
Q
Figure 18.1: An example of a graph game with 6 positions. Player Q starts and has three possible
moves 𝐴, 𝐵 and 𝐶. Q chooses to move to 𝐴, whereupon q responds with the only available
move 𝐷. Finally, Q ends the game with the move 𝐸, leaving q with no possible moves who
therefore lose the game.
1The opposite kind of game, where the player unable to make a move wins, is called a Misère
game.
299
C HAPTER 18. G AME T HEORY
Acyclic Games
An acyclic graph game admit a simple classification of winning and losing
positions.
𝑊
𝐿 𝐿
𝑊
𝑊
Figure 18.2: The winning and losing positions of the game in Figure 18.1
300
18.2. T HE G RAPH G AME
General Games
Non-Repetitive Games
301
C HAPTER 18. G AME T HEORY
302
19 Number Theory
Number theory is the study of certain properties of integers. It makes an
occasional appearance within algorithmic problem solving, in the form of its
subfield computational number theory. It is within number theory topics such
as divisibility and prime numbers belong.
In competitions, number theory problems range from simple applications
of the main theorems you learn in the chapter, to trickier tasks where you must
combine hard number theoretical insights with other algorithmic techniques.
The latter can involve required insights that are difficulty mathematic problems
themselves. This should not come as a surprise. Most content in this chapter is
essentially about learning efficient methods of computing the standard number
theoretical objects, such as primes, modular inverses, divisors, and becoming
well aquainted with time complexities and other asymptotic approximations that
tend to arise in number theoretical problems.
19.1 Divisibility
All of the number theory in this chapter relate to a single property of integers,
divisibility.
303
C HAPTER 19. N UMBER T HEORY
Dual Divisibility
Given two positive integers 𝑎 and 𝑏 with the same number of digits (1 ≤ 𝑏 ≤
𝑎 ≤ 1018 ), compute the number of divisors of 𝑎 that have 𝑏 as a divisor.
For example, with 𝑎 = 96 and 𝑏 = 12, there are 5 such numbers: 12, 24, 36,
48 and 96.
Solution. Assume that 𝑐 is such a number. The solution falls out from some
applications of the definition of divisibility. We have 𝑎 = 𝑐𝑞 and 𝑐 = 𝑏𝑞 0 for
some positive integers 𝑞, 𝑞 0.
The value of 𝑞 0 is at most 9 by the following argument. If 𝑞 0 ≥ 10, we have
𝑎 = 𝑐𝑞 ≥ 𝑐 ≥ 10𝑏, but then 𝑎 has more digits than 𝑏, a contradiction. Thus, we
can simply test all the values of 𝑐 by letting 𝑞 0 = 1, 2, . . . , 9 and verifying that
the two conditions hold using the modulo operator.
Problem 19.1
Dual Divisibility – dualdivisibility
Evening Out 1 – eveningout1
304
19.1. D IVISIBILITY
Divisors
Given an integer 𝑛, compute all the positive divisors of 𝑛.
Every integer has at least two particular divisors called the trivial divisors,
namely 1 and 𝑛 itself. If we exclude the divisor 𝑛, we get the proper divisors.
To find the remaining divisors, we can use the fact that any divisor 𝑑 of 𝑛 must
satisfy |𝑑 | ≤ |𝑛| (why?). This means that we can limit ourselves to testing
whether the integers between 1 and 𝑛 are divisors of 𝑛, a Θ(𝑛) algorithm. We
can do a bit better though, by exploiting a nice symmetry.
Hidden in Example 19.1 lies the key insight to speeding this up. It seems
that whenever we had a divisor 𝑑, we were immediately given another divisor 𝑞.
For example, when claiming 3 was a divisor of 12 since 3 · 4 = 12, we found
another divisor, 4. This is not a surprise, given that the definition of divisibility
(Definition 19.1) – the existence of the integer 𝑞 in 𝑛 = 𝑑𝑞 – is symmetric in 𝑑
and 𝑞, meaning divisors come in pairs (𝑑, 𝑑𝑛 ).
Exercise 19.2. Prove that a positive integer has an odd number of divisors if and
only if it is a perfect square.
Since divisors come in pairs, we can limit ourselves to finding one member
of each such pair. Furthermore, one of the elements in each such pair must be
√ √ √
bounded by 𝑛. Otherwise, we would have that 𝑛 = 𝑑 · 𝑑𝑛 > 𝑛 · 𝑛 = 𝑛, a
contradiction (again, 0 is a special case here where we always have 𝑑0 = 0). This
√
limit helps us reduce the time it takes to find the divisors of a number to Θ( 𝑛),
which allows us to solve the problem sufficiently fast.
1: procedure Divisors(𝑁 )
2: divisors ← new list
3: for 𝑖 from 1 up to 𝑖 2 ≤ 𝑁 do
4: if 𝑁 mod 𝑖 = 0 then
5: divisors. add(𝑖)
6: if 𝑖 ≠ 𝑁 /𝑖 then
7: divisors. add( 𝑁𝑖 )
305
C HAPTER 19. N UMBER T HEORY
8: return divisors
Problem 19.2
Divisors – divisors
Subcommittees
In a parliment of 𝑃 ≤ 1016 people, the speaker wants to divide the parliment
into (at least two) disjoint subcommittees of equal size. Of course, the chair of
such a subcommittee furthermore wants to divide their subcommittee into (at
least two) subsubcommittees of equal size, and so on, until no further divisions
can be performed.
What is the maximum number of levels of subcommittees can be created?
Solution. What different sizes may the first level of subcommitees have? Well,
if we perform a split into groups if size 𝑘, we get 𝑃𝑘 such groups. Of course, this
must be an integer – i.e. 𝑘 must be a divisor of 𝑃. This means we are looking
for a sequence of numbers 𝑐 0, 𝑐 1, 𝑐 2, . . . , 𝑐𝑛 such that 𝑐 0 = 𝑃, 𝑐𝑖+1 | 𝑐𝑖 and 𝑐𝑛 = 1.
A simple solution would be to generate all divisors of 𝑃 (the possible values
of 𝑐 1 ), attempt a split into those group sizes, and then recursively solve the
problem for them. However, this would be too slow. As an example, if we
take 𝑃 = 8 086 598 962 041 600, the sum of the square roots of its divisors is
6 636 882 083, so even finding only the ways to split the parliment into 2-level
committees would be too expensive.
Instead, we will use the following lemma:
This means that the possible values of 𝑐𝑖 , i.e. the transitive closure of
divisibility of 𝑃, are the divisors of 𝑃. Essentially, we are looking for the longest
sequence of divisors of 𝑃 such that each divisor is also a divisor of the previous
divisor. By constructing the directed graph of all the divisors 𝑎 with edges from
𝑎 to its own divisors, we reduce the problem to finding the longest path in a
DAG. Unfortunately, this too is slow – the above 𝑃 has 41 472 divisors, leaving
us with about 41472 = 859 942 656 modulo operations to construct the graph.
2
306
19.1. D IVISIBILITY
307
C HAPTER 19. N UMBER T HEORY
1: procedure Subcommittees(𝑃)
2: list divisors ← 𝐷𝑖𝑣𝑖𝑠𝑜𝑟𝑠 (𝑃)
3: sort divisors in descending order
4: ans ← 0
5: for each 𝑑 in divisors do
6: if 𝑑 divides 𝑃 then
7: ans ← ans + 1
8: 𝑃 ←𝑑
9: return ans
Problem 19.3
Subcommittees – subcommittees
Evening Out 2 – eveningout2
Multiplication Table – multtable
Note: Solve for 2 points.
308
19.1. D IVISIBILITY
This result that divisors comes in pairs happens to give us some help in
answering our next question, regarding the plurality of divisors. The above
√
result gives us an upper bound of 2 𝑛 divisors of an integer 𝑛. We can do a
1
little better, with ≈ 𝑛 3 being a commonly used approximation for the number of
divisors when dealing with integers which fit in the native integer types.1 For
example, the maximal number of divisors of a number less than 103 is 32, 106
is 240, 109 is 1 344, 1018 is 103 680.2
A bound we will find more useful when solving problems concerns the
average number of divisors of the integers between 1 and 𝑛.
Proof. There are between 𝑛−𝑖+1 𝑖 and 𝑛𝑖 integers between 1 and 𝑛 divisible by
𝑖, since every 𝑖’th integer is divisible by 𝑖. Thus, the number of divisors of all
those integers is bounded by
𝑛 Õ1 𝑛
𝑠𝑢𝑚𝑛𝑗=1 =𝑛 = 𝑂 (𝑛 ln 𝑛)
𝑗 𝑗=1
𝑗
𝑛− 𝑗 +1 Õ1 Õ1
𝑛 𝑛
𝑠𝑢𝑚𝑛𝑗=1 =𝑛 −𝑛 + ≥ 𝑛 ln 𝑛 − 𝑛 + ln 𝑛 = Ω(𝑛 ln 𝑛)
𝑗 𝑗=1
𝑗 𝑗=1
𝑗
from below.
This proof also suggest a way to compute the divisors of all the integers
1, 2, ..., 𝑁 .
1In reality, the maximal number of divisors of the interval [1, 𝑛] grows sub-polynomially, i.e.,
as 𝑂 (𝑛𝜖 ) for every 𝜖 > 0.
2Sequence A066150 from OEIS: http://oeis.org/A066150.
309
C HAPTER 19. N UMBER T HEORY
Divisor Counts
For every integer between 1 and 𝑁 , compute the number of positive divisors it
has.
Solving the problem with the previous√algorithm, computing the divisors for
every single integer, would yield a Θ(𝑁 𝑁 ) algorithm. Instead, we inverse the
problem. For each integer 𝑖, we find all the numbers divisible by 𝑖 (in Θ( 𝑛𝑖 )
time), which are 0𝑖, 1𝑖, 2𝑖, . . . b 𝑛𝑖 c𝑖. In total, this takes Θ(𝑁 ln 𝑁 ) time, a quite
decent improvment.
Problem 19.4
Divisor Counts – divisorcounts
Organizator – organizator
Example 19.2 The first 10 prime numbers are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29.
Which is the next one?
Problem 19.5
310
19.2. P RIME N UMBERS
In a similar manner, we can extend the algorithm used to count all divisors
of all numbers up to some limit to counting primes up to a limit. Instead of
counting divisors, we simply mark those numbers that had non-trivial divisors
as non-prime. The running time is the same, Θ(𝑁 ln 𝑁 ).
Problem 19.7
Prime Count – primecount
Note: Solve for 2 points.
One might wonder if the algorithm in Exercise 19.5 is faster than testing all
possible divisors. To answer this, we need to know more about the number of
primes.
There are an infinite number of primes. This can be proven by a simple proof
by contradiction. If 𝑝 1, 𝑝 2, ..., 𝑝𝑞 are the only primes, then 𝑃 = 𝑝 1𝑝 2 . . . 𝑝𝑞 + 1
is not divisible by any prime number (and by extension has no divisors but the
trivial ones), so it is not composite. However, 𝑃 is larger than any prime, so it is
not a prime number either, a contradiction.
More relevant is instead the density of primes, since that is what determines
how 𝜋 (𝑁 ) relates to 𝑁 .
311
C HAPTER 19. N UMBER T HEORY
1
The density of prime numbers in the interval [1, 𝑁 ] is ≈ ln 𝑁 for large
𝑁.
The proof requires a lot of deep number theory, so we will not show it here. This
means that precomputing primes and then using that list to check primality only
gains you a logarithmic factor. More specifically, the number of primes below
103 is 168, below 106 is 78 498, below 109 is approximately 51 · 106 .
Based on the Prime Number Theorem, one might have the reasonable
thought that prime numbers shouldn’t be that far apart. The Prime Number
Theorem states that the average distance between primes is ln1𝑁 , but of course
the maximum gap may be longer. For all integers up to 109 , the maximum gap
is 282, and for 1018 it is 1442.
A very trivial upper bound on the gaps that is occasionally useful is the
following one.
Prime Time
Jon Marius Venstad, Nordic Collegiate Programming Contest 2011, CC BY-SA 3.0
Odd, Even and Ingmariay are playing a game. They start with an arbitrary positive
integer and take turns either adding 1 or dividing by a prime (assuming the result
is still an integer). Once they reach 1, they each gets points corresponding to the
smallest of the numbers their move resulted in. If a player could make no move,
their score is instead equal to the starting integer. They all play such that they
minimize their own moves. If several possible moves would result in the same
score for the player, they would choose to produce the lowest number that they
can. They play in the order Odd → Even → Ingmariay → . . . , but alternate
who starts the round.
Given a list of starting integers of the rounds they played of the game,
determine the final scores of the players.
312
19.2. P RIME N UMBERS
Input
The first line contains 𝑛 (1 ≤ 𝑛 ≤ 1000), the number of rounds of the game. The
next 𝑛 lines contains the name of the starting player of the round and the starting
integer (between 1 and 10 000).
Output
Output the final scores of the three players.
Note: The problem statement has been shortened.
Solution. The game theoretic solution of the problem would be to construct the
graph of all the integers with edges between possible transitions. One could
then compute the score a player would get for moving to a certain integer in
the graph in the way described in Section 18.2. Unfortunately, the game graph
contains possible loops such as 2 → 3 → 4 → 2. We can eliminate those loops
with some additional insights using the tie-breaking rules the players use when
picking moves.
Let us investigate the behaviour of the players more closely. If a player is
presented with a prime number, they clearly will divide with it to end the game
and get 1 point. In other cases, they may either add 1 or divide away a prime.
This means we never want to move from a prime 𝑝 to 𝑝 + 1.
It turns out that removing these moves makes the game acyclic. Assume
to the contrary that we currently are at a number 𝑎 between two consecutive
primes 𝑝𝑘 and 𝑝𝑘+1 and have a sequence of moves that takes us back to 𝑎. The
next sequence of moves will be to add 1 until we hit an integer 𝑝𝑘 < 𝑏 ≤ 𝑝𝑘+1
(remember we never want to go to 𝑝𝑘+1 + 1) and dividing some prime 𝑝𝑖 away.
But 𝑝𝑏𝑖 ≤ 𝑏2 ≤ 𝑘+1 2 . By Bertrand’s Postulate, 2 < 𝑝𝑘 , so the new result
𝑝 𝑝𝑘+1
will be less than 𝑝𝑘 . However, since we are not allowed to make the transition
𝑝𝑘 → 𝑝𝑘 + 1 we can never reach 𝑎 again.
What remains is to compute the transitions from each integer 1 ≤ 𝑎 ≤ 𝑁 .
By precomputing all the primes up to 10 000, we can afford to test whether all of
4 ·104
them are divisors of each 𝑎. In total, this is on the order of 10ln 10 7
4 ≈ 10 edges,
Since the prime numbers have no other divisors besides the trivial ones, a
factorization consisting only of prime numbers is special.
313
C HAPTER 19. N UMBER T HEORY
Note that in the definition, we spoke of the prime factorization. It turns out
that this factorization is indeed unique, except for a reordering of the 𝑝𝑖 . It may
be “intuitively obvious” that this is the case, but that is misguided. A proof can
be constructed using Euclid’s Lemma from the previous section (p. 307).
Proof. First, the existence part, through a simple proof by induction. Assume
that all integers up to 𝑁 − 1 has a prime factorization. If 𝑁 is a prime, then
𝑁 is a prime factorization of itself. Otherwise, it has a non-trivial divisor, so
we can write 𝑁 = 𝑎𝑏 with 1 < 𝑎 ≤ 𝑏 < 𝑁 . By the induction hypothesis, 𝑎
and 𝑏 have prime factorizations. Concatenating the two factorizations gives
us a prime factorization of 𝑁 . Thus, by induction, all positive integers have
prime factorizations.
Next, the uniqueness. We prove this too by induction. Our base case is
𝑁 = 1 which has the empty product as unique prime factorization. Assume
that 𝑁 has two distinct prime factorizations 𝑁 = 𝑝 1𝑝 2 · · · 𝑝𝑘 = 𝑞 1𝑞 2 · · · 𝑞𝑙 ,
but all integers up to 𝑁 − 1 has only one. Consider the prime 𝑝𝑘 . Since it
divides the left side, it must also divide the right side. Let 𝑖 be such that
314
19.2. P RIME N UMBERS
𝑄 = 𝑞 1 · · · 𝑞𝑖−1 is not divisible by 𝑝𝑘 , but 𝑄𝑞𝑖 is. Such an 𝑖 exists since when
𝑖 = 1, the product 𝑄 is 1 (the empty product) which is not divisble by 𝑝 1 , and
with 𝑖 = 𝑙 we get 𝑄 = 𝑁 which is divisible by 𝑝 1 .
Then, by our version of Euclid’s Lemma, since 𝑝 1 and 𝑞𝑖 are primes,
𝑝 | 𝑄𝑞𝑖 but 𝑝 - 𝑄, we have 𝑝 1 = 𝑞𝑖 . WLOG we can reorder the factors 𝑞 and
assume 𝑞𝑖 = 𝑞 1 . If we divide away this factor, we get that 𝑝 2 · · · 𝑝𝑘 = 𝑞 2 · · · 𝑞𝑙
are both prime factorizations of 𝑝𝑁1 < 𝑁 , so by the induction hypothesis
they are the same. That means the original prime factorizations were also
the same, a contradiction. Thus, 𝑁 too has a unique prime factorization,
completing the proof.
Factorization
Given an integer 𝑁 , compute its prime factorization.
The simplest solution is to extend the algorithm the method used to test primality.
√
An integer 𝑁 can have at most one prime in its factorization that exceeds 𝑁 ,
since their product
√ otherwise would exceed 𝑁 . Looping over all possible prime
divisors up to 𝑁 and factoring them out from 𝑁 is thus sure √ to find all prime
factors, except for possibly a single one that was larger than 𝑁 . The algorithm
is called trial division.
1: procedure Factor(𝑁 )
2: primes ← new list
3: for 𝑖 from 2 up to 𝑖 2 ≤ 𝑁 do
4: while 𝑁 mod 𝑖 = 0 do
5: primes. add(𝑖)
6: 𝑁 ← 𝑁𝑖
7: if 𝑁 ≠ 1 then √
8: primes. add(𝑁 ) ⊲ 𝑁 may have had a single prime factor > 𝑁
9: return primes
Exercise 19.6. In Algorithm ??, 𝑁 is being modified in the loop when a new
prime is found. Is it a problem to use the new, updated 𝑁 in the 𝑖 2 ≤ 𝑁 check in
the loop? Is the time complexity the same?
315
C HAPTER 19. N UMBER T HEORY
Problem 19.8
Factorization – factorization
Note: Solve for 1 point.
Factorial Power
Given integers 𝑛 and 𝑚 (1 ≤ 𝑛, 𝑚 ≤ 1014 ), determine the 𝑘 for which 𝑛𝑘 || 𝑚!.
Solution. To start, we must first connect the prime factorization with divisibility.
If 𝑛 has the prime factorization 𝑛 = 𝑝 𝑒11 · 𝑝 𝑒22 · · · 𝑝𝑙𝑒𝑙 , a divisor of 𝑛 must be of
𝑒0 𝑒0 𝑒0
the form 𝑑 = 𝑝 11 · 𝑝 22 · · · 𝑝𝑙 𝑙 , where 0 ≤ 𝑒𝑖0 ≤ 𝑒𝑖 . This can be proven using the
uniqueness of the prime factorization, and the fact that 𝑛 = 𝑑𝑞 for some integer
𝑞. Any number of this form is also a divisor of 𝑛.
The exponent laws gives us that 𝑛𝑘 = 𝑝 𝑘𝑒 1 · 𝑝 2 · · · 𝑝𝑙 , so we are looking
1 𝑘𝑒 2 𝑘𝑒𝑙
for the largest 𝑘 such that 𝑘𝑒𝑖 does not exceed the power of 𝑝𝑖 in 𝑚!. Thus, after
factoring 𝑛 the problem is reduced to determining how many times all the 𝑝𝑖
divides 𝑚!. This equals
$ % $ %
𝑚 𝑚 𝑚
+ 2 + 3 +...
𝑝𝑖 𝑝𝑖 𝑝𝑖
Problem 19.9
Factorial Power – factorialpower
Perfect Pth Powers – perfectpowers
Divisor Guessing Game – divisorguessing
For the next problem, we will show a slightly faster version of the prime
sieve we’ve previously seen, using it to factor all integers in an interval.
316
19.2. P RIME N UMBERS
Product Divisors
Given a sequence of integers 𝑎 1, 𝑎 2, . . . 𝑎𝑛 , compute the number of divisors of
𝐴 = 𝑛𝑖=1 𝑎𝑖 .
Î
Input
The length of the sequence 1 ≤ 𝑛 ≤ 1 000 000, and the sequence 1 ≤ 𝑎 1, . . . , 𝑎𝑛 ≤
106 .
Output
The number of divisors of 𝑛 modulo 109 + 7.
0 and 𝑒𝑖 to fulfill this condition. This gives us 𝑒𝑖 + 1 choices for the value of
𝑒𝑖0. Since each 𝑒𝑖0 is independent, there are a total of (𝑒 1 + 1) (𝑒 2 + 1) . . . (𝑒𝑘 + 1)
numbers of this form, and thus divisors of 𝐴 by the multiplication principle.
We are left with the problem of determining the prime factorization of 𝐴.
Essentially, this is tantamount to computing the prime factorization of every
integer between 1 and 106 , since we could have 𝑎𝑖 = 𝑖 for 𝑖 = 1 . . . 106 . Once
this is done, we can go through the sequence 𝑎𝑖 and tally up all primes in their
factorization. Since an integer 𝑚 has at most log2 𝑚 prime factors, this step
is bounded by approximately 𝑛 log2 106 operations. Then, how do we factor
all integers in [1..106 ]? We could obviously adapt the algorithm we used to
count primes before, but we will now improve it a bit. The general idea was
to loop over multiples of all numbers, and mark them as non-prime. When
dealing with primes however, we only care about doing this for primes. After
all, any non-prime must have a prime divisor, so we don’t lose correctness by
only sieving on primes. With this improvement, the seive is called the Seive of
Eratosthenes.
This results in the following solution:
1: procedure ProductDivisors(sequence 𝐴)
2: counts ← new list[106 + 1]
3: for each 𝑎 in 𝐴 do
4: counts[a] ← counts[a] + 1
317
C HAPTER 19. N UMBER T HEORY
The complexity of this solution is a bit tricky to analyze. The important part
of the sieve is the inner loop, which computes the actual factors. Let us count
the number of times a prime 𝑝 is pushed in this loop. First of all, every 𝑝’th
integer is divisible by 𝑝, which totals 𝑛𝑝 iterations. However, every 𝑝 2 ’th integer
integer is divisible by 𝑝 yet again, contributing an additional 𝑝𝑛2 iterations, and
so on. Summing this over every 𝑝 which is used in the sieve gives us the bound
Õ 𝑛 𝑛 𝑛
Õ 1 1 1
+ 2 + 3 + ... = 𝑛 + 2 + 3 + ...
√ 𝑝 𝑝 𝑝 √ 𝑝 𝑝 𝑝
𝑝≤ 𝑛 𝑝≤ 𝑛
Using the formula for the sum of a geometric series ( 𝑝1 + 𝑝12 + ... = 𝑝−1 ) gives us
𝑝
the simplification
Õ 1 © Õ 1ª
𝑛 = Θ 𝑛
√ 𝑝 −1
®
√ 𝑝
𝑝≤ 𝑛 « 𝑝 ≤ 𝑛 ¬
1
It is known that 𝑝 ≤𝑛 𝑝 = 𝑂 (ln ln 𝑛). With this, the final complexity becomes a
Í
√
simple 𝑂 (𝑛 ln ln 𝑛) = 𝑂 (𝑛 ln ln 𝑛).
318
19.3. T HE E UCLIDEAN A LGORITHM
Competitive Tip
When using the Sieve of Eratosthenes, we can save quite a bit of memory by using a
bitset instead since we only store a boolean state per number (whether it is prime or
not). This gives us slightly better cache behaviour, improving the performance in real
terms.
Competitive Tip
Problem 19.10
Prime Count – primecount
Note: Solve for 3 points.
Problem 19.11
319
C HAPTER 19. N UMBER T HEORY
(𝑎, 0) = 𝑎 (19.1)
(𝑎, 𝑎) = 𝑎 (19.2)
(𝑎, 𝑏) ≤ max(𝑎, 𝑏) (19.3)
(𝑎𝑐, 𝑏𝑐) = 𝑐 · (𝑎, 𝑏) (19.4)
(𝑎, 𝑏) | (𝑎, 𝑏𝑐) (19.5)
If (𝑎, 𝑐) = 1,then
(𝑎, 𝑏𝑐) = (𝑎, 𝑏) (19.6)
Proof. We give a proof for the last equation – the others are good exercises
to get acquainted with the GCD.
We can WLOG assume that 𝑐 is prime. If it is not, we can assume we
have a smallest counterexample, perform the substitution 𝑐 = 𝑝𝑐 0 where 𝑝 is
prime, and prove it using (𝑎, (𝑏𝑐 0)𝑝) = (𝑎, 𝑏𝑐 0) instead. As (𝑎, 𝑐 0) | (𝑎, 𝑐) = 1,
the premises still hold for showing that (𝑎, 𝑏𝑐 0) = (𝑎, 𝑏).
First, let 𝑑 = (𝑎, 𝑏). Then, (𝑎, 𝑏𝑐) = 𝑑 ( 𝑑𝑎 , 𝑏𝑐
𝑑 ) by Equation 19.4. We are
then trying to prove that ( 𝑑𝑎 , 𝑐 𝑑𝑏 ) ≠ 1. Assume otherwise. Then there must
be some prime 𝑝 that divides 𝑑𝑎 and 𝑐 𝑑𝑏 . The first condition implies 𝑝 | 𝑎.
However, since (𝑎, 𝑐) = 1 we can not have 𝑝 | 𝑐, or else they would share a
divisor greater than 1. This means 𝑝 | 𝑑𝑏 . But then 𝑝 | ( 𝑑𝑎 , 𝑑𝑏 ) = 1, which is
impossible. Thus, we must have that ( 𝑑𝑎 , 𝑐 𝑑𝑏 = 1 and (𝑎, 𝑐𝑏) = 𝑑 = (𝑎, 𝑏).
Before we start looking into how to actually compute the greatest common
divisor, we take a detour into the land of number theoretic sums to also get some
practice and understanding of what the GCD actually means.
320
19.3. T HE E UCLIDEAN A LGORITHM
GCD Sum 1
Compute ÕÕ
gcd(𝑖, 𝑗)
𝑖 |𝑁 𝑗 |𝑁
Solution. Now and then problems consist of computing some number theoretic
sum. There is a number of different techniques involved in this, so we will show
two different solutions.
Let us first try to transform the sum into something simpler. In our case,
we don’t even know how to compute the gcd of two numbers quickly yet, so it
makes sense to attempt to simplify that term. This approach is also supported by
the gcd(𝑖, 𝑗) being a non-trivial term that requires computation to figure out, but
we know all the values it will assume. Just from the definition, we understand
that gcd(𝑖, 𝑗) | 𝑖, and 𝑖 | 𝑁 , so gcd(𝑖, 𝑗) is a divisor of 𝑁 too. Picking 𝑖 = 𝑗 = 𝑑,
gives us gcd(𝑖, 𝑗) = 𝑑 for any divisor 𝑑.
By fixing the values of gcd(𝑖, 𝑗) one at a time, the problem is instead
transformed to fhe following: for each 𝑑 | 𝑁 , compute the number of pairs 𝑖 | 𝑁 ,
𝑗 | 𝑁 such that gcd(𝑖, 𝑗) = 𝑑. If we let this number be 𝑘 (𝑑), the sum in the
problem simplifies to 𝑑 |𝑁 𝑘 (𝑑) · 𝑑. Evaluating 𝑘 (𝑑) is slightly tricky. We first
Í
but themselves share no factor, there are three cases for each prime factor: it
divides 𝑖 0 to some power, 𝑗 0 to some power, or neither (a consequence of Euclid’s
Lemma). In the first two cases there are 𝑒𝑖 possibilities respectively, and in the
second 1. Thus, there are in total 2𝑒𝑖 + 1 choices for each factor, resulting in the
product (2𝑒𝑖 + 1) for the total number of such pairs (𝑖 0, 𝑗 0).
Î
321
C HAPTER 19. N UMBER T HEORY
prime factor 𝑁 and then use a recursive procedure to compute all divisors, one
prime factor at a time.
The second approach we show gives us a quite different way of computing
the sum. It involves first looking at the function we are computing, and figuring
out how it is affected if we isolate one of the prime powers dividing the argument
of the function (𝑁 ). This is a common theme in number theory, where many
sums and functions are easy to compute for prime powers, and hopefully easy to
combine!
Let 𝑝 be a prime where 𝑝 𝑘 || 𝑁 and 𝑁 0 = 𝑝𝑁𝑘 . Then, we can rewrite our sum
as
Õ𝑘 Õ 𝑘
©Õ Õ
gcd(𝑖 · 𝑝 𝑎 , 𝑗 · 𝑝 𝑏 ) ®
ª
𝑎=0 𝑏=0 «𝑖 |𝑁 0 𝑗 |𝑁 0 ¬
By Equation 19.4, we can factor out min(𝑝 𝑎 , 𝑝 𝑏 ) from the innermost term.
Assume for the purpose of demonstration that 𝑎 ≤ 𝑏. Then, the sum simplifies to
Õ𝑘 Õ 𝑘
©Õ Õ 𝑎
𝑝 gcd(𝑖, 𝑗 · 𝑝 𝑏−𝑎 ) ®
ª
𝑎=0 𝑏=0 «𝑖 |𝑁 0 𝑗 |𝑁 0 ¬
𝑘 Õ
𝑘
!
Õ ©Õ Õ
𝑚𝑖𝑛 (𝑎,𝑏)
gcd(𝑖, 𝑗) ®
ª
𝑝
𝑎=0 𝑏=0 «𝑖 |𝑁 0 𝑗 |𝑁 0 ¬
The observant reader may notice that the left factor in the above product happens
to be same sum you’d get if 𝑁 = 𝑝 𝑘 , since gcd(𝑝 𝑎 , 𝑝 𝑏 ) = 𝑝 min(𝑎,𝑏) ! It’s apparently
enough to compute the sum for all prime factors of 𝑁 and multiply answers
together. This is not uncommon – in Section 19.6 we study more functions like
this.
Now, the only thing that remains is to evaluate this particular sum for each
prime divisor. Luckily 𝑘 and the number of primes in a number is both very
small (𝑙𝑒 log2 (𝑁 )), so it can be computed with nested loops.
Finally, we get to the big question. How do we compute the greatest common
divisor of two integers?
322
19.3. T HE E UCLIDEAN A LGORITHM
This last piece of our Euclidean puzzle complete our algorithm, and gives us
a remarkably short algorithm, as seen in Algorithm ??. Note the recursive
invocation to (𝑏 mod 𝑎, 𝑎) to ensure that 𝑎 ≤ 𝑏.
1: procedure GCD(𝐴, 𝐵)
2: if 𝐵 = 0 then
3: return 𝐴
4: return 𝐺𝐶𝐷 (𝐵 mod 𝐴, 𝐴)
323
C HAPTER 19. N UMBER T HEORY
Competitive Tip
The Euclidean algorithm exists as the built-in function __gcd(a, b) in C++.
Granica
Croatian Open Competition in Informatics 2007/2008, Contest #6
Given integers 𝑎 1, 𝑎 2, ..., 𝑎𝑛 , find all those numbers 𝑑 such that upon division by
𝑑, all of the numbers 𝑎𝑖 leave the same remainder.
Input
The first line contains the integer 2 ≤ 𝑛 ≤ 100, the length of the sequence 𝑎𝑖 .
The second line contains the integers 𝑛 integers 1 ≤ 𝑎 1, 𝑎 2, . . . , 𝑎𝑛 ≤ 109 .
Output
Output all such integers 𝑑, separated by spaces.
Solution. What does it mean for two numbers 𝑎𝑖 and 𝑎 𝑗 to have the same
remainder when dividing by 𝑑? Letting this remainder be 𝑟 we can write
𝑎𝑖 = 𝑑𝑛 +𝑟 and 𝑎 𝑗 = 𝑑𝑚 +𝑟 for integers 𝑛 and 𝑚. Thus, 𝑎𝑖 −𝑎 𝑗 = 𝑑 (𝑛 −𝑚) so that
𝑑 is divisor of 𝑎𝑖 − 𝑎 𝑗 ! This gives us a necessary condition for our numbers 𝑑. Is
it sufficient? If 𝑎𝑖 = 𝑑𝑛 +𝑟 and 𝑎 𝑗 = 𝑑𝑚 +𝑟 0, we have 𝑎𝑖 −𝑎 𝑗 = 𝑑 (𝑛 −𝑚) + (𝑟 −𝑟 0).
Since 𝑑 is a divisor of 𝑎𝑖 − 𝑎 𝑗 it must be a divisor of 𝑑 (𝑛 − 𝑚) + (𝑟 − 𝑟 0) too,
meaning 𝑑 | 𝑟 − 𝑟 0. As 0 ≤ 𝑟, 𝑟 0 < 𝑑, we have that −𝑑 < 𝑟 − 𝑟 0 < 𝑑, implying
𝑟 − 𝑟 = 0 so that 𝑟 = 𝑟 0 and both remainders were the same after all.
The question then is how we compute the set of common divisors of all
numbers 𝑎𝑖 − 𝑎 𝑗 . We claim that this set is (even for the case of only two numbers)
the divisors of their greatest common divisor. Intuitively true for some, but to
prove it we take aid in the prime factorizations of divisors. A divisor of some
integer
𝑛 = 𝑝 𝑒11 · · · 𝑝𝑘𝑒𝑘
is of the form
𝑒0 𝑒0
𝑑 = 𝑝 11 · · · 𝑝𝑘𝑘
324
19.3. T HE E UCLIDEAN A LGORITHM
is that 0 ≤ 𝑒𝑖0 ≤ min(𝑓𝑖 , 𝑒𝑖 ). It should be clear that a number with this property
is indeed a common divisor of 𝑛 and 𝑚.
The largest such number is attained when 𝑒𝑖0 = min(𝑓𝑖 , 𝑒𝑖 ) giving us the GCD.
This also explains why all common divisors must be divisors of GCD.
Using this interpretation of the GCD, we can extend the result to finding
the GCD 𝑑 of a sequence 𝑏 1, 𝑏 2, . . . . Consider any prime 𝑝, such that 𝑝 𝑞𝑖 || 𝑏𝑖 .
Then, we must have 𝑝 min(𝑞1,𝑞2,... ) || 𝑑. This operation is exactly what the GCD
algorithm does for two numbers. Since min(𝑞 1, 𝑞 2, . . . ) = min(𝑞 1, min(𝑞 2, . . . )),
we can use the recursion formula 𝑑 = gcd(𝑏 1, 𝑏 2, . . . ) = gcd(𝑏 1, gcd(𝑏 2, . . . )),
simplest implemented in a loop:
1: procedure MultiGCD(sequence 𝐴)
2: gcd ← 0
3: for each 𝑎 ∈ 𝐴 do
4: gcd ← 𝐺𝐶𝐷 (gcd, 𝑎)
5: return gcd
Finally, we need to find all the divisors of the GCD to arrive at the answer.
Example 19.5 The multiples of 12 are 12, 24, 36, 48, 60, . . . . The multiples
of 10 are 10, 20, 30, 40, 50, 60, . . . . The least common multiple of the
numbers is thus 60.
325
C HAPTER 19. N UMBER T HEORY
that we have to add whatever factors to 𝑎 that are additionally present in 𝑏. For
example, since 10 = 2 · 5 and 12 = 2 · 2 · 3, we need to add an additional factor
2 and 3 to 10 to make a common multiple – 2 · 2 · 3 · 5 = 60
To compute the LCM easily, note that a multiple 𝑚 of an integer 𝑎 with
prime factorization
𝑎 = 𝑝 𝑒11 · · · 𝑝𝑘𝑒𝑘
must be of the form
𝑒0 𝑒0
𝑚 = 𝑝 11 · · · 𝑝𝑘𝑘
where 𝑒𝑖 ≤ 𝑒𝑖0.
Thus, if 𝑚 is to be a common multiple of 𝑎 and another integer
𝑏 = 𝑝 11 · · · 𝑝𝑘1
𝑓 𝑓
it must hold that max(𝑓𝑖 , 𝑒𝑖 ) ≤ 𝑒𝑖0, with 𝑒𝑖0 = max(𝑓𝑖 , 𝑒𝑖 ) giving us the smallest such
multiple. Since max(𝑒𝑖 , 𝑓𝑖 )+min(𝑒𝑖 , 𝑓𝑖 ) = 𝑒𝑖 +𝑓𝑖 , we get that lcm(𝑎, 𝑏)·𝑔𝑐𝑑 (𝑎, 𝑏) =
𝑎𝑏. This gives us the formula lcm(𝑎, 𝑏) = gcd(𝑎,𝑏) 𝑎
𝑏 to compute the LCM. The
order of operations is chosen to avoid overflows in computing the product 𝑎𝑏.
As for the GCD of multiple integers, it holds that
326
19.3. T HE E UCLIDEAN A LGORITHM
𝑎𝑥 + 𝑏𝑦 = (𝑎, 𝑏)
It is not obvious that a solution exists. Let 𝑆 = {𝑎𝑥 + 𝑏𝑦 | 𝑥, 𝑦 integers}. These
numbers are called the linear combinations of 𝑎 and 𝑏. 𝑆 is closed under addition
and negation (and thus also subtraction and multiplication). All numbers of
the form 𝑎𝑥 + 𝑏𝑦 are multiples of (𝑎, 𝑏), and we claim that the set (𝑎, 𝑏) itself
(and thus all its multiples). Assume 𝑑 = (𝑎, 𝑏) is the smallest positive member
of 𝑆. Then 𝑎 − 𝑑 d 𝑑𝑎 e = 𝑎 mod 𝑑 ∈ 𝑆, since it is closed under subtraction and
multiplication. Similarly, 𝑏 mod 𝑑 ∈ 𝑆. As 0 ≤ 𝑎 mod 𝑑 < 𝑑, we must have
𝑎 mod 𝑑 = 0 and 𝑏 mod 𝑑 = 0 as 𝑑 was the smallest element of 𝑆. However, this
is equivalent to 𝑑 | 𝑎 and 𝑑 | 𝑏, so 𝑑 | (𝑎, 𝑏) and 𝑑 = (𝑎, 𝑏) since 𝑑 was a multiple
of (𝑎, 𝑏).
This proof might remind you somewhat of the Euclidean algorithm. The
proof and the algorithm hide within them a method to write (𝑎, 𝑏) as a linear
combination of 𝑎 and 𝑏. Remember that during the computation of the GCD, we
repeatedly used that (𝑎, 𝑏) = (𝑏, 𝑎 mod 𝑏). Since 𝑎 mod 𝑏 is a linear combination
of 𝑎 and 𝑏, it seems as if the numbers (𝑎, 𝑏) during the computation of the GCD
always are linear combinations of 𝑎 and 𝑏. The algorithm concludes at (𝑑, 0), at
which point 𝑑 = (𝑎, 𝑏). If we only kept track of which linear combination that
was equal to 𝑑, we would be able to construct a solution to 𝑎𝑥 + 𝑏𝑦 = (𝑎, 𝑏). Let
us try this with an example, where we use [𝑥, 𝑦] to denote the number 𝑎𝑥 + 𝑏𝑦.
327
C HAPTER 19. N UMBER T HEORY
1: procedure ExtendedEuclidean(𝑎, 𝑏)
This gives us a single solution. Finding the others is not much harder. Let
𝑎0 𝑎
= (𝑎,𝑏) and 𝑏 0 = (𝑎,𝑏)
𝑏
. Given two solutions
𝑎𝑥 1 + 𝑏𝑦1 = (𝑎, 𝑏)
𝑎𝑥 2 + 𝑏𝑦2 = (𝑎, 𝑏)
we can first factor out (𝑎, 𝑏) to get
𝑎 0𝑥 1 + 𝑏 0𝑦1 = 1
𝑎 0𝑥 2 + 𝑏 0𝑦2 = 1
A simple subtraction gives us that
𝑎 0 (𝑥 1 − 𝑥 2 ) + 𝑏 0 (𝑦1 − 𝑦2 ) = 0
𝑎 0 (𝑥 1 − 𝑥 2 ) = 𝑏 0 (𝑦2 − 𝑦1 )
Because (𝑎 0, 𝑏 0) = 1, 𝑏 0 | 𝑥 1 − 𝑥 2 . Then there xists 𝑘 such that 𝑥 1 − 𝑥 2 = 𝑘𝑏 0, so
𝑥 1 = 𝑥 2 + 𝑘𝑏 0. Inserting this gives us
𝑎 0 (𝑥 2 + 𝑘𝑏 0 − 𝑥 2 )) = 𝑏 0 (𝑦2 − 𝑦1 )
328
19.4. M ODULAR A RITHMETIC
𝑎 0𝑘𝑏 0 = 𝑏 0 (𝑦2 − 𝑦1 )
𝑎 0𝑘 = 𝑦2 − 𝑦1
𝑦1 = 𝑦2 − 𝑘𝑎 0
Thus, any solution must be of the form
𝑏 𝑎
(𝑥 1 + 𝑘 , 𝑦1 − 𝑘 ) for 𝑘 ∈ Z
(𝑎, 𝑏) (𝑎, 𝑏)
It is easily verified that any 𝑘 also gives us a solution to this. This result is called
Bezout’s identity.
Generalized Knights
A generalized knight is a special chess piece. It moves by first choosing one of
the four cardinal directions and moves 𝑎 steps, and then chooses one of the two
orthogonal cardinal directions and moves 𝑏 steps (for example first up and then
left or right, or first left and then up or down). Compute the minimum number
of moves the knight needs to move from (0, 0) to (𝑥, 𝑦).
Input
The four integers 1 ≤ 𝑎, 𝑏, 𝑥, 𝑦 ≤ 101 8, where 𝑎 ≠ 𝑏.
Solution. Let is split up all the 8 moves the knight can makes into the following
subcomponents:
329
C HAPTER 19. N UMBER T HEORY
𝑓 𝑟𝑎𝑐24 = 0, remainder 2
𝑓 𝑟𝑎𝑐34 = 0, remainder 3
𝑓 𝑟𝑎𝑐44 = 1, remainder 0
𝑓 𝑟𝑎𝑐54 = 1, remainder 1
𝑓 𝑟𝑎𝑐64 = 1, remainder 2
Note how the remainder always increase by 1 when the nominator increased. As
you might remember from Chapter 2 on C++ (or from your favorite programming
language), there is an operator which compute this remainder called the modulo
operator. Modular arithmetic is then the computation on numbers, where
every number is taken modulo some integer 𝑛. Under such a scheme, we
have that e.g. 3 and 7 are basically the same if computing modulo 4, since
3 mod 4 = 3 = 7 mod 4. This concept, where numbers with the same remainder
are treated as if they are equal is called congruence.
𝑎 ≡𝑏 (mod 𝑛)
+ 0 1 2
0 0 1 2
1 1 2 3≡0
2 2 3≡0 4≡1
330
19.4. M ODULAR A RITHMETIC
* 0 1 2
0 0 0 0
1 0 1 2
2 0 2 4≡1
* 0 1 2 3
0 0 0 0 0
1 0 1 2 3
2 0 2 0 2
3 0 3 2 1
Now, 2 does not even have an inverse! To determine when an inverse exists –
and if so, computing the inverse – we will make use of the extended Euclidean
algorithm. If 𝑎𝑎 −1 ≡ 1 (mod 𝑛), we have 𝑛 | 𝑎𝑎 −1 − 1, meaning 𝑎𝑎 −1 − 1 = 𝑛𝑥
for some integer 𝑥. Rearranging this equation gives us 𝑎𝑎 −1 − 𝑛𝑥 = 1. We know
from Section 19.3 that this has a solution if and only if (𝑎, 𝑛) = 1. In this case,
we can use the extended Euclidean algorithm to compute 𝑎 −1 . Note that be
Bezout’s identity, 𝑎 −1 is actually unique modulo 𝑛.
Just like the reals, modular arithmetic has a cancellation law regarding
331
C HAPTER 19. N UMBER T HEORY
Theorem 19.7
Assume 𝑎⊥𝑛. Then 𝑎𝑏 ≡ 𝑎𝑐 (mod 𝑛) implies 𝑏 ≡ 𝑐 (mod 𝑛).
𝑎𝑏 ≡ 𝑎𝑐 (mod 𝑛)
with 𝑎 −1 results in
𝑎𝑎 −1𝑏 ≡ 𝑎𝑎 −1𝑐 (mod 𝑛)
Simplifying 𝑎𝑎 −1 gives us
𝑏 ≡𝑐 (mod 𝑛)
1 mod 𝑛 if m = 0
𝑎 mod 𝑛 = 𝑎 · (𝑎𝑚−1 mod 𝑛) mod 𝑛
𝑚
if m odd
(𝑎 2 mod 𝑛) 2 mod 𝑛 if m even
𝑚
This procedure is clearly Θ(log2 𝑚), since applying the recursive formula
for even numbers halve the 𝑚 to be computed, while applying it an odd number
will first make it even and then halve it in the next iteration. It is very important
that 𝑎 2 mod 𝑛 is computed only once, even though it is squared! Computing it
𝑚
332
19.5. C HINESE R EMAINDER T HEOREM
𝑥 ≡ 𝑎1 (mod 𝑚 1 )
𝑥 ≡ 𝑎2 (mod 𝑚 2 )
...
𝑥 ≡ 𝑎𝑚 (mod 𝑚𝑛 )
Proof. We will prove the theorem inductively. The theorem is clearly true
for 𝑛 = 1, with the unique solution 𝑥 = 𝑎 1 . Now, consider the two equations
𝑥 ≡ 𝑎1 (mod 𝑚 1 )
𝑥 ≡ 𝑎2 (mod 𝑚 2 )
Let 𝑥 = 𝑎 1 · 𝑚 2 · (𝑚 −1
2 mod 𝑚 1 ) + 𝑎 2 · 𝑚 1 · (𝑚 1 mod 𝑚 2 ), where 𝑚 1 mod 𝑚 2
−1 −1
𝑥 ≡ 𝑎1 (mod 𝑚 1 )
𝑥 ≡ 𝑎2 (mod 𝑚 2 )
𝑥 ≡𝑥∗ (mod 𝑚 1𝑚 2 )
333
C HAPTER 19. N UMBER T HEORY
where 𝑥∗ is the solution to the first two equations. We just proved those
two equations are equivalent with regards to 𝑥. This reduces the number of
equations to 𝑘 − 1, which by assumption the theorem holds for. Thus, it also
holds for 𝑘 equations.
Note that the theorem used an explicit construction of the solution, allowing
us to find what the unique solution to such a system is.
Radar
KTH Challenge 2014
We say that an integer 𝑧 is within distance 𝑦 of an integer 𝑥 modulo an integer 𝑚
if
𝑧 ≡ 𝑥 + 𝑡 (mod 𝑚)
where |𝑡 | ≤ 𝑦.
Find the smallest non-negative integer 𝑧 such that it is:
Input
The integers 0 ≤ 𝑚 1, 𝑚 2, 𝑚 3 ≤ 106 . The integers 0 ≤ 𝑥 1, 𝑥 2, 𝑥 3 ≤ 106 . The
integers 0 ≤ 𝑦1, 𝑦2, 𝑦3 ≤ 300.
Output
The integer 𝑧.
The problem gives rise to three linear equations of the form
𝑧 ≡ 𝑥 𝑖 + 𝑡𝑖 (mod 𝑚𝑖 )
where −𝑦𝑖 ≤ 𝑡𝑖 ≤ 𝑦𝑖 . If we fix all the variables 𝑡𝑖 , the problem reduces to solving
the system of equations using CRT. We could then find all possible values of
𝑧, and choose the minimum one. This requires applying the CRT construction
about 2 · 6003 = 432 000 000 times. Since the modulo operation involved is
quite expensive, this approach would use too much time. Instead, let us exploit
a useful greedy principle in finding minimal solutions.
Assume that 𝑧 is the minimal answer to an instance. There are only two
334
19.6. E ULER ’ S TOTIENT FUNCTION
In the first case, we only need to verify whether 𝑧 = 0 is a solution to the three
inequalities. In the second case, we managed to change an inequality to a linear
equation. By testing which of the 𝑖 this equation holds for, we only need to test
the values of 𝑡𝑖 for the two other equations. This reduce the number of times we
need to use the CRT to 6002 = 360 000 times, a modest amount well within the
time limit.
Definition 19.9 Two integers 𝑎 and 𝑏 are said to be relatively prime if their
only (and thus greatest) common divisor is 1. If 𝑎 and 𝑏 are relatively prime,
we write that 𝑎⊥𝑏.
Example 19.8 The numbers 74 and 22 are not relatively prime, since they
are both divisible by 2.
The numbers 72 and 65 are relatively prime. The prime factorization
of 72 is 2 · 2 · 2 · 3 · 3, and the factorization of 65 is 5 · 13. Since these
numbers have no prime factors in common, they have no divisors other than
1 in common.
335
C HAPTER 19. N UMBER T HEORY
Example 19.9 What is 𝜙 (12)? The numbers 2, 4, 6, 8, 10 all have the factor 2
in common with 12 and the numbers 3, 6, 9 all have the factor 3 in common
with 12.
This leaves us with the integers 1, 5, 7, 11 which are relatively prime to
12. Thus, 𝜙 (12) = 4.
For prime powers, 𝜙 (𝑝 𝑘 ) is easy to compute. The only integers which are
𝑝𝑘
not relatively prime to 𝜙 (𝑝 𝑘 ) are the multiples of 𝑝, which there are 𝑝 = 𝑝 𝑘−1
of, meaning
𝑝ℎ𝑖 (𝑝 𝑘 ) = 𝑝 𝑘 − 𝑝 𝑘−1 = 𝑝 𝑘−1 (𝑝 − 1)
It turns out 𝜙 (𝑛) has a property which is highly useful in computing certain
number theoretical functions – it is multiplicative, meaning
Computing 𝜙 for a single value can thus be done as quickly as factoring the
number. If we wish to compute 𝜙 for an interval [1, 𝑛] we can use the Sieve of
Eratosthenes.
336
19.6. E ULER ’ S TOTIENT FUNCTION
𝑎𝜙 (𝑛) ≡ 1 (mod 𝑛)
Proof. The proof of this theorem isn’t trivial, but it is number theoretically
interesting and helps to build some intuition for modular arithmetic. The idea
behind the proof will be to consider the product of the 𝜙 (𝑛) positive integers
less than 𝑛 which are relatively prime to 𝑛. We will call these 𝑥 1, 𝑥 2, . . . , 𝑥𝜙 (𝑛) .
Since these are all distinct integers between 1 and 𝑛, they are incongruent
modulo 𝑛. We call such a set of 𝜙 (𝑛) numbers, all incongruent modulo 𝑛 a
complete residue system (CRS) modulo 𝑛.
Next, we will prove that 𝑎𝑥 1, 𝑎𝑥 2, . . . , 𝑎𝑥𝜙 also form a CRS modulo 𝑛. We
need to show two properties for this:
We will start with the first property. Since both 𝑎 and 𝑥𝑖 are relatively
prime to 𝑛, neither number have a prime factor in common with 𝑛. This
means 𝑎𝑥𝑖 have no prime factor in common with 𝑛 either, meaning the two
numbers are relatively prime. The second property requires us to make use of
the cancellation property of modular arithmetic (Theorem 19.7). If 𝑎𝑥𝑖 ≡ 𝑎𝑥 𝑗
(mod 𝑛), the cancellation law gives us 𝑥𝑖 ≡ 𝑥 𝑗 (mod 𝑛). Since all 𝑥𝑖 are
incongruent modulo 𝑛, we must have 𝑖 = 𝑗, meaning all the numbers 𝑎𝑥𝑖
are incongruent as well. Thus, these numbers did indeed form a complete
residue system modulo 𝑛.
If 𝑎𝑥 1, . . . , 𝑎𝑥𝜙 (𝑛) form a CRS, we know every 𝑎𝑥𝑖 must be congruent to
some 𝑥 𝑗 , meaning
337
C HAPTER 19. N UMBER T HEORY
Since all the 𝑥𝑖 are relatively prime to 𝑛, we can again use the cancellation
law, leaving
𝑎𝜙 (𝑛) ≡ 1 (mod 𝑛)
completing our proof of Euler’s theorem.
For primes 𝑝 we get a special case of Euler’s theorem when since 𝜙 (𝑝) =
𝑝 − 1.
Corollary 19.1
Fermat’s Theorem For a prime 𝑝 and an integer 𝑎⊥𝑝, we have
𝑎𝑝−1 ≡ 1 (mod 𝑝)
Competitive Tip
Exponial
Nordic Collegiate Programming Contest 2016
Define the exponial of 𝑛 as the function
21
(𝑛−2) ...
𝑜𝑝𝑒𝑥𝑝𝑜𝑛𝑖𝑎𝑙 (𝑛) = 𝑛 (𝑛−1)
338
19.6. E ULER ’ S TOTIENT FUNCTION
(in 𝑒), with a period of 𝜙 (𝑚), maybe when computing 𝑛 (𝑛−1) we could compute
...
𝑒 = (𝑛 − 1) ... modulo 𝜙 (𝑚) and only then compute 𝑛𝑒 (mod 𝑚)? Alas, this
is only useful when 𝑛⊥𝑚, since this is a necessary precondition for Euler’s
theorem. When working modulo some integer 𝑚 with a prime factorization of
𝑝 𝑒11 · · · 𝑝𝑘𝑒𝑘 , a helpful approach is to instead work modulo its prime powers 𝑝𝑖𝑒𝑖
and then combine the results using the Chinese remainder theorem. Since the
prime powers of a prime factorization is relatively prime, the remainder theorem
applies.
Let us apply this principle to Euler’s theorem. When computing 𝑛𝑒 mod 𝑝 𝑘
we have two cases. Either 𝑝 | 𝑛, in which case 𝑛𝑒 ≡ 0 (mod 𝑝 𝑘 ) whenever
𝑒 ≥ 𝑘. Otherwise, 𝑝⊥𝑛, and 𝑛𝑒 ≡ 𝑛𝑒 mod 𝜙 (𝑝 ) (mod 𝑝 𝑘 ) by Euler’s theorem.
𝑘
𝑝𝑖𝑒𝑖 ≤ 𝑛 ⇒
still have the problem that 𝑛 can be up to 109 . We would need to perform a
number of exponentiations that is linear in 𝑛, which is slow for such large 𝑛.
However, our modulo will actually very quickly converge to 1. While the final
result is taken modulo 𝑚, the first recursive call is taken modulo 𝜙 (𝑚). The
recursive call performed at the next level will thus be modulo 𝜙 (𝜙 (𝑚)), and so
on. That this sequence decrease very quickly is based on two facts. For even 𝑚,
𝜙 (𝑚) = 𝜙 (2)𝜙 ( 𝑚2 ) = 𝜙 ( 𝑚2 ) ≤ 𝑚2 . For odd 𝑚, 𝜙 (𝑚) is even. Any odd 𝑚 consists
only of odd prime factors, but since 𝜙 (𝑝) = 𝑝 − 1 (i.e. an even number for odd
primes 𝑝) and 𝜙 is multiplicative, 𝜙 (𝑚) must be even. Thus 𝜙 (𝜙 (𝑚)) ≤ 𝑚2 for
339
C HAPTER 19. N UMBER T HEORY
𝑚 > 1 (1 is neither even nor contains an odd prime factor). This means the
modulo will become 1 in a logarithmic number of iterations, completing our
algorithm.
340
19.6. E ULER ’ S TOTIENT FUNCTION
Chapter Exercises
Problem 19.12
Longest Composite Sum – longcompositesum
Inheritance – inheritance
Evening Out 3 – eveningout3
Indivisible Sequence – indivisibleseq
Let 𝑆 be the set of all positive integer divisors of 𝑘. How many numbers are
the product of two distinct elements of 𝑆?
Happy Happy Prime Prime – happyprime
Prime Path – primepathc
Chapter Notes
A highly theoretical introduction to classical number theory can be found in
An Introduction to the Theory of Numbers[13] While devoid of exercises and
examples, it is very comprehensive.
A Computational Introduction to Number Theory and Algebra[24] instead
takes a more applied approach, and is freely available under a Creative Commons
license at the authors home page.3.
3http://www.shoup.net/ntb/
341
C HAPTER 19. N UMBER T HEORY
342
20 Competitive Programming Strat-
egy
Competitive programming is what we call the mind sport of solving algorithmical
problems and coding their solutions, often under the pressure of time. Most
programming competitions are performed online, at your own computer through
some kind of online judge system. For students of either high school or
university, there are two main competitions. High school students compete in
the International Olympiad in Informatics (IOI), and university students go for
the International Collegiate Programming Contest (ICPC).
Different competition styles have different difficulty, problem types and
strategies. In this chapter, we will discuss some basic strategy of programming
competitions, and give tips on how to improve your competitive skills.
20.1 IOI
The IOI is an international event where a large number of countries send teams
of up to 4 high school students to compete individually against each other during
two days of competition. Every participating country has its own national
selection olympiad first.
During a standard IOI contest, contestants are given 5 hours to solve 3
problems, each worth at most 100 points. These problems are not given in
any particular order, and the scores of the other contestants are hidden until
the end of the contest. Generally none of the problems are “easy” in the sense
that it is immediately obvious how to solve the problem in the same way the
first 1-2 problems of most other competitions are. This poses a large problem,
in particular for the amateur. Without any trivial problems nor guidance from
other contestants on what problems to focus on, how does an IOI competitor
prioritize? The problem is further exacerbated by problems not having a simple
binary scoring, with a submission being either accepted or rejected. Instead,
IOI problems contain many so-called subtasks. These subtasks give partial
credit for the problem, and contain additional restrictions and limits on either
343
C HAPTER 20. C OMPETITIVE P ROGRAMMING S TRATEGY
input or output. Some problems do not even use discrete subtasks. In these
tasks, scoring is done on some scale which determines how “good” the output
produced by your program is.
Strategy
Very few contestants manage to solve every problem fully during an IOI contest.
There is a very high probability you are not one of them, which leaves you
with two options – you either skip a problem entirely, or you solve some of its
subtasks. At the start of the competition, you should read through every problem
and all of the subtasks. In the IOI you do not get extra points for submitting faster.
Thus, it does not matter if you read the problems at the beginning instead of
rushing to solve the first problem you read. Once you have read all the subtasks,
you will often see the solutions to some of the subtasks immediately. Take note
of the subtasks which you know how to solve!
Deciding on which order you should solve subtasks in is probably one of
the most difficult parts of the IOI for contestants at or below the silver medal
level. In IOI 2016, the difference between receiving a gold medal and a silver
medal was a mere 3 points. On one of the problems, with subtasks worth 11,
23, 30 and 36 points, the first silver medalist solved the third subtask, worth 30
points (a submission that possibly was a failed attempt at 100 points). Most
competitors instead solved the first two subtasks, together worth 34 points. If
the contestant had solved the first two subtasks instead, he would have gotten a
gold medal.
The problem basically boils down to the question when should I solve
subtasks instead of focusing on a 100 point solution? There is no easy answer
to this question, due to the lack of information about the other contestants’
performances. First of all, you need to get a good sense of how difficult a
solution will be to implement correctly before you attempt it. If you only have
30 minutes left of a competition, it might not be a great idea to go for a 100
point solution on a very tricky problem. Instead, you might want to focus on
some of the easier subtasks you have left on this or other problems. If you fail
your 100 point solution which took over an hour to code, it is nice to know you
did not have some easy subtasks worth 30-60 points which could have given
you a medal.
Problems without discrete scoring (often called heuristic problems) are
almost always the hardest ones to get a full score on. These problems tend to
344
20.1. IOI
be very fun, and some contestants often spend way too much time on these
problems. They are treacherous in that it is often easy to increase your score by
something. However, those 30 minutes you spent to gain one additional point
may have been better spent coding a 15 point subtask on another problem. As a
general rule, go for the heuristic problem last during a competition. This does
not mean to skip the problem unless you completely solve the other two, just to
focus on them until you decide that the heuristic problem is worth more points
if given the remaining time.
In IOI, you are allowed to submit solution attempts a large number of times,
without any penalty. Use this opportunity! When submitting a solution, you
will generally be told the results of your submission on each of the secret test
cases. This provides you with much details. For example, you can get a sense of
how correct or wrong your algorithm is. If you only fail 1-2 cases, you probably
just have a minor bug, but your algorithm in general is probably correct. You
can also see if your algorithm is fast enough, since you will be told the execution
time of your program on the test cases. Whenever you make a change to your
code which you think affect correctness or speed – submit it again! This gives
you a sense of your progress, and also works as a good regression test. If your
change introduced more problems, you will know.
Whenever your solution should pass a subtask, submit it. These subtask
results will help you catch bugs earlier when you have less code to debug.
Getting Better
The IOI usually tend to have pretty hard problems. Some areas get rather little
attention. For example, there are basically no pure implementation tasks and
very little geometry.
First and foremost, make sure you are familiar with all the content in the IOI
syllabus1. This is an official document which details what areas are allowed in
IOI tasks. This book deals with most, if not all of the topics in the IOI syllabus.
In the Swedish IOI team, most of the top performers tend to also be good
mathematical problem solvers (also getting IMO medals). Combinatorial
problems from mathematical competitions tend to be somewhat similar to
the algorithmic frame of mind, and can be good practice for the difficult IOI
problems.
When selecting problems to practice on, there are a large number of national
1 https://people.ksp.sk/~misof/ioi-syllabus/
345
C HAPTER 20. C OMPETITIVE P ROGRAMMING S TRATEGY
20.2 ICPC
In ICPC, you compete in teams of three to solve about 10-12 problems during 5
hours. A twist in in the ICPC-style competitions is that the team shares a single
computer. This makes it a bit harder to prioritize tasks in ICPC competitions
than in IOI competitions. You will often have multiple problems ready to be
coded, and wait for the computer. In ICPC, you see the progress of every other
team as well, which gives you some suggestions on what to solve. As a beginner
or medium-level team, this means you will generally have a good idea on what to
solve next, since many better teams will have prioritized tasks correctly for you.
ICPC scoring is based on two factors. First, teams are ranked by the number
of solved problems. As a tie breaker, the penalty time of the teams are used.
The penalty time of a single problem is the number of minutes into the contest
when your first successful attempt was submitted, plus a 20 minute penalty for
any rejected attempts. Your total penalty time is the sum of penalties for every
problem.
Strategy
In general, teams will be subject to the penalty tie-breaking. In the 2016 ICPC
World Finals, both the winners and the team in second place solved 11 problems.
2http://hsin.hr/coci/
3http://main.edu.pl/en/archive/oi
346
20.2. ICPC
Their penalty time differed by a mere 7 minutes! While such a small penalty
difference in the very top is rather unusual, it shows the importance of taking
your penalty into account.
Minimizing penalties generally comes down to a few basic strategic points:
In the very beginning of an ICPC contest, the first few problems will be
solved quickly. In 2016, the first accepted submissions to five of the problems
came in after 11, 15, 18, 32, 44 minutes. On the other hand, after 44 minutes
no team had solved all of those problems. Why does not every team solve the
problems in the same order? Teams are of different skill in different areas, make
different judgment calls regarding difficulty or (especially early in the contest)
simply read the problem in a different order. The better you get, the harder it is
to distinguish between the “easy” problems of a contest – they are all “trivial”
and will take less than 10-15 minutes to solve and code.
Unless you are a very good team or have very significant variations in skill
among different areas (e.g., you are graph theory experts but do not know how
to compute the area of a triangle), you should probably follow the order the
other teams choose in solving the problems. In this case, you will generally
always be a few problems behind the top teams.
The better you get, the harder it is to exploit the scoreboard. You will more
often be tied in the top with teams who have solved the exact same problems.
The problems which teams above you have solved but you have not may only be
solved by 1-2 teams, which is not a particularly significant indicator in terms
of difficulty. Teams who are very strong at math might prioritize a hard maths
problem before an easier (on average for most teams) dynamic programming
problem. This can risk confusing you into solving the wrong problems for the
particular situation of your team.
The amount of cooperation during a contest is difficult to decide upon. The
optimal amount varies a lot between different teams. In general, the amount of
cooperation should increase within a single contest from the start to the end.
In the beginning, you should work in parallel as much as possible, to quickly
read all the problems, pick out the easy-medium problems and start solving
347
C HAPTER 20. C OMPETITIVE P ROGRAMMING S TRATEGY
them. Once you have competed in a few contests, you will generally know the
approximate difficulty of the simplest tasks, so you can skim the problem set for
problems of this difficulty. Sometimes, you find an even easier problem in the
beginning than the one the team decided to start coding.
If you run out of problems to code, you waste computer time. Generally, this
should not happen. If it does, you need to become faster at solving problems.
Towards the end of the contest, it is a common mistake to parallelize on
several of the hard problems at the same time. This carries a risk of not solving
any of the problems in the end, due to none of the problems getting sufficient
attention. Just as with subtasks in IOI, this is the hardest part of prioritizing
tasks. During the last hour of an ICPC contest, the previously public scoreboard
becomes frozen. You can still see the number of attempts other teams make, but
not whether they were successful. Hence, you can not really know how many
problems you have to solve to get the position that you want. Learning your
own limits and practicing a lot as a team – especially on difficult contests – will
help you get a feeling for how likely you are to get in all of your problems if you
parallelize.
Read all the problems! You do not want to be in a situation where you run
out of time during a competition, just to discover there was some easy problem
you knew how to solve but never read the statement of. ICPC contests are made
more complex by the fact that you are three different persons, with different
skills and knowledge. Just because you can not solve a problem does not mean
your team mates will not find the problem trivial, have seen something similar
before or are just better at solving this kind of problem.
The scoreboard also displays failed attempts. If you see a problem where
many teams require extra attempts, be more careful in your coding. Maybe you
can perform some extra tests before submitting, or make a final read-through of
the problem and solution to make sure you did not miss any details.
If you get Wrong Answer, you may want to spend a few minutes to code up
your own test case generators. Prefer generators which create cases where you
already know the answers. Learning e.g. Python for this helps, since it usually
takes under a minute to code a reasonable complex input generator.
If you get Time Limit Exceeded, or even suspect time might be an issue
– code a test case generator. Losing a minute on testing your program on the
worst case, versus a risk of losing 20 minutes to penalty is be a trade-off worth
considering on some problems.
348
20.2. ICPC
You are allowed to ask questions to the judges about ambiguities in the
problems. Do this the moment you think something is ambiguous (judges
generally take a few valuable minutes in answering). Most of the time they give
you a “No comment” response, in which case the perceived ambiguity probably
was not one.
If neither you nor your team mates can find a bug in a rejected solution,
consider coding it again from scratch. Often, this can be done rather quickly
when you have already coded a solution.
Getting Better
• Practice a lot with your team. Having a good team dynamic and learning
what problems the other team members excel at can be the difference that
helps you to solve an extra problem during a contest.
• Learn to write code on paper while waiting for the computer. In particular,
tricky subroutines and formulas are great to hammer out on paper before
occupying valuable computer time.
• Focus your practice on your weak areas. If you write buggy code, learn
your programming language better and code many complex solutions.
If your team is bad at geometry, practice geometry problems. If you
get stressed during contests, make sure you practice under time pressure.
For example, Codeforces (http://codeforces.com) has an excellent gym
feature, where you can compete retroactively in a contest using the same
amount of time as in the original contest. The scoreboard will then show
the corresponding scoreboard from the original contest during any given
time.
349
C HAPTER 20. C OMPETITIVE P ROGRAMMING S TRATEGY
350
21 Papers
This chapter contains a series of problems not associated to any particular topics.
They are a way for you to practice problem solving without being primed with
the techniques that will come up, and a way for us to show techniques, tricks and
combinations that does not fit naturally into the text of any particular chapter.
The problems are divided into shorter “papers”. Each paper includes a
suggested time that could be used if the problems were posed in a real contest.
Some of the papers are actually real contests that was given in the past.
After each contest, you can find solution descriptions of the problem.
351
C HAPTER 21. PAPERS
21.1 Paper 3
Divisor Solitaire
Nicolaas likes playing a solitaire game about divisors. First, he picks an integer
𝑁 (1 ≤ 𝑁 ≤ 1014 ). Then, in each round of the game he picks a divisor of
𝑁 , such that it neither divides or is a divisor of any previously picked number.
Given 𝑁 , determine the maximum number of rounds he can play.
Problem 21.1
Divisor Solitaire – divisorsolitaire
352
21.1. PAPER 3
Solutions
Divisor Solitaire A reasonable guess guided by combinatorial intuition is to
pick as solution exactly those divisors where Ω(𝑑) = 𝑙 for some 𝑙. Two such
number cannot be divisors of each other (why?). It is also difficult to come
up with something better in the case where all 𝑒𝑖 = 1 (i.e. 𝑁 is the product
Í
of
𝑘
distinct primes). In particular, we prove that it is optimal to let 𝑙 = b 𝑖=1
𝑒 𝑖
2 c.
Roughly, we show that all divisors of 𝑁 can be partitioned into chains of
divisors of the form 𝑑 1, 𝑑 2, . . . , 𝑑𝑚 such that 𝑑𝑖 | 𝑑𝑖+1 and Ω(𝑑𝑖 ) = 𝑙 for some
𝑖. All the integers in a chain divide each other, so we can only pick a single
divisor from each chain, making the number of such chains an upper bound to
the answer. As each chain contains a 𝑑𝑖 with Ω(𝑑𝑖 ) = 𝑙 (there can only be one
per chain), then the number of divisors of this kind must also be an upper bound.
Since that upper bound is attainable (by picking the set of all such divisors), it is
the largest possible such set.
More specifically, every chain 𝑑 1 , . . . , 𝑑ℎ , will be such that 𝑑𝑖+1 = 𝑑𝑖 𝑝
for some prime 𝑝, and Ω(𝑑 1 ) + Ω(𝑑ℎ ) = 𝑁 . The first condition implies
Ω(𝑑𝑖+1 ) = Ω(𝑑𝑖 ) + 1 and the second implies Ω(𝑑 1 ) ≤ 𝑙 and Ω(𝑑ℎ ) ≥ 𝑙, so that
Ω(𝑑𝑖 ) = 𝑙 for some 𝑖.
Such a partition exists by the following construction. Assume that all
divisors of 𝑁 0 = 𝑘𝑖=2 𝑝𝑖𝑒𝑖 can be partitioned into such chains. If 𝑑 1 , . . . , 𝑑ℎ , is
Î
such a chain, we can also partition the ℎ𝑒 1 integers 𝑑𝑖 · 𝑝 1𝛼 into chains. First,
take 𝑑 1 , 𝑑 1𝑝 1 , . . . , 𝑑 1𝑝 𝑒11 −1 , 𝑑 1𝑝 𝑒11 , 𝑑 2𝑝 𝑒11 , . . . 𝑑ℎ 𝑝 𝑒11 . This is a valid chain, since
Ω(𝑑 1 ) + Ω(𝑑ℎ 𝑝 𝑒11 ) = Ω(𝑑 1 ) + Ω(𝑑ℎ ) + 𝑒 1 = Ω(𝑁 0) + 𝑒 1 = Ω(𝑁 0𝑝 𝑒11 ) = Ω(𝑁 ).
Similarly, 𝑑 2 , . . . , 𝑑 2𝑝 𝑒11 −1 , . . . 𝑑ℎ 𝑝 𝑒11 −1 is a chain, and so on. Eventually, all
numbers will be in one such chain. If we repeat this for all chains of 𝑁 0, every
divisor of 𝑁 will also belong to a chain.
Constructing a partition for 𝑁 0 can be done in the exact same manner. The
base case where we are partitioning the divisors of a single prime power 𝑝 𝑘 is
straightforward – 1, 𝑝, . . . , 𝑝 𝑘 is exactly a chain.
Finally, computing the number of such divisors is then a straightforward
exercise in bruteforce after factoring 𝑁 to compute the exponents 𝑒 1 , . . . , 𝑒𝑘 .
The proof is due to De Bruijn et al[9].
353
C HAPTER 21. PAPERS
354
Part III
Advanced Topics
355
22 Data Structures
22.1 Self-Balancing Trees
22.2 Persistent Data Structures
22.3 Heavy-Light Decomposition
357
C HAPTER 22. D ATA S TRUCTURES
358
23 Combinatorics
23.1 Convolutions
Fast Fourier Transform
Number Theoretic Transform
359
C HAPTER 23. C OMBINATORICS
360
24 Strings
24.1 Hashing
Hashing is a concept most familiar from the hash table data structure. The idea
behind the structure is to compress a set 𝑆 of elements from a large set to a
smaller set, in order to quickly determine memberships of 𝑆 by having a direct
indexing of the smaller set into an array (which has Θ(1) look-ups). In this
section, we are going to look at hashing in a different light, as a way of speeding
up comparisons of data. When comparing two pieces of data 𝑎 and 𝑏 of size
𝑛 for equality, we need to use Θ(𝑛) time in the worst case since every bit of
data must be compared. This is fine if we perform only a single comparison. If
we instead wish to compare many pieces of data, this becomes an unnecessary
bottleneck. We can use the same kind of hashing as with hash tables, by defining
a “random” function 𝐻 (𝑥) : 𝑆 → Z𝑛 such that 𝑥 ≠ 𝑦 implies 𝐻 (𝑥) ≠ 𝐻 (𝑦) with
high probability. Such a function allows us to perform comparisons in Θ(1) time
(with linear preprocessing), by reducing the comparison of arbitrary data to small
integers (we often choose 𝑛 to be on the order of 232 or 264 to get constant-time
comparisons). The trade-off lies in correctness, which is compromised in the
unfortunate event that we perform a comparison 𝐻 (𝑥) = 𝐻 (𝑦) even though
𝑥 ≠ 𝑦.
FriendBook
Swedish Olympiad in Informatics 2011, Finals
FriendBook is a web site where you can chat with your friends. For a long time,
they have used a simple “friend system” where each user has a list of which other
users are their “friends”. Recently, a somewhat controversial feature was added,
namely a list of your “enemies”. While the friend relation will always be mutual
(two users must confirm that they wish to be friends), enmity is sometimes
one-way – a person 𝐴 can have an enemy 𝐵, who – by plain animosity – refuse
to accept 𝐴 as an enemy.
Being a poet, you have lately been pondering the following quote.
361
C HAPTER 24. S TRINGS
Given a FriendBook network, you wonder to what extent this quote applies.
More specifically, for how many pairs of users is it the case that they are either
friends with identical enemy lists, or are not friends and does not have identical
enemy lists?
Input
The first line contains an integer 2 ≤ 𝑁 ≤ 5000, the number of friends on
FriendBook. 𝑁 lines follow, each containing 𝑛 characters. The 𝑐’th character on
the 𝑟 ’th line 𝑆𝑟𝑐 species what relation person 𝑟 has to person 𝑐. This character is
either
F – if 𝑟 thinks of 𝑐 as an enemy.
362
24.1. H ASHING
363
C HAPTER 24. S TRINGS
Radio Transmission
Baltic Olympiad in Informatics 2009
Given is a string 𝑆. Find the shortest string 𝐿, such that 𝑆 is a substring of the
infinite string 𝑇 = . . . 𝐿𝐿𝐿𝐿𝐿 . . . .
Input
The first and only line of the input contains the string 𝑆, with 1 ≤ |𝑆 | ≤ 106 .
Output
Output the string 𝐿. If there are multiple strings 𝐿 of the shortest length, you
can output any of them.
Assume that 𝐿 has a particular length 𝑙. Then, since 𝑇 is periodic with length
𝑙, 𝑆 must be too (since it is a substring of 𝑇 ). Conversely, if 𝑆 is periodic with
some length 𝑙, can can choose as 𝐿 = 𝑠 1𝑠 2 . . . 𝑠𝑙 . Thus, we are actually seeking
the smallest 𝑙 such that 𝑆 is periodic with length 𝑙. The constraints this puts on 𝑆
are simple. We must have that
𝑠 1 = 𝑠𝑙+1 = 𝑠 2𝑙+1 = . . .
𝑠 2 = 𝑠𝑙+2 = 𝑠 2𝑙+2 = . . .
...
𝑠𝑙 = 𝑠 2𝑙 = 𝑠 3𝑙 = . . .
Using this insight as-is gives us a 𝑂 (|𝑆 | 2 ) algorithm, where we first fix 𝑙 and then
verify if those constraints hold. The idea is sound, but a bit slow. Again, the
problematic step is that we need to perform many slow, linear-time comparisons.
364
24.1. H ASHING
𝑠 1𝑠 2 . . . 𝑠𝑛−𝑙+1 = 𝑠𝑙+1𝑠𝑙+2 . . . 𝑠𝑛
Radio Transmission
1 H lh = 0, Rh = 0;
2 int l = 0;
3 for (int i = 1; i <= n; ++i) {
4 Lh = (Lh * p + S[i]) % M;
5 Rh = (S[n - i + 1] * p^(i - 1) + Rh) % M;
6 if (Lh == Rh) {
7 l = i;
8 }
9 }
10 cout << n - l << endl;
365
C HAPTER 24. S TRINGS
· · · + |𝑠 𝑁 | is at most 50 000.
Output
For each query 𝐿, 𝑅, 𝑆, output a line with the answer to the query.
Let us focus on how to solve the problem where every query has the same
string 𝑆. In this case, we would first find which of the strings 𝑠𝑖 that 𝑆 is contained
in using polynomial hashing. To respond to a query, could for example keep
a set of all the 𝑖 where 𝑠𝑖 was an occurrence together with how many smaller
𝑠𝑖 contained the string (i.e. some kind of partial sum). This would allow us
to respond to a query where 𝐿 = 1 using a upper bound in our set. Solving
queries of the form [1, 𝑅] is equivalent to general intervals however, since the
interval [𝐿, 𝑅] is simply the interval [1, 𝑅] with the interval [1, 𝐿 − 1] removed.
This procedure would take Θ( |𝑠𝑖 |) time to find the occurrences of 𝑆, and
Í
366
24.1. H ASHING
21 patterns[sz(s)].insert(S);
22 }
23
24 map<H, set<pii>> hits;
25 trav(pat, patterns) {
26 rep(i,0,N) {
27 vector<H> hashes = rollHash(s[i], pat.first);
28 trav(h, hashes)
29 if (pat.second.count(h))
30 hits[h].emplace(i, sz(hits[h]) + 1);
31 }
32 }
33
34 trav(query, queries) {
35 H h = polyHash(get<2>(query));
36 cout << countInterval(R, hits[h]) - countInterval(L-1, hits[h]) << endl;
37 }
38 }
367
C HAPTER 24. S TRINGS
One might be tempted to choose 𝑀 = 264 and use the overflow of 64-bit
integers as a cheap way of using hashes modulo 264 . This is a bad idea, since it
is possible to construct strings which are highly prone to collisions.
Theorem 24.2
𝑛 (𝑛+1)
For a polynomial hash 𝐻 with an odd 𝑝, 2 2 | 𝐻 (𝜏𝑛 ) − 𝐻 (𝜏𝑛 ).
and
𝐻 (𝜏𝑛 ) = 𝐻 (𝜏𝑛−1 ||𝜏𝑛−1 ) = 𝑝 2
𝑛−1
· 𝐻 (𝜏𝑛−1 ) + 𝐻 (𝜏𝑛−1 )
Then,
𝐻 (𝜏𝑛 ) − 𝐻 (𝜏𝑛 ) = 𝑝 2
𝑛−1
(𝐻 (𝜏𝑛−1 ) − 𝐻 (𝜏𝑛−1 )) + (𝐻 (𝜏𝑛−1 ) − 𝐻 (𝜏𝑛−1 ))
2𝑛−1
= (𝑝 − 1) (𝐻 (𝜏𝑛−1 ) − 𝐻 (𝜏𝑛−1 ))
368
24.1. H ASHING
(𝑛−1)𝑛 𝑛 (𝑛+1)
But 2𝑛 · 2 2 =2 2 , proving our statement.
This means that we can construct a string of length linear in the bit size of 𝑀
that causes hash collisions if we choose 𝑀 as a power of 2, explaining why it is
a bad choice.
2D Polynomial Hashing
Polynomial hashing can also be applied to pattern matching in grids, by first
performing polynomial hashing on all rows of the grid (thus reducing the grid
to a sequence) and then on the columns.
Surveillance
Swedish Olympiad in Informatics 2016, IOI Qualifiers
Given a matrix of integers 𝐴 = (𝑎𝑟,𝑐 ) find all occurrences of another matrix
𝑃 = (𝑝𝑟,𝑐 ) in 𝐴 which may differ by a constant 𝐶. An occurrence (𝑖, 𝑗) means
that 𝑎𝑖+𝑟,𝑗+𝑐 = 𝑝𝑟,𝑐 + 𝐶 where 𝐶 is a constant.
𝑎 1,𝑗 − 𝑝 1,1 = 𝑐
...
𝑎 1,𝑗+𝑛−1 − 𝑝 1,𝑛 = 𝑐
Since 𝑐 is arbitrary, this means the only condition is that
369
C HAPTER 24. S TRINGS
370
A Discrete Mathematics
This appendix reviews some basic discrete mathematics. Without a good grasp
on the foundations of mathematics, algorithmic problem solving is basically
impossible. When we analyze the efficiency of algorithms, we use sums,
recurrence relations and a bit of algebra. Some basic topics, such as set theory,
are essential to even understand some of the proofs and problems in this book.
This mathematical preliminary touches lightly upon these topics and is
meant to complement a high school education in mathematics in preparation for
the remaining text. While you can probably get by with the mathematics from
this chapter, we highly recommend that you (at some point) delve deeper into
discrete mathematics.
We do assume that you are familiar with induction or contradiction, and
mathematics that is part of a pre-calculus course (trigonometry, polynomials,
etc). Some more mathematically advanced parts of this book will go beyond
these assumptions, but this is only the case in very few places.
371
A PPENDIX A. D ISCRETE M ATHEMATICS
A.1 Logic
In mathematics, we often deal with truths and falsehoods in the form of theorems,
proofs, counter-examples and so on. Mathematical logic is a very exact discipline,
and a precise language has been developed to help us deal with logical statements.
For example, consider the statements
The first statement uses the logical connective or. It connects two statements,
and requires only one of them to be true in order for the whole statement to be
true. Since any integer is either odd or even, the statement is true.
The second statement is not really a logical statement. While we might have
a personal conviction regarding the entertainment value of programming and
maths, it is hard to consider the statements as having a truth value.
The third statement tells us that two statements are equivalent – one is true
exactly when the other is. This is also a true statement by some simple algebraic
manipulations.
The fourth statement concerns every object if some kind. It is a false
statement, a fact that can be proved by exhibiting e.g., a green apple.
The fifth statement is true. It asks whether something exists, a statement we
can prove by presenting an integer such as 42.
The sixth and last statement complicates matters by introducing an implica-
tion. It is a two-part statement, which only makes a claim regarding the second
part if the first part is true. Since no odd number divisible by 6 exists, it makes
no statement about the evenness of every integer. Thus, this implication is true.
To express such statements, a language has been developed where all these
logical operations such as existence, implication and so on have symbols
assigned to them. This enables us to remove the ambiguity inherent in the
372
A.1. L OGIC
English language, which is of utmost importance when dealing with the exactness
required by logic.
The disjunction (𝑎 is true or 𝑏 is true) is a common logical connective. It is
given the symbol ∨, so that the above statement is written as 𝑎 ∨ 𝑏. Another
common connective, the conjunction (𝑎 is true and 𝑏 is true) is assigned the
symbol ∧. For example, we write that 𝑎 ∧ 𝑏 for the statement that both 𝑎 and 𝑏
are true.
The third statement introduced the equivalence, a statement of the form “𝑎
is true if, and only if, 𝑏 is true”. This is the same as 𝑎 → 𝑏 (the only if part)
and 𝑏 → 𝑎 (the if part). We use the symbol ↔, which follows naturally for this
reason. The statement would then be written as
𝑥 < 0 ↔ 𝑥3 < 0
Logical also contains quantifiers. The fourth statement, that every apple is
blue, actually makes a large number of statements – one for each apple. This
concept is captured using the universal quantifier ∀, read as “for every”. For
example, we could write the statement as
∀ apple 𝑎 : 𝑎 is blue
In the fifth statement, another quantifier was used, which speaks of the
existence of something; the existential quantifier ∃, which we read as “there
exists”. We would write the second statement as
∃𝑥 : 𝑥 is an integer
An implication is a statement of the form “if 𝑎 is true, then 𝑏 must also be
true”. This is a statement on its own, which is true whenever 𝑎 is false (meaning
it does not say anything of 𝑏), or when 𝑎 is true and 𝑏 is true. We use the symbol
→ for this, writing the statement as 𝑎 → 𝑏. The sixth statement would hence be
written as
(∃𝑝 : 𝑝 is prime ∧ 𝑝 is divisible by 6) → ∀ prime 𝑝 : 𝑝 is even
The negation operator ¬ inverts a statement. The statement “no penguin
can fly” would thus be written as
¬(∃ penguin 𝑝 : 𝑝 can fly)
or, equivalently
∀ penguin 𝑝 : ¬𝑝 can fly
373
A PPENDIX A. D ISCRETE M ATHEMATICS
Exercise A.1. Write the following statements using the logical symbols, and
determine whether they are true or false:
1) If 𝑎 and 𝑏 are odd integers, 𝑎 + 𝑏 is an even integer,
2) 𝑎 and 𝑏 are odd integers if and only if 𝑎 + 𝑏 is an even integer,
3) Whenever it rains, the sun does not shine,
4) 𝑎𝑏 is 0 if and only if 𝑎 or 𝑏 is 0
Our treatment of logic ends here. Note that much is left unsaid – it is a
most rudimentary walk-through. This section is mainly meant to give you some
familiarity with the basic symbols used in logic, since they will appear later. If
you wish to gain a better understanding of logic, you can follow the references
in the chapter notes.
To construct the set of all even integers, we would use the syntax
{2𝑖 | 𝑖 is an integer}
which is read as “the set containing all numbers of the form 2𝑖 where 𝑖 is an
integer. To construct the set of all primes, we would write
{𝑝 | 𝑝 is prime}
374
A.2. S ETS AND S EQUENCES
Certain sets are used often enough to be assigned their own symbols:
• Z – the set of integers {. . . , −2, −1, 0, 1, 2, . . . },
{2, 3} ⊆ {2, 3, 5, 7}
and
2 −1
, 2, ⊆Q
4 7
For any set 𝑆, we have that ∅ ⊆ 𝑆 and 𝑆 ⊆ 𝑆. Whenever a set 𝐴 is not a subset of
another set 𝐵, we write that 𝐴 * 𝐵. For example,
{2, 𝜋 } * Q
375
A PPENDIX A. D ISCRETE M ATHEMATICS
• ∅
• Z
• Z+
• {2𝑘 | 𝑘 ∈ Z}
Sets also have many useful operations defined on them. The intersection
𝐴 ∩ 𝐵 of two sets 𝐴 and 𝐵 is the set containing all the elements which are
members of both sets, i.e.,
𝑥 ∈ 𝐴∩𝐵 ⇔𝑥 ∈ 𝐴∧𝑥 ∈ 𝐵
If the intersection of two sets is the empty set, we call the sets disjoint. A
similar concept is the union 𝐴 ∪ 𝐵 of 𝐴 and 𝐵, defined as the set containing
those elements which are members of either set.
For example, if
Then,
𝑋 ∩ 𝑌 = {4}
𝑋 ∩𝑌 ∩𝑍 = ∅
𝑋 ∪ 𝑌 = {1, 2, 3, 4, 5, 6, 7}
𝑋 ∪ 𝑍 = {1, 2, 3, 4, 6, 7}
376
A.3. S UMS AND P RODUCTS
Many useful sums have closed forms – expressions in which we do not need
sums of a variable number of terms.
Exercise A.6. Prove the following identities:
𝑛
Õ
𝑐 = 𝑐𝑛
𝑖=1
𝑛(𝑛 + 1)
Õ𝑛
𝑖=
𝑖=1
2
𝑛
𝑛(𝑛 + 21 ) (𝑛 + 1)
𝑖2 =
Õ
𝑖=1
3
𝑛
Õ
2𝑖 = 2𝑛+1 − 1
𝑖=0
377
A PPENDIX A. D ISCRETE M ATHEMATICS
The sum of the inverses of the first 𝑛 natural numbers happen to have a very
neat approximation, which we will occasionally make use of later on:
1
𝑛
Õ
≈ ln 𝑛
𝑖=1
𝑛
𝑘
Ö
𝑎𝑖
𝑖=𝑗
Chapter Notes
If you need a refresher on some more basic mathematics, such as single-variable
calculus, Calculus [26] by Michael Spivak is a solid textbook. It is not the
easiest book, but one the best undergraduate text on single-variable calculus if
you take the time to work it through.
For a gentle introduction to discrete mathematics, Discrete and Combinato-
rial Mathematics: An Applied Introduction [12] by Ralph Grimaldi is a nice
book with a lot of breadth.
Logic in Computer Science [14] is an introduction to formal logic, with many
interesting computational applications. The first chapter on propositional logic
is sufficient for most algorithmic problem solving, but the remaining chapters
shows many non-obvious applications that makes logic relevant to computer
science.
378
A.3. S UMS AND P RODUCTS
One of the best works on discrete mathematics ever produced for the aspiring
algorithmic problem solver is Concrete Mathematics [15], co-authored by famous
computer scientist Donald Knuth. It is rather heavy-weight, and probably serves
better as a more in-depth study of the foundations of discrete mathematics rather
than an introductory text.
Graph Theory [10] by Reinhard Diestel is widely acknowledged as the go-to
book on more advanced graph theory concepts. The author is freely available
for viewing at the book’s home page1.
1 http://diestel-graph-theory.com/
379
A PPENDIX A. D ISCRETE M ATHEMATICS
380
Hints
1.1 Try dividing cards into smaller piles that can be sorted separately.
1.6 The optimal number of questions is 6.
2.12 Try solving it for the special case 𝑦 = 2 first.
5.1 In the best case, line 4 of the insertion sort pseudo code never executes.
5.3 When is log2 𝑛 < 𝑛?
5.4 𝑐 = 2 for the upper bound.
5.5
1) Yes.
2) No.
5.6 Binomial expansion.
5.7
6.10
7.4 Use that 1.61 + 1 > 1.612 and 1.62 + 1 < 1.622 .
7.5 The positive root of the equation 𝑥 3 = 𝑥 2 + 𝑥 1 + 1 lies between 1.83 and
1.84.
7.6
3) The 𝑛 choices are which of the two letters to put on each position in the
string.
4) The 𝑛 choices are whether to include each element or not.
7.7 Since the three recursions are structurally identical, they will have the same
time complexity 𝑇 (𝑛).
381
A PPENDIX A. D ISCRETE M ATHEMATICS
382
Solutions
1.1 One possible solution is to first divide the cards into separate piles by values
1 − 100 000, 100 001 − 200 000, . . . . If we sort each such pile, the entire stack
of cards is sorted by putting the piles together. Each such pile can be sorted the
same way by instead dividing the cards up based on their 100000 digits, and so
on.
1.2
5) The input consists of two integers 𝑎 and 𝑏, not both 0. The output should
be the greatest common divisor of 𝑎 and 𝑏.
6) The input consists of a sequence of real numbers, the coefficients 𝑥𝑖 of a
polynomial. The output should be a real number that is a root of the polynomial.
7) The input consists of two integers 𝑎 and 𝑏. The output should be the
product 𝑎𝑏.
1.6 One can achieve 6 questions by always asking about the midpoint of the
range of possible numbers. For example, by asking about the number 50, one
knows if the correct number is between 1 − 49 or 51 − 100.
1.8 Given an algorithm that is correct with a probability 0.5 + 𝛼 for some 𝛼 > 0,
we can find the correct answer by running it many times and chosing the answer
that was most common.
1.12
Palindrome If we let the input 𝑛-letter word 𝑆 have the letters 𝑠 0, 𝑠 1, . . . , 𝑠𝑛−1 ,
it reads the same backwards and forwards if 𝑠 0𝑠 1 . . . 𝑠𝑛−1 = 𝑠𝑛−1𝑠𝑛−2 . . . 𝑠 0 . We
thus need to check all the letter pairs (𝑠 0 , 𝑠𝑛−1 ), (𝑠 1 , 𝑠𝑛−2 ) and so forth for equality.
1: procedure Palindrome(string 𝑆)
2: for 𝑖 from 0 to 𝑛 − 1 do
3: if 𝑆𝑖 ≠ 𝑆𝑛−1−𝑖 then
4: return false
5: return true
383
A PPENDIX A. D ISCRETE M ATHEMATICS
Primality The problem can be solved by checking all the numbers between 2
and 𝑛 − 1 to see if any of them are divisors of 𝑛. If not, then it is prime. This
follows from the definition, and the fact that a (positive) divisor of a positive
integer can not be greater than the integer itself.
1: procedure Primality(integer 𝑛)
2: for 𝑖 from 2 to 𝑛 − 1 do
3: if 𝑖 divides 𝑛 then
4: return false
5: return true
2.12 We only analyze the case where 𝑥 and 𝑦 are positive. Assume that
0 ≤ 𝑎 < 𝑥𝑦 ≤ 𝑎 + 1, so that the result when rounded to an integer away from zero
is 𝑎 + 1. Multiplying by 𝑦 gives us 𝑎𝑦 < 𝑥 ≤ 𝑎𝑦 + 𝑦, so that 𝑎𝑦 ≤ 𝑥 − 1 < 𝑎𝑦 + 𝑦
(since all values are now integers). Finally, adding 𝑦 to both inequalities gives us
𝑎𝑦 + 𝑦 ≤ 𝑥 − 1 + 𝑦 < 𝑎𝑦 + 2𝑦. After dividing by 𝑦, we get 𝑎 + 1 ≤ 𝑦 < 𝑎 + 2.
𝑥−1+𝑦
This means that the result of 𝑦 rounded towards zero is 𝑎 + 1, which is what
𝑥−1+𝑦
we wanted.
Analysis for negative 𝑥 and 𝑦 is similar.
5.1 Consider the case when the array is already sorted. In this case, the inner
loop on line 4 never executes, since 𝐴[ 𝑗] ≥ 𝐴[ 𝑗 − 1] for all 𝑗. Thus, only the
lines that take linear time in total are executed, making 𝑂 (𝑛) an upper bound on
the base case. On the other hand, the loop on line 2 always executes a linear
number of times no matter the case, so Ω(𝑛) is also a lower bound. Thus, the
algorithm has a Θ(𝑛) best-case running time.
5.2 To compute the sum in Θ(𝑛) time, we can add all the variables to a counter
using a for loop, one at a time.
To solve the problem in constant time, the formula 1 + 2 + · · · + 𝑛 = 𝑛 (𝑛+1)
2
can be used.
5.3 Let 𝑛 0 = 7. For any 𝑛 ≥ 1, we have log2 𝑛 < 𝑛 as 𝑛 < 2𝑛 (which can be
proved using either induction or simple calculus). In this case, 10𝑛 2 + 7𝑛 −
5 + log2 𝑛 ≤ 10𝑛 2 + 𝑛 2 + 𝑛 2 = 12𝑛 2 . Thus, with 𝑐 = 12 we get the required
statement.
5.4 Clearly max{𝑓 (𝑛), 𝑔(𝑛)} ≤ 𝑓 (𝑛) + 𝑔(𝑛) since the maximum of the two
functions is always equal to one of the functions. This means that 𝑓 (𝑛) + 𝑔(𝑛) =
Ω(max{𝑓 (𝑛), 𝑔(𝑛)}) with 𝑐 = 1. Similarly, 𝑓 (𝑛) + 𝑔(𝑛) ≤ 2 max{𝑓 (𝑛), 𝑔(𝑛)}
384
A.3. S UMS AND P RODUCTS
by the fact that each function individually is smaller than their maximum.
Thus 𝑓 (𝑛) + 𝑔(𝑛) = 𝑂 (max{𝑓 (𝑛), 𝑔(𝑛)}) with 𝑐 = 2. Together this proves the
statement.
5.5
8) This is clear with 𝑐 = 2.
9) For any 𝑐, picking 𝑛 such that 2𝑛 > 𝑐 gives us 22𝑛 = 2𝑛 · 2𝑛 > 𝑐2𝑛 , so no
𝑐 can satisfy the definition.
5.6 First, note that polynomials of higher powers are always greater than
polynomials of lower powers eventually:
385
A PPENDIX A. D ISCRETE M ATHEMATICS
7.1 They are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377.
7.4 Assume that 𝑇 (𝑛) ≥ 1.61𝑛 for all 𝑛 < 𝑛 0, and for 𝑛 = 1. Then 𝑇 (𝑛 0) ≥
0 0 0 0
1.61𝑛 −1 + 1.61𝑛 −2 = 1.61𝑛 −2 (1.61 + 1) ≥ 1.61𝑛 so that 𝑇 (𝑛) ≥ 1.61𝑛 also for
𝑛 = 𝑛 0. By the principle of induction, 𝑇 (𝑛) ≥ 1.61𝑛 holds for all 𝑛 ≥ 0.
The proof is similar for the upper bound.
7.5 We have that the time function 𝑇 (𝑛) ≥ 𝑇 (𝑛 − 1) + 𝑇 (𝑛 − 2) + 𝑇 (𝑛 − 3). If,
by induction, 𝑇 (𝑘) > 1.83𝑘 for all 𝑘 < 𝑛 we get
so the claim holds for 𝑇 (𝑛) too. Proving the upper bound is similar
7.6
10) Let 𝐴(𝑛) be the number of such strings. If the last character of the string
was a B, the remaining string can be formed in 𝐴(𝑛 − 1) ways. If the last character
of the string was an A, the second last character must have been a B (to avoid two
consecutive A’s). There are 𝐴(𝑛 − 2) ways in which the remaining string can be
formed after fixing these two letters, so that 𝐴(𝑛) = 𝐴(𝑛 − 1) + 𝐴(𝑛 − 2). The
base cases are 𝐴(0) = 1 and 𝐴(1) = 2.
11) Let 𝐵(𝑛) be the number of such subsets. If the element 𝑛 is to be
included in the subset, we can choose the remaining 𝑛 − 1 elements in 𝐵(𝑛 − 1)
ways. If the element 𝑛 is to be excluded from the subset, the element 𝑛 − 1 must
be according to the problem. The remaining 𝑛 − 2 elements can then be chosen
in 𝐵(𝑛 − 2) ways, for the recursion 𝐵(𝑛) = 𝐵(𝑛 − 1) + 𝐵(𝑛 − 2). The base cases
are 𝐴(0) = 1 and 𝐴(1) = 2.
7.7 The time complexity fulfills 𝑇 (𝑛) = 2𝑇 (𝑛 − 1) + 𝑂 (1). By induction, we
get 𝑇 (𝑛) = Θ(2𝑛 ).
386
Bibliography
[1] Exact Exponential Algorithms. Fedor V. Fomin and Dieter Kratsch.
Springer, 2010.
[2] Noga Alon, Raphy Yuster, and Uri Zwick. Color-coding: A new method
for finding simple paths, cycles and other small subgraphs within large
graphs. In Proceedings of the Twenty-Sixth Annual ACM Symposium on
Theory of Computing, STOC ’94, page 326–335, New York, NY, USA,
1994. Association for Computing Machinery.
[4] David Beazley and Brian K. Jones. Python Cookbook. O’Reilly, 2013.
[6] Xuan Cai. Canonical coin systems for change-making problems. In 2009
Ninth International Conference on Hybrid Intelligent Systems, volume 1,
pages 499–504, Aug 2009.
387
B IBLIOGRAPHY
[15] Donald E. Knuth, Oren Patashnik, and Ronald Graham. Concrete Mathe-
matics: A Foundation for Computer Science. Addison-Wesley, 1994.
388
B IBLIOGRAPHY
[29] Mark A. Weiss. Data Structures and Algorithm Analysis in C++. Pearson,
2013.
389
Index
𝐾𝑛 , 132 composite number, 310
computational problem, 3
addition principle, 267 conjunction, 373
adjacency lists, 137 connected, 143
adjacency matrix, 136 connected component, 143
algorithm, 5 continue statement, 32
amortized complexity, 90 correctness, 7
and, 373 cycle, 143
and operator, 29 cycle decomposition, 273
array, 40
assignment operator, 21 data structure, 97
auto, 24 degree, of vertex, 133
Dijkstra’s Algorithm, 239
BFS, 139, 232 directed graph, 135
bijection, 271 disjoint sets, 376
binary search, 210 disjunction, 373
binary tree, 104 divide and conquer, 201
binomial coefficient, 278 divides exactly, 316
bipartite matching, 253 divisibility, 303
boolean, 24 divisor, 303
breadth-first search, 139, 232 double, 23
break statement, 32 Dyck path, 282
char, 22 dynamic array, 98
closed trail, 143 edge, 131
closed walk, 143 element, 374
combinatorics, 267 equivalence, 373
comment, 19 existential quantifier, 373
comparison operators, 28
compiler, 16 factorial, 270
complete graph, 132 fixed-size array, 97
component, 143 float, 23
390
I NDEX
391
I NDEX
quotient, 329
Rabin-Karp, 364
recursion, 117
recursive definition, 117
remainder, 329
sequence, 376
set, 374
simple graph, 131
stable sort, 58
stack, 52, 101
string, 22
structure, 37
subset, 375
sum operator, 377
test data, 10
time complexity, 83
total correctness, 7
travelling salesman problem, 149
tree, 146
trial division, 315
typedef, 24
union, 376
universal quantifier, 373
variable declaration, 21
vertex, 131
Visual Studio Code, 16
392