0% found this document useful (0 votes)
15 views119 pages

Dsap l02.PDF

Uploaded by

cheno0809
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views119 pages

Dsap l02.PDF

Uploaded by

cheno0809
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 119

Data Structures and

Algorithms Using Python


Lecture 2: Complexity, search, sorting

Heikki Peura
[email protected]
Last time

Algorithms and functions


I Recipes
I Abstraction

Plan for today:


I Search algorithms
I How to analyse algorithm complexity
I “How does my code slow as my data grows?”
I Sorting algorithms

2 / 27
Announcements

I Check that you see the DSAP module on https://okpy.org


I No office hours this week — send me any questions by email
I Homework 1 due next Monday
I Homework 2 posted this Friday, due 30 September
I Tutorial 3 deadline extended from 16 September to 23
September
I Quiz on Friday

3 / 27
The quiz

Friday’s tutorial: around 7 multiple-choice questions, 25 minutes in


the beginning of the session
I Python
I Complexity analysis
I Algorithms: high-level ideas
I On your laptop so you can use Spyder to try things
I This may mean some questions will be difficult to solve in
Spyder...
I No communication allowed
I Sample questions available on the Hub

4 / 27
Who has studied search algorithms before?

5 / 27
Who has studied search algorithms before?

I How many guesses will we need?


I Go to www.menti.com

5 / 27
Search algorithms

Search for the word “swagger”?

6 / 27
What is the worst we could do?

If the answers were “yes” and “no”?


I Linear search - worst case: go through everything

If the answers are “too low” and “too high”?


I Each time, discard half or remaining numbers
I Binary search - worst case?

7 / 27
Logarithms
Exponentials: 24 = 2 × 2 × 2 × 2 = 16

Logarithm flips the exponential:


I log2 16 = 4
I “How many 2s do we multiply to get 16?"
I “How many times do we divide 16 by 2 to get to 1?”

8 / 27
Logarithms
Exponentials: 24 = 2 × 2 × 2 × 2 = 16

Logarithm flips the exponential:


I log2 16 = 4
I “How many 2s do we multiply to get 16?"
I “How many times do we divide 16 by 2 to get to 1?”

For numbers up to 128, we need to make at most log2 128 = 7


guesses
I What about for numbers up to 1024? 2048? 1 000 000?

8 / 27
Logarithms
Exponentials: 24 = 2 × 2 × 2 × 2 = 16

Logarithm flips the exponential:


I log2 16 = 4
I “How many 2s do we multiply to get 16?"
I “How many times do we divide 16 by 2 to get to 1?”

For numbers up to 128, we need to make at most log2 128 = 7


guesses
I What about for numbers up to 1024? 2048? 1 000 000?

Guess the number from 1 to n:


I Linear search: at most n guesses
I Binary search: at most log2 (n) guesses

8 / 27
Algorithm design: searching a list

Suppose we have a list L. We want to check


whether it contains the number 13.

9 / 27
What are computers good at?

1. Performing simple calculations


I Arithmetic operations
I Comparisons
I Assignments
I Accessing memory

2. Remembering the results

10 / 27
Goals in designing algorithms

1. Correctness — finds the correct answer for any input


2. Efficiency — finds the answer quickly

11 / 27
Goals in designing algorithms

1. Correctness — finds the correct answer for any input


2. Efficiency — finds the answer quickly
I It is important to understand both: think of airplane
software, Uber or algorithmic trading...

11 / 27
Goals in designing algorithms

1. Correctness — finds the correct answer for any input


2. Efficiency — finds the answer quickly
I It is important to understand both: think of airplane
software, Uber or algorithmic trading...

Efficiency:
I How much time will our computation take?
I How much memory will it need?

11 / 27
Example: linear search

Is x in list A?

1 def linear_search(A, x):


2 for elem in A:
3 if elem == x:
4 return True
5 return False

Efficiency:
I How much time will our computation take?
I How much memory will it need?

12 / 27
How much time will it take?

Simple: run and time it? But time depends on


1. Speed of computer
2. Specifics of implementation
3. Value of input

13 / 27
How much time will it take?

Simple: run and time it? But time depends on


1. Speed of computer
2. Specifics of implementation
3. Value of input

We can avoid 1 and 2 by measuring time in the number of basic


steps executed
I Step: constant-time computer operation
I Arithmetic operations
I Comparisons
I Assignments
I Accessing memory

13 / 27
How much time will it take?

Simple: run and time it? But time depends on


1. Speed of computer
2. Specifics of implementation
3. Value of input

We can avoid 1 and 2 by measuring time in the number of basic


steps executed
I Step: constant-time computer operation
I Arithmetic operations
I Comparisons
I Assignments
I Accessing memory

For 3, measure number of steps depending on the size of input

13 / 27
Complexity and input

Searching for an item in a list?

1 def linear_search(A, x):


2 # A is a list of length n
3 for elem in A:
4 if elem == x:
5 return True
6 return False

14 / 27
Complexity and input

Searching for an item in a list?

1 def linear_search(A, x):


2 # A is a list of length n
3 for elem in A:
4 if elem == x:
5 return True
6 return False

I x could be the first element of A


I x could not be in A
I How to give a general complexity measure?

14 / 27
Complexity cases

Cases for given input size (length of A):


I Best case — minimum time
I Worst case — maximum time
I Average case — average or expected time over all possible
inputs

15 / 27
Complexity cases

Cases for given input size (length of A):


I Best case — minimum time
I Worst case — maximum time
I Average case — average or expected time over all possible
inputs

Principle: focus on worst-case analysis


I Upper bound on running time
I Bonus: usually easier to analyze

15 / 27
Example
1 def sum_up_to(n):
2 result = 0 # 1 step
3 while n > 0: # 1 step, n times
4 result = result + n # 2 steps, n times
5 n = n - 1 # 2 steps, n times
6 return result # 1 step

16 / 27
Example
1 def sum_up_to(n):
2 result = 0 # 1 step
3 while n > 0: # 1 step, n times
4 result = result + n # 2 steps, n times
5 n = n - 1 # 2 steps, n times
6 return result # 1 step

Total: 5n + 2 steps
I As n gets large, 2 is irrelevant
I Arguably, so is 5
I It’s the size of the problem that matters

16 / 27
Example
1 def sum_up_to(n):
2 result = 0 # 1 step
3 while n > 0: # 1 step, n times
4 result = result + n # 2 steps, n times
5 n = n - 1 # 2 steps, n times
6 return result # 1 step

Total: 5n + 2 steps
I As n gets large, 2 is irrelevant
I Arguably, so is 5
I It’s the size of the problem that matters

Principle: ignore constant factors and lower-order terms


I These depend on computer and program implementation
I They do not matter for large inputs
I Simplifies comparisons
16 / 27
Example

1 def f(x): # x integer


2 ans = 0 # 1 step
3 for i in range(100):
4 ans += 1 # 200 steps
5 for i in range(x):
6 ans += 1 # 2*x
7 for i in range(x):
8 for j in range(x):
9 ans -= 1 # 2*x^2
10 return ans # 1 step for return

17 / 27
Example

1 def f(x): # x integer


2 ans = 0 # 1 step
3 for i in range(100):
4 ans += 1 # 200 steps
5 for i in range(x):
6 ans += 1 # 2*x
7 for i in range(x):
8 for j in range(x):
9 ans -= 1 # 2*x^2
10 return ans # 1 step for return

Steps: 202 + 2x + 2x 2
I x small -> first loop dominates (x = 3)

17 / 27
Example

1 def f(x): # x integer


2 ans = 0 # 1 step
3 for i in range(100):
4 ans += 1 # 200 steps
5 for i in range(x):
6 ans += 1 # 2*x
7 for i in range(x):
8 for j in range(x):
9 ans -= 1 # 2*x^2
10 return ans # 1 step for return

Steps: 202 + 2x + 2x 2
I x small -> first loop dominates (x = 3)
I x large -> last loop dominates (x = 106 )

17 / 27
Example

1 def f(x): # x integer


2 ans = 0 # 1 step
3 for i in range(100):
4 ans += 1 # 200 steps
5 for i in range(x):
6 ans += 1 # 2*x
7 for i in range(x):
8 for j in range(x):
9 ans -= 1 # 2*x^2
10 return ans # 1 step for return

Steps: 202 + 2x + 2x 2
I x small -> first loop dominates (x = 3)
I x large -> last loop dominates (x = 106 )
I Only need to consider last (nested) loop for large x

17 / 27
Example

1 def f(x): # x integer


2 ans = 0 # 1 step
3 for i in range(100):
4 ans += 1 # 200 steps
5 for i in range(x):
6 ans += 1 # 2*x
7 for i in range(x):
8 for j in range(x):
9 ans -= 1 # 2*x^2
10 return ans # 1 step for return

Steps: 202 + 2x + 2x 2
I x small -> first loop dominates (x = 3)
I x large -> last loop dominates (x = 106 )
I Only need to consider last (nested) loop for large x
I Does the 2 in 2x 2 matter? For large x, order of growth much
more important
17 / 27
Asymptotic analysis

Principle 0: measure number of basic operations as function of input


size

18 / 27
Asymptotic analysis

Principle 0: measure number of basic operations as function of input


size

Principle 1: focus on worst-case analysis

18 / 27
Asymptotic analysis

Principle 0: measure number of basic operations as function of input


size

Principle 1: focus on worst-case analysis

Principle 2: ignore constant factors and lower-order terms

18 / 27
Asymptotic analysis

Principle 0: measure number of basic operations as function of input


size

Principle 1: focus on worst-case analysis

Principle 2: ignore constant factors and lower-order terms

Principle 3: only care about large inputs


I Only large problems are interesting
I What happens when size gets very large?

18 / 27
Asymptotic analysis

Principle 0: measure number of basic operations as function of input


size

Principle 1: focus on worst-case analysis

Principle 2: ignore constant factors and lower-order terms

Principle 3: only care about large inputs


I Only large problems are interesting
I What happens when size gets very large?

Formal way to describe this approach:


I Big-O notation: upper bound on worst-case running time

18 / 27
Big-O: bound on runtime growth rate

Let T (n) be the number of steps taken for input size n:


I Example: T (n) = 202 + 2n + 2n2

19 / 27
Big-O: bound on runtime growth rate

Let T (n) be the number of steps taken for input size n:


I Example: T (n) = 202 + 2n + 2n2

Definition: T (n) is O(f (n)) if for all sufficiently large n, T (n) is


bounded above by a constant multiple of f (n)

19 / 27
Big-O: bound on runtime growth rate

Let T (n) be the number of steps taken for input size n:


I Example: T (n) = 202 + 2n + 2n2

Definition: T (n) is O(f (n)) if for all sufficiently large n, T (n) is


bounded above by a constant multiple of f (n)

Example: T (n) is O(n2 ) if for all large enough n, T (n) is bounded


above by a constant multiple of f (n) = n2
I Gist: for high values of n, does f (n) "grow at least as quickly"?
I cf (n) = cn2 ? (for any constant c)
I T (n) = 202 + 2n + 2n2 ?
I What if f (n) = n, that is O(n)?

19 / 27
Big O tells us how fast the algorithm is

Fast algorithm: worst-case running time grows slowly with input size

I O(1): constant running time — primitive operations


I O(log n): logarithmic running time
I O(n): linear running time — linear search
I O(n log n): log-linear time
I O(nc ): polynomial running time
I O(c n ): exponential running time
I O(n!): factorial running time

20 / 27
Binary search on sorted list
Algorithm for finding x in sorted list L:
I Pick an index i roughly dividing L in half
I If L[i] == x, return True (if nothing left to search return False)
I If not:
I If L[i] > x, repeat search on left half of L
I Otherwise repeat search on right half

21 / 27
Binary search on sorted list
Algorithm for finding x in sorted list L:
I Pick an index i roughly dividing L in half
I If L[i] == x, return True (if nothing left to search return False)
I If not:
I If L[i] > x, repeat search on left half of L
I Otherwise repeat search on right half

Find number 24 in a list L = [9, 24, 32, 56, 57, 61, 59, 99]

21 / 27
Binary search on sorted list
Algorithm for finding x in sorted list L:
I Pick an index i roughly dividing L in half
I If L[i] == x, return True (if nothing left to search return False)
I If not:
I If L[i] > x, repeat search on left half of L
I Otherwise repeat search on right half

Find number 24 in a list L = [9, 24, 32, 56, 57, 61, 59, 99]
First iteration
9 24 32 56 57 59 61 99
9 24 32 56 57 59 61 99
L[i] = 56 > 24 → discard right half and search left half

21 / 27
Binary search on sorted list
Algorithm for finding x in sorted list L:
I Pick an index i roughly dividing L in half
I If L[i] == x, return True (if nothing left to search return False)
I If not:
I If L[i] > x, repeat search on left half of L
I Otherwise repeat search on right half

Find number 24 in a list L = [9, 24, 32, 56, 57, 61, 59, 99]
First iteration
9 24 32 56 57 59 61 99
9 24 32 56 57 59 61 99
L[i] = 56 > 24 → discard right half and search left half

Second iteration
9 24 32 56 57 59 61 99
L[i] = 24 → return True
21 / 27
Binary search complexity
Algorithm for finding x in list L:
I Pick an index i roughly dividing L in half
I If L[i] == x, return True (if nothing left to search return False)
I If not:
I If L[i] > x, repeat search in left half of L
I Otherwise repeat search in right half

Complexity = # of iterations × Constant time per iteration

22 / 27
Binary search complexity
Algorithm for finding x in list L:
I Pick an index i roughly dividing L in half
I If L[i] == x, return True (if nothing left to search return False)
I If not:
I If L[i] > x, repeat search in left half of L
I Otherwise repeat search in right half

Complexity = # of iterations × Constant time per iteration

But how many iterations?

22 / 27
Binary search complexity
Algorithm for finding x in list L:
I Pick an index i roughly dividing L in half
I If L[i] == x, return True (if nothing left to search return False)
I If not:
I If L[i] > x, repeat search in left half of L
I Otherwise repeat search in right half

Complexity = # of iterations × Constant time per iteration

But how many iterations?


I How many times can you split n items in half?

22 / 27
Binary search complexity
Algorithm for finding x in list L:
I Pick an index i roughly dividing L in half
I If L[i] == x, return True (if nothing left to search return False)
I If not:
I If L[i] > x, repeat search in left half of L
I Otherwise repeat search in right half

Complexity = # of iterations × Constant time per iteration

But how many iterations?


I How many times can you split n items in half?
I log2 (n) (but base of logarithm does not matter for big-O)

22 / 27
Binary search complexity
Algorithm for finding x in list L:
I Pick an index i roughly dividing L in half
I If L[i] == x, return True (if nothing left to search return False)
I If not:
I If L[i] > x, repeat search in left half of L
I Otherwise repeat search in right half

Complexity = # of iterations × Constant time per iteration

But how many iterations?


I How many times can you split n items in half?
I log2 (n) (but base of logarithm does not matter for big-O)
I Complexity O(log n)!

22 / 27
Sorting algorithms

So if we have an unsorted list, should we sort it first?


I Suppose complexity O(sort(n))
I Is it less work to sort and do a binary search than do a linear
search?
I In other words: is sort(n) + log(n) < n?
I No...

In practice: what if we need to search repeatedly, say k times?


I Is sort(n) + k log(n) < kn?
I Depends on k ...

23 / 27
How would you sort a list?

56 24 99 32 9 61 57 79

24 / 27
How would you sort a list?

56 24 99 32 9 61 57 79
9 24 99 32 56 61 57 79

24 / 27
How would you sort a list?

56 24 99 32 9 61 57 79
9 24 99 32 56 61 57 79
9 24 99 32 56 61 57 79

24 / 27
How would you sort a list?

56 24 99 32 9 61 57 79
9 24 99 32 56 61 57 79
9 24 99 32 56 61 57 79
9 24 32 99 56 61 57 79

24 / 27
How would you sort a list?

56 24 99 32 9 61 57 79
9 24 99 32 56 61 57 79
9 24 99 32 56 61 57 79
9 24 32 99 56 61 57 79
9 24 32 56 99 61 57 79

24 / 27
How would you sort a list?

56 24 99 32 9 61 57 79
9 24 99 32 56 61 57 79
9 24 99 32 56 61 57 79
9 24 32 99 56 61 57 79
9 24 32 56 99 61 57 79
9 24 32 56 57 61 99 79

24 / 27
How would you sort a list?

56 24 99 32 9 61 57 79
9 24 99 32 56 61 57 79
9 24 99 32 56 61 57 79
9 24 32 99 56 61 57 79
9 24 32 56 99 61 57 79
9 24 32 56 57 61 99 79
9 24 32 56 57 61 99 79

24 / 27
How would you sort a list?

56 24 99 32 9 61 57 79
9 24 99 32 56 61 57 79
9 24 99 32 56 61 57 79
9 24 32 99 56 61 57 79
9 24 32 56 99 61 57 79
9 24 32 56 57 61 99 79
9 24 32 56 57 61 99 79
9 24 32 56 57 61 79 99

24 / 27
How would you sort a list?

56 24 99 32 9 61 57 79
9 24 99 32 56 61 57 79
9 24 99 32 56 61 57 79
9 24 32 99 56 61 57 79
9 24 32 56 99 61 57 79
9 24 32 56 57 61 99 79
9 24 32 56 57 61 99 79
9 24 32 56 57 61 79 99

In words: Find smallest item and move it to the front (swap with the
first unsorted item). Repeat with remaining unsorted items.

24 / 27
Selection sort algorithm

Selection sort list L of length n:

25 / 27
Selection sort algorithm

Selection sort list L of length n:


I Repeat n times:
I Find smallest unsorted element
I Swap its position with the first unsorted element

Python:

1 def selection_sort(L):
2 M = L[:] # make a copy of list to also preserve original
3 n = len(M)
4 for index in range(n):
5 min_index = find_min_index(M, index) # index with smallest element
6 M[index], M[min_index] = M[min_index], M[index] # swap positions
7 return M

25 / 27
Selection sort complexity

Correctness (for those into math): can be proved by induction

26 / 27
Selection sort complexity

Correctness (for those into math): can be proved by induction

Complexity:
I O(n) passes of main loop
I Each pass: search for the smallest element in O(n)
I Total O(n2 )

26 / 27
Selection sort complexity

Correctness (for those into math): can be proved by induction

Complexity:
I O(n) passes of main loop
I Each pass: search for the smallest element in O(n)
I Total O(n2 )

Can we do better?

26 / 27
Selection sort complexity

Correctness (for those into math): can be proved by induction

Complexity:
I O(n) passes of main loop
I Each pass: search for the smallest element in O(n)
I Total O(n2 )

Can we do better?
I Yes! Merge sort is O(n log n)
I But you can’t do any better than that...

26 / 27
Complexity matters

You’re planning a trip around the world visiting 10 cities.


What’s the cheapest route?

Check all alternative routes?

27 / 27
Complexity matters

You’re planning a trip around the world visiting 10 cities.


What’s the cheapest route?

Check all alternative routes?


I There are 10 × 9 × 8 × · · · × 2 × 1 = 3628800
possible routes
I Factorial complexity O(n!)
I Travelling salesperson problem

27 / 27
Review

Measuring algorithm time complexity:


I Number of basic steps taken
I Worst-case analysis
I Focus on large inputs

Searching and sorting are canonical algorithms problems

Workshop after the break


I Big O practice
I Search algorithms + optional exercises on sorting

28 / 27
Workshop

Workshop zip file on the Hub


I HTML instructions
I At some point, you’ll need the .py-file with skeleton
code (open in Spyder)
Appendix: Merge sort

These extra slides are for the merge sort


algorithm in the optional exercises
How to merge two sorted lists into one?
Loop through both lists simultaneously, copy smaller item to new list z
I Compare items at indices i1 = i2 = 0, update with every copy
operation

x = 24 32 56

y = 19 57 61

z=
How to merge two sorted lists into one?
Loop through both lists simultaneously, copy smaller item to new list z
I Compare items at indices i1 = i2 = 0, update with every copy
operation

x = 24 32 56 i1 = 0

y = 19 57 61 i2 = 0

z=
How to merge two sorted lists into one?
Loop through both lists simultaneously, copy smaller item to new list z
I Compare items at indices i1 = i2 = 0, update with every copy
operation

x = 24 32 56 i1 = 0

y = 19 57 61 i2 = 1

z = 19
How to merge two sorted lists into one?
Loop through both lists simultaneously, copy smaller item to new list z
I Compare items at indices i1 = i2 = 0, update with every copy
operation

x = 24 32 56 i1 = 1

y = 19 57 61 i2 = 1

z = 19 24
How to merge two sorted lists into one?
Loop through both lists simultaneously, copy smaller item to new list z
I Compare items at indices i1 = i2 = 0, update with every copy
operation

x = 24 32 56 i1 = 2

y = 19 57 61 i2 = 1

z = 19 24 32
How to merge two sorted lists into one?
Loop through both lists simultaneously, copy smaller item to new list z
I Compare items at indices i1 = i2 = 0, update with every copy
operation

x = 24 32 56 i1 = 3

y = 19 57 61 i2 = 1

z = 19 24 32 56
How to merge two sorted lists into one?
Loop through both lists simultaneously, copy smaller item to new list z
I Compare items at indices i1 = i2 = 0, update with every copy
operation

x = 24 32 56 i1 = 3

y = 19 57 61 i2 = 2

z = 19 24 32 56 57
How to merge two sorted lists into one?
Loop through both lists simultaneously, copy smaller item to new list z
I Compare items at indices i1 = i2 = 0, update with every copy
operation

x = 24 32 56 i1 = 3

y = 19 57 61 i2 = 3

z = 19 24 32 56 57 61
How to merge two sorted lists into one?
Loop through both lists simultaneously, copy smaller item to new list z
I Compare items at indices i1 = i2 = 0, update with every copy
operation

x = 24 32 56 i1 = 3

y = 19 57 61 i2 = 3

z = 19 24 32 56 57 61

What is the complexity of this operation?


I Lengths of lists are n1 and n2
I Two lists of lengths n1 and n2 : O(n1 + n2 ) copy
operations (need to copy each item)
I No more comparisons than copy operations
Sidebar: recursion
The factorial of n is the product of integers 1, ..., n.
I As a function: fact(n) = n × (n − 1) × (n − 2) × · · · × 2 × 1
I By convention, fact(0) = 1

28 / 27
Sidebar: recursion
The factorial of n is the product of integers 1, ..., n.
I As a function: fact(n) = n × (n − 1) × (n − 2) × · · · × 2 × 1
I By convention, fact(0) = 1

1 def fact(n):
2 result = 0
3 for i in range(1, n+1):
4 result = result * i
5 return result
6 print(fact(4))

28 / 27
Sidebar: recursion
The factorial of n is the product of integers 1, ..., n.
I As a function: fact(n) = n × (n − 1) × (n − 2) × · · · × 2 × 1
I By convention, fact(0) = 1

1 def fact(n):
2 result = 0
3 for i in range(1, n+1):
4 result = result * i
5 return result
6 print(fact(4))

But we can also write the factorial as follows:

fact(n) = 1, for n = 0
fact(n) = n × fact(n − 1), for n > 0

28 / 27
Sidebar: recursion
We can also write the factorial as follows:

fact(n) = 1, for n = 0
fact(n) = n × fact(n − 1), for n > 0

Factorial can be expressed as a smaller version of itself:

28 / 27
Sidebar: recursion
We can also write the factorial as follows:

fact(n) = 1, for n = 0
fact(n) = n × fact(n − 1), for n > 0

Factorial can be expressed as a smaller version of itself:

1 def fact_rec(n):
2 if n == 0:
3 return 1
4 else:
5 return n*fact_rec(n-1)
6 print(fact_rec(4))

This is called recursion


I Function calls itself
I Can make some problems easier to define -> merge sort!
28 / 27
Merge sort idea

Divide and conquer:


I Identify smallest possible “base case” subproblems that are easy
to solve
I Divide large problem and solve smaller subproblems
I Find a way to combine subproblem solutions to solve larger
problems
Merge sort idea

Divide and conquer:


I Identify smallest possible “base case” subproblems that are easy
to solve
I Divide large problem and solve smaller subproblems
I Find a way to combine subproblem solutions to solve larger
problems

Merge sort:
Merge sort idea

Divide and conquer:


I Identify smallest possible “base case” subproblems that are easy
to solve
I Divide large problem and solve smaller subproblems
I Find a way to combine subproblem solutions to solve larger
problems

Merge sort:
I Base case: if list length n < 2, the list is sorted
Merge sort idea

Divide and conquer:


I Identify smallest possible “base case” subproblems that are easy
to solve
I Divide large problem and solve smaller subproblems
I Find a way to combine subproblem solutions to solve larger
problems

Merge sort:
I Base case: if list length n < 2, the list is sorted
I Divide: if list length n ≥ 2, split into two lists and merge sort each
Merge sort idea

Divide and conquer:


I Identify smallest possible “base case” subproblems that are easy
to solve
I Divide large problem and solve smaller subproblems
I Find a way to combine subproblem solutions to solve larger
problems

Merge sort:
I Base case: if list length n < 2, the list is sorted
I Divide: if list length n ≥ 2, split into two lists and merge sort each
I Combine (merge) the results of the two smaller merge sorts
Merge sort

Dividing
56 24 99 32 9 61 57 79

Merging
Merge sort

Dividing
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79

Merging
Merge sort

Dividing
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32

Merging
Merge sort

Dividing
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32
56 24

Merging
Merge sort

Dividing
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32
56 24

Merging
24 56
Merge sort

Dividing
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32
56 24 99 32

Merging
24 56
Merge sort

Dividing
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32
56 24 99 32

Merging
24 56 32 99
Merge sort

Dividing
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32
56 24 99 32

Merging
24 56 32 99
24 32 56 99
Merge sort

Dividing
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32

Merging
24 56 32 99
24 32 56 99
Merge sort

Dividing
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32 9 61

Merging
24 56 32 99
24 32 56 99
Merge sort

Dividing
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32 9 61

Merging
24 56 32 99 9 61
24 32 56 99
Merge sort

Dividing
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79

Merging
24 56 32 99 9 61
24 32 56 99
Merge sort

Dividing
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79

Merging
24 56 32 99 9 61 57 79
24 32 56 99
Merge sort

Dividing
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79

Merging
24 56 32 99 9 61 57 79
24 32 56 99 9 57 61 79
Merge sort

Dividing
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79
56 24 99 32 9 61 57 79

Merging
24 56 32 99 9 61 57 79
24 32 56 99 9 57 61 79
9 24 32 56 57 61 79 99
Merge sort complexity

What is the complexity of merge? Two lists of lengths n1 , n2 :


Merge sort complexity

What is the complexity of merge? Two lists of lengths n1 , n2 :


I Two lists of lengths n1 and n2 : O(n1 + n2 ) copy operations (need
to copy each item)
Merge sort complexity

What is the complexity of merge? Two lists of lengths n1 , n2 :


I Two lists of lengths n1 and n2 : O(n1 + n2 ) copy operations (need
to copy each item)
I No more comparisons than copy operations
Merge sort complexity

What is the complexity of merge? Two lists of lengths n1 , n2 :


I Two lists of lengths n1 and n2 : O(n1 + n2 ) copy operations (need
to copy each item)
I No more comparisons than copy operations
I If original list length is n, total O(n) work for each round of
merging
Merge sort complexity

What is the complexity of merge? Two lists of lengths n1 , n2 :


I Two lists of lengths n1 and n2 : O(n1 + n2 ) copy operations (need
to copy each item)
I No more comparisons than copy operations
I If original list length is n, total O(n) work for each round of
merging

Merge sort complexity = merging × number of divisions


Merge sort complexity

What is the complexity of merge? Two lists of lengths n1 , n2 :


I Two lists of lengths n1 and n2 : O(n1 + n2 ) copy operations (need
to copy each item)
I No more comparisons than copy operations
I If original list length is n, total O(n) work for each round of
merging

Merge sort complexity = merging × number of divisions


I Number of division levels O(log n) (like binary search)
Merge sort complexity

What is the complexity of merge? Two lists of lengths n1 , n2 :


I Two lists of lengths n1 and n2 : O(n1 + n2 ) copy operations (need
to copy each item)
I No more comparisons than copy operations
I If original list length is n, total O(n) work for each round of
merging

Merge sort complexity = merging × number of divisions


I Number of division levels O(log n) (like binary search)
I Log-linear: O(n log n)
Merge sort complexity

What is the complexity of merge? Two lists of lengths n1 , n2 :


I Two lists of lengths n1 and n2 : O(n1 + n2 ) copy operations (need
to copy each item)
I No more comparisons than copy operations
I If original list length is n, total O(n) work for each round of
merging

Merge sort complexity = merging × number of divisions


I Number of division levels O(log n) (like binary search)
I Log-linear: O(n log n)
I Big improvement over selection sort!
Merge sort complexity

What is the complexity of merge? Two lists of lengths n1 , n2 :


I Two lists of lengths n1 and n2 : O(n1 + n2 ) copy operations (need
to copy each item)
I No more comparisons than copy operations
I If original list length is n, total O(n) work for each round of
merging

Merge sort complexity = merging × number of divisions


I Number of division levels O(log n) (like binary search)
I Log-linear: O(n log n)
I Big improvement over selection sort!
I Does need some more space due to copying lists
Complexity classes

Fast algorithm: worst-case running time grows slowly with input size

I O(1): constant running time — primitive operations


I O(log n): logarithmic running time — binary search
I O(n): linear running time — linear search
I O(n log n): log-linear time — merge sort
I O(nc ): polynomial running time — selection sort
I O(c n ): exponential running time — ??
Sorting is a canonical algorithms problem

Many algorithms exist: bubble sort, insertion sort, quick sort, radix
sort, heap sort, ...
I Useful for developing algorithmic thinking – eg randomized
algorithms
Sorting is a canonical algorithms problem

Many algorithms exist: bubble sort, insertion sort, quick sort, radix
sort, heap sort, ...
I Useful for developing algorithmic thinking – eg randomized
algorithms

Theoretical bound for worst-case performance is O(n log n) – we


can’t do better than merge sort
Sorting is a canonical algorithms problem

Many algorithms exist: bubble sort, insertion sort, quick sort, radix
sort, heap sort, ...
I Useful for developing algorithmic thinking – eg randomized
algorithms

Theoretical bound for worst-case performance is O(n log n) – we


can’t do better than merge sort

But other algorithms are better on average


I Python uses timsort (In 2002, a Dutch guy called Tim got
frustrated with existing algorithms)
I Exploit the fact that lists tend to be partly sorted already
Sorting is a canonical algorithms problem

Many algorithms exist: bubble sort, insertion sort, quick sort, radix
sort, heap sort, ...
I Useful for developing algorithmic thinking – eg randomized
algorithms

Theoretical bound for worst-case performance is O(n log n) – we


can’t do better than merge sort

But other algorithms are better on average


I Python uses timsort (In 2002, a Dutch guy called Tim got
frustrated with existing algorithms)
I Exploit the fact that lists tend to be partly sorted already

You might also like