0% found this document useful (0 votes)
6 views

Notes2 Handout

Review of Basic Data Structures.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Notes2 Handout

Review of Basic Data Structures.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

2-1

Outline of these notes

I Review of basic data structures


I Searching in a sorted array/binary search: the algorithm,
analysis, proof of optimality
I Sorting, part 1: insertion sort, quicksort, mergesort

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-2

Basic Data structures

Prerequisite material. Review [GT Chapters 2–4, 6] as necessary)


I Arrays, dynamic arrays
I Linked lists
I Stacks, queues
I Dictionaries, hash tables
I Binary trees

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-3

Arrays, Dynamic arrays, Linked lists


I Arrays:
I Numbered collection of cells or entries
I Numbering usually starts at 0
I Fixed number of entries
I Each cell has an index which uniquely identifies it.
I Accessing or modifying the contents of a cell given its index:
O(1) time.
I Inserting or deleting an item in the middle of an array is slow.
I Dynamic arrays:
I Similar to arrays, but size can be increased or decreased
I ArrayList in Java, list in Python
I Linked lists:
I Collection of nodes that form a linear ordering.
I The list has a first node and a last node
I Each node has a next node and a previous node (possibly null)
I Inserting or deleting an item in the middle of linked list is fast.
I Accessing a cell given its index (i.e., finding the kth item in the
list) is slow.
CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine
2-4

Stacks and Queues

I Stacks:
I Container of objects that are inserted and removed according
to Last-In First-Out (LIFO) principle:
I Only the most-recently inserted object can be removed.
I Insert and remove are usually called push and pop
I Queues (often called FIFO Queues)
I Container of objects that are inserted and removed according
to First-In First-Out (FIFO) principle:
I Only the element that has been in the queue the longes can
be removed.
I Insert and remove are usually called enqueue and dequeue
I Elements are inserted at the rear of the queue and are removed
from the front

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-5

Dictionaries/Maps

I Dictionaries
I A Dictionary (or Map) stores <key,value> pairs, which are
often referred to as items
I There can be at most item with a given key.
I Examples:
1. <Student ID, Student data>
2. <Object ID, Object data>

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-6

Hashing
An efficient method for implementing a dictionary. Uses
I A hash table, an array of size N.
I A hash function, which maps any key from the set of possible
keys to an integer in the range [0, N − 1]
I A collision strategy, which determines what to do when two
keys are mapped to the same table location by the hash
function.Commonly used collision strategies are:
I Chaining
I Open addressing: linear probing, quadratic probing, double
hashing
I Cuckoo hashing
Hashing is fast:
I O(1) expected time for access, insertion
I Cuckoo hashing improves the access time to O(1) worst-case
time. Insertion time remains O(1) expected time.
Disadvantages on next slide.
CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine
2-7

Hashing: Disadvantages

I Access time (except for cuckoo hashing) and insertion time


(for all strategies) is expected time, not worst-case time. So
there is no absolute guarantee on performance.
I Performance depends on the load factor
n
α= ,
N
where n is the number of items stored and N is the table size.
As α gets larger, performance deteriorates.
I Hashing can tell us whether a given key is in the dictionary. It
cannot tell us if nearby keys are in the dictionary.
I Is the word cas stored in the dictionary? (Exact match query)
I What is the first word in the dictionary that comes after cas?
(Successor query)

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-8

Binary Trees: a quick review

We will use as a data structure and as a tool for analyzing


algorithms.
Level 0 (root)

Level 1

Level 2

Level 3

The depth of a binary tree is the maximum of the levels of all its
leaves.

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-9

Traversing binary trees


A

B C

D E F

G H

I Preorder: root, left subtree (in preorder), right subtree (in


preorder): ABDGHCEF
I Inorder: left subtree (in inorder), root, right subtree (in
inorder): GDHBAECF
I Postorder: left subtree (in postorder), right subtree (in
postorder), root: GHDBEFCA
I Breadth-first order (level order): level 0 left-to-right, then
level 1 left-to-right, . . . : ABCDEFGH
CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine
2-10

Facts about binary trees

1. There are at most 2k nodes at level k.


2. A binary tree with depth d has:
I At most 2d leaves.
I At most 2d+1 − 1 nodes.
3. A binary tree with n leaves has depth ≥ dlg ne.
4. A binary tree with n nodes has depth ≥ blg nc.

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-11

Binary search trees


47

36 65

25 52 79

9 32

I Function as ordered dictionaries. (Can find successors,


predecessors)
I find, insert, and remove can all be done in O(h) time
(h = tree height)
I AVL trees, Red-Black Trees, Weak AVL trees: h = O(log n),
so find, insert, and remove can all be done in O(log n) time.
I Splay trees and Skip Lists: alternatives to balanced trees
I Can traverse the tree and list all items in O(n) time.
I [GT] Chapters 3–4 for details
CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine
2-12

Binary Search: Searching in a sorted array

I Input is a sorted array A and an item x.


I Problem is to locate x in the array.
I Several variants of the problem, for example. . .
1. Determine whether x is stored in the array
2. Find the largest i such that A[i] ≤ x (with a reasonable
convention if x < A[0]).
We will focus on the first variant.
I We will show that binary search is an optimal algorithm for
solving this problem.

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-13

Binary Search: Searching in a sorted array


Input: A: Sorted array with n entries [0..n − 1]
x: Item we are seeking
Output: Location of x, if x found
-1, if x not found

def binarySearch(A,x,first,last)
if first > last:
return (-1)
else:
mid = b(first+last)/2c
if x == A[mid]:
return mid
else if x < A[mid]:
return binarySearch(A,x,first,mid-1)
else:
return binarySearch(A,x,mid+1,last)
binarySearch(A,x,0,n-1)
CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine
2-14

Correctness of Binary Search

We need to prove two things:


1. If x is in the array, its location in the array (its index) is
between first and last, inclusive.
Note that this is equivalent to:
Either x is not in the array, or its location is between
first and last, inclusive.

2. On each recursive call, the difference last − first gets strictly


smaller.
first last

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-15

Correctness of Binary Search


To prove that the invariant continues to hold, we need to consider
three cases.
1. last ≥ first + 2
first last

mid

2. last = first + 1
first last

mid

3. last = first
first = last

mid

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-16

Binary Search: Analysis of Running Time

I We will count the number of 3-way comparisons of x against


elements of A. (also known as decisions)
I Rationale:
1. This is the essentially the same as the number of recursive
calls. Every recursive call, except for possibly the very last one,
results in a 3-way comparison.
2. Gives us a way to compare binary search against other
algorithms that solve the same problem: searching for an item
in an array by comparing the item against array entries.

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-17

Binary Search: Analysis of Running Time (continued)


I Binary search in an array of size 1: 1 decision
I Binary search in an array of size n > 1: after 1 decision, either
we are done, or the problem is reduced to binary search in a
subarray with a worst-case size of bn/2c
I So the worst-case time to do binary search on an array of size
n is T (n), where T (n) satisfies the equation

1 if n = 1
T (n) =  n 
1+T 2 otherwise

I The solution to this equation is:


T (n) = blg nc + 1
This can be proved by induction.
I So binary search does blg nc + 1 3-way comparisons on an
array of size n, in the worst case.
CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine
2-18

Optimality of binary search


I We will establish a lower bound on the worst-case number of
decisions required to find an item in an array, using only 3-way
comparisons of the item against array entries.
I The lower bound we will establish is blg nc + 1 3-way
comparisons.
I Since Binary Search performs within this bound, it is optimal.
I Our lower bound is established using a Decision Tree model.
I Note that the bound is exact (not just asymptotic)
I Our lower bound is on the worst case
I It says: for every algorithm for finding an item in an array of
size n, there is some input that forces it to perform blg nc + 1
comparisons.
I It does not say: for every algorithm for finding an item in an
array of size n, every input forces it to perform blg nc + 1
comparisons.

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-19

The decision tree model for searching in an array


Consider any algorithm that searches for an item x in an array A of size n
by comparing entries in A against x. Any such algorithm can be modeled
as a decision tree:
I Each node is labeled with an integer ∈ {0 . . . n − 1}.
I A node labeled i represents a 3-way comparison between x and A[i].
I The left subtree of a node labeled i describes the decision tree for
what happens if x < A[i].
I The right subtree of a node labeled i describes the decision tree for
what happens if x > A[i].
Example: Decision tree for binary search with n = 13:
6

2 9

0 4 7 11

1 3 5 8 10 12

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-20

Lower bound on locating an item in an array of size n

1. Any algorithm for searching an array of size n can be modeled by a


decision tree with at least n nodes.
2. Since the decision tree is a binary tree with n nodes,
the depth is at least blg nc.
3. The worst-case number of comparisons for the algorithm is the depth of
the decision tree +1. (Remember, root has depth 0).

Hence any algorithm for locating an item in an array of size n using only
comparisons must perform at least blg nc + 1 comparisons in the worst case.
So binary search is optimal with respect to worst-case performance.
CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine
2-21

Sorting

I Rearranging a list of items in nondescending order.


I Useful preprocessing step (e.g., for binary search)
I Important step in other algorithms
I Illustrates more general algorithmic techniques
We will discuss
I Comparison-based sorting algorithms (Insertion sort, Selection
Sort, Quicksort, Mergesort, Heapsort)
I Bucket-based sorting methods

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-22

Comparison-based sorting
I Basic operation: compare two items.
I Abstract model.
I Advantage: doesn’t use specific properties of the data items.
So same algorithm can be used for sorting integers, strings,
etc.
I Disadvantage: under certain circumstances, specific properties
of the data item can speed up the sorting process.
I Measure of time: number of comparisons
I Consistent with philosophy of counting basic operations,
discussed earlier.
I Misleading if other operations dominate (e.g., if we sort by
moving items around without comparing them)
I Comparison-based sorting has lower bound of Ω(n log n)
comparisons. (We will prove this.)

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-23

Θ(n log n) work vs. quadratic (Θ(n2 )) work


y
700000

600000

500000

n

400000 y= 2

300000

200000

y = 10 n lg n
100000

n
200 400 600 800 1000

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-24

Some terminology

I A permutation of a sequence of items is a reordering of the


sequence. A sequence of n items has n! distinct permutations.
I Note: Sorting is the problem of finding a particular
distinguished permutation of a list.
I An inversion in a sequence or list is a pair of items such that
the larger one precedes the smaller one.
Example: The list

18 29 12 15 32 10

has 9 inversions:

{(18,12), (18,15), (18,10), (29,12), (29,15),


(29,10), (12,10), (15,10), (32,10)}

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-25

Insertion sort
I Work from left to right across array
I Insert each item in correct position with respect to (sorted)
elements to its left
0

(Unsorted)

(Sorted) x (Unsorted)

n−1

(Sorted)

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-26

Insertion sort pseudocode

≤x >x ··· >x x

def insertionSort(n, A):


for k = 1 to n-1:
x = A[k]
j = k-1
while (j >= 0) and (A[j] > x):
A[j+1] = A[j]
j = j-1
A[j+1] = x

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-27

Insertion sort example


23 19 42 17 85 38

23 19 42 17 85 38

19 23 42 17 85 38

19 23 42 17 85 38

17 19 23 42 85 38

17 19 23 42 85 38

17 19 23 38 42 85

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-28

Analysis of Insertion Sort


I Worst-case running time:
I On kth iteration of outer loop, element A[k] is compared with
at most k elements:
A[k − 1], A[k − 2], . . . , A[0].
I Total number comparisons over all iterations is at most:
n−1
X n(n − 1)
k= = O(n2 ).
2
k=1

I Insertion Sort is a bad choice when n is large. (O(n2 )


vs. O(n log n) ).
I Insertion Sort is a good choice when n is small. (Constant
hidden in the ”big oh” is small).
I Insertion Sort is efficient if the input is “almost sorted”:

Time ≤ n − 1 + (# inversions)

I Storage: in place: O(1) extra storage


CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine
2-29

Selection Sort

I Two variants:
1. Repeatedly (for i from 0 to n − 1) find the minimum value,
output it, delete it.
I Values are output in sorted order
2. Repeatedly (for i from n − 1 down to 1)
I Find the maximum of A[0],A[1],. . . ,A[i].
I Swap this value with A[i] (no-op if it is already A[i]).
I Both variants run in O(n2 ) time if we use the straightforward
approach to finding the maximum/minimum.
I They can be improved by treating the items A[0],A[1],. . . ,A[i]
as items in an appropriately designed priority queue. (Next set
of notes)

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-30

Sorting algorithms based on Divide and Conquer

Divide and conquer paradigm


1. Split problem into subproblem(s)
2. Solve each subproblem (usually via recursive call)
3. Combine solution of subproblem(s) into solution of original
problem
We will discuss two sorting algorithms based on this paradigm:
I Quicksort
I Mergesort

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-31

Quicksort

Basic idea
I Classify keys as small keys or large keys. All small keys are
less than all large keys
I Rearrange keys so small keys precede all large keys.
I Recursively sort small keys, recursively sort large keys.

keys

small keys large keys

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-32

Quicksort: One specific implementation


I Let the first item in the array be the pivot value x (also call
the split value).
I Small keys are the keys < x.
I Large keys are the keys ≥ x.

first last

x ?

first splitpoint last

<x x ≥x

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-33

Pseudocode for Quicksort

def quickSort(A,first,last):
if first < last:
splitpoint = split(A,first,last)
quickSort(A,first,splitpoint-1)
quickSort(A,splitpoint+1,last)

first splitpoint last

<x x ≥x

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-34

The split step

def split(A,first,last):
splitpoint = first
x = A[first]
for k = first+1 to last do:
if A[k] < x:
A[splitpoint+1] ↔ A[k]
splitpoint = splitpoint + 1
A[first] ↔ A[splitpoint]
return splitpoint

Loop invariants:
I A[first+1..splitpoint] contains keys < x.
I A[splitpoint+1..k-1] contains keys ≥ x.
I A[k..last] contains unprocessed keys.

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-35

The split step


At start:
first k last

x ?

splitpoint

In middle:
first splitpoint k last

x <x ≥x ?

At end:
first splitpoint last

x <x ≥x

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-36

Example of split step

27 83 23 36 15 79 22 18
s k

27 83 23 36 15 79 22 18
s k

27 23 83 36 15 79 22 18
s k

27 23 83 36 15 79 22 18
s k

27 23 15 36 83 79 22 18
s k

27 23 15 36 83 79 22 18
s k

27 23 15 22 83 79 36 18
s k

27 23 15 22 18 79 36 83
s

18 23 15 22 27 79 36 83
s

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-37

Analysis of Quicksort
We can visualize the lists sorted by quicksort as a binary tree.
I The root is the top-level list (of all items to be sorted)
I The children of a node are the two sublists to be sorted.
I Identify each list with its split value.

27 83 23 36 15 79 22 18

18 23 15 22 79 36 83

15 23 22 36 83

22

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-38

Worst-case Analysis of Quicksort

I Any pair of values x and y gets compared at most once


during the entire run of Quicksort.
I The number of possible comparisons is
 
n
= O(n2 )
2

I Hence the worst-case number of comparisons performed by


Quicksort when sorting n items is O(n2 ).
I Question: Is there a better bound? Is it o(n2 )? Or is it Θ(n2 )?
I Answer: The bound is tight. It is Θ(n2 ). We will see why on
the next slide.

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-39

A bad case case for Quicksort: 1, 2, 3, . . . , n − 1, n

1 2 3 ... n − 1 n

2 3 ... n − 1 n

3 ... n − 1 n

n−1 n

n

2 comparisons required. So the worst-case running time for
Quicksort is Θ(n2 ). But what about the average case . . . ?
CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine
2-40

Average-case analysis of Quicksort:

Our approach:
1. Use the binary tree of sorted lists
2. Number the items in sorted order
3. Calculate the probability that two items get compared
4. Use this to compute the expected number of comparisons
performed by Quicksort.

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-41

Average-case analysis of Quicksort:

27 83 23 36 15 79 22 18

18 23 15 22 79 36 83

15 23 22 36 83

22

Sorted order: 15 18 22 23 27 36 79 83

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-42

Average-case analysis of Quicksort

I Number the keys in sorted order: S1 < S2 < · · · < Sn .


I Fact about comparisons: During the run of Quicksort, two
keys Si and Sj get compared if and only if the first key from
the set of keys {Si , Si+1 , . . . , Sj } to be chosen as a pivot is
either Si or Sj .
I If some key Sk is chosen first with Si < Sk < Sj , then Si goes
in the left half, Sj goes in the right half, and Si and Sj never
get compared.
I If Si is chosen first, it is compared against all the other keys in
the set in the split step (including Sj ).
I Similar if Sj is chosen first.
Examples:
I 23 and 22 (both statements true)
I 36 and 83 (both statements false)

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-43

Average-case analysis of Quicksort


Assume:
I All n keys are distinct
I All permutations are equally likely
I The keys in sorted order are S1 < S2 < · · · < Sn .
Let Pi,j = The probability that keys Si and Sj are compared
with each other during the invocation of quicksort
Then by Fact about comparisons on previous slide:

Pi,j = The probability that the first key from


{Si , Si+1 , . . . , Sj } to be chosen as a pivot value is
either Si or Sj
2
=
j −i +1

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-44

Average-case analysis of Quicksort


Define indicator random variables {Xi,j : 1 ≤ i < j ≤ n}

1 if keys Si and Sj get compared
Xi,j =
0 if keys Si and Sj do not get compared

1. The total number of comparisons is:


n X
X n
Xi,j
i=1 j=i+1

2. The expected (average) total number of comparisons is:


 
Xn X n n X
X n
E Xi,j  = E (Xi,j )
i=1 j=i+1 i=1 j=i+1

3. The expected value of Xi,j is:


2
E (Xi,j ) = Pi,j =
j −i +1
CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine
2-45

Average-case analysis of Quicksort


Hence the expected number of comparisons is
n X
X n n X
X n
2
E (Xi,j ) =
j −i +1
i=1 j=i+1 i=1 j=i+1
n n−i+1
X X 2
= (k = j − i + 1)
k
i=1 k=2
Xn Xn
2
<
k
i=1 k=1
Xn Xn
1
= 2
k
i=1 k=1
Xn
= 2 Hn = 2nHn ∈ O(n lg n).
i=1

So the average time for Quicksort is O(n lg n).


CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine
2-46

Implementation tricks for improving Quicksort

1. Better choice of “pivot” item:


I Instead of a single item, choose median of 3 (or 5, or 7, . . . )
I Choose a random item (or randomly reorder the list as a
preprocessing step)
I Combine
2. Reduce procedure call overhead
I For small lists, use some other nonrecursive sort (e.g., insertion
sort or selection sort, or a minimum-comparison sort)
I Explicitly manipulate the stack in the program (rather than
making recursive calls)
3. Reduce stack space
I Push the larger sublist (the one with more items) and
immediately working on the smaller sublist.
I Reduces worst-case stack usage from O(n) to O(lg n).

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-47

MergeSort
I Split array into two equal subarrays
I Sort both subarrays (recursively)
I Merge two sorted subarrays
first mid last

def mergeSort(A,first,last):
if first < last:
mid = b(first + last)/2c
mergeSort(A,first,mid)
mergeSort(A,mid+1,last)
merge(A,first,mid,mid+1,last)

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-48

The merge step


first1 last1 first2 last2

temp

19 26 42 71 14 24 31 39

14 19 24 26 31 39 42 71

Merging two lists of total size n requires at most n − 1


comparisons.
CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine
2-49

Code for the merge step

def merge(A,first1,last1,first2,last2):
index1 = first1; index2 = first2; tempIndex = 0
// Merge into temp array until one input array is exhausted
while (index1 <= last1) and (index2 <= last2)
if A[index1] <= A[index2]:
temp[tempIndex++] = A[index1++]
else:
temp[tempIndex++] = A[index2++]
// Copy appropriate trailer portion
while (index1 <= last1): temp[tempIndex++] = A[index1++]
while (index2 <= last2): temp[tempIndex++] = A[index2++]
// Copy temp array back to A array
tempIndex = 0; index = first1
while (index <= last2): A[index++] = temp[tempIndex++]

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-50

Analysis of Mergesort

T (n) = number of comparisons required to sort n items in the


worst case
(    
T n2 + T n2 + n − 1, n > 1
T (n) =
0, n=1

The asymptotic solution of this recurrence equation is

T (n) = Θ(n log n)

The exact solution of this recurrence equation is

T (n) = ndlg ne − 2dlg ne + 1

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-51

Geometrical Application: Counting line intersections


I Input: n lines in the plane, none of which are vertical; two
vertical lines x = a and x = b (with a < b).
I Problem: Count/report all pairs of lines that intersect
between the two vertical lines x = a and x = b.
Example: n = 6 8 intersections

a b a

Checking every pair of lines takes Θ(n2 ) time. We can do better.


CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine
2-52

Geometrical Application: Counting line intersections


1. Sort the lines according to the y -coordinate of their
intersection with the line x = a. Number the lines in sorted
order. [O(n log n) time]
2. Produce the sequence of line numbers sorted according to the
y -coordinate of their intersection with the line x = b
[O(n log n) time]
3. Count/report inversions in the sequence produced in step 2.

6
5
4
3

a b a

So the problem reduces to counting/reporting inversions.


CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine
2-53

Counting Inversions: An Application of Mergesort

An inversion in a sequence or list is a pair of items such that the


larger one precedes the smaller one.

Example: The list [18, 29, 12, 15, 32, 10] has 9 inversions:
(18, 12), (18, 15), (18, 10), (29, 12), (29, 15), (29, 10), (12, 10), (15, 10), (32, 10)

In a list of size n, there can be as many as n2 inversions.

Problem: Given a list, compute the number of inversions.

Brute force solution: Check each pair i, j with i < j to see if


L[i] > L[j]. This gives a Θ(n2 ) algorithm. We can do better.

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-54

Inversion Counting

Sorting is the process of removing inversions. So to count


inversions:
I Run a sorting algorithm
I Every time data is rearranged, keep track of how many
inversions are being removed.
In principle, we can use any sorting algorithm to count inversions.
Mergesort works particularly nicely.

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-55

Inversion Counting with MergeSort


In Mergesort, the only time we rearrange data is during the merge
step.
first1 index1 last1 first2 index2 last2

temp

tempindex

The number of inversions removed is:

last1 − index1 + 1

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-56

Example

first1 index1 last1 first2 index2 last2

19 26 42 71 14 24 31 39

14 19 24 26 31

2 inversions removed: (42, 31) and (71, 31)

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-57

Pseudocode for the merge step with inversion counting


def merge(A,first1,last1,first2,last2):
index1 = first1; index2 = first2; tempIndex = 0
invCount = 0
// Merge into temp array until one input array is exhausted
while (index1 <= last1) and (index2 <= last2)
if A[index1] <= A[index2]:
temp[tempIndex++] = A[index1++]
else:
temp[tempIndex++] = A[index2++]
invCount += last1 - index1 + 1;
// Copy appropriate trailer portion
while (index1 <= last1): temp[tempIndex++] = A[index1++]
while (index2 <= last2): temp[tempIndex++] = A[index2++]
// Copy temp array back to A array
tempIndex = 0; index = first1
while (index <= last2): A[index++] = temp[tempIndex++]
return invCount

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-58

Pseudocode for MergeSort with inversion counting

def mergeSort(A,first,last):
invCount = 0
if first < last:
mid = b(first + last)/2c
invCount += mergeSort(A,first,mid)
invCount += mergeSort(A,mid+1,last)
invCount += merge(A,first,mid,mid+1,last)
return invCount

Running time is the same as standard mergeSort: O(n log n)

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine


2-59

Listing inversions
We have just seen that we can count inversions without increasing
the asymptotic running time of Mergesort. Suppose we want to list
inversions. When we remove inversions, we list all inversions
removed:
first1 index1 last1 first2 index2 last2

temp

tempindex

(A[index1], A[index2]), (A[index1+1], A[index2]), . . . ,


(A[last1], A[index2]).
The extra work to do the reporting is proportional to the number
of inversions reported.
CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine
2-60

Inversion counting summary

Using a slight modification of Mergesort, we can . . .


I Count inversions in O(n log n) time.
I Report inversions in O(n log n + k) time, where k is the
number of inversions.
The same results hold for the line-intersection counting problem.

The reporting algorithm is an example of an output-sensitive


algorithm. The performance of the algorithm depends on the size
of the output as well as the size of the input.

CompSci 161—Spring 2021— c M. B. Dillencourt—University of California, Irvine

You might also like