MIT6_006S20_ps3-solutions
MIT6_006S20_ps3-solutions
006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Problem Set 3
Problem Set 3
Please write your solutions in the LATEX and Python templates provided. Aim for concise
solutions; convoluted and obtuse descriptions might receive low marks, even when they are
correct.
(a) [2 points] Insert integer keys A = [47, 61, 36, 52, 56, 33, 92] in order into
a hash table of size 7 using the hash function h(k) = (10k + 4) mod 7. Each slot of
the hash table stores a linked list of the keys hashing to that slot, with later insertions
being appended to the end of the list. Draw a picture of the hash table after all keys
have been inserted.
Solution:
0 1 2 3 4 5 6
36 56 47 52
92 61
33
k 47 61 36 52 56 33 92
10k + 4 474 614 364 524 564 334 924
10k + 4 mod 7 mod 7 5 5 0 6 4 5 0
10k + 4 mod 8 mod 7 2 6 4 4 4 6 4
10k + 4 mod 9 mod 7 6 2 4 2 6 1 6
10k + 4 mod 10 mod 7 4 4 4 4 4 4 4
10k + 4 mod 11 mod 7 1 2 1 0 3 4 0
10k + 4 mod 12 mod 7 6 2 4 1 0 3 0
10k + 4 mod 13 mod 7 6 3 0 4 5 2 1
The table above was generated using this python code:
Rubric:
• 3 point for c = 13
• Partial credit may be awarded if there is work shown of a correct approach that
does not yield the correct solution.
• Rony and Tiri can choose IDs k1 and k2 so as to guarantee that they’ll be roommates, or
• prove that no such choice is possible and compute the highest probability they could possibly
achieve of being roommates.
Problem Set 3 3
Rubric:
(a) [5 points] Every ice core is√given a unique core identifier for bookkeeping, which is
a string of exactly 16dlog4 ( n)e ASCII characters.2 Sort the slices by core identifier.
Solution:√ Each string is stored in memory as a contiguous sequence of at most
16dlog4 ( n)e × 8 = O(log n) bits. In the word-RAM, these bits can then be inter-
preted as an integer stored√in a constant number of machine words (since w ≥ lg n),
upper bounded by 216dlog4 ( n)e×8 < n33 , so we can sort them in Θ(n + n logn n33 ) =
Θ(n) time via radix sort.
(b) [5 points] The deepest ice cores in the database are up to 800,000 years old. Sort the
slices by their age: the integer number of years since the slice was formed.
Solution: The ages form a constant-bounded range [0, 8·105 ], so we can use counting
sort to order them in ascending order in worst-case Θ(8 · 105 + n) = Θ(n) time. (radix
sort may also be used)
(c) [5 points] Variation in the amount of snowfall each year will cause a glacier to accu-
mulate at different rates over time. Sort the slices by thickness, a rational number of
centimeters of the form m/n3 between 0 and 4, where m is an integer.
Solution: Multiplying by n3 , these are integers m in a polynomially-bounded range
[0, 4n3 ], so sort them using Radix Sort in worst-case Θ(n + n logn n3 ) = Θ(n) time.
(d) [5 points] Elna of Northendelle has discovered that water has memory, but is unable
to quantify the memory of a given slice. Luckily, given two slices, she can distinguish
which has more memory in O(1) time using her “two-finger algorithm” (touching the
slices with her two index fingers). Sort the slices by memory.
Solution: The only way to discern order information from the slices is via compar-
isons, so we choose merge sort which runs in Θ(n log n) time, which is optimal in the
comparison model.
Rubric:
• 1 points for a correct choice of algorithm
• 1 points if chosen algorithm is efficient
• 3 points for a correct justification
• Partial credit may be awarded
1
By “efficient”, we mean that asymptotically faster correct algorithms will receive more points than slower ones.
2
You may assume a string of k ASCII characters is a pointer to a contiguous sequence of k bytes in memory, where
each byte stores an integer from 0 to 127 inclusive representing an ASCII character.
https://en.wikipedia.org/wiki/ASCII
Problem Set 3 5
(a) [10 points] Given B and r, describe an expected O(n)-time algorithm to determine
whether B contains a close pair that fulfills order r.
Solution: It suffices to check for each bi whether r − bi = bj for some bj ∈ B, and
then checking whether |i − j| < n/10. Since each box has a unique number of reams,
if there is a match with bi it is a unique bj . Naively, we could perform this check by
comparing r − bi against all bj ∈ B − {bi }, which would take O(n) time for each
bi , leading to O(n2 ) running time. We can speed up this algorithm by first storing the
elements of B in a hash table H along with their index, e.g., (bi , i), so that looking
up each r − bi can be done quickly. For each bi ∈ B, insert bi into H mapped to i in
expected amortized O(1) time. Now all unique values that occur in B appear in H, so
for each bi , check whether r − bi appears in H in expected O(1) time. Then, if it does,
H will return a j for which we can test for closeness with i in O(1) time. Building
the hash table and then checking for matches each take expected O(n) time, so this
algorithm runs in expected O(n) time. This brute force algorithm is correct because
we check every bi for its only possible fulfilling partner, and check directly whether it
is close.
Rubric:
• 3 points for a description of a correct algorithm
• 2 points for analysis of correctness
• 2 points for analysis of running time
• 3 points if correct algorithm is efficient
• Partial credit may be awarded
(b) [10 points] Now suppose that r < n2 . Describe a worst-case O(n)-time algorithm to
determine whether B contains a close pair that fulfills order r.
Solution: Replace each bi in B with the tuple (bi , i), to keep track of the index of the box in the
original order. We do not know whether every bi is polynomially bounded in n; but we do know
that r is. If some bi ≥ r, it can certainly not be part of a pair from B that fulfills order r. So
perform a linear scan of B and remove all (bi , i) for which bi ≥ r, to construct set B 0 . Now the
ream count integers bi in B 0 are each upper bounded by O(n2 ), so we can sort the tuples in B 0 by
6 Problem Set 3
their ream counts bi in worst-case O(n + n logn n2 ) = O(n) time using radix-sort, and store the
output in an array A.
Now we can sweep the sorted list using a two-finger algorithm similar to the merge step in merge
sort to find a pair that sums to r, if such a pair exists. Specifically, initialize indices i = 0 and
j = |A| − 1, and repeat the following procedure until i = j. If A[i] = (bk , k), let A[i].b = bk and
A[i].x = k. There are three cases:
• A[i].b + A[j].b = r: a pair that fulfills the order has been found.
Check whether |A[i].x − A[j].x| < n/10 and return True if so; or
• A[i].b + A[j].b < r: A[i].b cannot be part of a pair that fulfills the order with any A[k].b for
k ∈ {i + 1, . . . , j}, so increase i; or
• A[i].b + A[j].b > r: A[j].b cannot be part of a pair that fulfills the order with any A[k] for
k ∈ {i, . . . , j − 1}, so decrease j.
This loop maintains the invariant that at the start of each loop, we have confirmed that no pair
(A[k].b, A[`].b) is close and fulfills the order, for all k ≤ i ≤ j ≤ `, so if we reach the end without
returning a valid pair, the algorithm will correctly conclude that there is none. Since each iteration
of the loop takes O(1) time and decreases j − i decrease by one, and j − i = |B 0 | − 1 starts positive
and ends when j − i < 0, this procedure takes at most O(n) time in the worst case.
Rubric:
• 3 points for a description of a correct algorithm
• 2 points for analysis of correctness
• 2 points for analysis of running time
• 3 points if correct algorithm is efficient
• Partial credit may be awarded
(a) [12 points] Given string A and a positive integer k, describe a data structure that can
be built in O(|A|) time, which will then support a single operation: given a different
string B with |B| = k, return the anagram substring count of B in A in O(k) time.
Problem Set 3 7
Solution: For this data structure, we need a way to find how many substrings of
A are anagrams of an input B in a running time that does not depend on |A|. The
idea will be to construct and store a constant-sized cannonicalization of each string in
a hash table, where anagrams of the string will have the same cannonicalization; in
particular, we will construct a frequency table with 26 entries, one for each lowercase
English letter, where each entry stores the number of occurances of that letter in the
string. Two strings will have the same frequency table if and only if they are anagrams
of each other.
Let S = (S0 , . . . , S|A|−k ) be the |A|−k+1 contiguous length-k substrings of A, where
substring Si starts at character A[i]. Constructing a frequency table naı̈vely for each
Si ∈ S would take O(|A|k) time. However, after computing the frequency table of S0
naı̈vely in O(k) time, we can construct the frequncy table forSi+1 from the frequency
table for Si in O(1) time by subtracting the letter at A[i] from the frequency table of
Si , and adding in the letter at A[i + k]. In this way, we can compute the constant-sized
frequency table for each Si ∈ S in O(k) + (|A| − k)O(1) = O(|A|) time. Then insert
each of these frequency tables into a hash table H, mapped to the number of Si ∈ S
having that frequency table in expected O(|A|) time (each frequency is at most n, so
each frequency table can be thought of as a 26dlg ne-sized integer, which fits within a
constant number of machine words, that can be used as a hash key).
Then given our data structure H, we can support the requested operation by first com-
puting the frequency table f of the input string B naı̈vely in O(k) time, and then
looking it up in H in O(1) expected time, for a total of expected O(k) time. Since
H(f ) stores the number of Si ∈ S with frequency table f , if f is in H, the stored
value will be the anagram substring count of B in A. Otherwise, if f is not in H, f is
not an anagram of any substring of A, so return zero.
Rubric:
• 4 points for a description of a correct data structure
• 2 points for analysis of correctness
• 2 points for analysis of running time
• 4 points if correct algorithm is efficient
• Partial credit may be awarded
(b) [3 points] Given string T and an array of n length-k strings S = (S0 , . . . , Sn−1 )
satisfying 0 < k < |T |, describe an O(|T | + nk)-time algorithm to return an array
A = (a0 , . . . , an−1 ) for which ai is the anagram substring count of Si in T for all
i ∈ {0, . . . , n − 1}.
Solution: Construct the data structure in part (a) substituting string T for A, and
using k in O(|T |) expected time. Then for each Si ∈ S, we can perform the operation
supported by the data structure in expected O(k) time, storing their outputs in an array
A, for a total of expected O(|T | + nk) time. This algorithm is correct based on the
corrctness of the data structure from (a).
8 Problem Set 3
Rubric:
• 2 points for a description of a correct algorithm
• 1 point for analysis of running time and correctness
• Partial credit may be awarded
(c) [25 points] Write a Python function count anagram substrings(T, S) that
implements your algorithm from part (b). Note the built-in Python function ord(c)
returns the ASCII integer corresponding to ASCII character c in O(1) time. You can
download a code template containing some test cases from the website.
Solution:
1 ORD_A = ord(’a’)
2 def lower_ord(c): # map a lowercase letter to range(26)
3 return ord(c) - ORD_A
4
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms