04hashtables Java Good PDF
04hashtables Java Good PDF
Contents
Objectives
Thinking about how to write iterators for data structures that you implement
You will be in a position to do the DIY Grep problem after lecture on Friday. We will discuss
hashtable implementation in detail on Monday.
How to Hand In
The source code files should comprise the hw04.src package, and your solution code files, the
hw04.sol package.
Begin by copying the source code from the course directory to your own personal directory.
That is, copy the following files from /course/cs0180/src/hw04/src/*.java to ˜/course/
cs0180/workspace/javaproject/src/hw04/src:
The purpose of these files is mainly to ensure your code compiles with our testsuite. In addi-
tion, it should be a guide for what test cases should look like. Note that your real testing files (as
required by the homework) should be MUCH more exhaustive.
After completing this assignment, the following solution files should be all be part of the hw04.sol
package when you hand in:
DIY Grep
Chaining
– Iterator.tex containing the LateX file describing your solution to the Hash Table
Iterator questions. We will also accept Iterator.pdf, but highly encourage trying out
LaTeX.
Only hand in the files that you have under the hw04.sol package, as well as the pdf
file that has your answers to Problem 3. Handing in the src files may break the testsuite.
There is no compatibility check for this assignment. Make a private post on Piazza with a
link to your submission if the autograder on Gradescope fails.
To hand in your files, submit them to Gradescope. Once you have handed in your homework, you
should receive an email, more or less immediately, confirming that fact. If you don’t receive this
email, try handing in again, or ask the TAs what went wrong.
2
CS18 Homework 4: Hash Tables Due: 5:00 PM, Mar 7, 2020
In this homework, you will be writing an application (DIY Grep) using Java’s built-in hashtables,
and completing an implementation of hashtables from scratch (via chaining). Java has two styles of
built-in hash tables, specifically, HashMap or HashSet.
These classes differ in that the former uses a hash table to represent a dictionary (a mapping from
some key to values), while the latter uses a hash table to represent a set, which is a special case
of a dictionary; the keys are the elements of the set, and there are no values.1 Consequently, it is
straightforward to use a hash table to represent a set.
You can use Java’s HashMap class by importing java.util.HashMap. Documentation on how to
use Java’s HashMap can be found here. Likewise, you can use Java’s HashSet class by importing
java.util.HashSet. Documentation on how to use Java’s HashSet can be found here. You may
use these only in the Grep problem.
Problems
The lights have only been mysteriously cut out for 30 seconds, but the CIT formal has descended
into disarray. Suddenly, a loud sound detonates above and the lights flicker back to life. Shouts
ring out as everyone looks up, just in time for a huge cloud of glitter to descend in their eyes — a
glitter bomb! Out of the corner of your eye, you spot a distinctly non-glittery figure on the balcony,
reveling in the chaos unfolding downstairs. You sprint to the stairs — it’s them, and you’re finally
going to catch them red-handed (or is it glitter-handed?). When you throw open the doors to the
second floor, though, there’s no one there. The culprit must have already disappeared into the maze
of CIT hallways. Heart pounding, you plunge into the labyrinth, desperate to find any trace of the
mysterious figure.
By the time you re-emerge from the winding halls, you’re exhausted and have nothing. Dejected,
you lean against the wall by CIT 201 to catch your breath — you can’t believe that they slipped
away right through your fingers, leaving not even a glitter trail in their wake. But when you look
up, you see something that definitely wasn’t there before. The painting that usually overlooks the
whole lobby is gone. In its place, there’s a giant, hastily painted symbol: a compass rose, like one
you would see on a nautical map.
The next day, all anyone can talk about is the eventful night and the compass rose that now overlooks
the CIT. Your suspicions have been confirmed once and for all, and you are more determined than
ever to solve this mystery. You survey everyone to find out who was seen on the dance floor right
before the blackout — and even more importantly, who wasn’t. You know your limits as the resident
CIT detective though, and there’s way too many names to check by hand. As you’re working, your
CS 18 HTA Evan sees you and, being the helpful TA he is, suggests you write a program to search
1
Recall the invariant that dictionaries do not allow duplicate keys. That is why it makes sense to view sets as
dictionaries.
3
CS18 Homework 4: Hash Tables Due: 5:00 PM, Mar 7, 2020
a file for specific guests using special data structures to make it faster (this sort of step is called
“preprocessing”).
For example, you might have a file like this, where each line is a different guest’s list of people that
they saw at the formal:
Then a search on Evan Velasquez would identify lines 1 and 3 (the lines of the file). A concrete
example of the output format we want is further down in this section.
This same problem arises in other settings, like finding which lines of a play involve a given character,
or generally searching in files for where specific information might lie. In fact, this operation is so
common that operating systems such as Unix have a built-in command for it!
The UNIX command grep is an extremely powerful and useful tool. You can use grep to search a
file for a given pattern, and report where that pattern appears in the file, as follows:
The -n is an optional parameter that tells grep that we want it to report line numbers. It prints
out each line of text next to the line number. This is but one of many, many grep features, which
are fully documented on its man page (which you can access, if you want to learn more, by typing
‘man grep’ into a terminal).
Here’s a couple of examples of how you’d use grep with this line number option:
Your output format will look different (see below), but this is the essence of what you
are trying to replicate with your code.
Now we’re going to implement (a limited version of) grep for ourselves!
4
CS18 Homework 4: Hash Tables Due: 5:00 PM, Mar 7, 2020
Task: Explain how a hashtable could be used to make it easy to look up the line numbers associated
with a given word in a multi-file. Your answer to this part is just in prose. Write your answer to
this question in a comment at the top of the Grep class, which the next task asks you to write.
Task: Write a class Grep with a constructor and a single method, lookup. This class should
implement the IGrep interface in the source files for this assignment. Your constructor should take
as input a filename and perform any necessary preprocessing. The lookup method should take as
input a word and return a set of the line numbers on which that word appears in the file. It should
operate in expected constant time.
As noted above, you can (and should) make use of Java’s built-in hash table data structure, which
is called HashMap, to solve this problem.
Notes:
If the same word appears more than once on the same line, you should include the line number
only once in your output.
You should treat words as sequences of characters separated by whitespace; so "glitter" and
"glitter!" are distinct words. Also, you can assume words are case sensitive; so "mystery"
and "Mystery" are distinct words.
Hints:
As part of the preprocessing step, you may want to use the split method in the String
class, which splits up a string into pieces each time it encounters a specific character, and
stores those pieces in an array of strings.
You might find the LineNumberReader class, which extends BufferedReader, useful. It has
a method getLineNumber that gets the current line number.
You may need to catch and handle relevant exceptions! Think about exceptions like FileNotFoundException
and/or IOException.
The Java syntax to declare a HashMap that maps, for example, from a String to a Set of
type Integer is new HashMap<String, Set<Integer>>().
Likewise, the syntax to declare a HashSet of Integers is new HashSet<Integer>().
Task: Write a main method for the Grep class. The String[] args should correspond to a file
name, and then at least 1 other word to look for, in that order.
For example, running:
‘java hw04.sol.Grep /course/cs0180/src/poems/iliad tree water cats’
from the bin directory, should print something like:
Note: To take in arguments using IntelliJ, press the drop-down menu near the green “run” button
and select edit configurations. In the “Program Arguments” field, put your program arguments
5
CS18 Homework 4: Hash Tables Due: 5:00 PM, Mar 7, 2020
(space separated). These arguments will appear in the String[] args variable in your main
method. For giving arguments through the commandline, you can just list them after calling the
program (space-separated), and they will similarly appear in String[] args. This would look like:
‘java hw04.sol.Grep filename’
Task: Write a class GrepTest.java that tests your lookup function. For each test, include a
comment above the test explaining what scenario that particular test was trying to check. Be sure
to test edge cases and a variety of scenarios. You do not need to test for exceptions that might be
thrown on I/O errors (as you are catching those and printing a message, which can’t be tested).
Note: We are testing lookup here, not the main grep program, because the latter simply prints
output to the screen. In such cases, you test the inner logic, then manually inspect output build
from the results of the inner logic.
Hint: We’ve included several test files (thankfully, none of which is the file shown above) in /course
/cs018/src/poems/. But do not be afraid to create your own text files for testing, especially to
try to catch edge cases! If you do this, make sure to hand in these files with the rest of your code in
your sol/hw04/sol/ directory.
Now that you’ve preprocessed the list of people who were at the party, you can figure out which
TAs were missing! First you try the Head TAs, but then you remember that they were definitely
there, managing the formal’s SignMeUp queue for refreshments the entire night. You then start to
search the guest list for the UTAs, and notice that there’s exactly one who was missing right before
the blackout. . .
We will finish everything you need for this problem in class on Monday. You should have a sense of
what a hashtable looks like under the hood (an array), but we haven’t yet talked about how to deal
with collisions. For those who want to start thinking ahead, we handle collisions by putting a list in
each cell of the array, and storing all keys/values that map to the same array index in the list at
that index.
In this problem, you will implement a hash table using chaining. Recall that the internal data
structure of a hash table implemented using chaining is an array of lists. You will use Java’s
LinkedList class in particular for the list implementation.
The point of this problem is for you to see how hashtables are built when they don’t already exist
in a language. Therefore, you may not use Java’s hashmaps or hashsets in your implementation.
The starter files contain an abstract class named AbsHashTable.java. It defines a class for
combining keys and values (as shown in Monday’s lecture). Look at the KVPair<K, V> class
stored inside. AbsHashTable.java implements an interface IDictionary<K, V>, which has the
definitions of the methods we need to implement.
Task: Extend AbsHashTable<K,V> with a class Chaining<K, V>, which will be your hashtable
implementation. The Chaining constructor should take as input a size variable, which specifies
the size (number of slots) of the table.
Your Chaining class needs to implement each of the abstract methods in AbsHashTable (abstract
methods have headers, but not yet bodies). You are welcome to implement additional (helper)
6
CS18 Homework 4: Hash Tables Due: 5:00 PM, Mar 7, 2020
This situation is one of a few exceptions (equals is another) when casting is acceptable; in
general, however, it’s still discouraged.
This line of code will generate a warning that there is an unchecked cast. To get rid of the warning,
above any methods that cast like this, you should write:
@SuppressWarnings("unchecked")
Typically, warnings such as these indicate a problem with your code, so you should not suppress
them; you should pay attention to them! In this specific instance, however, we justify its use as
we are trying to get around a Java limitation.
Note: For this problem, you need not submit any Java code. A high-level description of each
algorithm is enough. We recommend doing this portion of the homework in LATEX; although,
we would also accept a PDF. You can find a description of LATEX at the bottom of the previous
homework. And our template is linked here.
7
CS18 Homework 4: Hash Tables Due: 5:00 PM, Mar 7, 2020
Whenever you implement a collection (like a list or a hashtable) you should override iterator. We
did this successfully for doubly linked lists, and in this problem, you will think about how to do this
for hash tables. You are not being asked to implement the iterator, just to explain how
it would work. (You may assume that the hash table does not change while you are iterating; no
items are inserted or deleted.)
Note: While not part of the assignment, take a moment to think about how you would use an
iterator to write equals and toString for hash tables.
First, let’s think about how an iterator for a chaining hash table might work. As a first attempt,
you could try iterating over all the slots in the hash table: if a slot is empty, you can skip right over
it; but if it is not empty, you would then iterate over the bucket stored at that slot.
Task: The iterator we just described would examine all n slots in the chaining hash table, even
though there might not be data stored at all, or even most, of them. Explain how to implement a
more efficient chaining iterator that only examines slots which store data.
Your iterator should not affect the run time of the hash table’s basic operations; that is, the run
times of lookup, insert, update, and delete should not change.
Note: You cannot simply store the keys in Java’s HashSet and iterate through this set. Your
solution should not involve iterating through another HashSet or HashTable (note that these two
data structures would have similar iterators) as this would be using the solution to the problem to
solve the problem!
Hint: Consider augmenting the key-value pairs stored in your dictionary with additional fields.
Task: Discuss the trade-offs between the naive iterator we proposed, and your iterator design.
Please let us know if you find any mistakes, inconsistencies, or confusing language in this or any
other CS18 document by filling out the anonymous feedback form: https://cs.brown.edu/
courses/cs018/feedback.