Lab 9
Lab 9
The rationale
This lab is designed to show you another way to run Python scripts, from the command line in a terminal window.
Running scripts from the command line is actually much more common than using a Jupyter notebook environment,
it’s just that the latter is a much easier place to start programming. You will also use the text editor built-in to Jupyter
rather than a notebook. This lab also offers a peek at how computer languages are “equivalent” in the sense that
they can solve the same class of problems. However, not all tools are equally easy to use for a given problem. In this
case, the shell is a more natural tool for solving the problem.
Getting started
This lab closely follows the tasks that you worked on during last week’s shell lab. This week, you will do the same
work using Python, in a script that you run from the command line. To get started, follow the steps below. These
steps should be followed when you first start this lab. After that you can just edit the existing Python script and run it
in the terminal. It is a file that is saved just like a notebook is, just in a different format (remember file formats?)
Commands in blue are terminal commands.
● Login to Jupyter
● Create a new directory for this lab (New -> Folder). Rename the directory “Lab9-Command-S19” Click on
that subdirectory
● Create a new text file (New -> Text File) in the directory you just created (Lab9-Command-S19). This opens a
Jupyter text editor window
● Rename the file from Untitled.txt to lab9_myloginid.py (From the File menu, select Rename.) The .py is
required. myloginid should be your login id, like dmbarbe04.
● Put the following Python code in your file (copy and paste should work):
def printString(whatToPrint):
print(whatToPrint)
return
if __name__ == '__main__':
printString('hello world')
exit()
● Save the file (File -> Save)
● Open a new terminal window (New -> Terminal)
● Use the cd command to navigate to the directory you created for this lab
● Use the ls -l command to confirm that your script is there
● Run your script with the command
$ python lab9_myloginid.py
The next time you work on it you can just click on the file to open it in the text editor, then you could open a terminal
window, cd to the directory, and run your script. You will again need to use the cp command to make copies of
~charliep/courses/cs128/{first,second,third}.dat. These should be copied into the directory you
created for this lab so you can use them with your Python script later.
The work
The specific tasks your Python script needs to accomplish are listed below. They closely resemble those of the shell
lab from last week:
1. Extract all the lines from first.dat and second.dat that contain the string ‘AACCTTNN’. All of those sequences
should end up together in one file called fourth.dat
a. Do not just use the find() function for this - do it by writing code that checks the lines one at a time.
2. Combine the contents of fourth.dat to third.dat to create fifth.dat
a. Inside fifth.dat, the material from fourth.dat should appear before the material from third.dat.
3. Your program should display the number of lines and words that are in fifth.dat (words do not span lines)
You should start with a piece of paper and design the functions you will need and how you will use them. Look
through the list of tasks and see how many of them are the same work being done on a different input. If there are
similar things being done multiple times, that task may be something that should be a function.
Break the problem down into logical chunks, each of which can be solved independently and then assembled
together into a whole. One approach would be to have functions like findStrings(inputFileHandle, outputFileHandle,
targetString), appendFiles(inputFileHandle1, inputFileHandle2, outputFileHandle), and
countContents(inputFileHandle). There are other plans that work too. Think about when you will need to open files
and when you can close them.
Writing this code in Python will be more work than it was using the terminal and the tools available there. Thinking
through the problem on paper and then building small, testable chunks will make it much more tractable. When you
ask for help from the TAs, make sure you have your design document(s) handy so they can understand your
approach.
Since we don’t have autograding for this lab, we are going to use the diff command (from the terminal) and a
canon to test if you solution is correct. The diff command compares two files line by line and displays the ones that
differ. There are two canons, one for fourth.dat and one for fifth.dat. Here are examples of using diff and
redirect the results to files:
diff fourth.dat ~charliep/courses/cs128/fourth-canon.dat > fourth-diff.txt
diff fifth.dat ~charliep/courses/cs128/fifth-canon.dat > fifth-diff.txt
If there are no differences between the two files then diff will not print anything to the file. If there are differences, diff
will display them. Your goal is to have diff return no differences for each of the two output files your Python script
creates. Both canons are in the same directory. Make sure that the two files generated using the diff command are
empty.
Important details
● You may find this Python file handling reference useful:
http://www.pythonforbeginners.com/cheatsheet/python-file-handling
● You can use the Up arrow key to efficiently re-run commands and the history command to look at what
you’ve run before.
Submit it
Upload your correctly named .py file to the assignment on Moodle.