0% found this document useful (1 vote)
119 views2 pages

Lab 9

This document provides instructions for CS128 Lab 9, which involves writing a Python script to analyze DNA sequence data files. Students are asked to write a Python script that extracts lines from two files containing a target string and combines them into a new file. The script then combines this file with a third file and counts the lines and words. The document provides details on setting up the files, designing the script using functions, and testing the output against canon files.

Uploaded by

HuyIdol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
119 views2 pages

Lab 9

This document provides instructions for CS128 Lab 9, which involves writing a Python script to analyze DNA sequence data files. Students are asked to write a Python script that extracts lines from two files containing a target string and combines them into a new file. The script then combines this file with a third file and counts the lines and words. The document provides details on setting up the files, designing the script using functions, and testing the output against canon files.

Uploaded by

HuyIdol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

CS128 - Programming and Problem Solving 

Lab 9 - ​$ python mySolutionToTheLab.py 


 
Due:​ Due at 9:00 AM, Monday, April 8 2019. Your ​lab9_myloginid.py​ file should be uploaded to ​Moodle​.  

The rationale 
This lab is designed to show you another way to run Python scripts, from the command line in a terminal window. 
Running scripts from the command line is actually much more common than using a Jupyter notebook environment, 
it’s just that the latter is a much easier place to start programming. You will also use the text editor built-in to Jupyter 
rather than a notebook. This lab also offers a peek at how computer languages are “equivalent” in the sense that 
they can solve the same class of problems. However, not all tools are equally easy to use for a given problem. In this 
case, the shell is a more natural tool for solving the problem. 

Getting started 
This lab closely follows the tasks that you worked on during last week’s shell lab. This week, you will do the same 
work using Python, in a script that you run from the command line. To get started, follow the steps below. These 
steps should be followed when you first start this lab. After that you can just edit the existing Python script and run it 
in the terminal. It is a file that is saved just like a notebook is, just in a different format (remember file formats?) 
Commands in blue are terminal commands. 
 
● Login to Jupyter 
● Create a new directory for this lab (New -> Folder). Rename the directory “Lab9-Command-S19” Click on 
that subdirectory 
● Create a new text file (New -> Text File) in the directory you just created (Lab9-Command-S19). This opens a 
Jupyter text editor window 
● Rename the file from Untitled.txt to ​lab9_myloginid.py​ (From the File menu, select Rename.) The ​.py​ is 
required. ​myloginid ​should be your login id, like dmbarbe04. 
● Put the following Python code in your file (copy and paste should work): 
 
def printString(whatToPrint): 
print(whatToPrint) 
return 
 
if __name__ == '__main__': 
printString('hello world') 
exit() 
 
● Save the file (File -> Save) 
● Open a new terminal window (New -> Terminal) 
● Use the ​cd​ command to navigate to the directory you created for this lab 
● Use the ​ls -l​ command to confirm that your script is there  
● Run your script with the command  
$ python lab9_myloginid.py  
 
The next time you work on it you can just click on the file to open it in the text editor, then you could open a terminal 
window, cd to the directory, and run your script. You will again need to use the ​cp​ command to make copies of 
~charliep/courses/cs128/{first,second,third}.dat.​ These should be copied into the directory you 
created for this lab so you can use them with your Python script later.  
The work  
The specific tasks your Python script needs to accomplish are listed below. They closely resemble those of the shell 
lab from last week:  

1. Extract all the lines from first.dat and second.dat that contain the string ‘AACCTTNN’. All of those sequences
should end up together in one file called fourth.dat
a. Do not just use the find() function for this - do it by writing code that checks the lines one at a time.
2. Combine the contents of fourth.dat to third.dat to create fifth.dat
a. Inside fifth.dat, the material from fourth.dat should appear before the material from third.dat.
3. Your program should display the number of lines and words that are in fifth.dat (words do not span lines) 
 
You should start with a piece of paper and design the functions you will need and how you will use them. Look 
through the list of tasks and see how many of them are the same work being done on a different input. If there are 
similar things being done multiple times, that task may be something that should be a function. 
 
Break the problem down into logical chunks, each of which can be solved independently and then assembled 
together into a whole. One approach would be to have functions like findStrings(inputFileHandle, outputFileHandle, 
targetString), appendFiles(inputFileHandle1, inputFileHandle2, outputFileHandle), and 
countContents(inputFileHandle). There are other plans that work too. Think about when you will need to open files 
and when you can close them.  
 
Writing this code in Python will be more work than it was using the terminal and the tools available there. Thinking 
through the problem on paper and then building small, testable chunks will make it much more tractable. ​When you 
ask for help from the TAs, make sure you have your design document(s) handy so they can understand your 
approach. 
 
Since we don’t have autograding for this lab, we are going to use the ​diff​ command (from the terminal) and a 
canon to test if you solution is correct. The ​diff​ command compares two files line by line and displays the ones that 
differ. There are two canons, one for fourth.dat and one for fifth.dat. Here are examples of using ​diff and 
redirect the results to files​: 
 
diff fourth.dat ~charliep/courses/cs128/fourth-canon.dat > fourth-diff.txt 
diff fifth.dat ~charliep/courses/cs128/fifth-canon.dat > fifth-diff.txt 
 
If there are no differences between the two files then diff will not print anything to the file. If there are differences, diff 
will display them. Your goal is to have diff return no differences for each of the two output files your Python script 
creates. Both canons are in the same directory. Make sure that the two files generated using the diff command are 
empty. 

Important details 
● You may find this Python file handling reference useful: 
http://www.pythonforbeginners.com/cheatsheet/python-file-handling  
 
● You can use the Up arrow key to efficiently re-run commands and the history command to look at what 
you’ve run before. 

Submit it 
Upload your correctly named ​.py​ file to the assignment on Moodle. 
 

You might also like