0% found this document useful (0 votes)
52 views

BIS501

Uploaded by

Jingyu Tang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

BIS501

Uploaded by

Jingyu Tang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 654

Why Program?

Chapter 1

Python for Everybody


www.py4e.com
Computers Want to be Helpful...
What
• Computers are built for one purpose - to Next?
do things for us

• But we need to speak their language to


describe what we want done

• Users have it easy - someone already put What What What


many different programs (instructions) into Next? Next? Next?
the computer and users just pick the ones
they want to use What What What
Next? Next? Next?
Programmers Anticipate
Needs
• iPhone applications are a market

• iPhone applications have over 3 billion


downloads

• Programmers have left their jobs to be


full-time iPhone developers Pick Pick Pick
Me! Me! Me!
• Programmers know the ways of the
program Pick Pick Pay
Me! Me! Me!
Users vs. Programmers
• Users see computers as a set of tools - word processor, spreadsheet, map,
to-do list, etc.

• Programmers learn the computer “ways” and the computer language

• Programmers have some tools that allow them to build new tools

• Programmers sometimes write tools for lots of users and sometimes


programmers write little “helpers” for themselves to automate a task
Why be a Programmer?
• To get some task done - we are the user and programmer

- Clean up survey data

• To produce something for others to use - a programming job

- Fix a performance problem in the Sakai software

- Add a guestbook to a web site


User

Computer
Programmer
Hardware + Software

Data Information .... Networks

From a software creator’s point of view, we build the software. The end
users (stakeholders/actors) are our masters - who we want to please -
often they pay us money when they are pleased. But the data,
information, and networks are our problem to solve on their behalf.
The hardware and software are our friends and allies in this quest.
What is Code? Software? A Program?

• A sequence of stored instructions

- It is a little piece of our intelligence in the computer

- We figure something out and then we encode it and then give it to


someone else to save them the time and energy of figuring it out

• A piece of creative art - particularly when we do a good job on user


experience
Programs for Humans...

https://www.youtube.com/watch?v=XiBYM6g8Tck
Programs for Humans...
while music is playing:
Left hand out and up
Right hand out and up
Flip Left hand
Flip Right hand
Left hand to right shoulder
Right hand to left shoulder
Left hand to back of head
Right ham to back of head
Left hand to right hit
Right hand to left hit
Left hand on left bottom
Right hand on right bottom
Wiggle
Wiggle
Jump
https://www.youtube.com/watch?v=XiBYM6g8Tck
Programs for Humans...
while music is playing:
Left hand out and up
Right hand out and up
Flip Left hand
Flip Right hand
Left hand to right shoulder
Right hand to left shoulder
Left hand to back of head
Right ham to back of head
Left hand to right hit
Right hand to left hit
Left hand on left bottom
Right hand on right bottom
Wiggle
Wiggle
Jump
https://www.youtube.com/watch?v=XiBYM6g8Tck
Programs for Humans...
while music is playing:
Left hand out and up
Right hand out and up
Flip Left hand
Flip Right hand
Left hand to right shoulder
Right hand to left shoulder
Left hand to back of head
Right hand to back of head
Left hand to right hip
Right hand to left hip
Left hand on left bottom
Right hand on right bottom
Wiggle
Wiggle
Jump
https://www.youtube.com/watch?v=XiBYM6g8Tck
text = input('Enter text:')

counts = dict()
for line in text:
words = line.split()
for word in words:
counts[word] = counts.get(word,0) + 1

bigcount = None
bigword = None
for word,count in counts.items():
if bigcount is None or count > bigcount:
bigword = word
bigcount = count

print(bigword, bigcount)
Hardware Architecture
http://upload.wikimedia.org/wikipedia/commons/3/3d/RaspberryPi.jpg
Generic
Software What
Next? Computer
Input Central
and Output Processing
Devices Unit
Secondary
Memory

Main
Memory
Definitions
• Central Processing Unit: Runs the Program - The CPU is What
always wondering “what to do next”. Not the brains Next?
exactly - very dumb but very very fast

• Input Devices: Keyboard, Mouse, Touch Screen

• Output Devices: Screen, Speakers, Printer, DVD Burner

• Main Memory: Fast small temporary storage - lost on reboot - aka RAM

• Secondary Memory: Slower large permanent storage - lasts until deleted - disk
drive / memory stick
Generic
Software What
Next? Computer
Input Central
and Output Processing
Devices Unit
Secondary
if x< 3: print Memory

Main
Memory
Generic
Software What
Next? Computer
Input Central
and Output Processing
Devices Unit
01001001 Secondary
00111001 Memory

Main
Memory
Machine
Language
Totally Hot CPU

What
Next?

http://www.youtube.com/watch?v=y39D4529FM4
Hard Disk in Action

http://www.youtube.com/watch?v=9eMWG3fwiEU
Python as a Language
Python is the language of the Python
Interpreter and those who can converse with
it. An individual who can speak Python is
known as a Pythonista. It is a very uncommon
skill, and may be hereditary. Nearly all known
Pythonistas use software initially developed
by Guido van Rossum.
Early Learner: Syntax Errors
• We need to learn the Python language so we can communicate our instructions to
Python. In the beginning we will make lots of mistakes and speak gibberish like
small children.

• When you make a mistake, the computer does not think you are “cute”. It says
“syntax error” - given that it knows the language and you are just learning it. It
seems like Python is cruel and unfeeling.

• You must remember that you are intelligent and can learn. The computer is
simple and very fast, but cannot learn. So it is easier for you to learn Python than
for the computer to learn English...
Talking to Python
csev$ python3
Python 3.5.1 (v3.5.1:37a07cee5969, Dec 5 2015, 21:12:44)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwinType
"help", "copyright", "credits" or "license" for more information.
>>>
What
next?
csev$ python3
Python 3.5.1 (v3.5.1:37a07cee5969, Dec 5 2015, 21:12:44)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwinType
"help", "copyright", "credits" or "license" for more information.
>>> x = 1
>>> print(x)
1
>>> x = x + 1 This is a good test to make sure that you have
>>> print(x) Python correctly installed. Note that quit() also
2 works to end the interactive session.
>>> exit()
What Do We Say?
Elements of Python
• Vocabulary / Words - Variables and Reserved words (Chapter 2)

• Sentence structure - valid syntax patterns (Chapters 3-5)

• Story structure - constructing a program for a purpose


text = input('Enter text:')

counts = dict()
A short “story”
for line in text: about how to count
words = line.split()
for word in words:
characters in
counts[word] = counts.get(word,0) + 1 Python
bigcount = None
bigword = None
for word,count in counts.items():
if bigcount is None or count > bigcount:
bigword = word
bigcount = count

print(bigword, bigcount)
Reserved Words
You cannot use reserved words as variable names / identifiers

False class return is finally


None if for lambda continue
True def from while nonlocal
and del global not with
as elif try or yield
assert else import pass
break except in raise
Sentences or Lines

x = 2 Assignment statement
x = x + 2 Assignment with expression
print(x) Print statement

Variable Operator Constant Function


Programming Paragraphs
Python Scripts
• Interactive Python is good for experiments and programs of 3-4 lines
long.

• Most programs are much longer, so we type them into a file and tell
Python to run the commands in the file.

• In a sense, we are “giving Python a script”.

• As a convention, we add “.py” as the suffix on the end of these files to


indicate they contain Python.
Interactive versus Script
• Interactive

- You type directly to Python one line at a time and it responds

• Script

- You enter a sequence of statements (lines) into a file using a text


editor and tell Python to execute the statements in the file
Program Steps or Program Flow
• Like a recipe or installation instructions, a program is a sequence of
steps to be done in order.

• Some steps are conditional - they may be skipped.

• Sometimes a step or group of steps is to be repeated.

• Sometimes we store a set of steps to be used over and over as


needed several places throughout the program (Chapter 4).
Sequential Steps
x=2 Program:
Output:
print(x) x = 2
print(x) 2
x=x+2 x = x + 2 4
print(x)
print(x)

When a program is running, it flows from one step to the next. As


programmers, we set up “paths” for the program to follow.
x=5
Conditional Steps
Yes
x < 10 ?

print('Smaller') Program:
No Output:
x = 5
Yes if x < 10: Smaller
x > 20 ? print('Smaller') Finis
if x > 20:
print('Bigger') print('Bigger')
No
print('Finis')

print('Finis')
n=5 Repeated Steps
No Yes Output:
n>0? Program:
5
print(n) n = 5 4
while n > 0 :
print(n)
3
n = n -1 n = n – 1 2
print('Blastoff!') 1
Blastoff!
Loops (repeated steps) have iteration variables that
print('Blastoff')
change each time through a loop.
text = input('Enter text:') Sequential

counts = dict() Repeated


for line in text: Conditional
words = line.split()
for word in words:
counts[word] = counts.get(word,0) + 1

bigcount = None
bigword = None
for word,count in counts.items():
if bigcount is None or count > bigcount:
bigword = word
bigcount = count

print(bigword, bigcount)
text = input('Enter text:')
A short Python “Story”
about how to count
counts = dict() characters
for line in text:
words = line.split()
A word used to read
for word in words:
counts[word] = counts.get(word,0) + 1 data from a user

bigcount = None A sentence about


bigword = None
updating one of the
for word,count in counts.items():
if bigcount is None or count > bigcount: many counts
bigword = word
bigcount = count A paragraph about how
to find the largest item
print(bigword, bigcount)
in a list
Summary

• This is a quick overview of Chapter 1

• We will revisit these concepts throughout the course

• Focus on the big picture


Acknowledgements / Contributions
These slides are Copyright 2010- Charles R. Severance
Continue…
(www.dr-chuck.com) of the University of Michigan School of
Information and made available under a Creative Commons
Attribution 4.0 License. Please maintain this last slide in all
copies of the document to comply with the attribution
requirements of the license. If you make a change, feel free to
add your name and organization to the list of contributors on this
page as you republish the materials.

Initial Development: Charles Severance, University of Michigan


School of Information

… Insert new Contributors and Translators here


Variables, Expressions, and
Statements
Chapter 2

Python for Everybody


www.py4e.com
Constants
• Fixed values such as numbers, letters, and strings, are called
“constants” because their value does not change
• Numeric constants are as you expect
>>> print(123)
• String constants use single quotes (') 123
or double quotes (") >>> print(98.6)
98.6
>>> print('Hello world')
Hello world
Reserved Words
You cannot use reserved words as variable names / identifiers

False class return is finally


None if for lambda continue
True def from while nonlocal
and del global not with
as elif try or yield
assert else import pass
break except in raise
Variables
• A variable is a named place in the memory where a programmer can store
data and later retrieve the data using the variable “name”

• Programmers get to choose the names of the variables

• You can change the contents of a variable in a later statement

x = 12.2 x 12.2
y = 14
y 14
Variables
• A variable is a named place in the memory where a programmer can store
data and later retrieve the data using the variable “name”

• Programmers get to choose the names of the variables

• You can change the contents of a variable in a later statement

x = 12.2 x 12.2 100


y = 14
x = 100 y 14
Python Variable Name Rules
• Must start with a letter or underscore _

• Must consist of letters, numbers, and underscores

• Case Sensitive

Good: spam eggs spam23 _speed


Bad: 23spam #sign var.12
Different: spam Spam SPAM
Mnemonic Variable Names
• Since we programmers are given a choice in how we choose our
variable names, there is a bit of “best practice”
• We name variables to help us remember what we intend to store
in them (“mnemonic” = “memory aid”)
• This can confuse beginning students because well-named
variables often “sound” so good that they must be keywords

http://en.wikipedia.org/wiki/Mnemonic
x1q3z9ocd = 35.0
x1q3z9afd = 12.50
x1q3p9afd = x1q3z9ocd * x1q3z9afd
print(x1q3p9afd)

What is this bit of


code doing?
x1q3z9ocd = 35.0 a = 35.0
x1q3z9afd = 12.50 b = 12.50
x1q3p9afd = x1q3z9ocd * x1q3z9afd c = a * b
print(x1q3p9afd) print(c)

What are these bits


of code doing?
x1q3z9ocd = 35.0 a = 35.0
x1q3z9afd = 12.50 b = 12.50
x1q3p9afd = x1q3z9ocd * x1q3z9afd c = a * b
print(x1q3p9afd) print(c)

hours = 35.0
What are these bits rate = 12.50
of code doing? pay = hours * rate
print(pay)
Sentences or Lines

x = 2 Assignment statement
x = x + 2 Assignment with expression
print(x) Print statement

Variable Operator Constant Function


Assignment Statements
• We assign a value to a variable using the assignment statement (=)

• An assignment statement consists of an expression on the


right-hand side and a variable to store the result

x = 3.9 * x * ( 1 - x )
A variable is a memory location x 0.6
used to store a value (0.6)

0.6 0.6
x = 3.9 * x * ( 1 - x )

0.4

The right side is an expression.


0.936
Once the expression is evaluated, the
result is placed in (assigned to) x.
A variable is a memory location used to
store a value. The value stored in a x 0.6 0.936
variable can be updated by replacing the
old value (0.6) with a new value (0.936).
0.6 0.6
x = 3.9 * x * ( 1 - x )

0.4
The right side is an expression. Once the
expression is evaluated, the result is
placed in (assigned to) the variable on the
0.936
left side (i.e., x).
Expressions…
Numeric Expressions
Operator Operation
• Because of the lack of mathematical
symbols on computer keyboards - we + Addition
use “computer-speak” to express the - Subtraction
classic math operations
* Multiplication
• Asterisk is multiplication / Division

• Exponentiation (raise to a power) looks ** Power


different than in math % Remainder
Numeric Expressions
>>> xx = 2 >>> jj = 23
>>> xx = xx + 2 >>> kk = jj % 5 Operator Operation
>>> print(xx) >>> print(kk)
+ Addition
4 3
>>> yy = 440 * 12 >>> print(4 ** 3) - Subtraction
>>> print(yy) 64 * Multiplication
5280
>>> zz = yy / 1000 4R3 / Division

>>> print(zz) 5 23 ** Power


5.28 20 % Remainder

3
Order of Evaluation
• When we string operators together - Python must know which one
to do first

• This is called “operator precedence”

• Which operator “takes precedence” over the others?

x = 1 + 2 * 3 - 4 / 5 ** 6
Operator Precedence Rules
Highest precedence rule to lowest precedence rule:

• Parentheses are always respected Parenthesis


Power
• Exponentiation (raise to a power) Multiplication
Addition
• Multiplication, Division, and Remainder
Left to Right
• Addition and Subtraction

• Left to right
1 + 2 ** 3 / 4 * 5
>>> x = 1 + 2 ** 3 / 4 * 5
>>> print(x)
11.0 1 + 8 / 4 * 5
>>>
1 + 2 * 5
Parenthesis
Power
Multiplication 1 + 10
Addition
Left to Right 11
Operator Precedence Parenthesis
Power
• Remember the rules top to bottom Multiplication
Addition
• When writing code - use parentheses Left to Right

• When writing code - keep mathematical expressions simple enough


that they are easy to understand

• Break long series of mathematical operations up to make them


more clear
What Does “Type” Mean?
• In Python variables, literals, and
constants have a “type” >>> ddd = 1 + 4
>>> print(ddd)
• Python knows the difference between 5
an integer number and a string >>> eee = 'hello ' + 'there'
>>> print(eee)
hello there
• For example “+” means “addition” if
something is a number and
“concatenate” if something is a string
concatenate = put together
Type Matters
• Python knows what “type” >>> eee = 'hello ' + 'there'
everything is >>> eee = eee + 1
Traceback (most recent call last):
File "<stdin>", line 1, in
• Some operations are <module>TypeError: Can't convert
prohibited 'int' object to str implicitly
>>> type(eee)
• You cannot “add 1” to a string <class'str'>
>>> type('hello')
<class'str'>
• We can ask Python what type >>> type(1)
something is by using the <class'int'>
type() function >>>
Several Types of Numbers
>>> xx = 1
• Numbers have two main types >>> type (xx)
<class 'int'>
- Integers are whole numbers:
>>> temp = 98.6
-14, -2, 0, 1, 100, 401233 >>> type(temp)
<class'float'>
- Floating Point Numbers have
>>> type(1)
decimal parts: -2.5 , 0.0, 98.6, 14.0 <class 'int'>
>>> type(1.0)
• There are other number types - they
<class'float'>
are variations on float and integer
>>>
Type Conversions
>>> print(float(99) + 100)
199.0
• When you put an integer and >>> i = 42
floating point in an >>> type(i)
expression, the integer is <class'int'>
implicitly converted to a float >>> f = float(i)
>>> print(f)
• You can control this with the 42.0
>>> type(f)
built-in functions int() and
<class'float'>
float()
>>>
Integer Division
>>> print(10 / 2)
5.0
>>> print(9 / 2)
Integer division produces a floating 4.5
point result >>> print(99 / 100)
0.99
>>> print(10.0 / 2.0)
5.0
>>> print(99.0 / 100.0)
0.99
This was different in Python 2.x
>>> sval = '123'
String >>> type(sval)
<class 'str'>
>>> print(sval + 1)
Conversions Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't convert 'int' object

• You can also use int() and


to str implicitly
>>> ival = int(sval)
float() to convert between >>> type(ival)
<class 'int'>
strings and integers >>> print(ival + 1)
124
• You will get an error if the string >>> nsv = 'hello bob'
>>> niv = int(nsv)
does not contain numeric Traceback (most recent call last):
characters File "<stdin>", line 1, in <module>
ValueError: invalid literal for int()
with base 10: 'x'
User Input
• We can instruct Python to
nam = input('Who are you? ')
pause and read data from print('Welcome', nam)
the user using the input()
function
• The input() function
returns a string Who are you? Chuck
Welcome Chuck
Converting User Input
• If we want to read a number
from the user, we must inp = input('Europe floor?')
convert it from a string to a usf = int(inp) + 1
number using a type print('US floor', usf)
conversion function
• Later we will deal with bad
Europe floor? 0
input data
US floor 1
Comments in Python
• Anything after a # is ignored by Python

• Why comment?

- Describe what is going to happen in a sequence of code

- Document who wrote the code or other ancillary information

- Turn off a line of code - perhaps temporarily


Summary
• Type • Integer Division

• Reserved words • Conversion between types

• Variables (mnemonic) • User input

• Operators • Comments (#)

• Operator precedence
Exercise

Write a program to prompt the user for hours


and rate per hour to compute gross pay.

Enter Hours: 35
Enter Rate: 2.75

Pay: 96.25
Acknowledgements / Contributions

These slides are Copyright 2010- Charles R. Severance


(www.dr-chuck.com) of the University of Michigan School of ...
Information and made available under a Creative Commons
Attribution 4.0 License. Please maintain this last slide in all
copies of the document to comply with the attribution
requirements of the license. If you make a change, feel free to
add your name and organization to the list of contributors on this
page as you republish the materials.

Initial Development: Charles Severance, University of Michigan


School of Information

… Insert new Contributors and Translators here


Conditional Execution
Chapter 3

Python for Everybody


www.py4e.com
x=5
Conditional Steps
Yes
x < 10 ?

No print('Smaller') Program:
Output:
x = 5
Yes if x < 10: Smaller
x > 20 ? print('Smaller') Finis
if x > 20:
No print('Bigger') print('Bigger')

print('Finis')

print('Finis')
Comparison Operators
• Boolean expressions ask a Python Meaning
question and produce a Yes or No < Less than
result which we use to control
program flow <= Less than or Equal to
== Equal to
• Boolean expressions using >= Greater than or Equal to
comparison operators evaluate to
> Greater than
True / False or Yes / No
!= Not equal
• Comparison operators look at
variables but do not change the Remember: “=” is used for assignment.
variables
http://en.wikipedia.org/wiki/George_Boole
Comparison Operators
x = 5
if x == 5 :
print('Equals 5') Equals 5
if x > 4 :
print('Greater than 4')
Greater than 4
if x >= 5 : Greater than or Equals 5
print('Greater than or Equals 5')
if x < 6 : print('Less than 6') Less than 6
if x <= 5 :
print('Less than or Equals 5') Less than or Equals 5
if x != 6 :
print('Not equal 6') Not equal 6
One-Way Decisions
x = 5 Yes
print('Before 5') Before 5 x == 5 ?
if x == 5 :
print('Is 5') Is 5 print('Is 5’)
No
print('Is Still 5')
Is Still 5
print('Third 5')
print('Afterwards 5')
Third 5 print('Still 5')
print('Before 6') Afterwards 5
if x == 6 : Before 6 print('Third 5')
print('Is 6')
print('Is Still 6')
print('Third 6')
print('Afterwards 6') Afterwards 6
Indentation
• Increase indent indent after an if statement or for statement (after : )

• Maintain indent to indicate the scope of the block (which lines are affected
by the if/for)

• Reduce indent back to the level of the if statement or for statement to


indicate the end of the block

• Blank lines are ignored - they do not affect indentation

• Comments on a line by themselves are ignored with regard to indentation


Warning: Turn Off Tabs!!
• Atom automatically uses spaces for files with ".py" extension (nice!)

• Most text editors can turn tabs into spaces - make sure to enable this
feature

- NotePad++: Settings -> Preferences -> Language Menu/Tab Settings

- TextWrangler: TextWrangler -> Preferences -> Editor Defaults

• Python cares a *lot* about how far a line is indented. If you mix tabs and
spaces, you may get “indentation errors” even if everything looks fine
This will save you
much unnecessary
pain.
increase / maintain after if or for
decrease to indicate end of block
x = 5
if x > 2 :
print('Bigger than 2')
print('Still bigger')
print('Done with 2')

for i in range(5) :
print(i)
if i > 2 :
print('Bigger than 2')
print('Done with i', i)
print('All Done')
Think About begin/end Blocks
x = 5
if x > 2 :
print('Bigger than 2')
print('Still bigger')
print('Done with 2')

for i in range(5) :
print(i)
if i > 2 :
print('Bigger than 2')
print('Done with i', i)
print('All Done')
Nested x>1
yes

Decisions no print('More than one’)

x = 42
if x > 1 : yes
print('More than one') x < 100
if x < 100 :
no
print('Less than 100') print('Less than 100')
print('All done')

print('All Done')
Two-way Decisions
x=4

• Sometimes we want to
do one thing if a logical no yes
x>2
expression is true and
something else if the
expression is false print('Not bigger') print('Bigger')

• It is like a fork in the


road - we must choose
one or the other path but print('All Done')
not both
Two-way Decisions
x=4
with else:
no yes
x = 4 x>2

if x > 2 :
print('Bigger') print('Not bigger') print('Bigger')
else :
print('Smaller')

print('All done')
print('All Done')
Visualize Blocks x=4

no yes
x = 4 x>2

if x > 2 :
print('Bigger') print('Not bigger') print('Bigger')
else :
print('Smaller')

print('All done')
print('All Done')
More Conditional Structures…
Multi-way
yes
x<2 print('small')

no
if x < 2 :
yes
print('small')
elif x < 10 :
x < 10 print('Medium')
print('Medium') no
else :
print('LARGE') print('LARGE')
print('All done')

print('All Done')
x=0
Multi-way
yes
x<2 print('small')
x = 0
no
if x < 2 :
yes
print('small')
elif x < 10 :
x < 10 print('Medium')
print('Medium') no
else :
print('LARGE') print('LARGE')
print('All done')

print('All Done')
x=5
Multi-way
yes
x<2 print('small')
x = 5
no
if x < 2 :
yes
print('small')
elif x < 10 :
x < 10 print('Medium')
print('Medium') no
else :
print('LARGE') print('LARGE')
print('All done')

print('All Done')
x = 20
Multi-way
yes
x<2 print('small')
x = 20
no
if x < 2 :
yes
print('small')
elif x < 10 :
x < 10 print('Medium')
print('Medium') no
else :
print('LARGE') print('LARGE')
print('All done')

print('All Done')
Multi-way if x < 2 :
print('Small')
elif x < 10 :
# No Else print('Medium')
x = 5 elif x < 20 :
if x < 2 : print('Big')
print('Small') elif x < 40 :
elif x < 10 : print('Large')
print('Medium') elif x < 100:
print('Huge')
print('All done') else :
print('Ginormous')
Multi-way Puzzles
Which will never print
regardless of the value for x?
if x < 2 :
print('Below 2')
if x < 2 : elif x < 20 :
print('Below 2') print('Below 20')
elif x >= 2 : elif x < 10 :
print('Two or more') print('Below 10')
else : else :
print('Something else') print('Something else')
The try / except Structure

• You surround a dangerous section of code with try and except

• If the code in the try works - the except is skipped

• If the code in the try fails - it jumps to the except section


$ python3 notry.py
Traceback (most recent call last):
File "notry.py", line 2, in <module>
istr = int(astr)ValueError: invalid literal
for int() with base 10: 'Hello Bob'
$ cat notry.py
astr = 'Hello Bob' All
istr = int(astr) Done
print('First', istr)
astr = '123'
istr = int(astr)
print('Second', istr)
$ python3 notry.py
Traceback (most recent call last):
File "notry.py", line 2, in <module>
The istr = int(astr)ValueError: invalid literal
program for int() with base 10: 'Hello Bob'
stops $ cat notry.py
here astr = 'Hello Bob' All
istr = int(astr) Done
print('First', istr)
astr = '123'
istr = int(astr)
print('Second', istr)
Generic
Software
Computer
Input
Central
Devices
Processing
Unit
Secondary
Memory

Output Main
Devices Memory
Generic
Software
Computer
Input
Central
Devices
Processing
Unit
Secondary
Memory

Output Main
Devices Memory
astr = 'Hello Bob' When the first conversion fails - it
try: just drops into the except: clause
istr = int(astr) and the program continues.
except:
istr = -1
$ python tryexcept.py
print('First', istr) First -1
Second 123
astr = '123'
try:
istr = int(astr)
except:
istr = -1 When the second conversion
succeeds - it just skips the except:
print('Second', istr) clause and the program continues.
astr = 'Bob'
try / except
print('Hello')
astr = 'Bob'
try:
print('Hello') istr = int(astr)
istr = int(astr)
print('There')
except: print('There')
istr = -1
istr = -1
print('Done', istr)

print('Done', istr) Safety net


Sample try / except
rawstr = input('Enter a number:')
try:
ival = int(rawstr) $ python3 trynum.py
except: Enter a number:42
ival = -1 Nice work
$ python3 trynum.py
if ival > 0 : Enter a number:forty-two
print('Nice work') Not a number
else: $
print('Not a number')
Summary
• Comparison operators • Nested Decisions
== <= >= > < !=
• Multi-way decisions using elif
• Indentation
• try / except to compensate for
• One-way Decisions errors
• Two-way decisions:
if: and else:
Exercise

Rewrite your pay computation to give the


employee 1.5 times the hourly rate for hours
worked above 40 hours.

Enter Hours: 45
Enter Rate: 10

Pay: 475.0
475 = 40 * 10 + 5 * 15
Exercise

Rewrite your pay program using try and except so


that your program handles non-numeric input
gracefully.

Enter Hours: 20
Enter Rate: nine
Error, please enter numeric input

Enter Hours: forty


Error, please enter numeric input
Acknowledgements / Contributions

These slides are Copyright 2010- Charles R. Severance


(www.dr-chuck.com) of the University of Michigan School of ...
Information and made available under a Creative Commons
Attribution 4.0 License. Please maintain this last slide in all
copies of the document to comply with the attribution
requirements of the license. If you make a change, feel free to
add your name and organization to the list of contributors on this
page as you republish the materials.

Initial Development: Charles Severance, University of Michigan


School of Information

… Insert new Contributors and Translators here


Loops and Iteration
Chapter 5

Python for Everybody


www.py4e.com
n=5 Repeated Steps
Output:
No Yes Program:
n>0? 5
n = 5 4
print(n) while n > 0 :
3
print(n)
n = n – 1 2
n = n -1 print('Blastoff!') 1
print(n) Blastoff!
0
Loops (repeated steps) have iteration variables that
print('Blastoff') change each time through a loop. Often these iteration
variables go through a sequence of numbers.
n=5 An Infinite Loop
No Yes
n>0?
n = 5
while n > 0 :
print('Lather') print('Lather')
print('Rinse')
print('Rinse') print('Dry off!')

print('Dry off!') What is wrong with this loop?


n=0 Another Loop
No Yes
n>0?
n = 0
while n > 0 :
print('Lather') print('Lather')
print('Rinse')
print('Rinse') print('Dry off!')

print('Dry off!') What is this loop doing?


Breaking Out of a Loop
• The break statement ends the current loop and jumps to the
statement immediately following the loop

• It is like a loop test that can happen anywhere in the body of the
loop
while True: > hello there
line = input('> ') hello there
if line == 'done' : > finished
break finished
print(line) > done
print('Done!') Done!
Breaking Out of a Loop
• The break statement ends the current loop and jumps to the
statement immediately following the loop

• It is like a loop test that can happen anywhere in the body of the
loop
while True: > hello there
line = input('> ') hello there
if line == 'done' : > finished
break finished
print(line) > done
print('Done!')
Done!
No Yes
while True: True ?
line = input('> ')
if line == 'done' :
....
break
print(line)
print('Done!')
break

...

http://en.wikipedia.org/wiki/Transporter_(Star_Trek)
print('Done')
Finishing an Iteration with
continue
The continue statement ends the current iteration and jumps to the
top of the loop and starts the next iteration

while True:
> hello there
line = input('> ')
if line[0] == '#' : hello there
continue > # don't print this
if line == 'done' : > print this!
break print this!
print(line) > done
print('Done!') Done!
Finishing an Iteration with
continue
The continue statement ends the current iteration and jumps to the
top of the loop and starts the next iteration

while True:
> hello there
line = input('> ')
if line[0] == '#' : hello there
continue > # don't print this
if line == 'done' : > print this!
break print this!
print(line) > done
print('Done!') Done!
No
True ? Yes
while True:
line = raw_input('> ') ....
if line[0] == '#' :
continue
if line == 'done' : continue
break
print(line)
...
print('Done!')

print('Done')
Indefinite Loops

• While loops are called “indefinite loops” because they keep


going until a logical condition becomes False

• The loops we have seen so far are pretty easy to examine to see
if they will terminate or if they will be “infinite loops”

• Sometimes it is a little harder to be sure if a loop will terminate


Definite Loops
Iterating over a set of items…
Definite Loops
• Quite often we have a list of items of the lines in a file -
effectively a finite set of things

• We can write a loop to run the loop once for each of the items in
a set using the Python for construct

• These loops are called “definite loops” because they execute an


exact number of times

• We say that “definite loops iterate through the members of a set”


A Simple Definite Loop
5
for i in [5, 4, 3, 2, 1] :
print(i)
4
print('Blastoff!') 3
2
1
Blastoff!
A Definite Loop with Strings

Happy New Year: Joseph


friends = ['Joseph', 'Glenn', 'Sally'] Happy New Year: Glenn
for friend in friends : Happy New Year: Sally
print('Happy New Year:', friend)
print('Done!')
Done!
A Simple Definite Loop
Yes No
Done? Move i ahead 5
for i in [5, 4, 3, 2, 1] : 4
print(i) 3
print(i) print('Blastoff!')
2
1
Blastoff!

Definite loops (for loops) have explicit iteration variables


print('Blast off!') that change each time through a loop. These iteration
variables move through the sequence or set.
Looking at in...
• The iteration variable
“iterates” through the Five-element
sequence (ordered set) sequence
Iteration variable
• The block (body) of code is
executed once for each for i in [5, 4, 3, 2, 1] :
value in the sequence print(i)

• The iteration variable moves


through all of the values in
the sequence
No • The iteration variable “iterates”
Yes
Done? Move i ahead through the sequence (ordered
set)
print(i)
• The block (body) of code is
executed once for each value in
the sequence

• The iteration variable moves


through all of the values in the
for i in [5, 4, 3, 2, 1] :
print(i)
sequence
i=5
No print(i)
Yes
Done? Move i ahead i=4
print(i)
print(i)
i=3
print(i)

i=2

for i in [5, 4, 3, 2, 1] : print(i)


print(i)
i=1
print(i)
Loop Idioms:
What We Do in Loops

Note: Even though these examples are simple,


the patterns apply to all kinds of loops
Making “smart” loops
Set some variables to
initial values
for thing in data:
The trick is “knowing” something
Look for something or
about the whole loop when you
do something to each
are stuck writing code that only
entry separately,
sees one entry at a time updating a variable

Look at the variables


Looping Through a Set
$ python basicloop.py
print('Before')
Before
for thing in [9, 41, 12, 3, 74, 15] : 9
print(thing) 41
print('After')
12
3
74
15
After
What is the Largest Number?
What is the Largest Number?

3
What is the Largest Number?

41
What is the Largest Number?

12
What is the Largest Number?

9
What is the Largest Number?

74
What is the Largest Number?

15
What is the Largest Number?
What is the Largest Number?

3 41 12 9 74 15
What is the Largest Number?

largest_so_far -1
What is the Largest Number?

largest_so_far 3
What is the Largest Number?

41

largest_so_far 41
What is the Largest Number?

12

largest_so_far 41
What is the Largest Number?

largest_so_far 41
What is the Largest Number?

74

largest_so_far 74
What is the Largest Number?

15

74
What is the Largest Number?

3 41 12 9 74 15

74
Finding the Largest Value
$ python largest.py
largest_so_far = -1
Before -1
print('Before', largest_so_far)
for the_num in [9, 41, 12, 3, 74, 15] : 9 9
if the_num > largest_so_far : 41 41
largest_so_far = the_num 41 12
print(largest_so_far, the_num) 41 3
74 74
print('After', largest_so_far) 74 15
After 74

We make a variable that contains the largest value we have seen so far. If the current
number we are looking at is larger, it is the new largest value we have seen so far.
More Loop Patterns…
Counting in a Loop
$ python countloop.py
zork = 0 Before 0
print('Before', zork) 19
for thing in [9, 41, 12, 3, 74, 15] :
2 41
zork = zork + 1
print(zork, thing) 3 12
print('After', zork) 43
5 74
6 15
After 6

To count how many times we execute a loop, we introduce a counter variable


that starts at 0 and we add one to it each time through the loop.
Summing in a Loop
$ python countloop.py
zork = 0 Before 0
print('Before', zork) 99
for thing in [9, 41, 12, 3, 74, 15] :
50 41
zork = zork + thing
print(zork, thing) 62 12
print('After', zork) 65 3
139 74
154 15
After 154

To add up a value we encounter in a loop, we introduce a sum variable that


starts at 0 and we add the value to the sum each time through the loop.
Finding the Average in a Loop
$ python averageloop.py
count = 0 Before 0 0
sum = 0
199
print('Before', count, sum)
for value in [9, 41, 12, 3, 74, 15] : 2 50 41
count = count + 1 3 62 12
sum = sum + value 4 65 3
print(count, sum, value) 5 139 74
print('After', count, sum, sum / count) 6 154 15
After 6 154 25.666

An average just combines the counting and sum patterns and


divides when the loop is done.
Filtering in a Loop
print('Before') $ python search1.py
for value in [9, 41, 12, 3, 74, 15] : Before
if value > 20: Large number 41
print('Large number',value)
print('After')
Large number 74
After

We use an if statement in the loop to catch / filter the


values we are looking for.
Search Using a Boolean Variable
$ python search1.py
found = False
Before False
print('Before', found) False 9
for value in [9, 41, 12, 3, 74, 15] : False 41
if value == 3 : False 12
found = True True 3
print(found, value) True 74
print('After', found)
True 15
After True

If we just want to search and know if a value was found, we use a variable that
starts at False and is set to True as soon as we find what we are looking for.
How to Find the Smallest Value
$ python largest.py
largest_so_far = -1
Before -1
print('Before', largest_so_far)
for the_num in [9, 41, 12, 3, 74, 15] : 9 9
if the_num > largest_so_far : 41 41
largest_so_far = the_num 41 12
print(largest_so_far, the_num) 41 3
74 74
print('After', largest_so_far) 74 15
After 74

How would we change this to make it find the smallest value in the list?
Finding the Smallest Value
smallest_so_far = -1
print('Before', smallest_so_far)
for the_num in [9, 41, 12, 3, 74, 15] :
if the_num < smallest_so_far :
smallest_so_far = the_num
print(smallest_so_far, the_num)

print('After', smallest_so_far)

We switched the variable name to smallest_so_far and switched the > to <
Finding the Smallest Value
$ python smallbad.py
smallest_so_far = -1
Before -1
print('Before', smallest_so_far)
for the_num in [9, 41, 12, 3, 74, 15] : -1 9
if the_num < smallest_so_far : -1 41
smallest_so_far = the_num -1 12
print(smallest_so_far, the_num) -1 3
-1 74
print('After', smallest_so_far) -1 15
After -1

We switched the variable name to smallest_so_far and switched the > to <
Finding the Smallest Value
smallest = None $ python smallest.py
print('Before') Before
for value in [9, 41, 12, 3, 74, 15] : 99
if smallest is None :
9 41
smallest = value
elif value < smallest : 9 12
smallest = value 33
print(smallest, value) 3 74
print('After', smallest) 3 15
After 3

We still have a variable that is the smallest so far. The first time through the loop
smallest is None, so we take the first value to be the smallest.
The is and is not Operators
• Python has an is operator
smallest = None that can be used in logical
print('Before') expressions
for value in [3, 41, 12, 9, 74, 15] :
if smallest is None :
smallest = value
• Implies “is the same as”
elif value < smallest :
smallest = value • Similar to, but stronger than
print(smallest, value) ==
print('After', smallest)
• is not also is a logical
operator
Summary
• While loops (indefinite) • For loops (definite)
• Infinite loops • Iteration variables
• Using break • Loop idioms
• Using continue • Largest or smallest
• None constants and variables
Acknowledgements / Contributions
These slides are Copyright 2010- Charles R. Severance ...
(www.dr-chuck.com) of the University of Michigan School of
Information and open.umich.edu and made available under a
Creative Commons Attribution 4.0 License. Please maintain this
last slide in all copies of the document to comply with the
attribution requirements of the license. If you make a change,
feel free to add your name and organization to the list of
contributors on this page as you republish the materials.

Initial Development: Charles Severance, University of Michigan


School of Information

… Insert new Contributors and Translators here


Functions
Chapter 4

Python for Everybody


www.py4e.com
Stored (and reused) Steps
def
thing(): Program:
print('Hello') Output:
def thing():
print('Fun') print('Hello')
print('Fun') Hello
thing() Fun
thing()
print('Zip') Zip
print('Zip') thing()
Hello
Fun
thing()
We call these reusable pieces of code “functions”
Python Functions
• There are two kinds of functions in Python.

- Built-in functions that are provided as part of Python - print(),


input(), type(), float(), int() ...

- Functions that we define ourselves and then use

• We treat function names as “new” reserved words


(i.e., we avoid them as variable names)
Function Definition
• In Python a function is some reusable code that takes
arguments(s) as input, does some computation, and then returns
a result or results

• We define a function using the def reserved word

• We call/invoke the function by using the function name,


parentheses, and arguments in an expression
Argument

big = max('Hello world')


Assignment
'w'

Result
>>> big = max('Hello world')
>>> print(big)
w
>>> tiny = min('Hello world')
>>> print(tiny)

>>>
Max Function
A function is some
>>> big = max('Hello world') stored code that we
>>> print(big)
use. A function takes
w
some input and
produces an output.

'Hello world' max() 'w'


(a string) (a string)
function

Guido wrote this code


Max Function
A function is some
>>> big = max('Hello world') stored code that we
>>> print(big)
use. A function takes
w
some input and
def max(inp):
produces an output.
blah
'Hello world' blah 'w'
for x in inp:
(a string) blah
(a string)
blah

Guido wrote this code


Type Conversions
>>> print(float(99) / 100)
• When you put an integer 0.99
>>> i = 42
and floating point in an >>> type(i)
expression, the integer <class 'int'>
is implicitly converted to >>> f = float(i)
>>> print(f)
a float 42.0
>>> type(f)
• You can control this with <class 'float'>
the built-in functions int() >>> print(1 + 2 * float(3) / 4 – 5)
and float() -2.5
>>>
String >>> sval = '123'
>>> type(sval)
<class 'str'>
Conversions >>> print(sval + 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
• You can also use int() TypeError: cannot concatenate 'str'
and 'int'
and float() to convert >>> ival = int(sval)
between strings and >>> type(ival)
<class 'int'>
integers >>> print(ival + 1)
124
• You will get an error if the >>> nsv = 'hello bob'
>>> niv = int(nsv)
string does not contain Traceback (most recent call last):
numeric characters File "<stdin>", line 1, in <module>
ValueError: invalid literal for int()
Functions of Our Own…
Building our Own Functions
• We create a new function using the def keyword followed by
optional parameters in parentheses

• We indent the body of the function

• This defines the function but does not execute the body of the
function

def print_lyrics():
print("I'm a lumberjack, and I'm okay.")
print('I sleep all night and I work all day.')
print("I'm a lumberjack, and I'm okay.")
print_lyrics(): print('I sleep all night and I work all day.')

x = 5
print('Hello')

def print_lyrics():
print("I'm a lumberjack, and I'm okay.") Hello
print('I sleep all night and I work all day.')
Yo
print('Yo') 7
x = x + 2
print(x)
Definitions and Uses
• Once we have defined a function, we can call (or invoke) it
as many times as we like

• This is the store and reuse pattern


x = 5
print('Hello')

def print_lyrics():
print("I'm a lumberjack, and I'm okay.")
print('I sleep all night and I work all day.')

print('Yo')
print_lyrics()
x = x + 2
Hello
print(x) Yo
I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.
7
Arguments
• An argument is a value we pass into the function as its input
when we call the function

• We use arguments so we can direct the function to do different


kinds of work when we call it at different times

• We put the arguments in parentheses after the name of the


function
big = max('Hello world')
Argument
Parameters
>>> def greet(lang):
... if lang == 'es':
A parameter is a variable which ... print('Hola')
... elif lang == 'fr':
we use in the function definition. ... print('Bonjour')
It is a “handle” that allows the ... else:
... print('Hello')
code in the function to access ...
>>> greet('en')
the arguments for a particular Hello
function invocation. >>> greet('es')
Hola
>>> greet('fr')
Bonjour
>>>
Return Values
Often a function will take its arguments, do some computation, and
return a value to be used as the value of the function call in the
calling expression. The return keyword is used for this.

def greet():
return "Hello" Hello Glenn
Hello Sally
print(greet(), "Glenn")
print(greet(), "Sally")
Return Value
>>> def greet(lang):
... if lang == 'es':
• A “fruitful” function is one ... return 'Hola'
... elif lang == 'fr':
that produces a result (or ... return 'Bonjour'
return value) ... else:
... return 'Hello'
...
• The return statement ends >>> print(greet('en'),'Glenn')
the function execution and Hello Glenn
>>> print(greet('es'),'Sally')
“sends back” the result of Hola Sally
the function >>> print(greet('fr'),'Michael')
Bonjour Michael
>>>
Arguments, Parameters, and
Results
>>> big = max('Hello world') Parameter
>>> print(big)
w
def max(inp):
blah
blah
'Hello world' for x in inp: 'w'
blah
blah
Argument return 'w'
Result
Multiple Parameters / Arguments
• We can define more than one
parameter in the function def addtwo(a, b):
definition added = a + b
return added
• We simply add more arguments
x = addtwo(3, 5)
when we call the function print(x)

• We match the number and order 8


of arguments and parameters
Void (non-fruitful) Functions

• When a function does not return a value, we call it a “void”


function

• Functions that return values are “fruitful” functions

• Void functions are “not fruitful”


To function or not to function...
• Organize your code into “paragraphs” - capture a complete
thought and “name it”

• Don’t repeat yourself - make it work once and then reuse it

• If something gets too long or complex, break it up into logical


chunks and put those chunks in functions

• Make a library of common stuff that you do over and over -


perhaps share this with your friends...
Summary
• Functions • Arguments
• Built-In Functions • Results (fruitful functions)
• Type conversion (int, float) • Void (non-fruitful) functions
• String conversions • Why use functions?
• Parameters
Exercise

Rewrite your pay computation with time-and-a-


half for overtime and create a function called
computepay which takes two parameters ( hours
and rate).

Enter Hours: 45
Enter Rate: 10

Pay: 475.0
475 = 40 * 10 + 5 * 15
Acknowledgements / Contributions
These slides are Copyright 2010- Charles R. Severance ...
(www.dr-chuck.com) of the University of Michigan School of
Information and open.umich.edu and made available under a
Creative Commons Attribution 4.0 License. Please maintain this
last slide in all copies of the document to comply with the
attribution requirements of the license. If you make a change,
feel free to add your name and organization to the list of
contributors on this page as you republish the materials.

Initial Development: Charles Severance, University of Michigan


School of Information

… Insert new Contributors and Translators here


Reading Files
Chapter 7

Python for Everybody


www.py4e.com
Software What
It is time to go find some
Next? Data to mess with!
Input Central
and Output Processing Files R
Devices Unit Us

Secondary
if x < 3: print Memory

Main From [email protected] Sat Jan 5 09:14:16 2008


Memory Return-Path: <[email protected]>
Date: Sat, 5 Jan 2008 09:12:18 -0500To:
[email protected]:
[email protected]: [sakai] svn commit: r39772 -
content/branches/Details:
http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772
...
File Processing
A text file can be thought of as a sequence of lines
From [email protected] Sat Jan 5 09:14:16 2008
Return-Path: <[email protected]>
Date: Sat, 5 Jan 2008 09:12:18 -0500
To: [email protected]
From: [email protected]
Subject: [sakai] svn commit: r39772 - content/branches/

Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772

http://www.py4e.com/code/mbox-short.txt
• Before we can read the contents of the file, we must tell Python
Opening a File
which file we are going to work with and what we will be doing
with the file

• This is done with the open() function

• open() returns a “file handle” - a variable used to perform


operations on the file

• Similar to “File -> Open” in a Word Processor


Using open()
fhand = open('mbox.txt', 'r')
handle = open(filename, mode)

returns a handle use to manipulate the file

filename is a string

mode is optional and should be 'r' if we are planning to read


the file and 'w' if we are going to write to the file
What is a Handle?
>>> fhand = open('mbox.txt')
>>> print(fhand)
<_io.TextIOWrapper name='mbox.txt' mode='r' encoding='UTF-8'>
When Files are Missing
>>> fhand = open('stuff.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or
directory: 'stuff.txt'
>>> stuff = 'Hello\nWorld!'
The newline
• We use a special character
Character
>>> stuff
'Hello\nWorld!'
called the “newline” to indicate >>> print(stuff)
when a line ends Hello
World!
>>> stuff = 'X\nY'
• We represent it as \n in strings
>>> print(stuff)
X
• Newline is still one character - Y
not two >>> len(stuff)
3
File Processing
A text file can be thought of as a sequence of lines

From [email protected] Sat Jan 5 09:14:16 2008


Return-Path: <[email protected]>
Date: Sat, 5 Jan 2008 09:12:18 -0500
To: [email protected]
From: [email protected]
Subject: [sakai] svn commit: r39772 - content/branches/

Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772
File Processing
A text file has newlines at the end of each line

From [email protected] Sat Jan 5 09:14:16 2008\n


Return-Path: <[email protected]>\n
Date: Sat, 5 Jan 2008 09:12:18 -0500\n
To: [email protected]\n
From: [email protected]\n
Subject: [sakai] svn commit: r39772 - content/branches/\n
\n
Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772\n
Reading Files in Python
File Handle as a Sequence
• A file handle open for read can
be treated as a sequence of
strings where each line in the xfile = open('mbox.txt')
file is a string in the sequence for cheese in xfile:
print(cheese)
• We can use the for statement
to iterate through a sequence

• Remember - a sequence is an
ordered set
Counting Lines in a File fhand = open('mbox.txt')
• Open a file read-only count = 0
for line in fhand:
• Use a for loop to read each line count = count + 1
print('Line Count:', count)
• Count the lines and print out
the number of lines
$ python open.py
Line Count: 132045
Reading
We can read the whole
the *Whole* File
>>> fhand = open('mbox-short.txt')
>>> inp = fhand.read()
file (newlines and all) >>> print(len(inp))
94626
into a single string
>>> print(inp[:20])
From stephen.marquar
Searching Through a File fhand = open('mbox-short.txt')
We can put an if statement in
for line in fhand:
our for loop to only print lines if line.startswith('From:') :
that meet some criteria print(line)
OOPS!
From: [email protected]
What are all these blank
lines doing here? From: [email protected]

From: [email protected]

From: [email protected]
...
OOPS!
What are all these blank From: [email protected]\n
lines doing here? \n
From: [email protected]\n
Each line from the file has \n
a newline at the end From: [email protected]\n
\n
The print statement adds a From: [email protected]\n
\n
newline to each line
...
Searching
We Through a File (fixed)
can strip the whitespace
fhand = open('mbox-short.txt')
for line in fhand:
from the right-hand side of line = line.rstrip()
if line.startswith('From:') :
the string using rstrip() from print(line)
the string library
From: [email protected]
The newline is considered
From: [email protected]
“white space” and is From: [email protected]
stripped From: [email protected]
....
Skipping with continue
fhand = open('mbox-short.txt')
We can conveniently for line in fhand:
skip a line by using the line = line.rstrip()
if not line.startswith('From:') :
continue statement continue
print(line)
Using in to Select Lines
We can look for a string
anywhere in a line as our
fhand = open('mbox-short.txt')
for line in fhand:
line = line.rstrip()
if not '@uct.ac.za' in line :
selection criteria continue
print(line)

From [email protected] Sat Jan 5 09:14:16 2008


X-Authentication-Warning: set sender to [email protected] using –f
From: [email protected]
Author: [email protected]
From [email protected] Fri Jan 4 07:02:32 2008
X-Authentication-Warning: set sender to [email protected] using -f...
fname = input('Enter the file name: ')
fhand = open(fname)
count = 0
Prompt for
for line in fhand:
if line.startswith('Subject:') :
count = count + 1
File Name
print('There were', count, 'subject lines in', fname)

Enter the file name: mbox.txt


There were 1797 subject lines in mbox.txt

Enter the file name: mbox-short.txt


There were 27 subject lines in mbox-short.txt
fname = input('Enter the file name: ')
try:

Bad File fhand = open(fname)


except:
print('File cannot be opened:', fname)

Names quit()

count = 0
for line in fhand:
if line.startswith('Subject:') :
count = count + 1
print('There were', count, 'subject lines in', fname)

Enter the file name: mbox.txt


There were 1797 subject lines in mbox.txt

Enter the file name: na na boo boo


File cannot be opened: na na boo boo
Summary
• Searching for lines

• Reading file names

• Secondary storage • Dealing with bad files

• Opening a file - file handle

• File structure - newline character

• Reading a file line by line with a


for loop
Acknowledgements / Contributions

These slides are Copyright 2010- Charles R. Severance ...


(www.dr-chuck.com) of the University of Michigan School of
Information and open.umich.edu and made available under a
Creative Commons Attribution 4.0 License. Please maintain this
last slide in all copies of the document to comply with the
attribution requirements of the license. If you make a change,
feel free to add your name and organization to the list of
contributors on this page as you republish the materials.

Initial Development: Charles Severance, University of Michigan


School of Information

… Insert new Contributors and Translators here


Python Lists
Chapter 8

Python for Everybody


www.py4e.com
Programming
• Algorithm
- A set of rules or steps used to solve a problem

• Data Structure
- A particular way of organizing data in a computer

https://en.wikipedia.org/wiki/Algorithm
https://en.wikipedia.org/wiki/Data_structure
What is Not a “Collection”?
Most of our variables have one value in them - when we put a new
value in the variable, the old value is overwritten

$ python
>>> x = 2
>>> x = 4
>>> print(x)
4
A List is a Kind of
Collection
• A collection allows us to put many values in a single “variable”

• A collection is nice because we can carry all many values


around in one convenient package.

friends = [ 'Joseph', 'Glenn', 'Sally' ]

carryon = [ 'socks', 'shirt', 'perfume' ]


List Constants
• List constants are surrounded by >>> print([1, 24, 76])
square brackets and the elements [1, 24, 76]
>>> print(['red', 'yellow',
in the list are separated by 'blue'])
commas ['red', 'yellow', 'blue']
>>> print(['red', 24, 98.6])
• A list element can be any Python ['red', 24, 98.6]
>>> print([ 1, [5, 6], 7])
object - even another list [1, [5, 6], 7]
>>> print([])
• A list can be empty []
We Already Use Lists!
5
for i in [5, 4, 3, 2, 1] :
print(i) 4
print('Blastoff!') 3
2
1
Blastoff!
Lists and Definite Loops - Best Pals

friends = ['Joseph', 'Glenn', 'Sally']


for friend in friends : Happy New Year: Joseph
print('Happy New Year:', friend)
print('Done!') Happy New Year: Glenn
Happy New Year: Sally
Done!
z = ['Joseph', 'Glenn', 'Sally']
for x in z:
print('Happy New Year:', x)
print('Done!')
Looking Inside Lists

Just like strings, we can get at any single element in a list using an
index specified in square brackets

>>> friends = [ 'Joseph', 'Glenn', 'Sally' ]


Joseph Glenn Sally >>> print(friends[1])
Glenn
0 1 2 >>>
Lists are Mutable
>>> fruit = 'Banana'
>>> fruit[0] = 'b'
• Strings are “immutable” - we Traceback
cannot change the contents of a TypeError: 'str' object does not
string - we must make a new string support item assignment
>>> x = fruit.lower()
to make any change >>> print(x)
banana
>>> lotto = [2, 14, 26, 41, 63]
• Lists are “mutable” - we can >>> print(lotto)
change an element of a list using [2, 14, 26, 41, 63]
>>> lotto[2] = 28
the index operator >>> print(lotto)
[2, 14, 28, 41, 63]
How Long is a List?

• The len() function takes a list as a >>> greet = 'Hello Bob'


parameter and returns the number >>> print(len(greet))
of elements in the list 9
>>> x = [ 1, 2, 'joe', 99]
>>> print(len(x))
• Actually len() tells us the number of 4
elements of any set or sequence >>>
(such as a string...)
Using the range Function
• The range function returns
>>> print(range(4))
a list of numbers that range [0, 1, 2, 3]
from zero to one less than >>> friends = ['Joseph', 'Glenn', 'Sally']
>>> print(len(friends))
the parameter 3
>>> print(range(len(friends)))
[0, 1, 2]
• We can construct an index >>>
loop using for and an
integer iterator
A Tale of Two Loops...
>>> friends = ['Joseph', 'Glenn', 'Sally']
friends = ['Joseph', 'Glenn', 'Sally'] >>> print(len(friends))
3
for friend in friends : >>> print(range(len(friends)))
print('Happy New Year:', friend) [0, 1, 2]
>>>
for i in range(len(friends)) :
friend = friends[i]
print('Happy New Year:', friend) Happy New Year: Joseph
Happy New Year: Glenn
Happy New Year: Sally
Concatenating Lists Using +
>>> a = [1, 2, 3]
We can create a new list >>> b = [4, 5, 6]
by adding two existing >>> c = a + b
>>> print(c)
lists together
[1, 2, 3, 4, 5, 6]
>>> print(a)
[1, 2, 3]
Lists Can Be Sliced Using :
>>> t = [9, 41, 12, 3, 74, 15]
>>> t[1:3]
[41,12] Remember: Just like in
>>> t[:4] strings, the second
[9, 41, 12, 3] number is “up to but not
>>> t[3:]
including”
[3, 74, 15]
>>> t[:]
[9, 41, 12, 3, 74, 15]
List Methods
>>> x = list()
>>> type(x)
<type 'list'>
>>> dir(x)
['append', 'count', 'extend', 'index', 'insert',
'pop', 'remove', 'reverse', 'sort']
>>>

http://docs.python.org/tutorial/datastructures.html
Building a List from Scratch
>>> stuff = list()
• We can create an empty list >>> stuff.append('book')
and then add elements using >>> stuff.append(99)
the append method >>> print(stuff)
['book', 99]
• The list stays in order and >>> stuff.append('cookie')
>>> print(stuff)
new elements are added at
['book', 99, 'cookie']
the end of the list
Is Something in a List?
• Python provides two operators >>> some = [1, 9, 21, 10, 16]
that let you check if an item is >>> 9 in some
True
in a list
>>> 15 in some
False
• These are logical operators >>> 20 not in some
that return True or False True
>>>
• They do not modify the list
Lists are in Order
• A list can hold many
items and keeps
those items in the
order until we do >>> friends = [ 'Joseph', 'Glenn', 'Sally' ]
>>> friends.sort()
something to change >>> print(friends)
the order ['Glenn', 'Joseph', 'Sally']
>>> print(friends[1])
• A list can be sorted Joseph
(i.e., change its order) >>>

• The sort method


(unlike in strings)
means “sort yourself”
Built-in Functions and Lists
>>> nums = [3, 41, 12, 9, 74, 15]
• There are a number of >>> print(len(nums))
functions built into Python 6
that take lists as >>> print(max(nums))
parameters 74
>>> print(min(nums))
3
• Remember the loops we >>> print(sum(nums))
built? These are much 154
simpler. >>> print(sum(nums)/len(nums))
25.6
total = 0 Enter a number: 3
count = 0
while True :
Enter a number: 9
inp = input('Enter a number: ') Enter a number: 5
if inp == 'done' : break
value = float(inp) Enter a number: done
total = total + value Average: 5.66666666667
count = count + 1

average = total / count


numlist = list()
print('Average:', average)
while True :
inp = input('Enter a number: ')
if inp == 'done' : break
value = float(inp)
numlist.append(value)

average = sum(numlist) / len(numlist)


print('Average:', average)
Best Friends: Strings and Lists
>>> abc = 'With three words' >>> print(stuff)
>>> stuff = abc.split() ['With', 'three', 'words']
>>> print(stuff) >>> for w in stuff :
['With', 'three', 'words'] ... print(w)
>>> print(len(stuff)) ...
3 With
>>> print(stuff[0]) Three
With Words
>>>

Split breaks a string into parts and produces a list of strings. We think of these
as words. We can access a particular word or loop through all the words.
>>> line = 'A lot of spaces'
>>> etc = line.split()
>>> print(etc)
['A', 'lot', 'of', 'spaces'] ● When you do not specify a
>>>
>>> line = 'first;second;third' delimiter, multiple spaces are
>>> thing = line.split()
>>> print(thing) treated like one delimiter
['first;second;third']
>>> print(len(thing))
1 ● You can specify what delimiter
>>> thing = line.split(';')
>>> print(thing) character to use in the splitting
['first', 'second', 'third']
>>> print(len(thing))
3
>>>
From [email protected] Sat Jan 5 09:14:16 2008

fhand = open('mbox-short.txt') Sat


for line in fhand:
line = line.rstrip() Fri
if not line.startswith('From ') : continue Fri
words = line.split()
print(words[2])
Fri
...

>>> line = 'From [email protected] Sat Jan 5 09:14:16 2008'


>>> words = line.split()
>>> print(words)
['From', '[email protected]', 'Sat', 'Jan', '5', '09:14:16', '2008']
>>>
The Double Split Pattern
Sometimes we split a line one way, and then grab one of the pieces
of the line and split that piece again

From [email protected] Sat Jan 5 09:14:16 2008

words = line.split()
email = words[1]
print pieces[1]
The Double Split Pattern

From [email protected] Sat Jan 5 09:14:16 2008

words = line.split()
email = words[1] [email protected]
print pieces[1]
The Double Split Pattern

From [email protected] Sat Jan 5 09:14:16 2008

words = line.split()
email = words[1] [email protected]
pieces = email.split('@') ['stephen.marquard', 'uct.ac.za']
print pieces[1]
The Double Split Pattern

From [email protected] Sat Jan 5 09:14:16 2008

words = line.split()
email = words[1] [email protected]
pieces = email.split('@') ['stephen.marquard', 'uct.ac.za']
print(pieces[1]) 'uct.ac.za'
List Summary
• Concept of a collection • Slicing lists

• Lists and definite loops • List methods: append, remove

• Indexing and lookup • Sorting lists

• List mutability • Splitting strings into lists of words

• Functions: len, min, max, sum • Using split to parse strings


Acknowledgements / Contributions
These slides are Copyright 2010- Charles R. Severance
...
(www.dr-chuck.com) of the University of Michigan School of
Information and open.umich.edu and made available under a
Creative Commons Attribution 4.0 License. Please maintain this
last slide in all copies of the document to comply with the
attribution requirements of the license. If you make a change,
feel free to add your name and organization to the list of
contributors on this page as you republish the materials.

Initial Development: Charles Severance, University of Michigan


School of Information

… Insert new Contributors and Translators here


Tuples
Chapter 10

Python for Everybody


www.py4e.com
Tuples Are Like Lists
Tuples are another kind of sequence that functions much like a list
- they have elements which are indexed starting at 0
>>> x = ('Glenn', 'Sally', 'Joseph') >>> for iter in y:
>>> print(x[2]) ... print(iter)
Joseph ...
>>> y = ( 1, 9, 2 ) 1
>>> print(y) 9
(1, 9, 2) 2
>>> print(max(y)) >>>
9
but... Tuples are “immutable”
Unlike a list, once you create a tuple, you cannot alter its
contents - similar to a string

>>> x = [9, 8, 7] >>> y = 'ABC' >>> z = (5, 4, 3)


>>> x[2] = 6 >>> y[2] = 'D' >>> z[2] = 0
>>> print(x) Traceback:'str' Traceback:'tuple'
>>>[9, 8, 6] object does object does
>>> not support item not support item
Assignment Assignment
>>> >>>
Things not to do With Tuples
>>> x = (3, 2, 1)
>>> x.sort()
Traceback:
AttributeError: 'tuple' object has no attribute 'sort'
>>> x.append(5)
Traceback:
AttributeError: 'tuple' object has no attribute 'append'
>>> x.reverse()
Traceback:
AttributeError: 'tuple' object has no attribute 'reverse'
>>>
A Tale of Two Sequences
>>> l = list()
>>> dir(l)
['append', 'count', 'extend', 'index', 'insert', 'pop',
'remove', 'reverse', 'sort']

>>> t = tuple()
>>> dir(t)
['count', 'index']
Tuples are More Efficient
• Since Python does not have to build tuple structures to be
modifiable, they are simpler and more efficient in terms of
memory use and performance than lists
• So in our program when we are making “temporary variables”
we prefer tuples over lists
Tuples and Assignment
• We can also put a tuple on the left-hand side of an assignment
statement
• We can even omit the parentheses

>>> (x, y) = (4, 'fred')


>>> print(y)
fred
>>> (a, b) = (99, 98)
>>> print(a)
99
Tuples and Dictionaries
>>> d = dict()
>>> d['csev'] = 2
>>> d['cwen'] = 4
The items() method >>> for (k,v) in d.items():
in dictionaries ... print(k, v)
returns a list of (key, ...
csev 2
value) tuples
cwen 4
>>> tups = d.items()
>>> print(tups)
dict_items([('csev', 2), ('cwen', 4)])
Tuples are Comparable
The comparison operators work with tuples and other
sequences. If the first item is equal, Python goes on to the next
element, and so on, until it finds elements that differ.
>>> (0, 1, 2) < (5, 1, 2)
True
>>> (0, 1, 2000000) < (0, 3, 4)
True
>>> ( 'Jones', 'Sally' ) < ('Jones', 'Sam')
True
>>> ( 'Jones', 'Sally') > ('Adams', 'Sam')
True
Sorting Lists of Tuples
• We can take advantage of the ability to sort a list of tuples to
get a sorted version of a dictionary
• First we sort the dictionary by the key using the items() method
and sorted() function

>>> d = {'a':10, 'b':1, 'c':22}


>>> d.items()
dict_items([('a', 10), ('c', 22), ('b', 1)])
>>> sorted(d.items())
[('a', 10), ('b', 1), ('c', 22)]
Using sorted()
>>> d = {'a':10, 'b':1, 'c':22}
We can do this even >>> t = sorted(d.items())
more directly using the >>> t
built-in function sorted [('a', 10), ('b', 1), ('c', 22)]
that takes a sequence >>> for k, v in sorted(d.items()):
as a parameter and ... print(k, v)
returns a sorted ...
a 10
sequence
b 1
c 22
Sort by Values Instead of Key
• If we could construct a >>> c = {'a':10, 'b':1, 'c':22}
>>> tmp = list()
list of tuples of the
>>> for k, v in c.items() :
form (value, key) we ... tmp.append( (v, k) )
could sort by value ...
>>> print(tmp)
• We do this with a for [(10, 'a'), (22, 'c'), (1, 'b')]
loop that creates a list >>> tmp = sorted(tmp, reverse=True)
of tuples >>> print(tmp)
[(22, 'c'), (10, 'a'), (1, 'b')]
fhand = open('romeo.txt') The top 10 most
counts = {}
for line in fhand:
common words
words = line.split()
for word in words:
counts[word] = counts.get(word, 0 ) + 1

lst = []
for key, val in counts.items():
newtup = (val, key)
lst.append(newtup)

lst = sorted(lst, reverse=True)

for val, key in lst[:10] :


print(key, val)
Even Shorter Version
>>> c = {'a':10, 'b':1, 'c':22}

>>> print( sorted( [ (v,k) for k,v in c.items() ] ) )

[(1, 'b'), (10, 'a'), (22, 'c')]

List comprehension creates a dynamic list. In this case, we


make a list of reversed tuples and then sort it.
http://wiki.python.org/moin/HowTo/Sorting
Summary
• Tuple syntax • Tuples in assignment
statements
• Immutability
• Sorting dictionaries by
• Comparability
either key or value
• Sorting
Acknowledgements / Contributions
These slides are Copyright 2010- Charles R. Severance
...
(www.dr-chuck.com) of the University of Michigan School of
Information and open.umich.edu and made available under a
Creative Commons Attribution 4.0 License. Please maintain this
last slide in all copies of the document to comply with the
attribution requirements of the license. If you make a change,
feel free to add your name and organization to the list of
contributors on this page as you republish the materials.

Initial Development: Charles Severance, University of Michigan


School of Information

… Insert new Contributors and Translators here


Python Dictionaries
Chapter 9

Python for Everybody


www.py4e.com
What is a Collection?
• A collection is nice because we can put more than one value in it
and carry them all around in one convenient package

• We have a bunch of values in a single “variable”

• We do this by having more than one place “in” the variable

• We have ways of finding the different places in the variable


What is Not a “Collection”?
Most of our variables have one value in them - when we put a new
value in the variable - the old value is overwritten

$ python
>>> x = 2
>>> x = 4
>>> print(x)
4
A Story of Two Collections..
• List

- A linear collection of values that stay in order

• Dictionary

- A “bag” of values, each with its own label


Dictionaries
tissue
calculator

perfume
money
candy

http://en.wikipedia.org/wiki/Associative_array
Dictionaries
• Dictionaries are Python’s most powerful data collection

• Dictionaries allow us to do fast database-like operations in Python

• Dictionaries have different names in different languages

- Associative Arrays - Perl / PHP

- Properties or Map or HashMap - Java

- Property Bag - C# / .Net


Dictionaries
• Lists index their entries >>> purse = dict()
based on the position in the >>> purse['money'] = 12
>>> purse['candy'] = 3
list >>> purse['tissues'] = 75
>>> print(purse)
{'money': 12, 'tissues': 75, 'candy': 3}
• Dictionaries are like bags - >>> print(purse['candy'])
no order 3
>>> purse['candy'] = purse['candy'] + 2
>>> print(purse)
• So we index the things we {'money': 12, 'tissues': 75, 'candy': 5}
put in the dictionary with a
“lookup tag”
Comparing Lists and Dictionaries
Dictionaries are like lists except that they use keys instead of
numbers to look up values

>>> lst = list() >>> ddd = dict()


>>> lst.append(21) >>> ddd['age'] = 21
>>> lst.append(183) >>> ddd['course'] = 182
>>> print(lst) >>> print(ddd)
[21, 183] {'course': 182, 'age': 21}
>>> lst[0] = 23 >>> ddd['age'] = 23
>>> print(lst) >>> print(ddd)
[23, 183] {'course': 182, 'age': 23}
>>> lst = list() List
>>> lst.append(21)
>>> lst.append(183) Key Value
>>> print(lst)
[21, 183] [0] 21
lst
>>> lst[0] = 23 [1] 183
>>> print(lst)
[23, 183]

>>> ddd = dict()


Dictionary
>>> ddd['age'] = 21
>>> ddd['course'] = 182 Key Value
>>> print(ddd)
{'course': 182, 'age': 21} ['course'] 182
>>> ddd['age'] = 23 ddd
>>> print(ddd) ['age'] 21
{'course': 182, 'age': 23}
Dictionary Literals (Constants)
• Dictionary literals use curly braces and have a list of key : value pairs

• You can make an empty dictionary using empty curly braces

>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}


>>> print(jjj)
{'jan': 100, 'chuck': 1, 'fred': 42}
>>> ooo = { }
>>> print(ooo)
{}
>>>
Most Common Name?
Most Common Name?
marquard cwen cwen
zhen marquard zhen
csev
csev zhen
marquard
zhen csev zhen
Most Common Name?
marquard cwen cwen
zhen marquard zhen
csev
csev zhen
marquard
zhen csev zhen
Many Counters with a Dictionary
Key Value
One common use of dictionaries is
counting how often we “see” something
>>> ccc = dict()
>>> ccc['csev'] = 1
>>> ccc['cwen'] = 1
>>> print(ccc)
{'csev': 1, 'cwen': 1}
>>> ccc['cwen'] = ccc['cwen'] + 1
>>> print(ccc)
{'csev': 1, 'cwen': 2}
Dictionary Tracebacks
• It is an error to reference a key which is not in the dictionary

• We can use the in operator to see if a key is in the dictionary

>>> ccc = dict()


>>> print(ccc['csev'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'csev'
>>> 'csev' in ccc
False
When We See a New Name
When we encounter a new name, we need to add a new entry in the
dictionary and if this the second or later time we have seen the name,
we simply add one to the count in the dictionary under that name

counts = dict()
names = ['csev', 'cwen', 'csev', 'zqian', 'cwen']
for name in names :
if name not in counts: {'csev': 2, 'zqian': 1, 'cwen': 2}
counts[name] = 1
else :
counts[name] = counts[name] + 1
print(counts)
The get Method for Dictionaries
The pattern of checking to see if a if name in counts:
key is already in a dictionary and x = counts[name]
assuming a default value if the key else :
is not there is so common that there x = 0
is a method called get() that does
this for us
x = counts.get(name, 0)

Default value if key does not exist


(and no Traceback). {'csev': 2, 'zqian': 1, 'cwen': 2}
Simplified Counting with get()
We can use get() and provide a default value of zero when the key
is not yet in the dictionary - and then just add one

counts = dict()
names = ['csev', 'cwen', 'csev', 'zqian', 'cwen']
for name in names :
counts[name] = counts.get(name, 0) + 1
print(counts)

Default {'csev': 2, 'zqian': 1, 'cwen': 2}


Simplified Counting with get()

counts = dict()
names = ['csev', 'cwen', 'csev', 'zqian', 'cwen']
for name in names :
counts[name] = counts.get(name, 0) + 1
print(counts)

http://www.youtube.com/watch?v=EHJ9uYx5L58
Counting Words in Text
Writing programs (or programming) is a very creative and rewarding activity. You can write
programs for many reasons ranging from making your living to solving a difficult data analysis
problem to having fun to helping someone else solve a problem. This book assumes that
everyone needs to know how to program and that once you know how to program, you will figure
out what you want to do with your newfound skills.

We are surrounded in our daily lives with computers ranging from laptops to cell phones. We
can think of these computers as our “personal assistants” who can take care of many things on
our behalf. The hardware in our current-day computers is essentially built to continuously ask us
the question, “What would you like me to do next?”

Our computers are fast and have vast amounts of memory and could be very helpful to us if we
only knew the language to speak to explain to the computer what we would like it to do next. If
we knew this language we could tell the computer to do tasks on our behalf that were repetitive.
Interestingly, the kinds of things computers can do best are often the kinds of things that we
humans find boring and mind-numbing.
Counting Pattern
counts = dict()
print('Enter a line of text:') The general pattern to count the
line = input('')
words in a line of text is to split
words = line.split() the line into words, then loop
through the words and use a
print('Words:', words) dictionary to track the count of
each word independently.
print('Counting...')
for word in words:
counts[word] = counts.get(word,0) + 1
print('Counts', counts)
python wordcount.py
Enter a line of text:
the clown ran after the car and the car ran into the tent
and the tent fell down on the clown and the car

Words: ['the', 'clown', 'ran', 'after', 'the', 'car',


'and', 'the', 'car', 'ran', 'into', 'the', 'tent', 'and',
'the', 'tent', 'fell', 'down', 'on', 'the', 'clown',
'and', 'the', 'car']
Counting…

Counts {'and': 3, 'on': 1, 'ran': 2, 'car': 3, 'into': 1,


'after': 1, 'clown': 2, 'down': 1, 'fell': 1, 'the': 7,
'tent': 2}

http://www.flickr.com/photos/71502646@N00/2526007974/
python wordcount.py
counts = dict() Enter a line of text:
line = input('Enter a line of text:') the clown ran after the car and the car ran
words = line.split()
into the tent and the tent fell down on the
print('Words:', words) clown and the car
print('Counting...’)
Words: ['the', 'clown', 'ran', 'after', 'the', 'car',
for word in words: 'and', 'the', 'car', 'ran', 'into', 'the', 'tent', 'and',
counts[word] = counts.get(word,0) + 1 'the', 'tent', 'fell', 'down', 'on', 'the', 'clown',
print('Counts', counts)
'and', 'the', 'car']
Counting...

Counts {'and': 3, 'on': 1, 'ran': 2, 'car': 3,


'into': 1, 'after': 1, 'clown': 2, 'down': 1, 'fell':
1, 'the': 7, 'tent': 2}
Definite Loops and Dictionaries
Even though dictionaries are not stored in order, we can write a for
loop that goes through all the entries in a dictionary - actually it
goes through all of the keys in the dictionary and looks up the
values
>>> counts = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
>>> for key in counts:
... print(key, counts[key])
...
jan 100
chuck 1
fred 42
>>>
Retrieving Lists of Keys and Values
>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
You can get a list >>> print(list(jjj))
['jan', 'chuck', 'fred']
of keys, values, or >>> print(list(jjj.keys()))
items (both) from ['jan', 'chuck', 'fred']
a dictionary >>> print(list(jjj.values()))
[100, 1, 42]
>>> print(list(jjj.items()))
[('jan', 100), ('chuck', 1), ('fred', 42)]
>>>

What is a “tuple”? - coming soon...


Bonus: Two Iteration Variables!
• We loop through the
key-value pairs in a jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
for aaa,bbb in jjj.items() :
dictionary using *two* print(aaa, bbb)
iteration variables aaa bbb

• Each iteration, the first jan 100 [jan] 100


chuck 1
variable is the key and fred 42 [chuck] 1
the second variable is
the corresponding [fred] 42
value for the key
Summary
Acknowledgements / Contributions
These slides are Copyright 2010- Charles R. Severance
...
(www.dr-chuck.com) of the University of Michigan School of
Information and open.umich.edu and made available under a
Creative Commons Attribution 4.0 License. Please maintain this
last slide in all copies of the document to comply with the
attribution requirements of the license. If you make a change,
feel free to add your name and organization to the list of
contributors on this page as you republish the materials.

Initial Development: Charles Severance, University of Michigan


School of Information

… Insert new Contributors or translation credits here


Python Objects
Charles Severance

Python for Everybody


www.py4e.com
Warning
• This lecture is very much about definitions and
mechanics for objects
• This lecture is a lot more about “how it works” and less
about “how you use it”
• You won’t get the entire picture until this is all looked at
in the context of a real problem
• So please suspend disbelief and learn technique for the
next 40 or so slides…
https://docs.python.org/3/tutorial/datastructures.html
https://docs.python.org/3/library/sqlite3.html
Lets Start with Programs
Europe floor? 0
inp = input('Europe floor?') US floor 1
usf = int(inp) + 1
print('US floor', usf)

Input Process Output


Object Oriented
• A program is made up of many cooperating objects
• Instead of being the “whole program” - each object is a
little “island” within the program and cooperatively
working with other objects
• A program is made up of one or more objects working
together - objects make use of each other’s capabilities
Object
• An Object is a bit of self-contained Code and Data
• A key aspect of the Object approach is to break the
problem into smaller understandable parts (divide and
conquer)
• Objects have boundaries that allow us to ignore un-needed
detail
• We have been using objects all along: String Objects,
Integer Objects, Dictionary Objects, List Objects...
Input
Dictionary
Object

Object
String
Objects get
created and
used Output
Input
Code/Data
Code/Data

Code/Data
Code/Data
Objects are
bits of code
and data Output
Input
Code/Data
Code/Data

Code/Data
Code/Data
Objects hide detail
- they allow us to
ignore the detail of
the “rest of the Output
program”.
Input
Code/Data
Code/Data

Code/Data
Code/Data
Objects hide detail -
they allow the “rest
of the program” to
ignore the detail Output
about “us”.
Definitions
• Class - a template
• Method or Message - A defined capability of a class
• Field or attribute- A bit of data in a class
• Object or Instance - A particular instance of a class
Terminology: Class
Defines the abstract characteristics of a thing (object), including the
thing's characteristics (its attributes, fields or properties) and the
thing's behaviors (the things it can do, or methods, operations or
features). One might say that a class is a blueprint or factory that
describes the nature of something. For example, the class Dog would
consist of traits shared by all dogs, such as breed and fur color
(characteristics), and the ability to bark and sit (behaviors).

http://en.wikipedia.org/wiki/Object-oriented_programming
Terminology: Instance
One can have an instance of a class or a particular object.
The instance is the actual object created at runtime. In
programmer jargon, the Lassie object is an instance of the
Dog class. The set of values of the attributes of a particular
object is called its state. The object consists of state and the
behavior that's defined in the object's class.
Object and Instance are often used interchangeably.

http://en.wikipedia.org/wiki/Object-oriented_programming
Terminology: Method
An object's abilities. In language, methods are verbs. Lassie, being a
Dog, has the ability to bark. So bark() is one of Lassie's methods. She
may have other methods as well, for example sit() or eat() or walk() or
save_timmy(). Within the program, using a method usually affects
only one particular object; all Dogs can bark, but you need only one
particular dog to do the barking

Method and Message are often used interchangeably.

http://en.wikipedia.org/wiki/Object-oriented_programming
Some Python Objects
>>> dir(x)
>>> x = 'abc' [ … 'capitalize', 'casefold', 'center', 'count',
>>> type(x) 'encode', 'endswith', 'expandtabs', 'find',
<class 'str'> 'format', … 'lower', 'lstrip', 'maketrans',
>>> type(2.5) 'partition', 'replace', 'rfind', 'rindex', 'rjust',
<class 'float'> 'rpartition', 'rsplit', 'rstrip', 'split',
>>> type(2) 'splitlines', 'startswith', 'strip', 'swapcase',
<class 'int'> 'title', 'translate', 'upper', 'zfill']
>>> y = list() >>> dir(y)
>>> type(y) [… 'append', 'clear', 'copy', 'count', 'extend',
<class 'list'> 'index', 'insert', 'pop', 'remove', 'reverse',
>>> z = dict() 'sort']
>>> type(z) >>> dir(z)
<class 'dict'> […, 'clear', 'copy', 'fromkeys', 'get', 'items',
'keys', 'pop', 'popitem', 'setdefault', 'update',
'values']
A Sample Class
This is the template
class is a reserved
class PartyAnimal: for making
word
x=0 PartyAnimal objects

def party(self) : Each PartyAnimal


Each PartyAnimal
self.x = self.x + 1 object has a bit of
object has a bit of
print("So far",self.x) data
code
Construct a
an = PartyAnimal() PartyAnimal object
and store in an
an.party()
Tell the an object an.party() PartyAnimal.party(an)
to run the party() an.party()
code within it
class PartyAnimal: $ python party1.py
x=0

def party(self) :
self.x = self.x + 1
print("So far",self.x)

an = PartyAnimal()

an.party()
an.party()
an.party()
class PartyAnimal:
$ python party1.py
x=0

def party(self) :
self.x = self.x + 1
print("So far",self.x)
an
x 0
an = PartyAnimal()
party()
an.party()
an.party()
an.party()
class PartyAnimal: $ python party1.py
x=0 So far 1
So far 2
def party(self) : So far 3
self.x = self.x + 1
print("So far",self.x)
an
self x
an = PartyAnimal()
party()
an.party()
an.party()
an.party() PartyAnimal.party(an)
Playing with dir() and type()
A Nerdy Way to Find Capabilities
>>> y = list()
• The dir() command lists >>> type(y)
capabilities <class 'list'>
>>> dir(y)
• Ignore the ones with underscores ['__add__', '__class__',
'__contains__', '__delattr__',
- these are used by Python itself
'__delitem__', '__delslice__',
'__doc__', … '__setitem__',
• The rest are real operations that '__setslice__', '__str__',
the object can perform 'append', 'clear', 'copy',
'count', 'extend', 'index',
• It is like type() - it tells us 'insert', 'pop', 'remove',
'reverse', 'sort']
something *about* a variable >>>
class PartyAnimal:
x = 0 We can use dir() to find
the “capabilities” of our
def party(self) : newly created class.
self.x = self.x + 1
print("So far",self.x)

an = PartyAnimal()
$ python party3.py
print("Type", type(an)) Type <class '__main__.PartyAnimal'>
print("Dir ", dir(an)) Dir ['__class__', ... 'party', 'x']
Try dir() with a String
>>> x = 'Hello there'
>>> dir(x)
['__add__', '__class__', '__contains__', '__delattr__',
'__doc__', '__eq__', '__ge__', '__getattribute__',
'__getitem__', '__getnewargs__', '__getslice__', '__gt__',
'__hash__', '__init__', '__le__', '__len__', '__lt__',
'__repr__', '__rmod__', '__rmul__', '__setattr__', '__str__',
'capitalize', 'center', 'count', 'decode', 'encode', 'endswith',
'expandtabs', 'find', 'index', 'isalnum', 'isalpha', 'isdigit',
'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust',
'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex',
'rjust', 'rpartition', 'rsplit', 'rstrip', 'split',
'splitlines', 'startswith', 'strip', 'swapcase', 'title',
'translate', 'upper', 'zfill']
Object Lifecycle
http://en.wikipedia.org/wiki/Constructor_(computer_science)
Object Lifecycle
• Objects are created, used, and discarded
• We have special blocks of code (methods) that get called
- At the moment of creation (constructor)
- At the moment of destruction (destructor)
• Constructors are used a lot
• Destructors are seldom used
Constructor
The primary purpose of the constructor is to set up some
instance variables to have the proper initial values when
the object is created
class PartyAnimal:
x = 0
$ python party4.py
def __init__(self):
I am constructed
print('I am constructed')
So far 1
def party(self) : So far 2
self.x = self.x + 1 I am destructed 2
print('So far',self.x) an contains 42

def __del__(self):
print('I am destructed', self.x)

an = PartyAnimal()
The constructor and destructor are
an.party() optional. The constructor is
an.party() typically used to set up variables.
an = 42
The destructor is seldom used.
print('an contains',an)
Constructor
In object oriented programming, a constructor in a class
is a special block of statements called when an object is
created

http://en.wikipedia.org/wiki/Constructor_(computer_science)
Many Instances
• We can create lots of objects - the class is the template
for the object

• We can store each distinct object in its own variable


• We call this having multiple instances of the same class
• Each instance has its own copy of the instance variables
class PartyAnimal:
Constructors can have
x = 0 additional parameters.
name = "" These can be used to set up
def __init__(self, z): instance variables for the
self.name = z
print(self.name,"constructed")
particular instance of the
class (i.e., for the particular
def party(self) : object).
self.x = self.x + 1
print(self.name,"party count",self.x)

s = PartyAnimal("Sally")
j = PartyAnimal("Jim")

s.party()
j.party()
s.party() party5.py
class PartyAnimal:
x = 0
name = ""
def __init__(self, z):
self.name = z
print(self.name,"constructed")

def party(self) :
self.x = self.x + 1
print(self.name,"party count",self.x)

s = PartyAnimal("Sally")
j = PartyAnimal("Jim")

s.party()
j.party()
s.party()
class PartyAnimal:
x = 0
name = "" s
def __init__(self, z): x: 0
self.name = z
print(self.name,"constructed")
name:
def party(self) :
self.x = self.x + 1
print(self.name,"party count",self.x)

s = PartyAnimal("Sally")
j = PartyAnimal("Jim")

s.party()
j.party()
s.party()
class PartyAnimal:
x = 0
name = "" s
def __init__(self, z): x: 0
self.name = z
print(self.name,"constructed")
name: Sally
def party(self) :
self.x = self.x + 1
print(self.name,"party count",self.x)

s = PartyAnimal("Sally") j
j = PartyAnimal("Jim") x: 0
We have two
s.party()
j.party()
independent name: Jim
s.party() instances
class PartyAnimal:
x = 0
name = "" Sally constructed
def __init__(self, z): Jim constructed
Sally party count 1
self.name = z
Jim party count 1
print(self.name,"constructed") Sally party count 2

def party(self) :
self.x = self.x + 1
print(self.name,"party count",self.x)

s = PartyAnimal("Sally")
j = PartyAnimal("Jim")

s.party()
j.party()
s.party()
Inheritance
http://www.ibiblio.org/g2swap/byteofpython/read/inheritance.html
Inheritance
• When we make a new class - we can reuse an existing
class and inherit all the capabilities of an existing class
and then add our own little bit to make our new class
• Another form of store and reuse
• Write once - reuse many times
• The new class (child) has all the capabilities of the old
class (parent) - and then some more
Terminology: Inheritance

‘Subclasses’ are more specialized versions of a class, which


inherit attributes and behaviors from their parent classes, and
can introduce their own.

http://en.wikipedia.org/wiki/Object-oriented_programming
class PartyAnimal:
x = 0 s = PartyAnimal("Sally")
name = "" s.party()
def __init__(self, nam):
self.name = nam j = FootballFan("Jim")
print(self.name,"constructed") j.party()
j.touchdown()
def party(self) :
self.x = self.x + 1
print(self.name,"party count",self.x)
FootballFan is a class which
class FootballFan(PartyAnimal):
extends PartyAnimal. It has all
points = 0
def touchdown(self): the capabilities of PartyAnimal
self.points = self.points + 7 and more.
self.party()
print(self.name,"points",self.points)
class PartyAnimal:
x = 0 s = PartyAnimal("Sally")
name = "" s.party()
def __init__(self, nam):
self.name = nam j = FootballFan("Jim")
print(self.name,"constructed") j.party()
j.touchdown()
def party(self) :
self.x = self.x + 1
print(self.name,"party count",self.x) s
x:
class FootballFan(PartyAnimal):
points = 0
def touchdown(self):
name: Sally
self.points = self.points + 7
self.party()
print(self.name,"points",self.points)
class PartyAnimal:
x = 0 s = PartyAnimal("Sally")
name = "" s.party()
def __init__(self, nam):
self.name = nam j = FootballFan("Jim")
print(self.name,"constructed") j.party()
j.touchdown()
def party(self) :
self.x = self.x + 1
print(self.name,"party count",self.x) j
x:
class FootballFan(PartyAnimal):
points = 0
def touchdown(self): name: Jim
self.points = self.points + 7
self.party() points:
print(self.name,"points",self.points)
Definitions
• Class - a template
• Attribute – A variable within a class
• Method - A function within a class
• Object - A particular instance of a class
• Constructor – Code that runs when an object is created
• Inheritance - The ability to extend a class to make a new class.
Summary
• Object Oriented programming is a very structured
approach to code reuse

• We can group data and functionality together and create


many independent instances of a class
Acknowledgements / Contributions
Thes slide are Copyright 2010- Charles R. Severance (www.dr-
...
chuck.com) of the University of Michigan School of Information
and made available under a Creative Commons Attribution 4.0
License. Please maintain this last slide in all copies of the
document to comply with the attribution requirements of the
license. If you make a change, feel free to add your name and
organization to the list of contributors on this page as you
republish the materials.

Initial Development: Charles Severance, University of Michigan


School of Information

… Insert new Contributors here


Additional Source Information
• Snowman Cookie Cutter" by Didriks is licensed under CC BY
https://www.flickr.com/photos/dinnerseries/23570475099

• Photo from the television program Lassie. Lassie watches as Jeff (Tommy Rettig) works on his bike is Public
Domain
https://en.wikipedia.org/wiki/Lassie#/media/File:Lassie_and_Tommy_Rettig_1956.JPG
Introducing
DataFrames
DATA M A NIPULATION W ITH PA NDA S
pandasisbuiltonNumPyandMatplotlib
pandasispopular

https://pypistats.org/packages/pandas
Rectangulardata
Name Breed Color Height (cm) Weight (kg) Date of Birth

Bella Labrador Brown 56 25 2013-07-01

Charlie Poodle Black 43 23 2016-09-16

Lucy Chow Chow Brown 46 22 2014-08-25

Cooper Schnauzer Gray 49 17 2011-12-11

Max Labrador Black 59 29 2017-01-20

Stella Chihuahua Tan 18 2 2015-04-20

Bernie St. Bernard White 77 74 2018-02-27


pandasDataFrames
print(dogs)

name breed color height_cm weight_kg date_of_birth


0 Bella Labrador Brown 56 24 2013-07-01
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
3 Cooper Schnauzer Gray 49 17 2011-12-11
4 Max Labrador Black 59 29 2017-01-20
5 Stella Chihuahua Tan 18 2 2015-04-20
6 Bernie St. Bernard White 77 74 2018-02-27
ExploringaDataFrame:.head()
dogs.head()

name breed color height_cm weight_kg date_of_birth


0 Bella Labrador Brown 56 24 2013-07-01
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
3 Cooper Schnauzer Gray 49 17 2011-12-11
4 Max Labrador Black 59 29 2017-01-20
ExploringaDataFrame:.info()
dogs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 6 columns):
name 7 non-null object
breed 7 non-null object
color 7 non-null object
height_cm 7 non-null int64
weight_kg 7 non-null int64
date_of_birth 7 non-null object
dtypes: int64(2), object(4)
ExploringaDataFrame:.shape
dogs.shape

(7, 6)
ExploringaDataFrame:.describe()
dogs.describe()

height_cm weight_kg
count 7.000000 7.000000
mean 49.714286 27.428571
std 17.960274 22.292429
min 18.000000 2.000000
25% 44.500000 19.500000
50% 49.000000 23.000000
75% 57.500000 27.000000
max 77.000000 74.000000
ComponentsofaDataFrame:.values
dogs.values

array([['Bella', 'Labrador', 'Brown', 56, 24, '2013-07-01'],


['Charlie', 'Poodle', 'Black', 43, 24, '2016-09-16'],
['Lucy', 'Chow Chow', 'Brown', 46, 24, '2014-08-25'],
['Cooper', 'Schnauzer', 'Gray', 49, 17, '2011-12-11'],
['Max', 'Labrador', 'Black', 59, 29, '2017-01-20'],
['Stella', 'Chihuahua', 'Tan', 18, 2, '2015-04-20'],
['Bernie', 'St. Bernard', 'White', 77, 74, '2018-02-
27']], dtype=object)
ComponentsofaDataFrame:.columnsand.index
dogs.columns

Index(['name', 'breed', 'color', 'height_cm', 'weight_kg',


'date_of_birth'], dtype='object')

dogs.index

RangeIndex(start=0, stop=7, step=1)


pandasPhilosophy
There should be one -- and preferably only one -- obvious way to do it.

- The Zen of Python by Tim Peters, Item 13

https://peps.python.org/pep-0020/
Sortingand
subsetting
DATA M A NIPULATION W ITH PA NDA S

Richie Cotton
Curriculum Architect at DataCamp
Sorting
dogs.sort_values("weight_kg")

name breed color height_cm weight_kg date_of_birth


5 Stella Chihuahua Tan 18 2 2015-04-20
3 Cooper Schnauzer Gray 49 17 2011-12-11
0 Bella Labrador Brown 56 24 2013-07-01
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
4 Max Labrador Black 59 29 2017-01-20
6 Bernie St. Bernard White 77 74 2018-02-27
Sortingindescendingorder
dogs.sort_values("weight_kg",
ascending=False)

name breed color height_cm weight_kg date_of_birth


6 Bernie St. Bernard White 77 74 2018-02-27
4 Max Labrador Black 59 29 2017-01-20
1 Bella Labrador Brown 56 24 2013-07-01
2 Charlie Poodle Black 43 24 2016-09-16
3 Lucy Chow Chow Brown 46 24 2014-08-25
4 Cooper Schnauzer Gray 49 17 2011-12-11
5 Stella Chihuahua Tan 18 2 2015-04-20
Sortingbymultiplevariables
dogs.sort_values(["weight_kg", "height_cm"])

name breed color height_cm weight_kg date_of_birth


5 Stella Chihuahua Tan 18 2 2015-04-20
3 Cooper Schnauzer Gray 49 17 2011-12-11
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
0 Bella Labrador Brown 56 24 2013-07-01
4 Max Labrador Black 59 29 2017-01-20
6 Bernie St. Bernard White 77 74 2018-02-27
Sortingbymultiplevariables
dogs.sort_values(["weight_kg", "height_cm"], ascending=[True,
False])

name breed color height_cm weight_kg date_of_birth


5 Stella Chihuahua Tan 18 2 2015-04-20
3 Cooper Schnauzer Gray 49 17 2011-12-11
0 Bella Labrador Brown 56 24 2013-07-01
2 Lucy Chow Chow Brown 46 24 2014-08-25
1 Charlie Poodle Black 43 24 2016-09-16
4 Max Labrador Black 59 29 2017-01-20
6 Bernie St. Bernard White 77 74 2018-02-27
Subsettingcolumns
dogs["name"]

0 Bella
1 Charlie
2 Lucy
3 Cooper
4 Max
5 Stella
6 Bernie
Name: name, dtype: object
Subsettingmultiplecolumns
dogs[["breed", "height_cm"]] cols_to_subset = ["breed", "height_cm"]
dogs[cols_to_subset]

breed height_cm
0 Labrador 56 breed height_cm
1 Poodle 43 0 Labrador 56
2 Chow Chow 46 1 Poodle 43
3 Schnauzer 49 2 Chow Chow 46
4 Labrador 59 3 Schnauzer 49
5 Chihuahua 18 4 Labrador 59
6 St. Bernard 77 5 Chihuahua 18
6 St. Bernard 77
Subsettingrows
dogs["height_cm"] > 50

0 True
1 False
2 False
3 False
4 True
5 False
6 True
Name: height_cm, dtype: bool
Subsettingrows
dogs[dogs["height_cm"] > 50]

name breed color height_cm weight_kg date_of_birth


0 Bella Labrador Brown 56 24 2013-07-01
4 Max Labrador Black 59 29 2017-01-20
6 Bernie St. Bernard White 77 74 2018-02-27
Subsettingbasedontextdata
dogs[dogs["breed"] == "Labrador"]

name breed color height_cm weight_kg date_of_birth


0 Bella Labrador Brown 56 24 2013-07-01
4 Max Labrador Black 59 29 2017-01-20
Subsettingbasedondates
dogs[dogs["date_of_birth"] > "2015-01-01"]

name breed color height_cm weight_kg date_of_birth


1 Charlie Poodle Black 43 24 2016-09-16
4 Max Labrador Black 59 29 2017-01-20
5 Stella Chihuahua Tan 18 2 2015-04-20
6 Bernie St. Bernard White 77 74 2018-02-27
Subsettingbasedonmultipleconditions
is_lab = dogs["breed"] ==
"Labrador"
is_brown = dogs["color"] == "Brown"

dogs[is_lab & is_brown]

name breed color height_cm weight_kg date_of_birth


0 Bella Labrador Brown 56 24 2013-07-01

dogs[ (dogs["breed"] == "Labrador") & (dogs["color"] == "Brown") ]


Subsettingusing.isin()
is_black_or_brown = dogs["color"].isin(["Black",
"Brown"]) dogs[is_black_or_brown]

name breed color height_cm weight_kg date_of_birth


1 Bella Labrador Brown 56 24 2013-07-01
2 Charlie Poodle Black 43 24 2016-09-16
3 Lucy Chow Chow Brown 46 24 2014-08-25
4 Max Labrador Black 59 29 2017-01-20
Newcolumns
DATA M A NIPULATION W ITH PA NDA S

Richie Cotton
Curriculum Architect at DataCamp
Addinganewcolumn
dogs["height_m"] = dogs["height_cm"] / 100

print(dogs)

name breed color height_cm weight_kg date_of_birth height_m


0 Bella Labrador Brown 56 24 2013-07-01 0.56
1 Charlie Poodle Black 43 24 2016-09-16 0.43
2 Lucy Chow Chow Brown 46 24 2014-08-25 0.46
3 Cooper Schnauzer Gray 49 17 2011-12-11 0.49
4 Max Labrador Black 59 29 2017-01-20 0.59
5 Stella Chihuahua Tan 18 2 2015-04-20 0.18
6 Bernie St. Bernard White 77 74 2018-02-27 0.77
Doggymassindex
BMI = weight in kg/(height in m)2

dogs["bmi"] = dogs["weight_kg"] / dogs["height_m"] ** 2


print(dogs.head())

name breed color height_cm weight_kg date_of_birth height_m bmi


0 Bella Labrador Brown 56 24 2013-07-01 0.56 76.530612
1 Charlie Poodle Black 43 24 2016-09-16 0.43 129.799892
2 Lucy Chow Chow Brown 46 24 2014-08-25 0.46 113.421550
3 Cooper Schnauzer Gray 49 17 2011-12-11 0.49 70.803832
4 Max Labrador Black 59 29 2017-01-20 0.59 83.309394
Multiplemanipulations
bmi_lt_100 = dogs[dogs["bmi"] < 100]
bmi_lt_100_height = bmi_lt_100.sort_values("height_cm", ascending=False)

bmi_lt_100_height[["name", "height_cm", "bmi"]]

name height_cm bmi


4 Max 59 83.309394
0 Bella 56 76.530612
3 Cooper 49 70.803832
5 Stella 18 61.728395
Summarystatistics
DATA M A NIPULATION W ITH PA NDA S

Maggie Matsui
Content Developer at DataCamp
Summarizingnumericaldata
dogs["height_cm"].mean() .median() , .mode()

.min() , .max()

49.714285714285715 .var() , .std()

.sum()

.quantile()
Summarizingdates
Oldest dog:

dogs["date_of_birth"].min()

'2011-12-11'

Youngest dog:

dogs["date_of_birth"].max()

'2018-02-27'
The.agg()method
def pct30(column):
return column.quantile(0.3)

dogs["weight_kg"].agg(pct30)

22.599999999999998
Summariesonmultiplecolumns
dogs[["weight_kg", "height_cm"]].agg(pct30)

weight_kg 22.6
height_cm 45.4
dtype: float64
Multiplesummaries
def pct40(column):
return column.quantile(0.4)

dogs["weight_kg"].agg([pct30,
pct40])

pct30 22.6
pct40 24.0
Name: weight_kg, dtype:
float64
Cumulativesum
dogs["height_cm"] dogs["height_cm"].cumsum()

0 56 0 56
1 43 1 99
2 46 2 145
3 49 3 194
4 59 4 253
5 18 5 271
6 77 6 348
Name: height_cm, dtype: Name: height_cm, dtype:
int64 int64
Cumulativestatistics
.cummax()

.cummin()

.cumprod()
Walmart
sales.head()

store type dept date weekly_sales is_holiday temp_c fuel_price unemp


0 1 A 1 2010-02-05 24924.50 False 5.73 0.679 8.106
1 1 A 2 2010-02-05 50605.27 False 5.73 0.679 8.106
2 1 A 3 2010-02-05 13740.12 False 5.73 0.679 8.106
3 1 A 4 2010-02-05 39954.04 False 5.73 0.679 8.106
4 1 A 5 2010-02-05 32229.38 False 5.73 0.679 8.106
Counting
DATA M A N IP U LATION W ITH PA N DA S

Maggie Matsui
Content Developer at DataCamp
Avoidingdoublecounting
Vetvisits
print(vet_visits)

date name breed weight_kg


0 2018-09-02 Bella Labrador 24.87
1 2019-06-07 Max Labrador 28.35
2 2018-01-17 Stella Chihuahua 1.51
3 2019-10-19 Lucy Chow Chow 24.07
.. ... ... ... ...
71 2018-01-20 Stella Chihuahua 2.83
72 2019-06-07 Max Chow Chow 24.01
73 2018-08-20 Lucy Chow Chow 24.40
74 2019-04-22 Max Labrador 28.54
Droppingduplicatenames
vet_visits.drop_duplicates(subset="name")

date name breed weight_kg


0 2018-09-02 Bella Labrador 24.87
1 2019-06-07 Max Chow Chow 24.01
2 2019-03-19 Charlie Poodle 24.95
3 2018-01-17 Stella Chihuahua 1.51
4 2019-10-19 Lucy Chow Chow 24.07
7 2019-03-30 Cooper Schnauzer 16.91
10 2019-01-04 Bernie St. Bernard 74.98
(6 2019-06-07 Max Labrador 28.35)
Droppingduplicatepairs
unique_dogs = vet_visits.drop_duplicates(subset=["name", "breed"])
print(unique_dogs)

date name breed weight_kg


0 2018-09-02 Bella Labrador 24.87
1 2019-03-13 Max Chow Chow 24.13
2 2019-03-19 Charlie Poodle 24.95
3 2018-01-17 Stella Chihuahua 1.51
4 2019-10-19 Lucy Chow Chow 24.07
6 2019-06-07 Max Labrador 28.35
7 2019-03-30 Cooper Schnauzer 16.91
10 2019-01-04 Bernie St. Bernard 74.98
Easyas1,2,3
unique_dogs["breed"].value_counts() unique_dogs["breed"].value_counts(sort=Tru

Labrador 2 Labrador 2
Schnauzer 1 Chow Chow 2
St. Bernard 1 Schnauzer 1
Chow Chow 2 St. Bernard 1
Poodle 1 Poodle 1
Chihuahua 1 Chihuahua 1
Name: breed, dtype: int64 Name: breed, dtype: int64
Proportions
unique_dogs["breed"].value_counts(normalize=True)

Labrador 0.250
Chow Chow 0.250
Schnauzer 0.125
St. Bernard 0.125
Poodle 0.125
Chihuahua 0.125
Name: breed, dtype: float64
Groupedsummary
statistics
DATA M A NIPULATION W ITH PA NDA S

Maggie Matsui
Content Developer at DataCamp
Summariesbygroup
dogs[dogs["color"] == "Black"]["weight_kg"].mean()
dogs[dogs["color"] == "Brown"]["weight_kg"].mean()
dogs[dogs["color"] == "White"]["weight_kg"].mean()
dogs[dogs["color"] == "Gray"]["weight_kg"].mean()
dogs[dogs["color"] == "Tan"]["weight_kg"].mean()

26.0
24.0
74.0
17.0
2.0
Groupedsummaries
dogs.groupby("color")["weight_kg"].mean()

color
Black 26.5
Brown 24.0
Gray 17.0
Tan 2.0
White 74.0
Name: weight_kg, dtype:
float64
Multiplegroupedsummaries
dogs.groupby("color")["weight_kg"].agg([min, max,
sum])

min max sum


color
Black 24 29 53
Brown 24 24 48
Gray 17 17 17
Tan 2 2 2
White 74 74 74
Groupingbymultiplevariables
dogs.groupby(["color", "breed"])["weight_kg"].mean()

color breed
Black Chow Chow 25
Labrador 29
Poodle 24
Brown Chow Chow 24
Labrador 24
Gray Schnauzer 17
Tan Chihuahua 2
White St. Bernard 74
Name: weight_kg, dtype: int64
Manygroups,manysummaries
dogs.groupby(["color", "breed"])[["weight_kg",
"height_cm"]].mean()

weight_kg height_cm
color breed
Black Labrador 29 59
Poodle 24 43
Brown Chow Chow 24 46
Labrador 24 56
Gray Schnauzer 17 49
Tan Chihuahua 2 18
White St. Bernard 74 77
Pivottables
DATA M A NIPULATION W ITH PA NDA S

Maggie Matsui
Content Developer at DataCamp
Groupbytopivottable
dogs.groupby("color")["weight_kg"].mean() dogs.pivot_table(values="weight_kg",
index="color")

color
Black 26 weight_kg
Brown 24 color
Gray 17 Black 26.5
Tan 2 Brown 24.0
White 74 Gray 17.0
Name: weight_kg, dtype: int64 Tan 2.0
White 74.0
Differentstatistics
import numpy as np
dogs.pivot_table(values="weight_kg", index="color",
aggfunc=np.median)

weight_kg
color
Black 26.5
Brown 24.0
Gray 17.0
Tan 2.0
White 74.0
Multiplestatistics
dogs.pivot_table(values="weight_kg", index="color", aggfunc=[np.mean,
np.median])

mean median
weight_kg weight_kg
color
Black 26.5 26.5
Brown 24.0 24.0
Gray 17.0 17.0
Tan 2.0 2.0
White 74.0 74.0
Pivotontwovariables
dogs.groupby(["color", "breed"])["weight_kg"].mean()

dogs.pivot_table(values="weight_kg", index="color",
columns="breed")

breed Chihuahua Chow Chow Labrador Poodle Schnauzer St. Bernard


color
Black NaN NaN 29.0 24.0 NaN NaN
Brown NaN 24.0 24.0 NaN NaN NaN
Gray NaN NaN NaN NaN 17.0 NaN
Tan 2.0 NaN NaN NaN NaN NaN
White NaN NaN NaN NaN NaN 74.0
Fillingmissingvaluesinpivottables
dogs.pivot_table(values="weight_kg", index="color", columns="breed",
fill_value=0)

breed Chihuahua Chow Chow Labrador Poodle Schnauzer St. Bernard


color
Black 0 0 29 24 0 0
Brown 0 24 24 0 0 0
Gray 0 0 0 0 17 0
Tan 2 0 0 0 0 0
White 0 0 0 0 0 74
Summing withpivottables
dogs.pivot_table(values="weight_kg", index="color",
columns="breed", fill_value=0, margins=True)

breed Chihuahua Chow Chow Labrador Poodle Schnauzer St. Bernard All
color
Black 0 0 29 24 0 0 26.500000
Brown 0 24 24 0 0 0 24.000000
Gray 0 0 0 0 17 0 17.000000
Tan 2 0 0 0 0 0 2.000000
White 0 0 0 0 0 74 74.000000
All 2 24 26 24 17 74 27.714286
Explicitindexes
DATA M A NIPULATION W ITH PA NDA S

Richie Cotton
Curriculum Architect at DataCamp
Thedogdataset,revisited
print(dogs)

name breed color height_cm weight_kg


0 Bella Labrador Brown 56 25
1 Charlie Poodle Black 43 23
2 Lucy Chow Chow Brown 46 22
3 Cooper Schnauzer Gray 49 17
4 Max Labrador Black 59 29
5 Stella Chihuahua Tan 18 2
6 Bernie St. Bernard White 77 74
.columnsand.index
dogs.columns

Index(['name', 'breed', 'color', 'height_cm', 'weight_kg'],


dtype='object')

dogs.index

RangeIndex(start=0, stop=7, step=1)


Settingacolumnastheindex
dogs_ind = dogs.set_index("name")

print(dogs_ind)

breed color height_cm weight_kg


name
Bella Labrador Brown 56 25
Charlie Poodle Black 43 23
Lucy Chow Chow Brown 46 22
Cooper Schnauzer Grey 49 17
Max Labrador Black 59 29
Stella Chihuahua Tan 18 2
Bernie St. Bernard White 77 74
Removinganindex
dogs_ind.reset_index(
)

name breed color height_cm weight_kg


0 Bella Labrador Brown 56 25
1 Charlie Poodle Black 43 23
2 Lucy Chow Chow Brown 46 22
3 Cooper Schnauzer Grey 49 17
4 Max Labrador Black 59 29
5 Stella Chihuahua Tan 18 2
6 Bernie St. Bernard White 77 74
Droppinganindex
dogs_ind.reset_index(drop=True)

breed color height_cm weight_kg


0 Labrador Brown 56 25
1 Poodle Black 43 23
2 Chow Chow Brown 46 22
3 Schnauzer Grey 49 17
4 Labrador Black 59 29
5 Chihuahua Tan 18 2
6 St. Bernard White 77 74
Indexesmakesubsettingsimpler
dogs[dogs["name"].isin(["Bella", "Stella"])]

name breed color height_cm weight_kg


0 Bella Labrador Brown 56 25
5 Stella Chihuahua Tan 18 2

dogs_ind.loc[["Bella", "Stella"]]

breed color height_cm weight_kg


name
Bella Labrador Brown 56 25
Stella Chihuahua Tan 18 2
Indexvaluesdon'tneedtobeunique
dogs_ind2 = dogs.set_index("breed")
print(dogs_ind2)

name color height_cm weight_kg


breed
Labrador Bella Brown 56 25
Poodle Charlie Black 43 23
Chow Chow Lucy Brown 46 22
Schnauzer Cooper Grey 49 17
Labrador Max Black 59 29
Chihuahua Stella Tan 18 2
St. Bernard Bernie White 77 74
Subsettingonduplicatedindexvalues
dogs_ind2.loc["Labrador"]

name color height_cm weight_kg


breed
Labrador Bella Brown 56 25
Labrador Max Black 59 29
Multi-levelindexesa.k.a.hierarchicalindexes
dogs_ind3 = dogs.set_index(["breed", "color"])
print(dogs_ind3)

name height_cm weight_kg


breed color
Labrador Brown Bella 56 25
Poodle Black Charlie 43 23
Chow Chow Brown Lucy 46 22
Schnauzer Grey Cooper 49 17
Labrador Black Max 59 29
Chihuahua Tan Stella 18 2
St. Bernard White Bernie 77 74
Subsettheouterlevelwithalist
dogs_ind3.loc[["Labrador", "Chihuahua"]]

name height_cm weight_kg


breed color
Labrador Brown Bella 56 25
Black Max 59 29
Chihuahua Tan Stella 18 2
Subsetinnerlevelswithalistoftuples
dogs_ind3.loc[[("Labrador", "Brown"), ("Chihuahua",
"Tan")]]

name height_cm weight_kg


breed color
Labrador Brown Bella 56 25
Chihuahua Tan Stella 18 2
Sortingbyindexvalues
dogs_ind3.sort_index(
)

name height_cm weight_kg


breed color
Chihuahua Tan Stella 18 2
Chow Chow Brown Lucy 46 22
Labrador Black Max 59 29
Brown Bella 56 25
Poodle Black Charlie 43 23
Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74
Controllingsort_index
dogs_ind3.sort_index(level=["color", "breed"], ascending=[True,
False])

name height_cm weight_kg


breed color
Poodle Black Charlie 43 23
Labrador Black Max 59 29
Brown Bella 56 25
Chow Chow Brown Lucy 46 22
Schanuzer Grey Cooper 49 17
Chihuahua Tan Stella 18 2
St. Bernard White Bernie 77 74
Nowyouhavetwoproblems
Index values arejust data
Indexes violate "tidy data" principles

You needto learntwo syntaxes


Temperaturedataset
date city country avg_temp_c

0 2000-01-01 Abidjan Côte D'Ivoire 27.293

1 2000-02-01 Abidjan Côte D'Ivoire 27.685

2 2000-03-01 Abidjan Côte D'Ivoire 29.061

3 2000-04-01 Abidjan Côte D'Ivoire 28.162

4 2000-05-01 Abidjan Côte D'Ivoire 27.547


Slicingand subsetting
with.loc and.iloc

DATA M A NIPULATION W ITH PA NDA S

Richie Cotton
Curriculum Architect at DataCamp
Slicinglists
breeds = ["Labrador", "Poodle", breeds[2:5]
"Chow Chow", "Schnauzer",
"Labrador", "Chihuahua",
['Chow Chow', 'Schnauzer',
"St. Bernard"]
'Labrador']

breeds[:3]
['Labrador',
'Poodle',
'Chow Chow', ['Labrador', 'Poodle', 'Chow Chow']
'Schnauzer',
'Labrador', breeds[:]
'Chihuahua',
'St. Bernard']
['Labrador','Poodle','Chow Chow','Schnauzer',
'Labrador','Chihuahua','St. Bernard']
Sorttheindexbeforeyouslice
dogs_srt = dogs.set_index(["breed", "color"]).sort_index()
print(dogs_srt)

name height_cm weight_kg


breed color
Chihuahua Tan Stella 18 2
Chow Chow Brown Lucy 46 22
Labrador Black Max 59 29
Brown Bella 56 25
Poodle Black Charlie 43 23
Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74
Slicingtheouterindexlevel
dogs_srt.loc["Chow Chow":"Poodle"] Full dataset

name height_cm weight_kg


name height_cm weight_kg
breed color
breed color
Chihuahua Tan Stella 18 2
Chow Chow Brown Lucy 46 22
Chow Chow Brown Lucy 46 22
Labrador Black Max 59 29
Labrador Black Max 59 29
Brown Bella 56 25
Brown Bella 56 25
Poodle Black Charlie 43 23
Poodle Black Charlie 43 23
Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74
Slicingtheinnerindexlevelsbadly
dogs_srt.loc["Tan":"Grey"] Full dataset

name height_cm weight_kg


Empty DataFrame
breed color
Columns: [name, height_cm, weight_kg]
Chihuahua Tan Stella 18 2
Index: []
Chow Chow Brown Lucy 46 22
Labrador Black Max 59 29
Brown Bella 56 25
Poodle Black Charlie 43 23
Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74
Slicingtheinnerindexlevelscorrectly
dogs_srt.loc[ Full dataset
("Labrador", "Brown"):("Schnauzer", "Grey")]
name height_cm weight_kg
breed color
name height_cm weight_kg
Chihuahua Tan Stella 18 2
breed color
Chow Chow Brown Lucy 46 22
Labrador Brown Bella 56 25
Labrador Black Max 59 29
Poodle Black Charlie 43 23
Brown Bella 56 25
Schnauzer Grey Cooper 49 17
Poodle Black Charlie 43 23
Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74
Slicingcolumns
dogs_srt.loc[:, "name":"height_cm"] Full dataset

name height_cm weight_kg


name height_cm
breed color
breed color
Chihuahua Tan Stella 18 2
Chihuahua Tan Stella 18
Chow Chow Brown Lucy 46 22
Chow Chow Brown Lucy 46
Labrador Black Max 59 29
Labrador Black Max 59
Brown Bella 56 25
Brown Bella 56
Poodle Black Charlie 43 23
Poodle Black Charlie 43
Schnauzer Grey Cooper 49 17
Schnauzer Grey Cooper 49
St. Bernard White Bernie 77 74
St. Bernard White Bernie 77
Slicetwice
dogs_srt.loc[ Full dataset
("Labrador", "Brown"):("Schnauzer", "Grey"),
"name":"height_cm"] name height_cm weight_kg
breed color
Chihuahua Tan Stella 18 2
name height_cm
Chow Chow Brown Lucy 46 22
breed color
Labrador Black Max 59 29
Labrador Brown Bella 56
Brown Bella 56 25
Poodle Black Charlie 43
Poodle Black Charlie 43 23
Schanuzer Grey Cooper 49
Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74
Dogdays
dogs = dogs.set_index("date_of_birth").sort_index()
print(dogs)

name breed color height_cm weight_kg


date_of_birth
2011-12-11 Cooper Schanuzer Grey 49 17
2013-07-01 Bella Labrador Brown 56 25
2014-08-25 Lucy Chow Chow Brown 46 22
2015-04-20 Stella Chihuahua Tan 18 2
2016-09-16 Charlie Poodle Black 43 23
2017-01-20 Max Labrador Black 59 29
2018-02-27 Bernie St. Bernard White 77 74
Slicingbydates
dogs.loc["2014-08-25":"2016-09-16"]

name breed color height_cm weight_kg


date_of_birth
2014-08-25 Lucy Chow Chow Brown 46 22
2015-04-20 Stella Chihuahua Tan 18 2
2016-09-16 Charlie Poodle Black 43 23
Slicingbypartialdates
dogs.loc["2014":"2016"]

name breed color height_cm weight_kg


date_of_birth
2014-08-25 Lucy Chow Chow Brown 46 22
2015-04-20 Stella Chihuahua Tan 18 2
Subsettingbyrow/columnnumber
print(dogs.iloc[2:5, 1:4]) Full dataset

breed color height_cm name breed color height_cm weight_kg


2 Chow Chow Brown 46 0 Bella Labrador Brown 56 25
3 Schnauzer Grey 49 1 Charlie Poodle Black 43 23
4 Labrador Black 59 2 Lucy Chow Chow Brown 46 22
3 Cooper Schnauzer Grey 49 17
4 Max Labrador Black 59 29
5 Stella Chihuahua Tan 18 2
6 Bernie St. Bernard White 77 74
Workingwithpivot
tables
DATA M A NIPULATION W ITH PA NDA S

Richie Cotton
Curriculum Architect at DataCamp
A biggerdogdataset
print(dog_pack)

breed color height_cm weight_kg


0 Boxer Brown 62.64 30.4
1 Poodle Black 46.41 20.4
2 Beagle Brown 36.39 12.4
3 Chihuahua Tan 19.70 1.6
4 Labrador Tan 54.44 36.1
.. ... ... ... ...
87 Boxer Gray 58.13 29.9
88 St. Bernard White 70.13 69.4
89 Poodle Gray 51.30 20.4
90 Beagle White 38.81 8.8
91 Beagle Black 33.40 13.5
Pivotingthedogpack
dogs_height_by_breed_vs_color = dog_pack.pivot_table(
"height_cm", index="breed", columns="color")
print(dogs_height_by_breed_vs_color)

color Black Brown Gray Tan White


breed
Beagle 34.500000 36.4500 36.313333 35.740000 38.810000
Boxer 57.203333 62.6400 58.280000 62.310000 56.360000
Chihuahua 18.555000 NaN 21.660000 20.096667 17.933333
Chow Chow 51.262500 50.4800 NaN 53.497500 54.413333
Dachshund 21.186667 19.7250 NaN 19.375000 20.660000
Labrador 57.125000 NaN NaN 55.190000 55.310000
Poodle 48.036000 57.1300 56.645000 NaN 44.740000
St. Bernard 63.920000 65.8825 67.640000 68.334000 67.495000
.loc[]+slicingisapowercombo
dogs_height_by_breed_vs_color.loc["Chow Chow":"Poodle"]

color Black Brown Gray Tan White


breed
Chow Chow 51.262500 50.480 NaN 53.4975 54.413333
Dachshund 21.186667 19.725 NaN 19.3750 20.660000
Labrador 57.125000 NaN NaN 55.1900 55.310000
Poodle 48.036000 57.130 56.645 NaN 44.740000
Theaxisargument
dogs_height_by_breed_vs_color.mean(axis="index")

color
Black 43.973563
Brown 48.717917
Gray 48.107667
Tan 44.934738
White 44.465208
dtype: float64
Calculatingsummarystatsacrosscolumns
dogs_height_by_breed_vs_color.mean(axis="columns")

breed
Beagle 36.362667
Boxer 59.358667
Chihuahua 19.561250
Chow Chow 52.413333
Dachshund 20.236667
Labrador 55.875000
Poodle 51.637750
St. Bernard 66.654300
dtype: float64
Intro. to Data Visualization
Simple Graphs in Python
using

matplotlib and pyplot

Adapted from Dr. Ziad Al-Sharif


What is data visualization?
• Data visualization is the graphical representation of information
and data.
– Can be achieved using visual elements like figures, charts, graphs, maps,
and more.
• Data visualization tools provide a way to present these figures and
graphs.
• Often, it is essential to analyze massive amounts of information
and make data-driven decisions.
– converting complex data into an easy to understand representation.
Matplotlib
• Matplotlib is one of the most powerful tools for data
visualization in Python.
• Matplotlib is an incredibly powerful (and beautiful!) 2-D
plotting library.
– It is easy to use and provides a huge number of examples for tackling
unique problems
• In order to get matplotlib into your script,
– first you need to import it, for example:
import matplotlib.pyplot as plt

• However, if it is not installed, you may need to install it:


– Easiest way to install matplotlib is using pip.
– Type the following command in the command prompt (cmd) or your
Linux shell;
• pip install matplotlib
• Note that you may need to run the above cmd as an administrator
matplotlib
• Strives to emulate MATLAB
– matplotlib.pyplot is a collection of command style functions that make
matplotlib work like MATLAB.
• Each pyplot function makes some change to the figure:
– e.g.,
• creates a figure,
• creates a plotting area in the figure,
• plots some lines in the plotting area,
• decorates the plot with labels, etc.
• Note that various states are preserved across function calls
• Whenever you plot with matplotlib, the two main code lines should be
considered:
– Type of graph
• this is where you define a bar chart, line chart, etc.
– Show the graph
• this is to display the graph
E.g. Matplotlib
• Matplotlib allows you to make easy things
• You can generate plots, histograms, power
spectra, bar charts, errorcharts, scatterplots,
etc., with just a few lines of code.
Line Graphs
import matplotlib.pyplot as plt

#create data for plotting


x_values = [0, 1, 2, 3, 4, 5 ]
y_values = [0, 1, 4, 9, 16,25]

#the default graph style for plot is a line


plt.plot(x_values, y_values)

#display the graph


plt.show()
More on Line Graph
• Note: if you provide a single list or import matplotlib.pyplot as plt
array to the plot() command, plt.plot([1, 2, 3, 4])
plt.ylabel('some numbers')
– then matplotlib assumes it is a plt.show()
sequence of y values, and

– automatically generates the


x values for you.

• Since python ranges start with 0,


the default x vector has the same
length as y but starts with 0.

– Hence the x data are[0, 1, 2, 3].


pyplot
• text() : adds text in an arbitrary location
• xlabel(): adds text to the x-axis
• ylabel(): adds text to the y-axis
• title() : adds title to the plot
• clear() : removes all plots from the axes.
• savefig(): saves your figure to a file
• legend() : shows a legend on the plot
All methods are available on pyplot and on the axes instance
generally.
import matplotlib.pyplot as plt
y1 =[]
y2 =[]
x = range(-100,100,10)
for i in x: y1.append(i**2)
for i in x: y2.append(-i**2)

plt.plot(x, y1)
plt.plot(x, y2)
plt.xlabel("x")
plt.ylabel("y") Incrementally
plt.ylim(-2000, 2000) modify the figure.
plt.axhline(0) # horizontal line
plt.axvline(0) # vertical line

plt.savefig("quad.png") Save your figure to a file


plt.show() Show it on the screen
Plot
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [1, 4, 9, 16]

plt.plot(x, y)

no return value?

• We are operating on a “hidden” variable representing the figure.


• This is a terrible, terrible trick.
• Its only purpose is to pander to MATLAB users.
# importing the required module Simple line
import matplotlib.pyplot as plt

# x axis values
x = [1,2,3]
# corresponding y axis values
y = [2,4,1]

# plotting the points


plt.plot(x, y)

# naming the x axis


plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')
• Define the x-axis and corresponding y-axis
values as lists.
# giving a title to my graph • Plot them on canvas using .plot() function.
plt.title('My first graph!') • Give a name to x-axis and y-axis using .xlabel()
and .ylabel() functions.
• Give a title to your plot using .title() function.
# function to show the plot • Finally, to view your plot, we use .show()
plt.show() function.
import matplotlib.pyplot as plt

# line 1 points Simple 2 lines


x1 = [1,2,3]
y1 = [2,4,1]
# plotting the line 1 points
plt.plot(x1, y1, label="line 1")

# line 2 points
x2 = [1,2,3]
y2 = [4,1,3]
# plotting the line 2 points
plt.plot(x2, y2, label = "line 2")

# naming the x axis


plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')
• Here, we plot two lines on same graph. We
# giving a title to my graph differentiate between them by giving them a
plt.title('Two lines on same graph!') name(label) which is passed as an
argument of .plot() function.
# show a legend on the plot
• The small rectangular box giving
plt.legend()
information about type of line and its color is
called legend. We can add a legend to our
# function to show the plot plot using .legend() function.
plt.show()
import matplotlib.pyplot as plt

# x axis values Customization of


x = [1,2,3,4,5,6]
#
y
corresponding y axis values
= [2,4,1,5,2,6]
Plots
# plotting the points
plt.plot(x, y, color='green', linestyle='dashed', linewidth = 3,
marker='o', markerfacecolor='blue', markersize=12)

# setting x and y axis range


plt.ylim(1,8)
plt.xlim(1,8)

# naming the x axis


plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')

# giving a title to my graph


plt.title('Some cool customizations!')

# function to show the plot


plt.show()
Bar graphs
import matplotlib.pyplot as plt

#Create data for plotting


values = [5, 6, 3, 7, 2]
names = ["A", "B", "C", "D", "E"]

plt.bar(names, values, color="green")


plt.show()

• When using a bar graph, the change in code will be from


plt.plot() to plt.bar() changes it into a bar chart.
Bar graphs
We can also flip the bar graph horizontally with the following

import matplotlib.pyplot as plt

#Create data for plotting


values = [5,6,3,7,2]
names = ["A", "B", "C", "D", "E"]

# Adding an "h" after bar will flip the graph


plt.barh(names, values, color="yellowgreen")
plt.show()
Bar Chart
import matplotlib.pyplot as plt

# heights of bars
height = [10, 24, 36, 40, 5]
# labels for bars
names = ['one','two','three','four','five']

# plotting a bar chart


c1 =['red', 'green']
c2 =['b', 'g'] # we can use this for color
plt.bar(left, height, width=0.8, color=c1)

# naming the x-axis


plt.xlabel('x - axis')
# naming the y-axis
plt.ylabel('y - axis')
# plot title • Here, we use plt.bar() function
plt.title('My bar chart!') to plot a bar chart.
• you can also give some name to x-axis
# function to show the plot coordinates by defining tick_labels
plt.show()
Histogram
import matplotlib.pyplot as plt

# frequencies
ages=[2,5,70,40,30,45,50,45,43,40,44,60,7,13,57,18,90,77,32,21,20,40]

# setting the ranges and no. of intervals


range = (0, 100)
bins = 10

# plotting a histogram
plt.hist(ages, bins, range, color='green',histtype='bar',rwidth=0.8)

# x-axis label
plt.xlabel('age')
# frequency label
plt.ylabel('No. of people')
# plot title
plt.title('My histogram')

# function to show the plot


plt.show()
Histograms
import matplotlib.pyplot as plt

#generate fake data


x = [2,1,6,4,2,4,8,9,4,2,4,10,6,4,5,7,7,3,2,7,5,3,5,9,2,1]

#plot for a histogram


plt.hist(x, bins = 10, color='blue', alpha=0.5)
plt.show()

• Looking at the code snippet, I added


two new arguments:

– Bins — is an argument specific to a


histogram and allows the user to customize
how many bins they want.

– Alpha — is an argument that displays the


level of transparency of the data points.
Scatter Plots
import matplotlib.pyplot as plt

#create data for plotting

x_values = [0,1,2,3,4,5]
y_values = [0,1,4,9,16,25]

plt.scatter(x_values, y_values, s=30, color=“blue")


plt.show()

• Can you see the pattern? Now


the code changed from
plt.bar() to
plt.scatter().
Scatter plot
import matplotlib.pyplot as plt

# x-axis values
x = [1,2,3,4,5,6,7,8,9,10]
# y-axis values
y = [2,4,5,7,6,8,9,11,12,12]

# plotting points as a scatter plot


plt.scatter(x, y, label= "stars", color="green", marker="*", s=30)

# x-axis label
plt.xlabel('x - axis')
# frequency label
plt.ylabel('y - axis')
# plot title
plt.title('My scatter plot!')
# showing legend
plt.legend()

# function to show the plot


plt.show()
Pie-chart
import matplotlib.pyplot as plt

# defining labels
activities = ['eat', 'sleep', 'work', 'play']

# portion covered by each label


slices = [3, 7, 8, 6]

# color for each label


colors = ['r', 'y', 'g', 'b']

# plotting the pie chart


plt.pie(slices, labels = activities, colors=colors,
startangle=90, shadow = True, explode = (0, 0, 0.1, 0),
radius = 1.2, autopct = '%1.1f%%')

# plotting legend
plt.legend()

# showing the plot


plt.show()
Plotting curves of given equation
# importing the required modules
import matplotlib.pyplot as plt
import numpy as np

# setting the x - coordinates


x = np.arange(0, 2*(np.pi), 0.1)
# setting the corresponding y - coordinates
y = np.sin(x)

# potting the points


plt.plot(x, y)

# function to show the plot


plt.show()

Examples taken from:


Graph Plotting in Python | Set 1
Visualizing yourdata
DATA M A NIPULATION W ITH PA NDA S

Maggie Matsui
Content Developer at DataCamp
Histograms
import matplotlib.pyplot as plt

dog_pack["height_cm"].hist()

plt.show()
Histograms
dog_pack["height_cm"].hist(bins=20) dog_pack["height_cm"].hist(bins=5)
plt.show() plt.show()
Barplots
avg_weight_by_breed = dog_pack.groupby("breed")["weight_kg"].mean()
print(avg_weight_by_breed)

breed
Beagle 10.636364
Boxer 30.620000
Chihuahua 1.491667
Chow Chow 22.535714
Dachshund 9.975000
Labrador 31.850000
Poodle 20.400000
St. Bernard 71.576923
Name: weight_kg, dtype: float64
Barplots
avg_weight_by_breed.plot(kind="bar") avg_weight_by_breed.plot(kind="bar",
title="Mean Weight by Dog Breed")
plt.show()
plt.show()
Lineplots
sully.head() sully.plot(x="date",
y="weight_kg",
kind="line")
date weight_kg
plt.show()
0 2019-01-31 36.1
1 2019-02-28 35.3
2 2019-03-31 32.0
3 2019-04-30 32.9
4 2019-05-31 32.0
Rotatingaxislabels
sully.plot(x="date", y="weight_kg", kind="line",
rot=45) plt.show()
Scatterplots
dog_pack.plot(x="height_cm", y="weight_kg", kind="scatter")
plt.show()
Layeringplots
dog_pack[dog_pack["sex"]=="F"]["height_cm"].hist()
dog_pack[dog_pack["sex"]=="M"]["height_cm"].hist()

plt.show()
Addalegend
dog_pack[dog_pack["sex"]=="F"]["height_cm"].hist()
dog_pack[dog_pack["sex"]=="M"]["height_cm"].hist()
plt.legend(["F", "M"])
plt.show()
Transparency
dog_pack[dog_pack["sex"]=="F"]["height_cm"].hist(alpha=0.7)
dog_pack[dog_pack["sex"]=="M"]["height_cm"].hist(alpha=0.7)
plt.legend(["F", "M"])
plt.show()
Missingvalues
DATA M A NIPULATION W ITH PA NDA S

Maggie Matsui
Content Developer at DataCamp
What'samissingvalue?
Name Breed Color Height (cm) Weight (kg) Date of Birth

Bella Labrador Brown 56 25 2013-07-01

Charlie Poodle Black 43 23 2016-09-16

Lucy Chow Chow Brown 46 22 2014-08-25

Cooper Schnauzer Gray 49 17 2011-12-11

Max Labrador Black 59 29 2017-01-20

Stella Chihuahua Tan 18 2 2015-04-20

Bernie St. Bernard White 77 74 2018-02-27


What'samissingvalue?
Name Breed Color Height (cm) Weight (kg) Date of Birth

Bella Labrador Brown 56 ? 2013-07-01

Charlie Poodle Black 43 23 2016-09-16

Lucy Chow Chow Brown 46 22 2014-08-25

Cooper Schnauzer Gray 49 ? 2011-12-11

Max Labrador Black 59 29 2017-01-20

Stella Chihuahua Tan 18 2 2015-04-20

Bernie St. Bernard White 77 74 2018-02-27


Missing valuesinpandasDataFrames
print(dogs)

name breed color height_cm weight_kg date_of_birth


0 Bella Labrador Brown 56 NaN 2013-07-01
1 Charlie Poodle Black 43 24.0 2016-09-16
2 Lucy Chow Chow Brown 46 24.0 2014-08-25
3 Cooper Schnauzer Gray 49 NaN 2011-12-11
4 Max Labrador Black 59 29.0 2017-01-20
5 Stella Chihuahua Tan 18 2.0 2015-04-20
6 Bernie St. Bernard White 77 74.0 2018-02-27
Detectingmissingvalues
dogs.isna()

name breed color height_cm weight_kg date_of_birth


0 False False False False True False
1 False False False False False False
2 False False False False False False
3 False False False False True False
4 False False False False False False
5 False False False False False False
6 False False False False False False
Detectinganymissingvalues
dogs.isna().any()

name False
breed False
color False
height_cm False
weight_kg True
date_of_birth False
dtype: bool
Countingmissingvalues
dogs.isna().sum()

name 0
breed 0
color 0
height_cm 0
weight_kg 2
date_of_birth 0
dtype: int64
Plottingmissingvalues
import matplotlib.pyplot as plt

dogs.isna().sum().plot(kind="bar")
plt.show()
Removingmissingvalues
dogs.dropna()

name breed color height_cm weight_kg date_of_birth


1 Charlie Poodle Black 43 24.0 2016-09-16
2 Lucy Chow Chow Brown 46 24.0 2014-08-25
4 Max Labrador Black 59 29.0 2017-01-20
5 Stella Chihuahua Tan 18 2.0 2015-04-20
6 Bernie St. Bernard White 77 74.0 2018-02-27
Replacingmissingvalues
dogs.fillna(0)

name breed color height_cm weight_kg date_of_birth


0 Bella Labrador Brown 56 0.0 2013-07-01
1 Charlie Poodle Black 43 24.0 2016-09-16
2 Lucy Chow Chow Brown 46 24.0 2014-08-25
3 Cooper Schnauzer Gray 49 0.0 2011-12-11
4 Max Labrador Black 59 29.0 2017-01-20
5 Stella Chihuahua Tan 18 2.0 2015-04-20
6 Bernie St. Bernard White 77 74.0 2018-02-27
CreatingDataFrames
DATA M A NIPULATION W ITH PA NDA S

Maggie Matsui
Content Developer at DataCamp
Dictionaries
my_dict = { my_dict = {
"key1": value1, "title": "Charlotte's Web",
"key2": value2, "author": "E.B. White",
"key3": value3 "published": 1952
} }

my_dict["key1"] my_dict["title"]

value1 E.B. White


CreatingDataFrames
From alist of dictionaries From a dictionary of lists

Constructed row by row Constructed column by column


Listofdictionaries-byrow
name breed height (cm) weight (kg) date of birth

Ginger Dachshund 22 10 2019-03-14

Scout Dalmatian 59 25 2019-05-09

list_of_dicts = [

{"name": "Ginger", "breed": "Dachshund", "height_cm": 22,


"weight_kg": 10, "date_of_birth": "2019-03-14"},

{"name": "Scout", "breed": "Dalmatian", "height_cm": 59,


"weight_kg": 25, "date_of_birth": "2019-05-09"}

]
Listofdictionaries-byrow
name breed height (cm) weight (kg) date of birth

Ginger Dachshund 22 10 2019-03-14

Scout Dalmatian 59 25 2019-05-09

new_dogs = pd.DataFrame(list_of_dicts)
print(new_dogs)

name breed height_cm weight_kg date_of_birth


0 Ginger Dachshund 22 10 2019-03-14
1 Scout Dalmatian 59 25 2019-05-09
Dictionaryoflists-bycolumn
dict_of_lists = {

"name": ["Ginger", "Scout"],


"breed": ["Dachshund", "Dalmatian"],

"height_cm": [22, 59],


Key =column name
"weight_kg": [10, 25],
Value =list of column values
"date_of_birth": ["2019-03-14",
"2019-05-09"]

new_dogs = pd.DataFrame(dict_of_lists)
Dictionaryoflists-bycolumn
name breed height (cm) weight (kg) date of birth

Ginger Dachshund 22 10 2019-03-14

Scout Dalmatian 59 25 2019-05-09

print(new_dogs)

name breed height_cm weight_kg date_of_birth


0 Ginger Dachshund 22 10 2019-03-14
1 Scout Dalmatian 59 25 2019-05-09
Readingandwriting CSVs

DATA M A NIPULATION W ITH PA NDA S

Maggie Matsui
Content Developer at DataCamp
What'saCSV file?
CSV =comma-separated values

Designed for DataFrame-like data

Most database and spreadsheet programs can use them or create them
ExampleCSV file
new_dogs.csv

name,breed,height_cm,weight_kg,d_o_b
Ginger,Dachshund,22,10,2019-03-14
Scout,Dalmatian,59,25,2019-05-09
CSVtoDataFrame
import pandas as pd
new_dogs = pd.read_csv("new_dogs.csv")

print(new_dogs)

name breed height_cm weight_kg date_of_birth


0 Ginger Dachshund 22 10 2019-03-14
1 Scout Dalmatian 59 25 2019-05-09
DataFramemanipulation
new_dogs["bmi"] = new_dogs["weight_kg"] / (new_dogs["height_cm"] / 100) ** 2

print(new_dogs)

name breed height_cm weight_kg date_of_birth bmi


0 Ginger Dachshund 22 10 2019-03-14 206.611570
1 Scout Dalmatian 59 25 2019-05-09 71.818443
DataFrametoCSV
new_dogs.to_csv("new_dogs_with_bmi.csv")

new_dogs_with_bmi.csv

name,breed,height_cm,weight_kg,d_o_b,bmi
Ginger,Dachshund,22,10,2019-03-14,206.611570
Scout,Dalmatian,59,25,2019-05-09,71.818443
https://www.monkeyuser.com/2019/bug-fixing-ways/

Lecture Overview
• Debugging
• Exception Handling
• Testing

Disclaimer: Much of the material and slides for this lecture were borrowed from
—R. Anderson, M. Ernst and B. Howe in University of Washington CSE 140 3
Lecture Overview
• Debugging
• Exception Handling
• Testing

https://www.reddit.com/r/ProgrammerHumor/comments/1r0cw7/the_5_stages_of_debugging/ 4
The Problem “Computers are good at following
instructions, but not at reading
your mind.” - Donald Knuth

Not the same!

There is a bug!

What you want


What your program does
your program to do
What is Debugging?
• Grace Hopper was one of U.S.’s first programmers.
• She found a moth in the Mark I computer, which was
causing errors, and called it a computer “bug”
• Thus, the word debugging is coined ©
Debugging Tools
• Python error message
• assert
• print
• Python interpreter
• Python Tutor (http://pythontutor.com)
• Python debugger
• Best tool:
Two Key Ideas

1. The scientific method


2. Divide and conquer

If you master those, you will find debugging


easy, and possibly enjoyable ;-)
The Scientific Method

1. Create a hypothesis
2. Design an experiment to test that hypothesis
– Ensure that it yields insight
3. Understand the result of your experiment
– If you don’t understand, then possibly suspend
your main line of work to understand that
The Scientific Method

Tips:
• Be systematic
– Never do anything if you don't have a reason
– Don’t just flail
• Random guessing is likely to dig you into a deeper hole

• Don’t make assumptions (verify them)


Example Experiments
1. An alternate implementation of a function
– Run all your test cases afterward

2. A new, simpler test case


– Examples: smaller input, or test a function in isolation
– Can help you understand the reason for a failure
Your Scientific Notebook
Record everything you do
• Specific inputs and outputs (both expected and actual)
• Specific versions of the program
– If you get stuck, you can return to something that works
– You can write multiple implementations of a function
• What you have already tried
• What you are in the middle of doing now
– This may look like a stack!
• What you are sure of, and why

Your notebook also helps if you need to get help or reproduce


your results.
Read the Error Message
First function that was
called (<module>
Traceback (most recent call last): means the interpreter)
File "nx_error.py", line 41, in <module>
print(friends_of_friends(rj, myval)) Second function
File "nx_error.py", line 30, in friends_of_friends that was called
f = friends(graph, user)
Call stack or traceback
File "nx_error.py", line 25, in friends
return set(graph.neighbors(user))#
File "/Library/Frameworks/…/graph.py", line 978, in neighbors
return list(self.adj[n])
Last function that
TypeError: unhashable type: 'list'
was called (this one
suffered an error)
List of all exceptions (errors): The error message:
http://docs.python.org/3/library/exceptions.html#bltin-exceptions daunting but useful.
Two other resources, with more details about a few of the errors: You need to understand:
http://inventwithpython.com/appendixd.html • the literal meaning of
http://www.cs.arizona.edu/people/mccann/errors-python the error
• the underlying
problems certain
errors tend to sugges13t
Common Error Types
• AssertionError
– Raised when an assert statement fails.

• IndexError
– Raised when a sequence subscript is out of range.

• KeyError
– Raised when a mapping (dictionary) key is not found in the set of
existing keys.

• KeyboardInterrupt
– Raised when the user hits the interrupt key (normally Control-C or
Delete).
Common Error Types
• NameError
– Raised when a local or global name is not found.

• SyntaxError
– Raised when the parser encounters a syntax error.

• IndentationError
– Base class for syntax errors related to incorrect indentation.

• TypeError
– Raised when an operation or function is applied to an object of
inappropriate type.
Divide and Conquer

• Where is the defect (or “bug”)?


• Your goal is to find the one place that it is
• Finding a defect is often harder than fixing it

• Initially, the defect might be anywhere in your program


– It is impractical to find it if you have to look everywhere
• Idea: bit by bit reduce the scope of your search
• Eventually, the defect is localized to a few lines or one
line
– Then you can understand and fix it
Divide and Conquer

• 4 ways to divide and conquer:


– In the program code
– In test cases
– During the program execution
– During the development history
Divide and Conquer in the Program Code

• Localize the defect to part of the program


– e.g., one function, or one part of a function
• Code that isn’t executed cannot contain the defect
Divide and Conquer in the Program Code

Three approaches:
1. Test one function at a time
Divide and Conquer in the Program Code

Three approaches:
2. Add assertions or print statements
– The defect is executed before the failing assertion
(and maybe after a succeeding assertion)
Divide and Conquer in the Program Code

Three approaches:
3. Split complex expressions into simpler ones
Example: Failure in
result = set({graph.neighbors(user)})
Change it to
nbors = graph.neighbors(user)
nbors_set = {nbors}
result = set(nbors_set)
The error occurs on the “nbors_set = {nbors}" line
Divide and Conquer in Test Cases

• Your program fails when run on some large input


– It is hard to comprehend the error message
– The log of print statement output is overwhelming

• Try a smaller input


– Choose an input with some but not all characteristics of
the large input
– Example: duplicates, zeroes in data, …
Divide and Conquer in Execution Time
via Print (or “logging”) Statements
• A sequence of print statements is a record of the execution
of your program
• The print statements let you see and search multiple
moments in time
• The print statements are a useful technique,
in moderation
• Be disciplined
– Too much output is overwhelming rather than informative
– Remember the scientific method: have a reason
(a hypothesis to be tested) for each print statement
– Don’t only use print statements
Divide and Conquer in Development History

• The code used to work (for some test case)


• The code now fails
• The defect is related to some line you changed

• This is useful only if you kept a version of the


code that worked (use good names!)
• This is most useful if you have made few changes
• Moral: test often!
– Fewer lines to compare
– You remember what you were thinking/doing recently
A Metaphor About Debugging
If your code doesn’t work as
expected, then by definition you
don’t understand what is going on.

• You’re lost in the woods.


• You’re behind enemy lines.
• All bets are off.
• Don’t trust anyone or anything.

Don’t press on into unexplored


territory -- go back the way you
came! (and leave breadcrumbs!)

You’re trying to “advance the front lines,” not “trailblaze”


Time-Saving Trick: Make Sure You are
Debugging the Right Problem
• The game is to go from “working to working”
• When something doesn’t work, STOP!
– It’s wild out there!
• FIRST: Go back to the last situation that worked properly.
– Rollback your recent changes and verify that everything still works as
expected.
– Don’t make assumptions – by definition, you don’t understand the
code when something goes wrong, so you can’t trust your
assumptions.
– You may find that even what previously worked now doesn’t
– Perhaps you forgot to consider some “innocent” or unintentional
change, and now even tested code is broken
A Bad Timeline

• A works, so celebrate a little


• Now try B
• B doesn’t work
• Change B and try again
• Change B and try again
• Change B and try again

https://xkcd.com/1739/
A Bad Timeline

• A works, so celebrate a little


• Now try B
• B doesn’t work
• Change B and try again
• Change B and try again
• Change B and try again

from giphy.com
A Better Timeline
• A works, so celebrate a little
• Now try B
• B doesn’t work
• Rollback to A
• Does A still work?
– Yes: Find A’ that is somewhere between A and B
– No: You have unintentionally changed something else, and there’s no
point futzing with B at all!

These “innocent” and unnoticed changes happen more than you would think!
• You add a comment, and the indentation changes.
• You add a print statement, and a function is evaluated twice.
• You move a file, and the wrong one is being read
• You are on a different computer, and the library is a different version
Once You are on Solid Ground You can
Set Out Again
• Once you have something that works and something that
doesn’t work, it is only a matter of time

• You just need to incrementally change the working code into


the non-working code, and the problem will reveal itself.

• Variation: Perhaps your code works with one input, but fails
with another. Incrementally change the good input into the
bad input to expose the problem.
Simple Debugging Tools
print
– shows what is happening whether there is a problem or
not
– does not stop execution

assert
– Raises an exception if some condition is not met
– Does nothing if everything works
– Example: assert len(rj.edges()) == 16
– Use this liberally! Not just for debugging!
Lecture Overview
• Debugging
• Exception Handling
• Testing
What is an Exception?
• An exception is an abnormal condition (and thus
rare) that arises in a code sequence at runtime.
• For instance:
– Dividing a number by zero
– Accessing an element that is out of bounds of an array
– Attempting to open a file which does not exist
What is an Exception?
• When an exceptional condition arises, an object
representing that exception is created and thrown in
the code that caused the error

• An exception can be caught to handle it or pass it on

• Exceptions can be generated by the run-time system,


or they can be manually generated by your code
What is an Exception?
test = [1,2,3]
test[3]

IndexError: list index out of range


What is an Exception?
successFailureRatio = numSuccesses/numFailures print('The
success/failure ratio is', successFailureRatio)
print('Now here')

ZeroDivisionError: integer division or


modulo by zero
What is an Exception?
val = int(input('Enter an integer: '))
print('The square of the number', val**2)

> Enter an integer: asd

ValueError: invalid literal for int() with


base 10: 'asd'
Handling Exceptions
• Exception mechanism gives the programmer a chance
to do something against an abnormal condition.
• Exception handling is performing an action in response
to an exception.
• This action may be:
– Exiting the program
– Retrying the action with or without alternative data
– Displaying an error message and warning user to do
something
– ....
Handling Exceptions
try:
successFailureRatio = numSuccesses/numFailures
print('The S/F ratio is', successFailureRatio)
except ZeroDivisionError:
print('No failures, so the S/F is undefined.')
print('Now here')

• Upon entering the try block, the interpreter attempts to evaluate


the expression numSuccesses/numFailures.
• If expression evaluation is successful, the assignment is done and
the result is printed.
• If, however, a ZeroDivisionError exception is raised, the print
statement in the except block is executed.
Handling Exceptions
while True:
val = input('Enter an integer: ')
try:
val = int(val)
print('The square of the number', val**2)
break # to exit the while loop
except ValueError:
print(val, 'is not an integer')

Checks for whether ValueError exception is raised or not


Keywords of Exception Handling
• There are five keywords in Python to deal with
exceptions: try, except , else , raise and finally.

• try: Creates a block to monitor if any exception


occurs.

• except: Follows the try block and catches any


exception which is thrown within it.
Are There Many Exceptions in Python?
• Yes, some of them are…
– Exception
– ArithmeticError
– OverflowError
– ZeroDivisonError
– EOFError
– NameError
– IOError
– SyntaxError

List of all exceptions (errors):


http://docs.python.org/3/library/exceptions.html#bltin-exceptions
Multiple except Statements
• It is possible that more than one exception can be
thrown in a code block.
– We can use multiple except clauses

• When an exception is thrown, each except statement


is inspected in order, and the first one whose type
matches that of the exception is executed.
– Type matching means that the exception thrown must be an
object of the same class or a sub-class of the declared class
in the except statement

• After one except statement executes, the others are


bypassed.
Multiple except Statements
try:
You do your operations here;
except Exception-1:
Execute this block.
except Exception-2:
Execute this block.
except (Exception-3[, Exception-4[,...ExceptionN]]]):
If there is any exception from the given exception list,
then execute this block.
except (ValueError, TypeError):

The except block will be entered if any of the listed


exceptions is raised within the try block
Multiple except Statements
try:
f = open('outfile.dat', 'w')
dividend = 5
divisor = 0
division = dividend / divisor
f.write(str(division))

except IOError:
print("I can't open the file!")

except ZeroDivisionError:
print("You can't divide by zero!")

You can't divide by zero!


Multiple except Statements
try:
f = open('outfile.dat', 'w')
dividend = 5
divisor = 0
division = dividend / divisor
f.write(str(division))
except Exception:
print("Exception occured and handled!")
except IOError:
print("I can't open the file!")
except ZeroDivisionError:
print("You can't divide by zero!")

Exception occured and handled!


Multiple except Statements
try:
f = open('outfile.dat', 'w')
dividend = 5
divisor = 0
division = dividend / divisor
f.write(str(division))
except:
print("Exception occured and handled!")
except IOError:
print("I can't open the file!")
except ZeroDivisionError:
print("You can't divide by zero!")

SyntaxError: default 'except:' must be last


except-else Statements
try:
You do your operations here
except:
Execute this block.
else:
If there is no exception, execute this block.

try:
f = open(arg, 'r')
except IOError:
print('cannot open', arg)
else:
print(arg, 'has', len(f.readlines()), 'lines')
finally Statement

• finally creates a block of code that will be executed after


a try/except block has completed and before the code
following the try/except block

• finally block is executed whether or not exception is thrown

• finally block is executed whether or not exception is caught

• It is used to gurantee that a code block will be executed in any


condition.
finally Statement
You can use it to clean up files, database connections, etc.

try:
You do your operations here
except:
Execute this block.
finally:
This block will definitely be executed.

try:
file = open('out.txt', 'w')
do something…
finally:
file.close()
os.path.remove('out.txt')
Nested try Blocks
• When an exception occurs inside a try block;
– If the try block does not have a matching except, then the outer
try statement’s except clauses are inspected for a match
– If a matching except is found, that except block is executed
– If no matching except exists, execution flow continues to find a
matching except by inspecting the outer try statements
– If a matching except cannot be found at all, the exception will be
caught by Python’s exception handler.

• Execution flow never returns to the line that exception was


thrown.
– This means, an exception is caught and except block is executed,
the flow will continue with the lines following this except block
Let’s clarify it on various scenarios
try: Information: Exception1 and Exception2 are
statement1 subclasses of Exception3
try:
statement2 Exception 3
except Exception1:
statement3 Exception 1 Exception 2
except Exception2:
statement4;
try:
statement5 Question: Which statements are executed if
except Exception3: 1 statement1 throws Exception1
statement6 2 statement2 throws Exception1
statement7; 3 statement2 throws Exception3
except Exception3:
statement8
4statement2 throws Exception1 and
statement9; statement3 throws Exception2
Scenario: statement1 throws Exception1
Exception 3
try: Step1: Exception is thrown
statement1 Exception1
Exception 1 Exception 2
try:
statement2
except Exception1:
statement3 Step2: except clauses of the try
except Exception2: block are inspected for a
statement4; matching except statement.
try: Exception3 is super class of
statement5 Exception1, so it matches.
except Exception3:
statement6
statement7;
except Exception3:
Step3: statement8 is executed, exception is handled and execution
statement8
flow will continue bypassing the following except clauses
statement9;

Step4: statement9 is executed


Scenario: statement2 throws Exception1
Exception 3
try:
statement1 Exception 1 Exception 2
try: Step1: Exception is thrown
statement2 Exception1
except Exception1:
statement3
except Exception2: Step2: except clauses of the try block are
statement4; inspected for a matching except statement. First
try: clause catches the exception
statement5
except Exception3: Step3: statement3 is executed, exception is
statement6 handled
statement7; Step4: execution flow will continue bypassing the
except Exception3: following except clauses. statement5 is executed.
statement8
statement9;
Step5: Assuming no exception is thrown by
statement5, program continues with statement7
and statement9.
Scenario: statement2 throws Exception3
Exception 3
try:
statement1 Exception 1 Exception 2
try: Step1: Exception is thrown
statement2 Exception3
except Exception1:
statement3
except Exception2: Step2: except clauses of the try block are
statement4; inspected for a matching except statement.
try: None of these except clauses match Exception3
statement5
except Exception3:
statement6
statement7;
Step3: except clauses of the outer try statement
except Exception3: are inspected for a matching except . Exception3 is
statement8 catched and statement8 is executed
statement9;

Step4: statement9 is executed


57
Scenario: statement2 throws Exception1
and statement3 throws Exception2
Exception 3
try:
statement1 Exception 1 Exception 2
try: Step1: Exception is thrown
statement2 Exception1
except Exception1:
statement3 Step2: Exception is catched and statement3 is
except Exception2: executed.
statement4;
try: Step3: statement3 throws a new exception
statement5
Exception2
except Exception3:
statement6
statement7; Step4: Except clauses of the outer
except Exception3: try statement are inspected for a
statement8 matching except. Exception2 is
statement9; catched and statement8 is
executed

Step5: statement9 is executed


58
raise Statement
• You can raise exceptions by using the raise
statement.

• The syntax is as follows:


raise exceptionName(arguments)

59
raise Statement
def getRatios(vect1, vect2):
ratios = []
for index in range(len(vect1)):
try:
ratios.append(vect1[index]/vect2[index])
except ZeroDivisionError:
ratios.append(float('nan')) # nan = Not a Number
except:
raise ValueError(’getRatios called with bad arguments’)
return ratios

try:
print(getRatios([1.0, 2.0, 7.0, 6.0], [1.0,2.0,0.0,3.0]))
print(getRatios([], []))
print(getRatios([1.0, 2.0], [3.0]))
except ValueError as msg: [1.0, 1.0, nan, 2.0]
print(msg) []
getRatios called with bad arguments60
raise Statement
• Avoid raising a generic Exception! To catch it, you'll have
to catch all other more specific exceptions that subclass it..

def demo_bad_catch():
try:
raise ValueError('a hidden bug, do not catch this')
raise Exception('This is the exception you expect to handle')
except Exception as error:
print('caught this error: ' + repr(error))

>>> demo_bad_catch()
caught this error: ValueError('a hidden bug, do not catch this',)
raise Statement
• and more specific catches won't catch the general exception:..

def demo_no_catch():
try:
raise Exception('general exceptions not caught by specific handling')
except ValueError as e:
print('we will not catch e')

>>> demo_no_catch()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in demo_no_catch
Exception: general exceptions not caught by specific handling
Custom Exceptions
• Users can define their own exception by creating a
new class in Python.

• This exception class has to be derived, either directly


or indirectly, from the Exception class.

• Most of the built-in exceptions are also derived form


this class.
Custom Exceptions
class ValueTooSmallError(Exception):
"""Raised when the input value is too small"""
pass

class ValueTooLargeError(Exception):
"""Raised when the input value is too large"""
pass
Custom Exceptions
number = 10 # you need to guess this number

while True:
try:
i_num = int(input("Enter a number: "))
if i_num < number:
raise ValueTooSmallError
elif i_num > number:
raise ValueTooLargeError
break
except ValueTooSmallError:
print("This value is too small, try again!")
except ValueTooLargeError:
print("This value is too large, try again!")

print("Congratulations! You guessed it correctly.")


Lecture Overview
• Debugging
• Exception Handling
• Testing
Testing
• Programming to analyze data is powerful
• It is useless if the results are not correct
• Correctness is far more important than speed
Famous Examples
• Ariane 5 rocket
– On June 4, 1996, the maiden flight
of the European Ariane 5 launcher
crashed about 40 seconds after takeoff.
– Media reports indicated that the amount lost was half
a billion dollars
– The explosion was the result of a software error

• Therac-25 radiation therapy machine


– In 1985 a Canadian-built radiation-treatment device
began blasting holes through patients' bodies.
Testing does not Prove Correctness

“Program testing can be used to show the


presence of bugs, but never to show their
absence! --Edsger Dijkstra”
Testing = Double-Checking Results
• How do you know your program is right?
– Compare its output to a correct output

• How do you know a correct output?


– Real data is big
– You wrote a computer program because it is not
convenient to compute it by hand

• Use small inputs so you can compute by hand

• Example: standard deviation


– What are good tests for std_dev?
Testing ≠ Debugging
• Testing: Determining whether your program
is correct
– Doesn’t say where or how your program is
incorrect

• Debugging: Locating the specific defect in


your program, and fixing it
2 key ideas:
– divide and conquer
– the scientific method
When are you ready to test?
• Ensure that code will actually run
– Remove syntax errors
– Remove static semantic errors
– Both of these are typically handled by the Python
interpreter
• Have a set of expected results (i.e. input- output
pairings) ready
What is a Test?

• A test consists of:


– an input: sometimes called “test data”
– an oracle: a predicate (boolean expression) of the
output
What is a Test?

• Example test for sum:


– input: [1, 2, 3]
– oracle: result is 6
– write the test as: sum([1, 2, 3]) == 6

• Example test for sqrt:


– input: 3.14
– oracle: result is within 0.00001 of 1.772
– ways to write the test:
• -0.00001 < sqrt(3.14) – 1.772 < 0.00001
• math.abs(sqrt(3.14) – 1.772) < 0.00001
Test Results
• The test passes if the boolean expression evaluates
to True

• The test fails if the boolean expression evaluates to


False

• Use the assert statement:


– assert sum([1, 2, 3]) == 6
– assert True does nothing
– assert False crashes the program and prints a message
Where to Write Test Cases
• At the top level: is run every time you load your program
def hypotenuse(a, b):

assert hypotenuse(3, 4) == 5
assert hypotenuse(5, 12) == 13

• In a test function: is run when you invoke the function


def hypotenuse(a, b):

def test_hypotenuse():
assert hypotenuse(3, 4) == 5
assert hypotenuse(5, 12) == 13
Assertions are not Just for Test Cases
• Use assertions throughout your code

• Documents what you think is true about your


algorithm

• Lets you know immediately when something goes


wrong
– The longer between a code mistake and the programmer
noticing, the harder it is to debug
Assertions Make Debugging Easier
• Common, but unfortunate, course of events:
– Code contains a mistake (incorrect assumption or algorithm)
– Intermediate value (e.g., result of a function call) is incorrect
– That value is used in other computations, or copied into other
variables
– Eventually, the user notices that the overall program produces
a wrong result
– Where is the mistake in the program? It could be anywhere.

• Suppose you had 10 assertions evenly distributed in your


code
– When one fails, you can localize the mistake to 1/10 of your
code (the part between the last assertion that passes and the
first one that fails)
Where to Write Assertions
• Function entry: Are arguments legal?
– Place blame on the caller before the function fails

• Function exit: Is result correct?

• Places with tricky or interesting code

• Assertions are ordinary statements; e.g., can


appear within a loop:
for n in myNumbers:
assert type(n) == int or type(n) == float
Where not to Write Assertions
• Don’t clutter the code
– Same rule as for comments

• Don’t write assertions that are certain to succeed


– The existence of an assertion tells a programmer that it
might possibly fail

• Don’t write an assertion if the following code would fail


informatively
assert type(name) == str
print("Hello, " + name)

• Write assertions where they may be useful for


debugging
What to Write Assertions About
• Results of computations

• Correctly-formed data structures

assert 0 <= index < len(mylist)


assert len(list1) == len(list2)
When to Write Tests
• Two possibilities:
– Write code first, then write tests
– Write tests first, then write code
When to Write Tests
• If you write the code first, you remember the
implementation while writing the tests
– You are likely to make the same mistakes in the
implementation
When to Write Tests
• If you write the tests first, you will think more
about the functionality than about a particular
implementation
– You might notice some aspect of behavior that
you would have made a mistake about
– This is the better choice
Write the Whole Test
• A common mistake:
1. Write the function
2. Make up test inputs
3. Run the function
4. Use the result as the oracle

• You didn’t write a test, but only half of a test


– Created the tests inputs, but not the oracle

• The test does not determine whether the function is


correct
– Only determines that it continues to be as correct (or
incorrect) as it was before
Testing Approaches
• Black box testing - Choose test data without
looking at implementation

• Glass box (white box, clear box) testing -


Choose test data with knowledge of
implementation
Inside Knowledge might be Nice
• Assume the code below:

c = a + b
if c > 100
print("Tested”)
print("Passed”)

• Creating a test case with a=40 and b=70 is not enough


– Although every line of the code will be executed

• Another test case with a=40 and b=30 would complete


the test
Tests might not Reveal an Error Sometimes
def mean(numbers):
"""Returns the average of the argument list.
The argument must be a non-empty number list."""
return sum(numbers)//len(numbers)

# Tests
assert mean([1, 2, 3, 4, 5]) == 3
assert mean([1, 2, 3]) == 2

This implementation is elegant, but wrong!


mean([1,2,3,4]) ➔ would return 2.5!!!
Last but not Least, Don’t Write Meaningless Tests

def mean(numbers):
"""Returns the average of the argument list.
The argument must be a non-empty number list."""
return sum(numbers)//len(numbers)

Unnecessary tests. Don’t write these:

mean([1, 2, "hello"])
mean("hello")
mean([])
Test suite
• Want to find a collection of inputs that has high
likelihood of revealing bugs, yet is efficient
– Partition space of inputs into subsets that provide
equivalent information about correctness
• Partition divides a set into group of subsets such that each
element of set is in exactly one subset
• Construct test suite that contains one input from
each element of partition
• Run test suite
Example of partition
def bigger(x,y):
""" Assumes x and y are ints returns 1
if x is less than y else returns 0 """

• Input space is all pairs of integers


• Possible partition
– x positive, y positive
– x negative, y negative
– x positive, y negative
– x negative, y positive
– x=0,y=0
– x=0,y!=0
– x!=0,y=0
Why this partition?
• Lots of other choices
– E.g., x prime, y not; y prime, x not; both prime; both not
• Space of inputs often have natural boundaries
– Integers are positive, negative or zero
– From this perspective, have 9 subsets
• Split x = 0, y != 0 into x = 0, y positive and x =0, y negative
• Same for x != 0, y = 0
Partitioning
• What if no natural partition to input space?
– Random testing – probability that code is correct increases
with number of trials; but should be able to use code to do
better
– Use heuristics based on exploring paths through the
specifications – black-box testing
– Use heuristics based on exploring paths through the code
– glass-box testing
Black-box testing
• Test suite designed without looking at code
– Can be done by someone other than implementer
– Will avoid inherent biases of implementer, exposing
potential bugs more easily
– Testing designed without knowledge of implementation,
thus can be reused even if implementation changed
Paths through a specification
def sqrt_f(x, eps)
""" Assumes x, eps floats
x >= 0
eps > 0
returns res such that
x-eps <= res*res <= x+eps """

• Paths through specification:


– x=0
– x>0
• But clearly not enough
Paths through a specification
• Also good to consider boundary cases
– For numbers, very small, very large, “typical”
• For our sqrt_f case, try these:
– First four are typical x eps
0.0 0.0001
• Perfect square
25.0 0.0001
• Irrational square root
.05 0.0001
• Example less than 1 2.0 0.0001
– Last five test extremes 2.0 1.0 / (2.0 ** 64.0)
1.0 / (2.0 ** 64.0) 1.0 / (2.0 ** 64.0)
• If bug, might be code, 2.0 ** 64.0 1.0 / (2.0 ** 64.0)
or might be specification 1.0 / (2.0 ** 64.0) 2.0 ** 64.0
2.0 ** 64.0 2.0 ** 64.0
(e.g. don’t try to find
root if eps tiny)
Glass-box Testing
• Use code directly to guide design of test cases
• Glass-box test suite is path-complete if every
potential path through the code is tested at least
once
– Not always possible if loop can be exercised arbitrary
times, or recursion can be arbitrarily deep
• Even path-complete suite can miss a bug, depending
on choice of examples
Example
def abs(x):
""" Assumes x is an int returns x if x>=0
and –x otherwise """

if x<-1:
return –x
else:
return x

• Test suite of {-2, 2} will be path complete


• But will miss abs(-1) which incorrectly returns -1
• Testing boundary cases and typical cases {-2 -1, 2} would catch
this
Rules of thumb for glass-box testing
• Exercise both branches of all if statements
• Ensure each except clause is executed
• For each for loop, have tests where:
– Loop is not entered
– Body of loop executed exactly once
– Body of loop executed more than once
• For each while loop,
– Same cases as for loops
– Cases that catch all ways to exit loop
• For recursive functions, test with no recursive calls,
one recursive call, and more than one recursive call
Conducting tests
• Start with unit testing
– Check that each module (e.g. func2on) works correctly
• Move to integration testing
– Check that system as whole works correctly
• Cycle between these phases
Good testing practice
• Start with unit testing
• Move to integration testing
• After code is corrected, be sure to do regression
testing:
• Check that program still passes all the tests it used to
pass, i.e., that your code fix hasn’t broken something
that used to work
Testing - Summary
• Goal:
– Show that bugs exist
– Would be great to prove code is bug free, but
generally hard
• Usually can’t run on all possible inputs to check
• Formal methods sometimes help, but usually only on
simpler code

“Program testing can be used to show the presence of bugs,


but never to show their absence! --Edsger Dijkstra”
Importing flat files
from the web
I N TE R M E D I AT E I M P O R TI N G DATA I N P Y T H O N
You’re already great at importing!
Flat files such as .txt and .csv

What if your data is online?


Can you import web data?

You can : go to URL and click to do w nload files

BUT: not reprod ucible , not scalable


The urllib package
Provides interface for fetching data across the web
urlopen() - accepts URLs instead of file names
How to automate file download in Python
from urllib.request import urlretrieve
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-
white.csv'
urlretrieve(url, 'winequality-white.csv')
HTTP requests to
import files from the
web
I N TE R M E D I AT E I M P O R TI N G DATA I N P Y T H O N

Hugo Bowne-Anderson
Data Scientist at DataCamp
URL
Uniform /Universal Resource Locator

References to web resources

Focus: web addresses

Ingredients :
Protocol identi fi er - htp:
Resource name - datacamp . com

These specif y web addresses uniquely


HTTP
HyperText Transfer Protocol
Fo u ndation of data comm u nication for the web

HTTPS - more secure form of HTTP

Going to a website = sending HTTP request


GET request

urlretrieve() performs a GET request

HTML - HyperText Mark u p Langu age


GET requests using urllib
from urllib.request import urlopen, Request
url = "https://www.wikipedia.org/"
request = Request(url)
response = urlopen(request)
html = response.read()
response.close()
Regular Expressions
Chapter 11

Python for Everybody


www.py4e.com
Regular Expressions
In computing, a regular expression, also referred to
as “regex” or “regexp”, provides a concise and
flexible means for matching strings of text, such as
particular characters, words, or patterns of
characters. A regular expression is written in a
formal language that can be interpreted by a regular
expression processor.
http://en.wikipedia.org/wiki/Regular_expression
Regular Expressions
Really clever “wild card” expressions for matching
and parsing strings

http://en.wikipedia.org/wiki/Regular_expression
Really smart “Find” or “Search”
Understanding Regular Expressions
• Very powerful and quite cryptic
• Fun once you understand them
• Regular expressions are a language unto themselves
• A language of “marker characters” - programming with
characters
• It is kind of an “old school” language - compact
http://xkcd.com/208/
Regular Expression Quick Guide
^ Matches the beginning of a line
$ Matches the end of the line
. Matches any character
\s Matches whitespace
\S Matches any non-whitespace character
* Repeats a character zero or more times
*? Repeats a character zero or more times (non-greedy)
+ Repeats a character one or more times
+? Repeats a character one or more times (non-greedy)
[aeiou] Matches a single character in the listed set
[^XYZ] Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
( Indicates where string extraction is to start
) Indicates where string extraction is to end

https://www.py4e.com/lectures3/Pythonlearn-11-Regex-Handout.txt
The Regular Expression Module
• Before you can use regular expressions in your program, you
must import the library using “import re”

• You can use re.search() to see if a string matches a regular


expression, similar to using the find() method for strings

• You can use re.findall() to extract portions of a string that match


your regular expression, similar to a combination of find() and
slicing: var[5:10]
Using re.search() Like find()

import re
hand = open('mbox-short.txt')
for line in hand: hand = open('mbox-short.txt')
line = line.rstrip() for line in hand:
if line.find('From:') >= 0: line = line.rstrip()
print(line) if re.search('From:', line) :
print(line)
Using re.search() Like startswith()
import re
hand = open('mbox-short.txt')
for line in hand: hand = open('mbox-short.txt')
line = line.rstrip() for line in hand:
if line.startswith('From:') : line = line.rstrip()
print(line) if re.search('^From:', line) :
print(line)

We fine-tune what is matched by adding special characters to the string


Wild-Card Characters
• The dot character matches any character

• If you add the asterisk character, the character is “any number of


times”
Many
Match the start of the
times
X-Sieve: CMU Sieve 2.3 line
X-DSPAM-Result: Innocent
X-DSPAM-Confidence: 0.8475
X-Content-Type-Message-Body: text/plain
^X.*:
Match any character
Fine-Tuning Your Match
Depending on how “clean” your data is and the purpose of your
application, you may want to narrow your match down a bit

Many
Match the start of times
X-Sieve: CMU Sieve 2.3 the line
X-DSPAM-Result: Innocent
X-Plane is behind schedule: two weeks
X-: Very short
^X.*:
Match any character
Fine-Tuning Your Match
Depending on how “clean” your data is and the purpose of your
application, you may want to narrow your match down a bit

One or more
X-Sieve: CMU Sieve 2.3 Match the start of
times
X-DSPAM-Result: Innocent the line
X-: Very Short
X-Plane is behind schedule: two weeks ^X-\S+:
Match any non-whitespace character
Matching and Extracting Data
• re.search() returns a True/False depending on whether the string
matches the regular expression

• If we actually want the matching strings to be extracted, we use


re.findall()
>>> import re
[0-9]+ >>> x = 'My 2 favorite numbers are 19 and 42'
>>> y = re.findall('[0-9]+',x)
>>> print(y)
['2', '19', '42']
One or more digits
Matching and Extracting Data
When we use re.findall(), it returns a list of zero or more sub-strings
that match the regular expression

>>> import re
>>> x = 'My 2 favorite numbers are 19 and 42'
>>> y = re.findall('[0-9]+',x)
>>> print(y)
['2', '19', '42']
>>> y = re.findall('[AEIOU]+',x)
>>> print(y)
[]
Warning: Greedy Matching
The repeat characters (* and +) push outward in both directions
(greedy) to match the largest possible string
One or more
characters
>>> import re
>>> x = 'From: Using the : character'
>>> y = re.findall('^F.+:', x)
>>> print(y)
^F.+:
['From: Using the :']

First character in Last character in the


Why not 'From:' ?
the match is an F match is a :
Non-Greedy Matching
Not all regular expression repeat codes are greedy!
If you add a ? character, the + and * chill out a bit... One or more
characters but
not greedy
>>> import re
>>> x = 'From: Using the : character'
>>> y = re.findall('^F.+?:', x) ^F.+?:
>>> print(y)
['From:']
First character in Last character in the
the match is an F match is a :
Fine-Tuning String Extraction
You can refine the match for re.findall() and separately determine which
portion of the match is to be extracted by using parentheses

From [email protected] Sat Jan 5 09:14:16 2008

>>> y = re.findall('\S+@\S+',x)
\S+@\S+
>>> print(y)
['[email protected]’]
At least one
non-whitespace
character
Fine-Tuning String Extraction
Parentheses are not part of the match - but they tell where to start
and stop what string to extract

From [email protected] Sat Jan 5 09:14:16 2008

>>> y = re.findall('\S+@\S+',x)
>>> print(y) ^From (\S+@\S+)
['[email protected]']
>>> y = re.findall('^From (\S+@\S+)',x)
>>> print(y)
['[email protected]']
String Parsing Examples…
21 31

From [email protected] Sat Jan 5 09:14:16 2008

>>> data = 'From [email protected] Sat Jan 5 09:14:16 2008'


>>> atpos = data.find('@')
>>> print(atpos)
21
>>> sppos = data.find(' ',atpos) Extracting a host
>>> print(sppos) name - using find
31
>>> host = data[atpos+1 : sppos] and string slicing
>>> print(host)
uct.ac.za
The Double Split Pattern
Sometimes we split a line one way, and then grab one of the pieces
of the line and split that piece again

From [email protected] Sat Jan 5 09:14:16 2008

words = line.split() [email protected]


email = words[1] ['stephen.marquard', 'uct.ac.za']
pieces = email.split('@')
print(pieces[1]) 'uct.ac.za'
The Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('@([^ ]*)',lin)
print(y)

['uct.ac.za']
'@([^ ]*)'

Look through the string until you find an at sign


The Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('@([^ ]*)',lin)
print(y)

['uct.ac.za']
'@([^ ]*)'

Match non-blank character Match many of them


The Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('@([^ ]*)',lin)
print(y)

['uct.ac.za']
'@([^ ]*)'

Extract the non-blank characters


Even Cooler Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)',lin)
print(y)

['uct.ac.za']
'^From .*@([^ ]*)'

Starting at the beginning of the line, look for the string 'From '
Even Cooler Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)',lin)
print(y)

['uct.ac.za']
'^From .*@([^ ]*)'

Skip a bunch of characters, looking for an at sign


Even Cooler Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)',lin)
print(y)

['uct.ac.za']
'^From .*@([^ ]*)'

Start extracting
Even Cooler Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)',lin)
print(y)

['uct.ac.za']
'^From .*@([^ ]+)'

Match non-blank character Match many of them


Even Cooler Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)',lin)
print(y)

['uct.ac.za']
'^From .*@([^ ]+)'

Stop extracting
Escape Character
If you want a special regular expression character to just behave
normally (most of the time) you prefix it with '\'

>>> import re At least one


>>> x = 'We just received $10.00 for cookies.'
>>> y = re.findall('\$[0-9.]+',x)
or more
>>> print(y)
['$10.00']
\$[0-9.]+

A real dollar sign A digit or period


Summary
• Regular expressions are a cryptic but powerful language for
matching strings and extracting elements from those strings
• Regular expressions have special characters that indicate
intent
Acknowledgements / Contributions
These slides are Copyright 2010- Charles R. Severance
...
(www.dr-chuck.com) of the University of Michigan School of
Information and open.umich.edu and made available under a
Creative Commons Attribution 4.0 License. Please maintain this
last slide in all copies of the document to comply with the
attribution requirements of the license. If you make a change,
feel free to add your name and organization to the list of
contributors on this page as you republish the materials.

Initial Development: Charles Severance, University of Michigan


School of Information

… Insert new Contributors and Translations here


Networked Programs
Chapter 12

Python for Everybody


www.py4e.com
Using urllib in Python
Since HTTP is so common, we have a library that does all the
socket work for us and makes web pages look like a file

import urllib.request, urllib.parse, urllib.error

fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
for line in fhand:
print(line.decode().strip())

urllib1.py
import urllib.request, urllib.parse, urllib.error

fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
for line in fhand:
print(line.decode().strip())

But soft what light through yonder window breaks


It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief

urllib1.py
Reading Web Pages
import urllib.request, urllib.parse, urllib.error

fhand = urllib.request.urlopen('http://www.dr-chuck.com/page1.htm')
for line in fhand:
print(line.decode().strip())

<h1>The First Page</h1>


<p>If you like, you can switch to the <a
href="http://www.dr-chuck.com/page2.htm">Second
Page</a>.
</p>
urllib2.py
Following Links
import urllib.request, urllib.parse, urllib.error

fhand = urllib.request.urlopen('http://www.dr-chuck.com/page1.htm')
for line in fhand:
print(line.decode().strip())

<h1>The First Page</h1>


<p>If you like, you can switch to the <a
href="http://www.dr-chuck.com/page2.htm">Second
Page</a>.
</p>
urllib2.py
Acknowledgements / Contributions
Thes slide are Copyright 2010- Charles R. Severance (www.dr-
...
chuck.com) of the University of Michigan School of Information
and open.umich.edu and made available under a Creative
Commons Attribution 4.0 License. Please maintain this last slide
in all copies of the document to comply with the attribution
requirements of the license. If you make a change, feel free to
add your name and organization to the list of contributors on this
page as you republish the materials.

Initial Development: Charles Severance, University of Michigan


School of Information

… Insert new Contributors here


Using Web Services
Chapter 13

Python for Everybody


www.py4e.com
Data on the Web
• With the HTTP Request/Response well understood and well
supported, there was a natural move toward exchanging data
between programs using these protocols

• We needed to come up with an agreed way to represent data


going between applications and across networks

• There are two commonly used formats: XML and JSON


Sending Data Across the “Net”
PHP JavaScript
Array {
Object
"name" : "Chuck",
"phone" : "303-4456"
}
Python Java
Dictionary HashMap

a.k.a. “Wire Protocol” - What we send on the “wire”


Agreeing on a “Wire Format”
<person>
<name>
De-Serialize
Chuck
Python </name> Java
Dictionary <phone> HashMap
303 4456
Serialize
</phone>
</person>
XML
Agreeing on a “Wire Format”

De-Serialize
{
Python "name" : "Chuck", Java
"phone" : "303-4456"
Dictionary }
HashMap
Serialize

JSON
XML
Marking up data to send across the network...

http://en.wikipedia.org/wiki/XML
XML “Elements” (or Nodes)
<people>
<person>
<name>Chuck</name>
<phone>303 4456</phone>
• Simple Element </person>
• Complex Element
<person>
<name>Noah</name>
<phone>622 7421</phone>
</person>
</people>
eXtensible Markup Language
• Primary purpose is to help information systems share structured
data

• It started as a simplified subset of the Standard Generalized


Markup Language (SGML), and is designed to be relatively
human-legible

http://en.wikipedia.org/wiki/XML
XML Basics
• Start Tag <person>
<name>Chuck</name>
• End Tag
<phone type="intl">
• Text Content +1 734 303 4456
• Attribute
</phone>
<email hide="yes" />
• Self Closing Tag </person>
White Space
<person> Line ends do not matter.
<name>Chuck</name>
White space is generally
<phone type="intl">
+1 734 303 4456
discarded on text elements.
</phone> We indent only to be
<email hide="yes" /> readable.
</person>
<person>
<name>Chuck</name>
<phone type="intl">+1 734 303 4456</phone>
<email hide="yes" />
</person>
XML Terminology
• Tags indicate the beginning and ending of elements
• Attributes - Keyword/value pairs on the opening tag of XML
• Serialize / De-Serialize - Convert data in one program into a
common format that can be stored and/or transmitted between
systems in a programming language-independent manner

http://en.wikipedia.org/wiki/Serialization
XML as a Tree
a
<a>
<b>X</b>
<c>
b c
<d>Y</d>
<e>Z</e>
</c> X d e
</a>

Elements Text Y Z
XML Text and Attributes
a
<a>
<b w="5">X</b>
<c> w
b text
c
<d>Y</d> attrib node
<e>Z</e>
</c> 5 X d e
</a>

Elements Text Y Z
XML as Paths a
<a>
<b>X</b>
b c
<c> /a/b X
<d>Y</d> /a/c/d Y
<e>Z</e> /a/c/e Z X d e
</c>
</a>
Y Z
Elements Text
XML Schema
Describing a “contract” as to what is acceptable XML

http://en.wikipedia.org/wiki/Xml_schema
http://en.wikibooks.org/wiki/XML_Schema
XML Schema
• Description of the legal format of an XML document

• Expressed in terms of constraints on the structure and content of


documents

• Often used to specify a “contract” between systems - “My system


will only accept XML that conforms to this particular Schema.”

• If a particular piece of XML meets the specification of the Schema


- it is said to “validate”
http://en.wikipedia.org/wiki/Xml_schema
XML Validation
XML
Document

XML Schema
Validator
Contract
XML Document XML Validation
<person>
<lastname>Severance</lastname>
<age>17</age>
<dateborn>2001-04-17</dateborn>
</person>

XML Schema Contract


<xs:complexType name=”person”>
<xs:sequence>
Validator
<xs:element name="lastname" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="dateborn" type="xs:date"/>
</xs:sequence>
</xs:complexType>
Many XML Schema Languages
• Document Type Definition (DTD)

- http://en.wikipedia.org/wiki/Document_Type_Definition

• Standard Generalized Markup Language (ISO 8879:1986 SGML)

- http://en.wikipedia.org/wiki/SGML

• XML Schema from W3C - (XSD)

- http://en.wikipedia.org/wiki/XML_Schema_(W3C)
http://en.wikipedia.org/wiki/Xml_schema
xml1.py
import xml.etree.ElementTree as ET
data = '''<person>
<name>Chuck</name>
<phone type="intl">
+1 734 303 4456
</phone>
<email hide="yes"/>
</person>'''

tree = ET.fromstring(data)
print('Name:',tree.find('name').text)
print('Attr:',tree.find('email').get('hide'))
import xml.etree.ElementTree as ET xml2.py
input = '''<stuff>
<users>
<user x="2">
<id>001</id>
<name>Chuck</name>
</user>
<user x="7">
<id>009</id>
<name>Brent</name>
</user>
</users>
</stuff>'''

stuff = ET.fromstring(input)
lst = stuff.findall('users/user')
print('User count:', len(lst))
for item in lst:
print('Name', item.find('name').text)
print('Id', item.find('id').text)
print('Attribute', item.get("x"))
JavaScript Object Notation
import json json1.py
data = '''{
"name" : "Chuck",
"phone" : {
"type" : "intl",
"number" : "+1 734 303 4456" JSON represents data
}, as nested “lists” and
"email" : { “dictionaries”
"hide" : "yes"
}
}'''

info = json.loads(data)
print('Name:',info["name"])
print('Hide:',info["email"]["hide"])
import json json2.py
input = '''[
{ "id" : "001",
"x" : "2",
"name" : "Chuck"
} ,
{ "id" : "009", JSON represents data
"x" : "7",
"name" : "Chuck" as nested “lists” and
} “dictionaries”
]'''

info = json.loads(input)
print('User count:', len(info))
for item in info:
print('Name', item['name'])
print('Id', item['id'])
print('Attribute', item['x'])
Service Oriented Approach

http://en.wikipedia.org/wiki/Service-oriented_architecture
Service Oriented Approach
• Most non-trivial web applications use services Application

• They use services from other applications


APIs
- Credit Card Charge

- Hotel Reservation systems

• Services publish the “rules” applications must Service


Service
follow to make use of the service (API)
Application Program Interface
The API itself is largely abstract in that it specifies an
interface and controls the behavior of the objects specified
in that interface. The software that provides the functionality
described by an API is said to be an “implementation” of the
API. An API is typically defined in terms of the
programming language used to build an application.

http://en.wikipedia.org/wiki/API
import urllib.request, urllib.parse, urllib.error
import json

while True:
address = input('Enter location: ')
if len(address) < 1: break

url = 'https://nominatim.openstreetmap.org/search/' + urllib.parse.quote(address) + '?format=json'

print('Retrieving', url)
uh = urllib.request.urlopen(url)
data = uh.read().decode()
print('Retrieved', len(data), 'characters')

try:
js = json.loads(data)
except:
js = None

print(js[0]['lat'])
print(js[0]['lon'])
print(js[0]['display_name'])

geojson.py
Acknowledgements / Contributions
Thes slide are Copyright 2010- Charles R. Severance (www.dr-
...
chuck.com) of the University of Michigan School of Information
and open.umich.edu and made available under a Creative
Commons Attribution 4.0 License. Please maintain this last slide
in all copies of the document to comply with the attribution
requirements of the license. If you make a change, feel free to
add your name and organization to the list of contributors on this
page as you republish the materials.

Initial Development: Charles Severance, University of Michigan


School of Information

… Insert new Contributors here

You might also like