Python Standard Library - Fredrik Lundb PDF
Python Standard Library - Fredrik Lundb PDF
The Python 2.0 distribution comes with an extensive standard library, comprising over 200 modules.
This book provides a brief description of each module, plus one or more sample scripts showing how to
use it. All in all, this book contains 360 sample scripts.
Since I first stumbled upon Python some five years ago, I've spent hundreds of hours answering
questions on the comp.lang.python newsgroup. Maybe someone found a module that might be
exactly what he wanted, but he couldn't really figure out how to use it. Maybe someone had picked the
wrong module for the task. Or maybe someone tried to reinvent the wheel. Often, a short sample script
could be much more helpful than a pointer to the reference documentation.
After posting a couple of scripts each week, for a number of years, you end up with a rather large
collection of potentially useful scripts. What you'll find in this book are the best parts from over 3,000
newsgroup messages. You'll also find hundreds of new scripts added to make sure every little nook and
cranny of the standard library has been fully covered.
I've worked hard to make the scripts easy to understand and adaptable. I've intentionally kept the
annotations as short as possible. If you want more background, there's plenty of reference material
shipped with most Python distributions. In this book, the emphasis is on the code.
Comments, suggestions, and bug reports are welcome. Send them to [email protected]. I read
all mail as soon as it arrives, but it might take a while until I get around to answer.
For updates, addenda, and other information related to this book, point your favorite web browser to
http://www.pythonware.com/people/fredrik/librarybook.htm
This book covers the entire standard library, except the (optional) Tkinter user interface library. There
are several reasons for this, mostly related to time, space, and the fact that I'm working on several
other Tkinter documentation projects.
For current status on these projects, see http://www.pythonware.com/people/fredrik/tkinterbook.htm
Production details
This book was written in DocBook SGML. I used a variety of tools, including Secret Labs'
PythonWorks, and Excosoft Documentor, James Clark's Jade DSSSL processor, and Norm Walsh's
DocBook stylesheets. And a bunch of Python scripts, of course.
Thanks to my referees: Tim Peters, Guido van Rossum, David Ascher, Mark Lutz, and Rael Dornfest,
and the PythonWare crew: Matthew Ellis, Håkan Karlsson, and Rune Uhlin.
Thanks to Lenny Muellner, who turned my SGML files into the book you see before you, and to
Christien Shanreaw, who pulled all the different text and code files together for the book and the CD-
ROM.
Core Modules
"Since the functions in the C runtime library are not part of the Win32
API, we believe the number of applications that will be affected by this
bug to be very limited"
Overview
Python's standard library covers a wide range of modules. Everything from modules that are as much a
part of the Python language as the types and statements defined by the language specification, to
obscure modules that are probably useful only to a small number of programs.
This section describes a number of fundamental standard library modules. Any larger Python program
is likely to use most of these modules, either directly or indirectly.
Two modules are even more basic than all other modules combined: the __builtin__ module defines
built-in functions (like len, int, and range), and the exceptions module defines all built-in
exceptions.
Python imports both modules when it starts up, and makes their content available for all programs.
Several built-in types have support modules in the standard library. The string module implements
commonly used string operations, the math module provides math operations and constants, and the
cmath module does the same for complex numbers.
Regular Expressions
The re module provides regular expressions support for Python. Regular expressions are string
patterns written in a special syntax, which can be used to match strings, and extract substrings.
sys gives you access to various interpreter variables, such as the module search path, and the
interpreter version. operator provides functional equivalents to many built-in operators. copy allows
you to copy objects. And finally, gc gives you more control over the garbage collector facilities in
Python 2.0.
Python Standard Library: Core Modules 1-3
whither canada?
15
To pass keyword arguments to a function, you can use a dictionary as the third argument to apply:
crunchy frog
crunchy frog
crunchy frog
Python Standard Library: Core Modules 1-4
One common use for apply is to pass constructor arguments from a subclass on to the base class,
especially if the constructor takes a lot of arguments.
class Rectangle:
def __init__(self, color="white", width=10, height=10):
print "create a", color, self, "sized", width, "x", height
class RoundedRectangle(Rectangle):
def __init__(self, **kw):
apply(Rectangle.__init__, (self,), kw)
Python 2.0 provides an alternate syntax. Instead of apply, you can use an ordinary function call, and
use * to mark the tuple, and ** to mark the dictionary.
The following two statements are equivalent:
result = function(*args, **kwargs)
result = apply(function, args, kwargs)
The trick is that you can actually call this function directly. This can be handy if you have the module
name in a string variable, like in the following example, which imports all modules whose names end
with "-plugin":
import glob, os
modules = []
Note that the plugin modules have hyphens in the name. This means that you cannot import such a
module using the ordinary import command, since you cannot have hyphens in Python identifiers.
Here's the plugin used in this example:
def hello():
print "example-plugin says hello"
The following example shows how to get a function object, given that you have the module and
function name as strings:
You can also use this function to implement lazy loading of modules. In the following example, the
string module is imported when it is first used:
class LazyImport:
def __init__(self, module_name):
self.module_name = module_name
self.module = None
def __getattr__(self, name):
if self.module is None:
self.module = __import__(self.module_name)
return getattr(self.module, name)
string = LazyImport("string")
print string.lowercase
abcdefghijklmnopqrstuvwxyz
Python provides some basic support for reloading modules that you've already imported. The following
example loads the hello.py file three times:
import hello
reload(hello)
reload(hello)
reload uses the module name associated with the module object, not the variable name. This means
that even if you've renamed the module, reload will still be able to find the original module.
Note that when you reload a module, it is recompiled, and the new module replaces the old one in the
module dictionary. However, if you have created instances of classes defined in that module, those
instances will still use the old implementation.
Likewise, if you've used from-import to create references to module members in other modules,
those references will not be updated.
Python Standard Library: Core Modules 1-7
Looking in namespaces
The dir function returns a list of all members of a given module, class, instance, or other type. It's
probably most useful when you're working with an interactive Python interpreter, but can also come in
handy in other situations.
def dump(value):
print value, "=>", dir(value)
import sys
dump(0)
dump(1.0)
dump(0.0j) # complex number
dump([]) # list
dump({}) # dictionary
dump("string")
dump(len) # function
dump(sys) # module
0 => []
1.0 => []
0j => ['conjugate', 'imag', 'real']
[] => ['append', 'count', 'extend', 'index', 'insert',
'pop', 'remove', 'reverse', 'sort']
{} => ['clear', 'copy', 'get', 'has_key', 'items',
'keys', 'update', 'values']
string => []
<built-in function len> => ['__doc__', '__name__', '__self__']
<module 'sys' (built-in)> => ['__doc__', '__name__',
'__stderr__', '__stdin__', '__stdout__', 'argv',
'builtin_module_names', 'copyright', 'dllhandle',
'exc_info', 'exc_type', 'exec_prefix', 'executable',
...
In the following example, the getmember function returns all class-level attributes and methods
defined by a given class:
class A:
def a(self):
pass
def b(self):
pass
Python Standard Library: Core Modules 1-8
class B(A):
def c(self):
pass
def d(self):
pass
print getmembers(A)
print getmembers(B)
print getmembers(IOError)
Note that the getmembers function returns an ordered list. The earlier a name appears in the list, the
higher up in the class hierarchy it's defined. If order doesn't matter, you can use a dictionary to collect
the names instead of a list.
The vars function is similar, but it returns a dictionary containing the current value for each member.
If you use it without an argument, it returns a dictionary containing what's visible in the current local
namespace:
book = "library2"
pages = 250
scripts = 350
print "the %(book)s book contains more than %(scripts)s scripts" % vars()
Python is a dynamically typed language, which means that a given variable can be bound to values of
different types at different occasions. In the following example, the same function is called with an
integer, a floating point value, and a string:
def function(value):
print value
function(1)
function(1.0)
function("one")
The type function allows you to check what type a variable has. This function returns a type
descriptor, which is a unique object for each type provided by the Python interpreter.
def dump(value):
print type(value), value
dump(1)
dump(1.0)
dump("one")
<type 'int'> 1
<type 'float'> 1.0
<type 'string'> one
Each type has a single corresponding type object, which means that you can use the is operator (object
identity) to do type testing:
Example: Using the type function to distinguish between file names and file objects
# File:builtin-type-example-2.py
def load(file):
if isinstance(file, type("")):
file = open(file, "rb")
return file.read()
4672 bytes
4672 bytes
Python Standard Library: Core Modules 1-10
The callable function checks if an object can be called (either directly or via apply). It returns true for
functions, methods, lambda expressions, classes, and class instances which define the __call__
method.
def dump(function):
if callable(function):
print function, "is callable"
else:
print function, "is *not* callable"
class A:
def method(self, value):
return value
class B(A):
def __call__(self, value):
return value
a = A()
b = B()
dump(A) # classes
dump(B)
dump(B.method)
dump(a) # instances
dump(b)
dump(b.method)
0 is *not* callable
string is *not* callable
<built-in function callable> is callable
<function dump at 8ca320> is callable
A is callable
B is callable
<unbound method A.method> is callable
<A instance at 8caa10> is *not* callable
<B instance at 8cab00> is callable
<method A.method of B instance at 8cab00> is callable
Note that the class objects (A and B) are both callable; if you call them, they create new objects.
However, instances of class A are not callable, since that class doesn't have a __call__ method.
You'll find functions to check if an object is of any of the built-in number, sequence, or dictionary types
in the operator module. However, since it's easy to create a class that implements e.g. the basic
sequence methods, it's usually a bad idea to use explicit type testing on such objects.
Python Standard Library: Core Modules 1-11
Things get even more complicated when it comes to classes and instances. Python doesn't treat classes
as types per se. Instead, all classes belong to a special class type, and all class instances belong to a
special instance type.
This means that you cannot use type to test if an instance belongs to a given class; all instances have
the same type! To solve this, you can use the isinstance function, which checks if an object is an
instance of a given class (or of a subclass to it).
class A:
pass
class B:
pass
class C(A):
pass
def dump(object):
print object, "=>",
if isinstance(object, A):
print "A",
if isinstance(object, B):
print "B",
if isinstance(object, C):
print "C",
if isinstance(object, D):
print "D",
print
a = A()
b = B()
c = C()
d = D()
dump(a)
dump(b)
dump(c)
dump(d)
dump(0)
dump("string")
The issubclass function is similar, but checks whether a class object is the same as a given class, or is
a subclass of it.
Note that while isinstance accepts any kind of object, the issubclass function raises a TypeError
exception if you use it on something that is not a class object.
class A:
pass
class B:
pass
class C(A):
pass
def dump(object):
print object, "=>",
if issubclass(object, A):
print "A",
if issubclass(object, B):
print "B",
if issubclass(object, C):
print "C",
if issubclass(object, D):
print "D",
print
dump(A)
dump(B)
dump(C)
dump(D)
dump(0)
dump("string")
A => A
B => B
C => A C
D => A B D
0 =>
Traceback (innermost last):
File "builtin-issubclass-example-1.py", line 29, in ?
File "builtin-issubclass-example-1.py", line 15, in dump
TypeError: arguments must be classes
Python Standard Library: Core Modules 1-13
Python provides several ways to interact with the interpreter from within a program. For example, the
eval function evaluates a string as if it were a Python expression. You can pass it a literal, simple
expressions, or even use built-in functions:
def dump(expression):
result = eval(expression)
print expression, "=>", result, type(result)
dump("1")
dump("1.0")
dump("'string'")
dump("1.0 + 2.0")
dump("'*' * 10")
dump("len('world')")
A problem with eval is that if you cannot trust the source from which you got the string, you may get
into trouble. For example, someone might use the built-in __import__ function to load the os
module, and then remove files on your disk:
print eval("__import__('os').getcwd()")
print eval("__import__('os').remove('file')")
/home/fredrik/librarybook
Traceback (innermost last):
File "builtin-eval-example-2", line 2, in ?
File "<string>", line 0, in ?
os.error: (2, 'No such file or directory')
Note that you get an os.error exception, which means that Python actually tried to remove the file!
Python Standard Library: Core Modules 1-14
Luckily, there's a way around this problem. You can pass a second argument to eval, which should
contain a dictionary defining the namespace in which the expression is evaluated. Let's pass in an
empty namespace:
If you print the contents of the namespace variable, you'll find that it contains the full set of built-in
functions.
The solution to this little dilemma isn't far away: since Python doesn't add this item if it is already
there, you just have to add a dummy item called __builtins__ to the namespace before calling eval:
/home/fredrik/librarybook
Traceback (innermost last):
File "builtin-eval-example-3.py", line 2, in ?
File "<string>", line 0, in ?
NameError: __import__
Note that this doesn't product you from CPU or memory resource attacks (for example, something like
eval("'*'*1000000*2*2*2*2*2*2*2*2*2") will most likely cause your program to run out of
memory after a while)
Python Standard Library: Core Modules 1-15
The eval function only works for simple expressions. To handle larger blocks of code, use the compile
and exec functions:
NAME = "script.py"
BODY = """
prnt 'owl-stretching time'
"""
try:
compile(BODY, NAME, "exec")
except SyntaxError, v:
print "syntax error:", v, "in", NAME
When successful, the compile function returns a code object, which you can execute with the exec
statement:
BODY = """
print 'the ant, an introduction'
"""
print code
exec code
To generate code on the fly, you can use the class shown in the following example. Use the write
method to add statements, and indent and dedent to add structure, and this class takes care of the
rest.
class CodeGeneratorBackend:
"Simple code generator for Python"
def end(self):
self.code.append("") # make sure there's a newline at the end
return compile(string.join(self.code, "\n"), "<code>", "exec")
def indent(self):
self.level += 1
# in Python 1.5.2 and earlier, use this instead:
# self.level = self.level + 1
def dedent(self):
if self.level == 0:
raise SyntaxError, "internal error in code generator"
self.level -= 1
# in Python 1.5.2 and earlier, use this instead:
# self.level = self.level - 1
#
# try it out!
c = CodeGeneratorBackend()
c.begin()
c.write("for i in range(5):")
c.indent()
c.write("print 'code generation made easy!'")
c.dedent()
exec c.end()
Python also provides a function called execfile. It's simply a shortcut for loading code from a file,
compiling it, and executing it. The following example shows how to use and emulate this function.
execfile("hello.py")
EXECFILE("hello.py")
The hello.py file used in this example has the following contents:
Since Python looks among the built-in functions after it has checked the local and module namespace,
there may be situations when you need to explicitly refer to the __builtin__ module. For example,
the following script overloads the open function with a version that opens an ordinary file and checks
that it starts with a "magic" string. To be able to use the original open function, it explicitly refers to it
using the module name.
Python Standard Library: Core Modules 1-18
fp = open("samples/sample.gif")
print len(fp.read()), "bytes"
fp = open("samples/sample.jpg")
print len(fp.read()), "bytes"
3565 bytes
Traceback (innermost last):
File "builtin-open-example-1.py", line 12, in ?
File "builtin-open-example-1.py", line 5, in open
IOError: not a GIF file
Python Standard Library: Core Modules 1-19
You can create your own exception classes. Just inherit from the built-in Exception class (or a proper
standard exception), and override the constructor and/or __str__ method as necessary.
Python Standard Library: Core Modules 1-21
class HTTPError(Exception):
# indicates an HTTP protocol error
def __init__(self, url, errcode, errmsg):
self.url = url
self.errcode = errcode
self.errmsg = errmsg
def __str__(self):
return (
"<HTTPError for %s: %s %s>" %
(self.url, self.errcode, self.errmsg)
)
try:
raise HTTPError("http://www.python.org/foo", 200, "Not Found")
except HTTPError, error:
print "url", "=>", error.url
print "errcode", "=>", error.errcode
print "errmsg", "=>", error.errmsg
raise # reraise exception
The os module
This module provides a unified interface to a number of operating system functions.
Most of the functions in this module are implemented by platform specific modules, such as posix and
nt. The os module automatically loads the right implementation module when it is first imported.
import os
import string
try:
# remove old temp file, if any
os.remove(temp)
except os.error:
pass
fi = open(file)
fo = open(temp, "w")
for s in fi.readlines():
fo.write(string.replace(s, search_for, replace_with))
fi.close()
fo.close()
try:
# remove old backup file, if any
os.remove(back)
except os.error:
pass
#
# try it out!
file = "samples/sample.txt"
import os
sample.au
sample.jpg
sample.wav
...
The getcwd and chdir functions are used to get and set the current directory:
import os
# go down
os.chdir("samples")
print "2", os.getcwd()
# go back up
os.chdir(os.pardir)
print "3", os.getcwd()
1 /ematter/librarybook
2 /ematter/librarybook/samples
3 /ematter/librarybook
Python Standard Library: Core Modules 1-24
The makedirs and removedirs functions are used to create and remove directory hierarchies.
Example: Using the os module to create and remove multiple directory levels
# File:os-example-6.py
import os
os.makedirs("test/multiple/levels")
fp = open("test/multiple/levels/file", "w")
fp.write("inspector praline")
fp.close()
Note that removedirs removes all empty directories along the given path, starting with the last
directory in the given path name. In contrast, the mkdir and rmdir functions can only handle a single
directory level.
import os
os.mkdir("test")
os.rmdir("test")
To remove non-empty directories, you can use the rmtree function in the shutil module.
The stat function fetches information about an existing file. It returns a 9-tuple which contains the
size, inode change timestamp, modification timestamp, and access privileges.
Python Standard Library: Core Modules 1-25
import os
import time
file = "samples/sample.jpg"
def dump(st):
mode, ino, dev, nlink, uid, gid, size, atime, mtime, ctime = st
print "- size:", size, "bytes"
print "- owner:", uid, gid
print "- created:", time.ctime(ctime)
print "- last accessed:", time.ctime(atime)
print "- last modified:", time.ctime(mtime)
print "- mode:", oct(mode)
print "- inode/dev:", ino, dev
#
# get stats for a filename
st = os.stat(file)
#
# get stats for an open file
fp = open(file)
st = os.fstat(fp.fileno())
stat samples/sample.jpg
- size: 4762 bytes
- owner: 0 0
- created: Tue Sep 07 22:45:58 1999
- last accessed: Sun Sep 19 00:00:00 1999
- last modified: Sun May 19 01:42:16 1996
- mode: 0100666
- inode/dev: 0 2
fstat samples/sample.jpg
- size: 4762 bytes
- owner: 0 0
- created: Tue Sep 07 22:45:58 1999
- last accessed: Sun Sep 19 00:00:00 1999
- last modified: Sun May 19 01:42:16 1996
- mode: 0100666
- inode/dev: 0 0
Python Standard Library: Core Modules 1-26
Some fields don't make sense on non-Unix platforms; for example, the (inode, dev) tuple provides a
unique identity for each file on Unix, but can contain arbitrary data on other platforms.
The stat module contains a number of useful constants and helper functions for dealing with the
members of the stat tuple. Some of these are shown in the examples below.
You can modify the mode and time fields using the chmod and utime functions:
import os
import stat, time
infile = "samples/sample.jpg"
outfile = "out.jpg"
# copy contents
fi = open(infile, "rb")
fo = open(outfile, "wb")
while 1:
s = fi.read(10000)
if not s:
break
fo.write(s)
fi.close()
fo.close()
original =>
mode 0666
atime Thu Oct 14 15:15:50 1999
mtime Mon Nov 13 15:42:36 1995
copy =>
mode 0666
atime Thu Oct 14 15:15:50 1999
mtime Mon Nov 13 15:42:36 1995
Python Standard Library: Core Modules 1-27
The system function runs a new command under the current process, and waits for it to finish.
import os
if os.name == "nt":
command = "dir"
else:
command = "ls -l"
os.system(command)
The command is run via the operating system's standard shell, and returns the shell's exit status.
Under Windows 95/98, the shell is usually command.com whose exit status is always 0.
Warning:
The exec function starts a new process, replacing the current one ("go to process", in other words). In
the following example, note that the "goodbye" message is never printed:
import os
import sys
program = "python"
arguments = ["hello.py"]
Python provides a whole bunch of exec functions, with slightly varying behavior. The above example
uses execvp, which searches for the program along the standard path, passes the contents of the
Python Standard Library: Core Modules 1-28
second argument tuple as individual arguments to that program, and runs it with the current set of
environment variables. See the Python Library Reference for more information on the other seven
ways to call this function.
Under Unix, you can call other programs from the current one by combining exec with two other
functions, fork and wait. The former makes a copy of the current process, the latter waits for a child
process to finish.
import os
import sys
run("python", "hello.py")
print "goodbye"
The fork returns zero in the new process (the return from fork is the first thing that happens in that
process!), and a non-zero process identifier in the original process. Or in other words, "not pid" is true
only if we're in the new process.
fork and wait are not available on Windows, but you can use the spawn function instead.
Unfortunately, there's no standard version of spawn that searches for an executable along the path, so
you have to do that yourself:
import os
import string
run("python", "hello.py")
Python Standard Library: Core Modules 1-29
print "goodbye"
You can also use spawn to run other programs in the background. The following example adds an
optional mode argument to the run function; when set to os.P_NOWAIT, the script doesn't wait for
the other program to finish.
The default flag value os.P_WAIT tells spawn to wait until the new process is finished. Other flags
include os.P_OVERLAY which makes spawn behave like exec, and os.P_DETACH which runs
the new process in the background, detached from both console and keyboard.
Example: Using the os module to run another program in the background (Windows)
# File:os-spawn-example-2.py
import os
import string
goodbye
hello again, and welcome to the show
Python Standard Library: Core Modules 1-30
The following example provides a spawn method that works on either platform:
import os
import string
#
# try it out!
spawn("python", "hello.py")
print "goodbye"
The above example first attempts to call a function named spawnvp. If that doesn't exist (it doesn't, in
2.0 and earlier), the function looks for a function named spawnv and searches the path all by itself. As
a last resort, it falls back on exec and fork.
Python Standard Library: Core Modules 1-31
On Unix, fork can also be used to turn the current process into a background process (a "daemon").
Basically, all you need to do is to fork off a copy of the current process, and terminate the original
process:
import os
import time
pid = os.fork()
if pid:
os._exit(0) # kill original
However, it takes a bit more work to create a real daemon. First, call setpgrp to make the new process
a "process group leader". Otherwise, signals sent to a (by that time) unrelated process group might
cause problems in your daemon:
os.setpgrp()
It's also a good idea to remove the user mode mask, to make sure files created by the daemon actually
gets the mode flags specified by the program:
os.umask(0)
Then, you should redirect the stdout/stderr files, instead of just closing them. If you don't do this, you
may get unexpected exceptions the day some of your code tries to write something to the console via
stdout or stderr.
class NullDevice:
def write(self, s):
pass
sys.stdin.close()
sys.stdout = NullDevice()
sys.stderr = NullDevice()
In other words, while Python's print and C's printf/fprintf won't crash your program if the devices
have been disconnected, sys.stdout.write() happily throws an IOError exception when the
application runs as a daemon. But your program works just fine when running in the foreground...
Python Standard Library: Core Modules 1-32
By the way, the _exit function used in the examples above terminates the current process. In contrast
to sys.exit, this works also if the caller happens to catch the SystemExit exception:
import os
import sys
try:
sys.exit(1)
except SystemExit, value:
print "caught exit(%s)" % value
try:
os._exit(2)
except SystemExit, value:
print "caught exit(%s)" % value
print "bye!"
caught exit(1)
Python Standard Library: Core Modules 1-33
import os
filename = "my/little/pony"
using nt ...
split => ('my/little', 'pony')
splitext => ('my/little/pony', '')
dirname => my/little
basename => pony
join => my/little\pony
This module also contains a number of functions that allow you to quickly figure out what a filename
represents:
import os
FILES = (
os.curdir,
"/",
"file",
"/file",
"samples",
"samples/sample.jpg",
"directory/file",
"../directory/file",
"/directory/file"
)
The expanduser function treats a user name shortcut in the same way as most modern Unix shells (it
doesn't work well on Windows).
Example: Using the os.path module to insert the user name into a filename
# File:os-path-expanduser-example-1.py
import os
print os.path.expanduser("~/.pythonrc")
/home/effbot/.pythonrc
import os
os.environ["USER"] = "user"
print os.path.expandvars("/home/$USER/config")
print os.path.expandvars("$USER/folders")
/home/user/config
user/folders
The walk function helps you find all files in a directory tree. It takes a directory name, a callback
function, and a data object that is passed on to the callback.
import os
The walk function has a somewhat obscure user interface (maybe it's just me, but I can never
remember the order of the arguments). The index function in the next example returns a list of
filenames instead, which lets you use a straightforward for-in loop to process the files:
import os
def index(directory):
# like os.listdir, but traverses directory trees
stack = [directory]
files = []
while stack:
directory = stack.pop()
for file in os.listdir(directory):
fullname = os.path.join(directory, file)
files.append(fullname)
if os.path.isdir(fullname) and not os.path.islink(fullname):
stack.append(fullname)
return files
.\aifc-example-1.py
.\anydbm-example-1.py
.\array-example-1.py
...
Python Standard Library: Core Modules 1-37
If you don't want to list all files (for performance or memory reasons), the following example uses a
different approach. Here, the DirectoryWalker class behaves like a sequence object, returning one
file at a time:
import os
class DirectoryWalker:
# a forward iterator that traverses a directory tree
.\aifc-example-1.py
.\anydbm-example-1.py
.\array-example-1.py
...
Note that this class doesn't check the index passed to the __getitem__ method. This means that it
won't do the right thing if you access the sequence members out of order.
Python Standard Library: Core Modules 1-38
Finally, if you're interested in the file sizes or timestamps, here's a version of the class that returns both
the filename and the tuple returned from os.stat. This version saves one or two stat calls for each file
(both os.path.isdir and os.path.islink uses stat), and runs quite a bit faster on some platforms.
Example: Using a directory walker to traverse a file system, returning both the filename
and additional file information
# File:os-path-walk-example-4.py
class DirectoryStatWalker:
# a forward iterator that traverses a directory tree, and
# returns the filename and additional file information
.\aifc-example-1.py 336
.\anydbm-example-1.py 244
.\array-example-1.py 526
Python Standard Library: Core Modules 1-39
import stat
import os, time
st = os.stat("samples/sample.txt")
import string
In Python 1.5.2 and earlier, this module uses functions from the strop implementation module where
possible.
In Python 1.6 and later, most string operations are made available as string methods as well, and many
functions in the string module are simply wrapper functions that call the corresponding string
method.
Example: Using string methods instead of string module functions (Python 1.6 and later)
# File:string-example-2.py
In addition to the string manipulation stuff, the string module also contains a number of functions
which convert strings to other types:
import string
print int("4711"),
print string.atoi("4711"),
print string.atoi("11147", 8), # octal
print string.atoi("1267", 16), # hexadecimal
print string.atoi("3mv", 36) # whatever...
print float("4711"),
print string.atof("1"),
print string.atof("1.23e5")
In most cases (especially if you're using 1.6 or later), you can use the int and float functions instead of
their string module counterparts.
The atoi function takes an optional second argument, which specifices the number base. If the base is
zero, the function looks at the first few characters before attempting to interpret the value: if "0x", the
base is set to 16 (hexadecimal), and if "0", the base is set to 8 (octal). The default is base 10 (decimal),
just as if you hadn't provided an extra argument.
In 1.6 and later, the int also accepts a second argument, just like atoi. But unlike the string versions,
int and float also accepts Unicode strings.
Python Standard Library: Core Modules 1-42
The re module
"Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems"
This module provides a set of powerful regular expression facilities. A regular expression is a string
pattern written in a compact (and quite cryptic) syntax, and this module allows you to quickly check
whether a given string matches a given pattern (using the match function), or contains such a pattern
(using the search function).
The match function attempts to match a pattern against the beginning of the given string. If the
pattern matches anything at all (including an empty string, if the pattern allows that!), match returns
a match object. The group method can be used to find out what matched.
import re
# a single character
m = re.match(".", text)
if m: print repr("."), "=>", repr(m.group(0))
# a string of digits
m = re.match("\d+", text)
if m: print repr("\d+"), "=>", repr(m.group(0))
You can use parentheses to mark regions in the pattern. If the pattern matched, the group method can
be used to extract the contents of these regions. group(1) returns the contents of the first group,
group(2) the contents of the second, etc. If you pass several group numbers to the group function, it
returns a tuple.
import re
text ="10/15/99"
m = re.match("(\d{2})/(\d{2})/(\d{2,4})", text)
if m:
print m.group(1, 2, 3)
The search function searches for the pattern inside the string. It basically tries the pattern at every
possible characters position, starting from the left, and returns a match object as soon it has found a
match. If the pattern doesn't match anywhere, it returns None.
import re
m = re.search("(\d{1,2})/(\d{1,2})/(\d{2,4})", text)
date = m.group(0)
print date
10 25 95
10 25 95
10/25/95
Python Standard Library: Core Modules 1-44
The sub function can be used to replace patterns with another string.
import re
You can also use sub to replace patterns via a callback function. The following example also shows how
to pre-compile patterns.
import re
import string
def octal(match):
# replace octal code with corresponding ASCII character
return chr(string.atoi(match.group(1), 8))
octal_pattern = re.compile(r"\\(\d\d\d)")
print text
print octal_pattern.sub(octal, text)
If you don't compile, the re module caches compiled versions for you, so you usually don't have to
compile regular expressions in small scripts. In Python 1.5.2, the cache holds 20 patterns. In 2.0, the
cache size has been increased to 100 patterns.
Python Standard Library: Core Modules 1-45
Finally, here's an example that shows you how to match a string against a list of patterns. The list of
patterns are combined into a single pattern, and pre-compiled to save time.
def combined_pattern(patterns):
p = re.compile(
string.join(map(lambda x: "("+x+")", patterns), "|")
)
def fixup(v, m=p.match, r=range(0,len(patterns))):
try:
regs = m(v).regs
except AttributeError:
return None # no match, so m.regs will fail
else:
for i in r:
if regs[i+1] != (-1, -1):
return i
return fixup
#
# try it out!
patterns = [
r"\d+",
r"abc\d{2,4}",
r"p\w+"
]
p = combined_pattern(patterns)
print p("129391")
print p("abc800")
print p("abc1600")
print p("python")
print p("perl")
print p("tcl")
0
1
1
2
2
None
Python Standard Library: Core Modules 1-46
import math
e => 2.71828182846
pi => 3.14159265359
hypot => 5.0
import cmath
pi => 3.14159265359
sqrt(-1) => 1j
import operator
sequence = 1, 2, 4
add => 7
sub => -5
mul => 8
concat => spamegg
repeat => spamspamspamspamspam
getitem => 4
indexOf => 1
sequenceIncludes => 0
Python Standard Library: Core Modules 1-48
The module also contains a few functions which can be used to check object types:
import operator
import UserList
def dump(data):
print type(data), "=>",
if operator.isCallable(data):
print "CALLABLE",
if operator.isMappingType(data):
print "MAPPING",
if operator.isNumberType(data):
print "NUMBER",
if operator.isSequenceType(data):
print "SEQUENCE",
print
dump(0)
dump("string")
dump("string"[0])
dump([1, 2, 3])
dump((1, 2, 3))
dump({"a": 1})
dump(len) # function
dump(UserList) # module
dump(UserList.UserList) # class
dump(UserList.UserList()) # instance
Note that the operator module doesn't handle object instances in a sane fashion. In other words, be
careful when you use the isNumberType, isMappingType, and isSequenceType functions. It's
easy to make your code less flexible than it has to be.
Also note that a string sequence member (a character) is also a sequence. If you're writing a recursive
function that uses isSequenceType to traverse an object tree, you better not pass it an ordinary string
(or anything containing one).
Python Standard Library: Core Modules 1-49
import copy
a = [[1],[2],[3]]
b = copy.copy(a)
# modify original
a[0][0] = 0
a[1] = None
before =>
[[1], [2], [3]]
[[1], [2], [3]]
after =>
[[0], None, [3]]
[[0], [2], [3]]
Note that you can make shallow copies of lists using the [:] syntax (a full slice), and make copies of
dictionaries using the copy method.
Python Standard Library: Core Modules 1-50
In contrast, deepcopy(object) -> object creates a "deep" copy of the given object. If the object is a
container, all members are copied as well, recursively.
import copy
a = [[1],[2],[3]]
b = copy.deepcopy(a)
# modify original
a[0][0] = 0
a[1] = None
before =>
[[1], [2], [3]]
[[1], [2], [3]]
after =>
[[0], None, [3]]
[[1], [2], [3]]
Python Standard Library: Core Modules 1-51
import sys
if len(sys.argv) > 1:
print "there are", len(sys.argv)-1, "arguments:"
for arg in sys.argv[1:]:
print arg
else:
print "there are no arguments!"
If you read the script from standard input (like "python < sys-argv-example-1.py"), the script
name is set to an empty string. If you pass in the program as a string (using the -c option), the script
name is set to "-c"
Python Standard Library: Core Modules 1-52
The path list contains a list of directory names where Python looks for extension modules (Python
source modules, compiled modules, or binary extensions). When you start Python, this list is initialized
from a mixture of built-in rules, the contents of the PYTHONPATH environment variable, and the
registry contents (on Windows). But since it's an ordinary list, you can also manipulate it from within
the program:
Example: Using the sys module to manipulate the module search path
# File:sys-path-example-1.py
import sys
The builtin_module_names list contains the names of all modules built into the Python
interpreter.
import sys
def dump(module):
print module, "=>",
if module in sys.builtin_module_names:
print "<BUILTIN>"
else:
module = __import__(module)
print module.__file__
dump("os")
dump("sys")
dump("string")
dump("strop")
dump("zlib")
os => C:\python\lib\os.pyc
sys => <BUILTIN>
string => C:\python\lib\string.pyc
strop => <BUILTIN>
zlib => C:\python\zlib.pyd
The modules dictionary contains all loaded modules. The import statement checks this dictionary
before it actually loads something from disk.
As you can see from the following example, Python loads quite a bunch of modules before it hands
control over to your script.
import sys
print sys.modules.keys()
The getrefcount function returns the reference count for a given object — that is, the number of
places where this variable is used. Python keeps track of this value, and when it drops to zero, the
object is destroyed.
import sys
variable = 1234
print sys.getrefcount(0)
print sys.getrefcount(variable)
print sys.getrefcount(None)
50
3
192
Note that this value is always larger than the actual count, since the function itself hangs on to the
object while determining the value.
import sys
#
# emulate "import os.path" (sort of)...
if sys.platform == "win32":
import ntpath
pathmodule = ntpath
elif sys.platform == "mac":
import macpath
pathmodule = macpath
else:
# assume it's a posix platform
import posixpath
pathmodule = posixpath
print pathmodule
Typical platform names are win32 for Windows 9X/NT and mac for Macintosh. For Unix systems,
Python Standard Library: Core Modules 1-55
the platform name is usually derived from the output of the "uname -r" command, such as irix6,
linux2, or sunos5 (Solaris).
The setprofiler function allows you to install a profiling function. This is called every time a function
or method is called, at every return (explicit or implied), and for each exception:
import sys
def test(n):
j=0
for i in range(n):
j=j+i
return n
# disable profiler
sys.setprofile(None)
The profile module provides a complete profiler framework, based on this function.
Python Standard Library: Core Modules 1-56
The settrace function is similar, but the trace function is called for each new line:
import sys
def test(n):
j=0
for i in range(n):
j=j+i
return n
# disable tracing
sys.settrace(None)
The pdb module provides a complete debugger framework, based on the tracing facilities offered by
this function.
Python Standard Library: Core Modules 1-57
The stdin, stdout and stderr variables contain stream objects corresponding to the standard I/O
streams. You can access them directly if you need better control over the output than print can give
you. You can also replace them, if you want to redirect output and input to some other device, or
process them in some non-standard way:
import sys
import string
class Redirect:
print "MÅÅÅÅL!"
All it takes to redirect output is an object that implements the write method.
(Unless it's a C type instance, that is: Python uses an integer attribute called softspace to control
spacing, and adds it to the object if it isn't there. You don't have to bother if you're using Python
objects, but if you need to redirect to a C type, you should make sure that type supports the softspace
attribute.)
Python Standard Library: Core Modules 1-58
When you reach the end of the main program, the interpreter is automatically terminated. If you need
to exit in midflight, you can call the sys.exit function instead. This function takes an optional integer
value, which is returned to the calling program.
import sys
print "hello"
sys.exit(1)
print "there"
hello
It may not be obvious, but sys.exit doesn't exit at once. Instead, it raises a SystemExit exception.
This means that you can trap calls to sys.exit in your main program:
import sys
print "hello"
try:
sys.exit(1)
except SystemExit:
pass
print "there"
hello
there
Python Standard Library: Core Modules 1-59
If you want to clean things up after you, you can install an "exit handler", which is a function that is
automatically called on the way out.
import sys
def exitfunc():
print "world"
sys.exitfunc = exitfunc
print "hello"
sys.exit(1)
print "there" # never printed
hello
world
In Python 2.0, you can use the atexit module to register more than one exit handler.
Python Standard Library: Core Modules 1-60
import atexit
def exit(*args):
print "exit", args
import time
now = time.time()
or in other words:
- local time: (1999, 9, 19, 18, 25, 59, 6, 262, 1)
- utc: (1999, 9, 19, 16, 25, 59, 6, 262, 0)
The tuple returned by localtime and gmtime contains (year, month, day, hour, minute, second, day
of week, day of year, daylight savings flag), where the year number is four digits, the day of week begins
with 0 for Monday, and January 1st is day number 1.
Python Standard Library: Core Modules 1-62
You can of course use standard string formatting operators to convert a time tuple to a string, but the
time module also provides a number of standard conversion functions:
Example: Using the time module to format dates and times
# File:time-example-2.py
import time
now = time.localtime(time.time())
print time.asctime(now)
print time.strftime("%y/%m/%d %H:%M", now)
print time.strftime("%a %b %d", now)
print time.strftime("%c", now)
print time.strftime("%I %p", now)
print time.strftime("%Y-%m-%d %H:%M:%S %Z", now)
# do it by hand...
year, month, day, hour, minute, second, weekday, yearday, daylight = now
print "%04d-%02d-%02d" % (year, month, day)
print "%02d:%02d:%02d" % (hour, minute, second)
print ("MON", "TUE", "WED", "THU", "FRI", "SAT", "SUN")[weekday], yearday
On some platforms, the time module contains a strptime function, which is pretty much the opposite
of strftime. Given a string and a pattern, it returns the corresponding time tuple:
import time
The time.strptime function is currently only made available by Python if it's provided by the
platform's C libraries. For platforms that don't have a standard implementation (this includes
Windows), here's a partial replacement:
Python Standard Library: Core Modules 1-64
import re
import string
SPEC = {
# map formatting code to a regular expression fragment
"%a": "(?P<weekday>[a-z]+)",
"%A": "(?P<weekday>[a-z]+)",
"%b": "(?P<month>[a-z]+)",
"%B": "(?P<month>[a-z]+)",
"%C": "(?P<century>\d\d?)",
"%d": "(?P<day>\d\d?)",
"%D": "(?P<month>\d\d?)/(?P<day>\d\d?)/(?P<year>\d\d)",
"%e": "(?P<day>\d\d?)",
"%h": "(?P<month>[a-z]+)",
"%H": "(?P<hour>\d\d?)",
"%I": "(?P<hour12>\d\d?)",
"%j": "(?P<yearday>\d\d?\d?)",
"%m": "(?P<month>\d\d?)",
"%M": "(?P<minute>\d\d?)",
"%p": "(?P<ampm12>am|pm)",
"%R": "(?P<hour>\d\d?):(?P<minute>\d\d?)",
"%S": "(?P<second>\d\d?)",
"%T": "(?P<hour>\d\d?):(?P<minute>\d\d?):(?P<second>\d\d?)",
"%U": "(?P<week>\d\d)",
"%w": "(?P<weekday>\d)",
"%W": "(?P<weekday>\d\d)",
"%y": "(?P<year>\d\d)",
"%Y": "(?P<year>\d\d\d\d)",
"%%": "%"
}
class TimeParser:
if __name__ == "__main__":
# try it out
import time
print strptime("2000-12-20 01:02:03", "%Y-%m-%d %H:%M:%S")
print strptime(time.ctime(time.time()))
Converting a time tuple back to a time value is pretty easy, at least as long as we're talking about local
time. Just pass the time tuple to the mktime function:
Example: Using the time module to convert a local time tuple to a time integer
# File:time-example-3.py
import time
t0 = time.time()
tm = time.localtime(t0)
print tm
print t0
print time.mktime(tm)
Unfortunately, there's no function in the 1.5.2 standard library that converts UTC time tuples back to
time values (neither in Python nor in the underlying C libraries). The following example provides a
Python implementation of such a function, called timegm:
import time
t0 = time.time()
tm = time.gmtime(t0)
print tm
print t0
print timegm(tm)
Python Standard Library: Core Modules 1-67
In 1.6 and later, a similar function is available in the calendar module, as calendar.timegm.
Timing things
The time module can be used to time the execution of a Python program. You can measure either "wall
time" (real world time), or "process time" (the amount of CPU time the process has consumed, this
far).
import time
def procedure():
time.sleep(2.5)
Not all systems can measure the true process time. On such systems (including Windows), clock
usually measures the wall time since the program was started.
Also see the timing module, which measures the wall time between two events.
The process time has limited precision. On many systems, it wraps around after just over 30 minutes.
Python Standard Library: Core Modules 1-68
import types
def check(object):
print object,
if type(object) is types.IntType:
print "INTEGER",
if type(object) is types.FloatType:
print "FLOAT",
if type(object) is types.StringType:
print "STRING",
if type(object) is types.ClassType:
print "CLASS",
if type(object) is types.InstanceType:
print "INSTANCE",
print
check(0)
check(0.0)
check("0")
class A:
pass
class B:
pass
check(A)
check(B)
a = A()
b = B()
check(a)
check(b)
Python Standard Library: Core Modules 1-69
0 INTEGER
0.0 FLOAT
0 STRING
A CLASS
B CLASS
<A instance at 796960> INSTANCE
<B instance at 796990> INSTANCE
Note that all classes have the same type, and so do all instances. To test what class hierarchy a class or
an instance belongs to, use the built-in issubclass and isinstance functions.
The types module destroys the current exception state when it is first imported. In other words, don't
import it (or any module that imports it!) from within an exception handler.
Python Standard Library: Core Modules 1-70
The gc module
(Optional, 2.0 and later) This module provides an interface to the built-in cyclic garbage collector.
Python uses reference counting to keep track of when to get rid of objects; as soon as the last reference
to an object goes away, the object is destroyed.
Starting with version 2.0, Python also provides a cyclic garbage collector, which runs at regular
intervals. This collector looks for data structures that point to themselves, and does what it can to
break the cycles.
You can use the gc.collect function to force full collection. This function returns the number of objects
destroyed by the collector.
Example: Using the gc module to collect cyclic garbage
# File:gc-example-1.py
import gc
def __repr__(self):
return "<Node %s at %x>" % (repr(self.name), id(self))
root.addchild(Node("eric"))
root.addchild(Node("john"))
root.addchild(Node("michael"))
12 unreachable objects
0 unreachable objects
If you're sure that your program doesn't create any self-referencing data structures, you can use the gc.
disable function to disable collection. After calling this function, Python 2.0 works exactly like 1.5.2
and earlier.
Python Standard Library: More Standard Modules 2-1
Overview
This chapter describes a number of modules that are used in many Python programs. It's perfectly
possible to write large Python programs without using them, but they can help you save a lot of time
and effort.
The fileinput module makes it easy to write different kinds of text filters. This module provides a
wrapper class, which lets you use a simple for-in statement to loop over the contents of one or more
text files.
The StringIO module (and the cStringIO variant) implements an in-memory file object. You can use
StringIO objects in many places where Python expects an ordinary file object.
Type wrappers
UserDict, UserList, and UserString are thin wrappers on top of the corresponding built-in types.
But unlike the built-in types, these wrappers can be subclassed. This can come in handy if you need a
class that works almost like a built-in type, but has one or more extra methods.
Random numbers
The random module provides a number of different random number generators. The whrandom
module is similar, but it also allows you to create multiple generator objects.
The md5 and sha modules are used to calculate cryptographically strong message signatures (so-
called "message digests").
The crypt module implements a DES-style one-way encryption. This module is usually only available
on Unix systems.
The rotor module provides simple two-way encryption.
Python Standard Library: More Standard Modules 2-3
import fileinput
import sys
The module also allows you to get metainformation about the current line. This includes isfirstline,
filename, and lineno:
import fileinput
import glob
import string, sys
-- reading samples\sample.txt --
1 WE WILL PERHAPS EVENTUALLY BE WRITING ONLY SMALL
2 MODULES WHICH ARE IDENTIFIED BY NAME AS THEY ARE
3 USED TO BUILD LARGER ONES, SO THAT DEVICES LIKE
4 INDENTATION, RATHER THAN DELIMITERS, MIGHT BECOME
5 FEASIBLE FOR EXPRESSING LOCAL STRUCTURE IN THE
6 SOURCE LANGUAGE.
7 -- DONALD E. KNUTH, DECEMBER 1974
Python Standard Library: More Standard Modules 2-4
Processing text files in place is also easy. Just call the input function with the inplace keyword
argument set to 1, and the module takes care of the rest.
import shutil
import os
aifc-example-1.py
anydbm-example-1.py
array-example-1.py
...
The copytree function copies an entire directory tree (same as cp -r), and rmtree removes an entire
tree (same as rm -r).
Example: Using the shutil module to copy and remove directory trees
# File:shutil-example-2.py
import shutil
import os
SOURCE = "samples"
BACKUP = "samples-bak"
print os.listdir(BACKUP)
# remove it
shutil.rmtree(BACKUP)
print os.listdir(BACKUP)
Example: Using the tempfile module to create filenames for temporary files
# File:tempfile-example-1.py
import tempfile
import os
tempfile = tempfile.mktemp()
try:
# must remove file when done
os.remove(tempfile)
except OSError:
pass
The TemporaryFile function picks a suitable name, and opens the file. It also makes sure that the file
is removed when it's closed (under Unix, you can remove an open file and have it disappear when the
file is closed. On other platforms, this is done via a special wrapper class).
import tempfile
file = tempfile.TemporaryFile()
for i in range(100):
file.write("*" * 100)
import StringIO
file = StringIO.StringIO(MESSAGE)
print file.read()
The StringIO class implements memory file versions of all methods available for built-in file objects,
plus a getvalue method that returns the internal string value.
import StringIO
file = StringIO.StringIO()
file.write("This man is no ordinary man. ")
file.write("This is Mr. F. G. Superman.")
print file.getvalue()
StringIO can be used to capture redirected output from the Python interpreter:
import StringIO
import string, sys
stdout = sys.stdout
print """
According to Gbaya folktales, trickery and guile
are the best ways to defeat the python, king of
snakes, which was hatched from a dragon at the
world's start. -- National Geographic, May 1997
"""
sys.stdout = stdout
print string.upper(file.getvalue())
import cStringIO
file = cStringIO.StringIO(MESSAGE)
print file.read()
To make your code as fast as possible, but also robust enough to run on older Python installations, you
can fall back on the StringIO module if cStringIO is not available:
try:
import cStringIO
StringIO = cStringIO
except ImportError:
import StringIO
print StringIO
import mmap
import os
filename = "samples/sample.txt"
# basics
print data
print len(data), size
Under Windows, the file must currently be opened for both reading and writing (r+, or w+), or the
mmap call will fail.
Python Standard Library: More Standard Modules 2-11
Memory mapped regions can be used instead of ordinary strings in many places, including regular
expressions and many string operations:
import mmap
import os, string, re
def mapfile(filename):
file = open(filename, "r+")
size = os.path.getsize(filename)
return mmap.mmap(file.fileno(), size)
data = mapfile("samples/sample.txt")
# search
index = data.find("small")
print index, repr(data[index-5:index+15])
import UserDict
class FancyDict(UserDict.UserDict):
a = FancyDict(a = 1)
b = FancyDict(b = 2)
print a + b
{'b': 2, 'a': 1}
Python Standard Library: More Standard Modules 2-13
import UserList
class AutoList(UserList.UserList):
list = AutoList()
for i in range(10):
list[i] = i
print list
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Python Standard Library: More Standard Modules 2-14
import UserString
class MyString(UserString.MutableString):
file = open("samples/book.txt")
text = file.read()
file.close()
book = MyString(text)
print book
...
C: The one without the !
P: The one without the -!!! They've ALL got the !! It's a
Standard British Bird, the , it's in all the books!!!
...
Python Standard Library: More Standard Modules 2-15
try:
raise SyntaxError, "example"
except:
traceback.print_exc()
import traceback
import StringIO
try:
raise IOError, "an i/o error occurred"
except:
fp = StringIO.StringIO()
traceback.print_exc(file=fp)
message = fp.getvalue()
If you wish to format the traceback in a non-standard way, you can use the extract_tb function to
convert a traceback object to a list of stack entries:
import traceback
import sys
def function():
raise IOError, "an i/o error occurred"
try:
function()
except:
info = sys.exc_info()
for file, lineno, function, text in traceback.extract_tb(info[2]):
print file, "line", lineno, "in", function
print "=>", repr(text)
print "** %s: %s" % info[:2]
traceback-example-3.py line 8 in ?
=> 'function()'
traceback-example-3.py line 5 in function
=> 'raise IOError, "an i/o error occurred"'
** exceptions.IOError: an i/o error occurred
Python Standard Library: More Standard Modules 2-17
import errno
try:
fp = open("no.such.file")
except IOError, (error, message):
if error == errno.ENOENT:
print "no such file"
elif error == errno.EPERM:
print "permission denied"
else:
print message
no such file
The following example is a bit contrived, but it shows how to use the errorcode dictionary to map
from a numerical error code to the symbolic name.
import errno
try:
fp = open("no.such.file")
except IOError, (error, message):
print error, repr(message)
print errno.errorcode[error]
import getopt
import sys
# process options
opts, args = getopt.getopt(sys.argv[1:], "ld:")
long = 0
directory = None
for o, v in opts:
if o == "-l":
long = 1
elif o == "-d":
directory = v
long = 1
directory = directory
arguments = ['filename']
Python Standard Library: More Standard Modules 2-19
To make it look for long options, pass a list of option descriptors as the third argument. If an option
name ends with an equal sign (=), that option must have an additional argument.
import getopt
import sys
# process options
echo = 0
printer = None
for o, v in opts:
if o in ("-e", "--echo"):
echo = 1
elif o in ("-p", "--printer"):
printer = v
echo = 1
printer = lp01
arguments = ['message']
Python Standard Library: More Standard Modules 2-20
import getpass
usr = getpass.getuser()
import glob
samples/sample.jpg
Note that glob returns full path names, unlike the os.listdir function. glob uses the fnmatch module
to do the actual pattern matching.
Python Standard Library: More Standard Modules 2-22
import fnmatch
import os
sample.jpg
import fnmatch
import os, re
pattern = fnmatch.translate("*.jpg")
sample.jpg
(pattern was .*\.jpg$)
import random
for i in range(5):
Note that randint function can return the upper limit, while the other functions always returns values
smaller than the upper limit.
Python Standard Library: More Standard Modules 2-24
The choice function picks a random item from a sequence. It can be used with lists, tuples, or any
other sequence (provided it can be accessed in random order, of course):
Example: Using the random module to chose random items from a sequence
# File:random-example-2.py
import random
2
3
1
9
1
In 2.0 and later, the shuffle function can be used to shuffle the contents of a list (that is, generate a
random permutation of a list in-place). The following example also shows how to implement that
function under 1.5.2 and earlier:
import random
try:
# available in Python 2.0 and later
shuffle = random.shuffle
except AttributeError:
def shuffle(x):
for i in xrange(len(x)-1, 0, -1):
# pick an element in x[:i+1] with which to exchange x[i]
j = int(random.random() * (i+1))
x[i], x[j] = x[j], x[i]
cards = range(52)
shuffle(cards)
myhand = cards[:5]
print myhand
This module also contains a number of random generators with non-uniform distribution. For
example, the gauss function generates random numbers with a gaussian distribution:
import random
histogram = [0] * 20
****
**********
*************************
***********************************
************************************************
**************************************************
*************************************
***************************
*************
***
*
See the Python Library Reference for more information on the non-uniform generators.
Warning:
import whrandom
# same as random
print whrandom.random()
print whrandom.choice([1, 2, 3, 5, 9])
print whrandom.uniform(10, 20)
print whrandom.randint(100, 1000)
0.113412062346
1
16.8778954689
799
import whrandom
for i in range(5):
print rand1.random(), rand2.random(), rand3.random()
import md5
hash = md5.new()
hash.update("spam, spam, and eggs")
print repr(hash.digest())
'L\005J\243\266\355\243u`\305r\203\267\020F\303'
Note that the checksum is returned as a binary string. Getting a hexadecimal or base64-encoded string
is quite easy, though:
Example: Using the md5 module to get a hexadecimal or base64-encoded md5 value
# File:md5-example-2.py
import md5
import string
import base64
hash = md5.new()
hash.update("spam, spam, and eggs")
value = hash.digest()
print hash.hexdigest()
print base64.encodestring(value)
4c054aa3b6eda37560c57283b71046c3
TAVKo7bto3VgxXKDtxBGww==
Python Standard Library: More Standard Modules 2-28
Among other things, the MD5 checksum can be used for challenge-response authentication (but see the
note on random numbers below):
import md5
import string, random
def getchallenge():
# generate a 16-byte long random string. (note that the built-
# in pseudo-random generator uses a 24-bit seed, so this is not
# as good as it may seem...)
challenge = map(lambda i: chr(random.randint(0, 255)), range(16))
return string.join(challenge, "")
#
# server/client communication
challenge = getchallenge()
# 3. server does the same, and compares the result with the
# client response. the result is a safe login in which the
# password is never sent across the communication channel.
if server_response == client_response:
print "server:", "login ok"
client: connect
server: '\334\352\227Z#\272\273\212KG\330\265\032>\311o'
client: "l'\305\240-x\245\237\035\225A\254\233\337\225\001"
server: login ok
Python Standard Library: More Standard Modules 2-29
A variation of this can be used to sign messages sent over a public network, so that their integrity can
be verified at the receiving end.
import md5
import array
class HMAC_MD5:
# keyed MD5 message authentication
#
# simulate server end
signature = HMAC_MD5(key).digest(message)
#
# simulate client end
client_signature = HMAC_MD5(key).digest(message)
if client_signature == signature:
print "this is the original message:"
print
print message
else:
print "someone has modified the message!!!"
Python Standard Library: More Standard Modules 2-30
The copy method takes a snapshot of the internal object state. This allows you to precalculate partial
digests (such as the padded key, in this example).
For details on this algorithm, see HMAC-MD5: Keyed-MD5 for Message Authentication by Krawczyk
et al.
Warning:
Don't forget that the built-in pseudo random number generator isn't
really good enough for encryption purposes. Be careful.
Python Standard Library: More Standard Modules 2-31
import sha
hash = sha.new()
hash.update("spam, spam, and eggs")
print repr(hash.digest())
print hash.hexdigest()
'\321\333\003\026I\331\272-j\303\247\240\345\343Tvq\364\346\311'
d1db031649d9ba2d6ac3a7a0e5e3547671f4e6c9
See the md5 examples for more ways to use SHA signatures.
Python Standard Library: More Standard Modules 2-32
import crypt
'py8UGrijma1j6'
To verify a given password, encrypt the new password using the two first characters from the encrypted
string as the salt. If the result matches the encrypted string, the password is valid. The following
example uses the pwd module to fetch the encrypted password for a given user.
Python Standard Library: More Standard Modules 2-33
user = raw_input("username:")
password = raw_input("password:")
if login(user, password):
print "welcome", user
else:
print "login failed"
For other ways to implement authentication, see the description of the md5 module.
Python Standard Library: More Standard Modules 2-34
import rotor
SECRET_KEY = "spam"
MESSAGE = "the holy grail"
r = rotor.newrotor(SECRET_KEY)
encoded_message = r.encrypt(MESSAGE)
decoded_message = r.decrypt(encoded_message)
import zlib
compressed_message = zlib.compress(MESSAGE)
decompressed_message = zlib.decompress(compressed_message)
The compression rate varies a lot, depending on the contents of the file.
import zlib
import glob
import zlib
encoder = zlib.compressobj()
data = encoder.compress("life")
data = data + encoder.compress(" of ")
data = data + encoder.compress("brian")
data = data + encoder.flush()
print repr(data)
print repr(zlib.decompress(data))
'x\234\313\311LKU\310OSH*\312L\314\003\000!\010\004\302'
'life of brian'
Python Standard Library: More Standard Modules 2-37
To make it a bit more convenient to read a compressed file, you can wrap a decoder object in a file-like
wrapper:
import zlib
import string, StringIO
class ZipInputStream:
def __rewind(self):
self.zip = zlib.decompressobj()
self.pos = 0 # position in zipped stream
self.offset = 0 # position in unzipped stream
self.data = ""
def tell(self):
return self.offset
Python Standard Library: More Standard Modules 2-38
def readline(self):
# make sure we have an entire line
while self.zip and "\n" not in self.data:
self.__fill(len(self.data) + 512)
i = string.find(self.data, "\n") + 1
if i <= 0:
return self.read()
return self.read(i)
def readlines(self):
lines = []
while 1:
s = self.readline()
if not s:
break
lines.append(s)
return lines
#
# try it out
data = open("samples/sample.txt").read()
data = zlib.compress(data)
file = ZipInputStream(StringIO.StringIO(data))
for line in file.readlines():
print line[:-1]
Note that the tuple assignment cannot be properly compiled until we've reached the second
parenthesis.
import code
import string
#
SCRIPT = [
"a = (",
" 1,",
" 2,",
" 3 ",
")",
"print a"
]
script = ""
----------------------------------------
a=(
1,
2,
3
)
----------------------------------------
----------------------------------------
print a
----------------------------------------
(1, 2, 3)
The InteractiveConsole class implements an interactive console, much like the one you get when
you fire up the Python interpreter in interactive mode.
The console can be either active (it calls a function to get the next line) or passive (you call the push
method when you have new data). The default is to use the built-in raw_input function. Overload the
method with the same name if you prefer to use another input function.
import code
console = code.InteractiveConsole()
console.interact()
Python 1.5.2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
(InteractiveConsole)
>>> a = (
... 1,
... 2,
... 3
... )
>>> print a
(1, 2, 3)
Python Standard Library: More Standard Modules 2-41
The following script defines a function called keyboard. It allows you to hand control over to the
interactive interpreter at any point in your program.
def keyboard(banner=None):
import code, sys
code.interact(banner=banner, local=namespace)
def func():
print "START"
a = 10
keyboard()
print "END"
func()
START
Python 1.5.2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
(InteractiveConsole)
>>> print a
10
>>> print keyboard
<function keyboard at 9032c8>
^Z
END
Python Standard Library: Threads and Processes 3-1
Overview
This chapter describes the thread support modules provided with the standard Python interpreter.
Note that thread support is optional, and may not be available in your Python interpreter.
This chapter also covers some modules that allow you to run external processes on Unix and Windows
systems.
Threads
When you run a Python program, execution starts at the top of the main module, and proceeds
downwards. Loops can be used to repeat portions of the program, and function and method calls
transfer control to a different part of the program (but only temporarily).
With threads, your program can do several things at one time. Each thread has its own flow of control.
While one thread might be reading data from a file, another thread can keep the screen updated.
To keep two threads from accessing the same internal data structure at the same time, Python uses a
global interpreter lock. Only one thread can execute Python code at the same time; Python
automatically switches to the next thread after a short period of time, or when a thread does something
that may take a while (like waiting for the next byte to arrive over a network socket, or reading data
from a file).
The global lock isn't enough to avoid problems in your own programs, though. If multiple threads
attempt to access the same data object, it may end up in an inconsistent state. Consider a simple cache:
def getitem(key):
item = cache.get(key)
if item is None:
# not in cache; create a new one
item = create_new_item(key)
cache[key] = item
return item
If two threads call the getitem function just after each other with the same missing key, they're likely
to end up calling create_new_item twice with the same argument. While this may be okay in many
cases, it can cause serious problems in others.
To avoid problems like this, you can use lock objects to synchronize threads. A lock object can only be
owned by one thread at a time, and can thus be used to make sure that only one thread is executing the
code in the getitem body at any time.
Processes
On most modern operating systems, each program run in its own process. You usually start a new
program/process by entering a command to the shell, or by selecting it in a menu. Python also allows
you to start new programs from inside a Python program.
Most process-related functions are defined by the os module. See the Working with Processes section
for the full story.
Python Standard Library: Threads and Processes 3-3
import threading
import time, random
class Counter:
def __init__(self):
self.lock = threading.Lock()
self.value = 0
def increment(self):
self.lock.acquire() # critical section
self.value = value = self.value + 1
self.lock.release()
return value
counter = Counter()
class Worker(threading.Thread):
def run(self):
for i in range(10):
# pretend we're doing something that takes 10-100 ms
value = counter.increment() # increment global counter
time.sleep(random.randint(10, 100) / 1000.0)
print self.getName(), "-- task", i, "finished", value
#
# try it
for i in range(10):
Worker().start() # start a worker
Python Standard Library: Threads and Processes 3-4
This example also uses Lock objects to create a critical section inside the global counter object. If you
remove the calls to acquire and release, it's pretty likely that the counter won't reach 100.
Python Standard Library: Threads and Processes 3-5
import threading
import Queue
import time, random
WORKERS = 2
class Worker(threading.Thread):
def run(self):
while 1:
item = self.__queue.get()
if item is None:
break # reached end of queue
#
# try it
queue = Queue.Queue(0)
for i in range(WORKERS):
Worker(queue).start() # start a worker
for i in range(10):
queue.put(i)
for i in range(WORKERS):
queue.put(None) # add end-of-queue markers
Python Standard Library: Threads and Processes 3-6
task 1 finished
task 0 finished
task 3 finished
task 2 finished
task 4 finished
task 5 finished
task 7 finished
task 6 finished
task 9 finished
task 8 finished
You can limit the size of the queue. If the producer threads fill the queue, they will block until items are
popped off the queue.
import threading
import Queue
WORKERS = 2
class Worker(threading.Thread):
def run(self):
while 1:
item = self.__queue.get()
if item is None:
break # reached end of queue
#
# run with limited queue
queue = Queue.Queue(3)
for i in range(WORKERS):
Worker(queue).start() # start a worker
for i in range(WORKERS):
queue.put(None) # add end-of-queue markers
push 0
push 1
push 2
push 3
push 4
push 5
task 0 finished
push 6
task 1 finished
push 7
task 2 finished
push 8
task 3 finished
push 9
task 4 finished
task 6 finished
task 5 finished
task 7 finished
task 9 finished
task 8 finished
Python Standard Library: Threads and Processes 3-8
You can modify the behavior through subclassing. The following class provides a simple priority queue.
It expects all items added to the queue to be tuples, where the first member contains the priority (lower
value means higher priority):
import Queue
import bisect
Empty = Queue.Empty
class PriorityQueue(Queue.Queue):
"Thread-safe priority queue"
#
# try it
queue = PriorityQueue(0)
third
second
first
Python Standard Library: Threads and Processes 3-9
And here's a simple stack implementation (last-in first-out, instead of first-in, first-out):
import Queue
Empty = Queue.Empty
class Stack(Queue.Queue):
"Thread-safe stack"
# method aliases
push = Queue.Queue.put
pop = Queue.Queue.get
pop_nowait = Queue.Queue.get_nowait
#
# try it
stack = Stack(0)
third
second
first
Python Standard Library: Threads and Processes 3-10
import thread
import time, random
def worker():
for i in range(50):
# pretend we're doing something that takes 10-100 ms
time.sleep(random.randint(10, 100) / 1000.0)
print thread.get_ident(), "-- task", i, "finished"
#
# try it out!
for i in range(2):
thread.start_new_thread(worker, ())
time.sleep(1)
print "goodbye!"
Note that when the main program exits, all threads are killed. The threading module doesn't have
that problem.
Python Standard Library: Threads and Processes 3-11
import commands
status => 0
output => 171046 bytes
Python Standard Library: Threads and Processes 3-12
import pipes
t = pipes.Template()
# create a pipeline
t.append("sort", "--")
t.append("uniq", "--")
fout.write("foo\n")
fout.write("bar\n")
fout.close()
print fin.readline(),
print fin.readline(),
fin.close()
bar
foo
Python Standard Library: Threads and Processes 3-14
The following example shows how you can use this module to control an existing application.
import popen2
import string
class Chess:
"Interface class for chesstool-compatible programs"
def quit(self):
self.fout.write("quit\n")
self.fout.flush()
#
# play a few moves
g = Chess()
print g.move("a2a4")
print g.move("b2b3")
g.quit()
b8c6
e7e5
Python Standard Library: Threads and Processes 3-15
import signal
import time
signal.signal(signal.SIGALRM, handler)
now = time.time()
time.sleep(200)
got signal 14
slept for 1.99262607098 seconds
Python Standard Library: Data Representation 4-1
Data Representation
"PALO ALTO, Calif. — Intel says its Pentium Pro and new Pentium II
chips have a flaw that can cause computers to sometimes make
mistakes but said the problems could be fixed easily with rewritten
software"
Overview
This chapter describes a number of modules that can be used to convert between Python objects and
other data representations. They are often used to read and write foreign file formats, and to store or
transfer Python variables.
Binary data
Python provides several support modules that help you decode and encode binary data formats. The
struct module can convert between binary data structures (like C structs) and Python tuples. The
array module wraps binary arrays of data (C arrays) into a Python sequence object.
Self-describing formats
To pass data between different Python programs, you can marshal or pickle your data.
The marshal module uses a simple self-describing format which supports most built-in data types,
including code objects. Python uses this format itself, to store compiled code on disk (in PYC files).
The pickle module provides a more sophisticated format, which supports user-defined classes, self-
referencing data structures, and more. This module is available in two versions; the basic pickle
module is written in Python, and is relatively slow, while cPickle is written in C, and is usually as fast
as marshal.
Output formatting
This group of modules supplement built-in formatting functions like repr, and the % string formatting
operator.
The pprint module can print almost any Python data structure in a nice, readable way (well, as
readable as it can make things, that is).
The repr module provides a replacement for the built-in function with the same name. The version in
this module applies tight limits on most things; it doesn't print more than 30 characters from each
string, it doesn't print more than a few levels of deeply nested data structures, etc.
import array
print a
print repr(a.tostring())
print b
print repr(b.tostring())
The array objects can be treated as ordinary lists, to some extent. You cannot concatenate arrays if
they have different type codes, though.
Python Standard Library: Data Representation 4-4
import array
a.append(4)
a=a+a
a = a[2:-2]
print a
print repr(a.tostring())
for i in a:
print i,
This module also provides a very efficient way to turn raw binary data into a sequence of integers (or
floating point values, for that matter):
import array
print a
print repr(a.tostring())
print a.tolist()
Finally, here's how to use this module to determine the endianess of the current platform:
import array
def little_endian():
return ord(array.array("i",[1]).tostring()[0])
if little_endian():
print "little-endian platform (intel, alpha)"
else:
print "big-endian platform (motorola, sparc)"
Python 2.0 and later provides a sys.byteorder attribute, which is set to either "little" or "big":
import sys
import struct
# native byteorder
buffer = struct.pack("ihb", 1, 2, 3)
print repr(buffer)
print struct.unpack("ihb", buffer)
print repr(buffer)
print struct.unpack("!ihb", buffer)
'\001\000\000\000\002\000\003'
(1, 2, 3)
'\000\000\000\001\000\002\003'
(1, 2, 3)
Python Standard Library: Data Representation 4-7
import xdrlib
#
# create a packer and add some data to it
p = xdrlib.Packer()
p.pack_uint(1)
p.pack_string("spam")
data = p.get_buffer()
#
# create an unpacker and use it to decode the data
u = xdrlib.Unpacker(data)
u.done()
packed: '\000\000\000\001\000\000\000\004spam'
unpacked: 1 'spam'
The XDR format is used by Sun's remote procedure call (RPC) protocol. Here's an incomplete (and
rather contrived) example showing how to build an RPC request package:
Python Standard Library: Data Representation 4-8
import xdrlib
AUTH_NULL = 0
transaction = 1
p = xdrlib.Packer()
print repr(p.get_buffer())
'\000\000\000\001\000\000\000\001\000\000\000\002\000\000\004\322
\000\000\003\350\000\000\'\017\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000'
Python Standard Library: Data Representation 4-9
import marshal
value = (
"this is a string",
[1, 2, 3, 4],
("more tuples", 1.0, 2.3, 4.5),
"this is yet another string"
)
data = marshal.dumps(value)
# intermediate format
print type(data), len(data)
print "-"*50
print repr(data)
print "-"*50
print marshal.loads(data)
The marshal module can also handle code objects (it's used to store precompiled Python modules).
import marshal
script = """
print 'hello'
"""
data = marshal.dumps(code)
# intermediate format
print type(data), len(data)
print "-"*50
print repr(data)
print "-"*50
exec marshal.loads(data)
<type 'string'> 81
--------------------------------------------------
'c\000\000\000\000\001\000\000\000s\017\000\000\00
0\177\000\000\177\002\000d\000\000GHd\001\000S(\00
2\000\000\000s\005\000\000\000helloN(\000\000\000\
000(\000\000\000\000s\010\000\000\000<script>s\001
\000\000\000?\002\000s\000\000\000\000'
--------------------------------------------------
hello
Python Standard Library: Data Representation 4-11
import pickle
value = (
"this is a string",
[1, 2, 3, 4],
("more tuples", 1.0, 2.3, 4.5),
"this is yet another string"
)
data = pickle.dumps(value)
# intermediate format
print type(data), len(data)
print "-"*50
print data
print "-"*50
print pickle.loads(data)
On the other hand, pickle cannot handle code objects (but see the copy_reg module for a way to fix
this).
By default, pickle uses a text-based format. You can also use a binary format, in which numbers and
binary strings are stored in a compact binary format. The binary format usually results in smaller files.
import pickle
import math
value = (
"this is a long string" * 100,
[1.2345678, 2.3456789, 3.4567890] * 100
)
# text mode
data = pickle.dumps(value)
print type(data), len(data), pickle.loads(data) == value
# binary mode
data = pickle.dumps(value, 1)
print type(data), len(data), pickle.loads(data) == value
Python Standard Library: Data Representation 4-13
try:
import cPickle
pickle = cPickle
except ImportError:
import pickle
class Sample:
def __init__(self, value):
self.value = value
sample = Sample(1)
data = pickle.dumps(sample)
print pickle
print repr(data)
# File:copy-reg-example-1.py
import pickle
CODE = """
print 'good evening'
"""
exec code
exec pickle.loads(pickle.dumps(code))
good evening
Traceback (innermost last):
...
pickle.PicklingError: can't pickle 'code' objects
Python Standard Library: Data Representation 4-15
We can work around this by registering a code object handler. Such a handler consists of two parts; a
pickler which takes the code object and returns a tuple that can only contain simple data types, and an
unpickler which takes the contents of such a tuple as its arguments:
import copy_reg
import pickle, marshal, types
#
# register a pickle handler for code objects
def code_unpickler(data):
return marshal.loads(data)
def code_pickler(code):
return code_unpickler, (marshal.dumps(code),)
#
# try it out
CODE = """
print "suppose he's got a pointed stick"
"""
exec code
exec pickle.loads(pickle.dumps(code))
If you're transferring the pickled data across a network, or to another program, the custom unpickler
must of course be available at the receiving end as well.
Python Standard Library: Data Representation 4-16
For the really adventurous, here's a version that allows you to pickle open file objects:
import copy_reg
import pickle, types
import StringIO
#
# register a pickle handler for file objects
def file_pickler(code):
position = file.tell()
file.seek(0)
data = file.read()
file.seek(position)
return file_unpickler, (position, data)
#
# try it out
print file.read(120),
print "<here>",
print pickle.loads(pickle.dumps(file)).read()
import pprint
data = (
"this is a string", [1, 2, 3, 4], ("more tuples",
1.0, 2.3, 4.5), "this is yet another string"
)
pprint.pprint(data)
('this is a string',
[1, 2, 3, 4],
('more tuples', 1.0, 2.3, 4.5),
'this is yet another string')
Python Standard Library: Data Representation 4-18
print repr(data)
[('XXXXXXXXXXXX...XXXXXXXXXXXXX',), [('XXXXXXXXXXXX...XXXXXXXXXX
XXX',), [('XXXXXXXXXXXX...XXXXXXXXXXXXX',), [('XXXXXXXXXXXX...XX
XXXXXXXXXXX',), [('XXXXXXXXXXXX...XXXXXXXXXXXXX',), [(...), [...
]]]]]]]
Python Standard Library: Data Representation 4-19
In addition, the = character is used for padding at the end of the data stream.
The encode and decode functions work on file objects:
import base64
The encodestring and decodestring functions convert between strings instead. They're currently
implemented as wrappers on top of encode and decode, using StringIO objects for input and
output.
import base64
data = base64.encodestring(MESSAGE)
original_data = base64.decodestring(data)
Here's how to convert a user name and a password to a HTTP basic authentication string. Note that
you don't really have to work for the NSA to be able to decode this format...
import base64
'QWxhZGRpbjpvcGVuIHNlc2FtZQ=='
Python Standard Library: Data Representation 4-21
Finally, here's a small utility that converts a GIF image to a Python script, for use with the Tkinter
library:
Example: Using the base64 module to wrap GIF images for Tkinter
# File:base64-example-4.py
if not sys.argv[1:]:
print "Usage: gif2tk.py giffile >pyfile"
sys.exit(1)
if data[:4] != "GIF8":
print sys.argv[1], "is not a GIF file"
sys.exit(1)
image = PhotoImage(data="""
R0lGODlhoAB4APcAAAAAAIAAAACAAICAAAAAgIAAgACAgICAgAQEBIwEBIyMBJRUlISE/L
RUBAQE
...
AjmQBFmQBnmQCJmQCrmQDNmQDvmQEBmREnkRAQEAOw==
""")
Python Standard Library: Data Representation 4-22
import binhex
import sys
infile = "samples/sample.jpg"
binhex.binhex(infile, sys.stdout)
:#R0KEA"XC5jUF'F!2j!)!*!%%TS!N!4RdrrBrq!!%%T'58B!!3%!!!%!!3!!rpX
!3`!)"JB("J8)"`F(#3N)#J`8$3`,#``C%K-2&"dD(aiG'K`F)#3Z*b!L,#-F(#J
h+5``-63d0"mR16di-M`Z-c3brpX!3`%*#3N-#``B$3dB-L%F)6+3-[r!!"%)!)!
!J!-")J!#%3%$%3(ra!!I!!!""3'3"J#3#!%#!`3&"JF)#3S,rm3!Y4!!!J%$!`)
%!`8&"!3!!!&p!3)$!!34"4)K-8%'%e&K"b*a&$+"ND%))d+a`495dI!N-f*bJJN
Python Standard Library: Data Representation 4-23
import quopri
import StringIO
def decodestring(instring):
outfile = StringIO.StringIO()
quopri.decode(StringIO.StringIO(instring), outfile)
return outfile.getvalue()
#
# try it out
encoded_message = encodestring(MESSAGE)
decoded_message = decodestring(encoded_message)
original: å i åa ä e ö!
encoded message: '=E5 i =E5a =E4 e =F6!\012'
decoded message: å i åa ä e ö!
As this example shows, non-US characters are mapped to an '=' followed by two hexadecimal digits. So
is the '=' character itself ("=3D"), as well as whitespace at the end of lines ("=20"). Everything else
looks just like before. So provided you don't use too many weird characters, the encoded string is
nearly as readable as the original.
(Europeans generally hate this encoding, and strongly believe that certain US programmers deserve to
be slapped in the head with a huge great fish to the jolly music of Edward German...)
Python Standard Library: Data Representation 4-24
The uu module
The UU encoding scheme is used to convert arbitrary binary data to plain text. This format is quite
popular on the Usenet, but is slowly being superseded by base64 encoding.
An UU encoder takes groups of three bytes (24 bits), and converts each group to a sequence of four
printable characters (6 bits per character), using characters from chr(32) (space) to chr(95). Including
the length marker and line feed characters, UU encoding typically expands data by 40%.
An encoded data stream starts with a begin line, which also includes the file privileges (the Unix mode
field, as an octal number) and the filename, and ends with an end line:
begin 666 sample.jpg
M_]C_X 02D9)1@ ! 0 0 ! #_VP!# @&!@<&!0@'!P<)'0@*#!0-# L+
...more lines like this...
end
import uu
import os, sys
infile = "samples/sample.jpg"
decode(infile, outfile) decodes uu-encoded data from the input text file, and writes it to the output
file. Again, both arguments can be either filenames or file objects.
import uu
import StringIO
infile = "samples/sample.uue"
outfile = "samples/sample.jpg"
#
# decode
fi = open(infile)
fo = StringIO.StringIO()
uu.decode(fi, fo)
#
# compare with original data file
if fo.getvalue() == data:
print len(data), "bytes ok"
Python Standard Library: Data Representation 4-26
import binascii
data = binascii.b2a_base64(text)
text = binascii.a2b_base64(data)
print text, "<=>", repr(data)
data = binascii.b2a_uu(text)
text = binascii.a2b_uu(data)
print text, "<=>", repr(data)
data = binascii.b2a_hqx(text)
text = binascii.a2b_hqx(data)[0]
print text, "<=>", repr(data)
File Formats
Overview
This chapter describes a number of modules that are used to parse different file formats.
Markup Languages
Python comes with extensive support for the Extensible Markup Language XML and Hypertext
Markup Language (HTML) file formats. Python also provides basic support for Standard Generalized
Markup Language (SGML).
All these formats share the same basic structure (this isn't so strange, since both HTML and XML are
derived from SGML). Each document contains a mix of start tags, end tags, plain text (also called
character data), and entity references.
<document name="sample.xml">
<header>This is a header</header>
<body>This is the body text. The text can contain
plain text ("character data"), tags, and
entities.
</body>
</document>
In the above example, <document>, <header>, and <body> are start tags. For each start tag,
there's a corresponding end tag which looks similar, but has a slash before the tag name. The start tag
can also contain one or more attributes, like the name attribute in this example.
Everything between a start tag and its matching end tag is called an element. In the above example, the
document element contains two other elements, header and body.
Finally, " is a character entity. It is used to represent reserved characters in the text sections (in
this case, it's an ampersand (&) which is used to start the entity itself. Other common entities include
< for "less than" (<), and > for "greater than" (>).
While XML, HTML, and SGML all share the same building blocks, there are important differences
between them. In XML, all elements must have both start tags and end tags, and the tags must be
properly nested (if they are, the document is said to be well-formed). In addition, XML is case-
sensitive, so <document> and <Document> are two different element types.
HTML, in contrast, is much more flexible. The HTML parser can often fill in missing tags; for example,
if you open a new paragraph in HTML using the <P> tag without closing the previous paragraph, the
parser automatically adds a </P> end tag. HTML is also case-insensitive. On the other hand, XML
allows you to define your own elements, while HTML uses a fixed element set, as defined by the HTML
specifications.
SGML is even more flexible. In its full incarnation, you can use a custom declaration to define how to
translate the source text into an element structure, and a document type description (DTD) to validate
the structure, and fill in missing tags. Technically, both HTML and XML are SGML applications; they
both have their own SGML declaration, and HTML also has a standard DTD.
Python comes with parsers for all markup flavors. While SGML is the most flexible of the formats,
Python's sgmllib parser is actually pretty simple. It avoids most of the problems by only
understanding enough of the SGML standard to be able to deal with HTML. It doesn't handle
document type descriptions either; instead, you can customize the parser via subclassing.
The HTML support is built on top of the SGML parser. The htmllib parser delegates the actual
rendering to a formatter object. The formatter module contains a couple of standard formatters.
The XML support is most complex. In Python 1.5.2, the built-in support was limited to the xmllib
parser, which is pretty similar to the sgmllib module (with one important difference; xmllib actually
tries to support the entire XML standard).
Python 2.0 comes with more advanced XML tools, based on the optional expat parser.
Configuration Files
The ConfigParser module reads and writes a simple configuration file format, similar to Windows
INI files.
The netrc file reads .netrc configuration files, and the shlex module can be used to read any
configuration file using a shell script-like syntax.
Archive Formats
Python's standard library also provides support for the popular GZIP and ZIP (2.0 only) formats. The
gzip module can read and write GZIP files, and the zipfile reads and writes ZIP files. Both modules
depend on the zlib data compression module.
Python Standard Library: File Formats 5-3
import xmllib
class Parser(xmllib.XMLParser):
# get quotation number
try:
c = Parser()
c.load(open("samples/sample.xml"))
except EOFError:
pass
id => 031
Python Standard Library: File Formats 5-4
The second example contains a simple (and incomplete) rendering engine. The parser maintains an
element stack (__tags), which it passes to the renderer, together with text fragments. The renderer
looks the current tag hierarchy up in a style dictionary, and if it isn't already there, it creates a new style
descriptor by combining bits and pieces from the style sheet.
Example: Using the xmllib module
# File:xmllib-example-2.py
import xmllib
import string, sys
STYLESHEET = {
# each element can contribute one or more style elements
"quotation": {"style": "italic"},
"lang": {"weight": "bold"},
"name": {"weight": "medium"},
}
class Parser(xmllib.XMLParser):
# a simple styling engine
class DumbRenderer:
def __init__(self):
self.cache = {}
#
# try it out
r = DumbRenderer()
c = Parser(r)
c.load(open("samples/sample.xml"))
class Parser:
def __init__(self):
self._parser = expat.ParserCreate()
self._parser.StartElementHandler = self.start
self._parser.EndElementHandler = self.end
self._parser.CharacterDataHandler = self.data
def close(self):
self._parser.Parse("", 1) # end of data
del self._parser # get rid of circular references
p = Parser()
p.feed("<tag>data</tag>")
p.close()
START u'tag' {}
DATA u'data'
END u'tag'
Note that the parser returns Unicode strings, even if you pass it ordinary text. By default, the parser
interprets the source text as UTF-8 (as per the XML standard). To use other encodings, make sure the
XML file contains an encoding directive.
Python Standard Library: File Formats 5-7
class Parser:
def __init__(self):
self._parser = expat.ParserCreate()
self._parser.StartElementHandler = self.start
self._parser.EndElementHandler = self.end
self._parser.CharacterDataHandler = self.data
def close(self):
self._parser.Parse("", 1) # end of data
del self._parser # get rid of circular references
p = Parser()
p.feed("""\
<?xml version='1.0' encoding='iso-8859-1'?>
<author>
<name>fredrik lundh</name>
<city>linköping</city>
</author>
"""
)
p.close()
START u'author' {}
DATA u'\012'
START u'name' {}
DATA u'fredrik lundh'
END u'name'
DATA u'\012'
START u'city' {}
DATA u'link\366ping'
END u'city'
DATA u'\012'
END u'author'
Python Standard Library: File Formats 5-8
import sgmllib
import string
class FoundTitle(Exception):
pass
class ExtractTitle(sgmllib.SGMLParser):
def end_title(self):
self.title = string.join(self.data, "")
raise FoundTitle # abort parsing!
def extract(file):
# extract title from an HTML/SGML stream
p = ExtractTitle()
try:
while 1:
# read small chunks
s = file.read(512)
if not s:
break
p.feed(s)
p.close()
except FoundTitle:
return p.title
return None
Python Standard Library: File Formats 5-9
#
# try it out
To handle all tags, overload the unknown_starttag and unknown_endtag methods instead:
Example: Using the sgmllib module to format an SGML document
# File:sgmllib-example-2.py
import sgmllib
import cgi, sys
class PrettyPrinter(sgmllib.SGMLParser):
# A simple SGML pretty printer
def __init__(self):
# initialize base class
sgmllib.SGMLParser.__init__(self)
self.flag = 0
def newline(self):
# force newline, if necessary
if self.flag:
sys.stdout.write("\n")
self.flag = 0
self.newline()
sys.stdout.write("<%s%s>\n" % (tag, text))
#
# try it out
file = open("samples/sample.sgm")
p = PrettyPrinter()
p.feed(file.read())
p.close()
<chapter>
<title>
Quotations
<title>
<epigraph>
<attribution>
eff-bot, June 1997
<attribution>
<para>
<quote>
Nobody expects the Spanish Inquisition! Amongst
our weaponry are such diverse elements as fear, surprise,
ruthless efficiency, and an almost fanatical devotion to
Guido, and nice red uniforms — oh, damn!
<quote>
<para>
<epigraph>
<chapter>
The following example checks if an SGML document is "well-formed", in the XML sense. In a well-
formed document, all elements are properly nested, and there's one end tag for each start tag.
To check this, we simply keep a list of open tags, and check that each end tag closes a matching start
tag, and that there are no open tags when we reach the end of the document.
Example: Using the sgmllib module to check if an SGML document is well-formed
# File:sgmllib-example-3.py
import sgmllib
class WellFormednessChecker(sgmllib.SGMLParser):
# check that an SGML document is 'well formed'
# (in the XML sense).
def close(self):
sgmllib.SGMLParser.close(self)
if self.tags:
raise SyntaxError, "start tag %s not closed" % self.tags[-1]
try:
c = WellFormednessChecker()
c.load(open("samples/sample.htm"))
except SyntaxError:
raise # report error
else:
print "document is wellformed"
Finally, here's a class that allows you to filter HTML and SGML documents. To use this class, create
your own base class, and implement the start and end methods.
Example: Using the sgmllib module to filter SGML documents
# File:sgmllib-example-4.py
import sgmllib
import cgi, string, sys
class SGMLFilter(sgmllib.SGMLParser):
# sgml filter. override start/end to manipulate
# document elements
class Filter(SGMLFilter):
c = Filter()
c.load(open("samples/sample.htm"))
Python Standard Library: File Formats 5-13
import htmllib
import formatter
import string
class Parser(htmllib.HTMLParser):
# return a dictionary mapping anchor texts to lists
# of associated hyperlinks
def anchor_end(self):
text = string.strip(self.save_end())
if self.anchor and text:
self.anchors[text] = self.anchors.get(text, []) + [self.anchor]
file = open("samples/sample.htm")
html = file.read()
file.close()
p = Parser()
p.feed(html)
p.close()
for k, v in p.anchors.items():
print k, "=>", v
If you're only out to parse an HTML file, and not render it to an output device, it's usually easier to use
the sgmllib module instead.
Python Standard Library: File Formats 5-14
import htmlentitydefs
entities = htmlentitydefs.entitydefs
amp = &
quot = "
copy = ©
yen = ¥
The following example shows how to combine regular expressions with this dictionary to translate
entities in a string (the opposite of cgi.escape):
Example: Using the htmlentitydefs module to translate entities
# File:htmlentitydefs-example-2.py
import htmlentitydefs
import re
import cgi
pattern = re.compile("&(\w+?);")
def descape(string):
return pattern.sub(descape_entity, string)
print descape("<spam&eggs>")
print descape(cgi.escape("<spam&eggs>"))
<spam&eggs>
<spam&eggs>
Python Standard Library: File Formats 5-15
Finally, the following example shows how to use translate reserved XML characters and ISO Latin 1
characters to an XML string. This is similar to cgi.escape, but it also replaces non-ASCII characters.
Example: Escaping ISO Latin 1 entities
# File:htmlentitydefs-example-3.py
import htmlentitydefs
import re, string
for i in range(256):
entity_map[chr(i)] = "&#%d;" % i
def escape(string):
return pattern.sub(escape_entity, string)
print escape("<spam&eggs>")
print escape("å i åa ä e ö")
<spam&eggs>
å i åa ä e ö
Python Standard Library: File Formats 5-16
import formatter
import htmllib
w = formatter.AbstractWriter()
f = formatter.AbstractFormatter(w)
file = open("samples/sample.htm")
p = htmllib.HTMLParser(f)
p.feed(file.read())
p.close()
file.close()
send_paragraph(1)
new_font(('h1', 0, 1, 0))
send_flowing_data('A Chapter.')
send_line_break()
send_paragraph(1)
new_font(None)
send_flowing_data('Some text. Some more text. Some')
send_flowing_data(' ')
new_font((None, 1, None, None))
send_flowing_data('emphasised')
new_font(None)
send_flowing_data(' text. A')
send_flowing_data(' link')
send_flowing_data('[1]')
send_flowing_data('.')
Python Standard Library: File Formats 5-17
In addition to the AbstractWriter class, the formatter module provides an NullWriter class,
which ignores all events passed to it, and a DumbWriter class that converts the event stream to a
plain text document:
Example: Using the formatter module convert HTML to plain text
# File:formatter-example-2.py
import formatter
import htmllib
file = open("samples/sample.htm")
file.close()
# print links
print
print
i=1
for link in p.anchorlist:
print i, "=>", link
i=i+1
A Chapter.
1 => http://www.python.org
Python Standard Library: File Formats 5-18
The following example provides a custom Writer, which in this case is subclassed from the
DumbWriter class. This version keeps track of the current font style, and tweaks the output
somewhat depending on the font.
Example: Using the formatter module with a custom writer
# File:formatter-example-3.py
import formatter
import htmllib, string
class Writer(formatter.DumbWriter):
def __init__(self):
formatter.DumbWriter.__init__(self)
self.tag = ""
self.bold = self.italic = 0
self.fonts = []
w = Writer()
f = formatter.AbstractFormatter(w)
file = open("samples/sample.htm")
_A_ _CHAPTER._
[ematter]
pages: 250
[hardcopy]
pages: 350
import ConfigParser
import string
config = ConfigParser.ConfigParser()
config.read("samples/sample.ini")
# print summary
print
print string.upper(config.get("book", "title"))
print "by", config.get("book", "author"),
print "(" + config.get("book", "email") + ")"
print
print config.get("ematter", "pages"), "pages"
print
250 pages
book
title = Python Standard Library
email = [email protected]
author = Fredrik Lundh
version = 2.0-010504
__name__ = book
ematter
__name__ = ematter
pages = 250
hardcopy
__name__ = hardcopy
pages = 300
In Python 2.0, this module also allows you to write configuration data to a file.
Example: Using the ConfigParser module to write configuration data
# File:configparser-example-2.py
import ConfigParser
import sys
config = ConfigParser.ConfigParser()
config.add_section("ematter")
config.set("ematter", "pages", 250)
# write to screen
config.write(sys.stdout)
[book]
title = the python standard library
author = fredrik lundh
[ematter]
pages = 250
Python Standard Library: File Formats 5-22
import netrc
# default is $HOME/.netrc
info = netrc.netrc("samples/sample.netrc")
import shlex
while 1:
token = lexer.get_token()
if not token:
break
print repr(token)
'machine'
'secret.fbi'
'login'
'mulder'
'password'
'trustno1'
'machine'
'non.secret.fbi'
'login'
'scully'
'password'
'noway'
Python Standard Library: File Formats 5-24
To list the contents of an existing archive, you can use the namelist and infolist methods. The former
returns a list of filenames, the latter a list of ZipInfo instances.
Example: Using the zipfile module to list files in a ZIP file
# File:zipfile-example-1.py
import zipfile
# list filenames
for name in file.namelist():
print name,
print
sample.txt sample.jpg
sample.txt (1999, 9, 11, 20, 11, 8) 302
sample.jpg (1999, 9, 18, 16, 9, 44) 4762
import zipfile
Adding files to an archive is easy. Just pass the file name, and the name you want that file to have in
the archive, to the write method.
The following script creates a ZIP file containing all files in the samples directory.
Example: Using the zipfile module to store files in a ZIP file
# File:zipfile-example-3.py
import zipfile
import glob, os
file.close()
The third, optional argument to the write method controls what compression method to use. Or
rather, it controls whether data should be compressed at all. The default is zipfile.ZIP_STORED,
which stores the data in the archive without any compression at all. If the zlib module is installed, you
can also use zipfile.ZIP_DEFLATED, which gives you "deflate" compression.
Python Standard Library: File Formats 5-26
The zipfile module also allows you to add strings to the archive. However, adding data from a string is
a bit tricky; instead of just passing in the archive name and the data, you have to create a ZipInfo
instance and configure it correctly. Here's a simple example:
Example: Using the zipfile module to store strings in a ZIP file
# File:zipfile-example-4.py
import zipfile
import glob, os, time
now = time.localtime(time.time())[:6]
file.close()
import gzip
file = gzip.GzipFile("samples/sample.gz")
print file.read()
The standard implementation doesn't support the seek and tell methods. The following example
shows how to add forward seeking:
Example: Extending the gzip module to support seek/tell
# File:gzip-example-2.py
import gzip
class gzipFile(gzip.GzipFile):
# adds seek/tell support to GzipFile
offset = 0
def tell(self):
return self.offset
Python Standard Library: File Formats 5-28
#
# try it
file = gzipFile("samples/sample.gz")
file.seek(80)
print file.read()
Overview
Python comes with a rich set of modules for processing mail and news messages, as well as some
common mail archive (mailbox) formats.
Message-Id: <[email protected]>
Date: Tue, 14 Nov 2000 14:55:07 -0500
To: "Fredrik Lundh" <[email protected]>
From: Frank
Subject: Re: python library book!
Where is it?
The message parser reads the headers, and returns a dictionary-like object, with the message headers
as keys.
import rfc822
file = open("samples/sample.eml")
message = rfc822.Message(file)
for k, v in message.items():
print k, "=", v
The message object also provides a couple of convenience methods, which parses address fields and
dates for you:
import rfc822
file = open("samples/sample.eml")
message = rfc822.Message(file)
print message.getdate("date")
print message.getaddr("from")
print message.getaddrlist("to")
The address fields are parsed into (mail, real name)-tuples. The date field is parsed into a 9-element
time tuple, ready for use with the time module.
Python Standard Library: Mail and News Processing 6-4
import mimetools
file = open("samples/sample.msg")
msg = mimetools.Message(file)
import MimeWriter
# data encoders
import quopri
import base64
import StringIO
import sys
TEXT = """
here comes the image you asked for. hope
it's what you expected.
</F>"""
FILE = "samples/sample.jpg"
file = sys.stdout
#
# create a mime multipart writer instance
mime = MimeWriter.MimeWriter(file)
mime.addheader("Mime-Version", "1.0")
mime.startmultipartbody("mixed")
part = mime.nextpart()
part.addheader("Content-Transfer-Encoding", "quoted-printable")
part.startbody("text/plain")
quopri.encode(StringIO.StringIO(TEXT), file, 0)
# add an image
part = mime.nextpart()
part.addheader("Content-Transfer-Encoding", "base64")
part.startbody("image/jpeg")
mime.lastpart()
Python Standard Library: Mail and News Processing 6-6
--host.1.-852461.936831373.130.24813
Content-Type: text/plain
Context-Transfer-Encoding: quoted-printable
</F>
--host.1.-852461.936831373.130.24813
Content-Type: image/jpeg
Context-Transfer-Encoding: base64
/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8
UHRofHh0a
HBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhM
jIyMjIy
...
1e5vLrSYbJnEVpEgjCLx5mPU0qsVK0UaxjdNlS+1U6pfzTR8IzEhj2HrVG6m8m18xc8cIKSC
Aysl
tCuFyC746j/Cq2pTia4WztfmKjGBXTCmo6IUptn/2Q==
--host.1.-852461.936831373.130.24813--
Here's a larger example, which uses a helper class that stores each subpart in the most suitable way:
import MimeWriter
import string, StringIO, sys
import re, quopri, base64
#
# encoders
class Writer:
file = self.mime.startmultipartbody("mixed")
if blurb:
file.write(blurb)
def close(self):
"End of message"
self.mime.lastpart()
self.mime = self.file = None
part = self.mime.nextpart()
if typ == "text":
# text data
encoding = "quoted-printable"
encoder = lambda i, o: quopri.encode(i, o, 0)
else:
#
# write part headers
if encoding:
part.addheader("Content-Transfer-Encoding", encoding)
part.startbody(mimetype)
#
# write part body
if encoder:
encoder(file, self.file)
elif data:
self.file.write(data)
else:
while 1:
data = infile.read(16384)
if not data:
break
outfile.write(data)
#
# try it out
BLURB = "if you can read this, your mailer is not MIME-aware\n"
# add an image
mime.write(open("samples/sample.jpg", "rb"), "image/jpeg")
mime.close()
Python Standard Library: Mail and News Processing 6-9
import mailbox
mb = mailbox.UnixMailbox(open("/var/spool/mail/effbot"))
while 1:
msg = mb.next()
if not msg:
break
for k, v in msg.items():
print k, "=", v
body = msg.fp.read()
print len(body), "bytes in body"
import mailcap
caps = mailcap.getcaps()
for k, v in caps.items():
print k, "=", v
In the above example, the system uses pilview for all kinds of images, and ghostscript viewer for
PostScript documents.
import mailcap
caps = mailcap.getcaps()
print command
pilview samples/sample.jpg
Python Standard Library: Mail and News Processing 6-11
import mimetypes
import glob, urllib
import packmail
import sys
echo sample.txt
sed "s/^X//" >sample.txt <<"!"
XWe will perhaps eventually be writing only small
Xmodules which are identified by name as they are
Xused to build larger ones, so that devices like
Xindentation, rather than delimiters, might become
Xfeasible for expressing local structure in the
Xsource language.
X -- Donald E. Knuth, December 1974
!
import packmail
import sys
packmail.packtree(sys.stdout, "samples")
Note that this module cannot handle binary files, such as images and sound snippets.
Python Standard Library: Mail and News Processing 6-13
import mimify
import sys
mimify.unmimify("samples/sample.msg", sys.stdout, 1)
Here's a MIME message containing two parts, one encoded as quoted-printable, and the other as
base64. The third argument to unmimify controls whether base64-encoded parts should be decoded
or not.
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary='boundary'
--boundary
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable
--boundary
Content-Type: text/plain
Content-Transfer-Encoding: base64
a29tIG5lciBiYXJhLCBvbSBkdSB09nJzIQ==
--boundary--
Python Standard Library: Mail and News Processing 6-14
And here's the decoded result. Much more readable, at least if you know the language.
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary='boundary'
--boundary
Content-Type: text/plain
--boundary
Content-Type: text/plain
import mimify
import StringIO, sys
#
# decode message into a string buffer
file = StringIO.StringIO()
mimify.unmimify("samples/sample.msg", file, 1)
#
# encode message from string buffer
file.seek(0) # rewind
mimify.mimify(file, sys.stdout)
Python Standard Library: Mail and News Processing 6-15
import multifile
import cgi, rfc822
infile = open("samples/sample.msg")
message = rfc822.Message(infile)
if type[:10] == "multipart/":
# multipart message
boundary = params["boundary"]
file = multifile.MultiFile(infile)
file.push(boundary)
while file.next():
submessage = rfc822.Message(file)
# print submessage
print "-" * 68
for k, v in submessage.items():
print k, "=", v
print
print file.read()
file.pop()
else:
# plain message
print infile.read()
Python Standard Library: Network Protocols 7-1
Network Protocols
"Increasingly, people seem to misinterpret complexity as
sophistication, which is baffling — the incomprehensible should cause
suspicion rather than admiration. Possibly this trend results from a
mistaken belief that using a somewhat mysterious device confers an
aura of power on the user"
Niklaus Wirth
Overview
This chapter describes Python's socket protocol support, and the networking modules built on top of
the socket module. This includes client handlers for most popular Internet protocols, as well as several
frameworks that can be used to implement Internet servers.
For the low-level examples in this chapter I'll use two protocols for illustration; the Internet Time
Protocol, and the Hypertext Transfer Protocol.
The Internet Time Protocol (RFC 868, Postel and Harrenstien 1983) is a simple protocol which allows
a network client to get the current time from a server.
Since this protocol is relatively light weight, many (but far from all) Unix systems provide this service.
It's also about as easy to implement as a network protocol can possibly be. The server simply waits for
a connection request, and immediately returns the current time as a 4-byte integer, containing the
number of seconds since January 1st, 1900.
In fact, the protocol is so simple that I can include the entire specification:
Time Protocol
This RFC specifies a standard for the ARPA Internet community. Hosts on
the ARPA Internet that choose to implement a Time Protocol are expected
to adopt and implement this standard.
One motivation arises from the fact that not all systems have a
date/time clock, and all are subject to occasional human or machine
error. The use of time-servers makes it possible to quickly confirm or
correct a system's idea of the time, by making a brief poll of several
independent sites on the network.
This protocol may be used either above the Transmission Control Protocol
(TCP) or above the User Datagram Protocol (UDP).
The server listens for a connection on port 37. When the connection
is established, the server returns a 32-bit time value and closes the
connection. If the server is unable to determine the time at its
site, it should either refuse the connection or close it without
sending anything.
Python Standard Library: Network Protocols 7-3
The Time
The time is the number of seconds since 00:00 (midnight) 1 January 1900
GMT, such that the time 1 is 12:00:01 am on 1 January 1900 GMT; this
base will serve until the year 2036.
For example:
The Hypertext Transfer Protocol (HTTP, Fielding et al., RFC 2616) is something completely different.
The most recent specification (version 1.1), is over 100 pages.
However, in its simplest form, this protocol is very straightforward. To fetch a document, the client
connects to the server, and sends a request like:
GET /hello.txt HTTP/1.0
Host: hostname
User-Agent: name
Hello
Both the request and response headers usually contains more fields, but the Host field in the request
header is the only one that must always be present.
The header lines are separated by "\r\n", and the header must be followed by an empty line, even if
there is no body (this applies to both the request and the response).
The rest of the HTTP specification deals with stuff like content negotiation, cache mechanics, persistent
connections, and much more. For the full story, see Hypertext Transfer Protocol — HTTP/1.1.
Python Standard Library: Network Protocols 7-5
import socket
import struct, time
# server
HOST = "www.python.org"
PORT = 37
# connect to server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
s.close()
# print results
print "server time is", time.ctime(t)
print "local clock is", int(time.time()) - t, "seconds off"
The socket factory function creates a new socket of the given type (in this case, an Internet stream
socket, also known as a TCP socket). The connect method attempts to connect this socket to the given
server. Once that has succeeded, the recv method is used to read data.
Creating a server socket is done in a similar fashion. But instead of connecting to a server, you bind
the socket to a port on the local machine, tell it to listen for incoming connection requests, and
process each request as fast as possible.
The following example creates a time server, bound to port 8037 on the local machine (port numbers
up to 1024 are reserved for system services, and you have to have root privileges to use them to
implement services on a Unix system):
Python Standard Library: Network Protocols 7-6
import socket
import struct, time
# user-accessible port
PORT = 8037
# reference time
TIME1970 = 2208988800L
# establish server
service = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
service.bind(("", PORT))
service.listen(1)
while 1:
# serve forever
channel, info = service.accept()
print "connection from", info
t = int(time.time()) + TIME1970
t = struct.pack("!I", t)
channel.send(t) # send timestamp
channel.close() # disconnect
The listen call tells the socket that we're willing to accept incoming connections. The argument gives
the size of the connection queue (which holds connection requests that our program hasn't gotten
around to processing yet). Finally, the accept loop returns the current time to any client bold enough
to connect.
Note that the accept function returns a new socket object, which is directly connected to the client.
The original socket is only used to establish the connection; all further traffic goes via the new socket.
To test this server, we can use the following generalized version of our first example:
Python Standard Library: Network Protocols 7-7
import socket
import struct, sys, time
# default server
host = "localhost"
port = 8037
if __name__ == "__main__":
# command line utility
if sys.argv[1:]:
host = sys.argv[1]
if sys.argv[2:]:
port = int(sys.argv[2])
else:
port = 37 # default for public servers
t = gettime(host, port)
print "server time is", time.ctime(t)
print "local clock is", int(time.time()) - t, "seconds off"
This sample script can also be used as a module; to get the current time from a server, import the
timeclient module, and call the gettime function.
This far, we've used stream (or TCP) sockets. The time protocol specification also mentions UDP
sockets, or datagrams. Stream sockets work pretty much like a phone line; you'll know if someone at
the remote end picks up the receiver, and you'll notice when she hangs up. In contrast, sending
datagrams is more like shouting into a dark room. There might be someone there, but you won't know
unless she replies.
You don't need to connect to send data over a datagram socket. Instead, you use the sendto method,
which takes both the data and the address of the receiver. To read incoming datagrams, use the
recvfrom method.
Python Standard Library: Network Protocols 7-8
import socket
import struct, time
# server
HOST = "localhost"
PORT = 8037
# connect to server
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.close()
Note that recvfrom returns two values; the actual data, and the address of the sender. Use the latter if
you need to reply.
Here's the corresponding server:
Example: Using the socket module to implement a datagram time server
# File:socket-example-5.py
import socket
import struct, time
# user-accessible port
PORT = 8037
# reference time
TIME1970 = 2208988800L
# establish server
service = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
service.bind(("", PORT))
Python Standard Library: Network Protocols 7-9
while 1:
# serve forever
data, client = service.recvfrom(0)
print "connection from", client
t = int(time.time()) + TIME1970
t = struct.pack("!I", t)
service.sendto(t, client) # send timestamp
The main difference is that the server uses bind to assign a known port number to the socket, and
sends data back to the client address returned by recvfrom.
Python Standard Library: Network Protocols 7-10
Example: Using the select module to wait for data arriving over sockets
# File:select-example-1.py
import select
import socket
import time
PORT = 8037
TIME1970 = 2208988800L
while 1:
is_readable = [service]
is_writable = []
is_error = []
r, w, e = select.select(is_readable, is_writable, is_error, 1.0)
if r:
channel, info = service.accept()
print "connection from", info
t = int(time.time()) + TIME1970
t = chr(t>>24&255) + chr(t>>16&255) + chr(t>>8&255) + chr(t&255)
channel.send(t) # send timestamp
channel.close() # disconnect
else:
print "still waiting"
Python Standard Library: Network Protocols 7-11
In this example, we wait for the listening socket to become readable, which indicates that a connection
request has arrived. We treat the channel socket as usual, since it's not very likely that writing the four
bytes will fill the network buffers. If you need to send larger amounts of data to the client, you should
add it to the is_writable list at the top of the loop, and write only when select tells you to.
If you set the socket in non-blocking mode (by calling the setblocking method), you can use select
also to wait for a socket to become connected. But the asyncore module (see the next section)
provides a powerful framework which handles all this for you, so I won't go into further details here.
Python Standard Library: Network Protocols 7-12
The first example shows a time client, similar to the one for the socket module:
Example: Using the asyncore module to get the time from a time server
# File:asyncore-example-1.py
import asyncore
import socket, time
class TimeRequest(asyncore.dispatcher):
# time requestor (as defined in RFC 868)
def writable(self):
return 0 # don't have anything to write
def handle_connect(self):
pass # connection succeeded
def handle_expt(self):
self.close() # connection failed, shutdown
Python Standard Library: Network Protocols 7-13
def handle_read(self):
# get local time
here = int(time.time()) + TIME1970
self.adjust_time(int(here - there))
def handle_close(self):
self.close()
#
# try it out
request = TimeRequest("www.python.org")
asyncore.loop()
If you don't want the log messages, override the log method in your dispatcher subclass.
Here's the corresponding time server. Note that it uses two dispatcher subclasses, one for the
listening socket, and one for the client channel.
import asyncore
import socket, time
# reference time
TIME1970 = 2208988800L
class TimeChannel(asyncore.dispatcher):
def handle_write(self):
t = int(time.time()) + TIME1970
t = chr(t>>24&255) + chr(t>>16&255) + chr(t>>8&255) + chr(t&255)
self.send(t)
self.close()
Python Standard Library: Network Protocols 7-14
class TimeServer(asyncore.dispatcher):
def handle_accept(self):
channel, addr = self.accept()
TimeChannel(channel)
server = TimeServer(8037)
asyncore.loop()
In addition to the plain dispatcher, this module also includes a dispatcher_with_send class. This
class allows you send larger amounts of data, without clogging up the network transport buffers.
The following module defines an AsyncHTTP class based on the dispatcher_with_send class.
When you create an instance of this class, it issues an HTTP GET request, and sends the incoming data
to a "consumer" target object.
Python Standard Library: Network Protocols 7-15
import asyncore
import string, socket
import StringIO
import mimetools, urlparse
class AsyncHTTP(asyncore.dispatcher_with_send):
# HTTP requestor
self.uri = uri
self.consumer = consumer
self.host = host
self.port = port
self.status = None
self.header = None
self.data = ""
def handle_connect(self):
# connection succeeded
self.send(self.request)
Python Standard Library: Network Protocols 7-16
def handle_expt(self):
# connection failed; notify consumer (status is None)
self.close()
try:
http_header = self.consumer.http_header
except AttributeError:
pass
else:
http_header(self)
def handle_read(self):
data = self.recv(2048)
if not self.header:
self.data = self.data + data
try:
i = string.index(self.data, "\r\n\r\n")
except ValueError:
return # continue
else:
# parse header
fp = StringIO.StringIO(self.data[:i+4])
# status line is "HTTP/version status message"
status = fp.readline()
self.status = string.split(status, " ", 2)
# followed by a rfc822-style message header
self.header = mimetools.Message(fp)
# followed by a newline, and the payload (if any)
data = self.data[i+4:]
self.data = ""
# notify consumer (status is non-zero)
try:
http_header = self.consumer.http_header
except AttributeError:
pass
else:
http_header(self)
if not self.connected:
return # channel was closed by consumer
self.consumer.feed(data)
def handle_close(self):
self.consumer.close()
self.close()
Python Standard Library: Network Protocols 7-17
import SimpleAsyncHTTP
import asyncore
class DummyConsumer:
size = 0
def close(self):
# end of data
print self.size, "bytes in body"
#
# try it out
consumer = DummyConsumer()
request = SimpleAsyncHTTP.AsyncHTTP(
"http://www.pythonware.com",
consumer
)
asyncore.loop()
Note that the consumer interface is designed to be compatible with the htmllib and xmllib parsers.
This allows you to parse HTML or XML data on the fly. Note that the http_header method is
optional; if it isn't defined, it's simply ignored.
A problem with the above example is that it doesn't work for redirected resources. The following
example adds an extra consumer layer, which handles the redirection:
import SimpleAsyncHTTP
import asyncore
class DummyConsumer:
size = 0
def close(self):
# end of data
print self.size, "bytes in body"
class RedirectingConsumer:
def close(self):
self.consumer.close()
#
# try it out
consumer = RedirectingConsumer(DummyConsumer())
request = SimpleAsyncHTTP.AsyncHTTP(
"http://www.pythonware.com/library",
consumer
)
asyncore.loop()
If the server returns status 301 (permanent redirection) or 302 (temporary redirection), the redirecting
consumer closes the current request, and issues a new one for the new address. All other calls to the
consumer are delegated to the original consumer.
Python Standard Library: Network Protocols 7-20
PORT = 8000
class HTTPChannel(asynchat.async_chat):
def found_terminator(self):
if not self.request:
# got the request line
self.request = string.split(self.data, None, 2)
if len(self.request) != 3:
self.shutdown = 1
else:
self.push("HTTP/1.0 200 OK\r\n")
self.push("Content-type: text/html\r\n")
self.push("\r\n")
self.data = self.data + "\r\n"
self.set_terminator("\r\n\r\n") # look for end of headers
else:
# return payload.
self.push("<html><body><pre>\r\n")
self.push(self.data)
self.push("</pre></body></html>\r\n")
self.close_when_done()
Python Standard Library: Network Protocols 7-21
class HTTPServer(asyncore.dispatcher):
def handle_accept(self):
conn, addr = self.accept()
HTTPChannel(self, conn, addr)
#
# try it out
s = HTTPServer(PORT)
print "serving at port", PORT, "..."
asyncore.loop()
GET / HTTP/1.1
Accept: */*
Accept-Language: en, sv
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; Bruce/1.0)
Host: localhost:8000
Connection: Keep-Alive
The producer interface allows you to "push" objects that are too large to store in memory. asyncore
calls the producer's more method whenever it needs more data. To signal end of file, just return an
empty string.
The following example implements a very simple file-based HTTP server, using a simple
FileProducer class that reads data from a file, a few kilobytes at the time.
ROOT = "."
PORT = 8000
Python Standard Library: Network Protocols 7-22
class HTTPChannel(asynchat.async_chat):
def found_terminator(self):
if not self.header:
# parse http header
fp = StringIO.StringIO(self.data)
request = string.split(fp.readline(), None, 2)
if len(request) != 3:
# badly formed request; just shut down
self.shutdown = 1
else:
# parse message header
self.header = mimetools.Message(fp)
self.set_terminator("\r\n")
self.server.handle_request(
self, request[0], request[1], self.header
)
self.close_when_done()
self.data = ""
else:
pass # ignore body data, for now
class FileProducer:
# a producer which reads data from a file object
def more(self):
if self.file:
data = self.file.read(2048)
if data:
return data
self.file = None
return ""
Python Standard Library: Network Protocols 7-23
class HTTPServer(asyncore.dispatcher):
def handle_accept(self):
conn, addr = self.accept()
HTTPChannel(self, conn, addr)
#
# try it out
s = HTTPServer(PORT)
print "serving at port", PORT
asyncore.loop()
import urllib
fp = urllib.urlopen("http://www.python.org")
op = open("out.html", "wb")
n=0
while 1:
s = fp.read(8192)
if not s:
break
op.write(s)
n = n + len(s)
fp.close()
op.close()
for k, v in fp.headers.items():
print k, "=", v
Note that stream object provides some non-standard attributes. headers is a Message object (as
defined by the mimetools module), and url contains the actual URL. The latter is updated if the
server redirects the client to a new URL.
The urlopen function is actually a helper function, which creates an instance of the
FancyURLopener class, and calls its open method. To get special behavior, you can subclass that
class. For example, the following class automatically logs in to the server, when necessary:
Python Standard Library: Network Protocols 7-25
import urllib
class myURLOpener(urllib.FancyURLopener):
# read an URL, with automatic HTTP authentication
urlopener = myURLOpener()
urlopener.setpasswd("mulder", "trustno1")
fp = urlopener.open("http://www.secretlabs.com")
print fp.read()
Python Standard Library: Network Protocols 7-26
import urlparse
print urlparse.urlparse("http://host/path;params?query#fragment")
A common use is to split an HTTP URLs into host and path components (an HTTP request involves
asking the host to return data identified by the path):
Example: Using the urlparse module to parse HTTP locators
# File:urlparse-example-2.py
import urlparse
if scheme == "http":
print "host", "=>", host
if params:
path = path + ";" + params
if query:
path = path + "?" + query
print "path", "=>", path
Alternatively, you can use the urlunparse function to put the URL back together again:
Example: Using the urlparse module to parse HTTP locators
# File:urlparse-example-3.py
import urlparse
if scheme == "http":
print "host", "=>", host
print "path", "=>", urlparse.urlunparse((None, None, path, params, query, None))
The urljoin function is used to combine an absolute URL with a second, possibly relative URL:
Example: Using the urlparse module to combine relative locators
# File:urlparse-example-4.py
import urlparse
base = "http://spam.egg/my/little/pony"
import Cookie
import os, time
cookie = Cookie.SimpleCookie()
cookie["user"] = "Mimi"
cookie["timestamp"] = time.time()
print cookie
cookie = Cookie.SmartCookie()
cookie.load(os.environ["HTTP_COOKIE"])
Set-Cookie: timestamp=736513200;
Set-Cookie: user=Mimi;
user 'Mimi'
timestamp '736513200'
Python Standard Library: Network Protocols 7-29
import robotparser
r = robotparser.RobotFileParser()
r.set_url("http://www.python.org/robots.txt")
r.read()
if r.can_fetch("*", "/index.html"):
print "may fetch the home page"
if r.can_fetch("*", "/tim_one/index.html"):
print "may fetch the tim peters archive"
import ftplib
ftp = ftplib.FTP("www.python.org")
ftp.login("anonymous", "ftplib-example-1")
print ftp.dir()
ftp.quit()
total 34
drwxrwxr-x 11 root 4127 512 Sep 14 14:18 .
drwxrwxr-x 11 root 4127 512 Sep 14 14:18 ..
drwxrwxr-x 2 root 4127 512 Sep 13 15:18 RCS
lrwxrwxrwx 1 root bin 11 Jun 29 14:34 README -> welcome.msg
drwxr-xr-x 3 root wheel 512 May 19 1998 bin
drwxr-sr-x 3 root 1400 512 Jun 9 1997 dev
drwxrwxr-- 2 root 4127 512 Feb 8 1998 dup
drwxr-xr-x 3 root wheel 512 May 19 1998 etc
...
Downloading files is easy; just use the appropriate retr function. Note that when you download a text
file, you have to add line endings yourself. The following function uses a lambda expression to do that
on the fly.
Example: Using the ftplib module to retrieve files
# File:ftplib-example-2.py
import ftplib
import sys
ftp = ftplib.FTP("www.python.org")
ftp.login("anonymous", "ftplib-example-2")
gettext(ftp, "README")
getbinary(ftp, "welcome.msg")
CONFUSED FTP CLIENT? Try begining your login password with '-' dash.
This turns off continuation messages that may be confusing your client.
...
Finally, here's a simple example that copies files to the FTP server. This script uses the file extension to
figure out if the file is a text file or a binary file:
Example: Using the ftplib module to store files
# File:ftplib-example-3.py
import ftplib
import os
ftp = ftplib.FTP("ftp.fbi.gov")
ftp.login("mulder", "trustno1")
upload(ftp, "trixie.zip")
upload(ftp, "file.txt")
upload(ftp, "sightings.jpg")
Python Standard Library: Network Protocols 7-32
import gopherlib
host = "gopher.spam.egg"
f = gopherlib.send_selector("1/", host)
for item in gopherlib.get_directory(f):
print item
import httplib
USER_AGENT = "httplib-example-1.py"
class Error:
# indicates an HTTP error
def __init__(self, url, errcode, errmsg, headers):
self.url = url
self.errcode = errcode
self.errmsg = errmsg
self.headers = headers
def __repr__(self):
return (
"<Error for %s: %s %s>" %
(self.url, self.errcode, self.errmsg)
)
class Server:
# write header
http.putrequest("GET", path)
http.putheader("User-Agent", USER_AGENT)
http.putheader("Host", self.host)
http.putheader("Accept", "*/*")
http.endheaders()
# get response
errcode, errmsg, headers = http.getreply()
if errcode != 200:
raise Error(errcode, errmsg, headers)
file = http.getfile()
return file.read()
Python Standard Library: Network Protocols 7-34
if __name__ == "__main__":
server = Server("www.pythonware.com")
print server.fetch("/index.htm")
Note that the HTTP client provided by this module blocks while waiting for the server to respond. For
an asynchronous solution, which among other things allows you to issue multiple requests in parallel,
see the examples for the asyncore module.
The httplib module also allows you to send other HTTP commands, such as POST.
Example: Using the httplib module to post data
# File:httplib-example-2.py
import httplib
USER_AGENT = "httplib-example-2.py"
http = httplib.HTTP(host)
# write header
http.putrequest("PUT", path)
http.putheader("User-Agent", USER_AGENT)
http.putheader("Host", host)
if type:
http.putheader("Content-Type", type)
http.putheader("Content-Length", str(len(data)))
http.endheaders()
# write body
http.send(data)
# get response
errcode, errmsg, headers = http.getreply()
if errcode != 200:
raise Error(errcode, errmsg, headers)
file = http.getfile()
return file.read()
if __name__ == "__main__":
import poplib
import string, random
import StringIO, rfc822
SERVER = "pop.spam.egg"
USER = "mulder"
PASSWORD = "trustno1"
# connect to server
server = poplib.POP3(SERVER)
# login
server.user(USER)
server.pass_(PASSWORD)
message = rfc822.Message(file)
for k, v in message.items():
print k, "=", v
print message.fp.read()
subject = ANN: (the eff-bot guide to) The Standard Python Library
message-id = <[email protected]>
received = (from [email protected])
by spam.egg (8.8.7/8.8.5) id KAA09206
for mulder; Tue, 12 Oct 1999 10:08:47 +0200
from = Fredrik Lundh <[email protected]>
date = Tue, 12 Oct 1999 10:08:47 +0200
to = [email protected]
...
Python Standard Library: Network Protocols 7-36
import imaplib
import string, random
import StringIO, rfc822
SERVER = "imap.spam.egg"
USER = "mulder"
PASSWORD = "trustno1"
# connect to server
server = imaplib.IMAP4(SERVER)
# login
server.login(USER, PASSWORD)
server.select()
file = StringIO.StringIO(text)
message = rfc822.Message(file)
for k, v in message.items():
print k, "=", v
print message.fp.read()
server.logout()
subject = ANN: (the eff-bot guide to) The Standard Python Library
message-id = <[email protected]>
to = [email protected]
date = Tue, 12 Oct 1999 10:16:19 +0200 (MET DST)
from = <[email protected]>
received = ([email protected]) by imap.algonet.se (8.8.8+Sun/8.6.12)
id KAA12177 for [email protected]; Tue, 12 Oct 1999 10:16:19 +0200
(MET DST)
import smtplib
import string, sys
HOST = "localhost"
FROM = "[email protected]"
TO = "[email protected]"
body = string.join((
"From: %s" % FROM,
"To: %s" % TO,
"Subject: %s" % SUBJECT,
"",
BODY), "\r\n")
print body
server = smtplib.SMTP(HOST)
server.sendmail(FROM, [TO], body)
server.quit()
From: [email protected]
To: [email protected]
Subject: for your information!
import telnetlib
import sys
HOST = "spam.egg"
USER = "mulder"
PASSWORD = "trustno1"
telnet = telnetlib.Telnet(HOST)
telnet.read_until("login: ")
telnet.write(USER + "\n")
telnet.read_until("Password: ")
telnet.write(PASSWORD + "\n")
telnet.write("ls librarybook\n")
telnet.write("exit\n")
print telnet.read_all()
[spam.egg mulder]$ ls
README os-path-isabs-example-1.py
SimpleAsyncHTTP.py os-path-isdir-example-1.py
aifc-example-1.py os-path-isfile-example-1.py
anydbm-example-1.py os-path-islink-example-1.py
array-example-1.py os-path-ismount-example-1.py
...
Python Standard Library: Network Protocols 7-39
Listing messages
Prior to reading messages from a news server, you have to connect to the server, and then select a
newsgroup. The following script also downloads a complete list of all messages on the server, and
extracts some more or less interesting statistics from that list:
import nntplib
import string
SERVER = "news.spam.egg"
GROUP = "comp.lang.python"
AUTHOR = "[email protected]" # eff-bots human alias
# connect to server
server = nntplib.NNTP(SERVER)
# choose a newsgroup
resp, count, first, last, name = server.group(GROUP)
print "count", "=>", count
print "range", "=>", first, last
Downloading messages
Downloading a message is easy. Just call the article method, as shown in this script:
import nntplib
import string
SERVER = "news.spam.egg"
GROUP = "comp.lang.python"
KEYWORD = "tkinter"
# connect to server
server = nntplib.NNTP(SERVER)
To further manipulate the messages, you can wrap it up in a Message object (using the rfc822
module):
import nntplib
import string, random
import StringIO, rfc822
SERVER = "news.spam.egg"
GROUP = "comp.lang.python"
# connect to server
server = nntplib.NNTP(SERVER)
message = rfc822.Message(file)
for k, v in message.items():
print k, "=", v
print message.fp.read()
mime-version = 1.0
content-type = text/plain; charset="iso-8859-1"
message-id = <[email protected]>
lines = 22
...
from = "Fredrik Lundh" <[email protected]>
nntp-posting-host = parrot.python.org
subject = ANN: (the eff-bot guide to) The Standard Python Library
...
</F>
Once you've gotten this far, you can use modules like htmllib, uu, and base64 to further process the
messages.
Python Standard Library: Network Protocols 7-42
import SocketServer
import time
# user-accessible port
PORT = 8037
# reference time
TIME1970 = 2208988800L
class TimeRequestHandler(SocketServer.StreamRequestHandler):
def handle(self):
print "connection from", self.client_address
t = int(time.time()) + TIME1970
b = chr(t>>24&255) + chr(t>>16&255) + chr(t>>8&255) + chr(t&255)
self.wfile.write(b)
import BaseHTTPServer
import cgi, random, sys
MESSAGES = [
"That's as maybe, it's still a frog.",
"Albatross! Albatross! Albatross!",
"A pink form from Reading.",
"Hello people, and welcome to 'It's a Tree'"
"I simply stare at the brick and it goes to sleep.",
]
class Handler(BaseHTTPServer.BaseHTTPRequestHandler):
def do_GET(self):
if self.path != "/":
self.send_error(404, "File not found")
return
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
try:
# redirect stdout to client
stdout = sys.stdout
sys.stdout = self.wfile
self.makepage()
finally:
sys.stdout = stdout # restore
def makepage(self):
# generate a random message
tagline = random.choice(MESSAGES)
print "<html>"
print "<body>"
print "<p>Today's quote: "
print "<i>%s</i>" % cgi.escape(tagline)
print "</body>"
print "</html>"
PORT = 8000
See the SimpleHTTPServer and CGIHTTPServer modules for more extensive HTTP frameworks.
Python Standard Library: Network Protocols 7-44
import SimpleHTTPServer
import SocketServer
PORT = 8000
Handler = SimpleHTTPServer.SimpleHTTPRequestHandler
The server ignores drive letters and relative path names (such as '..'). However, it does not implement
any other access control mechanisms, so be careful how you use it.
The second example implements a truly minimal web proxy. When sent to a proxy, the HTTP requests
should include the full URI for the target server. This server uses urllib to fetch data from the target.
Example: Using the SimpleHTTPServer module as a proxy
# File:simplehttpserver-example-2.py
import SocketServer
import SimpleHTTPServer
import urllib
PORT = 1234
class Proxy(SimpleHTTPServer.SimpleHTTPRequestHandler):
def do_GET(self):
self.copyfile(urllib.urlopen(self.path), self.wfile)
import CGIHTTPServer
import BaseHTTPServer
class Handler(CGIHTTPServer.CGIHTTPRequestHandler):
cgi_directories = ["/cgi"]
PORT = 8000
import cgi
import os, urllib
ROOT = "samples"
# header
print "text/html"
print
query = os.environ.get("QUERY_STRING")
if not query:
query = "."
print "<html>"
print "<head>"
print "<title>file listing</title>"
print "</head>"
print "</html>"
print "<body>"
try:
files = os.listdir(os.path.join(ROOT, query))
except os.error:
files = []
print "</body>"
print "</html>"
Python Standard Library: Network Protocols 7-47
text/html
<html>
<head>
<title>file listing</title>
</head>
</html>
<body>
<p>sample.gif
<p>sample.gz
<p>sample.netrc
...
<p>sample.txt
<p>sample.xml
<p>sample~
<p><a href='cgi-example-1.py?web'>web</a>
</body>
</html>
Python Standard Library: Network Protocols 7-48
import webbrowser
import time
webbrowser.open("http://www.pythonware.com")
On Unix, this module supports lynx, Netscape, Mosaic, Konquerer, and Grail. On Windows and
Macintosh, it uses the standard browser (as defined in the registry or the Internet configuration panel).
Python Standard Library: Internationalization 8-1
Internationalization
import locale
# integer formatting
value = 4711
print locale.format("%d", value, 1), "==",
print locale.atoi(locale.format("%d", value, 1))
# floating point
value = 47.11
print locale.format("%f", value, 1), "==",
print locale.atof(locale.format("%f", value, 1))
info = locale.localeconv()
print info["int_curr_symbol"]
import locale
language sv_SE
encoding cp1252
Python Standard Library: Internationalization 8-3
import unicodedata
for char in [u"A", u"-", u"1", u"\N{LATIN CAPITAL LETTER O WITH DIAERESIS}"]:
print repr(char),
print unicodedata.category(char),
print repr(unicodedata.decomposition(char)),
print unicodedata.decimal(char, None),
print unicodedata.numeric(char, None)
Note that in Python 2.0, properties for CJK ideographs and Hangul syllables are missing. This affects
characters in the range 0x3400-0x4DB5, 0x4E00-0x9FA5, and 0xAC00-D7A3. The first character in
each range has correct properties, so you can work around this problem by simply mapping each
character to the beginning:
def remap(char):
# fix for broken unicode property database in Python 2.0
c = ord(char)
if 0x3400 <= c <= 0x4DB5:
return unichr(0x3400)
if 0x4E00 <= c <= 0x9FA5:
return unichr(0x4E00)
if 0xAC00 <= c <= 0xD7A3:
return unichr(0xAC00)
return char
print repr(u"\N{FROWN}")
print repr(u"\N{SMILE}")
print repr(u"\N{SKULL AND CROSSBONES}")
u'\u2322'
u'\u2323'
u'\u2620'
Python Standard Library: Multimedia Modules 9-1
Multimedia Modules
"Wot? No quote?"
Overview
Python comes with a small set of modules for dealing with image files and audio files.
Also see Python Imaging Library (PIL) (http://www.pythonware.com/products/pil) and Snack
(http://www.speech.kth.se/snack/), among others.
import imghdr
result = imghdr.what("samples/sample.jpg")
if result:
print "file format:", result
else:
print "cannot identify file"
import sndhdr
result = sndhdr.what("samples/sample.wav")
if result:
print "file format:", result
else:
print "cannot identify file"
result = whatsound.what("samples/sample.wav")
if result:
print "file format:", result
else:
print "cannot identify file"
import aifc
a = aifc.open("samples/sample.aiff", "r")
if a.getnchannels() == 1:
print "mono,",
else:
print "stereo,",
data = a.readframes(a.getnframes())
import sunau
w = sunau.open("samples/sample.au", "r")
if w.getnchannels() == 1:
print "mono,",
else:
print "stereo,",
import sunaudio
file = "samples/sample.au"
import wave
w = wave.open("samples/sample.wav", "r")
if w.getnchannels() == 1:
print "mono,",
else:
print "stereo,",
import audiodev
import aifc
player = audiodev.AudioDev()
player.setoutrate(sound.getframerate())
player.setsampwidth(sound.getsampwidth())
player.setnchannels(sound.getnchannels())
while 1:
data = sound.readframes(bytes_per_second)
if not data:
break
player.writeframes(data)
player.wait()
Python Standard Library: Multimedia Modules 9-10
import winsound
file = "samples/sample.wav"
winsound.PlaySound(
file,
winsound.SND_FILENAME|winsound.SND_NOWAIT,
)
Python Standard Library: Multimedia Modules 9-11
import colorsys
# gold
r, g, b = 1.00, 0.84, 0.00
y, i, q = colorsys.rgb_to_yiq(r, g, b)
print "YIQ", (y, i, q), "=>", colorsys.yiq_to_rgb(y, i, q)
h, l, s = colorsys.rgb_to_hls(r, g, b)
print "HLS", (h, l, s), "=>", colorsys.hls_to_rgb(h, l, s)
h, s, v = colorsys.rgb_to_hsv(r, g, b)
print "HSV", (h, s, v), "=>", colorsys.hsv_to_rgb(h, s, v)
Data Storage
"Unlike mainstream component programming, scripts usually do not
introduce new components but simply "wire" existing ones. Scripts can
be seen as introducing behavior but no new state. /.../ Of course, there
is nothing to stop a "scripting" language from introducing persistent
state — it then simply turns into a normal programming language"
Overview
Python comes with drivers for a number of very similar database managers, all modeled after Unix's
dbm library. These databases behaves like ordinary dictionaries, with the exception that you can only
use strings for keys and values (the shelve module can handle any kind of value).
import anydbm
db = anydbm.open("database", "c")
db["1"] = "one"
db["2"] = "two"
db["3"] = "three"
db.close()
db = anydbm.open("database", "r")
for key in db.keys():
print repr(key), repr(db[key])
'2' 'two'
'3' 'three'
'1' 'one'
Python Standard Library: Data Storage 10-3
import whichdb
filename = "database"
result = whichdb.whichdb(filename)
if result:
print "file created by", result
handler = __import__(result)
db = handler.open(filename, "r")
print db.keys()
else:
# cannot identify data base
if result is None:
print "cannot read database file", filename
else:
print "cannot identify database file", filename
db = None
This example used the __import__ function to import a module with the given name.
Python Standard Library: Data Storage 10-4
import shelve
db = shelve.open("database", "c")
db["one"] = 1
db["two"] = 2
db["three"] = 3
db.close()
db = shelve.open("database", "r")
for key in db.keys():
print repr(key), repr(db[key])
'one' 1
'three' 3
'two' 2
The following example shows how to use the shelve module with a given database driver.
import shelve
import gdbm
db = gdbm_shelve("dbfile")
Python Standard Library: Data Storage 10-5
import dbhash
db = dbhash.open("dbhash", "c")
db["one"] = "the foot"
db["two"] = "the shoulder"
db["three"] = "the other foot"
db["four"] = "the bridge of the nose"
db["five"] = "the naughty bits"
db["six"] = "just above the elbow"
db["seven"] = "two inches to the right of a very naughty bit indeed"
db["eight"] = "the kneecap"
db.close()
db = dbhash.open("dbhash", "r")
for key in db.keys():
print repr(key), repr(db[key])
Python Standard Library: Data Storage 10-6
import dbm
db = dbm.open("dbm", "c")
db["first"] = "bruce"
db["second"] = "bruce"
db["third"] = "bruce"
db["fourth"] = "bruce"
db["fifth"] = "michael"
db["fifth"] = "bruce" # overwrite
db.close()
db = dbm.open("dbm", "r")
for key in db.keys():
print repr(key), repr(db[key])
'first' 'bruce'
'second' 'bruce'
'fourth' 'bruce'
'third' 'bruce'
'fifth' 'bruce'
Python Standard Library: Data Storage 10-7
import dumbdbm
db = dumbdbm.open("dumbdbm", "c")
db["first"] = "fear"
db["second"] = "surprise"
db["third"] = "ruthless efficiency"
db["fourth"] = "an almost fanatical devotion to the Pope"
db["fifth"] = "nice red uniforms"
db.close()
db = dumbdbm.open("dumbdbm", "r")
for key in db.keys():
print repr(key), repr(db[key])
'first' 'fear'
'third' 'ruthless efficiency'
'fifth' 'nice red uniforms'
'second' 'surprise'
'fourth' 'an almost fanatical devotion to the Pope'
Python Standard Library: Data Storage 10-8
import gdbm
db = gdbm.open("gdbm", "c")
db["1"] = "call"
db["2"] = "the"
db["3"] = "next"
db["4"] = "defendant"
db.close()
db = gdbm.open("gdbm", "r")
keys = db.keys()
keys.sort()
for key in keys:
print db[key],
0 SET_LINENO 0
3 SET_LINENO 1
6 LOAD_CONST 0 ('hello again, and welcome to the show')
9 PRINT_ITEM
10 PRINT_NEWLINE
11 LOAD_CONST 1 (None)
14 RETURN_VALUE
You can also use dis as a module. The dis function takes a class, method, function, or code object as its
single argument.
import dis
def procedure():
print 'hello'
dis.dis(procedure)
0 SET_LINENO 3
3 SET_LINENO 4
6 LOAD_CONST 1 ('hello')
9 PRINT_ITEM
10 PRINT_NEWLINE
11 LOAD_CONST 0 (None)
14 RETURN_VALUE
Python Standard Library: Tools and Utilities 11-3
import pdb
def test(n):
j=0
for i in range(n):
j=j+i
return n
db = pdb.Pdb()
db.runcall(test, 1)
> pdb-example-1.py(3)test()
-> def test(n):
(Pdb) s
> pdb-example-1.py(4)test()
-> j = 0
(Pdb) s
> pdb-example-1.py(5)test()
-> for i in range(n):
...
Python Standard Library: Tools and Utilities 11-4
import bdb
import time
def spam(n):
j=0
for i in range(n):
j=j+i
return n
def egg(n):
spam(n)
spam(n)
spam(n)
spam(n)
def test(n):
egg(n)
class myDebugger(bdb.Bdb):
run = 0
db = myDebugger()
db.run = 1
db.set_break("bdb-example-1.py", 7)
db.runcall(test, 1)
continue...
call egg None
call spam None
break at C:\ematter\librarybook\bdb-example-1.py 7 in spam
continue...
call spam None
break at C:\ematter\librarybook\bdb-example-1.py 7 in spam
continue...
call spam None
break at C:\ematter\librarybook\bdb-example-1.py 7 in spam
continue...
call spam None
break at C:\ematter\librarybook\bdb-example-1.py 7 in spam
continue...
Python Standard Library: Tools and Utilities 11-6
import profile
def func1():
for i in range(1000):
pass
def func2():
for i in range(1000):
func1()
profile.run("func2()")
You can modify the report to suit your needs, via the pstats module (see next page).
Python Standard Library: Tools and Utilities 11-7
import pstats
import profile
def func1():
for i in range(1000):
pass
def func2():
for i in range(1000):
func1()
p = profile.Profile()
p.run("func2()")
s = pstats.Stats(p)
s.sort_stats("time", "name").print_stats()
Since the Python interpreter interprets a tab as eight spaces, the script will run correctly. It will also
display correctly, in any editor that assumes that a tab is either eight or four spaces. That's not enough
to fool the tab nanny, of course...
You can also use tabnanny from within a program.
import tabnanny
FILE = "samples/badtabs.py"
file = open(FILE)
for line in file.readlines():
print repr(line)
'if 1:\012'
' \011print "hello"\012'
' print "world"\012'
samples/badtabs.py 3 ' print "world"\012'
(To capture the output, you can redirect sys.stdout to a StringIO object.)
Python Standard Library: Platform Specific Modules 12-1
Overview
This chapter describes some platform specific modules. I've emphasized modules that are available on
entire families of platforms (such as Unix, or the Windows family).
FILE = "counter.txt"
if not os.path.exists(FILE):
# create the counter file if it doesn't exist
file = open(FILE, "w")
file.write("0")
file.close()
for i in range(20):
# increment the counter
file = open(FILE, "r+")
fcntl.flock(file.fileno(), FCNTL.LOCK_EX)
counter = int(file.readline()) + 1
file.seek(0)
file.write(str(counter))
file.close() # unlocks the file
print os.getpid(), "=>", counter
time.sleep(0.1)
30940 => 1
30942 => 2
30941 => 3
30940 => 4
30941 => 5
30942 => 6
Python Standard Library: Platform Specific Modules 12-3
import pwd
import os
print pwd.getpwuid(os.getgid())
print pwd.getpwnam("root")
The getpwall function returns a list of database entries for all available users. This can be useful if you
want to search for a user.
If you have to look up many names, you can use getpwall to preload a dictionary:
import pwd
import os
def userinfo(uid):
# name or uid integer
return _pwd[uid]
print userinfo(os.getuid())
print userinfo("root")
import grp
import os
print grp.getgrgid(os.getgid())
print grp.getgrnam("wheel")
The getgrall function returns a list of database entries for all available groups.
If you're going to do a lot of group queries, you can save some time by using getgrall to copy all the
(current) groups into a dictionary. The groupinfo function in the following example returns the
information for either a group identifier (an integer) or a group name (a string):
import grp
import os
def groupinfo(gid):
# name or gid integer
return _grp[gid]
print groupinfo(os.getgid())
print groupinfo("wheel")
import nis
import string
print nis.cat("ypservers")
print string.split(nis.match("bacon", "hosts.byname"))
{'bacon.spam.egg': 'bacon.spam.egg'}
['194.18.155.250', 'bacon.spam.egg', 'bacon', 'spam-010']
Python Standard Library: Platform Specific Modules 12-6
import curses
text = [
"a very simple curses demo",
"",
"(press any key to exit)"
]
# setup keyboard
curses.noecho() # no keyboard echo
curses.cbreak() # don't wait for newline
# screen size
rows, columns = screen.getmaxyx()
screen.getch()
curses.endwin()
Python Standard Library: Platform Specific Modules 12-7
fileno = sys.stdin.fileno()
attr = termios.tcgetattr(fileno)
orig = attr[:]
try:
termios.tcsetattr(fileno, TERMIOS.TCSADRAIN, attr)
message = raw_input("enter secret message: ")
print
finally:
# restore terminal settings
termios.tcsetattr(fileno, TERMIOS.TCSADRAIN, orig)
import tty
import os, sys
fileno = sys.stdin.fileno()
tty.setraw(fileno)
print raw_input("raw input: ")
tty.setcbreak(fileno)
print raw_input("cbreak input: ")
import resource
import resource
import syslog
import sys
syslog.openlog(sys.argv[0])
syslog.closelog()
Python Standard Library: Platform Specific Modules 12-11
import msvcrt
while 1:
char = msvcrt.getch()
if char == chr(27):
break
print char,
if char == chr(13):
print
The kbhit function returns true if a key has been pressed (which means that getch won't block).
import msvcrt
import time
The locking function can be used to implement cross-process file locking under Windows:
import msvcrt
import os
FILE = "counter.txt"
if not os.path.exists(FILE):
file = open(FILE, "w")
file.write("0")
file.close()
for i in range(20):
file = open(FILE, "r+")
# look from current position (0) to end of file
msvcrt.locking(file.fileno(), LK_LOCK, os.path.getsize(FILE))
counter = int(file.readline()) + 1
file.seek(0)
file.write(str(counter))
file.close() # unlocks the file
print os.getpid(), "=>", counter
time.sleep(0.1)
208 => 21
208 => 22
208 => 23
208 => 24
208 => 25
208 => 26
Python Standard Library: Platform Specific Modules 12-14
The nt module
(Implementation, Windows only). This module is an implementation module used by the os module
on Windows platforms. There's hardly any reason to use this module directly; use os instead.
import nt
aifc-example-1.py 314
anydbm-example-1.py 259
array-example-1.py 48
Python Standard Library: Platform Specific Modules 12-15
import _winreg
explorer = _winreg.OpenKey(
_winreg.HKEY_CURRENT_USER,
"Software\\Microsoft\\Windows\\CurrentVersion\\Explorer"
)
print
print "user is", repr(value)
user is u'Effbot'
Python Standard Library: Platform Specific Modules 12-16
import posix
aifc-example-1.py 314
anydbm-example-1.py 259
array-example-1.py 48
Python Standard Library: Implementation Support Modules 13-1
import dospath
file = "/my/little/pony"
isabs => 1
dirname => /my/little
basename => pony
normpath => \my\little\pony
split => ('/my/little', 'pony')
join => /my/little/pony\zorba
Note that Python's DOS support can use both forward (/) and backwards slashes (\) as directory
separators.
import macpath
file = "my:little:pony"
isabs => 1
dirname => my:little
basename => pony
normpath => my:little:pony
split => ('my:little', 'pony')
join => my:little:pony:zorba
Python Standard Library: Implementation Support Modules 13-3
import ntpath
file = "/my/little/pony"
isabs => 1
dirname => /my/little
basename => pony
normpath => \my\little\pony
split => ('/my/little', 'pony')
join => /my/little/pony\zorba
Note that this module treats both forward slashes (/) and backward slashes (\) as directory separators.
Python Standard Library: Implementation Support Modules 13-4
import posixpath
file = "/my/little/pony"
isabs => 1
dirname => /my/little
basename => pony
normpath => /my/little/pony
split => ('/my/little', 'pony')
join => /my/little/pony/zorba
Python Standard Library: Implementation Support Modules 13-5
import strop
import sys
if strop.lower(sys.executable)[-4:] == ".exe":
extra = sys.executable[:-4] # windows
else:
extra = sys.executable
import mymodule
In Python 2.0 and later, you should use string methods instead of strop. In the above example, replace
"strop.lower(sys.executable)" with "sys.executable.lower()"
Python Standard Library: Implementation Support Modules 13-6
import imp
import sys
import __builtin__
__builtin__.__import__ = my_import
import xmllib
Note that the alternative version shown here doesn't support packages. For a more extensive example,
see the sources for the knee module.
Python Standard Library: Implementation Support Modules 13-7
import new
class Sample:
a = "default"
def __init__(self):
self.a = "initialised"
def __repr__(self):
return self.a
#
# create instances
a = Sample()
print "normal", "=>", a
b = new.instance(Sample, {})
print "new.instance", "=>", b
b.__init__()
print "after __init__", "=>", b
# File:pre-example-1.py
import pre
p = pre.compile("[Python]+")
import sre
# a single character
m = sre.match(".", text)
if m: print repr("."), "=>", repr(m.group(0))
import py_compile
The compileall module can be used to compile all Python files in an entire directory tree.
Python Standard Library: Implementation Support Modules 13-11
import compileall
compileall.compile_dir(".", force=1)
def import_from(filename):
"Import module from a named file"
loader = ihooks.BasicModuleLoader()
path, file = os.path.split(filename)
name, ext = os.path.splitext(file)
m = loader.find_module_in_dir(name, path)
if not m:
raise ImportError, name
m = loader.load_module(name, m)
return m
colorsys = import_from("/python/lib/colorsys.py")
print colorsys
import linecache
print linecache.getline("linecache-example-1.py", 5)
print linecache.getline("linecache-example-1.py", 5)
import macurl2path
file = ":my:little:pony"
print macurl2path.pathname2url(file)
print macurl2path.url2pathname(macurl2path.pathname2url(file))
my/little/pony
:my:little:pony
Python Standard Library: Implementation Support Modules 13-15
import nturl2path
file = r"c:\my\little\pony"
print nturl2path.pathname2url(file)
print nturl2path.url2pathname(nturl2path.pathname2url(file))
///C|/my/little/pony
C:\my\little\pony
This module should not be used directly; for portability, access these functions via the urllib module
instead:
import urllib
file = r"c:\my\little\pony"
print urllib.pathname2url(file)
print urllib.url2pathname(urllib.pathname2url(file))
///C|/my/little/pony
C:\my\little\pony
Python Standard Library: Implementation Support Modules 13-16
import tokenize
file = open("tokenize-example-1.py")
tokenize.tokenize(
file.readline,
handle_token
)
Note that the tokenize function takes two callable objects; the first argument is called repeatedly to
fetch new code lines, and the second argument is called for each token.
Python Standard Library: Implementation Support Modules 13-17
import keyword
if keyword.iskeyword(name):
print name, "is a reserved word."
print "here's a complete list of reserved words:"
print keyword.kwlist
import parser
import symbol, token
def dump_and_modify(node):
name = symbol.sym_name.get(node[0])
if name is None:
name = token.tok_name.get(node[0])
print name,
for i in range(1, len(node)):
item = node[i]
if type(item) is type([]):
dump_and_modify(item)
else:
print repr(item)
if name == "NUMBER":
# increment all numbers!
node[i] = repr(int(item)+1)
list = ast.tolist()
dump_and_modify(list)
ast = parser.sequence2ast(list)
print eval(parser.compileast(ast))
import symbol
print 268
return 274
Python Standard Library: Implementation Support Modules 13-20
import token
NUMBER 2
PLUS 16
STRING 3
Python Standard Library: Other Modules 14-1
Other Modules
Overview
This chapter describes a number of less common modules. Some are useful, others are quite obscure,
and some are just plain obsolete.
Python Standard Library: Other Modules 14-2
import pyclbr
mod = pyclbr.readmodule("cgi")
for k, v in mod.items():
print k, v
In 2.0 and later, there's also an alternative interface, readmodule_ex, which returns global functions
as well.
import pyclbr
for k, v in mod.items():
print k, v
To get more information about each class, use the various attributes in the Class instances:
import pyclbr
import string
mod = pyclbr.readmodule("cgi")
def dump(c):
# print class header
s = "class " + c.name
if c.super:
s = s + "(" + string.join(map(lambda v: v.name, c.super), ", ") + ")"
print s + ":"
# print method names, sorted by line number
methods = c.methods.items()
methods.sort(lambda a, b: cmp(a[1], b[1]))
for method, lineno in methods:
print " def " + method
print
for k, v in mod.items():
dump(v)
class MiniFieldStorage:
def __init__
def __repr__
class InterpFormContentDict(SvFormContentDict):
def __getitem__
def values
def items
...
Python Standard Library: Other Modules 14-4
import filecmp
if filecmp.cmp("samples/sample.au", "samples/sample.wav"):
print "files are identical"
else:
print "files differ!"
files differ!
In 1.5.2 and earlier, you can use the cmp and dircmp modules instead.
Python Standard Library: Other Modules 14-5
import cmd
import string, sys
class CLI(cmd.Cmd):
def __init__(self):
cmd.Cmd.__init__(self)
self.prompt = '> '
def help_hello(self):
print "syntax: hello [message]",
print "-- prints a hello message"
def help_quit(self):
print "syntax: quit",
print "-- terminates the application"
# shortcuts
do_q = do_quit
#
# try it out
cli = CLI()
cli.cmdloop()
Python Standard Library: Other Modules 14-6
> help
Undocumented commands:
======================
help q
import rexec
r = rexec.RExec()
print r.r_eval("1+2+3")
print r.r_eval("__import__('os').remove('file')")
6
Traceback (innermost last):
File "rexec-example-1.py", line 5, in ?
print r.r_eval("__import__('os').remove('file')")
File "/usr/local/lib/python1.5/rexec.py", line 257, in r_eval
return eval(code, m.__dict__)
File "<string>", line 0, in ?
AttributeError: remove
Python Standard Library: Other Modules 14-8
import Bastion
class Sample:
value = 0
def getvalue(self):
return self.value
#
# try it
s = Sample()
s._set(100) # cheat
print s.getvalue()
s = Bastion.Bastion(Sample())
s._set(100) # attempt to cheat
print s.getvalue()
100
Traceback (innermost last):
...
AttributeError: _set
Python Standard Library: Other Modules 14-9
You can control which functions to publish. In the following example, the internal method can be
called from outside, but the getvalue no longer works:
import Bastion
class Sample:
value = 0
def getvalue(self):
return self.value
#
# try it
def is_public(name):
return name[:3] != "get"
s = Bastion.Bastion(Sample(), is_public)
s._set(100) # this works
print s.getvalue() # but not this
100
Traceback (innermost last):
...
AttributeError: getvalue
Python Standard Library: Other Modules 14-10
The following script shows how to use the completion functions from within a program.
import rlcompleter
import sys
completer = rlcompleter.Completer()
import statvfs
import os
st = os.statvfs(".")
import calendar
calendar.prmonth(1999, 12)
December 1999
Mo Tu We Th Fr Sa Su
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31
Python Standard Library: Other Modules 14-14
import calendar
calendar.prcal(2000)
2000
Note that the calendars are printed using European conventions; in other words, Monday is the first
day of the week.
This module contains a number of support functions which can be useful if you want to output
calendars in other formats. It's probably easiest to copy the entire file, and tweak it to suit your needs.
Python Standard Library: Other Modules 14-15
import sched
import time, sys
scheduler.run()
one
two
three
Python Standard Library: Other Modules 14-16
import statcache
import os, stat, time
now = time.time()
for i in range(1000):
st = os.stat("samples/sample.txt")
print "os.stat", "=>", time.time() - now
now = time.time()
for i in range(1000):
st = statcache.stat("samples/sample.txt")
print "statcache.stat", "=>", time.time() - now
import grep
import glob
grep.grep("\<rather\>", glob.glob("samples/*.txt"))
import dircache
#
# test cached version
t0 = time.clock()
for i in range(100):
dircache.listdir(os.sep)
#
# test standard version
t0 = time.clock()
for i in range(100):
os.listdir(os.sep)
cached 0.0664509964968
standard 0.5560845807
Python Standard Library: Other Modules 14-19
import dircmp
d = dircmp.dircmp()
d.new("samples", "oldsamples")
d.run()
d.report()
In Python 2.0 and later, this module has been replaced by the filecmp module.
Python Standard Library: Other Modules 14-20
import cmp
if cmp.cmp("samples/sample.au", "samples/sample.wav"):
print "files are identical"
else:
print "files differ!"
files differ!
In Python 2.0 and later, this module has been replaced by the filecmp module.
Python Standard Library: Other Modules 14-21
import cmpcache
if cmpcache.cmp("samples/sample.au", "samples/sample.wav"):
print "files are identical"
else:
print "files differ!"
files differ!
In Python 2.0 and later, this module has been replaced by the filecmp module.
Python Standard Library: Other Modules 14-22
readfile(filename) => string reads the contents of a text file as a single string.
def readfile(filename):
file = open(filename, "r")
return file.read()
readopenfile(file) => string returns the contents of an open file (or other file object).
def readopenfile(file):
return file.read()
Python Standard Library: Other Modules 14-23
import soundex
a = "fredrik"
b = "friedrich"
F63620 F63620
1
Python Standard Library: Other Modules 14-24
import timing
import time
def procedure():
time.sleep(1.234)
timing.start()
procedure()
timing.finish()
seconds: 1
milliseconds: 1239
microseconds: 1239999
The following script shows how you can emulate this module using functions in the standard time
module.
Example: Emulating the timing module
# File:timing-example-2.py
import time
t0 = t1 = 0
def start():
global t0
t0 = time.time()
def finish():
global t1
t1 = time.time()
def seconds():
return int(t1 - t0)
def milli():
return int((t1 - t0) * 1000)
def micro():
return int((t1 - t0) * 1000000)
You can use time.clock() instead of time.time() to get CPU time, where supported.
Python Standard Library: Other Modules 14-25
import posixfile
import string
filename = "counter.txt"
try:
# open for update
file = posixfile.open(filename, "r+")
counter = int(file.read(6)) + 1
except IOError:
# create it
file = posixfile.open(filename, "w")
counter = 0
file.lock("w|", 6)
file.seek(0) # rewind
file.write("%06d" % counter)
import bisect
bisect.insort(list, 25)
bisect.insort(list, 15)
print list
bisect(sequence, item) => index returns the index where the item should be inserted. The
sequence is not modified.
import bisect
print list
print bisect.bisect(list, 25)
print bisect.bisect(list, 15)
import knee
import os
if not os.environ.has_key("TZ"):
# set it to something...
os.environ["TZ"] = "EST+5EDT;100/2,300/2"
In addition to the variables shown in this example, this module contains a number of time
manipulation functions that use the defined time zone.
Python Standard Library: Other Modules 14-29
import regex
text = "Man's crisis of identity in the latter half of the 20th century"
p = regex.compile("latter") # literal
print p.match(text)
print p.search(text), repr(p.group(0))
p = regex.compile("[0-9]+") # number
print p.search(text), repr(p.group(0))
-1
32 'latter'
51 '20'
13 'of'
56 'century'
Python Standard Library: Other Modules 14-30
import regsub
import reconvert
import regex_syntax
import regex
def compile_awk(pattern):
return compile(pattern, regex_syntax.RE_SYNTAX_AWK)
def compile_grep(pattern):
return compile(pattern, regex_syntax.RE_SYNTAX_GREP)
def compile_emacs(pattern):
return compile(pattern, regex_syntax.RE_SYNTAX_EMACS)
Python Standard Library: Other Modules 14-33
import find
.\samples\sample.jpg