Python for Data Analysis Data Wrangling with Pandas NumPy and IPython 1st Edition Wes Mckinney - Quickly download the ebook in PDF format for unlimited reading
Python for Data Analysis Data Wrangling with Pandas NumPy and IPython 1st Edition Wes Mckinney - Quickly download the ebook in PDF format for unlimited reading
com
https://ebookmeta.com/product/python-for-data-analysis-data-
wrangling-with-pandas-numpy-and-ipython-1st-edition-wes-
mckinney/
OR CLICK HERE
DOWLOAD EBOOK
https://ebookmeta.com/product/python-data-analysis-numpy-matplotlib-
and-pandas-bernd-klein/
ebookmeta.com
https://ebookmeta.com/product/python-for-data-analysis-3rd-edition-
second-early-release-wes-mckinney/
ebookmeta.com
https://ebookmeta.com/product/illiberal-europe-eastern-europe-from-
the-fall-of-the-berlin-wall-to-the-war-in-ukraine-2nd-edition-leon-
marc/
ebookmeta.com
Mapping the Field of Adult and Continuing Education An
International Compendium 1st Edition Alan B. Knox
https://ebookmeta.com/product/mapping-the-field-of-adult-and-
continuing-education-an-international-compendium-1st-edition-alan-b-
knox/
ebookmeta.com
https://ebookmeta.com/product/what-really-happens-in-vegas-true-
stories-of-the-people-who-make-vegas-vegas-1st-edition-patterson/
ebookmeta.com
https://ebookmeta.com/product/marketing-research-delivering-customer-
insight-4th-edition-alan-wilson/
ebookmeta.com
https://ebookmeta.com/product/competition-cauldrons-conspiracy-
moonflower-mystery-5-1st-edition-beverly-rearick/
ebookmeta.com
Python for Data Analysis
Download from Wow! eBook <www.wowebook.com>
Wes McKinney
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions
are also available for most titles (http://my.safaribooksonline.com). For more information, contact our
corporate/institutional sales department: 800-998-9938 or [email protected].
Editors: Julie Steele and Meghan Blanchette Indexer: BIM Publishing Services
Production Editor: Melanie Yarbrough Cover Designer: Karen Montgomery
Copyeditor: Teresa Exley Interior Designer: David Futato
Proofreader: BIM Publishing Services Illustrator: Rebecca Demarest
Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of
O’Reilly Media, Inc. Python for Data Analysis, the cover image of a golden-tailed tree shrew, and related
trade dress are trademarks of O’Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as
trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a
trademark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and author assume
no responsibility for errors or omissions, or for damages resulting from the use of the information con-
tained herein.
ISBN: 978-1-449-31979-3
[LSI]
1349356084
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
What Is This Book About? 1
Why Python for Data Analysis? 2
Python as Glue 2
Solving the “Two-Language” Problem 2
Why Not Python? 3
Essential Python Libraries 3
NumPy 4
pandas 4
matplotlib 5
IPython 5
SciPy 6
Installation and Setup 6
Windows 7
Apple OS X 9
GNU/Linux 10
Python 2 and Python 3 11
Integrated Development Environments (IDEs) 11
Community and Conferences 12
Navigating This Book 12
Code Examples 13
Data for Examples 13
Import Conventions 13
Jargon 13
Acknowledgements 14
2. Introductory Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.usa.gov data from bit.ly 17
Counting Time Zones in Pure Python 19
iii
Counting Time Zones with pandas 21
MovieLens 1M Data Set 26
Measuring rating disagreement 30
US Baby Names 1880-2010 32
Analyzing Naming Trends 36
Conclusions and The Path Ahead 43
iv | Table of Contents
Operations between Arrays and Scalars 85
Basic Indexing and Slicing 86
Boolean Indexing 89
Fancy Indexing 92
Transposing Arrays and Swapping Axes 93
Universal Functions: Fast Element-wise Array Functions 95
Data Processing Using Arrays 97
Expressing Conditional Logic as Array Operations 98
Mathematical and Statistical Methods 100
Methods for Boolean Arrays 101
Sorting 101
Unique and Other Set Logic 102
File Input and Output with Arrays 103
Storing Arrays on Disk in Binary Format 103
Saving and Loading Text Files 104
Linear Algebra 105
Random Number Generation 106
Example: Random Walks 108
Simulating Many Random Walks at Once 109
Table of Contents | v
Other pandas Topics 151
Integer Indexing 151
Panel Data 152
vi | Table of Contents
8. Plotting and Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
A Brief matplotlib API Primer 219
Figures and Subplots 220
Colors, Markers, and Line Styles 224
Ticks, Labels, and Legends 225
Annotations and Drawing on a Subplot 228
Saving Plots to File 231
matplotlib Configuration 231
Plotting Functions in pandas 232
Line Plots 232
Bar Plots 235
Histograms and Density Plots 238
Scatter Plots 239
Plotting Maps: Visualizing Haiti Earthquake Crisis Data 241
Python Visualization Tool Ecosystem 247
Chaco 248
mayavi 248
Other Packages 248
The Future of Visualization Tools? 249
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
Table of Contents | ix
Preface
The scientific Python ecosystem of open source libraries has grown substantially over
the last 10 years. By late 2011, I had long felt that the lack of centralized learning
resources for data analysis and statistical applications was a stumbling block for new
Python programmers engaged in such work. Key projects for data analysis (especially
NumPy, IPython, matplotlib, and pandas) had also matured enough that a book written
about them would likely not go out-of-date very quickly. Thus, I mustered the nerve
to embark on this writing project. This is the book that I wish existed when I started
using Python for data analysis in 2007. I hope you find it useful and are able to apply
these tools productively in your work.
xi
This icon indicates a warning or caution.
xii | Preface
How to Contact Us
Please address comments and questions concerning this book to the publisher:
O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)
We have a web page for this book, where we list errata, examples, and any additional
information. You can access this page at http://oreil.ly/python_for_data_analysis.
To comment or ask technical questions about this book, send email to
[email protected].
For more information about our books, courses, conferences, and news, see our website
at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
Preface | xiii
CHAPTER 1
Preliminaries
1
Why Python for Data Analysis?
For many people (myself among them), the Python language is easy to fall in love with.
Since its first appearance in 1991, Python has become one of the most popular dynamic,
programming languages, along with Perl, Ruby, and others. Python and Ruby have
become especially popular in recent years for building websites using their numerous
web frameworks, like Rails (Ruby) and Django (Python). Such languages are often
called scripting languages as they can be used to write quick-and-dirty small programs,
or scripts. I don’t like the term “scripting language” as it carries a connotation that they
cannot be used for building mission-critical software. Among interpreted languages
Python is distinguished by its large and active scientific computing community. Adop-
tion of Python for scientific computing in both industry applications and academic
research has increased significantly since the early 2000s.
For data analysis and interactive, exploratory computing and data visualization, Python
will inevitably draw comparisons with the many other domain-specific open source
and commercial programming languages and tools in wide use, such as R, MATLAB,
SAS, Stata, and others. In recent years, Python’s improved library support (primarily
pandas) has made it a strong alternative for data manipulation tasks. Combined with
Python’s strength in general purpose programming, it is an excellent choice as a single
language for building data-centric applications.
Python as Glue
Part of Python’s success as a scientific computing platform is the ease of integrating C,
C++, and FORTRAN code. Most modern computing environments share a similar set
of legacy FORTRAN and C libraries for doing linear algebra, optimization, integration,
fast fourier transforms, and other such algorithms. The same story has held true for
many companies and national labs that have used Python to glue together 30 years’
worth of legacy software.
Most programs consist of small portions of code where most of the time is spent, with
large amounts of “glue code” that doesn’t run often. In many cases, the execution time
of the glue code is insignificant; effort is most fruitfully invested in optimizing the
computational bottlenecks, sometimes by moving the code to a lower-level language
like C.
In the last few years, the Cython project (http://cython.org) has become one of the
preferred ways of both creating fast compiled extensions for Python and also interfacing
with C and C++ code.
2 | Chapter 1: Preliminaries
ideas to be part of a larger production system written in, say, Java, C#, or C++. What
people are increasingly finding is that Python is a suitable language not only for doing
research and prototyping but also building the production systems, too. I believe that
more and more companies will go down this path as there are often significant organ-
izational benefits to having both scientists and technologists using the same set of pro-
grammatic tools.
pandas
pandas provides rich data structures and functions designed to make working with
structured data fast, easy, and expressive. It is, as you will see, one of the critical in-
gredients enabling Python to be a powerful and productive data analysis environment.
The primary object in pandas that will be used in this book is the DataFrame, a two-
dimensional tabular, column-oriented data structure with both row and column labels:
>>> frame
total_bill tip sex smoker day time size
1 16.99 1.01 Female No Sun Dinner 2
2 10.34 1.66 Male No Sun Dinner 3
3 21.01 3.5 Male No Sun Dinner 3
4 23.68 3.31 Male No Sun Dinner 2
5 24.59 3.61 Female No Sun Dinner 4
6 25.29 4.71 Male No Sun Dinner 4
7 8.77 2 Male No Sun Dinner 2
8 26.88 3.12 Male No Sun Dinner 4
9 15.04 1.96 Male No Sun Dinner 2
10 14.78 3.23 Male No Sun Dinner 2
pandas combines the high performance array-computing features of NumPy with the
flexible data manipulation capabilities of spreadsheets and relational databases (such
as SQL). It provides sophisticated indexing functionality to make it easy to reshape,
slice and dice, perform aggregations, and select subsets of data. pandas is the primary
tool that we will use in this book.
4 | Chapter 1: Preliminaries
For financial users, pandas features rich, high-performance time series functionality
and tools well-suited for working with financial data. In fact, I initially designed pandas
as an ideal tool for financial data analysis applications.
For users of the R language for statistical computing, the DataFrame name will be
familiar, as the object was named after the similar R data.frame object. They are not
the same, however; the functionality provided by data.frame in R is essentially a strict
subset of that provided by the pandas DataFrame. While this is a book about Python, I
will occasionally draw comparisons with R as it is one of the most widely-used open
source data analysis environments and will be familiar to many readers.
The pandas name itself is derived from panel data, an econometrics term for multidi-
mensional structured data sets, and Python data analysis itself.
matplotlib
matplotlib is the most popular Python library for producing plots and other 2D data
visualizations. It was originally created by John D. Hunter (JDH) and is now maintained
by a large team of developers. It is well-suited for creating plots suitable for publication.
It integrates well with IPython (see below), thus providing a comfortable interactive
environment for plotting and exploring data. The plots are also interactive; you can
zoom in on a section of the plot and pan around the plot using the toolbar in the plot
window.
IPython
IPython is the component in the standard scientific Python toolset that ties everything
together. It provides a robust and productive environment for interactive and explor-
atory computing. It is an enhanced Python shell designed to accelerate the writing,
testing, and debugging of Python code. It is particularly useful for interactively working
with data and visualizing data with matplotlib. IPython is usually involved with the
majority of my Python work, including running, debugging, and testing code.
Aside from the standard terminal-based IPython shell, the project also provides
• A Mathematica-like HTML notebook for connecting to IPython through a web
browser (more on this later).
• A Qt framework-based GUI console with inline plotting, multiline editing, and
syntax highlighting
• An infrastructure for interactive parallel and distributed computing
I will devote a chapter to IPython and how to get the most out of its features. I strongly
recommend using it while working through this book.
6 | Chapter 1: Preliminaries
• Scientific Python base: NumPy, SciPy, matplotlib, and IPython. These are all in-
cluded in EPDFree.
• IPython Notebook dependencies: tornado and pyzmq. These are included in EPD-
Free.
• pandas (version 0.8.2 or higher).
At some point while reading you may wish to install one or more of the following
packages: statsmodels, PyTables, PyQt (or equivalently, PySide), xlrd, lxml, basemap,
pymongo, and requests. These are used in various examples. Installing these optional
libraries is not necessary, and I would would suggest waiting until you need them. For
example, installing PyQt or PyTables from source on OS X or Linux can be rather
arduous. For now, it’s most important to get up and running with the bare minimum:
EPDFree and pandas.
For information on each Python package and links to binary installers or other help,
see the Python Package Index (PyPI, http://pypi.python.org). This is also an excellent
resource for finding new Python packages.
Windows
To get started on Windows, download the EPDFree installer from http://www.en
thought.com, which should be an MSI installer named like epd_free-7.3-1-win-
x86.msi. Run the installer and accept the default installation location C:\Python27. If
you had previously installed Python in this location, you may want to delete it manually
first (or using Add/Remove Programs).
Next, you need to verify that Python has been successfully added to the system path
and that there are no conflicts with any prior-installed Python versions. First, open a
command prompt by going to the Start Menu and starting the Command Prompt ap-
plication, also known as cmd.exe. Try starting the Python interpreter by typing
python. You should see a message that matches the version of EPDFree you installed:
C:\Users\Wes>python
Python 2.7.3 |EPD_free 7.3-1 (32-bit)| (default, Apr 12 2012, 14:30:37) on win32
Type "credits", "demo" or "enthought" for more information.
>>>
If you installed other versions of Python, be sure to delete any other Python-related
directories from both the system and user Path variables. After making a path alterna-
tion, you have to restart the command prompt for the changes to take effect.
Once you can launch Python successfully from the command prompt, you need to
install pandas. The easiest way is to download the appropriate binary installer from
http://pypi.python.org/pypi/pandas. For EPDFree, this should be pandas-0.9.0.win32-
py2.7.exe. After you run this, let’s launch IPython and check that things are installed
correctly by importing pandas and making a simple matplotlib plot:
C:\Users\Wes>ipython --pylab
Python 2.7.3 |EPD_free 7.3-1 (32-bit)|
Type "copyright", "credits" or "license" for more information.
In [2]: plot(arange(10))
If successful, there should be no error messages and a plot window will appear. You
can also check that the IPython HTML notebook can be successfully run by typing:
$ ipython notebook --pylab=inline
EPDFree on Windows contains only 32-bit executables. If you want or need a 64-bit
setup on Windows, using EPD Full is the most painless way to accomplish that. If you
would rather install from scratch and not pay for an EPD subscription, Christoph
Gohlke at the University of California, Irvine, publishes unofficial binary installers for
8 | Chapter 1: Preliminaries
all of the book’s necessary packages (http://www.lfd.uci.edu/~gohlke/pythonlibs/) for 32-
and 64-bit Windows.
Apple OS X
To get started on OS X, you must first install Xcode, which includes Apple’s suite of
software development tools. The necessary component for our purposes is the gcc C
and C++ compiler suite. The Xcode installer can be found on the OS X install DVD
that came with your computer or downloaded from Apple directly.
Once you’ve installed Xcode, launch the terminal (Terminal.app) by navigating to
Applications > Utilities. Type gcc and press enter. You should hopefully see some-
thing like:
$ gcc
i686-apple-darwin10-gcc-4.2.1: no input files
Download from Wow! eBook <www.wowebook.com>
Now you need to install EPDFree. Download the installer which should be a disk image
named something like epd_free-7.3-1-macosx-i386.dmg. Double-click the .dmg file to
mount it, then double-click the .mpkg file inside to run the installer.
When the installer runs, it automatically appends the EPDFree executable path to
your .bash_profile file. This is located at /Users/your_uname/.bash_profile:
# Setting PATH for EPD_free-7.3-1
PATH="/Library/Frameworks/Python.framework/Versions/Current/bin:${PATH}"
export PATH
Should you encounter any problems in the following steps, you’ll want to inspect
your .bash_profile and potentially add the above directory to your path.
Now, it’s time to install pandas. Execute this command in the terminal:
$ sudo easy_install pandas
Searching for pandas
Reading http://pypi.python.org/simple/pandas/
Reading http://pandas.pydata.org
Reading http://pandas.sourceforge.net
Best match: pandas 0.9.0
Downloading http://pypi.python.org/packages/source/p/pandas/pandas-0.9.0.zip
Processing pandas-0.9.0.zip
Writing /tmp/easy_install-H5mIX6/pandas-0.9.0/setup.cfg
Running pandas-0.9.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-H5mIX6/
pandas-0.9.0/egg-dist-tmp-RhLG0z
Adding pandas 0.9.0 to easy-install.pth file
Installed /Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/
site-packages/pandas-0.9.0-py2.7-macosx-10.5-i386.egg
Processing dependencies for pandas
Finished processing dependencies for pandas
To verify everything is working, launch IPython in Pylab mode and test importing pan-
das then making a plot interactively:
In [2]: plot(arange(10))
If this succeeds, a plot window with a straight line should pop up.
GNU/Linux
Linux details will vary a bit depending on your Linux flavor, but here I give details for
Debian-based GNU/Linux systems like Ubuntu and Mint. Setup is similar to OS X with
the exception of how EPDFree is installed. The installer is a shell script that must be
executed in the terminal. Depending on whether you have a 32-bit or 64-bit system,
you will either need to install the x86 (32-bit) or x86_64 (64-bit) installer. You will then
have a file named something similar to epd_free-7.3-1-rh5-x86_64.sh. To install it,
execute this script with bash:
$ bash epd_free-7.3-1-rh5-x86_64.sh
After accepting the license, you will be presented with a choice of where to put the
EPDFree files. I recommend installing the files in your home directory, say /home/wesm/
epd (substituting your own username for wesm).
Once the installer has finished, you need to add EPDFree’s bin directory to your
$PATH variable. If you are using the bash shell (the default in Ubuntu, for example), this
means adding the following path addition in your .bashrc:
export PATH=/home/wesm/epd/bin:$PATH
Obviously, substitute the installation directory you used for /home/wesm/epd/. After
doing this you can either start a new terminal process or execute your .bashrc again
with source ~/.bashrc.
10 | Chapter 1: Preliminaries
You need a C compiler such as gcc to move forward; many Linux distributions include
gcc, but others may not. On Debian systems, you can install gcc by executing:
sudo apt-get install gcc
If you type gcc on the command line it should say something like:
$ gcc
gcc: no input files
If you installed EPDFree as root, you may need to add sudo to the command and enter
the sudo or root password. To verify things are working, perform the same checks as
in the OS X section.
I encourage you to download the data and use it to replicate the book’s code examples
and experiment with the tools presented in each chapter. I will happily accept contri-
butions, scripts, IPython notebooks, or any other materials you wish to contribute to
the book's repository for all to enjoy.
12 | Chapter 1: Preliminaries
Code Examples
Most of the code examples in the book are shown with input and output as it would
appear executed in the IPython shell.
In [5]: code
Out[5]: output
At times, for clarity, multiple code examples will be shown side by side. These should
be read left to right and executed separately.
In [5]: code In [6]: code2
Out[5]: output Out[6]: output2
Import Conventions
The Python community has adopted a number of naming conventions for commonly-
used modules:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
This means that when you see np.arange, this is a reference to the arange function in
NumPy. This is done as it’s considered bad practice in Python software development
to import everything (from numpy import *) from a large package like NumPy.
Jargon
I’ll use some terms common both to programming and data science that you may not
be familiar with. Thus, here are some brief definitions:
Munge/Munging/Wrangling
Describes the overall process of manipulating unstructured and/or messy data into
a structured or clean form. The word has snuck its way into the jargon of many
modern day data hackers. Munge rhymes with “lunge”.
Acknowledgements
It would have been difficult for me to write this book without the support of a large
number of people.
On the O’Reilly staff, I’m very grateful for my editors Meghan Blanchette and Julie
Steele who guided me through the process. Mike Loukides also worked with me in the
proposal stages and helped make the book a reality.
I received a wealth of technical review from a large cast of characters. In particular,
Martin Blais and Hugh White were incredibly helpful in improving the book’s exam-
ples, clarity, and organization from cover to cover. James Long, Drew Conway, Fer-
nando Pérez, Brian Granger, Thomas Kluyver, Adam Klein, Josh Klein, Chang She, and
Stéfan van der Walt each reviewed one or more chapters, providing pointed feedback
from many different perspectives.
I got many great ideas for examples and data sets from friends and colleagues in the
data community, among them: Mike Dewar, Jeff Hammerbacher, James Johndrow,
Kristian Lum, Adam Klein, Hilary Mason, Chang She, and Ashley Williams.
I am of course indebted to the many leaders in the open source scientific Python com-
munity who’ve built the foundation for my development work and gave encouragement
while I was writing this book: the IPython core team (Fernando Pérez, Brian Granger,
Min Ragan-Kelly, Thomas Kluyver, and others), John Hunter, Skipper Seabold, Travis
Oliphant, Peter Wang, Eric Jones, Robert Kern, Josef Perktold, Francesc Alted, Chris
Fonnesbeck, and too many others to mention. Several other people provided a great
deal of support, ideas, and encouragement along the way: Drew Conway, Sean Taylor,
Giuseppe Paleologo, Jared Lander, David Epstein, John Krowas, Joshua Bloom, Den
Pilsworth, John Myles-White, and many others I’ve forgotten.
I’d also like to thank a number of people from my formative years. First, my former
AQR colleagues who’ve cheered me on in my pandas work over the years: Alex Reyf-
man, Michael Wong, Tim Sargen, Oktay Kurbanov, Matthew Tschantz, Roni Israelov,
Michael Katz, Chris Uga, Prasad Ramanan, Ted Square, and Hoon Kim. Lastly, my
academic advisors Haynes Miller (MIT) and Mike West (Duke).
On the personal side, Casey Dinkin provided invaluable day-to-day support during the
writing process, tolerating my highs and lows as I hacked together the final draft on
14 | Chapter 1: Preliminaries
top of an already overcommitted schedule. Lastly, my parents, Bill and Kim, taught me
to always follow my dreams and to never settle for less.
Acknowledgements | 15
CHAPTER 2
Introductory Examples
This book teaches you the Python tools to work productively with data. While readers
may have many different end goals for their work, the tasks required generally fall into
a number of different broad groups:
Interacting with the outside world
Reading and writing with a variety of file formats and databases.
Preparation
Cleaning, munging, combining, normalizing, reshaping, slicing and dicing, and
transforming data for analysis.
Transformation
Applying mathematical and statistical operations to groups of data sets to derive
new data sets. For example, aggregating a large table by group variables.
Modeling and computation
Connecting your data to statistical models, machine learning algorithms, or other
computational tools
Presentation
Creating interactive or static graphical visualizations or textual summaries
In this chapter I will show you a few data sets and some things we can do with them.
These examples are just intended to pique your interest and thus will only be explained
at a high level. Don’t worry if you have no experience with any of these tools; they will
be discussed in great detail throughout the rest of the book. In the code examples you’ll
see input and output prompts like In [15]:; these are from the IPython shell.
17
In the case of the hourly snapshots, each line in each file contains a common form of
web data known as JSON, which stands for JavaScript Object Notation. For example,
if we read just the first line of a file you may see something like
In [15]: path = 'ch02/usagov_bitly_data2012-03-16-1331923249.txt'
In [16]: open(path).readline()
Out[16]: '{ "a": "Mozilla\\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\\/535.11
(KHTML, like Gecko) Chrome\\/17.0.963.78 Safari\\/535.11", "c": "US", "nk": 1,
"tz": "America\\/New_York", "gr": "MA", "g": "A6qOVH", "h": "wfLQtf", "l":
"orofrog", "al": "en-US,en;q=0.8", "hh": "1.usa.gov", "r":
"http:\\/\\/www.facebook.com\\/l\\/7AQEFzjSi\\/1.usa.gov\\/wfLQtf", "u":
"http:\\/\\/www.ncbi.nlm.nih.gov\\/pubmed\\/22415991", "t": 1331923247, "hc":
1331822918, "cy": "Danvers", "ll": [ 42.576698, -70.954903 ] }\n'
Python has numerous built-in and 3rd party modules for converting a JSON string into
a Python dictionary object. Here I’ll use the json module and its loads function invoked
on each line in the sample file I downloaded:
import json
path = 'ch02/usagov_bitly_data2012-03-16-1331923249.txt'
records = [json.loads(line) for line in open(path)]
If you’ve never programmed in Python before, the last expression here is called a list
comprehension, which is a concise way of applying an operation (like json.loads) to a
collection of strings or other objects. Conveniently, iterating over an open file handle
gives you a sequence of its lines. The resulting object records is now a list of Python
dicts:
In [18]: records[0]
Out[18]:
{u'a': u'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like
Gecko) Chrome/17.0.963.78 Safari/535.11',
u'al': u'en-US,en;q=0.8',
u'c': u'US',
u'cy': u'Danvers',
u'g': u'A6qOVH',
u'gr': u'MA',
u'h': u'wfLQtf',
u'hc': 1331822918,
u'hh': u'1.usa.gov',
u'l': u'orofrog',
u'll': [42.576698, -70.954903],
u'nk': 1,
u'r': u'http://www.facebook.com/l/7AQEFzjSi/1.usa.gov/wfLQtf',
u't': 1331923247,
u'tz': u'America/New_York',
u'u': u'http://www.ncbi.nlm.nih.gov/pubmed/22415991'}
1. http://www.usa.gov/About/developer-resources/1usagov.shtml
The u here in front of the quotation stands for unicode, a standard form of string en-
coding. Note that IPython shows the time zone string object representation here rather
than its print equivalent:
In [20]: print records[0]['tz']
America/New_York
KeyError: 'tz'
Oops! Turns out that not all of the records have a time zone field. This is easy to handle
as we can add the check if 'tz' in rec at the end of the list comprehension:
In [26]: time_zones = [rec['tz'] for rec in records if 'tz' in rec]
In [27]: time_zones[:10]
Out[27]:
[u'America/New_York',
u'America/Denver',
u'America/New_York',
u'America/Sao_Paulo',
u'America/New_York',
u'America/New_York',
u'Europe/Warsaw',
u'',
u'',
u'']
Just looking at the first 10 time zones we see that some of them are unknown (empty).
You can filter these out also but I’ll leave them in for now. Now, to produce counts by
time zone I’ll show two approaches: the harder way (using just the Python standard
library) and the easier way (using pandas). One way to do the counting is to use a dict
to store counts while we iterate through the time zones:
def get_counts(sequence):
counts = {}
If you know a bit more about the Python standard library, you might prefer to write
the same thing more briefly:
from collections import defaultdict
def get_counts2(sequence):
counts = defaultdict(int) # values will initialize to 0
for x in sequence:
counts[x] += 1
return counts
I put this logic in a function just to make it more reusable. To use it on the time zones,
Download from Wow! eBook <www.wowebook.com>
In [32]: counts['America/New_York']
Out[32]: 1251
In [33]: len(time_zones)
Out[33]: 3440
If we wanted the top 10 time zones and their counts, we have to do a little bit of dic-
tionary acrobatics:
def top_counts(count_dict, n=10):
value_key_pairs = [(count, tz) for tz, count in count_dict.items()]
value_key_pairs.sort()
return value_key_pairs[-n:]
We have then:
In [35]: top_counts(counts)
Out[35]:
[(33, u'America/Sao_Paulo'),
(35, u'Europe/Madrid'),
(36, u'Pacific/Honolulu'),
(37, u'Asia/Tokyo'),
(74, u'Europe/London'),
(191, u'America/Denver'),
(382, u'America/Los_Angeles'),
(400, u'America/Chicago'),
(521, u''),
(1251, u'America/New_York')]
In [51]: counts.most_common(10)
Out[51]:
[(u'America/New_York', 1251),
(u'', 521),
(u'America/Chicago', 400),
(u'America/Los_Angeles', 382),
(u'America/Denver', 191),
(u'Europe/London', 74),
(u'Asia/Tokyo', 37),
(u'Pacific/Honolulu', 36),
(u'Europe/Madrid', 35),
(u'America/Sao_Paulo', 33)]
In [292]: frame
Out[292]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3560 entries, 0 to 3559
Data columns:
_heartbeat_ 120 non-null values
a 3440 non-null values
al 3094 non-null values
c 2919 non-null values
cy 2919 non-null values
g 3440 non-null values
gr 2919 non-null values
h 3440 non-null values
hc 3440 non-null values
hh 3440 non-null values
kw 93 non-null values
l 3440 non-null values
ll 2919 non-null values
nk 3440 non-null values
r 3440 non-null values
t 3440 non-null values
tz 3440 non-null values
In [293]: frame['tz'][:10]
Out[293]:
0 America/New_York
1 America/Denver
2 America/New_York
3 America/Sao_Paulo
4 America/New_York
5 America/New_York
6 Europe/Warsaw
7
8
9
Name: tz
The output shown for the frame is the summary view, shown for large DataFrame ob-
jects. The Series object returned by frame['tz'] has a method value_counts that gives
us what we’re looking for:
In [294]: tz_counts = frame['tz'].value_counts()
In [295]: tz_counts[:10]
Out[295]:
America/New_York 1251
521
America/Chicago 400
America/Los_Angeles 382
America/Denver 191
Europe/London 74
Asia/Tokyo 37
Pacific/Honolulu 36
Europe/Madrid 35
America/Sao_Paulo 33
Then, we might want to make a plot of this data using plotting library, matplotlib. You
can do a bit of munging to fill in a substitute value for unknown and missing time zone
data in the records. The fillna function can replace missing (NA) values and unknown
(empty strings) values can be replaced by boolean array indexing:
In [296]: clean_tz = frame['tz'].fillna('Missing')
In [299]: tz_counts[:10]
Out[299]:
America/New_York 1251
Unknown 521
America/Chicago 400
America/Los_Angeles 382
America/Denver 191
Missing 120
Making a horizontal bar plot can be accomplished using the plot method on the
counts objects:
In [301]: tz_counts[:10].plot(kind='barh', rot=0)
See Figure 2-1 for the resulting figure. We’ll explore more tools for working with this
kind of data. For example, the a field contains information about the browser, device,
or application used to perform the URL shortening:
In [302]: frame['a'][1]
Out[302]: u'GoogleMaps/RochesterNY'
In [303]: frame['a'][50]
Out[303]: u'Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/20100101 Firefox/10.0.2'
In [304]: frame['a'][51]
Out[304]: u'Mozilla/5.0 (Linux; U; Android 2.2.2; en-us; LG-P925/V10e Build/FRG83G) AppleWebKit/533.1 (K
Parsing all of the interesting information in these “agent” strings may seem like a
daunting task. Luckily, once you have mastered Python’s built-in string functions and
regular expression capabilities, it is really not so bad. For example, we could split off
the first token in the string (corresponding roughly to the browser capability) and make
another summary of the user behavior:
In [305]: results = Series([x.split()[0] for x in frame.a.dropna()])
In [306]: results[:5]
Out[306]:
0 Mozilla/5.0
1 GoogleMaps/RochesterNY
2 Mozilla/4.0
3 Mozilla/5.0
4 Mozilla/5.0
Now, suppose you wanted to decompose the top time zones into Windows and non-
Windows users. As a simplification, let’s say that a user is on Windows if the string
'Windows' is in the agent string. Since some of the agents are missing, I’ll exclude these
from the data:
In [308]: cframe = frame[frame.a.notnull()]
In [310]: operating_system[:5]
Out[310]:
0 Windows
1 Not Windows
2 Windows
3 Not Windows
4 Windows
Name: a
Then, you can group the data by its time zone column and this new list of operating
systems:
In [311]: by_tz_os = cframe.groupby(['tz', operating_system])
The group counts, analogous to the value_counts function above, can be computed
using size. This result is then reshaped into a table with unstack:
In [312]: agg_counts = by_tz_os.size().unstack().fillna(0)
In [313]: agg_counts[:10]
Out[313]:
a Not Windows Windows
tz
245 276
Africa/Cairo 0 3
Africa/Casablanca 0 1
Africa/Ceuta 0 2
Africa/Johannesburg 0 1
Africa/Lusaka 0 1
America/Anchorage 4 1
America/Argentina/Buenos_Aires 1 0
CHAPTER IV
A TOY MOTHER
CHAPTER V
GREEN BAY-LEAVES
Lady Currey was not at all pleased with her son’s engagement,
and she said so. She came to town for this purpose, and made
Gilbert give her lunch while she strongly disapproved, from the hors
d’œuvres to the coffee. She had the soulless good looks which Time,
as if contemptuous, neglects to touch. And because she could afford
to do so, she purposely dressed in a middle-aged, sober fashion
which she considered dignified. She had a great sense of her own
importance, and the modern grandmother of fifty in ninon and
picture-hats was to her extreme anathema. She and Circe were
much the same age. Sybil Daunton-Pole had flashed into society like
a brilliant comet, a trail of admirers behind her, when Gilbert’s
mother, the amiable daughter of the then Home Secretary, had been
one of the small and unremarked stars that dot the social firmament.
Lady Currey had brought her husband a considerable sum of
money, but the only thing for which she needed money was to
gratify her craze for old china. If she had any heart or soul it was
given to her specimens of priceless Ming and old Chelsea. She spent
hours every day dusting her cabinets. Her only idea of travel was the
opportunity it gave her for visiting museums and picking up bargains
in rare porcelain.
For Gilbert she had a pleasant feeling of proprietorship—much the
same as she felt for the wonderful famille rose-jar of the Kien-Lung
period which she had herself unearthed in a visit to the East. Gilbert
was an only child, and he had been little or no trouble. This was the
first time he had disappointed her. When other mothers complained
of their sons, of escapades at Eton and Oxford, or premature and
undesirable love affairs, of monumental debts and lack of family
pride, Lady Currey’s lips always took on an added shade of
complacency as she thought of Gilbert and the even and admirable
tenour of his way. It was entirely becoming that Gilbert should be so
satisfactory and in some way reflected well on herself, just as did the
discovery of the famille rose-jar. Lady Currey liked everything around
her to be comme il faut, not the elastic comme il faut of fashion, but
rather the correctness of the copybook and the ten commandments.
Curiously enough, engrossed in herself and her china, she had never
until quite recently speculated, as do most mothers, on her son’s
probable choice of a wife. When she had thought of it, she had
dismissed the idea with the assurance that Gilbert would choose
wisely and soberly and to his advantage. It was not in her to feel
any jealousy of the woman Gilbert should love.
“I am grieved,” she said, sitting very upright—she rarely used the
back of a chair—“I am grieved to think that you intend to marry into
the Iverson family. The Iversons are not a family of which I—or any
right-thinking people—approve.”
“But, mother,” said Gilbert, rather taken aback, for he had become
used to her invariable approval, “I am not marrying the family. I am
marrying Claudia.”
“Ah! that’s what you think—the usual reply. For Geoffrey Iverson I
have no particular dislike—he has been the cat’s-paw of a clever and
unscrupulous woman. His family is a very good one. She would have
spoilt any man who had the misfortune to be married to her. Why,
Sybil Iverson is notorious!”
“Claudia is quite unlike her in every way. Why, she is not even like
her in appearance.”
Lady Currey lifted her thin, fair eyebrows. It was unbecoming that
she should tell him the scandalous rumours that floated about
respecting Claudia’s parentage: Such things could only be told by a
father to a son. She vehemently disapproved of any plain speaking
between the sexes. Such a crime could never be laid to her charge;
not even in the marital chamber had she ever discussed any such
thing.
“She is the daughter of her mother, Gilbert, and the mother—I say
it deliberately—is a bad woman, a woman who has trailed the glory
and purity of the flower of womanhood in the dust.” Lady Currey
occasionally indulged in such flights of rhetoric. She had rehearsed
this in the train.
“I don’t think the two women see much of one another.” Gilbert
was a little nettled. “Claudia told me herself that she hardly knew
her mother at all in her young days. She was left entirely to her
governesses. She can hardly have imbibed any—any idea from her
mother.”
The pathos of such an admission did not strike Lady Currey, it only
helped to justify her present attitude.
“It is, of course, very painful for me to have to mention such
matters to you, but why has she seen so little of her mother?
Because Sybil was—I blush to say it—so surrounded by lovers that
she neglected her maternal duties. I say again, she is notorious for
her lax life and morals. Don’t you believe in heredity, Gilbert? Think
of the blood that runs in that girl’s veins.”
Gilbert frowned. “Heredity is a curious thing. Not worth worrying
over, I think. I don’t profess to understand it.”
“I have studied the question.” She had read one book that was
quite out of date. “I firmly believe in heredity. The vices or the
virtues of the father and mother are surely transmitted to the
children.” It was pleasing to think that only virtues could be
transmitted to Gilbert, but it was all the more annoying that those
inherited virtues should be linked with the vices of Sybil Iverson’s
child.
Gilbert was becoming annoyed, and made no reply. After all, his
mother was only a woman, and women never could argue. It jarred
on his manhood that she should take him to task, and his voice was
a little cold as he inquired what she would take to drink.
“You know I always take one glass of claret.” The tone somehow
implied that a woman like Sybil Iverson might reprehensibly vary her
drink with lunch, but she had regular habits. Then she returned to
the attack.
“Claudia is not the woman that we—your dear father and I—would
have chosen for you.”
“Doesn’t every mother say that about her son’s choice?”
His mother sighed and waited while Gilbert ordered the wine.
“What sort of bringing-up has she had? What sort of a wife and
mother will such a girl make? Her mother’s only god was pleasure,
her only commandment ‘Enjoy the fleeting hour.’ Do you mean to tell
me that the daughter of such a woman has proper ideas about life?
Would you care to be the complaisant husband of a Circe?”
But here Gilbert put his foot down. His mother must be made to
see that he knew quite well what he was about, that he had not run
haphazard into this engagement. Not on any account would he let
her see that curious mixture of surprise and annoyance at the back
of his mind when he thought of the proposal scene. He had an
undefined feeling that he had been hurried into it, though how he
had been hurried, by whom or by what, he did not seek to explain
even to himself. To Gilbert’s cast of mind vague feelings were best
ignored as symptoms of a weak and illogical brain, much the same
as vague symptoms may denote an illness of the body. Still the
feeling was there, behind many stacks of docketed and pigeonholed
pieces of information. Yet he had almost made up his mind to
propose to Claudia—oh! yes—only—that particular night?
“Mother, I cannot hear you say such ridiculous things about
Claudia. You do not know her. You might as well say that the
children of murderers will all grow up murderers.”
“You might commit murder in a sudden fit of passion, but such a
warped, degraded nature as Sybil Iverson’s is another story. Besides
—the sons of a murderer have probably seen him hanged or
punished—the law steps in; but who punishes a woman like Sybil
Iverson? Society, nowadays, is too lax to such creatures, and
virtuous women have to mix with them and take them by the hand,
or else be dubbed ridiculous or old-fashioned. Well,” with a sudden
little gust of passion like a disturbance in a tea-cup, “thank God, I
am old-fashioned and absurd. I can say my prayers every night and
lie down in peace.... No, Gilbert, you know I only take one glass of
claret.”
“They say Mrs. Iverson has given up her wicked, siren-like ways
and gone in for spiritualism.” He wished his mother realized that she
was keeping him from his work and would hurry up with her lunch.
The leisurely ways of the country were not those of town. But Lady
Currey was doing her duty.
“Such women never give up their wicked ways, they take them to
the grave with them.” Both Gilbert and his mother had very little
sense of humour, with the distinction that Gilbert knew when things
were ridiculous. “I know Sybil’s mother died of a broken heart.” This
was quite untrue, she had died of fatty degeneration of the liver.
“But there, the Psalms say that the wicked flourish like green bay-
trees, and if they did in King David’s time there is no doubt they do
now. But their punishment awaits them, Gilbert; always remember
that.”
Gilbert nodded absently. Life after death was one of the vague
things, like psychology, that he did not consider as practical politics.
But he did not tell his mother this. If she liked to imagine him
striving for a golden harp with humility of soul, she might.
“I confess I am disappointed in you, Gilbert. I had looked forward
to your choosing some nice girl I could take to my heart, someone
like Maud Curtice, for example.”
Maud Curtice was a colourless girl who agreed with Lady Currey in
being shocked at the modern scanty fashion of dressing—she was
painfully thin and had ungainly hands and feet—and who devoted
herself to the mothers of eligible sons. She also had a large income.
“Wait till you know Claudia, mother. You are sure to like her.”
“I have heard she is very handsome and a great favourite in
Society,” returned his mother gloomily. “It is a bad report to my way
of thinking. That’s how her mother started.”
Just then, to his great relief, Gilbert caught sight of Colin Paton
wending his way out of the restaurant. He hailed him with joy, and
Paton came to a standstill beside their table.
Lady Currey approved of Colin Paton. His manners were respectful
and he showed an intelligent interest in china. She never noticed the
quizzical gravity with which he received her views on life, nor the
humorous twinkle in his eyes at her criticisms. She thought him “a
very nice young man.”
“Colin, old man, come and have some coffee with us.”
“Just had some. I hope you are quite well, Lady Currey?”
Gilbert made a business of looking at his watch and starting with
alarm. “By jove, I didn’t know it was so late. I must just swallow my
coffee and run. May I leave the mater with you to finish her coffee
at her leisure?”
Colin caught the appeal in Gilbert’s eyes and guessed the cause.
“Certainly, if Lady Currey will accept me as a poor substitute for
you.”
Lady Currey smiled a gracious assent. “I hope your dear mother is
better, Mr. Paton?”
“Yes, thank you.... Busy as usual, Gilbert? I hear the proverbial
busy bee is quite out of it.”
“Well, I am tearingly busy. Don’t get a minute to myself.”
Paton slipped into his chair. “And yet you’ve found time to get
engaged, I hear? I wrote my congratulation this morning.”
“Thanks, old chap. Oh! getting engaged doesn’t take very long.”
Gilbert laughed pleasantly and displayed his firm white teeth.
“Doesn’t it?” returned Paton, smiling. “I think it would take me no
end of a time. But there, we shall soon be born in the morning,
married at midday, and buried in the evening!” He saw Lady Currey
looking at him rather doubtfully. “A man like your son, Lady Currey,
takes a woman and the world by storm. Veni, vidi, vici is not for me.
Women have to know me quite a long time before they remember
me.”
“I am sure you have a great many friends,” she said
encouragingly.
“Yes, that’s why I expect I shall never get a wife.... Really must
go, Gilbert? I had tea with Claudia and the long-legged Patricia
yesterday. We wished you could have been with us.”
“Teas are not in my line. I suppose I shall see you again soon?”
“Well, I’m going away, you know.”
Gilbert turned back in surprise.
“What, at the beginning of the season!” exclaimed Lady Currey.
“Going out to the Argentine for a while. A friend of mine is going
out on a political mission and wants an assistant. I’ve decided to
accompany him. Never been there, and it must be an interesting
country.”
Gilbert raised his eyebrows. Why on earth didn’t Paton stop in one
place and make a name for himself? He had often advised him to do
so.
“Sudden isn’t it? I thought you said the other night that you were
remaining in town until the end of July.”
Paton nodded. “I’ve changed my mind. I think I want a change. I
shall only be away six months or so, perhaps a year.”
Gilbert’s thoughts had raced ahead. “Then if we’re married at the
end of July, as is probable, you’ll be away? That’s too bad. I had
relied on you for being best man.”
“You’ll be married so soon? No, I am afraid I can’t assist to give
you away.”
Gilbert again expressed his regrets, which were quite genuine, and
left his mother with Paton. Colin did not make the mistake of rushing
in where angels fear to tread, but waited for Lady Currey’s
comments.
“What do you think of this engagement, Mr. Paton? I know I can
speak to you quite frankly. I think it is a great mistake. Weren’t you
surprised?”
“Yes,” returned Paton truthfully, “I was very surprised. Gilbert did
not confide his hopes in me. I didn’t see any wooing going on, and
he never talked about her to me. He must have made the running
quickly.” Then he added, half to himself, “He can’t have seen a great
deal of her.”
“Of course not, or he wouldn’t have done it. Gilbert, for once in his
life, has lost his head over a pretty woman. Why, you are much more
of a friend than Gilbert.”
A slight shadow crossed her companion’s face and he dropped his
eyelids. “Well, I thought I was. But then friend—oh! it’s the veni,
vidi, vici trick. She’s a charming girl, Lady Currey, with all sorts of
possibilities.”
Lady Currey pursed up her thin lips that had never bestowed or
received a kiss of passion. “She is handsome, certainly. But is she
the wife for Gilbert? I have lived long enough to know that looks are
a poor foundation for matrimony.”
“She has quite a good deal of character,” said her companion
quietly, without any annoying enthusiasm. “I am sure she will
develop into a splendid woman with the man she loves. She isn’t the
usual pretty society doll, you know.”
“Does it strike you that Gilbert wants a woman of character?”
asked his mother with unexpected acuteness. “Clever men are
usually better mated to stupid wives. Look at Carlyle and Jane
Welsh! Much too clever for one another.” Then irrelevantly, “There
are too many clever girls nowadays. I don’t believe they make any
the better wives and mothers for being so clever. I am sure I never
wanted such a daughter-in-law.”
Paton found himself at a loss for conversation. He knew he could
do Claudia no good by praising her warmly to her future mother-in-
law, he might even make matters worse. Yet to hear Claudia belittled
made something leap within him into fierce flame. It seemed disloyal
to listen to Lady Currey’s sneers. Yet he knew that Claudia must
storm the citadel of Lady Currey’s heart herself. As an advance agent
his labours would be wasted. But Paton, looking across the table into
the light, offended eyes of the woman, was sorry for the girl. It was
rather odd. His mother, a confirmed invalid, and Lady Currey had
been close friends in their youth. Yet his mother had warmly liked
Claudia when she had once met her for a few minutes. He was
startled to find that his current of thought had communicated itself
to Lady Currey.
“Your mother always did like pretty things—I know she admires
Claudia—but she was always unduly swayed by good looks, even at
school. I know how deceptive they are. A man told me the other day
that his wife had left him and been through the Divorce Court, and
he attributed it entirely to her good looks. ‘A very pretty woman is
difficult to live with,’ he said; ‘she gets a great deal of adulation and
flattery in Society, and naturally the husband at home falls rather
flat.’ There is a lot of truth in that, Mr. Paton.”
“Perhaps he was the typical English husband who, as soon as he
has won a wife, forgets to be her lover,” replied Paton. “You are very
careful and precious of your rare china, Lady Currey.”
His vis-à-vis stared. She wondered that Paton, who was usually so
smooth in conversation, should make such a sudden jump. But it
served to divert her mind from Claudia.
“I had such luck last week. I was walking along the High Street in
Moulton and I caught sight of a pair of vases. I thought that powder
blue could be nothing less than Chinese. They had blue and white
reserves on them. You know what that means. I got them for a mere
song, and they’re beauties. Since I last saw you I have bought....”
Still talking china, Paton saw her into a taxi.
He strolled away from the restaurant. It was warm and sunny, and
the pedestrians seemed all in a good humour. Paton often wandered
for hours through the streets of London, finding in that wonderful
panorama food for eyes and brain and heart. He loved the feeling
that he was part of the crowd, and his mind was stored with many
observations and memories. The romance of the streets was no idle
journalistic phrase to him. He felt it around him on all sides, plucking
at him with alluring fingers leading him into the land of dreams.
Often at night he would give himself wholly up to its enchantment,
wandering along mile after mile through quaint byways and on misty
commons, through silent Suburbia and the noisy, restless East-end
slums. London was to him a book of unending pages with countless
illustrations.
This afternoon he mingled with the crowd, but he did not heed it,
so that he did not see a woman in a motor energetically waving her
hand to him and directing the chauffeur to stop.
“Mr. Paton—oh! Mr. Paton, what a day-dream!”
It was Claudia herself, looking altogether charming in light
summer attire. There were waving, greeny-blue ostrich feathers in
her Leghorn hat and around her neck. The softness of the feathers
and the peculiar shade of blue accentuated the creamy tint of her
skin and the brightness of her eyes. Her happiness shone through
the envelope of the flesh like a flame through clear glass. A heavy-
eyed woman of the lower classes who was passing marked her and
muttered, “She has a good time, I’ll be bound,” then, wrapped in her
own bad one, passed on.
Paton went up to the car and held out his hand.
“Mr. Paton, you’re just the man I want. Do come and see some
pictures with me. Jujubes hates pictures, don’t you, Jujubes?” She
turned to the faded, amiable woman beside her in the car.
“I don’t hate them, but they all look so alike,” said Jujubes mildly.
“When you’ve seen one, it seems to me you’ve seen the lot.”
“There, listen to this awful heathen who rejoices in her darkness!
Leave me not to her tender mercies. Jujubes can do some shopping
for me.” She looked entreatingly at him with her fresh young mouth
smiling at herself, Jujubes, Paton and the whole world.
He hesitated for the fraction of a second. Then he said cheerily:
“Of course I’ll come, if only out of kindness to Miss Jujubes. And I
shan’t be seeing any more English pictures for a long time, I
suppose.” Then he told her of his intended visit to the Argentine.
“Oh!” said Claudia blankly. “Oh! I wish you weren’t going away. I
shall miss you so much—we shall all miss you.” She said it quite
naturally as the thought came to her mind. One could always do that
with Colin Paton.
“Thank you,” he said smilingly, as he helped Jujubes to alight. “It’s
very good of you to say so.” He seated himself beside Claudia.
“Don’t. You needn’t be formal and polite. Why are you going? Is it
the wanderlust again? Or is it to help you in your career?”
Gilbert had taught her to think of careers.
“Oh! I shall never have a career,” said Paton lightly, aware of the
soft, dark eyes on his face questioning him. But he did not meet
them. Somehow they held a look in them to-day that he could not
bear. “I don’t concentrate, you know. I’m just ‘a blooming amateur.’
Gilbert was reading me a solemn lecture the other day, but—I go on
the same old way. I’m glad, however, that Gilbert is getting on so
well. But then, he does concentrate.”
“He works very hard,” said Claudia thoughtfully, “I had no idea
how hard. He does too much, I think.” Then she looked at the rather
fine lines of the face beside her. “But I don’t believe you are afraid of
hard work. I remember how hard you worked when you were on
that Hospital Committee.”
“No, I don’t think it’s that,” said Paton quietly. “Let’s say it’s lack of
ambition and driving power.”
Was there something in his tone that sent a vague shadow of
distrust over Claudia’s expression, or was it the echo of some secret
misgiving in herself?
“Does that mean you think ambition—the ordinary get-to-the-top-
of-the-tree ambition—rather commonplace?”
“Not a bit,” he said heartily. “After all, we live on a commonplace
earth. Gilbert is right and I am wrong, and when Gilbert is Lord Chief
Justice and I’m an obscure old bore of a bachelor, I shall, no doubt,
fully realize my wrongness. But do ask me to dinner sometimes.”
“But you mustn’t remain a bachelor,” said Claudia, with all the
enthusiasm of the newly-engaged woman, “because your life will be
incomplete. That sounds like sex conceit, but you said it yourself to
me, and then I began to believe it. And now——” she completed the
sentence with a charming blush.
“Can you imagine any modern woman wanting a man without
worldly ambition, a man she will never be proud of, a man who is
nothing and does nothing?” The tone was light enough, and the girl,
engrossed in her own happiness, did not detect an unusual note of
bitterness. For Colin Paton was never bitter. He could be sarcastic
and even scathing when roused, but he never indulged in the refuge
of cowardly souls.
Claudia took him quite seriously, for happiness, just as sorrow,
may temporarily obscure a sense of humour. “I forbid you to say
such things of yourself,” she said, with an engaging air of
motherliness. “You’re awfully clever—awfully clever. Why, you are
one of the best-read and best-informed men in London.” Suddenly
she realized how often she had turned to him for information or
advice. And she could never remember an occasion on which he had
failed her, or an opinion that her critical faculty on reflection deemed
unsound.
“No market value, dear lady.”
She paused a moment thoughtfully. “Is that true?” she said slowly.
“Gilbert said that the other day when I asked him if he had read
something. He says he has no time for books, it’s as much as he can
do to read the newspapers.... Somehow it seems all wrong.” She
looked away with a puzzled expression at the trees of the Park.
He cast a quick glance at her profile and the beautiful lines of her
throat. He seemed about to say something with unusual impetuosity,
and then he resolutely locked his lips. He allowed her to go on
speaking.
“Ambition gets in the way of—of a lot of other things, doesn’t it? It
seems a voracious dragon, swallowing up everything: friends, books,
pictures—all the beautiful, graceful things of life. Isn’t it a pity?”
“I think so; but then I’m in the minority.”
“And that’s why you are not ambitious,” she flashed out with
sudden insight. “Yes, I see. I wonder if you are right.” Her voice was
a little wistful.
“No,” he said, with resolute reassurance. “No. I’m wrong, and
Gilbert is right. Wife of the Lord Chief Justice—what greater honour
could you wish?”
“Now you are making fun of me,” she replied, with a tiny frown,
“and I was quite serious. It’s difficult to explain. But—well, I hate the
usual sort of man who does nothing except wear his clothes well,
don’t you? Look at Jack. He sets off his uniform beautifully, but he
just footles his life away. There doesn’t seem anything between that
and great strenuosity—except you. I can’t place you. Somehow you
always make me see things in a different perspective from anyone
else. I wonder why it is. Sometimes you make things seem better
and sometimes you make them seem worse.”
He drew in his breath a little and his hand in its thin suède
covering clenched itself on his knee. “Claudia, you mustn’t let me
make things seem worse or any different from—what they are. I’d
be content if my mission in life were to make things better, not
worse, for you. Not that you want that now,” he added hastily,
pulling himself in. “I know, from things you have left unsaid, that
your home life hasn’t been all you wanted and ought to have had,
but now—now you are going to be very happy. Gilbert is a splendid
fellow.”
She turned to him, her face glowing, her eyes deep and dark with
emotion.
“Yes, I think I am going to be very happy. Somehow you have
always understood. I have never had to tell you things. You see,
nobody ever wanted me very much, and I—I wanted somebody to
want me and to rely on me and care for my companionship. It is so
wonderful to think that our interests are one, that what interests me
interests him, that I can tell him my good news and bad news and
be always sure that I don’t bore him. I’ve always had to bottle up
things. I’ve had one or two girl friends, but it isn’t the same. And
even then they get engaged and married and you fall in the
background. But when I’ve got a husband of my own it will be
different, won’t it?”
He hesitated the fraction of a second. “Yes, Claudia, it will be
different. You know how glad I am that you have found happiness,
don’t you? I wanted that so much for my—friend.”
“And isn’t it nice that I am marrying your friend?” she exclaimed
joyfully. “Because you might not have liked my husband, or my
husband might not have liked you. Oh, I know,” sagely. “I have
heard from my friends who got married, that it is sometimes very