100% found this document useful (1 vote)
55 views

Think Stats Exploratory Data Analysis Second Edition Allen B. Downey instant download

Think Stats, Second Edition by Allen B. Downey is a comprehensive introduction to exploratory data analysis using Python. The book covers statistical concepts and methods, including probability, distributions, and hypothesis testing, with a focus on computational techniques. It is designed for those who know how to program and want to apply statistical analysis to real-world data.

Uploaded by

grsbronca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
55 views

Think Stats Exploratory Data Analysis Second Edition Allen B. Downey instant download

Think Stats, Second Edition by Allen B. Downey is a comprehensive introduction to exploratory data analysis using Python. The book covers statistical concepts and methods, including probability, distributions, and hypothesis testing, with a focus on computational techniques. It is designed for those who know how to program and want to apply statistical analysis to real-world data.

Uploaded by

grsbronca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Think Stats Exploratory Data Analysis Second

Edition Allen B. Downey pdf download

https://ebookfinal.com/download/think-stats-exploratory-data-
analysis-second-edition-allen-b-downey/

Explore and download more ebooks or textbooks


at ebookfinal.com
We have selected some products that you may be interested in
Click the link to download now or visit ebookfinal.com
for more options!.

Think Python 1st Edition Allen B. Downey

https://ebookfinal.com/download/think-python-1st-edition-allen-b-
downey/

Think Complexity Complexity Science and Computational


Modeling 2nd (version 1.1) Edition Allen B. Downey

https://ebookfinal.com/download/think-complexity-complexity-science-
and-computational-modeling-2nd-version-1-1-edition-allen-b-downey/

Exploratory Multivariate Analysis by Example Using R


Second Edition Husson

https://ebookfinal.com/download/exploratory-multivariate-analysis-by-
example-using-r-second-edition-husson/

Exploratory analysis of Metallurgical process data with


neural networks and related methods 1st Edition C. Aldrich
(Eds.)
https://ebookfinal.com/download/exploratory-analysis-of-metallurgical-
process-data-with-neural-networks-and-related-methods-1st-edition-c-
aldrich-eds/
Exploratory Network Analysis with Pajek Wouter De Nooy

https://ebookfinal.com/download/exploratory-network-analysis-with-
pajek-wouter-de-nooy/

Applied Multivariate Data Analysis Second Edition Brian S.


Everitt

https://ebookfinal.com/download/applied-multivariate-data-analysis-
second-edition-brian-s-everitt/

Data Analysis in Vegetation Ecology Second Edition Otto


Wildi(Auth.)

https://ebookfinal.com/download/data-analysis-in-vegetation-ecology-
second-edition-otto-wildiauth/

Exploratory and Confirmatory Factor Analysis Understanding


Concepts and Applications 1st Edition Bruce Thompson

https://ebookfinal.com/download/exploratory-and-confirmatory-factor-
analysis-understanding-concepts-and-applications-1st-edition-bruce-
thompson/

How to Think about Analysis 1st Edition Lara Alcock

https://ebookfinal.com/download/how-to-think-about-analysis-1st-
edition-lara-alcock/
Think Stats Exploratory Data Analysis Second Edition
Allen B. Downey Digital Instant Download
Author(s): Allen B. Downey
ISBN(s): 9781491907375, 1491907371
Edition: Second edition
File Details: PDF, 10.84 MB
Year: 2015
Language: english
2n
d
Ed
Think
Think Stats

iti
on
SECOND
EDITION
If you know how to program, you have the skills to turn data into knowledge
using tools of probability and statistics. This concise introduction shows comprehensive
“This is the most 
introduc-

Think Stats
you how to perform statistical analysis computationally, rather than
mathematically, with programs written in Python. tion to the Python data
By working with a single case study throughout this thoroughly revised
analysis stack on the
book, you’ll learn the entire process of exploratory data analysis—from market. Practitioners

Stats
collecting data and generating statistics to identifying patterns and testing who want to brush up 
hypotheses. You’ll explore distributions, rules of probability, visualization, on their technical skills
and many other tools and concepts.
by learning about the
New chapters on regression, time series analysis, survival analysis, and
analytic methods will enrich your discoveries.
tools available for a 
modern programming
■■ Develop an understanding of probability and statistics by language will also 
writing and testing code benefit from this book.
■■ Run experiments to test statistical behavior, such as generating This is an excellent 
samples from several distributions
modern statistics 
■■ Use simulations to understand concepts that are hard to grasp
mathematically textbook. ” —Skipper Seabold
EXPLOR ATORY
■■ Import data from most sources with Python, rather than rely author of StatsModels
DATA ANALYSIS
on data that’s cleaned and formatted for statistics tools
■■ Use statistical inference to answer questions about real-world
data

Allen Downey is a Professor of Computer Science at Olin College of Engineering.


He has taught computer science at Wellesley College, Colby College, and UC
Berkeley. He earned a PhD in Computer Science from UC Berkeley, and master’s
and bachelor’s degrees from MIT.

Downey

STATISTICS PROGR AMMING


Twitter: @oreillymedia
facebook.com/oreilly
US $34.99 CAN $36.99
ISBN: 978-1-491-90733-7

Allen B. Downey
2n
d
Ed
Think
Think Stats

iti
on
SECOND
EDITION
If you know how to program, you have the skills to turn data into knowledge
using tools of probability and statistics. This concise introduction shows comprehensive
“This is the most 
introduc-

Think Stats
you how to perform statistical analysis computationally, rather than
mathematically, with programs written in Python. tion to the Python data
By working with a single case study throughout this thoroughly revised
analysis stack on the
book, you’ll learn the entire process of exploratory data analysis—from market. Practitioners

Stats
collecting data and generating statistics to identifying patterns and testing who want to brush up 
hypotheses. You’ll explore distributions, rules of probability, visualization, on their technical skills
and many other tools and concepts.
by learning about the
New chapters on regression, time series analysis, survival analysis, and
analytic methods will enrich your discoveries.
tools available for a 
modern programming
■■ Develop an understanding of probability and statistics by language will also 
writing and testing code benefit from this book.
■■ Run experiments to test statistical behavior, such as generating This is an excellent 
samples from several distributions
modern statistics 
■■ Use simulations to understand concepts that are hard to grasp
mathematically textbook. ” —Skipper Seabold
EXPLOR ATORY
■■ Import data from most sources with Python, rather than rely author of StatsModels
DATA ANALYSIS
on data that’s cleaned and formatted for statistics tools
■■ Use statistical inference to answer questions about real-world
data

Allen Downey is a Professor of Computer Science at Olin College of Engineering.


He has taught computer science at Wellesley College, Colby College, and UC
Berkeley. He earned a PhD in Computer Science from UC Berkeley, and master’s
and bachelor’s degrees from MIT.

Downey

STATISTICS PROGR AMMING


Twitter: @oreillymedia
facebook.com/oreilly
US $34.99 CAN $36.99
ISBN: 978-1-491-90733-7

Allen B. Downey
SECOND EDITION

Think Stats

Allen B. Downey
Think Stats, Second Edition
by Allen B. Downey
Copyright © 2015 Allen B. Downey. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/
institutional sales department: 800-998-9938 or [email protected].
Editors: Mike Loukides and Meghan Blanchette Indexer: Allen B. Downey
Production Editor: Melanie Yarbrough Cover Designer: Karen Montgomery
Copyeditor: Marta Justak Interior Designer: David Futato
Proofreader: Amanda Kersey Illustrator: Rebecca Demarest

October 2014: Second Edition

Revision History for the Second Edition:


2014-10-09: First release

See http://oreilly.com/catalog/errata.csp?isbn=9781491907337 for release details.

The O’Reilly logo is a registered trademarks of O’Reilly Media, Inc. Think Stats, second edition, the cover
image of an archerfish, and related trade dress are trademarks of O’Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as
trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark
claim, the designations have been printed in caps or initial caps.
While the publisher and the author have used good faith efforts to ensure that the information and instruc‐
tions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors
or omissions, including without limitation responsibility for damages resulting from the use of or reliance
on this work. Use of the information and instructions contained in this work is at your own risk. If any code
samples or other technology this work contains or describes is subject to open source licenses or the intel‐
lectual property rights of others, it is your responsibility to ensure that your use thereof complies with such
licenses and/or rights.
Think Stats is available under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
License. The author maintains an online version at http://thinkstats2.com.

ISBN: 978-1-491-90733-7
[LSI]
Table of Contents

Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1. Exploratory Data Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


A Statistical Approach 2
The National Survey of Family Growth 2
Importing the Data 3
DataFrames 4
Variables 6
Transformation 7
Validation 8
Interpretation 9
Exercises 11
Glossary 12

2. Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Representing Histograms 16
Plotting Histograms 16
NSFG Variables 17
Outliers 19
First Babies 20
Summarizing Distributions 22
Variance 23
Effect Size 23
Reporting Results 24
Exercises 25
Glossary 25

3. Probability Mass Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27


Pmfs 27

iii
Plotting PMFs 28
Other Visualizations 30
The Class Size Paradox 30
DataFrame Indexing 34
Exercises 35
Glossary 37

4. Cumulative Distribution Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39


The Limits of PMFs 39
Percentiles 40
CDFs 41
Representing CDFs 42
Comparing CDFs 44
Percentile-Based Statistics 44
Random Numbers 45
Comparing Percentile Ranks 47
Exercises 47
Glossary 48

5. Modeling Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
The Exponential Distribution 49
The Normal Distribution 52
Normal Probability Plot 54
The lognormal Distribution 55
The Pareto Distribution 57
Generating Random Numbers 60
Why Model? 61
Exercises 61
Glossary 63

6. Probability Density Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65


PDFs 65
Kernel Density Estimation 67
The Distribution Framework 69
Hist Implementation 69
Pmf Implementation 70
Cdf Implementation 71
Moments 72
Skewness 73
Exercises 75
Glossary 77

iv | Table of Contents
7. Relationships Between Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Scatter Plots 79
Characterizing Relationships 82
Correlation 83
Covariance 84
Pearson’s Correlation 85
Nonlinear Relationships 86
Spearman’s Rank Correlation 87
Correlation and Causation 88
Exercises 88
Glossary 89

8. Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
The Estimation Game 91
Guess the Variance 93
Sampling Distributions 94
Sampling Bias 97
Exponential Distributions 98
Exercises 99
Glossary 100

9. Hypothesis Testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101


Classical Hypothesis Testing 101
HypothesisTest 102
Testing a Difference in Means 104
Other Test Statistics 105
Testing a Correlation 107
Testing Proportions 108
Chi-Squared Tests 109
First Babies Again 110
Errors 111
Power 112
Replication 113
Exercises 114
Glossary 114

10. Linear Least Squares. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117


Least Squares Fit 117
Implementation 118
Residuals 119
Estimation 120
Goodness of Fit 122

Table of Contents | v
Testing a Linear Model 124
Weighted Resampling 126
Exercises 127
Glossary 128

11. Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129


StatsModels 130
Multiple Regression 131
Nonlinear Relationships 133
Data Mining 134
Prediction 135
Logistic Regression 137
Estimating Parameters 139
Implementation 140
Accuracy 141
Exercises 142
Glossary 143

12. Time Series Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145


Importing and Cleaning 145
Plotting 147
Linear Regression 148
Moving Averages 151
Missing Values 153
Serial Correlation 153
Autocorrelation 155
Prediction 157
Further Reading 161
Exercises 161
Glossary 162

13. Survival Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165


Survival Curves 165
Hazard Function 167
Estimating Survival Curves 168
Kaplan-Meier Estimation 169
The Marriage Curve 170
Estimating the Survival Function 171
Confidence Intervals 172
Cohort Effects 173
Extrapolation 176
Expected Remaining Lifetime 178

vi | Table of Contents
Exercises 180
Glossary 181

14. Analytic Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183


Normal Distributions 183
Sampling Distributions 184
Representing Normal Distributions 185
Central Limit Theorem 186
Testing the CLT 187
Applying the CLT 190
Correlation Test 191
Chi-Squared Test 193
Discussion 194
Exercises 195

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Table of Contents | vii


Preface

This book is an introduction to the practical tools of exploratory data analysis. The
organization of the book follows the process I use when I start working with a dataset:

• Importing and cleaning: Whatever format the data is in, it usually takes some time
and effort to read the data, clean and transform it, and check that everything made
it through the translation process intact.
• Single variable explorations: I usually start by examining one variable at a time,
finding out what the variables mean, looking at distributions of the values, and
choosing appropriate summary statistics.
• Pair-wise explorations: To identify possible relationships between variables, I look
at tables and scatter plots, and compute correlations and linear fits.
• Multivariate analysis: If there are apparent relationships between variables, I use
multiple regression to add control variables and investigate more complex rela‐
tionships.
• Estimation and hypothesis testing: When reporting statistical results, it is important
to answer three questions: How big is the effect? How much variability should we
expect if we run the same measurement again? Is it possible that the apparent effect
is due to chance?
• Visualization: During exploration, visualization is an important tool for finding
possible relationships and effects. Then if an apparent effect holds up to scrutiny,
visualization is an effective way to communicate results.

This book takes a computational approach, which has several advantages over mathe‐
matical approaches:

• I present most ideas using Python code, rather than mathematical notation. In
general, Python code is more readable; also, because it is executable, readers can
download it, run it, and modify it.

ix
• Each chapter includes exercises readers can do to develop and solidify their learn‐
ing. When you write programs, you express your understanding in code; while you
are debugging the program, you are also correcting your understanding.
• Some exercises involve experiments to test statistical behavior. For example, you
can explore the Central Limit Theorem (CLT) by generating random samples and
computing their sums. The resulting visualizations demonstrate why the CLT works
and when it doesn’t.
• Some ideas that are hard to grasp mathematically are easy to understand by simu‐
lation. For example, we approximate p-values by running random simulations,
which reinforces the meaning of the p-value.
• Because the book is based on a general-purpose programming language (Python),
readers can import data from almost any source. They are not limited to datasets
that have been cleaned and formatted for a particular statistics tool.

The book lends itself to a project-based approach. In my class, students work on a


semester-long project that requires them to pose a statistical question, find a dataset
that can address it, and apply each of the techniques they learn to their own data.
To demonstrate my approach to statistical analysis, the book presents a case study that
runs through all of the chapters. It uses data from two sources:

• The National Survey of Family Growth (NSFG), conducted by the U.S. Centers for
Disease Control and Prevention (CDC) to gather “information on family life, mar‐
riage and divorce, pregnancy, infertility, use of contraception, and men’s and wom‐
en’s health.” (See http://cdc.gov/nchs/nsfg.htm.)
• The Behavioral Risk Factor Surveillance System (BRFSS), conducted by the Na‐
tional Center for Chronic Disease Prevention and Health Promotion to “track
health conditions and risk behaviors in the United States.” (See http://cdc.gov/
BRFSS/.)

Other examples use data from the IRS, the U.S. Census, and the Boston Marathon.
This second edition of Think Stats includes the chapters from the first edition, many of
them substantially revised, and new chapters on regression, time series analysis, survival
analysis, and analytic methods. The previous edition did not use pandas, SciPy, or
StatsModels, so all of that material is new.

How I Wrote This Book


When people write a new textbook, they usually start by reading a stack of old textbooks.
As a result, most books contain the same material in pretty much the same order.

x | Preface
I did not do that. In fact, I used almost no printed material while I was writing this book,
for several reasons:

• My goal was to explore a new approach to this material, so I didn’t want much
exposure to existing approaches.
• Since I am making this book available under a free license, I wanted to make sure
that no part of it was encumbered by copyright restrictions.
• Many readers of my books don’t have access to libraries of printed material, so I
tried to make references to resources that are freely available on the Internet.
• Some proponents of old media think that the exclusive use of electronic resources
is lazy and unreliable. They might be right about the first part, but I think they are
wrong about the second, so I wanted to test my theory.

The resource I used more than any other is Wikipedia. In general, the articles I read on
statistical topics were very good (although I made a few small changes along the way).
I include references to Wikipedia pages throughout the book and I encourage you to
follow those links; in many cases, the Wikipedia page picks up where my description
leaves off. The vocabulary and notation in this book are generally consistent with Wi‐
kipedia, unless I had a good reason to deviate. Other resources I found useful were
Wolfram MathWorld and the Reddit statistics forum.

Using the Code


The code and data used in this book are available from GitHub. Git is a version control
system that allows you to keep track of the files that make up a project. A collection of
files under Git’s control is called a repository. GitHub is a hosting service that provides
storage for Git repositories and a convenient web interface.
The GitHub homepage for my repository provides several ways to work with the code:

• You can create a copy of my repository on GitHub by pressing the Fork button. If
you don’t already have a GitHub account, you’ll need to create one. After forking,
you’ll have your own repository on GitHub that you can use to keep track of code
you write while working on this book. Then you can clone the repo, which means
that you make a copy of the files on your computer.
• Or you could clone my repository. You don’t need a GitHub account to do this, but
you won’t be able to write your changes back to GitHub.
• If you don’t want to use Git at all, you can download the files in a Zip file using the
button in the lower-right corner of the GitHub page.

All of the code is written to work in both Python 2 and Python 3 with no translation.

Preface | xi
I developed this book using Anaconda from Continuum Analytics, which is a free
Python distribution that includes all the packages you’ll need to run the code (and lots
more). I found Anaconda easy to install. By default it does a user-level installation, not
system-level, so you don’t need administrative privileges. And it supports both Python
2 and Python 3. You can download Anaconda from Continuum.
If you don’t want to use Anaconda, you will need the following packages:

• pandas for representing and analyzing data


• NumPy for basic numerical computation
• SciPy for scientific computation including statistics
• StatsModels for regression and other statistical analysis
• matplotlib for visualization

Although these are commonly used packages, they are not included with all Python
installations, and they can be hard to install in some environments. If you have trouble
installing them, I strongly recommend using Anaconda or one of the other Python
distributions that include these packages.
After you clone the repository or unzip the zip file, you should have a file called
ThinkStats2/code/nsfg.py. If you run it, it should read a data file, run some tests, and
print a message like, “All tests passed.” If you get import errors, it probably means there
are packages you need to install.
Most exercises use Python scripts, but some also use the IPython notebook. If you have
not used IPython notebook before, I suggest you start with the documentation.
I wrote this book assuming that the reader is familiar with core Python, including object-
oriented features, but not pandas, NumPy, and SciPy. If you are already familiar with
these modules, you can skip a few sections.
I assume that the reader knows basic mathematics, including logarithms, for example,
and summations. I refer to calculus concepts in a few places, but you don’t have to do
any calculus.
If you have never studied statistics, I think this book is a good place to start. And if you
have taken a traditional statistics class, I hope this book will help repair the damage.

Allen B. Downey is a Professor of Computer Science at the Franklin W. Olin College of
Engineering in Needham, MA.

xii | Preface
Contributor List
If you have a suggestion or correction, please send email to downey@allendow‐
ney.com. If I make a change based on your feedback, I will add you to the contributor
list (unless you ask to be omitted).
If you include at least part of the sentence the error appears in, that makes it easy for
me to search. Page and section numbers are fine, too, but not quite as easy to work with.
Thanks!

• Lisa Downey and June Downey read an early draft and made many corrections and
suggestions.
• Steven Zhang found several errors.
• Andy Pethan and Molly Farison helped debug some of the solutions, and Molly
spotted several typos.
• Andrew Heine found an error in my error function.
• Dr. Nikolas Akerblom knows how big a Hyracotherium is.
• Alex Morrow clarified one of the code examples.
• Jonathan Street caught an error in the nick of time.
• Gábor Lipták found a typo in the book and the relay race solution.
• Many thanks to Kevin Smith and Tim Arnold for their work on plasTeX, which I
used to convert this book to DocBook.
• George Caplan sent several suggestions for improving clarity.
• Julian Ceipek found an error and a number of typos.
• Stijn Debrouwere, Leo Marihart III, Jonathan Hammler, and Kent Johnson found
errors in the first print edition.
• Dan Kearney found a typo.
• Jeff Pickhardt found a broken link and a typo.
• Jörg Beyer found typos in the book and made many corrections in the docstrings
of the accompanying code.
• Tommie Gannert sent a patch file with a number of corrections.
• Alexander Gryzlov suggested a clarification in an exercise.
• Martin Veillette reported an error in one of the formulas for Pearson’s correlation.
• Christoph Lendenmann submitted several errata.
• Haitao Ma noticed a typo and and sent me a note.
• Michael Kearney sent me many excellent suggestions.

Preface | xiii
• Alex Birch made a number of helpful suggestions.
• Lindsey Vanderlyn, Griffin Tschurwald, and Ben Small read an early version of this
book and found many errors.
• John Roth, Carol Willing, and Carol Novitsky performed technical reviews of the
book. They found many errors and made many helpful suggestions.
• Rohit Deshpande found a typesetting error.
• David Palmer sent many helpful suggestions and corrections.
• Erik Kulyk found many typos.

Safari® Books Online


Safari Books Online (www.safaribooksonline.com) is an
on-demand digital library that delivers expert content in
both book and video form from the world’s leading
authors in technology and business.
Technology professionals, software developers, web designers, and business and crea‐
tive professionals use Safari Books Online as their primary resource for research, prob‐
lem solving, learning, and certification training.
Safari Books Online offers a range of plans and pricing for enterprise, government, and
education, and individuals.
Members have access to thousands of books, training videos, and prepublication manu‐
scripts in one fully searchable database from publishers like O’Reilly Media, Prentice
Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit
Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM
Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill,
Jones & Bartlett, Course Technology, and hundreds more. For more information about
Safari Books Online, please visit us online.

xiv | Preface
How to Contact Us
Please address comments and questions concerning this book to the publisher:

O’Reilly Media, Inc.


1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional
information. You can access this page at http://bit.ly/think_stats_2e.
To comment or ask technical questions about this book, send email to bookques
[email protected].
For more information about our books, courses, conferences, and news, see our website
at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia

Preface | xv
CHAPTER 1
Exploratory Data Analysis

The thesis of this book is that data combined with practical methods can answer ques‐
tions and guide decisions under uncertainty.
As an example, I present a case study motivated by a question I heard when my wife
and I were expecting our first child: do first babies tend to arrive late?
If you Google this question, you will find plenty of discussion. Some people claim it’s
true, others say it’s a myth, and some people say it’s the other way around: first babies
come early.
In many of these discussions, people provide data to support their claims. I found many
examples like these:
“My two friends that have given birth recently to their first babies, BOTH went almost 2
weeks overdue before going into labour or being induced.”
“My first one came 2 weeks late and now I think the second one is going to come out two
weeks early!!”
“I don’t think that can be true because my sister was my mother’s first and she was early,
as with many of my cousins.”
Reports like these are called anecdotal evidence because they are based on data that is
unpublished and usually personal. In casual conversation, there is nothing wrong with
anecdotes, so I don’t mean to pick on the people I quoted.
But we might want evidence that is more persuasive and an answer that is more reliable.
By those standards, anecdotal evidence usually fails, because:
Small number of observations
If pregnancy length is longer for first babies, the difference is probably small com‐
pared to natural variation. In that case, we might have to compare a large number
of pregnancies to be sure that a difference exists.

1
Selection bias
People who join a discussion of this question might be interested because their first
babies were late. In that case the process of selecting data would bias the results.
Confirmation bias
People who believe the claim might be more likely to contribute examples that
confirm it. People who doubt the claim are more likely to cite counterexamples.
Inaccuracy
Anecdotes are often personal stories, and often misremembered, misrepresented,
repeated inaccurately, etc.
So how can we do better?

A Statistical Approach
To address the limitations of anecdotes, we will use the tools of statistics, which include:
Data collection
We will use data from a large national survey that was designed explicitly with the
goal of generating statistically valid inferences about the U.S. population.
Descriptive statistics
We will generate statistics that summarize the data concisely, and evaluate different
ways to visualize data.
Exploratory data analysis
We will look for patterns, differences, and other features that address the questions
we are interested in. At the same time we will check for inconsistencies and identify
limitations.
Estimation
We will use data from a sample to estimate characteristics of the general population.
Hypothesis testing
Where we see apparent effects, like a difference between two groups, we will evaluate
whether the effect might have happened by chance.
By performing these steps with care to avoid pitfalls, we can reach conclusions that are
more justifiable and more likely to be correct.

The National Survey of Family Growth


Since 1973, the US Centers for Disease Control and Prevention (CDC) have conducted
the National Survey of Family Growth (NSFG), which is intended to gather “information
on family life, marriage and divorce, pregnancy, infertility, use of contraception, and
men’s and women’s health. The survey results are used…to plan health services and

2 | Chapter 1: Exploratory Data Analysis


Discovering Diverse Content Through
Random Scribd Documents
167
Hanka’s knees shook with terror and he dropped his
candle on the mosaic floor. The candle went out, yet the
room was full of light, a still, blue light, like the
reflections in a block of ice. The four great men did not
move. But King Winter raised his hand and beckoned to
the lad.

“Come here,” he said quite kindly. Hanka had fallen to


his knees, and crept upon them across the floor, holding
up his hands for mercy.

“Who art thou?” demanded the King.

“Have pity, O most wonderful King! I am only Hanka the


Fool.”

“Have ye come back from the great forests, thou and


thy friends?”
“Sir,” replied Hanka, kissing his feet, “It is only I who
have come. I did not mean to chop through the roof of
the palace. I meant to chop a hole in the ice of the
White Sea to catch some fish, for the Tsar and his
people are starving.”

“Thou hast come alone?” asked King Winter in surprise, 168


“How is it that wolves and bears and all the wild beasts
have spared thee, and robbers have not beaten thee to
death?”

“Because, O Most Wonderful Majesty, I traveled with the


Tsar’s blessing on my head.”

King Winter sat up, and even the four Spirits looked
startled.

“But since when,” asked the King, “doth the proud


Merciless Tsar stoop to give his blessing to such a
beggar-lad as thee?”

“Oh,” cried Hanka, “he is not proud, indeed he isn’t! He


is as humble as I am, even I, Hanka the Fool. We call
him the Merciful Tsar, for he has turned from all his
wickedness, and given his wealth away. I was a
wayfarer, and he had no other gift for me, so he gave
me his blessing, Most Wonderful King!”

At these words King Winter arose, the four spirits lifted 169
their heads, there was a murmur of many voices and
then fairy music everywhere.

“Rise up, Hanka,” said the Ruler of the North. “My reign
in the City is over, for the Merciless Tsar has repented
and become as humble as thou. Go back to the great
forests where thy Tsar and his people are, and tell them
to return hither, for King Winter and his forces have left
the city, and it belongs to the Tsar once more! In token
of this, in case thou shouldst forget what to say, take
that bag of snow-stars behind the Throne, and carry it
to the Tsar.”

170
While he spoke, a whole army of spirits, snow-fairies
and wind-fairies and genie, crowned with frost-flowers,
gathered from all parts of the palace. Some came from
the bedrooms, where they had been asleep in the
bureau-drawers, some from the kitchen where they had
been hiding under cups and mixing-bowls, some peeped
down over the pictures on the parlor wall, or between
the curtains, or even out of the empty hall-stove. They
all joined hands in a ring and danced around Hanka,
who sat bewildered on the floor with his axe and
fishing-rod, wondering where all these creatures had
been while he had explored the palace.
“Joy, joy,” sang the spirits, “we are going home again,
home to the North Pole, to our friends, the seals and
polar bears, the long waiting-time is over, for the
Merciless Tsar has repented—joy, joy, joy!”

Then there was a tinkle of icicles outside the door, as 171


King Winter’s sleigh with the three white horses came
jingling up. The palace doors flew open and Hanka saw
that the snow had already melted down almost to the
turquoise terraces. The king leaped into his chariot,
waved his hand to the humble fool who had followed
him to the door, and away went the royal horses, over
the frozen White Sea to the distant North Pole, with all
the fairy train holding on and running behind as swiftly
as the wind.

Hanka turned back and looked at the empty throne


chamber. The four great Spirits had vanished, though he
had not seen them running away with the fairies. But
where they had stood, the floor was cracked a little, and
four yellow crocus-flowers had sprung up through the
stone.

Hanka felt very lonesome and frightened in the big, 172


splendid palace. He picked up his axe and rod and the
bag of snow-stars King Winter had ordered him to take,
and ran as fast as he could through the open door, over
the terraces, through the town and gates to the open
country outside. Everywhere the snow had gone away
so quickly that the second stories of all the houses were
quite free and the first stories just appearing. Beyond
the gates, he came upon great streams of water that
were running down to the White Sea, where the ice was
melting and wiping out the track of King Winter’s sleigh.
Hanka turned toward the South, to the great forests
where the Tsar’s people had built their wooden village.
He sang aloud as he walked, because the warm sun was
shining on his back, and his stomach was full of
crackers and jam, so he felt very happy despite the
heavy bag of snow-stars on his shoulder. If he had not
been a fool he would certainly have wondered why they
were so heavy; but he was a fool so he just carried
them and did not wonder at anything. Above him in the
treetops the birds were singing as happily as he, the air
smelled sweet and warm, and in some places Mother
Mir’s flowers were peeping through the thin, wet snow.

“Why, I believe it’s going to be Spring!” said Hanka. 173

In the village, the Tsar was still sitting on the bench


beside the gate. The villagers came to offer him food,
but he refused it, saying “You have not enough for
yourselves. I will not eat your food. Give it to your
children, good people!”

“But you will starve!” they cried. “Oh no,” replied the
Tsar. “Some good Saint will take care of me.”

And in the night, when the village was quiet and dark,
the crows in the forest flew to him and brought him
some frozen berries, the squirrels brought nuts to
appease his hunger, and the fairies from the great
Forest brought partridge-eggs and reindeer milk.

174
It was a beautiful sunny morning, when the villagers
who stood about the gate talking to the Tsar saw Hanka
returning, with his axe and rod and a bag over his
shoulder. “Look, look,” they cried, “he is bringing a
whole bagful of fish!”

“But where do you suppose he got the bag?” said the


Tsar. “He didn’t have it when he left.”

They were not kept guessing very long. Hanka came


running and shouting:

“Greetings, O Merciful Tsar, greetings from King Winter! 175


He gave me a message to thee, but I have forgotten it,
but here is a bag of snow-stars for thee, and thy city is
all thawed out, King Winter has gone back to the North
Pole. And I went through thy palace and found lots of
crackers and jam, which I ate. I didn’t mean to steal,
but there was nobody to ask for them so I had to take
them.”

The Tsar smiled and nodded.

“Thou art welcome to my crackers and jam, dear


Hanka,” he said, as he opened the bag of snow-stars,
took it by the lower corners, and turned it upside down.

Out of the bag rolled thousands and thousands of


sparkling, flashing diamonds! The people stood open-
mouthed, and Hanka sat down with surprise when he
saw what he had been carrying.

“That means we may return, for King Winter’s war is 176


over,” said the Tsar. So all the people went back to their
city on the shores of the White Sea, where the streets
were paved with silver, the walls were shining marble,
and the church steeples were topped with gold. The
Tsar sat on his throne again, but he ruled his people
now with mercy and justice, so everyone liked to be
brought before him to see his mild fatherly face.

Hanka was allowed to live in the palace all his life, and
had a silver fishing-rod, a silken line and a diamond
sinker, and was permitted to cast for gold-fish in the
royal pond. King Winter came for a visit once every year
with a little snow just to remind people of his past
reign; but he always found the people ready to joke and
laugh at the bad weather he brought, for they were all
happy and contented who lived in the city of the
Merciful Tsar.
THE END
Transcriber’s Notes

Copyright notice provided as in the original—this e-text


is public domain in the country of publication.
Silently corrected palpable typos; left non-standard (or
amusing) spellings and dialect unchanged.
In the text versions, delimited italics text in
_underscores_ (the HTML version reproduces the font
form of the printed book.)
*** END OF THE PROJECT GUTENBERG EBOOK THE CRUISE OF
THE LITTLE DIPPER, AND OTHER FAIRY TALES ***

Updated editions will replace the previous one—the old editions will
be renamed.

Creating the works from print editions not protected by U.S.


copyright law means that no one owns a United States copyright in
these works, so the Foundation (and you!) can copy and distribute it
in the United States without permission and without paying
copyright royalties. Special rules, set forth in the General Terms of
Use part of this license, apply to copying and distributing Project
Gutenberg™ electronic works to protect the PROJECT GUTENBERG™
concept and trademark. Project Gutenberg is a registered trademark,
and may not be used if you charge for an eBook, except by following
the terms of the trademark license, including paying royalties for use
of the Project Gutenberg trademark. If you do not charge anything
for copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such as
creation of derivative works, reports, performances and research.
Project Gutenberg eBooks may be modified and printed and given
away—you may do practically ANYTHING in the United States with
eBooks not protected by U.S. copyright law. Redistribution is subject
to the trademark license, especially commercial redistribution.

START: FULL LICENSE


THE FULL PROJECT GUTENBERG
LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the free


distribution of electronic works, by using or distributing this work (or
any other work associated in any way with the phrase “Project
Gutenberg”), you agree to comply with all the terms of the Full
Project Gutenberg™ License available with this file or online at
www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand, agree
to and accept all the terms of this license and intellectual property
(trademark/copyright) agreement. If you do not agree to abide by all
the terms of this agreement, you must cease using and return or
destroy all copies of Project Gutenberg™ electronic works in your
possession. If you paid a fee for obtaining a copy of or access to a
Project Gutenberg™ electronic work and you do not agree to be
bound by the terms of this agreement, you may obtain a refund
from the person or entity to whom you paid the fee as set forth in
paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only be


used on or associated in any way with an electronic work by people
who agree to be bound by the terms of this agreement. There are a
few things that you can do with most Project Gutenberg™ electronic
works even without complying with the full terms of this agreement.
See paragraph 1.C below. There are a lot of things you can do with
Project Gutenberg™ electronic works if you follow the terms of this
agreement and help preserve free future access to Project
Gutenberg™ electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright law
in the United States and you are located in the United States, we do
not claim a right to prevent you from copying, distributing,
performing, displaying or creating derivative works based on the
work as long as all references to Project Gutenberg are removed. Of
course, we hope that you will support the Project Gutenberg™
mission of promoting free access to electronic works by freely
sharing Project Gutenberg™ works in compliance with the terms of
this agreement for keeping the Project Gutenberg™ name associated
with the work. You can easily comply with the terms of this
agreement by keeping this work in the same format with its attached
full Project Gutenberg™ License when you share it without charge
with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.

1.E. Unless you have removed all references to Project Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project Gutenberg™
work (any work on which the phrase “Project Gutenberg” appears,
or with which the phrase “Project Gutenberg” is associated) is
accessed, displayed, performed, viewed, copied or distributed:
This eBook is for the use of anyone anywhere in
the United States and most other parts of the world
at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use
it under the terms of the Project Gutenberg License
included with this eBook or online at
www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of
the country where you are located before using this
eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is derived


from texts not protected by U.S. copyright law (does not contain a
notice indicating that it is posted with permission of the copyright
holder), the work can be copied and distributed to anyone in the
United States without paying any fees or charges. If you are
redistributing or providing access to a work with the phrase “Project
Gutenberg” associated with or appearing on the work, you must
comply either with the requirements of paragraphs 1.E.1 through
1.E.7 or obtain permission for the use of the work and the Project
Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is posted


with the permission of the copyright holder, your use and distribution
must comply with both paragraphs 1.E.1 through 1.E.7 and any
additional terms imposed by the copyright holder. Additional terms
will be linked to the Project Gutenberg™ License for all works posted
with the permission of the copyright holder found at the beginning
of this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files containing a
part of this work or any other work associated with Project
Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute this
electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the Project
Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or providing


access to or distributing Project Gutenberg™ electronic works
provided that:

• You pay a royalty fee of 20% of the gross profits you


derive from the use of Project Gutenberg™ works
calculated using the method you already use to
calculate your applicable taxes. The fee is owed to the
owner of the Project Gutenberg™ trademark, but he has
agreed to donate royalties under this paragraph to the
Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each
date on which you prepare (or are legally required to
prepare) your periodic tax returns. Royalty payments
should be clearly marked as such and sent to the
Project Gutenberg Literary Archive Foundation at the
address specified in Section 4, “Information about
donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user


who notifies you in writing (or by e-mail) within 30 days
of receipt that s/he does not agree to the terms of the
full Project Gutenberg™ License. You must require such
a user to return or destroy all copies of the works
possessed in a physical medium and discontinue all use
of and all access to other copies of Project Gutenberg™
works.

• You provide, in accordance with paragraph 1.F.3, a full


refund of any money paid for a work or a replacement
copy, if a defect in the electronic work is discovered and
reported to you within 90 days of receipt of the work.

• You comply with all other terms of this agreement for


free distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™


electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for


the “Right of Replacement or Refund” described in paragraph 1.F.3,
the Project Gutenberg Literary Archive Foundation, the owner of the
Project Gutenberg™ trademark, and any other party distributing a
Project Gutenberg™ electronic work under this agreement, disclaim
all liability to you for damages, costs and expenses, including legal
fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR
NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR
BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK
OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL
NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT,
CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF
YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of receiving
it, you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or
entity that provided you with the defective work may elect to provide
a replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.
1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation,


the trademark owner, any agent or employee of the Foundation,
anyone providing copies of Project Gutenberg™ electronic works in
accordance with this agreement, and any volunteers associated with
the production, promotion and distribution of Project Gutenberg™
electronic works, harmless from all liability, costs and expenses,
including legal fees, that arise directly or indirectly from any of the
following which you do or cause to occur: (a) distribution of this or
any Project Gutenberg™ work, (b) alteration, modification, or
additions or deletions to any Project Gutenberg™ work, and (c) any
Defect you cause.

Section 2. Information about the Mission


of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the
assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many
small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About


Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.

Project Gutenberg™ eBooks are often created from several printed


editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebookfinal.com

You might also like