An Introduction to Statistics with Python With Applications in the Life Sciences - 2nd Edition Chapter-by-Chapter Download
An Introduction to Statistics with Python With Applications in the Life Sciences - 2nd Edition Chapter-by-Chapter Download
Visit the link below to download the full version of this book:
https://medipdf.com/product/an-introduction-to-statistics-with-python-with-appli
cations-in-the-life-sciences-2nd-edition/
An Introduction to Statistics
with Python
With Applications in the Life Sciences
Second Edition
Thomas Haslwanter
School of Medical Engineering and Applied
Social Sciences
University of Applied Sciences Upper Austria
Linz, Austria
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To my two-, three-, and four-legged
household companions: Jean, Felix, and his
sister Jessica: Thank you so much for all the
support you have provided over the years!
Preface
In the data analysis for my own research work, I was often slowed down by two
things: (1) I did not know enough statistics, and (2) the books available would provide
a theoretical background, but no real practical help. The book you are holding in your
hands (or on your tablet or laptop) is intended to be the book that will solve this very
problem. It is designed to provide enough basic understanding so that you know what
you are doing, and it should equip you with the tools you need. I believe that the
Python solutions provided in this book for the most basic statistical problems address
at least 90% of the problems that most physicists, biologists, and medical doctors
encounter in their work. So if you are the typical graduate student working on your
degree, or a medical researcher analyzing your latest experiments, chances are that
you will find the tools you require here—explanation and source-code included.
This is the reason I have focused on statistical basics and hypothesis tests in
this book, and refer only briefly to other statistical approaches. I am well aware
that most of the tests presented in this book can also be carried out using statistical
modeling. But in many cases, this is not the methodology used in many life science
journals. Advanced statistical analysis goes beyond the scope of this book, and—to
be frank—exceeds my own knowledge of statistics.
My motivation for providing the solutions in Python is based on two considera-
tions. One is that I would like them to be available to everyone. While commercial
solutions like Matlab, SPSS, Minitab etc. offer powerful tools, most can only use
them legally in an academic setting. In contrast, Python is completely free (as in free
beer is often heard in the Python community). The second reason is that Python is the
most beautiful coding language that I have yet encountered; and around 2010 Python
and its documentation matured to the point where one can use it without being an
serious coder. Together, this book, Python, and the tools that the Python ecosystem
offers today provide a beautiful, free package that covers all the statistics that most
researchers will need in their lifetime.
vii
viii Preface
Since the publication of the first edition, Python has continuously gained popularity
and become firmly established as one of the foremost programming languages for
statistical data analysis. All the core packages have matured. And thanks to the
stunning development of Jupyter as an interactive programming environment, Python
has become even more accessible for people with little programming background.
To reflect these developments, and to incorporate the suggestions I have received for
improving the presentation of the material, Springer has given me the opportunity to
bring out a new edition of Introduction to Statistics with Python.
Compared to the first edition, the following changes have been made:
• The package pandas and its DataFrames have become an integral part of
scientific Python, as has the Jupyter framework for interactive data environ-
ments. Correspondingly, a bigger amount of space has been dedicated to their
introduction.
• A new package, pingouin, is promising a simplified and more powerful inter-
face for many common statistics function. This package is introduced, and many
application examples have been added.
• The visualization of data has been expanded, including the preparation of
publication-ready graphics.
• The design of experiments and power analyses are discussed in more detail.
• A new section has been added on the confidence intervals of frequently used
statistical parameters.
• A new chapter has been added on finding patterns in data, including an introduction
to the correlation coefficient, cross- and autocorrelation. For an application of
these concepts, a short introduction is given to time series analysis.
As for the first edition, all examples and solutions from this book are again avail-
able online. This includes code samples and example programs, Jupyter Notebooks
with additional or extended information, as well as the data and Python code used to
generate most of the figures. They can be downloaded from https://github.com/tho
mas-haslwanter/statsintro-python-2e.
I hope this book will help you with the statistical analysis of your data, and
convey some of the often really simple ideas behind the sometimes awkwardly named
statistical analysis procedures.
much all at once. However, solutions provided to the exercises at the end of most
chapters should help you to get up to speed with Python.
• you are not a statistics expert: If you have advanced statistics experience, the
online help in Python and the Python packages may be sufficient to allow you to
do most of your data analysis right away. This book may still help you to get started
with Python. However, the book concentrates on the basic ideas of statistics and
on hypothesis tests, and only the last part introduces linear regression modeling
and Bayesian statistics.
This book is designed to give you all (or at least most of) the tools that you
will need for statistical data analysis. I attempt to provide the background you need
to understand what you are doing. I do not prove any theorems, and do not apply
mathematics unless necessary. For all tests, a working Python program is provided.
In principle, you just have to define your problem, select the corresponding program,
and adapt it to your needs. This should allow you to get going quickly, even if you
have little Python experience. This is also the reason why I have not provided the
software as one single Python package; I expect that you will have to tailor each
program to your specific setup (data format, etc.).
This book is organized into three parts
Part I gives an introduction to Python: how to set it up, simple programs to get
started, and tips on how to avoid some common mistakes. It also shows how to
read data from different sources into Python, and how to visualize statistical data.
Part II provides an introduction to statistical analysis; on how to design a study,
power analysis, and how best to analyze data; probability distributions; and an
overview of the most important hypothesis tests. Even though modern statistics
is firmly based in statistical modeling, hypothesis tests still seem to dominate the
life sciences. For each test, a Python program is provided that shows how the test
can be implemented.
Part III provides an introduction to correlation and regression analysis, time
series analysis, and statistical modeling, and a look at advanced statistical analysis
procedures. I have also included tests on discrete data in this section, such as
logistic regression, as they utilize “generalized linear models” which I regard
as advanced. This part ends with a presentation of the basic ideas of Bayesian
statistics.
To achieve all those goals as quickly as possible, the Appendix A of the book
provides hints on how to efficiently develop correct and working code. This should
get you to the point where you can get things done quickly.
x Preface
Acknowledgments
Python is built on the contributions from the user community, and some of the sections
in this book are based on some of the excellent information available on the web.
(Permission has been granted by the authors to reprint their contributions here.)
I especially want to thank the following people:
• Christiane Takacs helped me enormously by polishing the introductory statistics
sections.
• Connor Johnson wrote a very nice blog explaining the results of the statsmodels
OLS command, which provided the basis for the section on Statistical Models.
• Cam Davidson Pilon wrote the excellent open-source e-book Probabilistic-
Programming-and-Bayesian-Methods-for-Hackers. From there, I took the
example of the Challenger disaster to demonstrate Bayesian statistics.
• Fabian Pedregosa’s blog on ordinal logistic regression allowed me to include this
topic, which otherwise would be admittedly beyond my own skills.
I also want to thank Springer Publishing for the chance to bring out the second
edition of this book, and to base the three introductory chapters (Python, Data Import,
and Data Display) to a significant part on the corresponding chapters of my book
Hands-on Signal Analysis with Python.
If you have a suggestion or correction, please send an email to my work address
[email protected]. If I make a change based on your feedback, I will add
you to the list of contributors unless advised otherwise. If you include at least part
of the sentence the error appears in, that makes it easy for me to search. Page and
section numbers are fine, too, but not as easy to work with. Thanks!
xi
xii Contents
xv
xvi Abbreviations
The first part of the book presents an introduction to statistics based on Python. It
is impossible to cover the whole language in thirty or forty pages, so if you are
a beginner, please see one of the excellent Python introductions available on the
Internet for details. Links are given below. This part is a kick-start for Python; it
shows how to install Python under Windows, Linux, or MacOS, and walks step-by-
step through documented programming examples. Tips are included to help avoid
some of the problems frequently encountered while learning Python.
Because most of the data for statistical analysis are commonly obtained from text
files, Excel files, or data preprocessed by Matlab, the third chapter presents simple
ways to import these types of data into Python.
The last chapter of Part I illustrates various ways of visualizing data in Python.
Since the flexibility of Python for interactive data analysis has led to a certain com-
plexity that can frustrate new Python programmers, code samples for various types
of interactive plots should help future Pythonistas to avoid these problems.