Bayesian Analysis with Python 1st Edition Martin instant download
Bayesian Analysis with Python 1st Edition Martin instant download
download
https://ebookname.com/product/bayesian-analysis-with-python-1st-
edition-martin/
https://ebookname.com/product/xml-processing-with-perl-python-
and-php-1st-edition-martin-c-brown/
https://ebookname.com/product/bayesian-data-analysis-3rd-edition-
gelman-a/
https://ebookname.com/product/bayesian-methods-for-statistical-
analysis-borek-puza/
https://ebookname.com/product/straggling-through-fire-an-
anthology-of-proemistry-first-edition-ghulam-murtaza-aatir/
Theoretical Biochemistry Processes and Properties of
Biological Systems 1st Edition Leif A. Eriksson (Eds.)
https://ebookname.com/product/theoretical-biochemistry-processes-
and-properties-of-biological-systems-1st-edition-leif-a-eriksson-
eds/
https://ebookname.com/product/lady-death-the-memoirs-of-stalin-s-
sniper-1st-edition-lyudmila-pavlichenko/
https://ebookname.com/product/ecological-economics-for-the-
anthropocene-an-emerging-paradigm-peter-brown-editor/
https://ebookname.com/product/student-solutions-manual-to-
accompany-calculus-6th-edition-robert-ellis/
https://ebookname.com/product/a-course-in-linear-algebra-with-
applications-2nd-edition-derek-j-s-robinson/
Bayesian Analysis with Python
Osvaldo Martin
BIRMINGHAM - MUMBAI
Bayesian Analysis with Python
All rights reserved. No part of this book may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, without the prior written
permission of the publisher, except in the case of brief quotations embedded in
critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy
of the information presented. However, the information contained in this book
is sold without warranty, either express or implied. Neither the author nor Packt
Publishing, and its dealers and distributors will be held liable for any damages
caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the
companies and products mentioned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy of this information.
ISBN 978-1-78588-380-4
www.packtpub.com
Credits
Reviewer Proofreader
Austin Rochford Safis Editor
Copy Editor
Safis Editing
Vikrant Phadke
About the Author
I would like to thank my wife, Romina, for her support while writing
this book and in general for her support in all my projects, specially
the unreasonable ones. I also want to thank Walter Lapadula,
Juan Manuel Alonso, and Romina Torres-Astorga for providing
invaluable feedback and suggestions on my drafts.
A special thanks goes to the core developers of PyMC3. This book
was possible only because of the dedication, love, and hard work
they have put into PyMC3. I hope this book contributes to the spread
and adoption of this great library.
About the Reviewer
At www.PacktPub.com, you can also read a collection of free technical articles, sign
up for a range of free newsletters and receive exclusive discounts and offers on Packt
books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all
Packt books and video courses, as well as industry-leading tools to help you plan
your personal development and advance your career.
Why subscribe?
• Fully searchable across every book published by Packt
• Copy and paste, print, and bookmark content
• On demand and accessible via a web browser
Table of Contents
Preface vii
Chapter 1: Thinking Probabilistically - A Bayesian Inference Primer 1
Statistics as a form of modeling 2
Exploratory data analysis 2
Inferential statistics 3
Probabilities and uncertainty 5
Probability distributions 7
Bayes' theorem and statistical inference 10
Single parameter inference 13
The coin-flipping problem 13
The general model 14
Choosing the likelihood 14
Choosing the prior 16
Getting the posterior 18
Computing and plotting the posterior 18
Influence of the prior and how to choose one 21
Communicating a Bayesian analysis 23
Model notation and visualization 23
Summarizing the posterior 24
Highest posterior density 24
Posterior predictive checks 27
Installing the necessary Python packages 28
Summary 29
Exercises 29
Chapter 2: Programming Probabilistically – A PyMC3 Primer 31
Probabilistic programming 32
Inference engines 33
Non-Markovian methods 33
Markovian methods 36
[i]
Table of Contents
PyMC3 introduction 46
Coin-flipping, the computational approach 46
Model specification 47
Pushing the inference button 48
Diagnosing the sampling process 48
Summarizing the posterior 55
Posterior-based decisions 55
ROPE 56
Loss functions 57
Summary 58
Keep reading 58
Exercises 59
Chapter 3: Juggling with Multi-Parametric and
Hierarchical Models 61
Nuisance parameters and marginalized distributions 62
Gaussians, Gaussians, Gaussians everywhere 64
Gaussian inferences 64
Robust inferences 69
Student's t-distribution 69
Comparing groups 75
The tips dataset 76
Cohen's d 80
Probability of superiority 81
Hierarchical models 81
Shrinkage 84
Summary 88
Keep reading 88
Exercises 89
Chapter 4: Understanding and Predicting Data with Linear
Regression Models 91
Simple linear regression 92
The machine learning connection 92
The core of linear regression models 93
Linear models and high autocorrelation 100
Modifying the data before running 101
Changing the sampling method 103
Interpreting and visualizing the posterior 103
Pearson correlation coefficient 107
Pearson coefficient from a multivariate Gaussian 110
Robust linear regression 113
Hierarchical linear regression 117
Correlation, causation, and the messiness of life 124
[ ii ]
Table of Contents
[ iii ]
Table of Contents
[ iv ]
Table of Contents
[v]
Preface
Bayesian statistics has been around for more than 250 years now. During this time
it has enjoyed as much recognition and appreciation as disdain and contempt.
Through the last few decades it has gained more and more attention from people in
statistics and almost all other sciences, engineering, and even outside the walls of the
academic world. This revival has been possible due to theoretical and computational
developments. Modern Bayesian statistics is mostly computational statistics. The
necessity for flexible and transparent models and a more interpretation of statistical
analysis has only contributed to the trend.
Here, we will adopt a pragmatic approach to Bayesian statistics and we will not
care too much about other statistical paradigms and their relationship to Bayesian
statistics. The aim of this book is to learn about Bayesian data analysis with the help
of Python. Philosophical discussions are interesting but they have already been
undertaken elsewhere in a richer way than we can discuss in these pages.
[ vii ]
Preface
Chapter 3, Juggling with Multi-Parametric and Hierarchical Models, tells us about the
very basis of Bayesian modeling and we start adding complexity to the mix. We
learn how to build and analyze models with more than one parameter and how to
put structure into models, taking advantages of hierarchical models.
Chapter 4, Understanding and Predicting Data with Linear Regression Models, tells us
about how linear regression is a very widely used model per se and a building block
of more complex models. In this chapter, we apply linear models to solve regression
problems and how to adapt them to deal with outliers and multiple variables.
Chapter 5, Classifying Outcomes with Logistic Regression, generalizes the the linear
model from previous chapter to solve classification problems including problems
with multiple input and output variables.
Chapter 7, Mixture Models, discusses how to mix simpler models to build more
complex ones. This leads us to new models and also to reinterpret models learned
in previous chapters. Problems, such as data clustering and dealing with count data,
are discussed.
Chapter 8, Gaussian Processes, closes the book by briefly discussing some more
advanced concepts related to non-parametric statistics. What kernels are, how to use
kernelized linear regression, and how to use Gaussian processes for regression are
the central themes of this chapter.
[ viii ]
Preface
Maybe the easiest way to install Python and Python libraries is using Anaconda,
a scientific computing distribution. You can read more about Anaconda and
download it from https://www.continuum.io/downloads. Once Anaconda is in
our system, we can install new Python packages with this command: conda install
NamePackage.
• Ipython 5.0
• NumPy 1.11.1
• SciPy 0.18.1
• Pandas 0.18.1
• Matplotlib 1.5.3
• Seaborn 0.7.1
• PyMC3 3.0
[ ix ]
Preface
Conventions
In this book, you will find a number of text styles that distinguish between different
kinds of information. Here are some examples of these styles and an explanation of
their meaning.
Code words in text, database table names, folder names, filenames, file extensions,
pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "To
compute the HPD in the correct way we will use the function plot_post."
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about
this book—what you liked or disliked. Reader feedback is important for us as it helps
us develop titles that you will really get the most out of.
[x]
Preface
If there is a topic that you have expertise in and you are interested in either writing
or contributing to a book, see our author guide at www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to
help you to get the most from your purchase.
1. Log in or register to our website using your e-mail address and password.
2. Hover the mouse pointer on the SUPPORT tab at the top.
3. Click on Code Downloads & Errata.
4. Enter the name of the book in the Search box.
5. Select the book for which you're looking to download the code files.
6. Choose from the drop-down menu where you purchased this book from.
7. Click on Code Download.
You can also download the code files by clicking on the Code Files button on the
book's webpage at the Packt Publishing website. This page can be accessed by
entering the book's name in the Search box. Please note that you need to be logged in
to your Packt account.
Once the file is downloaded, please make sure that you unzip or extract the folder
using the latest version of:
The code bundle for the book is also hosted on GitHub at https://github.com/
PacktPublishing/Bayesian-Analysis-with-Python. We also have other code
bundles from our rich catalog of books and videos available at https://github.
com/PacktPublishing/. Check them out!
[ xi ]
Preface
Errata
Although we have taken every care to ensure the accuracy of our content, mistakes
do happen. If you find a mistake in one of our books—maybe a mistake in the text or
the code—we would be grateful if you could report this to us. By doing so, you can
save other readers from frustration and help us improve subsequent versions of this
book. If you find any errata, please report them by visiting http://www.packtpub.
com/submit-errata, selecting your book, clicking on the Errata Submission Form
link, and entering the details of your errata. Once your errata are verified, your
submission will be accepted and the errata will be uploaded to our website or added
to any list of existing errata under the Errata section of that title.
Piracy
Piracy of copyrighted material on the Internet is an ongoing problem across all
media. At Packt, we take the protection of our copyright and licenses very seriously.
If you come across any illegal copies of our works in any form on the Internet, please
provide us with the location address or website name immediately so that we can
pursue a remedy.
We appreciate your help in protecting our authors and our ability to bring you
valuable content.
Questions
If you have a problem with any aspect of this book, you can contact us at
[email protected], and we will do our best to address the problem.
[ xii ]
Thinking Probabilistically -
A Bayesian Inference Primer
Probability theory is nothing but common sense reduced to calculation.
-Pierre-Simon Laplace
In this chapter, we will learn the core concepts of Bayesian statistics and some of the
instruments in the Bayesian toolbox. We will use some Python code in this chapter,
but this chapter will be mostly theoretical; most of the concepts in this chapter will be
revisited many times through the rest of the book. This chapter, being intense on the
theoretical side, may be a little anxiogenic for the coder in you, but I think it will ease
the path to effectively applying Bayesian statistics to your problems.
• Statistical modeling
• Probabilities and uncertainty
• Bayes' theorem and statistical inference
• Single parameter inference and the classic coin-flip problem
• Choosing priors and why people often don't like them, but should
• Communicating a Bayesian analysis
• Installing all Python packages
[1]
Thinking Probabilistically - A Bayesian Inference Primer
[2]
Chapter 1
OK, so let's assume we have our dataset; usually, a good idea is to explore and
visualize it in order to get some intuition about what we have in our hands. This can
be achieved through what is known as Exploratory Data Analysis (EDA), which
basically consists of the following:
• Descriptive statistics
• Data visualization
The first one, descriptive statistics, is about how to use some measures (or statistics)
to summarize or characterize the data in a quantitative manner. You probably
already know that you can describe data using the mean, mode, standard deviation,
interquartile ranges, and so forth. The second one, data visualization, is about
visually inspecting the data; you probably are familiar with representations such
as histograms, scatter plots, and others. While EDA was originally thought of as
something you apply to data before doing any complex analysis or even as an
alternative to complex model-based analysis, through the book we will learn that
EDA is also applicable to understanding, interpreting, checking, summarizing, and
communicating the results of Bayesian analysis.
Inferential statistics
Sometimes, plotting our data and computing simple numbers, such as the average
of our data, is all we need. Other times, we want to make a generalization based
on our data. We may want to understand the underlying mechanism that could
have generated the data, or maybe we want to make predictions for future
(yet unobserved) data points, or we need to choose among several competing
explanations for the same observations. That's the job of inferential statistics. To do
inferential statistics we will rely on probabilistic models. There are many types of
models and most of science, and I will add all of our understanding of the real world,
is through models. The brain is just a machine that models reality (whatever reality
might be) see this TED talk about the machine that builds the reality http://www.
tedxriodelaplata.org/videos/m%C3%A1quina-construye-realidad.
[3]
Thinking Probabilistically - A Bayesian Inference Primer
What are models? Models are simplified descriptions of a given system (or process).
Those descriptions are purposely designed to capture only the most relevant aspects
of the system, and hence, most models do not pretend they are able to explain
everything; on the contrary, if we have a simple and a complex model and both
models explain the data more or less equally well, we will generally prefer the
simpler one. This heuristic for simple models is known as Occam's razor, and we will
discuss how it is related to Bayesian analysis in Chapter 6, Model Comparison.
Model building, no matter which type of model you are building, is an iterative
process following more or less the same basic rules. We can summarize the Bayesian
modeling process using three steps:
1. Given some data and some assumptions on how this data could have been
generated, we will build models. Most of the time, models will be crude
approximations, but most of the time this is all we need.
2. Then we will use Bayes' theorem to add data to our models and derive the
logical consequences of mixing the data and our assumptions. We say we are
conditioning the model on our data.
3. Lastly, we will check that the model makes sense according to different
criteria, including our data and our expertise on the subject we are studying.
Bayesian models are also known as probabilistic models because they are built
using probabilities. Why probabilities? Because probabilities are the correct
mathematical tool to model the uncertainty in our data, so let's take a walk through
the garden of forking paths.
[4]
Chapter 1
[5]
Thinking Probabilistically - A Bayesian Inference Primer
Logic is about thinking without making mistakes. Under the Aristotelian or classical
logic, we can only have statements taking the values true or false. Under the
Bayesian definition of probability, certainty is just a special case: a true statement has
a probability of 1, a false one has probability 0. We would assign a probability of 1
about life on Mars only after having conclusive data indicating something is growing
and reproducing and doing other activities we associate with living organisms.
Notice, however, that assigning a probability of 0 is harder because we can always
think that there is some Martian spot that is unexplored, or that we have made
mistakes with some experiment, or several other reasons that could lead us to falsely
believe life is absent on Mars when it is not. Related to this point is Cromwell's rule,
stating that we should reserve the use of the prior probabilities of 0 or 1 to logically
true or false statements. Interesting enough, Cox mathematically proved that if we
want to extend logic to include uncertainty we must use probabilities and probability
theory. Bayes' theorem is just a logical consequence of the rules of probability as
we will see soon. Hence, another way of thinking about Bayesian statistics is as an
extension of logic when dealing with uncertainty, something that clearly has nothing
to do with subjective reasoning in the pejorative sense. Now that we know the
Bayesian interpretation of probability, let's see some of the mathematical properties
of probabilities. For a more detailed study of probability theory, you can read
Introduction to probability by Joseph K Blitzstein & Jessica Hwang.
Probabilities are numbers in the interval [0, 1], that is, numbers between 0 and 1,
including both extremes. Probabilities follow some rules; one of these rules is the
product rule:
p ( A, B ) = p ( A | B ) p ( B )
[6]
Exploring the Variety of Random
Documents with Different Content
The Project Gutenberg eBook of Woman in
Prison
This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.
Language: English
CAROLINE H. WOODS.
NEW YORK:
II. AT NIGHT 13
X. AN ARRIVAL 93
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookname.com