0% found this document useful (0 votes)

62 views7 pages

Bayesian Basics: Ryan P. Adams

T (S0−1 + NΣ−1 )(μ − m N ) 1. The document provides an overview of Bayesian statistics and machine learning, explaining that Bayesians represent uncertain quantities as probability distributions over possible values rather than point estimates. 2. It describes how Bayesian inference works by updating a prior distribution with observed data via Bayes' theorem to obtain a posterior distribution, representing the revised beliefs after seeing data. 3. Specific examples are given of conjugate prior distributions that have a closed form for the posterior, including the beta-binomial model and a Gaussian model with known covariance.

Uploaded by

Pen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views7 pages

Bayesian Basics: Ryan P. Adams

Uploaded by

Pen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Bayesian Basics

Ryan P. Adams

These are notes to help clarify things and create context. Please note that they are not a replacement for the readings.
There are a lot of ideas that seem to carry the name Bayesian and so it can be unclear sometimes what this word actually means. At a high level, however, it is about being willing to use
probability distributions to represent unknown quantities that are not necessarily random. That
is, using probability to capture degrees of belief in which the uncertainty may be entirely in ones
own head or the state of an algorithm. For example, we might have some noisy astronomical information about the question how many rings does Saturn have? We might have some evidence
supporting some number of rings, but it is noisy and incomplete, so we are uncertain. A Bayesian
is willing to place a probability distribution on this quantity and represent it as a random variable,
because she is uncertain. A frequenist might assert that there is no possibility of repeating a random
event that produces a new Saturn with a different number of rings, and so it is inappropriate to
consider this a random variable with a probability distribution: there is an unknown truth and we
must estimate it. This is a deep philosophical question that has been debated for a long time. In
this class, were going to take it as a given that some kinds of machine learning problems are noisy
and uncertain and that it can be useful to reason about these using the calculus of probabilities.
The Bayesian model for machine learning is appealing for a few reasons. First, as Ive said, it
allows one to represent beliefs in the presence of noise. However, it also allows you to integrate out
that uncertainty and account for it when making decisions and predictions from data. It provides a
coherent way to balance old data against new data and accumulate more information as it arrives.
It also enable one to separate out modeling assumptions from fitting (inference) procedures and
separate algorithmic concerns from our inductive biases. Finally, it enables us to handle difficult
tasks like model selection in a clear and rigorous way. Being Bayesian is not the only approach to
machine learning and statistics, but it can be a nice one for many problems due to these and other
properties.
Personally, I like to think of Bayesian inference as a kind of hypothesis processing machine.
Imagine that there is a space of possible (unobserved) states of the world and wed like to reason about them. Lets imagine that our a priori beliefs about the world are captured by a prior
distribution p( ). Now, we see some data and those data are coupled to these hypotheses via a
likelihood function p(data | ). This likelihood function is a distribution over data, given a state of
the world. The environment gives the data to us and were stuck with it, so we think of likelihood
functions as being functions over their parameters; this can be somewhat confusing because those
parameters appear behind the vertical bars.1 In any case, this hypothesis processing machine has
1 Hard-core

frequentists might say that you cant condition on something that isnt a random variable and so therefore likelihood function should be written as a parameterized family of densities like f (data).

two steps: first, multiply these two function together pointwise as a function of ; second, normalize so that you get a probability density back on . This multiplication penalizes values of that
assign low probability to the data, and upweights those that assign high probability to the data.
prior p( ) multiply by p(data | ) divide by

p(data | ) p( ) d posterior p( | data)

Bayes theorem really is that simple:

p( | data) = R

p( ) p(data | )
.
p( 0 ) p(data | 0 ) d 0

Here Im using 0 instead of to make it clearer that this denominator doesnt depend on a specific
value of , but is an integral over all values. It is the normalization constant for this distribution
over , often called the marginal likelihood:
p(data) =

p( ) p(data | ) d .

Conjugacy
It is often the case that your prior might have a simple form that you get to choose, but after you
multiply it by one or more likelihood functions, it starts to become complicated. This typically
means that you can evaluate it pointwise, but only up to a constant because the marginal likelihood integral becomes intractable. There are a variety of methods out there for dealing with this
(common) situation, with the two most popular ones being Markov chain Monte Carlo and variational inference. These more advanced techniques are out of the scope of this course, so we will
instead focus on the important situations in which the posterior distribution has the same form
as the prior distribution. Likelihoods that have this form are generally exponential family distributions and the priors that are closed under the corresponding Bayesian updates are called conjugate
priors. We wont go into too much detail about exponential family distributions in this course, but
the basic idea is that these distributions have the form
o
n
p(data | ) exp T T (data) ,
that is, the log densities are linear in the parameters. Here Im imagining that is now a realvalued vector. The vector function T (data) provides the sufficient statistics. The book goes into
more detail about exponential families and why they are interesting, and CS281 discusses these
topics further.
In any case, the reason these kinds of likelihoods are convenient is because if we have a prior
that looks like
n
o
p( ) exp T ,
then when you multiply it by one of those likelihoods, things inside the exponential function just
add:
n
o
p( | data) exp T ( + T (data)) .
2

Im skipping a ton of details here about the conjugate prior setup. Bishop 2.4 has a more rigorous
and thorough treatment in which these equations have all the other terms to make the math work
out correctly with normalization constants and such. Nevertheless, this is the high level idea:
make things multiply nicely.

Example: Beta-Binomial
The Bernoulli distribution is just the coin flip distribution with some bias (0, 1). Well say
that a heads is 1 and a tails is 0. The probability mass function for the Bernoulli is
Pr( X = x | ) = x (1 )1 x .
You can write this as an exponential family using the natural parameterization = ln{/(1 )}:
x (1 )1 x = exp{ x ln + (1 x ) ln(1 )}
exp{ x ln x ln(1 )}

}.
exp{ x ln
1
The conjugate prior for the Bernoulli distribution is the beta distribution, which is a density on the
interval (0, 1) given by
p( | , ) =

( + ) 1
(1 ) 1 .
()( )

Bishop 2.1.1 has nice pictures of the beta distribution and discusses some further properties. Now,
in class I sort of threw around lets imagine you see J heads and K tails, but if you were paying close attention, you know I was skipping over some important pieces. If you have J heads
out of J + K tosses, then what you really have is a binomial distribution and there is a binomial
coefficient out there in the likelihood:

J+K J
Pr( J heads, K tails | ) =
(1 ) K .
J
The binomial coefficient doesnt effect the math we do for the Bayesian update, however, since it
just gets sucked into the normalization constant. To see how this works, we first denote our prior
parameters for the beta distribution as 0 and 0 .
p() = Beta( | 0 , 0 )
prior

likelihood

{

( 0 + 0 ) 0 1
p( | J heads, K tails, 0 , 0 )

(1 ) 0 1
( 0 ) ( 0 )
0 + J 1 (1 ) 0 + K 1

= Beta( | = 0 + J, = 0 + K ) .
So after seeing these flips we get a new beta distribution back out!

}|
{
J+K J
(1 ) K
J

Bayesian Updates for Gaussians with Known Covariance

Next, lets consider a very slightly more complex model: multivariate Gaussian data with known
covariance. That is, were imagining that we have N data in RD that have been drawn from
a D-dimensional Gaussian distribution with unknown mean but known covariance matrix .
Gaussian distributions are very fundamental and I am going to assume that youve seen them
before and are comfortable with them and what covariance matrices are, etc. For a review, see
Bishop 2.3. The probability density function for a Gaussian with this parameterization is

1
D/2
1/2
T 1
N ( x | , ) = |2 |
||
exp ( x ) ( x ) .
2
This is the likelihood function, in terms of . The conjugate prior for in this case is another
Gaussian. Well denote the prior parameters for that Gaussian as m0 and S0 . We encounter N
data { xn }nN=1 and now we would like the posterior distribution on :
likelihood
prior

}|
{
N
z
}|
{
p( | m0 , S0 , , { xn }nN=1 ) [N ( | m0 , S0 )] N ( xn | , )
n =1

1
exp ( m0 )T S01 ( m0 )
2

1
exp 2 (xn )T 1 (xn )
n =1

Ive thrown out all of the factors that did not involve . Its usually convenient to write this all in
log space:
!
N
1
ln p( | m0 , S0 , , { xn }nN=1 ) = const
( m0 )T S01 ( m0 ) + ( xn )T 1 ( xn )
2
n =1
!
N
1
= const
T S01 2T S01 m0 2T 1 xn + NT 1 .
2
n =1
Here I just expanded the two quadratic forms and stuck terms that dont depend on into the
constant out front. I also observed that the xn only participate in one of the terms. Now we
collapse the like terms and write x N = N1 n xn for the sample mean of the data:

1 T 1
S0 + N1 2T S01 m0 + N1 x N
.
ln p( | m0 , S0 , , { xn }nN=1 ) = const
2
We now complete the square and write this as a quadratic form, which results in some other things
getting baked into the constant. We dont care about any of these constants because we have to
normalize later anyway. Remember that the (log) normalization constant is just a number we
subtract in log space.
1
1
ln p( | m0 , S0 , , { xn }nN=1 ) = const ( m N )T S
N ( m N )
2

1
S N = S01 + N1

m N = S N S01 m0 + N1 x N .
4

So now, in log space, we have a quadratic form and so we can see well wind up with a Gaussian
in if we exponentiate and normalize! We can compute the posterior mean and covariance, denoted m N and S N , respectively, using a little bit of linear algebra.2 A few things to think about to
help get some intuition:
The more data you get, the bigger N will be and the more relative effect 1 and 1 x N will
have on S N and m N , respectively. This feels right because more data means that the prior
should be overwhelmed.
Consider what a strong prior would look like for , in the case where the dimensions were
independent a priori. The matrix S0 would have small positive numbers on the diagonal
and zeros off of it. When we took its inverse, it would still be a diagonal matrix, but now
the values on the diagonal would be big. These would be competing with N to center
the posterior mean at m0 instead of x N . This also feels right because a strong prior should
compete more with the data.
Note that the posterior covariance depends on the data only through N and not on the actual
values of xn . That means that our uncertainty in this case is entirely a function of the number
of data. Note that this wouldnt be the case if was unknown.
It is a useful exercise to work through this same math where both and are unknown. In
this case, there is still a conjugate prior, but now it is a more complicated distribution called a
Normal-Inverse-Wishart distribution. The Wishart distribution is a distribution over positive definite matrices that is conjugate to Gaussian likelihoods with unknown covariances. Bishop goes
through this in 2.3.6.

Bayesian Linear Regression

We now have the tools to revisit linear regression in a Bayesian setting. Recall that are data are
now pairs, { xn , tn }nN=1 . Well assume that there are some basis function and our inputs become a
design matrix which has N rows and J columns. The targets are real-valued and we stack them
into a column vector t R N . Our regression model assumes independent zero-mean Gaussian
noise with precision . Our weight parameter is a J-dimensional w and so were saying the labels
arise as
en N (0, 1 ) .

t n = ( x n )T w + en
This becomes a likelihood function for the nth datum via

p(tn | xn , w, ) = N (tn | ( xn )T w, 1 ) .
The noise is independent, so we can write the likelihood for all N data as
p(t | { xn }nN=1 , w, ) =

N (tn | (xn )T w, 1 )

n =1

= N (t | w, 1 I N ) .
2 Its

pretty common to use the subscript N to denote posterior parameters. This is a kind of reflection of using the
subscript 0 for prior parameter; in the beginning you have zero data and afterward you have N data.

Here were writing this likelihood as a big multivariate Gaussian rather than a product of univariate ones, but is exactly the same. Im using the notation I N to indicate an N N identity matrix.
If you find this switch to matrix notation confusing, it might be worth working through it to convince yourself that it is correct. To do Bayesian linear regression, well need to put a prior on the
weights w. The convenient conjugate prior is a Gaussian and, as before, well use prior parameters m0 and S0 ; Bishop does the same in Equation (3.48). We proceed exactly as in the simple
Gaussian case and write down the prior and likelihood to get the posterior:
prior

likelihood

}|
{
z
}|
{ z
1
p(w | t, , , m0 , S0 ) N (w | m0 , S0 ) N (t | w, I N ) .
We move to log space and collapse constants, as before:

1
(w m0 )T S01 (w m0 ) + (t w)T (t w)
2

1 T 1
w S0 w 2wT S01 m0 2wT T t + wT T w .
= const
2

ln p(w | t, , , m0 , S0 ) = const

Collect the quadratic and linear terms:

ln p(w | t, , , m0 , S0 ) = const

1 T 1
.
w S0 + T w 2wT S01 m0 + T t
2

Complete the square:

1
1
ln p(w | t, , , m0 , S0 ) = const (w m N )T S
N (w m N )
2

1
S N = S01 + T

m N = S N S01 m0 + T t .
So, it turns out that with Gaussian noise we have a Gaussian posterior on the weights. Now, think
about what these parameters would look like if we made the prior very weak and zero mean. A
very weak independent prior would mean that S0 was zero off of the diagonal and large positive
values on the diagonal, i.e., large variances and large a priori uncertainty about w. When this
matrix is inverted, S01 will have zeros off the diagonal and values that are nearly zero on the
diagonal. That means that
S N 1 ( T ) 1
and S01 m0 will be the zero vector. So now the posterior mean will be
SN

z
}|
{
m N 1 (T )1 T t = (T )1 T t ,
which we recognize as both the ordinary least squares and maximum likelihood estimates for w.
Bishop 3.3 has some very nice figures showing this posterior for simple data. Note that all through
this we have assumed that is known. It is of course also possible to infer using an appropriate
prior.
6

Bayesian Linear Regression Posterior Predictive

We can also compute the posterior predictive in this case. Recall that the posterior predictive is
what the model predicts about new data, integrating out the parameters. In this case, that means
making a prediction of a new output at a new input location, taking into account all possible
values of w:
p(t | x, { xn , tn }nN=1 , m0 , S0 , ) =

p(t | x, w, ) p(w | { xn , tn }nN=1 , m0 , S0 , ) dw

predictive distribution

posterior

Z z

}|
{z
}|
{
N (t | ( x)T w, 1 ) N (w | m N , S N ) dw .

There are different ways to do this integral, including the kind of brute-force algebra weve been
using. Personally, I like to think it through using some basic properties of the Gaussian distribution. In particular, if you have a Gaussian random variable z N (, ), and you apply the
linear transformation y = Az + b, the resulting distribution on y is also Gaussian with a simple
form: y N ( A + b, AAT ). This is just saying: If I have a Gaussian random variable and I perform
a linear transformation of it, what is the resulting distribution? That is relevant here because this is
exactly what the integral is computing: draw a random w from the posterior and then linearly
transform it with ( x). Theres just one more piece of information we need: when we add two
Gaussian random variables of the same dimension, the covariance of their sum is the sum of their
covariance matrices. With these two pieces of knowledge, we see that:
p(t | x, { xn , tn }nN=1 , m0 , S0 , ) = N (t | ( x)T m N , ( x)T S N ( x) + 1 ) .
So the predictive distribution is nice and Gaussian. Bishop 3.3.2 has some nice figures of what this
looks like with polynomial basis functions.

Frequencies: Hasil Uji Frekuensi
No ratings yet
Frequencies: Hasil Uji Frekuensi
25 pages
19-Bayesian 2
No ratings yet
19-Bayesian 2
39 pages
20-Bayesian2
No ratings yet
20-Bayesian2
50 pages
Lecture 2 - 4 Prior
No ratings yet
Lecture 2 - 4 Prior
51 pages
Slides 1
No ratings yet
Slides 1
73 pages
FSMLecture4 - Copy (4)
No ratings yet
FSMLecture4 - Copy (4)
49 pages
Bayesian
No ratings yet
Bayesian
50 pages
Cs 13 Batch 1
No ratings yet
Cs 13 Batch 1
84 pages
Lecture 10
No ratings yet
Lecture 10
33 pages
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
No ratings yet
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
188 pages
Block 4 ST3189
No ratings yet
Block 4 ST3189
25 pages
Some Introductory Remarks On Bayesian Inference
100% (1)
Some Introductory Remarks On Bayesian Inference
35 pages
lec38
No ratings yet
lec38
8 pages
Lecture 3
No ratings yet
Lecture 3
4 pages
Chapter 1 B
No ratings yet
Chapter 1 B
35 pages
CH 5
No ratings yet
CH 5
45 pages
Ch3 - 2009 Conjugate Families of Distributions
No ratings yet
Ch3 - 2009 Conjugate Families of Distributions
67 pages
Lecture 10
No ratings yet
Lecture 10
59 pages
Understanding Python
No ratings yet
Understanding Python
9 pages
Notes4_BayesianLearning
No ratings yet
Notes4_BayesianLearning
8 pages
Lecture4 More Bayes
No ratings yet
Lecture4 More Bayes
24 pages
Latihan Soal
No ratings yet
Latihan Soal
49 pages
Bayesian Uncertainty Quantification
No ratings yet
Bayesian Uncertainty Quantification
23 pages
BML Lecture Notes
No ratings yet
BML Lecture Notes
126 pages
BaYesian Models Machine Learning 2016
No ratings yet
BaYesian Models Machine Learning 2016
126 pages
ln13
No ratings yet
ln13
5 pages
Introduction To Bayesian Methods With An Example
No ratings yet
Introduction To Bayesian Methods With An Example
25 pages
Bayes Lecture Notes
No ratings yet
Bayes Lecture Notes
79 pages
Presentation On Database Management System
No ratings yet
Presentation On Database Management System
20 pages
Chapter 1 Basic Concepts: Thomas Bayes (1702-1761)
No ratings yet
Chapter 1 Basic Concepts: Thomas Bayes (1702-1761)
6 pages
bayesian-inference
No ratings yet
bayesian-inference
18 pages
Introduction To Bayesian Inference: M. Botje NIKHEF, PO Box 41882, 1009DB Amsterdam, The Netherlands June 21, 2006
No ratings yet
Introduction To Bayesian Inference: M. Botje NIKHEF, PO Box 41882, 1009DB Amsterdam, The Netherlands June 21, 2006
68 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Lec12 13 BayesianInferenceForTheGaussian
No ratings yet
Lec12 13 BayesianInferenceForTheGaussian
57 pages
Chap 2
No ratings yet
Chap 2
28 pages
Bayes' Estimators: The Method
No ratings yet
Bayes' Estimators: The Method
7 pages
Assign 1
No ratings yet
Assign 1
5 pages
Advance Statistics
No ratings yet
Advance Statistics
23 pages
Lec17 PriorModeling
No ratings yet
Lec17 PriorModeling
37 pages
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
No ratings yet
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
53 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
Bayes Manuscripts
No ratings yet
Bayes Manuscripts
180 pages
Bayesian Modelling Tuts-4-9
No ratings yet
Bayesian Modelling Tuts-4-9
6 pages
03 Bay Est He or em
No ratings yet
03 Bay Est He or em
13 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
Bayesian Inference: A Practical Primer: Outline
No ratings yet
Bayesian Inference: A Practical Primer: Outline
28 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
Class Diagram Relationships in UML With Examples
No ratings yet
Class Diagram Relationships in UML With Examples
8 pages
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
No ratings yet
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
23 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
Autism Awareness PowerPoint Template by SlideWin
No ratings yet
Autism Awareness PowerPoint Template by SlideWin
29 pages
zzzz-essential_bayes
No ratings yet
zzzz-essential_bayes
158 pages
Funes The Memorious by Borges
0% (1)
Funes The Memorious by Borges
4 pages
RANSAC Vs ICP PDF
100% (1)
RANSAC Vs ICP PDF
15 pages
Fundamentals of Deep Learning: Part 5: Pre-Trained Models
No ratings yet
Fundamentals of Deep Learning: Part 5: Pre-Trained Models
18 pages
Alternative Probabilistic Models: Probability Theory
100% (1)
Alternative Probabilistic Models: Probability Theory
37 pages
Perancangan Sistem Informasi Logistik Dan Basis Data: ILI-3F3
No ratings yet
Perancangan Sistem Informasi Logistik Dan Basis Data: ILI-3F3
33 pages
Introduction To Probabilistic Learning
No ratings yet
Introduction To Probabilistic Learning
9 pages
Models of Operations Research - Commerceiets
No ratings yet
Models of Operations Research - Commerceiets
5 pages
Rubric QGIS
100% (1)
Rubric QGIS
2 pages
Software Engineering (SE2223) : Lecture 12 Imran Rao
No ratings yet
Software Engineering (SE2223) : Lecture 12 Imran Rao
28 pages
OM 1 Ses 4 - 5 - Demand Forecasting Techniques
No ratings yet
OM 1 Ses 4 - 5 - Demand Forecasting Techniques
30 pages
A Specification Language For Direct-Manipulation User Interfaces
No ratings yet
A Specification Language For Direct-Manipulation User Interfaces
35 pages
Set-Theoretic Paradoxes and their Resolution in Z-F
From Everand
Set-Theoretic Paradoxes and their Resolution in Z-F
Samuel Horelick
4.5/5 (2)
Combinatorial Enlightenment Slideshow
No ratings yet
Combinatorial Enlightenment Slideshow
50 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Bayesian Inference
No ratings yet
Bayesian Inference
5 pages
A Beginner's Notes On Bayesian Econometrics (Art)
No ratings yet
A Beginner's Notes On Bayesian Econometrics (Art)
21 pages
How Quantum Computers Fail: Quantum Codes, Correlations in Physical Systems, and Noise Accumulation
No ratings yet
How Quantum Computers Fail: Quantum Codes, Correlations in Physical Systems, and Noise Accumulation
17 pages
Improving Obfuscation in The Cryptonote Protocol: Research Bulletin Mrl-0004
No ratings yet
Improving Obfuscation in The Cryptonote Protocol: Research Bulletin Mrl-0004
14 pages
Nu Nu: Jordan Lee September 23, 2014
No ratings yet
Nu Nu: Jordan Lee September 23, 2014
15 pages
Bayesian Statistics: Thomas Bayes
No ratings yet
Bayesian Statistics: Thomas Bayes
22 pages
Honest Raj 18BCA0063
No ratings yet
Honest Raj 18BCA0063
10 pages
A Method of Measuring Eye Movemnent Using A Scieral Search Coil in A Magnetic Field-274
No ratings yet
A Method of Measuring Eye Movemnent Using A Scieral Search Coil in A Magnetic Field-274
9 pages
MetaDisk: A Blockchain-Based Decentralized File Storage Application
No ratings yet
MetaDisk: A Blockchain-Based Decentralized File Storage Application
11 pages
Computer Simulation: Course Code: CS 312 Resource Person: Syed Rehan Ashraf Email: Rehan - Ashraf@umt - Edu.pk - Edu
No ratings yet
Computer Simulation: Course Code: CS 312 Resource Person: Syed Rehan Ashraf Email: Rehan - Ashraf@umt - Edu.pk - Edu
6 pages
The Humane Interface New Directions For Designing Interactive Systems
No ratings yet
The Humane Interface New Directions For Designing Interactive Systems
5 pages
Probabilistic Machine Learning: Exponential Families
No ratings yet
Probabilistic Machine Learning: Exponential Families
19 pages
Dunster House Basement Floor: N E W S
No ratings yet
Dunster House Basement Floor: N E W S
1 page
Experiment No. 2: Aim: A) D Flip-Flop: Synchronous VHDL Code
No ratings yet
Experiment No. 2: Aim: A) D Flip-Flop: Synchronous VHDL Code
6 pages
Adobe® Photoshop® CS5 Keyboard Shortcuts PC
No ratings yet
Adobe® Photoshop® CS5 Keyboard Shortcuts PC
26 pages
Bayesian Statistics 01
100% (1)
Bayesian Statistics 01
22 pages
SQL2XMI: Reverse Engineering of UML-ER Diagrams From Relational Database Schemas
No ratings yet
SQL2XMI: Reverse Engineering of UML-ER Diagrams From Relational Database Schemas
5 pages
To Design The Use Case Diagram For Railway Reservation System
No ratings yet
To Design The Use Case Diagram For Railway Reservation System
10 pages
K - Nearest Neighbor Algorithm
100% (1)
K - Nearest Neighbor Algorithm
18 pages
Conceptual, Logical, Physical Models
No ratings yet
Conceptual, Logical, Physical Models
2 pages
Distribution Tables PDF
No ratings yet
Distribution Tables PDF
11 pages
Modern Bayesian Econometrics
No ratings yet
Modern Bayesian Econometrics
100 pages
CGVR
No ratings yet
CGVR
3 pages
Segmentation
No ratings yet
Segmentation
16 pages
Elements of Real Analysis
From Everand
Elements of Real Analysis
David A. Sprecher
No ratings yet
3D Scanner
No ratings yet
3D Scanner
27 pages
Functional Dependencies-More Examples
No ratings yet
Functional Dependencies-More Examples
17 pages
Map Reading
No ratings yet
Map Reading
2 pages
DB
100% (1)
DB
92 pages
Notes Chi Square Test and Anova
No ratings yet
Notes Chi Square Test and Anova
5 pages
Sysml Overview Oose
100% (6)
Sysml Overview Oose
4 pages
SQL Server Course Content
No ratings yet
SQL Server Course Content
11 pages

Bayesian Basics: Ryan P. Adams

Uploaded by

Bayesian Basics: Ryan P. Adams

Uploaded by

Bayesian Basics

p(data | ) p( ) d posterior p( | data)

Bayes theorem really is that simple:

Bayesian Updates for Gaussians with Known Covariance

Bayesian Linear Regression

Collect the quadratic and linear terms:

Complete the square:

Bayesian Linear Regression Posterior Predictive

p(t | x, w, ) p(w | { xn , tn }nN=1 , m0 , S0 , ) dw

You might also like