0% found this document useful (0 votes)

37 views

MLPR w0f - Machine Learning and Pattern Recognition

This document reviews probability theory concepts like expectations, means, and variances. It defines notation, expectations as probability-weighted sums, means as expectations of outcomes, and variances as expectations of squared distances from the mean. It also discusses sums of independent variables and how variances add while means remain zero.

Uploaded by

zeliawillscumberg

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

MLPR w0f - Machine Learning and Pattern Recognition

Uploaded by

zeliawillscumberg

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

MLPR → Course Notes → w0f PDF of this page

Previous: Programming in Matlab/Octave or Python Next: Notation

Expectations and sums of

variables
You are expected to know some probability theory, including expectations/averages. This sheet
reviews some of that background.

1 Probability Distributions / Notation

The notation on this sheet follows MacKay’s textbook, available online here:
http://www.inference.phy.cam.ac.uk/mackay/itila/book.html

An outcome, x , comes from a discrete set or ‘alphabet’ X = {a1 , a2 , … , aI } , with

corresponding probabilities X = {p1 , p2 , … , pI } .

Examples:

A standard six-sided die has X = {1, 2, 3, 4, 5, 6} with corresponding probabilities

X = { 16 , 16 , 16 , 16 , 16 , 16 } .

A Bernoulli distribution, which has probability distribution

⎧p
⎪ x = 1,
P(x) = ⎨
⎪1−p x = 0, (1)
⎩0 otherwise,

has alphabet X = {1, 0} with X = {p, 1−p}.

2 Expectations
An expectation is a property of a probability distribution, defined by a probability-weighted
sum. The expectation of some function, f , of an outcome, x , is:

∑
𝔼P(x) [f (x)] = pi f (ai ). (2)
i=1

Often the subscript P(x) is dropped from the notation because the reader knows under which
distribution the expectation is being taken. Notation can vary considerably, and details are
often dropped. You might also see E[f ], [f ], or ⟨f ⟩ , which all mean the same thing.

The expectation is sometimes a useful representative value of a random function value. The
expectation of the identity function, f (x) = x , is the ‘mean’, which is one measure of the centre
of a distribution.

The expectation is a linear operator:

𝔼[f (x) + g(x)] = 𝔼[f (x)] + 𝔼[g(x)] and 𝔼[cf (x)] = c𝔼[f (x)]. (3)

These properties are apparent if you explicitly write out the summations.

The expectation of a constant with respect to x is the constant:

∑ pi
𝔼[c] = c = c, (4)
i=1

because probability distributions sum to one (‘probabilities are normalized’).

The expectation of independent outcomes separate:

𝔼[f (x)g(y)] = 𝔼[f (x)] 𝔼[g(y)]. (5)

True if x and y are independent.

Exercise 1: prove this. (Answers at the end of the note.)

3 The mean
The mean of a distribution over a number, is simply the ‘expected’ value of the numerical
outcome.

∑
‘Expected Value' = ‘mean' = μ = 𝔼[x] = pi ai . (6)
i=1

For a six-sided die:

1 1 1 1 1 1
𝔼[x] = ×1 + ×2 + ×3 + ×4 + ×5 + ×6 = 3.5. (7)
6 6 6 6 6 6

In every day language I wouldn’t say that I ‘expect’ to see 3.5 as the outcome of throwing a
die… I expect to see an integer! However, 3.5 is the ‘expected value’ as it is commonly
defined. Similarly a single Bernoulli outcome will be a zero or a one, but its ‘expected’ value is
a fraction,

𝔼[x] = p×1 + (1−p)×0 = p, (8)

the probability of getting a one.

Change of units: I might have a distribution over heights measured in metres, for which I
have computed the mean. If I multiply the heights by 100 to obtain heights in centimetres, the
mean in centimetres can be obtained by multiplying the mean in metres by 100. Formally:
𝔼[100 x] = 100 𝔼[x] .

4 The variance
The variance is also an expectation, measuring the average squared distance from the mean:

var[x] = σ 2 = 𝔼[(x − μ)2 ] = 𝔼[x2 ] − 𝔼[x]2 , (9)

where μ = 𝔼[x] is the mean.

Exercise 2: prove that 𝔼[(x − μ)2 ] = 𝔼[x 2 ] − 𝔼[x]2 .

Exercise 3: show that var[cx] = c2 var[x].

Exercise 4: show that var[x + y] = var[x] + var[y] , for independent outcomes x and y .

Exercise 5: Given outcomes distributed with mean μ and variance σ 2 , how could you shift
and scale them to have mean zero and variance one?

Change of units: If the outcome x is a height measured in metres, then x 2 has units m2 ;
x2 is an area. The variance also has units m2 , it cannot be represented on the same scale as the
outcome, because it has different units. If you multiply all heights by 100 to convert to

centimetres, the variance is multiplied by 1002 . Therefore, the relative size of the mean and
the variance depends on the units you use, and so often isn’t meaningful.

Standard deviation: The standard deviation σ , the square root of the variance, does have the
same units as the mean. The standard deviation is often used as a measure of the typical
distance from the mean. Often variances are used in intermediate calculations because they are
easier to deal with: it is variances that add (as in Exercise 4 above), not standard deviations.

5 Sums of independent variables: “random

walks”
A drunkard starts at the centre of an alleyway, with exits at each end. He takes a sequence of
A drunkard starts at the centre of an alleyway, with exits at each end. He takes a sequence of
random staggers either to the left or right along the alleyway. His position after N steps is
kN = ∑Nn=1 xn , where the outcomes, {xn } , the staggering motions, are drawn from a
distribution with zero mean and finite variance σ 2 . For example X = {−1, +1} with
X = { 12 , 12 } , which has 𝔼[xn ] = 0 and var[xn ] = 1 .

If the drunkard started in the centre of the alleyway, will he ever escape? If so, roughly how
long will it take? (If you don’t already know, have a think…)

The expected, or mean position after N steps is 𝔼[kN ] = N𝔼[xn ] = 0 . This doesn’t mean we
don’t think the drunkard will escape. There are ways of escaping both left and right, it’s just
‘on average’ that he’ll stay in the middle.

The variance of the drunkard’s position is var[kN ] = Nvar[xn ] = N σ 2 . The standard deviation
of the position is then std[kN ] = √N
‾‾σ , which is a measure of the width of the distribution over
the displacement from the centre of the alleyway. If we double the length of the alley, then it
will typically take four times the number of random steps to escape.

Worthwhile remembering: the typical magnitude of the sum of N independent zero-mean

variables scales with √N
‾‾ . The individual variables need to have finite variance, and ‘typical
magnitude’ is measured by standard deviation. Sometimes you might have to work out the σ
for your problem, or do other detailed calculations. But sometimes the scaling of the width of
the distribution is all that really matters.

Corollary: the typical magnitude of the mean of N independent zero-mean variables with
finite variance scales with 1/√N
‾‾ .

6 Solutions
As always, you are strongly recommended to work hard on a problem yourself before looking
at the solutions. As you transition into doing research, there won’t be any answers, and you
have to build confidence in getting and checking your own answers.

Exercise 1: For independent outcomes x and y , p(x, y) = p(x)p(y) and so

𝔼[f (x)g(y)] = ∑x ∑y p(x)p(y)f (x)g(y) = ∑x p(x)f (x) ∑y p(y)g(y) = 𝔼[f (x)]𝔼[g(y)] .

Exercise 2: 𝔼[(x − μ)2 ] = 𝔼[x 2 + μ2 − 2xμ] = 𝔼[x 2 ] + μ2 − 2μ𝔼[x] = 𝔼[x 2 ] − μ2 .

Exercise 3:
var[cx] = 𝔼[(cx)2 ] − 𝔼[cx]2 = 𝔼[c2 x2 ] − (c𝔼[x])2 = c2 (𝔼[x2 ] − 𝔼[x]2 ) = c2 var[x] .
Exercise 4:
var[x + y] = 𝔼[(x + y)2 ] − 𝔼[x + y]2 = 𝔼[x2 ] + 𝔼[y2 ] + 2𝔼[xy] − (𝔼[x]2 + 𝔼[y]2 + 2𝔼[x]𝔼[y]) = var[x] + var[y] ,
if 𝔼[xy] = 𝔼[x]𝔼[y] , true if x and y are independent variables.

Exercise 5: z = (x − μ)/σ has mean 0 and variance 1. The division is by the standard
deviation, not the variance. You should now be able to prove this result for yourself.

What to remember: using the expectation notation where possible, rather than writing out the
summations or integrals explicitly, makes the mathematics concise.

MLPR → Course Notes → w0f PDF of this page

Previous: Programming in Matlab/Octave or Python Next: Notation

Notes by Iain Murray and Arno Onken

BTEC L3, Unit 7 - Business Decision Making Notes
100% (1)
BTEC L3, Unit 7 - Business Decision Making Notes
41 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
21 pages
Role of Social Media Influencers in Tourism Destinations
No ratings yet
Role of Social Media Influencers in Tourism Destinations
12 pages
Machine Learning and Pattern Recognition Expectations
No ratings yet
Machine Learning and Pattern Recognition Expectations
3 pages
PHY114 - Lecture - Notes - Lecture 02
No ratings yet
PHY114 - Lecture - Notes - Lecture 02
7 pages
Lecture 15 N
No ratings yet
Lecture 15 N
37 pages
Basic Probability Reference Sheet: February 27, 2001
No ratings yet
Basic Probability Reference Sheet: February 27, 2001
8 pages
Mit18 05 s22 Class05-Prep-A
No ratings yet
Mit18 05 s22 Class05-Prep-A
7 pages
Stats Cheat Sheets
No ratings yet
Stats Cheat Sheets
15 pages
323 egec
No ratings yet
323 egec
18 pages
Manual For Instructors: TO Linear Algebra Fifth Edition
No ratings yet
Manual For Instructors: TO Linear Algebra Fifth Edition
12 pages
Draw PDF
No ratings yet
Draw PDF
21 pages
SDM 1 Formula
No ratings yet
SDM 1 Formula
9 pages
13 Discrete RV
No ratings yet
13 Discrete RV
29 pages
Iit random variable semester
No ratings yet
Iit random variable semester
44 pages
mean-variance
No ratings yet
mean-variance
14 pages
Lecturenotes4 10
No ratings yet
Lecturenotes4 10
16 pages
Class6 Prep A
No ratings yet
Class6 Prep A
7 pages
mit18_05_s22_class04-prep-b
No ratings yet
mit18_05_s22_class04-prep-b
7 pages
Mean and Variance
No ratings yet
Mean and Variance
19 pages
Variance of Discrete Random Variables Class 5, 18.05 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
No ratings yet
Variance of Discrete Random Variables Class 5, 18.05 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
7 pages
Expected Value:) P (X X X E
No ratings yet
Expected Value:) P (X X X E
28 pages
Mathematical Expectation Discrete
No ratings yet
Mathematical Expectation Discrete
23 pages
Variance PDF
No ratings yet
Variance PDF
14 pages
Sampling Distributions:: N X X X X
No ratings yet
Sampling Distributions:: N X X X X
3 pages
Lesson 2-07 Properties of Means and Variances
100% (1)
Lesson 2-07 Properties of Means and Variances
9 pages
EEE251 Probability Methods in Engineering: Instructor: Bakhtiar Ali
No ratings yet
EEE251 Probability Methods in Engineering: Instructor: Bakhtiar Ali
10 pages
Expected Value of A Random Variable
No ratings yet
Expected Value of A Random Variable
10 pages
Lesson 4: Mean and Variance of A Discrete Random Variables
No ratings yet
Lesson 4: Mean and Variance of A Discrete Random Variables
11 pages
Mean and Variance
No ratings yet
Mean and Variance
19 pages
ProbabilityStatistics_Probability2 (1)
No ratings yet
ProbabilityStatistics_Probability2 (1)
11 pages
Distribuciones de Probabilidades
No ratings yet
Distribuciones de Probabilidades
10 pages
3.1 Expectation: Expectation or The Expected Value of X, Denoted by E (X), Is Defined by
No ratings yet
3.1 Expectation: Expectation or The Expected Value of X, Denoted by E (X), Is Defined by
3 pages
Stats 1 - IITM BS Notes - Part 4
No ratings yet
Stats 1 - IITM BS Notes - Part 4
16 pages
Stat-and-Prob-Q1-M5
No ratings yet
Stat-and-Prob-Q1-M5
12 pages
Statistics Review
No ratings yet
Statistics Review
16 pages
Summary Statistics, Distributions of Sums and Means: Joe Felsenstein
No ratings yet
Summary Statistics, Distributions of Sums and Means: Joe Felsenstein
21 pages
Aaoc C111: Probability & Statistics: Bits-Pilani Hyderabad Campus
No ratings yet
Aaoc C111: Probability & Statistics: Bits-Pilani Hyderabad Campus
41 pages
lecture 6. Statistics (1)
No ratings yet
lecture 6. Statistics (1)
28 pages
Chap2 PDF
No ratings yet
Chap2 PDF
20 pages
Mean and Variance of Random Variables and Probability Distribution Discussion
No ratings yet
Mean and Variance of Random Variables and Probability Distribution Discussion
36 pages
Chapter 4 Slides
No ratings yet
Chapter 4 Slides
27 pages
Continuous Random Variables: Probability Density Function PDF
No ratings yet
Continuous Random Variables: Probability Density Function PDF
13 pages
Lecture 7 9
No ratings yet
Lecture 7 9
16 pages
week two note
No ratings yet
week two note
19 pages
Stat Cheatsheet (Ver.2)
No ratings yet
Stat Cheatsheet (Ver.2)
2 pages
Chapter 2: Axioms of Probability
No ratings yet
Chapter 2: Axioms of Probability
8 pages
A (Very) Brief Review of Statistical Inference: 1 Some Preliminaries
No ratings yet
A (Very) Brief Review of Statistical Inference: 1 Some Preliminaries
9 pages
For Statsprob 1st PPT in 2nd Sem
No ratings yet
For Statsprob 1st PPT in 2nd Sem
7 pages
Engineering Probability and Statistics Statistics: Mathematical Expectation
No ratings yet
Engineering Probability and Statistics Statistics: Mathematical Expectation
18 pages
F (X) Is Reviewed
No ratings yet
F (X) Is Reviewed
18 pages
Lecture 2
No ratings yet
Lecture 2
45 pages
Variance: Variance Is The Expectation of The
No ratings yet
Variance: Variance Is The Expectation of The
21 pages
17 Notes MFML Probreview
No ratings yet
17 Notes MFML Probreview
19 pages
Slides 9 A
No ratings yet
Slides 9 A
44 pages
Math Reviewer
No ratings yet
Math Reviewer
31 pages
Statistics and Probability Reviewer
No ratings yet
Statistics and Probability Reviewer
4 pages
2.3 Mathematical Expectation: Xpxifxisad.R.Vwithp.M.F.Px X E Xfxdxifxisac.R.Vwithp.D.F.Fx
No ratings yet
2.3 Mathematical Expectation: Xpxifxisad.R.Vwithp.M.F.Px X E Xfxdxifxisac.R.Vwithp.D.F.Fx
11 pages
Review Some Basic Statistical Concepts: Topic
No ratings yet
Review Some Basic Statistical Concepts: Topic
55 pages
Expectations 13 Pages
No ratings yet
Expectations 13 Pages
13 pages
EDA Lesson 3
No ratings yet
EDA Lesson 3
3 pages
Statistics and probability reviewer
No ratings yet
Statistics and probability reviewer
3 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
BDS 2018-19
No ratings yet
BDS 2018-19
6 pages
Award_in_Education_and_Training_sample
No ratings yet
Award_in_Education_and_Training_sample
9 pages
w2e_multivariate_gaussian
No ratings yet
w2e_multivariate_gaussian
6 pages
BDS 2016-17
No ratings yet
BDS 2016-17
4 pages
w2c_central_limit
No ratings yet
w2c_central_limit
1 page
Biological Data Science Lecture6
No ratings yet
Biological Data Science Lecture6
29 pages
Biological Data Science Lecture4
No ratings yet
Biological Data Science Lecture4
21 pages
Part 5
No ratings yet
Part 5
31 pages
Part 4
No ratings yet
Part 4
24 pages
MDA3S
No ratings yet
MDA3S
22 pages
MATH11183 Week 1-Part 2
No ratings yet
MATH11183 Week 1-Part 2
18 pages
Week 8 Pca
No ratings yet
Week 8 Pca
26 pages
2019 AMAM Exam Paper
No ratings yet
2019 AMAM Exam Paper
3 pages
TS Part2
No ratings yet
TS Part2
62 pages
Week 2 Naive Bayes
No ratings yet
Week 2 Naive Bayes
15 pages
Bayesian Workshop1 Solution
No ratings yet
Bayesian Workshop1 Solution
3 pages
Part 3
No ratings yet
Part 3
29 pages
2017 AMAM Exam Paper
No ratings yet
2017 AMAM Exam Paper
6 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
PMRslides 03 B
No ratings yet
PMRslides 03 B
45 pages
Bayesian Week4 LectureNotes
No ratings yet
Bayesian Week4 LectureNotes
15 pages
PMRslides 02
No ratings yet
PMRslides 02
13 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
Bio Statslectures
No ratings yet
Bio Statslectures
60 pages
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
No ratings yet
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
3 pages
w9b Netflix Prize
No ratings yet
w9b Netflix Prize
3 pages
Heat Advection
No ratings yet
Heat Advection
12 pages
Machine Learning and Pattern Recognition Sampling Based Approximations
No ratings yet
Machine Learning and Pattern Recognition Sampling Based Approximations
3 pages
Machine Learning and Pattern Recognition - Laplace - Approximation
No ratings yet
Machine Learning and Pattern Recognition - Laplace - Approximation
4 pages
Machine Learning and Pattern Recognition Variational KL
No ratings yet
Machine Learning and Pattern Recognition Variational KL
5 pages
Mineral Sampling
No ratings yet
Mineral Sampling
31 pages
IEQ Microproject
No ratings yet
IEQ Microproject
14 pages
Manual Calculation - Pencil Beam
No ratings yet
Manual Calculation - Pencil Beam
25 pages
C3 PDF
No ratings yet
C3 PDF
15 pages
Porn Sex Versus Real Sex: How Sexually Explicit Material Shapes Our Understanding of Sexual Anatomy, Physiology, and Behaviour
No ratings yet
Porn Sex Versus Real Sex: How Sexually Explicit Material Shapes Our Understanding of Sexual Anatomy, Physiology, and Behaviour
23 pages
Conflux Journal of Education Vol 1 Issue 2 July 2013 Nas Publishers
No ratings yet
Conflux Journal of Education Vol 1 Issue 2 July 2013 Nas Publishers
135 pages
Midterm Review STAT 2800 2010 - W2024
No ratings yet
Midterm Review STAT 2800 2010 - W2024
7 pages
Ib Biology Internal Assessment Checklist-1
No ratings yet
Ib Biology Internal Assessment Checklist-1
3 pages
Multipotentiality
No ratings yet
Multipotentiality
12 pages
P1 Management Accounting Performance Evaluation
No ratings yet
P1 Management Accounting Performance Evaluation
31 pages
Statistical Process Control Exercise For Exam Two
No ratings yet
Statistical Process Control Exercise For Exam Two
4 pages
What Are Functions Education Presentation in Green Cream Brown Nostalgic Handdrawn Style
No ratings yet
What Are Functions Education Presentation in Green Cream Brown Nostalgic Handdrawn Style
11 pages
OMAE2012-83586: Safety Factors For Fatigue Analysis of Flexible Pipes Based On Structural Reliability
No ratings yet
OMAE2012-83586: Safety Factors For Fatigue Analysis of Flexible Pipes Based On Structural Reliability
6 pages
Journal of Vocational and Technical Education (Jovte)
100% (3)
Journal of Vocational and Technical Education (Jovte)
227 pages
Wk6 Statistics Probability Q3 Mod5 Finding The Mean and Variance
No ratings yet
Wk6 Statistics Probability Q3 Mod5 Finding The Mean and Variance
24 pages
Inference For Numerical Data
No ratings yet
Inference For Numerical Data
3 pages
Lecture 1: Basic Probability.: Prof. Andrew Barbour Dr. B Eatrice de Tili' Ere
No ratings yet
Lecture 1: Basic Probability.: Prof. Andrew Barbour Dr. B Eatrice de Tili' Ere
40 pages
Eurolab Handbook Iso Iec 17025 2017
No ratings yet
Eurolab Handbook Iso Iec 17025 2017
17 pages
E asTTle Writing Manual Part B Technical Information
No ratings yet
E asTTle Writing Manual Part B Technical Information
23 pages
Experiment-1: Aim: - To Calculate Mean, Median, Mode, Standard Deviation and
No ratings yet
Experiment-1: Aim: - To Calculate Mean, Median, Mode, Standard Deviation and
3 pages
Oooo 1.: 5. The Sum of The Percent Frequencies For All Classes Will Always Equal
No ratings yet
Oooo 1.: 5. The Sum of The Percent Frequencies For All Classes Will Always Equal
6 pages
90plus MAS Manual
No ratings yet
90plus MAS Manual
52 pages
bpcc 108 notes
No ratings yet
bpcc 108 notes
11 pages
Technical Guide For KPI Framework
No ratings yet
Technical Guide For KPI Framework
6 pages
Terms of Use
No ratings yet
Terms of Use
24 pages
The Influence of Human Resources Management Practices and Governmental Support On Organizational Performance in Small Manufacturing Businesses in
No ratings yet
The Influence of Human Resources Management Practices and Governmental Support On Organizational Performance in Small Manufacturing Businesses in
297 pages
G11-IDEA-DLL-TOLENTINO-March-11-14-2024
No ratings yet
G11-IDEA-DLL-TOLENTINO-March-11-14-2024
33 pages

MLPR w0f - Machine Learning and Pattern Recognition

Uploaded by

MLPR w0f - Machine Learning and Pattern Recognition

Uploaded by

MLPR → Course Notes → w0f PDF of this page

Previous: Programming in Matlab/Octave or Python Next: Notation

Expectations and sums of

1 Probability Distributions / Notation

An outcome, x , comes from a discrete set or ‘alphabet’ X = {a1 , a2 , … , aI } , with

A standard six-sided die has X = {1, 2, 3, 4, 5, 6} with corresponding probabilities

A Bernoulli distribution, which has probability distribution

has alphabet X = {1, 0} with X = {p, 1−p}.

The expectation is a linear operator:

The expectation of a constant with respect to x is the constant:

because probability distributions sum to one (‘probabilities are normalized’).

The expectation of independent outcomes separate:

True if x and y are independent.

Exercise 1: prove this. (Answers at the end of the note.)

For a six-sided die:

𝔼[x] = p×1 + (1−p)×0 = p, (8)

the probability of getting a one.

var[x] = σ 2 = 𝔼[(x − μ)2 ] = 𝔼[x2 ] − 𝔼[x]2 , (9)

where μ = 𝔼[x] is the mean.

Exercise 2: prove that 𝔼[(x − μ)2 ] = 𝔼[x 2 ] − 𝔼[x]2 .

Exercise 3: show that var[cx] = c2 var[x].

5 Sums of independent variables: “random

Worthwhile remembering: the typical magnitude of the sum of N independent zero-mean

Exercise 1: For independent outcomes x and y , p(x, y) = p(x)p(y) and so

Exercise 2: 𝔼[(x − μ)2 ] = 𝔼[x 2 + μ2 − 2xμ] = 𝔼[x 2 ] + μ2 − 2μ𝔼[x] = 𝔼[x 2 ] − μ2 .

MLPR → Course Notes → w0f PDF of this page

Previous: Programming in Matlab/Octave or Python Next: Notation

Notes by Iain Murray and Arno Onken

You might also like