0% found this document useful (0 votes)
10 views

Formula Sheet

Uploaded by

Magid Gaming
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Formula Sheet

Uploaded by

Magid Gaming
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

01 - Probability, Statistics, Data Analysis

Term and Symbol Notes Equation

Sample Mean, 𝒙
̅ Average (𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 )
𝑛

Sample Range 𝑟𝑎𝑛𝑔𝑒 = 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛


𝑛
Sample Variance, 𝒔𝟐 1
2
𝑠 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑛−1
𝑖=1

Standard Deviation, 𝒔 Square root of variance 𝑠 = √𝑠 2

02 - Probability

Term and Symbol Notes Equation

Sample Space, 𝑺 Set of all possible outcomes, consists of elements, can be described by rules
or statements.

Event, 𝑨 Subset of a sample space, usually the part we’re interested in

Complement, 𝑨′ The ‘inverse’ of the event. Consists of all elements not in A

Intersection, 𝑨 ∩ 𝑩 Represents all elements belonging to both A and B. Mutual exclusivity is


defined when there is no intersection.

Union, 𝑨 ∪ 𝑩 Represents all elements belonging to A or B or both

Multiplication Rule For two, separate operations 𝑃𝑜𝑖𝑛𝑡𝑠 = 𝑛1 𝑛2


with 𝒏𝟏 and 𝒏𝟐 outcomes
respectively.

Permutations Arrangement of objects. 𝑛!


Permutation of n objects.

Permutations of n objects in (𝑛 − 1)!


a circle

Permutations of n distinct 𝑛!
𝑛𝑃𝑟 =
objects, taking r at a time (𝑛 − 𝑟)!

Distinct permutations of n 𝑛!
things where 𝒏𝟏 are one kind 𝑛1 ! 𝑛2 !
and 𝒏𝟐 are another kind.
Partitioning Ways to partition/divide n 𝑛 𝑛!
( ) =
objects into r cells with 𝒏𝟏 𝑛1 , 𝑛2 , … , 𝑛𝑟 𝑛1 ! 𝑛2 … 𝑛𝑟 !
elements in the first, 𝒏𝟐 in
the seconds, etc.

Combinations Number of combinations 𝑛 𝑛 𝑛!


( )=( )=
(selections) of n distinct 𝑟, 𝑛 − 𝑟 𝑟 𝑟! (𝑛 − 𝑟)!
objects taken r at a time
= 𝒏𝑪𝒓

Probability Probability of mutually 𝑃(𝐴1 ∪ 𝐴2 ∪ 𝐴3 ∪ … )


exclusive events is the sum
= 𝑃(𝐴1 ) + 𝑃(𝐴2 ) + 𝑃(𝐴3 ) + ⋯
of their individual
probabilities.

Additive Rules (Probability) If A and B are two, non- 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
mutually exclusive events

If A and B are two mutually 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)


exclusive events

Conditional Probability Conditional probability of B, 𝑃(𝑩 ∩ 𝑨)


𝑃(𝐵|𝐴) =
given A 𝑃(𝐴)

Independence Two events A and B are independent if and only if 𝑷(𝑨 ∩ 𝑩) = 𝑷(𝑨)𝑷(𝑩)

Multiplicative Rules (Prob) If both events A and B can 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵|𝐴) or


occur
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵)𝑃(𝐴|𝐵)

03 - Random Variables and Probability Distribution

Term and Symbol Notes Equation

Discrete Probability (Mass) All probabilities are nonzero. 𝑓(𝑥) ≥ 0


Function
The sum of all probabilities
∑ 𝑓(𝑥) = 1
is 1
𝑥

Continuous Probability Each x is added to the x’s


𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∑ 𝑓(𝑡)
Distribution Function that came before.
𝑡≤𝑥

Probability between two 𝑏

values 𝑃(𝑎 < 𝑥 < 𝑏) = ∫ 𝑓(𝑥)𝑑𝑥


𝑎
𝑥
Cumulative probability up to
𝐹(𝑋) = 𝑃(𝑋 ≤ 𝑥) = ∫ 𝑓(𝑡)𝑑𝑡
a value −∞


Probability after a value
𝑃(𝑥 ≥ 𝑎) = ∫ 𝑓(𝑥)𝑑𝑥
𝑎

Joint Probability For discrete random variables 𝑓(𝑥, 𝑦) ≥ 0


Distribution/Mass Function
∑ ∑ 𝑓(𝑥, 𝑦) = 1
𝑥 𝑦

𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = 𝑓(𝑥, 𝑦)

Joint Density Function For continuous random 𝑓(𝑥, 𝑦) ≥ 0


variables
+∞ +∞
∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1
−∞ −∞

𝑃[(𝑋, 𝑌) ∈ 𝐴] = ∬ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦


𝐴

Marginal Distributions The distributions of the X or


𝑔(𝑥) = ∑ 𝑓(𝑥, 𝑦)
Y variables alone. Discrete.
𝑦

+∞
Continuous.
𝑔(𝑥) = ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦
−∞

Conditional Distribution Conditional distribution of 𝑓(𝑥, 𝑦)


𝑓(𝑦|𝑥) =
the random variable Y given 𝑔(𝑥)
X=x

Statistical Independence Two random variables are 𝑓(𝑥, 𝑦) = 𝑔(𝑥)ℎ(𝑦)


statistically independent if:

04 – Mathematical Expectations

Term and Symbol Notes Equation

Mathematical Expectation The mean or expected value


μ𝑥 = 𝐸(𝑥) = ∑ 𝑥 ∙ 𝑓(𝑥)
of a random variable X with
𝑥
probability distribution f(x).
= 𝑥1 𝑓(𝑥1 ) + 𝑥2 𝑓(𝑥2) + ⋯

+∞
𝜇𝑥 = 𝐸(𝑥) = ∫ 𝑥 ∙ 𝑓(𝑥)𝑑𝑥
−∞
The mean or expected value
μ𝑔(𝑋) = 𝐸(𝑔(𝑋)) = ∑ 𝑔(𝑥) ∙ 𝑓(𝑥)
of a random variable g(X)
𝑥

+∞
𝜇𝑔(𝑋) = 𝐸(𝑔(𝑋)) = ∫ 𝑔(𝑥) ∙ 𝑓(𝑥) 𝑑𝑥
−∞

Joint Distribution The mean or expected value μ𝑔(𝑋,𝑌) = 𝐸(𝑔(𝑋, 𝑌))


Mathematical Expectation of a random variable g(X,Y)
given. = ∑ ∑ 𝑔(𝑥, 𝑦) ∙ 𝑓(𝑥, 𝑦)
𝑥 𝑦

𝜇𝑔(𝑋,𝑌) = 𝐸(𝑔(𝑋, 𝑌))


+∞ +∞
=∫ ∫ 𝑔(𝑥, 𝑦)𝑓(𝑥, 𝑦) 𝑑𝑥𝑑𝑦
−∞ −∞

Variance of a Random Variance of X or the variance


σ𝑥 2 = 𝐸(𝑋 − μ𝑥 )2 = ∑(𝑥 − 𝜇𝑥 )2 𝑓(𝑥)
Variable of probability distribution of
𝑥
X
+∞
σ𝑥 2 = 𝐸(𝑋 − μ𝑥 )2 = ∫ (𝑥 − 𝜇𝑥 )2 𝑓(𝑥)𝑑𝑥
−∞

Alternate formula using σ𝑥 2 = 𝐸(𝑋 2 ) − 𝜇𝑥 2


terms from above +∞
𝐸(𝑋 2 ) = ∫ 𝑥 2 ∙ 𝑓(𝑥)𝑑𝑥
−∞

2
Variance of the random σ𝑔(𝑋) 2 = 𝐸(𝑔(𝑋) − μ𝑔(𝑋) )
variable g(X)
2
= ∑(𝑔(𝑥) − 𝜇𝑔(𝑥) ) 𝑓(𝑥)
𝑥

2
σ𝑔(𝑋) 2 = 𝐸(𝑔(𝑋) − μ𝑔(𝑋) )
+∞
2
=∫ (𝑔(𝑥) − 𝜇𝑔(𝑋) ) 𝑓(𝑥)𝑑𝑥
−∞

Covariance Covariance of random σ𝑥𝑦 = 𝐸 ((𝑋 − 𝜇𝑥 )(𝑌 − μ𝑦 ))


variables X,Y with joint
probability distribution f(x,y)
= ∑(𝑥 − 𝜇𝑥 )(𝑦 − 𝜇𝑦 ) 𝑓(𝑥, 𝑦)
𝑥

σ𝑥𝑦 = 𝐸 ((𝑋 − 𝜇𝑥 )(𝑌 − μ𝑦 ))


+∞ +∞
=∫ ∫ (𝑥 − 𝜇𝑥 )(𝑦 − 𝜇𝑦 )𝑓(𝑥, 𝑦) 𝑑𝑥𝑑𝑦
−∞ −∞
Alternate formula σ𝑋𝑌 = 𝐸(𝑋𝑌) − μ𝑥 μ𝑦

Correlation Coefficient The correlation coefficient of 𝜎𝑋𝑌


ρ𝑋𝑌 =
X and Y shows the strength 𝜎𝑋 𝜎𝑌
of their relationship

Linear Combinations of Assuming constant a, b 𝐸(𝑎𝑋 + 𝑏) = 𝑎𝐸(𝑋) + 𝑏


Random Variables
σ𝑎𝑋+𝑏 2 = 𝑎2 σ𝑋 2

If X,Y are random variables σ𝑎𝑋+𝑏𝑌 2 = 𝑎2 σ𝑋 2 + 𝑏2 𝜎𝑌 2 + 2𝑎𝑏σ𝑋𝑌


with joint probability
function f(x,y)

Expected value of the sum or 𝐸(𝑔(𝑋) ± ℎ(𝑋)) = 𝐸(𝑔(𝑋)) ± 𝐸(ℎ(𝑋))


the difference of two (or
more) functions of random
variable X

Expected value of the sum or 𝐸(𝑔(𝑋, 𝑌) ± ℎ(𝑋, 𝑌))


the difference of two (or = 𝐸(𝑔(𝑋, 𝑌)) ± 𝐸(ℎ(𝑋, 𝑌))
more) functions of random
variables X,Y
05 – Common Discrete Probability Distributions

Term and Symbol Notes Equation

Discrete Uniform Used for a random variable 1


f(x; k) = , x = x1 , x2 , … , xk
Distribution X has k values with equal k
probabilities

Mean 1
k

μx = ∑ xi
k
i=1

Variance 1
𝑘

μx = ∑(𝑥𝑖 − μ𝑥 )2
k
𝑖=1

Binomial Distribution Used for binomial random 𝑛


𝑃(𝑋 = 𝑥) = 𝑓(𝑥) = 𝑏(𝑥; 𝑛, 𝑝) = ( ) 𝑝 𝑥 𝑞𝑛−𝑥
variable X (num. successes 𝑥
in n independent trials) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞𝑛−𝑥
with probability of success
p and failure 𝒒 = 𝟏 − 𝒑

Mean μ𝑥 = 𝑛𝑝

Variance 𝜎𝑥2 = 𝑛𝑝𝑞

When using the tables, n is the number of trials, r is x (number of successes).


The values are cumulative, so subtraction is necessary to get probability at
one point.

Multinomial Distribution Used when a given trial 𝑓(𝑥1, 𝑥2 , … , 𝑥𝑘 ; 𝑝1 , 𝑝2 , … , 𝑝𝑘 , 𝑛)


can result in k outcomes, 𝑛 𝑥 𝑥 𝑥
each with probabilities =( ) 𝑝 1𝑝 2𝑝 3 …
𝑥1, 𝑥2 , … , 𝑥𝑘 1 2 3
𝒑𝟏 , 𝒑𝟐 , … , 𝒑𝒌. For n
independent trials 𝑛! 𝑥 𝑥 𝑥
= 𝑝1 1 𝑝2 2 𝑝3 3 …
𝑥1 ! 𝑥2 ! … 𝑥𝑘 !
𝑘 𝑘

∑ 𝑥𝑖 = 𝑛, ∑ 𝑝𝑖 = 1
𝑖=1 𝑖=1

Hypergeometric Distribution Used for the (𝑘𝑥)(𝑁−𝑘 )


𝑛−𝑥
hypergeometric random ℎ(𝑥; 𝑁, 𝑛, 𝑘) =
(𝑁
𝑛
)
variable X (num.
successes) from a random 𝑘𝐶𝑥 ∗ (𝑁 − 𝑘)𝐶𝑛−𝑥
sample n, taken from N =
𝑁𝐶𝑛
items where k are success
and N-k are failures
Mean 𝑛𝑘
μ𝑥 =
𝑁

Variance 𝑁 − 𝑛 𝑘 𝑘
σ2𝑥 = ∗ 𝑛 ∗ (1 − )
𝑁 − 1 𝑁 𝑁

Multivariate Used when N items are 𝑓(𝑥1 , 𝑥2, … , 𝑥𝑘 ; 𝑎1 , 𝑎2 , … , 𝑎𝑘 , 𝑁, 𝑛)


Hypergeometric Distribution partitioned into k cells
𝑨𝟏 , 𝑨𝟐 , … , 𝑨𝒌 with (𝑎𝑥1) (𝑎𝑥2) . . . (𝑎𝑥𝑘)
1 2 𝑘
=
𝒂𝟏 , 𝒂𝟐 , … , 𝒂𝒌 elements (𝑁 )
𝑛
each. The probability
distribution 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒌 𝑎1 𝐶𝑥1 ∗ 𝑎2 𝐶𝑥2 ∗ … ∗ 𝑎𝑘 𝐶𝑥𝑘
=
representing the number of 𝑁𝐶𝑛
elements selected from 𝑘 𝑘
𝑨𝟏 , 𝑨𝟐 , … , 𝑨𝒌 in a sample
∑ 𝑥𝑖 = 𝑛, ∑ 𝑎𝑖 = 𝑁
of size n is:
𝑖=1 𝑖=1

Geometric Distribution Used for a sequence of 𝑝(1 − 𝑝)𝑥−1 , 𝑥 = 1,2, …


𝑝(𝑥) = 𝑃(𝑋 = 𝑥) = {
trials with probability of 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
success p where X is the
number of trials up to the
first success.

Mean 1
μ𝑥 =
𝑝

Variance 1 − 𝑝
σ2𝑥 =
𝑝2

Poisson Distribution Used for Poisson random 𝑒 −𝜆𝑡 (λ𝑡)𝑥


variable X (num. 𝑝(𝑡; λ𝑡) =
𝑥!
outcomes) occurring in a
time interval or region t
with average outcomes per
unit time 𝛌

Mean, Variance μ𝑥 = σ2𝑥 = λ𝑡

When using the tables, 𝛍 = 𝛌𝒕, r is the number of outcomes (x). Table is
cumulative, so subtraction is needed for probability at exact r values.
06 – Common Continuous Probability Distributions

Term and Symbol Notes Equation

Continuous Uniform Density function of the 1


Distribution continuous uniform 𝑓(𝑥; 𝐴, 𝐵) = {𝐵 − 𝐴 , 𝐴≤𝑥≤𝐵
random variable X on the 0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
interval [A, B]. Solving
will require integration
across an interval.

Mean 𝐴 + 𝐵
μ𝑥 =
2

Variance (𝐵 − 𝐴)2
σ2𝑥 =
12

Normal Distribution The density of the normal 1 1


−( )(𝑥−μ)2
𝑒 2σ2
random variable X with √2π
mean 𝛍 and variance 𝛔𝟐 is:

+∞
Mean
μ = 𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞

+∞
Variance
𝜎 2 = 𝐸(𝑥 − 𝜇)2 = ∫ (𝑥 − 𝜇)2 𝑓(𝑥)𝑑𝑥
−∞

Standard Normal To simplify calculations, we have a table for the standard normal curve. To
Distribution make use of them, we must convert our values to a function of Z.

Z to X formula 𝑋−𝜇
𝑍 =
𝜎

Interval conversion 𝑎−𝜇 𝑏−𝜇


𝑧𝑎 = , 𝑧𝑏 =
𝜎 𝜎
𝑃(𝑎 < 𝑋 < 𝑏) = 𝑃(𝑧𝑎 < 𝑍 < 𝑧𝑏 )
= 𝑃(𝑍 < 𝑧𝑏 ) − 𝑃(𝑍 < 𝑧𝑎 )

𝑥
Gamma Distribution The continuous random −
𝑥 α−1 𝑒 𝛽
variable X has a gamma 𝑓(𝑥; α, β) = { βα Γ(α) , 𝑥 > 0
distribution with
parameters 𝛂 and 𝛃 if its 0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
density function is given Γ(𝑛) = (𝑛 − 1)!
by:

Mean μ = αβ
Variance σ2 = αβ2
𝑥
Exponential Distribution Special case of the gamma −
distribution when 𝛂 = 𝟏 𝑒 𝛽
𝑓(𝑥; β) = { 𝛽 ,𝑥 > 0
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

Mean μ= β
Variance σ2 = β2

08 - Fundamental Sampling Distributions and Data Descriptions

Term and Symbol Notes Equation

Sampling Distribution (SD) The SD of the sample 𝑋̅ − μ 𝑋̅ − 𝜇


of Means / Central Limit ̅ is the average from 𝑍= = σ
mean 𝑿 σ𝑋̅
Theorem a size n sample taken from √𝑛
a population with mean 𝛍
and std. dev. 𝛔. With large
n, we approximate a
normal distribution.

The mean 𝛍𝑿̅ and variance σ2


μ𝑋̅ = μ, σ2X̅ =
𝛔𝟐𝐗̅ of the sampling 𝑛
̅
distribution of 𝑿
Sampling Distribution (SD) For two independent μ𝑋̅1− 𝑋̅2 = μ𝑋̅1 − μ𝑋̅2 = μ1 − μ2
of the Difference Between samples of sizes 𝒏𝟏 , 𝒏𝟐 , σ12 σ22
Two Means drawn randomly from two σ2𝑋̅1−𝑋̅2 = σ𝑋2̅1 + σ2𝑋̅2 = +
𝑛1 𝑛2
populations with means
𝛍𝟏 , 𝛍𝟐 , and std. dev. 𝛔𝟏 , 𝛔𝟐 , (𝑋̅1 − 𝑋̅2 ) − μ𝑋̅1− 𝑋̅2
respectively. The SD of the 𝑍=
difference of the means σ𝑋̅1 − 𝑋̅2
̅𝟏 − 𝑿
𝑿 ̅ 𝟐 is approximately (𝑋̅1 − 𝑋̅2 ) − (μ1 − μ2 )
=
normally distributed.
σ12 σ22

𝑛1 + 𝑛2

𝑣 𝑥
Chi-Squared Distribution Special case of the gamma 𝑥 2−1 𝑒 −2
𝒗
distribution where 𝛂 = 𝟐 , 𝑥>0
𝑓(𝑥; 𝑣) = 2𝑣2 Γ (𝑣 )
and 𝛃 = 𝟐, where v is a 2
positive integer/parameter 0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
called the degrees of
freedom (df).
Mean μ= 𝑣
Variance σ2 = 2𝑣
Sampling Distribution (SD) The sample variance (𝑺𝟐 ) 2
(𝑛 − 1)𝑆 2
of Variances of a size n sample from a χ =
σ2
normal population with
variance 𝛔𝟐 has
distribution 𝛘𝟐 , which is a
Chi-squared density
function with 𝒗 = 𝒏 − 𝟏
When using the tables, 𝒗 is degrees of freedom, 𝛂 is area under the curve,
measured from the right, and the table values are the 𝛘𝟐 scores.
Applications of the Central The central limit theorem should only on random samples, with a finite
Limit Theorem mean and variance on the population, and a sample size greater than 30.
Student’s t-Distribution Used in situations where 𝑋̅ − μ
the population is normal 𝑇 =
𝑆
and 𝛔 is not known. √𝑛
Follows same format as
CLT. This also has a df
term (𝒅𝒇 = 𝒗 = 𝒏 − 𝟏).
Alternate formula (𝑽 = 𝛘𝟐 ) 𝑍
𝑇=
√ 𝑉
𝑛−1
When using tables, 𝒅𝒇 = 𝒗 = 𝒏 − 𝟏, 𝒑 is the area under the curve,
measured from the right, and the table values are the t-scores.
Fisher-Snedecor F- Used to compare two 𝑈
Distribution variances using the ratio 𝑣
𝐹= 1
between two chi-squared 𝑉
variables 𝑼, 𝑽 and their dfs 𝑣2

Alternate formula using 𝑆12


sample variances 𝑺𝟐𝟏 , 𝑺𝟐𝟐 σ12 σ22 𝑆12
taken from samples of size 𝐹= 2 = 2 2
𝑆2 σ1 𝑆2
𝒏𝟏 , 𝒏𝟐 with variances σ22
𝛔𝟐𝟏 , 𝛔𝟐𝟐 . This has an F-
distribution with 𝒗𝟏 =
𝒏𝟏 − 𝟏 and 𝒗𝟐 = 𝒏𝟐 − 𝟏
When using the tables, 𝒗𝟏 and 𝒗𝟐 are the df values, and the table values are
the f-scores. The tables are separated by area (𝛂), so find the table that
corresponds to the f-score given the df values.
09 - One & Two-Sample Estimation Problems

Term and Symbol Notes Equation

Estimating the Mean of The 𝟏𝟎𝟎(𝟏 − 𝛂)% σ 𝜎


𝑥̅ − 𝑧α/2 < μ < 𝑥̅ + 𝑧𝛼/2
Single Sample confidence interval (CI) √𝑛 √𝑛
for the true mean 𝛍 given a
size n sample with mean 𝒙̅
and known variance 𝛔𝟐

Commonly used z-scores 𝑧0.9 = 1.645


𝒛𝛂/𝟐 𝑧0.95 = 1.96
𝑧0.99 = 2.58
𝜎
We are 𝟏𝟎𝟎(𝟏 − 𝛂)% confident the error 𝒆 = |𝒙
̅ − 𝛍| will not exceed 𝑧𝛼/2
√𝑛

Getting number of samples 𝑧α/2 σ 2

n for desired error e. 𝑛=( )


𝑒
Round up.
The 𝟏𝟎𝟎(𝟏 − 𝛂)% CI for s 𝑠
𝑥̅ − 𝑡α/2 < μ < 𝑥̅ + 𝑡𝛼/2
the true mean 𝛍 given a √𝑛 √𝑛
size 𝒏 < 𝟑𝟎 sample with
mean 𝒙 ̅ and unknown 𝑑𝑓 = 𝑣 = 𝑛 − 1
variance. T-score is used,
The 𝟏𝟎𝟎(𝟏 − 𝛂)% CI for s 𝑠
𝑥̅ − 𝑧α/2 < μ < 𝑥̅ + 𝑧𝛼/2
the true mean 𝛍 given a √𝑛 √𝑛
size 𝒏 ≥ 𝟑𝟎 sample with
mean 𝒙 ̅ and unknown pop
variance
Estimating the Difference of The 𝟏𝟎𝟎(𝟏 − 𝛂)% CI for
σ12 σ22
Two Means the difference between (𝑥̅1 − 𝑥̅ 2 ) − 𝑧α/2 √ + < μ1 − μ2
means 𝛍𝟏 − 𝛍𝟐 given size 𝑛1 𝑛2
𝒏𝟏 , 𝒏𝟐 samples with means
𝜎12 𝜎22
𝒙 ̅𝟐 and known
̅𝟏 , 𝒙 < (𝑥̅1 − 𝑥̅ 2) + 𝑧𝛼/2 √ +
variances pop 𝛔𝟐𝟏 , 𝛔𝟐𝟐 𝑛1 𝑛2

The 𝟏𝟎𝟎(𝟏 − 𝛂)% CI for


1 1
the difference between (𝑥̅1 − 𝑥̅ 2 ) − 𝑡α/2 𝑠𝑝 √ + < μ1 − μ2
means 𝛍𝟏 − 𝛍𝟐 given size 𝑛1 𝑛2
𝒏𝟏 , 𝒏𝟐 samples with means
1 1
𝒙 ̅𝟐 and unknown but
̅𝟏 , 𝒙 < (𝑥̅1 − 𝑥̅ 2 ) + 𝑡𝛼/2 𝑠𝑝 √ +
equal pop variances. T- 𝑛1 𝑛2
score is used. 𝒔𝒑 is the
(𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22
pooled std. dev, taken from 𝑠𝑝2 =
the sample std. devs 𝒔𝟏 , 𝒔𝟐 𝑛1 + 𝑛2 − 2
𝑑𝑓 = 𝑣 = 𝑛1 + 𝑛2 − 2
The 𝟏𝟎𝟎(𝟏 − 𝛂)% CI for
𝑠12 𝑠22
the difference between (𝑥̅1 − 𝑥̅ 2 ) − 𝑡α/2 √ + < μ1 − μ2
means 𝛍𝟏 − 𝛍𝟐 given size 𝑛1 𝑛2
𝒏𝟏 , 𝒏𝟐 samples with means
𝑠12 𝑠22
𝒙 ̅𝟐 and unknown and
̅𝟏 , 𝒙 < (𝑥̅1 − 𝑥̅ 2 ) + 𝑡𝛼/2 √ +
unequal pop variances. T- 𝑛1 𝑛2
score is used. df is
rounded down. 2
𝑠2 𝑠2
(𝑛1 + 𝑛2 )
1 2
𝑑𝑓 = 𝑣 = 2 2
s2 s2
(n1 ) (n2 )
1 2
n1 − 1 + n2 − 1
Two non-independent samples with different variances are called paired
observations.
The 𝟏𝟎𝟎(𝟏 − 𝛂)% CI for 𝑠𝑑 𝑠𝑑
𝑑̅ − 𝑡α/2 < μ𝑑 < 𝑑̅ + 𝑡𝛼/2
the n-pairs observation √𝑛 √𝑛
difference 𝛍𝑫 = 𝝁𝟏 − 𝝁𝟐 𝒅𝒇 = 𝒏 − 𝟏
with mean difference 𝒅̅ and
std. dev 𝒔𝒅 .
Estimating Population The 𝟏𝟎𝟎(𝟏 − 𝛂)% CI for
𝑝̂ 𝑞̂ 𝑝̂ 𝑞̂
Proportion (Single Sample) the binomial parameter p, 𝑝̂ − 𝑧α/2 √ < 𝑝 < 𝑝̂ + 𝑧𝛼/2 √
where 𝒑̂ is the proportion 𝑛 𝑛
of success for a n-size
sample and 𝒒 ̂ =𝟏−𝒑 ̂.
𝑝̂𝑞̂
We are 𝟏𝟎𝟎(𝟏 − 𝛂)% confident the error e will not exceed 𝑧𝛼/2 √
𝑛

Getting number of samples 2


𝑧α/2 𝑝̂ 𝑞̂
n for desired error e. 𝑛=
𝑒2
Round up.
Estimating Variance (Single The 𝟏𝟎𝟎(𝟏 − 𝛂)% CI for (𝑛 − 1)𝑠 2 2
(𝑛 − 1)𝑠 2
Sample the true variance 𝛔𝟐 given < σ <
χ2α/2 2
𝜒1−𝛼/2
an n-sized sample with
variance 𝒔𝟐 . Uses Chi- 𝑑𝑓 = 𝑛 − 1
squared.
When solving, keep in mind that the values are non-symmetric and that the
areas are from the right.
Estimating the Ratio of Two The 𝟏𝟎𝟎(𝟏 − 𝛂)% CI for 𝑠12 1 σ12 𝑠12
Variances the ratio of two variances < < 𝑓 (𝑑𝑓 , 𝑑𝑓 )
𝑠22 𝑓α/2 (𝑑𝑓1 , 𝑑𝑓2 ) σ22 𝑠22 𝛼/2 2 1
𝛔𝟐𝟏
given variances 𝒔𝟐𝟏 , 𝒔𝟐𝟐
𝛔𝟐𝟐
taken from 𝒏𝟏 , 𝒏𝟐 sized 𝑑𝑓1 = 𝑛1 − 1
samples. Uses f-score. 𝑑𝑓2 = 𝑛2 − 1
10 - One & Two-Sample Tests of Hypothesis

Term and Symbol Notes Equation

Hypothesis and Significance To solve any problem, it must be broken down into a few steps.
Tests
1. Create null (𝑯𝟎 ) and alternative (𝑯𝟏 ) hypothesis.
2. Determine if a one or two-tailed test is most suitable.
3. Calculate the sample statistic.
4. Reject or fail to reject the alternative hypothesis
Significance Test Results There are two types of error in significance tests. Type I error (𝛂) occurs
when 𝑯𝟎 is true and is rejected. Type II error (𝛃) occurs when 𝑯𝟎 is false
and is not rejected.
Hypothesis Testing on the Used when 𝛔 is known or 𝛔 unknown with 𝒏 ≥ 𝟑𝟎 and significance level 𝛂.
Population Mean (Case 1, 2)
One-Tailed Hypothesis 𝐻0 : μ = μ0
and Rejection Region 𝐻1 : μ > μ0 𝒐𝒓 μ < 𝑢0
𝑍 > 𝑧α 𝒐𝒓 𝑍 < 𝑧α
Two-Tailed Hypothesis 𝐻0 : μ = μ0
and Rejection Region 𝐻1 : μ ≠ μ0
|𝑍| > 𝑧α/2

Test Statistics (𝛔 known vs 𝑋̅ − μ0


unknown) 𝑍= σ
√𝑛
𝑋̅ − μ0
𝑍= s
√𝑛
When testing hypothesis with known 𝛔𝟐 and significance 𝛂 with specific
alternative 𝛍 = 𝛍𝟎 + 𝛅, the power of our test is 𝟏 − 𝛃. For defined 𝛂, 𝛃, we
can get n. Note that the notation 𝒛𝛃 is mathematically 𝒛𝟏−𝛃 = 𝒛𝒑𝒐𝒘𝒆𝒓
One-Tailed n (Round up) (𝑧α + 𝑧β ) σ2
2

𝑛=
δ2
Two-Tailed n (Round up) (𝑧α/2 + 𝑧β ) σ2
2

𝑛≈
δ2
Hypothesis Testing on the Used when 𝛔 is unknown with 𝒏 < 𝟑𝟎 and significance level 𝛂. We use T-
Population Mean (Case 3) test with 𝒅𝒇 = 𝒏 − 𝟏
One-Tailed Hypothesis 𝐻0 : μ = μ0
and Rejection Region 𝐻1 : μ > μ0 𝒐𝒓 μ < 𝑢0
𝑇 > 𝑡α 𝑜𝑟 𝑇 < −𝑡𝛼
Two-Tailed Hypothesis 𝐻0 : μ = μ0
and Rejection Region 𝐻1 : μ ≠ μ0
|𝑇| > 𝑡α/2
Test Statistic 𝑋̅ − μ0
𝑇= 𝑠
√𝑛
Hypothesis Testing on the Used when 𝛔𝟏 , 𝛔𝟐 are known, with two samples 𝒏𝟏 , 𝒏𝟐 chosen
Difference Between Two independently. Significance level of 𝛂
Means (Case 1)
One-Tailed Hypothesis 𝐻0 : μ1 − μ2 = 𝐷0
and Rejection Region 𝐻1 : μ1 − μ2 > 𝐷0 𝑜𝑟 μ − μ2 < 𝐷0
𝑍 > 𝑧α 𝑜𝑟 𝑍 < −𝑧α
Two-Tailed Hypothesis 𝐻0 : μ1 − μ2 = 𝐷0
and Rejection Region 𝐻1 : μ1 − μ2 ≠ 𝐷0
|𝑍| > 𝑧α/2

Test Statistic (𝑋̅1 − 𝑋̅2 ) − 𝐷0


𝑍=
σ𝑋̅1−𝑋̅2
(𝑋1 − 𝑋̅2 ) − 𝐷0
̅
=
σ12 σ22

𝑛1 + 𝑛2

Hypothesis Testing on the Used when 𝛔𝟏 , 𝛔𝟐 are unknown, with two large (𝒏 ≥ 𝟑𝟎) samples 𝒏𝟏 , 𝒏𝟐
Difference Between Two chosen independently. Significance level of 𝛂
Means (Case 2)
One-Tailed Hypothesis 𝐻0 : 𝜇1 − 𝜇2 = 𝐷0
and Rejection Region 𝐻1 : 𝜇1 − 𝜇2 > 𝐷0 𝑜𝑟 𝜇 − 𝜇2 < 𝐷0
𝑍 > 𝑧𝛼 𝑜𝑟 𝑍 < −𝑧𝛼
Two-Tailed Hypothesis 𝐻0 : μ1 − μ2 = 𝐷0
and Rejection Region 𝐻1 : μ1 − μ2 ≠ 𝐷0
|𝑍| > 𝑧𝛼/2

Test Statistic (𝑋̅1 − 𝑋̅2 ) − 𝐷0


𝑍=
𝜎𝑋̅1−𝑋̅2
(𝑋̅1 − 𝑋̅2 ) − 𝐷0
=
𝑠12 𝑠22

𝑛1 + 𝑛2

Hypothesis Testing on the Used when 𝛔𝟏 , 𝛔𝟐 are unknown but equal, with two samples 𝒏𝟏 , 𝒏𝟐 chosen
Difference Between Two independently. Significance level of 𝛂. T-score with 𝒅𝒇 = 𝒏𝟏 + 𝒏𝟐 − 𝟐
Means (Case 3)
One-Tailed Hypothesis 𝐻0 : 𝜇1 − 𝜇2 = 𝐷0
and Rejection Region 𝐻1 : 𝜇1 − 𝜇2 > 𝐷0 𝑜𝑟 𝜇 − 𝜇2 < 𝐷0
𝑇 > 𝑡𝛼 𝑜𝑟 𝑇 < −𝑡𝛼

Two-Tailed Hypothesis 𝐻0 : μ1 − μ2 = 𝐷0
and Rejection Region 𝐻1 : μ1 − μ2 ≠ 𝐷0
|𝑇| > 𝑡𝛼/2
Test Statistic (𝑋̅1 − 𝑋̅2 ) − 𝐷0
𝑇=
1 1
𝑠𝑝 √𝑛 + 𝑛
1 2

(𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22


𝑠𝑝2 =
𝑛1 + 𝑛2 − 2
Hypothesis Testing on the Used when 𝛔𝟏 , 𝛔𝟐 are unknown and unequal, with two samples 𝒏𝟏 , 𝒏𝟐
Difference Between Two chosen independently. Significance level of 𝛂
Means (Case 4)
One-Tailed Hypothesis 𝐻0 : 𝜇1 − 𝜇2 = 𝐷0
and Rejection Region 𝐻1 : 𝜇1 − 𝜇2 > 𝐷0 𝑜𝑟 𝜇 − 𝜇2 < 𝐷0
𝑇 > 𝑡𝛼 𝑜𝑟 𝑇 < −𝑡𝛼

Two-Tailed Hypothesis 𝐻0 : μ1 − μ2 = 𝐷0
and Rejection Region 𝐻1 : μ1 − μ2 ≠ 𝐷0
|𝑇| > 𝑡𝛼/2

Test Statistic (df rounds (𝑋̅1 − 𝑋̅2 ) − 𝐷0


down!) 𝑇=
𝑠12 𝑠22

𝑛1 + 𝑛2

2
𝑠2 𝑠2
(𝑛1 + 𝑛2 )
1 2
𝑑𝑓 = 𝑣 = 2 2
s2 s2
(n1 ) (n2 )
1 2
n1 − 1 + n2 − 1
Hypothesis Testing on the ̅.
Used with a sample of paired differences of size n with mean 𝒅
Difference Between Two Significance level of 𝛂.
Means (Case 5)
One-Tailed Hypothesis 𝐻0 : 𝜇1 − 𝜇2 = 𝐷0
and Rejection Region 𝐻1 : 𝜇1 − 𝜇2 > 𝐷0 𝑜𝑟 𝜇 − 𝜇2 < 𝐷0
𝑇 > 𝑡𝛼 𝑜𝑟 𝑇 < −𝑡𝛼
Two-Tailed Hypothesis 𝐻0 : μ1 − μ2 = 𝐷0
and Rejection Region 𝐻1 : μ1 − μ2 ≠ 𝐷0
|𝑇 | > 𝑡𝛼/2

Test Statistic 𝑑̅ − 𝐷0 𝑑̅ − 𝐷0
𝑇= σ𝑑 ≈ 𝑠𝑑
√ 𝑛 √𝑛
𝑑𝑓 = 𝑛 − 1

Hypothesis Testing on Used for n-sized sample selected randomly.


Population Variance
One-Tailed Hypothesis 𝐻0 : σ2 = σ20
and Rejection Region 𝐻1 : 𝜎 2 > 𝜎02 𝑜𝑟 𝜎 2 < 𝜎02
𝜒 2 > 𝜒𝛼2 𝑜𝑟 𝜒 2 < 𝜒1−𝛼
2
Two-Tailed Hypothesis 𝐻0 : 𝜎 2 = 𝜎02
and Rejection Region 𝐻1 : 𝜎 2 ≠ 𝜎02
2 2
𝜒 2 > 𝜒𝛼/2 𝑜𝑟 𝜒 2 < 𝜒1−𝛼/2
Test Statistic (𝑛 − 1)𝑠 2
χ2 =
σ20
𝑑𝑓 = 𝑛 − 1

Hypothesis Testing on the Used for samples of sizes 𝒏𝟏 , 𝒏𝟐 selected independently and randomly.
Ratio of Two Population Significance level of 𝛂.
Variance
One-Tailed Hypothesis σ12
and Rejection Region 𝐻0 : = 1
σ22
𝜎12 𝜎12
𝐻1 : > 1 𝑜𝑟 2 < 1
𝜎22 𝜎2
𝐹 > 𝑓α (𝑑𝑓1 , 𝑑𝑓2 )
Two-Tailed Hypothesis 𝜎12
and Rejection Region 𝐻0 : 2 = 1
𝜎2
𝜎12
𝐻1 : 2 ≠ 1
𝜎2
𝐹 > 𝑓𝛼/2 (𝑑𝑓1 , 𝑑𝑓2 )
Test Statistic 𝑙𝑎𝑟𝑔𝑒𝑟 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝐹=
𝑠𝑚𝑎𝑙𝑙𝑒𝑟 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑠12 𝑠22
= 2 𝑜𝑟 2
𝑠2 𝑠1
𝑑𝑓1 = 𝑛1 − 1
𝑑𝑓2 = 𝑛2 − 1
Goodness-of-Fit Test Test how good a fit between frequency of observations 𝒐𝒊 in a k-sized sample
compared to the expected frequencies 𝒆𝒊 . Significance level of 𝛂. Note that
when the expected frequencies are less than 5, collapse the class.
Rejection Region χ2 > χ2α
Test Statistic 𝑘
(𝑜𝑖 − 𝑒𝑖 )2
2
χ =∑
𝑒𝑖
𝑖=1

𝑑𝑓 = 𝑘 − 1
11 - Simple Linear Regression and Correlation

Term and Symbol Notes Equation

General Regression Model For a response (dependent) 𝑌 = α + β𝑥 + ϵ


variable 𝒀 and independent 𝐸(ϵ) = 0
variable 𝒙. The model is 𝑉𝑎𝑟(ϵ) = σ2
given as a function of 𝛂
and 𝛃, the intercept and
slope parameters. There is
also a random variable 𝛜.
𝑛 𝑛
Least Squares Estimate In order to estimate the
regression model, we use 𝑆𝑆𝐸 = ∑ 𝑒𝑖2 = ∑(𝑦𝑖 − 𝑦̂𝑖 )2
the Sum of the Squares of 𝑖=1 𝑖=1
𝑛
the Errors (SSE).
= ∑(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 )2
𝑖=1

By differentiating and ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)


minimizing the function, 𝑏=
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
we get the following
equations for the least 𝑎 = 𝑦̅ − 𝑏𝑥̅
squares estimates 𝒂 and 𝒃
𝑛
Shorthand Notations
𝑆𝑥𝑥 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑖=1
𝑛 𝑛 2
1
= ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 )
𝑛
𝑖=1 𝑖=1
𝑛

𝑆𝑦𝑦 = ∑(𝑦𝑖 − 𝑦̅)2


𝑖=1
𝑛 𝑛 2
1
= ∑ 𝑦𝑖2 − (∑ 𝑦𝑖 )
𝑛
𝑖=1 𝑖=1
𝑛

𝑆𝑥𝑦 = ∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)


𝑖=1
𝑛 𝑛 𝑛
1
= ∑ 𝑥𝑖 𝑦𝑖 − (∑ 𝑥𝑖 ) (∑ 𝑦𝑖 )
𝑛
𝑖=1 𝑖=1 𝑖=1

Shorthand least squares 𝑎 = 𝑦̅ − 𝑏𝑥̅


estimates { 𝑆𝑥𝑦
𝑏=
𝑆𝑥𝑥
𝑛
Shorthand SSE
𝑆𝑆𝐸 = ∑ 𝑒𝑖2 = 𝑆𝑦𝑦 − 𝑏𝑆𝑥𝑦
𝑖=1

When solving, it is recommended to solve for the sums of 𝑥𝑖 , 𝑦𝑖 , 𝑥𝑖2 , 𝑦𝑖2, and
𝑥𝑖 𝑦𝑖 beforehand to simplify the calculations of 𝑆𝑥𝑥 , 𝑆𝑥𝑦
Coefficient of Correlation The coefficient of 𝑆𝑥𝑦
correlation r is the strength 𝑟=
√𝑆𝑥𝑥 𝑆𝑦𝑦
of the linear relationship
between two variables x, y
in the sample.
𝑛
Total Corrected Sum of Total variation of the data
Squares (SST) relative to the naïve model, 𝑆𝑦𝑦 = ∑(𝑌𝑖 − 𝑦̅)2 = 𝑆𝑆𝑇
𝑦̂ = 𝑦̅ 𝑖=1
𝑛
Total variation of the data 2
𝑆𝑥𝑦
relative to the least squares 𝑆𝑆𝐸 = ∑(𝑌𝑖 − 𝑦̂𝑖 )2 = 𝑆𝑦𝑦 −
𝑆𝑥𝑥
line 𝑖=1

Coefficient of Determination Describes how well the 𝑆𝑦𝑦 − 𝑆𝑆𝐸 𝑆𝑆𝑇 − 𝑆𝑆𝐸 2
𝑆𝑥𝑦
least-squares line fits the 𝑟2 = = =
𝑆𝑦𝑦 𝑆𝑆𝑇 𝑆𝑥𝑥 𝑆𝑦𝑦
data. Proportional to the
reduction in variation
using least-squares model
vs the naïve one.
Values range from 0 to 1, where 1 means all points lie on the least-squares
line and 0 represents the least-squares line offering no information about Y
Variance Estimation An unbiased estimate of 𝑆𝑆𝐸 𝑆𝑦𝑦 − 𝑏𝑆𝑥𝑦
the common variance of the 𝑠2 = =
𝑛−2 𝑛−2
residuals based on the
sample variance is:
Confidence Interval The 𝟏𝟎𝟎(𝟏 − 𝛂)% CI for
1 (𝑥0 − 𝑥̅ )2
the mean response 𝝁𝒀|𝒙𝟎 𝑦̂0 − 𝑡α/2 𝑠√ + < 𝜇𝑌|𝑥0
𝑛 𝑆𝑥𝑥
for a given x-value 𝒙𝟎 . The
value 𝒚̂𝟎 represents the 1 (𝑥0 − 𝑥̅ )2
least-squares prediction for < 𝑦̂0 + 𝑡𝛼/2 𝑠√ +
𝒙𝟎 . 𝑛 𝑆𝑥𝑥
𝑑𝑓 = 𝑛 − 2

You might also like