0% found this document useful (0 votes)
36 views3 pages

Summary

This document summarizes key formulas and concepts related to probability and statistics. It includes: - Formulas for addition and multiplication rules of probability, as well as the total, Bayes, and binomial formulas. - Definitions and formulas for calculating the mean, variance, and probability mass/density functions of discrete and continuous random variables, including binomial, Poisson, hypergeometric, and normal distributions. - Formulas and procedures for constructing confidence intervals for a population mean and proportion using z-tests. - Formulas and procedures for hypothesis testing of a population mean or proportion using z-tests and t-tests, including tests for one sample and two independent samples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views3 pages

Summary

This document summarizes key formulas and concepts related to probability and statistics. It includes: - Formulas for addition and multiplication rules of probability, as well as the total, Bayes, and binomial formulas. - Definitions and formulas for calculating the mean, variance, and probability mass/density functions of discrete and continuous random variables, including binomial, Poisson, hypergeometric, and normal distributions. - Formulas and procedures for constructing confidence intervals for a population mean and proportion using z-tests. - Formulas and procedures for hypothesis testing of a population mean or proportion using z-tests and t-tests, including tests for one sample and two independent samples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Summary of fundamental formulas and exercises

HCM University of Technology, VNU


Probability and Statistics
Faculty of Applied Science

1 Probability

1.1 Addition and product rules


Addition formula and multiplication formula

ˆ P (A + B) = P (A) + P (B) − P (AB), và P


Pn  P P P
i=1 Ai = i P (Ai ) − i<j P (Ai Aj ) + i<j<k P (Ai Aj Ak ) − . . .

ˆ P (AB) = P (A)P (B|A) và P (A1 A2 ...An ) = P (A1 )P (A2 |A1 )P (A3 |A1 A2 )...P (An |A1 A2 ...An−1 )

Let A1 , . . . , An be mutually exclusive and exhaustive sets.

ˆ Total formula: P (F ) = P (A1 )P (F |A1 ) + P (A2 )P (F |A2 ) + · · · + P (An )P (F |An ).

P (Ak )P (F |Ak )
ˆ Bayse formula: P (Ak |F ) = .
P (F )

1.2 Random variable (R.v):

ˆ Discrete r.v X: µ = E(X) = xi P (X = xi ), và σ 2 = V(X) = i (xi − µ)2 P (X = xi ) = i x2i P (X = xi ) − µ2 .


P P P
i
R∞ R∞ R∞ 2
ˆ Continuous r.v X: µ = E(X) = −∞ xf (x)dx, and V(X) = −∞ (x − µ)2 f (x)dx = −∞ x f (x)dx − µ2 .

1.3 Some special distribution


k pk q n−k , k = 0, 1, . . . , n and E(X) = np, V(X) = npq.
Binomial r.v, X ∼ B(n, p)): P (X = k) = Cn

e−λ λk
Poisson r.v, X ∼ P (λ): P (X = k) = , k = 0, 1, 2, . . . , và E(X) = V(X) = λ.
k!
k C n−k
CK
 
N −K N −n K
Hypergeometric r.v, X ∼ H(N, K, n): P (X = k) = n
and E(X) = np, Var(X) = np(1 − p) ,p= .
CN N −1 N

(x−µ) 2
1 −
Normal r.v, X ∼ N (µ, σ 2 ): f (x) = √ e 2σ 2 và E(X) = µ, V(X) = σ 2 .
σ 2π

Central limit theorem: If X1 , . . . , Xn are n independent random variables and they have the same distribution with mean E(Xk ) = µ and
Pn
k=1 Xk X −µ
finite variance V(Xk ) = σ 2 , X = when n is enough large, then √ ∼ N (0, 1).
n σ/ n

2 Statistics

2.1 Confidence interval (CI)


CI for µ :

σ σ
1. Given σ 2 , X follows normal distribution: x − zα/2 √ ≤ µ ≤ x + zα/2 √
n n
n−1 s s
2. σ is unknown , and X follows normal distribution: x − tα/2 √ ≤ µ ≤ x + tn−1
α/2 √n
n
σ σ
3. Sample size n is large, population distribution is unknown or not following normal distribution: x − zα/2 √ ≤ µ ≤ x + zα/2 √
n n
(if σ is unknown, replace σ by s).
s s
P̂ (1 − P̂ ) P̂ (1 − P̂ ) X
CI for P , n large : P̂ − zα/2 ≤ P ≤ P̂ + zα/2 . where: P̂ = , X is the number of observations showing the
n n n
interesting event A in a sample of size n.

2.2 Hypothesis test with one sample


Hypothesis test for µ :

X − µ0
1. Given σ, X follows normal distribution : z0 = √ ==> use table 1 (z-test).
σ/ n
X − µ0
2. σ is unknown and X has normal distribution: t0 = √ ==> use table 2 (t-test).
s/ n

Supervisor: Phan Thị Hường Email: [email protected]


X − µ0
3. Sample size n is large, population distribution is unknown or not following normal distribution:, : z0 = √ ( σ is known ) or
σ/ n
X − µ0
z0 = √ (if σ is unknown)==> use table 1 (z-test).
s/ n

P̂ − p0 X
Hypothesis test for P , n is large : z0 = r ==> use table 1 (z-test). Where: P̂ = , X is the number of observations
p0 (1 − p0 ) n
n
showing the interesting event A in a sample of size n.

2.3 Hypothesis test with two sample


D̄−∆√0 ,
Hypothesis test for means, µ1 − µ2 with dependent samples : t0 = SD / n
where Di = xi − yi . ==> use table 2 (t-test) with
df = n − 1.

Hypothesis test for means, µ1 − µ2 with independent samples :

X −Y
1. Given σ, populations follow normal distributions: z0 = s ==> use table 1(z-test).
σ12 σ2
+ 2
n1 n2
2. σ is unknown, populations follow normal distributions and σ1 = σ2 :
(n1 − 1)s21 + (n2 − 1)s22 X −Y
Sp2 = , t0 = r ==> use table 2 (t-test) with df = n1 + n2 − 2 .
n1 + n2 − 2 1 1
Sp +
n1 n2
3. σ is unknown, populations follow normal distributions and σ1 6= σ2 :
X −Y [(s2 /n )+(s2 /n )]2
z0 = s và df = 2 1 12 2 2 2 2 ==>use table 2 (t-test).
s21 s2 (s 1 /n1 ) (s2 /n2 )
+
+ 2 n 1 − 1 n2 − 1
n 1 n 2
Note: A rule of thumb to check σ1 = σ2 :
s1
If ∈ [0.5, 2], use assumption σ1 = σ2 , otherwise σ1 6= σ2 .
s2
X −Y
4. Sample sizes are large, population distributions are unknown or not following normal distributions: z0 = s ( σ1 , σ2 are
σ12 σ2
+ 2
n1 n2
X −Y
known), z0 = s ( σ1 , σ2 are unknown) ==> use table 1(z-test).
s21 s2
+ 2
n1 n2

P̂1 − P̂2
Hypothesis test for population proportions, sample sizes are large : z0 = s   ==>use table 1, where:
1 1
P̂ (1 − P̂ ) +
n1 n2
X +Y X Y
P̂ = , P̂1 = , P̂2 = , X and Y are the numbers of observations showing the interesting event A in samples of sizes n1 and
n1 + n2 n1 n2
n2 respectively.

H1 Rejection region p-value H1 Rejection region (one sample) Rejection region p-value
n o n o n o
two-side Wα = z0 : |z0 | > zα/2 2 [1 − Φ(|z0 |)] Two-side Wα = t0 : |t0 | > tα/2,n−1 Wα = t0 : |t0 | > tα/2,df 2P (T > |t0 |)
 
right-side Wα = {z0 : z0 > zα } 1 − Φ(z0 ) Right-side Wα = t0 : t0 > tα,n−1 Wα = t0 : t0 > tα,df P (T > t0 )
 
left side Wα = {z0 : z0 < −zα } Φ(z0 ) Left-side Wα = t0 : t0 < −tα,n−1 Wα = t0 : t0 < −tα,df P (T < t0 )
Table 1: z-test Table 2: t-test

2.4 One-way ANOVA, balance sizes


Consider a sample of N = kn observations, where k is the number of levels of a factor and there are n observations per level.
A hypothesis test:
H0 : τ1 = τ2 = · · · = τk = 0 (Changing the levels of the factor has no effect on the mean response)
H1 : τi 6= 0, at least onei (There exists the difference between the levels of the factor.).
ANOVA table:

Source of variation SS df MS F
2 2
Pk Pk yi· y··
Treatments(SSB) SSB = n i=1 (ȳi· − ȳ·· = )2 i=1 n− N
k−1 M SB = SSB
k−1
SSW = ki=1 n 2 = SST − SSB SSW M SB
P P
Error(SSW) j=1 (y ij − ȳ i· ) k(n − 1) M SW = k(n−1) F = M SW
2
y··
SST = ki=1 n
P P 2
Pk Pn 2
Total (SST) j=1 (yij − ȳ·· ) = i=1 j=1 yij − N
kn − 1

M SB
Reject H0 when: F = M SW
> Fα;k−1,k(n−1) ===> F has Fisher distribution (F-test).

2
2.5 Simple linear regression model
Pn  Pn 
Pn i=1 xi i=1 yi
i=1 xi yi −
Sxy
The fitted regression line : y = β̂0 + β̂1 x, where: βˆ1 = Pn n 2 = Sxx
, and βˆ0 = ȳ − βˆ1 x̄.
Pn 2 i=1 xi
i=1 xi −
n
Sums:
Pn 2
xi
ˆ Sxx = i=1
Pn Pn
i=1 (xi − x̄)2 = i=1 x2i −
n
Pn  Pn 
xi yi
ˆ Sxy = i=1 i=1
Pn Pn
i=1 (xi − x̄)(yi − ȳ) = i=1 xi yi −
n
Pn 2
yi
ˆ Syy = SST = i=1
Pn Pn
i=1 (yi − ȳ)2 = i=1 yi2 −
n
SSE
Estimate σ 2 : σ̂ 2 = , SSE = SST − SSR = SST − β̂1 Sxy
n−2
Hypothesis testing for coefficients:

1. H0 : β1 = b1 q
βˆ1 −b1 σ̂ 2
The test statistic Tβ1 = ∼ t(n − 2), SE(β̂1 ) = Sxx
, ===> use table 2 (t-test).
SE(β̂1 )

2. H0 : β0 = b0 s  
βˆ0 −b0 1 x̄2
The test statistic: Tβ0 = ∼ t(n − 2), SE(β̂0 ) = σ̂ 2 + Sxx
===>use table 2 (t-test).
SE(β̂0 ) n

CIs for coefficients :


r r
σ̂ σ̂
1. βˆ1 − tn−2
α/2
≤ β 1 ≤ βˆ1 + tn−2
α/2
Sxx Sxx
r  r 
x̄2 x̄2
2. βˆ0 − tn−2
α/2
1
n
+ S
σ̂ 2 ≤ β ≤ βˆ + tn−2
0 0 α/2
1
n
+ Sxx
σ̂ 2
xx

SSR SSE Sxx Sxy


Coefficient of determination : R2 = =1− = β12 = β1 .
SST SST SST SST

3 Exercises

You might also like