Summary
Summary
1 Probability
P (AB) = P (A)P (B|A) và P (A1 A2 ...An ) = P (A1 )P (A2 |A1 )P (A3 |A1 A2 )...P (An |A1 A2 ...An−1 )
P (Ak )P (F |Ak )
Bayse formula: P (Ak |F ) = .
P (F )
e−λ λk
Poisson r.v, X ∼ P (λ): P (X = k) = , k = 0, 1, 2, . . . , và E(X) = V(X) = λ.
k!
k C n−k
CK
N −K N −n K
Hypergeometric r.v, X ∼ H(N, K, n): P (X = k) = n
and E(X) = np, Var(X) = np(1 − p) ,p= .
CN N −1 N
(x−µ) 2
1 −
Normal r.v, X ∼ N (µ, σ 2 ): f (x) = √ e 2σ 2 và E(X) = µ, V(X) = σ 2 .
σ 2π
Central limit theorem: If X1 , . . . , Xn are n independent random variables and they have the same distribution with mean E(Xk ) = µ and
Pn
k=1 Xk X −µ
finite variance V(Xk ) = σ 2 , X = when n is enough large, then √ ∼ N (0, 1).
n σ/ n
2 Statistics
σ σ
1. Given σ 2 , X follows normal distribution: x − zα/2 √ ≤ µ ≤ x + zα/2 √
n n
n−1 s s
2. σ is unknown , and X follows normal distribution: x − tα/2 √ ≤ µ ≤ x + tn−1
α/2 √n
n
σ σ
3. Sample size n is large, population distribution is unknown or not following normal distribution: x − zα/2 √ ≤ µ ≤ x + zα/2 √
n n
(if σ is unknown, replace σ by s).
s s
P̂ (1 − P̂ ) P̂ (1 − P̂ ) X
CI for P , n large : P̂ − zα/2 ≤ P ≤ P̂ + zα/2 . where: P̂ = , X is the number of observations showing the
n n n
interesting event A in a sample of size n.
X − µ0
1. Given σ, X follows normal distribution : z0 = √ ==> use table 1 (z-test).
σ/ n
X − µ0
2. σ is unknown and X has normal distribution: t0 = √ ==> use table 2 (t-test).
s/ n
P̂ − p0 X
Hypothesis test for P , n is large : z0 = r ==> use table 1 (z-test). Where: P̂ = , X is the number of observations
p0 (1 − p0 ) n
n
showing the interesting event A in a sample of size n.
X −Y
1. Given σ, populations follow normal distributions: z0 = s ==> use table 1(z-test).
σ12 σ2
+ 2
n1 n2
2. σ is unknown, populations follow normal distributions and σ1 = σ2 :
(n1 − 1)s21 + (n2 − 1)s22 X −Y
Sp2 = , t0 = r ==> use table 2 (t-test) with df = n1 + n2 − 2 .
n1 + n2 − 2 1 1
Sp +
n1 n2
3. σ is unknown, populations follow normal distributions and σ1 6= σ2 :
X −Y [(s2 /n )+(s2 /n )]2
z0 = s và df = 2 1 12 2 2 2 2 ==>use table 2 (t-test).
s21 s2 (s 1 /n1 ) (s2 /n2 )
+
+ 2 n 1 − 1 n2 − 1
n 1 n 2
Note: A rule of thumb to check σ1 = σ2 :
s1
If ∈ [0.5, 2], use assumption σ1 = σ2 , otherwise σ1 6= σ2 .
s2
X −Y
4. Sample sizes are large, population distributions are unknown or not following normal distributions: z0 = s ( σ1 , σ2 are
σ12 σ2
+ 2
n1 n2
X −Y
known), z0 = s ( σ1 , σ2 are unknown) ==> use table 1(z-test).
s21 s2
+ 2
n1 n2
P̂1 − P̂2
Hypothesis test for population proportions, sample sizes are large : z0 = s ==>use table 1, where:
1 1
P̂ (1 − P̂ ) +
n1 n2
X +Y X Y
P̂ = , P̂1 = , P̂2 = , X and Y are the numbers of observations showing the interesting event A in samples of sizes n1 and
n1 + n2 n1 n2
n2 respectively.
H1 Rejection region p-value H1 Rejection region (one sample) Rejection region p-value
n o n o n o
two-side Wα = z0 : |z0 | > zα/2 2 [1 − Φ(|z0 |)] Two-side Wα = t0 : |t0 | > tα/2,n−1 Wα = t0 : |t0 | > tα/2,df 2P (T > |t0 |)
right-side Wα = {z0 : z0 > zα } 1 − Φ(z0 ) Right-side Wα = t0 : t0 > tα,n−1 Wα = t0 : t0 > tα,df P (T > t0 )
left side Wα = {z0 : z0 < −zα } Φ(z0 ) Left-side Wα = t0 : t0 < −tα,n−1 Wα = t0 : t0 < −tα,df P (T < t0 )
Table 1: z-test Table 2: t-test
Source of variation SS df MS F
2 2
Pk Pk yi· y··
Treatments(SSB) SSB = n i=1 (ȳi· − ȳ·· = )2 i=1 n− N
k−1 M SB = SSB
k−1
SSW = ki=1 n 2 = SST − SSB SSW M SB
P P
Error(SSW) j=1 (y ij − ȳ i· ) k(n − 1) M SW = k(n−1) F = M SW
2
y··
SST = ki=1 n
P P 2
Pk Pn 2
Total (SST) j=1 (yij − ȳ·· ) = i=1 j=1 yij − N
kn − 1
M SB
Reject H0 when: F = M SW
> Fα;k−1,k(n−1) ===> F has Fisher distribution (F-test).
2
2.5 Simple linear regression model
Pn Pn
Pn i=1 xi i=1 yi
i=1 xi yi −
Sxy
The fitted regression line : y = β̂0 + β̂1 x, where: βˆ1 = Pn n 2 = Sxx
, and βˆ0 = ȳ − βˆ1 x̄.
Pn 2 i=1 xi
i=1 xi −
n
Sums:
Pn 2
xi
Sxx = i=1
Pn Pn
i=1 (xi − x̄)2 = i=1 x2i −
n
Pn Pn
xi yi
Sxy = i=1 i=1
Pn Pn
i=1 (xi − x̄)(yi − ȳ) = i=1 xi yi −
n
Pn 2
yi
Syy = SST = i=1
Pn Pn
i=1 (yi − ȳ)2 = i=1 yi2 −
n
SSE
Estimate σ 2 : σ̂ 2 = , SSE = SST − SSR = SST − β̂1 Sxy
n−2
Hypothesis testing for coefficients:
1. H0 : β1 = b1 q
βˆ1 −b1 σ̂ 2
The test statistic Tβ1 = ∼ t(n − 2), SE(β̂1 ) = Sxx
, ===> use table 2 (t-test).
SE(β̂1 )
2. H0 : β0 = b0 s
βˆ0 −b0 1 x̄2
The test statistic: Tβ0 = ∼ t(n − 2), SE(β̂0 ) = σ̂ 2 + Sxx
===>use table 2 (t-test).
SE(β̂0 ) n
3 Exercises