Formuleblad-statistiek
Formuleblad-statistiek
1 Standard Formulas
First we treat the calculations on sample data and afterwards population data.
1
Formula Overview
1.1.3 Covariance and Correlation (sample data)
• n Cr = (nr) = n!
r!(n−r )!
2
Formula Overview
3 Formulas about random variables / distributions
For the formulas of all the distributions (Bernoulli, normal etc.), we refer to AthenaDocs.
3.2 Approximations
3.2.1 Approximating X̄
Note: The following is only true if the conditions in Table 1 hold!
Let X be a RV with mean µ and variance σ2 . Let X̄ be the average of n X’s. Then
!
σ2
X̄ ∼ N µ, .
n
3
Formula Overview
The Standard Error of X̄ is √σ .
n
Hence,
!
k−µ
P( X̄ < k) = P Z< √ .
σ/ n
n < 15 If this is true, we must know that the population is normally distributed.
15 ≤ n < 30 If this is true, we only have to know that the population is symmetrically distributed.
n ≥ 30 If this is true, the population may be skewed, but not severe.
Table 1: Assumptions
• If BOTH nπ ≥ 5 and nπ (1 − π ) ≥ 5:
We approximate X ∼ N(nπ, nπ (1 − π )). Do not forget the continuity correction:
Original Approximation
P( X > k) P( X > k + 0.5)
P( X ≥ k) P( X ≥ k − 0.5)
P( X < k) P( X < k − 0.5)
P( X ≤ k) P( X ≤ k + 0.5)
4
Formula Overview
4 Confidence Interval Formulas
4.1 One-Sample C.I.
Here, C.I. stands for “Confidence Interval”. For C.I. about regression, see Section 6.2.1.
In the case of “C.I. for µ1 , µ2 , σ1 6= σ2 unknown”, the d f is usually given. A little note:
• In the odd case that you need to calculate the d f after all, use the formula
2
s21 s22
n1 + n2
d f special = !2 !2
s2 s2
1 2
n1 n2
n1 −1 + n2 −1
5 Hypothesis Testing
5.1 One-Sample Hypothesis Testing
Use this table when H0 and H1 concern only one parameter:
5
Formula Overview
When to Use Test statistic Distribution df Requirements
Test for µ, X̄ −µ0
zcalc = √ t (or z) ∞ See Table 1
σ known σ/ n
C.I. for µ, X̄ −µ0
tcalc = √ t n−1 See Table 1
σ unknown s/ n
p − π0
Test for π zcalc = √ t (or z) ∞ nπ0 ≥ 10 and n(1 − π0 ) ≥ 10
π0 (1−π0 )/n
( n −1) s2
Test for σ2 χ2calc = σ2 χ2 n−1 Normal Population
q0
Test for ρ tcalc = r 1n−−r22 t n−2 See Table 1
Test for β i bi − c Normal Populations,
tcalc = sb t n−k−1
(regression) i Equal variances
In the case of “Test for µ1 − µ2 , σ1 6= σ2 unknown”, the d f is usually given. If you have two samples,
and none of these formulas work, do not forget about the paired difference procedure!. A little note:
• In the odd case that you need to calculate the d f after all, use the formula
2
s21 s22
n1 + n2
d f special = !2 !2
s2 s2
1 2
n1 n2
n1 −1 + n2 −1
6
Formula Overview
6 Linear Regression
6.1 Simple Linear Regression
Let’s say you have a theoretical regression model of the form Y = β 0 + β 1 X + ε, and you test
H0 : β 1 = c, H0 : β 1 ≤ c, or β 1 ≥ c,
then use
b1 − c
t= ∼ t n − k −1
sb1
where
You can also use this formula for other bi ’s if you have the model Y = β 0 + β 1 X1 + . . . + β k Xk + e of
course.
• Requirements: Normal Populations & Equal Variances
then use
SSR/k ( n − k − 1) R2
F= = ∼ Fk,n−k−1 .
SSE/(n − k − 1) k 1 − R2
Usually a table of this form is given:
n−1
R2adj = 1 − (1 − R ) 2
n−k−1
7
Formula Overview
6.2.1 Confidence Intervals
If we have a regression model Y = β 0 + β 1 X1 + . . . + β k Xk , then a (1 − α)100% confidence interval for
a slope β i is given by
bi ± t α ;n−k−1 sbi
2
where sbi is the standard error of the regression bi and k the amount of predictor variables.
8
Formula Overview
7.3 Contingency Tables
Let’s say you two variables (that are qualitative) for which you want to check whether they are
independent (like Education Level and Place of residence), with a hypothesis:
H0 : The variables are independent.
For this test you use the statistic
( f jk − e jk )2
χ2calc = ∑ e jk
∼ χ2(r−1)(c−1) .
R j · Ck
e jk = .
n
Here
- f jk is observed frequency in a cell,
- e jk is the expected frequency of a cell,
- R j is the total amount of observations in a row,
- Ck is the total amount of observations in a column,
- n is the total amount of observations in total,
- r and c are the amount of rows and collumns respectively.
9
Formula Overview
7.4.3 Wilcoxon Signed Ranked Test
You need to assume symmetry to use this test. The statistic is
W= ∑ Ri+
i
Step 2 Number each observation in this list from 1 to n. These numbers we call ‘Ranks’.
Step 3 Add the rank numbers for all observations for which the observation was above M0 .
This equals the test statistic W.
Step 4 If n ≤ 20 go to the Wilcoxon signed ranked test table. Find corresponding interval from
table ( a, b).
- If our W lies in the interval: keep H0 .
- If W lies outside the interval: reject H0 .
n ( n +1)
W− 4
Step 5 If n > 20 we use a z-test, with the test statistic Z = q
n(n+1)(2n+1)
24
10