STATS Shortcut Formula
STATS Shortcut Formula
VARIANCE
(x − μ)2
∑
n u m e r a t or =
2
∑ (x − μ)2 2
∑ (x − x̄)2
σpopulation = Ssample = =
∑
(x 2 − 2x μ + μ 2 )
N n−1 (x 2 ) − 2μ μ 2
∑ ∑ ∑
= x+
(x 2 ) − 2μ . n . μ + n μ 2
∑
=
Sum of squared deviations of every observation from its mean over n
total number of observations. Units are square units of the original (x 2 ) − n μ 2
∑ i
=
variable.
1
Computational Formulae :
n n
∑1 (xi2) − Nμ 2 ∑1 (xi2) − n x̄ 2 <= useful for manual calculations
2 2
σpopulation = Ssample =
N n−1
STANDARD DEVIATION
Square root of Variance. Units are associated with numerical measures. σ or s
Compute : Arrange ascending, decimal np? next pos : integer np? avg of value at np&np+1.
Percentile need not be part of data set. Q1 = 25th p, Q2= median = 50th p, Q3= 75th p.
5 number summary : Min,Q1,Q2,Q3,Max
OUTLIERS
outlier < Q1 − 1.5 * IQ R; outlier > Q3 + 1.5 * IQ R
TWO WAY CONTINGENCY TABLE
Association between 2 categorical variables - relative frequencies. Bivariate Categorical Data.
Row Total and Column Total. Row Relative Frequencies (cell frequency/row total) & Column
Relative Frequencies (cell frequency/column total). If the row or column relative frequencies are
the same for all rows/columns then the 2 variables are not associated.
SCATTER PLOT
y-axis : response variable ; x-axis : explanatory variable
n n ∑n1 xi . ∑n1 yi
∑1 (xi − x̄)(yi − ȳ) ∑1 (xi . yi ) − N
cov(x, y)population = =
N N
n
∑
num erator = (xi − x̄)(yi − ȳ)
1
n
∑
= (xi . yi − xi . ȳ − x̄ . yi + x̄ . ȳ)
1
n n n n
∑ ∑ ∑ ∑
= xi . yi − xi . ȳ − x̄ . yi + x̄ . ȳ
1 1 1 1
n n n n
∑ ∑ ∑ ∑
= xi . yi − ȳ . xi − x̄ . yi + x̄ . ȳ
1 1 1 1
n
∑
= xi . yi − ȳ . N . x̄ − x̄ . N . ȳ + N . x̄ . ȳ
1
n
∑
= xi . yi − N . x̄ . ȳ
n n
n ∑1 xi . ∑1 yi
∑
= xi . yi −
1
N
N n−1
Covariance quantifies the strength of the linear association between 2 numerical variables. Units
of Covariance: x-variable X y-variable (e.g. years.cm or years.Rs); hence difficult to interpret.
When both variables are moving in the same direction covariance is positive.
CORRELATION COEFFICIENT
n n ∑n1 xi . ∑n1 yi
cov(x, y) ∑1 (xi − x̄)(yi − ȳ) ∑1 xi . yi − n
r = = =
Sx . Sy n
∑1 (xi − x̄)2 .
n
∑1 (yi − ȳ)2 ( ∑ x 2 − n . x̄2) . ( ∑ y 2 − n . ȳ2)
−1 ≤ r ≤ + 1
The Pearson correlation coefficient is derived from covariance. Divide covariance of x and y by
product of standard deviations of x and y. Units of standard deviations cancel out the units of
covariance.
( Sx )
Ȳ0 − Ȳ1 n0 n
rpb = . p0 . p1 p0 = ; p1 = 1
n n
GOODNESS OF FIT
R2 = r 2
Mean +C *C Affected
Covariance ? ? ?
Correlation Coefficient ? ? ?
n
Variance (sample) x x2 ∑1 (xi2 ) − n x̄ 2
2
Ssample =
n−1
Covariance (sample) x y xy n ∑n1 xi . ∑n1 yi
∑1 xi . yi − n
=
n−1
∑n1 xi . ∑n1 yi
Correlation Coefficient x y xy x2 y2 n
∑1 xi . yi − n
=
( ∑ x2 −n . x̄2) . ( ∑ y 2 − n . ȳ2)