0% found this document useful (0 votes)
82 views

STATS Shortcut Formula

1) Variance is a measure of how far data points are spread from their average value. It is calculated by taking the average of squared deviations from the mean. 2) Standard deviation is the square root of variance and provides a measure of data spread in the same units as the original data. 3) Percentiles, quartiles, and the interquartile range (IQR) are used to describe data distribution and identify outliers. The IQR is calculated as Q3-Q1.

Uploaded by

jeet sigh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

STATS Shortcut Formula

1) Variance is a measure of how far data points are spread from their average value. It is calculated by taking the average of squared deviations from the mean. 2) Standard deviation is the square root of variance and provides a measure of data spread in the same units as the original data. 3) Percentiles, quartiles, and the interquartile range (IQR) are used to describe data distribution and identify outliers. The IQR is calculated as Q3-Q1.

Uploaded by

jeet sigh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

CG ver:1.

0 date:20201116 Statistics I Page 1 of 3

VARIANCE
(x − μ)2


n u m e r a t or =
2
∑ (x − μ)2 2
∑ (x − x̄)2
σpopulation = Ssample = =

(x 2 − 2x μ + μ 2 )

N n−1 (x 2 ) − 2μ μ 2

∑ ∑ ∑
= x+
(x 2 ) − 2μ . n . μ + n μ 2


=
Sum of squared deviations of every observation from its mean over n
total number of observations. Units are square units of the original (x 2 ) − n μ 2

∑ i
=
variable.
1
Computational Formulae :

n n
∑1 (xi2) − Nμ 2 ∑1 (xi2) − n x̄ 2 <= useful for manual calculations
2 2
σpopulation = Ssample =
N n−1
STANDARD DEVIATION
Square root of Variance. Units are associated with numerical measures. σ or s

PERCENTILES, QUARTILES, IQR,Box Plot


100p percent of data ≤ percentile ≤ 100(1-p) percent of data.

Compute : Arrange ascending, decimal np? next pos : integer np? avg of value at np&np+1.
Percentile need not be part of data set. Q1 = 25th p, Q2= median = 50th p, Q3= 75th p.
5 number summary : Min,Q1,Q2,Q3,Max

IQR = Q3-Q1 ; Range = Max-Min

OUTLIERS
outlier < Q1 − 1.5 * IQ R; outlier > Q3 + 1.5 * IQ R
TWO WAY CONTINGENCY TABLE
Association between 2 categorical variables - relative frequencies. Bivariate Categorical Data.

Row Total and Column Total. Row Relative Frequencies (cell frequency/row total) & Column
Relative Frequencies (cell frequency/column total). If the row or column relative frequencies are
the same for all rows/columns then the 2 variables are not associated.

STACKED/SEGMENTED BAR CHART


Represents counts of a particular category and segments representing frequency of category. 100
percent stacked bar chart is useful for point to whole relationships. (e.g. Gender vs Phone
ownership)

SCATTER PLOT
y-axis : response variable ; x-axis : explanatory variable

e.g. x-axis : age and y-axis : height



CG ver:1.0 date:20201116 Statistics I Page 2 of 3
COVARIANCE

n n ∑n1 xi . ∑n1 yi
∑1 (xi − x̄)(yi − ȳ) ∑1 (xi . yi ) − N
cov(x, y)population = =

N N
n


num erator = (xi − x̄)(yi − ȳ)

1
n


= (xi . yi − xi . ȳ − x̄ . yi + x̄ . ȳ)

1
n n n n

∑ ∑ ∑ ∑
= xi . yi − xi . ȳ − x̄ . yi + x̄ . ȳ

1 1 1 1
n n n n

∑ ∑ ∑ ∑
= xi . yi − ȳ . xi − x̄ . yi + x̄ . ȳ

1 1 1 1
n


= xi . yi − ȳ . N . x̄ − x̄ . N . ȳ + N . x̄ . ȳ

1
n


= xi . yi − N . x̄ . ȳ

n n
n ∑1 xi . ∑1 yi

= xi . yi −

1
N

n ∑n1 xi . ∑n1 yi n ∑n1 xi . ∑n1 yi


∑1 xi . yi − N
∑1 xi . yi − n
cov(x, y)population = cov(x, y)sample =

N n−1
Covariance quantifies the strength of the linear association between 2 numerical variables. Units
of Covariance: x-variable X y-variable (e.g. years.cm or years.Rs); hence difficult to interpret.
When both variables are moving in the same direction covariance is positive.

CORRELATION COEFFICIENT

n n ∑n1 xi . ∑n1 yi
cov(x, y) ∑1 (xi − x̄)(yi − ȳ) ∑1 xi . yi − n
r = = =

Sx . Sy n
∑1 (xi − x̄)2 .
n
∑1 (yi − ȳ)2 ( ∑ x 2 − n . x̄2) . ( ∑ y 2 − n . ȳ2)

−1 ≤ r ≤ + 1
The Pearson correlation coefficient is derived from covariance. Divide covariance of x and y by
product of standard deviations of x and y. Units of standard deviations cancel out the units of
covariance.

CG ver:1.0 date:20201116 Statistics I Page 3 of 3

POINT BI-SERIAL CORRELATION COEFFICIENT

For a dichotomous categorical variable (2 categories) . X is a numerical variable and Y is a


categorical variable.

e.g.: Gender(Y0 and Y1) and Marks(X)

( Sx )
Ȳ0 − Ȳ1 n0 n
rpb = . p0 . p1 p0 = ; p1 = 1
n n

GOODNESS OF FIT

R2 = r 2

R 2 is a measure of the proportion of variance in the data-set explained by the explanatory


variable. The measure is closer to 1 when r is closer to -1 or +1.

EFFECT OF MANIPULATING DATA WITH CONSTANT


Add Constant (+C) Multiply Constant (xC) Outliers

Mean +C *C Affected

Median +C *C Not Affected

Mode +C *C Not Affected

Variance no effect * C2 Affected

Standard Deviation no effect *C Affected

Covariance ? ? ?

Correlation Coefficient ? ? ?

Calculation Tables for Computational Formulae (Shortcuts)

n
Variance (sample) x x2 ∑1 (xi2 ) − n x̄ 2
2
Ssample =
n−1
Covariance (sample) x y xy n ∑n1 xi . ∑n1 yi
∑1 xi . yi − n
=
n−1
∑n1 xi . ∑n1 yi
Correlation Coefficient x y xy x2 y2 n
∑1 xi . yi − n
=
( ∑ x2 −n . x̄2) . ( ∑ y 2 − n . ȳ2)

You might also like