PARAMETRIC-TEST
PARAMETRIC-TEST
Y Y Y
Y Y Y
X X X
Variance: n
•Gives information on variability of a
single variable.
2
i
( x x ) 2
S i 1
x
Covariance: n 1
•Gives information on the degree to
which two variables vary together. n
•Note how similar the covariance is to
variance: the equation simply
(x i x)( yi y )
multiplies x’s error scores by y’s error
cov( x, y ) i 1
scores as opposed to squaring x’s
error scores. n 1
Covariance
(x i x)( yi y )
cov( x, y ) i 1
n 1
When X and Y : cov (x,y) = pos.
When X and Y : cov (x,y) = neg.
When no constant relationship: cov (x,y) = 0
Problem with Covariance:
The value obtained by covariance is dependent on the size of
the data’s standard deviations: if large, the value will be
greater than if small… even if the relationship between x and y
is exactly the same in the large versus small standard
deviation datasets.
Example of how covariance value
relies on variance
High variance data Low variance data
cov( x, y )
rxy
sx s y
Pearson’s R continued
n n
( x x)( y
i i y) ( x x)( y
i i y)
cov( x, y ) i 1 rxy i 1
n 1 (n 1) s x s y
Z xi * Z yi
rxy i 1
n 1
Limitations of r
When r = 1 or r = -1:
We can predict y from x with certainty
all data points are on a straight line: y = ax + b
r is actually r̂
r = true r of whole population
r̂ = estimate of r based on data
r is very sensitive to extreme values:
Regression
Correlation tells you if there is an association
between x and y but it doesn’t describe the
relationship or allow you to predict one
variable from the other.
= ŷ, predicted value
= y i , true value
ε = residual error
Least Squares Regression
To find the best line we must minimise the sum of
the squares of the residuals (the vertical distances
from the data points to our line)
Model line: ŷ = ax + b a = slope, b = intercept
Residual (ε) = y - ŷ
Sum of squares of residuals = Σ (y – ŷ)2
y = ax + b b = y – ax
We can put our equation for a into this giving:
r sy r = correlation coefficient of x and y
b=y- s x s = standard deviation of y
y
x s = standard deviation of x
x
Back to the model
a b
r sy r sy
ŷ = ax + b = x+y- x
sx sx
a a
r sy
Rearranges to: ŷ= (x – x) + y
sx
If the correlation is zero, we will simply predict the mean of y for every
value of x, and our regression line is just a flat straight line crossing the
x-axis at y
We can calculate the regression line for any data, but the important
question is how well does this line fit the data, or how good is it at
predicting y from x
How good is our model?
∑(y – y)2 SSy
Total variance of y: sy 2 = =
n-1 dfy
sampling variability
Confidence Interval
A range of values of a sample statistic that is
likely (at a given level of probability, i.e.
confidence level) to contain a population
parameter.
The interval that will include that population
parameter a certain percentage (= confidence
level) of the time.
Confidence Interval for difference
and Hypothesis Test
When the value 0 is not included in the interval,
that means 0 (no difference) is not a plausible
population value.
It appears unlikely that the true difference
between Company A’s salary average and the
national salary average is 0.
Therefore, Company A’s salary average is
significantly different from the national salary
average.
Independent-Sample T test
Evaluates the difference between the
means of two independent groups.
Also called “Between Groups T test”
Ho: =
1 2
H1: 1= 2
Paired-Sample T test
Evaluates whether the mean of the difference
between the paired variables is significantly
different than zero.
Applicable to 1) repeated measures and 2)
matched subjects.
Also called “Within Subject T test” “Repeated
Measures T test”.
Ho: d= 0
H1: d= 0
Analysis of Variance (ANOVA)
ANOVA:
Compare more than two groups
Test the null hypothesis that two populations
among several numbers of populations has the
same average.
ANOVA example
Example: Curricula A, B, C.
You want to know what the average score on the
test of computer operations would have been
if the entire population of the 4th graders in the school
system had been taught using Curriculum A;
What the population average would have been had
they been taught using Curriculum B;
What the population average would have been had they
been taught using Curriculum C.
Null Hypothesis: The population averages would have
been identical regardless of the curriculum used.
Alternative Hypothesis: The population averages differ for
at least one pair of the population.
ANOVA: F-ratio
The variation in the averages of these samples, from one sample
to the next, will be compared to the variation among individual
observations within each of the samples.
Statistic termed an F-ratio will be computed. It will summarize the
variation among sample averages, compared to the variation
among individual observations within samples.
This F-statistic will be compared to tabulated critical values that
correspond to selected alpha levels.
If the computed value of the F-statistic is larger than the critical
value, the null hypothesis of equal population averages will be
rejected in favor of the alternative that the population averages
differ.
Interpreting Significance
p<.05
The probability of observing an F-statistic at
least this large, given that the null hypothesis
was true, is less than .05.
Logic of ANOVA
If 2 or more populations have identical averages, the
averages of random samples selected from those
populations ought to be fairly similar as well.
t2 = F.
MS tr = SS tr/k-1
MSE = SS error/(n-1)(k-1)
F = MS tr/MSE
Within-Subject (Repeated
Measures) ANOVA
Interaction Effect
When the relationship between the dependent variable and one
independent variable differs according to the level of a second
independent variable.
When the effect of one independent variable on the dependent
variable differs at various levels of second independent variable.
T-distribution
A family of theoretical probability distributions used in hypothesis
testing.
The larger the sample, the more closely the t approximates the normal
distribution. For sample greater than 120, they are practically
equivalent.