0% found this document useful (0 votes)
18 views49 pages

PARAMETRIC-TEST

The document discusses parametric tests, focusing on correlation, regression, and the differences between them. It explains concepts such as covariance, Pearson's r, and various t-tests, including one-sample, independent-sample, and paired-sample t-tests, as well as ANOVA for comparing means across multiple groups. The document emphasizes the importance of understanding variability and significance in statistical analysis.

Uploaded by

leomillmendiola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views49 pages

PARAMETRIC-TEST

The document discusses parametric tests, focusing on correlation, regression, and the differences between them. It explains concepts such as covariance, Pearson's r, and various t-tests, including one-sample, independent-sample, and paired-sample t-tests, as well as ANOVA for comparing means across multiple groups. The document emphasizes the importance of understanding variability and significance in statistical analysis.

Uploaded by

leomillmendiola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 49

PARAMETRIC TEST

The relationship between x and y


 Correlation: is there a relationship between 2
variables?
 Regression: how well a certain independent
variable predict dependent variable?
 CORRELATION  CAUSATION
 In
order to infer causality: manipulate independent
variable and observe effect on dependent variable
Scattergrams

Y Y Y
Y Y Y

X X X

Positive correlation Negative correlation No correlation


Variance vs Covariance
 Do two variables change together?

Variance: n
•Gives information on variability of a
single variable.
2
 i
( x  x ) 2

S  i 1
x
Covariance: n 1
•Gives information on the degree to
which two variables vary together. n
•Note how similar the covariance is to
variance: the equation simply
 (x i  x)( yi  y )
multiplies x’s error scores by y’s error
cov( x, y )  i 1
scores as opposed to squaring x’s
error scores. n 1
Covariance

 (x i  x)( yi  y )
cov( x, y )  i 1
n 1
 When X and Y : cov (x,y) = pos.
 When X and Y : cov (x,y) = neg.
 When no constant relationship: cov (x,y) = 0
Problem with Covariance:
 The value obtained by covariance is dependent on the size of
the data’s standard deviations: if large, the value will be
greater than if small… even if the relationship between x and y
is exactly the same in the large versus small standard
deviation datasets.
Example of how covariance value
relies on variance
High variance data Low variance data

Subject x y x error * y x y X error * y


error error
1 101 100 2500 54 53 9
2 81 80 900 53 52 4
3 61 60 100 52 51 1
4 51 50 0 51 50 0
5 41 40 100 50 49 1
6 21 20 900 49 48 4
7 1 0 2500 48 47 9
Mean 51 50 51 50

Sum of x error * y error : 7000 Sum of x error * y error : 28

Covariance: 1166.67 Covariance: 4.67


Solution: Pearson’s r
 Covariance does not really tell us anything

 Solution: standardise this measure

 Pearson’s R: standardises the covariance value.


 Divides the covariance by the multiplied standard deviations of X and Y:

cov( x, y )
rxy 
sx s y
Pearson’s R continued

n n

 ( x  x)( y
i i  y)  ( x  x)( y
i i  y)
cov( x, y )  i 1 rxy  i 1
n 1 (n  1) s x s y

Z xi * Z yi
rxy  i 1
n 1
Limitations of r
 When r = 1 or r = -1:
 We can predict y from x with certainty
 all data points are on a straight line: y = ax + b
 r is actually r̂
 r = true r of whole population
r̂ = estimate of r based on data


r is very sensitive to extreme values:
Regression
 Correlation tells you if there is an association
between x and y but it doesn’t describe the
relationship or allow you to predict one
variable from the other.

 To do this we need REGRESSION!


Best-fit Line
 Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that
gives best prediction of y for any value of x

 This will be the line that ŷ = ax + b


minimises distance between
data and fitted line, i.e. slope intercept
the residuals
ε

= ŷ, predicted value
= y i , true value

ε = residual error
Least Squares Regression
 To find the best line we must minimise the sum of
the squares of the residuals (the vertical distances
from the data points to our line)
Model line: ŷ = ax + b a = slope, b = intercept

Residual (ε) = y - ŷ
Sum of squares of residuals = Σ (y – ŷ)2

 we must find values of a and b that minimise


Σ (y – ŷ)2
The solution
 Doing this gives the following equations for a and b:

r sy r = correlation coefficient of x and y


a= sx
sy = standard deviation of y
sx = standard deviation of x

 From you can see that:


 A low correlation coefficient gives a flatter slope (small value of
a)
 Large spread of y, i.e. high standard deviation, results in a
steeper slope (high value of a)
 Large spread of x, i.e. high standard deviation, results in a flatter
slope (high value of a)
The solution cont.
 Our model equation is ŷ = ax + b
 This line must pass through the mean so:

y = ax + b b = y – ax
 We can put our equation for a into this giving:
r sy r = correlation coefficient of x and y
b=y- s x s = standard deviation of y
y

x s = standard deviation of x
x
Back to the model
a b
r sy r sy
ŷ = ax + b = x+y- x
sx sx
a a
r sy
Rearranges to: ŷ= (x – x) + y
sx
 If the correlation is zero, we will simply predict the mean of y for every
value of x, and our regression line is just a flat straight line crossing the
x-axis at y

 But this isn’t very useful.

 We can calculate the regression line for any data, but the important
question is how well does this line fit the data, or how good is it at
predicting y from x
How good is our model?
∑(y – y)2 SSy
 Total variance of y: sy 2 = =
n-1 dfy

 Variance of predicted y values (ŷ):


∑(ŷ – y)2 SSpred This is the variance
sŷ 2 = = explained by our
n-1 dfŷ regression model

 Error variance: This is the variance of the error


between our predicted y values and
∑(y – ŷ)2 SSer the actual y values, and thus is the
serror =
2
= variance in y that is NOT explained
n-2 dfer
by the regression model
How good is our model cont.
 Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get:

ser2 = sy2 – r2sy2


= sy2 (1 – r2)

 From this we can see that the greater the correlation


the smaller the error variance, so the better our
prediction
General Linear Model
 Linear regression is actually a form of the
General Linear Model where the parameters
are a, the slope of the line, and b, the intercept.
y = ax + b +ε
 A General Linear Model is just any model that
describes the data in terms of a straight line
COMPARATIVE
t(ea) for Two:
Test between the Means of Different Groups
 When you want to know if there is a ‘difference’
between the two groups in the mean
 Use “t-test”.
 Why can’t we just use the “difference” in score?
 Because we have to take the ‘variability’ into
account.
 T = difference between group means
sampling variability
One-Sample T Test
 Evaluates whether the mean on a test
variable is significantly different from a
constant (test value).
 Test value typically represents a neutral
point. (e.g. midpoint on the test variable,
the average value of the test variable
based on past research)
Example of One-sample T-test
 Is the starting salary of company A
($17,016.09) the same as the average of
the starting salary of the national average
($20,000)?
 Null Hypothesis:

Starting salary of company A = National average


 Alternative Hypothesis:
Starting salary of company A = National average
 Review:
Standard deviation: Measure of dispersion or
spread of scores in a distribution of scores.
Standard error of the mean: Standard deviation
of sampling distribution. How much the mean
would be expected to vary if the differences
were due only to error variance.
Significance test: Statistical test to determine
how likely it is that the observed
characteristics of the samples have occurred
by chance alone in the population from which
the samples were selected. SOME TIMES
USING P VALUE
z and t
 Z score : standardized scores
 Z distribution : normal curve with mean value
z=0
 95% of the people in the given sample (or
population) have
z-scores between –1.96 and 1.96.
 T distribution is adjustment of z distribution
for sample size (smaller sampling distribution
has flatter shape with small samples).
 T = difference between group means

sampling variability
Confidence Interval
 A range of values of a sample statistic that is
likely (at a given level of probability, i.e.
confidence level) to contain a population
parameter.
 The interval that will include that population
parameter a certain percentage (= confidence
level) of the time.
Confidence Interval for difference
and Hypothesis Test
 When the value 0 is not included in the interval,
that means 0 (no difference) is not a plausible
population value.
 It appears unlikely that the true difference
between Company A’s salary average and the
national salary average is 0.
 Therefore, Company A’s salary average is
significantly different from the national salary
average.
Independent-Sample T test
 Evaluates the difference between the
means of two independent groups.
 Also called “Between Groups T test”

 Ho:  = 
1 2

H1: 1= 2
Paired-Sample T test
 Evaluates whether the mean of the difference
between the paired variables is significantly
different than zero.
 Applicable to 1) repeated measures and 2)
matched subjects.
 Also called “Within Subject T test” “Repeated
Measures T test”.
 Ho: d= 0
H1: d= 0
Analysis of Variance (ANOVA)

 An inferential statistical procedure used to


test the null hypothesis that the means of
two or more populations are equal to each
other.

 The test statistic for ANOVA is the F-test


(named for R. A. Fisher, the creator of the
statistic).
T test vs. ANOVA
 T-test
 Compare two groups
 Test the null hypothesis that two populations
has the same average.

 ANOVA:
 Compare more than two groups
 Test the null hypothesis that two populations
among several numbers of populations has the
same average.
ANOVA example
 Example: Curricula A, B, C.
 You want to know what the average score on the
test of computer operations would have been
 if the entire population of the 4th graders in the school
system had been taught using Curriculum A;
 What the population average would have been had
they been taught using Curriculum B;
 What the population average would have been had they
been taught using Curriculum C.
 Null Hypothesis: The population averages would have
been identical regardless of the curriculum used.
 Alternative Hypothesis: The population averages differ for
at least one pair of the population.
ANOVA: F-ratio
 The variation in the averages of these samples, from one sample
to the next, will be compared to the variation among individual
observations within each of the samples.
 Statistic termed an F-ratio will be computed. It will summarize the
variation among sample averages, compared to the variation
among individual observations within samples.
 This F-statistic will be compared to tabulated critical values that
correspond to selected alpha levels.
 If the computed value of the F-statistic is larger than the critical
value, the null hypothesis of equal population averages will be
rejected in favor of the alternative that the population averages
differ.
Interpreting Significance
 p<.05
 The probability of observing an F-statistic at
least this large, given that the null hypothesis
was true, is less than .05.
Logic of ANOVA
 If 2 or more populations have identical averages, the
averages of random samples selected from those
populations ought to be fairly similar as well.

 Sample statistics vary from one sample to the next,


however, large differences among the sample averages
would cause us to question the hypothesis that the
samples were selected from populations with identical
averages.
Logic of ANOVA cont.
 How much should the sample averages differ before we
conclude that the null hypothesis of equal population
averages should be rejected.
 In ANOVA, the answer to this question is obtained by
comparing the variation among the sample averages to
the variation among observations within each of the
samples.
 Only if variation among sample averages is substantially
larger than the variation within the samples, do we
conclude that the populations must have had different
averages.
Three types of ANOVA
 One-way ANOVA

 Within-subjects ANOVA (Repeated measures,


randomized complete block)

 Factorial ANOVA (Two-way ANOVA)


Sources of Variation
 Three sources of variation:
1) Total, 2) Between groups, 3) Within groups
 Sum of Squares (SS): Reflects variation. Depend on sample size.
 Degrees of freedom (df): Number of population averages being
compared.
 Mean Square (MS): SS adjusted by df. MS can be compared with
each other. (SS/df)
 F statistic: used to determine whether the population averages are
significantly different. If the computed F static is larger than the
critical value that corresponds to a selected alpha level, the null
hypothesis is rejected.
Computing F-ratio
SS Total: Total variation in the data
df total: Total sample size (N) -1
MS total: SS total/ df total

SS between: Variation among the groups compared.


df between: Number of groups -1
MS between : SS between/df between

SS within: Variation among the scores who are in the same


group.
df within: Total sample size - number of groups -1
MS within: SS within/df within
F ratio = MS between / MS within
Formula for One-way ANOVA
Formula Name How To
Sum of Square Total Subtract each of the scores from
the mean of the entire sample.
Square each of those deviations.
Add those up for each group,
then add the two groups
together.
Sum of Squares Among Each group mean is subtracted
from the overall sample mean,
squared, multiplied by how
many are in that group, then
those are summed up. For two
groups, we just sum together
two numbers.
Sum of Squares Within Here's a shortcut. Just find the
SST and the SSA and find the
difference. What's left over is the
SSW.
Alpha inflation
 Conducting multiple ANOVAs, will incur a large
risk that at least one of them would be statistically
significant just by chance.
 The risk of committee Type I error is very large for
the entire set of ANOVAs.
 Example: 2 tests .05 Alpha
 Probability of not having Type I error .95
.95x.95 = .9025
 Probability of at least one Type I error is
1-9025= .0975. Close to 10 %.
 Use more stringent criteria. e.g. .001
Relation between t-test and F-test

 When two groups are compared both t-test and


F-test will lead to the same answer.

 t2 = F.

 So by squaring t you’ll get F


(or square root of t is F)
Follow-up test
 Conducted to see specifically which means are
different from which other means.

 Instead of repeating t-test for each combination (which


can lead to an alpha inflation) there are some modified
versions of t-test that adjusts for the alpha inflation.

 Most recommended: Tukey HSD test


 Other popular tests: Bonferroni test , Scheffe test
Within-Subject (Repeated
Measures) ANOVA
 SS tr : Sum of Squares Treatment
 SS block : Sum of Squares Block
 SS error = SS total - SS block - SS tr

 MS tr = SS tr/k-1
 MSE = SS error/(n-1)(k-1)

 F = MS tr/MSE
Within-Subject (Repeated
Measures) ANOVA

 Examine differences on a dependent


variable that has been measured at more
than two time points for one or more
independent categorical variables.
Within-Subject (Repeated
Measures) ANOVA
Formula Name Description
Sum of Squares Represents variation
Treatment due to treatment
effect

Sum of Squares Block Represents variation


within an individual
(within block)

Sum of Squares Error Represents error


variation

Sum of Squares Total Represents total


variation
Factorial ANOVA
T-test and One way ANOVA
 1 independent variable (e.g. Gender), 1
dependent variable (e.g. Test score)

Two-way ANOVA (Factorial ANOVA)


 2 (or more) independent variables (e.g. Gender
and Academic Standing), 1 dependent variable
(e.g. Test score)
Main Effects and
Interaction Effects
Main Effects
 The effects for each independent variable on the dependent variable.
 Differences between the group means for each independent
variable on the dependent variable.

Interaction Effect
 When the relationship between the dependent variable and one
independent variable differs according to the level of a second
independent variable.
 When the effect of one independent variable on the dependent
variable differs at various levels of second independent variable.
T-distribution
 A family of theoretical probability distributions used in hypothesis
testing.

 As with normal distributions (or z-distributions), t distributions are


unimodal, symmetrical and bell shaped.

 Important for interpreting data gather on small samples when the


population variance is unknown.

 The larger the sample, the more closely the t approximates the normal
distribution. For sample greater than 120, they are practically
equivalent.

You might also like