0% found this document useful (0 votes)
34 views

Three Segments: - Overview - Calculation of R - Assumptions

The document discusses correlation, beginning with an overview that defines correlation, explains what correlations are used for, and notes the need for caution when interpreting correlations. It then covers calculating the correlation coefficient r using either raw scores or z-scores and the relationship between covariance, variance, and r. The final segment provides an example calculating statistics for Jeremy Lin's points per game in 10 games.

Uploaded by

Ricardo Silva
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Three Segments: - Overview - Calculation of R - Assumptions

The document discusses correlation, beginning with an overview that defines correlation, explains what correlations are used for, and notes the need for caution when interpreting correlations. It then covers calculating the correlation coefficient r using either raw scores or z-scores and the relationship between covariance, variance, and r. The final segment provides an example calculating statistics for Jeremy Lin's points per game in 10 games.

Uploaded by

Ricardo Silva
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

10/12/13

Three segments
•  Overview
Statistics One •  Calculation of r
•  Assumptions
Lecture 5
Correlation

1 2

Correlation: Overview
•  Important concepts & topics
–  What is a correlation?
Lecture 5 ~ Segment 1 –  What are they used for?
Correlation: Overview –  Scatterplots
–  CAUTION!
–  Types of correlations

3 4

1

10/12/13

Correlation: Overview Correlation: Overview


•  Correlation •  When two variables, let’s call them X and
–  A statistical procedure used to measure and Y, are correlated, then one variable can be
describe the relationship between two used to predict the other variable
variables –  More precisely, a person’s score on X can be
–  Correlations can range between +1 and -1 used to predict his or her score on Y
•  +1 is a perfect positive correlation
•  0 is no correlation (independence)
•  -1 is a perfect negative correlation
5 6

Correlation: Overview Correlation: Overview


•  Example:
–  Working memory capacity is strongly
correlated with intelligence, or IQ, in healthy
young adults
–  So if we know a person’s IQ then we can
predict how they will do on a test of working
memory
7 8

2

10/12/13

Correlation: Overview Correlation: Overview


•  CAUTION! •  CAUTION!
–  Correlation does not imply causation –  The magnitude of a correlation depends upon
many factors, including:
•  Sampling (random and representative?)

9 10

Correlation: Overview Correlation: Overview


•  CAUTION! •  For now, consider just one assumption:
–  The magnitude of a correlation is also –  Random and representative sampling
influenced by:
•  Measurement of X & Y (See Lecture 6) –  There is a strong correlation between IQ and
•  Several other assumptions (See Segment 3) working memory among all healthy young
adults.
•  What is the correlation between IQ and working
memory among college graduates?
11 12

3

10/12/13

Correlation: Overview Correlation: Overview


•  CAUTION!
•  Finally & perhaps most important:
–  The correlation coefficient is a sample
statistic, just like the mean
•  It may not be representative of ALL individuals
–  For example, in school I scored very high on Math and
Science but below average on Language and History

13 14

Correlation: Overview Correlation: Overview


•  Note: there are several types of correlation
coefficients, for different variable types
–  Pearson product-moment correlation
coefficient (r)
•  When both variables, X & Y, are continuous
–  Point bi-serial correlation
•  When 1 variable is continuous and 1 is
dichotomous
15 16

4

10/12/13

Correlation: Overview Segment summary


•  Note: there are several types of correlation •  Important concepts/topics
coefficients –  What is a correlation?
–  Phi coefficient –  What are they used for?
•  When both variables are dichotomous –  Scatterplots
–  Spearman rank correlation –  CAUTION!
•  When both variables are ordinal (ranked data) –  Types of correlations

17 18

END SEGMENT Lecture 5 ~ Segment 2


Calculation of r

19 20

5

10/12/13

Calculation of r Calculation of r
•  Important topics •  r = the degree to which X and Y vary together,
–  r relative to the degree to which X and Y vary
•  Pearson product-moment correlation coefficient independently
–  Raw score formula
–  Z-score formula
•  r = (Covariance of X & Y) / (Variance of X & Y)
–  Sum of cross products (SP) & Covariance

21 22

Calculation of r Calculation of r
•  Two ways to calculate r •  Let’s quickly review calculations from
–  Raw score formula Lecture 4 on summary statistics
–  Z-score formula
•  Variance = SD2 = MS = (SS/N)

23 24

6

10/12/13

Linsanity! Jeremy Lin (10 games)


Points  per  game   (X-­‐M)   (X-­‐M)2  
28   5.3   28.09  
26   3.3   10.89  
10   -­‐12.7   161.29  
27   4.3   18.49  
20   -­‐2.7   7.29  
38   15.3   234.09  
23   0.3   0.09  
28   5.3   28.09  
25   2.3   5.29  
2   -­‐20.7   428.49  
25
M  =  227/10  =  22.7   M  =  0/10  =  0   M  =  922.1/10  =  92.21   26

Results Just one new concept!


•  M = Mean = 22.7 •  SP = Sum of cross Products
•  SD2 = Variance = MS = SS/N = 92.21
•  SD = Standard Deviation = 9.6

27 28

7

10/12/13

Just one new concept! Just one new concept!


•  Review: To calculate SS •  To calculate SP
–  For each row, calculate the deviation score –  For each row, calculate the deviation score on
•  (X – Mx) X
–  Square the deviation scores •  (X - Mx)
•  (X - Mx)2 –  For each row, calculate the deviation score on
–  Sum the squared deviation scores Y
•  SSx = Σ[(X – Mx)2] = Σ[(X – Mx) x (X – Mx)] •  (Y – My)

29 30

Just one new concept! Calculation of r


•  To calculate SP Raw score formula:

–  Then, for each row, multiply the deviation score

on X by the deviation score on Y r = SPxy / SQRT(SSx x SSy)

•  (X – Mx) x (Y – My)

–  Then, sum the “cross products”
•  SP = Σ[(X – Mx) x (Y – My)]

31 32

8

10/12/13

Calculation of r Formulae to calculate r


SPxy = Σ[(X - Mx) x (Y - My)]
r = SPxy / SQRT (SSx x SSy)




SSx = Σ(X - Mx)2 = Σ[(X - Mx) x (X - Mx)]
r = Σ[(X - Mx) x (Y - My)] /


SQRT (Σ(X - Mx)2 x Σ(Y - My)2)

SSy = Σ(Y - My)2 = Σ[(Y - My) x (Y - My)]













33
34

Formulae to calculate r Formulae to calculate r


Z-score formula:
Zx = (X - Mx) / SDx


Zy = (Y - My) / SDy

r = Σ(Zx x Zy) / N



SDx = SQRT (Σ(X - Mx)2 / N)

SDy = SQRT (Σ(Y - My)2 / N)







35 36

9

10/12/13

Formulae to calculate r Formulae to calculate r


Proof of equivalence:
r = Σ { [(X - Mx) / SQRT (Σ(X - Mx)2 / N)] x


[(Y - My) / SQRT (Σ(Y - My)2 / N)] } / N

Zx = (X - Mx) / SQRT (Σ(X - Mx)2 / N)




Zy = (Y - My) / SQRT (Σ(Y - My)2 / N)














37
38







Formulae to calculate r Variance and covariance


r = Σ { [(X - Mx) / SQRT (Σ(X - Mx)2 / N)] x
•  Variance = MS = SS / N
[(Y - My) / SQRT (Σ(Y - My)2 / N)] } / N

•  Covariance = COV = SP / N


r = Σ [(X - Mx) x (Y - My)] /

SQRT ( Σ(X - Mx)2 x Σ(Y - My)2 )
•  Correlation is standardized COV

–  Standardized so the value is in the range -1 to
r = SPxy / SQRT (SSx x SSy) ß The raw score formula!
1



39 40

10

10/12/13

Note on the denominators Segment summary


•  Correlation for descriptive statistics •  Important topics
–  Divide by N –  r
•  Correlation for inferential statistics •  Pearson product-moment correlation coefficient
–  Raw score formula
–  Divide by N – 1 –  Z-score formula
–  Sum of cross Products (SP) & Covariance

41 42

END SEGMENT Lecture 5 ~ Segment 3


Assumptions

43 44

11

10/12/13

Assumptions Assumptions
•  Assumptions when interpreting r •  Assumptions when interpreting r
–  Normal distributions for X and Y –  Reliability of X and Y
–  Linear relationship between X and Y –  Validity of X and Y
–  Homoscedasticity –  Random and representative sampling

45 46

Assumptions Assumptions
•  Assumptions when interpreting r •  Assumptions when interpreting r
–  Normal distributions for X and Y –  Linear relationship between X and Y
•  How to detect violations? •  How to detect violation?
–  Plot histograms and examine summary statistics –  Examine scatterplots (see following examples)

47 48

12

10/12/13

Assumptions Homoscedasticity
•  Assumptions when interpreting r •  In a scatterplot the vertical distance between a
–  Homoscedasticity dot and the regression line reflects the amount
•  How to detect violation? of prediction error (known as the “residual”)
–  Examine scatterplots (see following examples)

49 50

Homoscedasticity Anscombe’s quartet


•  Homoscedasticity means that the •  In 1973, statistician Dr. Frank Anscombe
distances (the residuals) are not related to developed a classic example to illustrate
the variable plotted on the X axis (they are several of the assumptions underlying
not a function of X) correlation and regression
•  This is best illustrated with scatterplots

51 52

13

10/12/13

Anscombe’s quartet Anscombe’s quartet

53 54

Anscombe’s quartet Anscombe’s quartet

55 56

14

10/12/13

Anscombe’s quartet Segment summary


•  Assumptions when interpreting r
–  Normal distributions for X and Y
–  Linear relationship between X and Y
–  Homoscedasticity

57 58

Segment summary
•  Assumptions when interpreting r
–  Reliability of X and Y
–  Validity of X and Y
END SEGMENT
–  Random and representative sampling

59 60

15

10/12/13

END LECTURE 5

61

16

You might also like