0% found this document useful (0 votes)
66 views

Chapt 2 Slides

This document discusses quantitative relationships between two numerical variables. It distinguishes between deterministic relationships, where a formula precisely defines the relationship, and non-deterministic relationships, where multiple y values can exist for each x value. For non-deterministic relationships, simple linear regression finds the line of best fit to the data but can only predict average y values, not individual points. The correlation coefficient r measures the strength of linear association between variables on a scale from -1 to 1. Ecological correlations using aggregated group data are prone to ecological and atomistic fallacies in making inferences about individuals. Attenuation effect occurs when range restriction in one variable understates the true correlation.

Uploaded by

erickhadinata
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Chapt 2 Slides

This document discusses quantitative relationships between two numerical variables. It distinguishes between deterministic relationships, where a formula precisely defines the relationship, and non-deterministic relationships, where multiple y values can exist for each x value. For non-deterministic relationships, simple linear regression finds the line of best fit to the data but can only predict average y values, not individual points. The correlation coefficient r measures the strength of linear association between variables on a scale from -1 to 1. Ecological correlations using aggregated group data are prone to ecological and atomistic fallacies in making inferences about individuals. Attenuation effect occurs when range restriction in one variable understates the true correlation.

Uploaded by

erickhadinata
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Quantitative Reasoning

Association
Key Concepts

Sudarshan Narasimhan
(Dash)

[email protected]
Tutor
Provost office, QR team

1
Outline

R/s between 2 numerical variables

Deterministic Non-deterministic

Scatter plot Association between x & y variables

Linear regression Correlation coefficient

Ecological correlation Attenuation effect

Ecological fallacy Atomistic fallacy


Deterministic relationships
• E.g degrees celsius to Farenheit

• We have a formula for which given one x value you can compute a
true value for y and vice versa.

• The story ends there.


Non – deterministic relationships
• For each x value, there can exist multiple y values and vice versa

• We WANT to arrive at SOME kind of formula.

• But whatever formula we arrive at, we must understand what it IS


and what it is NOT.
Simple Linear Regression
 Regression line (or line of best fit to data)
Son's Height vs Father's Height Son's Height vs Father's Height
80 80

75 75

Son's Height
Son's Height

70 70

65 65

60 60

55 55
55 60 65 70 75 80 55 60 65 70 75 80

Father's height Father's height


What does the regression equation mean?
• Suppose the line in the previous slide has a regression equation y =
1.01x + 1.02

• So what does it mean if I input a value of x = 60?

• Does the value obtained for y correspond to the son’s height


assuming the father’s height is 60 inches?

• What if I have a father whose height is 80 inches. Can I use the


regression equation to predict what will the son’s height be?
Simple Linear Regression Exam
Point!

 Regression line (or line of best fit to data)


Son's Height vs Father's Son's Height vs Father's
Height Height
𝑌 = 𝑚𝑋 + 𝑐
Son's Height

Son's Height
80 85
75
70
75 ≠𝑟
65 65
60 55
(in
55 55 60 65 70 75 80
55 60 65 70 75 80
Father's height
general)
Father's height

 Only can predict son’s AVERAGE height!

 CANNOT predict son’s average height if father’s height is


beyond range used in data set! (i.e can’t simply extrapolate)

 GRADIENT is not the same as r value in general.


Exam
Correlation Coefficient, 𝒓 Point!
1. measures linear association between 2 variables (NOT causation!)

2. ranges between -1 and 1 (no units)

3. 𝑟 > 0 → positive linear association


𝑟 < 0 → negative linear association
𝑟 = 0 → no linear association

4.
What can you say about the graphs below?

Exam
Point!
e.g.
5. Computing 𝑟

Son's Height vs Father's Height


80

1 (65,71) 2

Son's Height
75

e.g. -1.1 × 0.71 70

65
4 3
60

55
55 60 65 70 75 80
6. 𝑟 is not affected by change of scale Father's height
Ecological Correlation
 Correlation computed based on aggregated data, e.g., group averages
Exam
Point!

 E.g of groups : School, business organization, country, race, etc.

 Why use it?


But beware
 ecological fallacy: deduce inferences on correlation between
individuals based on aggregated data
Exam
Point!

 atomistic fallacy: generalize correlation based on individuals


towards aggregate level correlation
Exam
Point!
Another way to think of ecological fallacy and
atomistic fallacy
Ecological Correlation Direction of conclusion
Correlation
Average Math score

Ecological fallacy

Math score
Atomistic fallacy
Each point Each point
represents a represents a
school None of them can prove student
the other one is true. If
someone makes the Chemistry score
Average Chemistry score
wrong conclusion, he
has committed a fallacy

13
Is this attenuation effect?
Attenuation Effect

 Attenuation Effect: Due to range restriction in one variable, Exam


correlation coefficient obtained understates the strength of Point!
association between the two variables
Son's Height vs Father's Height Son's Height vs Father's Height
80 80

75 75
Son's Height

Son's Height
70 70

65 65

60 60

55 55
55 60 65 70 75 80 55 60 65 70 75 80

Father's height Father's height (66-70 in)

You might also like