0% found this document useful (0 votes)
4 views

Statistics Overview Part II

Uploaded by

lcseguraf
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Statistics Overview Part II

Uploaded by

lcseguraf
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Statistics Overview

Part II
Outline
• Covariance
• Correlation
• Simple Linear Regression Model
Measures Of The Relationship Between Two Numerical Variables

• Scatter plots allow you to visually examine the relationship between


two numerical variables and now we will discuss two quantitative
measures of such relationships.

• The Covariance
• The Coefficient of Correlation
The Covariance
• The covariance measures the strength of the linear relationship between two numerical
variables (X & Y)

• The sample covariance:


n

 ( X  X)( Y  Y )
i i
cov ( X , Y )  i1
n 1

• Only concerned with the strength of the relationship


• No causal effect is implied
Interpreting Covariance

• Covariance between two variables:


cov(X,Y) > 0 X and Y tend to move in the same direction
cov(X,Y) < 0 X and Y tend to move in opposite directions
cov(X,Y) = 0 X and Y are independent

• The covariance has a major flaw:


• It is not possible to determine the relative strength of the
relationship from the size of the covariance (only tell
direction)
Coefficient of Correlation
• Measures the relative strength of the linear
relationship between two numerical variables
• Sample coefficient of correlation:

cov (X , Y)
r
SX SY

where
n n n
 (X  X)(Y  Y)
i i  (X  X)
i
2
 i
(Y  Y ) 2

cov (X , Y)  i1 SX  i1


SY  i1
n 1 n 1 n 1
Features of the Coefficient of Correlation
• The population coefficient of correlation is referred as ρ.
• The sample coefficient of correlation is referred to as r.
• Either ρ or r have the following features:
• Unit free
• Ranges between –1 and 1
• The closer to –1, the stronger the negative linear relationship
• The closer to 1, the stronger the positive linear relationship
• The closer to 0, the weaker the linear relationship
Scatter Plots of Sample Data with
Various Coefficients of Correlation
Y Y

X X
r = -1 r = -.6
Y
Y Y X
relationshi
p

X X X
r = +1 r = +.3 r=0
Introduction to Regression Analysis
• Regression analysis is used to:
• Predict the value of a dependent variable based on the value of at least one
independent variable
• Explain the impact of changes in an independent variable on the dependent
variable
Dependent variable: the variable we wish to predict or explain(Y)
Independent variable: the variable used to predict
or explain the dependent variable(X)
Simple Linear Regression Model

• Only one independent variable, X


• Relationship between X and Y is described
by a linear function
• Changes in Y are assumed to be related to
changes in X
Types of Relationships

Linear relationships Nonlinear relationships

Y Y Quadrati
c/
Paraboli
c
X X

Y Y

Exponenti
X al X
Types of Relationships (continued)
Strong relationships Weak relationships

Y Y

X X

Y Y

X X
Types of Relationships (continued)

No relationship

X
Simple Linear Regression Model

Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable

Yi β0  β1Xi  ε i
Linear component Random Error
component
Simple Linear Regression Model
(continued)

Y Yi β0  β1Xi  ε i
Observed Value
of Y for Xi

εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value

Intercept = β0

Xi X
Simple Linear Regression Equation (Prediction Line)

The simple linear regression equation provides an


estimate of the population regression line

Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i
intercept
Value of X for

Ŷi b0  b1Xi


observation i
Interpretation of the Slope and the Intercept

• b0 is the estimated average value of Y


when the value of X is zero

• b1 is the estimated change in the average


value of Y as a result of a one-unit increase
in X
The Least Squares Method
b0 and b1 are obtained by finding the values of that
minimize the sum of the squared differences between
Y and Ŷ

min  (Yi  Ŷi ) min  (Yi  (b 0  b1Xi ))


2 2
The Least Squares Estimates
SSXY
Slope b1 
SSX
Intercept b0 Y  b1 X
where
n n
SSXY  ( X i  X )(Yi  Y )  X iYi  n XY
i 1 i 1
n n
SSX  ( X i  X )  X i  n X
2 2 2

i 1 i 1
Inferences About the Slope

• The standard error of the regression slope coefficient (b 1) is estimated


by

S YX S YX
Sb1  
SSX  (X i  X) 2

where:
Sb1 = Estimate of the standard error of the slope

SSE
S YX  = Standard error of the estimate
n 2
Chap 13-20
Inferences About the Slope: t Test
• t test for a population slope
• Is there a linear relationship between X and Y?
• Null and alternative hypotheses
• H0: β1 = 0 (no linear relationship)
• H1: β1 ≠ 0 (linear relationship does exist)
• Test statistic where:
b1  β 1 b1 = regression slope
t STAT  coefficient
Sb β1 = hypothesized slope
1
Sb1 = standard
d.f. n  2 error of the slope
Chap 13-21
Measures of Variation

• Total variation is made up of two parts:

SST  SSR  SSE


Total Sum of Regression Sum Error Sum of
Squares of Squares Squares

SST  ( Yi  Y )2 SSR  ( Ŷi  Y )2 SSE  ( Yi  Ŷi )2


where:
Y = Mean value of the dependent variable
Yi = Observed value of the dependent variable
Yˆi = Predicted value of Y for the given X value
i Chap 13-22
Measures of Variation (continued)

• SST = total sum of squares (Total Variation)


• Measures the variation of the Yi values around their mean Y
• SSR = regression sum of squares (Explained Variation)
• Variation attributable to the relationship between X and Y
• SSE = error sum of squares (Unexplained Variation)
• Variation in Y attributable to factors other than X
Measures of Variation
(continued)

Y
Yi  
SSE = (Yi - Yi )2 Y
_
SST = (Yi - Y)2

Y  _
_ SSR = (Yi - Y)2 _
Y Y

Xi X
Coefficient of Determination, r 2

• The coefficient of determination is the portion of


the total variation in the dependent variable that is
explained by variation in the independent variable
• The coefficient of determination is also called r-
squared and is denoted as r2
SSR regression sum of squares 2
2
r   0 r 1
SST total sum of squares

• r2 is also the sample correlation


coefficient
Examples of Approximate r2 Values

Y
r2 = 1

Perfect linear relationship


between X and Y:
X
r2 = 1
Y 100% of the variation in Y is
explained by variation in X

X
r =1
2
Examples of Approximate r2 Values

Y
0 < r2 < 1

Weaker linear relationships


between X and Y:
X
Some but not all of the
Y
variation in Y is explained
by variation in X

X
Examples of Approximate r2 Values

r2 = 0
Y
No linear relationship
between X and Y:

The value of Y does not


X depend on X. (None of the
r2 = 0
variation in Y is explained
by variation in X)
Empirical Time
• Collect data from investing.com
• Monthly Return of Bitcoin from Jan 2018 to Oct 2024
• Monthly Return of stock market , proxied by S&P500, from Jan 2018 to Oct
2024
• Can you find the covariance and correlation between the monthly return of
Bitcoin and monthly return of stock market
• Can you use regress Bitcoin monthly return on S&P500 monthly return?
• Is the Beta estimates significant ? If yes, how to interpret it ?
• You can use excel or any software (R, Python, SAS or Stat whatever you
like )

You might also like