Linear Regression Analysis_1
Linear Regression Analysis_1
Lecture 1
Variable
Variable
Qualitative and Quantitative variables
Association between Quantitative Variables
Correlation
Regression
Measure of association
• In statistics, any measure used to quantify a relationship between two
or more variables is a measure of association.
• Measures of association are used in various fields of research. For
example, in the areas of epidemiology and psychology, measures of
association are frequently used to quantify relationships between
exposures and diseases or behaviors.
• Data may be measured on an interval/ratio scale, an ordinal/rank
scale, or a nominal/categorical scale.
• These three characteristics can be thought of as continuous, integer,
and qualitative categories, respectively.
• The method used to determine the strength of an association
depends on the characteristics of the data for each variable.
Pearson’s correlation coefficient
Yi = f(Xi, ) + ei .
The researchers' goal is to estimate the function f that most closely fits the data. To carry out regression
analysis, the form of the function f must be specified.
Linear Regression
In linear regression, the model specification is that the dependent variable, yi is a
linear combination of the parameters (but need not be linear in the independent
variables).
For example, in simple linear regression for modelling, n data points there is one
independent variable: Xi and two parameters 0 and 1 :
Straight line: yi = 0 + 1Xi + i , i = 1, 2, …, n.
In multiple linear regression, there are several independent variables or functions
of independent variables. Adding a term in xi2 to the preceding regression gives:
Parabola: yi = 0 + 1Xi + 2Xi2 + I, i = 1, 2, …, n.
This is still a linear regression although the expression on the right hand side is
quadratic in the independent variable Xi, it is linear in the parameters 0, 1 and 2.
In both cases, I is an error term and the subscript i indexes a particular
observation.
Given a random sample from the population, we estimate the
population parameters and obtain the sample linear regression model
𝑦𝑖 = 𝛽0 +𝛽1 𝑥𝑖 .
The residual ei is the difference between the value of the dependent
variable predicted by the model above and the true value of the
dependent variable.
One method is to obtain parameter estimates that minimize the sum of
squared residuals, SSR.
• What is the formula for the least square estimates for simple linear
regression?
• What is the estimate of the variance?
• What is MSE?
• What are the assumptions of a simple linear regression model?