0% found this document useful (0 votes)
3 views

Lecture SLR (1)

The document discusses correlation and regression analysis, focusing on determining relationships between numerical variables. It explains concepts such as correlation coefficients, scatter plots, and the significance of correlation, as well as the methodology for simple linear regression. The document also highlights the importance of understanding the assumptions and limitations of these statistical methods.

Uploaded by

k214525
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture SLR (1)

The document discusses correlation and regression analysis, focusing on determining relationships between numerical variables. It explains concepts such as correlation coefficients, scatter plots, and the significance of correlation, as well as the methodology for simple linear regression. The document also highlights the importance of understanding the assumptions and limitations of these statistical methods.

Uploaded by

k214525
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Correlation & Regression

Slide 16,18, 41,48, 49, and 53 are optional


Introduction
• In previous chapters we have introduced two areas of inferential statistics,
hypothesis testing & Confidence intervals.
• Another area of inferential statistics involves determining whether a relationship
between two or more numerical or quantitative variables exists. For example:

• A businessperson may want to know whether the volume of sales for a given month
is related to the amount of advertising the firm does that month.
• Educators are interested in determining whether the number of hours a student
studies is related to the student’s score on a particular exam.
• Medical researchers are interested in questions such as “Is caffeine related to heart
damage?” or “Is there a relationship between a person’s age and his or her blood
pressure?”
• A zoologist may want to know whether the birth weight of a certain animal is related
to its life span.
Introduction (Contd.)

These are only a few of the many questions that can be answered by
using the techniques of correlation and regression analysis.
• Correlation is a statistical method used to determine whether a
relationship between variables exists.

• The purpose of this chapter is to answer the following questions


statistically:
1. Are two or more variables related?
2. If so, what is the strength of the relationship?
3. What type of relationship exists?
4. What kind of predictions can be made from the relationship?

Introduction
• In simple correlation and regression studies, the researcher collects data
on two numerical or quantitative variables to see whether a relationship
exists between the variables. For example:

• if a researcher wishes to see whether there is a relationship between


number of hours studied and test scores on an exam as shown:

Introduction
• The independent variable is the variable in regression that can be controlled or manipulated.
• The dependent variable is the variable in regression that cannot be controlled or manipulated.

Introduction
Scatter Plot

• The independent and dependent variables can be plotted on a graph


called a scatter plot. The independent variable, x, is plotted on the
horizontal axis and the dependent variable, y, is plotted on the vertical
axis.
Scatter plot (Example 01)
• Construct a scatter plot for the data obtained in a study of age and
systolic blood pressure of six randomly selected subjects. The data are
shown in the following table.
Scatter plot (Example # 02)
• Construct a scatter plot for the data obtained in a study on the
number of absences and the final grades of seven randomly selected
students from a statistics class.
Correlation Coefficient

• to determine the strength of the relationship between two variables.


There are several types of correlation coefficients. The one explained
in this section is called the Pearson product moment correlation
coefficient (PPMC), named after statistician Karl Pearson, who
pioneered the research in this area.

• The correlation coefficient computed from the sample data measures


the strength and direction of a linear relationship between two
variables. The symbol for the sample correlation coefficient is r. The
symbol for the population correlation coefficient is ρ (Greek letter
rho).
Correlation
Properties of Correlation Coefficient

• Correlation coefficient is symmetric rxy = ryx.

• Correlation coefficient does not depend on units.

• The correlation coefficient lies between – 1 to + 1 i.e. – 1 ≤ r ≤ +1

Correlation
Formula for Correlation Coefficient

Correlation
Example # 03
• Compute the value of the correlation coefficient for the data obtained
in the study of age and blood pressure:

Correlation
Example # 04
• Compute the value of the correlation coefficient for the data obtained
in the study of the number of absences and the final grade of the
seven students in the statistics class:

Correlation
The Significance of Correlation
• Since the value of r is computed from data obtained from samples,
there are two possibilities when r is not equal to zero: either the value
of r is high enough to conclude that there is a significant linear
relationship between the variables, or the value of r is due to chance.

• In order to make this decision, one uses a hypothesis-testing


procedure.

Correlation
Hypothesis testing for Correlation

Correlation
Example # 05
• Test the significance of the correlation coefficient found in Example #
01.

Correlation
Example # 05 (Contd.)

Correlation
Possible relationship b/w variables
When the null hypothesis has been rejected for a specific value, any of the following
possibilities can exist.

i. There is a direct cause-and-effect relationship between the variables. That is, x


causes y. For example, water causes plants to grow, poison causes death, and heat
causes ice to melt.

ii. There is a reverse cause-and-effect relationship between the variables. That is, y
causes x. For example, suppose a researcher believes excessive coffee consumption
causes nervousness, but the researcher fails to consider that the reverse situation
may occur. That is, it may be that an extremely nervous person craves coffee to calm
his or her nerves.

iii. The relationship between the variables may be caused by a third variable.
Remember, correlation does not necessarily imply causation. Correlation
Curvilinear relationship
Do the following for next 3 slides
Practice Questions for Correlation

Correlation
Practice Questions for correlation (Contd.)

Correlation
Simple Linear Regression
• In studying relationships between two variables:

• collect the data and then construct a Scatter plot.


• compute the value of the correlation coefficient.
• test the significance of the relationship.

• If the value of the correlation coefficient is significant, the next step is to


determine the equation of the regression line, which is the data’s line of
best fit.
• Determining the regression line when r is not significant and then
making predictions using the regression line is meaningless.
Simple Linear Regression (Contd.)

• The dependence of one variable over the other variable is called


“Regression”.

• The statistical method which helps us to estimate the value of


dependent variable from the known value of independent variable is
called regression.

Simple Linear Regression


Simple Linear Regression Model
• The statistical model for simple linear regression is given below. The response
Y is related to the independent variable x through the equation:

Simple Linear Regression


• The quantity Y is a random variable since is ϵ random.

• The value x of the regressor variable is not random and, in fact, is


measured with negligible error.

• The quantity ϵ, often called a random error or random disturbance, has


constant variance (homogeneous variance).

• The presence of this random error, , keeps the model from becoming
simply a deterministic equation.
Simple Linear Regression
• We must keep in mind that:

• in practice ß0 and ß1 are not known and must be estimated from data.
• we never observe the actual ϵ values in practice and thus we can never draw
the true regression line

• We can only draw an estimated line.

Simple Linear Regression


Simple Regression Model (Contd.)
Deterministic Vs. Statistical Relationship
• in regression analysis we are concerned with what is known as the
statistical, not deterministic dependence among variables.

• In statistical relationships among variables we essentially deal with


random or stochastic variables i.e. variables that have probability
distributions.

Simple Linear Regression


Deterministic Vs. Statistical (Example)

• The dependence of crop yield on temperature, rainfall, sunshine, and


fertilizer, for example, is statistical in nature in the sense that the
explanatory variables, although certainly important, will not enable
the agronomist to predict crop yield exactly.

• In deterministic phenomena, on the other hand, we deal with


relationships of the type, say, exhibited by Newton’s law of gravity,
which states: Every particle in the universe attracts every other
particle with a force directly proportional to the product of their
masses and inversely proportional to the square of the distance
between them.

Simple Linear Regression


Simple Linear Regression
Relationship B/w Regression & Correlation
The value of the correlation
coefficient can also be
found by using the formula:
Determining Regression Equation

• There are several methods for estimating the regression parameters,


here we will use Method of Least Sq. to estimate the parameters.
• The formula for parameters estimation by least square is given below:

Simple Linear Regression


The Method of Least Squares
• We shall find b0 and b1, so that the sum of the squares of the residuals is a
minimum.

• The residual sum of squares is often called the sum of squares of the errors
about the regression line and is denoted by SSE.

• This minimization procedure for estimating the parameters is called the


method of least squares.

Simple Linear Regression


The Method of Least Squares (Contd.)

We have considered bo = a & b1 = b (see slide 38)


Simple Linear Regression
Example # 06
• Find the equation of the regression line for the data in Example # 01
(Slide 15), and graph the line on the scatter plot of the data.
Example # 06 (Contd.)

The sign of the correlation coefficient and the sign of the slope of the regression line will
always be the same. The reason is that the numerators of the formulas are the same.

Simple Linear Regression


Simple Linear Regression
Prediction Using Regression Eq.
• Using the equation of the regression line found in Example # 06,
predict the blood pressure for a person who is 50 years old.

Simple Linear Regression


Assumptions for valid prediction

• For any specific value of the independent variable x, the value of the
dependent variable y must be normally distributed about the
regression line.

• The standard deviation of each of the dependent variables must be


the same for each value of the independent variable.

Simple Linear Regression


Assumptions for Prediction

Simple Linear Regression


Residual Analysis
To interpret a
residual plot, you
need to determine
if the residuals
form a pattern.
All showing
pattern.
Model is not good
The residual plot shows that the regression line y = 4.8 + 2.8*x is
somewhat questionable for making predictions due to a small sample size.
Coefficient of Determination (r2)
Square of correlation is “Coeff of determination”
This given formula is optional.
Short
Q/A
Example # 07

• Consider the experimental data given below, which were obtained


from 33 samples of chemically treated waste in a study conducted at
Virginia Tech. Readings on x, the percent reduction in total solids, and
y, the percent reduction in chemical oxygen demand, were recorded.
Estimate the regression line for the pollution data.

Simple Linear Regression


Example # 07 (Contd.)

Simple Linear Regression


Example # 07 (Contd.)

Simple Linear Regression


Practice questions
• Do a complete regression analysis by performing the following steps
for Q1 & Q2. (See Next Slide)

Simple Linear Regression


Simple Linear Regression
Practice Questions

Simple Linear Regression


Practice Questions

Simple Linear Regression

You might also like