0% found this document useful (0 votes)
14 views12 pages

QT_LESSON 8-Regression & Correlation.docx

Lesson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views12 pages

QT_LESSON 8-Regression & Correlation.docx

Lesson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

LESSON 8: CORRELATION AND REGRESSION

I. REGRESSION.
Regression is the measure of the average relationship between two or more variables in
terms of the original units of the data.
The regression study which confines itself to a study of only two variables is called simple
regression. The regression analysis which studies more than two variables at a time is
called a multiple regression.
In a simple regression analysis, there are two variables-one of which is known as an
‘independent variable’ or ‘regressor /predictor/explanatory.’ This other variable whose
values are predicted is called the ‘dependent’ or ‘regressed ‘or explained variable.

With the help of regression studies, we can also calculate the coefficient of correlation.
The coefficient of determination (𝒓𝟐 ) which measures the effect of the independent
variable on the dependent variable gives us an indication about the predictive value of the
regression studies. We will use the method of least squares to obtain the regression
equation.

Method of least squares


This method establishes a mathematical relationship between the movements of X and Y
series and algebraic equations are obtained to represent the relative movements of X and
Y series.
In this method we minimise the Sum of Squares of the deviations between the given values
of a variable and its estimated values given by the line of best fit.
Line of Regression of Y on X is the line which gives the best estimate for the value
of Y for a specified value of X.
Similarly, the line of regression of X on Y is the line which gives the best estimate
for the value of X for a specified value of Y.
The line of best fit is obtained by the equation of straight line 𝒀 = 𝒂 + 𝒃𝑿. In the
method of least squares, this line is obtained with the help of the following two normal
equation.
𝒏𝒂 + 𝒃 ∑ 𝑿 = ∑ 𝐘

𝒂 ∑ 𝐗 + 𝒃 ∑ 𝐗 𝟐 = ∑ 𝑿𝒀

It the values of X and Y variables are substituted in the above equation we get the values
of 𝒂 and 𝒃 and thus we get the regression line of Y on X.

Page 1 of 12 Regression & Correlation


To get the regression line of X on Y we will have to assume X as the dependent variable
and Y as the independent variable and the two normal equations are
𝑛𝑎 + 𝑏 ∑ 𝑌 = ∑ X

𝑎 ∑ Y + 𝑏 ∑ Y 2 = ∑ 𝑋𝑌
Example
Find the estimates of a and b given that 𝑌 = 𝑎 + 𝑏𝑋 and thus find the value of Y when 𝑋 =
12

X 8 10 6 11 8 7 10 9 10

Y 145 150 124 157 130 127 140 122 132

Example

The table below shows the number of absences, X, in a Quantitative methods course and
the final exam grade, Y, for 7 students. Find the line that best fit the data.

X 1 0 2 6 4 3 3

Y 85 80 70 55 90 90 95

Example

From the following data obtain the two regression equations using the method of Least
Squares.

X 2 4 6 8 10

Y 5 7 9 8 11

Page 2 of 12 Regression & Correlation


II. CORRELATION.

If two variables X and Y vary in such a way that changes in one are accompanied by
changes in the other, these variables are said to be correlated.

If the increase in one variable is associated with an increase in the other variable, the
correlation is said to be positive. If the increase in one is being associated with a decrease
in the other, the correlation is said to be negative.

Thus, correlation is a measure of the association between the values of two variables
X and Y. Correlation techniques are used in predicting the values of one variable from the
values of another variable. If the values of the two variables satisfy an equation exactly,
we say that the variables are perfectly correlated. If there is no such equation, then
there is no relationship between the two variables and they are said to be uncorrelated.

Scatter Diagrams

A useful method of investigating if there is any correlation between two variables is to


draw a scatter diagram. In this method, the given data is plotted on a graph paper. In
respect of each observation, the values of the variable (Y) is measured along the y-axis
and the corresponding value of the other variable (X) is plotted along the x-axis.

If all points on a scatter diagram seem to lie near a straight line, the correlation is called
linear. If Y tends to increase as X increases, the correlation is positive; and if Y tends to
decrease as X increases, the correlation is negative. If all the points seem to lie near some
curve, the correlation is called non-linear.

The correlation coefficient denoted by r is given by the formula

𝒄𝒐𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒐𝒇 (𝒙, 𝒚)
𝒓=
√(𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒐𝒇 𝒙)(𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒐𝒇 𝒚)
∑ 𝒙𝒚 ∑ 𝒙𝟐
where 𝒄𝒐𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒐𝒇 (𝒙, 𝒚) = ̅𝒚
−𝒙 ̅ , variance of 𝒙 = ̅𝟐
−𝒙
𝒏 𝒏

∑ 𝒚𝟐
and variance of 𝒚 = ̅𝟐
−𝒚
𝒏

Page 3 of 12 Regression & Correlation


Example

1. A researcher studied the connection between x (the age in years of a licensed driver)
and y (the percentage of fatal accidents for drivers of that age which are caused by
speeding). The collected data is shown below. Calculate the coefficients of
correlation.

x 17 27 37 47 57 67 77

y 36 25 20 12 10 7 5

2. The table below shows the number of absences, x, in a Quantitative methods


course and the final exam grade, y, for 7 students. Find the correlation coefficient for
the data.

x 1 0 2 6 4 3 3

y 85 80 70 55 90 90 95

Coefficient of determination

This measure denoted by 𝒓𝟐 is used to determine the goodness of fit of the regression
equation. It indicates the proportion of variance in the dependent variable that can be
explained by the independent variables.

A researcher studied the connection between x (the age in years of a licensed driver) and
y (the percentage of fatal accidents for drivers of that age which are caused by
speeding). The collected data is shown below. Calculate the coefficients of determination.

X 17 27 37 47 57 67 77

Y 36 25 20 12 10 7 5

Page 4 of 12 Regression & Correlation


Karl Pearson’s Correlation Coefficient

To compute Karl Pearson’s Correlation Coefficient, Let’s consider the data given
below.
Age (years) 1 2 3 4 5
Weight (kg) 7 4 6 7 10
Let’s calculate the Karl Pearson’s Correlation Coefficient between the Age and the
weight of these children in a health Centre
Solution
Karl Pearson’s Correlation Coefficient (r) is given by the formula
𝒏(∑ 𝒙𝒚) − (∑ 𝒙)(∑ 𝒚)
𝒓=
√{∑ 𝒏𝒙𝟐 − (∑ 𝒙)𝟐 } {∑ 𝒏𝒚𝟐 − (∑ 𝒚)𝟐 }

Now, let us solve the problem above


Note: The Range of correlation coefficient is -1 to +1
With -1 being a perfectly negatively correlated and +1 being perfectly positively
correlated. 0 is translated as NO CORRELATION

Spearman’s Rank Correlation Coefficient (𝒓𝒌 )


This is another technique of computing correlation coefficient by means of the Ranks of
the values.
For example, given the data below, compute the Spearman’s Rank Correlation Coefficient.
X 10 6 9 12 8
Y 8 7 5 6 9

Here, the formula that we use is given by


𝟔 ∑ 𝑫𝟐
𝒓𝒌 = 𝟏 −
𝑵(𝑵𝟐 − 𝟏)
Where D is the difference between the Ranks of X and Y, N is the number of the values

Now, let us solve the problem given above

Page 5 of 12 Regression & Correlation


X Y 𝑅𝑋 𝑅𝑌 D 𝐷2
10 8 2 2 0 0
6 7 5 3 2 4
9 5 3 5 2 4
12 6 1 4 3 9
8 9 4 1 3 9
∑ 𝑫𝟐 = 𝟐𝟔 and N = 6
Now substitute into the formula and obtain the answer.

Example 2

Sometimes we get a question where the Ranks are already provided and your task become
very easy.
i.e

The table below shows the ranks of students in Maths and physics test. Calculate the
Ranks Correlation Coefficient

𝑹𝑴 8 5 9 2 4 1 7 3 10 6
𝑹𝑷 7 10 6 3 2 5 9 1 8 4

You task is to obtain D and 𝐷 2 then substitute your values into the formula

𝟔 ∑ 𝑫𝟐
𝒓𝒌 = 𝟏 −
𝑵(𝑵𝟐 −𝟏)

Let’s Do the work.

Page 6 of 12 Regression & Correlation


QUIZ

1. Given that the covariance between Lengths and Weights of 5-items is 6 and their
standard deviation are 2.45 and 2.61 respectively. Find Coefficient of correlation
between the lengths and the weight

Solution
𝑪𝒐𝒗(𝑿, 𝒀) = 𝟔, 𝜹𝑿 = 𝟐. 𝟒𝟓, 𝒂𝒏𝒅 𝜹𝒀 = 𝟐. 𝟔𝟏.

𝑪𝒐𝒗(𝑿, 𝒀)
𝒂𝒏𝒅 𝒘𝒆 𝒌𝒏𝒐𝒘, 𝒓 =
𝜹𝑿 × 𝜹𝒀

2. Coefficient of correlation between two variables X and Y is 0.49 and their


covariance is 36. If Variance of X is 16, Find the Standard deviation of Y

𝑹𝒆𝒎𝒆𝒎𝒃𝒆𝒓, 𝜹𝟐 = 𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆

Exercise

1. The table below reports the ages (in years) and the number of hours of sleep in
one night by seven adults.

Ages, X 35 20 59 42 38 68 75

Hours of 7 9 5 6 8 5 4
sleep, Y

i. Find the correlation coefficient between X and Y


ii. Find the equation of the regression line between Y and X.

2. The following table show the prices of a commodity and the amount demanded at
each respective price.

Price (X): 10 12 13 12 16 15

Amount demanded (Y): 40 38 43 45 37 43

i. Find the regression equation based on the above data.

Page 7 of 12 Regression & Correlation


ii. Estimate the likely amount to be demanded if the price is 14.
iii. Obtain the coefficient of correlation and determination between price and
quantity demanded and hence interpret their meaning.
3. Calculate the coefficient of correlation for the following data;

X 63 52 59 57 64 65 55 56 59

Y 126 125 117 113 130 129 111 113 116

4. Given the bivariate data;

X 1 5 3 2 1 1 7 3

Y 6 1 0 0 1 2 1 5

a. Fit a regression line of Y on X and hence predict Y when X=10.


b. Fit a regression line of X on Y, predict X when Y=2.5
c. Calculate the correlation coefficient
d. Find the coefficient of determination and interpret it.

5. The following data gives the ages and blood pressure of 10 women;

Age(X) 56 42 36 47 49 42 60 72 63 55

Blood 147 125 118 128 145 140 155 160 149 150
Pressure(Y)

a. Determine the correlation coefficient between X and Y.


b. Determine the least square regression equation of Y on X.
c. Estimate the blood pressure of a woman whose age is 45 years.

6. The following data have been collected regarding sales and advertising
expenditure

Sales (Sh. M) 8 9 7 8 9 10
Advertising expenditure 21 25 29 33 37 41
(Sh. M)
Calculate the Karl Pearson’s correlation coefficient between sales revenue and
advertising expenditure. Comment on the results. (10 Marks)

Page 8 of 12 Regression & Correlation


EXTRAS

7. The following data give the test scores and sales made by 9 salesmen
during the last one year

Test scores 14 19 24 21 26 22 15 20 19
Sales 31 36 48 37 50 45 33 41 39
(millions
Obtain

i. The regression equation of test scores on sales (5 Marks)


ii. The regression equation of sales on test scores (5 Marks)

8. An examination of eight applicants for clerical post was taken by a firm.


From the marks obtained by the applicants in the Accountancy and
Statistics papers, compute rank coefficient of correlation.

Applicant A B C D E F G H
Marks in Accountancy 15 20 28 12 40 60 24 80
Marks in Statistics 40 32 50 35 20 10 30 60

9. The following data have been collected regarding sales and advertising
expenditure

Sales (Sh. M) 8.5 9.2 7.9 8.6 9.4 10.1

Advertising expenditure 210 250 290 330 370 410


(Sh. M)
Calculate the Karl Pearson’s correlation coefficient between sales revenue and
advertising expenditure. Comment on the results. (12 marks)

10. Two soccer judges were assigned the task of assessing 10 players of a
team and each judge awarded each player points as shown in the table

Page 9 of 12 Regression & Correlation


Player No. 1 2 3 4 5 6 7 8 9 10

Judge A 34 30 44 8 12 41 38 18 26 28

Judge B 26 22 42 10 18 32 46 17 12 30

Compute the Spearman’s rank coefficient of correlation between the points


awarded by the two soccer judges. (8 Marks)

11. Find the regression equation for predicting Y from X given the data

X 0 2 8 6
Y 4 1 10 9 (7 marks)

1. EXERCIZES

QUESTION TWO (20 MARKS)

a. Use the method of least squares to determine the equation of the straight
line that best fits the following data.
(10 Marks)
X 11 13 14 17 18 21 26
Y 20 23 25 28 30 34 38
b. Using the data in (b) above find the coefficients of correlation and
determination and interpret your result.
(10 Marks)

c. The following table gives the profits in ten thousand of shillings of two
supermarkets. Compute the coefficient of variation for each supermarket
and indicate which one has higher variability of profits.
(8 Marks)
A 48 15 28 41 59 41
B 33 20 23 69 45 53

Page 10 of 12 Regression & Correlation


d. obtained by the applicants in the Accountancy and Statistics papers,
compute rank coefficient of correlation.
Applicant A B C D E F G H
Marks in 15 20 28 12 40 60 24 80
Accountancy
Marks in Statistics 40 32 50 35 20 10 30 60

(8 Marks)

QUESTION THREE 20 MARKS


I. The following table gives the various values of two variables.
(X): 42 44 58 55 89 98 66
(Y): 56 49 53 58 65 76 58
a. Determine the regression equation which may be associated with these
values and hence use it to estimate the value of y when x = 60.
(8 Marks)

b. Calculate the coefficient of correlation; hence interpret the results.


(8 Marks)

II. The table shows the turnover and profit before taxation of a supermarket from
1982 to 1987.
Year Turnover (‘0000 ksh) Profit before Taxation
1982 106 10
1983 125 12
1984 147 16
1985 167 17
1986 187 18
1987 220 22
(i) Plot a scatter diagram showing the relationship between profit before taxation and
turn over. (4 marks)
(ii) Calculate the regression line of profit before taxation on turnover
( 8 marks)
(iii) Forecast the profit if the turnover is 2,800,000. (2 Marks)

Page 11 of 12 Regression & Correlation


QUESTION FOUR (20 MARKS)
(a) The following data have been collected regarding sales and advertising expenditure
Sales (Sh. M) 8.5 9.2 7.9 8.6 9.4 10.1
Advertising expenditure 210 250 290 330 370 410
(Sh. M)
Calculate the Karl Pearson’s correlation coefficient between sales revenue and
advertising expenditure. Comment on the results. (12 marks)

QUESTION FIVE (20 MARKS)


1. The following data have been collected regarding sales and advertising expenditure
Sales (Sh. M) 8 9 7 8 9 10
Advertising 21 25 29 33 37 41
expenditure (Sh. M)
Calculate the Karl Pearson’s correlation coefficient between sales revenue and
advertising expenditure. Comment on the results. (10 Marks)

2. A researcher studied the connection between x (the age in years of a licensed


driver) and y (the percentage of fatal accidents for drivers of that age which
are caused by speeding). The collected data is shown below.

X 17 27 37 47 57 67 77
y 36 25 20 12 10 7 5
Using this data to:

i. Calculate the coefficients of correlation (5 Marks)


ii. Find regression equation that adequately represents the data. (5 Marks)

Page 12 of 12 Regression & Correlation

You might also like