Sps 2291 Lesson 4
Sps 2291 Lesson 4
SEPTEMBER 2023
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Table of contents 2
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
3
LESSON FOUR:
CORRELATION AND
REGRESSION ANALYSIS
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
CORRELATION AND REGRESSION ANALYSIS 4
Lesson Objectives
By the end of the lesson learners should be able to:
▶ Interpret scatter diagrams for bivariate data.
▶ Fit the equations of the least squares regression line and use
them to estimate values.
▶ Calculate and interpret the value of the product-moment
correlation coefficient.
▶ Calculate and interpret the value of spearman’s rank
correlation coefficient.
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Introduction 5
Correlation
Correlation is a statistical measure that indicates the extent to which two
or more variables fluctuate together. A positive correlation indicates the
extent to which those variables increase or decrease in parallel; a negative
correlation indicates the extent to which one variable increases as the
other decreases.
Scatter Diagrams
A scatter diagram is a tool for analyzing relationships between two
variables. One variable is plotted on the horizontal axis and the other is
plotted on the vertical axis. The pattern of their intersecting points can
graphically show relationship patterns.
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Interpreting a Scatter Plot 7
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Assumptions of Correlation 8
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Example 10
Solution
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Solution 11
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Solution 12
Other solution
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Other solution 13
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Other solution 14
Example
Calculate the Spearman rank-order correlation coefficient using
the following data:
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Solution 16
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Solution 17
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Solution 18
Exercise
Study is conducted involving 14 infants to investigate the association between gestational age at birth, measured in weeks, and birth
weight, measured in grams.
Use this data to calculate the pearson correlation coefficient and the Spearmans rank correlation coefficient and comment on your
answer.
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
REGRESSION ANALYSIS 19
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Example 23
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Solution 24
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Coefficient of Determination 25
▶ We may ask the question: How good is the regression model? In other
words: How well does the independent variable explain the dependent
variable in the regression model? The coefficient of determination is
one concept that answers this question. in the absence of a regression
model, we use ȳ perform estimation or prediction. Consequently, the
error of prediction is the difference between the actual observed value
and the mean of the observed values. If we calculate such errors in
the sample and then square and add them, the resulting sum is called
the total sum of squares and is denoted by SST.
▶ The Coefficient of determination, denoted by R 2 ,represents the
proportion of SST that is explained by use of the regression model and
is given by:
SSxy SSR
R 2 =β1 =
SSyy SST
where : SSR = Σ(Ŷ − Ȳ )2 and SST = Σ(Y − Ȳ )2
0 ≤ R2 ≤ 1
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Example 26
A study is conducted involving 10 patients to investigate the relationship and effects of patient’s age and their blood pressure. Use the
data below to obtain the coefficient of determination from the fitted line of best fit.
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]
Exercise 27
1. The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of
this number for a biodiesel fuel is expensive and time-consuming. A study including the following data on x= iodine values in
(g) and y= cetane number for a sample of 14 biofuels was conducted.
x 132.0 129.0 120.0 113.2 105.0 92.0 84.0 83.2 88.4 59.0 80.0 81.5 71.
y 46.0 48.0 51.0 52.1 54.0 52.0 59.0 58.7 61.6 64.0 61.4 54.6 58.
MUSEMBI N.S, MSC. | STA 2470/SPM 2291: PROBABILITY AND STATISTICS LECTURE NOTES [2023]