Correlation and Regression Analysis
Correlation and Regression Analysis
Correlation and
Regression Analysis
Univariate, Bivariate and
Multivariate Distribution
• Univariate Distribution :
• This type of data consists of only one variable.
• The analysis of univariate data is thus the simplest form of analysis since the
information deals with only one quantity that changes.
• It does not deal with causes or relationships and the main purpose of the
analysis is to describe the data and find patterns that exist within it. The
example of a univariate data can be height, marks.
X 1 2 3 4 5
Y 1 2 3 4 5
𝒓 =+𝟏
Scatter Diagram Example
Perfect Negative Correlation
X 1 2 3 4 5
Y 5 4 3 2 1
𝒓 =−𝟏
Scatter Diagram Example
High degree of Positive Correlation High degree of Negative Correlation
Scatter Diagram Example
Low degree of Positive Correlation Low degree of Negative Correlation
Scatter Diagram Example
Low degree of Positive Correlation Low degree of Negative Correlation
Scatter Diagram Example
No Correlation 𝒓 =𝟎
Example
Students 1 2 3 4 5 6 7 8 9 10
Management aptitude score 400 675 475 350 425 600 550 325 675 450
Grade point average 1.8 3.8 2.8 1.7 2.8 3.1 2.6 1.9 3.2 2.3
Example
Students 1 2 3 4 5 6 7 8 9 10
Management aptitude score 400 675 475 350 425 600 550 325 675 450
Grade point average 1.8 3.8 2.8 1.7 2.8 3.1 2.6 1.9 3.2 2.3
Interpretation: From the scatter diagram shown in Fig., it appears that there is a
high degree of association between two variable values. It is because the data
points are very close to a straight line passing through the points. This pattern of
dotted points also indicates a high degree of linear positive correlation.
Methods of Correlation Analysis
• The correlation between two ratio-scaled (numeric) variables is represented by
the letter r which takes on values between –1 and +1 only.
• Sometimes this measure is called the ‘Pearson product moment correction’ or
the correlation coefficient.
• The correlation coefficient is scale free and therefore its interpretation is
independent of the units of measurement of two variables, say x and y.
-1 -0.5 0 0.5 1
Strong negative correlation Weak negative correlation Weak positive correlation Strong positive correlation
• Where
Covariance
• Suppose and
Example
• The following table gives indices of industrial production and number of registered
unemployed people (in lakh). Calculate the value of the correlation coefficient using
Karl Pearson’s correlation coefficient method
Year Production Number Unemployed
2000 100 15
2001 102 12
2002 104 13
2003 107 11
2004 105 12
2005 112 12
2006 103 19
2007 99 26
Example
• Calculate Karl Pearson’s correlation coefficient between
expenditure on advertising and sales from the data
during the IPL tournament .
Advertising expenditure Sales (lakh
(‘000 Rs) Rs)
39 47
65 53
62 58
90 86
82 62
75 68
25 60
98 91
36 51
78 84
Example
• The following table gives the distribution of items of production and also the relatively
defective items among them, according to size groups. Find the correlation coefficient
between size and defect in quality using covariance method
Size group No. of item No. of defective items
15-16 200 150
16-17 270 162
17-18 340 170
18-19 360 180
19-20 400 180
20-21 300 114
Regression Analysis
Independent and dependent
variables
• Regression and correlation analyses are based on the relationship, or association,
between two (or more) variables.
• The variable we are trying to predict is the dependent variable or the variable
used to predict or explain the dependent variable
Introduction to Regression Analysis
Predict the value of dependent variable based on the value of at least one dependent
variable
Explain the impact of changes in an independent variable on the dependent variable.
The statistical technique of estimating the unknown value of one variable (i.e.,
dependent variable) from the known value of other variable (i.e., independent variable)
is called regression analysis.
How the typical value of the dependent variable changes when any one of the
independent variables is varied, while the other independent variables are held fixed.
Example:
Estimation using the Regression Line
• The equation for a straight line where the dependent variable Y is determined by
the independent variable X is:
• Using this equation, we can take a given value of X and compute the value of Y.
• The a is called the Y-intercept because its value is the point at which the
regression line crosses the Y-axis—that is, the vertical axis.
• The b in Equation is the slope of the line.
• It represents how much each unit change of the independent variable X
changes the dependent variable Y.
• Both a and b are numerical constants because for any given straight line, their
values do not change.
Example 1
The following table shows the number of two wheeler sold in
Hyderabad for a term of 8 years and sale of tyres by MRF in
the Hyderabad for the same period
Year 2015 2016 2017 2018 2019 2020 2021 2022
No. of two wheeler
sold (in 000) 60 63 72 75 65 66 67 68
No. of tyres sold by
MRF (in 000) 125 110 130 135 125 126 112 113