0% found this document useful (0 votes)
28 views

Correlation and Regression Analysis

Uploaded by

patnampanchami00
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Correlation and Regression Analysis

Uploaded by

patnampanchami00
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Topic 3

Correlation and
Regression Analysis
Univariate, Bivariate and
Multivariate Distribution
• Univariate Distribution :
• This type of data consists of only one variable.
• The analysis of univariate data is thus the simplest form of analysis since the
information deals with only one quantity that changes.
• It does not deal with causes or relationships and the main purpose of the
analysis is to describe the data and find patterns that exist within it. The
example of a univariate data can be height, marks.

Table: Quiz Marks


Students 1 2 3 4 5 6 7
Marks Obtained 9 7 7 6 4 4 2
Univariate, Bivariate and
Multivariate Distribution
• Bivariate Distribution :
• This type of data consists of two different variables.
• The knowledge of such a relationship is important to make inferences from
the relationship between variables in a given situation.
• Few instances where the knowledge of an association or relationship
between two variables would prove vital to make decision are:
• Sales revenue and expenses incurred on advertising.
• Frequency of smoking and lung damage.
• Weight and height of individuates.
• Age and sign legibility distance.
• Age and hours of TV viewing per day.
• Family income and expenditure on luxury items.
Univariate, Bivariate and
Multivariate Distribution
• Multivariate Distribution :
• When the data involves three or more variables,
• It is categorized under multivariate.
• Example of this type of data is suppose an advertiser wants to compare the
popularity of four advertisements on a website, then their click rates could
be measured for both men and women and relationships between variables
can then be examined.
Correlation Analysis
Correlation Analysis
• A statistical technique that is used to analyse the
strength and direction of the relationship between two
quantitative variables, is called correlation analysis.

• An analysis of the relationship of two or more variables


is usually called correlation — A. M. Tuttle
Types of Correlations
• There are three broad types of correlations:
1. Positive and negative,
2. Linear and non-linear,
Types of Correlations
• Positive and negative correlation
Types of Correlations
• Linear and non-linear
• A correlation is referred to as linear correlation when variations in the values of two
variables have a constant ratio.

• A correlation is referred to as a non-linear correlation when the amount of change in


the values of one variable does not bear a constant ratio to the amount of change in
the corresponding values of another variable.
Methods of Correlation Analysis
1. Scatter Diagram method
2. Karl Pearson’s Coefficient of Correlation method
3. Spearman’s Rank Correlation method
Methods of Correlation Analysis
1. Scatter Diagram method
2. Karl Pearson’s Coefficient of Correlation method
3. Spearman’s Rank Correlation method
Scatter Diagram
• The scatter diagram method is a quick at-a-glance method of
determining of an apparent relationship between two variables, if
any.
• A scatter diagram (or a graph) can be obtained on a graph paper
by plotting observed (or known) pairs of values of variables x and
y, taking the independent variable values on the x-axis and the
dependent variable values on the y-axis.
• It is common to try to draw a straight line through data points so
that an equal number of points lie on either side of the line.
• The relationship between two variables x and y described by the
data points is defined by this straight line.
• In a scatter diagram the horizontal and vertical axes are scaled in
units corresponding to the variables x and y, respectively
Scatter Diagram Example
Perfect Positive Correlation

X 1 2 3 4 5
Y 1 2 3 4 5

𝒓 =+𝟏
Scatter Diagram Example
Perfect Negative Correlation

X 1 2 3 4 5
Y 5 4 3 2 1

𝒓 =−𝟏
Scatter Diagram Example
High degree of Positive Correlation High degree of Negative Correlation
Scatter Diagram Example
Low degree of Positive Correlation Low degree of Negative Correlation
Scatter Diagram Example
Low degree of Positive Correlation Low degree of Negative Correlation
Scatter Diagram Example
No Correlation 𝒓 =𝟎
Example

• Given the following data

Students 1 2 3 4 5 6 7 8 9 10
Management aptitude score 400 675 475 350 425 600 550 325 675 450
Grade point average 1.8 3.8 2.8 1.7 2.8 3.1 2.6 1.9 3.2 2.3
Example
Students 1 2 3 4 5 6 7 8 9 10
Management aptitude score 400 675 475 350 425 600 550 325 675 450
Grade point average 1.8 3.8 2.8 1.7 2.8 3.1 2.6 1.9 3.2 2.3

Interpretation: From the scatter diagram shown in Fig., it appears that there is a
high degree of association between two variable values. It is because the data
points are very close to a straight line passing through the points. This pattern of
dotted points also indicates a high degree of linear positive correlation.
Methods of Correlation Analysis
• The correlation between two ratio-scaled (numeric) variables is represented by
the letter r which takes on values between –1 and +1 only.
• Sometimes this measure is called the ‘Pearson product moment correction’ or
the correlation coefficient.
• The correlation coefficient is scale free and therefore its interpretation is
independent of the units of measurement of two variables, say x and y.

Negative correlation Positive correlation

-1 -0.5 0 0.5 1
Strong negative correlation Weak negative correlation Weak positive correlation Strong positive correlation

Perfect Moderate Moderate Perfect


No correlation
negative negative positive positive
correlation correlation correlation correlation
Methods of Correlation Analysis
1. Scatter Diagram method
2. Karl Pearson’s Coefficient of Correlation method (Covariance
Method)
3. Spearman’s Rank Correlation method
Karl Pearson’s Coefficient of
Correlation method
• Karl Pearson’s correlation coefficient measures quantitatively the extent to which
two variables x and y are correlated. For a set of n pairs of values of x and y,
Pearson’s correlation coefficient r is given by

• Where
Covariance

standard deviation of sample data on variable x

standard deviation of sample data on variable y


Karl Pearson’s Coefficient of
Correlation method
• Substituting mathematical formula for Cov (x, y) and and , we have

• Suppose and
Example
• The following table gives indices of industrial production and number of registered
unemployed people (in lakh). Calculate the value of the correlation coefficient using
Karl Pearson’s correlation coefficient method
Year Production Number Unemployed
2000 100 15
2001 102 12
2002 104 13
2003 107 11
2004 105 12
2005 112 12
2006 103 19
2007 99 26
Example
• Calculate Karl Pearson’s correlation coefficient between
expenditure on advertising and sales from the data
during the IPL tournament .
Advertising expenditure Sales (lakh
(‘000 Rs) Rs)
39 47
65 53
62 58
90 86
82 62
75 68
25 60
98 91
36 51
78 84
Example
• The following table gives the distribution of items of production and also the relatively
defective items among them, according to size groups. Find the correlation coefficient
between size and defect in quality using covariance method
Size group No. of item No. of defective items
15-16 200 150
16-17 270 162
17-18 340 170
18-19 360 180
19-20 400 180
20-21 300 114
Regression Analysis
Independent and dependent
variables
• Regression and correlation analyses are based on the relationship, or association,
between two (or more) variables.

• The known variable (or variables) is called the independent variable(s).

• The variable we are trying to predict is the dependent variable or the variable
used to predict or explain the dependent variable
Introduction to Regression Analysis
 Predict the value of dependent variable based on the value of at least one dependent
variable
 Explain the impact of changes in an independent variable on the dependent variable.
 The statistical technique of estimating the unknown value of one variable (i.e.,
dependent variable) from the known value of other variable (i.e., independent variable)
is called regression analysis.
 How the typical value of the dependent variable changes when any one of the
independent variables is varied, while the other independent variables are held fixed.

 Example:
Estimation using the Regression Line
• The equation for a straight line where the dependent variable Y is determined by
the independent variable X is:

• Using this equation, we can take a given value of X and compute the value of Y.
• The a is called the Y-intercept because its value is the point at which the
regression line crosses the Y-axis—that is, the vertical axis.
• The b in Equation is the slope of the line.
• It represents how much each unit change of the independent variable X
changes the dependent variable Y.
• Both a and b are numerical constants because for any given straight line, their
values do not change.
Example 1
The following table shows the number of two wheeler sold in
Hyderabad for a term of 8 years and sale of tyres by MRF in
the Hyderabad for the same period
Year 2015 2016 2017 2018 2019 2020 2021 2022
No. of two wheeler
sold (in 000) 60 63 72 75 65 66 67 68
No. of tyres sold by
MRF (in 000) 125 110 130 135 125 126 112 113

a) Find the regression equation to estimate the sale of tyres


when the number of two wheeler sold is known.
b) Estimate the sale of tyres when two wheeler sold is 850.
Example 2
• The monthly salary of the programmer and lines of code
written per day by ten programmers are given in Table
(a) Construct a regression line to predict the lines of code
written per day by a programmer based on the monthly
salary.
(b) What would be the lines of code written by a programmer
whose monthly salary is 48,000?
Monthly salary 32 33 34 37 38 40 40 41 42 43
(in 000s of rupees)
Lines of code 110 111 114 145 156 160 162 162 169 171
written per day
Example 3
Sales Purchase
From the given data,
91 71
calculate 97 75
(a) Construct a regression 108 69
line to predict the 121 97
67 70
purchase based on the 124 91
sales. 51 39
(b)What would be the 73 61
111 80
purchase if sales is
57 47
monthly salary is 120?

You might also like