0% found this document useful (0 votes)
11 views5 pages

Correlation and Regression Analysis

The document discusses methods for analyzing the relationship between variables, focusing on correlation and regression analysis. It explains how to use chi-square tests for non-metric variables and paired T-tests for related metric variables, emphasizing the importance of correlation coefficients in determining relationships. Additionally, it outlines regression analysis as a technique to predict dependent variables based on independent variables, detailing various types of regression and associated statistics.

Uploaded by

Nithin Suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Correlation and Regression Analysis

The document discusses methods for analyzing the relationship between variables, focusing on correlation and regression analysis. It explains how to use chi-square tests for non-metric variables and paired T-tests for related metric variables, emphasizing the importance of correlation coefficients in determining relationships. Additionally, it outlines regression analysis as a technique to predict dependent variables based on independent variables, detailing various types of regression and associated statistics.

Uploaded by

Nithin Suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Correlation

To find the association between two non-metric variables, we do a crosstab and chi-
square. In case we wanted to examine the change in two related metric variables, we do a paired
T-test. The output of paired T-test in SPSS contains correlation values as default. Correlation
measures relationship between two metric variables as chi-square crosstab does for two non-
metric variables. Both tests inform if the two variables are associated/related.
When we test the relationship between two variables, there are two possibilities.

They may or may not be significantly related. Our null hypothesis in correlation is similar to
chi-square crosstab. It states that the two metric variables are not related. When this hypothesis
is rejected, we conclude that the two variables are related.

After confirming the relationship, the next thing is to examine the direction of the
relationship. Who says hate is not a relationship? It is a negative relationship. A positive
relationship means both variables either increase or decrease simultaneously. For, eg. Height
and weight may be positively related. In marketing, the advertising budget and brand recall
may be positively related.
The two variables may be negatively related. The most recallable concept is the law of demand.
Ceteris paribus, when price increases quantity demanded decreases.
Karl Pearson Correlation coefficient is the measure of the relationship between two metric
variables like height and weight, sales promotion and sales etc. The value of this coefficient
varies between -1 and 1. A zero value indicates no LINEAR relationship meaning although the
two variables don’t have any linear relationship they may still have a nonlinear relationship
like quadratic, exponential etc.

After there is a relationship and we know whether it's positive or negative, we need to
predict one variable if we know the other. The square of R, R2 is called, the coefficient of
determination. This R2 explains how much of the variation in one variable can be determined
by the other variable.
There is another coefficient called, Spearman's Rho, used to measure the relationship
between two ordinal (rank) variables. Let’s take one example: Suppose MMI course is taught
by two professors, Professor A and Professor B to the same students. Professor A teaches pre-
midterm, and Professor B teaches post midterm. The performance of students is likely to be
correlated in the two editions.
Our hypothesis is no linear correlation, i.e. R=0
The data is metric, i.e. marks (ratio scale) and we can find Karl Pearson Correlation
Coefficient.
In SPSS go to Analyse/Correlate/Bivariate and then put the two variables in the test box and
click OK.

Firstly we have to see whether we can say with more than 95% confidence that there is
a correlation between the two variables. The p-value if less than .05 then correlation is
significant at 95% and if the p-value is less than .01, then the p-value is significant at 99%. The
good thing is that the significant correlations are flagged. If the correlation is significant at

Restricted circulation. Do not share without explicit permission of Prof. Shivan Sanjay Patel
95%, one star will appear next to the Pearson Correlation Coefficient, and if it is significant at
99%, then two stars will appear.

If none of the stars appears then the null hypothesis of NO Correlation has to be
accepted, and no further inference can be drawn no matter what the R-value is. If the correlation
is significant (shown by stars), then the next thing is to look at the sign of the correlation
coefficient. A positive value means a direct relationship, and a negative sign means an inverse
relationship. The third thing to look now is the R2 value. The R2 will always be positive, and
as mentioned earlier, it tells the amount of variation in one variable explained by the other. The
R squared is an indication of the strength of the relationship.

However, if the data is ordinal, meaning instead of marks the two faculties gave ranks
to the students. Then we can calculate Spearman's Rho and the interpretation is precisely like
Pearson.

Restricted circulation. Do not share without explicit permission of Prof. Shivan Sanjay Patel
Regression Analysis
Basic Concept
Regression analysis is a dependence technique is used to check the impacts of independent
variable (s) on dependent variable and/or predict dependent variable given the known values
of independent variables. Correlation is a necessary condition of regression. If two variables
are correlated, then only one can regress another.

Application of regression
Regression can be used to determine:
Whether a relationship exists: Whether the independent variables explain a significant
variation in the dependent variable
The strength of the relationship: How much of the variation in the dependent variable can be
explained by the independent variables
The structure or form of the relationship: the mathematical equation relating the
independent and dependent variables
Predict the values of the dependent variable: when values for the independent variables are
known
Control for other dependent variables: when evaluating the contributions of a specific
variable or set of variables.

Types of regression

Types No. and Nature of DV No. and Nature of IV

Bivariate/Simple regression One and metric One and metric

Multiple regression One and metric Two or more and metric

Statistics associated with bivariate/simple regression


Coefficient of determination (R2): The strength of association is measured by the coefficient of
determination, R2. It varies between 0 and 1 and signifies the proportion of the total variation
in DV that is accounted for by the variation in IV.

Regression coefficients: Regression outputs general give two regression coefficients are
discussed as follows:
1. Standardized regression coefficient: Also termed the beta coefficient or beta weight,
this is the slope obtained by the regression of Y on X when the data are standardized.
The units of beta values are standardized. This coefficient is used for the purpose 1 as
mentioned in Application of regression section.
2. Unstandardized regression coefficient: The estimated parameter b is usually referred
to as the non-standardized regression coefficient. The units of b values of respective
IVs are unstandardized, i.e., may be in their original units. This coefficient is used for
the purchase 2 as mentioned in Application of regression section. Regression equation

Restricted circulation. Do not share without explicit permission of Prof. Shivan Sanjay Patel
is required in this case: Y = a + β1X1 (bivariate) OR Y = a + β1X1 + β2X2 + β3X3 + ...
+ βtXt (Multiple)

Statistics associated with multiple regression

Adjusted R2: R2, coefficient of multiple determination, is adjusted for the number of
independent variables and the sample size to account for the diminishing returns. After the
first few variables, the additional independent variables do not make much contribution.

F test: The F test is used to test the null hypothesis that the coefficient of multiple determination
in the population, R2pop, is zero. This is equivalent to testing the null hypothesis.

VIF: Variance Inflation factor is a measure of multi-collinearity. Multicollinearity is a


statistical phenomenon that occurs when two or more independent variables in a regression
model are highly correlated with each other. The ideal value should be less than 5.

Durbin-Watson statistic: It measures the autocorrelation. Autocorrelation refers to the degree


of correlation of the same variables between two successive time intervals. The ideal range is
between 1.5 to 2.5.

Case to use regression:

Recall that once we know that a relationship exists, we would like to predict the
behaviour of one variable once we know the other. I had told you this incident in the class. I
was at IIM R, and a lady Accounts lecturer was taking attendance. I happen to be present there.
She happened to call a girl's name, and she was absent. The lady lecturer stopped taking
attendance, looked at the class, smiled and said that then a boy, named X, would also be absent
and everyone started laughing. But she was following regression after knowing correlation.
Once there is a relationship between the two variables, the behaviour of one variable can be
predicted if we know the other variable. So simply saying regression is the average relationship
between the two variables. Regression is meaningful only when the two variables are related
(meaning correlation is significant) hence correlation is the necessary condition before
regression. It is for this reason that correlation and regression are always pronounced together.
In marketing, if we found that advertising budget affects sales, the manager would like to know
how much increase in advertising will lead to how much sales. Another example could be the
negative relationship between price and demand. The objective of a manager is not to reduce
or increase price or not to reduce or increase demand; the objective often is to maximize profit.
So regression can answer that if there is a relationship between price and quantity demanded,
then how much change in price will lead to how much change in demand and vice versa.
So, the most important objective in regression is to predict the value of the other when we know
one. First, we will take a case of only two variables called bivariate (bi=two variables), one is
a dependent variable and the other independent variable.
The objective is to predict the dependent variable with the help of independent variable.
Since we will try to fit a straight line to know the relationship (that's why this is called linear
regression), we should first see whether a linear relationship exists before trying to find a linear
relationship. The best way to SEE this is to plot a scatterplot. If the dots seem to represent a

Restricted circulation. Do not share without explicit permission of Prof. Shivan Sanjay Patel
line, then only regression should be done. However, if they seem to make a curve, exponential
etc. we shouldn’t fit the linear regression and try to fit the curve equation depending on what
we think best represents the relationship.
The linear regression equation can be written as:
Y = a + bX + e
Y = dependent variable
X = independent variable
a = constant
e = Error term
When we are predicting the value of dependent variable Y after knowing the values of X, we
don't expect that we will be able to know the dependent variable fully if we know the
independent. The error term signifies the amount of variation not explained by the independent
variable. In bivariate regression, the coefficient of X is nothing but the correlation coefficient
only.

Things to do with regression:


• Prediction (unstandardized beta)
• Comparison different IVs (standardized beta)

Restricted circulation. Do not share without explicit permission of Prof. Shivan Sanjay Patel

You might also like