0% found this document useful (0 votes)
35 views14 pages

Correlation Coefficient

Uploaded by

nishtha saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views14 pages

Correlation Coefficient

Uploaded by

nishtha saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

2Linear Correlation (Pearson’s coefficient of correlation and

Spearman’s rank order correlation)


Meaning: Correlation is a measure of the relation between two or more variables. In
other words, it shows how changes in one variable affect changes in the other
variables. If the two variables are highly correlated, it means that when one changes
by a certain amount the other changes on an average by a certain amount.

In education and psychology, there are times when one needs to know whether there
exists any relationship among the different attributes of the individual or they are
independent of each other. Consequently, there are numerous questions like the
following which need to be answered by computing correlation:
1. Does scholastic achievement depend upon the general intelligence of a child?
2. Is it true that the height of the children increases with the increase in their ages?

Such distributions that show the relation between two variables are called bivariate
distributions. The frequency distribution of two variables, age and marks, obtained
by 50 students in an skilful test, is shown.

Types of correlation

There are many types of correlation, like linear and curvilinear, that are computed in
statistics. We shall confine ourselves here, to the method of linear correlation. When
the relationship between two sets of scores or variables can be represented graphically
by a straight line, it is known as linear correlation. Linear correlation is also of many
types such as Pearson’s coefficient of correlation and Spearman Rank Order
correlation.

Pearson’s Coefficient of correlation (rxy)


Correlation is a measure of the relation between two or more variables. In other
words, it shows how changes in one variable affect changes in the other variables. E.g
relation between IQ and academic achievement.

1
Range: Correlation coefficients can range from -1.00 to +1.00. The value of -1.00
represents a perfect negative correlation while a value of +1.00 represents a perfect
positive correlation. A value of 0.00 represents a lack of correlation.

Strength and direction of relationship:

● Correlation- a matter of Degree/Intensity: Intensity refers to the strength of the


relationship and is expressed as a number between zero (meaning no
correlation) and one (meaning a perfect correlation).Sign doesn’t matter here.

● Correlation-a matter of Direction. Direction refers to how one variable moves


in relation to the other. A positive correlation (or direct relationship) means
that two variables move in the same direction, either both moving up or both
moving down. For instance, high school grades and college grades are often
positively correlated in that students who earn high grades in high school tend
to also earn high grades in college. A negative correlation (or inverse
relationship) means that the two variables move in opposite directions; as one
goes up, the other tends to go down. For instance, depression and self-esteem
tend to be inversely related because the more depressed an individual is the
lower his or her self-esteem. As depression increases, then, self-esteem tends
to decrease. The sign in front of the ‘r’ represents the direction of a
correlation.

Ans: 0, 0.62, -0.84, -0.90, 0.93,

Scatter Plot

2
What is a scatter diagram ? With the help of scatter diagrams show the positive,
negative and zero correlation. (or explain the meaning of direction with the help of a
scatter diagram)

Correlations are graphed on a special type of graph called a scatter plot (or scatter
gram). On a scatter plot, one variable (typically called the X variable) is placed on the
horizontal axis (abscissa) and the Y variable is placed on the vertical axis
(ordinate). If the variables are highly correlated, the points will hug the line of best fit.
For perfect correlations the scores hug the line of best fit very closely while for low
correlations the scores scatter widely from the line of best fit. Thus, the closer the
scores hug the line of best fit the higher the correlation (‘hugging principle’ –
Minium). This is referred to as the “hugging principle”

Scatter diagrams applied in several ways, where the most important benefit is
showing the correlation between two variables. The scatter diagram will visualize in
an easy to observe way if the data points are positively correlated, negatively
correlated, or there is no correlation between the two variables.

What scatter plots can tell us?

3
If the variables are highly correlated, the points will hug the line of best fit.

Types of Scatter Plots?

According to the correlation, scatter plots are divided into the following three
categories.
1. Positive Correlation: The scatter plot with a positive correlation is also known
as a "Scatter Diagram with Positive Slant." In this case, as the value of X
increases, the value of Y will increase too, which means that the correlation
between the two variables is positive. If you draw a straight line along the data
points, the slope of the line will go up. For example, if the weather gets colder,
hot drink sales will go up.
2. Negative Correlation: The scatter plot with a negative correlation is also
known as a "Scatter Diagram with a Negative Slant." In this case, as the value
of X increases, the value of Y will decrease. If you draw a straight line along
the data points, the slope of the line will go down. For example, if the number
of practice trials goes up, the number of errors will go down.
3. No Correlation: The scatter plot with no correlation is also known as the
"Scatter Diagram with Zero Degree of Correlation." In this case, the data point

4
spreads so randomly that you can’t draw a line through the data points. You
can conclude that these two variables have no correlation or zero degrees of
correlation. For example, if the weather gets hotter, we can’t conclude that the
sales of wooden chairs will go up or down because there is no correlation
between the two variables.

Computation of Pearson’s product moment correlation

5
Raw score Formula

6
Rank difference method rho
For computing the coefficient of correlation between two sets of scores we require
ranks, i.e. positions of merit of these individuals in possession of certain
characteristics.

Rank 1 is the highest rank

Example: Six traits (A, B, C, D, E, F) believed to be important in the work of an


executive have been ranked in order of -merit by Judges X and Y as given below.
Find an appropriate correlation

7
Example 2: When there are tied ranks

Factors affecting correlation coeffecient

8
1. Pearson’s and Spearman’s correlation coefficients are appropriate only
for linear relationships. A: Pearson's r and Spearman’s correlation measure
the strength of the linear relationship between two variables: for a given
increase (or decrease) in variable X, there should be a constant corresponding
increase (or decrease) in variable Y.

When the data is curvilinear, then Pearson’s correlation will be low as


compared to when a curvilinear correlation is calculated. The reason is based
on hugging principle which states that the closer the scores to the straight line
of best fit, higher the value of ‘r’. When the data is curvilinear, but nonetheless
a straight line has been fitted, then the scores do not hug the straight line very
well, resulting in a low correlation. On the other hand, if a proper curved line
has been fitted for curvilinear data, the scores will hug this line closely
reflecting a higher correlation.

2. The correlation coefficient is sensitive to the range of talent (the


variability) characterizing the measurements of the two variables: The
smaller the amount of variability in X and/or Y, the lower the apparent
correlation. Conversely if the variability is increased on scores, the correlation
will increase. Thus, for example if a test is given to an inferior group the
scores will huddle around the low scores while if it is given to a superior
group the scores will huddle around the high scores. In both these cases
variability is low and so correlation will be low. On the other hand if the
distribution is normally distributed, there will be high variability and therefore
a high correlation.

3. When samples are pooled, the correlation for the combined data depends
on where the sample values lie relative to one another in both the X and Y

9
dimensions: The correlation of samples taken separately and that calculated
by pooling the samples will be different. The correlation for the combined data
depends on the nature of the two samples w.r.t. X and Y. The correlation will
be high for pooled samples as compared to separate samples when one sample
has higher ability than the second.

4. The correlation coefficient tends to be an overestimate in discontinuous


distributions: If the distribution is discontinuous (gaps are there) then in most
cases correlation coefficient will be higher as compared to if the distribution is
continuous. When the distribution is discontinuous (middle values are
missing), we have no information at all about what happens when X and Y
have intermediate values: we only know what happens when the values of X
and Y are either low or high. Since r is an estimate of the average strength of
the relationship, it would be misleading to use it in this case.

5. Score transformation: adding, subtracting, multiplying or dividing each score


in the distribution of X or Y or both; with a constant has no effect on
correlation coefficient.

6. Homoscedasticity (equal variability): Homoscedasticity (pronounced "ho-


mo-skee-das-ti-ci-ty") basically means that the variances along the line of best
fit remain similar as you move along the line. It is required that the data show
homoscedasticity for one to calculate a Pearson product-moment correlation.

10
In the first diagram there is homoscedasticity while in the second there is
very low level of homoscedasticity.

Pearson's r describes the average strength of the relationship between X and


Y. Hence scores should have a constant amount of variability at all points in
their distribution. In the following graph, the data lack homoscedasticity.

● For small values of X and Y, there is a strong relationship: the points


are all very close to the regression line.
● However for large values of X and Y, the relationship is much weaker.

This makes r pretty meaningless as a summary of the relationship between


the two variables: for small values of X and Y, r underestimates the
strength of the relationship, whereas for large values of X and Y, r
overestimates it.

7. The correlation coefficient, like other statistics, is subject to sampling


variation: When pairs of samples are taken are taken out repeatedly from two
populations and the correlation coefficient calculated, a sampling distribution
of correlation coefficients is obtained. These correlation coefficients differ
from each other. The larger the samples the less the fluctuation and the more

11
representative it is of the population. Conversely if the samples are small the
correlation coefficient will differ from each other and not be representative of
the population.

8. Other factors affecting correlation: Correlation coefficient depends on


several factors. For example if a correlation is found between intelligence and
maths score, then the correlation will depend on the specific tests used for
measuring intelligence and maths, it will differ among 4rth class as compared
too 8th class students and it may also differ according to SES. Thus correlation
depends on the specific measures used for assessing the two variables, kinds
of subjects, and the context of the study.

Significance of measuring correlation


1. The objective of much research in psychology is to identify the extent to
which one variable relates to another variable. For example:
● Is there a relationship between a person's education level and their
health?
● Is pet ownership associated with living longer?
● Did a company's marketing campaign increase their product sales?
● Is intelligence related to better school performance?

2. Correlation helps in predicting one value from the other. The relationship can be
represented by a simple equation called the regression equation. For e.g if know that
the correlation is high b/w job performance and experience, then we can predict future
job performance based on years of experience. The higher the correlation, the better
the prediction.

3. Correlation and causation: Correlation does not tell us causation.

Example 1:

Example 2:

Until the late 19th century, it was believed by scientists and laypeople alike that bad
odors caused disease. The sick and dying tended to smell unpleasant so the two
phenomena were correlated. However, it was only in 1880 it became clear that while
bad smells and disease often appeared together, both were caused by a third, variable
—germs.

12
Figure 9 6 shows four of the possibilities that may underlie a correlation. between
two variables:

(a) The condition of X determines (in part, at least) the condition of Y.


(b) The opposite is true— Y is a cause of X.
(c) Some third variable influences both X and Y
(d) A complex of interrelated variables influences X and Y Moreover, two or
more of these situations may occur simultaneously.

Thus, an investigator must be alert to the several possible interpretations of a


correlation.

Although a correlation by itself does not prove causation, however, this does not
mean that correlations are of no importance in research. The finding of a strong
correlation between two variables is often the starting point for additional research. In
1955, for example, a researcher found a correlation of +.70 between the mean
number, of cigarettes smoked and the incidence of lung cancer in 11 countries (Doll,
1955). The correlation alone did not prove causation, for the countries also differed in
levels of air pollution and other factors that might cause lung cancer, but the results
were nevertheless suggestive and stimulated a great deal of additional research. On
the basis of this other research, the U.S. Surgeon General later ordered that all

13
packages of cigarettes sold in the United States carry the warning that smoking can be
hazardous to your health.

Difference between Pearson’s correlation and Spearman’s Rank order


correlation

14

You might also like