Correlation Coefficient
Correlation Coefficient
In education and psychology, there are times when one needs to know whether there
exists any relationship among the different attributes of the individual or they are
independent of each other. Consequently, there are numerous questions like the
following which need to be answered by computing correlation:
1. Does scholastic achievement depend upon the general intelligence of a child?
2. Is it true that the height of the children increases with the increase in their ages?
Such distributions that show the relation between two variables are called bivariate
distributions. The frequency distribution of two variables, age and marks, obtained
by 50 students in an skilful test, is shown.
Types of correlation
There are many types of correlation, like linear and curvilinear, that are computed in
statistics. We shall confine ourselves here, to the method of linear correlation. When
the relationship between two sets of scores or variables can be represented graphically
by a straight line, it is known as linear correlation. Linear correlation is also of many
types such as Pearson’s coefficient of correlation and Spearman Rank Order
correlation.
1
Range: Correlation coefficients can range from -1.00 to +1.00. The value of -1.00
represents a perfect negative correlation while a value of +1.00 represents a perfect
positive correlation. A value of 0.00 represents a lack of correlation.
Scatter Plot
2
What is a scatter diagram ? With the help of scatter diagrams show the positive,
negative and zero correlation. (or explain the meaning of direction with the help of a
scatter diagram)
Correlations are graphed on a special type of graph called a scatter plot (or scatter
gram). On a scatter plot, one variable (typically called the X variable) is placed on the
horizontal axis (abscissa) and the Y variable is placed on the vertical axis
(ordinate). If the variables are highly correlated, the points will hug the line of best fit.
For perfect correlations the scores hug the line of best fit very closely while for low
correlations the scores scatter widely from the line of best fit. Thus, the closer the
scores hug the line of best fit the higher the correlation (‘hugging principle’ –
Minium). This is referred to as the “hugging principle”
Scatter diagrams applied in several ways, where the most important benefit is
showing the correlation between two variables. The scatter diagram will visualize in
an easy to observe way if the data points are positively correlated, negatively
correlated, or there is no correlation between the two variables.
3
If the variables are highly correlated, the points will hug the line of best fit.
According to the correlation, scatter plots are divided into the following three
categories.
1. Positive Correlation: The scatter plot with a positive correlation is also known
as a "Scatter Diagram with Positive Slant." In this case, as the value of X
increases, the value of Y will increase too, which means that the correlation
between the two variables is positive. If you draw a straight line along the data
points, the slope of the line will go up. For example, if the weather gets colder,
hot drink sales will go up.
2. Negative Correlation: The scatter plot with a negative correlation is also
known as a "Scatter Diagram with a Negative Slant." In this case, as the value
of X increases, the value of Y will decrease. If you draw a straight line along
the data points, the slope of the line will go down. For example, if the number
of practice trials goes up, the number of errors will go down.
3. No Correlation: The scatter plot with no correlation is also known as the
"Scatter Diagram with Zero Degree of Correlation." In this case, the data point
4
spreads so randomly that you can’t draw a line through the data points. You
can conclude that these two variables have no correlation or zero degrees of
correlation. For example, if the weather gets hotter, we can’t conclude that the
sales of wooden chairs will go up or down because there is no correlation
between the two variables.
5
Raw score Formula
6
Rank difference method rho
For computing the coefficient of correlation between two sets of scores we require
ranks, i.e. positions of merit of these individuals in possession of certain
characteristics.
7
Example 2: When there are tied ranks
8
1. Pearson’s and Spearman’s correlation coefficients are appropriate only
for linear relationships. A: Pearson's r and Spearman’s correlation measure
the strength of the linear relationship between two variables: for a given
increase (or decrease) in variable X, there should be a constant corresponding
increase (or decrease) in variable Y.
3. When samples are pooled, the correlation for the combined data depends
on where the sample values lie relative to one another in both the X and Y
9
dimensions: The correlation of samples taken separately and that calculated
by pooling the samples will be different. The correlation for the combined data
depends on the nature of the two samples w.r.t. X and Y. The correlation will
be high for pooled samples as compared to separate samples when one sample
has higher ability than the second.
10
In the first diagram there is homoscedasticity while in the second there is
very low level of homoscedasticity.
11
representative it is of the population. Conversely if the samples are small the
correlation coefficient will differ from each other and not be representative of
the population.
2. Correlation helps in predicting one value from the other. The relationship can be
represented by a simple equation called the regression equation. For e.g if know that
the correlation is high b/w job performance and experience, then we can predict future
job performance based on years of experience. The higher the correlation, the better
the prediction.
Example 1:
Example 2:
Until the late 19th century, it was believed by scientists and laypeople alike that bad
odors caused disease. The sick and dying tended to smell unpleasant so the two
phenomena were correlated. However, it was only in 1880 it became clear that while
bad smells and disease often appeared together, both were caused by a third, variable
—germs.
12
Figure 9 6 shows four of the possibilities that may underlie a correlation. between
two variables:
Although a correlation by itself does not prove causation, however, this does not
mean that correlations are of no importance in research. The finding of a strong
correlation between two variables is often the starting point for additional research. In
1955, for example, a researcher found a correlation of +.70 between the mean
number, of cigarettes smoked and the incidence of lung cancer in 11 countries (Doll,
1955). The correlation alone did not prove causation, for the countries also differed in
levels of air pollution and other factors that might cause lung cancer, but the results
were nevertheless suggestive and stimulated a great deal of additional research. On
the basis of this other research, the U.S. Surgeon General later ordered that all
13
packages of cigarettes sold in the United States carry the warning that smoking can be
hazardous to your health.
14