CORRELATION
Dr/Doaa Mohamed Mahmoud
Assistant professor of Public Health and Community
Medicine
ILOS
By the end of the session the student will be able to:
explore a linear relation between two continuous
variables
interpret parametric and non-parametric
correlation coefficients
Correlation
Correlation analysis is concerned with measuring the
degree of association between two variables, x
and y. Initially, we assume that both x and y are
numerical
Correlation coefficient describes how closely two
variables are related
(the amount of variability in one measurement that is
explained by another measurement)
Correlation
Suppose we have a pair of values, (x, y), measured on
each of the n individuals in our sample.
We can mark the point corresponding to each individual's
pair of values on a two-dimensional scatter diagram.
Conventionally, we put the x variable on the horizontal
axis, and the y variable on the vertical axis in this
diagram.
Plotting the points for all n individuals, we obtain a scatter
of points that may suggest a relationship between the
two variables.
Correlation
Pearson’s correlation coefficient (r)
We measure how close the observations are to the
straight line that best describes their linear
relationship by calculating the Pearson product
moment correlation coefficient, usually simply called
the correlation coefficient.
Its true value in the population, p (the Greek letter,
rho), is estimated in the sample by r.
Pearson’s correlation coefficient (r)
the sign indicates whether one variable increases as
the other variable increases (positive r) or whether
one variables decreases as the other increases
(negative r)
the magnitude indicates how close the points are to
the straight line. The closer r is to the extremes, the
greater the degree of linear association
Pearson’s correlation coefficient (r)
The range of a correlation coefficient is from −1 to +1,
where +1 and −1 indicate that one variable has a
perfect linear association. (this is most unusual, in
practice)
A correlation coefficient of zero indicates absence of a
linear association.
A positive coefficient value indicates that both variables
increase in value together
A negative coefficient value indicates that one variable
decreases in value as the other variable increases.
Properties cont.
It is dimensionless, i.e. it has no units of
measurement.
Its value is valid only within the range of values
of x and y in the sample. You cannot infer that it
will have the same value when considering
values of x or y that are more extreme than the
sample values.
x and y can be interchanged without affecting
the value of r
A correlation between x and y does not
necessarily imply a 'cause and effect' relationship.
Pearson’s correlation coefficient (r)
Pearson’s correlation coefficient (r ) is a parametric
correlation coefficient that is used to measure the
association between two continuous variables that
are both normally distributed.
Pearson’s correlation coefficient (r)
Correlation
Define the null and alternative hypotheses under
study
H0:P=0
H1:P≠0
It is important to note that a significant association
between two variables does not imply that they
have a causal relationship.
Spearman’s rank correlation coefficient
Spearman’s ρ (rho) is a rank correlation coefficient
that is used for two ordinal variables or when one
variable has a continuous normal distribution and
the other variable is categorical or non-normally
distributed. When this statistic is computed, the
categorical or non-normally distributed variable is
ranked that is sorted into ascending order and
numbered sequentially, and then a correlation of
the ranks with the continuous variable that is
equivalent to Pearson’s r is calculated.
Spearman’s rank correlation coefficient
We calculate Spearman's rank correlation coefficient,
the non-parametric equivalent to Pearson's
correlation coefficient, if one or more of the
following points is true:
at least one of the variables, x or y, is measured on
an ordinal scale;
neither x nor y is Normally distributed;
the sample size is small;