Correlation DU Final
Correlation DU Final
Examples
The Number of “Daily Hassles” () that people experience and the number
of Physical and Psychological Symptoms  (
Bullying () and Mental Health Outcomes (in Adolescents
Video Game Usage () and Cognitive Skills ( among School Children
Amount of Training Employees Receive () and their Job Performance (
Correlation and Causation
The above data show a perfect positive relationship (with fixed ratio) between income and weight, i.e., as
the income is increasing, the weight is increasing and the rate of change between two variables is the same.
The rate of change can be variable too (Variable Ratio, Non-linear correlations).
Correlation does not prove Causation
A Correlation Coefficient is a quantitative determination of the degree of relationship between two variables X
and Y. But r alone gives no information as to the character of the association.
We cannot assume a causal sequence unless we have evidence beyond the correlation coefficient itself.
Causation implies an invariable sequence— A always leads to B— whereas correlation is simply a measure of
mutual association between two variables.
Two cases arise in which the direction of the cause-effect relation may be inferred. In the correlation between
X and Y
(1) X may be in part at least a cause of Y
(2) X and Y may have the same basic cause or causes
Athletic prowess is known to depend upon physical strength, dexterity and muscular coordination. The r
between sensorimotor tests and athletic performance will be positive and high, and the direction of the cause-
effect relation is clear.
However, the r between executive ability and emotional stability is determined (besides selection) by
overlapping personality dimensions.
Causal relations are sometimes revealed or suggested by partial correlation where the influence of a given
variable, for example, age, can be controlled and its effects upon variability in other traits held constant. The r
between intelligence and educational achievement over a wide age range is often drastically reduced when
the effect of the age variable is removed.
3. To Test New Measurement Tools
Correlation Research can be used to asses whether a tool consistently or accurately
captures the concept it aims to measure.
Suppose that you want to test a new instrument for measuring your variable, you
need to test its Reliability and Validity.
Examples
A researcher might evaluate the validity of a brief extraversion test by administering it
to a large group of participants along with a longer extraversion test that has already
been shown to be valid.
This researcher might then check to see whether participants’ scores on the brief test
are strongly correlated with their scores on the longer one.
Neither test score is thought to cause the other, so there is no independent variable to
manipulate.
A Hypothetical Study on
Whether People Who Make ‘Daily To-Do Lists’ experience ‘Less Stress’ than People
Who ‘Do Not Make Such Lists’
Question? What do you think whether this study is Correlational or Experimental in
nature?
Answer
The crucial point is that what defines a study as experimental or correlational is not
the variables being studied, nor whether the variables are quantitative or categorical,
nor the type of graph or statistics used to analyze the data.
It is how the study is conducted.
How to Conduct a Correlational Research?
Correlational
Studies
Examples
Social Anxiety () and Avoidance Behaviours ( in Adolescents
Observing adolescents in social situations such as school or social events could reveal patterns of avoidance
behaviours and their relationship with self-reported social anxiety levels.
Peer Modelling () and Prosocial Behavior in Children (
Observing children’s interactions on the playground to see if those who observe their peers engaging in prosocial
behaviours are more likely to exhibit similar behavior themselves.
Leadership Styles () and Employee Behaviours ( in an Organisation
Observing how different leadership styles such as transformational, autocratic etc. are associated with employee
engagement, communication patterns and task completion with a real work environment.
ii. Survey Method
In Survey Research, you can use Questionnaires to measure your variables of interest.
You can conduct surveys online, by mail, by phone, or in person.
Surveys are a quick, flexible way to collect standardized data from many participants, but
it’s important to ensure that your questions are worded in an unbiased way and capture
relevant insights.
Examples
Sleep Patterns () and Depression Symptoms (
Participants complete questionnaires assessing their sleep quality, duration and depression symptoms.
Different Parenting Styles () and Delinquency among Adolescents (
Surveys seek responses by adolescents on different parenting styles such as authoritative, authoritarian
and permissive, and delinquent behavioral tendencies that help identify patterns and associations
Workplace Stress () and Job Satisfaction (
Employees would respond to surveys assessing their stress level, job satisfaction and various stressors in
their work environment.
iii. Archival Data
Archival Data is the data that have already been collected for some other purpose such as official
records, polls, or previous studies.
Using secondary data is inexpensive and fast, because data collection is complete. However, the
data may be unreliable, incomplete or not entirely relevant, and you have no control over the
reliability and validity of the data collection procedures.
Example
Crime Rates () and Economic Factors (
Social Psychologists might use archival data to examine the correlations between crime rates and
economic factors such as unemployment rates and poverty levels across different regions.
Researchers may analyze existing patient records and treatment outcome data to investigate the
correlation between specific patient characteristics such as age, gender and comorbidities and treatment
success or relapse rates.
One can use official national statistics and scientific studies from several different countries to combine
data on average working hours and rates of mental illness.
How to Measure Correlation?
The Measure of Correlation is called the Correlation Coefficient (r)
The Degree of Relationship expressed by a Correlation Coefficient (r) Ranges from -1 to
+1
The Direction of Change is indicated by a Sign + or -
-1 +1
Positive and Negative Correlation
Positive Correlation: As the value of one variable increases, the value of
other variable also increases. Similarly, as the value of one variable
decreases, the value of other variable also decreases.
Negative Correlation: As the value of one variable increases, the value of
other variable to decreases.
No Correlation: There is no correlation between the two variables.
Strength of Correlation between Variables
Limitations of Correlation Method
i. Causality Problem
If two variables are correlated, it could be because one of them is a cause and
the other is an effect. But the correlational research design doesn’t allow you
to infer which is which. To err on the side of caution, researchers don’t
conclude causality from correlational studies.
Example
Lack of Perceived Social Support () and Depression (
People with low Perceived Social Support are more likely to have Depression. But
you can’t be certain about whether having low perceived social support levels
causes depression, or whether having depression causes perception of lack of
social support due to negative thinking.
ii. A Third Variable Problem
A Confounding Variable is a third variable that influences other variables to make them
seem causally related even though they are not. Instead, there are separate causal links
between the confounder and each variable.
In correlational research, there is limited or no researcher control over such Confounding
Variable or Extraneous Variables Even if you statistically control for some potential
confounders, there may still be other hidden variables that disguise the relationship between
your study variables.
Example
Working hours () and Work-related Stress (; Meaning of Work as a Confounding
Variable
People with lower working hours report lower levels of work-related stress. However, this
doesn’t prove that lower working hours causes a reduction in stress. There are many other
variables that may influence both variables, such as meaning of work, working conditions, job
insecurity, compensation, etc. You might statistically control for these variables, but you can’t
say for certain that lower working hours reduce stress because other variables may complicate
the relationship.
Types of
Based on
Correlations Non-
Based on Based Based on
Direction of Constancy of on Type of Parametr
the the Ratio of Number Variables ic
Relationship Change of Used /
between Variables Special Spearma
Positive Variables Simple Correlations
Correlatio Linear Point- Biserial n Rank
Correlati Correlati Biserial Correlati Order
n on
on Correlati on Correlati
Negative (r) on on (ρ)
Correlation Curviline )
ar / Non- Partial ) Kendall’s
Linear Correlatio Tau
Correlati n
(
on 
Multiple
Correlati
on
(
Linear and Non-Linear / Curvilinear Correlation
Linear and Non-Linear / Curvilinear Correlation
1. Simple Linear CorrelationsKarl
Pearson’s Product Moment Correlation
If the Quantitative Scores of both the Variables (which are assumed to be Normally distributed) are
available then the usual method for Correlation is Pearson Product Moment Correlation.
This method gives a precise numerical value of the degree of linear relationship between two Variables X
and Y.
Underlying Assumptions for Karl Pearson Correlations:
(a) Both Variables are Normally Distributed
(b) Both are Variables are Continuous
Formula to calculate Karl Pearson Correlation Coefficient from Raw Scores is
r =  OR  OR 
where,
is the Sum of Cross-products of Deviation Scores and can be expressed in terms of Raw Scores

is the Sum of Squares of the Deviation Scores for X
 is the Sum of Squares of the Deviation Scores for Y
Karl Pearson’s Product Moment Correlation
  r =  OR  OR 
 
  − ¯) , ( − ¯)
∑ ∑
= ( =
𝑿
𝒀
𝑺
𝑺
𝑿
𝑿
𝑺
𝑺
𝒀
𝒀
𝟐
𝟐
Calculation of Karl Pearson’s Product Moment Correlation
 
 =
= =

=
=
 8
Where,


 
Dichotomous Variables
True Dichotomous Variable Forced / Artificial Dichotomous
(Discrete) Variable
(Continuous)

where, 
No Outliers
There should be No Outliers for the Continuous Variables for each category of Dichotomous Variable (can be tested
using Box-Plots).
Formula for Point-biserial r ( )
Point Biserial r is a Product-Moment r

Where, OR 
X = Continuous Variable
Y = Dichotomous Variable
N = Total Sample Where,
Mp is the Mean of the First Group
Mq is the Mean of the Second Group
σT Standard Deviation of the Total Group
p is the proportion of the First Group
q is the proportion of the Second Group
Numerical Problem for Point-biserial r ( )
Problem: It is a data of 20 subjects, out of which 9 are Males and 11 are Females. Their marks
in the final examination are also provided. Calculate appropriate correlation and interpret the
results.
Method I Method II

= = 
=
=
=
=

=
= 0.38
Formulation of Hypothesis
Null Hypothesis
A hypothesis about a population parameter (e.g. mean) that a researcher tests.
 There will be no significant relationship between the marks in the final examination and the Gender of
the subjects.
Alternate Hypothesis
A hypothesis about a population parameter that contradicts the Null Hypothesis
There will be a significant relationship between the marks in the final examination and Gender of the
subjects.
OR
Female subjects will be associated with higher marks in the final examination compared to male
subjects.
OR
 Male subjects will be associated with higher marks in the final examination compared to female
subjects.
Interpretation of Point Biserial r (
The Point-biserial correlation between Gender and Marks obtained is 0.386. The Sign is
Positive (+).
The Sign is ‘arbitrary’ and needs to be interpreted depending on the ‘coding of the
dichotomous group’. The interpretation of the sign is - the Group that is coded as 1 has a
higher Mean than the Group that is coded as 0.
Since Females are denoted as 1, The Interpretation of the Correlation would be that Females
were associated with More Marks in the Final Examination, and Males were associated with
Lesser Marks in the Final Examination.
The Result
“There was a Moderately Positive Correlation between the Gender of the Subjects and Marks
in their Final Examination  = 0.386
Suppose 1 is Male and 0 is Female, then how do we interpret?
 = - 0.386
Negative Sign indicates that Males were associated with Lesser Marks and Females were
associated with More Marks in the Final Examination.
(b) Biserial r (
It is a Correlation Index that estimates the Strength of a Relationship between a True Continuous
Variable and a Forced or Artificial Dichotomous Variable.
When both the Variables are assumed to be Continuous & Normally Distributed, but One
of the Variables is Obtainable in Scores, and the Other Variable is Classified into 2 categories
(Forced Dichotomy).
Under these conditions the Biserial r is an estimate of the Product Moment r
Can be used as a measure of Discrimination Index of Item
Examples
Social Media Usage () [Frequent (1) and Non-frequent (0)] and the Severity of Clinical
Depression Symptoms () or Self-esteem () in individuals with a specific medical
condition.
Social Adjustment () [Socially Adjusted (1) and Socially Not Adjusted (0)], Academic
Achievement () [Pass (1) and Fail (0)], Creativity () [ Creative (1) or Non Creative (0)],
and Intelligence () in School Children
Supervisory Feedback () [Positive (1) and Negative (0)] and Job Performance () of
Employees
Parental Attachment () [Secure (1) and Insecure (0)], Body Shape [Athletic (1) and Non
athletic (0)], Belief Systems [Radical (0) or Conservative (0)], or Cognitive
Processing [Socially minded (1) or Mechanically minded (0)], etc.
Assumptions of Biserial r
Interval Level of Measurement for both The Variables
Both the Variables should be measured on a Continuous Scale. Though, one of the variables is
dichotomized based on a cut-off criterion.
Normal Distribution of Continuous Variable
Both the Continuous Variables should be approximately Normally Distributed, and for each category
of the Dichotomous Variable (can use Shapiro-Wilk Test of Normality).
Homogeneity of Variance
Both the Continuous Variable should have Equal Variances, and for each category of the
Dichotomous Variable (can use Levene’s test of equality of variances).
No Outliers
There should be No Outliers for the Continuous Variables, and for each category of Dichotomous
Variable (can be tested using Box-Plots).
Formula for Biserial r )
 OR 
Where,
Mp is the Mean of the First Group
Mq is the Mean of the Second Group
σT Standard Deviation of the Total Group
p is the Proportion of the First Group
q is the Proportion of the Second Group
(preferably neither p or q should be less than 0.01; and a closer to 0.50 is better)
μ is the Height of the Normal Curve ordinate dividing the two parts, p and q.
Numerical Problem for Biserial r (
Problem
A teacher wants to determine whether there is a
relationship between the Results of the Students
(Pass or Fail) and the Number of Hours per week that
they devoted to their studies.
The data of 14 students is given.
Calculate Biserial r correlation.
Limitations:
Biserial r cannot be used in Regression equation.
This coefficient has no Standard Error of Estimate and the score predicted for all the members of a
group is simply the mean of that category. (Note: SErbis is larger than SEr and that it becomes increasingly
larger as the difference between p and q widens. In case of Biseral r, only an approximate measure of the
stability of rbis is obtained with available formula as the exact sampling distribution of Biseral r is not
known.)
It is not limited as r, to a range of + - 1.00, rendering comparisons with other coefficients of correlation
difficult.
Comparison of Biserial r (rbis ) and Point Biserial r (rpbis )
In most cases Pt. Biserial r is more dependable statistics than Biserial r.
The Pt. Biserial r makes No Assumption regarding Form of the Distribution in the Dichotomized Variables.
Point Biserial r is always Lower than Biserial r (and closer to Product Moment r) for the same two variables, but this
characteristic is not much important as both coefficients are rarely computed from the same data.
Point Biserial r is a Product-Moment r and can be checked against r. This is usually not possible for Biserial r.
In favor of Biserial r it can be said that – a Normal Distribution in the Split Variable is often a more plausible hypothesis than is
the dubious assumption of a Genuine Dichotomy.*
Point Biserial r and Biserial r are both useful in Item Analysis. But Biserial r is not as valid procedure nor as defensible as is Point
Biserial r.
* The only distinct advantage Biserial r has over Point Biserial r is that Tables are available from which we can read values of rbis quickly
and with sufficient accuracy for most purposes. All one need to know is the percentage passing a given Item in selected Upper and
Lower Groups.
𝑥
𝑧
𝑦
their common dependence upon a Third Variable.
𝑦
𝑧
𝑥
𝑟
𝑟
E.g. Creativity and Intelligence, mediated by Verbal Comprehension
𝑥
𝑧
𝑟
𝑥
𝑦
𝑟
𝑦
𝑧
𝑟
(b) Statistically, by holding Variability of Extraneous Factor Constant through
Variabl Variabl
Partial Correlation. eY eZ
Partial Correlation is a measure of the strength and direction of a Linear
Relationship between Two Continuous Variables ( and  whilst controlling for
the Effect of One or More Continuous Variables (  also known as ‘Covariates’,
‘Extraneous’ or ‘control’ Variables.
𝑥
𝑦
𝑧
( )
𝑅
i.e. Partial Correlation measures the Strength of Correlation between
Variable X and Variable Y when the Variable Z is kept constant
Partial Correlations 
When we are interested in finding out the Correlation of Variables of Interest, keeping Other Variables
Constant, then Partial Correlation is used.
Partial Correlation eliminates the effects of the unwanted variables in the Study.
These are useful in Multiple Regression Analysis as they allow to look at the relationship between two
variables when effect of the third variable is held constant.
A significant positive partial correlation implies that as the values of one variable increase, the values on
second variable also tends to increase, while holding the values of control variable(s) constant.
r12.345 is Correlation between Variable 1 and Variable 2, in which Variable 3,4, 5 have been partialed out.
Likewise, r12.3, r13.2, and r23.1

Zero Order, First Order, Second Order …Partial Correlations?
The Order indicates the No. of Correlations Partialled Out.
Examples
Exercise Frequency (, and Sleep Quality , while controlling Medication Use 
Body Shaming (, and Body Dissatisfaction Self-esteem 
Training Completion (, and Job Performance Supervisor Feedback 
Assumptions of Partial Correlation 
All the Variables should be Continuous and measured at Interval Level
All the three variables should be measured on a Continuous Scale (i.e. an Interval or Ratio
Scale). For e.g., age, height, weight, test scores.
1.




 
=
=  
Calculations of Partial Correlation 


=
= 362/
= 0.53 =
=

= 0.13
= 656/
= 0.59

= 452/
= 0.80
Interpretation
The analysis revealed that there is a positive
correlation between milk intake and body
weight while controlling for age.
Multiple Correlation ()
Multiple Correlation is a statistical technique to assess the relationship between One Variable with the ‘Joint
Effect’ of the Rest of the Variables, (e.g relationhip of  .
It is a Correlation between More than Two Variables and these Variables should be Continuous.
R closer to 1, means Greater is the Linear Relationship. 1 is Perfect Correlation and 0 is No Correlation.
R1(23) is a Multiple Correlation of Variable 1 with the Combined Effect or Joint Effect of Variable 2 and 3. Likewise,
R1(23), R2(13), R3(12)
Examples
Childhood Trauma ( , Coping Strategies , Social Support  and Post-Traumatic Stress Disorders (. Among individuals with
history of trauma, one can investigate how childhood trauma is correlated with the use of Coping Strategies, the availability of social
support and the presence of PTSD Symptoms in Adults
Social Media Usage ( , Self-esteem , Loneliness  and Depression (The study could examine whether high levels of Social
Media Usage are correlated with lower Self-esteem, increased feelings of Loneliness, and higher levels of Depression
Academic Motivation ( , Peer Relationships , and Academic Achievement . A study might examine the relationships between
Academic Motivation, Peer Relationship Quality, and Academic Achievement in adolescents.
Leadership Style ( , Employee Engagement  Job SatisfactionExploring how leadership impacts employee attitudes and
turnover intentions by studying the relationships among leadership style, employee engagement and job satisfaction within a
company.
Numerical Problem for Multiple Correlation 
Problem: A researcher wants to find out the correlation between IQ ( , Study
Hours  and Annual Grades 
Numerical Problem of Multiple Correlation 
  
  =
=  
Calculations of Multiple Correlation 


= 362/ =
= 0.53 =
=

= 656/ = 0.82
= 0.59

= 452/
= 0.80
Correcting Multiple R for Inflation
A multiple R computed from a sample always tends to be somewhat
‘"inflated” with respect to the population R^ owing to the accumulation of
chance errors which tend to pile up since R is always taken as positive.
The boosting of a multiple R is most pronounced when N is small and the
number of variables in the problem quite large.
An obtained R can be corrected or "shrunken” to give a better measure of the
population R by use of the following formula:
Limitations of Multiple and Partial Correlation
The number of cases in a multiple correlation problem should be large, especially if there
are a number of variables; otherwise the coefficients calculated from the data will have
little significance.
The question of accuracy of computation is also involved. A general rule is that results
should be carried to as many decimals as there are variables in the problem. How strictly
this rule is to be followed must depend upon the accuracy of the original measures.
A serious limitation to a clear-cut interpretation of a partial r arises from the fact that most
of the tests employed by psychologists probably depend upon a large number of
‘'determiners.”
It would be fallacious to interpret the partial correlation between reading comprehension
and arithmetic, with the influence of “general intelligence” partialed out, as giving the net
relationship between these two variables for a constant degree of intelligence. Both reading
and arithmetic enter with heavy, but unknown, weight into most general intelligence tests;
hence the partial correlation between these two, for general intelligence constant, cannot
be interpreted in a clear-cut and meaningful way.
Since coefficient of multiple correlation is always positive, variable errors of sampling tend
to accumulate and thus make the coefficient too large. A correction to be applied to R,
when the sample is small and the number of variables large.