PBH7003 Tests of relationships
PBH7003 Tests of relationships
3
REMINDER: Levels of measurement
4
Tests of Relationships
We’re not always interested in determining differences!!
Sometimes the major objective of a study is to determine
relationships
Widely used methods to examine association and
relationships between variables:
5
Correlation Analysis
Numerical measure of strength of a relationship between
two variables = Correlation coefficient = r
-1.0 0 +1.0
Negative
r = -0.7 r = -0.4 r = -0.2
8
Guidelines for interpreting
correlation coefficient (r)
When we report our results, we need to comment on both the
strength and direction of the relationship
Whether the relationship is positive or negative (direction)
Whether the relationship is weak/moderate/strong (strength)
Direction Significance
Strength
A Pearson’s correlation was conducted to identify the
relationship between height and standing long jump
performance. There was a very strong significant
positive correlation between height and jump
performance (r = 0.740, p < 0.05) 10
Significance
A relationship can be strong and yet not significant
A relationship can be weak but significant
Depends on:
Size of coefficient (r) Standard error of
Size of sample correlation coefficient
3rd
No. of snake bites explanatory
factor
Vol. ice cream eaten (warm
weather)
IV – x axis
DV – y axis
13
Parametric or non-parametric
Correlations can be both parametric and non-parametric
in design.
Bivariate correlation
•Linear relationship
between TWO
variables
Partial correlation
•Linear relationship
between TWO
variables whilst
controlling for the
effects of one or more
variables 14
Assumptions of Pearson’s correlation
Variables must be either interval or ratio
measurements
Data must be approximately normally
distributed
There should be a linear relationship between
the two variables.
Outliers should be kept to a minimum or be
removed entirely.
15
Linearity
Pearson’s correlation measures strength and
direction of LINEAR relationships only
16
Examining Linearity
To test to see whether your two variables form a linear
relationship
Plot them on a graph (e.g., a scatterplot) and visually inspect
the graph's shape
17
How to get a scatterplot in SPSS
18
Then a new window will appear
Select ‘simple scatter’ and then click ‘define’
19
Let’s say you want to graph the association
between the variables ‘time spent revising’
and ‘exam performance’.
Click OK to
get the 20
scatterplot
This scatterplot in this example does not show a curvilinear
association. Therefore, in these data, the assumption that the
relationship between the variables must be linear, has been met. 21
Outliers
Outliers will impact your correlation coefficient
Left graph shows a correlation coefficient of +0.4
but if the last variable was removed the correlation
coefficient would increase to +0.7 (right graph)
22
Examining Outliers
Again, assessed by visual inspection of a graph
23
Genuine Outliers
Transform your data – but then re-check all the
assumptions!
Keep it – but highlight it in write up
Run correlation with and without outlier to see if there
is an appreciable difference
Remove it (preferably with justification for doing so)
e.g., you aimed to take a cross section of individuals and
you have one participant with a BMI of 55 kg.m -2 – the
individual doesn’t represent the study you want to
generalise to
Provide information about that data point in write up
24
Assumptions of Spearman’s
Correlation
Ordinal data (or interval/ratio if don’t meet parametric)
All you need is a monotonic relationship between the two
variables
One that either never increases or decreases as its IV increases
1) As x increases, y 1) 2) 3)
never decreases
2) As x increases, y
never increases
3) As x increases, y
sometimes increases
sometimes decreases 25
Spearman’s Rho in SPSS
26
Spearman’s Rho output
Degrees of
freedom (df) =
Reporting the results
N-2
28
Partial Correlation in SPSS
29
30
1. Move ‘exam
performance’ and
‘exam anxiety’
across, in the
‘variables’ box.
2. 2. Move ‘time
spent revising’
into the
‘controlling for’
box.
3. 3. Click OK to run
the analysis.
31
How to report the results
A partial correlation was conducted to examine the association between exam anxiety and
exam performance while controlling for the effect of revision time. There was a moderate,
negative correlation between exam anxiety and exam performance, which was statistically
significant, r(100) = -.247, p = .012
32
Once you’ve established a
correlation/relationship
It is not uncommon within health research to try
to establish a true cause and effect relationship
between one variable and another
It is possible to predict the influence of one
variable outcome on another variable outcome
Particularly important when practitioners and
researchers try to predict the outcome on an
immeasurable variable by looking at the performance
on a measurable variable.
REGRESSION ANALYSIS
34
Regression Analysis
In regression analysis you have a DV, and one or
more IV’s that are related to it.
You can express the relationship as a linear
equation:
y = a + bx
Y = DV
X = IV
a = constant
b = slope of line
For every increase of 1
unit in x, y changes by
an amount equal to b
35
Linear Regression
Some relationships are perfectly linear and fit this equation
exactly
E.g., mobile phone bill, may be:
If you know the monthly rental fee and the number of minutes used,
you can predict the total charges exactly.
36
Multiple Regression
What if there are several factors affecting the
independent variable?
e.g., think of a price of a home
as a DV. Several factors
contribute to the price of a
home…..
So, it is common to use more
than one variable to predict the
performance on another
variable.
This is known as Multiple Regression
37
Multiple Regression
Each of the factors will have a separate relationship with
the price of a home
The equation that describes a multiple regression
relationship is:
38
How good is the model? (R ) 2
40
Things to consider (1)
Multicollinearity
Occurs when one of more of your IV’s are related to one
another
If two predictors are highly correlated, and each accounts for
similar variance in the outcome then we should consider only
one
Collinearity Diagnostics
Examine correlation matrix of all predictor variables to see if any are
highly correlated
Correlation up to as much as 0.8 acceptable
Variance Inflation Factor (VIF)
Indicates whether a predictor has a strong linear relationship with other
predictors
Value of above 10 gives cause for concern (VIF should be less than 10)
Tolerance
Another measure reported by SPSS relating to collinearity
Should be investigated if less than 0.1 (Tolerance should be >0.1)
42
Basic requirements of MR
You have one dependent variable that is measured
at the continuous level
SPSS output
measuring a person's individual aerobic capacity
45
Interpreting and reporting MR
When interpreting and reporting your results from a multiple regression,
work through three stages:
(a)determine whether the multiple regression model is a good fit for the
data
(b)understand the coefficients of the regression model and
(c)make predictions of the dependent variable based on values of the
independent variables
Presenting the results of MR
Presenting the results of MR
A multiple regression was run to predict VO2max
from gender, age, weight and heart rate. These
variables statistically significantly predicted VO2max,
F(4, 95) = 32.393, p < .0005, R2 = .577. Age was
negatively associated with VO2max (b=-0.165, t=-
2.633, p<0.05) etc
Multiple Regression – Different
Methods
Some predictors variables will be more important that others, some
may have no influence at all
We can choose to enter all our variables in our regression calculation
using the enter method.
We can decide which variables to include (forward method) or exclude
(backward method) in the regression equation
Or both (by stepwise method)
To end up with a model that includes the predictors we believe are
important in the prediction and excludes ones that have only trivial effect
on the DV
49
Hierarchical Regression
A special form of a multiple linear regression analysis in
which more variables are added to the model in separate
steps called “blocks.”
63
Ordinal Logistic Regression
What about when our DV is ordinal?
Ordinal Logistic Regression – used to predict an ordinal DV given one
or more IV’s
e.g., the effects on taste (Likert scale ratings ranging from strong dislike
(1) to excellent taste (9)) of various cheese additives.
64
Requirements of Ordinal Logistic
regression
One DV that is measured at the ordinal level. (e.g., Likert
scales, PA level (sedentary, low, moderate and high)
≥ One IV that are continuous (e.g. age), ordinal (as
above) or categorical (e.g. gender, ethnicity)
However, ordinal IVs must be treated as being either
continuous or categorical
Still don’t want outliers or multicollinearity!
You should have proportional odds
Each IV has an identical effect at each cumulative split of the
ordinal DV
65
Example
Taxes have the ability to elicit strong responses in many
people (too high vs too low)
A researcher presented participants with the statement ‘Tax
is too high in this country’
Participants gave a level of agreement – ‘strongly disagree’,
‘disagree’, ‘agree’ or ‘strongly agree’
Ordered responses categories of the DV (Tax View)
Other questions asked including whether they owned their
own business, their age (age) and which political party they
last voted for.
Ordinal Logistic Regression run to determine whether tax
view could be predicted by the IVs
66
Only an experimental design can
establish whether there is a causal
relationship between variables.
67
Reading
68