0% found this document useful (0 votes)
3 views

PBH7003 Tests of relationships

The document discusses the levels of measurement and various statistical methods for analyzing relationships between variables, focusing on correlation and regression analysis. It explains correlation coefficients, their interpretation, and the differences between parametric and non-parametric methods, as well as the importance of controlling for confounding variables in partial correlation. Additionally, it covers the assumptions for using Pearson’s and Spearman’s correlation, the significance of regression models, and the considerations for multiple regression analysis.

Uploaded by

jaasiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

PBH7003 Tests of relationships

The document discusses the levels of measurement and various statistical methods for analyzing relationships between variables, focusing on correlation and regression analysis. It explains correlation coefficients, their interpretation, and the differences between parametric and non-parametric methods, as well as the importance of controlling for confounding variables in partial correlation. Additionally, it covers the assumptions for using Pearson’s and Spearman’s correlation, the significance of regression models, and the considerations for multiple regression analysis.

Uploaded by

jaasiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 68

2

REMINDER: Levels of measurement

3
REMINDER: Levels of measurement

4
Tests of Relationships
We’re not always interested in determining differences!!
Sometimes the major objective of a study is to determine
relationships
Widely used methods to examine association and
relationships between variables:

5
Correlation Analysis
Numerical measure of strength of a relationship between
two variables = Correlation coefficient = r

-1.0 0 +1.0

Perfect -ve No Perfect +ve


correlation correlation correlation
(null hyp)

Parametric – Pearson's correlation coefficient (r)

Non-parametric – Spearman's correlation coefficient (rs)


6
Correlation

Positive Negative No correlation


(value > 0) (value < 0) (value of 0)
• Data has an • Data has a • Data has no
upward trend downward trend
ward trend
7
Strength of correlation
The closer the points lie to the line the stronger the
correlation

r = 0.7 r = 0.4 r = 0.2


Positive

Negative
r = -0.7 r = -0.4 r = -0.2

8
Guidelines for interpreting
correlation coefficient (r)
When we report our results, we need to comment on both the
strength and direction of the relationship
 Whether the relationship is positive or negative (direction)
 Whether the relationship is weak/moderate/strong (strength)

0.0 < r < 0.1 Trivial correlation


0.1 < r < 0.3 Weak correlation
0.3 < r < 0.5 Moderate correlation
0.5 < r < 0.7 Strong correlation
0.7 < r < 0.9 V. Strong correlation
0.9 < r < 1.0 Nearly perfect correlation
r = 1.0 Perfect correlation
9
Example write up 4 STEP PROCESS!

0.0 < r < 0.1 Trivial correlation


0.1 < r < 0.3 Weak correlation
0.3 < r < 0.5 Moderate correlation
0.5 < r < 0.7 Strong correlation
0.7 < r < 0.9 V. Strong correlation
0.9 < r < 1.0 Nearly perfect correlation
R = 1.0 Perfect correlation

Direction Significance
Strength
A Pearson’s correlation was conducted to identify the
relationship between height and standing long jump
performance. There was a very strong significant
positive correlation between height and jump
performance (r = 0.740, p < 0.05) 10
Significance
A relationship can be strong and yet not significant
A relationship can be weak but significant

Depends on:
Size of coefficient (r) Standard error of
Size of sample correlation coefficient

Small samples – easy to 2


produce a strong correlation 1 r
SEr 
Large samples – easy to n
achieve significance
11
Things to remember! (1)
The measurement is correlation not causation!!
Just because two variables are correlated does not
mean that one causes the other

3rd
No. of snake bites explanatory
factor
Vol. ice cream eaten (warm
weather)

Weather gets warmer – snakes more active – appetite for


cool refreshments also increases
12
Things to remember! (2)
The correlation treats all variables equally
i.e., it doesn’t distinguish between independent and
dependent variables

e.g., as height ↑ so does bball performance (makes sense)

Plot other way round


(height determined by
bball performance –
still get same r

IV – x axis
DV – y axis
13
Parametric or non-parametric
Correlations can be both parametric and non-parametric
in design.

Bivariate correlation
•Linear relationship
between TWO
variables

Partial correlation
•Linear relationship
between TWO
variables whilst
controlling for the
effects of one or more
variables 14
Assumptions of Pearson’s correlation
Variables must be either interval or ratio
measurements
Data must be approximately normally
distributed
There should be a linear relationship between
the two variables.
Outliers should be kept to a minimum or be
removed entirely.

15
Linearity
Pearson’s correlation measures strength and
direction of LINEAR relationships only

If you have a curvilinear relationship, a


correlational analysis using a linear model will tell
you that you have no relationship when you do
e.g., psychological arousal and performance

16
Examining Linearity
To test to see whether your two variables form a linear
relationship
Plot them on a graph (e.g., a scatterplot) and visually inspect
the graph's shape

If non-linear – use Spearman’s correlation


 Or transform data if data will still hold meaning

17
How to get a scatterplot in SPSS

18
Then a new window will appear
Select ‘simple scatter’ and then click ‘define’

19
Let’s say you want to graph the association
between the variables ‘time spent revising’
and ‘exam performance’.

Move one variable in the Y


Axis and the other in the X
Axis. It doesn’t matter which
variable goes to which Axis.

Click OK to
get the 20

scatterplot
This scatterplot in this example does not show a curvilinear
association. Therefore, in these data, the assumption that the
relationship between the variables must be linear, has been met. 21
Outliers
Outliers will impact your correlation coefficient
Left graph shows a correlation coefficient of +0.4
but if the last variable was removed the correlation
coefficient would increase to +0.7 (right graph)

The last result is


suspect – either
atypical or a
mistake (i.e., an
outlier)

22
Examining Outliers
Again, assessed by visual inspection of a graph

First check you haven’t made any data entry errors!


Consider any measurement errors?
If a genuine outlier……

23
Genuine Outliers
Transform your data – but then re-check all the
assumptions!
Keep it – but highlight it in write up
Run correlation with and without outlier to see if there
is an appreciable difference
Remove it (preferably with justification for doing so)
e.g., you aimed to take a cross section of individuals and
you have one participant with a BMI of 55 kg.m -2 – the
individual doesn’t represent the study you want to
generalise to
Provide information about that data point in write up

24
Assumptions of Spearman’s
Correlation
Ordinal data (or interval/ratio if don’t meet parametric)
All you need is a monotonic relationship between the two
variables
One that either never increases or decreases as its IV increases

1) As x increases, y 1) 2) 3)
never decreases
2) As x increases, y
never increases
3) As x increases, y
sometimes increases
sometimes decreases 25
Spearman’s Rho in SPSS

26
Spearman’s Rho output

Degrees of
freedom (df) =
Reporting the results
N-2

A Spearman's Rank Order correlation was run to determine the


relationship between students' English and Maths exam
marks. There was a strong, positive correlation between
English and Maths marks, which was statistically significant
(r (8) = .669, p = .035). 27
Partial Correlation
It measures the relationship between two variables, controlling for the effect
that one or more other continuous variables (aka third/confounding variable)
has/have on them both.

For example, we know that revision time is associated (and therefore


explains part of) with both exam performance and exam anxiety. But we now
want to see what the correlation between exam performance and exam
anxiety is while controlling for the effect of revision time.

What is the correlation between


exam performance (1) and anxiety
(2), controlling for revision time
(3)?

28
Partial Correlation in SPSS

29
30
1. Move ‘exam
performance’ and
‘exam anxiety’
across, in the
‘variables’ box.

2. 2. Move ‘time
spent revising’
into the
‘controlling for’
box.

3. 3. Click OK to run
the analysis.
31
How to report the results

A partial correlation was conducted to examine the association between exam anxiety and
exam performance while controlling for the effect of revision time. There was a moderate,
negative correlation between exam anxiety and exam performance, which was statistically
significant, r(100) = -.247, p = .012
32
Once you’ve established a
correlation/relationship
It is not uncommon within health research to try
to establish a true cause and effect relationship
between one variable and another
It is possible to predict the influence of one
variable outcome on another variable outcome
Particularly important when practitioners and
researchers try to predict the outcome on an
immeasurable variable by looking at the performance
on a measurable variable.

REGRESSION ANALYSIS
34
Regression Analysis
In regression analysis you have a DV, and one or
more IV’s that are related to it.
You can express the relationship as a linear
equation:
y = a + bx
Y = DV
X = IV
a = constant
b = slope of line
For every increase of 1
unit in x, y changes by
an amount equal to b
35
Linear Regression
Some relationships are perfectly linear and fit this equation
exactly
E.g., mobile phone bill, may be:

Total Charges = Monthly rental + 10p a minute

If you know the monthly rental fee and the number of minutes used,
you can predict the total charges exactly.

36
Multiple Regression
What if there are several factors affecting the
independent variable?
e.g., think of a price of a home
as a DV. Several factors
contribute to the price of a
home…..
So, it is common to use more
than one variable to predict the
performance on another
variable.
This is known as Multiple Regression
37
Multiple Regression
Each of the factors will have a separate relationship with
the price of a home
The equation that describes a multiple regression
relationship is:

y = a + b1x1 + b2x2 + b3x3 + … bnxn + e

No. of rooms Age of house No. of bathrooms


The equation separates each IV from the rest – each has its own
coefficient describing its relationship to the DV
 e.g., If square footage is an IV, and it has a coefficient of £50, every
additional square foot of space adds £50 on average to price of home

38
How good is the model? (R ) 2

The R2 value provides a measure of how well the


model explains the data
Differences between observations that are not
explained by the model remain in the error term.
The R2 value tells you what percent of those
differences is explained by the model.
An R2 of .68 means that 68% of the variance in the
observed values of the dependent variable is
explained by the model
32% of those differences remains unexplained in the
error term.
39
Is the model significant? (p)
There is a significance level provided for the model
as a whole (an ANOVA model)
This tells us whether the relationship is significant or
not
Significance values are also shown for each IV to
see which are significant

40
Things to consider (1)
Multicollinearity
Occurs when one of more of your IV’s are related to one
another

Multicollinearity between predictors makes it difficult to


assess the individual importance of a predictor

If two predictors are highly correlated, and each accounts for
similar variance in the outcome then we should consider only
one
Collinearity Diagnostics
Examine correlation matrix of all predictor variables to see if any are
highly correlated
Correlation up to as much as 0.8 acceptable
Variance Inflation Factor (VIF)
Indicates whether a predictor has a strong linear relationship with other
predictors
Value of above 10 gives cause for concern (VIF should be less than 10)
Tolerance
Another measure reported by SPSS relating to collinearity
Should be investigated if less than 0.1 (Tolerance should be >0.1)

42
Basic requirements of MR
You have one dependent variable that is measured
at the continuous level

You have two or more independent variables that


are measured either at
the continuous or nominal level
VO2max: maximum or optimum rate at which the heart, lungs, and
muscles can effectively use oxygen during exercise, used as a way of

SPSS output
measuring a person's individual aerobic capacity

The F-ratio in the ANOVA table tests


whether the overall regression model
is a good fit for the data. The table
shows that the independent variables
statistically significantly predict the
dependent variable, F(4, 95) =
32.393, p < .0005 (i.e., the
regression model is a good fit of the
data).
Unstandardized coefficients indicate
how much the dependent variable
varies with an independent variable
when all other independent variables
are held constant. Consider the effect
of age. The unstandardized coefficient,
B1, for age is equal to -0.165. This
means that for each one-year increase
in age, there is a decrease in VO2max of
0.165 ml/min/kg.
Conditions for Multiple Regression
Used to explore linear relationships
Independent variable – interval or ordinal scale
Dependent variable – interval/ratio scale
Large number of observations required
 Generally: n = 10  no. of independent variables
Data should be checked for outliers, normality, etc.
Additional independent variables:
 Unique contributions to understanding the dependent variable
 Independent variables should not be highly correlated
 Simplest (most parsimonious) model is the best

45
Interpreting and reporting MR
When interpreting and reporting your results from a multiple regression,
work through three stages:
(a)determine whether the multiple regression model is a good fit for the
data
(b)understand the coefficients of the regression model and
(c)make predictions of the dependent variable based on values of the
independent variables
Presenting the results of MR
Presenting the results of MR
A multiple regression was run to predict VO2max
from gender, age, weight and heart rate. These
variables statistically significantly predicted VO2max,
F(4, 95) = 32.393, p < .0005, R2 = .577. Age was
negatively associated with VO2max (b=-0.165, t=-
2.633, p<0.05) etc
Multiple Regression – Different
Methods
Some predictors variables will be more important that others, some
may have no influence at all
We can choose to enter all our variables in our regression calculation
using the enter method.
We can decide which variables to include (forward method) or exclude
(backward method) in the regression equation
Or both (by stepwise method)
To end up with a model that includes the predictors we believe are
important in the prediction and excludes ones that have only trivial effect
on the DV
49
Hierarchical Regression
 A special form of a multiple linear regression analysis in
which more variables are added to the model in separate
steps called “blocks.”

This is often done to statistically “control” for certain variables,


to see whether adding variables significantly improves a
model’s ability to predict the criterion variable and/or to
investigate a moderating effect of a variable

Hierarchical multiple regression (also known as sequential


multiple regression)
Hierarchical Regression
A health researcher wants to predict maximal aerobic capacity (VO2max) The
researcher knows that VO2max is related to an individual's age and gender.
The researcher wants to know whether adding an individual's weight can
improve the prediction of VO2max. Furthermore, the researcher wants to
know whether measuring an individual's heart rate immediately after a 20
minute submaximal walking test would also lead to an improvement in the
ability to predict VO2max. Therefore, the researcher recruits 100 participants
to perform a maximum VO2max test, whilst also recording their age, weight,
heart rate and gender.
Procedure of Hierarchical Regression
Similar to multiple regression but variables are entered in blocks

The basic command for hierarchical multiple regression analysis in


SPSS is “regression -> linear”:
Procedure of Hierarchical Regression

By transferring weight, you are testing if the addition of weight to the


regression model improves the prediction of VO2max over and above the
variables age and gender alone.
Procedure of Hierarchical Regression
The addition of weight to the prediction of VO2max (Model
2) led to a statistically significant increase in R2 of .239,
F(1, 96) = 40.059, p < .0005.

The addition of heart rate to the prediction of VO2max (Model 3)


led to a statistically significant increase in R2 of .283, F(1, 95) =
92.466, p < .0005.
The full model of gender, age, weight and heart rate to predict
VO2max (Model 3) was statistically significant, R2 = .710, F(4,
95) = 58.078, p < .0005, adjusted R2 = .698.
Reporting
A hierarchical multiple regression was run to determine if the
addition of weight and then of heart rate obtained from a
submaximal test improved the prediction of VO2max over and
above age and gender alone. The full model of gender, age,
weight and heart rate to predict VO2max (Model 3) was
statistically significant, R2 = .710, F(4, 95) = 58.078, p < .001;
adjusted R2 = .698. The addition of weight to the prediction of
VO2max (Model 2) led to a statistically significant increase in
R2 of .239, F(1, 96) = 40.059, p < .001. The addition of heart
rate to the prediction of VO2max (Model 3) also led to a
statistically significant increase in R2 of .283, F(1, 95) =
92.466, p < .001.
Reporting
Logistic Regression
What about when your DV is not interval?
Binomial logistic regression – similar to linear
regression BUT
Predicts probability that an observation falls into one of
two categories (dichotomous DV) based on one or
more IVs
e.g., predict women who do or do not experience
breast pain based on a number of other variables
(e.g., age, number of children, menopausal status)
Predicting membership
of a group/category 61
Requirements of logistic regression
≥ 2 IV’s that can be either continuous or
categorical (e.g., height, exam performance,
gender, etc.)
One DV that is dichotomous (e.g., presence of
heart disease (yes/no), gender (male/female).
Note: If you have an ordinal IV, you need to treat this
as either a continuous or nominal variable.
Still don’t want outliers or multicollinearity!
Categories should be mutually exclusive
Minimum of 15 cases per IV
62
Example
Predicting whether the incidence of heart disease can be
predicted based on:
Age
Weight
Gender
Maximal aerobic capacity (VO2max)

100 participants recruited to perform VO2max test and record


age, weight and gender.
Also evaluated for presence of heart disease.
Logistic regression run to determine whether presence of heart
disease could be predicted from their VO2max, age, weight and
gender.

63
Ordinal Logistic Regression
What about when our DV is ordinal?
Ordinal Logistic Regression – used to predict an ordinal DV given one
or more IV’s

e.g., the effects on taste (Likert scale ratings ranging from strong dislike
(1) to excellent taste (9)) of various cheese additives.

64
Requirements of Ordinal Logistic
regression
One DV that is measured at the ordinal level. (e.g., Likert
scales, PA level (sedentary, low, moderate and high)
≥ One IV that are continuous (e.g. age), ordinal (as
above) or categorical (e.g. gender, ethnicity)
However, ordinal IVs must be treated as being either
continuous or categorical
Still don’t want outliers or multicollinearity!
You should have proportional odds
 Each IV has an identical effect at each cumulative split of the
ordinal DV

65
Example
Taxes have the ability to elicit strong responses in many
people (too high vs too low)
A researcher presented participants with the statement ‘Tax
is too high in this country’
Participants gave a level of agreement – ‘strongly disagree’,
‘disagree’, ‘agree’ or ‘strongly agree’
 Ordered responses categories of the DV (Tax View)
Other questions asked including whether they owned their
own business, their age (age) and which political party they
last voted for.
Ordinal Logistic Regression run to determine whether tax
view could be predicted by the IVs

66
Only an experimental design can
establish whether there is a causal
relationship between variables.

67
Reading

68

You might also like