0% found this document useful (0 votes)
9 views

Week 8 - Logistic Regression

Uploaded by

Marina
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Week 8 - Logistic Regression

Uploaded by

Marina
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 67

PSY2017 Advanced Statistics and Data Analysis

Logistic Regression
Dr Carolina Feher da Silva
What is the point of logistic regression?
• Activity – picking the right test for research scenarios
• Break
Today’s
class Running a logistic regression in jamovi

Interpreting and reporting the results of


logistic regression
• Activity – interpreting and reporting
Learning objectives
Part 1:
• Understand what logistic regression is
• Know when to use logistic regression in your research

Part 2:
• Be able to run a logistic regression analysis in jamovi

Part 3:
• Interpret statistical output of a logistic regression analysis – inc.
model results & checking assumptions
• Be able to report results clearly in APA style – text and graphs
What is the point of logistic regression?
When should I use it?
Results might look like this …
The logistic function
What is binary logistic regression?
• Like linear regression, logistic regression is all
about prediction
• Prediction of categorical outcomes instead of
continuous
• More specifically:
• Binary logistic regression
• Outcomes that are binary (e.g., yes vs no), that
is, they have 2 options only

• Has all the same uses as linear regression


What are some binary outcomes?
• True vs false
• Left vs right
• Repeat vs change decision
• Pass vs fail
‘Logistic regression was used to test if body dissatisfaction at 14 years old
predicted the onset of risky health behaviours at 21 years old.’
‘Multiple logistic regressions were run to explore psychological
predictors of intentions to consume a healthier diet…’
Linear regression Logistic regression
Purpose? Predict value of Y based on Predict probability of Y
value(s) of X based on value(s) of X

Data type Continuous outcome Binary outcome


Is the model significant? F test χ2 test
How much variance does R2 Pseudo R2
the model explain?

Are individual predictors P value based on t-test P value based on z-test


significant? (Wald)

Effect size for predictors B and beta Odds ratio (must report
95%CI)
What is an odds ratio (OR)
• If you want to know in detail:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2938757/
• What are the odds of an event?
• The probability that the event will occur divided by the probability that the event will not
occur.
• If the probability is 0.8, then the odds are 0.8/0.2 = 4.
Odds ratio for a predictor X
Odds after a unit change in the predictor
Original odds (before change)

• OR=1 Exposure/predictor does not affect odds of outcome


• OR>1 Exposure/predictor associated with higher odds of outcome
• OR<1 Exposure/predictor associated with lower odds of outcome
Odds ratio
What is an odds ratio?!

Odds ratio should always be reported alongside 95% confidence interval:

OR 1.57 (95% CI: 1.24-1.67)


OR 0.65 (95% CI: 0.45-0.89)

Greater than 1 = greater probability of experiencing event (if 95% CI greater than 1
significant)
Less that 1 = lessened probability of experiencing event (if 95% CI less than 1 significant)
Example 1: Stroke and depression
Depression
No stroke 1
Stroke 3.5 (1.4-8.3) ***
*** p<0.001

Reference group = no stroke


Odds ratios relative to the group with no stroke

People who have had a stroke have a 3.5 times increased odds of being diagnosed with
depression than people who have not had a stroke.
Whyte et al (2004). Depression after stroke: A prospective epidemiological study. Journal of the American Geriatrics
Society: 52(5); 774-778.
Example 2: Oral contraceptive pill use and
depression
Depression
No oral contraceptive 1
use
Current oral 0.81 (0.47-1.40)
contraceptive use
While the odds ratio suggests that women taking oral contraceptives may have lower odds of
experiencing depression, the fact the 95% confidence interval overlaps with 1 means this is not
statistically significant.

Cheslock-Potsova et al (2015). Oral Contraceptive Use and Psychiatric Disorders in a Nationally Representative Sample
of Women. Archives of womens mental health: 18(1); 103-111.
Odds ratios with continuous predictors
Continuous predictor…..
• Different way to think about odds ratio

• OR tells us for every one-point increase in X what


the resulting increase or decrease in the odds of Y
will be.
Example: Anxiety, depression self-esteem
and bullying
Odds ratio (95% CI) P-value
Anxiety 0.36 (0.18-0.71) 0.003
Self-esteem 1.32 (0.75-2.31) 0.337
Lying 0.41 (0.23-0.76) 0.004
Depression 3.29 (1.63-6.66) 0.001

The effects of anxiety, lying and depression on being a bully are statistically significant.

For example: For every 1-point increase in anxiety, the odds of being a bully are multiplied by
0.36, which decreases it.

Salmon (1998). Bullying in schools: self-reported anxiety, depression and self-esteem in secondary school children. BMJ
317: 924-925.
Summary: Logistic regression
• Logistic regression allows us to determine the probability of
experiencing an event/outcome.
• Logistic regression can use both continuous and categorical predictors
(for categorical predictors we compare to a ‘reference’ group).
• The outcome for logistic regression is always categorical (binary).
• The probability of experiencing an event is indicated by the odds ratio
and 95% confidence interval (if the 95% confidence interval doesn't
overlap with 1, the effect is statistically significant).
• The contribution of each predictor is assessed with the Wald statistic.
Activity: Choose the test
Instructions:
• In groups, work through research scenarios/questions
• Choose which statistical test is the most appropriate to use
• Things to keep in mind to help you make your decision:

groups /
Research Question 1:
Mental health problems can be measured on a continuum (e.g. number of
symptoms) or discretely (e.g. diagnosed with an anxiety disorder vs not). If we had
data on both of these outcomes and a potential predictor (i.e., adverse childhood
experiences), how would you analyse the data to answer the following hypotheses?
H1: higher levels of adverse childhood experiences are associated with higher anxiety symptoms
H2: higher levels of adverse childhood experiences are associated with anxiety disorders.

groups /
Research Question 2:
Bevan has data on one million participants’ big five personality traits
collected in 2011. He links this data to the national death register to see
if there are any links between big five traits and dying in the past 10
years. First, what analysis would you use? Second, are there any
confounding variables or moderators you would recommend including
in the analysis?

groups /
Break Time
Stretch your legs
Get a cuppa
Running a logistic regression
in jamovi
Logistic regression example
• Do personality traits and gender predict getting a first (vs not) on the PSY2017 exam?

• Same data as multiple regression example; Survey data from a cohort of


psychology students, N = 162, but there is some missing data (e.g., 25 people
have missing psy2017_first data as they did not consent to data linkage)
• Self-report measures of Openness, Conscientiousness, Extraversion,
Agreeableness, Neuroticism, all on 1 to 5 scale, with higher scores meaning
higher levels of the trait
• Outcome interested in now is not % exam score but if get a first (70% or higher)

• Outcome variable is now if person gets a first (70% or higher) or if lower than a
first (≤69%) =‘psy2017_first’ ; a first = 1 and below a first=0 (coded like a dummy
variable)
• Gender is coded and Female = 0 and Male = 1 (this is a dummy variable)
Statistical Analysis process

Descriptive statistics

Assumption testing

Running the analysis

Post hoc analysis/


Follow up tests if needed
Step 1
• Analyses
• Exploration
• Descriptives
Step 2
• Move variables (predictor and
outcome) across
• Click on arrow tab for Statistics
• Tick (if not already ticked):
• All Sample Size
• All Central Tendency
• All Distribution
• All Normality (Shapiro-Wilk)
Step 2l
• Click on arrow tab for Plots Checking binary
• Tick (if not already ticked):
Histogram should look like this:
• Histogram two columns, same width
Step 3
• Analyses Checking binary
• Exploration

• Scatterplot

• Move variables across


• predictor on x axis
• outcome on y axis

• Tick Regression Line


• None

Scatterplot should look like this


Step 3m
Checking for multicollinearity
• Analyses
• Regression
• Correlation Matrix
• Move variables across
• Click on Correlation Coefficients
• Pearson
• Click on Additional Options
• Report Significance
• Flag significant correlations
• N
Step 4
Run model and check
regression specific assumptions
• Analyses
• Regression
• Logistic Regression
• 2 Outcomes Binomial
Step 5
• Move variables across
• Predictor variable = Covariates
• can also put pre-made dummy variables in
with covariates
• Confounder variables & non dummy categorical
predictors = Factors
• Outcome variable = Dependent variable
• Click on Assumption Checks Tab & Tick
Assumption Checks
• Collinearity statistics (to get VIF value)
• Click on Save and tick
• Cook’s distance (this creates a new variable called Cook’s
distance, which use to test assumption)
Step 5l
• Click on
• Analyses
• Exploration
• Descriptives
• Move variable
‘Cook’s distance’
across
Step 5m Model building
• Click on model builder
• All predictors / confounders
appear in predictor list
• Transfer across to Blocks, the
main theoretical predictors
• Then add new block for next
set of predictors or confounder
variables
• The number of blocks created
= number of models generated
• Model 1 = Block 1 variables
• Model 2 = block 1 & 2 variables
Step 6
• Click on Model Fit Tab &
Tick Fit measures
• Overall model test
& Pseudo R2 measures
• R
• R2
• Adjusted R2
• Tick Prediction tab,
Predictive measures
• Classification table
Step 7
• Click on Model Coefficients and
Odds Ratio
• Odds ratio
• Confidence Interval (95%)
Logistic regression summary steps (similar to multiple regression)
1. Identify outcome variable - must be dichotomous / binary
2. Choose predictors and how to enter them (“building/specifying a model”)
• Empirical and/or theoretical basis
• Ideally research and choose predictors before collecting the data
• Be picky! Less is more.
• Control for confounders?
• Do you want to enter predictors in blocks?

3. Run the model


4. Check assumptions – if not met, what does this mean for your model?
5. Check results – predictors and overall model fit
6. Report results
Summary steps – ‘Exam first’ example
1. Identify outcome variable –needs to be dichotomous/binary
• histogram shows that PSY2017_first is binary (0=below first; 1=first)

2. Choose predictors and how to enter them (“building a model”)


• Empirical and/or theoretical basis
• Use big five traits as this is a theoretical model of personality traits.
• There is quite a lot of evidence linking some of these to academic outcomes (e.g., Poropat, 2009).
• Ideally research and choose predictors before collecting the data
• Yes, this was done.
• Be picky! Less is more.
• We will use all 5 traits, but there is an argument that Conscientiousness and Openness are most important if we look at previous
research.
• Will you control for confounders?
• Yes, gender is our confounder – it is potentially related to both traits and grades
• Do you want to enter predictors in blocks?
• No, will enter the 5 traits and gender together

3. Run the model


4. Check results – predictors and overall model fit
5. Check assumptions
6. Report results
Interpreting and reporting the

results of a logistic regression


Reporting in APA style: Remember
Some Rules to help simplify things
1. Report means and SD for integers to 1 decimal place e.g., 35.1
2. Report test statistics to 2 decimal places (F, t, values) e.g., F=3.45
3. Report p-values and effect sizes to 3 decimal places
e.g., p = .078, d=.234 (p value can do to 2 decimal if >.1)

4. If p=.000 or p=.001 you must report this as p<.001


5. Only put a zero before the decimal if value can be >1(e.g. t, d, F)
6. Do not repeat information in tables and main text
https://apastyle.apa.org/instructional-aids/numbers-statistics-guide.pdf
Reporting a Regression
For logistic regression (unlike
multiple), it’s important to report all
Three important things to report three, as info about model and
predictors will differ now.
1.
Information
about 3.
2. Information
assumptions
Information about
about the individual
model predictors
Descriptive statistics & assumption testing
The Basics See Week 1 for
• 1. Descriptive Statistics more information
• A) Assumption of binormal distribution

Assumptions specific to Logistic Regression


• B) Outliers (influential cases)
• C) Linearity
• D) Homoscedasticity of error
• E) Normally distributed error
• F) Multicollinearity
1. Descriptives See Week 1 & simple regression

Look at: mean,


SDs, min/max
Check variables
for reverse scoring
A)Binomial distribution
Check outcome variable is binary, thus suitable for logistic regression
Checking binary

Histogram
should look
like this: two
columns, same
width
2. Reporting
Model fit resultsOverall Model
• How much variance (%) in ‘Exam first’ does the model predict?
• Pseudo R2 = % of variance explained - used as no ‘proper’ ones for logistic regression
• Pseudo adjusted R2 – only increases if new predictors improve model (so good to compare)

• Overall model test = Is model significantly better than using the mean to guess everyone’s grade?
(similar to F test in multiple regression, but a chi-square instead
• Model comparisons tell us if model 2 (block 1 and 2) is better than model 1 alone (if doing blocks)
2. Overall Logistic Regression model
• Pseudo R2 statistics
• may be best for comparing results of different models that use the same data set as they focus on how much it changes as
predictors change.
• May be less good as an indicator of variance explained that can used to compare across models (i.e., not quite the same or good
as a normal R square)
• People may vary in terms of which pseudo R2 to report
• See more info here: https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-squareds/
• Report:
• “The overall model is significant, χ2 (6) = 13.2, p = .040.
• Pseudo R2 statistics indicate that approximately 9% (Cox and Snell R2) to 13% (Nagelkerke R2 ) of variance in the
outcome is explained by the model.”
The overall model’s test: chi-square,
assesses whether the overall logistic
regression model is statistically
significant or not (similar to the F-
ratio in multiple regression)
Model fit continued…
• The classification table is also useful for evaluating how good your model is
3. Reporting model Predictors
Examine if individual predictors were significant, and if so how important
were they in the model?

1 predictor
was
statistically
significant
2a. Reporting predictors:
• Look at p values to determine significance of each predictor.
• The odds ratio (OR) tells us the direction and magnitude of effect of each
predictor on the outcome. e.g. Odds ratio = Effect size
• Report:
• “On average, for every unit increase in conscientiousness the odds of getting a first
become 2.68 times larger, OR = 2.68, 95%CI [1.41, 5.08].”
• Odds ratios below 1 and significant would simply be a decrease, e.g. for an OR of
0.50 …
• Predictor will be not statistically significant (p > .05) if 95% Confidence Intervals (CIs)
pass over 1 (e.g. 0.5 to 2.3), as relationship could cause an increase or decrease.
Thus, we can’t be sure the odds differ for people with different levels of these traits.
• Always report CIs for Odds ratios.
ASSUMPTION B Outliers
B) Outliers
Maximum Cook’s distance < 1 = no outliers
• Look in descriptives of new variable cook’s distance:
• Are any cases/individuals having an excessive effect on the
model?
• No influential case when the maximum Cook’s distance is less than 1
• Don’t have to report specifically, but could write…
• No single cases exerted excessive effects on the regression model (Cook’s Distance,
range = 0.001 to 0.050).
• If higher than 1, then run model again without those cases and report….
• X cases were exerting large effects on the regression model (Cook’s Distance = X for
case X, X for case X). Re-running the regression model without these cases resulted in
….
Multicollinearit
ASSUMPTION F y
F) Multicollinearity – correlation matrix
Example:
red ones are sig relationship
between predictors (but low
correlation value, so OK).
Green ones are sig relationship
between predictors & outcome
Blue ones are not sig
relationship between predictor
& outcome
F) Multicollinearity
• Don’t want predictors to be too similar to each other
• Problems if Variation Inflation Factor (VIF) > 10
• (tolerance =1/VIF) These are fine
• Problems if predictors correlated >.8 or >.9 in correlation matrix

• If high VIFs, examine correlation matrix & drop or combine problem predictor(s)
• e.g. combine highly correlated predictors, or drop one of them

• If want to report precisely, options are… Grades


1. There was no indication of multicollinearity, VIF range = 1.14, 1.26.
2. Inspection of the correlation matrix identified ?? predictor to be dropped and the
model run again. Re-running the regression model without these cases resulted in….
3. From inspecting the correlation matrix, predictors ?? and ?? were combined and the
model run again. Re-running the regression model without these cases resulted in….
Activity:
Interpret the data output
Stress and incidence of depression
• Wang and Schmitz (2011). Does job strain interact with psychosocial
factors outside of the workplace in relation to the risk of major depression?
The Canadian National Population Health Survey. Social Psychiatry and
Psychiatric Epidemiology 46: 577-584.

• Examined 6,008 working participants aged 18-64 (free of depression at


baseline).

• Predictors = first model uses different types of stress, then the number of
stressors
• Outcome = 6-year incidence of depression
Different kinds of stress & depression
Variable Category OR (95% CI)
Job strain Low 1
High 1.58 (1.25-2.00)
Negative Life events No 1
Yes 2.11 (1.68-2.65)
Chronic stress No 1
Yes 1.99 (1.46-2.71)
Childhood trauma No 1
Yes 1.68 (1.32-2.15)

All models were adjusted by gender, age, marital status, education and time-varying covariates (employment status, self-rated health and having
one or more long-term medical conditions)

Which variables do you think might


Can you interpret what each of these odds
significantly predict depression incidence
ratios mean?
and why?
Number of stressors (4 option categorical
predictor) and depression incidence
Number of stressors OR (95% CI)
0 1
1 1.52 (0.72-3.20)
2 1.91 (0.93-3.95)
3+ 4.37 (2.17-8.83)

All models were adjusted by gender, age, marital status, education and time-varying covariates (employment status, self-rated health and having
one or more long-term medical conditions)

Can you interpret what each of these odds


ratios mean?
•Fo
ur
Other stuff
power analysis, sample size
Power
• Calculating power is quite involved for logistic regression so I won’t
cover it here.
• If you need to do in the future there are some nice vids on youtube:
• https://www.youtube.com/watch?v=-XEMewjLnZk
Finally….Further viewing
• How to do logistic regression in jamovi: (9 minutes)
https://www.youtube.com/watch?v=s7GL0z-3ymA
• A different perspective in explaining logistic regression,
including the usual part 4 – the complex statistical stuff:
• National Centre for Research Methods – Dr Heini Väisänen
• 19 minutes long
https://www.youtube.com/watch?v=777b3xDnN8w&list=RDCMUCtAbL
1TprMe465zRm3Rbwsg&index=7
Finally ….Further reading examples
•Check the reading list:
• Urban street tree biodiversity and antidepressant prescriptions

• The impact of nonverbal ability on prevalence and clinical presentation of


language disorder: evidence from a population study
• Perception of a need to change weight in individuals living with and beyond
breast, prostate and colorectal cancer: a cross‑sectional survey
QUESTIONS?
DISCUSSION
BOARD
WORKSHOPS
Week 9
Mediation

You might also like