Slides 01
Slides 01
Fabian Scheipl
• You can always ask questions in German, I’ll translate and reply in
English.
• Moodle forums will be used to collect your questions & requests for
clarification that we’ll discuss in the Q&A sessions.
• Lectures, code and exercise sessions will all be relevant for the exam.
1.2 Examples
[We will use the term “subject” for convenience, even though the unit of
observation might be an animal, a crop field, a country etc.]
• If the observation times also have the same distance 𝑑 = 𝑡𝑗+1 − 𝑡𝑗 for
all 𝑗, they are called equally spaced.
1.2 Examples
400
300
200
Average reaction time [ms]
subject: 333 subject: 334 subject: 335 subject: 337 subject: 349 subject: 350
400
300
200
subject: 351 subject: 352 subject: 369 subject: 370 subject: 371 subject: 372
400
300
200
0.0 2.5 5.0 7.5 0.0 2.5 5.0 7.5 0.0 2.5 5.0 7.5 0.0 2.5 5.0 7.5 0.0 2.5 5.0 7.5 0.0 2.5 5.0 7.5
Days of sleep deprivation
• Data from Potthoff and Roy (1964), re-analyzed in the book by Little
and Rubin (1987) and part of the R-package nlme.
28
Distance [mm]
sex
24 Male
Female
20
16
8 10 12 14
Age [years]
• N=100 children 12-33 months old with high blood lead levels
• Response: Blood lead level (𝜇g/dl)
• Treatment: placebo or succimer (enhances urinary excretion of lead)
• Measurements: baseline, week 1, week 4 and week 6 (balanced data).
Treatment Placebo
60
Blood lead level [mg/cl]
40
20
0
0 2 4 6 0 2 4 6
Week
• 𝑁 = 50 male rats
• Same measurement times 𝑡𝑗 for all rats, but dropout due to rats not
surviving the anesthesia (unequal 𝑛𝑖 ).
rats.sas
85
80
Distance [pixels]
75
70
• The data with 2376 observations on 369 men infected with HIV is
highly unbalanced (see Diggle et al, 2002).
• Questions of interest:
– the average time course for the CD4 cell depletion
– time courses for individuals
– heterogeneity between individuals
– factors influencing the CD4 cell count change (age, cigarette and
drug use, number of sexual partners, psychological health)
3000
CD4 cell count [1/ml]
2000
1000
−2 0 2 4
Time since seroconversion [a]
• Covariates can be
– time-invariant and only measured at baseline, e.g. gender or
treatment for Examples 2-4.
– time-varying and measured over time, e.g. cigarette and drug use
in the CD4 data.
• If so, which shape do they take? Linear (e.g. growth in children)? Are
there break points (e.g. TLC trial)?
1.2 Examples
●
income
●
●
● ●
time
income
●
●
●
● ●
time
Longitudinal studies can follow individual change over time and are thus
more informative than cross-sectional studies (𝑛𝑖 = 1).
Note that E[𝑌𝑖𝑗 − 𝑌𝑖𝑘 ] = 𝛽𝐿(𝑡𝑖𝑗 − 𝑡𝑖𝑘 ), i.e. changes in 𝑌 for subject 𝑖 when 𝑡
changes contribute to the estimation of 𝛽𝐿.
● ●
● ●
● ●
income
income
● ●
● ●
● ●
● ●
● ●
● ● ● ●
● ●
● ●
time time
28 28
Male
Male
24 24
Distance [mm]
Distance [mm]
20 20
16 16
32 32
28 28
Female
Female
24 24
20 20
16 16
8 10 12 14 8 10 12 14
Age [years] Age [years]
[Confounder: a variable that is associated with both the response and the
covariate of interest and will lead to biased effect estimates if ignored.]
Note that 𝑧𝑖 does not appear in the mean of the change in 𝑌. Longitudinal
studies thus offer better protection against confounding: For changes
in the response, each subject serves as its own control for time-constant
variables such as age, gender, socio-economic background, education,
genetics, disease history, ….
1.2 Examples
• Often, observations are more similar the closer they are in time, i.e. the
correlation is decreasing with the time difference. (In contrast to
clustered data, e.g. on families.)
• There has been a lot of development in recent years, but flexibility and
robustness of software can still be an issue.
1.2 Examples
• Mixed models: Observations are correlated, because they are from the
same subject and share the same underlying processes.
• In the linear model: Transition and linear mixed model imply marginal
models with particular correlation structures (cf. Ch. 3.5 and 6.1).
The 𝛽 parameters in all three approaches have the same marginal
interpretation. This is no longer the case in the generalized setting,
see Ch. 8 ff.
• For the linear case, we will focus on the linear mixed model (Ch. 3-7).
The generalized linear mixed model is discussed in Chapter 9.
• Marginal models are discussed for the generalized case in Chapter 10.
• Both model time courses and try to take into account temporal
correlation between observations.
• Longitudinal data typically span shorter time periods than time series,
but they contain independent replications in the form of subjects.
This allows us to borrow strength (can be more robust to model
assumptions).