Ilovepdf Merged (2)
Ilovepdf Merged (2)
Business Studies
Fredrik Tell
Uppsala University
Outline
• What is science?
• Philosophical considerations
– Ontology
– Epistemology
– Methodology
• Four stances on philosophy of science (Van de Ven)
• Scientific progress and the role of paradigms
WHAT IS SCIENCE?
Dreams of science…
Being scientific…
• The goal is inference
• Descriptive inference
• Causal inference
• The procedures are public
• Makes possible assessment of how knowledge claims
were generated
• The conclusions are uncertain
• Reaching perfectly certain conclusions is impossible
• The content is the method
• Being scientific means adhering to the rules of science,
rather than the topic investigated
(King, G. R. O. Keohane & S. Verba 1994. Designing
Social Inquiry, Princeton University Press)
SOME CONCEPTS TO HELP
US NAVIGATE
Metaphysics and Ontology
Metaphysics
Nomothetic Ideographic
• Purpose: Establishing general • Purpose: Establishing
laws and empirical understanding of the particular
generalization context in order to generate
• Requires comparative studies (broader) understanding.
(usually of large samples). • Only feasible with few studies,
• Studying the general in order to retain depth in
occurence of something (like description. Can be
an event). Often specific comparative.
features. • Studying a case of something
• ”Something” = phenomenon (like an event)
• ”Something” = phenomenon
Variance and Process approaches
Variance Theory Process Theory
Attributes of:
• Environment (x1) Organization State State
• Technology (x2) Outcomes A B
• Decision
(Y)
Process (x3) • events
• Resources (x4) • activities
•choices
• Logical positivism
• Relativism
• Pragmatism
• Realism
Stool
Recliner
Seat
Chair
Camp Throne
chair
Rocker
Ludwig
Wittgenstein
Wittgenstein (2): language games
Aspects of paradigms
• Anomalies
• Scientific discoveries (or inventions?)
• Holism and the problem of inventor identification
• Barriers and resistance to change
• Uncertainty and paradigms
• Evaluation criteria
• The substitution of paradigms
• The role of thought experiments
Analogies
Almagest, Claudius Ptolemy (c. AD 90 – c. AD 168)
On the Revolutions of the Celestial Spheres
Nicolaus Copernicus (19 February 1473 – 24 May 1543)
Do paradigms matter in Business Studies?
Joachim Landstrom
Landstrom, Joachim
Introduction
Research design issues when studying cause-effect Introduction
Designing a quantitative study Research questions
Managing a quantitative study
Jiang, J. (Xuefeng), Wang, I. Y., & Wangerin, D. D. (2018). How does the FASB
make decisions?
“This study examines how the Financial Accounting Standards Board (FASB) sets
Generally Accepted Accounting Principles (GAAP) over the past 40 years.”
Ingram, R. W. (1985). A Descriptive Analysis of Municipal Bond Price Data for Use
in Accounting Research
“In this paper I describe municipal bond data that have not been used in prior
research but that appear to perform in a similar fashion to corporate security data”
Landstrom, Joachim
Introduction
Research design issues when studying cause-effect Introduction
Designing a quantitative study Research questions
Managing a quantitative study
Hong, Y., & Andersen, M. L. (2011). The Relationship Between Corporate Social
Responsibility and Earnings Management: An Exploratory Study
“In this article, we examine the communication process by investigating the
potential relationship between corporate social responsibility (CSR) and the quality
of their financial reporting.”
Landstrom, Joachim
Introduction
Research design issues when studying cause-effect Introduction
Designing a quantitative study Research questions
Managing a quantitative study
Jedson, P., (2023). Mandatory disclosure and learning from external market
participants: Evidence from the JOBS act
“This paper examines whether mandatory disclosure affects how much firms learn
from external market participants, that is, whether there is a market feedback
effect.”
Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study
Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study
Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Credible (quantitative) research
Designing a quantitative study
Managing a quantitative study
Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Credible (quantitative) research
Designing a quantitative study
Managing a quantitative study
External validity
“The failure of cold fusion”: Research is only preliminary until replication is
successful
Replication crisis, as evidenced in business studies
Economics: 61 percent complete replication (Camerer et al, 2016).
The Strategic Management Journal: 80 percent failed to replicate (Bergh et al,
2017).
Financial Accounting & Auditing: 60 percent complete replication, 29 percent
partial replication (Salterio et al, 2022).
Financial Economics: 82 percent complete replication (Jensen et al, 2023).
In-sample versus out-of-sample.
Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Credible (quantitative) research
Designing a quantitative study
Managing a quantitative study
Internal validity
“Fit” between RQ/theory/hypotheses
“Fit” between theory/hypotheses/model(s)
Research is seldom, truly, innovative: It is incremental
“Standing on the shoulders of giants” is a nice maxim.
Thus: Makes sense to, often, stay close to a lead-article
Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Credible (quantitative) research
Designing a quantitative study
Managing a quantitative study
Reliability
Replicability
Method chapter is key
Clearly stated sample extraction/data management
Clearly stated model(s)
Clearly stated variable measurements
“Standing on the shoulders of giants” is a nice maxim.
Makes sense to, often, stay close to a lead-article
Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study
Theory
Not “high” theory,
and not consultancy reports either.
Published, empirical, research, in relevant area.
Current research (not stone-age old research).
Often generates hypotheses. But not necessarily.
Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study
Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study
Data sources
Primary data
Unique
Time consuming
Difficult to get access
Not possible to replicate
Thus, rarely used in quant-studies
Secondary data
Non-unique: Comes from an external database
Relatively easy to get lots of data
Subscription based: Refinitiv Eikon, Retriever Business, SHoF, Retriever
Research, Factiva
Possible to replicate
Thus, almost always used for quant-studies
Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study
Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study
Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study
References
Bergh, D. D., Sharp, B. M., Aguinis, H., & Li, M. (2017). Is there a credibility crisis in
strategic management research? Evidence on the reproducibility of study findings.
Strategic Organization, 15(3), 423–436.
https://doi.org/10.1177/1476127017701076
Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M.,
Kirchler, M., Almenberg, J., Altmejd, A., Chan, T., Heikensten, E., Holzmeister, F.,
Imai, T., Isaksson, S., Nave, G., Pfeiffer, T., Razen, M., & Wu, H. (2016). Evaluating
replicability of laboratory experiments in economics. Science (American
Association for the Advancement of Science), 351(6280), 1433–1436.
https://doi.org/10.1126/science.aaf0918
Jensen, T. I., Kelly, B., & Pedersen, L. H. (2023). Is There a Replication Crisis in
Finance? The Journal of Finance (New York), 78(5), 2465–2518.
https://doi.org/10.1111/jofi.13249
Salterio, Steven E. and Luo, Yi and Adamson, Constance, Replication of Audit and
Financial Accounting Research: We do a lot more than we think (2022).
http://dx.doi.org/10.2139/ssrn.4210603
Landstrom, Joachim
Scientific Methods in Business Studies, ht-24
Lecture 3
Anna Bengtson
Todays agenda
1. Qualitative studies – some recaps
3. Case studies
4. Finding data
5. Data sources
1. Interviews
2. Observations
3. Secondary data
4. Etc.
Motives for qualitative studies
• RQ
Plenty of choices!
• Design
• Experiment (quasi-experiment)
• Case
• Cross-sectional
• Longitudinal
• Comparative studies
• Pedagogical tool to teach law at Harvard Law School from the 1870s
• In law and medicine there are “already existing cases” – law cases
and patient cases respectively, while other subjects have to construct
their teaching cases
When using research cases?
• When asking “how” and “why” questions
• Extendable or not?
• Agreement of length?
• Learn about the issue but also the context and/or person you are
going to meet.
Statement/Claim/Argument
How to make sense of data?
Coding
• Code = Word or phrase
representing the essence or key
attributes of a narrative, a
sentence, a piece of text…
• Used to categorize data
• Organizing into ”chunks” that are
alike or similar
Why do we code?
• Don’t always!
• A way of working
• Beginning to analyze
• Document analytical process
• Retrieving data – finding your way back
Theory
Tentative hypothesis
Pattern
Observations
Grounded theory – or “emergent” coding
• Constant comparison!
• Between data sources
• Between theory and data
• Between concepts
• Good fit between concepts and the data
• Constant Comparison Stop Motion Demo - YouTube
Core elements of coding
1. Get to know your data
2. Mark the text
3. Code
4. Relate to theory
1. Get to know you data!
• Read!
• Make notes of major themes,
unusual issues, events, things
that surprise you, recurrent
events/phrases/themes
• Group cases into types of
categories
2. Mark the text
• Mark sentences, words, phrases,
paragraphs, or longer pieces of
text
• Mark the text using short
descriptions, key words etc.
3. Code
• Develop a coding scheme
• Mark the text systematically
• Review the codes
• Think of groupings
• Drop, merge or split codes
4. Relate to theoretical ideas
• Add interpretation!
• Relate codes to research question
• Seek interconnections between codes
An example
RQ:How does scientific innovation happen?
Tell me about your BEC research. When did you get involved in this line of research, and why?
Interview A: ”I got a big grant, and then I could start this experiment…” […] ”I could hire three PhD students to work on the project, so I did not have to
do all the work myself” […] ”We worked for four years without any results” […]
Interview B: ”I accepted a professorship here and I could start to work properly”. […] ”I worked with a group of other physicists here, including PhD
students” […] ”This kind of experiment was getting a lot of attention because of the Nobel Prize in laser physics in XX.” […] ”It takes many, many years to
get to that point in this field”
Interview C: ”We had very little money so we had to buy smaller XX” […] ”It was believed to be a very ”hot” topic internationally” […] ”Our boss
supported us, despite the fact that the project ran over time”
(These quotes have been altered from the original for the purpose of this example!)
• Mark sentences, words, phrases,
paragraphs, or longer pieces of
text
• Mark the text using short
descriptions, key words etc.
Or this:
How does scientific innovation happen?
Interview A: ”I got a big grant, and then I could start this experiment…” […] ”I could hire three PhD students to work on the project, so I did not have to
do all the work myself” […] ”We worked for four years without any results” […]
Interview B: ”I accepted a professorship here and I could start to work properly”. […] ”I worked with a group of other physicists here, including PhD
students” […] ”This kind of experiment was getting a lot of attention because of the Nobel Prize in laser physics in XX.” […] ”It takes many, many years to
get to that point in this field”
Interview C: ”We had very little money so we had to buy smaller XX” […] ”It was believed to be a very ”hot” topic internationally” […] ”Our boss
supported us, despite the fact that the project ran over time”
(These quotes have been altered from the original for the purpose of this example!)
Coding
• How does scientific innovation happen?
Running late
Doctoral students
Long time perspective
Important publication
International scientific community
Nobel prize
Support from colleagues
Professorship
Belief in our work
Creating categories/themes
TIME
RESOURCES Running late
Professorship
SUPPORT
Important publication
• Support?
• Local support (department, colleagues, ”boss”…)
• International community
Resources Support
Protective
Space
How does scientific innovation happen?
• Theorize about the role of ”protective space” for scientists, and some
of the elements of such a space. We claim that scientists need to feel
”protected” or ”safe”, in order to engage in new or innovative
scientific fields/practices. This includes having a secure work
situation, (allowing both sufficient time to do research and access to
labs, equipment, PhD students etc.), feel support in their local
environment, and have the ”freedom to fail” and enough time to try
and retry.
Data: Quotes from interviews
• Read!
• Code!
• Analyze!
Coding is…
• …a way of working.
• …a step in the analytical process, but not everything.
• …not necessary.
Analyzing,
structuring and
presenting
Qualitative work
Linda Wedlin
agenda
1. Grounded theory
2. Content Analysis
3. Discourse analysis
Deduction
Theory
Hypotheses
Observations
Confirmation
2. Content Analysis
• Deep method!
• textual raw data
• Critical discourse analysis - focuses on the ways in which
social and political domination are said to be visible in
text and talk
• Genres, styles, narratives, vocabulary?
Example: Scania CEO statement
Scania 2010 Sustainability report
• Chaos!
• Representation
• Authority
1. Sorting
• Convicing
• Representing
• Linking claims to evidence
Data: What you can read/hear/see/feel/touch/smell
Statement/Claim/Argument
On arguments
Constructing an argument
• Evidence:
• He continuosly writes interesting and well argued reports,
which are given the highest grades.
• Reason:
• Implicit: (Because good grades is assumed to be a
marker for a good student)
• Emotions play a larger role in rational decision-
making than most of us think, because without the
help of the emotional centers of the brain, we
cannot make rational decisions. Persons whose
brains have suffered physical damage to their
emotional centres cannot make even simple,
everyday decisions.
• Accurate
• Precise
• Sufficient
• Representative
• Authoritative
Your text should provide an
argument
• Start where your readers ”are”, what they know
and don’t know, what they question and what
they don’t
• Formulate a key claim – make a statement
• Argue for your claim!
• You have to convince them – not just ask them to
believe you
• You have to show AND tell
Some final notes
The importance of analysis
• Credibility
• Alternative explanations
• Explain outliers, deviant cases
• Apply comprehensive data treatment
• Transferability
• Analytical generalization
• Purposive sampling
An introduction to regression analysis
Joachim Landstrom
December 3, 2024
Introduction Introduction
where,
y is the dependent variable (the ‘effect’)
x is the independent variable (the ‘cause’), a.k.a. covariate, regressor.
E(y |x1 , x2 , . . . , xK −1 ) is the predicted/fitted value
The regression parameters, e.g. β1 are population parameters
Landstrom, Joachim What is an OLS regression?
Introduction Introduction
Regression plot
1 x11 x21 · · · xK −1,1 β0
1 x12 x22 · · · xK −1,2 β1
If X = i, X1 , X2 , . . . , XK −1 N×K = . .. , and β = .. ⇒
.. .. ..
.. . . . . .
1 ··· ··· ··· xK −1,N βK −1 K ×1
| {z }
a.k.a the design matrix
y = Xβ + u ⇒ u = y − Xβ
Linearity
The model must have a linear relationship
Linear independence
No exact linear relationship in X, e.g.:
MarketCap = β0 + β1 Revenue + β2 COGS + β3 GrossProfit, where
GrossProfit ≡ Revenue − COGS
Clearly there is an exact linear relationship among the covariates
N must be at least as large as K , since we are solving a linear equation
system.
With a constant in the regression model, this assumption also implies that X
cannot be completely constant — variation is necessary.
Exogeneity of covariates
The error is not a function of X, i.e., E(u|x1 , x2 , . . . , xK ) = E(u|X) = 0
This means that the covariates does not carry useful information for predicting u
This also implies that for each observation E(ui |X) = 0
Since, for each observation, we have E(ui |X) = 0, it follows that E(u) = 0, and
this means that we get the PRF.
Homoskedasticity
Errors are uncorrelated, which implies that the covariance between the errors
is zero. That is, E(ui , uj ) = 0.
This is also known as non-autocorrelation (in time-series samples).
The conditional variance of errors is constant, which thus equals the
unconditional variance.
That is: E[var (u)|X] = E[var (u)] = σ 2 , i.e. E[var (u)|X] ≡ E(uu′ |X) = σ 2 I
uu′ is also
known as the variance-covariance
matrix of the error
u1 u1 u1 u2 u1 u3 ··· u1 uN σ2 0 · · ·
··· 0
..
0 σ2 · · · ··· 0
u2 u1 u2 u2 ··· . u2 uN
′ 0 0 σ2 ··· 0 = σ 2 IN
E(uu |X) = u3 u1 u3 u2 u3 u3 · · · u2 uN
=
.. .. .. .. .. ...
..
.
..
.
..
.
..
.
. . . . .
uN u1 uN u2 uN u3 · · · uN uN 0 0 0 ··· σ2 N×N
X
SSR ≡ (ûi2 ) = û′ û ⇒
(y − Xβ̂)′ (y − Xβ̂) ⇒
y′ y − β̂ ′ X′ y − y′ Xβ̂ + β̂ ′ X′ Xβ̂
y′ y − 2β̂ ′ X′ y + β̂ ′ X′ Xβ̂
dSSR
= −2X′ y + 2X′ Xβ̂ = 0 ⇒
d β̂
−1 ′
β̂ = (X′ X) Xy
−1 ′ −1 ′ −1
β̂ = (X′ X) X y = (X′ X) X (Xβ + û) = β + (X′ X) X′ û
| {z }
sampling error
Then if E[û|X] = 0 ⇒
−1 ′ −1 ′
E(β̂|X) = β + (X′ X) X E(û|X) = β + (X′ X) X0=β
−1 ′
Since β̂ = β + (X′ X) X û, we get
−1 ′ −1 ′
E[var (β̂)|X] = (β + (X′ X) X E(û|X) − β)(β + (X′ X) X E(û|X) − β)′
−1 ′ −1
E[var (β̂)|X] = (X′ X) X E(ûû′ |X) X(X′ X)
| {z }
Note!
−1 ′ 2 −1
E[var (β̂)|X] = (X′ X) X σ IX(X′ X)
−1 ′
E[var (β̂)|X] = σ 2 (X′ X) X X(X′ X)−1
−1
E[var (β̂)|X] = σ 2 (X′ X)
Standard errors
The estimated residual variance (K − 1 is no of covariates excl the constant):
σ̂ 2 = SSR × (N − K )−1
The t-test
β̂ − β
tβ̂ =
se(β̂)
Goodness-of-fit
X
Let SST ≡ (yi − y )2 = y′ y − N × y , and SSE ≡ SST − SSR ⇒
SST SSE SSR
= + ⇒
SST SST SST
P 2
SSE SSR ûi û′ û
R2 ≡ =1− =P 2
= ′
SST SST (yi − y ) yy−N ×y
The adjusted R 2
R 2 is increasing in the number of independent variable.
2
E.g Add variable z to X: RXz = RX2 + (1 − RX2 )ρ2yz , where ρ2yz is the correlation
between y and z
So, to compare models R 2 needs to be adjusted, so that such an automatic
increase does not bias the decision. The adjusted R 2 is computed as:
2 N −1
Radjusted =1− (1 − R 2 )
N −K
Heteroskedasticity I
Recall:
−1 −1
E[var (β̂)|X] = (X′ X) X′ E(uu′ |X)X(X′ X) , and
u1 u1 u1 u2 u1 u3 · · · u1 uN σ2 0 · · ·
··· 0
.. 2
0 σ ··· ··· 0
u2 u1 u2 u2 . · · · u2 uN
E(uu′ |X) = 0 0 σ2 ··· 0
u3 u1 u3 u2 u3 u3 · · · u2 uN =
.. .. .. .. .. ...
..
.
..
.
..
.
..
.
. . . . .
uN u1 uN u2 uN u3 · · · uN uN 0 0 0 ··· σ2
Heteroskedasticity II
Thus, the se(u) becomes biased, which biases se(β), that affects the ability
of the t-test to correctly test H0
Recall:
−1 −1 ′
E(β̂|X) = β + (X′ X) X′ E(u|X) = β + (X′ X) X0=β
| {z }
Sampling error
Above only assumes E(u|X) = 0, which implies that still E(β̂|X) = β when
heteroskedasticity is present.
Auto-correlated errors I
Recall:
−1 −1
E[var (β̂)|X] = (X′ X) X′ E(uu′ |X)X(X′ X) , and
u1 u1 u1 u2 u1 u3 · · · u1 uN σ2 0 · · ·
··· 0
..
0 σ2 · · · ··· 0
u2 u1 u2 u2 . · · · u2 uN
′ 0 0 σ2 ··· 0
E(uu |X) = u3 u1 u3 u2 u3 u3 · · · u2 uN =
.. .. .. .. .. .. .. .. .. ..
. . . . . .
. . . .
uN u1 uN u2 uN u3 · · · uN uN 0 0 0 ··· σ2
Auto-correlated errors II
Cross-sectional data is assumed to random, so auto-correlation is seldom an
issue.
Time-series data is not drawn randomly. It comes from a single entity
observed over time. Path-dependence is then often an issue.
If the ts-model does not correctly model the path-dependence, this
dependence finds its way into the errors - leading to auto-correlated errors.
Multicollinearity
σ̂ 2 1 σ̂ 2
var (β̂k ) = × = × VIFk
SSTk (1 − Rk2 ) SSTk
N
X
where SSTk = (xi,k − x̄k )2
i=0
−1 ′
Recall: β̂ = (X′ X) Xy
−1
Recall: E[var (β̂)|X] = σ 2 (X′ X)
−1
Recall: β̂ = β + (X′ X) X′ û
| {z }
Sampling error
Variance in covariates
OLS regression strives on variance in covariates
At least one of the covariates must vary
The greater the variance, the smaller the standard error
N = 1, 000 sd = 1 sd = 2 sd = 4 sd = 8
se(β0 ) 0.1875 0.1476 0.1355 0.1322
se(β1 ) 0.1321 0.0660 0.0330 0.0165
Outliers I
Outliers are extreme values, values that severs, or impedes, the
causal-relation y = f (x).
Covariates based on ratios often suffers from this due to the small
denominator effect. Other reasons might be e.g. data input errors.
Difficult to separate outliers from extreme effects that may occur normally in
data.
Many methods exists for treating outliers. E.g.
Transformation using e.g. natural logarithm
Trimming
Winsorizing
Summary
Linearity/Non-linearity
Stationary/Non-stationary
Exogeneity of covariates
Over-specified model
Under-specified model (a.k.a omitted variable) — Endogeneity
Variance-covariance structure for errors
Homoskedasticity
Heteroskedasticity
Auto-correlation
Multicollinearity
Outliers
Next-up
Consider:
Earnings = β0 + β1 Education + β2 Ability + u
4
1
′
2
= 1 × 1 + 2 × 2 + 3 × 3 + 4 × 4 = 12 + 22 + 32 + 42 = 30
uu= 1 2 3 4
3
4
2
1 1×1 1×2 1×3 1×4 1 2 3 4 u1 u1 u2 u1 u3 u1 u4
2
uu′ =
2 2 × 1 2 × 2 2 × 3 2 × 4 2 4 6 8 u2 u1 u2 u2 u3 u2 u4
3 1 2 3 4 = 3 × 1 3 × 2 3 × 3 3 × 4 = 3 6 9 12 = u3 u1 u3 u2 u 2 u3 u4 and also
3
4 4×1 4×2 4×3 4×4 4 8 12 16 u4 u1 u4 u2 u4 u3 u42
X
u′ u ≡ diag(uu′ ) = tr (uu′ )
1 4 7 1 2 3
If X = 2 5 8 and X′ = 4 5 6 then:
3 6 9 7 8 9
P 2 P P
1 2 3 1 4 7 14 32 50 X X X X X
P 1 P 1 22 P 1 3
X′ X = 4 5 6 2 5 8 = 32 77 122 = X2 X1 X X X
P P 2 P 2 23
7 8 9 3 6 9 50 122 194 X3 X1 X3 X2 X3
Joachim Landstrom
December 3, 2024
Introduction
Introduction
Ability is ‘unobservable’
cov (Education, Ability) ̸= 0
Omitted variables problem
Biased regression parameters
Plan
Introduction
Model presentation
Pooled OLS
First-differencing
LSDV
Time-demeaning fixed effect
Random effects
The residual
Specification tests
Standard regression
y1 1 x11 x21 xK −1,1
y2 1 x12 x22 xK −1,2
= β0 .. +β1 .. +β2 .. + . . . +βK −1 +
.. ..
. . . . .
yN N×1
1 x1N x2N xK −1,N N×1
y = β0 i +β1 X1 +β2 X2 + · · · +βK −1 XK −1 +
1 x11 x21 · · · xK −1,1 β0
1 x12 x22 · · · xK −1,2 β1
If X = i, X1 , X2 , . . . , XK −1 N×K = . , and β = . ⇒
. .
. .
. . . .
. .
. . . . . .
1 · · · · · · · · · xK −1,N βK −1 K ×1
| {z }
a.k.a the design matrix
y = Xβ + u ⇒ u = y − Xβ
Landstrom, Joachim Panel regression analysis
From standard regressions to panel regressions
Model presentation
Eliminating the unobserved variable(s)
y11 1 x111 x112 ··· x11,K −1 u11
y12
1
x121 x122 ··· x12,K −1
β0 u12
y21 1 x211 x212 ··· x21,K −1 β1 u21
= 1 + u ⇒
y22 x221 x222 ··· x22,K −1 ..
. 22
. . .. .. .. .. .
.. .. ..
. . . . βK −1 K ×1
yNT NT ×1
1 ··· ··· ··· xNT ,K −1 NT ×K
uNT NT ×1
y = Xβ + u
First subscript is the c-s index, the 2nd subscript is the t-s index, and the 3rd
subscript is the column index.
Pooled OLS
First-differencing
ynt = β0 + Xnt β + zn γ + ut ⇒
ynt−1 = β0 + Xnt−1 β + zn γ + ut−1 ⇒
∆ynt = ∆Xβ + v , where:
v = ∆u = ut − ut−1
LSDV
Assume cov (X, zn ) ̸= 0, or cov (X, zt ) ̸= 0, or both
Add one dummy variable for each c-s/time into the regression
This method works in a spreadsheet program, and in SPSS
This method reduces the degrees of freedom significantly
However, estimation of c-s dummies is only consistent in T.
Should be used with caution if regression parameters for dummies is of
interest.
Random effects
ynt = β0 + Xnt β + zn γ + u
Specification test
Selecting between Pooled OLS, FE-CS, FE-TS, RE, or FD can be subjected
to format tests
1 Pooled OLS or FE? Call on the F-test for testing panel models
2 Test both time-invariant and cross-sectional invariant FE against pooled using
the the F-test for testing panel models
3 Then test time-invariant vs cross-sectional invariant using the the F-test for
testing panel models
4 Test one-way vs two-way using the the F-test for testing panel models
5 Random effect, or FE? Call on the Hausman test
Recall:
u1 u1 u1 u2 u1 u3 ··· u1 uN σ2 0 · · ·
··· 0
..
0 σ2 · · · ··· 0
u2 u1 u2 u2 ···. u2 uN
′ 0 0 σ2 ··· 0 = σ2I
E(uu |X) = u
3 1u u3 2u u3 3u · ·· u2 uN
=
.. .. .. . .. .. .. .. .. ..
. . . .. . . . . . .
uN u1 uN u2 uN u3 · · · uN uN 0 0 0 ··· σ2
u11 u11 u11 u12 u11 u21 u11 u22 u11 u31 u11 u32
u12 u11 u12 u12 u12 u21 u12 u22 u12 u31 u12 u32
′
u21 u11 u21 u12 u21 u21 u21 u22 u21 u31 u21 u32
E(uu |X) =
u22 u11
⇒
u22 u12 u22 u21 u22 u22 u22 u31 u22 u32
u31 u11 u31 u12 u31 u21 u31 u22 u31 u31 u31 u32
u32 u11 u32 u12 u32 u21 u32 u22 u32 u31 u32 u32
2
σ 0 0 0 0 0
0 σ2 0 0 0 0
σ2 0 0 0
′
0 0
E(uu |X) = = σ 2 INT
0 0
0 σ2 0 0
0 0 0 0 σ2 0
0 0 0 0 0 σ2
Landstrom, Joachim Panel regression analysis
OLS assumptions
Same assumptions apply to a panel regression as to standard OLS
regression in the form of c-s and t-s regression. But more issues pile up.
There cannot be any:
1 Within cross-section heteroskedasticity, e.g. u11 u11 = u12 u12
2 Within cross-section auto-correlation, e.g. u11 u12 = 0
3 Cross-sectional heteroskedasticity, e.g. u12 u12 = u21 u21
4 Cross-sectional contemporaneous correlation, e.g. u11 u21 = 0
5 Cross-sectional auto-correlation, e.g. u11 u22 = 0
Joachim Landstrom
December 9, 2024
Motivation
Introduction
Main structure
Motivation
An event study is a quasi experiment where the event is the treatment
The research design isolates the treatment effect from other confounding
factors
The treatment may be temporary
Give the subject an electrical shock (RCT not quasi)
Threat of fine from misbehaving
Publication of some news
or permanent
A policy change such as the introduction of mandatory sustainability reporting
Introduction of import duties (Wooldridge, 2016, pp 347 – 350)
Isolation via observing the outcome both pre- and post-treatment
Useful when RCT cannot be used (or when unethical)
Transparent
Replicable
Well-established Landstrom, Joachim Causal inference and the event study method
Motivation
Introduction
Main structure
T0 T1 T2 T3
(
1 if t = τ
Diτ =
0 if t ̸= τ
αpost are the average treatment effects
Landstrom, Joachim Causal inference and the event study method
Staggered rollout two-way dynamic fixed effect
Regression-based event study method
Examples
Kothari and Warner (2007) find that between 1974 and 2000, 565 event-study
papers where published in five journals.
The basic statistical format of event-studies has not changed.
Two changes to the method:
Daily return data are used instead of monthly data
The methods to estimate abnormal returns has improved.
T0 T1 T2 T3
Set the event window, the post-event window, and the estimation
window
Set the event window, the post-event window, and the estimation
window, continued
We then (often) trace the share price reaction over a period that we call the
post-event window. It typically begins shortly after the event window.
A short window is typically less than a year.
A long window is a year or longer.
We often observe the share price behaviour of a period before the event
window. This is the estimation window, and may extend a year back in time. It
typically ends shortly before the event window begins.
The estimation window and the post-event window are sometimes combined
to build a broader estimation window.
Abnormal returns
Expected returns
The market model parameters are usually estimated by OLS of Rit on Rmt
over the estimation window.
The choice of market index can be important. Several studies report problems
with choosing a value-weighted index. An equally-weighted index seem
preferable.
Landstrom, Joachim Causal inference and the event study method
Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR
By removing the portion of the return that is related to variation in the market’s
return, the variance of the abnormal return is reduced.
This in turn can lead to increased ability to detect event effects.
The benefit from using the market model will depend upon the R 2 of the
market model regression. The higher the R 2 the greater is the variance
reduction of the abnormal return, and the larger is the gain.
CAPM
CAPM
R̂iτ = rf τ + β̂i × (Rmτ − rf τ )
Generally, the gains from employing multifactor models for event studies are
limited (Campbell, Lo, MacKinlay, 1997, p. 156). The reason for the limited
gains is the empirical fact that the marginal explanatory power of additional
factors than the market factor is small, and hence, there is little reduction in
the variance of the abnormal return
The variance reduction will typically be greatest in cases where the sample
firms have a common characteristic, for example they are all members of one
industry or they are all firms concentrated in one market capitalisation group.
In these cases the use of a multifactor model warrants consideration.
Landstrom, Joachim Causal inference and the event study method
Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR
Sorts
Suppose that there are two factors that affect returns: Size and B/M. We do
not know whether there is a stable or linear relationship as the one specified
in the FF 3-factor model. What do we do?
1 Sort all returns in the sample into 10 deciles according to size.
2 Conditional on size, sort returns into ten deciles according to B/M. (This gives us
100 portfolios.)
3 Compute the average return of the 100 portfolios for each period. This gives us
the expected returns of stocks given the characteristics.
4 For each stock in the event study:
1 Find in what size decile they belong.
2 Find in what B/M decile they belong.
3 Compare the return of the stock to the corresponding portfolio return
4 The deviations are the abnormal returns.
Sorts
1 N N
2 = 1 ×
P P
var (AR t ) = × σ̂ ϵ MSEi , where the N is the number of
N 2 i=1 i N 2 i=1
events
T
P
CAR = AR t
i=1
T
P
var (CAR) = var (AR t ) = L × var (AR t ), where L is the length of the event
t=1
window
T-tests on CAR
Joachim Landstrom
−1
Recall: β̂ = β + (X′ X) X′ û
| {z }
Sampling error
rp,t = rf ,t + β1 (rm,t − rf ,t ) ⇒
rp,t − rf ,t = β0 + β1 (rm,t − rf ,t ) + ut , where β0 = Jensen’s α
CAPM is dead: From chaos to order — APT, the ‘new kid in the bloc’
Cochrane, J. H. (April 2011), Discount Rates. NBER Working Paper No. w16972.
Landstrom, Joachim Causal inference and portfolio sorts
Portfolio sorts: Introduction
Regression & portfolio sorts
The portfolio sort process
Cochrane, J. H. (April 2011), Discount Rates. NBER Working Paper No. w16972.
Landstrom, Joachim Causal inference and portfolio sorts
Portfolio sorts: Introduction
Regression & portfolio sorts
The portfolio sort process
Portfolio sorts
Cross-sectional regression:
rit − rft = α + xit β + uit , where
x ∈ {rm − rf , size, bm, momentum, accruals, roe, . . . }
Portfolio sorts difficult to apply with more than two factors, but
Cross-sectional regressions (may) suffer(s) from omitted variables, so
Solution is, maybe, to allow for panel fixed effects, and
Perhaps we should start to apply a two-way fixed effect (TWFE) event study
setup in the future.