0% found this document useful (0 votes)
17 views316 pages

Ilovepdf Merged (2)

The document discusses the philosophy of science in business studies, covering key concepts such as ontology, epistemology, and methodology. It outlines four philosophical stances on science, including logical positivism, relativism, pragmatism, and realism, and explores the implications of these perspectives on scientific progress and paradigms. Additionally, it emphasizes the importance of credible quantitative research design and the challenges associated with studying cause-effect relationships.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views316 pages

Ilovepdf Merged (2)

The document discusses the philosophy of science in business studies, covering key concepts such as ontology, epistemology, and methodology. It outlines four philosophical stances on science, including logical positivism, relativism, pragmatism, and realism, and explores the implications of these perspectives on scientific progress and paradigms. Additionally, it emphasizes the importance of credible quantitative research design and the challenges associated with studying cause-effect relationships.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 316

Philosophy of science in

Business Studies

Fredrik Tell
Uppsala University
Outline

• What is science?
• Philosophical considerations
– Ontology
– Epistemology
– Methodology
• Four stances on philosophy of science (Van de Ven)
• Scientific progress and the role of paradigms
WHAT IS SCIENCE?
Dreams of science…
Being scientific…
• The goal is inference
• Descriptive inference
• Causal inference
• The procedures are public
• Makes possible assessment of how knowledge claims
were generated
• The conclusions are uncertain
• Reaching perfectly certain conclusions is impossible
• The content is the method
• Being scientific means adhering to the rules of science,
rather than the topic investigated
(King, G. R. O. Keohane & S. Verba 1994. Designing
Social Inquiry, Princeton University Press)
SOME CONCEPTS TO HELP
US NAVIGATE
Metaphysics and Ontology
Metaphysics

• Fundamental nature of reality “being-as-such”, outside


the realm of objective study, things that are eternal and
do not change

(Immaterialism is sometimes called idealism)


On what there is… (Quine)

l Realism (abstract universals, e.g. Platonic forms) vs.


Nominalism (only particulars/instances, e.g. Aristotelian
critique)
Epistemology
What can we know?
l What is (certain) knowledge?
l Justified
l True
l Belief
l How can we be justified in believing
something as true?
l The skeptical challenge
Responses to skepticism:
Rationalism vs. Empiricism

René Descartes (1596–1650)

John Locke (1632–1704)


Representationalism and
abstraction of ideas
• Cartesian dualism • Lockeian view
– Cartesian split between – Mediate (experienced
mind and matter through our senses)
(extended in time and and immediate
space). (intuitive ideas)
– Representation of perception
innate (real) ideas – Knowledge =
– Knowledge = ideas agreement of ideas
rationally beyond doubt obtained through
sensory experience
– Knowledge of external
world through God
Idealist critique (Berkeley): how can one substance causally
affect another substance of a fundamentally different kind?
Explanation
Explanans and explanandum

"By the explanandum, we understand the sentence


describing the phenomenon to be explained (not that
phenomenon itself); by the explanans, the class of those
sentences which are adduced to account for the
phenomenon" (Hempel and Oppenheim, 1948: 152)

Explanations explain the particular (explananda) by the


general (explanans).
Causality
David Hume (1711-1776): A skeptic…

But no sense data can prove causation…


Methodology and implications
Deduction and induction
Nomothetic vs. Ideographic methods

Nomothetic Ideographic
• Purpose: Establishing general • Purpose: Establishing
laws and empirical understanding of the particular
generalization context in order to generate
• Requires comparative studies (broader) understanding.
(usually of large samples). • Only feasible with few studies,
• Studying the general in order to retain depth in
occurence of something (like description. Can be
an event). Often specific comparative.
features. • Studying a case of something
• ”Something” = phenomenon (like an event)
• ”Something” = phenomenon
Variance and Process approaches
Variance Theory Process Theory

Attributes of:
• Environment (x1) Organization State State
• Technology (x2) Outcomes A B
• Decision
(Y)
Process (x3) • events
• Resources (x4) • activities
•choices

Y = f(x1, x2, x3, x4) T0 T1

(Van de Ven, Andrew H. (2007), Engaged Scholarship: A Guide for Organizational


and Social Research, Oxford: Oxford University Press, chapter 5)
Four stances on philosophy of science

• Logical positivism
• Relativism
• Pragmatism
• Realism

(Van de Ven, Andrew H. (2007), Engaged Scholarship: A


Guide for Organizational and Social Research, Oxford:
Oxford University Press, chapter 2)
Comparing four stances

(Van de Ven, 2008, p. 39)


Logical positivism
• Ludwig Wittgenstein (1): Logic in itself is empty
(tautological) but can say everything that can be said
meaningfully about the world (Russell: logical atomism)
• The Vienna Circle (e.g. Otto Neurath and Rudolf
Carnap): Linguistic physicalism is possible!
– Every statement is synonymous with (i.e. is
equivalent in meaning with) some physical
statement
– Therefore every meaningful statement can be
verified physically/empirically (not possible to
verify=not meaningful)
The hypothetico-deductive model

Members of the Vienna Circle


Logical positivism

• Ontology: Rejecting metaphysics, what exists is empirical


(nominalism)
• Epistemology: Mainly empiricists notion of how we gain
knowledge, albeit enabled by force of logic
• Knower: Inquirer able to observe without interfering with
empirical reality, mind and reality operate according to logic
• Language: Language can be formalized to operate logically.
As empirical reality is structured logically, propositions
about the state of the world can be formulated and
verified/rejected
• Methodology: Propositional and hypothetical through
hypothetico-logical structure for testing, Nomothetic,
variance
On denoting… (or: “word & object”)

Meaning and reference


What animal is this?
Relativism
• Ontology: Subjective idealism, reality is socially constructed
• Epistemology: Covering laws and causality is not attainable
as such explanations do not capture meanings that humans
(in interaction) ascribe to the world they encounter.
Knowledge is not cumulative.
• Knower: Constructivist, access to knowledge is mediated
through socio-linguistic processes (world-view), and so is the
ability to communicate knowledge
• Language: Language is self-referential (like rules in a
language-game), it is presupposes values and interests and
cannot separate between sense and reference
• Methodology: To reveal meaning/understanding how people
make sense and construct/apply rules that are
followed/broken, Idiographic, Process
Pragmatism
Abduction/retroduction
Pragmatism
• Ontology: Uninterested in metaphysics, but rather practical
consequences. Favors ideas (and their generation) rather
than ”nominal” features of reality, hence ”realist” with ”r”
• Epistemology: Both meaning and explanation are defined
in terms of their practical concequences, more interested in
context of discovery than context of justification
• Knower: Consequentialist. The knower has a priori
cognitive frameworks which affect perception of the world
• Language: Language serves as a tool for the inquirer to
organize and assess consequences in terms of actionability
• Methodology: To generate frameworks to understand
reality and investigate actionable consequences.
Depending on stance Idiographic/Nomothetic, Process
For real?
A positivist critic: Karl Popper
(1902-1994)
Karl Popper’s rational inquiry
Objective knowledge: Two main components
• Conjectures and refutations: Falsificationism
– Rational pursuit of the truth (truth non-verifiable)
• Three world realism (interaction of three worlds)
World 1: Consisting of physical instances and
processes
World 2: Our mental states and processes of ”knowers”
World 3: A realm of the products of the interactions
between World 1 and World 2 considered per se. Only
accessible for humans (e.g. theories, myths, art).
An evolutionary argument – Open (rational) society as
one implication
Popper’s three world theory

Popper, K. R., Objective


Knowledge. Oxford University
Press, Oxford, 1972.
Critical realism

E.g. Bhaskar, R. (1975/2008).


A realist theory of science,
Routledge
Realism
• Ontology: Realist and no rejection of metaphysics
• Epistemology: Rationalistic grounding in scientific
inquiry, but empirical facts can reject what is rationally
conceived. Lack of clear criteria for justification.
• Knower: Any scientific inquiry is guided by cognitive
frameworks and perspectives (including not-yet-falsified
assertions about the world) that function as a
”searchlight” for the knower.
• Language: Language has an ability to describe (at least
partially) underlying (real) mechanisms and structures.
• Methodologies: Ideographic/nomothetic, cross-sectional
Comparing four stances

(Van de Ven, 2008, p. 39)


The process of scientific progress
Thomas Kuhn
on scientific revolutions
• Essentially two qualitatively different kinds of science
– Normal
– Revolutionary
• Fine-grained stages in the model
– Pre-paradigmatic
– Normal science
– Crisis
– Revolution
– Back to normal
Normal science and paradigms

• Scientists are not always explicitly in agreement on the


foundations of the paradigm
– Foundations (“certainty”) may be unclear or tacit in
relation to the scientist
• Family resemblance and rule-following
– Seeing something as something
• Meaning of paradigm
– Complex belief system
– Exemplars and concrete methods
• Paradigms and scientific communities
Learning a paradigm through
family resemblance

Stool

Recliner
Seat
Chair

Camp Throne
chair
Rocker
Ludwig
Wittgenstein
Wittgenstein (2): language games
Aspects of paradigms

• Shared symbolic generalizations


• Models
• Values
• Metaphysical principles
• Exemplars or concrete problem situations

Newton-Smith, W. H. (1981), The Rationality of Science,


Routledge
Normal science

• Paradigm as example and promise


• Specialization of science
• Pre-paradigmatic stages
• Purpose of normal science
– Normal problems and rules how to solve them
• Problem solving and jig-saw puzzles – paradigm
articulation
– Collection of data
– Theoretical work
Crises

• Anomalies
• Scientific discoveries (or inventions?)
• Holism and the problem of inventor identification
• Barriers and resistance to change
• Uncertainty and paradigms
• Evaluation criteria
• The substitution of paradigms
• The role of thought experiments
Analogies
Almagest, Claudius Ptolemy (c. AD 90 – c. AD 168)
On the Revolutions of the Celestial Spheres
Nicolaus Copernicus (19 February 1473 – 24 May 1543)
Do paradigms matter in Business Studies?

Morgan, Gareth (1980),


Paradigms, Metaphors,
and Puzzle Solving in
Organization Theory,
Administrative Science
Quarterly, 25(4): 605-
622.
https://doi.org/10.2307/
2392283
Paradigms and metaphors of organizations
Designing and managing quantitative research

Joachim Landstrom

Department of Business Studies, Uppsala University

November 12, 2024


Introduction
Research design issues when studying cause-effect Introduction
Designing a quantitative study Research questions
Managing a quantitative study

What is quantitative research?


A systematic investigation,
primarily using statistical techniques,
which is often (always?) aimed at being generalizable to a larger population.
RQs are typically either descriptive, correlational (A → B, or A ← B), or
cause-effect (most common) (A → B)
Mostly follows a “linear” progress: RQ → Model → Method → Results (not
very iterative)

Landstrom, Joachim
Introduction
Research design issues when studying cause-effect Introduction
Designing a quantitative study Research questions
Managing a quantitative study

Examples of descriptive RQs

Jiang, J. (Xuefeng), Wang, I. Y., & Wangerin, D. D. (2018). How does the FASB
make decisions?
“This study examines how the Financial Accounting Standards Board (FASB) sets
Generally Accepted Accounting Principles (GAAP) over the past 40 years.”

Ingram, R. W. (1985). A Descriptive Analysis of Municipal Bond Price Data for Use
in Accounting Research
“In this paper I describe municipal bond data that have not been used in prior
research but that appear to perform in a similar fashion to corporate security data”

Landstrom, Joachim
Introduction
Research design issues when studying cause-effect Introduction
Designing a quantitative study Research questions
Managing a quantitative study

Examples of correlational RQs

Larcker, D. F., Richardson, S. A., & Tuna, I. (2007). Corporate Governance,


Accounting Outcomes, and Organizational Performance
“The purpose of this paper is to provide an exploratory inquiry into the dimensions
of corporate governance and. . . ”

Hong, Y., & Andersen, M. L. (2011). The Relationship Between Corporate Social
Responsibility and Earnings Management: An Exploratory Study
“In this article, we examine the communication process by investigating the
potential relationship between corporate social responsibility (CSR) and the quality
of their financial reporting.”

Landstrom, Joachim
Introduction
Research design issues when studying cause-effect Introduction
Designing a quantitative study Research questions
Managing a quantitative study

Examples of cause-effect RQs

Jedson, P., (2023). Mandatory disclosure and learning from external market
participants: Evidence from the JOBS act
“This paper examines whether mandatory disclosure affects how much firms learn
from external market participants, that is, whether there is a market feedback
effect.”

Gunn R, Pierce S, Romney M., (2023). How Do Investors Respond to Targets’


Interim Earnings?
“In this study, we seek to understand how investors in the target and acquirer
respond to these interim earnings announcements.”

Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study

Issues & solutions


Only theory can inform of y = f (x)
Regression will fit both y = f (x) and x = f (y)
Endogeneity: x is correlated with the errors
Confounding variables: Maybe true function is y = f (x, z)
How to proceed?
Out-of sample predictions
Experiments: Treatment vs control
Randomized controlled trials (a.k.a RCT)
Quasi-experiments

Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study

Experiments: Treatment vs control


Randomized controlled trials: Participants randomly assigned to treatment
and control group
Can be difficult to implement in business studies, and the environment is not
controllable
Quasi-experiments: Participants non-randomly assigned to treatment and
control group
Data exists pre- and post-interventions for participants
Example of methods: Regression discontinuity, difference-in-difference,
synthetic controls, event studies, fixed effects regressions, instrumental
variables, matching methods, interrupted time-series

Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Credible (quantitative) research
Designing a quantitative study
Managing a quantitative study

What’s really, really, important in quant-research? It is credibility


External validity: Can the result be generalized to a broader population?
Internal validity: Is the research design appropriate to answer the RQ?
Reliability: Consistency of measurement

Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Credible (quantitative) research
Designing a quantitative study
Managing a quantitative study

External validity
“The failure of cold fusion”: Research is only preliminary until replication is
successful
Replication crisis, as evidenced in business studies
Economics: 61 percent complete replication (Camerer et al, 2016).
The Strategic Management Journal: 80 percent failed to replicate (Bergh et al,
2017).
Financial Accounting & Auditing: 60 percent complete replication, 29 percent
partial replication (Salterio et al, 2022).
Financial Economics: 82 percent complete replication (Jensen et al, 2023).
In-sample versus out-of-sample.

Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Credible (quantitative) research
Designing a quantitative study
Managing a quantitative study

Internal validity
“Fit” between RQ/theory/hypotheses
“Fit” between theory/hypotheses/model(s)
Research is seldom, truly, innovative: It is incremental
“Standing on the shoulders of giants” is a nice maxim.
Thus: Makes sense to, often, stay close to a lead-article

Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Credible (quantitative) research
Designing a quantitative study
Managing a quantitative study

Reliability
Replicability
Method chapter is key
Clearly stated sample extraction/data management
Clearly stated model(s)
Clearly stated variable measurements
“Standing on the shoulders of giants” is a nice maxim.
Makes sense to, often, stay close to a lead-article

Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study

Theory
Not “high” theory,
and not consultancy reports either.
Published, empirical, research, in relevant area.
Current research (not stone-age old research).
Often generates hypotheses. But not necessarily.

Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study

Method & data management


1 Transparency
2 Transparency
3 Transparency
avoid e.g., Excel/Spreadsheets,
and avoid non-script based programs
Record what you do,
and have traceability in your coding,
and report what you do.
Remember: Replicability is key

Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study

Data sources
Primary data
Unique
Time consuming
Difficult to get access
Not possible to replicate
Thus, rarely used in quant-studies
Secondary data
Non-unique: Comes from an external database
Relatively easy to get lots of data
Subscription based: Refinitiv Eikon, Retriever Business, SHoF, Retriever
Research, Factiva
Possible to replicate
Thus, almost always used for quant-studies

Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study

Establish the project’s file structure


data directory
raw data
temporary data
final data
source directory (where you place scripts)
document directory
Don’t write only in cloud.
plot directory
table directory
README files with metadata in each directory

Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study

The actual work


Establish writing rules (e.g., present tense, active form, how to reference)
Use Zotero for referencing and use a joint Zotero directory/file/account.
Share the project directory,
but work on separate files,
and only merge when done.

Landstrom, Joachim
Introduction
Research design issues when studying cause-effect
Designing a quantitative study
Managing a quantitative study

References
Bergh, D. D., Sharp, B. M., Aguinis, H., & Li, M. (2017). Is there a credibility crisis in
strategic management research? Evidence on the reproducibility of study findings.
Strategic Organization, 15(3), 423–436.
https://doi.org/10.1177/1476127017701076
Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M.,
Kirchler, M., Almenberg, J., Altmejd, A., Chan, T., Heikensten, E., Holzmeister, F.,
Imai, T., Isaksson, S., Nave, G., Pfeiffer, T., Razen, M., & Wu, H. (2016). Evaluating
replicability of laboratory experiments in economics. Science (American
Association for the Advancement of Science), 351(6280), 1433–1436.
https://doi.org/10.1126/science.aaf0918
Jensen, T. I., Kelly, B., & Pedersen, L. H. (2023). Is There a Replication Crisis in
Finance? The Journal of Finance (New York), 78(5), 2465–2518.
https://doi.org/10.1111/jofi.13249
Salterio, Steven E. and Luo, Yi and Adamson, Constance, Replication of Audit and
Financial Accounting Research: We do a lot more than we think (2022).
http://dx.doi.org/10.2139/ssrn.4210603
Landstrom, Joachim
Scientific Methods in Business Studies, ht-24

Lecture 3

Collecting qualitative data

(Data collection, interviews and textual data)

Anna Bengtson
Todays agenda
1. Qualitative studies – some recaps

2. Reasons for qualitative studies

3. Case studies

4. Finding data

5. Data sources
1. Interviews
2. Observations
3. Secondary data
4. Etc.
Motives for qualitative studies

• Study the dynamics of a process

• Study meaning; what happens and how do people


understand and deal with it

• “Thick descriptions”; details, context, understandings


What are you trying to understand?
• Emotions, feelings, attitudes?

• Cultures, categories, identities, values?

• Non-formalized phenomenon (e.g. informal leadership)

• Processes, change, dynamics?

• Settings that you are not allowed into?


Qualitative research methods

• Important to have a clear strategy, but also be open for


modifications through out the research process

• Should not be “everything that is not quantitative”/an excuse


for sloppy research

• The reader needs to be convinced about the rigidity of your


research
Choice of data gathering techniques
• What is your overall purpose with the study?

• Preliminary literature review/basic knowledge


• What is known?
• What type of data is available in literature?
• What type of data is relevant / interesting to add?

• RQ
Plenty of choices!
• Design
• Experiment (quasi-experiment)
• Case
• Cross-sectional
• Longitudinal
• Comparative studies

• Techniques for gathering data


• Observations
• Interviews
• Surveys
• Documents / text
• Focus groups

• Coding and systematic data analysis


• Grounded theory
• Content analysis
• Discourse analysis
What is a case? (I)
The word “case” comes from the Latin word casus, meaning
event (“to happen”) or chance – but also accident.

- a description of a real situation (or a situation that could be


real)
What is a teaching case?
• Background in teaching at universities and university colleges

• Pedagogical tool to teach law at Harvard Law School from the 1870s

• Used at Harvard Business School from the 1920s

• In law and medicine there are “already existing cases” – law cases
and patient cases respectively, while other subjects have to construct
their teaching cases
When using research cases?
• When asking “how” and “why” questions

• When the investigator has little or no control over events

• Focus on a phenomenon in its real-life context and creates


context-dependent knowledge

• What is this a case of ? What is the unit of analysis? What is


it intended to illustrate?
• For theory development, not statistical generalization

• “… especially appropriate in new topic areas.” (Eisenhardt,


1989)
Many different types of cases

• Unique case (interesting because cannot easily


be explained by previous research)
• Typical case (to figure out what is at stake…)
• Extreme case (like banana flies…)
• Comparative cases (look at differences and
similarities between cases in predefined
dimensions)
The purpose with cases according to
Dawson (1997)

“not on working the data to strengthen the generalizability of


the findings but rather to provide narrative accounts of the
continuously developing and complex dynamic of people in
organizations”
The purpose with cases according to Yin
(1994)

…to investigate “a contemporary phenomenon within its real-


life context, especially when the boundaries between phenomenon
and context are not clearly evident”
A collective term for many approaches…

Positivism Social constructivism


-hypotheses -capture complexity
-several cases -one multifaceted case
-generalisability -convincing
-reliability, validity - the text is central,
trustworthiness
What is this a case of ?

“The locus of study is not the object of the study…


anthropologists don´t study villages, they study in villages.”
Geertz, 1973
The object of study – the focal unit
Given the aim and purpose of the study, which focal
unit should the case revolve around?
• A company
• A location
• A group
• An individual
• A relationship
• A network
• An event
• A change process
• Etc. …
Process (case) studies
• To study changes over time
• Interaction between action and context (Pettigrew, 1987)
• Different kinds of process studies (Van de Ven & Poole,
1995)
• Division into phases? (Langley, 1999)
Data
Finding data for qualitative studies
• People – interview or observe
• Documents
• Internet
• Archives
• Videotapes

• Is there already available data that can be used?


• One type of data or a combination?
What would you prefer?

• 25 CEO statements • 3 informal


(1 page each) from conversation with
the annual reports managers in 2
(1995-2020), from companies, 5 policy
one company statements, 20 corp.
web pages, and 15
newspaper articles.
Data – questions to ask…
• Match with RQ?

• Availability and access?

• Extendable or not?

• In both cases, making a structured analysis is required!

• Make a plan for collecting, analyzing and presenting data


Interviews

Some practical issues…


Time plan
Interviews take time:

- access… may take months


- researching… you need to be thorough
- conducting… travel?
- transcribing… 1 hour = 1 day
- analyzing… lots of work!
Your “contract” with the informant
• “Informed consent”

• Agree on the purpose of the interview

• Agreement of length?

• Agreement on how the material will be used (stored, shared,


reported)

• Finish with agreement on possible further contact, other


persons to interview (snowballing)
Come prepared!

• Get obvious information from other sources beforehand (web


pages, annual reports, information material…)

• Learn about the issue but also the context and/or person you are
going to meet.

• Prepare the guide based on type of interview:


• Structured interview: questions to stick to
• Semi structured (topics/areas plus questions/examples)
• In-depth interviews: topics to cover

• Test the interview guide beforehand


Creating a dialog
• A conversation where the other person is the “expert” and talks the
most…
• Show interest:
• ”This makes me curious”
• “Explain more…”
• Ask for examples
• Stick to specifics – not general world views (if that is not the
purpose…)
• Stay on track!
• Silence: allow pauses for the interviewees to reflect associate and
reflect, and break the silence themselves with significant
information.
Different types of questions
• Introducing questions (warm up, kick-start)
• Follow-up questions (more depth, develop, contextualize)
• Probing / penetrating questions (draw out more complete
narratives with details, examples)
• Specifying questions (make general statements more precise)
• Direct questions and indirect questions
• Structuring questions (finish off one part and open up another,
breaking off long irrelevant answers)
• “I would like to introduce another topic…”
• Interpreting questions (summarizing what you heard and that you
understood it correct)
Problems
• Sampling errors
• Is “convenience sample” too convenient?
• “Snowballing” may steer you in unintended direction
• Biases
• Response bias (social desirable)
• Recall bias
• Interviewer bias (unclear questions)
• Recording errors
Other methods for collecting data
Ethnography
• Ethnography is a method of observing human interaction in
social settings and activities (i.e. in their cultural context
• You learn from people from the inside (rather than study
people from the outside)
• Allows direct access to behavior and actions that may
otherwise be hidden
• Result in thick descriptions (cause and effect may be difficult
to establish)
• Reactivity of those observed?
• Takes time! Data overload?
Taking field notes
• Careful notes are essential
• Describe what you see and hear
• As well as the context in which it happens
• Provide descriptions without inferring meaning
• Separate your “analysis”/interpretations of events from the
observations
• Be careful to infer causality and meaning – careful of
observer bias!
Focus groups
• A research method organized to collect qualitative data
through interaction and directed discussions.
• Members need to be selected carefully.
• One moderator poses topics and makes sure that no one
individual dominate the group.
• One observer takes notes on non-verbal aspects.
• Neglected or unnoticed phenomenon can be brought to the
surface.
• Influence of moderator / biased questions is a risk.
“Secondary sources”
• Plenty of data “out there” waiting for you…
• Databases
• Annual reports
• Newspaper articles
• Archives
• Websites

• Good to try to triangulate (use of a variety of data


sources) to strengthen the reliability of the results
Archival data
• A structured source of information that exists independent
of the researcher (one type of secondary data)

• Not an objective mirroring of reality

• Fragments and lost pieces

• Pay attention to time (when written) and purpose (why


written) of a certain document
Limit data!

• Use a variety of data sources to learn about the topic and


what may be interesting to research further

• But do not try to do everything and do not loose sight of


your purpose due to data overload…
Thank you for listening!
Questions, comments?
Coding – an example
Linda Wedlin
Finding the line
• Connecting ”claims” to ”evidence”
• Linking empirical findings to theoretical concepts/arguments
• Finding your way from ”raw data” to ”making a statement”
• Creating a trail….
Data: What you can read/hear/see/feel/touch/smell

Interpretation: How to make sense


of what you read/hear/see/feel/touch…

Statement/Claim/Argument
How to make sense of data?
Coding
• Code = Word or phrase
representing the essence or key
attributes of a narrative, a
sentence, a piece of text…
• Used to categorize data
• Organizing into ”chunks” that are
alike or similar
Why do we code?
• Don’t always!
• A way of working
• Beginning to analyze
• Document analytical process
• Retrieving data – finding your way back

• Sorting – reducing – arguing


Various ways to do it…
• Descriptive coding
• Grounded Theory
• Thematic Coding
• In-vivo coding
• Various techniques!
• Gioia Method

• Inductive! (but coding can also be deductive…)


Induction

Theory

Tentative hypothesis

Pattern

Observations
Grounded theory – or “emergent” coding

• Glaser and Strauss (1967)


• Generate theory from data
• Do not ignore literature, but do not superimpose it!
• Not only inductive but also deductive
• "What’s going on?" and "What is the main problem of the
participants and how are they trying to solve it?"
Grounded theory

• Constant comparison!
• Between data sources
• Between theory and data
• Between concepts
• Good fit between concepts and the data
• Constant Comparison Stop Motion Demo - YouTube
Core elements of coding
1. Get to know your data
2. Mark the text
3. Code
4. Relate to theory
1. Get to know you data!
• Read!
• Make notes of major themes,
unusual issues, events, things
that surprise you, recurrent
events/phrases/themes
• Group cases into types of
categories
2. Mark the text
• Mark sentences, words, phrases,
paragraphs, or longer pieces of
text
• Mark the text using short
descriptions, key words etc.
3. Code
• Develop a coding scheme
• Mark the text systematically
• Review the codes
• Think of groupings
• Drop, merge or split codes
4. Relate to theoretical ideas
• Add interpretation!
• Relate codes to research question
• Seek interconnections between codes
An example
RQ:How does scientific innovation happen?
Tell me about your BEC research. When did you get involved in this line of research, and why?

Interview A: ”I got a big grant, and then I could start this experiment…” […] ”I could hire three PhD students to work on the project, so I did not have to
do all the work myself” […] ”We worked for four years without any results” […]

Interview B: ”I accepted a professorship here and I could start to work properly”. […] ”I worked with a group of other physicists here, including PhD
students” […] ”This kind of experiment was getting a lot of attention because of the Nobel Prize in laser physics in XX.” […] ”It takes many, many years to
get to that point in this field”

Interview C: ”We had very little money so we had to buy smaller XX” […] ”It was believed to be a very ”hot” topic internationally” […] ”Our boss
supported us, despite the fact that the project ran over time”

(These quotes have been altered from the original for the purpose of this example!)
• Mark sentences, words, phrases,
paragraphs, or longer pieces of
text
• Mark the text using short
descriptions, key words etc.
Or this:
How does scientific innovation happen?

Interview A: ”I got a big grant, and then I could start this experiment…” […] ”I could hire three PhD students to work on the project, so I did not have to
do all the work myself” […] ”We worked for four years without any results” […]

Interview B: ”I accepted a professorship here and I could start to work properly”. […] ”I worked with a group of other physicists here, including PhD
students” […] ”This kind of experiment was getting a lot of attention because of the Nobel Prize in laser physics in XX.” […] ”It takes many, many years to
get to that point in this field”

Interview C: ”We had very little money so we had to buy smaller XX” […] ”It was believed to be a very ”hot” topic internationally” […] ”Our boss
supported us, despite the fact that the project ran over time”

(These quotes have been altered from the original for the purpose of this example!)
Coding
• How does scientific innovation happen?

Big grants: ”I got a big grant, and then I could…”


Expensive equipment: ”We had very little money so we had to buy smaller XX”
Belief in our work. ”Our boss supported us, despite the fact that the project ran over time”
International scientific community: ”It was believed to be a very ”hot” topic internationally”
Long time perspective. ”It takes many, many years to get to that point in this field”
Support from colleagues:
Nobel prize/Attention
Running late
Doctoral students
Fixed position
Professorship
Deliver results quickly
Important publication
Sorting…

Running late
Doctoral students
Long time perspective

Deliver results quickly


Big grants

Expensive equipment Fixed position

Important publication
International scientific community
Nobel prize
Support from colleagues
Professorship
Belief in our work
Creating categories/themes
TIME
RESOURCES Running late

Doctoral students Long time perspective


Deliver results quickly
Big grants
Fixed position
Expensive equipment

Professorship

SUPPORT

Support from colleagues

Belief in our work

Important publication

International scientific community


Nobel prize
Merging or splitting categories?
• Resources
• Financial resources (”big grants”; ”money to buy equipment”…)
• Work capacity/human resources (”professorship”, ”PhD students”…)

• Support?
• Local support (department, colleagues, ”boss”…)
• International community

• Or Support vs. Attention?


Time

Resources Support

Protective
Space
How does scientific innovation happen?
• Theorize about the role of ”protective space” for scientists, and some
of the elements of such a space. We claim that scientists need to feel
”protected” or ”safe”, in order to engage in new or innovative
scientific fields/practices. This includes having a secure work
situation, (allowing both sufficient time to do research and access to
labs, equipment, PhD students etc.), feel support in their local
environment, and have the ”freedom to fail” and enough time to try
and retry.
Data: Quotes from interviews

Interpretation: Codes, Categories/Themes, Theory

Claims: Scientists need to feel ”protected”


Assessing your coding
• Does this makes sense?
• Is it logic and coherent?
• Relatively easy to understand?
• Is it comprehensive? – Can it host all of your data?
• Is it useful? – Does it explain what you want to understand?
Using theory and theorizing from data
• Add interpretation!
• Relate codes to research question
• Relate to existing theory
• Theorize – can we use this to make sense ”at a higher level”?

• Read!
• Code!
• Analyze!
Coding is…
• …a way of working.
• …a step in the analytical process, but not everything.
• …not necessary.
Analyzing,
structuring and
presenting
Qualitative work

Linda Wedlin
agenda

• Analytical strategies, some (more) examples


• On arguments
Analytical strategies
Two examples
Different strategies

• Preconceived theories or concepts


• OR
• Be more open-minded

• Aim for description of things you can observe


• OR
• Aim for conceptualisation of underlying patterns,
hidden meanings or other
Some commonly used methods

1. Grounded theory
2. Content Analysis
3. Discourse analysis
Deduction

Theory

Hypotheses

Observations

Confirmation
2. Content Analysis

• Suitable for large amounts of textual information


• Who says what, to whom, why, to what extent and with
what effect?
• Broad umbrella term:
• Word counts, spaces, important themes
• Recurring themes

• Often quantitative presentation


Content analysis - example

• Grafström, M. and Windell, K. (2011). The role of infomediaries:


CSR in the business press during 2000–2009. Journal of business
ethics, 1-17.
• Aim: understanding the role of business media setting the
corporate CSR agenda by exploring how CSR is presented in two
English-language business newspapers with an international
readership, Financial Times and The Guardian, between 2000-
2009.
• total 274 articles: 152 (FT), 122 (Guardian).
• Sample months: May & November
Coding schema:
1.Date

2.Source (Financial Times or the Guardian)

3.Type of article (news article, News item, Letter, or Commentary)

4.Dominant theme (General, Corporate Citizenship, Charity and


Donations, Regulation, or Marketing)

5.Dominant argument (opportunity or threat? Other?)

6.Dominant actor (investor, NGO, corporation, consultant or journalist


etc?)

7.Specific dominant actor (if mentioned by name, open category)


Coding database
What is the role of media in
constructing and popularizing CSR?

• The role of infomediaries


• Linking CSR to specific
• corporate activities (HRM and corporate regulation)
• arguments (a business opportunity)
• Spokespersons (corporate actors)
• Contributes to the Construction of CSR in corporate
practice
• A ”business case” for CSR
Looking for meaning: Discourse
analysis
• “ cultures of languages”; language as a form of social
practice
• Often RQ of symbolic nature; looking for meanings
• Broad umbrella term:
• Rhetoric
• Text/narrative analysis
• Conversation Analysis
3. Discourse analysis

• Deep method!
• textual raw data
• Critical discourse analysis - focuses on the ways in which
social and political domination are said to be visible in
text and talk
• Genres, styles, narratives, vocabulary?
Example: Scania CEO statement
Scania 2010 Sustainability report

Our very large investments in employee training


during 2009 are aimed at enabling us to take quick
and efficient advantage of the next economic
upturn. Our vision to produce 150,000 vehicles per
year at the next economic peak during the second
half of the 2010 decade remains unchanged.
Because of our training programme, we can achieve
this with an unchanged production workforce.
• Genre? CEO statement
• Vocabulary?
• Taking advantage of, produce, achieve = strong actor
• investment, efficient, production = market discourse
• Employee training – “caring employer”

• Narrative? Large investments -09 + production -10 =>


vision -10 unchanged
Recap analysis
Analysis

• Process of breaking a complex topic or substance into


smaller parts to gain a better understanding of it
• Practical examples: Market analysis, risk analysis, price
analysis
• For testing, modifying or generating theory
A few words…

• Keep an open mind – avoid preconception!


• Interpret data BUT be true to participants
• Focus on what is in the data, not what you think is
or should be in the data!
• Maintain chain of evidence!
Some pitfalls

• ”Going native”: using the meanings and explanations of


practice as analytical concepts
• Local explanations are made universal
• The particular situation is seen as unique or as an exception
• No distinction is made in the analysis between
• Action and opinion/meaning
• Activities and actions; inferred causality
• Primary and secondary information
Problems?

• Chaos!
• Representation
• Authority
1. Sorting

• Create order from chaos


• Sort out important/relevant aspects
• Finding patterns
2. Reducing

• Finding categories, themes, issues


• Links and relations
• Level of abstraction
见微知著
“seeing the small to know the big”
3. Arguing

• Convicing
• Representing
• Linking claims to evidence
Data: What you can read/hear/see/feel/touch/smell

Interpretation: How to make sense


of what you read/hear/see/feel/touch…

Statement/Claim/Argument
On arguments
Constructing an argument

• Toulmins (2003) model for argumentation


• Claim
• Reasons
• Evidence

• I claim that… because of reasons…which I base


on this evidence…
• Claim:
• One of my students, Nils, is very good.

• Evidence:
• He continuosly writes interesting and well argued reports,
which are given the highest grades.

• Reason:
• Implicit: (Because good grades is assumed to be a
marker for a good student)
• Emotions play a larger role in rational decision-
making than most of us think, because without the
help of the emotional centers of the brain, we
cannot make rational decisions. Persons whose
brains have suffered physical damage to their
emotional centres cannot make even simple,
everyday decisions.

• Example from Booth et al. 2003 (p. 142-3)


• deduction
Providing good enough reasons

• Reasons interact with evidence


• Outlines the ”logic of the argument”
• Explicates the analytical steps needed in order to link
specific evidence to your particular claim

• Context and time dependent


• Often theory laden
What counts as evidence?

• Your own research; analyses, experiments,


interviews, etc.
• Authoritative statements and ”accepted facts”
• Other people’s investigations/research
• A combination of the above!
Good evidence?

• Accurate
• Precise
• Sufficient
• Representative
• Authoritative
Your text should provide an
argument
• Start where your readers ”are”, what they know
and don’t know, what they question and what
they don’t
• Formulate a key claim – make a statement
• Argue for your claim!
• You have to convince them – not just ask them to
believe you
• You have to show AND tell
Some final notes
The importance of analysis

• Cannot be stressed enough!


• Good data is never enough – needs to be
properly analyzed!
• Limit your claims!

• Better to have thick, good and properly analyzed


data on a small claim, than thin, bad or not
sufficiently analyzed data for a ”great” claim!
• Remember: The choice of method is yours, and it
is your responsibility to make this choice and its
implications clear!
• Your analysis is your interpretation – but it is not an
opinion! It is based on evidence, backed by
reason, and verified with the help of scientific
methods!
Making qualitative research
trustworthy
• Trustworthiness
• Making the process transparent
• Making theoretical assumptions clear
• Making plausible interpretations

• Credibility
• Alternative explanations
• Explain outliers, deviant cases
• Apply comprehensive data treatment

• Transferability
• Analytical generalization
• Purposive sampling
An introduction to regression analysis

Joachim Landstrom

Department of Business Studies, Uppsala University

December 3, 2024
Introduction Introduction

Cause and effect


Regression cannot detect causation: Only theory do this
A theory giving a clear cause and effect relation is a must when employing
regressions: y = f (x1 , x2 , . . . , xK −1 )
‘bad theory’ ⇒ ‘bad model’ (miss-specified) ⇒ ‘bad outcome/conclusions’
Quasi-experiments is no remedy to ‘bad theory’

From theory to the population regression function (PRF)


y = f (x1 , x2 , . . . , xK −1 ) ⇒ E(y |x1 , x2 , . . . , xK −1 ) = β0 + β1 x1 + β2 x2 + · · · + βK −1 xK −1

where,
y is the dependent variable (the ‘effect’)
x is the independent variable (the ‘cause’), a.k.a. covariate, regressor.
E(y |x1 , x2 , . . . , xK −1 ) is the predicted/fitted value
The regression parameters, e.g. β1 are population parameters
Landstrom, Joachim What is an OLS regression?
Introduction Introduction

The linear regression model

The sample regression model


y = β0 + β1 x1 + β2 x2 + · · · + βK −1 xK −1 + u

y is the dependent variable (the ‘effect’)


x is the independent variable (the ‘cause’)
β0 and β1 are constants, and the regression parameters of interest for us.
β0 , the constant, a.k.a. intercept
β1 , the slope parameter.
Including the constant, there are K regression parameters
u is the regression error, the residual, and it is used for regression diagnostics.
We add a hat-symbol, e.g. β̂1 and û, to regression parameters and the error
when we apply the model to a sample.

Landstrom, Joachim What is an OLS regression?


Introduction Introduction

Regression plot

Landstrom, Joachim What is an OLS regression?


Introduction Introduction

The regression model in matrix form (N obs with K − 1 covariates)


           
y1 1 x11 x21 xK −1,1 u1
 y2  1  x12   x22   xK −1,2   u2 
= β0  ..  +β1  ..  +β2  ..  + . . . +βK −1 +  . 
           
 ..   .. 
 .  .  .   .   .   .. 
yN N×1
1 x1N x2N xK −1,N N×1 uN
y = β0 i +β1 X1 +β2 X2 +··· +βK −1 XK −1 + u

   
1 x11 x21 · · · xK −1,1 β0
  1 x12 x22 · · · xK −1,2   β1 
If X = i, X1 , X2 , . . . , XK −1 N×K = . ..  , and β =  ..  ⇒
   
.. .. ..
 .. . . . .   . 
1 ··· ··· ··· xK −1,N βK −1 K ×1
| {z }
a.k.a the design matrix

y = Xβ + u ⇒ u = y − Xβ

Landstrom, Joachim What is an OLS regression?


OLS assumptions
OLS derivations

Linearity
The model must have a linear relationship

Landstrom, Joachim What is an OLS regression?


OLS assumptions
OLS derivations

Linear independence
No exact linear relationship in X, e.g.:
MarketCap = β0 + β1 Revenue + β2 COGS + β3 GrossProfit, where
GrossProfit ≡ Revenue − COGS
Clearly there is an exact linear relationship among the covariates
N must be at least as large as K , since we are solving a linear equation
system.
With a constant in the regression model, this assumption also implies that X
cannot be completely constant — variation is necessary.

Landstrom, Joachim What is an OLS regression?


OLS assumptions
OLS derivations

Exogeneity of covariates
The error is not a function of X, i.e., E(u|x1 , x2 , . . . , xK ) = E(u|X) = 0
This means that the covariates does not carry useful information for predicting u
This also implies that for each observation E(ui |X) = 0
Since, for each observation, we have E(ui |X) = 0, it follows that E(u) = 0, and
this means that we get the PRF.

Landstrom, Joachim What is an OLS regression?


OLS assumptions
OLS derivations

Homoskedasticity
Errors are uncorrelated, which implies that the covariance between the errors
is zero. That is, E(ui , uj ) = 0.
This is also known as non-autocorrelation (in time-series samples).
The conditional variance of errors is constant, which thus equals the
unconditional variance.
That is: E[var (u)|X] = E[var (u)] = σ 2 , i.e. E[var (u)|X] ≡ E(uu′ |X) = σ 2 I
uu′ is also
 known as the variance-covariance
 matrix of the error
u1 u1 u1 u2 u1 u3 ··· u1 uN σ2 0 · · ·
 
··· 0
 ..  
0 σ2 · · · ··· 0
 u2 u1 u2 u2 ··· . u2 uN 
  
′ 0 0 σ2 ··· 0 = σ 2 IN

E(uu |X) =  u3 u1 u3 u2 u3 u3 · · · u2 uN 
=
 

 .. .. .. .. ..   ...
 ..
.
..
.
..
.
.. 
.
 . . . . . 
uN u1 uN u2 uN u3 · · · uN uN 0 0 0 ··· σ2 N×N

Landstrom, Joachim What is an OLS regression?


OLS assumptions
OLS derivations

Data generating process of covariates and errors


The data in X may be a mixture of both constant and random vectors.
We do not need to assume that the variables follow a certain distribution, such
as a normal distribution.
The really crucial assumption regarding the data generating process is the
exogeneity assumption: E(u|X) = 0
The assumption of homoskedastic errors, E(uu′ |X) = σ 2 I, is also important,
but may, easily, be relaxed.
In addition, the errors are usually assumed to be normally distributed. This
assumption my be violated when e.g., the sample size is small.

Landstrom, Joachim What is an OLS regression?


Regression parameters and their variances
OLS assumptions
Standard errors and the t-test
OLS derivations
Goodness-of-fit

Deriving the slope in the OLS regression by minimizing SSR

X
SSR ≡ (ûi2 ) = û′ û ⇒
(y − Xβ̂)′ (y − Xβ̂) ⇒
y′ y − β̂ ′ X′ y − y′ Xβ̂ + β̂ ′ X′ Xβ̂
y′ y − 2β̂ ′ X′ y + β̂ ′ X′ Xβ̂

dSSR
= −2X′ y + 2X′ Xβ̂ = 0 ⇒
d β̂
−1 ′
β̂ = (X′ X) Xy

where X′ X is a sscp matrix and X′ y is a cross-product matrix


Landstrom, Joachim What is an OLS regression?
Regression parameters and their variances
OLS assumptions
Standard errors and the t-test
OLS derivations
Goodness-of-fit

Sampling error and the unbiasedness of the regression parameters

−1 ′ −1 ′ −1
β̂ = (X′ X) X y = (X′ X) X (Xβ + û) = β + (X′ X) X′ û
| {z }
sampling error

Then if E[û|X] = 0 ⇒
−1 ′ −1 ′
E(β̂|X) = β + (X′ X) X E(û|X) = β + (X′ X) X0=β

limN→∞ (sampling error) = 0


X′ X holds the sums of each column of the matrix, and sums of the
−1
cross-products. So limN→∞ (X′ X) = 0
Further, sampling bias effect in the residuals disappears too.
Landstrom, Joachim What is an OLS regression?
Regression parameters and their variances
OLS assumptions
Standard errors and the t-test
OLS derivations
Goodness-of-fit

The variance-covariance matrix for the regression parameters I

−1 ′
Since β̂ = β + (X′ X) X û, we get

E[var (β̂)|X] = E([β̂ − E(β̂)][β̂ − E(β̂)]′ |X)

−1 ′ −1 ′
E[var (β̂)|X] = (β + (X′ X) X E(û|X) − β)(β + (X′ X) X E(û|X) − β)′

−1 ′ −1
E[var (β̂)|X] = (X′ X) X E(ûû′ |X) X(X′ X)
| {z }
Note!

Landstrom, Joachim What is an OLS regression?


Regression parameters and their variances
OLS assumptions
Standard errors and the t-test
OLS derivations
Goodness-of-fit

The variance-covariance matrix for the regression parameters II

Assume homoskedasticity, i.e. E(ûû′ |X) = σ 2 I, and we get


−1 ′ −1
E[var (β̂)|X] = (X′ X) X E(ûû′ |X)X(X′ X)

−1 ′ 2 −1
E[var (β̂)|X] = (X′ X) X σ IX(X′ X)

−1 ′
E[var (β̂)|X] = σ 2 (X′ X) X X(X′ X)−1

−1
E[var (β̂)|X] = σ 2 (X′ X)

Landstrom, Joachim What is an OLS regression?


Regression parameters and their variances
OLS assumptions
Standard errors and the t-test
OLS derivations
Goodness-of-fit

Standard errors
The estimated residual variance (K − 1 is no of covariates excl the constant):

σ̂ 2 = SSR × (N − K )−1

N − K [in the book n − (k + 1)] is the so-called degrees of freedom



The residual standard error is (i.e., the estimate of the σ): se(u) = σ̂ 2
a.k.a Root Mean Squared Error (RMSE), [standard error of regression (the
book)]
Standard errors for the regression parameters:
q q
se(β̂) = σ̂ × diag[(X X) ] = se(u) × diag[(X′ X)−1 ]
2 ′ −1

Landstrom, Joachim What is an OLS regression?


Regression parameters and their variances
OLS assumptions
Standard errors and the t-test
OLS derivations
Goodness-of-fit

The t-test

β̂ − β
tβ̂ =
se(β̂)

By default, all software set H0 : β = 0, which is what is disclosed in the


summary output of regressions.
So, by default, the tests are double-sided and H1 : β ̸= 0 in the software

Landstrom, Joachim What is an OLS regression?


Regression parameters and their variances
OLS assumptions
Standard errors and the t-test
OLS derivations
Goodness-of-fit

Goodness-of-fit

X
Let SST ≡ (yi − y )2 = y′ y − N × y , and SSE ≡ SST − SSR ⇒
SST SSE SSR
= + ⇒
SST SST SST
P 2
SSE SSR ûi û′ û
R2 ≡ =1− =P 2
= ′
SST SST (yi − y ) yy−N ×y

SST is Sum of Squared Total


SSE is Sum of Squared Explained
R 2 is the so-called coefficient of determination. R 2 ∈ (0, 1)
Some software, e.g. R, calls it ‘Multiple R-squared’ since the residuals may be the
outcome of a regression with several covariates.
The regression model needs to include a constant term to use R 2 and be linear
Landstrom, Joachim What is an OLS regression?
Regression parameters and their variances
OLS assumptions
Standard errors and the t-test
OLS derivations
Goodness-of-fit

The adjusted R 2
R 2 is increasing in the number of independent variable.
2
E.g Add variable z to X: RXz = RX2 + (1 − RX2 )ρ2yz , where ρ2yz is the correlation
between y and z
So, to compare models R 2 needs to be adjusted, so that such an automatic
increase does not bias the decision. The adjusted R 2 is computed as:

2 N −1
Radjusted =1− (1 − R 2 )
N −K

Landstrom, Joachim What is an OLS regression?


Non-linearity
Assumption violations I
Heteroskedasticity
Assumption violations II
Auto-correlation

Linearity & Non-linearity


E(ui |X) = 0 & E(u) = 0 vs E(ui |X) ̸= 0 & E(u) = 0

Landstrom, Joachim What is an OLS regression?


Non-linearity
Assumption violations I
Heteroskedasticity
Assumption violations II
Auto-correlation

Heteroskedasticity I

Recall:
−1 −1
E[var (β̂)|X] = (X′ X) X′ E(uu′ |X)X(X′ X) , and
  
u1 u1 u1 u2 u1 u3 · · · u1 uN σ2 0 · · ·

··· 0
.. 2
 0 σ ··· ··· 0
  
 u2 u1 u2 u2 . · · · u2 uN  
E(uu′ |X) =  0 0 σ2 ··· 0
 
 u3 u1 u3 u2 u3 u3 · · · u2 uN  = 
 

 .. .. .. .. ..   ...
 ..
.
..
.
..
.
.. 
.
 . . . . . 
uN u1 uN u2 uN u3 · · · uN uN 0 0 0 ··· σ2

Suppose we have σi2 , for all i ∈ {1, · · · , N} rather than σ 2


But still assume 0 expected covariance so that all off-diagonal elements are 0
Then E(uu′ |X) ̸= σ 2 I
Landstrom, Joachim What is an OLS regression?
Non-linearity
Assumption violations I
Heteroskedasticity
Assumption violations II
Auto-correlation

Heteroskedasticity II
Thus, the se(u) becomes biased, which biases se(β), that affects the ability
of the t-test to correctly test H0

Recall:
−1 −1 ′
E(β̂|X) = β + (X′ X) X′ E(u|X) = β + (X′ X) X0=β
| {z }
Sampling error

Above only assumes E(u|X) = 0, which implies that still E(β̂|X) = β when
heteroskedasticity is present.

Landstrom, Joachim What is an OLS regression?


Non-linearity
Assumption violations I
Heteroskedasticity
Assumption violations II
Auto-correlation

Heteroskedasticity III: Regression plot

Landstrom, Joachim What is an OLS regression?


Non-linearity
Assumption violations I
Heteroskedasticity
Assumption violations II
Auto-correlation

Heteroskedasticity IV: Histogram

Landstrom, Joachim What is an OLS regression?


Non-linearity
Assumption violations I
Heteroskedasticity
Assumption violations II
Auto-correlation

Heteroskedasticity V: Normal Q-Q-Plot

Landstrom, Joachim What is an OLS regression?


Non-linearity
Assumption violations I
Heteroskedasticity
Assumption violations II
Auto-correlation

Heteroskedasticity VI: Regression table

Landstrom, Joachim What is an OLS regression?


Non-linearity
Assumption violations I
Heteroskedasticity
Assumption violations II
Auto-correlation

Heteroskedasticity VII: Remedies


Transform the covariates prior to regression
Remove scale effect by deflating variables. E.g Profits/Assets rather than
Profits [i.e. avoid regressions using ‘levels’]
Use other transformers such as ln() when appropriate
Use robust standard error such as White standard errors (a.k.a HC0) for
larger samples. Nowadays, Cribari-Neto & Da Silva (2011) suggest to use of
HC5 rather than HC0 when data may suffer from heteroskedasticity. (HC =
Heteroscedasticity-Consistent)
HC standard errors are unbiased when data is homoskedastic → always use
robust standard errors (?)

Landstrom, Joachim What is an OLS regression?


Non-linearity
Assumption violations I
Heteroskedasticity
Assumption violations II
Auto-correlation

Auto-correlated errors I

Recall:
−1 −1
E[var (β̂)|X] = (X′ X) X′ E(uu′ |X)X(X′ X) , and
  
u1 u1 u1 u2 u1 u3 · · · u1 uN σ2 0 · · ·

··· 0
 ..  
0 σ2 · · · ··· 0
 u2 u1 u2 u2 . · · · u2 uN 
  
′ 0 0 σ2 ··· 0

E(uu |X) =  u3 u1 u3 u2 u3 u3 · · · u2 uN  = 
 

 .. .. .. .. ..  .. .. .. .. .. 
 . . . . .  .

. . . .
uN u1 uN u2 uN u3 · · · uN uN 0 0 0 ··· σ2

Suppose the covariance for the off-diagonal elements are not 0


But still assume constant variance
As with heteroskedasticity, the error’s variance-covariance structure does not
collapse into the assumed, simple form, and we need to take some action.
Landstrom, Joachim What is an OLS regression?
Non-linearity
Assumption violations I
Heteroskedasticity
Assumption violations II
Auto-correlation

Auto-correlated errors II
Cross-sectional data is assumed to random, so auto-correlation is seldom an
issue.
Time-series data is not drawn randomly. It comes from a single entity
observed over time. Path-dependence is then often an issue.
If the ts-model does not correctly model the path-dependence, this
dependence finds its way into the errors - leading to auto-correlated errors.

Landstrom, Joachim What is an OLS regression?


Non-linearity
Assumption violations I
Heteroskedasticity
Assumption violations II
Auto-correlation

Auto-correlated errors III: Regression plot

Landstrom, Joachim What is an OLS regression?


Non-linearity
Assumption violations I
Heteroskedasticity
Assumption violations II
Auto-correlation

Auto-correlated errors IV: Histogram

Landstrom, Joachim What is an OLS regression?


Non-linearity
Assumption violations I
Heteroskedasticity
Assumption violations II
Auto-correlation

Auto-correlated errors V: Normal Q-Q-Plot

Landstrom, Joachim What is an OLS regression?


Non-linearity
Assumption violations I
Heteroskedasticity
Assumption violations II
Auto-correlation

Auto-correlated errors VI: Regression table

Landstrom, Joachim What is an OLS regression?


Non-linearity
Assumption violations I
Heteroskedasticity
Assumption violations II
Auto-correlation

Auto-correlated errors VII: Remedies


Transform the covariates prior to regression
Re-specify model to incorporate path dependency
Use robust standard error such as Newey-West standard errors (a.k.a HAC0).
HAC standard errors are also robust to heteroskedasticity (HAC =
Heteroskedasticity- and Autocorrelation-Consistent)
HAC standard errors are unbiased when data is homoskedastic → always use
robust standard errors (?)

Landstrom, Joachim What is an OLS regression?


Multicollinearity
Assumption violations I
Irrelevant variables
Assumption violations II
Omitted variables

Multicollinearity

σ̂ 2 1 σ̂ 2
var (β̂k ) = × = × VIFk
SSTk (1 − Rk2 ) SSTk
N
X
where SSTk = (xi,k − x̄k )2
i=0

The problem of multicollinearity is not well-defined


Does not violate the assumption of no linear dependence
limR 2 →1 var (β̂k ) = ∞
k
Can be mitigated by increasing N, since it increases SSTk

Landstrom, Joachim What is an OLS regression?


Multicollinearity
Assumption violations I
Irrelevant variables
Assumption violations II
Omitted variables

Irrelevant variables: Over-specified model I

−1 ′
Recall: β̂ = (X′ X) Xy

Suppose the correct model is: y = Xβ + u


and the (incorrect) model used is: y = Xβ + zγ + u, where γ = 0 ⇒
But γ=0
z }| {
−1 −1 −1
β̂ = (X′ X) X′ (Xβ + zγ + û) = β + (X′ X) X′ zγ + (X′ X) X′ û ⇒
| {z }
Sampling error
′ −1 ′
β̂ − β = (X X) X û

Landstrom, Joachim What is an OLS regression?


Multicollinearity
Assumption violations I
Irrelevant variables
Assumption violations II
Omitted variables

Irrelevant variables: Over-specified model II

−1
Recall: E[var (β̂)|X] = σ 2 (X′ X)

The difference between variances:


−1 −1 1 ′ ′ ′
E[var (β̂)|X] − E[var (β̂)|X, z] = X z(z z)z X
σ2

The regression parameters are unbiased


The variances of the regression parameters are no longer unbiased, unless
X′ z = 0

Landstrom, Joachim What is an OLS regression?


Multicollinearity
Assumption violations I
Irrelevant variables
Assumption violations II
Omitted variables

Omitted variables: Under-specified model I

−1
Recall: β̂ = β + (X′ X) X′ û
| {z }
Sampling error

Suppose the correct model is: y = Xβ + zγ + v


and the (incorrect) model used is: y = Xβ + u
From it follows that: u = zγ + v, and
−1 −1
β̂ − β = (X′ X) X′ zγ + (X′ X) X′ v̂
| {z } | {z }
Omitted var bias Sampling error

Landstrom, Joachim What is an OLS regression?


Multicollinearity
Assumption violations I
Irrelevant variables
Assumption violations II
Omitted variables

Omitted variables: Under-specified model II


limN→∞ (sampling error) = 0
But the omitted variables bias does not go away as sample increases
Regression is biased if
X′ z ̸= 0 and
γ ̸= 0

Landstrom, Joachim What is an OLS regression?


Multicollinearity
Assumption violations I
Irrelevant variables
Assumption violations II
Omitted variables

Bias from under-specification


Assumptions: β0 = 0, β1 = 2, γ = 2, xi ∼ N(1, 2), v ∼ N(0, 4) zi = 0.5xi + w,
where w ∼ N(0, 4), which gives ρX,z ≈ 0.32

Bias [1,]/[2,] N = 10 N = 100 N = 1, 000 N = 10, 000


X′ z 14.091 16.022 685.989 4,612.025
73.745 149.272 3,365.101 25,015.443
(X′ X)−1 X′ zγ -1.221 -0.613 -0.024 -0.097
3.515 0.790 1.352 1.024
(X′ X)−1 X′ v̂ 1.459 0.338 -0.112 -0.040
-0.809 0.083 0.064 0.014
β̂0 − β0 0.238 -0.275 -0.136 -0.137
β̂1 − β1 2.705 0.873 1.416 1.038

Landstrom, Joachim What is an OLS regression?


Effect of sample size
Other issues Variance in the covariates
Outliers

The sample size effect

Landstrom, Joachim What is an OLS regression?


Effect of sample size
Other issues Variance in the covariates
Outliers

Variance in covariates
OLS regression strives on variance in covariates
At least one of the covariates must vary
The greater the variance, the smaller the standard error

N = 1, 000 sd = 1 sd = 2 sd = 4 sd = 8
se(β0 ) 0.1875 0.1476 0.1355 0.1322
se(β1 ) 0.1321 0.0660 0.0330 0.0165

Landstrom, Joachim What is an OLS regression?


Effect of sample size
Other issues Variance in the covariates
Outliers

Outliers I
Outliers are extreme values, values that severs, or impedes, the
causal-relation y = f (x).
Covariates based on ratios often suffers from this due to the small
denominator effect. Other reasons might be e.g. data input errors.
Difficult to separate outliers from extreme effects that may occur normally in
data.
Many methods exists for treating outliers. E.g.
Transformation using e.g. natural logarithm
Trimming
Winsorizing

Landstrom, Joachim What is an OLS regression?


Effect of sample size
Other issues Variance in the covariates
Outliers

Outliers II: Regression plot

Landstrom, Joachim What is an OLS regression?


Effect of sample size
Other issues Variance in the covariates
Outliers

Outliers III: Histogram

Landstrom, Joachim What is an OLS regression?


Effect of sample size
Other issues Variance in the covariates
Outliers

Outliers IV: Regression table

Landstrom, Joachim What is an OLS regression?


Effect of sample size
Other issues Variance in the covariates
Outliers

Outliers V: Multivariate Normality Testing — Chi-square Q-Q-Plot


(Core/Outliers)

Landstrom, Joachim What is an OLS regression?


Effect of sample size
Other issues Variance in the covariates
Outliers

Outliers VI: Trimming and Winsorizing


Both trimming and Winsorizing entails identifying an upper and lower bound.
E.g. P2.5 & P97.5
Trimming implies dropping all observations from the data that are either less
than the lower bound, or greater than the upper bound. This reduces the
dataset — difficult with few observation in the data.
An alternative trimming process is to ‘nullify’ the extreme values, i.e. to set them
to ‘NAs’
Spreadsheets can’t process regressions with ‘NAs’.
Winsorizing implies "reducing the signal strength" from the values above the
upper and below the lower bound. That is, those values are replaced by the
boundary values.

Landstrom, Joachim What is an OLS regression?


Effect of sample size
Other issues Variance in the covariates
Outliers

Outliers VII: Multivariate Normality Testing — Chi-square Q-Q-Plot


(Trimmed/Winsorized)

Landstrom, Joachim What is an OLS regression?


Effect of sample size
Other issues Variance in the covariates
Outliers

Outliers VIII: Regression table

Landstrom, Joachim What is an OLS regression?


Summary & next-up

Summary
Linearity/Non-linearity
Stationary/Non-stationary
Exogeneity of covariates
Over-specified model
Under-specified model (a.k.a omitted variable) — Endogeneity
Variance-covariance structure for errors
Homoskedasticity
Heteroskedasticity
Auto-correlation
Multicollinearity
Outliers

Landstrom, Joachim What is an OLS regression?


Summary & next-up

Next-up
Consider:
Earnings = β0 + β1 Education + β2 Ability + u

How can we remove the estimation bias from a knowingly under-specified


model?
We may miss some data
Some data cannot be collected
How can we remove estimation bias from an an unknowingly under-specified
model?
Solution to all of the above: Panel regressions!

Landstrom, Joachim What is an OLS regression?


Appendix: Math Tips

Math Tip: The difference between u′ u and uu′


 
1
2
Let u = 
3 then

4
 
1

2
= 1 × 1 + 2 × 2 + 3 × 3 + 4 × 4 = 12 + 22 + 32 + 42 = 30
 
uu= 1 2 3 4   
3
4
       2 
1 1×1 1×2 1×3 1×4 1 2 3 4 u1 u1 u2 u1 u3 u1 u4
2
uu′ = 
2 2 × 1 2 × 2 2 × 3 2 × 4 2 4 6 8  u2 u1 u2 u2 u3 u2 u4 
         
3 1 2 3 4 = 3 × 1 3 × 2 3 × 3 3 × 4 = 3 6 9 12 = u3 u1 u3 u2 u 2 u3 u4  and also
3
4 4×1 4×2 4×3 4×4 4 8 12 16 u4 u1 u4 u2 u4 u3 u42
X
u′ u ≡ diag(uu′ ) = tr (uu′ )

Landstrom, Joachim What is an OLS regression?


Appendix: Math Tips

Math Tip: Sums of squares and sums of cross products (SSCP)


matrix

   
1 4 7 1 2 3
If X = 2 5 8 and X′ = 4 5 6 then:
3 6 9 7 8 9
      P 2 P P 
1 2 3 1 4 7 14 32 50 X X X X X
P 1 P 1 22 P 1 3
X′ X = 4 5 6 2 5 8 = 32 77 122 =  X2 X1 X X X 
P P 2 P 2 23
7 8 9 3 6 9 50 122 194 X3 X1 X3 X2 X3

The diagonal elements are sum of squares for each column in X


The off-diagonal elements are the sums of cross-products between rows and
columns in in X
The matrix is a NxN matrix and it is symmetric, i.e. upper and lower triangles are
the same
Landstrom, Joachim What is an OLS regression?
Panel regression

Joachim Landstrom

Department of Business Studies, Uppsala University

December 3, 2024
Introduction

Introduction

Earnings = β0 + β1 Education + β2 Ability + u ⇒


Earnings = β0 + β1 Education + v ⇒
v = β2 Ability + u

Ability is ‘unobservable’
cov (Education, Ability) ̸= 0
Omitted variables problem
Biased regression parameters

Landstrom, Joachim Panel regression analysis


Introduction

Plan
Introduction
Model presentation
Pooled OLS
First-differencing
LSDV
Time-demeaning fixed effect
Random effects
The residual
Specification tests

Landstrom, Joachim Panel regression analysis


Introduction

Problems — and a solution


CS analysis
Sample size too small
Dynamic effects
Weak theory — risk for omitted variables
Missing data
Some data is unobservable
TS analysis
Too short time-series (i.e. sample size too small)
Weak theory — risk for omitted variables
Missing data
Some data is unobservable
TSCS analysis: Panel data
Sample size increase dramatically
Can use dynamic models
Can sweep away untreated heterogeneity (missing data/unobserved/unknown
variables)
Landstrom, Joachim Panel regression analysis
From standard regressions to panel regressions
Model presentation
Eliminating the unobserved variable(s)

Standard regression

         
y1 1 x11 x21 xK −1,1
 y2  1  x12   x22   xK −1,2 
= β0  ..  +β1  ..  +β2  ..  + . . . +βK −1 +
         
 ..   .. 
 .  .  .   .   . 
yN N×1
1 x1N x2N xK −1,N N×1
y = β0 i +β1 X1 +β2 X2 + · · · +βK −1 XK −1 +
   
1 x11 x21 · · · xK −1,1 β0
  1 x12 x22 · · · xK −1,2   β1 
If X = i, X1 , X2 , . . . , XK −1 N×K =  . , and β =  .  ⇒
   
. .
. .
. . . .
.  .
. . . . .   . 
1 · · · · · · · · · xK −1,N βK −1 K ×1
| {z }
a.k.a the design matrix

y = Xβ + u ⇒ u = y − Xβ
Landstrom, Joachim Panel regression analysis
From standard regressions to panel regressions
Model presentation
Eliminating the unobserved variable(s)

The panel regression


Now imagine that the design matrix, X, from previous slide comes from a
single individual/department/firm/country.
Each row in X is holds data from a time period. The first row has the first time
period, and the time index increases for each successive row.
Now imagine that you collect more than one such the design matrix and stack
each on top of each other.
Then you get a regression model as on the next page.

Landstrom, Joachim Panel regression analysis


From standard regressions to panel regressions
Model presentation
Eliminating the unobserved variable(s)

A panel regression model

     
y11 1 x111 x112 ··· x11,K −1   u11
 y12 
 
1
 x121 x122 ··· x12,K −1 
 β0  u12 
 
 y21  1 x211 x212 ··· x21,K −1   β1   u21 
= 1 + u  ⇒
       
 y22  x221 x222 ··· x22,K −1   .. 
     .   22 
 .  . .. .. .. ..  . 
 ..   ..  .. 

. . . .  βK −1 K ×1
yNT NT ×1
1 ··· ··· ··· xNT ,K −1 NT ×K
uNT NT ×1
y = Xβ + u

First subscript is the c-s index, the 2nd subscript is the t-s index, and the 3rd
subscript is the column index.

Landstrom, Joachim Panel regression analysis


From standard regressions to panel regressions
Model presentation
Eliminating the unobserved variable(s)

The basic model

ynt = β0 + Xnt β + zn γ + zt µ + unt

zn γ is a time-invariant constant unique for each c-s


zt µ is a cross-sectional-invariant constant unique for each time
If cov (X, zn ) ̸= 0 only, we have a one-way fixed effect model
If cov (X, zt ) ̸= 0 only, we also have a one-way fixed effect model
If we have both cov (X, zn ) ̸= 0 and, cov (X, zt ) ̸= 0, then we have a two-way
fixed effect model

Landstrom, Joachim Panel regression analysis


From standard regressions to panel regressions
Model presentation
Eliminating the unobserved variable(s)

Pooled OLS

ynt = β0 + Xnt β + zn γ + zt µ + unt ⇒


ynt = β0 + Xnt β + unt

Assumes both cov (X, zn ) = 0 and cov (X, zt ) = 0


Collapses the model to a standard OLS regression model applies to the whole
panel

Landstrom, Joachim Panel regression analysis


From standard regressions to panel regressions
Model presentation
Eliminating the unobserved variable(s)

First-differencing

ynt = β0 + Xnt β + zn γ + ut ⇒
ynt−1 = β0 + Xnt−1 β + zn γ + ut−1 ⇒
∆ynt = ∆Xβ + v , where:
v = ∆u = ut − ut−1

zn γ is a time-invariant constant unique for each c-s


Assume cov (X, zn ) ̸= 0
Has no intercept
Design matrix cannot hold constant such as e.g. industry dummies (no
variation if differences).
However, the error is now auto-correlated, by design
This method works in a spreadsheet program, and in SPSS
Landstrom, Joachim Panel regression analysis
From standard regressions to panel regressions
Model presentation
Eliminating the unobserved variable(s)

LSDV
Assume cov (X, zn ) ̸= 0, or cov (X, zt ) ̸= 0, or both
Add one dummy variable for each c-s/time into the regression
This method works in a spreadsheet program, and in SPSS
This method reduces the degrees of freedom significantly
However, estimation of c-s dummies is only consistent in T.
Should be used with caution if regression parameters for dummies is of
interest.

Landstrom, Joachim Panel regression analysis


From standard regressions to panel regressions
Model presentation
Eliminating the unobserved variable(s)

Time-demeaning fixed effect

ynt = β0 + Xnt β + zn γ + u, and


y n = β0 + Xn β + z n γ + u ⇒
ynt − y n = β0 − β0 + (Xnt − Xn )β + (zn − z n )γ + (u − u) ⇒
ynt − y n = β0 − β0 + (Xnt − Xn )β + u

Does not introduce auto-correlation into errors


Does not zap the degrees-of-freedom
Does not rely on T-consistency
Has become the de-facto standard method for fixed effects regressions

Landstrom, Joachim Panel regression analysis


From standard regressions to panel regressions
Model presentation
Eliminating the unobserved variable(s)

Random effects

ynt = β0 + Xnt β + zn γ + u

But now assume cov (X, zn ) = 0


Time-demeaning fixed effect is now inefficient
Seldom seen in my line of work

Landstrom, Joachim Panel regression analysis


Specification test

Specification test
Selecting between Pooled OLS, FE-CS, FE-TS, RE, or FD can be subjected
to format tests
1 Pooled OLS or FE? Call on the F-test for testing panel models
2 Test both time-invariant and cross-sectional invariant FE against pooled using
the the F-test for testing panel models
3 Then test time-invariant vs cross-sectional invariant using the the F-test for
testing panel models
4 Test one-way vs two-way using the the F-test for testing panel models
5 Random effect, or FE? Call on the Hausman test

Landstrom, Joachim Panel regression analysis


The variance-covariance matrix I

Recall:
  
u1 u1 u1 u2 u1 u3 ··· u1 uN σ2 0 · · ·

··· 0
 ..  
0 σ2 · · · ··· 0
 u2 u1 u2 u2 ···. u2 uN 
  
′  0 0 σ2 ··· 0  = σ2I

E(uu |X) =  u
 3 1u u3 2u u3 3u · ·· u2 uN 
=
 .. .. .. . ..   .. .. .. .. .. 
 . . . .. .  . . . . .
uN u1 uN u2 uN u3 · · · uN uN 0 0 0 ··· σ2

Landstrom, Joachim Panel regression analysis


The variance-covariance matrix II & homoskedasticity

 
u11 u11 u11 u12 u11 u21 u11 u22 u11 u31 u11 u32
u12 u11 u12 u12 u12 u21 u12 u22 u12 u31 u12 u32 
 

u21 u11 u21 u12 u21 u21 u21 u22 u21 u31 u21 u32 
E(uu |X) = 
u22 u11
⇒
 u22 u12 u22 u21 u22 u22 u22 u31 u22 u32 

u31 u11 u31 u12 u31 u21 u31 u22 u31 u31 u31 u32 
u32 u11 u32 u12 u32 u21 u32 u22 u32 u31 u32 u32
 2 
σ 0 0 0 0 0
 0 σ2 0 0 0 0
σ2 0 0 0 
 

0 0
E(uu |X) =   = σ 2 INT
0 0
 0 σ2 0 0 

0 0 0 0 σ2 0 
0 0 0 0 0 σ2
Landstrom, Joachim Panel regression analysis
OLS assumptions
Same assumptions apply to a panel regression as to standard OLS
regression in the form of c-s and t-s regression. But more issues pile up.
There cannot be any:
1 Within cross-section heteroskedasticity, e.g. u11 u11 = u12 u12
2 Within cross-section auto-correlation, e.g. u11 u12 = 0
3 Cross-sectional heteroskedasticity, e.g. u12 u12 = u21 u21
4 Cross-sectional contemporaneous correlation, e.g. u11 u21 = 0
5 Cross-sectional auto-correlation, e.g. u11 u22 = 0

Landstrom, Joachim Panel regression analysis


Treating departures from heteroskedasticity and auto-correlation
If within cross-section auto-correlation? Is the model correctly specified. Is a
dynamic model needed (lag and or leads?)
If within cross-section heteroskedasticity? Transformation? Outliers?
If cross-sectional contemporaneous correlation? Two-way model?
Also use HAC standard errors.

Landstrom, Joachim Panel regression analysis


HAC errors I

Clustering Symbol Formula Comment


None (White) VWH White errors
Cross Section Clustering VCX Arellano (1987)
Time Clustering VCT Arellano (1987)
Double Clustering VCXT VCXT = VCX + VCT − VWH

VCT .L = VCT + Ll=1 (VCT .l − VCT .l )
P
Time Cluster + Shock VCT .L
PL ′
None (Newey-West) VNW .L VNW .L = VWH + l=1 ωl (VWH.l − VWH.l )
PL ′
Driscoll and Kraay VSCC.L VSCC.L = VCT + l=1 ωl (VCT .l − VCT .l )
Double Clustering + Shock VCXT .L VCXT .L = VCT .L + VCX − VNW .L|ω=1

Landstrom, Joachim Panel regression analysis


HAC errors II
The Arellano (1987)-based covariance matrix allow for both serial correlation
and heteroskedasticity
The White covariance matrix only allows for heteroskedasticity
Time clustering requires longer t-s, T > 4, which is not often available in
business studies. So is time/double clustering feasible?
To add to above: Use HC0 for large samples and HC3/HC4 for small samples.

Landstrom, Joachim Panel regression analysis


Causal inference and quantitative research
The event study method

Joachim Landstrom

Department of Business Studies, Uppsala University

December 9, 2024
Motivation
Introduction
Main structure

Recap: Bias from under-specification


Assumptions: β0 = 0, β1 = 2, γ = 2, xi ∼ N(1, 2), v ∼ N(0, 4) zi = 0.5xi + w,
where w ∼ N(0, 4), which gives ρX,z ≈ 0.32

Bias [1,]/[2,] N = 10 N = 100 N = 1, 000 N = 10, 000


X′ z 14.091 16.022 685.989 4,612.025
73.745 149.272 3,365.101 25,015.443
(X′ X)−1 X′ zγ -1.221 -0.613 -0.024 -0.097
(Omitted variables bias) 3.515 0.790 1.352 1.024
(X′ X)−1 X′ v̂ 1.459 0.338 -0.112 -0.040
(Sampling bias) -0.809 0.083 0.064 0.014
β̂0 − β0 0.238 -0.275 -0.136 -0.137
β̂1 − β1 2.705 0.873 1.416 1.038

Landstrom, Joachim Causal inference and the event study method


Motivation
Introduction
Main structure

Recap: The basic panel regression model

yit = β0 + Xit β + ηi + δt +uit


| {z }
Fixed effects

ηi is a time-invariant constant unique for each cross-section


δt is a cross-sectional-invariant constant unique for each time
Xit are covariates
yit is the dependent variable

Landstrom, Joachim Causal inference and the event study method


Motivation
Introduction
Main structure

Motivation
An event study is a quasi experiment where the event is the treatment
The research design isolates the treatment effect from other confounding
factors
The treatment may be temporary
Give the subject an electrical shock (RCT not quasi)
Threat of fine from misbehaving
Publication of some news
or permanent
A policy change such as the introduction of mandatory sustainability reporting
Introduction of import duties (Wooldridge, 2016, pp 347 – 350)
Isolation via observing the outcome both pre- and post-treatment
Useful when RCT cannot be used (or when unethical)
Transparent
Replicable
Well-established Landstrom, Joachim Causal inference and the event study method
Motivation
Introduction
Main structure

Staggered and non-staggered rollout design


Staggered rollout: Events occurs at different calendar times
Non-staggered rollout: All events occur at the same calendar time
Susceptibility to confounding variables:
Staggered avoids clustering-effect and
possible other calendar-time dependent confounding events
Event-time vs calendar-time

Landstrom, Joachim Causal inference and the event study method


Motivation
Introduction
Main structure

Event-time and event windows for an event study

Pre-event window Event windowPost-event window

T0 T1 T2 T3

Landstrom, Joachim Causal inference and the event study method


Motivation
Introduction
Main structure

The contrafactual — What would have been?


In cause-and-effect studies we need to figure out what would have been true
had we not provided ‘treatment’, or likewise ‘had experienced that event’: The
contrafactual
How to predict the contrafactual
Ignore it — Implies random-walk
Use pre-event window information makes it possible to e.g. account for trend
Use post-event window information mixed together with pre-event window
E.g. regress on pre-event and use those regression parameters, the β̂, with
post-event data to get predictions)
Often capital market-based event studies — the market model
Then we study any deviation from the predicted contrafactual during
post-treatment
Landstrom, Joachim Causal inference and the event study method
Motivation
Introduction
Main structure

Example: Non-staggered events

Landstrom, Joachim Causal inference and the event study method


Motivation
Introduction
Main structure

Example: Staggered events but with calendar time

Landstrom, Joachim Causal inference and the event study method


Motivation
Introduction
Main structure

Example: Staggered events with event time

Landstrom, Joachim Causal inference and the event study method


Staggered rollout two-way dynamic fixed effect
Regression-based event study method
Examples

The regression-based event study method


1 Pooled OLS + Dummy(ies)
2 One-way fixed effect + Dummy(ies)
3 Two-way fixed effect + Dummy(ies) [a.k.a. TWFE — Try to google it + event
study]
4 More advanced methods to allow for various heterogeneity across e.g.
cross-sections
Dummies are often binary, but, generically, they can be continous too.
Dummies are often permanent (switch is turned on), but the switch may be
allowed to turn on and off too — i.e. not permanent (for the duration of the
post-event window) .
This lecture builds on a binary & permanent dummy set-up
Landstrom, Joachim Causal inference and the event study method
Staggered rollout two-way dynamic fixed effect
Regression-based event study method
Examples

Two-way dynamic fixed effect design I

Pre-event window Post-event window


z }| { z }| {
−2
X X T X K
yit = ηi + δt + ατpre Diτ + α0event Di0 + ατpost Diτ + βk xik +uit
| {z } | {z }
Fixed effects τ =−T τ =1 k =1
Event window | {z }
Controls
where

(
1 if t = τ
Diτ =
0 if t ̸= τ
αpost are the average treatment effects
Landstrom, Joachim Causal inference and the event study method
Staggered rollout two-way dynamic fixed effect
Regression-based event study method
Examples

Simplifications to the two-way dynamic fixed effect design


A simple research design may ignore fixed effect, see e.g. Wooldridge, 2016,
Example 10.5, pp 349 – 350
It may also have only a single dummy for each window (pre-event, event,
post-event). Again see Example 10.5 in Wooldridge.
Or maybe event a single treatment dummy
Simplifications can thus be e.g.:
Yit = β0 + αDi + uit , i.e. pooled OLS with a single treatment dummy
T
X
Yit = β0 + α0 Di0 + ατ Diτ + uit , i.e. pooled OLS with event & post-event dummies
τ =1
Yit = β0 + αpre Di,pre + αevent Di,event + αpost Di,post + uit

Landstrom, Joachim Causal inference and the event study method


Staggered rollout two-way dynamic fixed effect
Regression-based event study method
Examples

Research design issues


Cohorts pre- and post- may have changed
Knowledge of forthcoming treatment already in pre-event window
Parallel-trend assumption
Assumption of the same treatment effect across the cross-sections

Landstrom, Joachim Causal inference and the event study method


Staggered rollout two-way dynamic fixed effect
Regression-based event study method
Examples

Case: Punishment to promote prosocial behavior


Experiment to increase waste separation.
Staggered randomized rollout route-by-route across 65 routes in Tilburg 2014
– 2015, affecting 70,000 households.
1 3 wks prior to start households were informed that it was illegal not to separate
their waste and that they could be fined
2 1 mth of visually inspection each garbage can on the focal street(s). Attached
warning labels if breach in separation.
Persistent waste-reduction 10-15 percent
Vollaard, B. & van Soest, D, (2024) "Punishment to promote prosocial behavior: a
field experiment", Journal of Environmental Economics and Management, vol 124.
DOI

Landstrom, Joachim Causal inference and the event study method


Staggered rollout two-way dynamic fixed effect
Regression-based event study method
Examples

Residual (i.e. unsorted) waste change

Landstrom, Joachim Causal inference and the event study method


Staggered rollout two-way dynamic fixed effect
Regression-based event study method
Examples

Case: Social media and mental health


Staggered quasi experiment: Rollout of Facebook across US Colleges
Significant deterioration of mental health from social media
Braghieri, L., Levy, R., & Makarin, A. (2022). Social Media and Mental Health. The
American Economic Review, 112(11), 3660–3693. DOI

Landstrom, Joachim Causal inference and the event study method


Staggered rollout two-way dynamic fixed effect
Regression-based event study method
Examples

Case: Social media and mental health

Landstrom, Joachim Causal inference and the event study method


Staggered rollout two-way dynamic fixed effect
Regression-based event study method
Examples

Case: Predicting philanthropic contributions (static fixed effect)


How does mega-events and natural disasters affect philanthropic
contributions in the US?
(Presented separately)
Tilcsik, A., & Marquis, C. (2013). Punctuated Generosity: How Mega-events and
Natural Disasters Affect Corporate Philanthropy in U.S. Communities.
Administrative Science Quarterly, 58(1), 111–148. DOI

Landstrom, Joachim Causal inference and the event study method


Introduction
Capital-market based event studies Example from marketing
An econometric skeleton of a capital market-based event study Example from organization studies
Finance and accounting

Capital-market based event studies


Using financial market data, an event study measures the impact of a specific
event on the value of a firm.
The usefulness of event studies arises from the fact that the magnitude of
abnormal performance at the time of an event provides a measure of the
(unanticipated) impact of this type of event on the wealth of the firms’
claimholders.
Event studies also serve an important purpose in capital market research as
the principle means of testing market efficiency.
Systematically non-zero abnormal security returns that persist after a particular
type of corporate event are inconsistent with market efficiency.
Event studies are often seen outside of mainstream accounting and finance
journals — as I am about to show you
Landstrom, Joachim Causal inference and the event study method
Introduction
Capital-market based event studies Example from marketing
An econometric skeleton of a capital market-based event study Example from organization studies
Finance and accounting

Two-versions of an event study research design


A ‘pure’ event study — from here on this is what I call an event study
Portfolio sorts — next lecture

Landstrom, Joachim Causal inference and the event study method


Introduction
Capital-market based event studies Example from marketing
An econometric skeleton of a capital market-based event study Example from organization studies
Finance and accounting

Outcome and contrafactuals in an event study


The outcome variable is often called car — cumulative abnormal return
The contrafactual is the expected return
The expected return often use post-event window information mixed together
with pre-event window

Landstrom, Joachim Causal inference and the event study method


Introduction
Capital-market based event studies Example from marketing
An econometric skeleton of a capital market-based event study Example from organization studies
Finance and accounting

Example from marketing: Gao(2015)


Pre-recall advertising spending can moderate the financial damage from
product recalls (not hazard recalls)
Based on automotive recalls for the period 2005–2012
Gao, H., Xie, J., Wang, Q., & Wilbur, K. C. (2015). Should Ad Spending Increase
or Decrease Before a Recall Announcement? The Marketing-Finance Interface in
Product-Harm Crisis Management. Journal of Marketing, 79(5), 80–99. DOI

Landstrom, Joachim Causal inference and the event study method


Introduction
Capital-market based event studies Example from marketing
An econometric skeleton of a capital market-based event study Example from organization studies
Finance and accounting

Example from marketing: Gao(2015)

Landstrom, Joachim Causal inference and the event study method


Introduction
Capital-market based event studies Example from marketing
An econometric skeleton of a capital market-based event study Example from organization studies
Finance and accounting

Example from organization studies


See separate pdfs

Landstrom, Joachim Causal inference and the event study method


Introduction
Capital-market based event studies Example from marketing
An econometric skeleton of a capital market-based event study Example from organization studies
Finance and accounting

Event studies in accounting and finance

Kothari and Warner (2007) find that between 1974 and 2000, 565 event-study
papers where published in five journals.
The basic statistical format of event-studies has not changed.
Two changes to the method:
Daily return data are used instead of monthly data
The methods to estimate abnormal returns has improved.

Landstrom, Joachim Causal inference and the event study method


Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

Event studies and market efficiency


Event studies assumes that the market is highly efficient.
Hence share prices reflect all available information.
If the release of new information is random, it follows that share prices should
to be randomly-walking, and the buying-selling of shares are zero-NPV
transactions.
If the share performs non-randomly, contingent on the release of new
information, it tells use that the event has an effect on security values.
The release of new information should be immediately reflected in share
prices.
The speed at which the market reacts informs us about the degree of market
efficiency.
Landstrom, Joachim Causal inference and the event study method
Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

Steps in a typical capital-market based event study


1 Define the event and establish the event window (T1 <= τ < T2).
2 Define the estimation window (T0 <= τ < T1).
3 Define the post-estimation window (T2 <= τ < T3).
4 Establish the firm selection criteria. Make sure that the shares are frequently
traded during the event window.
5 Estimate the model parameters using data in the estimation window.
6 Measure the abnormal returns for the shares in the sample.
7 Conduct tests. Define null and alternative hypotheses. Measure the abnormal
returns. Determine levels of significance for tests.
8 Present results and diagnostics
9 Interpret results and draw inferences and conclusions.
Landstrom, Joachim Causal inference and the event study method
Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

Recall: Time-line for an event study

Estimation window Event windowPost-event window

T0 T1 T2 T3

Landstrom, Joachim Causal inference and the event study method


Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

Sample selection criteria

Samples selected on common characteristics such as market cap, prior


returns, book-to-market ratio, and earnings-price ratio can be severely biased.
Such samples can easily lead to imprecise and potentially erroneous
inferences.

Landstrom, Joachim Causal inference and the event study method


Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

Set the event window, the post-event window, and the estimation
window

Event studies standardises the share price reactions by measuring the


reaction relative to the date of the event.
We identify the event date (announcement date, τ ).
The event window is typically larger than just the announcement date. It starts
before and end after it (is often 2–3 days). Studies are usually more effective
with a short event window.

Landstrom, Joachim Causal inference and the event study method


Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

Set the event window, the post-event window, and the estimation
window, continued

We then (often) trace the share price reaction over a period that we call the
post-event window. It typically begins shortly after the event window.
A short window is typically less than a year.
A long window is a year or longer.
We often observe the share price behaviour of a period before the event
window. This is the estimation window, and may extend a year back in time. It
typically ends shortly before the event window begins.
The estimation window and the post-event window are sometimes combined
to build a broader estimation window.

Landstrom, Joachim Causal inference and the event study method


Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

Abnormal returns

Appraisal of the event’s impact requires a measure of the abnormal return


The abnormal return is the actual ex post return of the share over the event
window minus the expected return of the firm over the event window.
The normal return is defined as the expected return without conditioning on
the event taking place.

For firm i and event date τ the abnormal return is:


ARiτ = Riτ − R̂iτ

Landstrom, Joachim Causal inference and the event study method


Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

Expected returns

Model choices for estimating the expected return:


1 The constant return model
2 The market model
3 CAPM and versions thereof
4 Sorts

Landstrom, Joachim Causal inference and the event study method


Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

Measuring the return

The total return including dividends


log(Rit ) = log[(Pit + Dit )/Pit−1 ]

Rit = Return on the share i during period t


Pit = Price of share i at the end of period t
Dit = Dividend (ex dividend, not necessarily paid) on share i during period t
Logarithmic returns generally produce better estimates than tests based on
arithmetic returns.

Landstrom, Joachim Causal inference and the event study method


Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

The constant return model

Assumes that the mean return for a share (µ̄i ) is constant.


Often performs as well as more sophisticated models.
For daily data the model use nominal returns.
For monthly data the model can use e.g. real returns or excess returns.
Excess returns are return differences to a short-duration risk-free return.

The constant return model


Riτ = µ̄i + ζiτ , where E(ζiτ ) = 0, and var(ζiτ ) = σζ2i .

Landstrom, Joachim Causal inference and the event study method


Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

The market model


The market model is a statistical model that relates the return of a share to
the return of the market portfolio.
For share i the market model is:
The market model
Rit = α + βi × Rmt + ϵit , where E(ϵit ) = 0, and var(ϵit ) = σϵ2i
R̂iτ = α̂ + β̂i × Rmτ

The market model parameters are usually estimated by OLS of Rit on Rmt
over the estimation window.
The choice of market index can be important. Several studies report problems
with choosing a value-weighted index. An equally-weighted index seem
preferable.
Landstrom, Joachim Causal inference and the event study method
Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

The market model

By removing the portion of the return that is related to variation in the market’s
return, the variance of the abnormal return is reduced.
This in turn can lead to increased ability to detect event effects.
The benefit from using the market model will depend upon the R 2 of the
market model regression. The higher the R 2 the greater is the variance
reduction of the abnormal return, and the larger is the gain.

Landstrom, Joachim Causal inference and the event study method


Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

The market adjusted model

The market adjusted model ignores risk


According to Kothari & Waren (2007, pp. 21), this works well for short-window
event studies
For share i the market adjusted model is:

The market adjusted model


R̂iτ = Rmτ

Landstrom, Joachim Causal inference and the event study method


Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

CAPM

CAPM
R̂iτ = rf τ + β̂i × (Rmτ − rf τ )

Common in the 1970’s


CAPM is nowadays seen as “dead”.

Landstrom, Joachim Causal inference and the event study method


Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

Fama and French’s 3-factor model


Fama and French’s 3-factor model
R̂iτ − rf τ = âi + b̂1i × (Rmτ − rf τ ) + b̂2i × SMLτ + b̂3i × HMLτ

Generally, the gains from employing multifactor models for event studies are
limited (Campbell, Lo, MacKinlay, 1997, p. 156). The reason for the limited
gains is the empirical fact that the marginal explanatory power of additional
factors than the market factor is small, and hence, there is little reduction in
the variance of the abnormal return
The variance reduction will typically be greatest in cases where the sample
firms have a common characteristic, for example they are all members of one
industry or they are all firms concentrated in one market capitalisation group.
In these cases the use of a multifactor model warrants consideration.
Landstrom, Joachim Causal inference and the event study method
Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

Sorts
Suppose that there are two factors that affect returns: Size and B/M. We do
not know whether there is a stable or linear relationship as the one specified
in the FF 3-factor model. What do we do?
1 Sort all returns in the sample into 10 deciles according to size.
2 Conditional on size, sort returns into ten deciles according to B/M. (This gives us
100 portfolios.)
3 Compute the average return of the 100 portfolios for each period. This gives us
the expected returns of stocks given the characteristics.
4 For each stock in the event study:
1 Find in what size decile they belong.
2 Find in what B/M decile they belong.
3 Compare the return of the stock to the corresponding portfolio return
4 The deviations are the abnormal returns.

Landstrom, Joachim Causal inference and the event study method


Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

Sorts

Sorts give more conservative results than the FF 3-factor model.


If we use the FF 3-factor method, we tend to find huge abnormal returns,
while with the sorts, we do not.
Results change if we sort first by B/M and then size (not good).
Results change if we sort according to other characteristics

Landstrom, Joachim Causal inference and the event study method


Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

Cumulative abnormal returns


The abnormal return observations must be aggregated in order to draw
overall inferences for the event of interest. The aggregation is along two
dimensions—through time and across events.
ARit = Rit − R̂it = Rit − α̂i − β̂i × Rmt
1 N
P
AR t = × ARit
N i=1
1 T
σ̂ϵ2i = ARit2 , (market-adjusted model)
P
×
T − 1 t=1
1 T
σ̂ϵ2i = ARit2 , (market model), which means that it is the regression’s
P
×
T − 2 t=1
MSE
1 T
σ̂ϵ2i = ARit2 , (FF3F model), which means that it is the regression’s
P
×
T − 4 t=1
MSE
Landstrom, Joachim Causal inference and the event study method
Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

Cumulative abnormal returns, cont.

1 N N
2 = 1 ×
P P
var (AR t ) = × σ̂ ϵ MSEi , where the N is the number of
N 2 i=1 i N 2 i=1
events
T
P
CAR = AR t
i=1
T
P
var (CAR) = var (AR t ) = L × var (AR t ), where L is the length of the event
t=1
window

Landstrom, Joachim Causal inference and the event study method


Event studies and market efficiency
Steps in a typical capital-market based event study
Capital-market based event studies
Expected and abnormal returns
An econometric skeleton of a capital market-based event study
How to aggregate the abnormal returns
Standard T-tests of CAR

T-tests on CAR

T-statistics for CAR


CAR
t̂ = q
var (CAR)

t is distributed Student-t with T − 1 degrees of freedom

Landstrom, Joachim Causal inference and the event study method


Causal inference and quantitative research
The portfolio sorts method

Joachim Landstrom

Department of Business Studies, Uppsala University

December 18, 2024


Introduction

Recap: Omitted variables

−1
Recall: β̂ = β + (X′ X) X′ û
| {z }
Sampling error

Suppose the correct model is: y = Xβ + zγ + v


and the (incorrect) model used is: y = Xβ + u
From it follows that: u = zγ + v, and
−1 −1
β̂ − β = (X′ X) X′ zγ + (X′ X) X′ v̂
| {z } | {z }
Omitted var bias Sampling error

Landstrom, Joachim Causal inference and portfolio sorts


Introduction

Recap: Bias from omitted variables


Assumptions: β0 = 0, β1 = 2, γ = 2, xi ∼ N(1, 2), v ∼ N(0, 4) zi = 0.5xi + w,
where w ∼ N(0, 4), which gives ρX,z ≈ 0.32

Bias [1,]/[2,] N = 10 N = 100 N = 1, 000 N = 10, 000


X′ z 14.091 16.022 685.989 4,612.025
73.745 149.272 3,365.101 25,015.443
(X′ X)−1 X′ zγ -1.221 -0.613 -0.024 -0.097
3.515 0.790 1.352 1.024
(X′ X)−1 X′ v̂ 1.459 0.338 -0.112 -0.040
-0.809 0.083 0.064 0.014
β̂0 − β0 0.238 -0.275 -0.136 -0.137
β̂1 − β1 2.705 0.873 1.416 1.038

Landstrom, Joachim Causal inference and portfolio sorts


Introduction

Recap: The capital market-based event study method


The treatment is the event
The market is assumed efficient
Thus pre- and post-event returns are assumed to be randomly-walking
The contrafactual is all non-treatment returns, scaled for risk.
In practice we use a market-index as a proxy for the contrafactual, either controlling
for differences in risk, or not
The outcome is the spread between the treatment outcome and the expected
outcome, which is measured as above.
This difference is collated in different manners depending on study setup e.g.
as: aar , car , caar also known as acar , or perhaps as (Jensen’s) alpha
[today’s topic], or simply as a hedge portfolio return [also today’s topic].

Landstrom, Joachim Causal inference and portfolio sorts


Introduction

A classic capital market-based event-study set-up

Landstrom, Joachim Causal inference and portfolio sorts


The calendar-time portfolio

The calendar-time portfolio: An orderly world


Rather than stacking all events into ‘event time’, we may form so-called a
calendar-time portfolio
All stocks that has experienced an event in a given time-period are included
into a portfolio
The portfolio is re-evaluated at certain intervals (re-weighted)
Stocks are pruned from portfolio if no event in the current time-period
New stocks are added that has experienced the event in the current
time-period
Portfolio returns are calculated and often evaluated using the portfolio
(Jensen’s) alpha

rp,t = rf ,t + β1 (rm,t − rf ,t ) ⇒
rp,t − rf ,t = β0 + β1 (rm,t − rf ,t ) + ut , where β0 = Jensen’s α

Landstrom, Joachim Causal inference and portfolio sorts


The calendar-time portfolio

Jensen’s alpha — and its children: From order to chaos


Model assumed that only beta-risk is priced (a 1-factor model)
Then any significant non-zero alpha is an ‘anomaly’ (against the asset-pricing
model, and the belief in an efficient market.)
This is the so-called joint hypothesis problem
The joint hypothesis problem:
Is the market inefficient, or
is the asset-pricing model incorrect, or
is the market inefficient and the asset-pricing model is incorrect?
No way to tell what is the reason
Belief in market efficiency:
Any anomaly is a sign of an incorrect asset-pricing model: An omitted variable!
Based on Ross’ arbitrage pricing theory (APT)
The ‘hunt’ for factors is ON since Fama-French (1992)
Today multi-factor models (4/5-factors) are used on US-data. Less so in
Europe due to lack of easily available factor-models
Landstrom, Joachim Causal inference and portfolio sorts
The calendar-time portfolio

CAPM is dead: From chaos to order — APT, the ‘new kid in the bloc’

Cochrane, J. H. (April 2011), Discount Rates. NBER Working Paper No. w16972.
Landstrom, Joachim Causal inference and portfolio sorts
Portfolio sorts: Introduction
Regression & portfolio sorts
The portfolio sort process

Cross-sectional regression & Non-linearity

Landstrom, Joachim Causal inference and portfolio sorts


Portfolio sorts: Introduction
Regression & portfolio sorts
The portfolio sort process

Cross-sectional regression & portfolio sorts

Cochrane, J. H. (April 2011), Discount Rates. NBER Working Paper No. w16972.
Landstrom, Joachim Causal inference and portfolio sorts
Portfolio sorts: Introduction
Regression & portfolio sorts
The portfolio sort process

Portfolio sorts

Cross-sectional regression:
rit − rft = α + xit β + uit , where
x ∈ {rm − rf , size, bm, momentum, accruals, roe, . . . }

Portfolio sorts: Cross-sectional regression:


Sorts allows for non-linear Cross-sectional regression requires
relationship a linear relationship
Sort on one or two parameters Allows for multivariate regression
Easy to handle inclusion/exclusion Difficult to handle inclusion/exclusion
Allows for a hedge portfolio Does not allow for a hedge portfolio
Landstrom, Joachim Causal inference and portfolio sorts
Portfolio sorts: Introduction
Regression & portfolio sorts
The portfolio sort process

Cross-sectional regression → Panel regressions → TWFE

rit − rft = α + xit β + uit ⇒


rit − rft = xit β + ηi + µt + uit

Portfolio sorts difficult to apply with more than two factors, but
Cross-sectional regressions (may) suffer(s) from omitted variables, so
Solution is, maybe, to allow for panel fixed effects, and
Perhaps we should start to apply a two-way fixed effect (TWFE) event study
setup in the future.

Landstrom, Joachim Causal inference and portfolio sorts


Portfolio sorts: Introduction
Examples of research
The portfolio sort process

The sorting process I


1 Firms/stock are sorted on some parameter:
Customer Satisfaction/Advertising exp/ESG/ROA/Size/BM/Mom/Accruals, . . .
2 Stock are sorted into bins/groups/portfolios based on breakpoints (often 3, 5,
10 bins)
3 Average returns are measured during a period
4 For longer duration: Re-sorting at specific intervals to handle
inclusion/exclusion
Weekly/monthly/quarterly/semi-annually/annually
5 Often a High/Low difference bin is created and statistically tested.
In-between portfolios are often ignored. But, really? Why?
Should we not see a monotonic change between bins?

Landstrom, Joachim Causal inference and portfolio sorts


Portfolio sorts: Introduction
Examples of research
The portfolio sort process

Is there a monotonic change?

Patton, A. & Timmermann (2008)

Landstrom, Joachim Causal inference and portfolio sorts


Portfolio sorts: Introduction
Examples of research
The portfolio sort process

An implied trading strategy


In the High/Low difference bin, there is an implied return to a trading strategy.
Go long in the high bin and short low bin.
But can you really short the low bin?
Are those stock available to short sellers?
What is the transaction cost to such a trading strategy?
The higher the rebalancing frequency, the costlier it becomes.

Landstrom, Joachim Causal inference and portfolio sorts


Portfolio sorts: Introduction
Examples of research
The portfolio sort process

The sorting process II — Unconditional/Conditional sorts


Two main versions of sorts:
Unconditional sorts:
Each parameter is independently sorted.
Sorts combined (As we will do in the workshop)
Sort order is unimportant
Conditional sorts:
First sort the first parameter into bins
Then sort each bin above on second parameter into sub-bins
Sort order becomes important
Several other sorting variations exists that can be used e.g. when the
parameter seems to be skewed
See Cerniglia, J.A., Kolm, P.N., Fabozzi, F.J., (2013), pp. 214–219, for such a
discussion with nice examples

Landstrom, Joachim Causal inference and portfolio sorts


Portfolio sorts: Introduction
Examples of research
The portfolio sort process

The Stock Market’s Pricing of Customer Satisfaction


Does the stock market react to the publication of US Customer Satisfaction
Index?
Period 1Q1995 – Q42006
Ittner, C., Larcker, D., & Taylor, D. (2009). Commentary–The Stock Market’s
Pricing of Customer Satisfaction. Marketing Science (Providence, R.I.), 28(5),
826–835.

Landstrom, Joachim Causal inference and portfolio sorts


Portfolio sorts: Introduction
Examples of research
The portfolio sort process

The Stock Market’s Pricing of Customer Satisfaction

Landstrom, Joachim Causal inference and portfolio sorts

You might also like