Discrete Choice Analysis II
Moshe Ben-Akiva
1.201 / 11.545 / ESD.210
Transportation Systems Analysis: Demand & Economics
Fall 2008
Review – Last Lecture
● Introduction to Discrete Choice Analysis
● A simple example – route choice
● The Random Utility Model
– Systematic utility
– Random components
● Derivation of the Probit and Logit models
– Binary Probit
– Binary Logit
– Multinomial Logit
2
Outline – This Lecture
● Model specification and estimation
● Aggregation and forecasting
● Independence from Irrelevant Alternatives (IIA) property –
Motivation for Nested Logit
● Nested Logit - specification and an example
● Appendix:
– Nested Logit model specification
– Advanced Choice Models
3
Specification of Systematic Components
● Types of Variables
– Attributes of alternatives: Zin, e.g., travel time, travel cost
– Characteristics of decision-makers: Sn, e.g., age, gender, income,
occupation
– Therefore: Xin = h(Zin, Sn)
● Examples:
– Xin1 = Zin1 = travel cost
– Xin2 = log(Zin2) = log (travel time)
– Xin3 = Zin1/Sn1 = travel cost / income
● Functional Form: Linear in the Parameters
Vin = β1Xin1 + β2Xin2 + ... + βkXinK
Vjn = β1Xjn1 + β2Xjn2 + ... + βkXjnK
4
Data Collection
● Data collection for each individual in the sample:
– Choice set: available alternatives
– Socio-economic characteristics
– Attributes of available alternatives
– Actual choice
n Income Auto Time Transit Time Choice
1 35 15.4 58.2 Auto
2 45 14.2 31.0 Transit
3 37 19.6 43.6 Auto
4 42 50.8 59.9 Auto
5 32 55.5 33.8 Transit
6 15 N/A 48.4 Transit
5
Model Specification Example
Vauto = β0 + β1 TTauto + β2 ln(Income)
Vtransit = β1 TTtransit
β0 β1 β2
Auto 1 TTauto ln(Income)
Transit 0 TTtransit 0
6
Probabilities of Observed Choices
●Individual 1:
Vauto = β0 + β1 15.4 + β2 ln(35)
Vtransit = β1 58.2
e β 0 +15.4 β1 +ln(35) β 2
P(Auto) = β 0 +15.4 β1 +ln(35) β 2
e + e 58.2 β1
●Individual 2:
Vauto = β0 + β1 14.2 + β2 ln(45)
Vtransit = β1 31.0
e31.0 β1
P(Transit) =
e β 0 +14.2 β1 +ln(45) β 2 + e 31.0 β1
7
Maximum Likelihood Estimation
● Find the values of β that are most likely to result in the choices observed
in the sample:
– max L*(β) = P1(Auto)P2(Transit)…P6(Transit)
1, if person n chose alternative i
● If yin =
0, if person n chose alternative j
● Then we maximize, over choices of {β1, β2 …, βk}, the following
expression:
N
L * ( β 1 , β 2 ,..., β k ) = ∏
y
Pn (i ) y in Pn ( j ) jn
n =1
● β* = arg maxβ L* (β1, β2,…, βk)
= arg maxβ log L* (β1, β2, …, βk)
8
Sources of Data on User Behavior
● Revealed Preferences Data
– Travel Diaries
– Field Tests
● Stated Preferences Data
– Surveys
– Simulators
9
Stated Preferences / Conjoint Experiments
● Used for product design and pricing
– For products with significantly different attributes
– When attributes are strongly correlated in real markets
– Where market tests are expensive or infeasible
● Uses data from survey “trade-off” experiments in which
attributes of the product are systematically varied
● Applied in transportation studies since the early 1980s
10
Aggregation and Forecasting
● Objective is to make aggregate predictions from
– A disaggregate model, P( i | Xn )
– Which is based on individual attributes and
characteristics, Xn
– Having only limited information about the explanatory
variables
11
The Aggregate Forecasting Problem
● The fraction of population T choosing alt. i is:
W (i ) = ∫ P (i| X ) p( X ) dX , p(X) is the density function of X
X
1 N T
= ∑
N T n=1
P (i| X n ) , NT is the # in the population of interest
● Not feasible to calculate because:
– We never know each individual’s complete vector of
relevant attributes
– p(X) is generally unknown
● The problem is to reduce the required data
12
Sample Enumeration
● Use a sample to represent the entire population
● For a random sample:
1 Ns
Ŵ (i) = ∑ P̂(i | xn ) where Ns is the # of obs. in sample
N s n=1
● For a weighted sample:
Ns
w
1
Wˆ (i ) = ∑ n
P̂p̂ (i | xn ) , where is xn 's selection prob.
n=1 ∑ wn wn
n
● No aggregations bias, but there is sampling error
13
Disaggregate Prediction
Generate a representative population
Apply demand model
• Calculate probabilities or simulate
decision for each decision maker
• Translate into trips
• Aggregate trips to OD matrices
Assign traffic to a network
Predict system performance
14
Generating Disaggregate Populations
Household Exogenous
surveys forecasts
Census
Counts
data
Data fusion
(e.g., IPF, HH evolution)
Representative
Population
15
Review
● Empirical issues
– Model specification and estimation
– Aggregate forecasting
● Next…More theoretical issues
– Independence from Irrelevant Alternatives (IIA) property –
Motivation for Nested Logit
– Nested Logit - specification and an example
16
Summary of Basic Discrete Choice Models
● Binary Probit:
Vn
1 − 21 ε 2
Pn (i| Cn ) = Φ(Vn ) = ∫ e dε
−∞
2π
● Binary Logit:
1 eVin
Pn (i| Cn ) = −Vn
= V
e in + e jn
V
1+ e
● Multinomial Logit:
eVin
Pn (i| Cn ) =
∑
V jn
e
j∈Cn
17
Independence from
Irrelevant Alternatives (IIA)
● Property of the Multinomial Logit Model
– εjn independent identically distributed (i.i.d.)
– εjn ~ ExtremeValue(0,µ) ∀ j
e µVin
– Pn (i| Cn ) =
∑ e
µV jn
j ∈Cn
P(i|C1 ) P(i|C2 )
so = ∀ i, j, C1, C2
P( j|C1 ) P( j|C2 )
such that i, j ∈ C1, i, j ∈ C2, C1 ⊆ Cn and C2 ⊆ Cn
18
Examples of IIA
● Route choice with an overlapping segment
T-δ
Path 2 b
a
δ
O D
Path 1
T
e µT 1
P(1|{1,2a,2b}) = P(2a|{1,2a,2b}) = P(2b|{1,2a ,2b}) = =
∑ e µT 3
j ∈{1, 2 a ,2b}
19
Red Bus / Blue Bus Paradox
● Consider that initially auto and bus have the same utility
– Cn = {auto, bus} and Vauto = Vbus = V
– P(auto) = P(bus) = 1/2
● Suppose that a new bus service is introduced that is identical
to the existing bus service, except the buses are painted
differently (red vs. blue)
– Cn = {auto, red bus, blue bus}; Vred bus = Vblue bus = V
– Logit now predicts
P(auto) = P(red bus) = P(blue bus) =1/3
– We’d expect
P(auto) =1/2, P(red bus) = P(blue bus) =1/4
20
IIA and Aggregation
● Divide the population into two equally-sized groups: those
who prefer autos, and those who prefer transit
● Mode shares before introducing blue bus:
Population Auto Share Red Bus Share
Auto people 90% 10% P(auto)/P(red bus) = 9
Transit people 10% 90% P(auto)/P(red bus) = 1/9
Total 50% 50%
● Auto and red bus share ratios remain constant for each
group after introducing blue bus:
Population Auto Share Red Bus Share Blue Bus Share
Auto people 81.8% 9.1% 9.1%
Transit people 5.2% 47.4% 47.4%
Total 43.5% 28.25% 28.25%
21
Motivation for Nested Logit
● Overcome the IIA Problem of Multinomial Logit when
– Alternatives are correlated
(e.g., red bus and blue bus)
– Multidimensional choices are considered (e.g., departure
time and route)
22
Tree Representation of Nested Logit
● Example: Mode Choice (Correlated Alternatives)
motorized non-motorized
auto transit bicycle walk
drive carpool bus metro
alone
23
Tree Representation of Nested Logit
● Example: Route and Departure Time Choice (Multidimensional Choice)
Route 1 Route 2 Route 3 8:10 8:20 8:30 8:40 8:50
.... .... .... ....
8:10 8:20 8:30 8:40 8:50 Route 1 Route 2 Route 3
24
Nested Model Estimation
● Logit at each node
● Utilities at lower level enter at the node as the inclusive value
Non-
motorized Motorized
(NM) (M)
I NM = ln ∑ e Vi
i∈C NM Walk Bike Car Taxi Bus
● The inclusive value is often referred to as logsum
25
Nested Model – Example
Non- Motorized
motorized (M)
(NM)
Walk Bike Car Taxi Bus
e µ NM Vi
P(i | NM ) = µ NM VWalk i = Walk , Bike
e + e µ NM VBike
1
I NM = ln(e µ NM VWalk + e µ NM VBike )
µ NM
26
Nested Model – Example
Non-
Motorized
motorized
(M)
(NM)
Walk Bike Car Taxi Bus
e µ M Vi
P(i | M ) = µ M VCar
i = Car ,Taxi, Bus
e + e µ M VTaxi + e µ M VBus
1 µ M VCar µ M VTaxi µ M VBus
IM = ln(e +e +e )
µM
27
Nested Model – Example
Non-
Motorized
motorized
(M)
(NM)
Walk Bike Car Taxi Bus
e µI NM
P(NM ) = µI NM
e + e µI M
e µI M
P(M ) = µI NM
e + e µI M
28
Nested Model – Example
● Calculation of choice probabilities
P(Bus) = P(Bus| M)⋅ P(M)
eµM
VBus
e
µI
M
=
µM
VCar µM
VTaxi µM
VBus ⋅
µINM
µI
M
e +e +e
e
+
e
µ
µMVBus
µM
ln(eµMVCar +eµMVTaxi+eµMVBus )
e
e
=
µMVCar µMVTaxi µMVBus ⋅
µ µ V µ V
ln(e
µMVCar +
eµMVTaxi+
eµMVBus )
µ
e
+e +e
µNM ln(
e NM Walk +
e NM Bike)
e
+e µM
29
Extensions to Discrete Choice Modeling
● Multinomial Probit (MNP)
● Sampling and Estimation Methods
● Combined Data Sets
● Taste Heterogeneity
● Cross Nested Logit and GEV Models
● Mixed Logit and Probit (Hybrid Models)
● Latent Variables (e.g., Attitudes and Perceptions)
● Choice Set Generation
30
Summary
● Introduction to Discrete Choice Analysis
● A simple example
● The Random Utility Model
● Specification and Estimation of Discrete Choice Models
● Forecasting with Discrete Choice Models
● IIA Property - Motivation for Nested Logit Models
● Nested Logit
31
Additional Readings
● Ben-Akiva, M. and Bierlaire, M. (2003), ‘Discrete Choice Models With Applications to
Departure Time and Route Choice,’ The Handbook of Transportation Science, 2nd ed.,
(eds.) R.W. Hall, Kluwer, pp. 7 – 38.
● Ben-Akiva, M. and Lerman, S. (1985), Discrete Choice Analysis, MIT Press, Cambridge,
Massachusetts.
● Train, K. (2003), Discrete Choice Methods with Simulation, Cambridge University Press,
United Kingdom.
● And/Or take 1.202 next semester!
32
Appendix
Nested Logit model specification
Cross-Nested Logit
Logit Mixtures (Continuous/Discrete)
Revealed + Stated Preferences
Nested Logit Model Specification
● Partition Cn into M non-overlapping nests:
Cmn ∩ Cm’n = ∅ ∀ m≠m’
● Deterministic utility term for nest Cmn:
= ~ +
1
∑e
~
µ mV jn
V V Cmn Cmn µm
ln
j∈Cmn
● Model: P(i | C n ) = P(C mn | C n )P(i | C mn ), i ∈ C mn ⊆ C n
where
µ ~
e V Cmn e µmVin
P(Cmn | Cn ) = and P(i | Cmn ) = µmV jn
~
∑
µV C
e ln ∑e
j∈Cmn
l
34
Continuous Logit Mixture
Example:
● Combining Probit and Logit
● Error decomposed into two parts
– Probit-type portion for flexibility
– i.i.d. Extreme Value for tractability
● An intuitive, practical, and powerful method
– Correlations across alternatives
– Taste heterogeneity
– Correlations across space and time
● Requires simulation-based estimation
35
Cont. Logit Mixture: Error Component
Illustration
● Utility:
U auto = β X auto + ξ auto + ν auto
U bus = β X bus + ξ bus + ν bus
U subw ay = β X sub w ay + ξ subw ay + ν subw a y ν i.i.d. Extreme Value
ε e.g. ξ ~ N(0,Σ)
● Probability:
e β X auto +ξauto
Λ(auto|X,ξ ) = βX +ξ
e β X auto +ξauto + e β X bus +ξbus + e subway subway
ξ unknown •
P(auto|X) = ∫ Λ(auto | X , ξ ) f (ξ )d ξ
ξ
36
Continuous Logit Mixture
Random Taste Variation
● Logit: β is a constant vector
– Can segment, e.g. βlow inc , βmed inc , βhigh inc
● Logit Mixture: β can be randomly distributed
– Can be a function of personal characteristics
– Distribution can be Normal, Lognormal, Triangular, etc
37
Discrete Logit Mixture
Latent Classes
Main Postulate:
• Unobserved heterogeneity is “generated” by discrete or
categorical constructs such as
�Different decision protocols adopted
�Choice sets considered may vary
�Segments of the population with varying tastes
• Above constructs characterized as latent classes
38
Latent Class Choice Model
P (i ) =
S
∑ Λ(i | s)Q(s
)
s=1
Class-specific Class
Choice Model Membership
Model
(probability of (probability of
choosing i belonging to
conditional on class s)
belonging to
class s)
39
Summary of Discrete Choice Models
Logit NL/CNL Probit Logit Mixture
Handles unobserved taste No No Yes Yes
heterogeneity
Flexible substitution pattern No Yes Yes Yes
Handles panel data No No Yes Yes
Requires error terms normally No No Yes No
distributed
Closed-form choice probabilities Yes Yes No No (cont.)
available Yes (discrete)
Numerical approximation and/or No No Yes Yes (cont.)
simulation needed No (discrete)
40
6. Revealed and Stated Preferences
• Revealed Preferences Data
– Travel Diaries
– Field Tests
• Stated Preferences Data
– Surveys
– Simulators
41
Stated Preferences / Conjoint
Experiments
• Used for product design and pricing
– For products with significantly different attributes
– When attributes are strongly correlated in real markets
– Where market tests are expensive or infeasible
• Uses data from survey “trade-off” experiments in which
attributes of the product are systematically varied
• Applied in transportation studies since the early 1980s
• Can be combined with Revealed Preferences Data
– Benefit from strengths
– Correct for weaknesses
– Improve efficiency
42
Framework for Combining Data
Attributes of Alternatives
& Characteristics of
Decision-Maker
Utility Stated Preferences
Revealed Preferences
43
MIT OpenCourseWare
[Link]
1.201J / 11.545J / ESD.210J Transportation Systems Analysis: Demand and Economics
Fall 2008
For information about citing these materials or our Terms of Use, visit: [Link]