0% found this document useful (0 votes)
83 views53 pages

Slides CFM Imperial

This document discusses algorithmic pricing in securities markets. It introduces the topic and describes how algorithmic market makers can learn to price assets through reinforcement learning techniques like Q-learning. The document outlines an experiment where algorithmic market makers interact to learn pricing without prior knowledge of the environment. It finds the algorithmic market makers learn not to be adversely selected and set prices closer to theoretical predictions when adverse selection is higher.

Uploaded by

Tom Hardy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views53 pages

Slides CFM Imperial

This document discusses algorithmic pricing in securities markets. It introduces the topic and describes how algorithmic market makers can learn to price assets through reinforcement learning techniques like Q-learning. The document outlines an experiment where algorithmic market makers interact to learn pricing without prior knowledge of the environment. It finds the algorithmic market makers learn not to be adversely selected and set prices closer to theoretical predictions when adverse selection is higher.

Uploaded by

Tom Hardy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Algorithmic Pricing in Securities Markets

Jean-Edouard Colliard, Thierry Foucault, Stefano Lovo


HEC Paris

CFM-Imperial Conference
Roadmap

Introduction

The Market Making Game

Algo MMs

Implementation and Findings

Role of Experimentation

Price Discovery

Conclusion
Algorithms in Securities Markets

▶ Rise of the “ Algo market maker”.


AI in Securities Markets
Behavioral Algorithmic Finance

▶ Can we still rely on traditional models of market making to


explain and predict trading outcomes when prices are set by
algos?
▶ How to model the behavior of algo market makers?

1. Bayesian learning + Nash equilibrium (Glosten and


Milgrom (1985), Kyle (1985) etc.). Algo market makers are
just faster and more efficient in getting and processing
information.

2. Reinforcement learning (AlphaGo, Robotics, Autonomous


cars etc.); Algos learn how to make decisions iteratively by
experimenting, receiving feedback and adjusting their behavior.
▶ Does the second approach predict outcomes different from the
first? How and why?
What we do

▶ We consider a standard market-making game (≈


Glosten-Milgrom (1985)) but we assume that quotes are set
by Q-learning algorithms (“algo-MMs) with no prior
knowledge about the environment (e.g., intensity of adverse
selection).
▶ We run experiments (a large number of interactions between
algo-MMs and their clients, holding the environment
constant) to study how Algo-MMs learn from experience and
set their prices.
▶ We benchmark the observations to the predictions of the
Nash equilibrium of the model (standard Bertrand equilibrium
with zero expected profits for market makers).
Questions

▶ Adverse selection. Can algo MMs learn to price “adverse


selection”?
“Zillow may simply have realized before anyone else that
adverse selection is intractable. If so,other iBuyers will
eventually fail, too.”
(The Washington Post, November 2021)

▶ Price discovery. Can algos learn asset values (“discover


fundamentals”)?
▶ Competition. Can algos learn to be competitive (undercut
when profitable to do so)? Major concern in online product
markets.
Algorithmic Pricing in Product Markets
Main New Findings

▶ Algo-MMs learn not to be adversely selected: Their


average realized spreads are positive.
▶ Algo-MMs do not learn to undercut: They eventually
settle on prices less competitive than those predicted by the
Nash (Glosten-Milgrom (1985)) equilibrium.
▶ Algo-MMs set prices that are more competitive (closer to
the nash equilibrium) when adverse selection costs are larger.
▶ Algo-MMs learn to update their quotes based on the order
flow (discover asset fundamentals).
Roadmap

Introduction

The Market Making Game

Algo MMs

Implementation and Findings

Role of Experimentation

Price Discovery

Conclusion
The Market Making Game (“RFQ”)

▶ A risky asset with payoff ṽ where v = vH = µ + ∆


2 or
v = vL = µ − ∆2 with equal probabilities.

▶ A client considers buying one share of the asset. Her


valuation for the asset is:

ṽC = w̃C + L̃,

where (i) L̃ ∼ N (0, σ 2 ) (“liquidity shock”) and (ii) w̃C = ṽ .


▶ 2 Market Makers (MMs) X and Y simultaneously post prices
aX and aY , not knowing ṽ and ṽC .
▶ The client buys if vC ≥ min ai .
i∈{X ,Y }

▶ If a trade occurs, the dealers posting the best quote earn (in
aggregate) (amin − ṽ ) and zero otherwise.
The Client’s Demand

▶ The client’s realized demand is either 1 (buy) or 0 (no trade).


▶ Conditional on v , the likelihood of a trade is:

D(amin , wC ) = Pr(v + L̃ ≥ amin ).

▶ It decreases with amin and it increases with wC ⇒ Adverse


selection.
▶ The unconditional likelihood of a trade is:
1 1
D̄(amin ) = D(amin , vL ) + D(amin , vH ).
2 2
Likelihood of a Trade

Example: vH=4, vL=0, E(v)=2. Best Offer=2.5.


• Likelihood of a trade if v= vH = Blue + Red.
• Likelihood of a trade if v= vL = Blue.
• Prob Trade = Blue + 0.5 Red.
• Difference in likelihood v=vH and v=vL = Red
Economic Environment

▶ Two Cases:

1. With adverse selection: w̃C = ṽ .


▶ ⇒ The client is more likely to buy when the asset payoff is
high than when it is low:

∆D (amin ) = D(amin , vH ) − D(amin , vL ) > 0.


▶ ⇒ Dealers are exposed to adverse selection (more likely to sell
when the asset payoff is high than low).

2. Without adverse selection: w̃C and ṽ are i.i.d. The


likelihood that a client buys is D̄(amin ) whether the asset
payoff is high or low.
Dealers’ Expected Profits

▶ Π̄(aX , aY ): Dealer X ’s expected profit if X posts aX and Y


posts aY .
▶ Adverse Selection Case: (suppose aX < aY )

1 1
Π̄(aX , aY ) = ( D(ai , vH )(ai − vH ) + D(aX , vL )(aX − vL )),
2 2
 ⇔ 
 Cov (D(aX , ṽ ), ṽ ) 
Π̄(aX , aY ) = D̄(ai ) (aX − E(ṽ )) − ,
 
 D̄(aX ) 
| {z }
AdverseSelection Cost
∆×∆D
where Cov (D(aX , ṽ ), ṽ ) = 2 > 0.
▶ No Adverse Selection Case: Cov (D(amin , v ), v )) = 0
because the likelihood of a buy does not vary with v .
Benchmark (Glosten-Milgrom (1985))

▶ Competitive Price: a∗ such that Π(a∗ , a∗ ) = 0


▶ This is the only Nash equilibrium with 2 (or more)
dealers.
1. Without adverse selection: a∗ = E(ṽ ). Independent of the
risk of the asset ∆ and the standard deviation of L (σ).

2. With adverse selection: a∗ = E(ṽ | Buy) > E(ṽ ) (No regret


quotes, as in Glosten and Milgrom (1985)).
▶ Empiricists often use two measures of illiquidity.
1. Dealers’ average quoted spread: a∗ − E(ṽ ).
2. Dealers’ average realized spread: E(a∗ − ṽ | Buy ).
3. Dealers’ total expected profit = Likelihood of a trade ×
Average realized spread.
Glosten-Milgrom Equilibrium: Predictions

1. H.1. Dealers’ average quoted spread (a∗ − E(ṽ )) is strictly


larger with adverse selection.
2. H.2. Dealers’ average quoted spread declines with σ and
increases with ∆ if adverse selection. Without adverse
selection, these parameters have no effect on the quoted
spread.
3. H.3. Dealers’ average realized spreads is zero with and
without adverse selection.
▶ Are these (very standard) predictions satisfied when prices are
set by Algo MMs?
Effects of exposure to adverse selection

σ 0.5 1 3 5 7
(1) Quoted Spread 2.00 2.00 1.24 0.68 0.47
(2) Adverse Sel. Cost 2.00 2.00 1.24 0.68 0.47
(3) Realized Spread (1)-(2) 0 0 0 0 0

∆v 0 2 4 6 8
Quoted Spread 0 0.16 0.68 1.65 3.02
Adverse Sel. Cost 0 0.16 0.68 1.65 3.02
Realized spread 0 0 0 0 0
Roadmap

Introduction

The Market Making Game

Algo MMs

Implementation and Findings

Role of Experimentation

Price Discovery

Conclusion
Why Reinforcement Learning

▶ The Nash approach assumes that the market makers (MMs)


know Π̄(ai , a−i ).
▶ This requires a lot of knowledge on the environment: (i) The
distribution of client’s valuation (vC ), (ii) the distribution of
the asset payoff, (iii) the number of competitors etc.
▶ An alternative behavioral model: Reinforcement learning
▶ Algo MMs (AMs): They have no prior knowledge about the
environment and learn to play the market making game via
experimentation.
▶ We focus on Q-learning. Not because it is realistic but
because it is a simple model of behavior for an algorithm. Our
goal is not find the best possible algorithms in our
environment.
Q-Learning Algorithm - Description
▶ Holding parameters (∆ and σ) fixed, the market making game
is repeated with T different clients (realization of the asset
payoff after each client).
▶ Q-Matrix: Qit (a) is AM i’s assessment of the expected profit
with price a in episode t.
▶ Action:: In episode t, AM i chooses her price as follows:
▶ With probability ϵt = e −βt : Explore: Pick randomly a price
at on a pre-specified grid P.
▶ With probability (1 − ϵt ): Exploit Play ait = arg max Qit (a).
a
▶ Feedback: After choosing ait , AM i obtains her realized
profit πit (and has no further information).
▶ Learning AM i updates the cell of the Q-matrix for ait (and
ait only):
Qit+1 (at ) = απt + (1 − α)Qit (at ).
▶ Initialization: Qi0 (a) is chosen randomly.
Q-Learning Algorithm - Example

▶ Parameters environment: E(v ) = 2, ∆ = 4, σ = 5. Adverse


selection (w̃C = ṽ ) ⇒ The competitive price is 2.68 in theory.

▶ Parameters algorithm: α = 0.5, β = 0.1. P = {3, 3.1}.

▶ Dealer X’s price is fixed at aX = 3.1 to simplify (will not be


the case in actual experiments).

▶ True expected profits for dealer Y: 0.12 with aY = 3 and


0.15/2 = 0.075 with aY = 3.1 ⇒ undercutting dealer X is
optimal.

▶ Will AM Y eventually learn to undercut if it uses a Q-learning


algorithm?
Q-Learning Algorithm - Example

▶ Initialization of the Q-matrix


   
QY 0 (3) 0
QY 0 = =
QY 0 (3.1) 0.01

▶ t = 1: ṽ = vL = 0, ϵ1 = 0.90
Explore
a=3
Trade occurs (vL + L̃ ≥ 3).
QY 1 (3) = α × [3 − vL ] + (1 − α) × QY 0 (3) = 1.5.
QY 1 (3.1) = QY 0 (3.1) = 0.01
Q-Learning Algorithm - Example

▶ Reminder: Parameters algorithm: α = 0.5, β = 0.1.


 
1.5
QY 1 =
0.01

▶ t = 2: ṽ = vL = 0, ϵ2 = 0.82
Explore
a = 3.1
Trade occurs (vL + L̃ ≥ 3.1).
QY 2 (3.1) = α × [3.1 − vL ] + (1 − α) × QY 1 (3.1) = 1.55.
Q-Learning Algorithm - Example

▶ Reminder parameters algorithm: α = 0.5, β = 0.1.

 
1.5
QY 2 =
1.55

▶ t = 3: ṽ = vH = 4, ϵ3 = 0.74
Explore
a=3
Trade occurs.
QY 3 (3) = α × [3 − 4] + (1 − α) × QY 2 (3) = 0.25
Q-Learning Algorithm - Example

▶ Parameters: α = 0.5, β = 0.1.

 
0.25
QY 3 =
1.55

▶ t = 4: ṽ = vH = 4, ϵ4 = 0.67
Exploit
Greedy-price: a = 3.1
Trade does not occur,
QY 4 (3.1) = α × 0 + (1 − α) × QY 3 (3.1) = 0.775.
▶ Etc T times
▶ Will AM Y eventually learn that the true expected payoff of
a = 3 is higher...? No...Not necessarily: T is finite and
experimentation is less likely over time.
Roadmap

Introduction

The Market Making Game

Algo MMs

Implementation and Findings

Role of Experimentation

Price Discovery

Conclusion
Implementation

▶ Algorithm: α = 0.01, β = 8.10−5 , random Q0 .

▶ Baseline Environment: E (v ) = 2, ∆ = 4, σ = 5 (baseline).

▶ For each parametrization of the game (σ,∆v ), we run


K = 10, 000 “experiments”, each with T = 106 clients.

▶ Price Grid: 139, tick size δ = 0.1:


P= {1.01, 1.02, ..., 2, ..., 14.9}.

▶ We focus on the distribution of prices (mostly mean


prices) across experiments after the last client.
Learning
Observation 1: AMs’ prices are not competitive

Nash
Equilibrium
Observation 1: AMs’ prices are not competitive
Observation 2: AM’s prices account for adverse selection
Observation 3: Adverse selection reduces AMs’ rents
Observation 5: AMs’ spreads and rents increase with
Volatility
Observation 6: AMs’ spreads and rents decrease with the
number of AMs
Evidence

Brogaard and Garriott (2019, JFQA):


Puzzles

▶ H.1 is satisfied: AMs charge larger spreads when there is


adverse selection.

▶ Otherwise, AMs’ prices are significantly different from


the Glosten-Milgrom prices:

1. Realized spreads are not zero. Algo MMs “leave money on


the table”: They do not learn to undercut non competitive
quotes when it is profitable to do so.

2. Realized spreads are larger when there is no adverse


selection. In principle, they should be zero with or without
adverse selection.

3. Quoted spreads and realized spreads are smaller when


adverse selection costs are larger. Exactly the opposite of
the Glosten-Milgrom benchmark (Nash equilibrium).
An Explanation: Lack of Experimentation + Noise
▶ Parameters: σ = 5, ∆v = 4, E (v ) = 2. If the AMs post 5
(the modal price in the experiments), they each obtain an
average profit of 0.3. If Y undercuts at 4.9, it obtains:
0.8
Expected Profit = 0.59

Variance Profit= 1.78


0.6
Probability

0.4

0.2

0
0 0.9 4.9
Realized Profit at aY = 4.9 (aX = 5)

▶ ⇒ The Q value of undercutting is a noisy estimate of


the true expected profit.
Adverse Selection Reduces Noise
▶ Same parameters as before but Blue: Adverse selection and
Red: No adverse selection.
0.8
Variance Profit No adv. Sel.= 2.93

Variance Profit Ad Sel.= 1.78


0.6

Probability
0.4

0.2

⇒0
0 0.9 4.9
Realized Profit at aY = 4.9 (aX = 5)

▶ Adverse selection reduces the noise in the feedback


received by an AM if it undercuts ⇒ Makes learning to
undercut easier.
The Dispersion In Clients’ Liquidity Shock Increases Noise
Asset Volatility
Roadmap

Introduction

The Market Making Game

Algo MMs

Implementation and Findings

Role of Experimentation

Price Discovery

Conclusion
More Experimentation, More competitive outcomes

Figure: Parameters: σ = 5, ∆v = 4. The experimentation rate never


goes below 5%.
But...experimentation is costly

Figure: We vary β and compute average per period profits over various
time windows.
Roadmap

Introduction

The Market Making Game

Algo MMs

Implementation and Findings

Role of Experimentation

Price Discovery

Conclusion
Learning from Order Flow

▶ Price discovery: Dealers discover the asset fundamental, v ,


via repeated interactions with their clients.
▶ Implication: If dealers receive requests from two clients in
sequence, in the Glosten and Milgrom equilibrium:

1. They raise their offer for the second client after observing a
buy from the first (because a buy is more likely if v = vH ).

2. They lower their offer for the second client if the first does not
trade (because the first client is more likely not to trade if
v = vL ).
Example
▶ Parameters: Adverse selection, ∆ = 4, Lt ∼ N (0, 5). a2T (
a2NT ): 2nd client price after a trade (No trade) with the first
client
σ
0.5 1 3 5 7

(a2T − a1∗ ) 0 0 0.58 0.58 0.45

(a2NT − a1∗ ) 0 0 -0.8 -0.6 -0.45


∆v
0 2 4 6 8

(a2T − a1∗ ) 0 0.34 0.58 0.95 0.84

(a2NT − a1∗ ) 0 -0.36 -0.6 -1.2 -1.36


Price Discovery

▶ To study whether AMs update their quotes as in the Glosten


and Milgrom equilibrium, we extend the baseline model to 2
clients: each episode has two periods with the same ṽ .
▶ More complex dynamic problem: Dealers face a dynamic
optimization problem because the price in period 1 affects the
informational content of the trade in period 1 and therefore
the choice of the price in period 2 (Leach and Madhavan
(1995)).
▶ Q-learning was precisely developed for this kind of
environment (estimate of value function in dynamic
optimization problems).
Extension

T=1 T=2 T=1000,000

=1     

v1 is realized v2 is realized v1000000 is realized


Price Discovery
Roadmap

Introduction

The Market Making Game

Algo MMs

Implementation and Findings

Role of Experimentation

Price Discovery

Conclusion
Conclusion
▶ The behavior of algorithmic market makers using
Q-learning algorithm is significantly different from that
predicted by the standard equilibrium analysis of the
market making game.
1. Non competitive prices (not collusion, just imperfect estimates
of actual expected profits).
2. An increase in adverse selection costs make algo market
makers more competitive.
▶ The variance of profits is important even though AMs
are not penalized for risk taking in our experiments.
1. Parameters that increases this variance makes learning true
expected payoffs and optimal actions more difficult for AMs.
▶ AMs’s behavior is more consistent with the predictions
of the Nash equilibrium if they experiment more but this
is costly.
▶ Next step: Check whether our findings are robust with more
complex algorithms.
Thank You!

You might also like