Mastering Pandas For Finance - Sample Chapter
Mastering Pandas For Finance - Sample Chapter
ee
This book will teach you to use Python and the Python
Data Analysis Library (pandas) to solve real-world financial
problems.
Starting with a focus on pandas data structures, you will learn
to load and manipulate time-series financial data and then
calculate common financial measures, leading into more
advanced derivations using fixed- and moving-windows.
This leads into correlating time-series data to both index
and social data to build simple trading algorithms. From
there, you will learn about more complex trading algorithms
and implement them using open source back-testing tools.
Then, you will examine the calculation of the value of
options and Value at Risk. This then leads into the modeling
of portfolios and calculation of optimal portfolios based
upon risk. All concepts will be demonstrated continuously
through progressive examples using interactive Python
and IPython Notebook.
By the end of the book, you will be familiar with applying
pandas to many financial problems, giving you the knowledge
needed to leverage pandas in the real world of finance.
Mastering pandas
for Finance
$ 44.99 US
29.99 UK
P U B L I S H I N G
Michael Heydt
Sa
m
pl
C o m m u n i t y
E x p e r i e n c e
D i s t i l l e d
Mastering pandas
for Finance
Master pandas, an open source Python Data Analysis Library,
for financial data analysis
Michael Heydt
Algorithmic Trading
In this chapter, we will examine how to use pandas and a library known as Zipline
to develop automated trading algorithms. Zipline (http://www.zipline.io/) is a
Python-based algorithmic trading library. It provides event-driven approximations
of live-trading systems. It is currently used in production as the trading engine that
powers Quantopian (https://www.quantopian.com/), a free, community-centered
platform for collaborating on the development of trading algorithms with a
web browser.
We previously simulated trading based on a historical review of social and stock
data, but these examples were naive in that they glossed over many facets of real
trading, such as transaction fees, commissions, and slippage, among many others.
Zipline provides robust capabilities to include these factors in the trading model.
Zipline also provides a facility referred to as backtesting. Backtesting is the ability
to run an algorithm on historical data to determine the effectiveness of the decisions
made on actual market data. This can be used to vet the algorithm and compare it to
others in an effort to determine the best trading decisions for your situation.
We will examine three specific and fundamental trading algorithms: simple
crossover, dual moving average crossover, and pairs trade. We will first look at how
these algorithms operate and make decisions, and then we will actually implement
these using Zipline and execute and analyze them on historical data.
This chapter will cover the following topics in detail:
Algorithmic Trading
Notebook setup
The Notebook and examples will all require the following code to execute and
format output. Later in the chapter, we will import the Zipline package but only after
first discussing how to install it in your Python environment:
In [1]:
import pandas as pd
import pandas.io.data as web
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
%matplotlib inline
pd.set_option('display.notebook_repr_html', False)
pd.set_option('display.max_columns', 8)
pd.set_option('display.max_rows', 10)
pd.set_option('display.width', 78)
pd.set_option('precision', 6)
[ 168 ]
Chapter 7
Momentum strategies
In momentum trading, trading focuses on stocks that are moving in a specific direction
on high volume, measuring the rate of change in price changes. It is typically measured
by continuously computing price differences at fixed time intervals. Momentum is a
useful indicator of the strength or weakness of the price although it is typically more
useful during rising markets as they occur more frequently than falling markets;
therefore, momentum-based prediction gives better results in a rising market.
Mean-reversion strategies
Mean reversion is a theory in trading that prices and returns will eventually move
back towards the mean of the stock or of another historical average, such as the
growth of the economy or an industry average. When the market price is below the
average price, a stock is considered attractive for purchase as it is expected that the
price will rise and, hence, a profit can be made by buying and holding the stock as it
rises and then selling at its peak. If the current market price is above the mean, the
expectation is the price will fall and there is potential for profit in shorting the stock.
Moving averages
Whether using a momentum or mean-reversion strategy for trading, the analyses
will, in one form or another, utilize moving averages of the closing price of stocks.
We have seen these before when we looked at calculating a rolling mean. We will
now examine several different forms of rolling means and cover several concepts
that are important to use in order to make trading decisions based upon how one
or more means move over time:
Algorithmic Trading
To demonstrate this, take a look at the closing price of MSFT for 2014 related to its
7-day, 30-day, and 120-day rolling means during the same period:
In [2]:
msft = web.DataReader("MSFT", "yahoo",
datetime(2000, 1, 1),
datetime(2014, 12, 31))
msft[:5]
Out[2]:
Open
High
Low
Close
2000-01-03
117.38
118.62
112.00
116.56
53228400
41.77
2000-01-04
113.56
117.12
112.25
112.62
54119000
40.36
2000-01-05
111.12
116.38
109.38
113.81
64059600
40.78
2000-01-06
112.19
113.88
108.38
110.00
54976600
39.41
2000-01-07
108.62
112.25
107.31
111.44
62013600
39.93
Date
[ 170 ]
Chapter 7
Then, we plot the price versus various rolling means to see this concept of support:
In [4]:
msft['2014'][['Adj Close', 'MAm7',
'MA30', 'MA120']].plot(figsize=(12,8));
The price of MSFT had a progressive rise over 2014, and the 120-day rolling mean
has functioned as a floor/support, where the price bounces off this floor as it
approaches it. The longer the window of the rolling mean, the lower and smoother
the floor will be in an uptrending market.
[ 171 ]
Algorithmic Trading
Contrast this with the price of the stock in 2002, when it had a steady decrease
in value:
In [5]:
msft['2014'][['Adj Close', 'MA7',
'MA30', 'MA120']].plot(figsize=(12,8));
In this situation, the 120-day moving average functions as a ceiling for about 9
months. This ceiling is referred to as resistance as it tends to push prices down as
they rise up towards this ceiling.
The price does not always respect the moving average. In both
of these cases, the prices have crossed over the moving average,
and, at times, it has reversed its movement slightly before or
just after crossing the average.
[ 172 ]
Chapter 7
In general, though, if the price is above a particular moving average, then it can be
said that the trend for that stock is up relative to that average and when the price is
below a particular moving average, the trend is down.
The means of calculating the moving average used in the previous example is
considered a simple moving average (SMA). The example demonstrated calculated
the 7, 30, and 120 SMA values.
While valuable and used to form the basis of other technical analyses, simple moving
averages have several drawbacks. They are listed as follows:
The shorter the window used, the more the noise in the signal feeds into
the result
The average calculated at the end of the window can be significantly skewed
by the values earlier in the window that are significantly skewed from the
mean
[ 173 ]
is the result.
Algorithmic Trading
One must specify precisely one of the three values to the pd.ewma() function at
which point pandas will use the corresponding formulation for alpha.
[ 174 ]
Chapter 7
The most recent value is weighted at 21 percent of the result, and this decreases by a
factor (1-alpha) across all the points, and the total of these weights is equal to 1.0.
The center of mass option specifies the point where half of the number of weights
would be on each side of the center of mass. In the case of a 10-period span, the center
of mass is 5.5. Data points 1, 2, 3, 4, and 5 are on one side, and 6, 7, 8, 9, and 10 are on
the other. The actual weight is not taken into accountjust the number of items.
The half-life specification specifies the period of time for the percentage of the
weighting factor to become half of its value. For the 10-period span, the half-life
value is 3.454152. The first weight is 0.21, and we would expect that to reduce to
0.105 just under halfway between points 4 and 5 (1+3.454152=4.454152). These
values are 0.115 and 0.094, and 0.105 is indeed between the two.
[ 175 ]
Algorithmic Trading
The following example demonstrates how the exponential weighted moving average
differs from a normal moving average. It calculates both kinds of averages for a
90-day window and plots the results:
In [7]:
span = 90
msft_ewma = msft[['Adj Close']].copy()
msft_ewma['MA90'] = pd.rolling_mean(msft_ewma, span)
msft_ewma['EWMA90'] = pd.ewma(msft_ewma['Adj Close'],
span=span)
msft_ewma['2014'].plot(figsize=(12, 8));
The exponential moving averages exhibit less lag, and, therefore, are more sensitive to
recent prices and price changes. Since more recent values are favored, they will turn
before simple moving averages, facilitating decision making on changes in momentum.
Comparatively, a simple moving average represents a truer average of prices for
the entire time period. Therefore, a simple moving average may be better suited to
identify the support or resistance level.
[ 176 ]
Chapter 7
Crossovers
A crossover is the most basic type of signal for trading. The simplest form of a
crossover is when the price of an asset moves from one side of a moving average to
the other. This crossover represents a change in momentum and can be used as a
point of making the decision to enter or exit the market.
The following command exemplifies several crossovers in the Microsoft data:
In [8]:
msft['2002-1':'2002-9'][['Adj Close',
'MA30']].plot(figsize=(12,8));
[ 177 ]
Algorithmic Trading
As an example, the cross occurring on July 09, 2002, is a signal of the beginning
of a downtrend and would likely be used to close out any existing long positions.
Conversely, a close above a moving average, as shown around August 13, may
suggest the beginning of a new uptrend and a signal to go short on the stock.
A second type of crossover, referred to as a dual moving average crossover, occurs
when a short-term average crosses a long-term average. This signal is used to
identify that momentum is shifting in the direction of the short-term average. A buy
signal is generated when the short-term average crosses the long-term average and
rises above it, while a sell signal is triggered by a short-term average crossing longterm average and falling below it.
To demonstrate this, the following command shows MSFT for January 2002 through
June 2002. There is one crossover of the 30- and 90-day moving averages with the
30-day crossing moving from above to below the 90-day average. This is a significant
signal of the downswing of the stock during upcoming intervals:
In [9]:
msft['2002-1':'2002-6'][['Adj Close', 'MA30', 'MA90']
].plot(figsize=(12,8));
[ 178 ]
Chapter 7
Pairs trading
Pairs trading is a strategy that implements a statistical arbitrage and convergence.
The basic idea is that, as we have seen, prices tend to move back to the mean. If two
stocks can be identified that have a relatively high correlation, then the change in the
difference in price between the two stocks can be used to signal trading events if one
of the two moves out of correlation with the other.
If the change in the spread between the two stocks exceeds a certain level (their
correlation has decreased), then the higher-priced stock can be considered to be in a
short position and should be sold as it is assumed that the spread will decrease as the
higher-priced stock returns to the mean (decreases in price as the correlation returns
to a higher level). Likewise, the lower-priced stock is in a long position, and it is
assumed that the price will rise as the correlation returns to normal levels.
This strategy relies on the two stocks being correlated as temporary reductions in
correlation by one stock making either a positive or negative move. This is based
upon the effects on one of the stocks that outside of shared market forces. This
difference can be used to our advantage in an arbitrage by selling and buying equal
amounts of each stock and profiting as the two prices move back into correlation.
Of course, if the two stocks move into a truly different level of correlation, then this
might be a losing situation.
Coca-Cola (KO) and Pepsi (PEP) are a canonical example of pairs-trading as they
are both in the same market segment and are both likely to be affected by the same
market events, such as the price of the common ingredients.
[ 179 ]
Algorithmic Trading
As an example, the following screenshot shows the price of Pepsi and Coca-Cola
from January 1997 through June 1998 (we will revisit this series of data later when
we implement pairs trading):
These prices are generally highly correlated during this period, but there is a marked
change in correlation that starts in August 1997 and seems to take until the end of
the year to move back into alignment. This is a situation where pairs trading can give
profits if identified and executed properly.
[ 180 ]
Chapter 7
[ 181 ]
Algorithmic Trading
The following is a simple algorithm for trading AAPL that is provided with the
Zipline examples, albeit modified to be in a class, and run in IPython. Then, print
some additional diagnostic code to trace how the process is executing in more detail:
In [11]:
class BuyApple(zp.TradingAlgorithm):
trace=False
def initialize(context):
if BuyApple.trace: print("---> initialize")
if BuyApple.trace: print(context)
if BuyApple.trace: print("<--- initialize")
Trading simulation starts with the call to the static .initialize() method. This
is your opportunity to initialize the trading simulation. In this sample, we do not
perform any initialization other than printing the context for examination.
The implementation of the actual trading is handled in the override of the
handle_data method. This method will be called for each day of the trading
Chapter 7
indexes={},
start=datetime(1990, 1, 1),
end=datetime(2014, 1, 1),
adjusted=False)
data.plot(figsize=(12,8));
Our first simulation will purposely use only one week of historical data so that we
can easily keep the output to a nominal size that will help us to easily examine the
results of the simulation:
In [13]:
result = BuyApple().run(data['2000-01-03':'2000-01-07'])
---> initialize
BuyApple(
capital_base=100000.0
sim_params=
SimulationParameters(
period_start=2006-01-01 00:00:00+00:00,
period_end=2006-12-31 00:00:00+00:00,
[ 183 ]
Algorithmic Trading
capital_base=100000.0,
data_frequency=daily,
emission_rate=daily,
first_open=2006-01-03 14:31:00+00:00,
last_close=2006-12-29 21:00:00+00:00),
initialized=False,
slippage=VolumeShareSlippage(
volume_limit=0.25,
price_impact=0.1),
commission=PerShare(cost=0.03, min trade cost=None),
blotter=Blotter(
transact_partial=(VolumeShareSlippage(
volume_limit=0.25,
price_impact=0.1), PerShare(cost=0.03, min trade cost=None)),
open_orders=defaultdict(<type 'list'>, {}),
orders={},
new_orders=[],
current_dt=None),
recorded_vars={})
<--- initialize
---> handle_data
BarData({'AAPL': SIDData({'volume': 1000, 'sid': 'AAPL',
'source_id': 'DataFrameSource-fc37c5097c557f0d46d6713256f4eaa3',
'dt': Timestamp('2000-01-03 00:00:00+0000', tz='UTC'), 'type': 4,
'price': 111.94})})
<-- handle_data
---> handle_data
[2015-04-16 21:53] INFO: Performance: Simulated 5 trading days
out of 5.
[2015-04-16 21:53] INFO: Performance: first open: 2000-01-03
14:31:00+00:00
[2015-04-16 21:53] INFO: Performance: last close: 2000-01-07
21:00:00+00:00
Chapter 7
<-- handle_data
---> handle_data
BarData({'AAPL': SIDData({'price': 104.0, 'volume': 1000, 'sid':
'AAPL', 'source_id': 'DataFrameSourcefc37c5097c557f0d46d6713256f4eaa3', 'dt': Timestamp('2000-01-05
00:00:00+0000', tz='UTC'), 'type': 4})})
<-- handle_data
---> handle_data
BarData({'AAPL': SIDData({'price': 95.0, 'volume': 1000, 'sid':
'AAPL', 'source_id': 'DataFrameSourcefc37c5097c557f0d46d6713256f4eaa3', 'dt': Timestamp('2000-01-06
00:00:00+0000', tz='UTC'), 'type': 4})})
<-- handle_data
---> handle_data
BarData({'AAPL': SIDData({'price': 99.5, 'volume': 1000, 'sid':
'AAPL', 'source_id': 'DataFrameSourcefc37c5097c557f0d46d6713256f4eaa3', 'dt': Timestamp('2000-01-07
00:00:00+0000', tz='UTC'), 'type': 4})})
<-- handle_data
The context in the initialize method shows us some parameters that the
simulation will use during its execution. The context also shows that we start with
a base capitalization of 100000.0. There will be a commission of $0.03 assessed for
each share purchased.
The context is also printed for each day of trading. The output shows us that Zipline
passes the price data for each day of AAPL. We do not utilize this information in this
simulation and blindly purchase one share of AAPL.
The result of the simulation is assigned to the result variable, which we can analyze for
detailed results of the simulation on each day of trading. This is a DataFrame where
each column represents a particular measurement during the simulation, and each row
represents the values of those variables on each day of trading during the simulation.
We can examine a number of the variables to demonstrate what Zipline was doing
during the processing. The orders variable contains a list of all orders made during
the day. The following command gets the orders for the first day of the simulation:
In [14]:
result.iloc[0].orders
Out[14]:
[{'amount': 1,
[ 185 ]
Algorithmic Trading
'commission': None,
'created': Timestamp('2000-01-03 00:00:00+0000', tz='UTC'),
'dt': Timestamp('2000-01-03 00:00:00+0000', tz='UTC'),
'filled': 0,
'id': 'dccb19f416104f259a7f0bff726136a2',
'limit': None,
'limit_reached': False,
'sid': 'AAPL',
'status': 0,
'stop': None,
'stop_reached': False}]
This tells us that Zipline placed an order in the market for one share of AAPL on
2000-01-03. The order filled the value 0, which means that this trade has not yet
been executed in the market.
On the second day of trading, Zipline reports that two orders were made:
In [15]:
result.iloc[1].orders
Out[15]:
[{'amount': 1,
'commission': 0.03,
'created': Timestamp('2000-01-03 00:00:00+0000', tz='UTC'),
'dt': Timestamp('2000-01-04 00:00:00+0000', tz='UTC'),
'filled': 1,
'id': 'dccb19f416104f259a7f0bff726136a2',
'limit': None,
'limit_reached': False,
'sid': 'AAPL',
'status': 1,
'stop': None,
'stop_reached': False},
{'amount': 1,
'commission': None,
'created': Timestamp('2000-01-04 00:00:00+0000', tz='UTC'),
'dt': Timestamp('2000-01-04 00:00:00+0000', tz='UTC'),
'filled': 0,
'id': '1ec23ea51fd7429fa97b9f29a66bf66a',
[ 186 ]
Chapter 7
'limit': None,
'limit_reached': False,
'sid': 'AAPL',
'status': 0,
'stop': None,
'stop_reached': False}]
The first order listed has the same ID as the order from day one. This tells us that this
represents that same order, and we can see this from the filled key, which is now 1
and from the fact that this order has been filled in the market.
The second order is a new order, which represents our request on the second day of
trading, which will be reported as filled at the start of day two.
During the simulation, Zipline keeps track of the amount of cash we have (capital)
at the start and end of the day. As we purchase stocks, our cash is reduced. Starting
and ending cash is represented by the starting_cash and ending_case variables
of the result.
Zipline also accumulates the total value of the purchases of stock during the
simulation. This value is represented in each trading period using the ending_value
variable of the result.
The following command shows us the running values for ending_cash and
ending_value, along with ending_value:
In [16]:
result[['starting_cash', 'ending_cash', 'ending_value']]
Out[16]:
starting_cash
ending_cash
ending_value
2000-01-03 21:00:00
100000.00000
100000.00000
0.0
2000-01-04 21:00:00
100000.00000
99897.46999
102.5
2000-01-05 21:00:00
99897.46999
99793.43998
208.0
2000-01-06 21:00:00
99793.43998
99698.40997
285.0
2000-01-07 21:00:00
99698.40997
99598.87996
398.0
Ending cash represents the amount of cash (capital) that we have to invest at the end
of the given day. We made an order on day one for one share of the apple, but since
the transaction did not execute until the next day, we still have our starting seed at
the end of the day. But on day two, this will execute at the value reported at the close
of day one, which is 111.94. Hence, our ending_cash is reduced by 111.94 for one
share and also deducted is the $0.03 for the commission resulting in 9987.47.
[ 187 ]
Algorithmic Trading
At the end of day two, our ending_value, that is, our position in the market, is 102.5
as we have accumulated one share of AAPL, and it closed at 102.5 on day two.
We did not print starting_cash and starting_value
as this will always be equal to our initial capitalization of
100000.0 and a portfolio value of 0.0 as we have not yet
bought any securities.
While investing, we would be interested in the overall value of our portfolio, which,
in this case, would be the value of our on-hand cash + our position in the market.
This can be easily calculated:
In [17]:
pvalue = result.ending_cash + result.ending_value
pvalue
Out[17]:
2000-01-03 21:00:00
100000.00000
2000-01-04 21:00:00
99999.96999
2000-01-05 21:00:00
100001.43998
2000-01-06 21:00:00
99983.40997
2000-01-07 21:00:00
99996.87996
dtype: float64
100000.00000
2000-01-04 21:00:00
99999.96999
2000-01-05 21:00:00
100001.43998
2000-01-06 21:00:00
99983.40997
2000-01-07 21:00:00
99996.87996
In a similar vein, we can also calculate the daily returns on our investment using
.pct_change():
In [19]:
result.portfolio_value.pct_change()
[ 188 ]
Chapter 7
Out[19]:
2000-01-03 21:00:00
NaN
2000-01-04 21:00:00
-3.00103e-07
2000-01-05 21:00:00
1.46999e-05
2000-01-06 21:00:00
-1.80297e-04
2000-01-07 21:00:00
1.34722e-04
This is actually a column of the results from the simulation, so we do not need to
actually calculate it:
In [20]:
result['returns']
Out[20]:
2000-01-03 21:00:00
NaN
2000-01-04 21:00:00
-3.00103e-07
2000-01-05 21:00:00
1.46999e-05
2000-01-06 21:00:00
-1.80297e-04
2000-01-07 21:00:00
1.34722e-04
Using this small trading interval, we have seen what type of calculations Zipline
performs during each period. Now, let's run this simulation over a longer period of
time to see how it performs. The following command runs the simulation across the
entire year 2000:
In [21]:
result_for_2000 = BuyApple().run(data['2000'])
Out[21]:
[2015-02-15 05:05] INFO: Performance: Simulated 252 trading days
out of 252.
[2015-02-15 05:05] INFO: Performance: first open: 2000-01-03
14:31:00+00:00
[2015-02-15 05:05] INFO: Performance: last close: 2000-12-29
21:00:00+00:00
The following command shows us our cash on hand and the value of our
investments throughout the simulation:
In [22]:
result_for_2000[['ending_cash', 'ending_value']]
[ 189 ]
Algorithmic Trading
Out[22]:
ending_cash
ending_value
2000-01-03 21:00:00
100000.00000
0.00
2000-01-04 21:00:00
99897.46999
102.50
2000-01-05 21:00:00
99793.43998
208.00
2000-01-06 21:00:00
99698.40997
285.00
2000-01-07 21:00:00
99598.87996
398.00
...
...
2000-12-22 21:00:00
82082.91821
3705.00
2000-12-26 21:00:00
82068.19821
3643.12
2000-12-27 21:00:00
82053.35821
3687.69
2000-12-28 21:00:00
82038.51820
3702.50
2000-12-29 21:00:00
82023.60820
3734.88
...
The following command visualizes our overall portfolio value during the year 2000:
In [23]:
result_for_2000.portfolio_value.plot(figsize=(12,8));
[ 190 ]
Chapter 7
Our strategy has lost us money over the year 2000. AAPL generally trended
downward during the year, and simply buying every day is a losing strategy.
The following command runs the simulation over 5 years:
In [24]:
result = BuyApple().run(data['2000':'2004']).portfolio_value
result.plot(figsize=(12,8));
[2015-04-16 22:52] INFO: Performance: Simulated 1256 trading days
out of 1256.
[2015-04-16 22:52] INFO: Performance: first open: 2000-01-03
14:31:00+00:00
[2015-04-16 22:52] INFO: Performance: last close: 2004-12-31
21:00:00+00:00
Hanging in with this strategy over several more years has paid off as AAPL had a
marked upswing in value starting in mid-2013.
[ 191 ]
Algorithmic Trading
Chapter 7
class DualMovingAverage(zp.TradingAlgorithm):
def initialize(context):
# we need to track two moving averages, so we will set
# these up in the context the .add_transform method
# informs Zipline to execute a transform on every day
# of trading
# the following will set up a MovingAverge transform,
# named short_mavg, accessing the .price field of the
# data, and a length of 100 days
context.add_transform(zp.transforms.MovingAverage,
'short_mavg', ['price'],
window_length=100)
# and the following is a 400 day MovingAverage
context.add_transform(zp.transforms.MovingAverage,
'long_mavg', ['price'],
window_length=400)
# this is a flag we will use to track the state of
# whether or not we have made our first trade when the
# means cross.
Algorithmic Trading
# the next cross
self.invested = True
buy = True # records that we did a buy
elif short_mavg < long_mavg and self.invested:
# short move across the long, trending down
# sell it all!
self.order_target('AAPL', -100)
# prevents further sales until the next cross
self.invested = False
sell = True # and note that we did sell
# add extra data to the results of the simulation to
# give the short and long ma on the interval, and if
# we decided to buy or sell
self.record(short_mavg=short_mavg,
long_mavg=long_mavg,
buy=buy,
sell=sell)
We can now execute this algorithm by passing it data from 1990 through 2001, as
shown here:
In [27]:
results = DualMovingAverage().run(sub_data)
[2015-02-15 22:18] INFO: Performance: Simulated 3028 trading days
out of 3028.
[2015-02-15 22:18] INFO: Performance: first open: 1990-01-02
14:31:00+00:00
[2015-02-15 22:18] INFO: Performance: last close: 2001-12-31
21:00:00+00:00
To analyze the results of the simulation, we can use the following function that
creates several charts that show the short/long means relative to price, the value of
the portfolio, and the points at which we made buys and sells:
In [28]:
def analyze(data, perf):
fig = plt.figure()
ax1 = fig.add_subplot(211,
ylabel='Price in $')
Chapter 7
'^', markersize=10, color='m')
ax1.plot(perf.ix[perf.sell].index, perf.short_mavg[perf.sell],
'v', markersize=10, color='k')
ax2 = fig.add_subplot(212, ylabel='Portfolio value in $')
perf.portfolio_value.plot(ax=ax2, lw=2.)
ax2.plot(perf.ix[perf.buy].index,
perf.portfolio_value[perf.buy],
'^', markersize=10, color='m')
ax2.plot(perf.ix[perf.sell].index,
perf.portfolio_value[perf.sell],
'v', markersize=10, color='k')
plt.legend(loc=0)
plt.gcf().set_size_inches(14, 10)
Using this function, we can plot the decisions made and the resulting portfolio value
as trades are executed:
In [29]:
analyze(sub_data, results)
[ 195 ]
Algorithmic Trading
The crossover points are noted on the graphs using triangles. Upward-pointing
red triangles identify buys and downward-pointing black triangles identify sells.
Portfolio value stays level after a sell as we are completely divested from the market
until we make another purchase.
[ 196 ]
Chapter 7
Analyzing the chart, we can see that the two stocks tend to follow along the same
trend line, but that there is a point where Coke takes a drop relative to Pepsi (August
1997 through December 1997). It then tends to follow the same path although with a
wider spread during 1998 than in early 1997.
[ 197 ]
Algorithmic Trading
We can dive deeper into this information to see what we can do with pairs trading.
In this algorithm, we will examine how the spread between the two stocks change.
Therefore, we need to calculate the spread:
In [31]:
data['PriceDelta'] = data.PEP - data.KO
data['1997':].PriceDelta.plot(figsize=(12,8))
plt.ylabel('Spread')
plt.axhline(data.Spread.mean());
Using this information, we can make a decision to buy one stock and sell the other
if the spread exceeds a particular size. In the algorithm we implement, we will
normalize the spread data on a 100-day window and use that to calculate the z-score
on each particular day.
If the z-score is > 2, then we will want to buy PEP and sell KO as the spread increases
over our threshold with PEP taking the higher price. If the z-score is < -2, then we
want to buy KO and sell PEP, as PEP takes the lower price as the spread increases.
Additionally, if the absolute value of the z-score < 0.5, then we will sell off any
holdings we have in either stock to limit our exposure as we consider the spread to
be fairly stable and we can divest.
[ 198 ]
Chapter 7
One calculation that we will need to perform during the simulation is calculating the
regression of the two series prices. This will then be used to calculate the z-score of
the spread at each interval. To do this, the following function is created:
In [32]:
@zp.transforms.batch_transform
def ols_transform(data, ticker1, ticker2):
p0 = data.price[ticker1]
p1 = sm.add_constant(data.price[ticker2], prepend=True)
slope, intercept = sm.OLS(p0, p1).fit().params
return slope, intercept
[ 199 ]
Algorithmic Trading
def compute_zscore(self, data, slope, intercept):
# calculate the spread
spread=(data['PEP'].price-(slope*data['KO'].price+
intercept))
self.spreads.append(spread) # record for z-score calc
self.record(spread = spread)
spread_wind=self.spreads[-self.window_length:]
zscore=(spread - np.mean(spread_wind))/np.std(spread_wind)
return zscore
[ 200 ]
Chapter 7
During the simulation of the algorithm, we recorded any transactions made, which
can be accessed using the action column of the result DataFrame:
In [35]:
selection = ((perf.action=='PK') | (perf.action=='KP') |
(perf.action=='DE'))
actions = perf[selection][['action']]
actions
Out[35]:
1997-07-16 20:00:00
KP
1997-07-22 20:00:00
DE
1997-08-05 20:00:00
PK
1997-10-15 20:00:00
DE
1998-03-09 21:00:00
PK
1998-04-28 20:00:00
DE
Algorithmic Trading
ax3.axhline(2, color='k')
ax3.axhline(-2, color='k')
plt.ylabel('Z-score')
ax4 = plt.subplot(414)
perf['1997':].portfolio_value.plot()
plt.ylabel('Protfolio Value')
for ax in [ax1, ax2, ax3, ax4]:
for d in actions.index[actions.action=='PK']:
ax.axvline(d, color='g')
for d in actions.index[actions.action=='KP']:
ax.axvline(d, color='c')
for d in actions.index[actions.action=='DE']:
ax.axvline(d, color='r')
plt.gcf().set_size_inches(16, 12)
[ 202 ]
Chapter 7
The first event is on 1997-7-16 when the algorithm saw the spread become less than
-2, and, therefore, triggered a sale of KO and a buy of PEP. This quickly turned
around and moved to a z-score of 0.19 on 1997-7-22, triggering a divesting of our
position. During this time, even though we played the spread, we still lost because
a reversion happened very quickly.
On 1997-08-05, the z-score moved above 2.0 to 2.12985 and triggered a purchase of KO
and a sale of PEP. The z-score stayed around 2.0 until 1997-10-15 when it dropped to
-0.1482 and, therefore, we divested. Between those two dates, since the spread stayed
fairly consistent around 2.0, our playing of the spread made us consistent returns as
we can see with the portfolio value increasing steadily over that period.
On 1998-03-09, a similar trend was identified, and again, we bought KO and sold PEP.
Unfortunately the spread started to minimize and we lost a little during this period.
Summary
In this chapter, we took an adventure into learning the fundamentals of algorithmic
trading using pandas and Zipline. We started with a little theory to set a framework
for understanding how the algorithms would be implemented. From there, we
implemented three different trading algorithms using Zipline and dived into the
decisions made and their impact on the portfolios as the transactions were executed.
Finally, we established a fundamental knowledge of how to simulate markets and
make automated trading decisions.
[ 203 ]
www.PacktPub.com
Stay Connected: