Alphapy Readthedocs Io en Latest
Alphapy Readthedocs Io en Latest
Release 2.5.0
1 Introduction 1
1.1 Core Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 External Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Quick Start 5
3 Installation 7
3.1 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Anaconda Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Command Line 9
5 Support 11
5.1 Donations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6 Kaggle Tutorial 13
10 AlphaPy 41
10.1 Model Object Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
10.2 Data Ingestion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
10.3 Feature Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
10.4 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
10.5 Model Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
10.6 Grid Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
10.7 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
10.8 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
10.9 Plot Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
10.10 Final Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
11 Project Structure 59
11.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
11.2 Model Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
11.3 Algorithms Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
11.4 Final Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
i
12 MarketFlow 75
12.1 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
12.2 Domain Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
12.3 Group Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
12.4 Variables and Aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
12.5 Trading Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
12.6 Model Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
12.7 Creating the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
12.8 Running the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
13 SportFlow 89
13.1 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
13.2 Domain Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
13.3 Model Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
13.4 Creating the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
13.5 Running the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
14 alphapy 95
14.1 alphapy package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Bibliography 167
Index 171
ii
CHAPTER
ONE
INTRODUCTION
AlphaPy is a machine learning framework for both speculators and data scientists. It is written in Python with the
scikit-learn and pandas libraries, as well as many other helpful libraries for feature engineering and visualiza-
tion. Here are just some of the things you can do with AlphaPy:
• Run machine learning models using scikit-learn and xgboost.
• Create models for analyzing the markets with MarketFlow.
• Predict sporting events with SportFlow.
• Develop trading systems and analyze portfolios using MarketFlow and Quantopian’s pyfolio.
The alphapy package is the base platform. The domain pipelines MarketFlow (mflow) and SportFlow (sflow)
run on top of alphapy. As shown in the diagram below, we separate the domain pipeline from the model pipeline.
The main job of a domain pipeline is to transform the raw application data into canonical form, i.e., a training set and
a testing set. The model pipeline is flexible enough to handle any project and evolved over many Kaggle competitions.
1
AlphaPy Documentation, Release 2.5.0
Testing Data: The testing data is an external file that is read as a pandas dataframe. For classification, the labels
may or may not be included.
Model Pipeline: This Python code is generic for running all classification or regression models. The pipeline
begins with data and ends with a model object for new predictions.
Model YAML: The configuration file has specific sections for running the model pipeline. Every aspect of creating
a model is controlled through this file.
Model Object: All models are saved to disk. You can load and run your trained model on new data in scoring
mode.
AlphaPy has been developed primarily for supervised learning tasks. You can generate models for any classification
or regression problem.
• Binary Classification: classify elements into one of two groups
• Multiclass Classification: classify elements into multiple categories
• Regression: predict real values based on derived coefficients
Classification Algorithms:
• AdaBoost
• Extra Trees
• Gradient Boosting
• K-Nearest Neighbors
• Logistic Regression
• Support Vector Machine (including Linear)
• Naive Bayes (including Multinomial)
• Radial Basis Functions
• Random Forests
• XGBoost Binary and Multiclass
Regression Algorithms:
• Extra Trees
• Gradient Boosting
• K-Nearest Neighbor
• Linear Regression
• Random Forests
• XGBoost
2 Chapter 1. Introduction
AlphaPy Documentation, Release 2.5.0
AlphaPy relies on a number of key packages in both its model and domain pipelines. Although most packages are
included in the Anaconda Python platform, most of the following packages are not, so please refer to the Web or
Github site for further information.
• categorical-encoding: https://github.com/scikit-learn-contrib/categorical-encoding
• imbalanced-learn: https://github.com/scikit-learn-contrib/imbalanced-learn
• pyfolio: https://github.com/quantopian/pyfolio
• XGBoost: https://github.com/dmlc/xgboost
4 Chapter 1. Introduction
CHAPTER
TWO
QUICK START
cd AlphaPy/alphapy/examples
Note: Note that you can work entirely within a Jupyter notebook, or solely from the command line. Generally, we
like to run the pipelines first and then perform our analysis within a notebook.
5
AlphaPy Documentation, Release 2.5.0
THREE
INSTALLATION
You should already have pip, Python, and XGBoost (see below) installed on your system. Run the following command
to install AlphaPy:
3.1 XGBoost
For Macintosh and Window users, XGBoost will not install automatically with pip. For instructions to install XG-
Boost on your specific platform, go to http://xgboost.readthedocs.io/en/latest/build.html.
Note: If you already have the Anaconda Python distribution, then you can create a virtual environment for AlphaPy
with conda with the following recipe.
conda create -n alphapy python=3.5
source activate alphapy
conda install -c conda-forge bokeh
conda install -c conda-forge ipython
conda install -c conda-forge matplotlib
conda install -c conda-forge numpy
conda install -c conda-forge pandas
conda install -c conda-forge pyyaml
conda install -c conda-forge scikit-learn
conda install -c conda-forge scipy
conda install -c conda-forge seaborn
conda install -c conda-forge xgboost
pip install pandas_datareader
pip install imbalanced-learn
pip install category_encoders
pip install pyfolio
7
AlphaPy Documentation, Release 2.5.0
8 Chapter 3. Installation
CHAPTER
FOUR
COMMAND LINE
The AlphaPy Command Line Interface (CLI) was designed to be as simple as possible. First, change the directory to
your project location, where you have already followed the Project Structure specifications:
cd path/to/project
alphapy
Usage:
9
AlphaPy Documentation, Release 2.5.0
FIVE
SUPPORT
5.1 Donations
If you like the software, please click on the Donate button below:
11
AlphaPy Documentation, Release 2.5.0
12 Chapter 5. Support
CHAPTER
SIX
KAGGLE TUTORIAL
The most popular introductory project on Kaggle is Titanic, in which you apply machine learning to predict which
passengers were most likely to survive the sinking of the famous ship. In this tutorial, we will run AlphaPy to train a
model, generate predictions, and create a submission file so you can see where you land on the Kaggle leaderboard.
Note: AlphaPy is a good starter for most Kaggle competitions. We also use it for other competitions such as the
crowd-sourced hedge fund Numerai.
cd Kaggle
Before running AlphaPy, let’s briefly review the model.yml file in the config directory. We will submit the actual
predictions (1 vs. 0) instead of the probabilities, so submit_probas is set to False. All features will be included
except for the PassengerId. The target variable is Survived, the label we are trying to accurately predict.
We’ll compare random forests and XGBoost, run recursive feature elimination and a grid search, and select the best
model. Note that a blended model of all the algorithms is a candidate for best model. The details of each algorithm
are located in the algos.yml file.
Listing 1: model.yml
project:
directory : .
file_extension : csv
(continues on next page)
13
AlphaPy Documentation, Release 2.5.0
data:
drop : ['PassengerId']
features : '*'
sampling :
option : False
method : under_random
ratio : 0.5
sentinel : -1
separator : ','
shuffle : False
split : 0.4
target : Survived
target_value : 1
model:
algorithms : ['RF', 'XGB']
balance_classes : True
calibration :
option : False
type : sigmoid
cv_folds : 3
estimators : 51
feature_selection :
option : False
percentage : 50
uni_grid : [5, 10, 15, 20, 25]
score_func : f_classif
grid_search :
option : True
iterations : 50
random : True
subsample : False
sampling_pct : 0.2
pvalue_level : 0.01
rfe :
option : True
step : 3
scoring_function : roc_auc
type : classification
features:
clustering :
option : True
increment : 3
maximum : 30
minimum : 3
counts :
option : True
encoding :
rounding : 2
type : factorize
factors : []
interactions :
option : True
(continues on next page)
pipeline:
number_jobs : -1
seed : 42
verbosity : 0
plots:
calibration : True
confusion_matrix : True
importances : True
learning_curve : True
roc_curve : True
xgboost:
stopping_rounds : 20
Step 2: Now, we are ready to run AlphaPy. Enter the following command:
alphapy
As alphapy runs, you will see the progress of the workflow, and the logging output is saved in alphapy.log.
When the workflow completes, your project structure will look like this, with a different datestamp:
Kaggle
(continues on next page)
15
AlphaPy Documentation, Release 2.5.0
Step 3: To see how your model ranks on the Kaggle leaderboard, upload the submission file from the output
directory to the Web site https://www.kaggle.com/c/titanic/submit.
17
AlphaPy Documentation, Release 2.5.0
SEVEN
Machine learning subsumes technical analysis because collectively, technical analysis is just a set of features for
market prediction. We can use machine learning as a feature blender for moving averages, indicators such as RSI and
ADX, and even representations of chart formations such as double tops and head-and-shoulder patterns.
We are not directly predicting net return in our models, although that is the ultimate goal. By characterizing the market
with models, we can increase the Return On Investment (ROI). We have a wide range of dependent or target variables
from which to choose, not just net return. There is more power in building a classifier rather than a more traditional
regression model, so we want to define binary conditions such as whether or not today is going to be a trend day, rather
than a numerical prediction of today’s return.
In this tutorial, we will train a model that predicts whether or not the next day will have a larger-than-average range.
This is important for deciding which system to deploy on the prediction day. If our model gives us predictive power,
then we can filter out those days where trading a given system is a losing strategy.
Step 1: From the examples directory, change your directory:
cd "Trading Model"
Before running MarketFlow, let’s briefly review the configuration files in the config directory:
market.yml: The MarketFlow configuration file
model.yml: The AlphaPy configuration file
19
AlphaPy Documentation, Release 2.5.0
In market.yml, we limit our model to six stocks in the target group test, going back 2000 trading days. You can
define any group of stock symbols in the groups section, and then set the target_group attribute in the market
section to the name of that group.
This is a 1-day forecast, but we also use those features that can be calculated at the market open, such as gap infor-
mation in the leaders section. In the features section, we define many variables for moving averages, historical
range, RSI, volatility, and volume.
Listing 1: market.yml
market:
create_model : True
data_fractal : 1d
data_history : 500
forecast_period : 1
fractal : 1d
lag_period : 1
leaders : ['gap', 'gapbadown', 'gapbaup', 'gapdown', 'gapup']
predict_history : 100
schema : yahoo
subject : stock
target_group : test
groups:
all : ['aaoi', 'aapl', 'acia', 'adbe', 'adi', 'adp', 'agn', 'aig', 'akam',
'algn', 'alk', 'alxn', 'amat', 'amba', 'amd', 'amgn', 'amt', 'amzn',
'antm', 'arch', 'asml', 'athn', 'atvi', 'auph', 'avgo', 'axp', 'ayx',
'azo', 'ba', 'baba', 'bac', 'bby', 'bidu', 'biib', 'brcd', 'bvsn',
'bwld', 'c', 'cacc', 'cara', 'casy', 'cat', 'cde', 'celg', 'cern',
'chkp', 'chtr', 'clvs', 'cme', 'cmg', 'cof', 'cohr', 'comm', 'cost',
'cpk', 'crm', 'crus', 'csco', 'ctsh', 'ctxs', 'csx', 'cvs', 'cybr',
'data', 'ddd', 'deck', 'dgaz', 'dia', 'dis', 'dish', 'dnkn', 'dpz',
'drys', 'dust', 'ea', 'ebay', 'edc', 'edz', 'eem', 'elli', 'eog',
'esrx', 'etrm', 'ewh', 'ewt', 'expe', 'fang', 'fas', 'faz', 'fb',
'fcx', 'fdx', 'ffiv', 'fit', 'five', 'fnsr', 'fslr', 'ftnt', 'gddy',
'gdx', 'gdxj', 'ge', 'gild', 'gld', 'glw', 'gm', 'googl', 'gpro',
'grub', 'gs', 'gwph', 'hal', 'has', 'hd', 'hdp', 'hlf', 'hog', 'hum',
'ibb', 'ibm', 'ice', 'idxx', 'ilmn', 'ilmn', 'incy', 'intc', 'intu',
'ip', 'isrg', 'iwm', 'ivv', 'iwf', 'iwm', 'jack', 'jcp', 'jdst', 'jnj',
'jnpr', 'jnug', 'jpm', 'kite', 'klac', 'ko', 'kss', 'labd', 'labu',
'len', 'lite', 'lmt', 'lnkd', 'lrcx', 'lulu', 'lvs', 'mbly', 'mcd',
'mchp', 'mdy', 'meoh', 'mnst', 'mo', 'momo', 'mon', 'mrk', 'ms', 'msft',
'mtb', 'mu', 'nflx', 'nfx', 'nke', 'ntap', 'ntes', 'ntnx', 'nugt',
'nvda', 'nxpi', 'nxst', 'oii', 'oled', 'orcl', 'orly', 'p', 'panw',
'pcln', 'pg', 'pm', 'pnra', 'prgo', 'pxd', 'pypl', 'qcom', 'qqq',
'qrvo', 'rht', 'sam', 'sbux', 'sds', 'sgen', 'shld', 'shop', 'sig',
'sina', 'siri', 'skx', 'slb', 'slv', 'smh', 'snap', 'sncr', 'soda',
'splk', 'spy', 'stld', 'stmp', 'stx', 'svxy', 'swks', 'symc', 't',
'tbt', 'teva', 'tgt', 'tho', 'tlt', 'tmo', 'tna', 'tqqq', 'trip',
'tsla', 'ttwo', 'tvix', 'twlo', 'twtr', 'tza', 'uaa', 'ugaz', 'uhs',
'ulta', 'ulti', 'unh', 'unp', 'upro', 'uri', 'ups', 'uri', 'uthr',
'utx', 'uvxy', 'v', 'veev', 'viav', 'vlo', 'vmc', 'vrsn', 'vrtx', 'vrx',
'vwo', 'vxx', 'vz', 'wday', 'wdc', 'wfc', 'wfm', 'wmt', 'wynn', 'x',
'xbi', 'xhb', 'xiv', 'xle', 'xlf', 'xlk', 'xlnx', 'xom', 'xlp', 'xlu',
'xlv', 'xme', 'xom', 'wix', 'yelp', 'z']
etf : ['dia', 'dust', 'edc', 'edz', 'eem', 'ewh', 'ewt', 'fas', 'faz',
'gld', 'hyg', 'iwm', 'ivv', 'iwf', 'jnk', 'mdy', 'nugt', 'qqq',
'sds', 'smh', 'spy', 'tbt', 'tlt', 'tna', 'tvix', 'tza', 'upro',
(continues on next page)
aliases:
atr : 'ma_truerange'
aver : 'ma_hlrange'
cma : 'ma_close'
cmax : 'highest_close'
cmin : 'lowest_close'
hc : 'higher_close'
hh : 'higher_high'
hl : 'higher_low'
ho : 'higher_open'
hmax : 'highest_high'
hmin : 'lowest_high'
lc : 'lower_close'
lh : 'lower_high'
ll : 'lower_low'
lo : 'lower_open'
lmax : 'highest_low'
lmin : 'lowest_low'
net : 'net_close'
netdown : 'down_net'
netup : 'up_net'
omax : 'highest_open'
omin : 'lowest_open'
rmax : 'highest_hlrange'
rmin : 'lowest_hlrange'
rr : 'maratio_hlrange'
rixc : 'rindex_close_high_low'
rixo : 'rindex_open_high_low'
roi : 'netreturn_close'
rsi : 'rsi_close'
sepma : 'ma_sep'
(continues on next page)
21
AlphaPy Documentation, Release 2.5.0
variables:
abovema : 'close > cma_50'
belowma : 'close < cma_50'
bigup : 'rrover & sephigh & netup'
bigdown : 'rrover & sephigh & netdown'
doji : 'sepdoji & rrunder'
hookdown : 'open > high[1] & close < close[1]'
hookup : 'open < low[1] & close > close[1]'
inside : 'low > low[1] & high < high[1]'
madelta : '(close - cma_50) / atr_10'
nr : 'hlrange == rmin_4'
outside : 'low < low[1] & high > high[1]'
roihigh : 'roi_5 >= 5'
roilow : 'roi_5 < -5'
roiminus : 'roi_5 < 0'
roiplus : 'roi_5 > 0'
rrhigh : 'rr_1_10 >= 1.2'
rrlow : 'rr_1_10 <= 0.8'
rrover : 'rr_1_10 >= 1.0'
rrunder : 'rr_1_10 < 1.0'
sep : 'rixc_1 - rixo_1'
sepdoji : 'abs(sep) <= 15'
sephigh : 'abs(sep_1_1) >= 70'
seplow : 'abs(sep_1_1) <= 30'
trend : 'rrover & sephigh'
vmover : 'vmratio >= 1'
vmunder : 'vmratio < 1'
volatility : 'atr_10 / close'
wr : 'hlrange == rmax_4'
In each of the tutorials, we experiment with different options in model.yml to run AlphaPy. Here, we first apply
univariate feature selection and then run a random forest classifier with Recursive Feature Elimination, including
Cross-Validation (RFECV). When you choose RFECV, the process takes much longer, so if you want to see more
logging, then increase the verbosity level in the pipeline section.
Since stock prices are time series data, we apply the runs_test function to twelve features in the treatments
section. Treatments are powerful because you can write any function to extrapolate new features from existing ones.
AlphaPy provides some of these functions in the alphapy.features module, but it can also import external
functions as well.
Our target variable is rrover, the ratio of the 1-day range to the 10-day average high/low range. If that ratio is greater
than or equal to 1.0, then the value of rrover is True. This is what we are trying to predict.
Listing 2: model.yml
project:
directory : .
file_extension : csv
submission_file :
submit_probas : False
data:
drop : ['date', 'tag', 'open', 'high', 'low', 'close', 'volume',
˓→'adjclose', (continues on next page)
model:
algorithms : ['RF']
balance_classes : True
calibration :
option : False
type : isotonic
cv_folds : 3
estimators : 501
feature_selection :
option : True
percentage : 50
uni_grid : [5, 10, 15, 20, 25]
score_func : f_classif
grid_search :
option : False
iterations : 100
random : True
subsample : True
sampling_pct : 0.25
pvalue_level : 0.01
rfe :
option : True
step : 10
scoring_function : 'roc_auc'
type : classification
features:
clustering :
option : False
increment : 3
maximum : 30
minimum : 3
counts :
option : False
encoding :
rounding : 3
type : factorize
factors : []
interactions :
(continues on next page)
23
AlphaPy Documentation, Release 2.5.0
treatments:
doji : ['alphapy.features', 'runs_test', ['all'], 18]
hc : ['alphapy.features', 'runs_test', ['all'], 18]
hh : ['alphapy.features', 'runs_test', ['all'], 18]
hl : ['alphapy.features', 'runs_test', ['all'], 18]
ho : ['alphapy.features', 'runs_test', ['all'], 18]
rrhigh : ['alphapy.features', 'runs_test', ['all'], 18]
rrlow : ['alphapy.features', 'runs_test', ['all'], 18]
rrover : ['alphapy.features', 'runs_test', ['all'], 18]
rrunder : ['alphapy.features', 'runs_test', ['all'], 18]
sephigh : ['alphapy.features', 'runs_test', ['all'], 18]
seplow : ['alphapy.features', 'runs_test', ['all'], 18]
trend : ['alphapy.features', 'runs_test', ['all'], 18]
pipeline:
number_jobs : -1
seed : 10231
verbosity : 0
plots:
calibration : True
confusion_matrix : True
importances : True
(continues on next page)
xgboost:
stopping_rounds : 20
As mflow runs, you will see the progress of the workflow, and the logging output is saved in market_flow.log.
When the workflow completes, your project structure will look like this, with a different datestamp:
Trading Model
market_flow.log
config
algos.yml
market.yml
model.yml
data
input
test_20170420.csv
test.csv
train_20170420.csv
train.csv
model
feature_map_20170420.pkl
model_20170420.pkl
output
predictions_20170420.csv
probabilities_20170420.csv
rankings_20170420.csv
plots
calibration_test.png
calibration_train.png
confusion_test_RF.png
confusion_train_RF.png
feature_importance_train_RF.png
learning_curve_train_RF.png
roc_curve_test.png
roc_curve_train.png
Let’s look at the results in the plots directory. Since our scoring function was roc_auc, we examine the ROC
Curve first. The AUC is approximately 0.61, which is not very high but in the context of the stock market, we may
still be able to derive some predictive power. Further, we are running the model on a relatively small sample of stocks,
as denoted by the jittery line of the ROC Curve.
25
AlphaPy Documentation, Release 2.5.0
We can benefit from more samples, as the learning curve shows that the training and cross-validation lines have yet to
converge.
The good news is that even with a relatively small number of testing points, the Reliability Curve slopes upward from
left to right, with the dotted line denoting a perfect classifier.
27
AlphaPy Documentation, Release 2.5.0
To get better accuracy, we can raise our threshold to find the best candidates, since they are ranked by probability, but
this also means limiting our pool of stocks. Let’s take a closer look at the rankings file.
Step 3: From the command line, enter:
jupyter notebook
A Trading Model.ipynb
Step 5: Run the commands in the notebook, making sure that when you read in the rankings file, change the date to
match the result from the ls command.
Conclusion We can predict large-range days with some confidence, but only at a higher probability threshold. This
is important for choosing the correct system on any given day. We can achieve better results with more data, so we
recommend expanding the stock universe, e.g., a group with at least 100 members going five years back.
29
AlphaPy Documentation, Release 2.5.0
EIGHT
In this tutorial, we use machine learning to predict whether or not an NCAA Men’s Basketball team will cover the
spread. The spread is set by Las Vegas bookmakers to balance the betting; it is a way of giving points to the underdog
to encourage bets on both sides.
SportFlow starts with the basic data and derives time series features based on streaks and runs (not the baseball runs).
In the table below, the game data includes both line and over_under information consolidated from various sports Web
sites. For example, a line of -9 means the home team is favored by 9 points. A line of +3 means the away team is
favored by 3 points; the line is always relative to the home team. An over_under is the predicted total score for the
game, with a bet being placed on whether not the final total will be under or over that amount.
31
AlphaPy Documentation, Release 2.5.0
cd NCAAB
Before running SportFlow, let’s briefly review the configuration files in the config directory:
sport.yml: The SportFlow configuration file
model.yml: The AlphaPy configuration file
In sport.yml, the first three items are used for random_scoring, which we will not be doing here. By default,
we will create a model based on all seasons and calculate short-term streaks of 3 with the rolling_window.
Listing 1: sport.yml
sport:
league : NCAAB
points_max : 100
points_min : 50
random_scoring : False
seasons : []
rolling_window : 3
In each of the tutorials, we experiment with different options in model.yml to run AlphaPy. Here, we will run a
random forest classifier with Recursive Feature Elimination and Cross-Validation (RFECV), and then an XGBoost
classifier. We will also perform a random grid search, which increases the total running time to approximately 15
minutes. You can get in some two-ball dribbling while waiting for SportFlow to finish.
In the features section, we identify the factors generated by SportFlow. For example, we want to treat the
various streaks as factors. Other options are interactions, standard scaling, and a threshold for removing
low-variance features.
Our target variable is won_on_spread, a Boolean indicator of whether or not the home team covered the spread.
This is what we are trying to predict.
Listing 2: model.yml
project:
directory : .
file_extension : csv
submission_file :
submit_probas : False
data:
drop : ['Unnamed: 0', 'index', 'season', 'date', 'home.team', 'away.
˓→team',
model:
algorithms : ['RF', 'XGB']
balance_classes : False
calibration :
option : False
type : isotonic
cv_folds : 3
estimators : 201
(continues on next page)
33
AlphaPy Documentation, Release 2.5.0
features:
clustering :
option : False
increment : 3
maximum : 30
minimum : 3
counts :
option : False
encoding :
rounding : 3
type : factorize
factors : ['line', 'delta.wins', 'delta.losses', 'delta.ties',
'delta.point_win_streak', 'delta.point_loss_streak',
'delta.cover_win_streak', 'delta.cover_loss_streak',
'delta.over_streak', 'delta.under_streak']
interactions :
option : True
poly_degree : 2
sampling_pct : 5
isomap :
option : False
components : 2
neighbors : 5
logtransform :
option : False
numpy :
option : False
pca :
option : False
increment : 3
maximum : 15
minimum : 3
whiten : False
scaling :
option : True
type : standard
scipy :
option : False
text :
(continues on next page)
pipeline:
number_jobs : -1
seed : 13201
verbosity : 0
plots:
calibration : True
confusion_matrix : True
importances : True
learning_curve : True
roc_curve : True
xgboost:
stopping_rounds : 30
As sflow runs, you will see the progress of the workflow, and the logging output is saved in sport_flow.log.
When the workflow completes, your project structure will look like this, with a different datestamp:
NCAAB
sport_flow.log
config
algos.yml
sport.yml
model.yml
data
ncaab_game_scores_1g.csv
input
test.csv
train.csv
model
feature_map_20170427.pkl
model_20170427.pkl
output
predictions_20170427.csv
probabilities_20170427.csv
rankings_20170427.csv
plots
calibration_test.png
calibration_train.png
confusion_test_RF.png
confusion_test_XGB.png
(continues on next page)
35
AlphaPy Documentation, Release 2.5.0
Depending upon the model parameters and the prediction date, the AUC of the ROC Curve will vary between 0.54
and 0.58. This model is barely passable, but we are getting a slight edge even with our basic data. We will need more
game samples to have any confidence in our predictions.
After a model is created, we can run sflow in predict mode. Just specify the prediction date pdate, and Sport-
Flow will make predictions for all cases in the predict.csv file on or after the specified date. Note that the
predict.csv file is generated on the fly in predict mode and stored in the input directory.
Step 3: Now, let’s run SportFlow in predict mode, where all results will be stored in the output directory:
sflow --predict --pdate 2016-03-15
Conclusion Even with just one season of NCAA Men’s Basketball data, our model predicts between 52-54%
accuracy. To attain better accuracy, we need more historical data vis a vis the number of games and other types of
information such as individual player statistics. If you want to become a professional bettor, then you need at least
56% winners to break the bank.
NINE
A trading system is a set of automated rules for buying and selling stocks, options, futures, and other instruments.
Trading is considered to be both an art and a science; the scientific branch is known as technical analysis. Many
technicians spend their lives chasing the Holy Grail: a system that will make them rich simply by detecting common
patterns. Technicians in history such as Edwards, Elliott, Fibonacci, Gann, and Gartley show us visually appealing
charts, but there is no scientific evidence proving that these techniques actually work.
Trading systems generally operate in two contexts: trend and counter-trend. A system that follows the trend tries
to stay in one direction as long as possible. A system that bucks the trend reverses direction at certain support and
resistance levels, also known as fading the trend. With MarketFlow, you can implement either type of system using
our long/short strategy.
In this tutorial, we are going to test a simple long/short system. If today’s closing price is greater than yesterday’s
close, then we go long. If today’s close is lower than yesterday’s, then we go short, so we always have a position in
the market.
Step 1: From the examples directory, change your directory:
cd "Trading System"
37
AlphaPy Documentation, Release 2.5.0
Before running MarketFlow, let’s review the market.yml file in the config directory. Since we are just running a
system, we really don’t need the model.yml file, but if you have a project where the system is derived from a model,
then you will want to maintain both files.
In market.yml, we will test our system on five stocks in the target group faang, going back 1000 trading days.
We need to define only two features: hc for higher close, and lc for lower close. We name the system closer,
which requires just a longentry and a shortentry. There are no exit conditions and no holding period, so we
will always have a position in each stock.
Listing 1: market.yml
market:
create_model : False
data_fractal : 1d
data_history : 500
forecast_period : 1
fractal : 1d
lag_period : 1
leaders : []
predict_history : 50
schema : quandl_wiki
subject : stock
target_group : faang
system:
name : 'closer'
holdperiod : 0
longentry : hc
longexit :
shortentry : lc
shortexit :
scale : False
groups:
faang : ['fb', 'aapl', 'amzn', 'nflx', 'googl']
aliases:
hc : 'higher_close'
lc : 'lower_close'
As mflow runs, you will see the progress of the workflow, and the logging output is saved in market_flow.log.
When the workflow completes, your project structure will look like this, with an additional directory systems:
Trading System
market_flow.log
config
algos.yml
market.yml
model.yml
data
input
model
(continues on next page)
MarketFlow records position, return, and transaction data in the systems directory, so now we can analyze our
results with Pyfolio.
Step 3: From the command line, enter:
jupyter notebook
A Trading System.ipynb
39
AlphaPy Documentation, Release 2.5.0
TEN
ALPHAPY
AlphaPy first reads the model.yml file and then displays the model parameters as confirmation that the file was read
successfully. As shown in the example below, the Random Forest (RF) and XGBoost (XGB) algorithms are used to
build the model. From the model specifications, a Model object will be created.
All of the model parameters are listed in alphabetical order. At a minimum, scan for algorithms, features,
model_type, and target to verify their accuracy, i.e., that you are running the right model. The verbosity
parameter will control the degree of output that you see when running the pipeline.
Listing 1: alphapy.log
[12/30/17 23:17:49]
˓→INFO ********************************************************************************
[12/30/17 23:17:49] INFO AlphaPy Start
[12/30/17 23:17:49]
˓→INFO ********************************************************************************
[12/30/17 23:17:49] INFO Model Configuration
[12/30/17 23:17:49] INFO No Treatments Found
[12/30/17 23:17:49] INFO MODEL PARAMETERS:
[12/30/17 23:17:49] INFO algorithms = ['RF', 'XGB']
(continues on next page)
41
AlphaPy Documentation, Release 2.5.0
Data are loaded from both the training file and the test file. Any features that you wish to remove from the data are
then dropped. Statistics about the shape of the data and the target variable proportions are logged.
Listing 2: alphapy.log
[12/30/17 23:17:50] INFO Creating directory ./model
[12/30/17 23:17:50] INFO Creating directory ./output
[12/30/17 23:17:50] INFO Creating directory ./plots
[12/30/17 23:17:50] INFO Creating Model
[12/30/17 23:17:50] INFO Calling Pipeline
[12/30/17 23:17:50] INFO Training Pipeline
[12/30/17 23:17:50] INFO Loading Data
[12/30/17 23:17:50] INFO Loading data from ./input/train.csv
[12/30/17 23:17:50] INFO Found target Survived in data frame
[12/30/17 23:17:50] INFO Labels (y) found for Partition.train
[12/30/17 23:17:50] INFO Loading Data
[12/30/17 23:17:50] INFO Loading data from ./input/test.csv
[12/30/17 23:17:50] INFO Target Survived not found in Partition.test
[12/30/17 23:17:50] INFO Saving New Features in Model
[12/30/17 23:17:50] INFO Original Feature Statistics
[12/30/17 23:17:50] INFO Number of Training Rows : 891
[12/30/17 23:17:50] INFO Number of Training Columns : 11
[12/30/17 23:17:50] INFO Unique Training Values for Survived : [0 1]
[12/30/17 23:17:50] INFO Unique Training Counts for Survived : [549 342]
There are two stages to feature processing. First, you may want to transform a column of a dataframe into a different
format or break up a feature into its respective components. This is known as a treatment, and it is a one-to-many
transformation. For example, a date feature can be extracted into day, month, and year.
The next stage is feature type determination, which applies to all features, regardless of whether or not a treatment has
been previously applied. The unique number of a feature’s values dictates whether or not that feature is a factor. If the
given feature is a factor, then a specific type of encoding is applied. Otherwise, the feature is generally either text or a
number.
In the example below, each feature’s type is identified along with the unique number of values. For factors, a specific
type of encoding is selected, as specified in the model.yml file. For text, you can choose either count vectorization
and TF-IDF or just plain factorization. Numerical features have both imputation and log-transformation options.
Listing 3: alphapy.log
[12/30/17 23:17:50] INFO Number of Testing Rows : 418
[12/30/17 23:17:50] INFO Number of Testing Columns : 11
[12/30/17 23:17:50] INFO Original Features : Index(['PassengerId', 'Pclass',
˓→'Name', 'Sex', 'Age', 'SibSp', 'Parch',
'Cabin', 'Embarked'],
dtype='object')
[12/30/17 23:17:50] INFO Feature Count : 10
(continues on next page)
[12/30/17 23:17:50] INFO Feature 2: Name is a text feature [12:82] with 1307
˓→unique values
[12/30/17 23:17:50] INFO Feature 7: Ticket is a text feature [3:18] with 929
˓→unique values
[12/30/17 23:17:50] INFO Feature 9: Cabin is a text feature [1:15] with 187
˓→unique values
As AlphaPy runs, you can see the number of new features that are generated along the way, depending on which
features you selected in the features section of the model.yml file. For interactions, you specify the polynomial
degree and the percentage of the interactions that you would like to retain in the model. Be careful of the polynomial
degree, as the number of interaction terms is exponential.
Listing 4: alphapy.log
[12/30/17 23:17:50] INFO k = 21
[12/30/17 23:17:50] INFO k = 24
[12/30/17 23:17:50] INFO k = 27
[12/30/17 23:17:51] INFO k = 30
[12/30/17 23:17:51] INFO Clustering Feature Count : 10
[12/30/17 23:17:51] INFO New Feature Count : 35
[12/30/17 23:17:51] INFO Saving New Features in Model
[12/30/17 23:17:51] INFO Creating Interactions
[12/30/17 23:17:51] INFO Initial Feature Count : 35
[12/30/17 23:17:51] INFO Generating Polynomial Features
[12/30/17 23:17:51] INFO Interaction Percentage : 10
[12/30/17 23:17:51] INFO Polynomial Degree : 5
[12/30/17 23:17:51] INFO Polynomial Feature Count : 15
Listing 5: alphapy.log
[12/30/17 23:17:51] INFO Getting Class Weights
[12/30/17 23:17:51] INFO Class Weight for target Survived [1]: 1.605263
(continues on next page)
A classification model is highly dependent on the class proportions. If you’re trying to predict a rare pattern with high
accuracy, then training for accuracy will be useless because a dumb classifier could just predict the majority class and
be right most of the time. As a result, AlphaPy gives data scientists the ability to undersample majority classes or
oversample minority classes. There are even techniques that combine the two, e.g., SMOTE or ensemble sampling.
Before estimation, we need to apply sampling and possibly shuffling to improve cross-validation. For example, time
series data is ordered, and you may want to eliminate that dependency.
At the beginning of the estimation phase, we read in all of the algorithms from the algos.yml file and then select
those algorithms used in this particular model. The process is iterative for each algorithm: initial fit, feature selection,
grid search, and final fit.
Listing 6: alphapy.log
[12/30/17 23:17:51] INFO New Total Feature Count : 50
[12/30/17 23:17:51] INFO Saving New Features in Model
[12/30/17 23:17:51] INFO Removing Low-Variance Features
[12/30/17 23:17:51] INFO Low-Variance Threshold : 0.10
[12/30/17 23:17:51] INFO Original Feature Count : 50
[12/30/17 23:17:51] INFO Reduced Feature Count : 50
[12/30/17 23:17:51] INFO Saving New Features in Model
[12/30/17 23:17:51] INFO Skipping Shuffling
[12/30/17 23:17:51] INFO Skipping Sampling
[12/30/17 23:17:51] INFO Getting Class Weights
[12/30/17 23:17:51] INFO Class Weight for target Survived [1]: 1.605263
[12/30/17 23:17:51] INFO Getting All Estimators
[12/30/17 23:17:51] INFO Algorithm Configuration
[12/30/17 23:17:51] INFO Selecting Models
[12/30/17 23:17:51] INFO Algorithm: RF
[12/30/17 23:17:51] INFO Fitting Initial Model
[12/30/17 23:17:51] INFO Recursive Feature Elimination with CV
[12/30/17 23:18:14] INFO RFECV took 22.72 seconds for step 3 and 3 folds
[12/30/17 23:18:14] INFO Algorithm: RF, Selected Features: 20, Ranking: [ 2 1
˓→ 1 1 5 9 1 1 2 6 8 7 8 6 7 10 11 11 11 10 10 1 1 1 1
9 6 9 5 1 5 4 2 4 1 1 1 3 7 1 1 8 1 1 4 1 1 3 3 1]
[12/30/17 23:18:14] INFO Randomized Grid Search
[12/30/17 23:19:08] INFO Grid Search took 54.03 seconds for 50 candidate
˓→parameter settings.
[12/30/17 23:19:08] INFO Algorithm: RF, Best Score: 0.8627, Best Parameters: {
˓→'est__n_estimators': 501, 'est__min_samples_split': 5, 'est__min_samples_leaf': 3,
Listing 7: alphapy.log
[12/30/17 23:17:51] INFO Selecting Models
[12/30/17 23:17:51] INFO Algorithm: RF
[12/30/17 23:17:51] INFO Fitting Initial Model
[12/30/17 23:17:51] INFO Recursive Feature Elimination with CV
[12/30/17 23:18:14] INFO RFECV took 22.72 seconds for step 3 and 3 folds
[12/30/17 23:18:14] INFO Algorithm: RF, Selected Features: 20, Ranking: [ 2 1
˓→ 1 1 5 9 1 1 2 6 8 7 8 6 7 10 11 11 11 10 10 1 1 1 1
9 6 9 5 1 5 4 2 4 1 1 1 3 7 1 1 8 1 1 4 1 1 3 3 1]
[12/30/17 23:18:14] INFO Randomized Grid Search
[12/30/17 23:19:08] INFO Grid Search took 54.03 seconds for 50 candidate
˓→parameter settings.
Each model is evaluated using all of the metrics available in scikit-learn to give you a sense of how other scoring
functions compare. Metrics are calculated on the training data for every algorithm. If test labels are present, then
metrics are also calculated for the test data.
Listing 8: alphapy.log
[12/30/17 23:19:32] INFO Final Model Predictions for XGB
[12/30/17 23:19:32] INFO Skipping Calibration
[12/30/17 23:19:32] INFO Making Predictions
[12/30/17 23:19:32] INFO Predictions Complete
[12/30/17 23:19:32] INFO Blending Models
[12/30/17 23:19:32] INFO Blending Start: 2017-12-30 23:19:32.734086
[12/30/17 23:19:32] INFO Blending Complete: 0:00:00.010781
[12/30/17 23:19:32]
˓→INFO ================================================================================
[12/30/17 23:19:32] INFO Metrics for: Partition.train
[12/30/17 23:19:32] INFO ------------------------------------------------------
˓→--------------------------
Listing 9: alphapy.log
[12/30/17 23:19:32] INFO Mean validation score: 0.855 (std: 0.023)
[12/30/17 23:19:32] INFO Parameters: {'est__subsample': 0.5, 'est__n_estimators
˓→': 21, 'est__min_child_weight': 1.0, 'est__max_depth': 7, 'est__learning_rate': 0.
[12/30/17 23:19:32] INFO Algorithm: XGB, Best Score: 0.8627, Best Parameters: {
˓→'est__subsample': 0.6, 'est__n_estimators': 21, 'est__min_child_weight': 1.1, 'est__
When more than one algorithm is scored in the estimation stage, the final step is to combine the predictions of each
one and create the blended model, i.e., the predictions from the independent models are used as training features. For
classification, AlphaPy uses logistic regression, and for regression, we use ridge regression.
• The model object is stored in Pickle (.pkl) format in the models directory of the project. The model is loaded
later in prediction mode.
• The feature map is stored in Pickle (.pkl) format in the models directory. The feature map is restored for
prediction mode.
• Predictions are stored in the project’s output directory.
• Sorted rankings of predictions are stored in output.
• Any submission files are stored in output.
[12/30/17 23:19:36]
˓→INFO ================================================================================
[12/30/17 23:19:36] INFO Saving Model Predictor
[12/30/17 23:19:36] INFO Writing model predictor to ./model/model_20171230.pkl
[12/30/17 23:19:36] INFO Saving Feature Map
[12/30/17 23:19:36] INFO Writing feature map to ./model/feature_map_20171230.
˓→pkl
ELEVEN
PROJECT STRUCTURE
11.1 Setup
Your initial configuration must have the following directories and files. The directories config, data, and input
store input, and the directories model, output, and plots store output:
project
config
model.yml
algos.yml
data
input
train.csv
test.csv
model
output
plots
The top-level directory is the main project directory with a unique name. There are six required subdirectories:
config: This directory contains all of the YAML files. At a minimum, it must contain model.yml and algos.
yml.
data: If required, any data for the domain pipeline is stored here. Data from this directory will be transformed into
train.csv and test.csv in the input directory.
input: The training file train.csv and the testing file test.csv are stored here. Note that these file names can
be named anything as configured in the model.yml file.
model: The final model is dumped here as a pickle file in the format model_[yyyymmdd].pkl.
output: This directory contains predictions, probabilities, rankings, and any submission files:
• predictions_[yyyymmdd].csv
• probabilities_[yyyymmdd].csv
• rankings_[yyyymmdd].csv
• submission_[yyyymmdd].csv
plots: All generated plots are stored here. The file name has the following elements:
• plot name
• ‘train’ or ‘test’
• algorithm abbreviation
59
AlphaPy Documentation, Release 2.5.0
• format suffix
For example, a calibration plot for the testing data for all algorithms will be named calibration_test.
png. The file name for a confusion matrix for XGBoost training data will be confusion_train_XGB.png.
Here is an example of a model configuration file. It is written in YAML and is divided into logical sections reflecting
the stages of the pipeline. Within each section, you can control different aspects for experimenting with model results.
Please refer to the following sections for more detail.
Listing 1: model.yml
project:
directory : .
file_extension : csv
submission_file : 'gender_submission'
submit_probas : False
data:
drop : ['PassengerId']
features : '*'
sampling :
option : False
method : under_random
ratio : 0.5
sentinel : -1
separator : ','
shuffle : False
split : 0.4
target : Survived
target_value : 1
model:
algorithms : ['RF', 'XGB']
balance_classes : True
calibration :
option : False
type : sigmoid
cv_folds : 3
estimators : 51
feature_selection :
option : False
percentage : 50
uni_grid : [5, 10, 15, 20, 25]
score_func : f_classif
grid_search :
option : True
iterations : 50
random : True
subsample : False
sampling_pct : 0.2
pvalue_level : 0.01
rfe :
option : True
step : 3
(continues on next page)
features:
clustering :
option : True
increment : 3
maximum : 30
minimum : 3
counts :
option : True
encoding :
rounding : 2
type : factorize
factors : []
interactions :
option : True
poly_degree : 5
sampling_pct : 10
isomap :
option : False
components : 2
neighbors : 5
logtransform :
option : False
numpy :
option : True
pca :
option : False
increment : 1
maximum : 10
minimum : 2
whiten : False
scaling :
option : True
type : standard
scipy :
option : False
text :
ngrams : 3
vectorize : False
tsne :
option : False
components : 2
learning_rate : 1000.0
perplexity : 30.0
variance :
option : True
threshold : 0.1
pipeline:
number_jobs : -1
seed : 42
verbosity : 0
plots:
calibration : True
(continues on next page)
xgboost:
stopping_rounds : 20
Listing 2: model.yml
project:
directory : .
file_extension : csv
submission_file : 'gender_submission'
submit_probas : False
Warning: If you do not supply a value on the right-hand side of the colon [:], then Python will interpret that key
as having a None value, which is correct. Do not spell out None; otherwise, the value will be interpreted as the
string ‘None’.
Listing 3: model.yml
data:
drop : ['PassengerId']
features : '*'
sampling :
option : False
method : under_random
ratio : 0.5
sentinel : -1
separator : ','
shuffle : False
split : 0.4
target : Survived
target_value : 1
Listing 4: model.yml
model:
algorithms : ['RF', 'XGB']
balance_classes : True
calibration :
option : False
type : sigmoid
cv_folds : 3
estimators : 51
(continues on next page)
Listing 5: model.yml
features:
clustering :
option : True
increment : 3
maximum : 30
minimum : 3
counts :
option : True
encoding :
rounding : 2
type : factorize
factors : []
interactions :
option : True
poly_degree : 5
sampling_pct : 10
isomap :
option : False
components : 2
neighbors : 5
logtransform :
option : False
numpy :
option : True
pca :
option : False
increment : 1
maximum : 10
minimum : 2
whiten : False
scaling :
option : True
type : standard
scipy :
option : False
text :
ngrams : 3
vectorize : False
tsne :
option : False
components : 2
learning_rate : 1000.0
perplexity : 30.0
variance :
option : True
threshold : 0.1
Treatments are special functions for feature extraction. In the treatments section below, we are applying treat-
ments to two features doji and hc. Within the Python list, we are calling the runs_test function of the module
alphapy.features. The module name is always the first element of the list, and the the function name is always the
second element of the list. The remaining elements of the list are the actual parameters to the function.
Listing 6: model.yml
treatments:
doji : ['alphapy.features', 'runs_test', ['all'], 18]
hc : ['alphapy.features', 'runs_test', ['all'], 18]
Here is the code for the runs_test function, which calculates runs for Boolean features. For a treatment function,
the first and second arguments are always the same. The first argument f is the data frame, and the second argument c
is the column (or feature) to which we are going to apply the treatment. The remaining function arguments correspond
to the actual parameters that were specified in the configuration file, in this case wfuncs and window.
Listing 7: features.py
def runs_test(f, c, wfuncs, window):
fc = f[c]
all_funcs = {'runs' : runs,
'streak' : streak,
'rtotal' : rtotal,
'zscore' : zscore}
# use all functions
if 'all' in wfuncs:
wfuncs = all_funcs.keys()
# apply each of the runs functions
new_features = pd.DataFrame()
for w in wfuncs:
if w in all_funcs:
new_feature = fc.rolling(window=window).apply(all_funcs[w])
new_feature.fillna(0, inplace=True)
frames = [new_features, new_feature]
new_features = pd.concat(frames, axis=1)
else:
logger.info("Runs Function %s not found", w)
return new_features
When the runs_test function is invoked, a new data frame is created, as multiple feature columns may be generated
from a single treatment function. These new features are returned and appended to the original data frame.
Listing 8: model.yml
pipeline:
number_jobs : -1
seed : 42
verbosity : 0
To turn on the automatic generation of any plot in the plots section, simply set the corresponding value to True.
Listing 9: model.yml
plots:
calibration : True
confusion_matrix : True
importances : True
learning_curve : True
roc_curve : True
Each algorithm has its own section in the algos.yml file, e.g., AB or RF. The following elements are required for
every algorithm entry in the YAML file:
model_type: Specify classification or regression
params The initial parameters for the first fitting
grid: The grid search dictionary for hyperparameter tuning of an estimator. If you are using randomized grid search,
then make sure that the total number of grid combinations exceeds the number of random iterations.
scoring: Set to True if a specific scoring function will be applied.
Note: The parameters n_estimators, n_jobs, seed, and verbosity are informed by the model.yml file.
When the estimators are created, the proper values for these parameters are automatically substituted in the algos.
yml file on a global basis.
AB:
# AdaBoost
model_type : classification
params : {"n_estimators" : n_estimators,
"random_state" : seed}
grid : {"n_estimators" : [10, 50, 100, 150, 200],
"learning_rate" : [0.2, 0.5, 0.7, 1.0, 1.5, 2.0],
"algorithm" : ['SAMME', 'SAMME.R']}
scoring : True
GB:
# Gradient Boosting
model_type : classification
params : {"n_estimators" : n_estimators,
"max_depth" : 3,
"random_state" : seed,
"verbose" : verbosity}
grid : {"loss" : ['deviance', 'exponential'],
"learning_rate" : [0.05, 0.1, 0.15],
"n_estimators" : [50, 100, 200],
"max_depth" : [3, 5, 10],
"min_samples_split" : [2, 3],
"min_samples_leaf" : [1, 2]}
scoring : True
GBR:
# Gradient Boosting Regression
model_type : regression
params : {"n_estimators" : n_estimators,
"random_state" : seed,
"verbose" : verbosity}
grid : {}
scoring : False
KNN:
# K-Nearest Neighbors
model_type : classification
params : {"n_jobs" : n_jobs}
grid : {"n_neighbors" : [3, 5, 7, 10],
"weights" : ['uniform', 'distance'],
"algorithm" : ['ball_tree', 'kd_tree', 'brute', 'auto'],
"leaf_size" : [10, 20, 30, 40, 50]}
scoring : False
KNR:
# K-Nearest Neighbor Regression
model_type : regression
params : {"n_jobs" : n_jobs}
grid : {}
scoring : False
LR:
# Linear Regression
model_type : regression
params : {"n_jobs" : n_jobs}
grid : {"fit_intercept" : [True, False],
"normalize" : [True, False],
"copy_X" : [True, False]}
scoring : False
LSVC:
# Linear Support Vector Classification
model_type : classification
params : {"C" : 0.01,
"max_iter" : 2000,
"penalty" : 'l1',
"dual" : False,
"random_state" : seed,
"verbose" : verbosity}
grid : {"C" : np.logspace(-2, 10, 13),
"penalty" : ['l1', 'l2'],
"dual" : [True, False],
"tol" : [0.0005, 0.001, 0.005],
"max_iter" : [500, 1000, 2000]}
scoring : False
LSVM:
# Linear Support Vector Machine
model_type : classification
params : {"kernel" : 'linear',
"probability" : True,
"random_state" : seed,
"verbose" : verbosity}
grid : {"C" : np.logspace(-2, 10, 13),
"gamma" : np.logspace(-9, 3, 13),
"shrinking" : [True, False],
"tol" : [0.0005, 0.001, 0.005],
"decision_function_shape" : ['ovo', 'ovr']}
scoring : False
NB:
# Naive Bayes
model_type : classification
params : {}
grid : {"alpha" : [0.01, 0.1, 0.2, 0.3, 0.4, 0.5, 1.0, 2.0, 5.0, 10.0],
(continues on next page)
RBF:
# Radial Basis Function
model_type : classification
params : {"kernel" : 'rbf',
"probability" : True,
"random_state" : seed,
"verbose" : verbosity}
grid : {"C" : np.logspace(-2, 10, 13),
"gamma" : np.logspace(-9, 3, 13),
"shrinking" : [True, False],
"tol" : [0.0005, 0.001, 0.005],
"decision_function_shape" : ['ovo', 'ovr']}
scoring : False
RF:
# Random Forest
model_type : classification
params : {"n_estimators" : n_estimators,
"max_depth" : 10,
"min_samples_split" : 5,
"min_samples_leaf" : 3,
"bootstrap" : True,
"criterion" : 'entropy',
"random_state" : seed,
"n_jobs" : n_jobs,
"verbose" : verbosity}
grid : {"n_estimators" : [21, 51, 101, 201, 501],
"max_depth" : [5, 7, 10, 20],
"min_samples_split" : [2, 3, 5, 10],
"min_samples_leaf" : [1, 2, 3],
"bootstrap" : [True, False],
"criterion" : ['gini', 'entropy']}
scoring : True
RFR:
# Random Forest Regression
model_type : regression
params : {"n_estimators" : n_estimators,
"random_state" : seed,
"n_jobs" : n_jobs,
"verbose" : verbosity}
grid : {}
scoring : False
SVM:
# Support Vector Machine
model_type : classification
params : {"probability" : True,
"random_state" : seed,
"verbose" : verbosity}
grid : {"C" : np.logspace(-2, 10, 13),
"gamma" : np.logspace(-9, 3, 13),
"shrinking" : [True, False],
"tol" : [0.0005, 0.001, 0.005],
(continues on next page)
TF_DNN:
# Google TensorFlow Deep Neural Network
model_type : classification
params : {"feature_columns" : [tf.contrib.layers.real_valued_column("",
˓→dimension=4)],
"n_classes" : 2,
"hidden_units" : [20, 40, 20]}
grid : {}
scoring : False
XGB:
# XGBoost Binary
model_type : classification
params : {"objective" : 'binary:logistic',
"n_estimators" : n_estimators,
"seed" : seed,
"max_depth" : 6,
"learning_rate" : 0.1,
"min_child_weight" : 1.1,
"subsample" : 0.9,
"colsample_bytree" : 0.9,
"nthread" : n_jobs,
"silent" : True}
grid : {"n_estimators" : [21, 51, 101, 201, 501],
"max_depth" : [5, 6, 7, 8, 9, 10, 12, 15, 20],
"learning_rate" : [0.01, 0.02, 0.05, 0.1, 0.2],
"min_child_weight" : [1.0, 1.1],
"subsample" : [0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
"colsample_bytree" : [0.5, 0.6, 0.7, 0.8, 0.9, 1.0]}
scoring : False
XGBM:
# XGBoost Multiclass
model_type : multiclass
params : {"objective" : 'multi:softmax',
"n_estimators" : n_estimators,
"seed" : seed,
"max_depth" : 10,
"learning_rate" : 0.1,
"min_child_weight" : 1.1,
"subsample" : 0.9,
"colsample_bytree" : 0.9,
"nthread" : n_jobs,
"silent" : True}
grid : {}
scoring : False
XGBR:
# XGBoost Regression
model_type : regression
params : {"objective" : 'reg:linear',
"n_estimators" : n_estimators,
"seed" : seed,
"max_depth" : 10,
(continues on next page)
XT:
# Extra Trees
model_type : classification
params : {"n_estimators" : n_estimators,
"random_state" : seed,
"n_jobs" : n_jobs,
"verbose" : verbosity}
grid : {"n_estimators" : [21, 51, 101, 201, 501, 1001, 2001],
"max_features" : ['auto', 'sqrt', 'log2'],
"max_depth" : [3, 5, 7, 10, 20, 30],
"min_samples_split" : [2, 3],
"min_samples_leaf" : [1, 2],
"bootstrap" : [True, False],
"warm_start" : [True, False]}
scoring : True
XTR:
# Extra Trees Regression
model_type : regression
params : {"n_estimators" : n_estimators,
"random_state" : seed,
"n_jobs" : n_jobs,
"verbose" : verbosity}
grid : {}
scoring : False
project
alphapy.log
config
algos.yml
model.yml
data
input
test.csv
train.csv
model
feature_map_20170325.pkl
model_20170325.pkl
output
predictions_20170325.csv
(continues on next page)
TWELVE
MARKETFLOW
MarketFlow transforms financial market data into machine learning models for making market predictions. The
platform gets stock price data from Yahoo Finance (end-of-day) and Google Finance (intraday), transforming the data
into canonical form for training and testing. MarketFlow is powerful because you can easily apply new features to
groups of stocks simultaneously using our Variable Definition Language (VDL). All of the dataframes are aggregated
and split into training and testing files for input into AlphaPy.
MarketFlow gets daily stock prices from Yahoo Finance and intraday stock prices from Google Finance. Both data
sources have the standard primitives: Open, High, Low, Close, and Volume. For daily data, there is a Date
timestamp and for intraday data, there is a Datetime timestamp. We augment the intraday data with a bar_number
field to mark the end of the trading day. All trading days do not end at 4:00 pm EST, as there are holiday trading days
that are shortened.
75
AlphaPy Documentation, Release 2.5.0
Note: Normal market hours are 9:30 am to 4:00 pm EST. Here, we retrieved the data from the CST time zone, one
hour ahead.
Note: You can get Google intraday data going back a maximum of 50 days. If you want to build your own historical
record, then we recommend that you save the data on an ongoing basis for a a larger backtesting window.
The market configuration file (market.yml) is written in YAML and is divided into logical sections reflecting
different parts of MarketFlow. This file is stored in the config directory of your project, along with the model.
yml and algos.yml files. The market section has the following parameters:
data_history: Number of periods of historical data to retrieve.
forecast_period: Number of periods to forecast for the target variable.
fractal: The time quantum for the data feed, represented by an integer followed by a character code. The string
“1d” is one day, and “5m” is five minutes.
leaders: A list of features that are coincident with the target variable. For example, with daily stock market data,
the Open is considered to be a leader because it is recorded at the market open. In contrast, the daily High or
Low cannot be known until the the market close.
predict_history: This is the minimum number of periods required to derive all of the features in prediction
mode on a given date. If you use a rolling mean of 50 days, then the predict_history should be set to at
least 50 to have a valid value on the prediction date.
schema: This string uniquely identifies the subject matter of the data. A schema could be prices for identifying
market data.
target_group: The name of the group selected from the groups section, e.g., a set of stock symbols.
Listing 1: market.yml
market:
data_history : 2000
forecast_period : 1
fractal : 1d
leaders : ['gap', 'gapbadown', 'gapbaup', 'gapdown', 'gapup']
predict_history : 100
schema : prices
target_group : test
groups:
all : ['aaoi', 'aapl', 'acia', 'adbe', 'adi', 'adp', 'agn', 'aig', 'akam',
'algn', 'alk', 'alxn', 'amat', 'amba', 'amd', 'amgn', 'amt', 'amzn',
'antm', 'arch', 'asml', 'athn', 'atvi', 'auph', 'avgo', 'axp', 'ayx',
'azo', 'ba', 'baba', 'bac', 'bby', 'bidu', 'biib', 'brcd', 'bvsn',
'bwld', 'c', 'cacc', 'cara', 'casy', 'cat', 'cde', 'celg', 'cern',
'chkp', 'chtr', 'clvs', 'cme', 'cmg', 'cof', 'cohr', 'comm', 'cost',
'cpk', 'crm', 'crus', 'csco', 'ctsh', 'ctxs', 'csx', 'cvs', 'cybr',
'data', 'ddd', 'deck', 'dgaz', 'dia', 'dis', 'dish', 'dnkn', 'dpz',
'drys', 'dust', 'ea', 'ebay', 'edc', 'edz', 'eem', 'elli', 'eog',
'esrx', 'etrm', 'ewh', 'ewt', 'expe', 'fang', 'fas', 'faz', 'fb',
'fcx', 'fdx', 'ffiv', 'fit', 'five', 'fnsr', 'fslr', 'ftnt', 'gddy',
'gdx', 'gdxj', 'ge', 'gild', 'gld', 'glw', 'gm', 'googl', 'gpro',
'grub', 'gs', 'gwph', 'hal', 'has', 'hd', 'hdp', 'hlf', 'hog', 'hum',
'ibb', 'ibm', 'ice', 'idxx', 'ilmn', 'ilmn', 'incy', 'intc', 'intu',
'ip', 'isrg', 'iwm', 'ivv', 'iwf', 'iwm', 'jack', 'jcp', 'jdst', 'jnj',
'jnpr', 'jnug', 'jpm', 'kite', 'klac', 'ko', 'kss', 'labd', 'labu',
'len', 'lite', 'lmt', 'lnkd', 'lrcx', 'lulu', 'lvs', 'mbly', 'mcd',
'mchp', 'mdy', 'meoh', 'mnst', 'mo', 'momo', 'mon', 'mrk', 'ms', 'msft',
'mtb', 'mu', 'nflx', 'nfx', 'nke', 'ntap', 'ntes', 'ntnx', 'nugt',
'nvda', 'nxpi', 'nxst', 'oii', 'oled', 'orcl', 'orly', 'p', 'panw',
'pcln', 'pg', 'pm', 'pnra', 'prgo', 'pxd', 'pypl', 'qcom', 'qqq',
'qrvo', 'rht', 'sam', 'sbux', 'sds', 'sgen', 'shld', 'shop', 'sig',
'sina', 'siri', 'skx', 'slb', 'slv', 'smh', 'snap', 'sncr', 'soda',
'splk', 'spy', 'stld', 'stmp', 'stx', 'svxy', 'swks', 'symc', 't',
'tbt', 'teva', 'tgt', 'tho', 'tlt', 'tmo', 'tna', 'tqqq', 'trip',
'tsla', 'ttwo', 'tvix', 'twlo', 'twtr', 'tza', 'uaa', 'ugaz', 'uhs',
'ulta', 'ulti', 'unh', 'unp', 'upro', 'uri', 'ups', 'uri', 'uthr',
'utx', 'uvxy', 'v', 'veev', 'viav', 'vlo', 'vmc', 'vrsn', 'vrtx', 'vrx',
'vwo', 'vxx', 'vz', 'wday', 'wdc', 'wfc', 'wfm', 'wmt', 'wynn', 'x',
'xbi', 'xhb', 'xiv', 'xle', 'xlf', 'xlk', 'xlnx', 'xom', 'xlp', 'xlu',
'xlv', 'xme', 'xom', 'wix', 'yelp', 'z']
etf : ['dia', 'dust', 'edc', 'edz', 'eem', 'ewh', 'ewt', 'fas', 'faz',
'gld', 'hyg', 'iwm', 'ivv', 'iwf', 'jnk', 'mdy', 'nugt', 'qqq',
'sds', 'smh', 'spy', 'tbt', 'tlt', 'tna', 'tvix', 'tza', 'upro',
'uvxy', 'vwo', 'vxx', 'xhb', 'xiv', 'xle', 'xlf', 'xlk', 'xlp',
'xlu', 'xlv', 'xme']
tech : ['aapl', 'adbe', 'amat', 'amgn', 'amzn', 'avgo', 'baba', 'bidu',
'brcd', 'csco', 'ddd', 'emc', 'expe', 'fb', 'fit', 'fslr', 'goog',
'intc', 'isrg', 'lnkd', 'msft', 'nflx', 'nvda', 'pcln', 'qcom',
'qqq', 'tsla', 'twtr']
test : ['aapl', 'amzn', 'goog', 'fb', 'nvda', 'tsla']
aliases:
atr : 'ma_truerange'
aver : 'ma_hlrange'
cma : 'ma_close'
cmax : 'highest_close'
cmin : 'lowest_close'
hc : 'higher_close'
hh : 'higher_high'
hl : 'higher_low'
ho : 'higher_open'
hmax : 'highest_high'
hmin : 'lowest_high'
lc : 'lower_close'
lh : 'lower_high'
ll : 'lower_low'
lo : 'lower_open'
lmax : 'highest_low'
lmin : 'lowest_low'
net : 'net_close'
netdown : 'down_net'
netup : 'up_net'
omax : 'highest_open'
omin : 'lowest_open'
rmax : 'highest_hlrange'
rmin : 'lowest_hlrange'
rr : 'maratio_hlrange'
rixc : 'rindex_close_high_low'
rixo : 'rindex_open_high_low'
roi : 'netreturn_close'
rsi : 'rsi_close'
sepma : 'ma_sep'
vma : 'ma_volume'
vmratio : 'maratio_volume'
upmove : 'net_high'
variables:
abovema : 'close > cma_50'
belowma : 'close < cma_50'
bigup : 'rrover & sephigh & netup'
bigdown : 'rrover & sephigh & netdown'
doji : 'sepdoji & rrunder'
hookdown : 'open > high[1] & close < close[1]'
(continues on next page)
The cornerstone of MarketFlow is the Analysis. You can create models and forecasts for different groups of stocks.
The purpose of the analysis object is to gather data for all of the group members and then consolidate the data into
train and test files. Further, some features and the target variable have to be adjusted (lagged) to avoid data leakage.
A group is simply a collection of symbols for analysis. In this example, we create different groups for technology
stocks, ETFs, and a smaller group for testing. To create a model for a given group, simply set the target_group in
the market section of the market.yml file and run mflow.
Listing 2: market.yml
groups:
all : ['aaoi', 'aapl', 'acia', 'adbe', 'adi', 'adp', 'agn', 'aig', 'akam',
'algn', 'alk', 'alxn', 'amat', 'amba', 'amd', 'amgn', 'amt', 'amzn',
'antm', 'arch', 'asml', 'athn', 'atvi', 'auph', 'avgo', 'axp', 'ayx',
'azo', 'ba', 'baba', 'bac', 'bby', 'bidu', 'biib', 'brcd', 'bvsn',
'bwld', 'c', 'cacc', 'cara', 'casy', 'cat', 'cde', 'celg', 'cern',
'chkp', 'chtr', 'clvs', 'cme', 'cmg', 'cof', 'cohr', 'comm', 'cost',
'cpk', 'crm', 'crus', 'csco', 'ctsh', 'ctxs', 'csx', 'cvs', 'cybr',
'data', 'ddd', 'deck', 'dgaz', 'dia', 'dis', 'dish', 'dnkn', 'dpz',
'drys', 'dust', 'ea', 'ebay', 'edc', 'edz', 'eem', 'elli', 'eog',
'esrx', 'etrm', 'ewh', 'ewt', 'expe', 'fang', 'fas', 'faz', 'fb',
'fcx', 'fdx', 'ffiv', 'fit', 'five', 'fnsr', 'fslr', 'ftnt', 'gddy',
'gdx', 'gdxj', 'ge', 'gild', 'gld', 'glw', 'gm', 'googl', 'gpro',
'grub', 'gs', 'gwph', 'hal', 'has', 'hd', 'hdp', 'hlf', 'hog', 'hum',
'ibb', 'ibm', 'ice', 'idxx', 'ilmn', 'ilmn', 'incy', 'intc', 'intu',
'ip', 'isrg', 'iwm', 'ivv', 'iwf', 'iwm', 'jack', 'jcp', 'jdst', 'jnj',
'jnpr', 'jnug', 'jpm', 'kite', 'klac', 'ko', 'kss', 'labd', 'labu',
'len', 'lite', 'lmt', 'lnkd', 'lrcx', 'lulu', 'lvs', 'mbly', 'mcd',
'mchp', 'mdy', 'meoh', 'mnst', 'mo', 'momo', 'mon', 'mrk', 'ms', 'msft',
(continues on next page)
Because market analysis encompasses a wide array of technical indicators, you can define features using the Variable
Definition Language (VDL). The concept is simple: flatten out a function call and its parameters into a string, and that
string represents the variable name. You can use the technical analysis functions in AlphaPy, or define your own.
Let’s define a feature that indicates whether or not a stock is above its 50-day closing moving average. The al-
phapy.market_variables module has a function ma to calculate a rolling mean. It has two parameters: the name of
the dataframe’s column and the period over which to calculate the mean. So, the corresponding variable name is
ma_close_50.
Typically, a moving average is calculated with the closing price, so we can define an alias cma which represents the
closing moving average. An alias is simply a substitution mechanism for replacing one string with an abbreviation.
Instead of ma_close_50, we can now refer to cma_50 using an alias.
Finally, we can define the variable abovema with a relational expression. Note that numeric values in the expression
can be substituted when defining features, e.g., abovema_20.
Listing 3: market.yml
features: ['abovema_50']
aliases:
cma : 'ma_close'
variables:
abovema : 'close > cma_50'
Listing 4: market.yml
aliases:
atr : 'ma_truerange'
aver : 'ma_hlrange'
cma : 'ma_close'
cmax : 'highest_close'
cmin : 'lowest_close'
hc : 'higher_close'
hh : 'higher_high'
hl : 'higher_low'
ho : 'higher_open'
hmax : 'highest_high'
hmin : 'lowest_high'
lc : 'lower_close'
lh : 'lower_high'
ll : 'lower_low'
lo : 'lower_open'
lmax : 'highest_low'
lmin : 'lowest_low'
net : 'net_close'
netdown : 'down_net'
netup : 'up_net'
omax : 'highest_open'
omin : 'lowest_open'
rmax : 'highest_hlrange'
rmin : 'lowest_hlrange'
rr : 'maratio_hlrange'
rixc : 'rindex_close_high_low'
rixo : 'rindex_open_high_low'
roi : 'netreturn_close'
rsi : 'rsi_close'
sepma : 'ma_sep'
vma : 'ma_volume'
vmratio : 'maratio_volume'
upmove : 'net_high'
Variable expressions are valid Python expressions, with the addition of offsets to reference previous values.
Listing 5: market.yml
variables:
abovema : 'close > cma_50'
belowma : 'close < cma_50'
bigup : 'rrover & sephigh & netup'
bigdown : 'rrover & sephigh & netdown'
doji : 'sepdoji & rrunder'
hookdown : 'open > high[1] & close < close[1]'
hookup : 'open < low[1] & close > close[1]'
inside : 'low > low[1] & high < high[1]'
madelta : '(close - cma_50) / atr_10'
nr : 'hlrange == rmin_4'
outside : 'low < low[1] & high > high[1]'
roihigh : 'roi_5 >= 5'
roilow : 'roi_5 < -5'
roiminus : 'roi_5 < 0'
roiplus : 'roi_5 > 0'
(continues on next page)
Once the aliases and variables are defined, a foundation is established for defining all of the features that you want to
test.
Listing 6: market.yml
features: ['abovema_3', 'abovema_5', 'abovema_10', 'abovema_20', 'abovema_50',
'adx', 'atr', 'bigdown', 'bigup', 'diminus', 'diplus', 'doji',
'gap', 'gapbadown', 'gapbaup', 'gapdown', 'gapup',
'hc', 'hh', 'ho', 'hl', 'lc', 'lh', 'll', 'lo', 'hookdown', 'hookup',
'inside', 'outside', 'madelta_3', 'madelta_5', 'madelta_7', 'madelta_10',
'madelta_12', 'madelta_15', 'madelta_18', 'madelta_20', 'madelta',
'net', 'netdown', 'netup', 'nr_3', 'nr_4', 'nr_5', 'nr_7', 'nr_8',
'nr_10', 'nr_18', 'roi', 'roi_2', 'roi_3', 'roi_4', 'roi_5', 'roi_10',
'roi_20', 'rr_1_4', 'rr_1_7', 'rr_1_10', 'rr_2_5', 'rr_2_7', 'rr_2_10',
'rr_3_8', 'rr_3_14', 'rr_4_10', 'rr_4_20', 'rr_5_10', 'rr_5_20',
'rr_5_30', 'rr_6_14', 'rr_6_25', 'rr_7_14', 'rr_7_35', 'rr_8_22',
'rrhigh', 'rrlow', 'rrover', 'rrunder', 'rsi_3', 'rsi_4', 'rsi_5',
'rsi_6', 'rsi_8', 'rsi_10', 'rsi_14', 'sep_3_3', 'sep_5_5', 'sep_8_8',
'sep_10_10', 'sep_14_14', 'sep_21_21', 'sep_30_30', 'sep_40_40',
'sephigh', 'seplow', 'trend', 'vma', 'vmover', 'vmratio', 'vmunder',
'volatility_3', 'volatility_5', 'volatility', 'volatility_20',
'wr_2', 'wr_3', 'wr', 'wr_5', 'wr_6', 'wr_7', 'wr_10']
MarketFlow provides two out-of-the-box trading systems. The first is a long/short system that you define using the
system features in the configuration file market.yml. When MarketFlow detects a system in the file, it knows to
execute that particular long/short strategy.
Listing 7: market.yml
market:
data_history : 1000
forecast_period : 1
fractal : 1d
leaders : []
predict_history : 50
schema : prices
target_group : faang
system:
name : 'closer'
holdperiod : 0
longentry : hc
longexit :
shortentry : lc
shortexit :
scale : False
groups:
faang : ['fb', 'aapl', 'amzn', 'nflx', 'googl']
aliases:
hc : 'higher_close'
lc : 'lower_close'
MarketFlow runs on top of AlphaPy, so the model.yml file has the same format. In the following example, note the
use of treatments to calculate runs for a set of features.
Listing 8: model.yml
project:
directory : .
file_extension : csv
submission_file :
submit_probas : False
data:
drop : ['date', 'tag', 'open', 'high', 'low', 'close', 'volume',
˓→'adjclose',
model:
algorithms : ['RF']
balance_classes : True
calibration :
option : False
type : isotonic
cv_folds : 3
estimators : 501
feature_selection :
option : True
percentage : 50
uni_grid : [5, 10, 15, 20, 25]
score_func : f_classif
grid_search :
option : False
iterations : 100
random : True
subsample : True
sampling_pct : 0.25
pvalue_level : 0.01
rfe :
option : True
step : 10
scoring_function : 'roc_auc'
type : classification
features:
clustering :
option : False
increment : 3
maximum : 30
minimum : 3
counts :
option : False
encoding :
rounding : 3
type : factorize
factors : []
interactions :
option : True
poly_degree : 2
sampling_pct : 5
isomap :
option : False
components : 2
neighbors : 5
logtransform :
option : False
numpy :
option : False
pca :
(continues on next page)
treatments:
doji : ['alphapy.features', 'runs_test', ['all'], 18]
hc : ['alphapy.features', 'runs_test', ['all'], 18]
hh : ['alphapy.features', 'runs_test', ['all'], 18]
hl : ['alphapy.features', 'runs_test', ['all'], 18]
ho : ['alphapy.features', 'runs_test', ['all'], 18]
rrhigh : ['alphapy.features', 'runs_test', ['all'], 18]
rrlow : ['alphapy.features', 'runs_test', ['all'], 18]
rrover : ['alphapy.features', 'runs_test', ['all'], 18]
rrunder : ['alphapy.features', 'runs_test', ['all'], 18]
sephigh : ['alphapy.features', 'runs_test', ['all'], 18]
seplow : ['alphapy.features', 'runs_test', ['all'], 18]
trend : ['alphapy.features', 'runs_test', ['all'], 18]
pipeline:
number_jobs : -1
seed : 10231
verbosity : 0
plots:
calibration : True
confusion_matrix : True
importances : True
learning_curve : True
roc_curve : True
xgboost:
stopping_rounds : 20
First, change the directory to your project location, where you have already followed the Project Structure specifica-
tions:
cd path/to/project
mflow
Usage:
In the project location, run mflow with the predict flag. MarketFlow will automatically create the predict.csv
file using the pdate option:
THIRTEEN
SPORTFLOW
SportFlow applies machine learning algorithms to predict game outcomes for matches in any team sport. We created
binary features (for classification) to determine whether or not a team will win the game or even more importantly,
cover the spread. We also try to predict whether or not a game’s total points will exceed the over/under.
Of course, there are practical matters to predicting a game’s outcome. The strength of supervised learning is to improve
an algorithm’s performance with lots of data. While major-league baseball has a total of 2,430 games per year, pro
football has only 256 games per year. College football and basketball are somewhere in the middle of this range.
The other complication is determining whether or not a model for one sport can be used for another. The advantage
is that combining sports gives us more data. The disadvantage is that each sport has unique characteristics that could
make a unified model infeasible. Still, we can combine the game data to test an overall model.
SportFlow starts with minimal game data (lines and scores) and expands these data into temporal features such as runs
and streaks for all of the features. Currently, we do not incorporate player data or other external factors, but there are
some excellent open-source packages such as BurntSushi’s nflgame Python code. For its initial version, SportFlow
game data must be in the format below:
89
AlphaPy Documentation, Release 2.5.0
The SportFlow logic is split-apply-combine, as the data are first split along team lines, then team statistics are calcu-
lated and applied, and finally the team data are inserted into the overall model frame.
The SportFlow configuration file is minimal. You can simulate random scoring to compare with a real model. Further,
you can experiment with the rolling window for run and streak calculations.
Listing 1: sport.yml
sport:
points_max : 100
points_min : 50
random_scoring : False
seasons : []
rolling_window : 3
SportFlow runs on top of AlphaPy, so the model.yml file has the same format.
Listing 2: model.yml
project:
directory : .
file_extension : csv
submission_file :
submit_probas : False
data:
drop : ['Unnamed: 0', 'index', 'season', 'date', 'home.team', 'away.
˓→team',
model:
(continues on next page)
features:
clustering :
option : False
increment : 3
maximum : 30
minimum : 3
counts :
option : False
encoding :
rounding : 3
type : factorize
factors : ['line', 'delta.wins', 'delta.losses', 'delta.ties',
'delta.point_win_streak', 'delta.point_loss_streak',
'delta.cover_win_streak', 'delta.cover_loss_streak',
'delta.over_streak', 'delta.under_streak']
interactions :
option : True
poly_degree : 2
sampling_pct : 5
isomap :
option : False
components : 2
neighbors : 5
logtransform :
option : False
numpy :
option : False
pca :
option : False
increment : 3
maximum : 15
minimum : 3
(continues on next page)
pipeline:
number_jobs : -1
seed : 13201
verbosity : 0
plots:
calibration : True
confusion_matrix : True
importances : True
learning_curve : True
roc_curve : True
xgboost:
stopping_rounds : 30
First, change the directory to your project location, where you have already followed the Project Structure specifica-
tions:
cd path/to/project
sflow
Usage:
In the project location, run sflow with the predict flag. SportFlow will automatically create the predict.csv
file using the pdate option:
FOURTEEN
ALPHAPY
14.1.1 Submodules
alphapy.__main__.main(args=None)
AlphaPy Main Program
Notes
alphapy.__main__.main_pipeline(model)
AlphaPy Main Pipeline
Parameters model (alphapy.Model) – The model specifications for the pipeline.
Returns model – The final model.
Return type alphapy.Model
alphapy.__main__.prediction_pipeline(model)
AlphaPy Prediction Pipeline
Parameters model (alphapy.Model) – The model object for controlling the pipeline.
Returns None
Return type None
95
AlphaPy Documentation, Release 2.5.0
Notes
The saved model is loaded from disk, and predictions are made on the new testing data.
alphapy.__main__.training_pipeline(model)
AlphaPy Training Pipeline
Parameters model (alphapy.Model) – The model object for controlling the pipeline.
Returns model – The final results are stored in the model object.
Return type alphapy.Model
Raises KeyError – If the number of columns of the train and test data do not match, then this
exception is raised.
Examples
aliases = {}
alphapy.alias.get_alias(alias)
Find an alias value with the given key.
Parameters alias (str) – Key for finding the alias value.
Returns alias_value – Value for the corresponding key.
Return type str
Examples
alphapy.calendrical.biz_day_month(rdate)
Calculate the business day of the month.
Parameters rdate (int) – RDate date format.
Returns bdm – Business day of month.
Return type int
alphapy.calendrical.biz_day_week(rdate)
Calculate the business day of the week.
Parameters rdate (int) – RDate date format.
Returns bdw – Business day of week.
Return type int
alphapy.calendrical.christmas_day(gyear, observed)
Get Christmas Day for a given year.
Parameters
• gyear (int) – Gregorian year.
• observed (bool) – False if the exact date, True if the weekday.
Returns xmas – Christmas Day in RDate format.
Return type int
alphapy.calendrical.cinco_de_mayo(gyear)
Get Cinco de Mayo for a given year.
Parameters gyear (int) – Gregorian year.
Returns cinco_de_mayo – Cinco de Mayo in RDate format.
Return type int
alphapy.calendrical.day_of_week(rdate)
Get the ordinal day of the week.
Parameters rdate (int) – RDate date format.
Returns dw – Ordinal day of the week.
Return type int
alphapy.calendrical.day_of_year(gyear, gmonth, gday)
Calculate the day number of the given calendar year.
Parameters
• gyear (int) – Gregorian year.
• gmonth (int) – Gregorian month.
• gday (int) – Gregorian day.
Returns dy – Day number of year in RDate format.
Return type int
alphapy.calendrical.days_left_in_year(gyear, gmonth, gday)
Calculate the number of days remaining in the calendar year.
Parameters
• gyear (int) – Gregorian year.
• gmonth (int) – Gregorian month.
• gday (int) – Gregorian day.
Returns days_left – Calendar days remaining in RDate format.
Return type int
alphapy.calendrical.easter_day(gyear)
Get Easter Day for a given year.
Parameters gyear (int) – Gregorian year.
Returns ed – Easter Day in RDate format.
Return type int
alphapy.calendrical.expand_dates(date_list)
alphapy.calendrical.fathers_day(gyear)
Get Father’s Day for a given year.
Parameters gyear (int) – Gregorian year.
Returns fathers_day – Father’s Day in RDate format.
Return type int
alphapy.calendrical.first_kday(k, gyear, gmonth, gday)
Calculate the first kday in RDate format.
Parameters
• k (int) – Day of the week.
• gyear (int) – Gregorian year.
• gmonth (int) – Gregorian month.
• gday (int) – Gregorian day.
Returns fkd – first-kday in RDate format.
Return type int
alphapy.calendrical.gdate_to_rdate(gyear, gmonth, gday)
Convert Gregorian date to RDate format.
Parameters
• gyear (int) – Gregorian year.
• gmonth (int) – Gregorian month.
• gday (int) – Gregorian day.
Returns rdate – RDate date format.
Return type int
alphapy.calendrical.get_holiday_names()
Get the list of defined holidays.
Returns holidays – List of holiday names.
Return type list of str
alphapy.calendrical.kday_before(rdate, k)
Calculate the day before a given RDate.
Parameters
• rdate (int) – RDate date format.
• k (int) – Day of the week.
Returns kdb – kday-before in RDate format.
Return type int
alphapy.calendrical.kday_nearest(rdate, k)
Calculate the day nearest a given RDate.
Parameters
• rdate (int) – RDate date format.
• k (int) – Day of the week.
Returns kdn – kday-nearest in RDate format.
Return type int
alphapy.calendrical.kday_on_after(rdate, k)
Calculate the day on or after a given RDate.
Parameters
• rdate (int) – RDate date format.
• k (int) – Day of the week.
Returns kdoa – kday-on-or-after in RDate format.
Return type int
alphapy.calendrical.kday_on_before(rdate, k)
Calculate the day on or before a given RDate.
Parameters
• rdate (int) – RDate date format.
• k (int) – Day of the week.
Returns kdob – kday-on-or-before in RDate format.
Return type int
alphapy.calendrical.labor_day(gyear)
Get Labor Day for a given year.
Parameters gyear (int) – Gregorian year.
Returns lday – Labor Day in RDate format.
Return type int
alphapy.calendrical.last_kday(k, gyear, gmonth, gday)
Calculate the last kday in RDate format.
Parameters
• k (int) – Day of the week.
• gyear (int) – Gregorian year.
alphapy.calendrical.next_holiday(rdate, holidays)
Find the next holiday after a given date.
Parameters
• rdate (int) – RDate date format.
• holidays (dict of RDate (int)) – Holidays in RDate format.
Returns holiday – Next holiday in RDate format.
Return type RDate (int)
alphapy.calendrical.nth_bizday(n, gyear, gmonth)
Calculate the nth business day in a month.
Parameters
• n (int) – Number of the business day to get.
• gyear (int) – Gregorian year.
• gmonth (int) – Gregorian month.
Returns bizday – Nth business day of a given month in RDate format.
Return type int
alphapy.calendrical.nth_kday(n, k, gyear, gmonth, gday)
Calculate the nth-kday in RDate format.
Parameters
• n (int) – Occurrence of a given day counting in either direction.
• k (int) – Day of the week.
• gyear (int) – Gregorian year.
• gmonth (int) – Gregorian month.
• gday (int) – Gregorian day.
Returns nthkday – nth-kday in RDate format.
Return type int
alphapy.calendrical.presidents_day(gyear)
Get President’s Day for a given year.
Parameters gyear (int) – Gregorian year.
Returns prezday – President’s Day in RDate format.
Return type int
alphapy.calendrical.previous_event(rdate, events)
Find the previous event before a given date.
Parameters
• rdate (int) – RDate date format.
• events (list of RDate (int)) – Monthly events in RDate format.
Returns event – Previous event in RDate format.
Return type RDate (int)
alphapy.calendrical.previous_holiday(rdate, holidays)
Find the previous holiday before a given date.
Parameters
• rdate (int) – RDate date format.
• holidays (dict of RDate (int)) – Holidays in RDate format.
Returns holiday – Previous holiday in RDate format.
Return type RDate (int)
alphapy.calendrical.rdate_to_gdate(rdate)
Convert RDate format to Gregorian date format.
Parameters rdate (int) – RDate date format.
Returns
• gyear (int) – Gregorian year.
• gmonth (int) – Gregorian month.
• gday (int) – Gregorian day.
alphapy.calendrical.rdate_to_gyear(rdate)
Convert RDate format to Gregorian year.
Parameters rdate (int) – RDate date format.
Returns gyear – Gregorian year.
Return type int
alphapy.calendrical.saint_patricks_day(gyear)
Get Saint Patrick’s day for a given year.
Parameters
• gyear (int) – Gregorian year.
• observed (bool) – False if the exact date, True if the weekday.
Returns patricks – Saint Patrick’s Day in RDate format.
Return type int
alphapy.calendrical.set_events(n, k, gyear, gday)
Define monthly events for a given year.
Parameters
• n (int) – Occurrence of a given day counting in either direction.
• k (int) – Day of the week.
• gyear (int) – Gregorian year for the events.
• gday (int) – Gregorian day representing the first day to consider.
Returns events – Monthly events in RDate format.
Return type list of RDate (int)
Example
alphapy.calendrical.set_holidays(gyear, observe)
Determine if this is a Gregorian leap year.
Parameters
• gyear (int) – Value for the corresponding key.
• observe (bool) – True to get the observed date, otherwise False.
Returns holidays – Set of holidays in RDate format for a given year.
Return type dict of int
alphapy.calendrical.subtract_dates(gyear1, gmonth1, gday1, gyear2, gmonth2, gday2)
Calculate the difference between two Gregorian dates.
Parameters
• gyear1 (int) – Gregorian year of first date.
• gmonth1 (int) – Gregorian month of first date.
• gday1 (int) – Gregorian day of first date.
• gyear2 (int) – Gregorian year of successive date.
• gmonth2 (int) – Gregorian month of successive date.
• gday2 (int) – Gregorian day of successive date.
Returns delta_days – Difference in days in RDate format.
Return type int
alphapy.calendrical.thanksgiving_day(gyear)
Get Thanksgiving Day for a given year.
Parameters gyear (int) – Gregorian year.
Returns tday – Thanksgiving Day in RDate format.
Return type int
alphapy.calendrical.valentines_day(gyear)
Get Valentine’s day for a given year.
Parameters gyear (int) – Gregorian year.
Returns valentines – Valentine’s Day in RDate format.
Return type int
alphapy.calendrical.veterans_day(gyear, observed)
Get Veteran’s day for a given year.
Parameters
• gyear (int) – Gregorian year.
• observed (bool) – False if the exact date, True if the weekday.
Returns veterans – Veteran’s Day in RDate format.
• schema (str) – The schema (including any subschema) for this data feed.
• subschema (str) – Any subschema for this data feed.
• symbol (str) – A valid stock symbol.
• intraday_data (bool) – If True, then get intraday data.
• data_fractal (str) – Pandas offset alias.
• from_date (str) – Starting date for symbol retrieval.
• to_date (str) – Ending date for symbol retrieval.
• lookback_period (int) – The number of periods of data to retrieve.
Returns df – The dataframe containing the market data.
Return type pandas.DataFrame
alphapy.data.get_quandl_data(schema, subschema, symbol, intraday_data, data_fractal,
from_date, to_date, lookback_period)
Get Quandl data.
Parameters
• schema (str) – The schema for this data feed.
• subschema (str) – Any subschema for this data feed.
• symbol (str) – A valid stock symbol.
• intraday_data (bool) – If True, then get intraday data.
• data_fractal (str) – Pandas offset alias.
• from_date (str) – Starting date for symbol retrieval.
• to_date (str) – Ending date for symbol retrieval.
• lookback_period (int) – The number of periods of data to retrieve.
Returns df – The dataframe containing the market data.
Return type pandas.DataFrame
alphapy.data.get_yahoo_data(schema, subschema, symbol, intraday_data, data_fractal, from_date,
to_date, lookback_period)
Get Yahoo data.
Parameters
• schema (str) – The schema (including any subschema) for this data feed.
• subschema (str) – Any subschema for this data feed.
• symbol (str) – A valid stock symbol.
• intraday_data (bool) – If True, then get intraday data.
• data_fractal (str) – Pandas offset alias.
• from_date (str) – Starting date for symbol retrieval.
• to_date (str) – Ending date for symbol retrieval.
• lookback_period (int) – The number of periods of data to retrieve.
Returns df – The dataframe containing the market data.
Return type pandas.DataFrame
alphapy.data.sample_data(model)
Sample the training data.
Sampling is configured in the model.yml file (data:sampling:method) You can learn more about resampling
techniques here [IMB].
Parameters model (alphapy.Model) – The model object describing the data.
Returns model – The model object with the sampled data.
Return type alphapy.Model
alphapy.data.shuffle_data(model)
Randomly shuffle the training data.
Parameters model (alphapy.Model) – The model object describing the data.
Returns model – The model object with the shuffled data.
Return type alphapy.Model
Parameters cfg_dir (str) – The directory where the configuration file algos.yml is stored.
Returns specs – The specifications for determining which algorithms to run.
Return type dict
alphapy.estimators.get_estimators(model)
Define all the AlphaPy estimators based on the contents of the algos.yml file.
Parameters model (alphapy.Model) – The model object containing global AlphaPy parameters.
Returns estimators – All of the estimators required for running the pipeline.
Return type dict
References
Notes
Isomaps are very memory-intensive. Your process will be killed if you run out of memory.
References
You can find more information on Principal Component Analysis here [ISO].
alphapy.features.create_numpy_features(base_features, sentinel)
Calculate the sum, mean, standard deviation, and variance of each row.
Parameters
• base_features (numpy array) – The feature dataframe.
• sentinel (float) – The number to be imputed for NaN values.
Returns
• np_features (numpy array) – The calculated NumPy features.
• np_fnames (list) – The NumPy feature names.
alphapy.features.create_pca_features(features, model)
Apply Principal Component Analysis (PCA) to the features.
Parameters
• features (numpy array) – The input features.
• model (alphapy.Model) – The model object with the PCA parameters.
Returns
• pfeatures (numpy array) – The PCA features.
• pnames (list) – The PCA feature names.
References
You can find more information on Principal Component Analysis here [PCA].
alphapy.features.create_scipy_features(base_features, sentinel)
Calculate the skew, kurtosis, and other statistical features for each row.
Parameters
• base_features (numpy array) – The feature dataframe.
• sentinel (float) – The number to be imputed for NaN values.
Returns
• sp_features (numpy array) – The calculated SciPy features.
• sp_fnames (list) – The SciPy feature names.
alphapy.features.create_tsne_features(features, model)
Create t-SNE features.
Parameters
• features (numpy array) – The input features.
• model (alphapy.Model) – The model object with the t-SNE parameters.
Returns
• tfeatures (numpy array) – The t-SNE features.
• tnames (list) – The t-SNE feature names.
References
You can find more information on the t-SNE technique here [TSNE].
alphapy.features.drop_features(X, drop)
Drop any specified features.
Parameters
• X (pandas.DataFrame) – The dataframe containing the features.
• drop (list) – The list of features to remove from X.
Returns X – The dataframe without the dropped features.
Return type pandas.DataFrame
alphapy.features.float_factor(x, rounding)
Convert a floating point number to a factor.
Parameters
• x (float) – The value to convert to a factor.
• rounding (int) – The number of places to round.
Returns ffactor – The resulting factor.
Return type int
alphapy.features.get_factors(model, X_train, X_test, y_train, fnum, fname, nvalues, dtype, en-
coder, rounding, sentinel)
Convert the original feature to a factor.
Parameters
• model (alphapy.Model) – Model object with the feature specifications.
• X_train (pandas.DataFrame) – Training dataframe containing the column fname.
• X_test (pandas.DataFrame) – Testing dataframe containing the column fname.
• y_train (pandas.Series) – Training series for target variable.
• fnum (int) – Feature number, strictly for logging purposes
• fname (str) – Name of the text column in the dataframe df.
• nvalues (int) – The number of unique values.
• dtype (str) – The values 'float64', 'int64', or 'bool'.
• encoder (alphapy.features.Encoders) – Type of encoder to apply.
• rounding (int) – Number of places to round.
• sentinel (float) – The number to be imputed for NaN values.
Returns
• all_features (numpy array) – The features that have been transformed to factors.
References
References
To use count vectorization and TF-IDF, you can find more information here [TFE].
alphapy.features.impute_values(feature, dt, sentinel)
Impute values for a given data type. The median strategy is applied for floating point values, and the most
frequent strategy is applied for integer or Boolean values.
Parameters
• feature (pandas.Series or numpy.array) – The feature for imputation.
• dt (str) – The values 'float64', 'int64', or 'bool'.
• sentinel (float) – The number to be imputed for NaN values.
Returns imputed – The feature after imputation.
Return type numpy.array
Raises TypeError – Data type dt is invalid for imputation.
References
References
You can find more information on low-variance feature selection here [LV].
alphapy.features.save_features(model, X_train, X_test, y_train=None, y_test=None)
Save new features to the model.
Parameters
• model (alphapy.Model) – Model object with train and test data.
• X_train (numpy array) – Training features.
• X_test (numpy array) – Testing features.
• y_train (numpy array) – Training labels.
• y_test (numpy array) – Testing labels.
Returns model – Model object with new train and test data.
References
You can find more information on univariate feature selection here [UNI].
Examples
frames = {}
alphapy.frame.dump_frames(group, directory, extension, separator)
Save a group of data frames to disk.
Parameters
• group (alphapy.Group) – The collection of frames to be saved to the file system.
• directory (str) – Full directory specification.
• extension (str) – File name extension, e.g., csv.
• separator (str) – The delimiter between fields in the file.
Returns None
Return type None
alphapy.frame.frame_name(name, space)
Get the frame name for the given name and space.
Parameters
• name (str) – Group name.
Examples
• leaders (list) – The features that are contemporaneous with the target.
• lag_period (int) – The number of lagged rows for prediction.
Returns new_frame – The transformed dataframe with variable sequences.
Return type pandas.DataFrame
alphapy.frame.write_frame(df, directory, filename, extension, separator, index=False, in-
dex_label=None, columns=None)
Write a dataframe into a delimiter-separated file.
Parameters
• df (pandas.DataFrame) – The pandas dataframe to save to a file.
• directory (str) – Full directory specification.
• filename (str) – Name of the file to write, excluding the extension.
• extension (str) – File name extension, e.g., csv.
• separator (str) – The delimiter between fields in the file.
• index (bool, optional) – If True, write the row names (index).
• index_label (str, optional) – A column label for the index.
• columns (str, optional) – A list of column names.
Returns None
Return type None
class alphapy.globals.Encoders(value)
Bases: enum.Enum
AlphaPy Encoders.
These are the encoders used in AlphaPy, as configured in the model.yml file (features:encoding:type) You
can learn more about encoders here [ENC].
backdiff = 1
basen = 2
binary = 3
catboost = 4
hashing = 5
helmert = 6
jstein = 7
leaveone = 8
mestimate = 9
onehot = 10
ordinal = 11
polynomial = 12
sum = 13
target = 14
woe = 15
class alphapy.globals.ModelType(value)
Bases: enum.Enum
AlphaPy Model Types.
classification = 1
clustering = 2
multiclass = 3
oneclass = 4
regression = 5
class alphapy.globals.Objective(value)
Bases: enum.Enum
Scoring Function Objectives.
Best model selection is based on the scoring or Objective function, which must be either maximized or mini-
mized. For example, roc_auc is maximized, while neg_log_loss is minimized.
maximize = 1
minimize = 2
class alphapy.globals.Orders
Bases: object
System Order Types.
Variables
• le (str) – long entry
• se (str) – short entry
• lx (str) – long exit
• sx (str) – short exit
• lh (str) – long exit at the end of the holding period
• sh (str) – short exit at the end of the holding period
le = 'le'
lh = 'lh'
lx = 'lx'
se = 'se'
sh = 'sh'
sx = 'sx'
class alphapy.globals.Partition(value)
Bases: enum.Enum
AlphaPy Partitions.
predict = 1
test = 2
train = 3
class alphapy.globals.SamplingMethod(value)
Bases: enum.Enum
AlphaPy Sampling Methods.
These are the data sampling methods used in AlphaPy, as configured in the model.yml file (data:sampling:
method) You can learn more about resampling techniques here [IMB].
ensemble_bc = 1
ensemble_easy = 2
over_random = 3
over_smote = 4
over_smoteb = 5
over_smotesv = 6
overunder_smote_enn = 7
overunder_smote_tomek = 8
under_cluster = 9
under_ncr = 10
under_nearmiss = 11
under_random = 12
under_tomek = 13
class alphapy.globals.Scalers(value)
Bases: enum.Enum
AlphaPy Scalers.
These are the scaling methods used in AlphaPy, as configured in the model.yml file (features:scaling:type)
You can learn more about feature scaling here [SCALE].
minmax = 1
standard = 2
Examples
>>> Group('tech')
add(newlist)
Add new members to the group.
Parameters newlist (list) – New members or identifiers to add to the group.
Returns None
Return type None
Notes
Notes
alphapy.market_flow.get_market_config()
Read the configuration file for MarketFlow.
Parameters None (None)
Returns specs – The parameters for controlling MarketFlow.
Return type dict
alphapy.market_flow.main(args=None)
MarketFlow Main Program
Notes
alphapy.market_flow.market_pipeline(model, market_specs)
AlphaPy MarketFlow Pipeline
Parameters
• model (alphapy.Model) – The model object for AlphaPy.
• market_specs (dict) – The specifications for controlling the MarketFlow pipeline.
Returns model – The final results are stored in the model object.
Return type alphapy.Model
Notes
class alphapy.model.Model(specs)
Bases: object
Create a new model.
Parameters specs (dict) – The model specifications obtained by reading the model.yml file.
Variables
• specs (dict) – The model specifications.
• X_train (pandas.DataFrame) – Training features in matrix format.
• X_test (pandas.Series) – Testing features in matrix format.
• y_train (pandas.DataFrame) – Training labels in vector format.
• y_test (pandas.Series) – Testing labels in vector format.
• algolist (list) – Algorithms to use in training.
• estimators (dict) – Dictionary of estimators (key: algorithm)
• importances (dict) – Feature Importances (key: algorithm)
• coefs (dict) – Coefficients, if applicable (key: algorithm)
• support (dict) – Support Vectors, if applicable (key: algorithm)
• preds (dict) – Predictions or labels (keys: algorithm, partition)
• probas (dict) – Probabilities from classification (keys: algorithm, partition)
• metrics (dict) – Model evaluation metrics (keys: algorith, partition, metric)
Raises KeyError – Model specs must include the key algorithms, which is stored in algolist.
alphapy.model.first_fit(model, algo, est)
Fit the model before optimization.
Parameters
• model (alphapy.Model) – The model object with specifications.
• algo (str) – Abbreviation of the algorithm to run.
• est (alphapy.Estimator) – The estimator to fit.
Returns model – The model object with the initial estimator.
Return type alphapy.Model
Notes
AlphaPy fits an initial model because the user may choose to get a first score without any additional fea-
ture selection or grid search. XGBoost is a special case because it has the advantage of an eval_set and
early_stopping_rounds, which can speed up the estimation phase.
alphapy.model.generate_metrics(model, partition)
Generate model evaluation metrics for all estimators.
Parameters
• model (alphapy.Model) – The model object with stored predictions.
Notes
AlphaPy takes a brute-force approach to calculating each metric. It calls every scikit-learn function without
exception. If the calculation fails for any reason, then the evaluation will still continue without error.
References
For more information about model evaluation and the associated metrics, refer to [EVAL].
alphapy.model.get_model_config()
Read in the configuration file for AlphaPy.
Parameters None (None)
Returns specs – The parameters for controlling AlphaPy.
Return type dict
Raises ValueError – Unrecognized value of a model.yml field.
alphapy.model.load_feature_map(model, directory)
Load the feature map from storage. By default, the most recent feature map is loaded into memory.
Parameters
• model (alphapy.Model) – The model object to contain the feature map.
• directory (str) – Full directory specification of the feature map’s location.
Returns model – The model object containing the feature map.
Return type alphapy.Model
alphapy.model.load_predictor(directory)
Load the model predictor from storage. By default, the most recent model is loaded into memory.
Parameters directory (str) – Full directory specification of the predictor’s location.
Returns predictor – The scoring function.
Return type function
alphapy.model.make_predictions(model, algo, calibrate)
Make predictions for the training and testing data.
Parameters
• model (alphapy.Model) – The model object with specifications.
• algo (str) – Abbreviation of the algorithm to make predictions.
• calibrate (bool) – If True, calibrate the probabilities for a classifier.
Returns model – The model object with the predictions.
Return type alphapy.Model
Notes
For classification, calibration is a precursor to making the actual predictions. In this case, AlphaPy predicts both
labels and probabilities. For regression, real values are predicted.
alphapy.model.predict_best(model)
Select the best model based on score.
Parameters model (alphapy.Model) – The model object with all of the estimators.
Returns model – The model object with the best estimator.
Return type alphapy.Model
Notes
Best model selection is based on a scoring function. If the objective is to minimize (e.g., negative log loss), then
we select the model with the algorithm that has the lowest score. If the objective is to maximize, then we select
the algorithm with the highest score (e.g., AUC).
For multiple algorithms, AlphaPy always creates a blended model. Therefore, the best algorithm that is selected
could actually be the blended model itself.
alphapy.model.predict_blend(model)
Make predictions from a blended model.
Parameters model (alphapy.Model) – The model object with all of the estimators.
Returns model – The model object with the blended estimator.
Return type alphapy.Model
Notes
For classification, AlphaPy uses logistic regression for creating a blended model. For regression, ridge regres-
sion is applied.
alphapy.model.save_feature_map(model, timestamp)
Save the feature map to disk.
Parameters
• model (alphapy.Model) – The model object containing the feature map.
• timestamp (str) – Date in yyyy-mm-dd format.
Returns None
Return type None
alphapy.model.save_model(model, tag, partition)
Save the results in the model file.
Parameters
• model (alphapy.Model) – The model object to save.
• tag (str) – A unique identifier for the output files, e.g., a date stamp.
• partition (alphapy.Partition) – Reference to the dataset.
Returns None
Notes
The following components are extracted from the model object and saved to disk:
• Model predictor (via joblib/pickle)
• Predictions
• Probabilities (classification only)
• Rankings
• Submission File (optional)
alphapy.model.save_predictions(model, tag, partition)
Save the predictions to disk.
Parameters
• model (alphapy.Model) – The model object to save.
• tag (str) – A unique identifier for the output files, e.g., a date stamp.
• partition (alphapy.Partition) – Reference to the dataset.
Returns
• preds (numpy array) – The prediction vector.
• probas (numpy array) – The probability vector.
alphapy.model.save_predictor(model, timestamp)
Save the time-stamped model predictor to disk.
Parameters
• model (alphapy.Model) – The model object that contains the best estimator.
• timestamp (str) – Date in yyyy-mm-dd format.
Returns None
Return type None
alphapy.optimize.grid_report(results, n_top=3)
Report the top grid search scores.
Parameters
• results (dict of numpy arrays) – Mean test scores for each grid search iteration.
• n_top (int, optional) – The number of grid search results to report.
Returns None
Return type None
alphapy.optimize.hyper_grid_search(model, estimator)
Return the best hyperparameters for a grid search.
Parameters
Notes
To reduce the time required for grid search, use either randomized grid search with a fixed number of iterations
or a full grid search with subsampling. AlphaPy uses the scikit-learn Pipeline with feature selection to reduce
the feature space.
References
Notes
If a scoring function is available, then AlphaPy can perform RFE with Cross-Validation (CV), as in this function;
otherwise, it just does RFE without CV.
References
alphapy.plots.generate_plots(model, partition)
Generate plots while running the pipeline.
Parameters
• model (alphapy.Model) – The model object with plotting specifications.
• partition (alphapy.Partition) – Reference to the dataset.
Returns None
Return type None
alphapy.plots.get_partition_data(model, partition)
Get the X, y pair for a given model and partition
Parameters
• model (alphapy.Model) – The model object with partition data.
• partition (alphapy.Partition) – Reference to the dataset.
Returns
• X (numpy array) – The feature matrix.
• y (numpy array) – The target vector.
Raises TypeError – Partition must be train or test.
alphapy.plots.get_plot_directory(model)
Get the plot output directory of a model.
Parameters model (alphapy.Model) – The model object with directory information.
Returns plot_directory – The output directory to write the plot.
Return type str
alphapy.plots.plot_boundary(model, partition, f1=0, f2=1)
Display a comparison of classifiers
Parameters
• model (alphapy.Model) – The model object with plotting specifications.
• partition (alphapy.Partition) – Reference to the dataset.
• f1 (int) – Number of the first feature to compare.
• f2 (int) – Number of the second feature to compare.
Returns None
Return type None
References
References
http://seaborn.pydata.org/generated/seaborn.boxplot.html
alphapy.plots.plot_calibration(model, partition)
Display scikit-learn calibration plots.
Parameters
• model (alphapy.Model) – The model object with plotting specifications.
• partition (alphapy.Partition) – Reference to the dataset.
Returns None
Return type None
References
Notes
References
http://bokeh.pydata.org/en/latest/docs/gallery/candlestick.html
alphapy.plots.plot_confusion_matrix(model, partition)
Draw the confusion matrix.
Parameters
• model (alphapy.Model) – The model object with plotting specifications.
• partition (alphapy.Partition) – Reference to the dataset.
Returns None
Return type None
References
http://scikit-learn.org/stable/modules/model_evaluation.html#confusion-matrix
alphapy.plots.plot_distribution(df, target, tag='eda', directory=None)
Display a Distribution Plot.
Parameters
• df (pandas.DataFrame) – The dataframe containing the target feature.
• target (str) – The target variable for the distribution plot.
• tag (str) – Unique identifier for the plot.
• directory (str, optional) – The full specification of the plot location.
Returns None
Return type None.
References
http://seaborn.pydata.org/generated/seaborn.distplot.html
alphapy.plots.plot_facet_grid(df, target, frow, fcol, tag='eda', directory=None)
Plot a Seaborn faceted histogram grid.
Parameters
• df (pandas.DataFrame) – The dataframe containing the features.
• target (str) – The target variable for contrast.
• frow (list of str) – Feature names for the row elements of the grid.
• fcol (list of str) – Feature names for the column elements of the grid.
• tag (str) – Unique identifier for the plot.
• directory (str, optional) – The full specification of the plot location.
Returns None
Return type None.
References
http://seaborn.pydata.org/generated/seaborn.FacetGrid.html
alphapy.plots.plot_importance(model, partition)
Display scikit-learn feature importances.
Parameters
• model (alphapy.Model) – The model object with plotting specifications.
• partition (alphapy.Partition) – Reference to the dataset.
Returns None
Return type None
References
http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html
alphapy.plots.plot_learning_curve(model, partition)
Generate learning curves for a given partition.
Parameters
• model (alphapy.Model) – The model object with plotting specifications.
• partition (alphapy.Partition) – Reference to the dataset.
Returns None
Return type None
References
http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html
alphapy.plots.plot_partial_dependence(est, X, features, fnames, tag, n_jobs=- 1, verbosity=0,
directory=None)
Display a Partial Dependence Plot.
Parameters
• est (estimator) – The scikit-learn estimator for calculating partial dependence.
• X (numpy array) – The data on which the estimator was trained.
• features (list of int) – Feature numbers of X.
• fnames (list of str) – The feature names to plot.
• tag (str) – Unique identifier for the plot
• n_jobs (int, optional) – The maximum number of parallel jobs.
• verbosity (int, optional) – The amount of logging from 0 (minimum) and higher.
• directory (str) – Directory where the plot will be stored.
Returns None
Return type None.
References
http://scikit-learn.org/stable/auto_examples/ensemble/plot_partial_dependence.html#
sphx-glr-auto-examples-ensemble-plot-partial-dependence-py
alphapy.plots.plot_roc_curve(model, partition)
Display ROC Curves with Cross-Validation.
Parameters
• model (alphapy.Model) – The model object with plotting specifications.
• partition (alphapy.Partition) – Reference to the dataset.
Returns None
Return type None
References
http://scikit-learn.org/stable/modules/model_evaluation.html#receiver-operating-characteristic-roc
alphapy.plots.plot_scatter(df, features, target, tag='eda', directory=None)
Plot a scatterplot matrix, also known as a pair plot.
Parameters
• df (pandas.DataFrame) – The dataframe containing the features.
• features (list of str) – The features to compare in the scatterplot.
• target (str) – The target variable for contrast.
• tag (str) – Unique identifier for the plot.
• directory (str, optional) – The full specification of the plot location.
Returns None
Return type None.
References
https://seaborn.pydata.org/examples/scatterplot_matrix.html
alphapy.plots.plot_swarm(df, x, y, hue, tag='eda', directory=None)
Display a Swarm Plot.
Parameters
• df (pandas.DataFrame) – The dataframe containing the x and y features.
• x (str) – Variable name in df to display along the x-axis.
• y (str) – Variable name in df to display along the y-axis.
• hue (str) – Variable name to be used as hue, i.e., another data dimension.
• tag (str) – Unique identifier for the plot.
• directory (str, optional) – The full specification of the plot location.
Returns None
Return type None.
References
http://seaborn.pydata.org/generated/seaborn.swarmplot.html
alphapy.plots.plot_time_series(df, target, tag='eda', directory=None)
Plot time series data.
Parameters
• df (pandas.DataFrame) – The dataframe containing the target feature.
• target (str) – The target variable for the time series plot.
• tag (str) – Unique identifier for the plot.
• directory (str, optional) – The full specification of the plot location.
Returns None
Return type None.
References
http://seaborn.pydata.org/generated/seaborn.tsplot.html
alphapy.plots.plot_validation_curve(model, partition, pname, prange)
Generate scikit-learn validation curves.
Parameters
• model (alphapy.Model) – The model object with plotting specifications.
• partition (alphapy.Partition) – Reference to the dataset.
• pname (str) – Name of the hyperparameter to test.
• prange (numpy array) – The values of the hyperparameter that will be evaluated.
Returns None
Return type None
References
http://scikit-learn.org/stable/auto_examples/model_selection/plot_validation_curve.html#
sphx-glr-auto-examples-model-selection-plot-validation-curve-py
alphapy.plots.write_plot(vizlib, plot, plot_type, tag, directory=None)
Save the plot to a file, or display it interactively.
Parameters
• vizlib (str) – The visualization library: 'matplotlib', 'seaborn', or 'bokeh'.
• plot (module) – Plotting context, e.g., plt.
• plot_type (str) – Type of plot to generate.
• tag (str) – Unique identifier for the plot.
• directory (str, optional) – The full specification for the directory location. if directory
is None, then the plot is displayed interactively.
Returns None
References
Visualization Libraries:
• Matplotlib : http://matplotlib.org/
• Seaborn : https://seaborn.pydata.org/
• Bokeh : http://bokeh.pydata.org/en/latest/
Notes
Warning: The portfolio management functions balance, kick_out, and stop_loss are not part of
the main StockStream pipeline, and thus have not been thoroughly tested. Feel free to exercise the code and
report any issues.
Notes
This function also generates the files required for analysis by the pyfolio package:
• Returns File
• Positions File
• Transactions File
alphapy.portfolio.kick_out(p, tdate)
Trim the portfolio based on filter criteria.
To reduce a portfolio’s positions, AlphaPy can rank the positions on some criterion, such as open profit or net
return. On a periodic basis, the worst performers can be culled from the portfolio.
Parameters
• p (alphapy.Portfolio) – The portfolio for reducing positions.
• tdate (datetime) – The date to trim the portfolio positions.
Returns p – The reduced portfolio.
Return type alphapy.Portfolio
Notes
Warning: The portfolio management functions kick_out, balance, and stop_loss are not part of
the main StockStream pipeline, and thus have not been thoroughly tested. Feel free to exercise the code and
report any issues.
alphapy.portfolio.portfolio_name(group_name, tag)
Return the name of the portfolio.
Parameters
Notes
Warning: The portfolio management functions stop_loss, balance, and kick_out are not part of
the main StockStream pipeline, and thus have not been thoroughly tested. Feel free to exercise the code and
report any issues.
Notes
The cost basis is calculated as the total value of all trades (14,000) divided by the total number of shares traded
(800), so 14,000 / 800 = 17.5, and the net position is -200.
alphapy.portfolio.withdraw_portfolio(p, cash, tdate)
Withdraw cash from a given portfolio.
Parameters
• p (alphapy.Portfolio) – Portfolio to accept the withdrawal.
• cash (float) – Cash amount to withdraw.
• tdate (datetime) – The date of withdrawal.
Returns p – Portfolio with the withdrawn cash.
Return type alphapy.Portfolio
alphapy.sport_flow.main(args=None)
The main program for SportFlow.
Notes
Examples
systems = {}
alphapy.system.run_system(model, system, group, intraday=False, quantity=1)
Run a system for a given group, creating a trades frame.
Parameters
• model (alphapy.Model) – The model object with specifications.
alphapy.transforms.abovema(f, c, p=50)
Determine those values of the dataframe that are above the moving average.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the column in the dataframe f.
• p (int) – The period of the moving average.
Returns new_column – The array containing the new feature.
Return type pandas.Series (bool)
alphapy.transforms.adx(f, p=14)
Calculate the Average Directional Index (ADX).
Parameters
• f (pandas.DataFrame) – Dataframe with all columns required for calculation. If you are
applying ADX through vapply, then these columns are calculated automatically.
• p (int) – The period over which to calculate the ADX.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
References
The Average Directional Movement Index (ADX) was invented by J. Welles Wilder in 1978 [WIKI_ADX]. Its
value reflects the strength of trend in any given instrument.
alphapy.transforms.belowma(f, c, p=50)
Determine those values of the dataframe that are below the moving average.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the column in the dataframe f.
• p (int) – The period of the moving average.
Returns new_column – The array containing the new feature.
Return type pandas.Series (bool)
alphapy.transforms.c2max(f, c1, c2)
Take the maximum value between two columns in a dataframe.
Parameters
• f (pandas.DataFrame) – Dataframe containing the two columns c1 and c2.
• c1 (str) – Name of the first column in the dataframe f.
• c2 (str) – Name of the second column in the dataframe f.
Returns max_val – The maximum value of the two columns.
Return type float
alphapy.transforms.c2min(f, c1, c2)
Take the minimum value between two columns in a dataframe.
Parameters
• f (pandas.DataFrame) – Dataframe containing the two columns c1 and c2.
• c1 (str) – Name of the first column in the dataframe f.
• c2 (str) – Name of the second column in the dataframe f.
Returns min_val – The minimum value of the two columns.
Return type float
alphapy.transforms.diff(f, c, n=1)
Calculate the n-th order difference for the given variable.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the column in the dataframe f.
• n (int) – The number of times that the values are differenced.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
alphapy.transforms.diminus(f, p=14)
Calculate the Minus Directional Indicator (-DI).
Parameters
References
A component of the average directional index (ADX) that is used to measure the presence of a downtrend. When
the -DI is sloping downward, it is a signal that the downtrend is getting stronger [IP_NDI].
alphapy.transforms.diplus(f, p=14)
Calculate the Plus Directional Indicator (+DI).
Parameters
• f (pandas.DataFrame) – Dataframe with columns high and low.
• p (int) – The period over which to calculate the +DI.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
References
A component of the average directional index (ADX) that is used to measure the presence of an uptrend. When
the +DI is sloping upward, it is a signal that the uptrend is getting stronger [IP_PDI].
alphapy.transforms.dminus(f )
Calculate the Minus Directional Movement (-DM).
Parameters f (pandas.DataFrame) – Dataframe with columns high and low.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
References
Directional movement is negative (minus) when the prior low minus the current low is greater than the current
high minus the prior high. This so-called Minus Directional Movement (-DM) equals the prior low minus the
current low, provided it is positive. A negative value would simply be entered as zero [SC_ADX].
alphapy.transforms.dmplus(f )
Calculate the Plus Directional Movement (+DM).
Parameters f (pandas.DataFrame) – Dataframe with columns high and low.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
References
Directional movement is positive (plus) when the current high minus the prior high is greater than the prior low
minus the current low. This so-called Plus Directional Movement (+DM) then equals the current high minus the
prior high, provided it is positive. A negative value would simply be entered as zero [SC_ADX].
alphapy.transforms.down(f, c)
Find the negative values in the series.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the column in the dataframe f.
Returns new_column – The array containing the new feature.
Return type pandas.Series (bool)
alphapy.transforms.dpc(f, c)
Get the negative values, with positive values zeroed.
Parameters
• f (pandas.DataFrame) – Dataframe with column c.
• c (str) – Name of the column.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
alphapy.transforms.ema(f, c, p=20)
Calculate the mean on a rolling basis.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the column in the dataframe f.
• p (int) – The period over which to calculate the rolling mean.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
References
An exponential moving average (EMA) is a type of moving average that is similar to a simple moving average,
except that more weight is given to the latest data [IP_EMA].
alphapy.transforms.extract_bizday(f, c)
Extract business day of month and week.
Parameters
• f (pandas.DataFrame) – Dataframe containing the date column c.
• c (str) – Name of the date column in the dataframe f.
Returns date_features – The dataframe containing the date features.
Return type pandas.DataFrame
alphapy.transforms.extract_date(f, c)
Extract date into its components: year, month, day, dayofweek.
Parameters
• f (pandas.DataFrame) – Dataframe containing the date column c.
• c (str) – Name of the date column in the dataframe f.
Returns date_features – The dataframe containing the date features.
Return type pandas.DataFrame
alphapy.transforms.extract_time(f, c)
Extract time into its components: hour, minute, second.
Parameters
• f (pandas.DataFrame) – Dataframe containing the time column c.
• c (str) – Name of the time column in the dataframe f.
Returns time_features – The dataframe containing the time features.
Return type pandas.DataFrame
alphapy.transforms.gap(f )
Calculate the gap percentage between the current open and the previous close.
Parameters f (pandas.DataFrame) – Dataframe with columns open and close.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
References
A gap is a break between prices on a chart that occurs when the price of a stock makes a sharp move up or down
with no trading occurring in between [IP_GAP].
alphapy.transforms.gapbadown(f )
Determine whether or not there has been a breakaway gap down.
Parameters f (pandas.DataFrame) – Dataframe with columns open and low.
Returns new_column – The array containing the new feature.
Return type pandas.Series (bool)
References
A breakaway gap represents a gap in the movement of a stock price supported by levels of high volume
[IP_BAGAP].
alphapy.transforms.gapbaup(f )
Determine whether or not there has been a breakaway gap up.
Parameters f (pandas.DataFrame) – Dataframe with columns open and high.
Returns new_column – The array containing the new feature.
Return type pandas.Series (bool)
References
A breakaway gap represents a gap in the movement of a stock price supported by levels of high volume
[IP_BAGAP].
alphapy.transforms.gapdown(f )
Determine whether or not there has been a gap down.
Parameters f (pandas.DataFrame) – Dataframe with columns open and close.
Returns new_column – The array containing the new feature.
Return type pandas.Series (bool)
References
A gap is a break between prices on a chart that occurs when the price of a stock makes a sharp move up or down
with no trading occurring in between [IP_GAP].
alphapy.transforms.gapup(f )
Determine whether or not there has been a gap up.
Parameters f (pandas.DataFrame) – Dataframe with columns open and close.
Returns new_column – The array containing the new feature.
Return type pandas.Series (bool)
References
A gap is a break between prices on a chart that occurs when the price of a stock makes a sharp move up or down
with no trading occurring in between [IP_GAP].
alphapy.transforms.gtval(f, c1, c2)
Determine whether or not the first column of a dataframe is greater than the second.
Parameters
• f (pandas.DataFrame) – Dataframe containing the two columns c1 and c2.
• c1 (str) – Name of the first column in the dataframe f.
• c2 (str) – Name of the second column in the dataframe f.
Returns new_column – The array containing the new feature.
Return type pandas.Series (bool)
alphapy.transforms.gtval0(f, c1, c2)
For positive values in the first column of the dataframe that are greater than the second column, get the value in
the first column, otherwise return zero.
Parameters
• f (pandas.DataFrame) – Dataframe containing the two columns c1 and c2.
• c1 (str) – Name of the first column in the dataframe f.
• c2 (str) – Name of the second column in the dataframe f.
Returns new_val – A positive value or zero.
Return type float
alphapy.transforms.higher(f, c, o=1)
Determine whether or not a series value is higher than the value o periods back.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the column in the dataframe f.
• o (int, optional) – Offset value for shifting the series.
Returns new_column – The array containing the new feature.
Return type pandas.Series (bool)
alphapy.transforms.highest(f, c, p=20)
Calculate the highest value on a rolling basis.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the column in the dataframe f.
• p (int) – The period over which to calculate the rolling maximum.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
alphapy.transforms.hlrange(f, p=1)
Calculate the Range, the difference between High and Low.
Parameters
• f (pandas.DataFrame) – Dataframe with columns high and low.
• p (int) – The period over which the range is calculated.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
alphapy.transforms.lower(f, c, o=1)
Determine whether or not a series value is lower than the value o periods back.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the column in the dataframe f.
• o (int, optional) – Offset value for shifting the series.
Returns new_column – The array containing the new feature.
Return type pandas.Series (bool)
alphapy.transforms.lowest(f, c, p=20)
Calculate the lowest value on a rolling basis.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the column in the dataframe f.
• p (int) – The period over which to calculate the rolling minimum.
Returns new_column – The array containing the new feature.
References
In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by
creating series of averages of different subsets of the full data set [WIKI_MA].
alphapy.transforms.maratio(f, c, p1=1, p2=10)
Calculate the ratio of two moving averages.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the column in the dataframe f.
• p1 (int) – The period of the first moving average.
• p2 (int) – The period of the second moving average.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
alphapy.transforms.mval(f, c)
Get the negative value, otherwise zero.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the column in the dataframe f.
Returns new_val – Negative value or zero.
Return type float
alphapy.transforms.net(f, c='close', o=1)
Calculate the net change of a given column.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the column in the dataframe f.
• o (int, optional) – Offset value for shifting the series.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
References
Net change is the difference between the closing price of a security on the day’s trading and the previous day’s
closing price. Net change can be positive or negative and is quoted in terms of dollars [IP_NET].
alphapy.transforms.netreturn(f, c, o=1)
Calculate the net return, or Return On Invesment (ROI)
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the column in the dataframe f.
• o (int, optional) – Offset value for shifting the series.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
References
ROI measures the amount of return on an investment relative to the original cost. To calculate ROI, the benefit
(or return) of an investment is divided by the cost of the investment, and the result is expressed as a percentage
or a ratio [IP_ROI].
alphapy.transforms.pchange1(f, c, o=1)
Calculate the percentage change within the same variable.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the column in the dataframe f.
• o (int) – Offset to the previous value.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
alphapy.transforms.pchange2(f, c1, c2)
Calculate the percentage change between two variables.
Parameters
• f (pandas.DataFrame) – Dataframe containing the two columns c1 and c2.
• c1 (str) – Name of the first column in the dataframe f.
• c2 (str) – Name of the second column in the dataframe f.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
alphapy.transforms.pval(f, c)
Get the positive value, otherwise zero.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the column in the dataframe f.
Returns new_val – Positive value or zero.
References
Developed by J. Welles Wilder, the Relative Strength Index (RSI) is a momentum oscillator that measures the
speed and change of price movements [SC_RSI].
alphapy.transforms.rtotal(vec)
Calculate the running total.
Parameters vec (pandas.Series) – The input array for calculating the running total.
Returns running_total – The final running total.
Return type int
Example
>>> vec.rolling(window=20).apply(rtotal)
alphapy.transforms.runs(vec)
Calculate the total number of runs.
Parameters vec (pandas.Series) – The input array for calculating the number of runs.
Returns runs_value – The total number of runs.
Return type int
Example
>>> vec.rolling(window=20).apply(runs)
References
For more information about runs tests for detecting non-randomness, refer to [RUNS].
alphapy.transforms.split_to_letters(f, c)
Separate text into distinct characters.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the text column in the dataframe f.
Returns new_feature – The array containing the new feature.
Return type pandas.Series
Example
Example
>>> vec.rolling(window=20).apply(streak)
alphapy.transforms.texplode(f, c)
Get dummy values for a text column.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str) – Name of the text column in the dataframe f.
Returns dummies – The dataframe containing the dummy variables.
Return type pandas.DataFrame
Example
This function is useful for columns that appear to have separate character codes but are consolidated into a single
column. Here, the column c is transformed into five dummy variables.
alphapy.transforms.truehigh(f )
Calculate the True High value.
Parameters f (pandas.DataFrame) – Dataframe with columns high and low.
Returns new_column – The array containing the new feature.
Return type pandas.Series (float)
References
References
References
References
In the statistics of time series, and in particular the analysis of financial time series for stock trading purposes,
a moving-average crossover occurs when, on plotting two moving averages each based on different degrees of
smoothing, the traces of these moving averages cross [WIKI_XMA].
alphapy.transforms.xmaup(f, c='close', pfast=20, pslow=50)
Determine those values of the dataframe that are below the moving average.
Parameters
• f (pandas.DataFrame) – Dataframe containing the column c.
• c (str, optional) – Name of the column in the dataframe f.
• pfast (int, optional) – The period of the fast moving average.
• pslow (int, optional) – The period of the slow moving average.
Returns new_column – The array containing the new feature.
Return type pandas.Series (bool)
References
In the statistics of time series, and in particular the analysis of financial time series for stock trading purposes,
a moving-average crossover occurs when, on plotting two moving averages each based on different degrees of
smoothing, the traces of these moving averages cross [WIKI_XMA].
alphapy.transforms.zscore(vec)
Calculate the Z-Score.
Parameters vec (pandas.Series) – The input array for calculating the Z-Score.
Returns zscore – The value of the Z-Score.
Return type float
References
To calculate the Z-Score, you can find more information here [ZSCORE].
Example
>>> vec.rolling(window=20).apply(zscore)
alphapy.utilities.get_datestamp()
Returns today’s datestamp.
Returns datestamp – The valid date string in YYYY-mm-dd format.
Return type str
alphapy.utilities.most_recent_file(directory, file_spec)
Find the most recent file in a directory.
Parameters
• directory (str) – Full directory specification.
• file_spec (str) – Wildcard search string for the file to locate.
Returns file_name – Name of the file to read, excluding the extension.
Return type str
alphapy.utilities.np_store_data(data, dir_name, file_name, extension, separator)
Store NumPy data in a file.
Parameters
• data (numpy array) – The model component to store
• dir_name (str) – Full directory specification.
• file_name (str) – Name of the file to read, excluding the extension.
• extension (str) – File name extension, e.g., csv.
• separator (str) – The delimiter between fields in the file.
Returns None
Return type None
alphapy.utilities.remove_list_items(elements, alist)
Remove one or more items from the given list.
Parameters
• elements (list) – The items to remove from the list alist.
• alist (list) – Any object of any type can be a list item.
Returns sublist – The subset of items after removal.
Return type list
Examples
alphapy.utilities.subtract_days(date_string, ndays)
Subtract a number of days from a given date.
Parameters
• date_string (str) – An alphanumeric string in the format %Y-%m-%d.
• ndays (int) – Number of days to subtract.
Returns new_date_string – The adjusted date string in the format %Y-%m-%d.
Return type str
Examples
alphapy.utilities.valid_date(date_string)
Determine whether or not the given string is a valid date.
Parameters date_string (str) – An alphanumeric string in the format %Y-%m-%d.
Returns date_string – The valid date string.
Return type str
Raises ValueError – Not a valid date.
Examples
alphapy.utilities.valid_name(name)
Determine whether or not the given string is a valid alphanumeric string.
Parameters name (str) – An alphanumeric identifier.
Returns result – True if the name is valid, else False.
Return type bool
Examples
Examples
variables = {}
alphapy.variables.allvars(expr)
Get the list of valid names in the expression.
Parameters expr (str) – A valid expression conforming to the Variable Definition Language.
Returns vlist – List of valid variable names.
Return type list
alphapy.variables.vapply(group, vname, vfuncs=None)
Apply a variable to multiple dataframes.
Parameters
• group (alphapy.Group) – The input group.
• vname (str) – The variable to apply to the group.
• vfuncs (dict, optional) – Dictionary of external modules and functions.
Returns None
Return type None
Other Parameters Frame.frames (dict) – Global dictionary of dataframes
See also:
vunapply()
alphapy.variables.vexec(f, v, vfuncs=None)
Add a variable to the given dataframe.
This is the core function for adding a variable to a dataframe. The default variable functions are already defined
locally in alphapy.transforms; however, you may want to define your own variable functions. If so, then
the vfuncs parameter will contain the list of modules and functions to be imported and applied by the vexec
function.
To write your own variable function, your function must have a pandas DataFrame as an input parameter and
must return a pandas DataFrame with the new variable(s).
Parameters
• f (pandas.DataFrame) – Dataframe to contain the new variable.
• v (str) – Variable to add to the dataframe.
• vfuncs (dict, optional) – Dictionary of external modules and functions.
Returns f – Dataframe with the new variable.
Return type pandas.DataFrame
Other Parameters Variable.variables (dict) – Global dictionary of variables
alphapy.variables.vmapply(group, vs, vfuncs=None)
Apply multiple variables to multiple dataframes.
Parameters
• group (alphapy.Group) – The input group.
• vs (list) – The list of variables to apply to the group.
• vfuncs (dict, optional) – Dictionary of external modules and functions.
Returns None
Return type None
See also:
vmunapply()
alphapy.variables.vmunapply(group, vs)
Remove a list of variables from multiple dataframes.
Parameters
• group (alphapy.Group) – The input group.
• vs (list) – The list of variables to remove from the group.
Returns None
Return type None
See also:
vmapply()
alphapy.variables.vparse(vname)
Parse a variable name into its respective components.
Parameters vname (str) – The name of the variable.
Returns
• vxlag (str) – Variable name without the lag component.
• root (str) – The base variable name without the parameters.
• plist (list) – The parameter list.
• lag (int) – The offset starting with the current value [0] and counting back, e.g., an offset [1]
means the previous value of the variable.
Notes
AlphaPy makes feature creation easy. The syntax of a variable name maps to a function call:
xma_20_50 => xma(20, 50)
Examples
>>> vparse('xma_20_50[1]')
# ('xma_20_50', 'xma', ['20', '50'], 1)
alphapy.variables.vsub(v, expr)
Substitute the variable parameters into the expression.
This function performs the parameter substitution when applying features to a dataframe. It is a mechanism
for the user to override the default values in any given expression when defining a feature, instead of having to
programmatically call a function with new values.
Parameters
• v (str) – Variable name.
• expr (str) – The expression for substitution.
Returns The expression with the new, substituted values.
Return type newexpr
alphapy.variables.vtree(vname)
Get all of the antecedent variables.
Before applying a variable to a dataframe, we have to recursively get all of the child variables, beginning with
the starting variable’s expression. Then, we have to extract the variables from all the subsequent expressions.
This process continues until all antecedent variables are obtained.
Parameters vname (str) – A valid variable stored in Variable.variables.
Returns all_variables – The variables that need to be applied before vname.
Return type list
Other Parameters Variable.variables (dict) – Global dictionary of variables
alphapy.variables.vunapply(group, vname)
Remove a variable from multiple dataframes.
Parameters
• group (alphapy.Group) – The input group.
• vname (str) – The variable to remove from the group.
Returns None
Return type None
Other Parameters Frame.frames (dict) – Global dictionary of dataframes
See also:
vapply()
FIFTEEN
• genindex
• modindex
• search
165
AlphaPy Documentation, Release 2.5.0
[CLUS] http://scikit-learn.org/stable/modules/clustering.html
[ISO] http://scikit-learn.org/stable/modules/manifold.html#isomap
[PCA] http://scikit-learn.org/stable/modules/decomposition.html#pca
[TSNE] http://scikit-learn.org/stable/modules/manifold.html#t-distributed-stochastic-neighbor-embedding-t-sne
[POLY] http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html
[IMP] http://scikit-learn.org/stable/modules/preprocessing.html#imputation
[LV] http://scikit-learn.org/stable/modules/feature_selection.html#variance-threshold
[UNI] http://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection
[ENC] https://github.com/scikit-learn-contrib/categorical-encoding
[IMB] https://github.com/scikit-learn-contrib/imbalanced-learn
[SCALE] http://scikit-learn.org/stable/modules/preprocessing.html
[EVAL] http://scikit-learn.org/stable/modules/model_evaluation.html
[GRID] http://scikit-learn.org/stable/modules/grid_search.html#grid-search
[PIPE] http://scikit-learn.org/stable/modules/pipeline.html#pipeline
[RFECV] http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html
[WIKI_ADX] https://en.wikipedia.org/wiki/Average_directional_movement_index
[IP_NDI] http://www.investopedia.com/terms/n/negativedirectionalindicator.asp
[IP_PDI] http://www.investopedia.com/terms/p/positivedirectionalindicator.asp
[SC_ADX] http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:average_directional_
index_adx
[IP_EMA] http://www.investopedia.com/terms/e/ema.asp
[IP_GAP] http://www.investopedia.com/terms/g/gap.asp
[IP_BAGAP] http://www.investopedia.com/terms/b/breakawaygap.asp
[WIKI_MA] https://en.wikipedia.org/wiki/Moving_average
[IP_NET] http://www.investopedia.com/terms/n/netchange.asp
[IP_ROI] http://www.investopedia.com/terms/r/returnoninvestment.asp
167
AlphaPy Documentation, Release 2.5.0
[SC_RSI] http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:relative_strength_
index_rsi
[RUNS] http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm
[TS_TR] http://help.tradestation.com/09_01/tradestationhelp/charting_definitions/true_range.htm
[WIKI_XMA] https://en.wikipedia.org/wiki/Moving_average_crossover
[ZSCORE] https://en.wikipedia.org/wiki/Standard_score
168 Bibliography
PYTHON MODULE INDEX
a
alphapy.__main__, 95
alphapy.alias, 96
alphapy.analysis, 97
alphapy.calendrical, 98
alphapy.data, 106
alphapy.estimators, 109
alphapy.features, 110
alphapy.frame, 116
alphapy.globals, 118
alphapy.group, 121
alphapy.market_flow, 122
alphapy.model, 123
alphapy.optimize, 126
alphapy.plots, 127
alphapy.portfolio, 134
alphapy.space, 141
alphapy.sport_flow, 141
alphapy.system, 144
alphapy.transforms, 145
alphapy.utilities, 159
alphapy.variables, 161
169
AlphaPy Documentation, Release 2.5.0
A module, 141
abovema() (in module alphapy.transforms), 145 alphapy.sport_flow
add() (alphapy.group.Group method), 121 module, 141
add_features() (in module alphapy.sport_flow), 141 alphapy.system
add_position() (in module alphapy.portfolio), 136 module, 144
adx() (in module alphapy.transforms), 145 alphapy.transforms
Alias (class in alphapy.alias), 96 module, 145
aliases (alphapy.alias.Alias attribute), 96 alphapy.utilities
allocate_trade() (in module alphapy.portfolio), module, 159
136 alphapy.variables
allvars() (in module alphapy.variables), 161 module, 161
alphapy.__main__ analyses (alphapy.analysis.Analysis attribute), 97
module, 95 Analysis (class in alphapy.analysis), 97
alphapy.alias analysis_name() (in module alphapy.analysis), 97
module, 96 apply_transform() (in module alphapy.features),
alphapy.analysis 110
module, 97 apply_transforms() (in module alphapy.features),
alphapy.calendrical 110
module, 98
alphapy.data
B
module, 106 backdiff (alphapy.globals.Encoders attribute), 118
alphapy.estimators balance() (in module alphapy.portfolio), 136
module, 109 basen (alphapy.globals.Encoders attribute), 118
alphapy.features belowma() (in module alphapy.transforms), 146
module, 110 binary (alphapy.globals.Encoders attribute), 118
alphapy.frame biz_day_month() (in module alphapy.calendrical),
module, 116 98
alphapy.globals biz_day_week() (in module alphapy.calendrical), 98
module, 118
alphapy.group C
module, 121 c2max() (in module alphapy.transforms), 146
alphapy.market_flow c2min() (in module alphapy.transforms), 146
module, 122 catboost (alphapy.globals.Encoders attribute), 118
alphapy.model christmas_day() (in module alphapy.calendrical),
module, 123 98
alphapy.optimize cinco_de_mayo() (in module alphapy.calendrical),
module, 126 98
alphapy.plots classification (alphapy.globals.ModelType at-
module, 127 tribute), 119
alphapy.portfolio close_position() (in module alphapy.portfolio),
module, 134 137
alphapy.space
171
AlphaPy Documentation, Release 2.5.0
172 Index
AlphaPy Documentation, Release 2.5.0
Index 173
AlphaPy Documentation, Release 2.5.0
174 Index
AlphaPy Documentation, Release 2.5.0
pval() (in module alphapy.transforms), 153 stop_loss() (in module alphapy.portfolio), 139
streak() (in module alphapy.transforms), 156
R subtract_dates() (in module alphapy.calendrical),
rdate_to_gdate() (in module alphapy.calendrical), 105
104 subtract_days() (in module alphapy.utilities), 160
rdate_to_gyear() (in module alphapy.calendrical), sum (alphapy.globals.Encoders attribute), 118
104 sx (alphapy.globals.Orders attribute), 119
read_frame() (in module alphapy.frame), 117 System (class in alphapy.system), 144
regression (alphapy.globals.ModelType attribute), systems (alphapy.system.System attribute), 144
119
remove() (alphapy.group.Group method), 121 T
remove_list_items() (in module al- target (alphapy.globals.Encoders attribute), 119
phapy.utilities), 159 test (alphapy.globals.Partition attribute), 120
remove_lv_features() (in module al- texplode() (in module alphapy.transforms), 156
phapy.features), 115 thanksgiving_day() (in module al-
remove_position() (in module alphapy.portfolio), phapy.calendrical), 105
139 Trade (class in alphapy.portfolio), 135
rfecv_search() (in module alphapy.optimize), 127 trade_system() (in module alphapy.system), 145
rindex() (in module alphapy.transforms), 154 train (alphapy.globals.Partition attribute), 120
rsi() (in module alphapy.transforms), 154 training_pipeline() (in module al-
rtotal() (in module alphapy.transforms), 154 phapy.__main__), 96
run_analysis() (in module alphapy.analysis), 97 truehigh() (in module alphapy.transforms), 156
run_system() (in module alphapy.system), 144 truelow() (in module alphapy.transforms), 157
runs() (in module alphapy.transforms), 155 truerange() (in module alphapy.transforms), 157
runs_test() (in module alphapy.transforms), 155
U
S under_cluster (alphapy.globals.SamplingMethod
saint_patricks_day() (in module al- attribute), 120
phapy.calendrical), 104 under_ncr (alphapy.globals.SamplingMethod at-
sample_data() (in module alphapy.data), 108 tribute), 120
SamplingMethod (class in alphapy.globals), 120 under_nearmiss (alphapy.globals.SamplingMethod
save_feature_map() (in module alphapy.model), attribute), 120
125 under_random (alphapy.globals.SamplingMethod at-
save_features() (in module alphapy.features), 115 tribute), 120
save_model() (in module alphapy.model), 125 under_tomek (alphapy.globals.SamplingMethod at-
save_predictions() (in module alphapy.model), tribute), 120
126 up() (in module alphapy.transforms), 157
save_predictor() (in module alphapy.model), 126 upc() (in module alphapy.transforms), 157
Scalers (class in alphapy.globals), 120 update_portfolio() (in module alphapy.portfolio),
se (alphapy.globals.Orders attribute), 119 139
select_features() (in module alphapy.features), update_position() (in module alphapy.portfolio),
116 139
sequence_frame() (in module alphapy.frame), 117
set_events() (in module alphapy.calendrical), 104 V
set_holidays() (in module alphapy.calendrical), valentines_day() (in module alphapy.calendrical),
105 105
sh (alphapy.globals.Orders attribute), 119 valid_date() (in module alphapy.utilities), 160
shuffle_data() (in module alphapy.data), 109 valid_name() (in module alphapy.utilities), 160
Space (class in alphapy.space), 141 valuate_portfolio() (in module al-
space_name() (in module alphapy.space), 141 phapy.portfolio), 140
split_to_letters() (in module al- valuate_position() (in module alphapy.portfolio),
phapy.transforms), 155 140
standard (alphapy.globals.Scalers attribute), 120 vapply() (in module alphapy.variables), 161
states (alphapy.portfolio.Trade attribute), 136 Variable (class in alphapy.variables), 161
Index 175
AlphaPy Documentation, Release 2.5.0
W
withdraw_portfolio() (in module al-
phapy.portfolio), 140
woe (alphapy.globals.Encoders attribute), 119
write_frame() (in module alphapy.frame), 118
write_plot() (in module alphapy.plots), 133
X
xmadown() (in module alphapy.transforms), 157
xmaup() (in module alphapy.transforms), 158
Z
zscore() (in module alphapy.transforms), 158
176 Index