Pandas Datareader
Pandas Datareader
Release 0.1
1 Installation 3
1.1 Install latest release version via pip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Install latest development version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Usage 5
3 Documentation 7
3.1 What’s New . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Remote Data Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Caching queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
i
ii
pandas-datareader Documentation, Release 0.1
Up to date remote data access for pandas, works for multiple versions of pandas.
Contents 1
pandas-datareader Documentation, Release 0.1
2 Contents
CHAPTER 1
Installation
or
3
pandas-datareader Documentation, Release 0.1
4 Chapter 1. Installation
CHAPTER 2
Usage
Starting in 0.19.0, pandas no longer supports pandas.io.data or pandas.io.wb, so you must replace your
imports from pandas.io with those from pandas_datareader:
Many functions from the data module have been included in the top level API.
5
pandas-datareader Documentation, Release 0.1
6 Chapter 2. Usage
CHAPTER 3
Documentation
Contents:
What’s New
This is a major release from 0.4.0. We recommend that all users upgrade.
Highlights include:
• Compat with the new Yahoo iCharts API. Yahoo removed the older API, this release restores ability to download
from Yahoo. (GH315)
• Enhancements
• Backwards incompatible API changes
• Bug Fixes
Enhancements
7
pandas-datareader Documentation, Release 0.1
Bug Fixes
This is a major release from 0.3.0 and includes compat with pandas 0.20.1, and some backwards incompatible API
changes.
Highlights include:
• Enhancements
• Backwards incompatible API changes
Enhancements
• Support has been dropped for Python 2.6 and 3.4 (GH313)
• Support has been dropped for pandas versions before 0.17.0 (GH313)
This is a major release from 0.2.1 and includes new features and a number of bug fixes.
Highlights include:
• New features
8 Chapter 3. Documentation
pandas-datareader Documentation, Release 0.1
– Other enhancements
• Bug Fixes
New features
• DataReader now supports dividend only pulls from Yahoo! Finance, see here (GH138).
• DataReader now supports downloading mutual fund prices from the Thrift Savings Plan, see here (GH157).
• DataReader now supports Google options data source, see here (GH148).
• DataReader now supports Google quotes, see here (GH188).
• DataReader now supports Enigma dataset. see here (GH245).
• DataReader now supports downloading a full list of NASDAQ listed symbols. see here (GH254).
Other enhancements
• Eurostat reader now supports larger data returned from API via zip format. (GH205)
• Added support for Python 3.6.
• Added support for pandas 19.2
Bug Fixes
• Fixed bug that caused DataReader to fail if company name has a comma. (GH85).
• Fixed bug in YahooOptions caused as a result of change in yahoo website format. (GH244).
This is a minor release from 0.2.0 and includes new features and bug fixes.
Highlights include:
• New features
• Backwards incompatible API changes
New features
• Options columns PctChg and IV (Implied Volatility) are now type float rather than string. (GH122)
This is a major release from 0.1.1 and includes new features and a number of bug fixes.
Highlights include:
• New features
• Backwards incompatible API changes
• Bug Fixes
New features
• Fama French indexes are not Pandas.PeriodIndex for annual and montly data, and pandas.DatetimeIndex other-
wise (GH56).
Bug Fixes
Functions from pandas_datareader.data and pandas_datareader.wb extract data from various Internet
sources into a pandas DataFrame. Currently the following sources are supported:
• Yahoo! Finance
• Google Finance
• Enigma
• Quandl
• St.Louis FED (FRED)
10 Chapter 3. Documentation
pandas-datareader Documentation, Release 0.1
Yahoo! Finance
In [6]: f.ix['2010-01-04']
Out[6]:
Open 10.170000
High 10.280000
Low 10.050000
Close 10.280000
Adj Close 8.201456
Volume 60855800.000000
Name: 2010-01-04 00:00:00, dtype: float64
Historical corporate actions (Dividends and Stock Splits) with ex-dates from Yahoo! Finance.
In [17]: f
Out[17]:
Dividends
Date
2012-01-27 0.05
2012-04-30 0.05
2012-08-01 0.05
2012-10-31 0.05
*Experimental*
The YahooQuotesReader class allows to get quotes data from Yahoo! Finance.
In [18]: import pandas_datareader.data as web
In [20]: amzn
Out[20]:
PE change_pct last short_ratio time
AMZN 195.83 +0.09% 1039.87 1.14 4:00pm
*Experimental*
The Options class allows the download of options data from Yahoo! Finance.
The get_all_data method downloads and caches option data for all expiry months and provides a formatted
DataFrame with a hierarchical index, so its easy to get to the specific option you want.
In [21]: from pandas_datareader.data import Options
12 Chapter 3. Documentation
pandas-datareader Documentation, Release 0.1
PctChg
Strike Expiry Type Symbol
2.5 2017-08-18 call AAPL170818C00002500 0.000000
put AAPL170818P00002500 0.000000
2018-01-19 call AAPL180119C00002500 -1.390978
5.0 2017-08-18 call AAPL170818C00005000 0.000000
2018-01-19 call AAPL180119C00005000 0.000000
#Show the volume traded of $100 strike puts at all expiry dates:
In [26]: data.loc[(100, slice(None), 'put'),'Vol'].head()
Out[26]:
Strike Expiry Type Symbol
100 2017-07-28 put AAPL170728P00100000 1
2017-08-18 put AAPL170818P00100000 620
2017-08-25 put AAPL170825P00100000 2
2017-09-01 put AAPL170901P00100000 1
2017-09-15 put AAPL170915P00100000 3
Name: Vol, dtype: float64
If you don’t want to download all the data, more specific requests can be made.
PctChg
Strike Expiry Type Symbol
95 2017-07-28 call AAPL170728C00095000 0.000000
100 2017-07-28 call AAPL170728C00100000 0.000000
105 2017-07-28 call AAPL170728C00105000 0.000000
120 2017-07-28 call AAPL170728C00120000 -2.171799
130 2017-07-28 call AAPL170728C00130000 3.579415
Note that if you call get_all_data first, this second call will happen much faster, as the data is cached.
If a given expiry date is not available, data for the next available expiry will be returned (January 15, 2015 in the above
example).
Available expiry dates can be accessed from the expiry_dates property.
In [31]: aapl.expiry_dates
Out[31]:
[datetime.date(2017, 7, 28),
datetime.date(2017, 8, 4),
datetime.date(2017, 8, 11),
datetime.date(2017, 8, 18),
datetime.date(2017, 8, 25),
datetime.date(2017, 9, 1),
datetime.date(2017, 9, 15),
datetime.date(2017, 10, 20),
datetime.date(2017, 11, 17),
datetime.date(2017, 12, 15),
datetime.date(2018, 1, 19),
datetime.date(2018, 2, 16),
datetime.date(2018, 4, 20),
datetime.date(2018, 6, 15),
datetime.date(2018, 9, 21),
datetime.date(2019, 1, 18)]
PctChg
Strike Expiry Type Symbol
95 2017-07-28 call AAPL170728C00095000 0.000000
100 2017-07-28 call AAPL170728C00100000 0.000000
105 2017-07-28 call AAPL170728C00105000 0.000000
120 2017-07-28 call AAPL170728C00120000 -2.171799
130 2017-07-28 call AAPL170728C00130000 3.579415
A list-like object containing dates can also be passed to the expiry parameter, returning options data for all expiry dates
in the list.
14 Chapter 3. Documentation
pandas-datareader Documentation, Release 0.1
The month and year parameters can be used to get all options data for a given month.
Google Finance
In [41]: f.ix['2010-01-04']
Out[41]:
Open 10.17
High 10.28
Low 10.05
Close 10.28
Volume 60855796.00
Name: 2010-01-04 00:00:00, dtype: float64
*Experimental*
The GoogleQuotesReader class allows to get quotes data from Google Finance.
In [44]: q
Out[44]:
change_pct last time
AMZN 0.09 1039.87 2017-07-25 16:00:00
GOOG -3.02 950.70 2017-07-25 16:00:00
*Experimental*
The Options class allows the download of options data from Google Finance.
The get_options_data method downloads options data for specified expiry date and provides a formatted
DataFrame with a hierarchical index, so its easy to get to the specific option you want.
Available expiry dates can be accessed from the expiry_dates property.
In [45]: from pandas_datareader.data import Options
PctChg
Strike Expiry Type Symbol
340 2018-01-19 call GOOG180119C00340000 0.00
put GOOG180119P00340000 0.00
350 2018-01-19 call GOOG180119C00350000 -4.85
put GOOG180119P00350000 -50.00
360 2018-01-19 call GOOG180119C00360000 0.00
Enigma
Access datasets from Enigma, the world’s largest repository of structured public data.
In [49]: import os
/home/docs/checkouts/readthedocs.org/user_builds/pandas-datareader/envs/latest/local/
˓→lib/python2.7/site-packages/pandas_datareader-0.5.0-py2.7.egg/pandas_datareader/
42
43 def get_data_enigma(*args, **kwargs):
---> 44 return EnigmaReader(*args, **kwargs).read()
45
46
/home/docs/checkouts/readthedocs.org/user_builds/pandas-datareader/envs/latest/local/
˓→lib/python2.7/site-packages/pandas_datareader-0.5.0-py2.7.egg/pandas_datareader/
16 Chapter 3. Documentation
pandas-datareader Documentation, Release 0.1
47 self._api_key = os.getenv('ENIGMA_API_KEY')
48 if self._api_key is None:
---> 49 raise ValueError("Please provide an Enigma API key or set "
50 "the ENIGMA_API_KEY environment variable\n"
51 "If you do not have an API key, you can get "
ValueError: Please provide an Enigma API key or set the ENIGMA_API_KEY environment
˓→variable
If you do not have an API key, you can get one here: https://app.enigma.io/signup
In [52]: df.columns
Quandl
Daily financial data (prices of stocks, ETFs etc.) from Quandl. The symbol names consist of two parts: DB name and
symbol name. DB names can be all the free ones listed on the Quandl website <https://blog.quandl.com/free-data-on-
quandl>__. Symbol names vary with DB name; for WIKI (US stocks), they are the common ticker symbols, in some
other cases (such as FSE) they can be a bit strange. Some sources are also mapped to suitable ISO country codes in
the dot suffix style shown above, currently available for ‘BE, CN, DE, FR, IN, JP, NL, PT, UK, US.
As of June 2017, each DB has a different data schema, the coverage in terms of time range is sometimes surprisingly
small, and the data quality is not always good.
In [56]: df.loc['2015-01-02']
Out[56]:
Open High Low Close Volume ExDividend SplitRatio \
Date
2015-01-02 111.39 111.44 107.35 109.33 53204626 0 1
FRED
In [62]: gdp.ix['2013-01-01']
Out[62]:
GDP 16475.4
Name: 2013-01-01 00:00:00, dtype: float64
# Multiple series:
In [63]: inflation = web.DataReader(["CPIAUCSL", "CPILFESL"], "fred", start, end)
In [64]: inflation.head()
Out[64]:
CPIAUCSL CPILFESL
DATE
2010-01-01 217.488 220.633
2010-02-01 217.281 220.731
2010-03-01 217.353 220.783
2010-04-01 217.403 220.822
2010-05-01 217.290 220.962
Fama/French
Access datasets from the Fama/French Data Library. The get_available_datasets function returns a list of
all available datasets.
In [67]: len(get_available_datasets())
Out[67]: 262
In [69]: print(ds['DESCR'])
5 Industry Portfolios
---------------------
This file was created by CMPT_IND_RETS using the 201705 CRSP database. It contains
˓→value- and equal-weighted returns for 5 industry portfolios. The portfolios are
˓→constructed at the end of June. The annual returns are from January to December.
˓→Missing data are indicated by -99.99 or -999. Copyright 2017 Kenneth R. French
In [70]: ds[4].ix['1926-07']
18 Chapter 3. Documentation
pandas-datareader Documentation, Release 0.1
<ipython-input-70-79093f940e41> in <module>()
----> 1 ds[4].ix['1926-07']
1484 drop_level=drop_level)
1485 else:
-> 1486 loc = self.index.get_loc(key)
1487
1488 if isinstance(loc, np.ndarray):
World Bank
pandas users can easily access thousands of panel data series from the World Bank’s World Development Indicators
by using the wb I/O functions.
Indicators
Either from exploring the World Bank site, or using the search function included, every world bank indicator is
accessible.
For example, if you wanted to compare the Gross Domestic Products per capita in constant dollars in North America,
In [2]: wb.search('gdp.*capita.*const').iloc[:,:2]
Out[2]:
id name
3242 GDPPCKD GDP per Capita, constant US$, millions
5143 NY.GDP.PCAP.KD GDP per capita (constant 2005 US$)
5145 NY.GDP.PCAP.KN GDP per capita (constant LCU)
5147 NY.GDP.PCAP.PP.KD GDP per capita, PPP (constant 2005 internation...
Then you would use the download function to acquire the data from the World Bank’s servers:
In [3]: dat = wb.download(indicator='NY.GDP.PCAP.KD', country=['US', 'CA', 'MX'],
˓→start=2005, end=2008)
In [4]: print(dat)
NY.GDP.PCAP.KD
country year
Canada 2008 36005.5004978584
2007 36182.9138439757
2006 35785.9698172849
2005 35087.8925933298
Mexico 2008 8113.10219480083
2007 8119.21298908649
2006 7961.96818458178
2005 7666.69796097264
United States 2008 43069.5819857208
2007 43635.5852068142
2006 43228.111147107
2005 42516.3934699993
The resulting dataset is a properly formatted DataFrame with a hierarchical index, so it is easy to apply .groupby
transformations to it:
In [6]: dat['NY.GDP.PCAP.KD'].groupby(level=0).mean()
Out[6]:
country
Canada 35765.569188
Mexico 7965.245332
United States 43112.417952
dtype: float64
Now imagine you want to compare GDP to the share of people with cellphone contracts around the world.
In [7]: wb.search('cell.*%').iloc[:,:2]
Out[7]:
id name
3990 IT.CEL.SETS.FE.ZS Mobile cellular telephone users, female (% of ...
3991 IT.CEL.SETS.MA.ZS Mobile cellular telephone users, male (% of po...
4027 IT.MOB.COV.ZS Population coverage of mobile cellular telepho...
Notice that this second search was much faster than the first one because pandas now has a cached list of available
data series.
In [13]: ind = ['NY.GDP.PCAP.KD', 'IT.MOB.COV.ZS']
In [14]: dat = wb.download(indicator=ind, country='all', start=2011, end=2011).
˓→dropna()
20 Chapter 3. Documentation
pandas-datareader Documentation, Release 0.1
Finally, we use the statsmodels package to assess the relationship between our two variables using ordinary least
squares regression. Unsurprisingly, populations in rich countries tend to use cellphones at a higher rate:
In [17]:
import numpy as np
In [18]:
import statsmodels.formula.api as smf
In [19]:mod = smf.ols("cellphone ~ np.log(gdp)", dat).fit()
In [20]:
print(mod.summary())
OLS Regression Results
==============================================================================
Dep. Variable: cellphone R-squared: 0.297
Model: OLS Adj. R-squared: 0.274
Method: Least Squares F-statistic: 13.08
Date: Thu, 25 Jul 2013 Prob (F-statistic): 0.00105
Time: 15:24:42 Log-Likelihood: -139.16
No. Observations: 33 AIC: 282.3
Df Residuals: 31 BIC: 285.3
Df Model: 1
===============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
-------------------------------------------------------------------------------
Intercept 16.5110 19.071 0.866 0.393 -22.384 55.406
np.log(gdp) 9.9333 2.747 3.616 0.001 4.331 15.535
==============================================================================
Omnibus: 36.054 Durbin-Watson: 2.071
Prob(Omnibus): 0.000 Jarque-Bera (JB): 119.133
Skew: -2.314 Prob(JB): 1.35e-26
Kurtosis: 11.077 Cond. No. 45.8
==============================================================================
Country Codes
The country argument accepts a string or list of mixed two or three character ISO country codes, as well as dynamic
World Bank exceptions to the ISO standards.
For a list of the the hard-coded country codes (used solely for error handling logic) see pandas_datareader.
wb.country_codes.
Note: The World Bank’s country list and indicators are dynamic. As of 0.15.1, wb.download() is more flexible.
To achieve this, the warning and exception logic changed.
The world bank converts some country codes, in their response, which makes error checking by pandas difficult.
Retired indicators still persist in the search.
Given the new flexibility of 0.15.1, improved error handling by the user may be necessary for fringe cases.
To help identify issues:
There are at least 4 kinds of country codes:
1. Standard (2/3 digit ISO) - returns data, will warn and error properly.
2. Non-standard (WB Exceptions) - returns data, but will falsely warn.
3. Blank - silently missing from the response.
4. Bad - causes the entire response from WB to fail, always exception inducing.
There are at least 3 kinds of indicators:
1. Current - Returns data.
2. Retired - Appears in search results, yet won’t return data.
3. Bad - Will not return data.
Use the errors argument to control warnings and exceptions. Setting errors to ignore or warn, won’t stop failed
responses. (ie, 100% bad indicators, or a single “bad” (#4 above) country code).
See docstrings for more info.
OECD
OECD Statistics are avaliable via DataReader. You have to specify OECD’s data set code.
To confirm data set code, access to each data -> Export -> SDMX Query. Following example is to down-
load “Trade Union Density” data which set code is “UN_DEN”.
In [74]: df.columns
Out[74]:
Index([u'Australia', u'Austria', u'Belgium', u'Canada', u'Czech Republic',
u'Denmark', u'Finland', u'France', u'Germany', u'Greece', u'Hungary',
u'Iceland', u'Ireland', u'Italy', u'Japan', u'Korea', u'Luxembourg',
u'Mexico', u'Netherlands', u'New Zealand', u'Norway', u'Poland',
u'Portugal', u'Slovak Republic', u'Spain', u'Sweden', u'Switzerland',
u'Turkey', u'United Kingdom', u'United States', u'OECD countries',
u'Chile', u'Slovenia', u'Estonia', u'Israel'],
dtype='object', name=u'Country')
22 Chapter 3. Documentation
pandas-datareader Documentation, Release 0.1
Eurostat
In [78]: df
Out[78]:
ACCIDENT Collisions of trains, including collisions with obstacles within the
˓→clearance gauge \
UNIT
˓→ Number
GEO
˓→ Austria
FREQ
˓→ Annual
TIME_PERIOD
˓→
2010-01-01 3
˓→
2011-01-01 2
˓→
2012-01-01 1
˓→
2013-01-01 4
˓→
2014-01-01 1
˓→
2015-01-01 7
˓→
ACCIDENT \
UNIT
GEO Belgium Bulgaria Switzerland Channel Tunnel Czech Republic
FREQ Annual Annual Annual Annual Annual
TIME_PERIOD
2010-01-01 5 2 5 0 3
2011-01-01 0 0 4 0 6
2012-01-01 3 3 4 0 6
2013-01-01 1 2 6 0 5
2014-01-01 3 4 0 0 13
2015-01-01 0 3 3 0 14
ACCIDENT \
UNIT
GEO Germany (until 1990 former territory of the FRG) Denmark Estonia
FREQ Annual Annual Annual
TIME_PERIOD
2010-01-01 13 0 1
2011-01-01 18 1 0
2012-01-01 23 1 3
2013-01-01 29 0 0
2014-01-01 32 0 0
2015-01-01 40 3 0
ACCIDENT
UNIT
GEO Romania Sweden Slovenia Slovakia United Kingdom
FREQ Annual Annual Annual Annual Annual
TIME_PERIOD
2010-01-01 271 69 21 85 62
2011-01-01 217 54 11 84 78
2012-01-01 215 47 14 96 75
2013-01-01 180 43 13 94 84
2014-01-01 185 53 15 113 54
2015-01-01 141 40 14 87 40
EDGAR Index
** As of December 31st, the SEC disabled access via FTP. EDGAR support currently broken until re-write to use
HTTPS. **
Company filing index from EDGAR (SEC).
The daily indices get large quickly (i.e. the set of daily indices from 1994 to 2015 is 1.5GB), and the FTP server will
close the connection past some downloading threshold . In testing, pulling one year at a time works well. If the FTP
server starts refusing your connections, you should be able to reconnect after waiting a few minutes.
In [81]: tspreader.read()
Out[81]:
L Income L 2020 L 2030 L 2040 L 2050 G Fund F Fund \
date
2015-10-01 17.5164 22.5789 24.2159 25.5690 14.4009 14.8380 17.0467
24 Chapter 3. Documentation
pandas-datareader Documentation, Release 0.1
Caching queries
Making the same request repeatedly can use a lot of bandwidth, slow down your code and may result in your IP being
banned.
pandas-datareader allows you to cache queries using requests_cache by passing a requests_cache.
Session to DataReader or Options using the session parameter.
Below is an example with Yahoo! Finance. The session parameter is implemented for all datareaders.
In [9]: f.ix['2010-01-04']
Out[9]:
Open 10.170000
High 10.280000
Low 10.050000
Close 10.280000
Adj Close 8.201456
Volume 60855800.000000
Name: 2010-01-04 00:00:00, dtype: float64
A SQLite file named cache.sqlite will be created in the working directory, storing the request until the expiry
date.
For additional information on using requests-cache, see the documentation.
26 Chapter 3. Documentation
CHAPTER 4
• genindex
• modindex
• search
27