fredapi is a Python API for the FRED data provided by the
Federal Reserve Bank of St. Louis. fredapi provides a wrapper in python to the
FRED web service, and also provides several conveninent methods
for parsing and analyzing point-in-time data (i.e. historic data revisions) from ALFRED
fredapi makes use of pandas and returns data to you in a pandas Series or DataFrame
pip install fredapiFirst you need an API key, you can apply for one for free on the FRED website. Once you have your API key, you can set it in one of three ways:
- set it to the evironment variable FRED_API_KEY
- save it to a file and use the 'api_key_file' parameter
- pass it directly as the 'api_key' parameter
from fredapi import Fred
fred = Fred(api_key='insert api key here')
data = fred.get_series('SP500')Many economic data series contain frequent revisions. fredapi provides several convenient methods for handling data revisions and answering the quesion of what-data-was-known-when.
In ALFRED there is the concept of a vintage date. Basically every observation can have three dates associated with it: date, realtime_start and realtime_end.
- date: the date the value is for
- realtime_start: the first date the value is valid
- realtime_end: the last date the value is valid
For instance, there has been three observations (data points) for the GDP of 2014 Q1:
<observation realtime_start="2014-04-30" realtime_end="2014-05-28" date="2014-01-01" value="17149.6"/>
<observation realtime_start="2014-05-29" realtime_end="2014-06-24" date="2014-01-01" value="17101.3"/>
<observation realtime_start="2014-06-25" realtime_end="2014-07-29" date="2014-01-01" value="17016.0"/>This means the GDP value for Q1 2014 has been released three times. First release was on 4/30/2014 for a value of 17149.6, and then there have been two revisions on 5/29/2014 and 6/25/2014 for revised values of 17101.3 and 17016.0, respectively.
If you pass realtime_start and/or realtime_end to get_series, you will get a pandas.DataFrame with a pandas.MultiIndex instead of a pandas.Series.
For instance, with observation_start and observation_end set to 2015-01-01 and realtime_start set to 2015-01-01, one will get:
GDP
obs_date rt_start rt_end
2015-01-01 2015-04-29 2015-05-28 17710.0
2015-05-29 2015-06-23 17665.0
2015-06-24 9999-12-31 17693.3
data = fred.get_series_first_release('GDP')
data.tail()this outputs:
date
2013-04-01 16633.4
2013-07-01 16857.6
2013-10-01 17102.5
2014-01-01 17149.6
2014-04-01 17294.7
Name: value, dtype: objectNote that this is the same as simply calling get_series()
data = fred.get_series_latest_release('GDP')
data.tail()this outputs:
2013-04-01 16619.2
2013-07-01 16872.3
2013-10-01 17078.3
2014-01-01 17044.0
2014-04-01 17294.7
dtype: float64
data = fred.get_dataframe(['SP500', 'GDP'], frequency='q')
data.tail()this outputs:
SP500 GDP
2014-07-31 1975.91 17599.8
2014-10-31 2009.34 17703.7
2015-01-31 2063.69 17693.3
dtype: float64
Note that if you do not specify the frequency each series will be output on its own intrinsic frequency introducing NaN in the dataframe.
data = fred.get_dataframe(['GDP', 'PAYEMS'])
data.tail()outputs:
GDP PAYEMS
2014-07-31 17599.8 139156
2014-08-31 NaN 139369
2014-09-30 NaN 139619
2014-10-31 17703.7 139840
2014-11-30 NaN 140263
2014-12-31 NaN 140592
2015-01-31 17693.3 140793
fred.get_series_as_of_date('GDP', '6/1/2014')this outputs:
| date | realtime_start | value | |
|---|---|---|---|
| 2237 | 2013-10-01 00:00:00 | 2014-01-30 00:00:00 | 17102.5 |
| 2238 | 2013-10-01 00:00:00 | 2014-02-28 00:00:00 | 17080.7 |
| 2239 | 2013-10-01 00:00:00 | 2014-03-27 00:00:00 | 17089.6 |
| 2241 | 2014-01-01 00:00:00 | 2014-04-30 00:00:00 | 17149.6 |
| 2242 | 2014-01-01 00:00:00 | 2014-05-29 00:00:00 | 17101.3 |
This returns a DataFrame with all the data from ALFRED
df = fred.get_series_all_releases('GDP')
df.tail()this outputs:
| date | realtime_start | value | |
|---|---|---|---|
| 2236 | 2013-07-01 00:00:00 | 2014-07-30 00:00:00 | 16872.3 |
| 2237 | 2013-10-01 00:00:00 | 2014-01-30 00:00:00 | 17102.5 |
| 2238 | 2013-10-01 00:00:00 | 2014-02-28 00:00:00 | 17080.7 |
| 2239 | 2013-10-01 00:00:00 | 2014-03-27 00:00:00 | 17089.6 |
| 2240 | 2013-10-01 00:00:00 | 2014-07-30 00:00:00 | 17078.3 |
| 2241 | 2014-01-01 00:00:00 | 2014-04-30 00:00:00 | 17149.6 |
| 2242 | 2014-01-01 00:00:00 | 2014-05-29 00:00:00 | 17101.3 |
| 2243 | 2014-01-01 00:00:00 | 2014-06-25 00:00:00 | 17016 |
| 2244 | 2014-01-01 00:00:00 | 2014-07-30 00:00:00 | 17044 |
| 2245 | 2014-04-01 00:00:00 | 2014-07-30 00:00:00 | 17294.7 |
This work the same way as for the latest release, one just adds either realtime_start, realtime_end, or both.
data = fred.get_dataframe(['GDP', 'CP'], observation_start='7/1/2014',
observation_end='1/1/2015', realtime_start='7/1/2014')
data.tail()outputs:
GDP CP
obs_date rt_start rt_end
2014-07-01 2014-10-30 2014-11-24 17535.4 NaN
2014-11-25 2014-12-22 17555.2 1872.7
2014-12-23 NaT 17599.8 NaN
2015-07-29 NaN 1894.6
2015-07-30 NaT NaN 1761.1
2014-10-01 2015-01-30 2015-02-26 17710.7 NaN
2015-02-27 2015-03-26 17701.3 NaN
2015-03-27 NaT 17703.7 NaN
2015-07-29 NaN 1837.5
2015-07-30 NaT NaN 1700.5
2015-01-01 2015-04-29 2015-05-28 17710.0 NaN
2015-05-29 2015-06-23 17665.0 1893.8
2015-06-24 NaT 17693.3 NaN
2015-07-29 NaN 1891.2
2015-07-30 NaT NaN 1734.5''')
The advantage of a this approach is that all the information is downloaded now and one can apply further transformation without making more web queries.
For instance:
dfo = df.reset_index(levels=[1, 2]) # move rt_start and rt_end to columns.
target = pd.to_datetime('2015-06-01')
dfo[(dfo.rt_start < target) & (target < dfo.rt_end)].groupby(level=0).first()will output the value of the series as of the target date:
rt_start rt_end GDP CP
obs_date
2014-07-01 2014-12-23 2015-07-29 17599.8 1894.6
2014-10-01 2015-03-27 2015-07-29 17703.7 1837.5
2015-01-01 2015-05-29 2015-06-23 17665.0 1893.8from __future__ import print_function
vintage_dates = fred.get_series_vintage_dates('GDP')
for dt in vintage_dates[-5:]:
print(dt.strftime('%Y-%m-%d'))this outputs:
2014-03-27
2014-04-30
2014-05-29
2014-06-25
2014-07-30
You can always search for data series on the FRED website. But sometimes it can be more convenient to search programmatically.
fredapi provides a search() method that does a fulltext search and returns a DataFrame of results.
fred.search('potential gdp').Tthis outputs:
| series id | GDPPOT | NGDPPOT |
|---|---|---|
| frequency | Quarterly | Quarterly |
| frequency_short | Q | Q |
| id | GDPPOT | NGDPPOT |
| last_updated | 2014-02-04 10:06:03-06:00 | 2014-02-04 10:06:03-06:00 |
| notes | Real potential GDP is the CBO's estimate of the output the economy would produce with a high rate of use of its capital and labor resources. The data is adjusted to remove the effects of inflation. | None |
| observation_end | 2024-10-01 00:00:00 | 2024-10-01 00:00:00 |
| observation_start | 1949-01-01 00:00:00 | 1949-01-01 00:00:00 |
| popularity | 72 | 61 |
| realtime_end | 2014-08-23 00:00:00 | 2014-08-23 00:00:00 |
| realtime_start | 2014-08-23 00:00:00 | 2014-08-23 00:00:00 |
| seasonal_adjustment | Not Seasonally Adjusted | Not Seasonally Adjusted |
| seasonal_adjustment_short | NSA | NSA |
| title | Real Potential Gross Domestic Product | Nominal Potential Gross Domestic Product |
| units | Billions of Chained 2009 Dollars | Billions of Dollars |
| units_short | Bil. of Chn. 2009 $ | Bil. of $ |
- I have a blog post with more examples written in an
IPythonnotebook