Running the linear regression models requires modules from scikit-learn (installation instructions here) and statsmodels. Other than that, the code uses standard Python libraries such as pandas, numpy, matplotlib, and seaborn, and should run with no issues on Python 3.*.
Using Airbnb data on listings in New York City from August 2020 to August 2021, I wanted to understand:
- How do airbnb prices in NYC differ by neighborhood and other attributes?
- What are the most salient factors that predict pricing?
- How have prices trended during COVID-19?
There are two notebooks that analyze 1) factors related to pricing, 2) price trends over time, as denoted by their titles. Each notebook contains markdown cells that walk through the exploration, visualiation, and model building step by step.
The 'data' folder contains monthly data downloaded from Airbnb. The 'figures' folder contains all figures saved from the analysis.
The main results of this analysis can be found in this Medium post.
The Airbnb data is available under a Creative Commons CC0 1.0 Universal (CC0 1.0) "Public Domain Dedication" license and can be found here.