Black Friday Sales Prediction

Project Introduction

Black Friday is the informal name for the Friday following Thanksgiving Day in the United States, which falls on the fourth Thursday of November. It has been considered the beginning of the U.S. Christmas shopping season since 1952. Over the years, retailers have capitalized on this day by offering heavy discounts and starting sales as early as midnight.

For a retail store or an e-commerce business, a major challenge is setting product prices that maximize profit during such high-volume sales events. This project aims to predict purchase amounts based on historical sales data, which can help businesses in pricing decisions to increase profits.

Dataset Description

The dataset used in this project was obtained from an online data analytics hackathon hosted by Analytics Vidhya. It contains various features such as:

Age
Gender
Marital Status
Product Categories
City Demographics
Purchase Amount

It consists of 12 columns and 537,577 records. The target variable is the purchase amount, which we aim to predict using different regression models.

Exploratory Data Analysis (EDA)

Key insights derived from the data visualization:

Gender: Approximately 75% of the purchases were made by male users, and only 25% by females. Males also tend to spend more on average.
Marital Status: Single men spend the most during Black Friday. Spending tends to decrease after marriage, possibly due to added responsibilities.
Age Group: The age group 25–40 is the most active in terms of spending.
Stay in Current City: Users who have been in the city for just 1 year tend to spend more, possibly due to their need to buy new items. Long-term residents (>4 years) show reduced purchase activity.
City Category: While City B contributes the most to overall sales revenue, certain products are purchased more frequently in City C.

Data Preparation

Label Encoding was used for categorical features such as Age, Gender, and City_Category.
One-hot Encoding (get_dummies) was applied to the Stay_In_Current_City_Years column.
Missing Values in Product_Category_2 and Product_Category_3 were filled appropriately.

Modeling Phase

The dataset was split into training and testing sets using an 80:20 ratio.
Multiple regression models were implemented:
- Linear Regression
- Decision Tree Regressor
- Random Forest Regressor
- XGBoost Regressor

Evaluation Metric

The performance of models was evaluated using Root Mean Square Error (RMSE), a standard metric to measure the accuracy of regression models. RMSE calculates the square root of the average squared differences between predicted and actual values.

Technologies Used

Python
Pandas – for data manipulation and analysis
NumPy – for numerical operations
Matplotlib and Seaborn – for data visualization
Scikit-learn – for preprocessing, modeling, and evaluation
XGBoost – for implementing the XGBoost regression model
Jupyter Notebook – for interactive development and analysis

Conclusion

Several regression models were tested in this project:

Linear Regression
Decision Tree Regressor
Random Forest Regressor
XGBoost Regressor

Among these, the XGBoost Regressor yielded the best results with the lowest RMSE of 2879, making it the most effective model for predicting purchase amounts in this scenario.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Data		Data
Black Friday Sales Prediction.ipynb		Black Friday Sales Prediction.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Black Friday Sales Prediction

Table of Contents

Project Introduction

Dataset Description

Exploratory Data Analysis (EDA)

Data Preparation

Modeling Phase

Evaluation Metric

Technologies Used

Conclusion

About

Uh oh!

Releases

Packages

Languages

nanthasnk/Black-Friday-Sales-Prediction

Folders and files

Latest commit

History

Repository files navigation

Black Friday Sales Prediction

Table of Contents

Project Introduction

Dataset Description

Exploratory Data Analysis (EDA)

Data Preparation

Modeling Phase

Evaluation Metric

Technologies Used

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages