Retail Analysis With Walmart Data
Retail Analysis With Walmart Data
Machepalli Ramaseshu
Data Science with Python (Simplilearn)
Abstract
The aim of the project is to analyse the sales of the various stores of Walmart
and predict the sales and demand accurately. There are certain events and
holidays which impact sales on each day. The business is facing a challenge due
to unforeseen demands and runs out of stock sometimes, due to the
inappropriate machine learning algorithm. An ideal ML algorithm will predict
demand accurately and ingest factors like economic conditions including CPI,
Unemployment Index, etc.
Keywords
Weekly sales, Monthly sales, Temperature, Markdown, Linear regression.
Business Problem
The decision makers of Walmart should be able to analyse the various factors
affecting the sales in their stores. The various factors include fuel price,
temperature, Unemployment, CPI etc.
Analysis Tasks
Which store has maximum sales
Which store has maximum standard deviation i.e., the sales vary a lot.
Also, find out the coefficient of mean to standard deviation
Which store/s has good quarterly growth rate in Q3’2012
Some holidays have a negative impact on sales. Find out holidays which
have higher sales than the mean sales in non-holiday season for all stores
together
Provide a monthly and semester view of sales in units and give insights
Data Understanding
This is the historical data that covers sales from 2010-02-05 to 2012-11-01, in
the file Walmart_Store_sales. Within this file you will find the following fields:
Store - the store number
Date - the week of sales
Weekly_Sales - sales for the given store
Holiday_Flag - whether the week is a special holiday week 1 – Holiday
week 0 – non-holiday week
Temperature - Temperature on the day of sale
Fuel_Price - Cost of fuel in the region
CPI – Prevailing consumer price index
Unemployment - Prevailing unemployment rate
Holiday Events
Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13
Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13
Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13
Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13
Data Preparation
Converted date to datetime format
Checking for missing values and there are none in the dataset
Splitting the date column into Day, Month and Year columns to facilitate
the analysis of monthly sales and such.
As we can see from Fig 2 that store 20 has Maximum sales and store 33 has
Minimum sales.
Q2. Which store has maximum St. Deviation and also find out the
coefficient of mean to St. Deviation?
The store with maximum standard deviation is 14 with 317570 $.
The store with good growth rate in third quarter of 2012 is Store 4 with
25652119.35 $
Q4. Some holidays have a negative impact on sales. Find out holidays which
have higher sales than the mean sales in non-holiday season for all stores
together
We can see from the above graphs that there is an increase in sales during
Thanksgiving and decrease in sales during Christmas.
From the above graphs we can infer that the sales are highest in December
and lowest in January.
From the above graph the sales are highest in 2011 and lowest in 2012.
Statistical Model
The variables we are considering in building the model are temperature, CPI,
Fuel Price and Unemployment
a. Linear regression model:
Accuracy: 12.820941389380858
Mean Absolute Error: 451424.22420130763
Mean Squared Error: 294645504043.84894
Root Mean Squared Error: 542812.5864825252