Case Study Reportf
Case Study Reportf
Introduction
This case study aims to analyze the Amazon sales data to uncover insights into customer
demographics, product preferences, and sales performance. The analysis focuses on identifying
key trends and patterns that can inform business strategies and decision-making.
Data Overview
The data consists of various features related to Amazon sales, such as order ID, order date, ship
date, ship mode, customer ID, segment, country, city, state, postal code, region, product ID,
category, sub-category, product name, sales, quantity, discount, and profit.
Objectives
Methodology
# Descriptive statistics
df.describe()
top_10_states = df['ship-
state'].value_counts().head(10)
Visualization:
plt.figure(figsize=(12, 6))
sns.countplot(data=df[df['ship-
state'].isin(top_10_states.index)], x='ship-state')
plt.xlabel('ship-state')
plt.ylabel('count')
plt.title('Distribution of State')
plt.xticks(rotation=45)
plt.show()
Insights
Top 10 States: The analysis identifies the top 10 states with the highest sales volume. A bar plot
is used to visualize the distribution, showing that Maharashtra has the highest number of
buyers.
Customer Base: The data reveals a significant customer base in the Maharashtra state.
Product Preferences: T-shirts are highly demanded, with M-size being the most preferred choice
among buyers.
Order Fulfillment: Orders are primarily fulfilled through Amazon, highlighting its role as a crucial
distribution channel.
Conclusion
The data analysis reveals that the business has a significant customer base in Maharashtra state,
mainly serves retailers, fulfills orders through Amazon, experiences high demand for T-shirts,
and sees M-Size as the preferred choice among buyers. These insights can help the business
tailor its marketing strategies, optimize inventory management, and enhance customer
satisfaction.
Recommendations
We can also incorporate linear regression algorithm for the above sales report.To use a linear
regression algorithm for an Amazon sales report, you'll follow these general steps:
1. Collect Data:
o Gather historical sales data. This can include daily, weekly, or monthly sales
figures, depending on the granularity you need.
o Collect other relevant features that might influence sales, such as price,
advertising spend, promotions, seasonality, and competitor prices.
2. Preprocess Data:
o Clean the data by handling missing values, removing outliers, and ensuring
consistency.
o Encode categorical variables if necessary (e.g., product categories).
o Normalize or standardize numerical features to improve the performance of the
regression model.
3. Exploratory Data Analysis (EDA):
o Visualize the data to understand trends, patterns, and relationships between
variables.
o Use plots like histograms, scatter plots, and correlation matrices.
4. Split the Data:
o Divide the data into training and testing sets. A common split is 80% for training
and 20% for testing.
5. Build the Linear Regression Model:
o Use a machine learning library like scikit-learn in Python to create and train the
linear regression model.
6. Evaluate the Model:
o Assess the model's performance using metrics like Mean Absolute Error (MAE),
Mean Squared Error (MSE), and R-squared.
o Plot residuals to check for patterns that might indicate issues with the model.
7. Make Predictions:
o Use the trained model to make predictions on new data or to forecast future sales.
Output:
The results of the Amazon sales report analysis is shown below. In this out put we get to
whichstate purchased more products in India