0% found this document useful (0 votes)
11 views27 pages

Extended Project - Wholesales Customer

Uploaded by

Nandini Priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views27 pages

Extended Project - Wholesales Customer

Uploaded by

Nandini Priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

PDS GRADED PROJECT – GUIDED

REPORT SUBMITTED BY NANDINI


PRIYA M
Table of Content

1. Understanding the structure of the data………………..………………………………………………………………………3

1.1. Load the data..................................................................................................................................3

1.2. Check the structure of the data......................................................................................................3

1.3. Check the types of the data…………………………………………………………….…………………………………………4

1.4. Check for missing values………………………………………………………………………………………………...…………4

1.5. Check the statistical summary…………………………………………………………………………………………………….4

2. Univariate Analysis……………………………..………………..………………………………………………………………………5

2.1. Explore all the variables and provide observations on the distributions of all the relevant

variables in the dataset…………...............................................................................................................5

2.2. Explore all the categorical variables and provide observations on their
frequency……………………………………………………........................................................................................9

2.3. Find the distribution of spending across all categories……………………………………………………………….


…………………………………………………………….…………10

2.4. Check for any outliers in the data…………………………………………….


…………………………………………………………………………………………………….5

3 Multivariate Analysis……………………………………………..……………………………………………………………………13

3.1. Find the total spendings across regions ……………………………………………………….…………………………14

3.2. Find the total spending of all the channels ………………………………………………………………………………15

3.3. Find the total spending across regions via different channels……………………………………………………16

3.4. Find the total spending on each of the categories across different region and channels……………16

3.5 Do the item varieties show similar behaviour across region and channel?...................................22

3.6 Is there any correlation between the different item varieties in terms of spending?....................24

4. Conclusion and Recommendations…………………………………………………………………………………………….25

4.1 Conclude with the key insights/observations……………………………………………………………………………25

4.2 Mention recommendations for the business…………………………………………………………………………….26


1. structure of the data

1.1 Import the required libraries

1.2 Dataset shape, datatypes


(440, 9)

There are 440 rows and 9 columns present in the dataset.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 440 entries, 0 to 439
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Buyer/Spender 440 non-null int64
1 Channel 440 non-null object
2 Region 440 non-null object
3 Fresh 440 non-null int64
4 Milk 440 non-null int64
5 Grocery 440 non-null int64
6 Frozen 440 non-null int64
7 Detergents_Paper 440 non-null int64
8 Delicatessen 440 non-null int64
dtypes: int64(7), object(2)

Based on the analysis , numerical data types are identified as follows

 Buyer/sender
 Fresh
 Milk
 Grocery
 Frozen
 Detergents_paper
 Delicatessen
Categorical data types are identified as follows

 Channel
 Region

1.3 Statistical summary and check for missing values

2. Univariate Analysis

2.1 Explore all the variables and provide observations


Observations
1. Fresh: The distribution of the 'Fresh' variable is right-skewed, indicating that most
customers purchase a relatively small amount of fresh products.
2. Milk, Grocery, Frozen, Detergents_Paper, Delicatessen: These variables also exhibit right-
skewed distributions, suggesting that a majority of customers tend to purchase smaller
quantities of these products.
3. The presence of outliers is evident in some of the histograms, particularly for variables like
'Grocery' and 'Detergents_Paper'. These outliers represent customers with unusually high
purchases for these categories.
4. The overall shape of the histograms suggests that the purchasing behavior of customers
varies significantly across different product categories.

2.2 Explore all the categorical variables and provide observations on their frequency
Observations

1. Region:
 The "Lisbon" region has the highest number of customers.
 "Oporto" and "Other" regions have a similar number of customers.

2. Channel:
 The "Hotel/Restaurant/Cafe" channel has the highest number of customers.
 "Retailer" and "Direct" channels have a similar number of customers.

2.3 Find the distribution of spending across all categories


Observation
1. Milk has a median spending of 3627
2. Grocery has a median spending of 4755
3. Frozen has a median spending of 8504
4. Detergents_paper has a median spending of 816
5. Delicatessen has a median spending of 965
6. Outliers are present for all categories. Among all the categories , Froze has the highest median
spending
2.4 Check for any outliers in the data?

3. Multivariate Analysis

3.1 Find the total spendings across regions


Observation- The total spending is highest in Oporto , followed by other and Lisbon region

3.2 Find the total spending of all the channels


Observation - Total spending highest in Retail
3.4 Find the total spending across regions via different channels

Observation – Among the channel , Retail has the highest spending


Hotel – total spending is highest in other , followed by Lisbon and Oporto
Retail , Total spending is highest in other , followed by Lisbon and Oporto

3.4 Find the total spending on each of the categories across different region and
channels
Observation

 For item fresh, Hotel has the highest total spending followed by retail
 In retail, highest total spending in other, followed by Oporto
 In hotel, Highest total spending in other, followed by Lisbon
 Other region is performing well in all channels
Observation

 For item Milk, Retail has the highest total spending followed by Hotel
 In retail, highest total spending in other, followed by lisbon
 In hotel, Highest total spending in lisbon, followed by other
Observation

 For item grocery, retail has the highest total spending followed by hotel
 In retail, highest total spending in Lisbon, followed by other
 In hotel, Highest total spending in Oporto, followed by Lisbon
Observation

 For item frozen, Hotel has the highest total spending followed by retail
 In retail, highest total spending in Lisbon,
 In hotel, Highest total spending in Oporto, followed by other
Observation

 For item detergents paper, Retail has the highest total spending followed by hotel
 In retail, highest total spending in oporto, followed by Lisbon
 In hotel, Highest total spending in Lisbon
Observation

 For item delicatessen, retail has the highest total spending followed by hotel
 In retail, highest total spending in Lisbon, followed by other
 In hotel, Highest total spending in other, followed by lisbon

3.5 Do the item varieties show similar behavior across region and channel?
By region
By channel

Observation

 All item categories show similar behavior across region


 Milk , grocery , detergents paper show similar behavior across channel . firoze and
delicatessen show similar behavior across channel.

3.6 Is there any correlation between the different item varieties in terms of
spending?
Observation

 Milk and Grocery have a high positive correlation (0.72)


 grocery and Detergents_Paper have a high positive correlation (0.92)
 grocery has the lowest coefficient of variation, indicating that its spending is relatively consistent
across different regions and channels.
 Delicatessesn has the highest coefficient of variation, suggesting that its spending varies
significantly across different regions and channels.
 Detergents paper and Frozen have moderate coefficients of variation, indicating some variation
in spending but not as much as Delicatessesen.
 Milk has a relatively low coefficient of variation, similar to grocery, suggesting that its spending is
also somewhat consistent across different regions and channels.

4. Univariate Analysis

4.1 Conclude with the key insights/observations

1.Spending patterns across regions:


 Region oporto has the highest total spending, followed by Region other and Region
lisbon.
 The spending patterns for different item categories vary across regions.
2.Spending patterns across channels:
 The retail channel has the highest total spending, followed by the hotel channel.
 The spending patterns for different item categories vary across channels.
3.Spending patterns by region and channel:
 Region other or retail channel has the highest spending for most item categories.
 There are some exceptions, such as Region other with the hotel channel having the
highest spending for the fresh category. Region Lisbon with retail channel for grocery .
region oporto with hotel channel for frozen.
4.Distribution of spending across categories:
 The spending on different item categories is right skewed, with a few high-spending
customers driving the average spending up.
 This suggests that there is an opportunity to target these high-spending customers with
personalized marketing campaigns.
5.Outliers in spending:
 There are a few outliers in the spending data for each item category.
 These outliers could be due to data entry errors or unusual customer behavior.
6.Coefficient of variation:
 The coefficient of variation is highest for the Delicatessen category, indicating that
spending on this category is the most variable.
 This suggests that there is an opportunity to increase sales of Delicatessen products by
targeting customers who are more likely to spend on this category.
7. Correlaion:
 Milk and Grocery have a high positive correlation (0.72)
 grocery and Detergents_Paper have a high positive correlation (0.92)

4.2 Mention recommendations for the business


Based on the analysis, here are some recommendations for the business:
1. Focus on increasing sales of high-margin items:
 Items such as Milk, Grocery, and fresh have higher spending compared to others.
 Strategies like promotions, targeted marketing, and improved product placement
can be implemented to boost their sales.
2. Analyze regional spending patterns:
 Spending varies across regions.
 Tailor marketing strategies and product offerings to meet the specific needs and
preferences of each region.
3. Explore opportunities in the Hotel channel:
 The Hotel channel shows promising spending potential.
 Develop strategies to increase sales in this channel, such as offering tailored product
bundles or discounts.
4. Address outliers in spending:
 Investigate the reasons behind unusually high or low spending for certain customers
or items.
 Implement measures to address any underlying issues or capitalize on opportunities.
5. Monitor spending trends over time:
 Regularly track spending patterns to identify changes and emerging trends.
 Use this information to adjust strategies and make informed business decisions.
6. Conduct further analysis:
 Explore the relationship between spending and other factors like customer
demographics, purchase frequency, and seasonality.
 Utilize advanced techniques like segmentation and predictive analytics to gain
deeper insights into spending patterns.

By implementing these recommendations, the business can optimize its product offerings,
marketing strategies, and sales channels to increase revenue and profitability.

You might also like