0% found this document useful (0 votes)
20 views

SMDM Project

great learfning

Uploaded by

nishu.ashwin007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

SMDM Project

great learfning

Uploaded by

nishu.ashwin007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Problem 1: Wholesale Customers Analysis

Introduction

A wholesale distributor operating in different regions of Portugal has information on annual


spending of several items in their stores across different regions and channels. The data consists of
440 large retailers’ annual spending on 6 different varieties of products in 3 different regions (Lisbon,
Oporto, Other) and across different sales channel (Hotel, Retail)

Data Description EDA Analysis:


1.1 Use methods of descriptive statistics to summarize data. Which Region and which
Channel spent the most? Which Region and which Channel spent the least?

Descriptive Statistics Summarize the data for Region & Channel


The data shows that the spending in other regions is way more as compared to both of the regions
combined. The spending in Others is maximum as compared to annual spending in Lisbon and
Oporto and Oporto spent the least.

1.2. There are 6 different varieties of items that are considered. Describe and
comment/explain all the varieties across Region and Channel? Provide a detailed
justification for your answer.

Answer 1.2: - All the behavioural characteristics of data can be determined by aggregating the data
using the describe function of python. Below data shows the count, mean, std, min, max and IQRs of
all the varieties across the three regions and two channels.
1.3 On the basis of the descriptive measure of variability, which item shows the most
inconsistent behaviour? Which items shows the least inconsistent behaviour?

Ans = Descriptive Measure of Variability is done using coefficient of variation.

Using Coefficient of Variation, we find out the least value is of Category “Fresh” (1.05) and highest
value is of Category “Delicatessen” (1.84) So from the given data it is clear that most inconsistent
behaviour shown by item – Delicatessen And least inconsistent behaviour shown by item – Fresh
Below is the output from Python –

Coefficient of Variation for Fresh is 1.0539179237473144


Coefficient of Variation for Milk is 1.2732985840065412
Coefficient of Variation for Frozen is 1.5803323836352914
Coefficient of Variation for Grocery is 1.1951743730016822
Coefficient of Variation for Detergents Paper is 1.654647138500516 Coefficient of Variation for
Delicatessen is 1.849406898115838

1.4 Are there any outliers in the data? Back up your answer with a suitable plot/technique
with the help of detailed comments.

Ans - To check the outliers in the dataset, there are couple of graphs and methods are available to
perform. Plots which we can use to check for the outliers are Boxplot, scatterplot. Data point far
from other data point present in the plots would be considered as an outlier. We can calculate IQR
Interquartile range for the variable. If the data point is outside the IQR range, then it would consider
as outlier. The data for the above questions such as skewness and the behavioural characteristics
indicates that there are many outliers in the data. We can also confirm the same by doing a boxplot
for all the items

The box plots clearly show us that each of the item in the data has outliers in it. We can notice data
for median and IQR also not evenly placed for few such as Detergents Paper. There are extreme
outlier occurrences for all these items.
1.5 On the basis of your analysis, what are your recommendations for the business? How
can your analysis help the business to solve its problem? Answer from the business
perspective.

Ans - Business Recommendations

1) As per the analysis, I find out that there are inconsistencies in spending of different items (by
calculating Coefficient of Variation), which should be minimized. The spending of Hotel and Retail
channel are different which should be more or less equal. And also spent should equal for different
regions. Need to focus on other items also than “Fresh” and “Grocery”.

2) It can be noticed that the overall sales in hotels is much higher than the sales in Retail. The
distributor may consider Retail channel as a target area for further expansion on growth. Spend in
Hotel needs to be increased in Milk, Grocery and Detergents Paper. Spend in Retail needs to be
increased in Fresh, Frozen and Delicatessen.

3) The annual spending on Grocery items is directly proportional to the number of Retailers in the
region. So, Retail Channel should spend most on the Grocery items. The spending should be done
carefully as Grocery items are also very inconsistent.

4) The annual spending through both channels by all the regions should be managed carefully
especially in case of Fresh items. The Fresh items have the highest standard deviation and are least
inconsistent. So, the spending on this item should be done carefully.

5) The data is not normally distributed due to the presence of many outliers. This indicates that a
large no of sales can be attributed to some specific buyers. Additional consideration in the business
should be taken for these buyers to ensure long term retention.

6) There needs to be focus on increasing the spend in Lisbon, Oporto regions and Retail Channel to
balance the spend to reduce risk while increasing business

You might also like