Descriptive Analytics Assignment
Descriptive Analytics Assignment
You have been assigned as a business analyst for a leading online retail company specializing in furniture and office supplies. A dataset of
51,290 transactions between 2013 to 2016 has been given to you. The dataset contains the columns Row ID, Order ID, Order Date, Ship Date,
Ship Mode, Customer ID, Customer Name, Segment, Postal Code, City, State, Country, Region, Market, Product ID, Category, Sub-Category,
Product Name, Sales, Quantity, Discount, Profit, Shipping Cost, Order Priority. Use the dataset to derive marketing strategy insights, and make a
report by answering the following questions:
1. Use appropriate visualization techniques to summarize each of the variables and to construct hypothesis statements. Develop at least five
hypothesis statements.
2. Use appropriate analytic techniques to test the hypothesis statements. Explain and interpret your results in detail.
3. Provide recommendations to the company relating to marketing strategies that it should adopt based on the results of the analysis.
Sales vs Count
Conclusion
A sample of 5129 sets of data was taken for analysis.
The sales Mean, Standard Error, Median, Mode, Standard Deviation, Sample Variance, Kurtosis and skewness is higher.
This difference in LOS was impacting the over statistics of the hospital as well.
By plotting box plot, the outliers were observed for each Key variable. Using Z-score computation,
the outliers were removed overall.
This resulted in better quality of the data set.
The statistics were done again post removal of outliers and compared as below:
Tables
2000.00
1000.00
0.00
-1000.00 0.00 1000.002000.003000.004000.005000.00
-2000.00
Sales vs Discount
Scatter Plot between Sales and Discount – We do see that Sales is independent of Discount. The number of sales is not increasing with increase in the
percentage of discount given
Tables
40000.00
20000.00
0.00
0.00 1.00 2.00 3.00 4.00 5.00
Histogram of Sales (Before removing Outliers) -> The data is skewed completely to right
Histogram of Sales (After removing Outliers) – The skewness of data is still towards the right. But the data seem to be more uniformly distributed.
Histogram of Profit -> Data seems to be normally distributed (before Removing Outlier)
Histogram of Profit (After removing Outliers) – Few extreme values in the data has reduced after removing the outliers
Bar graph of Sales Vs Sub Category – Chairs, bookcases and Phones seems to be in more demand when compared to other items
Stacked Bar Chart – Sub Category Vs Shipping Mode -> First class Shipment is more in demand for Art, binders and Storage items
Shipping mode Vs Items - > The first class or Same day shipping is done mainly for orders below 1000 due to cost factor
Order Priority Vs Segment of product - > Consumer segment are given more priority in case of Criticality of Order shipment
Box Plot- Category Vs Sales (Before removing Outliers) – We are able to see few Outliers in technology and Office Supplies Category.
Box plot- Sales Vs Category (After Removing Outliers)-> We see significance difference in the dataand most of the extreme values are being taken out.
Sub Category vs Profit- In the segment of Phones, chairs and copiers, we see more profit incurred. We also see some extreme Profit values in segment like
Benders, phones and copiers gained due to outlier’s present.
Sub Category Vs Profit (After removal of outlier) – Significance decrease in the extreme or outlier values. In the case of Tables, we do not see almost no
outliers/extreme values
Correlogram of the Sample data – We can see a strong positive correlation between Sales and Shipping cost. And a strong negative correlation between
discount and profit.
Model
The model is created using Excel for multiple linear regression assuming alpha as 0.05 level of significance. Based on the coefficient of determination R
square, 92% of the Sales is determined by the independent variables used in the model. The model is a good fit.
Hypothesis statements
1.) Mean Sales across all regions is equal
2.) Mean Shipping cost is equal to all Regions
3.) Mean Profit is equal across all Regions
4.) Profit is independent of Discount, Sales and Quantity
5.) Shipping Cost is independent of Country and SubCategory
As the P value is 4.34292487440634E-09 which is less than 0.05, we reject the null hypothesis and accept the alternative hypothesis.
As the P value is 0.99 which is grator than 0.05, we accept the null hypothesis.
Hypothesis 3: Mean Profit is equal across all Regions
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 3709487.433 22 168613.0652 6.756952375 1.98104E-20 1.544230836
Within Groups 121176678.3 4856 24954.01119
As the P value is 1.98103919419889E-20 which is less than 0.05, we reject the null hypothesis and accept the alternate hypothesis.
#H0 - Profit=discount+Sales+Quantity
#H1 - Profit!=discount+Sales+Quantity
We see that all the factors are significantly affecting the price
Regression Statistics
Multiple R 0.483848399
R Square 0.234109273
Adjusted R Square 0.233660947
Standard Error 137.6369678
Observations 5129
ANOVA
df SS MS F Significance F
Regression 3 29676717.88 9892239.294 522.185034 4.1619E-296
Residual 5125 97087666.4 18943.93491
Total 5128 126764384.3
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 35.49332164 3.755206698 9.451762443 4.93786E-21 28.13151313 42.85513014 28.13151313 42.85513014
Sales 0.096358198 0.003532539 27.27732036 4.2666E-153 0.089432913 0.103283482 0.089432913 0.103283482
Quantity 1.003829029 0.881729614 1.138477162 0.254974505 -0.72473749 2.732395548 -0.72473749 2.732395548
Discount -227.8011358 9.123190008 -24.96946085 5.8321E-130 -245.6864836 -209.915788 -245.6864836 -209.915788
Discount, Quantity, Sales are having significant effect on Price still which is contributing to only 23.41%. As the P value is less than 0.05, we accept the
hypothesis
Hypothesis 5: Shipping Cost is independent of Country and SubCategory
Pr(>|t|)
(Intercept) 0.332639
Sub-Category : Appliances 0.255826
Sub-Category : Art 0.003758 **
Sub-Category : Binders 0.000448 ***
Sub-Category : Bookcases 0.140265
Sub-Category : Chairs 0.100037
Sub-Category : Copiers 4.18e-05 ***
Sub-Category : Envelopes 0.018414 *
Sub-Category : Fasteners 0.002851 **
Sub-Category : Furnishings 0.003676 **
Sub-Category : Labels 0.000587 ***
Sub-Category : Machines 0.76511
Sub-Category : Paper 0.015242 *
Sub-Category : Phones 0.236621
Sub-Category : Storage 0.106763
Sub-Category : Supplies 0.012435 *
Sub-Category : Tables < 2e-16 ***
Country : Albania 0.91281
Country : Algeria 0.738475
Country : Angola 0.951432
Country : Argentina 0.234835
Country : Australia 0.899099
Country : Austria 0.594137
Country : Azerbaijan 0.765025
Country : Bahrain 0.705502
Country : Bangladesh 0.981222
Country : Barbados 0.826301
Country : Belarus 0.250224
Country : Belgium 0.799339
Country : Benin 0.769308
Country : Bolivia 0.931652
Country : Bosnia and Herzegovina 0.780198
Country : Brazil 0.37474
Country : Bulgaria 0.818955
Country : Burkina Faso 0.007239 **
Country : Cambodia 0.316617
Country : Cameroon 0.746526
Country : Canada 0.909344
Country : Chile 0.706947
Country : China 0.543633
Country : Colombia 0.734385
Country : Costa Rica 0.746996
Country : Cote d'Ivoire 0.817516
Country : Croatia 0.974605
Country : Cuba 0.660458
Country : Czech Republic 0.905662
Country : Democratic Republic of the Congo 0.831813
Country : Denmark 0.110653
Country : Dominican Republic 0.410668
Country : Ecuador 0.798754
Country : Egypt 0.754726
Country : El Salvador 0.955801
Country : Equatorial Guinea 0.664443
Country : Estonia 0.881587
Country : Finland 0.984997
Country : France 0.991896
Country : Gabon 0.885618
Country : Georgia 0.485149
Country : Germany 0.930062
Country : Ghana 0.967607
Country : Guatemala 0.405258
Country : Guinea 0.961298
Country : Guinea-Bissau 0.880244
Country : Guyana 0.817636
Country : Haiti 0.096429 .
Country : Honduras 0.199614
Country : Hong Kong 0.865287
Country : Hungary 0.76052
Country : India 0.774885
Country : Indonesia 0.672151
Country : Iran 0.696326
Country : Iraq 0.986082
Country : Ireland 0.190443
Country : Israel 0.85794
Country : Italy 0.408283
Country : Jamaica 0.720221
Country : Japan 0.657808
Country : Jordan 0.908563
Country : Kazakhstan 0.071053 .
Country : Kenya 0.952589
Country : Kyrgyzstan 0.611104
Country : Lesotho 0.890601
Country : Liberia 0.121887
Country : Libya 0.851323
Country : Lithuania 0.272273
Country : Luxembourg 0.556419
Country : Madagascar 0.811705
Country : Malaysia 0.433257
Country : Mali 0.826235
Country : Martinique 0.402828
Country : Mauritania 0.395629
Country : Mexico 0.931423
Country : Moldova 0.789967
Country : Mongolia 0.802962
Country : Montenegro 0.881106
Country : Morocco 0.53037
Country : Mozambique 0.534544
Country : Myanmar (Burma) 0.506389
Country : Namibia 0.74116
Country : Nepal 0.989618
Country : Netherlands 0.016850 *
Country : New Zealand 0.8569
Country : Nicaragua 0.823594
Country : Niger 0.942059
Country : Nigeria 0.011995 *
Country : Norway 0.485229
Country : Pakistan 0.038315 *
Country : Panama 0.108455
Country : Papua New Guinea 0.961708
Country : Paraguay 0.944253
Country : Peru 0.244631
Country : Philippines 0.10966
Country : Poland 0.940954
Country : Portugal 0.032413 *
Country : Qatar 0.888379
Country : Romania 0.46974
Country : Russia 0.921857
Country : Rwanda 0.705098
Country : Saudi Arabia 0.914051
Country : Senegal 0.93805
Country : Sierra Leone 0.916336
Country : Singapore 0.845836
Country : Slovakia 0.884829
Country : Slovenia 0.000184 ***
Country : Somalia 0.936314
Country : South Africa 0.828634
Country : South Korea 0.088749
Country : Spain 0.925237
Country : Sri Lanka 0.961813
Country : Sudan 0.619647
Country : Sweden 0.120765
Country : Switzerland 0.215018
Country : Syria 0.988224
Country : Taiwan 0.289607
Country : Tanzania 0.964852
Country : Thailand 0.213359
Country : Togo 0.769891
Country : Trinidad and Tobago 0.764979
Country : Tunisia 0.106301
Country : Turkey 0.044371 *
Country : Turkmenistan 0.709195
Country : Uganda 0.008655 **
Country : Ukraine 0.999113
Country : United Arab Emirates 0.131181
Country : United Kingdom 0.722631
Country : United States 0.888192
Country : Uruguay 0.910498
Country : Uzbekistan 0.81616
Country : Venezuela 0.354139
Country : Vietnam 0.483576
Country : Yemen 1.57e-05 ***
Country : Zambia 0.996456
Country : Zimbabwe 0.166744
The dataset contains sufficient amount of records for the study. Of the 5019 sets available, the outliers were removed to obtain better data quality.
Sale of the goods depends on various factors like price, shipping cost, quantity, region and subcategory.
The model is 92% fit (86% adjusted to fit). There are only a few factors remaining that will impact the Sales.