100% found this document useful (1 vote)

164 views

SMDM Project

The document provides an analysis of annual spending data from 440 large retailers on 6 product categories across 3 regions and 2 channels in Portugal. It finds that: 1) The region that spent the most was Lisbon and the channel that spent the most was Hotel. The region that spent the least was Other and the channel that spent the least was Retail. 2) The 6 product varieties analyzed across region and channel were: fresh, milk, grocery, frozen, detergents/paper, and delicatessen. 3) Based on variability, frozen products showed the most inconsistent behavior, while milk showed the least inconsistent behavior. There were also some outliers found in the data. 4) Recommend

Uploaded by

Nandini Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

164 views

SMDM Project

Uploaded by

Nandini Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

SMDM PROJECT REPORT

DSBA

Name – Nandini Gupta

PGP-DSBA Online Oct’ 21
Date: 12/12/2021

1
Contents
Problem 1.....................................................................................................................................................................4
1.1. Use methods of descriptive statistics to summarize data. Which Region and which Channel spent the most?
Which Region and which Channel spent the least? .....................................................................................4-8
1.2. There are 6 different varieties of items that are considered. Describe and comment/explain all the varieties
across Region and Channel? Provide a detailed justification for your answer.............................................8-10
1.3. On the basis of the descriptive measure of variability, which item shows the most inconsistent behaviour?
Which items shows the least inconsistent behaviour? ...................................................................................10
1.4. Are there any outliers in the data? Back up your answer with a suitable plot/technique with the help of
detailed comments...........................................................................................................................................10
1.5. On the basis of your analysis, what are your recommendations for the business? How can your analysis
help the business to solve its problem? Answer from the business perspective...............................................11
Problem 2.......................................................................................................................................................................12
2.1. For this data, construct the following contingency tables (Keep Gender as row variable)……………...12
2.1.1.Gender and Major.......................................................................................................................12
2.1.2.Gender and Grad Intention..........................................................................................................12
2.1.3. Gender and Employment............................................................................................................12
2.1.4. Gender and Computer ................................................................................................................13
2.2. Assume that the sample is a representative of the population of CMSU. Based on the data, answer the
following questions:...........................................................................................................................................13
2.2.1. What is the probability that a randomly selected CMSU student will be male?.........................13
2.2.2 What is the probability that a randomly selected CMSU student will be female?
...............................................................................................................................................................13

2.3. Assume that the sample is representative of the population of CMSU. Based on the data, answer the
following question:………………………………………………………………………………………….....13
2.3.1. Find the conditional probability of different majors among the male students in CMSU……..13
2.3.2 Find the conditional probability of different majors among the female students of CMSU…...14
2.4. Assume that the sample is a representative of the population of CMSU. Based on the data, answer the
following question:……………………………………………………………………………………………15
2.4.1. Find the probability That a randomly chosen student is a male and intends to graduate………15
2.4.2 Find the probability that a randomly selected student is a female and does NOT have a
laptop.....................................................................................................................................................15
2.5. Assume that the sample is representative of the population of CMSU. Based on the data, answer the
following question:…………………………………………………………………………………………….16
2.5.1. Find the probability that a randomly chosen student is a male or has full-time employment?...16
2.5.2. Find the conditional probability that given a female student is randomly chosen, she is majoring
in international business or management…………………………………………………………...…16

2
2.6. Construct a contingency table of Gender and Intent to Graduate at 2 levels (Yes/No). The Undecided
students are not considered now and the table is a 2x2 table. Do you think the graduate intention and being
female are independent events?..........................................................................................................................17
2.7. Note that there are four numerical (continuous) variables in the data set, GPA, Salary, Spending, and Text
Messages……………………………………………………………………………………………….......…..17
2.7.1. If a student is chosen randomly, what is the probability that his/her GPA is less than 3?..........17
2.7.2. Find the conditional probability that a randomly selected male earns 50 or more. Find the
conditional probability that a randomly selected female earns 50 or more……………………...……18
2.8. Note that there are four numerical (continuous) variables in the data set, GPA, Salary, Spending, and Text
Messages. For each of them comment whether they follow a normal distribution. Write a note summarizing
your conclusions………………………………………………………………………………………………19

Problem 3.........................................................................................................................................................................19
3.1 Do you think there is evidence that means moisture contents in both types of shingles are within the
permissible limits? State your conclusions clearly showing all steps.................................................................21
3.2 Do you think that the population mean for shingles A and B are equal? Form the hypothesis and conduct
the test of the hypothesis. What assumption do you need to check before the test for equality of means is
performed?..........................................................................................................................................................22

3
Problem 1 Wholesale Customers Analysis
A wholesale distributor operating in different regions of Portugal has information on annual spending of
several items in their stores across different regions and channels. The data consists of 440 large retailers’
annual spending on 6 different varieties of products in 3 different regions (Lisbon, Oporto, Other) and across
different sales channel (Hotel, Retail).
This data set has 440 rows and 9 columns. It refers to customers of a wholesale distributor. It involves the
annual spending in monetary units (m.u.) on different product categories.The Following data dictionary gives
more details on this data set:
Description of variables is as folllows:

 FRESH : annual spending (m.u.) on fresh products (Continuous);

 MILK : annual spending (m.u.) on milk products (Continuous);
 GROCERY : annual spending (m.u.)on grocery products (Continuous);
 FROZEN : annual spending (m.u.)on frozen products (Continuous);
 DETERGENTS_PAPER : annual spending (m.u.) on detergents and paper products (Continuous);
 DELICATESSEN : annual spending (m.u.)on and delicatessen products (Continuous);
 CHANNEL : customers Channel - Hotel (Hotel/Restaurant/Cafe) or Retail channel (Nominal);
 REGION : customers Region Lisnon, Oporto or Other (Nominal);
 BUYER/SPENDER : it is showing running id number (assumption it is index) (Continuous);

Region Frequency Region - total : 440 rows Lisbon 77 rows Oporto 47 rows Other 316 row
Channel Frequency Channel -total : 440 rows Hotel 298 rows Retail 142 rows

Our project goal is to analysis the data and answer the questions asked.Thus, there is no outcome to be
predicted, and the EDA just tries to find patterns in the data.
1.1 Use methods of descriptive statistics to summarize data.
(a) Which Region and which Channel spent the most?
(b) Which Region and which Channel spent the least?
By using the describe function in python we first looked at the basic descriptive statistics of the dataset.
Sample of the Data:

Tab 1.1.1

4
Exploratory Analysis of the Data:

Tab 1.1.2
Descriptive Statistics of the Data:

Tab 1.1.3

Categories Spend by Region:

Tab 1.1.4

5
Table 1.1.4 shows categories spend on the basis of Region where
1. Other has spent 1,067,759
2. Lisbon has spent 2,386,813
3. Oporto has spent 1,555,088

Fig 1.1
Fig 1.1 shows the visual representation of Table 1.1.4 in the form of bar graph.

Categories Spend by Channel:

Tab 1.1.5
Table 1.1.5 shows categories spend on the basis of Channel where
1. Hotel has spent 7,999,569
2. Retail has spent 6,619,931

6
Fig 1.2
Fig 1.2 shows the visual representation of Table 1.1.5 in the form of bar graph.

Categories Spend by Channel and Region:

Fig 1.3

7
Fig 1.3 is a visual representation of categories spent in different regions through two channels.
From above data we can conclude that in Region Other has spent highest and Oporto has spent the least
where as in Channels Hotel has spent the higher as compared to Retail.

(a) Which Region and which Channel spent the most?

In Regions Others has spent the most and in Channels Hotel has spent the most.

(b) Which Region and which Channel spent the least?

In Regions Oporto has spent the least and in Channels Retail has spent the least.

1.2 There are 6 different varieties of items that are considered. Describe and comment/explain all the varieties
across Region and Channel? Provide a detailed justification for your answer.

Tab1.1.6
Measure of Central Tendency - Mean, Median, mode Measure of Dispersion - Range, IQR, Standard Deviation
From the Tab 1.1.3 & Tab 1.1.6, we can infer the following
 Channel has two unique values, with "Hotel" as most frequent with 298 out of 440 transactions. i.e
67.7 percentage of spending comes from "Hotel" channel.
 Retail has three unique values, with "Other" as most frequent with 316 out of 440 transactions.
i.e.71.8 percentage of spending comes from "Other" region.
 Fresh item (440 count),
has a mean of 12000.3, standard deviation of 12647.3, with min value of 3 and max value of 112151.

The other aspect is Q1(25%) is 3127.75, Q3(75%) is 16933.8, with Q2(50%) 8504

range = max-min =112151-3=112,148 & IQR = Q3-Q1 = 16933.8-3127.75 = 13,806.05 (this helpful
in calculating the outlier(1.5 IQR Lower/Upper limit))

 Milk item (440 count),

has a mean of 5796.27, standard deviation of 7380.38, with min value of 55 and max value of 73498.

The other aspect is Q1(25%) is 1533, Q3(75%) is 7190.25, with Q2(50%) 3627

8
range = max-min =73498-55=73443 & IQR = Q3-Q1 = 7190.25-1533 = 5657.25

 Grocery item (440 count),

has a mean of 7951.28, standard deviation of 9503.16, with min value of 3 and max value of 92780.

The other aspect is Q1(25%) is 2153, Q3(75%) is 10655.8, with Q2(50%) 4755.5

range = max-min =92780-3=92777 & IQR = Q3-Q1 = 10655.8-2153 = 8502.8

 Frozen (440 count),

has a mean of 3071.93, standard deviation of 4854.67, with min value of 25 and max value of 60869.

The other aspect is Q1(25%) is 742.25, Q3(75%) is 3554.25, with Q2(50%) 1526

range = max-min =60869-25=60844 & IQR = Q3-Q1 = 3554.25-742.25 = 2812

 Detergents_Paper (440 count),

has a mean of 2881.49, standard deviation of 4767.85, with min value of 3 and max value of 40827.

The other aspect is Q1(25%) is 256.75, Q3(75%) is 3922, with Q2(50%) 816.5

range = max-min = 40827-3= 40824 & IQR = Q3-Q1 = 3922-256.75 = 3665.25

 Delicatessen (440 count),

has a mean of 1524.87, standard deviation of 2820.11, with min value of 3 and max value of 47943.

The other aspect is Q1(25%) is 408.25, Q3(75%) is 1820.25, with Q2(50%) 965.5

range = max-min =47943-3=47940 & IQR = Q3-Q1 = 1820.25-408.25 = 1412

Visual representation of all varieties in the form of histogram across Region and Channel.

9
Fig 1.4

By Fig 1.4 we can conclude that Data is left skewed, All varieties show similar behaviour across Region and
Channel.

1.3 On the basis of the descriptive measure of variability, which item shows the most inconsistent behaviour?
Which items shows the least inconsistent behaviour?
We have calculated the Variance & Coefficient of Variance of all the varieties.

Varities Variance Coefficient of Variance

Fresh 1.599549e+08 1.0527196084948245
Milk 5.446997e+07 1.2718508307424503
Grocery 9.031010e+07 1.19381544774926
Frozen 2.356785e+07 1.578535529860776
Detergents_Paper 2.273244e+07 1.6527657881041729
Delicatessen 7.952997e+06 1.847304103918930
Tab 1.1.7

Tab 1.1.7 shows the Variance & Coefficient of Variance of all the varieties.
After Observing on the basis of Coefficient of Variance

Fresh item have lowest coefficient of Variation So that is consistent.

Delicatessen item have highest coefficient of Variation, So that is Inconsistent.

10
1.4 Are there any outliers in the data? Back up your answer with a suitable plot/technique with the help of
detailed comments.
Use Boxplot to see Outliers:
In Fig 1.5 The black point is the outliers in boxplot graph.

Fig 1.5

Fig 1.6
Yes there are outliers in all the items across the product range (Fresh, Milk, Grocery, Frozen,
Detergents_Paper & Delicatessen)

11
1.5 On the basis of your analysis, what are your recommendations for the business? How can your analysis
help the business to solve its problem? Answer from the business perspective.
As per the analysis, I find out there are inconsistencies in spending of different items (by calculating coefficient
of variance ), which should be decreased.The spending in Hotel and Retail Channel are scattered which should
be more or less equal, and lso spent should be equal for different regions. More focus should be given to items
other than Fresh & Grocery.

Problem 2 Wholesale Customers Analysis

The Student News Service at Clear Mountain State University (CMSU) has decided to gather data about the
undergraduate students that attend CMSU. CMSU creates and distributes a survey of 14 questions and receives
responses from 62 undergraduates
2.1 For this data, construct the following contingency tables (Keep Gender as row variable)
The data is stored in the Survey data set vas follows

Tab2.1.1
2.1.1. Gender and Major

Tab2.1.2
2.1.2. Gender and Grad Intention

Tab2.1.3

2.1.3. Gender and Employment

12
Tab2.1.4

2.1.4. Gender and Computer

Tab2.1.5
2.2. Assume that the sample is representative of the population of CMSU. Based on the data, answer the
following question:
2.2.1. What is the probability that a randomly selected CMSU student will be male?
From all the contingency tables creates it can be seen that.
Total No of Students = 62
Total No of Male = 29
Probability a randomly selected student will be male =Total No of Male / Total No of Students
Hence from the calculations done in Python we conclude that :
The probability that a randomly selected CMSU student will be male is 46.77%
2.2.2. What is the probability that a randomly selected CMSU student will be female?
From all the contingency tables creates it can be seen that.
Total No of Students = 62
Total No of Female = 33
Probability a randomly selected student will be male =Total No of Male / Total No of Female

Hence from the calculations done in Python we conclude that :

The probability that a randomly selected CMSU student will be Female is 53.23 %

2.3. Assume that the sample is representative of the population of CMSU.Based on the data, answer the following
question:

2.3.1. Find the conditional probability of different majors among the malestudents in CMSU.
13
Contingency table For Gender and Major :

Tab2.1.2

From all the contingency tables creates it can be seen that.

Probability of Accounting among the male students = 4/29
Probability of CIS among the male students = 1 / 29
Probability of Economics/Finance among the male students = 4 /29
Probability of International Business among the male students = 2/29
Probability of Management among the male students Management = 6/29
Probability of Other among the male students Other = 4/29Probability of Retailing/Marketing among the
male students = 5/29Probability of Undecided among the male students = 3/29

Hence from the calculations done in Python we conclude that :

The Probability of Accounting among the male students is 13.79%
The Probability of CIS among the male students is 3.45%
The Probability of Economics/Finance among the male students 13.79%
The Probability of International Business among the male students 6.9%
The Probability of Management among the male students Management is 20.69%
The Probability of Other among the male students Other 13.79%
The Probability of Retailing/Marketing among the male students 17.24%
The Probability of Undecided among the male students 10.34%

2.3.2 Find the conditional probability of different majors among the femalestudents of CMSU.

Contingency table For Gender and Major

Tab2.1.2
From all the contingency tables creates it can be seen that.

Probability of Accounting among the female students = 3/33

Probability of CIS among the female students = 3/33
Probability of Economics/Finance among the female students = 7/33
Probability of International Business among the female students = 4/33
Probability of Management among the female students Management = 4/33
Probability of Other among the female students Other = 3/33
Probability of Retailing/Marketing among the female students = 9/33
14
Probability of Undecided among the female students = 0/33

Hence from the calculations done in Python we conclude that :

The Probability of Accounting among the female students is 9.09%

The Probability of CIS among the female students is 9.09%
The Probability of Economics/Finance among the female students 21.21%
The Probability of International Business among the female students 12.12%
The Probability of Management among the female students Management is 12.12%
The Probability of Other among the female students Other 9.09%
The Probability of Retailing/Marketing among the female students 27.27%
The Probability of Undecided among the female students 0%

2.4. Assume that the sample is a representative of the population of CMSU.Based on the data, answer the following
question:
2.4.1. Find the probability That a randomly chosen student is a male andintends to graduate.

Contingency table For Gender and Grad Intention :

Tab2.1.3
Probability that a randomly chosen student is a Male = 29/62
Probability of Male that intends to Gradruate = 17/29
Probability a randomly chosen student is a male and intends to graduate
= Probability that a randomly chosen student is a Male*Probability that a randomly chosenstudent is a Male

Hence from the calculations done in Python we conclude that :

The probability That a randomly chosen student is a male and intends to graduate is 27.42 %

2.4.2 Find the probability that a randomly selected student is a female and does NOT have a laptop.

Contingency table For Gender and Computer :

Tab2.1.5

15
Probability that a randomly chosen student is a Female = 33/62
Probability of Female with No Laptop = 1-(29/33)

Probability that a randomly selected student is a female and does NOT have a laptop
= Probability that a randomly chosen student is a Female * Probability of Female with NoLaptop

Hence from the calculations done in Python we conclude that :

The probability that a randomly selected student is a female and does NOT have a laptop is 6.45 %

2.5. Assume that the sample is representative of the population of CMSU.Based on the data, answer the following
question:

2.5.1. Find the probability that a randomly chosen student is either a maleor has full-time employment?

Contingency table For Gender and Employment :

Tab2.1.4

Probability of a Student being Male = 29/33

Probability of a student having FullTime Employment = 10/62
Probability of a Male having FullTime Employment = 7/29
Probability that a randomly chosen student is either a male or has full-time employment
= Probability of a Student being Male + Probability of a student having FullTime Employment - Probability
of a Male having FullTime Employment

Hence from the calculations done in Python we conclude that :

The probability that a randomly chosen student is either a male or has a full-time employment79.87 %

2.5.2. Find the conditional probability that given a female student israndomly chosen, she is majoring in
international business ormanagement.

Contingency table For Gender and Major :

Tab2.1.2

16
Probability of international business given Female = 4/33
Probability of management given Female = 4/33

Since international business and management are independent of each other

Probability of international business or management given Female
= Probability of international business given Female + Probability of management givenFemale

Hence from the calculations done in Python we conclude that :

The conditional probability that given a female student is randomly chosen, she is majoring ininternational
business or management is 24.242 %

2.6.Construct a contingency table of Gender and Intent to Graduate at 2levels (Yes/No).

The Undecided students are not considered now and the table is a 2x2 table. Do you think the graduate
intention and being femaleare independent events?

2X2 Contingency table of Gender and Intent to Graduate without considering the Undecidedstudents

Tab2.1.3

Two events A and B can be proved to be Independent events when it satisfies the condition :
P(A∩B) = P(A) * P(B)

In this case if being female and graduate intention are independent can be proven by checking thecondition :
P(F∩Yes) = P(F) * P(Yes)

Where F = Female Yes = Grad Intention being Yes

Hence from the calculations done in Python we conclude that : P(F∩Yes) ≠ P(F) * P(Yes)

Hence, Graduate intention and being female are not independent events

2.7. Note that there are four numerical (continuous) variables in the dataset, GPA, Salary, Spending, and
Text Messages. Answer the following questions based on the data

2.7.1. If a student is chosen randomly, what is the probability that his/herGPA is less than 3?

Since GPA is a continuous variable the Probability of a student whose GPA is less than 3 an be calculated by
using the Poisson Distribution.

To calculate the probability of GPA 3 or less we will add the prob of 0,1,2 and 3 GPA obtained in the
PoissonDistribution.

Hence from the calculations done in Python we conclude that :

17
If a student is chosen randomly, what is the probability that his/her GPA is less than 3is 39.49%

2.7.2. Find the conditional probability that a randomly selected male earns50 or more. Find the conditional
probability that a randomly selectedfemale earns 50 or more.
(a) Conditional probability that a randomly selected male earns 50 or more:

Fig 2.1

The above distplot (Fig2.1) represents the salary of all the Male in the population.
As we can see it is normally distributed hence the conditional probability that a randomly selected male earns
50 or more can be calculated using the Normal distribution.

To calculate this, we will calculate the cumulative probability for less than 50 using Normal Distribution
andthen will subtract from 1.

Hence from the calculations done in Python we conclude that :

The Conditional probability that a randomly selected male earns 50 or more is 83.04 %

(b) Conditional probability that a randomly selected female earns 50 or more:

Fig 2.2

The above distplot Fig 2.2 represents the salary of all the Female in the population.

18
As we can see it is normally distributed hence the conditional probability that a randomly selected female
earns 50 or more can be calculated using the Normal distribution.

To calculate this, we will calculate the cumulative probability for less than 50 using Normal Distribution
andthen will subtract from 1.

Hence from the calculations done in Python we conclude that :

The Conditional probability that a randomly selected Female earns 50 or more is 86.09 %

2.8. Note that there are four numerical (continuous) variables in the dataset, GPA, Salary, Spending, and Text
Messages. For each of them comment whether they follow a normal distribution. Write a note summarizing
your conclusions.

Fig 2.3
19
From the above histograms Fig 2.3 for the continuous variables GPA, Salary, Spending and Text Messages we
can see that :

 GPA is almost Normally Distributed with a slight skewness toward the left.
 Salary is also Normally Distributed with a slight skewness towards the right.
 Spending is not Normally distributed and highly Right Skewed
 Text message is not Normally distributed and highly Right Skewed.

The following table consist of the Skewness value of the variables.

Tab2.1.6

As mentioned earlier from the Tab2.1.6 it is evident that :

 GPA has very less Skewness and it is negative, so it is towards the left.
 Salary also has very less skewness but positive, so it is towards the right.
 Spending is highly Right Skewed
 Text Message is highly Right Skewed.

Problem 3 A & B shingles

An important quality characteristic used by the manufacturers of ABC asphalt shingles is the amount of moisture
the shingles contain when they are packaged. Customers may feel that they have purchased a product lacking in quality if
they find moisture and wet shingles inside the packaging. In some cases, excessive moisture can cause the granules
attached to the shingles for texture and colouring purposes to fall off the shingles resulting in appearance problems. To monitor
the amount of moisture present, the company conducts moisture tests. A shingle is weighed and then dried.
Theshingle is then reweighed, and based on the amount of moisture taken out of the product, the pounds of moisture per 100
square feet is calculated. The company would like to show that the mean moisturecontent is less than 0.35 pound per 100
square feet.

The file (A & B shingles.csv) includes 36 measurements (in pounds per 100 square feet) for A shinglesand 31
for B shingles. This business report provides detailed explanation of approach to each problem given in the
assignment and provides relative information with regards to solving the problem.

20
Tab3.1.1

3.1 Do you think there is evidence that means moisture contents in both types of shingles are within the
permissible limits? State your conclusions clearly showing all steps.

For the A shingles, the null and alternative hypothesis to test whether the population mean moisture content
isless than 0.35 pound per 100 square feet is given:

H0 : mean moisture content <=0.35

HA : mean moisture content > 0.35

Level of significance: 0.05

We have a samples and we do not know the population standard deviation.The sample is not a large sample.

So you use the t distribution and the tSTAT test statisticSince we a testing for only sample A we use One
sample T test.

Also as python by default inPython, ttest_1samp shows the result of 2-sided it is divided by 2 as our is a
!_Sided test.

Hence from the calculations done in Python we conclude that :

Our one-sample t-test p-value= [0.07477633]

We have no evidence to reject the null hypothesis since p value > Level of significance

For the B shingles, the null and alternative hypothesis to test whether the population mean moisture content
isless than 0.35 pound per 100 square feet is given:

H0 : mean moisture content <=0.35

HA : mean moisture content > 0.35

Level of significance: 0.05

We have a samples and we do not know the population standard deviation.The sample is not a large sample.
So you use the t distribution and the tSTAT test statisticSince we a testing for only sample A we use One
sample T test. . Also as python by default inPython, ttest_1samp shows the result of 2-sided it is divided by 2
as our is a !_Sided test.
Hence from the calculations done in Python we conclude that :

21
Our one-sample t-test p-value= [0.0020904774003191826]

We have evidence to reject the null hypothesis since p value < Level of significance

3.2 Do you think that the population mean for shingles A and B are equal? Form the hypothesis and conduct
the test of the hypothesis. What assumption do you need to check before the test for equality of means
isperformed?

Theoretical Assumptions for the Hypothesis Testing :

To perform a Test of equality of the population mean of the A shingles and B shingles, the null and
alternativehypothesis to test whether the population mean moisture content is equal is given:

H0 : mean moisture content of A = mean moisture content of BHA : mean moisture content of A
≠ mean moisture content of B
Level of significance: 0.05

We have two samples A and B and we do not know the population standard deviation.

The samples are not large sample. So you use the t distribution and the tSTAT test statisticSince we a testing
for equality between sample A and B we use two sample T test.

Hence from the calculations done in Python we conclude that :

Two-sample t-test p-value= 0.2017496571835306

We do not have enough evidence to reject the null hypothesis in favour of alternative hypothesis since
p value > Level of significance

Therefore, It can be concluded that the population mean for shingles A and B are equal.

A Wholesale Distributor
100% (3)
A Wholesale Distributor
5 pages
Arnab Chowdhury As1
No ratings yet
Arnab Chowdhury As1
12 pages
Statistic Quiz 1
No ratings yet
Statistic Quiz 1
5 pages
SMDM Project SAMPLE REPORT
0% (2)
SMDM Project SAMPLE REPORT
7 pages
SMDM Business-Report Arvind Soni-2
0% (1)
SMDM Business-Report Arvind Soni-2
15 pages
Asphalt Shingles Data Analysis PDF
No ratings yet
Asphalt Shingles Data Analysis PDF
4 pages
Prob 3
No ratings yet
Prob 3
2 pages
Project SMDM Kundan Sinha PDF
0% (1)
Project SMDM Kundan Sinha PDF
4 pages
Factor-Hair RV PDF
No ratings yet
Factor-Hair RV PDF
23 pages
Project: Advanced Statistics: Anova, Eda and Pca
No ratings yet
Project: Advanced Statistics: Anova, Eda and Pca
35 pages
Project Questions
No ratings yet
Project Questions
4 pages
Business Analytics Report: Submitted To
No ratings yet
Business Analytics Report: Submitted To
32 pages
Project 2 SMDM
50% (2)
Project 2 SMDM
5 pages
Problem Statement 1
100% (1)
Problem Statement 1
17 pages
SMDM-Project Report (Madhur Dhananiwala)
100% (2)
SMDM-Project Report (Madhur Dhananiwala)
43 pages
SMDM Project Report Dipti
No ratings yet
SMDM Project Report Dipti
14 pages
SMDM Project Report
100% (1)
SMDM Project Report
19 pages
SMDM Project Report
100% (1)
SMDM Project Report
9 pages
Akshaya SMDM Project Report
100% (1)
Akshaya SMDM Project Report
18 pages
SMDM Assignment: Problem 1
0% (1)
SMDM Assignment: Problem 1
16 pages
Business Report Project - Sheetal - SMDM
100% (1)
Business Report Project - Sheetal - SMDM
20 pages
SMDM Project: Submitted By: Tina Das
100% (1)
SMDM Project: Submitted By: Tina Das
15 pages
Dbms db03 2020 Assessment (Solved) : Find Study Resources
50% (2)
Dbms db03 2020 Assessment (Solved) : Find Study Resources
12 pages
Vijayalakshmi
No ratings yet
Vijayalakshmi
17 pages
Project Advanced Statistics UMESHHASIJA SEP2021 Jupyter File
100% (1)
Project Advanced Statistics UMESHHASIJA SEP2021 Jupyter File
25 pages
Business Report SMDM Bhushan
No ratings yet
Business Report SMDM Bhushan
18 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Project Report - Advanced - Stats - Final PDF
No ratings yet
Project Report - Advanced - Stats - Final PDF
25 pages
Data Mining Clustering PDF
No ratings yet
Data Mining Clustering PDF
15 pages
Sunira - Predictive Modeling
100% (1)
Sunira - Predictive Modeling
65 pages
AV Project Shivakumar Vanga
100% (1)
AV Project Shivakumar Vanga
37 pages
Palash Bhai - Machine Learning Assignment
100% (2)
Palash Bhai - Machine Learning Assignment
18 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
Tushar Tukaram Bhakare: Education Skills
No ratings yet
Tushar Tukaram Bhakare: Education Skills
1 page
AS Project Report
No ratings yet
AS Project Report
22 pages
SMDM Assignment PDF
100% (1)
SMDM Assignment PDF
15 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
Rajiv Ranjan 11 Dec 2022
No ratings yet
Rajiv Ranjan 11 Dec 2022
18 pages
Problem 2 - Survey: Importing Nessceary Libraries
No ratings yet
Problem 2 - Survey: Importing Nessceary Libraries
10 pages
Marketing & Retail Analytics - Report - Part A
100% (2)
Marketing & Retail Analytics - Report - Part A
18 pages
Mini Project - Factor Hair Analysis: Sravanthi.M
100% (2)
Mini Project - Factor Hair Analysis: Sravanthi.M
24 pages
Detail Project Report SMDM
100% (1)
Detail Project Report SMDM
25 pages
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
No ratings yet
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
28 pages
Business Report Pradeep Chauhan 11june'23
100% (1)
Business Report Pradeep Chauhan 11june'23
25 pages
SMDM - Project Report - Lakshmi
No ratings yet
SMDM - Project Report - Lakshmi
26 pages
SMDM Extended Project
No ratings yet
SMDM Extended Project
1 page
Advance Stats Project Parijat
No ratings yet
Advance Stats Project Parijat
18 pages
Surabhi FRA PartA
No ratings yet
Surabhi FRA PartA
13 pages
Project Advance Stats - Abhishek
No ratings yet
Project Advance Stats - Abhishek
14 pages
Pranjal - Singh - 25.12.2022 - Data Mining Project
No ratings yet
Pranjal - Singh - 25.12.2022 - Data Mining Project
8 pages
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
100% (1)
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
12 pages
MRA - Project - Puvya - Ravi
100% (3)
MRA - Project - Puvya - Ravi
46 pages
PM ProjectJune - 2021
100% (1)
PM ProjectJune - 2021
33 pages
Anshul Dyundi Predictive Modelling Alternate Project July 2022
No ratings yet
Anshul Dyundi Predictive Modelling Alternate Project July 2022
11 pages
Project Predictive Modeling PDF
100% (1)
Project Predictive Modeling PDF
58 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
Advance Statistics-Project Report
50% (2)
Advance Statistics-Project Report
17 pages
Answer Report: Data Mining
No ratings yet
Answer Report: Data Mining
32 pages
Bollibathula Vani SMDM PROJECT
No ratings yet
Bollibathula Vani SMDM PROJECT
20 pages
SMDM Project
0% (1)
SMDM Project
22 pages
Final Draft Kai 1
No ratings yet
Final Draft Kai 1
5 pages
Thesis Title Chapter 3 NEW (03-23-2020)
100% (1)
Thesis Title Chapter 3 NEW (03-23-2020)
10 pages
ECE Project Design Documentation
No ratings yet
ECE Project Design Documentation
54 pages
BAN 602 - Project4
No ratings yet
BAN 602 - Project4
5 pages
Session 11-15: Dr. Anup Kumar
No ratings yet
Session 11-15: Dr. Anup Kumar
108 pages
Lab Report: Course: EEE 4554 (Random Signal and Processes Lab) Experiment No. 1 Experiment Name: Introduction
No ratings yet
Lab Report: Course: EEE 4554 (Random Signal and Processes Lab) Experiment No. 1 Experiment Name: Introduction
7 pages
Adamson University: Center For Research and Development
No ratings yet
Adamson University: Center For Research and Development
2 pages
Advisers Guide - Quantitative - IMRAD Version Updated
No ratings yet
Advisers Guide - Quantitative - IMRAD Version Updated
9 pages
Case Study On Stress Management PDF
No ratings yet
Case Study On Stress Management PDF
8 pages
BRM MCQ
50% (2)
BRM MCQ
44 pages
P&S UNIT-4 Sampling Theory
No ratings yet
P&S UNIT-4 Sampling Theory
9 pages
BADM 299 Exam 3 Chaps 009 & 011-Review Questions
No ratings yet
BADM 299 Exam 3 Chaps 009 & 011-Review Questions
6 pages
Holmium Oxide Glass Wavelength Nist
No ratings yet
Holmium Oxide Glass Wavelength Nist
4 pages
Assignment Project Using SPSS
No ratings yet
Assignment Project Using SPSS
14 pages
MMW Finals Activity
No ratings yet
MMW Finals Activity
4 pages
The Tesla Secret 1. (Subliminal Messages) (Facebook Notes)
100% (1)
The Tesla Secret 1. (Subliminal Messages) (Facebook Notes)
7 pages
Social Research Methods - A.P. Kelley - 2016
No ratings yet
Social Research Methods - A.P. Kelley - 2016
64 pages
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
No ratings yet
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
6 pages
8 - Updated Ch15-Time Series Analysis and Forecasting
No ratings yet
8 - Updated Ch15-Time Series Analysis and Forecasting
39 pages
Carlo Rovelli - Loop Quantum Gravity
100% (3)
Carlo Rovelli - Loop Quantum Gravity
69 pages
LESSON 2 Discrete Probability Distribution
No ratings yet
LESSON 2 Discrete Probability Distribution
29 pages
DLP Science 7 Q1W1
No ratings yet
DLP Science 7 Q1W1
6 pages
What Is Document Analysis
100% (1)
What Is Document Analysis
6 pages
MTH 4th Grading Notes
No ratings yet
MTH 4th Grading Notes
19 pages
Research Design SLIDE
No ratings yet
Research Design SLIDE
16 pages
EC395 Lab 6
No ratings yet
EC395 Lab 6
4 pages
BRM CH 02 Formulation of Research Problem
No ratings yet
BRM CH 02 Formulation of Research Problem
8 pages
Assignment of Biostatistics
No ratings yet
Assignment of Biostatistics
8 pages
Practical Research 1
No ratings yet
Practical Research 1
24 pages

SMDM Project

Uploaded by

SMDM Project

Uploaded by

SMDM PROJECT REPORT

Name – Nandini Gupta

 FRESH : annual spending (m.u.) on fresh products (Continuous);

Categories Spend by Region:

Categories Spend by Channel:

Categories Spend by Channel and Region:

(a) Which Region and which Channel spent the most?

(b) Which Region and which Channel spent the least?

 Milk item (440 count),

 Grocery item (440 count),

range = max-min =92780-3=92777 & IQR = Q3-Q1 = 10655.8-2153 = 8502.8

 Frozen (440 count),

range = max-min =60869-25=60844 & IQR = Q3-Q1 = 3554.25-742.25 = 2812

 Detergents_Paper (440 count),

range = max-min = 40827-3= 40824 & IQR = Q3-Q1 = 3922-256.75 = 3665.25

 Delicatessen (440 count),

range = max-min =47943-3=47940 & IQR = Q3-Q1 = 1820.25-408.25 = 1412

Varities Variance Coefficient of Variance

Fresh item have lowest coefficient of Variation So that is consistent.

Delicatessen item have highest coefficient of Variation, So that is Inconsistent.

Problem 2 Wholesale Customers Analysis

2.1.3. Gender and Employment

2.1.4. Gender and Computer

Hence from the calculations done in Python we conclude that :

From all the contingency tables creates it can be seen that.

Hence from the calculations done in Python we conclude that :

Contingency table For Gender and Major

Probability of Accounting among the female students = 3/33

Hence from the calculations done in Python we conclude that :

The Probability of Accounting among the female students is 9.09%

Contingency table For Gender and Grad Intention :

Hence from the calculations done in Python we conclude that :

Contingency table For Gender and Computer :

Hence from the calculations done in Python we conclude that :

Contingency table For Gender and Employment :

Probability of a Student being Male = 29/33

Hence from the calculations done in Python we conclude that :

Contingency table For Gender and Major :

Since international business and management are independent of each other

Hence from the calculations done in Python we conclude that :

2.6.Construct a contingency table of Gender and Intent to Graduate at 2levels (Yes/No).

Where F = Female Yes = Grad Intention being Yes

Hence from the calculations done in Python we conclude that :

Hence from the calculations done in Python we conclude that :

(b) Conditional probability that a randomly selected female earns 50 or more:

Hence from the calculations done in Python we conclude that :

The following table consist of the Skewness value of the variables.

As mentioned earlier from the Tab2.1.6 it is evident that :

Problem 3 A & B shingles

H0 : mean moisture content <=0.35

Level of significance: 0.05

Hence from the calculations done in Python we conclude that :

Our one-sample t-test p-value= [0.07477633]

H0 : mean moisture content <=0.35

Level of significance: 0.05

Theoretical Assumptions for the Hypothesis Testing :

Hence from the calculations done in Python we conclude that :

You might also like