100% found this document useful (1 vote)
223 views20 pages

Business Report Project - Sheetal - SMDM

This business report summarizes analyses of two datasets: a wholesale customer dataset and a student survey dataset from Clear Mountain State University. For the wholesale dataset, the report identifies the region and channel with highest and lowest spending. It finds that spending behavior varies across categories, with fresh items showing the most inconsistency. There are outliers in the data. Recommendations focus on expanding to new regions and channels. For the student survey, the report constructs contingency tables and calculates probabilities across gender, major, employment, and other attributes. Tests find that gender and graduation intent are dependent. The probability of a GPA under 3 is also provided.

Uploaded by

Sheetal Mayanale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
223 views20 pages

Business Report Project - Sheetal - SMDM

This business report summarizes analyses of two datasets: a wholesale customer dataset and a student survey dataset from Clear Mountain State University. For the wholesale dataset, the report identifies the region and channel with highest and lowest spending. It finds that spending behavior varies across categories, with fresh items showing the most inconsistency. There are outliers in the data. Recommendations focus on expanding to new regions and channels. For the student survey, the report constructs contingency tables and calculates probabilities across gender, major, employment, and other attributes. Tests find that gender and graduation intent are dependent. The probability of a GPA under 3 is also provided.

Uploaded by

Sheetal Mayanale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

BUSINESS REPORT

PROJECT

SMDM
Student Name: Sheetal Basalingappa

Great Learning
Synopsis

Wholesale Customer Data Analysis: Page (3-8)


1.1 – Problem 1.1
1.2 – Problem 2.1
1.3 – Problem 3.1
1.4 – Problem 4.1
1.5 – Problem 5.1

Survey of Clear Mountain State University (CMSU) : Page (9-15)


2.1 - Problem 2.1
2.2 - Problem 2.2
2.3 - Problem 2.3
2.4 - Problem 2.4
2.6 - Problem 2.5
2.7 - Problem 2.6
2.7 - Problem 2.7
2.8 - Problem 2.8

Hypothesis Testing of ABC Asphalt Shingles Manufacturers : Page (16-20)


3.1 – Problem 3.1
3.2 – Problem 3.2
Wholesale Customer Analysis
Problem Statement:
A wholesale distributor operating in different regions of Portugal has information on annual spending of
several items in their stores across different regions and channels. The data consists of 440 large
retailers’ annual spending on 6 different varieties of products in 3 different regions (Lisbon, Oporto,
Other) and across different sales channel (Hotel, Retail).

Summary:

This business report provides detailed explanation of approach to each problem given in the assignment
and provides relative information with regards to solving the problem

Wholesale Customer Data Analysis

We imported the ‘Wholesale Customers Data dataset in python to analyze the spend under each store
items across regions and channel to find solutions to each problem. Below is the detailed approach for
the given data.

1.1 Use methods of descriptive statistics to summarize data.


We can conclude that the dataset has,

 440 counts in all the variables


 Two unique values in channel variable
 Three unique values in region variable
 Mean values of the variable are different
 The min value seems to be 3 for Fresh, Grocery, Detergents Paper and Delicatessen
 From the IQR values we understand the range of data lies in the 25%, 50%, 75%
 The max value seems to be 112151 holding by Fresh

1.1 Problem: 1.1.2. & 1.1.3 Which region and channel spend most & least?

Solution:

Using describe function in python we first looked at the basic descriptive statistics of the data set. Using
bar graph with Region and Channel we were able to identify region with maximum spend and minimum
spend. Below is the bar graph representation-Looking at the bar graph, Hotel Channel spends more, and
Retail spends least.

 Highest spend in the Region is from Others and lowest spend in the region is from Oporto

 Highest spend in the Channel is from hotel and lowest spend in the Channel is from Retail
Highest spend on region/channel is from others/hotel Least spend on region/channel is from
Oporto/hotel

Similarly, we grouped totals by region to get totals by region.


Other regions spend amount is the highest spend amount and
Oporto regions spend amount is least spend amount by region.

1.2Problem 1.2 There are 6 different varieties of items are considered. Do all varieties show similar
behavior across Region and Channel? Provide justification for your answer.

Solution:
Using pivot tables for each category and checking spend across Region and Channel we get the following
outputs

In OTHER REGION we can the spending is maximum on all varieties, and in the OPORTO REGION we can
find the spending is less on all varieties. We have 6 varieties so, if we see across Channel, we can find
insights on each of 6 varieties.
Across channel if we the spending on varieties is different for each products, we can Fresh variety spends
large in Hotel channel where it is less in Retail channel.

1.2 based on a descriptive measure of variability, which item shows the most inconsistent behavior?
Which items show the least inconsistent behavior?

Solution:

We can use IQR method or STD to find the MOST and LEAST inconsistent item.

IQR for all 6 varieties

Fresh varieties seem to be the most inconsistent in terms of spending by the buyer Delicatessen varieties
seems to be least inconsistent in terms of spending by the buyer

1.3 Are there any outliers in the data?

YES, all 6 varieties have outliers in the dataset.

We can use box plot to find the outliers in the dataset


1.4 Problem On the basis of your analysis, what are your recommendations for the business? How can
your analysis help the business to solve its problem? Answer from the business perspective

Based on the analysis done so far it is evident that there are many buyers in other region and there
are inconsistencies in spending of different items. so, we could improve our retailer base in Lisbon
and Oporto region as well We can find evidence Fresh items are considerably used by many retailers
among all the regions. We find that varieties like Frozen, Detergents paper and Delicatessen are not
Popular among the retailers so we can try maximize these products where the demand Is more. As
of now there is only two modes of sales either by Retail or Hotel. Considering the base of the region
we can use multiple sale method to reach our customers.
Survey of Clear Mountain State University (CMSU)

Problem Statement

The Student News Service at Clear Mountain State University (CMSU) has decided to gather data about
the undergraduate students that attend CMSU. CMSU creates and distributes a survey of 14 questions
and receives responses from 62 undergraduates.

Summary

This business report provides detailed explanation of approach to each problem given in the assignment
and provides a relative information with regards to solving the problem.
2 – CMSU Survey Data Analysis
We imported the ‘Survey-1’ dataset in python to analyze the data about the undergraduate students
who attend CMSU. Below is the detailed approach and answer.

2.1Problem. For this data, construct the following contingency tables (Keep Gender as
row variable)
2.1Problem 2.1.1 Gender and Major

Solution:
Below is the output from Python

2.1.1. Gender and Major

2.1.2. Gender and Grad Intention

2.1.3. Gender and Employment

2.1.4. Gender and Computer


2.1.1. Gender and Major

2.1.2. Gender and Grad Intention

2.1.3. Gender and Employment

2.1.4. Gender and Computer

2.2. Assume that the sample is representative of the population of CMSU. Based on the data, answer
the following question:

2.2.1. What is the probability that a randomly selected CMSU student will be male?

Number of male (A) = 29

Total Number of students (B) = 62

P (A/B) =29/62

The probability that a randomly selected CMSU student will be male is 46.77419354 8387096 %
2.2.2. What is the probability that a randomly selected CMSU student will be female?

Number of female (A) = 33

Total Number of students (B) = 62

P (A/B) =33/62

The probability that a randomly selected CMSU student will be female is 53.2258064 516129 %

2.3. Assume that the sample is representative of the population of CMSU. Based on the data, answer
the following question:

2.3.1. Find the conditional probability of different majors among the male students in CMSU.

Conditional probability of different Majors

P (Different Majors/ Male)

The snippet shows the probability of male choosing different majors

2.3.2 Find the conditional probability of different majors among the female students of CMSU.

P (Conditional Majors/ Female)

The snippet shows the probability of Female choosing different majors

2.4. Assume that the sample is a representative of the population of CMSU. Based on the data, answer
the following question:

2.4.1. Find the probability that a randomly chosen student is a male and intends to graduate.
Solution:
2.4.1. Find the probability that a randomly chosen student is a male and intends to graduate.

P (Grad Intent Yes/ Male) = 17/29

The probability that a randomly chosen student is a male and intends to graduate is 58.62%

2.4.2 Find the probability that a randomly selected student is a female and does NOT have a laptop.

P (Have a laptop/ female) = 29/33

P (does not have a laptop/ female) = 1- P (Have a laptop/ female) = 1-0.88=12%

The probability that a randomly selected student is a female and does NOT have a laptop is 12%

2.5. Assume that the sample is representative of the population of CMSU. Based on the data, answer
the following question:

2.5.1. Find the probability that a randomly chosen student is either a male or has full-time
employment? Probability of randomly selected student is male P (A) = 46.77%

Probability of randomly selected student has a fulltime job P (B) = 16.13%

Probability of male having a fulltime job P (A and B) = 11.29%

P = p_of_male_stu+p_of_fulltime_emp-p_of_male_fulltime_emp = 51.61%
The probability that a randomly chosen student is either a male or has full-time employment
51.61290322580645 %

2.5.2. Find the conditional probability that given a female student is randomly chosen, she is majoring
in international business or management.

Probability that given a female student is randomly chosen, she is majoring in international business or
management 24.24 %

2.6. Construct a contingency table of Gender and Intent to Graduate at 2 levels (Yes/No). The
Undecided students are not considered now, and the table is a 2x2 table. Do you think the graduate
intention and being female are independent events?

CONCLUSION:

The probability that a randomly selected Student is Female 50.0

The probability that a randomly selected student is female and intends to graduate 55.0 %

They are not independent events

2.7. Note that there are four numerical (continuous) variables in the data set, GPA, Salary, Spending,
and Text Messages.

Answer the following questions based on the data

2.6.1. If a student is chosen randomly, what is the probability that his/her GPA is less than 3? The
probability that his/her GPA is less than 3 is 27.419354838709676 %
2.6.2. Find the conditional probability that a randomly selected male earns 50 or more. Find the
conditional probability that a randomly selected female earns 50 or more.

Probability that a randomly selected male earns 50 or more is 48%

Probability that a randomly selected female earns 50 or more is 54%

2.8. Note that there are four numerical (continuous) variables in the data set, GPA, Salary, Spending,
and Text Messages. For each of them comment whether they follow a normal distribution. Write a
note summarizing your conclusions

Solution:

Used distplot to know the normal distribution of these four numerical (continuous) variables in the data
set – GPA, Salary, Spending and Text Messages
By these details we confirm that out of the given four data sets ‘GPA’ and ‘Salary’ are following normal
distribution whereas other two ‘Spending’ and ‘Text Messages’ are not following the normal distribution
SHINGLES ANALYSIS A & B
Problem Statement

An important quality characteristic used by the manufacturers of ABC asphalt shingles is the amount of
moisture the shingles contain when they are packaged. Customers may feel that they have purchased a
product lacking in quality if they find moisture and wet shingles inside the packaging. In some cases,
excessive moisture can cause the granules attached to the shingles for texture and coloring purposes to
fall off the shingles resulting in appearance problems. To monitor the amount of moisture present, the
company conducts moisture tests. A shingle is weighed and then dried. The shingle is then reweighed
and based on the amount of moisture taken out of the product; the pounds of moisture per 100 square
feet are calculated. The company would like to show that the mean moisture content is less than 0.35
pound per 100 square feet.

The file (‘A+&+B+shingles.csv’) includes 36 measurements (in pounds per 100 square feet) for A shingles
and 31 for B shingles.

Summary:

This business report provides detailed explanation of approach to each problem given in the assignment
and provides relative information with regards to solving the problem.
3 – Asphalt Shingles Data Analysis
We imported the ‘A & B shingles’ dataset in python to analyze the data about the Asphalt Shingles.
Below is the detailed approach and answer.

3.1Problem Do you think there is evidence that mean moisture contents in both types of
shingles are within the permissible limits? State your conclusions clearly showing all
steps.
SOLUTION:

In this problem we have provided with two independent samples of shingles A and B population
standard deviation is unknown and hence we can’t perform z test. So we have to go with t-test.

Since we have to find the mean moisture level is less than the permissible limit for the both samples we
have perform one sample t-test for sample A and sample B.

SAMPLE A
STEP 1:

DEFINE NULL AND ALTERNATE HYPOTHESIS

The null hypothesis states that the moisture content of sample A is greater or than equal to the
permissible limit, 𝜇 ≥ 0.35 The alternative hypothesis states that the moisture content of sample A is less
than permissible limit, 𝜇 < 0.35

𝐻0 : 𝜇 ≥ 0.35

𝐻 : 𝜇 < 0.35

STEP 2:

DECIDE THE SIGNIFICANCE LIMIT

Since alpha value is not given in the question, we assume it has alpha = 0.05

STEP 3

IDENTIFY THE TEST STATISTIC

We have sample A and we do not know the population standard deviation. Sample size n=36. We use
the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic for one sample t-test.

STEP 4:

CALCULATE THE P - VALUE AND TEST STATISTIC

Xbar = 0.316667
S = 0.135731
N = 36
Mu = 0.35
Tstat = -1.4735
(P Value/2) = 0.0747
STEP 5:

DECIDE TO REJECT OR ACCEPT NULL HYPOTHESIS

Since tstat > p_value, we fail to reject the null hypothesis

We conclude that the moisture content is greater than permissible limit in sample A.

SAMPLE B

STEP 1:

DEFINE NULL AND ALTERNATE HYPOTHESIS

The null hypothesis states that the moisture content of sample B is greater or than equal to the
permissible limit, 𝜇 ≥ 0.35

The alternative hypothesis states that the moisture content of sample B is less than permissible limit, 𝜇 <
0.35

𝐻0 : 𝜇 ≥ 0.35

𝐻 : 𝜇 < 0.35

STEP 2:

DECIDE THE SIGNIFICANCE LIMIT

Since alpha value is not given in the question, we assume it has alpha = 0.05

STEP 3

IDENTIFY THE TEST STATISTIC

We have sample A and we do not know the population standard deviation. Sample size n=31. We use
the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic for one sample t-test.

STEP 4:

CALCULATE THE P - VALUE AND TEST STATISTIC


Xbar = 0.2735

S = 0.1372

N = 31

Mu = 0.35

Tstat = -3.1003

P Value = 0.0020

STEP 5:

DECIDE TO REJECT OR ACCEPT NULL HYPOTHESIS

Since tstat < p_value, we reject the null hypothesis

We conclude that the moisture content is less than permissible limit in sample B

3.2 Do you think that the population mean for shingles A and B are equal? Form the hypothesis and
conduct the test of the hypothesis. What assumption do you need to check before the test for equality
of means is performed?

STEP 1

DEFINE NULL AND ALTERNATIVE HYPOTHESIS

In testing whether the mean for shingles A and Shingles B are the same, the null hypothesis states that
the mean of shingle A to mean of shingle B are the same, $\mu{A}$ equals $\mu{B}$. The alternative
hypothesis states that the mean are different, $\mu {A}$ is not equal to $\mu{B}$

STEP 2:

DECIDE THE SIGNIFICANCE LIMIT

Since alpha value is not given in the question we assume it has alpha = 0.05

STEP 3
IDENTIFY THE TEST STATISTIC

We have two samples and we do not know the population standard deviation.

Sample sizes for both samples are not the same.

The sample size is, n > 30. So we use the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic for two sample test.

Two tail test

STEP 4:

CALCULATE THE P - VALUE AND TEST STATISTIC

CALCULATE THE P - VALUE AND TEST STATISTIC

CALCULATION:

N1= 36 N2 =31

M1= 0.32 M2 = 0.27

S 2 1 = 0.02 S2 1 = 0.02

DF1 = 35 DF2 = 30

Tstat 1.2896282719661123

P Value 0.2017496571835306

STEP 5

DECIDE TO REJECT OR ACCEPT THE NULL HYPOTHESIS

Since tstat > p_value, we fail to reject the null hypothesis We conclude that mean for shingles A and
singles B are not the same

You might also like