Business Report Project - Sheetal - SMDM
Business Report Project - Sheetal - SMDM
PROJECT
SMDM
Student Name: Sheetal Basalingappa
Great Learning
Synopsis
Summary:
This business report provides detailed explanation of approach to each problem given in the assignment
and provides relative information with regards to solving the problem
We imported the ‘Wholesale Customers Data dataset in python to analyze the spend under each store
items across regions and channel to find solutions to each problem. Below is the detailed approach for
the given data.
1.1 Problem: 1.1.2. & 1.1.3 Which region and channel spend most & least?
Solution:
Using describe function in python we first looked at the basic descriptive statistics of the data set. Using
bar graph with Region and Channel we were able to identify region with maximum spend and minimum
spend. Below is the bar graph representation-Looking at the bar graph, Hotel Channel spends more, and
Retail spends least.
Highest spend in the Region is from Others and lowest spend in the region is from Oporto
Highest spend in the Channel is from hotel and lowest spend in the Channel is from Retail
Highest spend on region/channel is from others/hotel Least spend on region/channel is from
Oporto/hotel
1.2Problem 1.2 There are 6 different varieties of items are considered. Do all varieties show similar
behavior across Region and Channel? Provide justification for your answer.
Solution:
Using pivot tables for each category and checking spend across Region and Channel we get the following
outputs
In OTHER REGION we can the spending is maximum on all varieties, and in the OPORTO REGION we can
find the spending is less on all varieties. We have 6 varieties so, if we see across Channel, we can find
insights on each of 6 varieties.
Across channel if we the spending on varieties is different for each products, we can Fresh variety spends
large in Hotel channel where it is less in Retail channel.
1.2 based on a descriptive measure of variability, which item shows the most inconsistent behavior?
Which items show the least inconsistent behavior?
Solution:
We can use IQR method or STD to find the MOST and LEAST inconsistent item.
Fresh varieties seem to be the most inconsistent in terms of spending by the buyer Delicatessen varieties
seems to be least inconsistent in terms of spending by the buyer
Based on the analysis done so far it is evident that there are many buyers in other region and there
are inconsistencies in spending of different items. so, we could improve our retailer base in Lisbon
and Oporto region as well We can find evidence Fresh items are considerably used by many retailers
among all the regions. We find that varieties like Frozen, Detergents paper and Delicatessen are not
Popular among the retailers so we can try maximize these products where the demand Is more. As
of now there is only two modes of sales either by Retail or Hotel. Considering the base of the region
we can use multiple sale method to reach our customers.
Survey of Clear Mountain State University (CMSU)
Problem Statement
The Student News Service at Clear Mountain State University (CMSU) has decided to gather data about
the undergraduate students that attend CMSU. CMSU creates and distributes a survey of 14 questions
and receives responses from 62 undergraduates.
Summary
This business report provides detailed explanation of approach to each problem given in the assignment
and provides a relative information with regards to solving the problem.
2 – CMSU Survey Data Analysis
We imported the ‘Survey-1’ dataset in python to analyze the data about the undergraduate students
who attend CMSU. Below is the detailed approach and answer.
2.1Problem. For this data, construct the following contingency tables (Keep Gender as
row variable)
2.1Problem 2.1.1 Gender and Major
Solution:
Below is the output from Python
2.2. Assume that the sample is representative of the population of CMSU. Based on the data, answer
the following question:
2.2.1. What is the probability that a randomly selected CMSU student will be male?
P (A/B) =29/62
The probability that a randomly selected CMSU student will be male is 46.77419354 8387096 %
2.2.2. What is the probability that a randomly selected CMSU student will be female?
P (A/B) =33/62
The probability that a randomly selected CMSU student will be female is 53.2258064 516129 %
2.3. Assume that the sample is representative of the population of CMSU. Based on the data, answer
the following question:
2.3.1. Find the conditional probability of different majors among the male students in CMSU.
2.3.2 Find the conditional probability of different majors among the female students of CMSU.
2.4. Assume that the sample is a representative of the population of CMSU. Based on the data, answer
the following question:
2.4.1. Find the probability that a randomly chosen student is a male and intends to graduate.
Solution:
2.4.1. Find the probability that a randomly chosen student is a male and intends to graduate.
The probability that a randomly chosen student is a male and intends to graduate is 58.62%
2.4.2 Find the probability that a randomly selected student is a female and does NOT have a laptop.
The probability that a randomly selected student is a female and does NOT have a laptop is 12%
2.5. Assume that the sample is representative of the population of CMSU. Based on the data, answer
the following question:
2.5.1. Find the probability that a randomly chosen student is either a male or has full-time
employment? Probability of randomly selected student is male P (A) = 46.77%
P = p_of_male_stu+p_of_fulltime_emp-p_of_male_fulltime_emp = 51.61%
The probability that a randomly chosen student is either a male or has full-time employment
51.61290322580645 %
2.5.2. Find the conditional probability that given a female student is randomly chosen, she is majoring
in international business or management.
Probability that given a female student is randomly chosen, she is majoring in international business or
management 24.24 %
2.6. Construct a contingency table of Gender and Intent to Graduate at 2 levels (Yes/No). The
Undecided students are not considered now, and the table is a 2x2 table. Do you think the graduate
intention and being female are independent events?
CONCLUSION:
The probability that a randomly selected student is female and intends to graduate 55.0 %
2.7. Note that there are four numerical (continuous) variables in the data set, GPA, Salary, Spending,
and Text Messages.
2.6.1. If a student is chosen randomly, what is the probability that his/her GPA is less than 3? The
probability that his/her GPA is less than 3 is 27.419354838709676 %
2.6.2. Find the conditional probability that a randomly selected male earns 50 or more. Find the
conditional probability that a randomly selected female earns 50 or more.
2.8. Note that there are four numerical (continuous) variables in the data set, GPA, Salary, Spending,
and Text Messages. For each of them comment whether they follow a normal distribution. Write a
note summarizing your conclusions
Solution:
Used distplot to know the normal distribution of these four numerical (continuous) variables in the data
set – GPA, Salary, Spending and Text Messages
By these details we confirm that out of the given four data sets ‘GPA’ and ‘Salary’ are following normal
distribution whereas other two ‘Spending’ and ‘Text Messages’ are not following the normal distribution
SHINGLES ANALYSIS A & B
Problem Statement
An important quality characteristic used by the manufacturers of ABC asphalt shingles is the amount of
moisture the shingles contain when they are packaged. Customers may feel that they have purchased a
product lacking in quality if they find moisture and wet shingles inside the packaging. In some cases,
excessive moisture can cause the granules attached to the shingles for texture and coloring purposes to
fall off the shingles resulting in appearance problems. To monitor the amount of moisture present, the
company conducts moisture tests. A shingle is weighed and then dried. The shingle is then reweighed
and based on the amount of moisture taken out of the product; the pounds of moisture per 100 square
feet are calculated. The company would like to show that the mean moisture content is less than 0.35
pound per 100 square feet.
The file (‘A+&+B+shingles.csv’) includes 36 measurements (in pounds per 100 square feet) for A shingles
and 31 for B shingles.
Summary:
This business report provides detailed explanation of approach to each problem given in the assignment
and provides relative information with regards to solving the problem.
3 – Asphalt Shingles Data Analysis
We imported the ‘A & B shingles’ dataset in python to analyze the data about the Asphalt Shingles.
Below is the detailed approach and answer.
3.1Problem Do you think there is evidence that mean moisture contents in both types of
shingles are within the permissible limits? State your conclusions clearly showing all
steps.
SOLUTION:
In this problem we have provided with two independent samples of shingles A and B population
standard deviation is unknown and hence we can’t perform z test. So we have to go with t-test.
Since we have to find the mean moisture level is less than the permissible limit for the both samples we
have perform one sample t-test for sample A and sample B.
SAMPLE A
STEP 1:
The null hypothesis states that the moisture content of sample A is greater or than equal to the
permissible limit, 𝜇 ≥ 0.35 The alternative hypothesis states that the moisture content of sample A is less
than permissible limit, 𝜇 < 0.35
𝐻0 : 𝜇 ≥ 0.35
𝐻 : 𝜇 < 0.35
STEP 2:
Since alpha value is not given in the question, we assume it has alpha = 0.05
STEP 3
We have sample A and we do not know the population standard deviation. Sample size n=36. We use
the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic for one sample t-test.
STEP 4:
Xbar = 0.316667
S = 0.135731
N = 36
Mu = 0.35
Tstat = -1.4735
(P Value/2) = 0.0747
STEP 5:
We conclude that the moisture content is greater than permissible limit in sample A.
SAMPLE B
STEP 1:
The null hypothesis states that the moisture content of sample B is greater or than equal to the
permissible limit, 𝜇 ≥ 0.35
The alternative hypothesis states that the moisture content of sample B is less than permissible limit, 𝜇 <
0.35
𝐻0 : 𝜇 ≥ 0.35
𝐻 : 𝜇 < 0.35
STEP 2:
Since alpha value is not given in the question, we assume it has alpha = 0.05
STEP 3
We have sample A and we do not know the population standard deviation. Sample size n=31. We use
the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic for one sample t-test.
STEP 4:
S = 0.1372
N = 31
Mu = 0.35
Tstat = -3.1003
P Value = 0.0020
STEP 5:
We conclude that the moisture content is less than permissible limit in sample B
3.2 Do you think that the population mean for shingles A and B are equal? Form the hypothesis and
conduct the test of the hypothesis. What assumption do you need to check before the test for equality
of means is performed?
STEP 1
In testing whether the mean for shingles A and Shingles B are the same, the null hypothesis states that
the mean of shingle A to mean of shingle B are the same, $\mu{A}$ equals $\mu{B}$. The alternative
hypothesis states that the mean are different, $\mu {A}$ is not equal to $\mu{B}$
STEP 2:
Since alpha value is not given in the question we assume it has alpha = 0.05
STEP 3
IDENTIFY THE TEST STATISTIC
We have two samples and we do not know the population standard deviation.
The sample size is, n > 30. So we use the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic for two sample test.
STEP 4:
CALCULATION:
N1= 36 N2 =31
S 2 1 = 0.02 S2 1 = 0.02
DF1 = 35 DF2 = 30
Tstat 1.2896282719661123
P Value 0.2017496571835306
STEP 5
Since tstat > p_value, we fail to reject the null hypothesis We conclude that mean for shingles A and
singles B are not the same