Q 1.2 describe command to be used ? or based on correlation?
Should I use pairplot (but
that does not give region/channel related correlation)
Q 1.3 answer to be made based on standard deviation?
Q 1.5 - Pending
Problem 1
Q 1.1 Use methods of descriptive statistics to summarize data.
#Which Region and which Channel seems to spend more?
#Which Region and which Channel seems to spend less?
Answer: Used ‘Max’ and ‘Min’ commands to identify highest and Lowest spender
Regions and Channels.
Other Region Spends more
Retail Channel Spends more
Lisbon Region Spends less
Hotel Channel Spends less
Q 1.2
There are 6 different varieties of items are considered.
#Do all varieties show similar behaviour across Region and Channel?
Answer:
Q 1.3
On the basis of the descriptive measure of variability, which item shows the most inconsistent
behaviour? Which items shows the least inconsistent behaviour?
Q 1.4
Are there any outliers in the data?
Answer: Yes, there are Outliers in all of the 6 Items, below box plots confirms the
same.
Q 1.5
On the basis of this report, what are the recommendations?
Problem 2
Part I
2.1. For this data, construct the following contingency tables (Keep Gender as row variable)
2.1.1. Gender and Major
Internation
Accountin CI Economics/Finan Manageme Othe Retailing/Marketi Undecide
Major al
g S ce nt r ng d
Business
Gende
r
Femal
3.0 3.0 7.0 4.0 4.0 3.0 9.0 NaN
e
Male 4.0 1.0 4.0 2.0 6.0 4.0 5.0
2.1.2. Gender and Grad Intention
Grad IntentionNo Undecided Yes
Gender
Female 9 13 11
Male 3 9 17
2.1.3. Gender and Employment
Employment Full-Time Part-Time Unemployed
Gender
Female 3 24 6
Male 7 19
2.1.4. Gender and Computer
ComputerDesktop LaptopTablet
Gender
Female 2.0 29.0 2.0
Male 3.0 26.0 NaN
Part II
Q 2.2.1. What is the probability that a randomly selected CMSU student will be male?
What is the probability that a randomly selected CMSU student will be female?
probability that a randomly selected CMSU student will be male is 47.0%
probability that a randomly selected CMSU student will be Female is 53.0%
(Please refer python book for coding details)
Q 2.2.2. Find the conditional probability of different majors among the male students in
CMSU.
Find the conditional probability of different majors among the female students of CMSU.
P(A) is probability of a particular Major
P(A|B) = P(A ∩ B) / P(B)
#Major Accounting - Female 3 and Male 4
Female = 33
Male = 29
conditional probability of Accounting major among the male student= 4/29 = 14%
conditional probability of Accounting major among the Female student= 3/33 = 9%
Q 2.2.3. Find the conditional probability of intent to graduate, given that the student is a
male.
Find the conditional probability of intent to graduate, given that the student is a female.
Q 2.2.4. Find the conditional probability of employment status for the male students as well
as for the female students.
Q 2.2.5. Find the conditional probability of laptop preference among the male students as
well as among the female students.
Q 2.3. Based on the above probabilities, do you think that the column variable in each case
is independent of Gender?
Justify your comment in each case
Q 2.4. Note that there are three numerical (continuous) variables in the data set, Salary,
Spending and Text Messages. For each of them comment whether they follow a normal
distribution.
Write a note summarizing your conclusions.
[Recall that symmetric histogram does not necessarily mean that the underlying distribution
is symmetric]
Q 3.1. For the A shingles, form the null and alternative hypothesis to test whether the
population mean moisture content is less than 0.35 pound per 100 square feet.
Q 3.2. For the B shingles, form the null and alternative hypothesis to test whether the
population mean moisture content is less than 0.35 pound per 100 square feet.
Q 3.3. Do you think that the population means for shingles A and B are equal?
Form the hypothesis and conduct the test of the hypothesis.
What assumption do you need to check before the test for equality of means is performed?
Q 3.4. What assumption about the population distribution is needed in order to conduct the
hypothesis tests above?