0% found this document useful (0 votes)
97 views5 pages

Data Analysis and Probability Insights

The document discusses analyzing survey data from CMSU students. It includes constructing contingency tables for gender vs major, graduation intention, employment, and computer use. It then calculates conditional probabilities for these variables given gender. It also comments on whether the column variables are independent of gender. Finally, it discusses hypothesis tests for shingle moisture content data and the assumptions needed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views5 pages

Data Analysis and Probability Insights

The document discusses analyzing survey data from CMSU students. It includes constructing contingency tables for gender vs major, graduation intention, employment, and computer use. It then calculates conditional probabilities for these variables given gender. It also comments on whether the column variables are independent of gender. Finally, it discusses hypothesis tests for shingle moisture content data and the assumptions needed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

 Q 1.2  describe command to be used ? or based on correlation?

Should I use pairplot (but


that does not give region/channel related correlation)
 Q 1.3  answer to be made based on standard deviation?
 Q 1.5 - Pending

Problem 1

Q 1.1  Use methods of descriptive statistics to summarize data.

#Which Region and which Channel seems to spend more?

#Which Region and which Channel seems to spend less?

Answer: Used ‘Max’ and ‘Min’ commands to identify highest and Lowest spender
Regions and Channels.

Other Region Spends more


Retail Channel Spends more

Lisbon Region Spends less


Hotel Channel Spends less

Q 1.2 

There are 6 different varieties of items are considered.

#Do all varieties show similar behaviour across Region and Channel?

Answer:

Q 1.3 

On the basis of the descriptive measure of variability, which item shows the most inconsistent
behaviour? Which items shows the least inconsistent behaviour?
Q 1.4 

Are there any outliers in the data?

Answer: Yes, there are Outliers in all of the 6 Items, below box plots confirms the
same.

Q 1.5 

On the basis of this report, what are the recommendations?


Problem 2

Part I

2.1. For this data, construct the following contingency tables (Keep Gender as row variable)

2.1.1. Gender and Major

Internation
Accountin CI Economics/Finan Manageme Othe Retailing/Marketi Undecide
Major al
g S ce nt r ng d
Business
Gende
r
Femal
3.0 3.0 7.0 4.0 4.0 3.0 9.0 NaN
e
Male 4.0 1.0 4.0 2.0 6.0 4.0 5.0

2.1.2. Gender and Grad Intention

Grad IntentionNo Undecided Yes


Gender
Female 9 13 11
Male 3 9 17

2.1.3. Gender and Employment

Employment Full-Time Part-Time Unemployed


Gender
Female 3 24 6
Male 7 19

2.1.4. Gender and Computer

ComputerDesktop LaptopTablet
Gender
Female 2.0 29.0 2.0
Male 3.0 26.0 NaN

Part II
Q 2.2.1. What is the probability that a randomly selected CMSU student will be male?
What is the probability that a randomly selected CMSU student will be female?
probability that a randomly selected CMSU student will be male is 47.0%

probability that a randomly selected CMSU student will be Female is 53.0%


(Please refer python book for coding details)

Q 2.2.2. Find the conditional probability of different majors among the male students in
CMSU.
Find the conditional probability of different majors among the female students of CMSU.

P(A) is probability of a particular Major

P(A|B) = P(A ∩ B) / P(B)

#Major Accounting - Female 3 and Male 4

Female = 33

Male = 29

 conditional probability of Accounting major among the male student= 4/29 = 14%
 conditional probability of Accounting major among the Female student= 3/33 = 9%

Q 2.2.3. Find the conditional probability of intent to graduate, given that the student is a
male.
Find the conditional probability of intent to graduate, given that the student is a female.

Q 2.2.4. Find the conditional probability of employment status for the male students as well
as for the female students.

Q 2.2.5. Find the conditional probability of laptop preference among the male students as
well as among the female students.

Q 2.3. Based on the above probabilities, do you think that the column variable in each case
is independent of Gender?
Justify your comment in each case

Q 2.4. Note that there are three numerical (continuous) variables in the data set, Salary,
Spending and Text Messages. For each of them comment whether they follow a normal
distribution.
Write a note summarizing your conclusions.
[Recall that symmetric histogram does not necessarily mean that the underlying distribution
is symmetric]
Q 3.1. For the A shingles, form the null and alternative hypothesis to test whether the
population mean moisture content is less than 0.35 pound per 100 square feet.

Q 3.2. For the B shingles, form the null and alternative hypothesis to test whether the
population mean moisture content is less than 0.35 pound per 100 square feet.

Q 3.3. Do you think that the population means for shingles A and B are equal?
Form the hypothesis and conduct the test of the hypothesis.
What assumption do you need to check before the test for equality of means is performed?

Q 3.4. What assumption about the population distribution is needed in order to conduct the
hypothesis tests above?

You might also like