0% found this document useful (0 votes)

5 views34 pages

Introduction (1)

This report presents a guided project on inferential statistics, focusing on student demographics, moisture content in shingles, and salary relationships. It employs various statistical techniques, including probability analysis and hypothesis testing, to derive insights that can aid organizational decision-making. The document includes detailed methodologies, results, and conclusions for each analysis conducted.

Uploaded by

charangoud02126

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views34 pages

Introduction (1)

Uploaded by

charangoud02126

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Inferential Statistics (IS)

Guided project report

Submitted to

By
Pirangi Charan Teja Goud

In partial Fulfillment of
PGP-DSBA

List of Tables .................................................................................................................................... 3

Page
List of Equations............................................................................................................................... 5
Data Dictionary ................................................................................................................................ 6
Introduction ..................................................................................................................................... 7
Step 2: Select the Significance Level ....................................................................................... 24
Step 3: Conduct the Hypothesis Test ...................................................................................... 24
Results for Shingle A: ......................................................................................................... 24
Results for Shingle B: ......................................................................................................... 24
Step 4: Decision Based on p-value.......................................................................................... 25
Step 1: Define the Hypotheses ............................................................................................... 25
Step 2: Choose Significance Level .......................................................................................... 25
Step 3: Identify the Test Statistic............................................................................................. 25
Step 4: Compute the Test Statistic and p-value ....................................................................... 26
Step 5: Decision Rule ............................................................................................................ 26
Step 6: Conclusion ................................................................................................................ 26
Problem 3 ...................................................................................................................................... 26
Salary Distribution by Education Level (Boxplot) ......................................................................... 27
Summary of Salary Distribution by Education Level (Boxplot Analysis) .......................................... 27
Salary Variation and Spread ................................................................................................... 27
Presence of Outliers .............................................................................................................. 27
Skewness and Distribution Shape .......................................................................................... 28
Conclusion .............................................................................................................................. 28
Summary of Salary Distribution by Occupation (Boxplot Analysis) ................................................ 28
Median Salary Comparison .................................................................................................... 29
Salary Variability and Dispersion ............................................................................................ 29
Outliers and Distribution Characteristics ................................................................................ 29
Conclusion .............................................................................................................................. 29
Step 1: State the Hypotheses .................................................................................................... 30
Step 2: Check the Assumptions of ANOVA ................................................................................. 30

Page
Step 3: Conduct the Hypothesis Test (One-Way ANOVA) ............................................................. 30
Step 4: Conclusion from the Results .......................................................................................... 30
Conclusion .............................................................................................................................. 31
Step 1:.................................................................................................................................. 31
Null Hypothesis (H₀) .............................................................................................................. 31
Alternative Hypothesis (H₁) .................................................................................................... 31
Step 2: Assumption Checks ................................................................................................... 32
Step 3: Normality Test (Shapiro-Wilk Test)............................................................................... 32
Step 4: Homogeneity of Variance (Levene’s Test) ..................................................................... 32
Step 4: Conduct the Hypothesis Test (One-Way ANOVA) .......................................................... 32
Step 2: Check the Assumptions.............................................................................................. 33
Step 3: Conduct the Hypothesis Test (Two-Way ANOVA) .......................................................... 33
ANOVA Table ..................................................................................................................... 33
Step 4: Conclusion ................................................................................................................ 34

List of Tables
1. Table 1: Summary Statistics of Student Demographics
2. Table 2: Probability Analysis of Gender Distribution
3. Table 3: Conditional Probabilities of Majors by Gender
4. Table 4: Summary Statistics of Moisture Content in Shingles
5. Table 5: Hypothesis Test Results for Moisture Content
6. Table 6: ANOVA Results for Salary Differences

Page
List of Figures

1. Figure 1: Distribution of GPA

2. Figure 2: Distribution of Salary
3. Figure 3: Distribution of Spending
4. Figure 4: Distribution of Text Messages
5. Figure 5: Q-Q Plot of GPA Data
6. Figure 6: Q-Q Plot of Salary Data
7. Figure 7: Q-Q Plot of Spending Data
8. Figure 8: Q-Q Plot of Text Messages Data
9. Figure 9: Histogram of Moisture Content – Shingle A
10. Figure 10: Boxplot of Moisture Content – Shingle A
11. Figure 11: Histogram of Moisture Content – Shingle B
12. Figure 12: Boxplot of Moisture Content – Shingle B
13. Figure 13: Correlation Heatmap Between Shingles A & B
14. Figure 14: Boxplot – Salary Distribution by Education Level
15. Figure 15: Boxplot – Salary Distribution by Occupation

Page
List of Equations
1. Probability Formula:
𝐹𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠
𝑃(𝐸𝑣𝑒𝑛𝑡) =
𝑇𝑜𝑡𝑎𝑙 𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠
2. Conditional Probability Formula:
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴|𝐵|) =
𝑃(𝐵)
3. Union of two Events:

𝑃(𝐴 ∪ 𝐵 ) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)

4. ANOVA F-Test Formula:

𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝐹 =
𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒

Page
Data Dictionary

Column Name Data Type Description

ID Integer Unique identifier for each student

Gender Text Student’s gender (Male/Female)
Age Integer Student’s age in years
Class Text Academic class or year
Major Text Declared major field of study
Grad Intention Text Whether the student intends to graduate (Yes/No)
GPA Float Student's Grade Point Average
Employment Text Employment status (Full-time, Part-time, Unemployed)
Salary Float Monthly salary in dollars
Social Networking Float Hours spent on social networking per day
Satisfaction Integer Satisfaction score (scale-based)
Spending Float Amount of money spent per semester
Computer Text Type of computer owned (Laptop/None)
Text Messages Integer Number of text messages sent per day

Page
Introduction

This report provides statistical insights into three different problems: Student Demographics and Behavioral
Analysis, Moisture Content in ABC Asphalt Shingles and The Relationship Between Salary, Education, and
Occupation. This report applies statistical techniques such as probability analysis, hypothesis testing, and
ANOVA to address key business problems. Each section follows a structured approach, detailing the problem,
methodology, results, and conclusions. The insights derived from this analysis will support organizations in
enhancing decision-making processes and optimizing operational efficiency.

Problem 1:
The Student News Service at Clear Mountain State University (CMSU) has decided to gather data about the
undergraduate students that attend CMSU. CMSU creates and distributes a survey of 14 questions and receives
responses from 62 undergraduates (stored in the Survey data set).

1.1 What is the probability that a randomly selected CMSU student will be male?
Solution: the probability that a randomly selected CMSU student will be male is
Probability = (Number of male students) / (Total number of students)

From the dataset:

• Number of male students = 29

Page
• Total number of students = 62
62
P(Male)= = 0.468 (approx.)
29

Thus, the probability that a randomly selected CMSU student will be male is 0.468 (or 46.8%).

1.2 What is the probability that a randomly selected CMSU student will be female?
Solution: The probability of selecting a female student is calculated using the formula
Probability = (Number of female students) / (Total number of students)

From the dataset:

• Number of female students = 33

• Total number of students = 62
33
P(Female)= = 0.532(approx.)
62

Thus, the probability that a randomly selected CMSU student will be female is 0.532 (or 53.2%).

1.3What is the conditional probability of different majors among male students in CMSU?

Probability = (Number of male students in a specific major) / (Total number of male students)
• Management:
6
P (Management / Male) = = 0.207 (approx.)
29

• Retailing/Marketing:
5
P (Management / Male) = = 0.172 (approx.)
29

• Other:
4
P (Other / Male) = = 0.138 (approx.)
29

• Economics/Finance:
4
P (Economics or Finance / Male) = = 0.138 (approx.)
29

• Accounting:

Page
4
P (Accounting /Male) = = 0.138 (approx.)
29

• Undecided:
3
P (Undecided / Male) = = 0.103 (approx.)
29

• International Business:
2
P (International Business / Male) = = 0.069 (approx.)
29

• CIS:
1
P (International Business / Male) = = 0.034 (approx.)
29

1.4 What is the conditional probability of different majors among the female students of CMSU?
Solution: The conditional probability of a female student being in a particular major is given by the formula:
Probability = (Number of female students in a specific major) / (Total number of female students)

From the dataset, the total number of female students is 33. Below are the probabilities for different majors
among female students:

• Retailing/Marketing:
9
P (Retailing or Marketing /Female) = = 0.273 (approx.)
33
• Economics/Finance:
7
P (Economics or Finance / Female)= = 0.212 (approx.)
33
• Management:
4
P (Management / Female) = 33
= 0.121 (approx.)
• International Business:
4
P (International Business / Female) = = 0.121 (approx.)
33
• Other:
3
P (Other / Female) =33 = 0.091 (approx.)
• CIS:
3
P (CIS / Female) = = 0.091 (approx.)
33
• Accounting:
3
P (Accounting / Female) = = 0.091 (approx.)
33

Page
These probabilities represent the likelihood that a randomly selected female student is in a specific major.

1.5 What is the probability that a randomly chosen student is a male and intends to graduate?
Solution: The probability that a randomly chosen student is male and intends to graduate is calculated using the
formula:
Probability = (Number of male students who intend to graduate) / (Total number of students)

From the dataset:

• Number of male students who intend to graduate = 17

• Total number of students = 62
17
P (Male ∩ Graduate Intention) = = 0.274
62
Thus, the probability that a randomly chosen student is male and intends to graduate is 0.274 (or
27.4%).

1.6 What is the probability that a randomly selected student is a female and does NOT have a laptop?
Solution: the probability that a randomly selected student is a female and does NOT have a laptop:
Probability = (Number of female students without a laptop) / (Total number of students)

From the dataset:

• Number of female students who do not have a laptop = 4

• Total number of students = 62
4
P (Female ∩ No Laptop) = 62 = 0.065

The probability that a randomly selected student is female and does NOT have a laptop is 0.065 (or 6.5%).

1.7 What is the probability that a randomly chosen student is a male or has full-time employment?
Solution: The probability that a randomly chosen student is male or has full-time employment is calculated
using the formula:

P (Male ∪ Full-Time Employment) =P(Male)+P (Full-Time Employment) −P (Male ∩ Full-Time Employment)

From the dataset:

• Number of male students = 29

Page
• Number of students with full-time employment = 11
• Number of male students with full-time employment = 6
• Total number of students = 62

Now, calculating the probabilities:

29
P(Male) = = 0.468
62

11
P (Full-Time Employment) = 62 = 0.177

6
P (Male ∩ Full-Time Employment) = 62 = 0.097

Applying the formula:

P (Male ∪ Full-Time Employment) = 0.468 + 0.177 − 0.097 = 0.548

The probability that a randomly chosen student is male or has full-time employment is 0.548 (or 54.8%).

1.8 What is the conditional probability that given a female student is randomly chosen, she is majoring in
international business or management?
Solution: the conditional probability that given a female student is randomly chosen; she is majoring in
international business or management:
Probability = (Number of female students majoring in International Business or Management) / (Total number
of female students)
8
= = 0.242
33

Thus, the probability that a randomly chosen female student is majoring in International Business or
Management is 0.242 (or 24.2%).

1.9 If a student is chosen randomly, what is the probability that his/her GPA is less than 3?
Solution: The probability that a randomly chosen student has a GPA less than 3 is given by:

Probability = (Number of students with GPA < 3) / (Total number of students)

Using the dataset:

Page
• Number of students with GPA < 3 = 17
• Total number of students = 62
•
17
P (GPA < 3) = 62 = 0.274
Thus, the probability that a randomly chosen student has a GPA less than 3 is 0.274 (or 27.4%).

1.10 What is the conditional probability that a randomly selected male earns 50 or more?
Solution:
the conditional probability that a randomly selected male earns 50 or more:
P (Earning ≥ 50/ Male) = Number of males earning ≥50 / Total number of males
14
=29 = 0.4828 = 48.28%

the conditional probability that a randomly selected male earns 50 or more is 0.4828 or 48.28%
1.11 What is the conditional probability that a randomly selected female earns 50 or more?
Solution:
the conditional probability that a randomly selected female earns 50 or more:
P (Earning ≥ 50 / Female) = Number of males earning ≥50 / Total number of females
6
= 11 = 0.5455 𝑜𝑟 54.55%

the conditional probability that a randomly selected female earns 50 or more is 0.5455 or 54.55%

1.12 Are the continuous variables in the data normally distributed? Write a note summarizing your conclusions.
Solution:

Page
Fig 1: Distribution of GPA
The distribution of GPA appears to be approximately normal, with a slight left skew (skew = - 0.31). This
suggests that most students have GPAs clustered around the mean, with slightly more values leaning towards
the higher end. Since the distribution is close to normal, standard statistical methods can be applied without
significant concerns.

Fig 2: Distribution of Salary

Page
The Salary distribution is moderately right-skewed (skew = 0.53). This indicates that most individuals earn
within a lower to mid-range salary, but a few earn significantly higher amounts, pulling the distribution to the
right. While this skewness is not extreme, it suggests that median-based statistics might provide a better
representation of central tendency than the mean.

Fig 3: Distribution of Spending

Spending is highly right-skewed (skew=1.59), meaning that most individuals spend relatively low amounts,
while a few outliers have significantly higher spending levels. This skewness suggests that a small percentage
of people drive up the average spending. A log transformation may help normalize this variable for better
statistical analysis.

Page
Fig 4: Distribution of Text messages

Similarly, Text Messages show a highly right-skewed distribution (skew=1.30), indicating that most
individuals send relatively few messages, while a smaller group sends an exceptionally high number. This
skewed nature suggests that using median-based measures or transformation techniques could be beneficial in
further analysis.

Conclusion: In conclusion, while GPA is nearly normal, Salary, Spending, and Text Messages exhibit right-
skewed distributions due to high-value outliers. If normality is required for analysis, transformations such as

Page
logarithmic scaling or non-parametric tests should be considered.

Fig 5: GPA Data: Theoretical vs. Observed Quantiles

Summary:

• The GPA data appears to be somewhat normally distributed, but there may be slight deviations in
the tails, indicating potential skewness or outliers.

Page
Fig 6: Salary Data: Theoretical vs. Observed Quantiles

• If the points follow the diagonal line closely, the Salary data is normally distributed.
• If the points deviate significantly, especially at the higher or lower ends, it suggests that the Salary data
may have a skewed distribution or outliers.
• Summary:
• The Salary data likely deviates from normality, particularly in the tails. This could indicate a right-
skewed distribution, where a few individuals have significantly higher salaries compared to the
majority.

Fig 7: Spending Data: Theoretical vs. Observed Quantiles

Page
• If the points align well with the diagonal line, the Spending data is normally distributed.
• Deviations, especially at the higher end (e.g., points curving upward), suggest that the Spending
data may have a right-skewed distribution, with some individuals spending significantly more than
others.
• Summary:
o The Spending data appears to be right-skewed, with a few outliers or individuals spending much
more than the majority. This is common in spending data, where most people spend within a
certain range, but a few spend much more.

Fig 8: Test Messages Data: Theoretical vs. Ordered Quantiles

• If the points follow the diagonal line, the Text Messages data is normally distributed.
• Deviations, especially at the higher end, suggest that the data may have a right-skewed
distribution, with some individuals sending significantly more text messages than others.
• Summary:
• The Text Messages data is likely right-skewed, with a few individuals sending a much higher
number of text messages compared to the majority. This is common in communication data,
where most people send a moderate number of messages, but a few are highly active.

Page
• Conclusion:
• GPA: Approximately normally distributed, suitable for parametric methods with minor adjustments.
• Salary: Right-skewed, non-parametric methods or transformations recommended.
• Spending: Right-skewed, non-parametric methods or transformations recommended.
• Text Messages: Right-skewed, non-parametric methods or transformations recommended.
• For the skewed datasets (Salary, Spending, and Text Messages), consider using log transformations
or non-parametric statistical tests to handle the skewness and outliers. For GPA, parametric
methods can be used, but it’s important to check for outliers or slight deviations from normality.

Problem 2:

An important quality characteristic used by the manufacturers of ABC asphalt shingles is the amount of
moisture the shingles contain when they are packaged. Customers may feel that they have purchased a product
lacking in quality if they find moisture and wet shingles inside the packaging. In some cases, excessive moisture
can cause the granules attached to the shingles for texture and coloring purposes to fall off the shingles resulting
in appearance problems. To monitor the amount of moisture present, the company conducts moisture tests. A
shingle is weighed and then dried. The shingle is then reweighed and based on the amount of moisture taken out
of the product, the pounds of moisture per 100 square feet is calculated. The company would like to show that
the mean moisture content is less than 0.35 pounds per 100 square feet.

Exploratory Data Analysis (EDA):

Exploratory Data Analysis (EDA) is essential for understanding the distribution, patterns, and potential
anomalies within the dataset before conducting statistical tests. In this analysis, we will examine Shingle A and
Shingle B individually using summary statistics and visualizations.

Visualization for each shingle type: Histogram for Shingle - A

Page
Fig 9: Histogram of Moisture Content – Shingle A

• Summary:
• The histogram shows the distribution of moisture content in Shingle A.
• If the shape is approximately normal, it indicates a well-distributed moisture content.
• If skewed, it suggests uneven moisture retention.

Boxplot of Shingle – B:

Page
Fig 10: Boxplot of Moisture Content – Shingle A

• Summary:
• The boxplot helps identify outliers (values far from the whiskers).
• If outliers exist, further investigation is needed—whether they are natural variations or errors.

Histogram for Shingle – B

Page
Fig 11: Histogram of Moisture Content – Shingle B
• Summary
• Similar insights as above but specific to Shingle B.
• A wider spread would indicate higher variability in moisture content.

Boxplot for Shingle – B

Page
Fig 12: Boxplot of Moisture Content – Shingle – B
• Summary
• Compares the moisture variability with A.
• If the box (IQR) is larger than A, Shingle B has higher variation in moisture content.

Correlation Heatmap Between Shingles A & B

Fig 13: Correlation Heatmap Between Shangles A and B

Page
2.1 Is there any evidence that the mean moisture content in both types of shingles is within the permissible
limits?

Solution:

Step 1: State the Null and Alternate Hypotheses

For both Shingle A and Shingle B, we are testing if the mean moisture content is less than the permissible limit
(0.35 pounds per 100 square feet).

• Null Hypothesis (H₀): The mean moisture content is greater than or equal to 0.35.
H₀: μ ≥ 0.35

• Alternative Hypothesis (H₁): The mean moisture content is less than 0.35.
H₁: μ < 0.35

This is a one-tailed t-test for a single mean.

Step 2: Select the Significance Level

• We will use α = 0.05 (5%), which means we will reject the null hypothesis if the p-value is less than
0.05.

Step 3: Conduct the Hypothesis Test

We use a one-sample t-test to check if the mean moisture content is significantly less than 0.35.

Results for Shingle A:

• t-statistic: -1.4735
• p-value: 0.0748

Results for Shingle B:

• t-statistic: -3.6087
• p-value: 0.00048

Page
Step 4: Decision Based on p-value

• For Shingle A: Since p-value (0.0748) > 0.05, we fail to reject H₀.
• Conclusion: There is not enough statistical evidence to conclude that the moisture content of
Shingle A is less than 0.35.
• This suggests that Shingle A may not meet the permissible moisture limit.

• For Shingle B: Since p-value (0.00048) < 0.05, we reject H₀.

• Conclusion: There is enough evidence to conclude that the moisture content of Shingle B is less
than 0.35.
• This means Shingle B meets the permissible moisture limit.

2.2 Is the population mean for shingles A and B are equal?

Solution:

Step 1: Define the Hypotheses

• Null Hypothesis (H₀): The mean moisture content of Shingle A and Shingle B are equal.
Ho: μ A = μ B
• Alternative Hypothesis (H₁): The mean moisture content of Shingle A and Shingle B are not equal.
H1: μ A = μ B

This is a two-tailed test.

Step 2: Choose Significance Level

• The level of significance is α = 0.05 (5%).

Step 3: Identify the Test Statistic

• Since the population standard deviations are unknown, and sample sizes are different, we use the
independent two-sample t-test (Welch’s t-test).
• This test assumes unequal variances and follows a t-distribution.

Page
Step 4: Compute the Test Statistic and p-value

• t-statistic: 1.3912
• p-value: 0.1686

Step 5: Decision Rule

• If p-value < 0.05, we reject H₀ (significant difference in means).

• If p-value ≥ 0.05, we fail to reject H₀ (no significant difference).

Step 6: Conclusion

Since the p-value (0.1686) is greater than the significance level (0.05), we fail to reject the null hypothesis
(H₀).

Conclusion: There is not enough statistical evidence to say that the mean moisture content of Shingle A
and Shingle B are significantly different. We conclude that their means are statistically similar at the 5%
significance level.

Problem 3
Salary is hypothesized to depend on educational qualification and occupation. To understand the dependency,
the salaries of 40 individuals are collected and each person’s educational qualification and occupation are noted.
Educational qualification is at three levels, High school graduate, Bachelor's, and Doctorate. Occupation is at
four levels, Administrative and clerical, Sales, Professional or specialty, and Executive or managerial. A
different number of observations are in each level of education–occupation combination.

Exploratory Data Analysis:

Page
Salary Distribution by Education Level (Boxplot)

Fig 14: Boxplot – Salary Distribution by Education Level

Summary of Salary Distribution by Education Level (Boxplot Analysis)

The boxplot displays the salary distribution for individuals with different education levels (High School
Graduate, Bachelor's, and Doctorate). The key insights are:

Median Salary Comparison

• Doctorate holders have the highest median salary, reflecting their advanced qualifications.
• Bachelor's degree holders earn more than High School graduates but less than Doctorate holders.
• High School graduates (HS-grad) have the lowest median salary.

Salary Variation and Spread

• Doctorate: The widest interquartile range (IQR), indicating substantial variation in salaries.
• Bachelor’s: Moderate salary variation, with a slightly smaller IQR than Doctorates.
• High School Graduates: The smallest salary spread, indicating relatively consistent earnings.

Presence of Outliers

• No extreme outliers are observed in any education category.

Page
Skewness and Distribution Shape

• The distributions for Doctorate and bachelor's degree holders are more spread out, suggesting a broader
salary range.
• The salary distribution for HS Graduates is more compact, implying lower variability in earnings.

Conclusion

The analysis suggests that salary levels tend to increase with higher education levels. Doctorate holders not only
earn the highest median salary but also show the most variation in salaries. Bachelor's degree holders follow a
similar trend but with lower earnings. High School graduates earn the least, with more stable salaries. These
patterns indicate potentially significant differences in salaries across education levels, warranting further
statistical analysis (such as ANOVA) to verify the significance of these differences.

Fig 15: Boxplot – Salary Distribution by Occupation

Summary of Salary Distribution by Occupation (Boxplot Analysis)

The boxplot illustrates salary distributions across various occupational categories: Administrative & Clerical,
Sales, Professional Specialty, and Executive & Managerial. Key observations include:

Page
Median Salary Comparison

• Executive & Managerial roles have the highest median salary, indicating that individuals in these
positions typically earn more than those in other occupations.
• Sales roles exhibit a moderately high median salary, though with a more dispersed distribution.
• Administrative & Clerical roles have a relatively lower median salary.
• Professional Specialty roles display a highly variable median salary, suggesting significant disparities in
earnings within this category.

Salary Variability and Dispersion

• Professional Specialty roles demonstrate the largest salary spread, reflecting substantial variability in
earnings.
• Executive & Managerial positions show the least variation, suggesting more consistency in salaries
within this category.
• Sales roles exhibit moderate salary dispersion.

Outliers and Distribution Characteristics

• No extreme outliers are present in any occupational category.

• Professional Specialty roles have a broader salary range, potentially indicating the presence of highly
paid specialists.
• Administrative & Clerical positions show a more compact salary distribution, signifying less variability.

Conclusion

The salary distribution varies considerably across occupations. Executive & Managerial roles tend to offer the
highest and most stable salaries, whereas Professional Specialty roles demonstrate the greatest variation,
potentially due to differences in specialization and expertise. These observations underscore the significant
influence of occupation on salary levels. To validate these differences statistically, an ANOVA test should be
conducted to determine whether they are significant.

3.1 Is there any significant difference in salaries among different levels of education?

Solution:

Page
Step 1: State the Hypotheses

• H₀: μ Doctorate = μ Bachelors = μ HS-grad

• H₁: At least one μ is different

Step 2: Check the Assumptions of ANOVA

To perform a One-Way ANOVA, we need to check the following assumptions:

1. Normality: Salaries within each education level should be approximately normally distributed. We test
this using the Shapiro-Wilk test.
2. Homogeneity of Variance: The variance in salaries across education levels should be similar. We test
this using Levene’s test.

Step 3: Conduct the Hypothesis Test (One-Way ANOVA)

We perform a One-Way ANOVA to determine whether at least one education level has a significantly different
salary compared to others.

These are the results after running code:

Unique education levels: [' Doctorate' ' Bachelors' ' HS-grad']

Normality Test (Shapiro-Wilk Test):

Levene’s Test for Homogeneity of Variance:

Test Statistic: 1.8801, p-value: 0.1669

One-Way ANOVA Results:

F-statistic: 30.9563, p-value: 0.0000
Conclusion: Reject the null hypothesis. There is a significant difference in salaries among education
levels.

Step 4: Conclusion from the Results

1. Normality Test (Shapiro-Wilk Test):

a. Doctorate: p-value = 0.0676 (normal)
b. Bachelors: p-value = 0.7051 (normal)

Page
c. HS-grad: p-value = 0.1783 (normal)
d. Since all p-values > 0.05, the normality assumption holds.
2. Homogeneity of Variance (Levene’s Test):
a. Test Statistic: 1.8801, p-value = 0.1669
b. Since p-value > 0.05, variances are equal, so we can proceed with ANOVA.
3. One-Way ANOVA Results:
a. F-statistic: 30.9563, p-value < 0.0001
b. Since p-value < 0.05, we reject H₀ and conclude that salaries differ significantly among
education levels.
4. Post-hoc Analysis (Tukey’s HSD Test) (if applicable):
a. If ANOVA is significant, Tukey’s HSD test helps identify which specific education levels have
significantly different salaries.

Conclusion

• Based on One-Way ANOVA, there is a statistically significant difference in salaries among different
education levels.
• Since ANOVA only tells us that at least one group is different, a post-hoc test (Tukey’s HSD) can be
conducted to determine which specific education levels differ from each other.
• This finding suggests that higher education levels are associated with significantly different salary
distributions.

3.2 IIs there any significant difference in salaries among different levels of different occupations?
Solution:

Step 1:

Null Hypothesis (H₀)

There is no significant difference in mean salaries among different levels of education.

H0: μ Doctorate = μ Bachelors = μ HS-grad

Alternative Hypothesis (H₁)

At least one education level has a significantly different mean salary compared to others.

H1: At least one μ is different

Page
Step 2: Assumption Checks

Normality (Shapiro-Wilk Test): Salaries within each education level should be approximately normally
distributed.
Homogeneity of Variance (Levene’s Test): The variance in salaries across education levels should be similar.

Step 3: Normality Test (Shapiro-Wilk Test)

Education Level p-value Normality Assumption

Doctorate 0.0676 Normal
Bachelors 0.7051 Normal
HS-grad 0.1783 Normal

Since all p-values > 0.05, the normality assumption holds.

Step 4: Homogeneity of Variance (Levene’s Test)

• Test Statistic: 1.8801

• p-value: 0.1669
Since p-value > 0.05, variances are equal, so we can proceed with ANOVA.

Step 4: Conduct the Hypothesis Test (One-Way ANOVA)

• F-statistic: 30.9563
• p-value: < 0.0001

Since p-value < 0.05, we reject the null hypothesis (H₀) and conclude that salaries differ significantly among
education levels.

Conclusion

3.3 Is there a significant interaction between Education and Occupation on Salary?

Page
Solution:

Step1: Null Hypothesis (H₀):

H0: Effect of Education × Effect of Occupation=0

Alternative Hypothesis (H₁):

H1: Effect of Education × Effect of Occupation ≠0

Step 2: Check the Assumptions

1. Normality Check (Shapiro-Wilk Test)

a. Ideally, salaries within each Education-Occupation group should be normally distributed.
b. However, some groups have very few observations (e.g., Exec-managerial for HS-grad = 0),
making normality difficult to assess.

2.Homogeneity of Variance (Levene’s Test)

• This should be conducted to ensure similar variance across Education-Occupation groups.

• The assumption might be violated due to the unbalanced design.

Step 3: Conduct the Hypothesis Test (Two-Way ANOVA)

ANOVA Table

Source Sum of Squares df F-Statistic p-value

Education 1.94e+11 2 136.33 1.76e-12
Occupation 4.08e+08 3 0.19 0.8270
Education × Occupation 4.23e+10 6 9.91 1.32e-05
Residual 2.06e+10 29 - -
• Education: The p-value (1.76e-12) is very small (< 0.05), indicating that Education has a significant
effect on Salary.
• Occupation: The p-value (0.8270) is greater than 0.05, meaning that Occupation does not have a
significant effect on Salary.
• Education × Occupation Interaction: The p-value (1.32e-05) is less than 0.05, meaning that there is a
significant interaction between Education and Occupation on Salary.

Page
Step 4: Conclusion

• Since the interaction effect (Education × Occupation) is significant (p < 0.05), we reject the null
hypothesis.
• This means that the effect of Education on Salary depends on Occupation, and vice versa.
• However, due to the unbalanced group sizes and empty cells (Exec-managerial for HS-grad = 0), the
results should be interpreted with caution.

Page

Michael C. Whitlock and Dolph Schluter - The Analysis of Biological Data (2015, W. H. Freeman and Company)
No ratings yet
Michael C. Whitlock and Dolph Schluter - The Analysis of Biological Data (2015, W. H. Freeman and Company)
1,058 pages
Basic Statistics For Health Sciences
91% (11)
Basic Statistics For Health Sciences
361 pages
Arnab Chowdhury As1
No ratings yet
Arnab Chowdhury As1
12 pages
Inferential Statistics Guided Project
No ratings yet
Inferential Statistics Guided Project
34 pages
Inferential Statistics Project
No ratings yet
Inferential Statistics Project
33 pages
IS Extended Project Sri
No ratings yet
IS Extended Project Sri
7 pages
Statistics Explained, 4th Edition Full PDF Download
100% (11)
Statistics Explained, 4th Edition Full PDF Download
14 pages
Answer Book
No ratings yet
Answer Book
10 pages
Statistics Explained - 4th Edition ISBN 0367366355, 9780367366353 [FULL VERSION DOWNLOAD]
No ratings yet
Statistics Explained - 4th Edition ISBN 0367366355, 9780367366353 [FULL VERSION DOWNLOAD]
16 pages
Ba Report
No ratings yet
Ba Report
8 pages
Problem 1
No ratings yet
Problem 1
5 pages
Introduction Qr
No ratings yet
Introduction Qr
34 pages
DOC-20240509-WA0008.
No ratings yet
DOC-20240509-WA0008.
157 pages
E-Note 20895 Content Document 20240607120458PM
No ratings yet
E-Note 20895 Content Document 20240607120458PM
202 pages
Business Research CH-6
No ratings yet
Business Research CH-6
28 pages
Final Exam Review: Test Scores Frequency
100% (1)
Final Exam Review: Test Scores Frequency
10 pages
Intro To Probability and Statistics
No ratings yet
Intro To Probability and Statistics
147 pages
Inferential Statistics
No ratings yet
Inferential Statistics
48 pages
STAT 250 Practice Problem Solutions
100% (1)
STAT 250 Practice Problem Solutions
5 pages
Introduction Qr1
No ratings yet
Introduction Qr1
34 pages
Statistics for the Social Sciences 1729780459. Print
No ratings yet
Statistics for the Social Sciences 1729780459. Print
1,113 pages
Sample Problems On Data Analysis: What Is Your Favorite Class?
No ratings yet
Sample Problems On Data Analysis: What Is Your Favorite Class?
8 pages
Business Statistics Outline
No ratings yet
Business Statistics Outline
5 pages
HW 9.3 Solutions
No ratings yet
HW 9.3 Solutions
6 pages
Understanding Education Statistics
No ratings yet
Understanding Education Statistics
43 pages
638256045-Untitled
No ratings yet
638256045-Untitled
16 pages
Solutions Manual to accompany Miller & Freund’s Probability and Statistics for Engineers 8th edition 0321640772 pdf download
100% (2)
Solutions Manual to accompany Miller & Freund’s Probability and Statistics for Engineers 8th edition 0321640772 pdf download
45 pages
Consists of 440 Large Retailers'
No ratings yet
Consists of 440 Large Retailers'
12 pages
2743021a949b2be20a570e94ff11f796 (1)
No ratings yet
2743021a949b2be20a570e94ff11f796 (1)
17 pages
Anova Report Ankita PDF
No ratings yet
Anova Report Ankita PDF
16 pages
Solutions
No ratings yet
Solutions
8 pages
Prob
No ratings yet
Prob
2 pages
Assignment - Basics Statics Level 1
100% (2)
Assignment - Basics Statics Level 1
15 pages
asstat
No ratings yet
asstat
6 pages
Essential Stats For Decision Making-1 Descriptive Stats-2011
No ratings yet
Essential Stats For Decision Making-1 Descriptive Stats-2011
116 pages
Module 3 - Assignment
No ratings yet
Module 3 - Assignment
8 pages
Mostly Harmless Statistics
No ratings yet
Mostly Harmless Statistics
506 pages
AP Statistics
No ratings yet
AP Statistics
49 pages
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
No ratings yet
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
33 pages
Discussant: Ederlyn Punzalan-Chiong
No ratings yet
Discussant: Ederlyn Punzalan-Chiong
15 pages
Probability and Statistics For Engineers Applied Statistics: Course 461601 Course 400516
No ratings yet
Probability and Statistics For Engineers Applied Statistics: Course 461601 Course 400516
22 pages
Statistics Exercises
No ratings yet
Statistics Exercises
34 pages
Test Bank for Stats Data and Models 5th by De Veaux instant download
100% (1)
Test Bank for Stats Data and Models 5th by De Veaux instant download
41 pages
Lecture Note Sse2193
33% (3)
Lecture Note Sse2193
251 pages
E-Note_24354_Content_Document_20240917024357PM
No ratings yet
E-Note_24354_Content_Document_20240917024357PM
4 pages
Final Assignment 1 MEMO - STAT
No ratings yet
Final Assignment 1 MEMO - STAT
8 pages
Introduction To Statistics 1662031282
100% (1)
Introduction To Statistics 1662031282
936 pages
Muklis
No ratings yet
Muklis
23 pages
Solutions Manual to accompany Miller & Freund’s Probability and Statistics for Engineers 8th edition 0321640772 pdf download
100% (2)
Solutions Manual to accompany Miller & Freund’s Probability and Statistics for Engineers 8th edition 0321640772 pdf download
48 pages
Chapter 1 Introduction to Busienss Statistics - S (1)
No ratings yet
Chapter 1 Introduction to Busienss Statistics - S (1)
74 pages
Module-5-Inferential-Statistics
No ratings yet
Module-5-Inferential-Statistics
8 pages
Lesson 12 T Test For Independent or Uncorrelated Data
No ratings yet
Lesson 12 T Test For Independent or Uncorrelated Data
12 pages
Data Analysis for the Social Sciences Integrating Theory and Practice - 1st Edition Google Drive Download
100% (15)
Data Analysis for the Social Sciences Integrating Theory and Practice - 1st Edition Google Drive Download
17 pages
Statistics and Probability
No ratings yet
Statistics and Probability
22 pages
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
From Everand
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
NAGARAJU CHEVURU
No ratings yet
Business Statistics For Dummies
From Everand
Business Statistics For Dummies
Alan Anderson
No ratings yet
Thinking Statistically
From Everand
Thinking Statistically
Anthony Banfield
5/5 (1)
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
From Everand
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
Jens K. Perret
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
CP CPK Capability Calculation Sheet v3
No ratings yet
CP CPK Capability Calculation Sheet v3
6 pages
NCM 111 Lec Fri
No ratings yet
NCM 111 Lec Fri
12 pages
Central Tendecies Mean (20230527 - 085846)
No ratings yet
Central Tendecies Mean (20230527 - 085846)
9 pages
NTS Stage 2 Sample Paper
No ratings yet
NTS Stage 2 Sample Paper
28 pages
Analysis of Variance
100% (1)
Analysis of Variance
10 pages
Filds311 Prelims Reviewer
No ratings yet
Filds311 Prelims Reviewer
14 pages
How To Set Up and Develop A Geometallurgical Program
No ratings yet
How To Set Up and Develop A Geometallurgical Program
245 pages
ABC Analysis
No ratings yet
ABC Analysis
2 pages
Advanced R Notes
No ratings yet
Advanced R Notes
28 pages
Six Sigma Black Belt: Indian Statistical Institute SQC & or Unit Hyderabad
No ratings yet
Six Sigma Black Belt: Indian Statistical Institute SQC & or Unit Hyderabad
5 pages
Customer Relationship Management in Banks ppt 22 may
No ratings yet
Customer Relationship Management in Banks ppt 22 may
48 pages
Pca Implementation Notebook
No ratings yet
Pca Implementation Notebook
4 pages
Performance Analysis of A Cricketer by Data Visualization
No ratings yet
Performance Analysis of A Cricketer by Data Visualization
10 pages
bayes_R2_v3
No ratings yet
bayes_R2_v3
6 pages
CHP Eight Statistic
No ratings yet
CHP Eight Statistic
17 pages
IT Infrastructure Management
No ratings yet
IT Infrastructure Management
24 pages
Practical Research 2 Template
No ratings yet
Practical Research 2 Template
10 pages
ANOVA Table
No ratings yet
ANOVA Table
3 pages
STA1000S Finished Notes
No ratings yet
STA1000S Finished Notes
73 pages
Information Theory and Coding - Chapter 5
No ratings yet
Information Theory and Coding - Chapter 5
41 pages
Chapter One: Introduction: Education. (P. 6) 1
No ratings yet
Chapter One: Introduction: Education. (P. 6) 1
12 pages
Credit Card Fraud Detection Using Predictive Modeling: A Review
No ratings yet
Credit Card Fraud Detection Using Predictive Modeling: A Review
7 pages
Research Design Lecture Notes UNIT II
No ratings yet
Research Design Lecture Notes UNIT II
12 pages
Statistical Interpretation of Data - : Guide To
No ratings yet
Statistical Interpretation of Data - : Guide To
24 pages
Immaculate Conception School of Baliuag: Statistics and Probability Performance Task
No ratings yet
Immaculate Conception School of Baliuag: Statistics and Probability Performance Task
15 pages
AAiT PECC 2015 Year II Sem I Sections 21
No ratings yet
AAiT PECC 2015 Year II Sem I Sections 21
21 pages
Study of Anova Test
No ratings yet
Study of Anova Test
16 pages
Related Studies Local 1
No ratings yet
Related Studies Local 1
7 pages
Ebm Quiz March 2023
No ratings yet
Ebm Quiz March 2023
39 pages
Chapter 12 Data Test A Answers
No ratings yet
Chapter 12 Data Test A Answers
9 pages