Assignment 8
Assignment 8
Raymond Guo
2020-04-04
Exercise 1
Exercise 2
ggplot(data = pregnancy_length) +
labs(title = "Pregnancy Length Impacted by Birth Order",
x = "Pregnancy Length", y = "Count") +
geom_histogram(mapping = aes(x = prglngth, fill = birth_order),
binwidth = 1,
position = "identity",
alpha = 0.5) +
coord_cartesian(xlim = combine(27, 46))
1
Pregnancy Length Impacted by Birth Order
2000
Count birth_order
first
1000 other
0
30 35 40 45
Pregnancy Length
The shape of the histogram is left skewed. The center is positioned around 38 in pregnancy length.
Based off the spread, it seems to be a lot of outliers particularly to the left of the center of tension
compared to the right. The reason for this is unknown, so I will say the spread looks even.
Based off this graph, it shows that babies borned first takes a longer pregnancy length compared to
other babies. In conclusion, first borns arrive late compared to their other siblings.
Exercise 3
pregnancy_length %>%
filter(birth_order == "first") %>%
summarize(
mean = mean(prglngth),
median = median(prglngth),
deviation = sd(prglngth),
iqr = IQR(prglngth),
minimum = min(prglngth),
maximum = max(prglngth)
)
pregnancy_length %>%
filter(birth_order == "other") %>%
summarize(
mean = mean(prglngth),
median = median(prglngth),
deviation = sd(prglngth),
iqr = IQR(prglngth),
minimum = min(prglngth),
2
maximum = max(prglngth)
)
The mean, median, standard deviation, and maximum all makes sense based off the histogram
showing a huge amount of tension at around value 38 in pregnancy length for “first” and “other”.
The only notable problem is the minimum in which there seemed to be an error with the dataset
because there is no such thing as giving birth within 0 seconds. Lastly, the interquartile range may
not be significant to the human eye, but it is important to note that the births that did not come
first have a higher number of recordings at the center of tension compared to the births that were
first. This explains why the births representing “other” has an interquartile range of 0 because so
many numbers hit exactly 38 pregnancy length. The births coming “first” does not hit 0 because
there is not enough numbers hitting 38 compared to births from “other”.
Exercise 4
The test statistic would be “diff in means”.
Null hypothesis: There is no significant difference between first borns and non-first borns. Alternative
hypothesis: There is a significant difference between first borns and non-first borns.
This is one sided hypothesis test
Exercise 5
pregnancy_length_null %>%
get_p_value(obs_stat = pregnancy_length_obs_stat, direction = "right")
p_value
0.0866
3
shade_p_value(obs_stat = pregnancy_length_obs_stat, direction = "right") +
labs(title = "Simulated Null Distributions for Birth Order",
x = "Difference in Means", y = "Count")
2000
1500
Count
1000
500
0
−0.2 −0.1 0.0 0.1 0.2
Difference in Means
Exercise 6
2.5% 97.5%
-0.110664 0.1117476
Yes, the null result falls within range of the 95% confidence interval
birth_bootstraps %>%
visualize() +
shade_confidence_interval(bootstrap_ci) +
labs(title = "Simulated Null Distributions for Birth Order",
x = "Difference in Means", y = "Count")
4
Simulated Null Distributions for Birth Order
2000
Count 1500
1000
500
0
−0.2 −0.1 0.0 0.1 0.2
Difference in Means
Exercise 7
bootstrap_report(bootstrap_results)
5
## 0.0288791
plot_ci(bootstrap_results)
15
PMF
10
0
−0.06 −0.04 −0.02 0.00 0.02 0.04 0.06 0.08 0.10
Cohen's d
The effect size for the difference in pregnancy lengths for “first” and “other” from birth_order is
roughly “very small”" where 0.0288791 rounds closest to 0.01 on the given table.