0% found this document useful (0 votes)
230 views

Assignment 8

This document contains code and analysis to examine differences in pregnancy length based on birth order using data from the National Survey of Family Growth. The analysis includes: 1) Filtering the data to separate first births from other births. 2) Creating a histogram showing pregnancy length is slightly longer for first births compared to other births. 3) Calculating summary statistics finding means, medians, and ranges are similar between groups. 4) Performing a permutation test finding no significant difference between groups. 5) Conducting a bootstrap analysis finding the null result falls within the 95% confidence interval, indicating no significant effect. 6) Calculating the effect size, which is found to be very

Uploaded by

Ray Guo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
230 views

Assignment 8

This document contains code and analysis to examine differences in pregnancy length based on birth order using data from the National Survey of Family Growth. The analysis includes: 1) Filtering the data to separate first births from other births. 2) Creating a histogram showing pregnancy length is slightly longer for first births compared to other births. 3) Calculating summary statistics finding means, medians, and ranges are similar between groups. 4) Performing a permutation test finding no significant difference between groups. 5) Conducting a bootstrap analysis finding the null result falls within the 95% confidence interval, indicating no significant effect. 6) Calculating the effect size, which is found to be very

Uploaded by

Ray Guo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Assignment 8: Birth Times

Raymond Guo
2020-04-04

Exercise 1

live_births <- nsfg6 %>%


filter(outcome == 1)

first_births <- live_births %>%


filter(birthord == 1) %>%
mutate(birth_order = "first")

other_births <- live_births %>%


filter(birthord > 1) %>%
mutate(birth_order = "other")

pregnancy_length <- first_births %>%


bind_rows(other_births) %>%
select(prglngth, birth_order)

Exercise 2

ggplot(data = pregnancy_length) +
labs(title = "Pregnancy Length Impacted by Birth Order",
x = "Pregnancy Length", y = "Count") +
geom_histogram(mapping = aes(x = prglngth, fill = birth_order),
binwidth = 1,
position = "identity",
alpha = 0.5) +
coord_cartesian(xlim = combine(27, 46))

1
Pregnancy Length Impacted by Birth Order

2000

Count birth_order
first
1000 other

0
30 35 40 45
Pregnancy Length

The shape of the histogram is left skewed. The center is positioned around 38 in pregnancy length.
Based off the spread, it seems to be a lot of outliers particularly to the left of the center of tension
compared to the right. The reason for this is unknown, so I will say the spread looks even.
Based off this graph, it shows that babies borned first takes a longer pregnancy length compared to
other babies. In conclusion, first borns arrive late compared to their other siblings.

Exercise 3

pregnancy_length %>%
filter(birth_order == "first") %>%
summarize(
mean = mean(prglngth),
median = median(prglngth),
deviation = sd(prglngth),
iqr = IQR(prglngth),
minimum = min(prglngth),
maximum = max(prglngth)
)

mean median deviation iqr minimum maximum


38.60095 39 2.791901 1 0 48

pregnancy_length %>%
filter(birth_order == "other") %>%
summarize(
mean = mean(prglngth),
median = median(prglngth),
deviation = sd(prglngth),
iqr = IQR(prglngth),
minimum = min(prglngth),

2
maximum = max(prglngth)
)

mean median deviation iqr minimum maximum


38.52291 39 2.615852 0 4 50

The mean, median, standard deviation, and maximum all makes sense based off the histogram
showing a huge amount of tension at around value 38 in pregnancy length for “first” and “other”.
The only notable problem is the minimum in which there seemed to be an error with the dataset
because there is no such thing as giving birth within 0 seconds. Lastly, the interquartile range may
not be significant to the human eye, but it is important to note that the births that did not come
first have a higher number of recordings at the center of tension compared to the births that were
first. This explains why the births representing “other” has an interquartile range of 0 because so
many numbers hit exactly 38 pregnancy length. The births coming “first” does not hit 0 because
there is not enough numbers hitting 38 compared to births from “other”.

Exercise 4
The test statistic would be “diff in means”.
Null hypothesis: There is no significant difference between first borns and non-first borns. Alternative
hypothesis: There is a significant difference between first borns and non-first borns.
This is one sided hypothesis test

Exercise 5

pregnancy_length_null <- pregnancy_length %>%


specify(formula = prglngth ~ birth_order) %>%
hypothesize(null = "independence") %>%
generate(reps = 10000, type = "permute") %>%
calculate(stat = "diff in means", order = combine("first", "other"))

pregnancy_length_obs_stat <-pregnancy_length %>%


specify(formula = prglngth ~ birth_order) %>%
calculate(stat = "diff in means", order = combine("first", "other"))

pregnancy_length_null %>%
get_p_value(obs_stat = pregnancy_length_obs_stat, direction = "right")

p_value
0.0866

The p value is greater than 0.05, so we reject the null hypothesis.


pregnancy_length_null %>%
visualize() +

3
shade_p_value(obs_stat = pregnancy_length_obs_stat, direction = "right") +
labs(title = "Simulated Null Distributions for Birth Order",
x = "Difference in Means", y = "Count")

Simulated Null Distributions for Birth Order

2000

1500
Count

1000

500

0
−0.2 −0.1 0.0 0.1 0.2
Difference in Means

Exercise 6

birth_bootstraps <- pregnancy_length %>%


specify(prglngth ~ birth_order) %>%
hypothesize(null = "independence") %>%
generate(10000, type = "permute") %>%
calculate(stat = "diff in means", order = c("first", "other"))

bootstrap_ci <- birth_bootstraps %>%


get_confidence_interval()
bootstrap_ci

2.5% 97.5%
-0.110664 0.1117476

Yes, the null result falls within range of the 95% confidence interval
birth_bootstraps %>%
visualize() +
shade_confidence_interval(bootstrap_ci) +
labs(title = "Simulated Null Distributions for Birth Order",
x = "Difference in Means", y = "Count")

4
Simulated Null Distributions for Birth Order

2000

Count 1500

1000

500

0
−0.2 −0.1 0.0 0.1 0.2
Difference in Means

Exercise 7

bootstrap_results <- cohens_d_bootstrap(


data = pregnancy_length, model = prglngth ~ birth_order
)

bootstrap_report(bootstrap_results)

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS


## Based on 5000 bootstrap replicates
##
## CALL :
## boot::boot.ci(boot.out = cohens_d_bootstrap_sim, type = c("perc"))
##
## Intervals :
## Level Percentile
## 95% (-0.0122, 0.0706 )
## Calculations and Intervals on Original Scale
##
## Response variable
## prglngth
##
## Explanatory variable
## birth_order
##
## Explanatory category with larger mean
## first
##
## Explanatory category with smaller mean
## other
##
## Cohen's d observed value

5
## 0.0288791
plot_ci(bootstrap_results)

Bootstrap distribution: Cohen's d


confidence interval (−0.0122, 0.0706)
20

15
PMF

10

0
−0.06 −0.04 −0.02 0.00 0.02 0.04 0.06 0.08 0.10
Cohen's d

The effect size for the difference in pregnancy lengths for “first” and “other” from birth_order is
roughly “very small”" where 0.0288791 rounds closest to 0.01 on the given table.

You might also like