0% found this document useful (0 votes)

230 views

Assignment 8

This document contains code and analysis to examine differences in pregnancy length based on birth order using data from the National Survey of Family Growth. The analysis includes: 1) Filtering the data to separate first births from other births. 2) Creating a histogram showing pregnancy length is slightly longer for first births compared to other births. 3) Calculating summary statistics finding means, medians, and ranges are similar between groups. 4) Performing a permutation test finding no significant difference between groups. 5) Conducting a bootstrap analysis finding the null result falls within the 95% confidence interval, indicating no significant effect. 6) Calculating the effect size, which is found to be very

Uploaded by

Ray Guo

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

230 views

Assignment 8

Uploaded by

Ray Guo

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Assignment 8: Birth Times

Raymond Guo
2020-04-04

Exercise 1

live_births <- nsfg6 %>%

filter(outcome == 1)

first_births <- live_births %>%

filter(birthord == 1) %>%
mutate(birth_order = "first")

other_births <- live_births %>%

filter(birthord > 1) %>%
mutate(birth_order = "other")

pregnancy_length <- first_births %>%

bind_rows(other_births) %>%
select(prglngth, birth_order)

Exercise 2

ggplot(data = pregnancy_length) +
labs(title = "Pregnancy Length Impacted by Birth Order",
x = "Pregnancy Length", y = "Count") +
geom_histogram(mapping = aes(x = prglngth, fill = birth_order),
binwidth = 1,
position = "identity",
alpha = 0.5) +
coord_cartesian(xlim = combine(27, 46))

1
Pregnancy Length Impacted by Birth Order

2000

Count birth_order
first
1000 other

0
30 35 40 45
Pregnancy Length

The shape of the histogram is left skewed. The center is positioned around 38 in pregnancy length.
Based off the spread, it seems to be a lot of outliers particularly to the left of the center of tension
compared to the right. The reason for this is unknown, so I will say the spread looks even.
Based off this graph, it shows that babies borned first takes a longer pregnancy length compared to
other babies. In conclusion, first borns arrive late compared to their other siblings.

Exercise 3

pregnancy_length %>%
filter(birth_order == "first") %>%
summarize(
mean = mean(prglngth),
median = median(prglngth),
deviation = sd(prglngth),
iqr = IQR(prglngth),
minimum = min(prglngth),
maximum = max(prglngth)
)

mean median deviation iqr minimum maximum

38.60095 39 2.791901 1 0 48

pregnancy_length %>%
filter(birth_order == "other") %>%
summarize(
mean = mean(prglngth),
median = median(prglngth),
deviation = sd(prglngth),
iqr = IQR(prglngth),
minimum = min(prglngth),

2
maximum = max(prglngth)
)

mean median deviation iqr minimum maximum

38.52291 39 2.615852 0 4 50

The mean, median, standard deviation, and maximum all makes sense based off the histogram
showing a huge amount of tension at around value 38 in pregnancy length for “first” and “other”.
The only notable problem is the minimum in which there seemed to be an error with the dataset
because there is no such thing as giving birth within 0 seconds. Lastly, the interquartile range may
not be significant to the human eye, but it is important to note that the births that did not come
first have a higher number of recordings at the center of tension compared to the births that were
first. This explains why the births representing “other” has an interquartile range of 0 because so
many numbers hit exactly 38 pregnancy length. The births coming “first” does not hit 0 because
there is not enough numbers hitting 38 compared to births from “other”.

Exercise 4
The test statistic would be “diff in means”.
Null hypothesis: There is no significant difference between first borns and non-first borns. Alternative
hypothesis: There is a significant difference between first borns and non-first borns.
This is one sided hypothesis test

Exercise 5

pregnancy_length_null <- pregnancy_length %>%

specify(formula = prglngth ~ birth_order) %>%
hypothesize(null = "independence") %>%
generate(reps = 10000, type = "permute") %>%
calculate(stat = "diff in means", order = combine("first", "other"))

pregnancy_length_obs_stat <-pregnancy_length %>%

specify(formula = prglngth ~ birth_order) %>%
calculate(stat = "diff in means", order = combine("first", "other"))

pregnancy_length_null %>%
get_p_value(obs_stat = pregnancy_length_obs_stat, direction = "right")

p_value
0.0866

The p value is greater than 0.05, so we reject the null hypothesis.

pregnancy_length_null %>%
visualize() +

3
shade_p_value(obs_stat = pregnancy_length_obs_stat, direction = "right") +
labs(title = "Simulated Null Distributions for Birth Order",
x = "Difference in Means", y = "Count")

Simulated Null Distributions for Birth Order

2000

1500
Count

1000

500

0
−0.2 −0.1 0.0 0.1 0.2
Difference in Means

Exercise 6

birth_bootstraps <- pregnancy_length %>%

specify(prglngth ~ birth_order) %>%
hypothesize(null = "independence") %>%
generate(10000, type = "permute") %>%
calculate(stat = "diff in means", order = c("first", "other"))

bootstrap_ci <- birth_bootstraps %>%

get_confidence_interval()
bootstrap_ci

2.5% 97.5%
-0.110664 0.1117476

Yes, the null result falls within range of the 95% confidence interval
birth_bootstraps %>%
visualize() +
shade_confidence_interval(bootstrap_ci) +
labs(title = "Simulated Null Distributions for Birth Order",
x = "Difference in Means", y = "Count")

4
Simulated Null Distributions for Birth Order

2000

Count 1500

1000

500

0
−0.2 −0.1 0.0 0.1 0.2
Difference in Means

Exercise 7

bootstrap_results <- cohens_d_bootstrap(

data = pregnancy_length, model = prglngth ~ birth_order
)

bootstrap_report(bootstrap_results)

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS

## Based on 5000 bootstrap replicates
##
## CALL :
## boot::boot.ci(boot.out = cohens_d_bootstrap_sim, type = c("perc"))
##
## Intervals :
## Level Percentile
## 95% (-0.0122, 0.0706 )
## Calculations and Intervals on Original Scale
##
## Response variable
## prglngth
##
## Explanatory variable
## birth_order
##
## Explanatory category with larger mean
## first
##
## Explanatory category with smaller mean
## other
##
## Cohen's d observed value

5
## 0.0288791
plot_ci(bootstrap_results)

Bootstrap distribution: Cohen's d

confidence interval (−0.0122, 0.0706)
20

15
PMF

0
−0.06 −0.04 −0.02 0.00 0.02 0.04 0.06 0.08 0.10
Cohen's d

The effect size for the difference in pregnancy lengths for “first” and “other” from birth_order is
roughly “very small”" where 0.0288791 rounds closest to 0.01 on the given table.

Building Winning Algorithmic Trading Systems
14% (7)
Building Winning Algorithmic Trading Systems
1 page
Assignment 9
No ratings yet
Assignment 9
8 pages
Assignment 3
No ratings yet
Assignment 3
6 pages
Chapter 13 Decision Analysis Test Bank PDF
100% (3)
Chapter 13 Decision Analysis Test Bank PDF
42 pages
Chapter 3 Methods and Procedures This Chapter
90% (48)
Chapter 3 Methods and Procedures This Chapter
11 pages
Assignment 4
No ratings yet
Assignment 4
4 pages
Glossary of Statistical Terms and Symbols
No ratings yet
Glossary of Statistical Terms and Symbols
4 pages
Discourse As Social Interaction - Fairclough
100% (1)
Discourse As Social Interaction - Fairclough
11 pages
ML Section16 Causality
No ratings yet
ML Section16 Causality
57 pages
Analytical Studies
No ratings yet
Analytical Studies
23 pages
2 Sample T Test
No ratings yet
2 Sample T Test
9 pages
Marketing Research: MRKT 451 Experimentation I
No ratings yet
Marketing Research: MRKT 451 Experimentation I
41 pages
Hypothesis Testing in R
No ratings yet
Hypothesis Testing in R
13 pages
Assignment 7-Inference-for-Numerical-Data
No ratings yet
Assignment 7-Inference-for-Numerical-Data
5 pages
07 - Inference For Numerical Data
No ratings yet
07 - Inference For Numerical Data
3 pages
ProbList5-24-Sln
No ratings yet
ProbList5-24-Sln
9 pages
Exercise 2 Exam1practice Sa
No ratings yet
Exercise 2 Exam1practice Sa
11 pages
Biostatistics For Public Health: Chapter 11 - Inference About A Mean
No ratings yet
Biostatistics For Public Health: Chapter 11 - Inference About A Mean
31 pages
Biostats Lecture 10 Inference for Means(3)
No ratings yet
Biostats Lecture 10 Inference for Means(3)
43 pages
Collection of exercises - Basic statistics
No ratings yet
Collection of exercises - Basic statistics
27 pages
Difference Between Two Proportions: Using The Normal Approximation
No ratings yet
Difference Between Two Proportions: Using The Normal Approximation
10 pages
Big Data Analytics Lab File
No ratings yet
Big Data Analytics Lab File
61 pages
Inference For Numerical Data
No ratings yet
Inference For Numerical Data
3 pages
Local Media6288925927020885212
No ratings yet
Local Media6288925927020885212
12 pages
Stat5000 HW 1-2
No ratings yet
Stat5000 HW 1-2
3 pages
Fixed Versus Random Effects
No ratings yet
Fixed Versus Random Effects
82 pages
Week 4 - Statistical hypothesis testing (2)(1)
No ratings yet
Week 4 - Statistical hypothesis testing (2)(1)
22 pages
Further Statistics
No ratings yet
Further Statistics
31 pages
Computer Practical 3
No ratings yet
Computer Practical 3
7 pages
2024HW2Boot GOF Eng (1)
No ratings yet
2024HW2Boot GOF Eng (1)
4 pages
HW2 Solutions
No ratings yet
HW2 Solutions
8 pages
STAT 222 Spring 2021 HW2 Solutions: Deadline Reading
No ratings yet
STAT 222 Spring 2021 HW2 Solutions: Deadline Reading
7 pages
Lab Report For APSC 254
No ratings yet
Lab Report For APSC 254
6 pages
Week 13
No ratings yet
Week 13
21 pages
Gerstman_PP11
No ratings yet
Gerstman_PP11
37 pages
Preprocessing - Preprocessing Your Data With R
No ratings yet
Preprocessing - Preprocessing Your Data With R
23 pages
Chapter 19 Main Topics
No ratings yet
Chapter 19 Main Topics
6 pages
STAT22209 - Nonparametric Statistics
No ratings yet
STAT22209 - Nonparametric Statistics
74 pages
Biostatistics 521 Lecture 14 Inference For Numerical Data II
No ratings yet
Biostatistics 521 Lecture 14 Inference For Numerical Data II
79 pages
notes on
No ratings yet
notes on
47 pages
Exposed To Risk
No ratings yet
Exposed To Risk
11 pages
LESSON 2 - Testing Difference of Two Means
No ratings yet
LESSON 2 - Testing Difference of Two Means
56 pages
Logistic Regression Diagnostics, Splines and Interactions: Sandy Eckel Seckel@jhsph - Edu 19 May 2007
No ratings yet
Logistic Regression Diagnostics, Splines and Interactions: Sandy Eckel Seckel@jhsph - Edu 19 May 2007
44 pages
Unit Test 2 CS1
No ratings yet
Unit Test 2 CS1
7 pages
R commands New 2
No ratings yet
R commands New 2
23 pages
Document From Da??
No ratings yet
Document From Da??
25 pages
Ken Black QA ch13
100% (1)
Ken Black QA ch13
56 pages
Data1901 Notes
No ratings yet
Data1901 Notes
70 pages
Difference Between Two Proportions: Using The Normal Approximation
No ratings yet
Difference Between Two Proportions: Using The Normal Approximation
10 pages
Chapter 9 Correlation and Regression
No ratings yet
Chapter 9 Correlation and Regression
10 pages
Chapter 1: Introduction To Data: Openintro Statistics, 4Th Edition
No ratings yet
Chapter 1: Introduction To Data: Openintro Statistics, 4Th Edition
85 pages
R Session Bootstrapping Randomisation 2024
No ratings yet
R Session Bootstrapping Randomisation 2024
4 pages
Unit 2 Assignment SKELETON R spr18
No ratings yet
Unit 2 Assignment SKELETON R spr18
12 pages
Unit 540 Differences Between Two Groups Without Answers
No ratings yet
Unit 540 Differences Between Two Groups Without Answers
5 pages
X y X y X Y: 1. Two Means (T-Test) Age (Yrs)
No ratings yet
X y X y X Y: 1. Two Means (T-Test) Age (Yrs)
6 pages
Physics ML
No ratings yet
Physics ML
10 pages
Inbound 5502677004412826692
No ratings yet
Inbound 5502677004412826692
61 pages
Country Export in Millions of Euros in 2007
No ratings yet
Country Export in Millions of Euros in 2007
11 pages
Problem Set 10 Problem 1: D) Runner's Hypothesis Is Wrong As The More She Runs During The Week The Faster
No ratings yet
Problem Set 10 Problem 1: D) Runner's Hypothesis Is Wrong As The More She Runs During The Week The Faster
3 pages
Presentation1-t-TEST-MCC-703
No ratings yet
Presentation1-t-TEST-MCC-703
43 pages
T-Test With Likert Scale Variables
No ratings yet
T-Test With Likert Scale Variables
5 pages
Biostatistics End of Semester Exam Notes
No ratings yet
Biostatistics End of Semester Exam Notes
26 pages
Analysis of continuous outcome measures
No ratings yet
Analysis of continuous outcome measures
65 pages
Introduction To Hypothesis Tests: Assistant Prof. Dr. Özgür Tosun
No ratings yet
Introduction To Hypothesis Tests: Assistant Prof. Dr. Özgür Tosun
71 pages
Chromatography Reviewer
No ratings yet
Chromatography Reviewer
4 pages
Homework 7 Solution
No ratings yet
Homework 7 Solution
7 pages
01 Jul 2017 To 30 Jun 2018
No ratings yet
01 Jul 2017 To 30 Jun 2018
2 pages
Exercises Vector - Calculus - Module3TN - AM - 2021-2022
No ratings yet
Exercises Vector - Calculus - Module3TN - AM - 2021-2022
2 pages
SHC Bioanalyical Techniques Level I (Phase I)
No ratings yet
SHC Bioanalyical Techniques Level I (Phase I)
5 pages
Sec 2-3-4 Examples
No ratings yet
Sec 2-3-4 Examples
9 pages
Experiment Number 7: Time Domain Signal Analysis Part 2 7.1 Objective
No ratings yet
Experiment Number 7: Time Domain Signal Analysis Part 2 7.1 Objective
4 pages
Analisis Kesalahan Siswa Dalam Menyelesaikan Soal Cerita Matematika Berdasarkan Analisis Kesalahan Newman Pada Siswa Kelas VIII SMP Negeri 7 Padang
No ratings yet
Analisis Kesalahan Siswa Dalam Menyelesaikan Soal Cerita Matematika Berdasarkan Analisis Kesalahan Newman Pada Siswa Kelas VIII SMP Negeri 7 Padang
6 pages
Hypothesis Testing 1,2 PPT 1
No ratings yet
Hypothesis Testing 1,2 PPT 1
30 pages
Lecture-12 Canonical Correlation
No ratings yet
Lecture-12 Canonical Correlation
13 pages
Chapter # 03 Measures of Central Tendency: Subject: Introduction To Statistics
No ratings yet
Chapter # 03 Measures of Central Tendency: Subject: Introduction To Statistics
24 pages
Assignment Model
No ratings yet
Assignment Model
9 pages
Final
No ratings yet
Final
19 pages
Statistics 1 Midterm - QUestions
67% (3)
Statistics 1 Midterm - QUestions
4 pages
Financial Thesis PDF
100% (2)
Financial Thesis PDF
8 pages
CCE Curriculum and Syllabi R2020
No ratings yet
CCE Curriculum and Syllabi R2020
305 pages
Signals Systems Question Paper
100% (1)
Signals Systems Question Paper
14 pages
GEN MATH Module 3 Inverse Function
No ratings yet
GEN MATH Module 3 Inverse Function
16 pages
4-Karlygash Saduakassova - 2
No ratings yet
4-Karlygash Saduakassova - 2
13 pages
EE 413 - Engg Management: Decision Making
No ratings yet
EE 413 - Engg Management: Decision Making
30 pages
Hallinger Kovacevic 2019 A Bibliometric Review of Research On Educational Administration Science Mapping The Literature
No ratings yet
Hallinger Kovacevic 2019 A Bibliometric Review of Research On Educational Administration Science Mapping The Literature
35 pages
7 Structures and State-Space Realizations: 7.1 Block-Diagram Representation
No ratings yet
7 Structures and State-Space Realizations: 7.1 Block-Diagram Representation
21 pages
Generative Urban Design Concepts and Methods - A Research Review
No ratings yet
Generative Urban Design Concepts and Methods - A Research Review
7 pages
The Laplace Transform of Step Functions (Sect. 6.3) .: Overview
No ratings yet
The Laplace Transform of Step Functions (Sect. 6.3) .: Overview
11 pages
Eco 324
No ratings yet
Eco 324
194 pages
Optimization Using The Gradient and Simplex Method
No ratings yet
Optimization Using The Gradient and Simplex Method
8 pages