Data: (3 Points) Describe How The Observations in The Sample Are Collected, and The
Data: (3 Points) Describe How The Observations in The Sample Are Collected, and The
Data: (3 points) Describe how the observations in the sample are collected, and the
implications of this data collection method on the scope of inference (generalizability /
causality). Note that you will need to look into documentation on the BRFSS to answer this
question. See http://www.cdc.gov/brfss/ as well as the "More information on the data" section
below.
This entire sample is subject to non-response bias; all the collected data comes from people
who where willing to answer the phone and finish a very long survey. It is not a great
representation of the general population, but the sample size is useful to find statistical
significance to even small differences. For computing capabilities, we will refer to the data
Research quesion 3: Does a correlation exist between the body mass index (BMI) of the
individual answering the survey and having a high blood cholesterol or having a heart attack?
explanatory variable: _bmi5 (continuous) computed by researchers in kg/m^2
response variables: toldhi2 and cvdinfr4
Part 3:
Research quesion 1: What are the statistical characteristics of the estimaged age-gender
specific maximum oxygen consumption? Does it follow a normal distribution?
The last two digits of this variable are implied decimal places, before doing anything this
corrects to the units: ml/min/kg
The sample looks like it does not follow a normal distribution, it is seemed to be bimodal as
evidenced from the ‘S’ shape of the QQ plot.
Regardless, we can use the Shapiro-Wilk normality test to assess this more thoroughly. The null
hypothesis (Ho) is that the data follow a normal distribution, and we will reject the Ho if the p-
value < 0.05.
Conclusion p < 2.2e-16 , we can assume that this sample does not follow a normal distribution.
Research quesion 2: Does a correlation exist between the hours slept each night and reported
days with poor physical health or poor mental health in the last 30 days? Turning all the values
into numeric
Descriptive statistics
Let’s test the correlation between hours slept each night and reported days with poor physical
health in the last 30 days
There appears to be no clear correlation in the exploratory graph. Also, the coefficient of
correlation r=-0.04 is not significant.
Now, let us test the correlation between hours slept each night and reported days with poor
mental health in the last 30 days
Same thing, no visual correlation or significant coefficient can be deducted r=-0.11.
Conclusion: Even though there is no linear relationship between the hours slept and both
dependent variables, there is a dip in the lowess regression line plotted which leads to infer
that people on both extremes of the spectrum might have a higher number of days with poor
physical or mental health.
Research quesion 3: Does a correlation exist between the body mass index (BMI) of the
individual answering the survey and having a high blood cholesterol or having a heart attack?
The last two digits of this variable are implied decimal places, before doing anything this
corrects to the units: kg/m^2
Descriptive statistics
p=2.9e-4 therefore we can reject the Ho that states that there is no correlation between having
a higher BMI and suffering hypercholesterolemia.
Exploratory graph for heart attack
Statistical analysis for heart attack
p=6.47e-13 therefore we can reject the Ho that states that there is no correlation between
having a higher BMI and having a heart attack.