0% found this document useful (0 votes)
33 views

Data: (3 Points) Describe How The Observations in The Sample Are Collected, and The

The document discusses analyzing health data from the Behavioral Risk Factor Surveillance System (BRFSS) survey on California residents. It addresses three research questions: 1) Whether estimated maximum oxygen consumption follows a normal distribution. Descriptive statistics and normality tests show it is bimodally distributed. 2) Whether hours of sleep correlates with days of poor physical or mental health. Exploratory graphs and correlation tests show no significant linear correlation. 3) Whether body mass index (BMI) correlates with high cholesterol or heart attacks. Exploratory graphs and correlation tests show BMI significantly correlates with both conditions.

Uploaded by

Christian Marino
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Data: (3 Points) Describe How The Observations in The Sample Are Collected, and The

The document discusses analyzing health data from the Behavioral Risk Factor Surveillance System (BRFSS) survey on California residents. It addresses three research questions: 1) Whether estimated maximum oxygen consumption follows a normal distribution. Descriptive statistics and normality tests show it is bimodally distributed. 2) Whether hours of sleep correlates with days of poor physical or mental health. Exploratory graphs and correlation tests show no significant linear correlation. 3) Whether body mass index (BMI) correlates with high cholesterol or heart attacks. Exploratory graphs and correlation tests show BMI significantly correlates with both conditions.

Uploaded by

Christian Marino
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Part 1:

Data: (3 points) Describe how the observations in the sample are collected, and the
implications of this data collection method on the scope of inference (generalizability /
causality). Note that you will need to look into documentation on the BRFSS to answer this
question. See http://www.cdc.gov/brfss/ as well as the "More information on the data" section
below.

This entire sample is subject to non-response bias; all the collected data comes from people
who where willing to answer the phone and finish a very long survey. It is not a great
representation of the general population, but the sample size is useful to find statistical
significance to even small differences. For computing capabilities, we will refer to the data

collected in the state of California only.


Due to the fact that this has been collected retrospectively we can only imply CORRELATIONS
not CAUSALITY.
Part 2:
Research quesion 1: What are the statistical characteristics of the estimaged age-gender
specific maximum oxygen consumption? Does it follow a normal distribution?
variable: $maxvo2_ (continuous) in ml/min/kg
Research quesion 2: Does a correlation exist between the hours slept each night and reported
days with poor physical health or poor mental health in the last 30 days?

explanatory variable: sleptim1 in hours


response variables: physhlth and menthlth in days

Research quesion 3: Does a correlation exist between the body mass index (BMI) of the
individual answering the survey and having a high blood cholesterol or having a heart attack?
explanatory variable: _bmi5 (continuous) computed by researchers in kg/m^2
response variables: toldhi2 and cvdinfr4
Part 3:
Research quesion 1: What are the statistical characteristics of the estimaged age-gender
specific maximum oxygen consumption? Does it follow a normal distribution?
The last two digits of this variable are implied decimal places, before doing anything this
corrects to the units: ml/min/kg

We will now get some descritptive statistics on the computed measurement

Next some exploratory graphs


Another visual method to evaluate if this sample follow a sample distribution is a normal
probability plot using the qqnorm function

The sample looks like it does not follow a normal distribution, it is seemed to be bimodal as
evidenced from the ‘S’ shape of the QQ plot.
Regardless, we can use the Shapiro-Wilk normality test to assess this more thoroughly. The null
hypothesis (Ho) is that the data follow a normal distribution, and we will reject the Ho if the p-
value < 0.05.

Conclusion p < 2.2e-16 , we can assume that this sample does not follow a normal distribution.
Research quesion 2: Does a correlation exist between the hours slept each night and reported
days with poor physical health or poor mental health in the last 30 days? Turning all the values
into numeric

Descriptive statistics

Let’s test the correlation between hours slept each night and reported days with poor physical
health in the last 30 days
There appears to be no clear correlation in the exploratory graph. Also, the coefficient of
correlation r=-0.04 is not significant.
Now, let us test the correlation between hours slept each night and reported days with poor
mental health in the last 30 days
Same thing, no visual correlation or significant coefficient can be deducted r=-0.11.
Conclusion: Even though there is no linear relationship between the hours slept and both
dependent variables, there is a dip in the lowess regression line plotted which leads to infer
that people on both extremes of the spectrum might have a higher number of days with poor
physical or mental health.

Research quesion 3: Does a correlation exist between the body mass index (BMI) of the
individual answering the survey and having a high blood cholesterol or having a heart attack?
The last two digits of this variable are implied decimal places, before doing anything this
corrects to the units: kg/m^2

Descriptive statistics

Exploratory graph for high cholesterol


Statistical analysis for high cholesterol

p=2.9e-4 therefore we can reject the Ho that states that there is no correlation between having
a higher BMI and suffering hypercholesterolemia.
Exploratory graph for heart attack
Statistical analysis for heart attack

p=6.47e-13 therefore we can reject the Ho that states that there is no correlation between
having a higher BMI and having a heart attack.

Conclusion: Having a higher BMI correlates statistically with having hypercholesterolemia or


having a heart attack.

You might also like