Assignment 2
Assignment 2
The data for this study is stored in the file Sleep.csv which can be downloaded from Canvas.
The data contains 2 variables:
Treatment
Sleep
The treatment group the volunteer was assigned to (either Drug, Placebo or
Neither).
Whether the volunteer fell asleep within 30 minutes (either Yes or No).
(b)
(i)
Run the iNZightVIT software and load the file Sleep.csv into it.
Run a randomisation test to compare the proportion of volunteers who fell asleep within
30 minutes between the three treatment groups Include the output from this in your
assignment answers.
Notes: Variable 1 needs to be Sleep and Variable 2 needs to be Treatment. Before you select
Record my Choice in the Analyse window, change the level of interest to Yes.
(ii)
When chance is acting alone, would it be unusual to get an average deviation from the
overall proportion who fell asleep at least as big as the observed difference?
(Use your randomisation test output to answer this.)
(iii) Is it plausible that the observed average deviation from the overall proportion who fell
asleep can be explained by chance acting alone? Briefly justify your answer.
(iv) Can we conclude that different treatments caused differences in the sleep rate? If so,
justify why with two reasons. If not, what can we conclude?
Department of Statistics
Many research organisations give their interviewers exact scripts to follow when conducting
interviews to measure opinions on controversial issues.
(i) Give one type of bias which the research organisations are trying to minimise by using
exact scripts for the interview.
(ii) Will the use of scripts completely eliminate this form of bias? Briefly justify your answer.
(b)
In March 2001, Princeton Survey Research Associates conducted a poll asking about the 2000
presidential election in America. The poll asked a random sample of 1200 adults Did you vote
in the 2000 presidential election? 72% said yes and 28% said no (with a margin of error of
3 percent). The actual voter turnout was 51%.
What non-sampling error is most likely to explain the difference in the polled and actual figures.
Justify your answer.
(c)
In a U.S. court case, Bristol Myers was ordered by the Federal Trade Commission to stop
advertising that twice as many dentists use Ipana than any other toothpaste. Bristol Meyers
had based their claim on a survey of 10,000 randomly selected subscribers to two dental
magazines. They received 1,983 responses, with 621 saying they used Ipana and only 258
saying they used the second most popular brand.
(i)
Explain how selection bias may be a potential problem with the survey.
(ii)
Explain why self-selection bias is not a potential problem with the survey.
The wait time between placing the order and receiving the coffee (in seconds)
Gender
Run the iNZightVIT software and load the file Coffee.csv into it.
(a)
(i)
(ii)
(b)
(i)
(iii) Apart from selection bias, what is the main non-sampling error affecting the survey?
Briefly justify your answer.
(iv) Do the results of the survey provide convincing evidence of Bristol Myers claim? Briefly
justify your answer.
The data
Why would the median be a better estimate of the centre of this data than the mean?
Generate a bootstrap confidence interval for the median wait times of customers. (DO
NOT use the variable Gender at this point.) Include the output in your assignment
answers.
(iii) What is the parameter we are estimating using this bootstrap confidence interval?
(iv) Do we know the true value of this parameter?
(v) Interpret the bootstrap confidence interval.
Generate a bootstrap confidence interval for the difference in the median wait times
between males and females. Include the output in your assignment answers.
(ii) What is the parameter we are estimating using this bootstrap confidence interval?
(iii) Interpret the bootstrap confidence interval.
(iv) Based on the bootstrap confidence interval, is it believable that the median wait time for
males is the same as the median wait time for females? Briefly justify your answer.
Calculate and interpret a 95% confidence interval for the mean delivery time.
Note: You must clearly show that you have followed the step-by-step guide to producing a
confidence interval by hand given in the Lecture Workbook, Chapter 6. Use the t-procedures
tool to find values for t-multipliers and standard errors.
(b)
The student believes that, on average, pizzas from this pizzeria take less than 25 minutes to
arrive. Does the data support this? Briefly justify your answer.
Department of Statistics
40+
208
183
119
55
251
54
75
Total
945
688
Females
795
897
750
607
634
465
(a)
State the sampling situation (a, b or c) for calculating the standard error of the difference in the
following scenarios:
(i) for people who would take out travel insurance, estimating the difference between the
proportion of females who did so to cover the cost of lost luggage and the proportion of
males who did so to cover the lost luggage.
(ii) for people who would take out travel insurance, estimating the difference between the
proportion of females who did so to cover the cost of car accidents and the proportion of
females who did so to cover the cost of stolen items.
(iii) for people who had travelled overseas before, estimating the difference between the
proportion of people under 40 who thought nearby travellers letting children misbehave
was most annoying and the proportion of people under 40 who thought nearby travellers
reclining their seat was most annoying.
(b)
Consider people who had travelled overseas before. Calculate and interpret a 95% confidence
interval for the difference between the proportion of people under 40 years of age who thought
the most annoying thing a nearby traveller could do was smell and the proportion of people at
least 40 years of age who thought the most annoying thing a nearby traveller could do was
smell.
Note: You must clearly show that you have followed the step-by-step guide to producing a
confidence interval by hand given in the Lecture Workbook, Chapter 6. Use the t-procedures
tool to find values for t-multipliers and standard errors.