Midterm I Review - 1 Per Page
Midterm I Review - 1 Per Page
Midterm I Review
Units 1-5
Midterm Exam Logistics
Midterm will be held during the regular class time and location on
Thursday: it starts at approximately 10:07 and runs for 83 min.
Its closed-book. You are allowed one 8.5x11 page or notes(front-
and-back OK) No laptops or cell phones, Remember to bring a
calculator.
It covers Units 15 and the material on HWs 1-4. There are
practice problems and practice exam on the course website.
Be sure to briefly explain your answers and show calculations.
Office Hours Tues-Thurs (no sections or OHs otherwise):
Tues: 11:30am-12:30pm in SC-300b (Kevin)
Wed: 11am-1pm in SC-107 (Joseph), 1-3pm in SC-300b (Kevin)
Thurs: 9-10am in SC-300b (Kevin)
2
Outline of Material for the Midterm
Sampling and Measurement (Unit 2)
Surveys and Sampling
Experimental Design and Randomization
Descriptive Statistics (Unit 3)
Univariate
Bivariate
Probability & Random Variables (Unit 4 - Part I)
Binomial Random Variables
Normal Distribution
Law of Large Numbers and the Central Limit Theorem
Inference for a Population Mean (Unit 5)
Confidence Intervals
Hypothesis Tests
Power and Sample Size Calculations
3
Descriptive Statistics
(Univariate)
4
Descriptive Statistics
(Bivariate)
5
Descriptive Statistics
(Bivariate)
Study Design
(Experiments)
Principles
Control, Replication, Randomization leads to
causal associations
Elements
Treatments, response
Types
Simple Randomized Experiment
Stratified/Blocked, Matched Pairs
Not always feasible (ethically, etc)
7
Study Design
(Surveys)
Elements
Target Population (and a parameter), the sample (and a statistic)
Types
SRS (Randomly select Harvard students)
Stratified (Randomly select three student from each Harvard house)
Bias
Selection (systematic bias in those that were chosen to be sampled)
Response (systematic bias in the way that people responded to the
questions, for example: bad wording or pressure from the
investigator)
Non-response (systematic bias in the way that people did not respond
to the survey)
8
Intro to Probability
Sample Space, Outcomes
Events
Union, Intersection, Disjoint, Complement
Probability (only on events)
Rules (Unions & Intersections)
Conditional Probability:
Defintion: P(A|B) = P(A and B)/P(B)
Independence
Check: P(A and B) = P(A)*P(B)
or P(A|B) = P(A)
Bayes Rule (often, its just easier using a 2x2 table)
P ( B | A) P ( A)
P( A | B)
P ( B | A) P ( A) P ( B | AC ) P ( AC )
9
Intro to Random Variables
Discrete
Probability distribution function (sum to one)
Usually defined in tabular form
Calculating the mean, variance & sd
X = E(X) = [x*P(X = x)]
2X = E((X- X)2) = [(x- X)2*P(X = x)]
Continuous
Probability density function
Probabilities represented by areas
Note: P(X = x) = 0 for continuous variables
10
The Normal Distribution
Continuous
X ~ N(,)
Standardize to find probabilities
in Table A:
X
Z
Binomial Random Variables
Think Coin Flips (counting heads)
4 Major characteristics
dichotomous, n fixed, fixed, independent trials
Shorthand: X ~ Bin(n, )
= X/n
12
Binomial Random Variables
(Normal approximation to the binomial)
0.08
X ~ N n , n (1 )
(1 )
0.06
~ N ,
n
0.04
This holds only if:
n 10
0.02
n(1 ) 10 0.00
Law of Large Numbers and
the Central Limit Theorem
Law of Large Numbers
X will have mean equal to the individual observations
mean (), and its variance will shrink in comparison
E( X ) =
Var( X ) = 2/n
Central Limit Theorem
States that all sample means ( X ) and sums of RVs will
be normally distributed, no matter what the original
distribution (assuming n is large)
X
Remember: X ~ N X ,
n
14
Inference
(One-Sample for Means , unknown)
One sample t-test for CI for one sample
x 0 s
t xt *
df n 1
s/ n n
We assume Xi ~ N(,) & independent
t ~ t(df = n 1) [when null hypothesis is true]
Assume normal if n is small, OK without extreme outliers
when n is large (n > 15)
Example:
A sample of 5 stat 101 students were found to have slept 5.8 hours
the night before their final, with a standard deviation of 2.0. Is this
significantly lower than the recommended minimum of 7.5 hours?
Power and Sample Size
Calculating Power (2 steps)
1) Determine the rejection region for x under Ho
2) Calculate the probability of x falling in that rejection
region when Ha is true
Power increases with:
larger sample size, n
smaller ,
further distance between A and 0
Calculating Sample Size from a Desired Margin of Error (m)
2
z ( )
*
n
m
Practice Problem #1
The following are a collection of unrelated quick problems. Briefly justify
your answer for each problem.
a) Suppose that A and B are two disjoint events within the same sample
space. In addition, let P(A) = 1/8 and P(B) = 1/4. Are events A and B
independent? Explain.
18
Practice Problem #2
Cancer is the #2 cause of death in the United States, yet is not
nearly as deadly in other parts of the world. An investigator
looks at the cancer mortality rate (per 1,000 person-years) vs.
Population Growth Rate per year (in percent) for 171
countries. She starts by looking at the scatterplot and some
summary statistics from her data:
19
Practice Problem #2 (cont.)
21
Practice Problem #3
Not everybody likes Britney Spears. In fact, an internet poll run by
the Rolling Stones magazine showed that 66% of college-aged men
said they liked Britney, while 30% of college-aged women like her.
b) A line at the snack bar for the concert has10 people (all Brit-fans).
What is the probability that exactly 5 of these students are women?
22
Practice Problem #3 (cont.)
c) There is a line of 100 students to get into the concert (all of
whom are Brit-fans). What is the probability that the
majority of them are women?
d) The internet poll run by the Rolling Stones reported that 66%
of all college-aged men like Britney Spears? Why could this
be a mistake?
23
Practice Problem #4
A friend of yours is curious to see how confident Harvard students are in
their look. He asks a random sample of n = 130 Harvard students what
percent of Harvard students do you believe I better looking than you?
This sample had a mean of 30.8% and a standard deviation of 24.2%.
***Note: if people had realistic judgments about themselves, the mean
in the population should be 50%.
24