0% found this document useful (0 votes)
50 views24 pages

Midterm I Review - 1 Per Page

rev

Uploaded by

Pi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views24 pages

Midterm I Review - 1 Per Page

rev

Uploaded by

Pi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Stat 101

Midterm I Review
Units 1-5
Midterm Exam Logistics
Midterm will be held during the regular class time and location on
Thursday: it starts at approximately 10:07 and runs for 83 min.
Its closed-book. You are allowed one 8.5x11 page or notes(front-
and-back OK) No laptops or cell phones, Remember to bring a
calculator.
It covers Units 15 and the material on HWs 1-4. There are
practice problems and practice exam on the course website.
Be sure to briefly explain your answers and show calculations.
Office Hours Tues-Thurs (no sections or OHs otherwise):
Tues: 11:30am-12:30pm in SC-300b (Kevin)
Wed: 11am-1pm in SC-107 (Joseph), 1-3pm in SC-300b (Kevin)
Thurs: 9-10am in SC-300b (Kevin)
2
Outline of Material for the Midterm
Sampling and Measurement (Unit 2)
Surveys and Sampling
Experimental Design and Randomization
Descriptive Statistics (Unit 3)
Univariate
Bivariate
Probability & Random Variables (Unit 4 - Part I)
Binomial Random Variables
Normal Distribution
Law of Large Numbers and the Central Limit Theorem
Inference for a Population Mean (Unit 5)
Confidence Intervals
Hypothesis Tests
Power and Sample Size Calculations
3
Descriptive Statistics
(Univariate)

Histograms and Boxplots


Measures of Center/Location
Mean, Median, Mode, Quantiles
Measures of Spread
Range, IQR, Variance/Standard Deviation
Measures of Shape
Skewness
Outliers (1.5*IQR Rule)

4
Descriptive Statistics
(Bivariate)

Scatterplots, Side-by-Side Boxplots, Contingency


Tables (Two-way tables)

Correlation (r): a measure of linear association


Regression: y^ = a + b*x
e = y y^
R2: proportion of the total variability in the y-variable
that can be predicted by using the x-variable.

5
Descriptive Statistics
(Bivariate)
Study Design
(Experiments)
Principles
Control, Replication, Randomization leads to
causal associations
Elements
Treatments, response
Types
Simple Randomized Experiment
Stratified/Blocked, Matched Pairs
Not always feasible (ethically, etc)

7
Study Design
(Surveys)
Elements
Target Population (and a parameter), the sample (and a statistic)
Types
SRS (Randomly select Harvard students)
Stratified (Randomly select three student from each Harvard house)
Bias
Selection (systematic bias in those that were chosen to be sampled)
Response (systematic bias in the way that people responded to the
questions, for example: bad wording or pressure from the
investigator)
Non-response (systematic bias in the way that people did not respond
to the survey)
8
Intro to Probability
Sample Space, Outcomes
Events
Union, Intersection, Disjoint, Complement
Probability (only on events)
Rules (Unions & Intersections)
Conditional Probability:
Defintion: P(A|B) = P(A and B)/P(B)
Independence
Check: P(A and B) = P(A)*P(B)
or P(A|B) = P(A)
Bayes Rule (often, its just easier using a 2x2 table)
P ( B | A) P ( A)
P( A | B)
P ( B | A) P ( A) P ( B | AC ) P ( AC )

9
Intro to Random Variables
Discrete
Probability distribution function (sum to one)
Usually defined in tabular form
Calculating the mean, variance & sd
X = E(X) = [x*P(X = x)]
2X = E((X- X)2) = [(x- X)2*P(X = x)]
Continuous
Probability density function
Probabilities represented by areas
Note: P(X = x) = 0 for continuous variables

10
The Normal Distribution

Continuous
X ~ N(,)
Standardize to find probabilities
in Table A:
X
Z

Binomial Random Variables
Think Coin Flips (counting heads)

4 Major characteristics
dichotomous, n fixed, fixed, independent trials

Shorthand: X ~ Bin(n, )

Finding probabilities (Formula)

E(X) = n, Var(X) = n(1 )

= X/n
12
Binomial Random Variables
(Normal approximation to the binomial)

Let X ~ Bin(n, ). Then approximately:


0.08
X ~ N n , n (1 )
(1 )

0.06
~ N ,

n

0.04
This holds only if:
n 10
0.02
n(1 ) 10 0.00
Law of Large Numbers and
the Central Limit Theorem
Law of Large Numbers
X will have mean equal to the individual observations
mean (), and its variance will shrink in comparison
E( X ) =
Var( X ) = 2/n
Central Limit Theorem
States that all sample means ( X ) and sums of RVs will
be normally distributed, no matter what the original
distribution (assuming n is large)
X
Remember: X ~ N X ,
n
14
Inference
(One-Sample for Means , unknown)
One sample t-test for CI for one sample
x 0 s
t xt *
df n 1
s/ n n
We assume Xi ~ N(,) & independent
t ~ t(df = n 1) [when null hypothesis is true]
Assume normal if n is small, OK without extreme outliers
when n is large (n > 15)
Example:
A sample of 5 stat 101 students were found to have slept 5.8 hours
the night before their final, with a standard deviation of 2.0. Is this
significantly lower than the recommended minimum of 7.5 hours?
Power and Sample Size
Calculating Power (2 steps)
1) Determine the rejection region for x under Ho
2) Calculate the probability of x falling in that rejection
region when Ha is true
Power increases with:
larger sample size, n
smaller ,
further distance between A and 0
Calculating Sample Size from a Desired Margin of Error (m)
2
z ( )
*
n
m
Practice Problem #1
The following are a collection of unrelated quick problems. Briefly justify
your answer for each problem.
a) Suppose that A and B are two disjoint events within the same sample
space. In addition, let P(A) = 1/8 and P(B) = 1/4. Are events A and B
independent? Explain.

b) Suppose a particular outcome from a random event has a probability of


0.02. Which of the following statements represent correct interpretations of
this probability? Circle the right answer and provide justification.
i) The outcome will never happen.
ii) The outcome will happen two times out of every 100 trials, for
certain.
iii) The outcome will happen two times out of every 100 trials, on the
average.
iv) The outcome could happen, or it couldn't, the chances of either
result are the same.
17
Practice Problem #1 (cont.)
c) If females of a certain species of lizard always mate with males that are
0.75 years younger than they are, what would the correlation
coefficient between the ages of these male and female lizards be?
Circle the right answer and provide justification.
i) 0.75 iv) 1
ii) -0.75 v) -1
iii) 0 vi) Not enough information to tell

d) Consider the annual salaries of mutual fund managers in the Boston


area. The mean salary is $450,000 and the median salary is $380,000.
Circle the correct answer below. The probability that the salary of a
randomly selected mutual fund manager from the Boston area is larger
than the mean of $450,000 is:
i) > 0.5 iii) = 0.5
ii) < 0.5 iv) Cannot be determined

18
Practice Problem #2
Cancer is the #2 cause of death in the United States, yet is not
nearly as deadly in other parts of the world. An investigator
looks at the cancer mortality rate (per 1,000 person-years) vs.
Population Growth Rate per year (in percent) for 171
countries. She starts by looking at the scatterplot and some
summary statistics from her data:

19
Practice Problem #2 (cont.)

a) What is the equation for the least squares regression line?

b) The US has a growth rate of 0.90%. What is the predicted


cancer mortality rate for US (the true cancer mort. rate is 123.8)?
Practice Problem #2 (cont.)
c) What percentage of total variability in cancer
mortality rate can be predicted using growth rate?

d) The investigator believes cancer mortality rates


could be lowered if countries encouraged more
baby-making and more immigration. Briefly
explain why this statement may not be correct.

21
Practice Problem #3
Not everybody likes Britney Spears. In fact, an internet poll run by
the Rolling Stones magazine showed that 66% of college-aged men
said they liked Britney, while 30% of college-aged women like her.

a) Imagine Harvard, made up of 52% women, is hosting a Britney


Spears concert. Given that only fans of Britney attend the concert,
what is the probability that the person sitting next to you at the
concert is a woman?

b) A line at the snack bar for the concert has10 people (all Brit-fans).
What is the probability that exactly 5 of these students are women?

22
Practice Problem #3 (cont.)
c) There is a line of 100 students to get into the concert (all of
whom are Brit-fans). What is the probability that the
majority of them are women?

d) The internet poll run by the Rolling Stones reported that 66%
of all college-aged men like Britney Spears? Why could this
be a mistake?

23
Practice Problem #4
A friend of yours is curious to see how confident Harvard students are in
their look. He asks a random sample of n = 130 Harvard students what
percent of Harvard students do you believe I better looking than you?
This sample had a mean of 30.8% and a standard deviation of 24.2%.
***Note: if people had realistic judgments about themselves, the mean
in the population should be 50%.

a) Calculate the 95% confidence interval to estimate the true mean


percent of students that Harvard students think they are better looking
than.

b) Based on your confidence interval in part (a), would you expect a


hypothesis test to determine whether H0: = 50 to be rejected based
on a twosided test?

c) Perform the formal hypothesis test as stated in part (b).

24

You might also like