AP Stats Study Guide
AP Stats Study Guide
TYPES OF VARIABLES
CATEGORICAL → animals, people, things QUANTITATIVE (numerical data) → height,
→ pie chart → bar graph weight, GPA
→ segmented bar chart → histogram → boxplot
→ frequency table → stemplot → dotplot
→ relative frequency table → 2 way table
DESCRIBING GRAPHS
Right-skewed: right tail, positive skew, mean Left-skewed: left tail, negative skew, mean <
> median median
Association: This exists when knowing the value of one variable helps us predict the value of
the other. It is any relationship between 2 variables.
Correlation: A linear relationship between 2 variables. DOES NOT imply causation.
Correlation r: Measures the direction and strength of the association for a linear association
between two quantitative variables
- r is always between -1 and 1
- r > 0 → positive association
- r < 0 → negative association
S: Strength - weak, moderate, or strong association
O: Outliers/Unusual features
F: Form - linear or nonlinear form
A: Association/Direction - positive, negative, or no association
LEAST-SQUARES REGRESSION
ŷ = b0 + b1x
ŷ - predicted value of y for a given value of x
b0 - y intercept
b1 - slope
*This line makes the sum of the squared residuals as small as possible and contains the point
(x̄, ȳ)
Extrapolation: Use of a regression line for prediction far outside the interval of x values used to
obtain the line. Such predictions are often not accurate.
Residual: Difference between the actual value of y and the value of y predicted by the
regression line. (residual = y - ŷ)
Standard deviation of the residuals s: Measures the size of a typical residual, typical distance
between the actual y values and the predicted y values.
Coefficient of determination r2: Measures the percent of variability in the response variable that
is accounted for by the LSRL.
SAMPLING
Sample survey: A study that collects data from a sample that is chosen to represent a specific
population.
Sample: A subset of individuals in the population from which we collect data.
Population: The entire group of individuals we want info about in a statistical study.
Census: Collects data from every individual in the population.
Bias: The design of a statistical study shows bias if it is very likely to underestimate or very
likely to overestimate the value you want to know.
- Undercoverage: When some members of the population are less likely to be chosen or
cannot be chosen in a sample.
- Nonresponse: When an individual chosen for the sample can’t be contacted or refuses
to participate.
- Response bias: When there’s a systematic pattern of inaccurate answers to a survey
question.
When asked how the design of a sample survey leads to bias, do the following:
→ Describe how members of a sample might respond differently from rest of the population
→ Explain how this difference would lead to an underestimate or overestimate
Ex. This is a convenience sample. It would probably include a much higher proportion of
students with a graphing calculator than in the population at large because a graphing
calculator is required for the statistics class. So this method would probably lead to an
overestimate of the actual population proportion.
Matched pairs design: A design comparing 2 treatments that uses blocks of size 2. In some
designs, 2 very similar experimental units are paired and the 2 treatments are randomly
assigned within each pair. In others, each experimental unit receives both treatments in a
random order.
INFERENCE
Statistically significant: When the observed results of a study are too unusual to be explained
by chance alone, these results are statistically significant.
*Larger random samples tend to produce estimates closer to the true population value
*If individuals are randomly selected → You can make inferences about the population
*If treatment is randomly imposed → You can imply cause and effect
PROBABILITY
Law of large numbers: Says that if we observe more and more repetitions of any chance
process, the proportion of times that a specific outcome occurs approaches its probability.
Sample space: The list of all possible outcomes.
If all outcomes in the sample space are equally likely, the probability that event A occurs:
Complement rule P(AC) = 1 - P(A), probability that event A does NOT occur
Conditional probability
TREE DIAGRAM
RANDOM VARIABLES
Discrete random Takes a fixed set of possible values with gaps between them.
variable X
E(X) = , The average value over many, many
repetitions of the same chance process
Ex. If many, many newborns are randomly selected, their average Apgar
score will be about 8.128.
Binomial setting When we perform n independent trials of the same chance process and
count the number of times a success occurs.
B: Binary - 2 outcome, “success” or “failure”
I: Independent (np ≥ 10, nq ≥10)
N: Number of trials must be fixed
S: same probability of success on each trial
E(X) = np
Ex. If all the students in Mr. Hogarth’s class were just guessing and
repeated the activity many times, the average number of students who
would guess correctly would be about 7.
σx = 𝑛𝑝(1 − 𝑝)
Ex. If all the students in Mr.Hogarth’s class were just guessing and
repeated the activity many times, the number of students who would
guess correctly would typically vary by about 2.16 from the mean of 7.
Geometric When we perform independent trials of the same chance process and
setting record the number of trials it takes to get one success. On each trial, the
probability p of success must be the same.
BIS, Number of trials don’t have to be fixed.
1
E(X) = 𝑝 , The expected number of trials required to get the 1st success
1−𝑝
σx = 𝑝
10% Condition When taking a random sample of size n from a population of size N, we
can use a binomial distribution to model the count of successes in the
sample as long as n < 0.10N
Large Counts Suppose that a count X of successes has a binomial distribution.. The
Condition Large Counts condition says that the probability distribution of X is
approximately Normal if np ≥ 10 and nq ≥10.
SAMPLING DISTRIBUTION
Statistic: A number that describes some characteristic of a sample.
Parameter: A number that describes some characteristic of the population.
Sampling variability: Different random samples of the same size from the same population
produce different values for a statistic.
Sampling distribution: of a statistic is the distribution of values taken by the statistic in all
possible samples of the same size from the same population.
Unbiased estimator: A statistic used to estimate a parameter is this if the mean of its sampling
distribution = to the value of the parameter being estimated.
An estimator should have no bias and low variability
*Accurate ≠ Precise, increasing n will make an estimate more precise, not accurate
Sampling
distribution of p̂
μp̂ = p
𝑝(1−𝑝)
σp̂ = 𝑛
, as long as the 10% Condition is satisfied
P̂ is approximately normal if Large Counts is satisfied
μx̄ = μ
σ
σx̄ = , as long as the 10% Condition is satisfied
𝑛
The sampling distribution of x̄ is normal if the population is also normal.
CONFIDENCE INTERVALS
Point estimator: A statistic that provides an estimate of a population parameter.
Point estimate: Value of that statistic from a sample.
Confidence interval: An interval of plausible values for a parameter based on sample data.
confidence interval = point estimate ± margin of error
Confidence level C: The overall success rate of the method used to calculate the interval.
→ Make sure to describe the PARAMETER and not the statistic in interpretations
→ DOES NOT tell the probability that a particular interval captures the population parameter
Ex. We are 95% confident that the interval from 0.613 to 0.687 captures the true proportion of
all U.S. adults who would admit to experiencing some financial difficulty paying an unexpected
bill of $1000 right away.
Margin of error: of an estimate describes how far, at most, we expect the estimate to vary from
the true population value. In a C% confidence interval, the distance between the point estimate
and the true parameter value will be less than the margin of error in C% of all samples.
SIGNIFICANCE TESTS
A test is conducted with the assumption that H0 is true
Null hypothesis H0: Claim that we weigh evidence against in a significance test.
Alternative hypothesis Ha: Claim that we are trying to find evidence for in a significance test.
One-sided: What the Ha is if it states that a parameter is greater than the null value or if it states
that the parameter is less than the null value.
Two-sided: What the Ha is if it states that the parameter is different from the null value (it could
be either greater than or less than).
P-value: Probability of obtaining evidence for Ha as strong or stronger than the observed
evidence when H0 is true.
Power: Probability that the test will find convincing evidence for Ha when a specific alternative
value of the parameter is true.
- power = P(reject H0 | parameter = some specific alternative value)
- power = 1 - P(Type II error)
- P(Type II error) = 1-power
Power can be increased by:
→ Increasing sample size → Decreasing SE
→ Making null + alternative parameter values farther apart → Increasing α
MAKING CONCLUSIONS
→ If p < α, reject H0 + conclude that there is convincing evidence for Ha (in context).
→ If the p > α, fail to reject H0 + conclude there isn’t convincing evidence for Ha (in context).
PHANTOMS
P: Parameter of interest + BINS (omit this for means)
→ Mention the words “true” or “population”
H: Hypotheses, H0, Ha, and α (default is 0.05)
A: Assumptions
→ S: Random selection
→ I: Independence, n < 0.10N
→ N: For proportions, approximately normal if np0 ≥ 10 and n(1-p0) ≥10
For means, approximately normal if n ≥ 30
N: Name of interval
→ proportion: 1-proportion z-test for population proportion
→ mean: 1-sample t-test for population mean
T: Test statistic (z, t)
O: Obtain p-value, 1propztest/1samplettest(…, …, …, …), df = …
M: Make a decision, p = … → reject/fail to reject H0
S: State conclusion
→ There is evidence to support Ha …
→ There isn’t convincing evidence to support Ha …
PAIRED DATA
One subject is getting both treatments (i.e. fill out a survey with 2 questions, comparing
answers from both questions) Use one sample t test for means
Subtract the differences between the two groups
Name: Paired t test
Led μd = (group 1 - group 2), let μd be the true mean difference
H0: μd (difference of means) = 0
Ha: μd (difference of means) ≠/</> 0
INTERPRETATIONS
Slope For every 1 unit increase in explanatory variable, our model predicts an
average increase/decrease of slope in response variable.
Y-intercept When the explanatory variable is zero units, our model predicts that the
response variable would be y-intercept.
Extrapolation The prediction is not reliable due to extrapolation. The prediction was
made outside the interval of current data x-values. Trends seen in the
scatterplot may not continue at this new x-value.
s The actual value is typically about standard deviation away from the
predicted response variable by the least-squares regression line with x =
explanatory variable.
Ex. The actual price of a Ford F-150 is typically about $5740 away from
the price predicted by the least-squares regression line with x = miles
driven.
Residual The response value is residual more/less than the predicted response
variable by the regression line x = explanatory variable.
Ex. Andres grabbed 3.54 more Starburst candies than the number
predicted by the regression line with x = hand span.
Confidence We are C% confident that the interval from ____ to ____ captures the
interval parameter in context.
CALCULATOR FUNCTIONS
1 AND 2 VARIABLE DATA
1-Var Stats To find mean, standard Enter data in L1 and frequency in
(STAT, CALC) deviation, and 5 number L2 if needed
summary for a data set. 1-Var Stats L1 or 1-Var Stats L1,L2
LinReg (a + bx) To find the equation for a least Enter values in L1 (explanatory)
(STAT, CALC) squares regression line. To find r Enter values in L2 (response)
DiagnosticOn and r2. LinReg (a + bx) L1,L2
PROBABILITY CALCULATIONS
normalcdf To find an area for an interval in a normalcdf(lower, upper, mean, SD)
6, 5, 2 normal distribution.
CONFIDENCE INTERVALS
1-PropZInt To calculate a confidence interval 1-PropZInt
6, 6, 5 to estimate a single proportion. x: number of successes
n: sample size
C-Level: confidence level
SIGNIFICANCE TESTS
1-PropZTest To test a claim made about a 1-PropZTest
6, 7, 5 single proportion. p0: null value
x: number of successes
n: sample size
Prop: ≠p0 <p0 >po (alternative)