0% found this document useful (0 votes)
17 views

AP Stats Study Guide

The document provides an overview of key concepts in statistics including: 1) It describes different types of variables like categorical and quantitative variables and common graphs used to describe each type. 2) It discusses measures of center, spread, and shape for distributions including resistant measures like the median and non-resistant measures like the mean. 3) It covers concepts like normal distributions, z-scores, and how adding or multiplying constants affects distributions. 4) The document also summarizes topics like sampling, experiments vs observational studies, and probability concepts such as independent events, conditional probability, and tree diagrams.

Uploaded by

happyhania23
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

AP Stats Study Guide

The document provides an overview of key concepts in statistics including: 1) It describes different types of variables like categorical and quantitative variables and common graphs used to describe each type. 2) It discusses measures of center, spread, and shape for distributions including resistant measures like the median and non-resistant measures like the mean. 3) It covers concepts like normal distributions, z-scores, and how adding or multiplying constants affects distributions. 4) The document also summarizes topics like sampling, experiments vs observational studies, and probability concepts such as independent events, conditional probability, and tree diagrams.

Uploaded by

happyhania23
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

AP STATISTICS STUDY GUIDE

TYPES OF VARIABLES
CATEGORICAL → animals, people, things QUANTITATIVE (numerical data) → height,
→ pie chart → bar graph weight, GPA
→ segmented bar chart → histogram → boxplot
→ frequency table → stemplot → dotplot
→ relative frequency table → 2 way table

DESCRIBING GRAPHS
Right-skewed: right tail, positive skew, mean Left-skewed: left tail, negative skew, mean <
> median median

S: Shape The distribution of goals scored is skewed to the right, with


O: Outliers a single peak at 1 goal. There is a gap between 5 and 9
C: Center goals and the games when the team scored 9 and 10 goals
C: CONTEXT appear to be outliers. The distribution is centered at a
S: Spread median of 2 goals. The data varies from 1 to 10 goals scored.

RESISTANT MEASURES OF CENTER AND VARIABILITY


Resistant: A statistical measure that isn’t sensitive to extreme values
→ Median → IQR

NON-RESISTANT MEASURES OF CENTER AND VARIABILITY


→ Mean → Standard deviation
→ Range → Correlation
NORMAL DISTRIBUTION
Percentile: An individual’s percentile is the percent of values in a distribution that are less than
the individual’s data value
Z-score: for an individual value in a distribution tells us how many standard deviations from the
mean the value falls, and in what direction. Values larger than the mean have positive scores,
values smaller than the mean have negative scores.
Normal distribution: Symmetric, single-peaked, bell-shaped density curve called a Normal
curve. Specified by mean μ and standard deviation σ.

Empirical formula: 68-95-99.7

→ 68% falls within 1 SD of the mean


→ 95% falls within 2 SD of the mean
→ 99.7% falls within 3 SD of the mean

THE EFFECT OF ADDING OR SUBTRACTING A CONSTANT


→ Affects measures of center + location (mean, 5-number summary, percentiles)
→ No change on measures of variability (range, IQR, SD)
→ No change on shape of the distribution

THE EFFECT OF MULTIPLYING OR DIVIDING BY A CONSTANT


→ Affects measures of center + location (mean, 5-number summary, percentiles)
→ Affects measures of variability (range, IQR, SD)
→ No change on shape of the distribution
DENSITY CURVES
- Area underneath the curve is exactly 1 (Area = base x height)

ASSESSING NORMALITY WITH A NORMAL PROBABILITY PLOT


- Data is approximately normal if the points on the plot lie close to a straight line
- Nonlinear form indicates non-normal distribution

COMBINING NORMAL DISTRIBUTIONS


If X and Y are independent and normally distributed, X + Y is normally distributed with means
μx + μy and variance σ2x + σ2y

SCATTERPLOTS AND CORRELATION


Explanatory variable: May help predict or explain changes in a response variable.
Response variable: Measures an outcome of a study
Ex. Hand span would be an explanatory variable because we anticipate that knowing a
student’s hand span will help us predict the number of candies that student can grab.

Association: This exists when knowing the value of one variable helps us predict the value of
the other. It is any relationship between 2 variables.
Correlation: A linear relationship between 2 variables. DOES NOT imply causation.
Correlation r: Measures the direction and strength of the association for a linear association
between two quantitative variables
- r is always between -1 and 1
- r > 0 → positive association
- r < 0 → negative association
S: Strength - weak, moderate, or strong association
O: Outliers/Unusual features
F: Form - linear or nonlinear form
A: Association/Direction - positive, negative, or no association

LEAST-SQUARES REGRESSION
ŷ = b0 + b1x
ŷ - predicted value of y for a given value of x
b0 - y intercept
b1 - slope
*This line makes the sum of the squared residuals as small as possible and contains the point
(x̄, ȳ)
Extrapolation: Use of a regression line for prediction far outside the interval of x values used to
obtain the line. Such predictions are often not accurate.
Residual: Difference between the actual value of y and the value of y predicted by the
regression line. (residual = y - ŷ)
Standard deviation of the residuals s: Measures the size of a typical residual, typical distance
between the actual y values and the predicted y values.
Coefficient of determination r2: Measures the percent of variability in the response variable that
is accounted for by the LSRL.

READING COMPUTER OUTPUT

INTERPRETING RESIDUAL PLOT


A regression model is appropriate when there is no leftover curved pattern, apparent
randomness, and no clear patterns.

OUTLIERS AND INFLUENTIAL DATA IN REGRESSION


Influential observations: Individual points that substantially change the correlation or the
regression line. Outliers in x are often influential.
Lurking variable: A variable that is unknown and not controlled for.

SAMPLING
Sample survey: A study that collects data from a sample that is chosen to represent a specific
population.
Sample: A subset of individuals in the population from which we collect data.
Population: The entire group of individuals we want info about in a statistical study.
Census: Collects data from every individual in the population.

BAD METHODS OF SAMPLING


→ Convenience sampling → Voluntary response sampling

Bias: The design of a statistical study shows bias if it is very likely to underestimate or very
likely to overestimate the value you want to know.
- Undercoverage: When some members of the population are less likely to be chosen or
cannot be chosen in a sample.
- Nonresponse: When an individual chosen for the sample can’t be contacted or refuses
to participate.
- Response bias: When there’s a systematic pattern of inaccurate answers to a survey
question.
When asked how the design of a sample survey leads to bias, do the following:
→ Describe how members of a sample might respond differently from rest of the population
→ Explain how this difference would lead to an underestimate or overestimate
Ex. This is a convenience sample. It would probably include a much higher proportion of
students with a graphing calculator than in the population at large because a graphing
calculator is required for the statistics class. So this method would probably lead to an
overestimate of the actual population proportion.

GOOD METHODS OF SAMPLING


→ Random sampling → SRS
→ Stratified random sample → Cluster sampling

EXPERIMENT VS OBSERVATIONAL STUDY


An observational study observes individuals and measures variables of interest but does not
attempt to influence the responses. Data is collected. Nothing is influenced by the researcher.
An experiment deliberately imposes some treatment on individuals to measure their responses.
Placebo: A treatment that has no active ingredient, but is otherwise like other treatments.
Treatment: A specific condition applied to the individuals in an experiment.
Experimental units: The object to which a treatment is randomly assigned.
Subjects: When experimental units are human beings.
Double-blind: Neither the subjects nor those who interact with them and measure the response
variable know which treatment a subject received.
Single-blind: Either subjects don’t know which treatment they are receiving or the people who
interact with them and measure the response variable don’t know which subjects are receiving
which treatment.

COMPLETELY RANDOMIZED DESIGNS


Randomized block design: Separating the sample into groups to reduce variability (Gender,
nationality).

Matched pairs design: A design comparing 2 treatments that uses blocks of size 2. In some
designs, 2 very similar experimental units are paired and the 2 treatments are randomly
assigned within each pair. In others, each experimental unit receives both treatments in a
random order.

INFERENCE
Statistically significant: When the observed results of a study are too unusual to be explained
by chance alone, these results are statistically significant.
*Larger random samples tend to produce estimates closer to the true population value
*If individuals are randomly selected → You can make inferences about the population
*If treatment is randomly imposed → You can imply cause and effect

PROBABILITY
Law of large numbers: Says that if we observe more and more repetitions of any chance
process, the proportion of times that a specific outcome occurs approaches its probability.
Sample space: The list of all possible outcomes.
If all outcomes in the sample space are equally likely, the probability that event A occurs:
Complement rule P(AC) = 1 - P(A), probability that event A does NOT occur

Mutually exclusive P(A∩B) = 0, events A and B have no outcomes in common

General addition rule P(A∪B) = P(A) + P(B) - P(A∩B)

General multiplication rule P(A∩B) = P(A) x P(B|A)

Multiplication rule for P(A∩B) = P(A) x P(B)


independent events

Conditional probability

Independent events P(A|B) = P(A|B’) = P(A) or P(B|A) = P(B|A’) = P(B), knowing


whether or not one event has occurred doesn’t change the
probability of the other occurring

Intersection P(A∩B), common outcomes

Union P(A∪B), all outcomes

TREE DIAGRAM

RANDOM VARIABLES
Discrete random Takes a fixed set of possible values with gaps between them.
variable X
E(X) = , The average value over many, many
repetitions of the same chance process
Ex. If many, many newborns are randomly selected, their average Apgar
score will be about 8.128.

σx = , Measures how much the values of the variable


typically vary from the mean
Ex. A randomly selected newborn baby’s Apgar score will typically vary
from the mean (8.128) by about 1.437 units.

Binomial setting When we perform n independent trials of the same chance process and
count the number of times a success occurs.
B: Binary - 2 outcome, “success” or “failure”
I: Independent (np ≥ 10, nq ≥10)
N: Number of trials must be fixed
S: same probability of success on each trial

E(X) = np
Ex. If all the students in Mr. Hogarth’s class were just guessing and
repeated the activity many times, the average number of students who
would guess correctly would be about 7.
σx = 𝑛𝑝(1 − 𝑝)
Ex. If all the students in Mr.Hogarth’s class were just guessing and
repeated the activity many times, the number of students who would
guess correctly would typically vary by about 2.16 from the mean of 7.

Geometric When we perform independent trials of the same chance process and
setting record the number of trials it takes to get one success. On each trial, the
probability p of success must be the same.
BIS, Number of trials don’t have to be fixed.

1
E(X) = 𝑝 , The expected number of trials required to get the 1st success
1−𝑝
σx = 𝑝

10% Condition When taking a random sample of size n from a population of size N, we
can use a binomial distribution to model the count of successes in the
sample as long as n < 0.10N

Large Counts Suppose that a count X of successes has a binomial distribution.. The
Condition Large Counts condition says that the probability distribution of X is
approximately Normal if np ≥ 10 and nq ≥10.

SAMPLING DISTRIBUTION
Statistic: A number that describes some characteristic of a sample.
Parameter: A number that describes some characteristic of the population.
Sampling variability: Different random samples of the same size from the same population
produce different values for a statistic.
Sampling distribution: of a statistic is the distribution of values taken by the statistic in all
possible samples of the same size from the same population.
Unbiased estimator: A statistic used to estimate a parameter is this if the mean of its sampling
distribution = to the value of the parameter being estimated.
An estimator should have no bias and low variability
*Accurate ≠ Precise, increasing n will make an estimate more precise, not accurate

Sampling
distribution of p̂

μp̂ = p
𝑝(1−𝑝)
σp̂ = 𝑛
, as long as the 10% Condition is satisfied
P̂ is approximately normal if Large Counts is satisfied

Shape: When n is small and p is close to 0, the sampling distribution of p̂


is right-skewed. When n is small and p is close to 1, the sampling
distribution of p̂ is left-skewed. The sampling distribution of p̂ becomes
more Normal when p is closer to 0.5 or n is larger (or both).
Variability: The value of σp̂ depends on both n and p. For a specific
sample size, σp̂ is larger for values of p close to 0.5 and smaller for values
of p close to 0 or 1. For a specific value of p, σp̂ gets smaller as n gets
larger. Multiplying the sample size by 4 cuts σ in half.
Sampling
distribution of x̄

μx̄ = μ
σ
σx̄ = , as long as the 10% Condition is satisfied
𝑛
The sampling distribution of x̄ is normal if the population is also normal.

Central Limit If n ≥ 30, the sampling distribution of x̄ is approximately Normal.


Theorem (CLT)

CONFIDENCE INTERVALS
Point estimator: A statistic that provides an estimate of a population parameter.
Point estimate: Value of that statistic from a sample.
Confidence interval: An interval of plausible values for a parameter based on sample data.
confidence interval = point estimate ± margin of error

Confidence level C: The overall success rate of the method used to calculate the interval.
→ Make sure to describe the PARAMETER and not the statistic in interpretations
→ DOES NOT tell the probability that a particular interval captures the population parameter
Ex. We are 95% confident that the interval from 0.613 to 0.687 captures the true proportion of
all U.S. adults who would admit to experiencing some financial difficulty paying an unexpected
bill of $1000 right away.

Margin of error: of an estimate describes how far, at most, we expect the estimate to vary from
the true population value. In a C% confidence interval, the distance between the point estimate
and the true parameter value will be less than the margin of error in C% of all samples.

DECREASING MARGIN OF ERROR


- Decrease confidence level
- Increase sample size n
PANIC2
P: Parameter of interest + BINS (omit this for means)
→ Mention the words “true” or “population”
A: Assumptions
→ S: Random selection
→ I: Independence, n < 0.10N
→ N: For proportions, approximately normal if np ≥ 10 and nq ≥10
For means, approximately normal if n ≥ 30
N: Name of interval
→ proportion: 1-proportion z-interval for population proportion
→ mean: 1-sample t-interval for population mean
I: Interval
C: Conclusion in context

Confidence interval for a


population proportion

Confidence interval for a


population mean

t* is the critical value with n-1 degrees of freedom

SIGNIFICANCE TESTS
A test is conducted with the assumption that H0 is true
Null hypothesis H0: Claim that we weigh evidence against in a significance test.
Alternative hypothesis Ha: Claim that we are trying to find evidence for in a significance test.
One-sided: What the Ha is if it states that a parameter is greater than the null value or if it states
that the parameter is less than the null value.
Two-sided: What the Ha is if it states that the parameter is different from the null value (it could
be either greater than or less than).
P-value: Probability of obtaining evidence for Ha as strong or stronger than the observed
evidence when H0 is true.
Power: Probability that the test will find convincing evidence for Ha when a specific alternative
value of the parameter is true.
- power = P(reject H0 | parameter = some specific alternative value)
- power = 1 - P(Type II error)
- P(Type II error) = 1-power
Power can be increased by:
→ Increasing sample size → Decreasing SE
→ Making null + alternative parameter values farther apart → Increasing α

MAKING CONCLUSIONS
→ If p < α, reject H0 + conclude that there is convincing evidence for Ha (in context).
→ If the p > α, fail to reject H0 + conclude there isn’t convincing evidence for Ha (in context).

Type I error α: When when H0 is true but we reject H0


Type 2 error β: When Ha is true but we fail to reject H0

PHANTOMS
P: Parameter of interest + BINS (omit this for means)
→ Mention the words “true” or “population”
H: Hypotheses, H0, Ha, and α (default is 0.05)
A: Assumptions
→ S: Random selection
→ I: Independence, n < 0.10N
→ N: For proportions, approximately normal if np0 ≥ 10 and n(1-p0) ≥10
For means, approximately normal if n ≥ 30
N: Name of interval
→ proportion: 1-proportion z-test for population proportion
→ mean: 1-sample t-test for population mean
T: Test statistic (z, t)
O: Obtain p-value, 1propztest/1samplettest(…, …, …, …), df = …
M: Make a decision, p = … → reject/fail to reject H0
S: State conclusion
→ There is evidence to support Ha …
→ There isn’t convincing evidence to support Ha …

Significance test for a


population proportion
Significance test for a
population mean

CONFIDENCE INTERVAL FOR DIFFERENCES


PANIC2
iPause when there are two samples, 2 proportions are always independent, 2 means can only
be matched pairs if the samples sizes are the same (ALWAYS CHECK)
P: Parameter of interest + BINS (omit this for means)
→ Mention the words “true” or “population”
→ p1 - p2
→ μ1 - μ2
A: Assumptions
→ S: Random selection
→ I: Independence, n < 0.10N
→ N: For proportions, approximately normal if successes + failures for both samples are ≥ 10
For means, approximately normal if the sample sizes for both means ≥ 30
N: Name of interval
→ proportion: 2-proportion z-interval for difference in proportions
→ mean: 2-sample t-interval for difference in means
I: Interval
C: Conclusion in context

SIGNIFICANCE TEST FOR DIFFERENCES


PHANTOMS
iPause when there are two samples, 2 proportions are always independent, 2 means can only
be matched pairs if the samples sizes are the same (ALWAYS CHECK)
P: Parameter of interest + BINS (omit this for means)
→ Mention the words “true” or “population”
H: Hypotheses, H0: p1 = p2/μ1 = μ2, Ha, and α (default is 0.05)
A: Assumptions
→ S: Random selection
→ I: Independence, n < 0.10N
→ N: For proportions, approximately normal if successes + failures for both samples are ≥ 10
For means, approximately normal if the sample sizes for both means ≥ 30
N: Name of interval
→ proportion: 2-proportion z-test for difference in proportions
→ mean: 2-sample t-test for difference in means
T: Test statistic (z, t)
O: Obtain p-value, 2propztest/2samplettest(…, …, …, …), df = …
M: Make a decision, p = … → reject/fail to reject H0
S: State conclusion
→ There is evidence to support Ha …
→ There isn’t convincing evidence to support Ha …

PAIRED DATA
One subject is getting both treatments (i.e. fill out a survey with 2 questions, comparing
answers from both questions) Use one sample t test for means
Subtract the differences between the two groups
Name: Paired t test
Led μd = (group 1 - group 2), let μd be the true mean difference
H0: μd (difference of means) = 0
Ha: μd (difference of means) ≠/</> 0

INTERPRETATIONS
Slope For every 1 unit increase in explanatory variable, our model predicts an
average increase/decrease of slope in response variable.

Y-intercept When the explanatory variable is zero units, our model predicts that the
response variable would be y-intercept.

Extrapolation The prediction is not reliable due to extrapolation. The prediction was
made outside the interval of current data x-values. Trends seen in the
scatterplot may not continue at this new x-value.

r2 r2% of the variation in response variable can be explained


by the LSRL using the explanatory variable and response variable.
Ex. 73.33% of the variation in wait time is explained by the least-squares
regression line using the number of customers in line and the wait
time.

s The actual value is typically about standard deviation away from the
predicted response variable by the least-squares regression line with x =
explanatory variable.
Ex. The actual price of a Ford F-150 is typically about $5740 away from
the price predicted by the least-squares regression line with x = miles
driven.

Residual The response value is residual more/less than the predicted response
variable by the regression line x = explanatory variable.
Ex. Andres grabbed 3.54 more Starburst candies than the number
predicted by the regression line with x = hand span.

Confidence We are C% confident that the interval from ____ to ____ captures the
interval parameter in context.

Confidence If we were to select many random samples from a population and


level construct a C% confidence interval using each sample, about C% of
the intervals would capture the parameter in context.

CALCULATOR FUNCTIONS
1 AND 2 VARIABLE DATA
1-Var Stats To find mean, standard Enter data in L1 and frequency in
(STAT, CALC) deviation, and 5 number L2 if needed
summary for a data set. 1-Var Stats L1 or 1-Var Stats L1,L2

LinReg (a + bx) To find the equation for a least Enter values in L1 (explanatory)
(STAT, CALC) squares regression line. To find r Enter values in L2 (response)
DiagnosticOn and r2. LinReg (a + bx) L1,L2

PROBABILITY CALCULATIONS
normalcdf To find an area for an interval in a normalcdf(lower, upper, mean, SD)
6, 5, 2 normal distribution.

invNorm To find a boundary value in a invNorm(area left, mean, SD)


6, 5, 3 normal distribution.

binompdf To find the probability of getting binompdf(n, p, X)


6, 5, A exactly X successes in a binomial n: number of trials
setting. p: probability of success
X: number of successes

binomcdf To find the probability of getting binomcdf(n, p, X)


6, 5, B at most X successes in a n: number of trials
binomial setting. p: probability of success
X: number of successes

geompdf To find the probability of getting geompdf(p, X)


6, 5, H exactly X successes in a p: probability of success
geometric setting. X: number of successes
geomcdf To find the probability of getting geomcdf(p, lower, upper)
6, 5, I at most X successes in a p: probability of success
geometric setting.

CONFIDENCE INTERVALS
1-PropZInt To calculate a confidence interval 1-PropZInt
6, 6, 5 to estimate a single proportion. x: number of successes
n: sample size
C-Level: confidence level

2-PropZInt To calculate a confidence interval 2-PropZInt


6, 6, 6 to estimate a difference of x1: number of successes in sample 1
proportions. n1: sample size of sample 1
x2: number of successes in sample 2
n2: sample size of sample 2
C-Level: confidence level

TInterval To calculate a confidence interval TInterval


6, 6, 2 to estimate a single mean. Inpt: Stats
Standard deviation of the x̄: sample mean
population is unknown. Sx: sample standard deviation
n: sample size
C-Level: confidence level

2-SampTInt To calculate a confidence interval 2-SampTInt


6, 6, 4 to estimate a difference of means. Inpt: Stats
Standard deviation of the x̄1: sample mean of sample 1
populations unknown. Sx1: standard deviation of sample 1
n1: sample size of sample 1
x̄2: sample mean of sample 2
Sx2: standard deviation of sample 2
n2: sample size of sample 2
C-Level: confidence level
Pooled: No

SIGNIFICANCE TESTS
1-PropZTest To test a claim made about a 1-PropZTest
6, 7, 5 single proportion. p0: null value
x: number of successes
n: sample size
Prop: ≠p0 <p0 >po (alternative)

2-PropZTest To test a claim made about a 2-PropZTest


6, 7, 6 difference of proportions. x1: number of successes sample 1
n1: sample size of sample 1
x2: number of successes sample 2
n2: sample size of sample 2
p1: ≠p2 <p2 >p2 (alternative)

T-Test To test a claim made about a T-Test


6, 7, 2 single mean Inpt: Stats
Standard deviation of the μ0: null value
population is unknown. x̄: sample mean
Sx: sample standard deviation
n: sample size
Ha: ≠ μ0( < μ0( > μ0( (alternative)

2-SampTTest To test a claim made about a 2-SampTTest


6, 7, 4 difference of means Inpt: Stats
Standard deviation of the x̄1: sample mean of sample 1
populations unknown. Sx1: standard deviation sample 1
n1: sample size of sample 1
x̄2: sample mean of sample 2
Sx2: standard deviation sample 2
n2: sample size of sample 2
µ1: ≠µ2 <µ2 >µ2 (alternative)
Pooled: No

You might also like