STATS Unit 2 Notes
STATS Unit 2 Notes
• The alternate hypothesis (often denoted as H₁ or Hₐ) predicts a specific effect or difference
between two populations or treatments.
• In contrast, the null hypothesis (denoted as H₀) assumes that there is no effect or difference,
serving as the default or neutral stance.
For example, if researchers want to test whether a specially purified vitamin helps babies walk earlier, the
research hypothesis might state that babies who receive the vitamin (Population 1) walk earlier than
those who do not (Population 2). The null hypothesis would then state that there is no difference in
walking age between the two groups. The null hypothesis is what researchers test directly because it is
more specific and allows for statistical evaluation using probability.
Step 2: Determine the Characteristics of the Comparison Distribution
In the second step, researchers determine what the distribution of results would look like if the null
hypothesis were true. This involves defining the comparison distribution, which is the theoretical
distribution that represents the population under the null hypothesis. Researchers need to know the
mean, standard deviation, and shape of this distribution to evaluate how likely it is to obtain the sample
result if the null hypothesis is true. Typically, if the population distribution is normal or the sample size is
large, the comparison distribution will also be normal.
In the vitamin example, if the null hypothesis assumes no effect of the vitamin, then the distribution of
ages at which babies begin to walk in the general population (those who didn’t take the vitamin) becomes
the comparison distribution. This distribution forms the basis against which the sample result is evaluated.
Step 3: Determine the Cutoff Sample Score on the Comparison Distribution at Which the Null
Hypothesis Should Be Rejected
The third step involves setting a critical value, also called a cutoff score, that defines how extreme the
sample result must be to reject the null hypothesis. This decision is made before collecting data to avoid
bias. The level of significance, denoted by α (alpha), is chosen to represent the probability of rejecting the
null hypothesis when it is actually true (Type I error).
Common significance levels (α) are:
• 5% (α = 0.05) → Z = ±1.96
• 1% (α = 0.01) → Z = ±2.33
• For a one-tailed test at the 1% level, the researchers would reject H₀ only if the sample Z-score
falls below −2.33 (for lower tail).
This step ensures objectivity and limits the likelihood of making a false positive conclusion.
Step 4: Determine Your Sample’s Score on the Comparison Distribution
Once the data is collected, researchers calculate the sample’s Z-score to find out how extreme the result
is compared to the comparison distribution. This Z-score tells us how many standard deviations the
sample result is from the mean under the null hypothesis. The formula used is:
where X is the sample score, μ is the population mean, and σ is the population standard deviation.
For example, if a baby walked at 6 months, and the population mean is 14 months with a standard
deviation of 3 months, the Z-score would be −2.67
This score is then compared with the critical value determined in Step 3. Calculating this Z-score is
essential to determine whether the observed result is statistically significant or likely due to chance.
Step 5: Decide Whether to Reject the Null Hypothesis
The final step is to compare the sample’s Z-score with the cutoff score set earlier.
• The significance level (α) (if you are too strict, power can go down).
• The true difference between the population and the hypothesized value (larger differences are
easier to detect).
• Confidence Interval
A Confidence Interval (CI) is a statistical tool used to estimate the range within which the true value of a
population parameter, such as the mean or proportion, is likely to lie. Since it is often impossible or
impractical to collect data from an entire population, researchers collect data from a sample and then use
this sample to make inferences about the larger population. However, because samples vary, there is
always some uncertainty about how close the sample result is to the true population value.
A confidence interval provides a solution to this problem by calculating a range of values that is likely to
contain the true parameter, based on the sample data and the level of confidence the researcher
chooses. The most commonly used confidence levels are 95% and 99%. A 95% confidence interval
means that if the same sampling procedure were repeated many times, 95 out of 100 calculated intervals
would contain the actual population parameter, and only 5 would not. A wider confidence interval
suggests greater uncertainty or variability in the data, while a narrow interval indicates a more precise
estimate.
Confidence intervals are extremely useful in research because they do not just offer a single value, but
rather reflect the natural uncertainty and variability involved in working with samples, helping researchers
understand the reliability of their results and guiding sound, evidence-based decisions. Unlike hypothesis
testing, which only tells whether an effect is statistically significant, confidence intervals provide a clear
picture of both the size of the effect and the degree of confidence we can place in the estimate, making
them an essential part of any meaningful statistical analysis.
WHAT IS A T-TEST?
A t-test is a statistical tool used to determine whether the difference between the means (averages) of two
groups is significant — meaning, is the difference likely due to a real effect or just random chance? When
we collect data, sample means will always differ slightly, but a t-test tells us if that difference is large
enough to conclude that the groups are actually different in the population. There are different types of t-
tests depending on the type of data and the study design. The most common are:
1. Independent Groups t-test (unpaired t-test)
2. Dependent Groups t-test (paired t-test)
INDEPENDENT GROUPS
The Independent Groups t-test is a statistical method used to determine whether there is a significant
difference between the means (averages) of two unrelated (independent) groups. This test helps us
understand whether the difference between group averages is due to chance or because of some real
factor (like a treatment or condition).
Example: You want to compare the average test scores of Class A and Class B after teaching
them with two different methods. The students in Class A are not the same as in Class B—so they are
independent groups.
Constructs (Core Elements)
1. Two Independent Groups
In an Independent t-test, the two groups being compared must be entirely separate. This means that no
individual should appear in both groups. For example, if you're comparing boys and girls in different
classrooms, a boy cannot also be in the girls’ group. Each participant belongs to only one group.
2. Dependent Variable (What You're Measuring)
This is the variable you are observing or measuring in both groups. It represents the outcome you're
comparing—such as test scores, blood pressure, or reaction time. The dependent variable must be
numerical so that averages can be calculated and compared.
3. Independent Variable (Grouping Factor)
The independent variable is what divides participants into two groups. It’s often a treatment or condition,
like different teaching methods, types of diet, or gender. This variable creates the two distinct, unrelated
groups whose averages you will compare.
4. Sample Mean
Each group will have a sample mean, which is the average score of all participants in that group. The t-
test compares these two means to determine whether the difference between them is statistically
significant or just due to chance.
5. Standard Deviation and Variance
These are measures of how spread out the scores are within each group. A small variance means the
scores are close to the average, while a large variance means the scores are more spread out. These
values are important when calculating the t-value in the test.
Assumptions
1. Independence of Observations
This assumption means the scores of one participant should not influence the scores of another,
especially from the other group. The groups must not be related or matched in any way. Each observation
should be independent to ensure valid results.
2. Normality
The data in both groups should follow a roughly normal distribution, especially if the sample size is small.
This helps ensure the accuracy of the t-test results. When sample sizes are large (usually over 30), slight
non-normality is acceptable.
3. Homogeneity of Variance
The variance (spread) in scores should be similar between the two groups. If one group’s scores vary a
lot more than the other, it can affect the test’s reliability. This assumption is often tested using Levene’s
Test before performing the t-test.
4. Scale of Measurement
The dependent variable must be measured at either the interval or ratio level. This means the data
should have a meaningful numerical value, such as height, weight, or test scores—not just categories like
"yes" or "no".
Characteristics
Two-tailed or One-tailed
The test can be two-tailed (checking if there's any difference) or one-tailed (checking if one mean is
specifically higher or lower than the other). The choice depends on the research question or hypothesis
being tested.
• DEPENDENT GROUPS
Dependent-Samples Design (Repeated-Measures Design):
In a dependent-samples design, measurements in one sample are related to measurements in another
sample or within the same sample under different conditions. In this design, each subject is measured
multiple times, often under different experimental conditions. Since the observations are related, the
samples cannot be considered independent. The primary advantage of this design is that it reduces
variability between groups because each subject acts as their own control, minimizing random differences
that could obscure the effect of the treatment. This helps in identifying the effect of the condition being
tested more clearly.
Matched-Subjects Design:
In a matched-subjects design, subjects from two or more samples are paired based on a characteristic
that could affect the outcome (e.g., age, ability). After matching, subjects in each pair are assigned to
different experimental conditions. This design helps control for variability due to extraneous factors by
ensuring that comparisons are made between subjects with similar characteristics. Thus, it allows for a
more accurate assessment of the treatment effect by reducing the influence of potential confounding
variables.
Why Fewer Degrees of Freedom for Dependent Samples?
In independent samples, each subject's score in one group is unrelated to the score of another subject in
the other group. In contrast, in dependent samples (such as repeated-measures or matched-subjects
designs), the scores are related. For example, when the same subjects are measured under both
conditions, or when subjects are paired based on some characteristic, the scores are not independent of
each other.
• Repeated-measures design: A subject who performs well under one condition will likely perform well
under the other condition, and similarly for those who perform poorly.
• Matched-subjects design: If a subject performs well under one condition, their matched counterpart
is also likely to perform well under the other condition, assuming the matching variable is relevant.
Because the performance of one subject in a pair is not independent of the other, knowing one score
provides information about the other. This interdependence reduces the number of independent pieces of
information available for estimating variability, and thus, only n - 1 degrees of freedom are available for
statistical testing. This results in fewer degrees of freedom compared to independent samples, where
each observation is independent.
Assumptions When Testing a Hypothesis about the Difference Between Two Dependent Means:
Unlike the test for independent means, the assumption of homogeneity of variance is not required for
dependent samples. However, certain assumptions still apply, such as random sampling and normality of
the sampling distribution of the difference scores (X - Y, or D). The Central Limit Theorem assures us that
for larger sample sizes (n > 25), the sampling distribution of D will approximate normality regardless of the
population's shape. However, for smaller sample sizes (n < 25), if the population is not normal,
nonparametric tests might be a better alternative.
Problems with Using the Dependent-Samples Design:
1. Order Effects: Order effects occur when repeated measurements are made on the same subjects.
The first treatment condition might influence how a subject performs on the second treatment. For
example, exposure to one treatment can alter a subject’s behavior or outlook, which can lead to
difficulty interpreting the results. If the order of treatment conditions isn't randomly assigned, this
introduces additional variation and increases the standard error, making it harder to detect actual
differences.
2. Repeated-Measures Design with Nonrandom Assignment: If the order of treatments in a
repeated-measures design is not randomly assigned, it introduces bias in the results. For example, in
studies evaluating psychotherapy effectiveness, where conditions are assessed before and after
treatment, random assignment is not feasible. This type of design can lead to biased comparisons as
improvements could be due to the treatment, placebo effects, or natural recovery. To mitigate this, the
study should ideally have a control group to account for these potential biases.
3. Sampling Variation in Matched-Subjects Designs: In matched-subjects designs, the reduction in
variance between paired observations depends on the strength of the correlation between the
variables used for pairing. A high correlation between paired variables (such as matching subjects
based on intelligence for a learning experiment) can reduce variability and increase the power of the
test. However, the correlation coefficient observed in the sample might not exactly match the
population correlation. If the sample correlation is not accurate, it could lead to errors in pairing, and
thus, the reduction in variance might not be as substantial as expected. Therefore, care must be
taken when matching subjects on variables that are highly correlated with the outcome.