0% found this document useful (0 votes)
19 views

Lab 3 Statistics Intro

The document discusses statistical concepts like populations, samples, parameters, and different types of data. It explains measures of central tendency including mean, median and mode. It also covers measures of spread such as variance, standard deviation, and standard error which are used to quantify the variability or dispersion of data around the mean. Examples are provided to illustrate concepts like skewed distributions and how outliers can affect the mean.

Uploaded by

9yzdrvyhp9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Lab 3 Statistics Intro

The document discusses statistical concepts like populations, samples, parameters, and different types of data. It explains measures of central tendency including mean, median and mode. It also covers measures of spread such as variance, standard deviation, and standard error which are used to quantify the variability or dispersion of data around the mean. Examples are provided to illustrate concepts like skewed distributions and how outliers can affect the mean.

Uploaded by

9yzdrvyhp9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

LAB #3: Statistical descriptions of data; comparing two samples

BEFORE LAB
• Read the Introduction and skim the lab exercises below.
• Read closely through the middle of page 4, and please use the videos linked here as
needed to help understand the text.
Samples and Populations
Categorical vs Numeric data
Measures of Central Tendency: Crash Course Statistics #3
Measures of Spread: Crash Course Statistics #4
• Watch the video – Beak of the finch ( https://www.youtube.com/watch?v=mcM23M-CCog )

OBJECTIVES

1. Understand the primary functions of statistics; describing populations and testing alternative
hypotheses.
2. Familiarity and understanding of statistical terms; Population vs sample, parameter, mean,
median, mode, continuous vs categorical data, alternative vs null hypothesis, test statistic
including the example of t, alpha and P-value, population and sample variance, population
and sample standard deviation, standard error.
3. Understand the meaning of P-value in the context of hypothesis testing.
4. Practice skills in data management, graphing, and statistical analysis (excel).

INTRODUCTION

Populations and samples

Populations in statistics are the complete groups of interest; the full set of individuals we
are interested in making inferences about. It could be all men or women, or all voters in an
election, or all members of a single lizard species. There are two sampling problems statistics
tries to solve. First, we almost never can collect data on an entire population. Instead, we have to
make inferences about the parameters, variables that describe a population [like mean, median,
or variance], from smaller samples*. We measure these parameters in the samples, then use the
variability in those samples to figure out how close we thing our sample parameter estimates are
to the true population parameters. Second, we often don’t even know if we are dealing with two
or more separate populations for a given trait of interest, like leg length, or if we are in fact
looking at a single population. Note that two different species might well be good biological
species, but if they don’t differ in some specific trait, like leg length, we consider them
statistically to be a single population (for that trait). In this lab we will measure a couple different
kinds of samples; Paramecium caudatum vs P. aurelia, and the morphology of Galapagos
finches before and after a drought in the Galapagos. In both cases we will use statistics to ask
whether the two samples come from the same or different populations.

______________

1
* Proper sampling is a subject of its own, but any factor that makes a sample less than a completely random
selection from the population of interest can bias results, and almost all actual samples are called samples of
convenience, meaning they are what we could get, not what is actually out there. One of many good
discussions is here; https://blogs.scientificamerican.com/guest-blog/where-are-the-real-errors-in-political-
polls/
Before we get to statistical parameters we should define variable types. Data can take at
least three forms; continuous, categorical, and ordinal. Continuous data varies as it sounds,
quantitatively in either integer or decimal units. Think length, width, running speed, height,
weight, age at death, development time. Categorical data refers to discrete, discontinuous states
that variables can take. Think sex, species, phylum, state or country of origin, color, or genotype.
Ordinal data are beyond the scope of this course, but are used frequently in behavioral studies,
they are categories with ranks, such that the order is important but the scale is not regular or
linear. In this lab we will describe continuous variables using the sample parameters of mean,
variance and standard deviation. The parameters we will deal with today are measures of central
tendency and spread.
Measures of central tendency
Central tendency can be described by the mean, or average of all values, the median, or
the value separating the larger half of the values from the smaller half, and the mode, the most
common value. The mean is the parameter we are probably most comfortable with. The mean
score a class achieves on an exam is often what we use to measure our own success. We use a
bar above a variable to show that it is a mean (𝑋"). The entire population has a mean µ, and an
estimate of the population mean from a sample is 𝑋".
Population mean = µ = ΣXi /N
__
Sample mean = X = ΣXi /n

In the above formulas the Σ is the symbol for “sum of”, so ΣXi means sum of the X’s -
that is summing the values from each individual in the sample and dividing by n, which is the
number of individuals in the sample
A normally distributed population is one that is fully described by the mean and the
variance. It is symmetrical around the mean, giving it the familiar ‘bell’ shape. In theory the tails
extend to infinity in each direction. The mean, median and mode are the same. In the right
(positive)-skewed distribution the mean is larger than the mode because it is affected more by
extremely large values. An example of a right-skewed population is household income in the US.

2 Author: Diva Jain. Source Wikipedia


The median individual income in the US was under $33,000 in 2018. However, the extremely
high annual incomes of the wealthiest 1% (average income in this percentile is over $1,300,000)
raises the mean individual income to $50,000 (US Soc Sec Admin). The mean is more
vulnerable to outliers than the median. The left (negative) skewed population has a mean smaller
than the median or mode because it is more affected by extreme negative outliers.
Measures of spread
Measures of variability in data are just as important as measures of central tendency. This
is because the more variable a population, the more often you might collect two samples with
means pretty far from each other by chance. The less variable a population, the closer we would
expect the means of multiple samples to be to each other. This means that the variability
(sometimes call spread or dispersion) of our data is critical to determining the statistical
significance of any difference in the means of different samples. One measure of spread is pretty
crude, the range of values from low to high. The ones we will focus on are based on the
differences of each point from the mean.
Variance: If we were to just use the average difference of each point from the mean as a
measure of spread, we would always end up with zero. This is because by definition some points
will be above and some below the mean, and the sum of those differences must cancel out. One
way (not the only way, but that’s history) to make the differences positive is to square them. The
fundamental measure of spread in statistics is the variance, the average of the squares of the
differences of each point from the mean. The population variance, denoted s2, is then:

!
∑#"$%(𝑋" − µ)
!
𝜎 =
𝑁
Where i is each individual data point, N is the population size, Xi is the value of an individual,
and µ is the population mean. Now remember we almost never get to measure the population
variance or mean (µ). Instead we collect data from samples, that have sample mean 𝑋" and the
sample variance is slightly different:
∑# " !
"$%(𝑋" − 𝑋)
!
𝑠 =
𝑛−1
This is very similar, but not identical to the population variance. Some parts of this equation are
just symbols. 𝑋" really is the same calculation as a population mean, just for the sample collected,
similarly n is just the sample size rather than the population size. However, why n-1 and not just
n? Stats math geniuses over a century ago realized that small samples tend to under-estimate
variance, because they are less likely to pick up rare extreme values, so they put in a correction;
using n-1 in the denominator will make a fraction bigger than using just n. The smaller the
sample, the greater the effect of subtracting one from the denominator, so the sample variance
estimate will be corrected in inverse proportion to the sample size. Large samples, that are likely
to catch more rare extreme values won’t have sample variances increased much at all. The
sample standard deviation, which is sort of like (not exactly, don’t worry about it until you’re
in a stats class) the average difference from the mean, is the square root of the variance, so s =
√𝑠 ! . Variance and standard deviation (SD) alone are measures of how variable your sample data
are, but to put a range around a mean estimate or test the difference between two means, we need
to measure the uncertainty of the mean estimate itself, not the variability in our data. The term
that describes variability around the sample mean estimate is the standard error, SE, sometimes

3
written SE&' , to show that it is describing variability of the mean estimate, not variability of the
individual data points. It is calculated by taking the sample standard deviation and dividing it by
the square root of the sample size,
𝑠
SE&' =
√𝑛
In this way SE is influenced by variability among data points, because that goes into s, but there
is also a large effect of sample size; larger samples will have larger denominators, and smaller
standard errors. This makes a lot of sense. If individual data points are less variable (low s) then
there is less variability around the estimate of the mean (any sample will probably be close to the
true mean). However, even if individual data points are highly variable, it you sample a lot of
them the sample mean will also be pretty close to the true mean; there will be less uncertainty
around your mean estimate.
The range within which the true mean is found
The standard error (SE&' ) is used to define the range within which we have a certain level
of confidence (90%, 95%, 99%) the true mean lies. We determine that confidence interval by
multiplying the SE by a new variable, t. t a continuous probability distribution (a curve that
defines the probabilities of different values of a variable) that describes how many standard
errors away from the true mean small samples are expected to be. It was developed by a
statistician working to improve quality controls at Guinness brewing company from small
samples at the start of the 20th century. For a given value of degrees of freedom, which for a
single mean is just one less than the sample size (n-1), and a given level of confidence in your
mean estimate, t will tell you the range where that proportion of sample means will fall. In the
figure below there is a probability density function for sample means of size 30. The filled areas
each cover 0.025 (2.5%) of the area of the curve, showing that 5% of the time sample means will
be greater than 2.042 standard errors away from the true mean.

t-distribution for n=30


0.5

0.4
Probaility density

0.3

0.025 of the 0.025 of the


0.2 area under the area under the
curve (a=0.05) curve (a=0.05)
0.1

0
-4 -3 -2 -1 0 1 2 3 4
Standard errors from the mean

So, to generate a 95% confidence interval around a mean estimate from a sample, you add and
subtract t times the standard error from the sample mean:
CI = 𝑋" ± 𝑡(,*+ ∗ 𝑆𝐸&'
!

4
Where 𝑡!,*+ can be looked up in a table like the one below, for a 95% confidence interval a/2 is
"
,
0.025, df is n-1, and 𝑆𝐸&' = . Note from the table below that for large sample size a good rule
√.
of thumb is that 95% of the sample means, like 95% of the data in a normally distributed
population, lie within 2 standard errors of the sample mean.

Table 1: Values of t for various df and a-levels.


CI P=0.95 CI P=0.99
df a/2=0.1 a /2=0.05 a /2=0.025 a /2=0.01 a /2=0.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
40 1.303 1.684 2.021 2.423 2.704
50 1.299 1.676 2.009 2.403 2.678
60 1.296 1.671 2.000 2.390 2.660
70 1.294 1.667 1.994 2.381 2.648
80 1.292 1.664 1.990 2.374 2.639
90 1.291 1.662 1.987 2.368 2.632
100 1.290 1.660 1.984 2.364 2.626
infinity 1.282 1.645 1.96 2.327 1.576

5
Exercise 1: Comparing the size of two protists.
Last week you measured 10 protists, either Paramecium caudatum or P. aurelia.
Fill in the table below with your 10 measurements in ocular units at 10x. For fun let’s use excel
to calculate the actual length in µm.

Species:
Observation Ocular units µm
1
2
3
4
5
6
7
8
9
10

There is a companion excel file for this lab, called “Lab 3 T-testinator.” Since the whole point
of this and any lab is to mess things up, download the excel file, and rename it, best to include
your name and lab. In a pinch you can download the original file and start over.

Let’s learn to use excel to calculate the actual length in µm using the conversion factor from the
table above. Use the first sheet in the workbook, “Single sample worksheet”, enter the species,
the ocular units and the stage micrometer µm.
1. In cell C4, to the right of the cell that says “µm per unit” type “=C3/C2” without
quotation marks. The ‘=’ sign tells Excel that you are typing a formula, and either
clicking on cells or typing the column letter and row number will tell excel to use the
contents of those cells in any formulae. There is a huge number of functions excel can
calculate for you, each with its own non-intuitive syntax.
2. Enter the ocular unit measurements. After you have entered the 10 measurements in
ocular units, take a look at the formula that calculates the length in µm by clicking on that
cell. You’ll see the formula uses a named variable, “µm_per_ocular_unit” so that the
ocular unit measurement is always multiplied by the same value in cell C4.
3. The average of the sample is calculated in cell C18, but this workbook used the
Insert\Name\Define Name menu command to assign the name ‘average’ to that cell, so
now other formulas can refer to that fixed cell value in their own formulas. The first cell
in the column that calculates the difference between each datapoint and the sample mean
uses that variable name. **NB the browser version of office 365 does not support
defining names (called ‘Name Manager’ in the Windows version. This is super annoying,
and the reason you need to download ‘T-testinator’ and work with the Desktop version.
4. Take a look at those formulas and paste them down, so now the sheet will calculate the
sum of squares and the variance. Now you enter the formulas to calculate the SD and SE.
In excel the square root of a value x would be written x^0.5. Now you’re ready to
calculate the confidence interval:

6
What is the value of t in table 1 for n=10, a/2 = 0.025? _________________
Use that value of t to calculate the CI95 as described on pages 4-5 _____________________
We are now ready to compare the sample means of the two different species and ask if they are
significantly different.

Compare the means of two samples:


To statistically test whether two means are different we need a new measure of uncertainty
around our sample estimates. Instead of the SE&' , the standard error of the sample mean estimate,
we need the SE&# /&" , which is to say how variable is the estimate of the difference between the
two sample means.
𝑠%! 𝑠!!
SE&# /&" =9 +
𝑁% 𝑁!
Where s1 and s2 are the standard deviations of each sample, and N1 and N2 are the sample sizes.
There are different ways to calculate the test statistic, t, depending on assumptions of equal
variances, but let’s use the simplest, not because the assumptions are met, but because it shows
what shapes the test statistic. The simplest test statistic to compare two means is
𝑋% − 𝑋!
𝑡=
SE&# /&"
How will t vary with an increasing difference between the sample means?
______________________________________________________________
How will t vary with increasing variability of the data within each sample?
______________________________________________________________
How will t vary with an increasing sample size (N1 and N2)?
______________________________________________________________
Degrees of freedom in a two-sample t-test is easy to calculate if you know or can assume that the
variances are equal in the two samples. In that case it’s N1+N2-2. It has to be approximated if
that assumption can’t me made, and that’s beyond this lab.
Work with your table to complete the following table from two samples of Paramecium
caudatum:

Length of Paramecium Length of Paramecium


Parameter caudatum (in µm) caudatum (in µm)
Sample size N
Mean X
Stnd Deviation s
Variance s2
95% Confidence
Interval CI

7
SE&# /&"

𝑡&# /&"

Now, look up in the t-table above for 18 df (N1+N2-2) whether the t calculated from your samples is
greater than the critical value of t in the table for a=0.05 (a/2=0.025). The critical value of a test
statistic is the value of the test statistic above which we reject the null hypothesis. Let’s unpack that
definition. A null hypothesis is the hypothesis of no difference, relationship or effect. For the two-
sample mean it is that the two means are equal. The alternative hypothesis is that the means are
different. The critical value depends on p, which is a measure of how unlikely our data are if the
null hypothesis was in fact true. The p-value is the probability of obtaining a result as far from the
null hypothesis prediction as our data were, or farther, if the null hypothesis were true. In this case it
is asking how often we would observe means as far from each other or farther if the true means
were the same. A low p-value suggests that it would be very unlikely that two samples from the
same population would have means as different as we observed, and provides evidence to reject the
null hypothesis. When the p-values is less than the alpha-level you select, the result is considered
statistically significant. Typically, 0.05 is used as the alpha-level.
What’s the critical value of t for 18 df and a=0.05? ______________
How does the test statistic t compare with the critical value?
______________________________________________________________
What’s your conclusion?
______________________________________________________________
If there is time you may be able to test whether the mean lengths of two species, P. caudatum
and P. aurelia, are significantly different. Use the table below for that comparison

Length of Paramecium Length of Paramecium


Parameter caudatum (in µm) caudatum (in µm)
Sample size N
Mean X
Stnd Deviation s
Variance s2
95% Confidence
Interval CI
SE&# /&"

𝑡&# /&"

The next part of the lab is adapted from HHMI Bio interactive

8
Evolution in Action:
Statistical Analysis Activity
Student Handout

INTRODUCTION
In 1973, Princeton University evolutionary biologists Peter and Rosemary Grant began studying the finches of
the Galápagos archipelago, a group of islands about 600 miles off the coast of Ecuador. They collected
thousands of measurements every year to track changes in the physical characteristics of finch populations over
time. One of their major goals was to collect enough data to identify associations between environmental and
evolutionary changes in finch populations.

For their study, the Grants focused on the medium ground finch (Geospiza fortis), a seed-eating species of finch
on the island of Daphne Major. Every year, the Grants measured physical characteristics like wing length, body
mass, tarsus length (the section of leg between the ankle and knee), and beak depth for hundreds of individual
medium ground finches. Small changes in these structures can be important for survival in different
environments. In addition, these traits tend to vary widely within populations.

In early 1977, a drought began on Daphne Major. The drought lasted for 18 months and caused the type and
Evolution in Action: Statistical Analysis
abundance of food available to the finches to change rapidly. Medium ground finches prefer to eat the small,
soft seeds of the bushy plant chamaesyce (Chamaesyce amplexicaulis),
4. Calculate the 95%but the supply
confidence of chamaesyce
interval for each setseeds was
of data.
extremely limited as a result of the drought. As the Confidence
drought progressed and the hungry finches quickly ate the
limits serve the same purpose as SEM. The 95% CI provides a range of value
small, soft chamaesyce seeds, one of the only remainingof thefood
entiresources
population is likely
for the to beground
medium found. finch became the
As an approximation,
seeds of a plant called caltrop (Tribulus cistoides). Caltrop seeds are much uselarger
the simplified
and harderformula below of
than those to the
calculate the 95% confidence
chamaesyce and are covered with pointy spines. Fewer is roughly twiceofthe
than 20% SEM:
the 1,200 medium ground finches on the
2(𝑠)
island survived the drought of 1977. 95% CI = 𝑛

The Grants were interested in determining whether there were any differences between the finches that
survived the drought and the finches that did not—and in particular, whether any physical characteristics were
key to survival. To answer this question, they compared
PARTthe average value
B: Graphing of different characteristics in the
the Data
finches that survived the drought to the average values of the same characteristics in those
5. On a separate sheet of graph paper or that did computer,
on your not survive.
construct four bar graph
meansthe
They then applied statistical methods to determine whether of nonsurvivors and survivors
differences they for eachthe
found between physical characteristic (body mass, w
two groups
were likely to be real or merely occurred by chance. and beak depth). Label both axes of each graph and show either the SEM or 95% CI
on your instructor’s directions. An example of a well-constructed bar graph is show
You now have the opportunity to statistically analyze data collected by the Grants.

The final sheet in the workbook contains actual


MATERIALS Mean Dorsal Fin Height Among Male and Female Orca Whales
dataScientific
from 100calculator
mediumifground not usingfinches livingwith
a computer on a spreadsheet program like Excel or Google spreadsheet
Daphne Major in 1976. Fifty
Graphing paper if not using a computer of those birds did not
survive the 1977 drought (nonsurvivors)
Colored pencils for graphing if not using a computerand 50
did (survivors). Use the T-testinator spreadsheet to Figure 1.
Mean dorsal fin height (m)

Ruler for graphing if not using a computer construc


complete the table below comparing the samples dorsal fin
of survivors and non-survivors. To the right is an
PROCEDURE 36 femal
example
Table 1 (on ofthe
a good
next bar-graph,
page) shows body showing how one from 100 medium ground finches living on Daphne Major
measurements whales (O
continuous
in 1976. Fifty of those birds did not survive the a1977 drought (nonsurvivors) and 50 did (survivors). These data
trait, dorsal fin height varies as case, err
function of the explanatory
are also provided categorical variable
in an Excel spreadsheet; use either the data in Table 1 or in the Excel spreadsheet to
confiden
sex. The bars crossing the means are 95%
construct several graphs as outlined in the following pages.
confidence intervals. In the box below the table,
draw a bar graph of how one variable varies
between
The Origin of survivors
Species: Beak ofand non-survivors.
the Finch Revised December 2017
www.BioInteractive.org Page 1 of 6

6. Once you complete your four bar graphs, describe in the space below any differenc
9
and survivors you observe in each graph.
PART A: Calculating Descriptive Statistics
As you complete steps 1-3 below, enter your calculations in Table 2 for the mean, standard deviation, standard
error of the mean, and/or 95% confidence interval as assigned by your instructor.

Table 2. Descriptive statistics for morphological measurements taken from 100 medium ground finches (Geospiza fortis).
The data are presented in two groups: birds that did not survive the 1977 drought (Nonsurvivors) and birds that survived
the droughtTable: Morphological variation in surviving and nonsurviving Darwin’s finches.
(Survivors).

Nonsurvivors Survivors
Body Wing Tarsus Beak Body Wing Tarsus Beak
Descriptive Mass Length Length Depth Mass Length Length Depth
Statistics (g) (mm) (mm) (mm) (g) (mm) (mm) (mm)
Mean
Variance 1.842 5.181 0.701 0.775 3.087 5.448 0.735 0.709
(s2)
Standard
Deviation
Standard
Error of the
Mean
95%
Confidence
Interval

1. For the data in Table 1, calculate the mean for each physical characteristic in the nonsurvivor and survivor
Figure caption: _______________________________________________________________
group. __________________________________________________________________________
__________________________________________________________________________
2. Calculate __________________________________________________________________________
the standard deviation for each set of data. The standard deviation measures the mean difference
between each individual measurement and the mean of the entire population. Standard deviation is a way
to quantify how spread out a set of measurements is compared to the mean.
1/8" Graph Paper

(Note: To calculate the standard deviation for a sample, simply calculate the square root of the variance (s2) for
that sample. In Table 2, the variance has already been calculated.)

3. Calculate the standard error of the mean for each set of data.
Because you are analyzing random samples of 50 birds taken from the entire medium ground finch population
living on Daphne Major, it is not possible to know for certain that the mean you have calculated for each sample
is the same as the mean of the entire medium ground finch population. One way to show how close the sample
mean is to the population mean is to calculate the standard error of the mean (SEM). If you take many random
samples, the SEM is the standard deviation of the different sample means. About 68% of sample means would
be within one standard error of the population mean.

Use the formula below to calculate the SEM:


𝑠
SEM =
√𝑛

The Origin of Species: Beak of the Finch Revised December 2017


www.BioInteractive.org Page 3 of 6

10
1. From our discussion of this example in lecture, and from the video ‘Beak of the finch’,
which variable would you expect to change the most – what was selection supposed to be
acting on?
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________

2. Of the four variables, which seemed to change the most over the course of the drought.
Why do you say that?
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________

3. What do you think is a better metric to measure amount of change; SD standard deviation
or SE standard error, and why? Is it more meaningful to say a mean shifted 1.5 SD’s or
1.5 SE’s?
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________

4. Using the answer above, what trait do you now think changed the most, and what
measure are you using to measure change?
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________

11
Statistical terms worksheet – make sure you understand these!
Population vs Sample

Mean

Median

Mode

Sum of squares

Variance

Standard deviation (of the


sample or population)

Standard error (of the


mean)

(How are SD and SE


different?)
Confidence interval

t-test

t-test is used to:

t-value is an example of a What’s a test statistic? It’s a measure of how different


test statistic our observations are from the null hypothesis. Examples
include t, F, and c2 (chi-square)
Null hypothesis

Alternative hypothesis

P-value

12

You might also like