Samplesize
Samplesize
Sample Size
This document discusses how to determine your random sample size based on
the overall purpose of your research project. Methods for determining the
random sample size are outlined.
Prepared by:
UW-Stout Office of Planning, Assessment, Research and Quality
Contact:
Susan Greene
Revised: 8/13/2012
3/20/2017
1
OFFICE OF PLANNING, ASSESSMENT, RESEARCH AND QUALITY
Inspiring Innovation. Learn more at www.uwstout.edu
RANDOM SAMPLE DECISION TREE
Random Sample
PURPOSE: PURPOSE:
Sample Sample
Generalized to Compared to
Population Population
Computations
Computations
Type of Data
Categorical Continuous
· Nominal Data · Evenly spaced
· 2 or more categories or is a
categories not continuous
ordered number
· Can assign numbers · Distance between
but the value is categories is the
meaningless same
· EX. (yes/no) (male/
female)
2
DEFINITIONS
Project population is the group of individuals you want to generalize your results
to. These are the people you are interested in describing, comparing, predicting.
Population vs.
The project sample is a part of the population you select to produce the results.
sample
Typically, the population is everyone of interest, and the sample is a sub-set of
the population.
The range in a sample distribution between which it is expected that the true
Confidence
population value will lie, given the particular degree of confidence (typically
interval
95% or 99%).
Project research question stated as a hypothesis such that it is assumed that there
Null hypothesis is no effect or no difference between comparison groups. Statistical analysis
tests whether the null hypothesis can be rejected or not. Often symbolized as H0.
Probability that you reject the null hypothesis when it is true --this a false
Alpha positive. Typically, alpha is set by the researcher prior to any statistical testing;
common settings are 0.05 and 0.01. Often symbolized as α.
Probability that you will accept the null hypothesis when it is false – this is a
Beta
false negative. Often symbolized as β.
Probability that you reject the null hypothesis when it is false -- that you are able
Power
to detect a true effect. Often symbolized as 1 - β.
Also called nominal data. Data that has 2 or more categories that are not ordered.
Categorical data Can assign numbers but the absolute value have no practical meaning. For
example yes/no responses, male/female.
Tells us about the error due to sampling -- how well our sample represents the
Margin of error
population.
3
COMPUTATIONS
Notes:
1. For surveys or other archival data with more than one type of data, Cochran1 suggests that
the researcher decides which type of data contains the most critical information for the
success of the project, and base the sample size on that data type. The researcher could also
calculate sample sizes for each type of data and then use the most reasonable number based
on available resources.
2. The results of the chosen estimation method will be for minimum random sample sizes only.
For surveys and longitudinal studies, the researcher will need to increase the sample
size due to non-response and drop-outs. The exact amount of adjustment will depend on
the particular circumstances of the study. It is best to consult with resident experts to
determine the adjustment factor for a specific project.
3. Sample size selection is also dependent on the precision of the measurement tool.
You are sending a survey to a random sample of UW-Stout students that contains a series
of yes/no questions. You want to collect enough responses to reasonably generalize the
results of your random sample to the entire UW-Stout student body. Follow the
“Confidence Interval Method -- Categorical Data” methodology.
Your survey contains rating scale questions – for example, Likert-type scale where
1=strongly, disagree, 2=disagree, 3=neutral, 4=agree, 5=strongly agree. You want to
collect enough responses from your random sample of UW-Stout students to be confident
in saying that the average ratings represent the opinion of all current Stout students.
Follow the “Confidence Interval Method – Continuous Data” method. Note: if you don’t
agree that these types of survey questions yield continuous data, please use the
“Confidence Interval Method -- Categorical Data” methodology.
1
Cochran, W. G. (1977). Sampling Techniques (3rd edition). New York: John Wiley & Sons.
4
Confidence Interval Method
Categorical Data:
Data needed prior to calculations:
· Specify population size
· Specify alpha and margin of error, typically set at 0.05 and 5% respectively.
· Specify variance estimate. For a dichotomous variable use ½ or 0.50 as the estimate of the
population proportion unless you have evidence otherwise.
There are two options for calculating sample size for categorical data – using an online tool, or
doing this by hand.
1. Online tool at http://www.raosoft.com/samplesize.html
2. Hand calculations using the Cochran method outlined in Bartlett, Kotrlik, and Higgins
(2001)2:
𝑡 2 ×𝑝×(1 − 𝑝)
𝑛0 = Equation 1
𝑑2
Where
· 𝑛0 is the minimum estimated sample size
· t is the value of the t-distribution corresponding to the chosen alpha level – for .05 this is 1.96
· p is the estimate of population proportion*
· d is the margin of error – Bartlett et al recommend using 5%
3. If the estimate 𝑛0 is greater than 5% of the overall population, make the following correction:
𝑛0
𝑛1 = Equation 2
1 + 𝑛0 ⁄𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
Where
· 𝑛1 is the adjusted minimum estimated sample size
· 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 is the total population size
2
“Organizational Research: Determining Appropriate Sample Size in Survey Research” accessible online at
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.486.8295&rep=rep1&type=pdf
5
Continuous Data:
Hand computation using the method developed by Cochran and outlined in Bartlett et al.
Where
· 𝑛0 is the minimum estimated sample size
· t is the value of the t-distribution corresponding to the chosen alpha level – for .05 this is 1.96
· 𝑆 is the estimate of standard deviation
· d is the margin of error
5. If the estimate 𝑛0 is greater than 5% of the overall population, make the following correction:
𝑛0
𝑛1 = Equation 5
1 + 𝑛0 ⁄𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
Where
· 𝑛1 is the adjusted minimum estimated sample size
· 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 is the total population size
3
Lenth, R. V. (2001), ``Some Practical Guidelines for Effective Sample Size Determination,'' The American
Statistician, 55, 187-193.
6
Purpose: Sample Compared to Population
When your project results are meant to compare the sample to the broader population, the next
section outlines methods to select your sample size.
Power Method
Option 1: Free online tool developed by Russell Lenth located at
http://www.cs.uiowa.edu/~rlenth/Power/#Advice
Option 2: Free software to download and run on your PC, information located at
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3. G-Power offers more options
for selecting test type than the Lenth tool.
You will need to have the following information prior to obtaining your sample size results
· Statistical test you are interested in running
7
and then select your final sample size. Or conversely, you can use different effect
sizes and a given sample size and estimate the power, review these keeping in mind
your purpose and resources, and then select your final sample size.
2. Examine published literature related to the study and see what the typical effect sizes
are. Could you reasonably expect the same effect size? If so, use this as your base
absolute effect size.
· Determine if you will have balanced or unbalanced sub-groups. For example, if you are
making comparisons between men and women, will you have equal numbers in your
response sample?