Data Collection: Populations and Samples
Data Collection: Populations and Samples
In stats, a population is defined as the whole set of items that are of interest – e.g. all the
people in a town.
A sample is a selection of observations taken from a subset of the population which is used
to find out information about the population as a whole.
The size of the sample affects the validity of drawn conclusions. It depends on the required
accuracy. Generally, the larger the sample, the greater the accuracy. Large samples required
if population is very varied. Different samples could lead to different conclusions.
Often sampling units of a population are individually named or numbered to form a list
called a sampling frame.
Sampling
Random sampling helps to remove bias from a sample. There are three methods:
• Simple random sampling
• Systematic sampling
• Stratified sampling
A simple random sample of size n is one where every sample of size n has an equal chance
of being selected.
To carry out a simple random sample, each item in a sampling frame is allocated a number
to be selected randomly.
These random numbers can be selected either using a computer or using lottery sampling
(taking numbers out of a hat.
In systematic sampling, the required element are chosen at regular intervals from an ordered
list.
In stratified sampling, the population is divided into mutually exclusive strata e.g. males and
females, and a random sample is taken from each. The proportion of each strata sampled
should be the same.
The number sampled in stratum = no. in stratum / no. in pop * overall pop size
Non-random sampling
Opportunity sampling – taking sample from people available at the time and who fit the
criteria. e.g. first 20 people outside a supermarket
Method Advantages Disadvantages
Quota • Allows small sample to • Can introduce bias
represent population • Population must be
• No sampling frame divided into groups
required • Non-responses not
• Quick and easy regarded as such
• Easy comparison
between groups
Opportunity • Easy • Unlikely to provide a
• Inexpensive representative sample
• Highly dependent on
individual researcher
Types of data
When data is presented in a grouped frequency table, the specific data values are not
shown. The groups are known as classes:
• Class boundaries tell you the maximum and minimum values in a class
• Midpoint is average of class bondaries
• Class width is difference between upper and lower boundaries
Some questions will be based on weather data from the large data set provided by edexcel.