PLANNING & TYPES OF DATA
PLANNING & TYPES OF DATA
Page 1 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Planning an Enquiry
Your notes
Statistical Enquiry Cycle
What is the statistical enquiry cycle?
The process for statistical investigations in the real world is a cycle, known as the statistical enquiry
cycle
Being a cycle means there is no simple ‘beginning’ and ‘end’
Instead the steps are repeated with improvements made each time
An ‘improve-repeat’ process like this is known as an iterative process
Page 2 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Plan how the data will be represented (graphs, tables, diagrams, etc.)
Reasons should be given for each choice made in a plan Your notes
Collecting Data
Design data collection to minimise bias
Be aware of possible issues of sensitivity
Collect primary data using an appropriate method
Consider using secondary data (but only if it is reliable)
Processing and Representing Data
Organise the data and process it according to the plan
Clean the data if necessary
Create diagrams, etc., to represent the data
Calculate summary statistics to allow the data to be compared
Consider your target audience when presenting the data
Acknowledge any sources used (e.g. sources of secondary data)
Use technology where appropriate to save time and avoid errors
Interpreting Results
Interpret your summary statistics, and your diagrams, etc., in the context of the investigation
Draw conclusions that are related to the hypothesis
Make any appropriate inferences and predictions
Be sure to comment on the reliability of the results
Evaluating
Identify any possible issues with how the data was collected, processed and represented
Suggest improvements to deal with those issues
Reflect on how appropriate the data representation(s) were for the target audience
Repeat the process with improvements to investigate the hypothesis further
An exam question may directly mention one or more stages of the statistical enquiry cycle
But you should keep the statistical enquiry cycle process in mind when answering any exam
question
Page 3 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Page 4 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Sensitivity
People may be uncomfortable discussing sensitive topics Your notes
Who the data collector is, or how the data is collected, may affect this
Convenience
Some pieces or types of data might be hard to find or collect
What other issues might affect an investigation?
You should also try to anticipate issues that might arise during the statistical enquiry process
And think of proactive ways to deal with these
Being proactive means acting ahead of time, instead of only reacting once a problem has
appeared
Some examples might be:
Difficulties identifying the population you want to study
People may not answer some or all of the questions asked
Some responses or outcomes of an experiment may be unexpected
WORKED EXAMPLE
Guillaume wants to investigate the amount of time students and teachers in his school spend
listening to music.
He is going to write a plan for this investigation.
His hypothesis is
“The amount of time that students spend listening to music is greater
than the amount of time that teachers spend listening to music”.
Write down three other things he should include in his plan.
Explain why each of these things is appropriate.
You must refer to more than one stage of the statistical enquiry cycle.
A question like this doesn’t have a single ‘standard’ answer
To get full marks, the important things are to follow the directions in the question:
Make sure to write down three valid things that should be in his plan
Make sure to give an explanation for each of those things
Page 5 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Make sure to refer to more than one stage of the statistical enquiry cycle
An example from the ‘Collecting Data’ stage could be:
Your notes
Guillaume should use random sampling to choose the students and teachers to include in his study.
This will help reduce bias, because if he only chooses students and teachers he knows well they may
have similar listening habits to him.
‘Collecting Data’ answers could also involve: what data to collect; how to collect and record the
data; a strategy for processing the data; the importance of acknowledging sources for secondary
data; identifying possible issues of sensitivity in collecting data
An example from the ‘Processing and Representing Data’ stage could be:
Guillaume should use box plots to represent the data. This allows easy visual comparison of the data
for teachers and students, and also allows medians and interquartile ranges to be compared easily.
‘Processing and Representing Data’ answers could also involve: how to organise and/or process the
data; what statistical measures will be calculated to compare the data
An example from the ‘Interpreting Results’ stage could be:
Guillaume should compare the medians and interquartile ranges for the students and teachers. The
medians will show which group spends the higher average amount of time, and the interquartile
ranges will show how spread out the results for the two groups are.
‘Interpreting Results’ answers could also involve: planning to make a prediction or an inference
based on the results of the investigation
Possible answers from the ‘Evaluating’ stage could involve: planning to identify any weaknesses in
the approach used or in the representations chosen; planning to improve the process in order to get
a better sense of how valid the hypothesis is
Page 6 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Types of Data
Your notes
Types of Collected Data
What types of data do I need to be familiar with?
There are a number of terms for types of data that you need to be familiar with
You need to recognise and understand them when they appear in exam questions
And be able to use them when writing your answers to questions
Raw data is data in exactly the form that it was collected
i.e. before it has been organised or processed in any way
Raw data can be either quantitative or qualitative
Quantitative data can be recorded as a number
e.g. heights, lengths of time, numbers of people or objects, shoe sizes, etc.
Qualitative data cannot be recorded as a number
e.g. colours, flavours, kinds of animal, makes of car, etc.
Quantitative data can be either continuous or discrete
Continuous data can take any numerical value on a scale
e.g. height, length, weight, mass
For continuous data the measurements can become more and more accurate the more you
'zoom in'
Discrete data can only take on particular numerical values on a scale
Often these are integers (e.g. numbers of people or objects)
But they don't have to be integers (e.g. shoe sizes, which include 'half sizes')
Categorical data is data that can be organised into non-overlapping categories
'Non-overlapping' is important here
Each piece of data can belong to one and only one category
e.g. heights less than 1.7 metres (h < 1 . 7 ) and heights greater than or equal to 1.7 metres (
h ≥ 1.7)
Page 7 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
but not h ≤ 1 . 7 and h ≥ 1 . 7 (because a height of 1.7 metres would belong to both
categories)
The categories can be numerical or non-numerical Your notes
Ordinal data is data that can be written in order
If the data is numbers, these can be ordered in the usual way
If the data is not numbers, then it must be possible to apply a numerical 'rating scale'
e.g. a scale of 1 to 5 with 1 as 'disagree strongly' and 5 as 'agree strongly'
Bivariate data is data that is collected as pairs of values
This could be data collected to investigate
the relationship between two variables
how changes in one variable affect the other variable
e.g. age of car and cost of annual maintenance, train ticket price and length of journey, etc.
What is the difference between primary data and secondary
data?
For the exam, you need to know the difference between primary data and secondary data
This includes recognising the advantages and disadvantages of each
Primary data is data that is collected either by the person who is going to use it, or specifically for the
person who is going to use it
Advantages of primary data:
Can be gathered specifically for the question you are trying to answer
The level of accuracy will be known
The collection method will be known
Disadvantages of primary data:
Collecting data can require a lot of time
It can also be expensive
Secondary data is data that has been collected by somebody else
Some possible sources for secondary data:
the internet
print media (newspapers, magazines, etc.)
Page 8 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
databases
research articles Your notes
census returns
Advantages of secondary data
Can be quicker to obtain (i.e. less time)
Can be easier to obtain (i.e. more convenient)
Less expensive than collecting data yourself
May be more accurate than data you collect yourself (depending on the source)
Disadvantages of secondary data
May be hard to find relevant data for your specific question
The data may be out of date
The level of accuracy may not be known (e.g. the data may have been rounded)
The collection method may not be known
The source of the data may not be reliable
If you use secondary data, it is always necessary to acknowledge the source that the data was
taken from
WORKED EXAMPLE
(a) Which of the following words can be used to describe the data in the following examples?
quantitative qualitative continuous discrete
More than one word might be applicable in each case.
(i) The weights of dogs participating in a dog show.
Weight is recorded by a number, so it is quantitative data
And weight can take on any value, so it is continuous
quantitative, continuous
(ii) The favourite ice cream flavours of the students in a school.
Flavour is not recorded as a number, so it is qualitative data
And only quantitative data can be discrete or continuous
qualitative
Page 9 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Page 10 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
and the frequency of each category (i.e., the number of values in each category) is reported
The categories are known as classes
Your notes
The intervals defining what goes into what class are known as class intervals
Advantages of using grouped data:
The distribution of the data can be seen more clearly
Patterns in the data can be spotted more easily
Disadvantages of using grouped data:
The exact data values are no longer visible
You can only see how many values fall within each class
Statistics calculated from grouped data are less precise
e.g. mean, median and mode from grouped data can only be estimates
What things are important when grouping data?
You must be careful when selecting the class intervals for grouped data
The class intervals must not overlap
For discrete data make sure no data value occurs in more than one class interval
e.g. 0-10, 11-20, 21-30, etc.
For continuous data the class intervals also must not have any gaps between them
Consider how many class intervals to use for grouping the data
If there are too many intervals (too much detail)
or too few intervals (not enough detail)
then it can be hard to spot trends in the data
Class intervals do not all need to be the same width
Page 11 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
You will often see grouped data where the class intervals have equal widths
This is appropriate when the data is roughly evenly spread out Your notes
But sometimes unequal class widths might be more appropriate
e.g. when most of the data values are clustered 'in the middle'
It might make more sense to have wider intervals at the start and end
and narrower intervals in the middle
Too many or too few data values falling into certain class intervals
can make the data representation less useful
Also be careful with class intervals when working with rounded data values
All values that might round to a particular value must fall within the same class interval
e.g. if the data is time rounded to the nearest second
WORKED EXAMPLE
Hazel and Avelaine have been collecting data on the weights of walnuts. After rounding all the
weights to the nearest gram, the weights in their data set (in grams) are as follows:
9 13 17 11 15 16 22 18 14 16 15 19
14 13 10 15 20 14 16 13 12 18 16 12
(a) Avelaine suggests using the following table to group the data:
w < 10
10 ≤ w < 13
13 ≤ w < 15
15 ≤ w < 17
Page 12 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
17 ≤ w < 20
Your notes
w ≥ 20
Based on the nature of the data, suggest one problem with Avelaine's table.
Remember that rounded and unrounded values need to fall within the same class interval
The unrounded weight of any nut could be up to 0.5 grams more or less than the rounded value
Avelaine's table doesn't take account of the rounding of the data.
For example a 9.7 g nut would fall in the w<10 class interval, but the rounded value (10 g) would fall in
the 10≤w<13 class interval.
(b) Hazel suggests using the following table instead:
w < 9.5
9 . 5 ≤ w < 12. 5
12. 5 ≤ w < 14. 5
14. 5 ≤ w < 16. 5
16. 5 ≤ w < 19. 5
w ≥ 19. 5
Page 13 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Your notes
Also make sure your frequencies total up to 24 (the number of data values in the list)
w < 9.5 1
9 . 5 ≤ w < 12. 5 4
w ≥ 19. 5 2
Page 14 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
The researcher suspects that changes in this variable will cause changes in the other variable
The explanatory variable is thought to 'explain' why the other variable changes Your notes
The second variable is called the response variable (or dependent variable)
This is the variable that the researcher measures after changes have been made in the
explanatory variable
The researcher suspects that this variable will be affected by changes in the explanatory
variable
The response variable 'responds' to changes in the explanatory variable
For example, a researcher wants to study the effects of different types of running shoe on how
long it takes runners to run 100 metres
The explanatory variable is the type of running shoe
The response variable is the time taken to run 100 m
Any other variables in an experiment are known as extraneous variables
These should be eliminated or minimised so they don't affect the results
You need to be very careful with explanatory and response variables when drawing a scatter diagram
The explanatory variable MUST be on the x-axis
And the response variable MUST be on the y-axis
WORKED EXAMPLE
In each of the following experiments, state which variable is the explanatory variable and which is the
response variable.
(a) An engineer wishes to study whether temperature has an effect on charging times for mobile
phone batteries.
Explanatory variable: temperature
Response variable: how long it takes the batteries to charge
(b) An education researcher wants to see whether a new AI study app improves students' scores on a
maths test.
Explanatory variable: whether or not a student has used the app
Response variable: scores on the test
(c) An naturalist wants to explore whether the number of offspring successfully raised by breeding
pairs of a particular species of bird depends on the percentage of tree cover in the region where the
birds live.
Page 15 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Page 16 of 16
© 2015-2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers