Statistical Method Book For Lectures
Statistical Method Book For Lectures
A CONCISE COURSE IN
ADVANCED LEVEL
STATISTICS
J CRAWSHAW BSc
J CHAMBERS MA
!
!
I
I
.J
Text© J Crawshaw and J Chambers 1984, 1990, 1994, 2001
Original illustrations© Nelson Thornes Ltd 1994, 2001 Contents
Text © !CT Statistics Supplement, Douglas Butler, 2001
3 Probability 168
507
11 Hypothesis testing (z-tests and t-tests)
507
Hypothesis testing 511 Preface
One-tailed and two-tailed tests
512
Critical z-values 513
Summary of critical values and rejection criteria
513
Stages in the hypothesis test . Introduction
514
Hypothesis test 1: testing ft (the mean of a populatton)
520 This fully revised and updated edition of A Concise Course in Advanced Level Statistics is a
Type I and Type II errors . . . . 528
Hypothesis test 2: testing a bmomial proportton p when n iS large comprehensive text for use primarily by students and teachers of Advanced Level
Hypothesis test 3: testing flt- flz, the difference between means of two Mathematics, both at AS and A2 level. It also provides a useful support for those studying
534 statistics as part of science, social science and humanities courses.
normal populations
547
Summary
Features
560
12 The z 2 significance test Points of theory are explained concisely and illustrated clearly by worked examples, many
560 taken from Advanced Level papers.
The x2 significance test Carefully graded exercises help you to consolidate ideas and gain experience in applying
563
Performing a x2 goodness-of-fit test . . theory to different situations.
2
Summary of the procedure for performrng ax goodness-of-fit test 566
567 Frequent hints pinpoint common misunderstandings and reinforce ideas.
Test 1- goodness-of-fit test for a umform distnbution . Key concepts and formulae are highlighted in colour to increase clarity. Frequent
Test 2- goodness-of-fit test for a distrib~tlO~ m. a g~ven rat10
568
571 summaries provide a quick reference.
Test 3- goodness-of-fit test for a binomial distnbutton Extensive miscellaneous exercises and end-of-chapter tests provide practice in tackling
Test 4- goodness-of-fit test for a Poisson dtstnbutton
573
576 examination questions, providing essential examination preparation.
Test 5- goodness-of-fit test for a normal dtstnbutton . Answers to all exercises are provided.
Summary of the number of degrees of freedom for a goodness-of-fit test 579
582 An ICT supplement explores the use of ICT in the study of statistics.
The x' significance test for independence
590
Summary Specifications
600 The text covers the main theory required in the specifications of all the examination boards
13 Significance tests for correlation coefficients for the statistics sections of AS and A2 Mathematics.
600
Significance tests for correlation coefficients ..
Test for the product-moment correlation coeffiCient, r
600 Examination Questions
Spearman's coefficient of rank correlation, rs
605
608 We are grateful to the following Awarding Bodies for permission to reproduce questions from
Summary their past examinations:
617 Assessment and Qualifications Alliance (AQA), including Northern Examinations and
ICT statistics supplement Assessment Board (NEAB/JMB) and Associated Examining Board (AEB)
The Edexcel Foundation including University of London Examinations and Assessment
645
Appendix Councils (L)
645 Mathematics in Education and Industry (MEl)
Cumulative binomial probabilities
648 Oxford, Cambridge and RSA (OCR) including University of Cambridge Local
Cumulative Poisson probabilities
649 Examinations Syndicate (C), Oxford & Cambridge Schools Examination Board (0 & C)
The standard normal distribution function
649 and Oxford Delegacy of Local Examinations (0)
Critical values for the normal distribution
650 Welsh Joint Education Committee (WJEC)
Critical values for the t-distribution
651 All answers and worked solutions provided for examination questions are the responsibility of
Critical values for the x2 distribution
652 the authors.
Critical values for correlation coefficients
653
Random numbers We hope that you will enjoy using this text and that it will enhance your understanding of
statistics and give you confidence to succeed.
655
Answers J Crawshaw & J Chambers
2001
Representation and summary of data
pie charts
cumulative frequency
DISCRETE DATA
In a survey of lm quadrats in a field the number of snails in each of 30 quadrats was recorded
as follows:
1124023 1 4 2 3 5 2 2 3 2
231232 0 1 1 2 0 3 2 3 3
This is an example of discrete raw data.
Discrete data can take only exact values, for example
the number of cars passing a checkpoint in 30 minutes,
the shoe sizes of children in a class,
the number of tomatoes on each plant in a greenhouse.
The data are known as raw because they have not been ordered in any way.
Frequency distribution for discrete data For example, the measurement 144 ern (given to the nearest em) could have arisen from any
value in the interval143.5 em<; h < 144.5 em.
To illustrate the data more concisely, count the number of times each value occurs and Other examples of continuous data are
summarise these in a table, known as a frequency distribution.
the speed of a vehicle as it passes a checkpoint,
Number of snails 0 1 2 3 4 5 the mass of a cooking apple,
the ti1ne taken by a volunteer to perform a task.
Frequency 3 5 11 8 2 1 Total30
12 12
i 10
"~ 10
- Height (em) Height (em) Height (to the
nearest em)
~
~
8
! 8 - 119.5 <; h < 124.5 119.5-124.5
124.5 <; h < 129.5 124.5-129.5 120-124
6 6
c- 129.5 <; h < 134.5 129.5-134.5 125-129
4 4
I-- 134.5 <; h < 139.5 134.5-139.5 130-134
2
~.
2 135-139
139.5 <; h < 144.5 139.5-144.5
0 0
0 2 3 4 5 140-144
0 2 3 4 5
Number of snails
Number of snails
The values 119.5, 124.5, 129.5, ... are called the class boundaries or the interval boundaries.
Notice that
The upper class boundary (u.c.b.) of one interval is the lower class boundary (I.e. b.) of the
0 in the vertical line graph the distinct lines reinforce the discrete nature of the variable, next interval.
., in the bar chart the bars are all the same width and they are labelled in the middle of the
bar on the horizontal axis.
Width of an interval
The mode The width of an interval is the difference between the boundaries.
'I'he mode is the value that occurs most often. \'!./idth of <1ll interval upper class boundary - lovvcr class boundary
The mode is the most popular value, deriving from the French 'a Ia mode' meaning Often intervals with equal widths are chosen, as in the above illustrations in which each width
fashionable. It is easy to see from the diagrams above that the mode is 2 snails per quadrat. is 5 em.
To group the heights it helps to use a tally column, entering the numbers in the first row
133, 136, 120, ... etc. and then the second row. It is a good idea to cross off each number in
CONTINUOUS DATA the list as you enter it. The frequency distribution for the above data should read:
The following data were obtained in a survey of the heights of 20 children in a sports club.
Height (em) Tally Frequency
Each height was measured to the nearest centimetre.
119.5 <; h < 124.5 I 1 i
133 136 120 138 133 131 127 141 127 143 I
124.5 <; h < 129.5 mt 5 I
130 131 125 144 128 134 135 137 133 129 I
129.5 <; h < 134.5 Mil 7
134.5 <; h < 139.5 II II 4
This is an example of continuous raw data.
139.5 <; h < 144.5
I
Ill 3
Continuous data cannot talze c2<act values but can_ be Vv'ith.in
1neasurcd to a of accuracy. Total20 II
r--·- ---~--------·---~-~--·-- -------------·-----~------·-------------·---- -~---------·-·-----~- ---
The stemplot gives a good idea at a glance of the shape of the distribution. It is easy to pick
It is important to note that when the data are presented only in the form of a grouped
out the smallest and largest values and to see that the mode is 54. It is also obvious that the
frequency distribution, the original information has been lost. For example you would know
modal class is 50-59.
that there was one item in the first interval, but you would not know what it was. You would
know only that it was between 119.5 em and 124.5 em.
Example 1.1
The. maximu.m temperature in oc, measured to the nearest degree, was recorded each day
STEM AND LEAF DIAGRAMS (STEMPLOTS) durmg June m Sutton with the following results:
A very useful way of grouping data into classes while still retaining the original data is to
19 23 19 19 20 12 19 22 22 16 18 16 19 20 17
draw a stem and leaf diagram, also known as a stemplot. 13 14 12 15 17 16 17 19 22 22 20 19 19 20 20
These are the marks of 20 students in an assignment:
Draw a stem and leaf diagram to illustrate the temperatures and write down the modal
84 17 38 45 47 53 76 54 75 22 temperature.
66 65 55 54 51 44 39 19 54 72
Solution 1.1
Notice that the lowest mark is 17 and the highest mark is 84.
In stem and leaf diagrams, all the intervals must be of equal width, so it seems sensible to The smallest value is 12 and the highest value is 23. Grouping the data into intervals
choose intervals 10-19,20-29, 30-39, ... , 80-89 for this data. 10-19,20-29, ... would give you very little information.
Take the stem to represent the tens and the leaf to represent the units. C~oose a sen~ible number of intervals; usually between 5 and 10. Since you must use intervals
w1th equal w1dth, you could use intervals of 2 "C and consider 12-13 14-15 16-17 18-19
The first five entries When all the numbers 20-21, 22-23. ' ' . ' '
84, 17, 38,45 and 47 have been entered the
are represented like this: diagram looks like this: First do a preliminary plot and then arrange the entries in each leaf in order.
Stem and leaf diagram to show assignment marks NOTE: The stem does not necessarily represent the tens digit. For example, suppose you
:a~tsto use mtervals 12-14, 15-:17, 18-20,21-23. The interval18-20 cannot be represented
Stem
1
Leaf
7 9
I Key 1[7 means 17 marks Y . tern of 1, smce the tens d1g1t changes during the interval. For the stem you can use 12
15, 18 and 21. The leaf is then given as the number that is added to the stem. '
2 2
3 3 8 9 Stem Leaf Key 15[2 means 17 "C
4 5 7 12 0 0 1 2 18[ 0 means 18 "C
5 134445 15 0 1 1 1 2 2 2
6 5 6 18 0 1 1 1 1 1 1 1 1 2 2 2 2 2
7 2 5 6 21 1 1 1 1 2
8 4
6 ,t. CO>JCiSE: COUHSE ii\ i\--L _\/[L SLLJiST!CS
Example 1.3
NOTE: The key is essential in explaining how the stemplot has been formed.
Look at this stem and leaf diagram and for each of the three keys provided, give
In a stem and leaf diagram, or :.ternplot
(a) the value ringed,
intervals must be chosen 1 (b) the width of the interval containing the ringed value.
a Js
Stem Leaf (i) the widths of 30 metal components
0 7
Example 1.2 I Key 112 means 1.2 em
0 9
The table gives the number of days on which rain fell in 36 consecutive intervals of 30 days. 1 0 1
1 2 2 (ii) the reaction times of 30 volunteers
21 19 6 12 8 18 9 8 11 17 15 13 1 4 4 4 5 5
16 9 17 18 9 24 17 7 8 17 17 8 1 6 6 7 7 7 Key 112 means 12 hundredths of a second
7 11 16 17 8 5 13 22 20 16 20 13 1 8888990!!
2 0 0 1 1 (iii) the attendance at 30 matches
Draw stem and leaf diagrams with the following class intervals: 2 2 3
2 4 Key 1 12 means 1200 people
(a) 5-9, 10-14, 15-19,20-24
(b) 4-6,7-9, 10-12, 13-15, 16-18, 19-21,22-24.
Solution 1.3
Solution 1.2
(i) (a) 119 means 1.9 em.
(a) Using intervals 5-9, 10-14, 15-19,20-24 the completed stem and leaf diagram is: (b) The interval is 1.8 cm-1.9 em. Since width is a continuous variable, and assuming
Leaf that widths have been measured to the nearest tenth of a centimetre, then
Stem
0 5 6 7 7 8 8 8 8 8 9 9 9 Key 116 means 16 1.75 em<; width< 1.95 em and the class width is 2 mm.
1 112333 (ii) (a) 119 means 19 hundredths of a second, i.e. 0.19 seconds.
1 5 6 6 6 7 7 7 7 7 7 8 8 9 (b) The interval is 0.18 sec-0.19 sec, i.e. 0.175 <;time< 0.195, so the class width is 0.02
2 0 0 12 4 seconds.
(iii) (a) 119 means 1900 people.
NOTE: The stem and leaf diagram could have been written differently, as follows: (b) The interval is 1800 people-1900 people. Assuming that the number has been given
to the nearest hundred, then 1750 <;number< 1950, so the class width is 200 people.
Stem Leaf
5 0 1 2 2 3 3 3 3 3 4 4 4 Key 1511 means 16
10 112333 513 means 8
15 0 1 1 1 2 2 2 2 2 2 3 3 4
20 0 0 12 4 Back-to-back: stemplots
(b) Using intervals 4-6,7-9,10-12, ... the completed diagram, arranged in order is: Stem and leaf diagrams can be used to compare two samples by showing the results together
on a back-to-bad< stemplot.
Stem Leaf
4 1 2 Key 13 12 means 15
7 0011111222 Example 1.4
10 1 l 2
13 0002 Use a stem and leaf diagram to compare the examination marks in French and English for a
16 0 0 0 1 1 1 1 1 1 2 2 class of 20 pupils.
19 0 1 1 2
22 0 2 French 75 69 58 58 46 44 32 50 53 78
81 61 61 45 31 44 53 66 47 57
Both diagrams show that the mode is 17 rainy days, but the seven intervals used in (b)
show more clearly the two peaks, illustrating that the d1stnbutwn IS approximately English 52 58 68 77 38 85 43 44 56 65
bi-modal, with modal classes 7-9 and 16-18. 65 79 44 71 84 72 63 69 72 79
:; ~; ' ' ) \_; ·J ~-(
''! ' 9
''
i'-'
' ! ' ,'\ !' '" ' '
1'·./i ) ') ,il
9.2 7.3 7.0 6.5 5.4 5.3 10.1 8.4 7. Draw back-to-hack stemplots for the following
Solution 1.4 8.8 7.1 7.6 7.9 6.7 9.6 5.5 7.4 data. What conclusions can you draw?
7.0 8.2 5.5 7.8 8.2 7.5 6.1 6.1
The first four entries for French (75, 69, 58, 58) and for English (52, 58, 68, 77) are entered (a) The pulse rates of 30 company directors
3.9 6.8 7.6 8.1 8.0 10.0 were measured before and after taking
into a back-to-bad< stemplot as follows:
5. The daily hours of sunshine in London during exercise.
French English Key (English) August were Before: 110,93, 81, 75, 73, 73, 48, 53, 69,
Key (French) 69, 66, 111, 105,93, 90, 50, 57, 64, 90,
9[6 means 69 3 5[2 means 52 7.0 7.6 12.5 12.9 8.3 9.7 8.4 11.1 111,91, 70, 70,51, 79,93, 105,51,66,93.
4 7.5 7.5 9.8 10.4 11.6 11.3 7.3 7.8 Aftey, 117, 81, 77, 108, 130, 69, 77, 84, 84,
8 8 5 2 8 6.8 6.2 6.1 5.6 5.6 5.8 4.8 4.3 86,95, 125,96,104,104,137,143, 70,80,
6 8 0.0 0.6 0.8 1.6 0.2 2.4 2.6 131, 145, 106, 130, 109, 137,75, 104, 75,
9
97, 80.
5 7 7 illustrate these data on a stem and leaf diagram (Use class intervals 40-49,50-59,60-69, ... )
8 and comment. (b) The ages of teachers in two schools:
The completed diagram, before rearranging, is: School Ac 51, 45, 33, 37, 37, 27, 28, 54, 54,
6. A stem plot is given below but it does not have a 61, 34, 31, 39, 23, 53, 59, 40, 46, 48, 48,
English key. 39, 33, 25, 31, 48, 40, 53, 51, 46, 45, 45,
French
8 Stem Leaf 48, 39, 29, 23, 37.
1 2 3
5 9 School Be 59, 56, 40, 43, 46, 38, 29, 52, 54,
7 4 5 4 6 4 3 4 4 34,23,41,42,52,50,58,60,45,45,56,
6 1 4
733088 5 2 8 6 6 7 8 9 59,49,44,36,38,25,56,36,42,47,50,
6 1 1 9 6 8 5 5 3 9 7 2 3 3('!) 54, 59, 47, 58, 57.
7 9 1 2 2 9 7 5 6 6 6 7 8 (Use class intervals 20-29,30-39,40-49, ... )
8 5 7
8 0 3 4 (c) 20 boys and 20 girls took part in a reaction-
1 8 5 4
8 5 timing experiment. Their results were
The final diagram, arranged in order: measured to the nearest hundredth of a
State the value ringed and the width of the
interval that it is in when the diagram illustrates second.
French English Key (English) (a) the times taken for a journey, where 618
Girl" 0.22, 0.21, 0.18, 0.18, 0.16, 0.19,
Key (French) 8 0.25, 0.22, 0.17, 0.19, 0.16, 0.21, 0.24,
2 1 3 6[3 means 63 represents 6.8 hours,
8[5 means 58 (b) the masses, in g to three decimal places, of 0.22, 0.19, 0.22, 0.25, 0.22, 0.17, 0.22.
7 6 5 4 4 4 3 4 4 Boy" 0.14, 0.20, 0.22, 0.16, 0.19, 0.16,
components, where 61 8 represents 0.068 g.
887330 5 2 6 8 0.15, 0.23, 0.23, 0.19, 0.16, 0.15, 0.09,
9 6 1 1 6 3 5 5 8 9 0.23, 0.11, 0.21, 0.22, 0.18, 0.18, 0.16.
122799 (Usc class intervals 0.08-0.09, 0.10-0.11,
8 5 7
0.12-0.13, ... )
1 8 4 5
From the diagram it is clear that the class had higher marks in English than in French and it
appears that they performed better in English. This would, however, depend on the standards
Of marking used in the two examinations.
WAYS OF GROUPING DATA
la The following frequency distributions show some of the ways that data can be grouped. The
3. A group of adults took part in an experiment information is more concise than the raw data, but the disadvantage is that the original
1. (a) Draw a stemplot to show the masses, correct
to the nearest kilogram, of 30 men. which measured their reaction times. The results information has been lost.
Use intervals 50-54, 55-59, 60-64, ... were given to the nearest hundredth of a second.
0.14 0.17 0.21 0.20 0.20 0.22 (i) Frequency distribution to show the lengths, to the nearest millimetre, of 30 rods
(b) Write down the modal mass.
74 52 67 68 71 76 86 81 73 0.14 0.24 0.26 0.17 0.14 0.17
68 64 75 71 61 63 57 67 57 0.21 0.20 0.22 0.14 0.24 0.26
59 72 79 64 70 74 77 79 65 0.17 0.18 0.17 0.21 0.20 0.23 Length (mm) 27-31 32-36 37-46 47-51
68 76 83 0.17 0.23 0.21 0.23 0.24 0.23
Use intervals 0.14-0.15, 0.16-0.17, Frequency 4 11 12 3
2. A teacher recorded the times taken by 20 boys to 0.18-0.19, ... to draw a stemplot to illustrate the
swim one length of the pool. results. Comment on your diagram.
The interval27-31 means 26.5 mm <;length< 31.5 mm.
The times are given to the nearest second. 4. In a lesson on measurement, 30 pupils estimated
Using intervals 24-25,26-27, ... ,draw a stem the length of a line in centimetres and wrote The class boundaries are 26.5, 31.5, 36.5, 46.5, 51.5
and leaf diagram to illustrate the results. down their value correct to the nearest mm.
32 31 26 27 27 32 29 26 25 25 Using intervals 3.0-3.9, 4.0-4.9, ... ,draw a The class widths are 5, 5, 10, 5
29 31 32 26 30 24 32 27 26 31 stemplot.
(ii) Frequency distribution to show the marks in a test of 100 students (vi) Frequency distribution to show ages (in completed years) of applicants for a teaching post
30-39 40-49 50-59 60-69 70-79 80-99 Age (years) 21-24 25-28 29-32 33-40 41-52
Mark
14 26 20 18 12 Frequency 4 2 2 1 1
Frequency 10
This distribution can be interpreted in two ways: Since the ages are given in completed years (not to the nearest year) then '21-24' means
21 <:age< 25. Someone who is 24 years and 11 months would come into this category.
(a) As discrete data, the interval 30-39 represents 30.;; mark< 40. Sometimes this interval is written '21-' and the next is '25-', etc.
The class boundaries are 30, 40, 50, 60, 70, 80, 100
The class widths are 10, 10, 10, 10, 10, 20 The class boundaries are 21, 25, 29, 33, 41, 53
(b) As continuous data, assuming marks are to the nearest integer, 30-39 would
The class widths are 4, 4, 4, 8, 12
represent 29.5 <:mark< 39.5.
The class boundaries are 29.5, 39.5, 49.5, 59.5, 69.5, 79.5, 99.5
The class widths are 10, 10, 10, 10, 10, 20
HISTOGRAMS
(iii) Frequency distribution to show the lengths of 50 telephone calls
Grouped data can be displayed in a histogram as in the following diagram.
Length of call (min) 0- 3- 6- 9- 12- 18-
Frequency 9 12 15 10 4 0
The interval '3-' means 3 minutes.;;;.; time< 6 minutes, so any time including 3 n1inutes
and up to (but not including) 6 minutes comes into this interval.
Frequency 8 10 16 6
The interval '-250' means 100 g <mass<: 250 g, so any mass over 100 grams up to and
including 250 grams comes into this interval.
The class boundaries are 0, 100, 250, 500, 800
The class widths are 100, 150, 250, 300
.
Frequency 2 7 20 16 5
This histogram represents the following table for the distribution of ages of passengers on a
shuttle flrght from Denver, Colorado to Salt Lake City, Utah.
The interval 30-40 means 30 km/h <:speed< 40 km/h.
The class boundaries are 20, 30, 40, 60, 80, 100 Age, x years 0<;x<20 20<;x<40 40 <;x <50 50<;x<70 70 <;X< 100
Frequency 4 44 36 28 6
The class widths are 10, 10, 20, 20, 20
i!
i !
Histograms resemble bar charts, but there are two important differences. Solution 1.5
The data are continuous.
The class boundaries are 0.5, 20.5, 40.5, 60.5, 80.5, 100.5
there arc nu gaps bct\veu; the The interval widths are 20, 20, 20, 20, 20
t"bc :uc:_~ of each har is nnmrlrtiJm to the ll<..:m.:y that it rcprcsc_nrs. This means that
In this example all the intervals are of equal width and you could use the frequency for the
height of the bar. It is, however, a good idea to use the frequency density for the height of the
bar. The resulting histogram will then have a total area which represents the total frequency.
Histograms often have bars of varying widtbs, so the height of the bar must be adjusted in
accordance with the width of the bar.
Mass (g) Interval width Frequency Frequency density
The vertical axis is not labelled frequency but frequency density where
0.5 <;;x <20.5 20 10 0.5
20.5 <;X< 40.5 20 18 0.9
- -Interval \-Vidth 40.5 <;;X< 60.5 20 24 1.2
60.5 <;X< 80.5 20 14 0.7
Consider the interval 20 .;; x < 40 in the frequency table above. 80.5 <;X< 100.5 20 18 0.9
Frequency~ 44, interval width~ 20, so frequency density~ l_i\ ~ 2.2
Histogram to show the masses of letters
The complete table looks like this: ,.
-~
1.2
c represents
w
TI I I l letter
Ages Interval width Frequency Frequency density (;'
c
,
w
~
1.0
0 <;X< 20 20 4 0.2 ~
~
20<;x<40 20 44 2.2
0.8
40 <;X< 50 10 36 3.6
50<;x<70 20 28 1.4 .
0.6 :..:4
70 <;X< 100 30 6 0.2
18 l8
0.4
14
10
Modal class 0.2
The highest bar in the histogram represents the interval40 <; x <50. This is the modal class. .
Notice that in the table this interval does not have the greatest frequency, but 1t does have the 0.5 20.5 40.5 60.5 80.5 100.5
greatest frequency density. Mass of letter (g)
the mod~1l cbs:; is rhc intu -"a_l vvirh the """""" htqucnrey
1
the 1-:ur in tht' mrror.mrHt The main purpose of histograms is to illustrate grouped continuous data, but they can also be
used to illustrate grouped discrete data.
Example 1.5
Example 1.6
The grouped frequency distribution records the masses, to the nearest gram, of 84letters
delivered by the postman. These are the examination marks for a group of 120 first year statistics students.
Mass (g) 1-20 21-40 41-60 61-80 81-100 Mark 0-9 10-19 20-29 30-49 50-79
Represent the data in a histogram and comment on the shape of the distribution.
Draw a histogram to illustrate tbese data.
Solution 1.6
Finding the frequencies from a histogram
The data are discrete, so, to avoid gaps in the histogram, use class boundaries 9.5, 1.9.5, 29.5, To find the frequency in each interval, use
49.5. This leads to -0.5 and 79.5 as the remaining two boundaries, even though these marks
are outside the range of the discrete data.
The class boundaries are -0.5, 9.5, 19.5, 29.5, 49.5, 79.5
Example 1.7
The interval widths are 10, 10, 10, 20, 30
A Passengers' Association conducted a survey on the punctuality of trains using a particular
Class Frequency station. The histogram illustrates the results.
Mark Width Frequency Density
(a) Construct the frequency distribution.
0-9 10 8 0.8 (b) How many trains were there in the survey?
10-19 10 21 2.1
20-29 10 53 5.3 Histogram to show lateness of trains
30-49 20 23 1.4
50-79 30 10 0.3 £ 9 ~~
I! • I '
'I ;;
'
'Ill II !!• I!
I'
II
,," h
LT
u. \1
-
~
i 'l I i H
,,
c
7
li 11
1
i i i
;c
I
i
i:
rr
8 n ,- _ I " II
~
~
6
u i r:I' I
p
·H~-
5 i' I I• H
I U! ;. !· 1:
4
t
3 r: n E I i• b [1 ir uu
ln :(i ),I·' ji tD
3
53
2 rt! 1 J· I
1-
2
r, r': ~ n n ~~H i i1'i h:
H!
21
28 0 10 20 30 40 50 60 70 80
I 8
59.5
10
69.5 79.5
Number of minutes late (t)
-0.5 9.5 19.5 29.5 39.5 49.5
Marks
Solution 1.7
The distribution has a long tail of values to the right. It is said to be positively skewed.
(a) To find the frequency in each interval, use frequency= interval width x frequency density
HINT: when drawing the histogram you will find it easier to mark out the horizontal axis
Number of (t)
-0.5, 9.5, 19.5, ... using the lines of your squared paper. Then draw in the vertical frequency
minutes late O~t<S 5 o;;;;t< 1.0 10" t < 20 20<;t<30 30<;t<50 50" t < 80
density axis in a suitable position. Anywhere will do for this; it does not have to go through
(0, 0), but could be to the left of -0.5, for example Frequency 5x6.4 5 X 8.8 10 X 2.8 10 X 1.2 20 X 0.6 30 X 0.2
=- 32 =44 =28 =12 =12 =6
u•
i'i'
c
!
i 2cm
Area oc frequency
~
l !···· Area ~k x frequency
! I I 4.8 ~k X 8
I
..
k ~0.6
Total area ~
k x total frequency
I
' I .....
53.4 ~
0.6 x total frequency
53.4
. Total frequency ~ - - ~ 89
! ..
0.6
I
I
...
2 3 4 5 6 7 8 9 10
Number of letters delivered
FREQUENCY POLYGONS
Solution 1.8
The scale on the frequency density axis has not been marked but since you are given that there A grouped frequency distribution can be displayed as a frequency polygon.
are 13 houses in the interval 3-4 it is easy to see the area of four small squares represents one
To construct a frequency polygon, for each interval plot frequency density against the
house. mid-interval value, where
represents 1 house
The frequencies can be deduced directly from this, for example, the interval 7-10 contains
two houses. Then join the points with straight lines.
Total frequency~ 5 + 13 + 10 + 2 ~ 30
There are 30 houses in the street. Example 1.10
To work out the scale on the frequency density axis, note that the interval 3-4 has frequency
Draw a frequency polygon to illustrate this frequency distribution which gives the times taken
13 and is of width 2, therefore frequency density~ 13 + 2 ~ 6.5.
by 31 competitors to complete a cross-country run.
Since the bar is 13 squares high, each square on the vertical axis represents a frequency
density of 0.5. Timet(min) 25<;t<30 30 <;t< 35 35<;t<40 40 <;t< 50 50 <;t< 65
Frequency 4 12 8 4 3
Although it is easier to use frequency density for the vertical scale in the histogram, other
scales can be used, provided that area is proportional to frequency. This is illustrated in the
Solution 1.10
following example.
Mid-interval Frequency
Example 1.9 Time value Interval width Frequency density
A teacher recorded the time, to the nearest minute, spent reading during a particular day by
each child in a group. The times were summarised in a grouped frequency distribution and 25 «<30 27.5 5 4 i~ 0.8
represented by a histogram. The first class in the grouped frequency distribution was 10-19 30 <; t< 35 32.5 5 12 ¥=2.4
and its associated frequency was eight children. On the histogram the height of the rectangle 35<;t<40 37.5 5 8 ~ ~ 1.6
representing the class was 2.4 em and the width was 2 em. The total area under the histogram 40<;t<50 45 10 4 1o = 0.4
was 53.4 cm 2 • so«< 65 57.5 15 3 -b =0.2
Find the number of children in the group. (L)
................---------------------------------------------------
Frequency density Frequency density
Frequency polygon to show times taken to complete a cross-country run
Mid-interval value College A College B
~
c 22.5 0.8 0
•
~ 3
~
u
27.5 1.2 0.4
c
•u
0
32.5 2.2 0.8
.t 37.5 2.8 1.4
42.5 1.8 2.2
47.5 1 2.4
2 52.5 1 2.2
57.5 0.6 1.6
62.5 0 l
67.5 0 0
"' 3 - - CollegeA
~
•
~
~==·CollegeS
6c
•
0
l ""'""'
x~
',
45 50 55 60 65 ~ ><\
25 30 35 40
Time (min) 2 I
\
Note that this distribution is skewed with a tail at the right hand end, i.e. it is positively '\
skewed. ~
You could of course construct the histogram first and then join the mid-points of the tops of I
' \
\
I \
the rectangles to give the frequency polygon. I
I
I 'X \
I
I
I
Comparative frequency polygons I
I
25- 30- 35- 40- 45- 50- 55- 60- 65- Notice that in this example, since all the intervals are of equal width, frequency could have
Age 20-
been used ort the vertical axis.
11 14 9 5 5 3 0 0
College A 4 6
4 7 11 12 11 8 5 0
College B 0 2
FREQUENCY CURVES
Solution 1.11
Work out the mid-interval value for each interval, for example in the interval '20-' the lower When the number of intervals is large the frequency polygon
boundary is 20 and the upper boundary is 25, so mid-interval value~! (20 + 25) ~ 22.5 consists of a large number of line segments. The frequency
polygon approaches a smooth curve, known as a frequency
The width of each interval is 5, so work out the frequency densities for each college by curve.
dividing the frequencies by 5.
21
I
" the number of children in a family, I
• the age at which women marry,
In a uniform or rectangular distribution tbe data are evenly spread throughout the range.
• the distribution of wages in a firm.
(e) The normal distribution
(a) Positive skew
\
~
/
,- f-
1-
' \
rl
'
f-
ih ""'
J \ .
'·"
This symmetrical, bell-shaped distribution is known as a normal distribution.
In a positively skewed distribution, there is a long tail at the positive end of the
distribution. An approximately normal distribution occurs when measuring quantities such as heights,
masses, examination marks.
(b) Negative skew
A negatively skewed distribution could occur when considering, for example,
!_J
e reaction times for an experiment, I i
o daily maximum temperatures for a month in the summer.
I. A researcher timed how long it took for each of 3. On a particular day the length of stay of each car
38 volunteers to perform a simple task. The at a city car park was recorded:
results are shown in the table.
Length of stay (min) Frequency
Time (seconds) 5- 10- 20- 25- 40- 45-
t< 25 62
Frequency 2 12 7 15 2 0 25<;t<60 70
60<1< 80 88
Draw a histogram to illustrate the data. 80<!<150 280
2. In a survey the masses of 50 apples were noted 150" t < 300 30
In a negatively skewed distribution, there is a long tail at tbe negative end of tbe
and recorded in the following table. Each value
distribution. was given to the nearest gram. Represent the data by a histogram and state the
modal class.
86 101 114 118 87 92 93 116
(c) Reverse }-shape 105 102 97 93 101 111 96 117 4. Draw a histogram to show the masses, measured
100 106 118 101 107 96 101 102 to the nearest kilogram, of 200 girls.
104 92 99 107 98 105 113 100
I 103 108 92 109 95 100 103 110 Mass (kg) 41-50 51-55 56-60 61-70 71-75
II I 13
108
99 106 116 101 105 86 88
92
I Ii Ii
I I
(a) Construct a. frequency distribution, using
equal class mtervals of width 5 g and taking
Frequency 21 62 55 50 12
II II
I ' I
the first interval as 85-89.
(b) Draw a histogram to illustrate the data and
write down the modal class.
(c) Dr~1w a stemplot to illustrate the data and
In a }-shaped (reverse) distribution an initial 'bulge' is followed by a long tail.
wnre down the mode.
20% solution
5. This histogram represents the speeds of cars 9. The table shows the ages, in completed years, of Length (em) Frequency
passing a 30 miles per hour sign. Write out the women who gave birth to a child at Anytown
Height (em) Frequency
frequency distribution. Maternity Hospital during a particular year. O<;x<4 2
Without drawing a histogram first, draw a
frequency polygon to illustrate the information.
4,;;x<8 so 0
Describe the distribution.
8<;x<12 51 1
12<;x<16 52 0
Age (years) Number of births 16<;x<18 53 2
,-- 16- 70 18<;x<20 54 5
20- 470 20 <;X< 30 55 9
5
535 56 17 'I
4 1-- 25- I
280 13. Lucy and Jack play a computer game every day 57 25
30-
3
35- 118
and keep a record of their scores. Lucy's scores 58 20 ii
are shown in the table. Draw a frequency I
2 f-- 0 59 12
45- polygon to represent her scores. 1.:
60 9 \!
Frequency
50-99 100-149 150-199 200-249 250-299
6 14 10 6 4
40% solution I
6. In a competition to grow the tallest hollyhock, Number of cigarettes Height (em) Frequency
the heights recorded by 50 primary school Frequency Jack's scores are as follows:
smoked per day 54 0
children were as follows. Heights were measured
to the nearest centimetre. 0-9 5 Jack's 55 2
10-14 8 scores 50-99 100-149 150-199 200-249 250-299 56 2
Height (em) Frequency
15-19 32 57 2
l~requency 2 6 10 16 6
177-186 12 20-29 41 58 7
187-191 8 16 Draw a frequency polygon for Jack's scores on 59 10
30-39
192-196 8 40 and over 2 the same set of axes as Lucy's and use it to 60 11
197-201 9 compare the two sets of scores. 18
61
202-206 7 Draw a histogram to represent this data. 62 18
14. Students were investigating the effects of a
207-216 6 growth hormone placed on the growing tip of a 63 16
11. The marks awarded to 136 students in an
examination arc summarised in the table. Draw a maize seedling. The hormone was used in two 64 9
Draw a histogram and superimpose a frequency different concentrations and distilled water was
histogram to illustrate the data. 65 5
polygon. used as a control on a third set of seedlings. After
three weeks the heights of the plants were 66 0
Marks Frequency
7. The table shows the duration, in minutes, of measured to the nearest centimetre. They are
64 telephone calls made from a High Street call 10-29 22 shown in the table. Draw frequency polygons to 15. In one month, a stUdent recorded the length, to
box in a day. 18 represent the data and compare the results. the nearest minute, of each of the lectures she
30-39
22 Control attended. The table below shows her data and
Length of call (min) Frequency 40-49
the calculations she made before drawing a
S0-59 24 Height (em) Frequency histogram to illustrate these data.
0- 3 60-64 14
1!- 7 65-69 12 45 0 Length of
3- 22 70-84 24 46 7 lecture (minutes) 50-53 54-55 56-59 60-67
6- 20 47 11
6 48 12 Number of
12- 12.
6 49 14 lectures a b 30 c
15- "' 3
-~ 2.5
21- 0 u
g-
2
~ 1.5
-l
-_1·_-_-i__
50
51
14
18
Frequency
density 5 13 7.5 1.5
Draw a frequency polygon to illustrate the data. l l
1=1
52 12
0.5 53 8 Calculate
8. These are the number of times the letter 'e' (a) the value of a, of band of c,
appears in each sentence in an article called 'My 0246810Ul41618WUM~M~ 54 3 (b) the total number of lectures attended during
Kind of Day'. Make a grouped frequency Length (em) 55 1 the month. (C Additional)
distribution and draw a histogram. 56 0
Complete the frequency distribution represented
15 12 8 12 3 10 14 17 5 3 8 11 by the frequency polygon above.
7 16 5 13 12 11 6 7 4 17 8 1
.........--------------------------------------------- 25
Solution 1.13
CIRCULAR DIAGRAMS OR PIE CHARTS First calculate the total sales for each year and the angles in the pie charts.
Pie charts are so called because they look like an apple pie! The areas of the slices or sectors of Total sales (in millions of pounds):
the pie are in proportion to the quantities being represented.
First year F1 = 5.5 + 6.7 + 13.2 + 19.6 = 45
Second year F2 = 5.8 + 15.2 + 9.2 + 29.8 = 60
Example 1.12
The pie chart, which is not drawn to scale, shows the Angles:
distribution of various types of land and water in a certain
county. Calculate America Asia Europe
Africa
(a) the area of woodland, 13.2
4s x 360° ~ 105.6" Total 360"
(b) tbe angle of the urban sector, Farmland First year
(C) 1200,krri2
(c) the total area of the county.
Solution 1.12
1
f~~ x 88 = 660 km
2
(a) 160" represents 1200km 2 , . • 88" represents
2 Work out the ratio of the radii using
Area of woodland = 660 km
(b) 1200 km 2 is represented by 160", 30 km 2 is represented by 1\'~0 x 30 = 4"
d: d = F1 : F2 = 45 : 60 = 3 : 4
Angle for the urban sector = 4" r 1 : r 2 = 'f3 : 14 = 1. 73 · · · : 2
Pie charts of different sizes are useful when comparing two or more populations. The area of Europe
America
each pie will be in proportion to the different population sizes, so if the pies are drawn with America
Europe
radii r and r 2 and represent total population sizes F1 and F2 , then
1 Asia
Dividing by n ri: ri = F1 : F2
Taking square roots r1: r2 = {F,:{F,
r,
tz,ulii ::.,houid be chosen so that
Example 1.14
Example 1.13 On a particular Wednesday the sales of sugar from a supermarket consisted of 250 large
. ' • s, 210 me d"mrn pac1cets an d 225 small packets. The mass of sugar in a large packet is
p·tcket
The table shows, in millions of pounds, the sales of a company in two successive years. 111 tunes that in a medium packet and 2i times that in a small packet. Calculate the angles
needed
. l to dr aw a p1e · chart representing
· t he tota 1masses of sugar sold 111
· large, medmm
· and
Year Africa America Asia Europe sma 1 packets.
For five tests, Ben wants his mean mark to be at least 70.
8. On a certain day, 125 people, each buying one 9. A householder keeps an annual account of four
newspaper, were asked which newspaper they items of expenditure. The figures for the year x1 + Xz + x3 + x4 + Xs ~ 70
had bought. The results of the survey are shown 1991 are shown in the table below. 5
in the table below. 272 + Xc
Item Expenditure (£) --~')70
Number bought
5
Newspaper
Taxes X 272 +x 5 ;;, 350
The Times 10 Travel 1000 x 5 ;;,350-272
The Telegraph 25 Light/Heat y x 5 ~ 78
The Express 40 Telephone 300 To obtain Grade A, Ben must get at least 78 marks in his fifth test.
Some other paper 50
A pie chart was drawn to illustrate these data.
Given that the angles of the sectors representing 4
Calculate the angles of the sectors of a pie chart
of radius 5 em which would illustrate these data. Taxes and Travel were 124° and 80°
respectively, calculate A shorthand way of writing x 1 + x 2 + x 3 + x 4 is I
i=l
xi.
The following day a similar survey was carried (a) the total expenditure for the year,
out and the radius of the pie chart necessary to (b) the value of x and of y,
compare the new set of data with the previous
The symbol L (the Greek capital letter 'sigma') is used to denote 'the sum of'. So for
(c) the angle of each of the remaining sectors.
set was 6 em. Calculate the number of people in
the second survey. (C Additional) In 1992, the total expenditure on the same items x 1 + x 2 + x 3 + ... +X 11 you could write I" xi.
x=l
was £8000. Given that the radius of the pie chart
"
for 1991 was 6 em, calculate the radius of the pie
chart for 1992 in order that the two sets of data . - + Xz + ... + Xn i=l
x1
LX;
may be compared. (C Additional) The mean 1s often denoted by x, so
nx~ ~ --
n
This is rather cumbersome, so usually the subscript i is omitted.
THE MEAN
A typical or average value is useful when interpreting data. One such average is the mean.
Solution 1.16
In the above example, the data could have been arranged in a frequency distribution:
Solution 1.15
Xt +xz+X3+x4 Number of instruments, x 1 2 3 4 5
For the first four tests, ~ 68
4 Frequency, f 11 10 5 3 1
30 t\ CONCISE: COUf\S[ i~-1 /'.-1 ['/F_-1_ ST.t.TIST!CS
The total number of instruments played can be calculated in an organised way as follows: Find the other mid-interval values and form a table:
Mid-interval
X f fxx total number of instruments fx
x Speed (m.p.h.) value, x f
_ 'Efx
1 11 11 total number of people x~--
To obtain
Solution 1.17 >>32 \SHIFT![I] G \2ndF! m
Work out the mid-interval value for the first interval21-25, using lower class boundary~ 20.5, n~s \RCL\ [9 Red kttcrs on third \2nd F\1]
upper class boundary~ 25.5. l:x ~ 160 \RCL\ I!] I ro¥--' of calculator \2ndF\@
So mid-interval value~~ (20.5 + 25.5) ~ 23. To clear \MODE\ [I] \MODE\ [QJ
SD mode
You then assume that all the values in the interval21-25 are in fact 23.
From the calculator, the mean is 32.
The diagram shows a histogram of the distribution of masses of 50 first-year University
Example 1.19 students. All the rectangles are there but the vertical axis has been torn off.
Find the mean number of children per family for the following frequency distribution.
(a) Compile a grouped frequency table for the distribution.
4 5 (b) Use the values in your frequency table to find an approximate value for the mean mass of
Number of children per family, x 1 2 3
the students.
3 4 8 2 3
Frequency, f
I'
Solution 1.20
Solution 1.19 Let one small square be h on the vertical axis.
Casio 570W/85W/85WA Sharp
Remember that in a histogram, the area of each rectangle is proportional to the frequency.
Set SD mode IMODEIIMODEI OJ or IMODEI [I] IMODEIOJ
The areas are
Clear memories ISHIFT I !}ill B I2nd FIICAI
Input data OJ ISHIFT I o rn IDTI OJ~[IJ['5ATA] 5h X 10, 10h X 10, 18h x 5, 22h X 5, 10h
i.e. SOh, lOOh, 90h, 110h, 150h.
X 15
in the order
[I] ISHIFT I [J [l] IDTI [I]~[l]['5ATA]
So the total area~ SOOh.
xxf
rn lsHIFTI o rn IDTI rn ~ rn IDATAl
But total frequency~ the number of students~ 50
[l] ISHIFTI [J [I] IDTI [l] Gl [I] IDATAI
~50
mIsHIFT I o rn IDTI CIJ GJ [I) \DATA\ 500h
h ~ 0.1
To obtain This means that the frequencies are 5, 10, 9, 11, 15, giving a total of 50.
x~2.9 ISHIFT I OJ B I2nd Fl []
f
Lfx~ss IRCLI rnJ I rovv of t..'aku!ator I2nd Fl [±] Mass (kg) Frequency,
Example 1.20 of the interval 60 <: m < 65 is! (60 + 65) ~ 62.5.
mid-point, x frequency, (, fx
_ Lfx
45 5 225 1.e. x~--
55 10 550 Lf
62.5 9 562.5 3242.5
67.5 11 742.5 50
77.5 15 1162.5 ~ 64.85 kg
Lf~50 Lfx ~ 3242.5
40
Mass in kg
4. The amounts spent by 120 motorists at a petrol (a) A student was asked to draw a histogram to
Using the calculator: illustrate the data and produced the
station were recorded.
following diagram.
Casio 570W/85W/85WA Sharp Amount spent, £x Number of motorists
A histogram to illustrate the heights of birch trees
Set SD Mode IMODEIIMODEI [I) or IMODEIIIJ !MODEl [I) x<5 12
5 <x < 10 38
Clear memories ISHJFTIISc!l B l2ndFIICAI ill 20
10<;x<15 42 1: ,-
0
~[><][I] IDATAl
Input Data ~ llli1ITl 0 [I] IDTI 15 <;X< 20
20<;x<40
20
8
•§
~
15
.--
,-
in the order
ITilllli1B'l 0 [I_QJ IDT I lTil [><] [I_QJ IDATA I z
xxf (a) Draw a histogram to represent the data. 10
I62.5IISHIFTI 0 [2] IDTI 162.51 [><] [2] IDATAI (b) Estimate the mean amount spent.
10-19 20
numbers, a record made of the number of matches per
box. The results were as follows: Stem Leaf 20-24 20
(i) not using SD mode, 12 0 0
(ii) using SD mode. 25-29 15
15 () 1 1
Compare your answers. Number of 18 1 1 2 30 14
{a) 5, 6, 6, 8, 8, 9, 11, 13, 14,17 matches per box 47 48 49 50 51 21 0 1 1 2 2 2 31-34 16
24 0 0 1 2 35-39 10
(b) 148, 153, 156, 157, 160 Frequency 4 20 35 24 17 27 1 1
40-59 10
{c) 44!, 471, 48!, stt, 521, 54±, sst, S6i 30 2
Calculate the mean number of matches per box.
(d) 1769,1771,1772,1775,1778,1781,1784 (a) Represent these data by a histogram.
7. The height, correct to the nearest metre, was
(e) 0.85, 0.88, 0.89, 0.93, 0.94, 0.96 3. On a certain day the numbers of books on 40 Give a reason to justify the use of a
recorded for each of the 59 birch trees in an area
shelves in a library were noted and grouped as histogram to represent these data.
(f) of woodland. The heights are summarised in the
1 2 3 4 5 6 7 (b) Calculate an estimate of the mean time
shown. Find the mean number of books on a following table. (L)
taken to answer the calls.
shelf.
4 5 8 10 17 5 1
Number of shelves Height(m) 5-9 10-12 13-15 16-18 19-28
Number of books
(g) 28 29 30 31 32
X 27 Number of
31-35 4
35 trees 14 18 15 4 8
f 30 43 51 49 42 36-40 6
41-45 10
(h) 121 122 123 124 125
X 46-50 13
14 25 32 23 6 51-55 5
f
56-60 2
VARIABILITY Of DATA
Weighted means
Each of these sets of numbers has a mean of 7 but the spread of each is set is different:
In some situations it may not be suitable to calculate an ordinary mean. There may be times
when you wish to place greater emphasis on some of the values, as illustrated in the following (a) 7, 7, 7, 7, 7
example. (b) 4, 6, 6.5, 7.2, 11.3
(c) -193,-46,28,69, 177
Example 1.21 There is no variability in set (a), but the numbers in set (c) are obviously much more spread
A candidate obtained the following results in her GCSE mathematics examination: out than those in set (b).
Paper 1:72%, Paper 2:64%, Coursework: 73% There are various ways of measuring the variability or spread of a distribution, two of which
The regulations state that the two written papers have equal weighting and count for 80% of are described here.
the final result, whereas the coursework counts for 20%. What was the candidate's final
mark?
The range
Solution 1.21
The range is based entirely on the extreme values of the distribution.
The results are in the following ratio:
40% : 40% : 20% ~ 4 : 4 : 2 ~ 2 : 2 : 1.
For the final result, you have to take this weighting into account: In (a) the range~ 7- 7 ~ 0
In (b) the range~ 11.3-4 ~ 7.3
weighted mean~ 2(72) + 2(64) + 1(73) ~ 345 ~ 69 In (c) the range~ 177- (-193) ~ 370
2+2+1 5
Note that there are also ranges based on particular observations within the data and these
Therefore the final mark is 69%. percentile and quartile ranges are considered on page 68.
In if xJ, x 2 , .. ,, ~Y 1 , m\: wl" ,,,, IU 11 thc11
of the deviations of the readings from the mean, x. It is calculated using all the values in the !I
means distribution. To calculate s:
1. Find the weighted mean of the numbers 8 and 3. The prices of articles A, Band Care £30, £42
and £65. Find the mean price, if the three articles • for each reading x, calculate x- X, its deviation from the mean,
12, if they are given the weights 2 and 3 • square this deviation to give (x- X) 2 and note that, irrespective of whether the deviation
respectively. are given weights of 5, 3 and 2 respectively.
was positive or negative, this is now positive,
2. The final mark allocated to a student is 4. The weighted mean of the two numbers 30 and • find r(x- x) 2 , the sum of all these values,
calculated from her mark in each subject. 15 is 20. If the weightings arc 2 and x
(a) The class teacher worked out an ordinary respectively, find x. • find the average by dividing the sum by n, the number of readings;
mean. . . r(x-x) 2 •
(b) The headteacher decided to weight the 5. Two students, Jack and Jill, take an examination thts gtves and is known as the vartance,
subjects in proportion to the number of in French, German and English. The table below n
lessons per week, as shown in the table. shows the marks for each student and the weigbt • finally take the positive square root of the variance to obtain the standard deviation, s.
to be applied to each subject.
Number of lessons :·:, 01 a
Subject French German
Subject Mark per week
I
5 Marks for Jack 80 72 46 .I
Mathematics 64%
English 52% 4 Marks for Jill 64 82 Each of the three sets of numbers on the previous page has mean 7, i.e. X = 7.
Science 71% 6 Weight 2 X
3
Ia) For the set 7 7 7 7 7
French 75%
Calculate the value of x for which Jack and Jill ' ' ' '
History 82% 2 have the same weighted mean mark and find Since x- X = 7- 7 = 0 for every reading, s = 0, indicating that there is no deviation from
value of this mean. (C · the mean.
Which method gave the higher mark and by how
much?
To calculates, put the data into a table:
(b) For the set 4, 6, 6.5, 7.2, 11.3
Machine A Machine B
L(x- x) 2 ~ (4- 7) 2 + (6- 7) 2 + (6.5 -7) 2 + (7.2- 7) 2 + (11.3- 7) 2 ~ 28.78
~
s ~ ~-----;;-- ~
ps.n
-5-~2.4 (1 d.p.)
X
196
x-200
-4
(x- 200) 2
16
X
192
x-200
-8
(x- 200) 2
64
36
198 -2 4 194 -6
198 -2 4 195 -5 25
(c) For the set -193,-46,28,69,177
-1 1 198 -2 4
199
L(x- x) 2 ~ (-193- 7) 2 + (-46- 7) 2 + (28- 7) 2 + (69- 7) 2 + (177- 7) ~ 75 994
2
0 0 200 0 0
200
1
~
s ~ ~-----;;-- ~
t5994
5 ~ 123.3 (1 d.p.)
200 0 0 201 1
9
201 1 1 203 3
201 1 1 204 4 16
Notice that set (c) has a much higher standard deviation than set (b), confirming that it is 2 4 206 6 36
202
much more spread about the mean. 5 25 207 7 49
205
Remember that 56 240
L(x -200) 2 li
Standard dc\'iation L(x- 200) 2
52:::: s' 10
'
Variance 10
~ 5.6 ~24
,,i'
s~m
'
NOTE: s ~iVi
"' The standard deviation gives an indication of the lowest and highest values of the data as ~ 2.37 (2 d.p.) ~ 4.90 (2 d.p.)
follows. In most distributions, the bulk of the distribution lies within two standard
Machine A: s.d. ~ 2.37 g (2 d.p.) Machine B: s.d. ~ 4.90 g (2 d.p.)
deviations of the mean, i.e. within the interval x ± 2s or (x- 2s, x + 2s). This helps to give
Machine A has less variation, indicating that it is more reliable than machine B.
an idea of the spread of the data.
"' The units of standard deviation are the same as the units of the data.
" Standard deviations are useful when comparing sets of data; the higher the standard
deviation, the greater the variability in the data.
Alternative form of the formula for standard deviation
The formula given above is sometimes difficult to use, especially when X is not an integer, so
Example 1.22 an alternative form is often used. This is derived as follows:
Two machines, A and B, are used to pack biscuits. A random sample of ten packets was taken 2 1 2
frmn each machine and the mass of each packet was measured to the nearest gram and noted. s ~- E(x -x)
n
Find the standard deviation of the masses of the packets taken in the sample from each 1
machine. Comment on your answer. ~- L(x 2 - 2xx + x2)
n
1
Machine A ~- (Ex 2 - 2xLx + Ex 2 )
n
(mass in g) 196,198,198,199,200,200,201,201,202,205 :Ex 2 :Ex nX 2
~---2x-+-
Machine B n n n
(mass in g) 192,194,195,198,200,201,203,204,206,207 Lx' LX
~ - - - 2x(x) + x 2 since -=X
n n
Ex 2
~---.x2
Solution 1.22 12
LX 2000 LX 2000
Machine A x ~- ~ - - ~ 200 Machine B x~ - ~ -- ~ 200
n 10 n 10
II
Since the mean mass for each 1nachine is 200, x- X = x- 200
l ,,
iT
l:x 2
NOTE: It is useful to remember tbat - - - x 2 can be thought of as I.
n
'the mean of the squares minus the square of the mean'.
or ir1 the ahcmmivc form
Example 1.23
The mean of the five numbers 2, 3, 5, 6, 8 is 4.8. Calculate tbe standard deviation.
Solution 1.23
S .~
2 Consider again the data given in Example 1.19, on page 32, which shows the number of
Method 1 using )L(xn-x) Method 2 using s ~ )Lnx'- x-z
children in 20 families. The mean is 2.9.
2 -2.8 7.84 2 4 3 4 8 2 3
Frequency, f
3 -1.8 3.24 3 9
5 0.2 0.04 5 25 You could use one of these tbree methods for finding the standard deviation. Method 2 is
6 1.2 1.44 6 36
8 3.2 10.24 8 64 more popular than Method 1.
Input data [I)IDTI [I) IDATAI The standard deviation of the number of children per family is 1.22 (2 d.p.).
llJIDTI llJIDATAI
2
[I] IDTI [I] IDATAI Method 2 - using s ~ )L!xf
~ - . x_ 2
IIJIDTI IIJIDATAI
ITJIDTI rn IDATAl X f x' fx'
To obtain 1 3
s ~ 2.135 ... ISHIFT! [I) B I2nd Fl EJ 1 3
4 16
2 4
You can check 72
3 8 9
x~4.8 ISHIFT I [JJ B I2nd Fl [J
2:x~24
2:x2 ~ 138
IRCLI[ID
IRCLI~
IRCLI [g I
Red he!"'"' e~; 1i1Id
CU\Y u! ,,_';-LLi.ihltor
l I2nd Fl
l2ndFI Q
EJ
I2nd Fl [I]
4
.5
2
3
2:{~20
16
25
2:fx 2
32
75
~ 198
n 5
To clear IMODEI[l] IMODEI [Q]
SD mode
An intelligence test was taken by 115 candidates. For each candidate the time taken to
'Lfx2
52= Tr- (2.9)2 complete the test was recorded, and the times were summarised in a histogram (see diagram).
Write down the frequency for each of the class intervals 0-1, 1-2,2-3,3-5 and 5-10
= W- C2.9l' minutes.
= 1.49 Calculate estimates of the mean and standard deviation of the times taken to complete the
s = "1/1.49
test. (C)
= 1.22 (2 d.p.)
The standard deviation is 1.22 (2 d.p.), as before.
Solution 1.24
Method 3 - using the calculator in SD mode.
Frequency= frequency density x interval width. Note that the interval 2-3, for example, i
This time you need to take account of the frequencies, and this is done in exactly the same represents 2 .<time < 3.
way as when finding the mean:
Time (min) 0-1 1-2 2-3 3-5 5-10
Casio 570W/85W/85WA Sharp
IMODEI[T] Frequency 10 15 25 40 25
Set SD mode IMODEIIMODEI m or IMODEI [2]
Clear memories
To calculate estimates for the mean and standard deviation, use mid-interval values, x.
Input data
Do this in the fx fx'
Time (min) X f
order x x f
0-1 0.5 10 5 2.5
1-2 1.5 15 22.5 33.75
2.5 25 62.5 156.25
2-3
To obtain 160 640
IsHIFT ImB 3-5 4 40
x=2.9 187.5 1406.25
5-10 7.5 25
s = 1.220 ... ISHIFT I [2] El
'£f = 20 IRCLj[g,Ir-,::-.{-ccc-l:-lc_tt_u-·s_o_n_tc-hc-ir-:-d '£{= 115 · '£ fx = 437.5 "£ fx' = 2238.75
)z;£~' -x'=
SDmode 2238.75
s= 3.80 ... 2 = 2.2 (2 s.f.)
Therefore the standard deviation is 1.22 (2 d.p.), as before. 115
In a grouped frequency distribution, the mid-interval value is taken as representative of the The mean time is 3.8 minutes and the standard deviation is 2.2 minutes.
interval, as in the following example.
[You could have calculated these directly using the calculator in SD mode. Check them
yourself.]
Example 1.24
30 If you are given summary information, rather than the raw data or frequency distribution, you
~.E cannot use the calculator in SD mode. You will have to use the formulae to calculate the mean
25 ,---
and standard deviation, as in the following example.
"
u
,11 20
u
'0
Example 1.25
§ 15 ,---
£ (a) Cartons of orange juice are advertised as containing llitre. A random sample of
~ 100 cartons gave the following results for the volume, x.
u 10 1---
g
~
5,.
'Lx = 101.4, 'Lx 2 = 102.83
• l
~
~
Calculate the mean and the standard deviation of the volume of orange juice in these
0 100 cartons.
2 3 4 5 6 7 8 9 10 11
0
Time {minutes)
i'·i .\ 45 ·,I
The standard deviation of the volume is 0.010 litres (2 s.f.) Score Frequency 1
10. For a set of nine numbers I:(x- X) = 60 and
I:x 1 = 285. Find the mean of the numbers.
(b) L(x~997,Lfx 2 ~49711,L(~20 100-106 8
107-113 13
J 1. A group of 20 people played a game. The table
(i) x ~ L.fx ~ 997 ~ 49.85 114-120 24 below shows the frequency distribution of their
L( 20 121-127 11 scores.
128-134 4
The mean length of the rods is 49.85 em. Score 2 4 X
L(x 2 2
49 711
-49.85 2 ~ 0.5275 Number of people 2 5 7 6
(ii) Variance=--- x 5. The stemplot shows the times, recorded to the
L( 20 nearest second, of 12 people in a race.
Given that the mean score is 5, find
The variance is 0.5275 cm 2 • Calculate the mean time and the standard (a) the value of x,
deviation. (b) the variance of the distribution.
(C Additional)
Stem ~e~f J Key 115 means 15 seconds
1
1 5 5 6 6 6 12. From the information given about each of the
1 7 9 9 following sets of data, work out the missing
2 0 1 values in the table:
lf ean a 6. A vertical line graph for a set of data is shown n ~X ~x' x s
2. The table shows the weekly wages in£ of each of below. ·calculate the mean and standard
1. Do not use the statistical program on your
100 factory workers. deviation of the data. (a) 63 7623 924 800
calculator for this question.
(i) For each of the following sets of numbers, (a) Draw a histogram to illustrate this (b) 152.6 10.9 1.7
calculate the mean and the standard information. 57 300 33
(b) Calculate the mean wage and the standard (c) 52
deviation. Try using both forms of the 57 4
deviation. (d) 18
formula for the standard deviation in parts
(a) to (c). In parts (d) to (f) choose one of
the methods. Number of
13. At a bird observatory, migrating willow warblers
Wage£ workers are caught, measured and ringed before being
(a) 2, 4, 5, 6, 8
(b) 6, 8, 9, 11 10 released. The histogram below illustrates the
200<x<250 lengths, in millimetres, of the willow warblers
(c) 11, 14, 17, 23,29
(d) 5, 13, 7, 9, 16, 15 250 <x < 300 16 caught during one migration season.
(e) 4.6, 2.7, 3.1, 0.5, 6.2 300<x<375 40
(f) 200, 203, 206, 207, 209 375<x<400 26
5 6 7 8 9
(ii) Now check your answers using your 400 <x < 500 8
calculator in SD (STAT) mode.
Solution 1.26
(b) State briefly how it may be deduced from
16 the histogram (without any calculation) that 2+3+6+9
·~
ni>
E c
an estimate of the mean length is 111 mm. (a) x ----:--~ 5
4
s~ Explain briefly why this value may not be
~0
'l;;E
c E
12
the true mean length of the willow warblers
caught. S
2 Lx2
~---X
-2 4+9+36+81
4
5 2 ~ 7.5, s~ ru ~ 2.7 (2 s.f.)
~tv 8 n
~0 (c) Given that the lengths, x mm, of the willow
g.gj
~-=
warblers caught during this migration
on 4 season were such that :Ex= 13 099 and (b) Newmean~5+1~6
£0 2+3+6+9+a+b
:Ex 2 = 1 455 506, calculate the standard
deviation of the lengths. (C) 6
0 100 105 110 115 120 125 6
Length (mml
14. For a particular set of observations "'Ef = 20, 20 +a+ b
"f.(x 2 = 16 143, "f.{x = 563. Find the values of the 6
6
mean and the standard deviation. 20 +a+ b ~ 36
(a) Explain how the histogram shows that the a+b~16 ...... ®
15. For a given frequency distribution
total number of willow warblers caught at Lf(x- x) 2 ~ 182.3, '£fx 2 ~ 1025, Lf ~ 30. Variance of original set~ s 2 ~ 7.5. So new variance~ 7.5 + 2.5 ~ 10
the observatory during the migration season find the mean of the distribution. 2 2
is 1'1 8. 4+9+36+81+a +b 2
10 -6
6
16. The speeds of cars passing a speed camera are shown in the histogram.
130 +a 2 + b 2
Calculate estimates of the mean speed and the standard deviation. 10 --c------36
6
130 +a 2 + b 2
46
6
130 + a 2 + b 2 ~ 276
a 2 + b 2 ~ 146 ...... ®
From (i) b ~ 16- a. Substituting in@
a 2 +(16-a) 2 ~146
a2 + 256- 32a + a 2 ~ 146
6
2a 2 - 32a + 110 ~ 0
a 2 -16a+55~0
(a- 1l)(a- 5) ~ 0
.. a~11,a~5
4 If a~ 11, b ~ 16-11 ~ 5
Ha~5,b~16-5~11
(a) State how it may be deduced from the data that the mass of each fish caught by Sam was
Solution 1.27
1.00 kg.
LX 920 (b) The winner was the person who had caught the greatest total mass of fish by 4 p.m.
(a) x~-~-~4.6
n 200 Determine who was the winner, showing your working.
2 (c) Before leaving the waterside, Sam catches one more fish and weighs it. He then announces
2 LX -2 5032 2
S ~---X ~ ---4.6 ~ 4 that, if this extra fish is included with the other two fish he caught, the standard deviation
n 200
is 1.00 kg. Find the mass of this extra fish. (C)
s~\{4~2
The mean is 4.6 errors per page and the standard deviation is 2 errors.
(b) For the errors, y, on the further 50 pages Solution 1.28
Mean~4.4
(a) If the standard deviation is 0, there is no deviation from the mean. All the readings must
.. 4.4~ LY be exactly the same as the mean.
50
LY ~50 x 4.4 ~ 220 Since the mean is 1.00 kg, both fish must have weighed 1.00 kg.
The standard deviation~ 2.2
2
(b) Number of fish Mean Total mass
. z_LY 2
.. 2.2 -So-4.4 1.07 kg 12 X 1.07 ~ 12.84 kg
Ali 12
L y 2 ~ 50(2.2 + 4.4 2 2
) ~ 1210 Les 16 0.76 kg 16 X 0.76 ~ 12.16 kg
For the combined set of 250 pages: Sam 2 1.00 kg 2 X 1.00 ~ 2.00 kg
1HC.1l1
n 1 1-H;
vanance --
n,-'n,
1 ~ -2+x
3
-- - -
3
2
(2+x)' (squaring both sides)
Perch Tench Roach Mean mass (kg) Standard deviation (kg) 4 ± 172.
4
Ali 2 3 7 1.07 0.42
x~3.121 ... (ignoring negative value for x)
Les 6 2 8 0.76 0.27
Sam 1 0 1 1.00 0 Mass of Sam's extra fish is 3.12 kg (2 d.p.)
50
13. The figures in the table below are the ages, to the Number of Mean cost S.D.
nearest year, of a random sample of 30 people (£) (£)
holidays
ean a negotiating a mortgage with a bank.
10. The manager of a car showroom monitored the ShopR 32 190.35 10.4
1. The mean of ten numbers is 8. If an eleventh 29 26 31 42 38 202.25 15.5
number is now included in the results, the mean numbers of cars sold during two successive
38 38 ShopS 24
five-day periods. During the first five days the 45 35 37
becomes 9. What is the value of the eleventh 36 39 49 40 32 j,
numbers of cars sold per day had mean 1.8 and (L)
number? 32 34 27 61 29
variance 0.56. During the next five days the
33 31 33 52 44
numbers of cars sold per day had mean 2.8 and 15. Three random samples of 50, 30 and 20 bags
2. The mean of four numbers is 5, and the mean of 32 30 38 42 33
variance 1.76. Find the mean and variance of the respectively are taken from the production line of
three different numbers is 12. What is the mean numbers of cars sold per day during the full ten
Copy and complete the following stem and leaf '12 kg bags' of cat litter. The contents of each
of the seven numbers together? days. (NEAB) bag are then weighed. A summary of the results
diagram. Use the diagram to identify two
features of the shape of the distribution. is shown in the table.
3. The mean of n numbers is 5. If the number 13 is
11. Prior to the start of delicate wage negotiations in
now included with then numbers, the new mean
is 6. Find the value of n.
a large company, the unions and the
management take independent samples of the
25
30
I 41 1
Size
Mean wt.
(kg)
S.D.
(kg)
Sample
work force and ask them at what percentage 35
4. The mean of the numbers 3, 6, 7, a, 14, is 8. level they believe a settlement should be made. 11.8 0.5
Find the mean age of the 30 people. Given that 1 50
Find the standard deviation of the set of The results are as follows: 18 of them are men and that the mean age of the 2 30 12.1 0.9
numbers. men is 37.72, find the mean age of the 12
3 20 11.7 1.1
Standard women. (ME/)
5. The numbers a, b, 8, 5, 7 have mean 6 and
variance 2. Find the values of a and b, if a> b. Sample Size Mean deviation Find, in kilograms to two decimal places, the
14. A travel agency has two shops, RandS. The
mean weight per bag and the standard deviation
350 12.4% 2.1% number of holidays purchased in a particular
6. For a set of 20 numbers lli = 300 and 'management' for the 100 bags. (L)
week and the mean and standard deviation of the
Z.x 2 = 5500. For a second set of 30 numbers 'union' 237 10.7% 1.8%
costs of these holidays at each shop are shown in
Lx = 480 and :Ex 2 = 9600. Find the mean and the 16. The average height of 20 boys is 160 em, with a
the following table. standard deviation of 4 em. The average height
standard deviation of the combined set of Assuming that no individual was consulted by Calculate the mean, and, to the nearest penny,
50 numbers. of 30 girls is 155 em, with a standard deviation
both sides, calculate the mean and standard the standard deviation of the costs of all the
of 3.5 em. Find the standard deviation of the
deviation for these 587 workers. (AEB) 56 holidays purchased.
7. If the mean of the following frequency whole group of 50 children.
distribution is 3.66, find the value of a.
12. In a germination experiment, 200 rows of seeds,
5 6 with ten seeds per row, were incubated. The
1 2 3 4
frequency distribution of the number of seeds
3 9 a 11 8 7 which germinated per row is shown below. SCALING SETS OF DATA
Number of seeds germinated Frequency
8. A bag contained five balls each bearing one of
the numbers 1, 2, 3, 4, 5. A ball was drawn from .0 4 Example 1.29
the bag, its number noted, and then replaced.
1 10
This was done 50 times in all and the table Sweets are packed into bags with a nominal mass of 75 g. Ten bags are picked at random
below shows the resulting frequency distribution. 2 16
from the production line and weighed. Their masses, in grams, are
3 28
Number 1 2 3 4 5 34
4 76, 74.2, 75.1, 73.7, 72, 74.3, 75.4, 74, 73.1, 72.8
11 y 8 9 5 44
Frequency X
6 32 (a) Use your calculator to find the mean mass and the standard deviation.
If the mean is 2.7, determine the values 7 16
of x andy. 10 It was later discovered that the scales were reading 3.2 g below the correct weight.
8
9 6
9. Parplan Opinion Polls Ltd conducted a (b) What was the correct mean mass of the ten bags and the correct standard deviation?
nationwide survey into the attitudes of teenage 10 0
girls. One of the questions asked was 'What is (c) Compare your answers to (a) and (b) and comment.
the ideal age for a girl to have her first baby?' In (a) Calculate the mean and the standard
reply, the sample of 165 girls from the Northern deviation of the number of seeds
zone gave a mean of 23.4 years and a standard
germinating per row.
deviation of 1.6 years. Subsequently, the overall Solution 1.29
sample of 384 girls (Northern plus Southern For another 50 rows an analysis shows that t~e
zones) gave a mean of 24.8 years and a standard mean is 4.4 seeds and the standard deviation IS
(a) According to the scales with measurements being given in grams
deviation of 2.2 years. 2.2 seeds.
(b) Determine the mean and, to two decimal
Assuming that no girl was consulted twice, X= 74.06, s = 1.166 ... = 1.17 (2 d.p.)
places, the standard deviation for the
calculate the mean and standard deviation for
the 219 girls from the Southern zone. (AEB) 250 rows.
53
You can see from the diagram that the new set of data is much more spread out.
(b) The correct readings are:
79.2, 77.4, 78.3, 76.9, 75.2, 77.5, 78.6, 77.2, 76.3, 76 Original mean
(c) Notice that 77.26- 74.06 ~ 3.2 i.e. correct mean- original mean~ 3.2
So correct mean= original mean+ 3.2: correct s.d. ==original s.d. 6 8 10 12 14 16 18
2 4
X
If each reading is increased by 3.2, then the mean is increased by 3.2. The standard X X X
!::_
Showing the two sets of readings on a graph helps to show that although the mean increased,
the spread of the data about the mean remained the same.
va!nt of/::_.,
Original mean
Original data
X X xxxx
t X X X
X
For example, if y = -!x, since HI~~·
76 77 78 79
73 74 75
72
Example 1.30
Joe's mean mark for the physics tests for the term was 72. His teacher decided to scale all the
marks according to the formula y ~ 2x - 6, where y is the new mark and x the original mark.
'li
se 1h Scali sets of d
Example 1.32
1. (a) Find the mean and the standard deviation of {a) the values of a and b,
(b) the value of the scaled mark which
For students on an Electronics course the assessment consists of two components: a written
examination paper and a project. The marks for the examination paper are distributed with a
mean of 62 and a standard deviation of 16. Those for the project have a mean of 37 and a
the set of numbers 4, 6, 9, 3, 5, 6, 9.
(b) Deduce the mean and the standard deviation
of the set of numbers 514,516, 519,513,
corresponds to a mark of 64 in the original
data,
(c) the value in the original data if the scaled
il
515, 516, 519.
standard deviation of 6. Anna, a student on the course, scored 80 marks on the examination (c) Deduce the mean and the standard deviation mark is 79.
of the set of numbers 52, 78, 117, 39, 65,
paper and 46 marks for her project. 78, 117. 6. The marks of five students in a mathematics test
were 27, 31, 35, 47, 50.
Transfonn each of Anna's marks into a standardised score, such that, for each 2. A set of numbers has a mean of 22 and a (a) Calculate the mean mark and the standard
(a)
component, the mean and standard deviation for all students on the course are 50 and 20, standard deviation of 6. If 3 is added to each deviation.
number of the set, and each resulting number is {b) The marks are scaled so that the mean and
respectively. then doubled, find the mean and standard standard deviation become 50 and 20
Hence cmnpare Anna's relative perfonnance in the two assessment components. (NEAB)
(b) deviation of the new set. (C Additional} respectively. Calculate, to the nearest whole
number, the new marks corresponding to
3. A set of values of a variable X has a mean 11 and the original marks of 31 and 50.
a standard deviation a. State the new value of the IC Additional)
Solution 1.32 mean and of the standard deviation when each of
the variables is (a) increased by k, (b) multiplied
7. It is proposed to convert a set of values of a
by.p. Values of a new variable Yare obtained by
(a) Standardised values: y = 50, sy = 20 usmg the formula Y = 3X + 5. Find the mean and
variable X, whose mean 'and standard deviation
are 20 and 5 respectively, to a set of values of a
Examination X= 62, sx = 16 the standard deviation of the set of values of Y.
variable Y whose mean and standard deviation
Let y=ax+b (C Additional)
are 42 and 8 respectively. If the conversion
then y =ax+ b 4. Show that the standard deviation of the integers
formula is Y =aX+ b, calculate the values of a
50= 62a + b ...... ® 1, 2, 3, 4, 5, 6, 7 is 2.
and of b. (C Additional)
Using this result find the standard deviation of 8. In order to compare the performances of
Now = asx
Sy the numbers candidates in two schools a test was given. The
20=ax 16 (a) 101, 102, 103, 104, 105, 106, 107.
mean mark at school A was 45, and the mean
a= 1.25 (b) 100,200, 300 400 500 600 700
mark at school B was 31 with a standard
l~i 2.o1, 3.02, 4.<13, 5.04, 6:o5, 7.o6, 8.o7. deviation of 5. The marks of school A are scaled
{ } Wnte down seven integers which have so that the mean and standard deviation are the
Substituting in (j)
50= 62 X 1.25 + b mean 5 and standard deviation 6. same as school B and a mark of 85 at school A
b = -27.5 (L Additional)
becomes 63. Find the values of a and b if the
5. It is prop transformation used is y =ax+ b. Find also the
. ose d to convert a set of marks whose
The transformation for the examination paper is y = 1.25x- 27.5 mean original standard deviation of the marks from
m·t k IS 52
. h,and st an d ar d d evtatton
. . ts . 4 to a set of
school A.
When x = 80, y = 1.25 x 80-27.5 = 72.5 , r ,s Wit . mean 61 an d stand ar d deviation 3.
Th,
c equation
conve t h for th. e t rans £ormatwn
. necessary to
Anna's standardised mark for the examination is 72.5. r t e marks IS y =ax+ b. Find
T""'''
I
Example 1.34
(c) Using the frequency table estimate the mean
and standard deviation of the marks. Use the coding y ~ 200
x-25 000 to find the mean and standard d evtatwn
000 . . of the foll owmg:
.
9. The fo\lowing is a set of 109 examination marks (d) The marks are to be scaled linearly by the
relation Y =a+ bX where X is the old mark
ordered for convenience. and Y the new mark. The new mean and
20 150 000 175 000 200 000 225 000 250 000 275 000
16 17 18 X 125 000
11 11 12 13 14 2.1 25 26 26 standard deviation are to be 50 and 10
6 respectively. Using your estimates in (c) 3
24 25 25 31 19 27 35 24 12
21 21 23 28 28 29 29 29 30 36 calculate suitable values for a and b. f 5
27 27 28 32 33 33 34 34 35 39
31 32 32 38 38 39 40 10. The mean of the marks scored by candidates in
37 37 38
36 37 37 39 39 39 40 40 40 43 an examination is 45. These marks are scaled Solution 1.34
39 39 39 42 42 42 linearly to give a mean of 50 and a standard
40 41 41 41 42 47 47 47 47
40
45 46 46 53 deviation of 15. Given that the scaled mark of 80 y ~ x- 200 000
43 43 44 52 52 53 62 corresponds to an original mark of 70, calculate
50 50 51 51 52 59 61 (a) the standard deviation of the original marks, 25 000
48 59
57 58 58 82
54 54 5.1 67 70 76 77 (b) the mark which is unchanged by the scaling. so 25 OOOy ~ x - 200 000
66
63 64 66 I.e. ": ~ 25 OOOy + 200 000
Given that the greatest and least scaled marks are
(a) Construct a grouped frequency distribution x ~ 25 OOOy + 200 000 and sx 25 OOOs y
using a class width of 10 and starting with 92 and 2 respectively, calculate the
corresponding original marks. (C Additional)
o-9.
(b) Draw a histogram and comment on the X 200 000
y f fy fy'
shape of the distribution. X
y~ r.fy-
25 000
Zf- - 23 ~ -0.184
45
125
125 000 3 5 -15
USING A METHOD OF CODING TO FIND THE MEAN AND STANDARD 150 000 -2 19 -38 76 2 Zfy'
27 -27 27 s, ~v-'5/
175 000 -1
0 35 0 0 247
DEVIATION 200 000
24 24 ~ 125- (-0.184)
2
225 000 1 24
250 000 2 12 24 48 ~1.942 ...
Example
Salt is1.33 9 27 s, ~ 1.393 ".
~
packed in bags which the manufacturer claims contain 25 kg each. Eighty bags are 275 000 3 3
examined and the mass, x kg, of each is found. The results are Z(x- 25) 27 .2, r, fy'
~
J:.f 125 J:.fy 23 247
Z(x- 25\" 85.1. Find the mean and the standard deviation of the masses.
X~ y + 25 a
Now if y ~ x- 25, then x ~ y+ 25 /;
So x ~ 0.34 + 25
Therefore ~ 25.34 thc11 X=tl~
Sx = s)' so
Also Sx ~ 0.9737 ...
so and
The mean mass is 25.34 kg and the standard deviation is 0.97 kg (2 d.p.).
NOTE: The value 25 used here is sometimes kuown as the assumed mean.
~'i
(a) when the data are discrete and ungrouped- by drawing a step diagram,
(b) when the data are continuous or in the form of a grouped discrete distribution - by
drawing a cumulative frequency polygon or curve.
1\ d deviation of the
Time {min)
Frequenc_!_
nd the stan d ar 0 (a) Cumulative frequency - step diagrams for discrete ungrouped data
1. Find the mean af d using the coding -15
following sets o ata, 3
-20 The table shows the number of attempts needed to pass the driving test by 100 candidates at a
indicated: x- 312 2
-25
X f y~~ 6 particular test centre.
(a) -30
10 2 3 4 5 6
304 1 -35 Number of attempts 1
7
308 5 -40 4 2
2 Frequency 33 42 13 6
312 9 -45
1 (Number of candidates)
316 4 -50
320 4 . taken to feed the The cumulative frequency distribution is formed as follows:
t the mean ume .
324 2 CaIcuIa e hod of codmg.
animals, using a met <:1 <:2 <3 <;4 <:5 <;6
x-450 Number of attempts
y~~ h times taken on
Interval f 5 The table shows t e f oach to complete Cumulative- frequency 33 75 88 94 98 100
(b) · 30 consecuuv · e days or ac · h ve
. ular route. T1mes a
3
100 <:x < 200 one journey on a part;~st minute. Find the_ m~an T T T
200 <;X< 300
7 been given t~ the nea d the standard devmtton, 33 +42 33 + 42 + 13 total number
12 time for the JOurney ad-
300<:x<400 using a method of co mg. of candidates
18
400 <:x < 500
12 Frequency Plot the cumulative frequency against the number of attempts and decide how to join the
500 <;x < 600 Time (min)
6 1 points.
600 <;x <700
60-63
X- 0.0225 3
y- 64-67
f 0.005 12
(c) Interval 68-71
10
5 72-75
0- 4
10 76-79
o.oo5-
0.01- 13
tudents timed how long it
Q.Q15- 18 6. In a practical d~ss ~f their saliva to break dow~
0.02- 12 took for a samp e. The times to the neares
6 a 2% starch solutt<;>ll·h ble b~low. Find the
0.025- hown lD t eta
6 second _are s . a method of coding.
0.03- mean tlme, usmg
0
o.Q35-
Time (seconds)
Frequency
1
ii
· 1 t of data 11-20
2. For a partlCU ar se I 50)2 = 238.4 2
~'x- 50)~ 123.5, :E,x- 21-30
n = 100, .:..., d d deviation of x. 5
e stan ar 31-40
Find the mean and th 11
41-50
8
· nee of x it 51-60
3 Find t he vana 2 2593 2
· 7 Lf'x 100) ~ '
:E{(x-100)~12, 1
- 61-70
1
:E f ~ 20 71-90
onth the owner ot a i:
4 Each morning tor am I it took to feed the
. h ld. timed how ong
small o mg
animals. The resu1ts
were as shown: _:\ i i h ~
l_ r:
PI ottmg
· t he values (1, 33), (2, 75), (3, 88), ... tells you that 33 people took 1 attempt,
7 5 people took<: 2 attempts, 88 people took<: 3 attempts.
~f you )Om the points with straight lines, such as from A to B, and consider a cumulative
""Jl\fE fREQUENCY
CUM Ul,... . A cumulative
y up to a particular ttem. d be illustrated
rcquency of 50, this would suggest that 50 people took<: 1.4 attempts which is nonsense.
0 1 3 14 24 29 30
Cumulative frequency
roul number
P) t'ln•b
The step diagram is necessary because the values are not distributed evenly throughout the
intervals 1 to 2, 2 to 3, ... but 'jump' or 'step up' from 1 to 2, then 2 to 3 and so on.
(a) Consider a cumulative frequency of 50. From the graph, the number of attempts is two. 30
This means that when the data are placed in ascending order of size, the 50th item is 2,
i 27
I
l>~
i.e. the 50th person took two attempts.
~
lb I To f;od "'"" >,,ok "0 '" loo> ,,,.,.,, '" "' ""''" ll1 loom < "" >he bo>iwrt rul l 20
u
axis, to the top of the step, then go left to the cumulative frequency axis.
This shows that 94 candidates took up to four attempts.
lf you go to the bottom of the step, this tells you the number of candidates who wok
fewer than four attempts (88 in this case).
N otic> >h » l> ""I 1 '"'k" """ whw >"" "' d f,oro >b> dl '""' do" "" >he bwlw•""' ood>
It would be silly to consider 3.6 attempts, for example.
Note that in a step diagram, the mode is given by the value of the variable that gives the l t-"1Jt'f'r_;_r!fli'Mll.)JjlillJJl
o "-"'-"";
3 9 10.512 1516.518 21
Height (em)
'steepest' step.
From the graph above, the mode is two.
Solution 1.35
Values can be estimated from the graph. Note that the graph can be read in either direction. Frequency-- frequency density x width 'so f or 3 (age<5 f-20
Calculating the oth f . ' - 0 x 2 = 400
tl
er requenc1es gives
(i) To find the number of plants that were less than 10.5 em tall:
" Find the height 10.5 em on the horizontal axis
• Draw a vertical line up from 10.5 to meet the curve Age 3o;;;;x<5 7<:x<11 11 <;;X< 16 16 <;;X< 18
" Draw a horizontal line to the cumulative frequency axis and read 10.5 Frequency 400 800 1800 2000 600
the value (Number of pupils)
From the graph, seven plants were less than 10.5 em tall
(a) The cumulative frequency table is
(ii) To find x where 90% of the plants were less than x em tall:
••
.,
90% of 30
Find 27
the
Draw
on =the
curve
27vertical axis and draw a horizontal line to meet
the value
t:L 16.5
Age in years up to
Cumulative frequency
3
0 400
5 7
1200
11
3000 5000
16 18
5600
From the graph, 27 plants were less than 16.5 ern tall, sox= 16.5
Example 1.35
A survey is carried out to determine the numbers of pupils in various age groups who are
attending nurseries, schools and colleges within a certain area. The results are summarised in
ii
o+WbllliillL--4--~-----rJ-------~L-_J~~
15
20
Age in years
! i
10
5
0
(a) Copy and complete the following table showing the ages of the pupils and the
· L: ll I:
correspOnding cumulative frequencies.
16 18 If 30'Xo of the pupils exceed a
(c) N
7 11
3 5 , ow 70%
From the of h5600-- 3920 certam age, then 70% of pupl'1s are younger than thi
Age in years up to 5600
grap ' 3920 pupils ha s age.
0 ve an age up to 13 3 .
Cumulative frequency so 30% of the pupils are old h . years, I.e. approx 13 years 4 months
er t an 13 years 4 months.
(b) Draw a cumulative frequency diagram for the distribution.
(c) Use your cumulative frequency diagram to estimate the age exceeded by 30% of the pupils
in the survey. (NEAB)
64 (\ (c) Form the frequency distribution and calculate the frequency density for each interval,
where frequency density ~ frequency + interval width.
Note that the width of each interval is 5.
articular morning. A
Example 1.36 \ h m to travel to college on a p Upper Cumulative Frequency
l dhow long rt too c t e frequency Tline (min) Frequency density
dl~~-~b~u~ti~o~n~w':as~fo:r~m:e::d::.'---;,::;;;;;fu;;fr;:,;~
boundary
Students were
umulative as ce
frequency
C
lStn
cumulative frequenc~ 5 28 0- 28 5.6
Tfilin~e~t~ak~e=n~(~m~in~u~t:es~)_ _~::~-;;---- 10 45 5- 45-28 ~ 17 3.4
::.: 28
15 81 10- 81-45 ~ 36 7.2
<5 45
<10 20 143 15- 143-81 ~ 62 12.4
81
<15 25 280 20- 280-143 ~ 137 27.4
143
<20 30 349 25- 349-280 ~ 69 13.8
280
<25 35 374 30- 374-349~ 25 5
349
<30 40 395 35- 395-374 ~ 21 4.2
374
<35 45 400 40-(45) 400-395 ~ 5 1
395
<40 400 Total ~400
<45
. f quency polygon. · d
(a) Draw a cumulattve re dents took less than 18 mmutes. t frequency distribution an
(b) Estimate how many stu als of 0-, 5-, 10-, ... ,construe a
(c) Taking equal class mterv
draw a histogram.
10
t ken to travel to college I
Solution 1.36 o show the umes a I
tive frequency polygon t ' -: I I I
(a) Cumula ':\ 0
0 5 10 15 20 25 30 35 40 45
\i, ·:t\ Time (minutes)
r\ \.
~~s
<100 200 300
<10 0% 0 0% 0
<20 2% 10- 2% 0.2
<30 4% 20- 2% 0.2
90
" <40 8% 30- 4% 0.4
'
3 80
70
Girls
<50
<60
<70
<80
<90
14%
25%
40%
82%
98%
40-
50-
60-
70-
80-
6%
11%
15%
42%
16%
0.6
1.1
1.5
4.2
1.6
60 <100 100% 90 2% 02
Total100%
50
I
g 3
~
I
20
x
I l
I -
,-
10 fRI ,-
0
0
>'fX
20 40 60 80 100
MarK 0
0
n
10 20 30 40 50 60 70 80 90 100
h
Great care must be taken when comparing these curves. A common mistake is to say that the Mark
boys have done better than the girls because the boys' graph is above that of the girls. If you
calculate the corresponding percentage frequencies and draw the histograms you will see that
lnterquartile range~ ~3
0 - Q, -- 27 -23~4
Q, ~ 1(150 + 154) ~ 152
lnterpercentile range (c) 147 150 T
I 154 158 TI 159 162 ~164 165
Ranges between various percentiles can be found. For example, the range giving the middle Q, ~ 1(158 + 159) ~ 158.5
80% of the readings is found by subtracting the 1Oth percentile from the 90th percentile, Q, Q2 Q Q3 ~ ,(162 + 164) ~ 163
I 3
nterquartile range~ Q 3 - Q 1 ~ 163 - 152 ~ 11
i.e.
When P9o- P10. the median and percentiles, it is important to take note of whether the data are
finding
grouped or ungrouped.
(d) 10 12 113 15 @) 19 24 I 26 26 Q, ~ ~(12 + 13) ~ 12.5
Q, ~ 19
o
~1
T
Q, 0 Q 3 ~ !(24 + 26) ~ 25
~3
7 l)th
Ungrouped data - median . uartr·1 e range~ Q,- Q, ~ 25- 12.5 ~ 12.5
S lnterq
for ungrollpcd data consisting of n observations in order of size, the mc;dirm is the
~
, omettmes
. the fo ll owmg
. rule is used to find h .
observation. Q' l(n + 1) h t e quarttles:
(a) Consider this set of numbers: 7, 7, 2, 3, 4, 2, 7, 9, 31.
There are nine numbers, so the median is the i(9 + 1)th observation, i.e. the This .4 t value , 0~3 ~ 41(n + 1)th value.
)th obs~:nation
The median is 7.
Therefore Q, is 0 · 135 second s.
Q,
0.2 isseconds.
Th the
f 18.5th value . Th"JS Js
· halfway between the 18th and 19 t h va 1ues, whJch
. are both
Example 1.37 time experiment was performed first with 21 girls and then with 24 boys. The
A reaction ere ore Q, is 0.2 seconds.
results are shown on the stem and leaf diagram. The interquartile range ~ Q 3 - Q 1 -- 0 • 2 - 0.135-0 065
Summary of results - · seconds
Reaction times
Key (Boys)
1\8 means 0.18 sec. Girls Boys
Key (Girls)
6 \ 1 means 0.16 sec. Median 0.19 s 0
Boys Interquartile range 0 06 .17 s
Girls Th . s 000
2 4
4
3 3 2 2 2 2 2
0 0 0 1 1
ese results confirm what th
slower than the boys to react, eb~:::;!~e:f
r tsdiagram
.
sho":s, that
more vanabtlity the
in th
s
e bgirls
oys ' generally
results. are
1 0 0 2
1 8 8 8
9 9 @) 8 8 6 6 7 7
7 7 6 1
1 4 5 5
5 5 5 4 4 2 3 Ungrouped data in a frequency distribution - .
1
0 1 1 To find the median and . median and quartiles
t~:~;:~uled
1
0 9 it is useful to find t he cumulative
item. quarules frequency
of data in the form
as this of a
gives frequency
a requency up todistribution,
a particular
Find the median and the interquartile range for both sets of reaction times. Comment on your
Illustrating this on a 'step' diagram showing the cumulative frequency: llit-~!lill rr -r~r 111! , ~ 1
I
Iii
J ;i
\\ \ \1\ [T
i !l
':
!I I !: ; !;
I
! il
\:\, !i !i
1
ij H iT
:
! !
IT :L' ~~
\
i_:
iJ
It
IT - I
I
ll
!
ii
il ! I ! I \I Jr
i i.:J :
II Ill' rl
I
~! ifi!J! !J
'i
'I: \Y ~-I
LI ,, I! I
(\
\'\'\
Solution 1.39
\U\' LT_ ,-~
.1'
d. umber of absences. -i-n --->----· 75% --">"
4.
. d the median and interquartile range of the
Fm . (a) Find the me mn fn h .ddle 50% of the ~ "~
following distributwns: j (b) Find the range o t e rot 1'
(a) Sten; te~f [Key 512 means 52 observations. ber of absences. 8
in --·->- '
I
~
0
50% ->-· I l
(c) Calculate the mean num . .
(d) Calculate the standard devtatton. . I y I y
2
3
4
3 4 4
2 8 8
15667
d . the effectiveness of Famtly i-n ->" v 25% -~>-, v
5
6
2 3 3
5 7 8 8
8. A researcher, stu ymg . d out a survey of
Income Supplem:n~, ca~l\enefit. As part of the
120 families recetvmg t e d d the number of
v v
7 2 4 survey th_e researfhe~tec~~eeresults are illustrated
0
a, a3 a, "'o;· Q3
1
~
8 0 ,
(b) Stem Leaf [Key 112 means 1.2 J in the cumu attve req
,.,,
"'Ff
,
3 6
;:.-- 120
,,, t<
3
2
1 2
5 7
0 3 4 4
~
5- 110
!t1\I1 Grouped data
Cumulative
frequency curve
Cumulative percentage
frequency curve
2 .!'
1 6 7 8 8 9 9
1 2 2 3 4 "
~ 100 Lower quartile, Q 1 ! nth reading 25% reading
0 5 5 'S
E Median, Q 2 ! nth reading SO% reading
0 1 3 3 8 0 9
i nth reading 75% reading
Upper quartile, Q 3
(c) Stem Leaf [Key 22 11 means 23] 80
6 0 2 2 -
1 1 2 3 Cumulative frequency curve Cumulative % frequency curve
10 70
14 0 2 2 3 3
0 2 3 3 3 3 3 :'.'~'' \\ Note that the i(n + 1)th reading is not used for the median. If you used this value you would
not arrive at the same point on the cumulative frequency axis when you worked down from
18 60 I,
22 3 3 3 3 the top of the scale as you would when you worked up from the bottom of the scale. The !nth
26 0 0 2
50 or 50% value is needed for the median.
30 1 3
5.
Find the median td
requenc
interid~:t~~~~~~~:of each 40 Note also, that if preferred, a cumulative frequency polygon or cumulative percentage
frequency polygon can be drawn. The values obtained for the median and quartiles will not
of th e following 30
(a) 8 9 10 vary greatly from those obtained from curves.
5 6 7
X
5 20
15 18 6
6 11
f Example 1.40
14 15 16
10 1.\',\
',\\ :n lj
\.',!
i\\\ ~~!
8 The table gives the cumulative distribution of the heights (in centimetres) of 400 children in a
(b) 12 13 7
X o 23456
15 7 0 Number of children
3 9 11
f Height (em) <100 <110 <120 <130 <140 <150 <160 <170
h ws the number of goals
(a) Write down the mode ~nd the median of the
395 400
number of children per ~~rrnly. of the number Cumulative 0 27 85 215 320 370
6. The frequency tables o. . 25 games played.
scored in netball by Jemtma tn (b) Find the interquartl_ e range frequency
of children per fa~tly · t' le range is only a
Number of goals ) Ex lain why the mterquar 1 . e of
certain school:
8 6 (c ro~gh measure of spread for thts typ (NEAB)
1 3 2 5
0 distribution. (a) Draw a cumulative frequency curve.
Frequency
(b) Find an estimate of the median height.
umulative frequency table.
(a) Construct a cd_ t illustrate the table. (c) Determine the interquartile range.
b) Draw a step tagram o f \
( . d the median number o goa s.
(c) Fm '\ nge
(d) Find the interquartl e ra .
Example 1.41
The masses, measured to the nearest kilogram, of 50 boys are noted and a cumulative
. th 10 to 90 percentile range. percentage frequency distribution formed.
(d) Determme e
<59.5 <64.5 <69.5 <74.5 <79.5 <84.5 <89.5
mass (kg)
Solution 1.40 how the heights of 400 children
Cumulative
(a) Cumulative frequency curve to s 88 100
\ ;-i-\ %frequency 0 4 16 40 68
v
j
~ 1 'i
Draw a cumulative percentage frequency curve and use it to estimate the median mass and the
interquartile range.
Solution 1.41
Cumulative % frequency curve to show masses of 50 boys
0
I, • ..
~
:! I
JiT 11J'!JT }!
1!/ T1'
20 0
i trH1L: trr
\i ~ :I \1
40%
l
00 H
11 ,J
40 ~ !r I!
u. 0
'i :l;
74.s~-
160 170 84.5 89.5
0 140 150 5 9.5 64.5 69.5 79.5
120 130
100 110 Height (em) Mass (llg)
The median is the 50% reading. From the graph this is 76.3 kg.
. . h 1(400)th value, i.e. the 200th value.
For the medtan, fmd t The lower quartile, Q 1 is the 25% reading, so Q 1 = 71.5 kg.
(b) t"e 2 te of the me d'tan 1S
. 129 em •
From the grap h • an es tma . Q h 300th
th upper quart!1e, 3• t e The upper quartile Q 3 is the 75% reading, so Q 3 = 80.5 kg.
. find the 1 OOth value aud for e
(c) For the lower quarttle, Q,, The interquartile range= Q 3 - Q 1 = 80.5- 71.5 = 9 kg.
value. It is interesting to note that if the data are represented by a histogram, the median divides the
- 121 5 em and Q, = 13 7 ·5 em
From the graph, Q t - · area exactly in half.
The interquartile range= Q3- Q,121 5
=137.5- . Histogram to show the masses of 50 boys
=16cm
f th middle 50% of the readings.
Note that this is the range o e h h g 15
r-rr~
. h. 10o/.0 of the way throug t e
. ·
For the 10th percentile (wntteu, to
p ) find the value wh!C ts . . 90 (400)th
h lue. The 90th percentile lS the too
'
! 10
(d) c
. the fl-(400)th value, t.e. the 40t va u•
rea d mgs, 100
value, i.e. the 360th value. 5
.------ E 1-
Example 1.42 30 40 50 60 70 90
Examinations in English, Mathematics and Science were taken by 400 students. Each Age
examination was marked out of 100 and the cumulative frequency graphs illustrating the 61 37 15 0
Number of members 5 42
results are shown below. ~ 400
Without drawing a cumulative frequency curve, estimate
(;' 400 E
~ ~ (a) the median age
(b) tbe number of ~e b
. (c) the 20th percentil~ ers aged 67 or over,
~ 200
E il
\zoo
u o+-~~-4---L--1-
100
50 Solution 1.43
100
o Mark
o+---~--~£-~--"
100
50
Mark Form a cumulative frequency distribution.
o 50 Science
Mark
Mathematics <40 <50 <60 <70 <90
Age
English 160
Cumulative frequency 5 47 108 145
(a) In which subject was the median mark the highest? (C)
(b) In which subject was the interquartile range of the marks the greatest?
F
(a) Smce ob servatwns the med· h
(c) In which subject did approximately 75% of the students score 50 marks or more? rom there are 160 under. 50 'and 108 tan ts t e 80tb observation.
the table,
the mterval 50-60
47 are
. are under 60, so the 80th p
==u=¥~
h
60 years
Solution 1.42 50 years Median
3
Showing
Median Qtheisworking on the diagrams:
200th reading, lower quartile Q 1 is 100th, upper quartile Q is the
2 47 people
300th reading.
I
,, .. i 400
1os-p~~PI~-~ ---..
. E \-·--'-···--~-~---~
----------1'
... \ 1400
/ 0
/' y
Q2Q3 100
0 ° °
03
150 2 100
Mark . e 67 IS
(b) Theag · lm t h e interval 60~ 70 h.
· of
5 ' 67 IS located th h w tch has a width of 10
0
0 Marl< 10 e way t rough this interval. .
0 Science
Mathematics 60 67 70
English
(a) The median, Q 2, is the highest in English.
(b) The interquartile range, Q 3 - Q 1 , is greatest for Science.
(c) The subject in which 300 students scored 50 or more, i.e. 75% scored 50 or more is x people
--i45P~~-pj~~ - --·---------- ----r
Science.
Using linear int~rp~:/n th~ mterval60-67
The number of eo l . ,
~~~~'
is 145-108 = 37
to
Now of 73 = 25.9 j]j of the 37 people will be under 67 years old
, . number of peo 1 d .
So number of people e6;n er 67 years old = 108 + 26 = 134
Using linear interpolation or over= 160- 134 = 26
It is possible to estimate the median, quartiles or other percentiles for grouped data withont
drawing the cumulative frequency graph. The method is known as linear interpolation.
Using ratios:
X-~ 60
(c) The 20th percentile is the value 20% of the way through the distribution. 50 100 sox= 100 x50=30
There are 160 observations and 20% of 160 = 32, so the age of the 32nd person is The number of calls= 30% of 500 = 150
Five people were under 40,47 people were under 50, so the 32nd person is in the
needed. 150 calls lasted less than a mtnute.
.
interval40-50.
The number of people in this interval= 47- 5 = 42. 50
X
40
Exercise lk Cumul at.IVe f requency
~ ~ 1an and quartiles- grouped data
med' , 1
~1 of the way through the interval40-50. distributio~ ~nd dr~onstb-ct the frequency·
· ...,x<52 etc
32- 5 = 27, sox will be Mass (kg) Frequency the median on th h'!' a tstogram. Show
e tstogram.
x=40d~x10=46.4 ... 40-44 3
45-49 2 3. Eggs laid at Hill Farm ar .
The 20th percentile is 46.4 years (1 d.p.) results grouped h e wctghed and the
7 ass own:
50-54
55-59 18 Frequency
Mass (g)
60-64 18
65-69 3 -50 3
Example 1.44 70-74 1 -54 2
The distribution of the lengths of time of a large number of telephone calls made from an
office in a given week was such that the median was 100 seconds and the 80th percentile was -58 5
d a cu ~u 1attve
· f requency table d
~aw
(a) Construct
-62 12
190 seconds. Without drawing the cumulative frequency curve, estimate (b) a cumulatwe frequency curve. an
\?owl ~any students weighed less than -66 10
. cg. -70 6
(a)
(b) the
the upper
numberquartile,
of calls, out of 500, that lasted less than a minute. (c) How
61 kg?many stude n t s wetg
· h ed more than
-74 2
(d) 2~% were heavier than x kg
Fmd the value of x ·
Construct a cumulativ f
a cumulative frequenc e cr~~ue.r::? table and draw
(e) Est~mate the media~. estimate the med1" y t tve. se the curve to
Solution 1.44 3
(f) Esttmate the interquartile range.
an mass.
(a) Show the information on a diagram, denoting the upper quartile by Q .
a, 190 seconds 4.
2. Fifty
woodland d h were co ll ected m
soil samples . an area of gso
100 seconds tj
found. Th~ ~~~mtulet~H !value for each sample was
a tve requcnc d" "b . l'
-----------------~'
was constructed as sh own m . theYtable.
tstn utton \1li1 H
H
-----------------------------~3>-\
50%
?H value Cumulative frequency
40
75%
----------~----?
---------------- 80% <4.8 1
<5.2 2 30
tl
Percentage of distribution in interval100-190 is 30%. <5.6 5
Percentage of distribution in intervallOO-Q 3 is 25%.
So Q is ~t of the way through the intervall00-190.
~l,
This 3interval has a width of 90 so Q 3 = 100 + x 90 = 175
<6.0
<6.4
<6.8
10
19
38
20
ll
<7.2 43
:i
The upper quartile is 175 seconds. <7.6 46 H"
100 seconds <8.0 49 10
60 seconds
<8.4 50
!;
(b)
--
0
pl-I value less t~:~f7~he samples had a 0 5 10 15 20 25 30 35 40
---------------------------
50%
Time (minutes)
x% of calls lasted less than 60 seconds and SO% lasted less than 100 seconds.
and travels to a second ci f~rts. from one city
13. Every day at 08:28 a train de
Use your curve to estimate
(a) the median distance the journey were recordeY>10 ~
tnnes taken for
l~l :~~ 1
;e~:!~~~~ ~[~;~~t~~~~e ~~tances, ![
(a) Draw the cumulative frequency curve. certain period and mmutes over a
(b) Use your curve to estimate the median table. were grouped as shown in the
The cumulative frequency curve has been drawn travel more than I 30 o need to
<ill. IC)
temperature.
from information about the amount of time (c) In a particular house it was found that the
spent by 50 people in a supermarket on a central heating was turned on when the Time Frequency
weekly maximum temperature fell 10. The prices, on a particular da 0 f
the London Stock E h Y' 53 stocks on
particular day.
(a) Construct the cumulative frequency table, below 17 oc. Use your curve to estimate the table below. xc ange are summarised in -80 0
the number of weeks when the heating -85 6
taking boundaries .;;;;5, .;;;;10, ... 12
-90
(b) How many people spent between 17 and was turned on. Number of
(d) A week is classified as extremely warm -95 22
27 minutes in the supermarket? 31
(c) 60% of the people spent less than or equal when the weekly maximum is greater Price £x stocks -100
than 21 oc. -105 15
to t minutes. Find t. Use your curve to estimate the percentage of 75<x<;95 6 7
(d) 60% of the people spent longer than -110
weeks that are classified as extremely warm.
(C) 95<x<;100 10 -115 4
s minutes. Finds. 2
100<x<;105 12 -120
(e) Estimate the median.
(f) Find the interquartile range. -125 1
105<x<;110 13
8. The times, to the nearest minute, taken by a group Over 125 0
5. In a quality-control survey, the length of life, in of 120 students to write a particular essay, were 110<x<;120 7
hours, of 50 light bulbs is noted. recorded and are grouped in the table below. 120 <X<; 135 5
The results are summarised in the table.
(The
than interval upl~
' 90'
85 minut:s · d.Jcadte~ alll ti~es greater
0
Using linear interpolation, calculate estimates of minutes.) an me udmg 90
C_on~truct the
d1stnbution cumulative
and dr h fr equency
- table for this
th" draw a cu~u 1atJve
Ten curve. aw t e cumulative frequency From . frequency
(a) the median, (minutes) 40-44 45-49 so-54 ss-59 60-64 curve these figures
and from
(b) the interquartile range. . IS curve esttmate
Frequency 26
Use your curve to estimate a ) t h emed1anti
I(b) . me f or t h e JOUrney
·
Length of life (h) Number 34 30 (a) the ~edian price, t e mterquartile range '
22
bet~:~:S ~:~~ha:-di~~~l~.
8 (b) the mterquartile range
3 of students Ic) the number oft · 'h.
650" h <670
7 (c) :~t;1~~~r of stocks ~osting between £89 second city the
670<:h <680 Construct the cumulative frequency table for this (C) IC Additional)
20 distribution and draw the cumulative frequency
680,;; h <690
17 11. The masses, measured to the
690d <700 curve. 80 eggs were recorded d nearest gram, of 14. Two hundred and fift A
following heights.
.
Y rmy recrmts have the
3 Use your curve to estimate table below. an are grouped in the
700 "h < 702 (a) the interquartile range of the times,
(b) the percentage of these students who spent
over 62 minutes in writing the essay. 65 69 70 79 Height (em) No. of recruits
6. A factory produces a certain component. The Mass (g) 50-59 60 64
masses of 500 of these components were Another group of 30 students wrote the same 165- 18
measured to the nearest gram and are grouped in Number
essay and all took over 65 minutes to 20 y 37
of eggs 18 X 170-
the following table. complete
Use your it.
curve to estimate the median time of all 175- 60
60-69 70-74 75-79 80-84 85-89 Assuming that the read" · 65
150 students. (C) linearly distributed dm?s m each group are 180-
Mass (g) these c s h an gtven that 60% of all 185- 48
Number of 196 53 9. Each of 50 sportsmen was asked to state the calcula~~ th:::l:ctuafl masdses below 66.5 g,
e o x an of y. \C) 190-195 22
93 120 distance, x km, he needs to travel to obtain
components 38
access to suitable training facilities. The results
Without drawing a cumulative frequency curve, are summarised in the table below. s~rength, measur~d i~ ~~el a_:; tested for tensile
12. 30 specimens of she ij~~u~~cda~a in the form of a cumulative
estimate gives the distribution 0 f t hm . The table below
e measurements. (a) the ~e~r:~-h~:;~~c curve to estimate
(a) the 60th percentile, Number of sportsmen
(b) the number of components whose mass is Distance (x km) (b) the lower quartile height
less than 78 grams. (C) 1 Tensile strength Number of specimens The tallest 40% of the re ..
into a special squad E . crmts are to be formed
o<x<4 2 · stJmate
405 415 4
7. The weekly maximum temperatures in a 4<;x<10 Ic) t he median
certain town were recorded, to the nearest 6 415-425 3 (d) the upper ~uartile of the heights f h
degree Celsius, over a period of two years and 10<:x<20 19 425-435 6 members of this squad. o t e
grouped in the following table. 20<:x<35 12 435-445 10
445-455 5
task was o f ~h e ttmds
Number of weeks 35<:x<60 10 15. The distribution ·
455-465 2 certain taken when a
Temperature (°C) 60<:x<100 number of e ter orme by each of a large
8 percentile !a~~; w~s such. that its twentieth
-5 to -1
12
Construct the cumulative frequency table for the d_ts~nbution.
D_ra';' a cumulative fr equency dtagram
. of this was 50 minutes it~~?ut~esh,Its fortie.th percentile
Oto4 distribution and draw the cumulative frequencY 6 . •. x tet percent 1le was
17
5 to 9
btnnate the med"tan and the 10th and 90th
percentiles. 744 :~~~~~~-and tts eightieth percentile was
31 curve. (O&C)
10 to 14 23
15 to 19 9
20 to 24 4
25 to 29
Notice that when distributions are skew, the median generally lies between the mode and the
Number of children mean, and the following relationship is satisfied
Amount raised,·£
. l 'on to estimate (a) the . 70 mean ~· mode ""' 3 im<car
Use linear mterRo ~tl . (b) the upper quartile
1-5
median of t_he ~tstnbut~~~' ercentage of persons 36 One measure of skewness is given by Pearson's coefficient of skewness.
of the distnbuttOn, (c) k. pf t minutes or less. 6-10 19
who performed thetas tn or y 11-15 mean mode
(NEAB) Pearson's coefficient of ske;vncss
1 . ossible amount which ma~
oco
25 <;X< 28 6
28 <;x<29 12
SKEWNESS . 29<;x<30 27
e of various distribunons. 30<;x<31 30
On page 20 you considered the shap . h d of skewness of a distribution: 18
3l<;x<32
. l s of expressrng t e egree
There are mathemattca way . . . In a positively-sl~ewed 32<;x<33 14
In a symmetrical dtstnb~tlOn, distribution the tall o.f the 9
33 <;x<34
ln a negatively-skewed mean= mode = medmn distribution is pulled tn the
positive direction. 34<;x<35 4
distribution the tail o.f the
distribution is pulled m the mode < median < mean 3S<;x<40 5
negative direction.
mean < median < mode Draw a histogram to represent these data.
Meall
30 A Median 31 A
45 fuses
A measure of the skewness (or asymmetry) of a distribution is given by
75 fuses
3(mean- median)
standard deviation
Calculate the value of this measure of skewness for the above data. (L) Median = 30 + 17.5 - 30.58 ... = 30.6 A (1 d.p.)
30 x 1-
Explain briefly how this skewness is apparent in the shape of your histogram. ~~---~
Mid-point (x) f (b) x ='ifx
6
If
26.5 3861.5
28.5 12
Solution 1.45 Frequency density 27 125
29.5
frequency 30 = 30.892
30.5
Frequency interval width 31.5 18 2 If"
Interval width
2 32.5 14 (c) s =- -x2
Current
6 33.5 9
If
3 12 - 119 905.25
25 <;x<28 12 34.5 4 30.892 2
1 27 25
2S<;x<29 27 37.5 5
1 30 = 4.926 ...
29<;x<30 30
1 18 Lf= 125 s = 2.219
30<;x<31 18
1 l4
31<;x<32 14 [Check these on y our ca1culator, using SD mode.] ···
1 9
32<;x<33 9
1 4
33<;x<34
1
4
1 Therefore the mean is 30.892 A and t h e standard deviation is 2.22 A (2 d.p.)
34<;x<35 5
5
35<;x<40 Now skewness = 3(mean- median)
standard deviation
Histogram to show the current at which fuses blow = 3(30.892- 30.58 ... )
2.219 ...
~
= 0.42 (2 d.p.)
,-
Since
N skewness> 0 ' t h e d'tstnbutwn
. . is positively sk ewed .
1----
.--
f-
10
n
0 wr
0 25 28 30 32 34 36 38 40
Current {A)
interpolation as follows:
Since 45 fuses blew at a current less than 30 A and 75 fuses blew at a current less than
31 A, the median lies in the interval, of width 1 A, from 30 A to 31 A.
(Q,- Q2)- (02- Q,)
Quartile coefficient of skewness
Q3-Ql
Quartile coefficient of skewness 13-11
Another measure of skewness is defined in terms of the quartiles. Writing Q 1 for the lower 61-37
~ 0.083 ...
quartile, Q the median and Q 3 the upper quartile,
2 This indicates a positive skew.
,=~
.. ~--~
. . sIcew.
The histogram confirms the pos1t1ve
a,
Q,- Q2 < Q,- Q,
Q,-Q,>Q2-Ql
Q3- Q2 ~ Q2- Q, Quartile skewness < 0
Quartile skewness > 0 THE NORMAL DISTRIBUTION
Quartile skewness ~ 0
There is a special symmetrical distribution len
bell-shaped, centred around the mean own as the normal distribution. This is
Example 1.46
31 students tried to estimate the length of a line. The line was actually 60 mm long. These are
Here are two normal distributions wiili th e same mean, but different standard deviations.
-=A~
their results, in millimetres.
61 70 46 44 26 23 30 83 52 44 38
37 49 59 58 63 31 29 37 48 76 61
46 31 38 41 49 52 56 75 61
Find the median and the quartiles of this distribution and use the quartiles to estimate the ~
Th Mean
skewness. ere are two normal distributions with ,the s arne standard deviation but with d'ff
1 erent means.
Draw a histogram with equal class intervals 20 <;I< 30, 30,;; l < 40, ... - ·
' .
Solution 1.46
Arrange the results in order.
23 26 29 30 31 31 37
@ 38 38 41 44 44 46 46
61 61 63 70 75 76 83
@ 49 49 52 52 56 58 59 @
In 3 normal distribution:
There are 31 results, so the median, Q 2 , is the :H31 + 1)th value, i.e. the 16th value.
So median~ 48.
To find the quartiles, since n is odd (see page 69)
Q ~ ~(31 + 1)th values~ 8th value~ 37
1 s x x+s x2s x X+ 2s x3s x X+ 3s
Q = ~(31 + 1)th values~ 24th value~ 61
X
State
mean the
andmodal val ud a~d_calculate the median
standard
cheques with errors l·nevtatwnkof the number of'
The quartiles are a wee
skewness- _
mean-mode
_c_cc'-'-ccc'-~'-'c
Time
Number of children
8
i i 8 I
Key 3[7 means 37 I 130-
135-
6
5
standard deviation 10-19 15 3 3 7 140- 4
20-24 25 4 5 150-180 6
(a) 25-29 5 2 5 5 7
18 6 1 1 6 6 8 8 8 (a) Find
30-39
14. 12 7 3 5 5 (i)
ii) Pearson's coefficient of sl
40-49
i 12 50-64
7 8
9
2 9
1
(
(b) D
the quartile coeff . <ewness
tctent of skewness
£ 10 65-89
5 raw the histogram. .
r
(b)
(b) Calculate the corresponding value for the
above data. Comment on your result.
i 100
14
17 0 1 Th b ' op,
20 1 e ox plots can be drawn h onzontally
. ' as sh own above, or vertically
. like th'!S.
40
20
60
\ 40
------
0
20 i
'
i03
80 100
0 40 60
0 Mark
Vertically
...---- Highest value
A class of pupils played a computer game which tested how quickly they reacted to a visual
instruction to press a particular key. The computer measured their reaction times in tenths of
a second and stored a record of the sex and reaction time of each pupil. Finally it displayed
-4;---- Lowest value the following summary statistics for the whole class.
Solution 1.47
(a) Q, Q, a,
Girls
Q, Q, Q,
The whiskers are of eq':'al Boys
8 f 50 ackets. The1r A group of athletes frequently run round a cross-country course in training. The box and
Example 1 ·4 f eets in each 0
P whisker plots below represent the times taken by athletes A, B, C and D to complete the
of the numbers o sw
out a survey .
A group of chlldren carrLI\owmg stem and leaf d>a•tg~r:a:m:·------:;-;~=::;in-;;~~~ course.
results are shown m the -Key 20\4 means 24 sweets in a packet. A--_.j
9
778899 9 4 4 4 8-----------------
3
20 14 6 6 7 7 1 1 2 2 2 3 8 9 9
30 0 0 0 0 ~ 7 7 7 8 8 8 8
30 5 5 5 6 4 4 c-----
0 1 1 2 . . (NEAB)
40 0 . f this distribunon.
1
d' n and the quartl es o
(a) Calculate the ~e ::r the distribution.
(b) Draw a box pot
27 28 29 30 31 32 33 34 35
Solution 1.48 . ' 50+ 1 )th value, i.e. the 25 .5th value. Time (minutes)
. the median lS the 2( h' h are 33 and 34.
(a) There are 50 ltems, so h 25th and 26th value w lC (a) Compare the times taken by athletes C and D.
This is half-way between t e
. n Q = 33.5 sweets. Assume that the distributions shown above are representative of the times the athletes would
So me d1a , 2 take in a race over the same course.
(13 values) (b) Which of the athletes A orB would you choose if you were asked to select one of them to
(15 values) win a race against
7 7 8 8 9 9
20
30
14 6 6 7 6
1 1 2 2 2 3
0 0 0 0 6 7 7 7 @) 8 8
4 4
9 9
(15 values)
(7 values) (i)
(ii)
c
D?
5 6
30 5 5 4 4 T
40 0 0 1 1 2 Q, Q2 Give a reason for each answer.
I . 0 = 29 sweets.
0 so Q 1 is the 13th va ue, l.e. ~Ql - 38 sweets. (c) Which athlete would be most likely to win a race between A and B? (AEB)
items to the left o f ~2' Q . the 38th value, l.e. 3-
There are 25 · h f Q so 3 lS d.
There are 25 items to the ng t o 2• d' 'b tions either side of the me >an. Solution 1.49
. 'd . half the two lstn u
th at the quartiles dlVl e m (a) Dis always faster than C.
Remem b er
. easy to see: C's times are more variable than D's.
Note that the pattern lS 12 items C's times are positively skewed.
a,
12 items D's times are negatively skewed.
12 items
12 items t 34 35
38
38 38. . 44
(b) (i) B's median average time is faster than C's, but B's tin~es are more variable. It is
30 30 . . 33
24 26. . 29 probable that B would win against C.
Although A's slowest time of approximately 32 minutes appears to be slightly greater
than C's fastest time. A will almost certainly win against C.
(b) Box plot Therefore choose A to win a race against C.
44
a, (ii) A has a small chance of winning against D, but B has a slightly greater chance of
va\u·-· a, t winning against D.
Therefore choose B to win against D.
==:
ic) 1\'s average time is faster than B's and A's times are not as variable as B's.
1 hcrcfore choose A to win a race between A and B.
40
35
25
20
temperature.).;~; :~:e
It wron 1that. the .te ~perature recorded as 94 op .
would appear
recorded
It IS most
perature unusual
of 57 to have.
op how
.
just~~:~
outhehr.
ay Wit an(It was probably
extremely high
' ever lS not an outli
I
I
I
I
I
0. X~ 71, and
x-2s~56.6,
5 ~ 7.J1, so
x+2s=85.3
I
I Q, II
~..............
I Since outliers lie outside these values 57 op . ' 1s an outher.
I
I
Boundary
. ....... ' lS not an outlier but 94 op . .
I
I
I
Boundary
Example 1.50
A class of 31 children recorded the maximum daily temperature for the month of July with
the following results. The median and quartiles are shown on the stem and leaf diagram.
Key 6 \ 8 means 68°F l J 1. The table below gives the length .
50 telephone calls from a sch ools,office.
.
m mmutes, of Key 4 12 means Key 214 means
24 hundredths 24 hundredths
9 4 of a second
of a second
8 Length of Number of
8 1 1 call (min) calls Group 1 Group 2
7 7 7 9 9 3 3 3 G) 6 6 2
2 2 2
7 0 0 @ 0 <;1 8 5 4 4 2 4 5
2 2 2 3 3
6 (6) 8 8 8 9 9 1-2 11 333322
11000
2
2 0 0 1
4 4
6 1 3 4 4 2-3 17 8 1 8 9 9 9 9
5 7 3-5 8 7 6 6 1 6 6 7 7
5-10 6 5 4 4 1 4 4
Identify any outliers ;;.1o 0 1 2
1
(a) by using the values of the quartiles and illustrating your results on a boxplot, 0 9
(b) by using the mean and the standard deviation. (a) Draw a cumulative fr
(b) Estimate the med' eq~ency polygon.
(c) Draw a box lot lan an the quartiles. 3. !wenty-one
m millimetresgirls
Thestimated
I t h e Iength of a line
distribution. P and comment on the · e resu ts were '
Solution 1.50
(a) The values ringed are 66, 70 and 73, so Q 1 ~ 66 op, Q2 ~ 70 op, Q
3
~ 73 op, 2. ,I."wo groups of e
tnning experim~n ople t~:>Ok part in a reaction-
51
85 45
62 31 43 51
20 22 97 ~~ 4198 23
22 34 35 35
18 27
,~
hundredth o f a second
t. Theu results
h ' to the nearest
.
lnterquartile range~ Q 3 - Q 1 ~ 73°- 66° ~ 7 op ~~;l:r~e box plot and use it to identify any
~ Q 3 + 1.5 x T ~ 73o + 10.5" ~ 83.5"
'Jonstruct b ' are s own below
Upper boundary distribution~)X 1 to represent the
pdots .
Lower boundary ~ Q -1.5 x T ~ 66°- 10.5" ~ 55.5°
1
' an comment.
6. db l in Enghsh an m
obtaine Y a c ass
Mathematics. Comment on
'the distributions of
10 , ......
on the standard strain of corn he had previously
used and he recorded their weight gains after
i '' 8 I I ... i ..... three weeks. The results for this control group
i marks.
••''
L
I
(a) 3 are given in the ordered stem and leaf display in
-~ L I i English 6 .. . .....
I ....
i 2 l
( ....
•''
\
i\l--- 2
I
-\········ I I
Weight gain (grams)
Unit is 1 gram
~ \ --L___j____J
Mathematics
1
'•''
' marks 0
2 3 1 ? 7 X
32 1 5
I 33
( ( 70 80 10. A frequency diagram for a set of data is shown
'
50 50 60 34 5 9
0 30 40 30 40 below.
10 20 20
0 Lengtll (em) 35 0 6 6
Key 6\5 means 36 0 1 6
7. Key 4\6 means 6.5 hours 6 7
6.4 hours ~ 6 37 1 2 2 3
(bl 0.6 December
july ,!5 38 0 6
-~ 0 1 4 39 9
12
~ 11 1 3 3 4 3 40 2
i 0.4 10
9 2 2 8 8 41
~ 8 : +-r--+1--+-+-'i--+--+-+--HII-+-1-,---,-,--+1 42 1 3
0.2 7 3 4 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
420 6 55666 (b) On a single diagram draw two box-and-
{a) Find the median and the mode of the data. whisker plots, one for the weight gains of
1 5 ooz84 {b) Given that the mean is 5.95 and the
0 80 100 the chicks fed the new strain of corn and the
40 60 9 1 4 1 3 standard deviation is 2.58, explain why the
0 20 other for the weight gains of the control
Length (em) 1 3 5 5 value 15 may be regarded as an outlier. group fed the standard strain of corn.
764433 2 6 (c) Explain how you would treat the outlier if (c) Use your box and whisker plots to compare
9 8 8 6 0 0 1 1 2 the diagram represents and contrast the two distributions. (0)
887332000000 0 34 (i) the ages {in completed years) of
· daily hours of children at a party, 12. A random sample of 51 people was asked to
This back-to-bade stempl at glVes (ii) the sums of the scores obtained when record the number of miles they travelled by car
sunshine in December and July throwing a pair of dice. in a given week. The distances, to the nearest
{d) Find the median and the mode of the data mile, are shown below.
Find the median and quartiles for each month
after the outlier is removed.
and construct the box plots. {e) Without doing any calculations state what 67 76 85 42 93 48 93 46 52
Comment on the distributions. effect, if any, removing the outlier would n 77 53 41 48 86 78 56 80
have on the mean and on the standard 70 70 66 62 54 85 60 58 43
58 74 44 52 74 52 82 78 47
These are the times of t~e pastil .delivery to my deviation.
66 50 67 87 78 86 94 63 72
(f) Docs ~he diagram exhibit positive skewness,
8. house over four successive wee cs. 63 44 47 57 68 81
negative skewness or no skewness?
9o01 9o22 9o30 9o19 9o15 929 How is the skewness affected by removing (a) Construct a stem and leaf diagram to
9A5 9o53 9o02 9o05 9o31 9A7 the outlier? (MEI) represent these data.
9A8 9o29 9o09 9o29 9o02 (b) Find the median and the quartiles of this
~~i6 9o12 9o25 9o10 9o13 9o19 distribution.
(c) Draw a box plot to represent these data.
Draw a stem and leaf diagram. (d) Give one advantage of using
(a)
(b)
Find the median time. (i) a stem and leaf diagram,
Find the quartiles. .
(c) Draw a box and whisker dJagram. (iii a box plot,
(d) to illustrate data such as that given above. (L)
103
Pie charts
Area = frequency
C A
To compare sets of data with total
frequencies Fl and F2, draw circles with
Vertical line graphs- ungrouped discrete data
B radii in the ratio ·'f.·r;;-F
'H't· 'IJ:i2.
radius r1 radius r2
Height represents frequency.
Mode is denoted by the tallest line. Mean, .X
LX
Raw data X=- When data are grouped
n the mid-interval value '
_ L(x 1 1ower boundary+ upper
2( ' boundary)
Frequency distributions x=-- IS taken to represent the interval.
Lf
Stem and leaf diagrams (stemplot) Standard deviation 5
'
l Key 2\7 means 27 J Raw data s= JL(x:x)2 or s~ ~-x2
s~
2
Leaf = JL((x- x) or JLZ2 -xl
Stem The stem plot must have a key. Frequency distributions 5
L(
1 0 4 5 9 Intervals are 10-19,20-29,30-39,40-49,
variance = 52
2 2 2 3 5 6 7 7
S0-59 s
~--
="'variance
3 1 2 2 7 8 Equal width intervals must be chosen.
4 3 3 4 6 Scaling data
2 3 7 If y ~ax+ b, where a and b are constants
5
then y=ax+ band sy~ Ia I sx
Histograms - grouped data Area oc frequency Coding data
frequency
x-a
Frequency density interval width If y=- then x=a+by
b
,- Modal class is represented by tallest x~a+by
r- rectangle. sx=lbls,
1- Interval width =
I upper class boundary -lower class boundaty. Combining sets of numbers, x and y
' new mean Lx + Ly
n1 +nl
Frequency polygons - grouped data LX 2 +Ly 2
new variance (new mean) 2
n1 +nz
Plot frequency density against the Weighted means
mid-interval value Ifx 1' x z, ... , xn are given
. weightings w 1' w z, ... ,W 11 t hen
Join with straight lines
weighted mean- w1x1 + WzXz + ... + wnxn L.wixi
Wt+Wz+···+Wn Lwi
"' Skewness
Mean
Mode
Median
In a positively-skewed
distribution the tail of the
d!stnbution
. . is pulled in the
posttlve direction.
mode < median <mean
., plot cumulative frequency against
., steepest step denotes the Q3- Qz > Q 2 - Q 1
upper class boundary.
mode. ., join with a curve (or with straight
lines for a cumulative frequency
In a negatively-skewed
polygon). distribution the tail of the
distribution
. is pulled in the
negative direction.
" Median, quartiles and percentiles mean < median< mode
For n observations arranged in order of size Q3- Qz < Q 2 - QI
" the median Q is the value 50% of the way through the distribution,
2
., the lower quartile, Q , is the value 25% of the way through the distribution,
1
<> the upper quartile, Q , is the value 7 5% of the way through the distribution, -:::--~....,.._:.::"':::::__~
mean- mode _ J( mean- median)
3 Pearson's coefficient of skewness
standard deviation - standard deviatlon . . .
<> the xth percentile, Px, is the value x% of the way through the distribution.
Grouped data - (Q3-Qz)-(Q 2 -0 1)
Ungrouped date Quartile coefficient of skewne·ss
Q3-QI
!nth value= SO% value
\c(n + l)th value
*nth value= 25% value ., Box and w h.IS ker diagrams (boxplots)
Divides the distribution
either side of the -lnth value = 75% value
Q3 \
median in half Lowest
Value
I Highest
value
Symmetrical distribution
* *
Solution 1.51
_ "l:t_~~54.7seconds 40 60 80 100 120 140 160 180 200 220 240
0 20
(a) meant~-;- 10
_
754
_2_9_9-57-.-c-48~_-___2 ~ -13.658
~ 1.9 seconds (2 s.f.) Time in seconds
2
(b)s~ - 10
. . h 1 (10 + 1)th value~ 5.5th value.
the med1an IS t e 2 Example 1.53
1
(c) There are 10 vah ues,
. so . order of slze . Whig and Penn, solicitors, monitored the time spent on consultations with a random sample
Re-arranging t e umes m 5 7 56 8 59 3 of 120 of their clients. The times, to the nearest minute, are summarised in the following
52.7, 53.2, 53.6, 53.7, 53.8,_\54.0, 54.2, 5 . ' . ' . table.
5 .')th Y\1\W.' Time Number of clients
53 8 and 54.0, i.e.
The median is half way between . 10-14 2
5
. _ ( + 54 _0) ~ 53.9 seconds 15-19
med1an- 12 53 ·8 20-24 17
25-29 33
30-34 27
Example 1.52 f nual dexterity. The times, in 35-44 25
. b re required to take a test o rna
Applicants for an assembly JO a k b 19 applicants were as follows: 45-59 7
lete the tas y 6 60-89 3
seconds, taken to cornP 81 72 59 74 61 82, 48, 70, 8 . 1
4 67 59 66 102 ' ' ' ' ' 90-119
63,229,165, 77,49, 7 ' ' ' ' ' 120
Total
For these data find
(a) By calculation, obtain estimates of the median and quartiles of this distribution.
(a) the median, 1 W er quart!.1es. (b) Comment on the skewness of the distribution.
(b) the upper an d °
34.5 44.5
Q,lies in the interval 34.5-44.5 (width 10)
(c) Explain briefly wby these data are consistent with the distribution of times you might There are 25 it ems In
. t h.IS mterval
. .
Solution 1.53
Cumulative
(g)
1Whif and Pe~n
1
!Law:and--couri
i
1
1 i 11 I·
. ..
( l the
whendifference
th . of th e scores obtained Estimate
Calculate the mean mark for all 50 candidates rowmg a pa' f r
and show that the standard deviation of all (B) ~he number of child:~no or~mary dice, ~~~ t~c mea~l of the distribution
10. An advertising campaign to promote electric m a neighbourh d per ousehold less than 50age of the a PP l"tcants
t e medtan ' who are
50 marks is 9. years of age . (C)
showers consists of a mailshot which includes a lt is suggested that the original marks of the (e) Calculate new valu ~o survey.
pre-paid postcard requesting further details. candidates from Set A should be linearly scaled standard deviation~£ thr t?e mlcan and
Prospective customers who return the postcard so that their scaled marks would have a standard removed. e smg e outlier is
(MEI)
16. The cumulat'Jve 1requency t bl b l
the lengths, in minutes of 4a00e ow refers tot
are then· contacted by one of five sales staff: deviation of 9 and a mean mark equal to the made from a certain h~useholdte e~hone calls
Gideon, Magnus, }emma, Pandora or Muruvet. mean mark of all 50 candidates. 14. Data of three months. dunng a period
The pie charts below represent the number of {a) What effect would this have on an original 4320 collected
houses inf:o;n a survey of the cost of
sales completed and the number of potential table below. own arc summarised in the
mark of 60 obtained by a candidate from
customers contacted during a one-month period. Length of call
Set A? Number of houses in minutes Number of calls
(b) Given that the original marks of the Cost(£)
Sales completed candidates in Set A were all integers, explain
Pandora why no mark would remain unchanged. 20 001 50 000 540 <;1 20
(C Additional)
50 001-60 000 1150 <;2 67
Magnus 60 001-70 000 1320 ~2} 118
12. Machine A is set to cut lengths of wood 100 mm
70 001-100 000 860 <;3 177
long. To test the accuracy of the machine, a
random sample is taken from the output. The 100 001-150 000 450 <;5 315
sample size is denoted by n and the length in <; 10 400
mtllimetres of each piece of wood is denoted On graph paper, tllustrate th d
Gideon cumulative frequency h e ata by means of a
by x. The results arc summarised by to estimate the d. grap ' and use your graph Construct the correspondin f
me mn cost an d t h e mterquartile
· ' draw a histogram to ·n g requency table and
n ~50, :Ex~ 5035, ~ 507 033. range.
2
Potential customers contacted :Ex Use linear inter ola _t ustrate_the data.
Calculate the mean and standard deviation of the length of call a~d e tt~n_to ~sttmate the median
Muruvet
lengths in the sample, giving your answers
15. The age d'_IStn·bution of the a l' . (C) significance of a v ~p a m .t e geometrical
recorded m the table below. pp Jcants for a JOb is . erttca1 1me dra h
correct to one decimal place. tstogram at this val ue. < wn t rough the
(C Additional)
h
Machine B is also set to cut lengths of wood Age (years) 20- 35 40 45 50 60
100 rom long. A random sample of 50 items
from this machine has mean 100.2 mm and Number of
standard deviation 1.1 mm. Giving your reasons, 8 9 0
applicants 14 12 7
comment briefly on the accuracy of the two
Pandora
machines. l C)
17. ~he . fr e ep lone calls from my hous e during the
13. A frequency diagram for a set of data is shown in ftrst figure showsofa last
six months cumulative
year equency curve for the length oft I 1
the figure. No scale is given on the frequency
Gideon axis, but summary statistics are given for the
distribution:
160 ill !
11 :Lj LllJ :;t
1 il H Iii
fl li' nu_tt 'I
TiT
i '
~n
Magnus
' 1'I
,.
(a) The total number of potential customers 20
::I
I i
I'
contacted is 1100. Find, approximately, the
total number of sales completed. :, IT i i I
8 '
40
I;
rr~~~
11. An examination is taken by two sets of 2 3 4 !T:
1
candidates from the same school. The number of (a) State the mode and the mid-range value of IT
~~ ~~1-i· 1
candidates in each set, the mean marks and the i
?i'
variances are shown below.
the data.
(b) Identify two features of the distribution.
(c) Calculate the mean and standard deviation
0
0
_...I.T
'T iT
10
!i i
1
15
;
20 25 30
5
Number of Variance of the data and explain why the value 8, Time in minutes
Mean mark which occurs just once, may be regarded as
candidates
66 9 an outlier.
SetA 20 39
51
Set B 30
I i' 115
(c) for
Dataa sample
on the number (a) State the type of dia ram a .
{A) the distribution of these call times is of OO of wor d s per sentence illustrating the data.g ppropnate for
second m . 1 sentences taken from a
negatively skewed, . 'l agazme were presented in a t bl (b) Ca!culations using the data in the tabl .
(a) Find the median and inter-quartile range. (B) the majority of the calls last longer than stml ar to Table 2 and . h h a e esttmates as follows e g1ve
(b) Construct a histogram with six equal intervals. From this ne:tt bi c same _class
6 minutes, for the mean numb cr o f wor ta de,s an mean time of the experiments
intervals to illustrate the data. (C) the majority of the calls last between peresttmate standard deviation 69.64
6.37 s.s,
(c) Use the frequency distribution associated 5 and 10 minutes, was calculated to be 9 14 sentence
with your histogram to estimate the mean (D) the majority of the calls are shorter than sentence in this sa 1. ?hin fact, the only E~xp lainvalues.
precise why thes e are estimates
· rather than
the mean length. (MEl) words was one w:pl e wtt more than 25
length of call.
(d) State whether each of the following is true
or false
C I 1 JC 1 was 32 words 1
rna cu afteha_n improved estimate for
ean o t ts second sample .
th~ng.
(C)
range of the times t
experiments.
tn
(c) Estimate the median d h .
ft e mterquartile
a <en or completing the
pr~~gt·hdof ~thranhdom
classification of the customers during one hour
6. As part of aand
employees detailed theof1its wor k f?rce, a large company select l
study
recorded
use of trading and Table 2 shows the number of sample· Th
of e100
h'tstogram
(a) this
to draw a histogram to represent the data, words per sentence for a sample of 100 sentences
tllustrates the distribution uce . ttme each employee had beenecWI t e company male
(b) to estimate the mean value ofT. (C)
taken from a magazine.
2. The following stem and leaf diagram summarises
the blood glucose level, in mmol/1, of a patient, Table 1
Time employed by
J; ;:LL~LJ_c'-'=i';
. the comp an y for. a random sample of 100 f. 1
o Its rna e employees
measured daily over a period of time. Adult Adult
Totals Child/
Blood glucose level 5 \ 0 means 5.0 female male
(12) Student
5 0 0 1 1 1 2 2 3 3 3 4 4 ( 9)
5 5 5 6 6 7 8 8 9 9 (10) Number of 22
6 0 1 1 1 2 3 4 4 4 4 ( ) 5 28
customers
6 5 5 6 7 8 9 9 ( )
7 1 1 2 2 2 3 ( )
7 5 7 9 9 ( ) Table 2
8 1 1 1 2 2 3 34 ( 3)
8 7 9 9 ( 4)
9 0 1 1 2 ( 3) Words per
1-5 6-10 11-15 16-25 26-45
9 5 7 9 sentence
(a) Write down the numbers required to
complete the stem and leaf diagram. Number of 14 14
18 32 22
(b) Find the median and quartiles of these data. sentences
(c) On graph paper, construct a box plot to
represent these data. Show your scale (a) State a suitable type of diagram which could 30
Time (yearn)
represent the data in Table 1.
clearly. (b) The survey was carried out on a Monday
(d) Comment on the skewness of the (L)
morning. Give one possible reason why Copy and complete the f 0 II owmg
· table. 8.6 ~ears for the quartile times Th 1
distribution.
conclusions based upon the results of
T' (a) servmg woman in the
the compan for
l h e ongest
samp e ad been with
included a ;oma y~ah. Jhe sample also
3. The 30 members of the Darton town orchestra Table 1 should be treated with caution. I me (years) 0- 2 5 10- 15 20 30 20
joined the compa~; Do a dv~ry recently
each recorded the amount of individual practice, (c) Represent the data in Table 2 by means of
x hours, they did in the first week of June. The an accurately drawn histogram on graph Number of males 3S plots to · raw a Jacent box
results are summarised as follows: paper.
in the sample length c~~pare the distributions of the
1:x ~ 225, 1:x 2 ~ 1755.
(d) Use the figures in Table 2 to calculate,
correct to three significant figures, estimates (b) Calculate estimates f h . emplo;;es t~~~ be~~ :7tbl~hyees and female
The mean and standard deviation of the number (c) quartile times f ho t e median and (d) List . re~ d'ff
. th e company
An equivalent r~r t e sample.
of the mean and standard deviation of the I erences between the two .
6.
g 5 00
lfiJ ;x Hl .
1
0
2. A student collected some data on the heights,
0
I!!:
x em, of plants of a particular species. She chose (a) Compare, in words, the distributions of the '
0 1 2 3 4 5 6 7
to represent the data in a stem and leaf display, waiting times in the two post offices.
Time in hours
as shown below. (b) Advise the cleaner which post office to use if
I Unit is 1 em the outliers were due to The diagram shows a cumulative frequenc (b) Cobpl Y and complete the following frequency
9 polygon for the numbers of competitors who
2 2 3 4 4 4 5 6 7 7 (i) a cable laying company having severed
completed a marathon within 2 2! 3 4 d 7
ta e.
12 1 1 1 2 5 5 7 the electricity supply to the post office,
hours of the start. ' 2• ' an
3 1 2 2 5 5 9 (ii) the post office being short~staffed. Time in hours 2-2} 2}-3 3 4 4 7
(AEB) (a) "l!se the diagram to estimate
4 1 3 4 5 (1) the median No. of competitors 200
(a) (i) Explain why the data might be better 5. Thirty children were given a task to perform and (ii) the quartil;s
represented by a two-part stem and leaf
the times taken were recorded, each to the next of the times takei; by the 500 competitors (c) Calculate an estimate of
display. who completed the run.
whole number of minutes above the actual time. (i) the mean
(ii) Rewrite the above data in such a
display. The results were as follows: (ii) the stand~rd deviation
(b) Calculate an estimate of the mean height, 12 20 14 17 17 8 19 13 27 13 of the 500 competitors' tim'es. (NEAB)
22 16 11 18 13 6
in centimetres, of plants of this species. 16 18 10 7
8 10 17 16 19
(c) Calculate the median of the data given in the 16 12 14 23 15
display. (a)
Copy and complete the following stem and
(d) State which of the mean and median would leaf diagram to illustrate the above data.
be a better measure of location for the
heights of these 29 plants. Give a reason for Key 10 \ 5 represents a time of 15 minutes
your answer. (0)
7 8 8
3. A pie chart was drawn, for each of the years 0 1 2 2
l! Il
1990 and 1995, to illustrate the amounts spent 6 6
by a householder on electricity, gas, water and 2 3
telephone, and to compare the total amounts
spent in the two years.
(a) Given that the radii of the 1990 and 1995 (b) Usc your diagram to estimate the median
charts were 15 em and 18 em respectively, and the quartiles of the distribution of times
calculate the percentage increase in the total taken to complete the task.
amount spent. (c) Draw a box plot to illustrate the
distribution. (NEAB)
The amount spent on water in 1995 was twice
the amount spent in 1990. In the 1995 chart the
amount spent on water was represented by an
angle of 4 7o.
For example, if you place weights of 10 g, 20 g, 25 g, 30 g, 35 g, 50 g, etc on the end of a
spring and record the length of the spring for each weight, the weight is controlled so it is the
independent variable. The length of the spring is the dependent variable.
REGRESSION FUNCTION
Having drawn a scatter diagram, you can then look for a mathematical relationship between
the variables, y = f(x), where the function f, known as the regression function, is to be
determined.
the number of hours spent stu ymg or l . G test ( y) Common sense and care are needed when interpreting scatter diagrams.
. F h t() andthemarcma erman '
a student's markthrn a rene£ tels ;(x) and the average length of leaf of the plant (y).
- the diameter of e stem o a p an " Mathematically, there may appear to be a relationship, but this does not imply that there
. . bl known as bivariate data. When pairs of values are plotted, is a relationship in reality. You might find, for example, that over a period of time in a
Data connectmg two vana es are particular city there has been an increase in the number of robberies and an increase in
a scatter diagram is produced. Here are some examples: the number of health food shops. It would however be foolish to iniply that there is a
y
y relationship between these two variables.
X
X " The appearance of a mathematical relationship does not imply that there is a causal
X
X X X X
relationship. An increase in one variable does not necessarily cause an increase, or decrease,
X X X X X X X in the other variable.
X X X
X X X X
X
If it appears from the scatter diagram that a linear relationship is a sensible interpretation, you
X X may then attempt to find a model for the relationship in the form
A line of 'best fit'
of a regression line. drawn 'by eye'
In previous work you may have drawn a line of best fit on the
scatter diagram, attempting to draw it so that there are as many
Dependent and independent variables points above the line as below it, or as many points to the left of
the line as to the right of it. The line should also go through the
. bl h b t
!led it is called the independent or explanatory variable.
If one of the vana es as een con ro ' . point (x, ji), the means of the two sets of data.
The other variable is then the dependent or response vanable.
. ' , . . ther ha hazard. There is a mathematical way of Note that the times, x, are chosen by the person holding the stopwatch, sox is the
This method, known as drawmg by ~e , 1S ~ad f 1 p t squares and this is illustrated m the independent variable. The values of the mass, y, depend on the results of the chemical process
fitting the regression line, known as t e met o o eas at these times, therefore y is the dependent variable. If you were to repeat the experiment with
following example. . . the same values of x, you would almost certainly get a different set of values of y. So for a
. . of a chemical is related to the ttme, x mmutes, fixed value of x you could have several different values of y, all in the same vertical line on the
Consider the situation m whlcb thhe mbass, y ~? lace according to the table:
for which the chemical reaction as een ta mg p ' scatter diagram.
5 7 12 16 20
Time,xmin
25
20
X f
X
f
X
~
~
~
~
~
25
20 xl
c;~
I
jX
'
J X
25
20
y
m3
m4
m,
L.,.
l,,,,.""
~
25
20
y
m, ' m,
(It
,/L 25
20
y
/' '
#/~~· '"'
i
15
15 15 15
~ X J 15
" m,
'
'' ~rm,
10
" """
10
5 X
5
0
5
I
10 15 20
10
5 m1
10
5 m1
)'
V 10
5 I
0 0 0 0 0
0 5 10 15 20 0 5 10 15 20
X
0 5 10 15 20 0 5 10 15 20
Diagram 2
Diagram 1 Diagram 1 Diagram 2 Diagram 3
. h h (- -) and there are three points above the
In each of the attempts, the dotted lme goes t roug x: y
line and two points below it. yet neither of these hnes ts correct.
Useful formulae when calculating regression lines
Diagram 3 shows the true line of best fit. y
It has equation 25 Before looking at how to find the equation of the regression line, here is a reminder of the
y= 1.15 + 1.22x 20 formulae for the mean (page 28) and the variance (page 37) of a set of data together with a
new formula that connects the x andy data, the covariance.
This equation has been calculated by using the method 15
of least squares and the calculations are shown on y = 1.15 + 1.22x For the x data:
10 True line of best fit
page 123. LX
5
The mean of the x data is x where X = - .
n
0 -1-....,----,--...,.-r-~
0 5 10 15 20 The variance is usually written s 2 , but to distinguish that it is the variance of the x data, you
Diagram 3 could writes}. Usually, however, when working in the context of regression and correlation,
the variance of the x data is written sxx·
To
Thisdefine the regression line, you need to find the value of a and b for a particular set of data.
IS done as follows:
Remember that there are alternative formats of the variance:
For the regression line y on x written in the form
y=a + bx
the gradient, b, can be calculated as follows:
For the y data:
- ~y b or
y=-
n 1 ~y'
or s =- ~y 2 - ji 2 = - - - ji
2
1 -2 ~(y-ji)' YY n n
Note that b is known as the regression coefficient of yon
s,,=-~(y-y)
n n To find a, use the fact that (x, ji) lies on the line, x.
For the x and y data: If Y =a+ bx then ji =a+ bx
The covariance, s xy' connects the x and y data and the formula is
1 ._, __ ~xy __ Rearranging a 5'' /ric
orsx =-.cxy-xy=---xy
' n n To
pagefind the equation of the regression line y on x for the chemical
120: . experiment data on
In some textbooks and formulae booklets you might see the notation Sxx' and Sxy· These s,,
are known as the 'big S' formulae and are derived from the 'small s' formulae above as
X y x' y' xy
follows:
(~x)' 5 4 25 16 20
orS =~x 1 -nx 2 =~x'---
xx n 7 12 49 144 84
(~y)' 12 18 144 324 216
orS =~y'-nji 2 =~y ---
1
The big S formulae are useful in calculations where the factor of n cancels, but it should be There are five pairs of data, son 5.
remembered that they are not the formulae for the variance and covariance.
__ ~X 60 . ~ 79
X- -;;=S = 12 and ji = : =S = 15.8
The equation of the regression line y on x y
gradient m
1 1
Sxr=;:; ~xy- Xji =S X 1136-12 X 15.8 = 37.6
You are probably familiar with the equation of a straight line in 1 1
Sxx=;:; ~X 2 - X2 =5 X 874-122 =30.8
the form
y=mX+C
0
For the regression line y on x in the form y =a + bx:
where m is the gradient and c is the y-intercept.
b= sxy = 37.6-
When writing the equation of the regression line, a slightly gradient b sxx 30.8- 1.2207 ... = 1.22 (2 d.p.)
different format is usually used in which the constant term is
written before the x-term and the letters used are a and b. and a= Y- bx-
S - 15 . 8 - 1.2207 X 12 = 1.150 ... = 1.15 (2 d.p.)
The format is o the equation of the regression line yon xis y = 1.15 + 1.22x.
y=a+ bx
where b is the gradient and a is the y-intercept.
Making predictions using the regression line y on x
If you use the big S formulae: The regression line y on x gives you the average value of y for a given value of x, so in certain
60x 79 circumstances it can be used to predict or estimate missing values. This is known as
:Ex:Ey -1136 %188
Sxy %:Exy---- 5 interpolating from the given information.
n
2 60 2 The regression line y on x is used
(:Ex) -874--%154
2
sxx%:Ex - - n - 5 o when xis the independent variable and you want to estimate y for a given value of x, or
you want to estimate x for a given value of y.
b- Sxy% 188%1.2207 ... %1.22 (2 d.p.)
- sxx 154 d ) e when neither variable is controlled and you want to estimate y for a given value of x.
d b e to give a% 1.15 (2 .p .. For the chemical reaction data, in which x is the independent variable, you can use the
and a is calculate as a ov . . h following format for the
An alternative way of working out the equatton ts to use t e regression line y ~ 1.15 + 1.22x to estimate (a) y when x% 10, (b) x when y% 20, as follows:
equation of a straight line: . d . (h k) the equation of the line can (a) The estimate of y when x% 10, written j), is given by
If m is the gradient and the line goes through a hxe pomt ' ' Y% 1.15 + 1.22 X 10 ~ 13.35
be written (b) The estimate for x when y ~ 20, written x, is given by
y- k% m(x- h). d' t. band the fixed point is (x, y), so the 20% 1.1s + 1.22x
. the gra ten lS
\' X
In the case of the regre:slO~ me y on ' be written 1.22x ~ 18.85
equation of the regresswn lme y on x can
x% 15.4
orb ,c-::.
Warning: you must take care, though, as estimating outside the range of your data is
y 5' -~~ b(x --X) Sxx .
· t the equation ts unreliable. For example, for the chemical reaction data, when the reactants have formed their
For the above data relating to the chemical expenmen ' product, the reaction ceases and the mass would not continue to increase. Going outside the
range of data is known as extrapolating from the given information.
y -15.8% 1.2207(x -12)
Important note: In the situation where neither variable is controlled and you want to estimate
y -15.8% 1.2207x -14.648
x for a given value of y, you would use a different regression line, the least squares line x on y.
y% 1.2207x + 1.152 You would also use the regression line x on y if y is the independent variable. This is described
y% 1.22 x + 1.15 (2 d.p.) as before more fully on page 130.
Summarising:
The least squares
cc><n'f•.ssion line Y on X is Using a calculator to find the regression line y on x
y ~,_-a+ /;x Linear regression (LR) mode on the calculator enables you to input the pairs of data (x,, y;)
and then obtain the values of a and band also x, y, I:x, I:x 2 , :Ey, I:y 2 , :Exy and n. On the
calculator, the value of a is usually denoted by A and the value of b by B.
[1] [J [liD \DT\ (c) Find the equation of the regression line of p on t for Reg's data.
lrn [J ITIJIDTI (d) State, giving a reason, which of Reg or Norman you consider to be the fitter. (L)
IIill [J ~ IDTI
Equation of regression line:
You now have access to Solution 2.1
A~ 1.1506 .. .
ISHIFT Iill El y~A+Bx
(a) Scatter diagram to show Norman's data
ISHIFT I[]] El soy~ 1.15 + 1.22x
B ~ 1.2207 .. .
You can check the following
!·! iii ll JUU
Lx 2 ~ 874 \RCL\ III Red knn_-s i\, B, C, D, F :\!ld f
Lx~60
\RCL\ (ID 011_ thi1·ci ro\·V of t.::\knhtor.
\RCL\ [9 t:::
n~5 !•
Lf~ 1501
\RCL\ [l2j
Ly~79
\RCL\ ffiJ
Lxy~1136
\RCL\ [£}
\SHIFT\ I]] EJ :o
x~12
\SHIFT\il] EJ ~ EJ
Sxx = 30.8
\SHIFT\IIJ EJ
ji ~ 15.8 iF:
Syy ~ 50.56
~mEJ~EJ
To clear LR mode
\MODE\ I]]
Example 2.1
..•.(b) P••·'""·122.3I! -11.0t
'
~
"···· ,,,. "'' .. '"' ''" '"·'"
One measure of personal fitness is the time taken for an individual's pulse rate to return to so when t ~ 2.5, P ~ 122.3- 11.0 x 2.5 ~ 94.8
normal after strenuous exercise; the greater the fitness, the shorter the time. Reg and Norman
(c) Regression line of pont for Reg's data is p ~a+ bt
have the same normal pulse rates. Following a short programme of strenuous exercise they
both recorded their pulse rates P at time t minutes after they had stopped exercising. where a~ p- bt and b ~ s,,, ~ s,p
Norman's results are given in the table below. Su Su
4.0 5.0 - LP 829 ~ t
- 19.5
1.0 1.5 2.0 3.0 r~-~ -~103 625 t~--;;~ -6 ~2.4375
t 0.5 n 8 · '
102 94 81 83 71
p 125 113
To plot the(xline
including on the scatter
y·) th f diagram
h f, ydou nee d to wore
1 out three points on the line
To find busing the smalls format: , , e mean o eac set o ata. '
1 l.tP- iP
s , =- - =-
1 x 1867- 103.625 x 2.4375 = -19.210 ... From the calculator, x = 22.675 andy= 16 · 75 ' 80 P10 t (-x, Y
-) as accurately as you can.
11
n 8 Now choose two other x-coordinates and 1 1 h 1
1 1 should be within the range of data perhapcsa ctuthate t e y va ue for each. The x-coordinates
2
su=-l.t 2 -i 2 =-x63.75 -2.4375 = 2.027 ... , a e extremities.
n 8 Choosing x = 21.8 and x = 24.2:
-19.210
- - - = -9.4759 ... When X= 21.8, y = 11.47 + 0.2327 x 21.8 = 16.54 ... , so plot (21.8, 16.54).
2.027
To obtain this directly on the calculator key in
To find b using the big S format: 121.81 !SHIFT! [3)] to give 16.546 ...
l.tl.P 19.5x829
S,p= L.tP - - - = 1867 -153.6875 When x = 24.2, y = 11.47 + 0.2327 x 24.2 = 17.10 ... , so plot (24.2, 17.1).
n 8
(I.t) 2 19 5
2 Directly on the calculator: 124.21 !SHIFT! [I] gives 17.1048 ...
S,=I.t 2 - - - = 63.75 - - ·-= 16.21875
n 8 Now drfawh thde regression line, joining the three points, but do not take the line beyond the
range o t e ata.
b = S,p = -153.6875 -9.4759 ...
s" 16.21875 Scatter diagram to show the lengths (x) and breadths (y) of 12 cuckoo eggs.
To calculate a, use .
-
II "
a= P- bi= 103.625- (-9.4759) x 2.4375 = 126.72 ... .
i'
Regression line Pont for Reg is P = 126.7- 9.5t.
(d) Norman is fitter as his pulse rate decreases more rapidly. This can be seen from the
r:
gradients of the regression lines: the gradient for Norman is -11.0 and the gradient for ji .·-·
-
I!
Drawing a regression line on a scatter diagram
i!
Example 2.2 - ': -
The following data represent the lengths (x) and breadths (y) of 12 cuckoos' eggs measured in
millimetres.
X 22.3 23.6 24.2 22.6 22.3 22.3 22,1 23.3 22.2 22.2 21.8 23.2 :
y
16.5 17.1 17.3 17.0 16.8 16.4 17.2 16.8 16.7 16.2 16.6 16.4
H ! !fiT if h
Draw a scatter diagram for the data. 'n'' i.
!
• '. i I
Obtain the least squares regression line of y on X and plot this on the scatter diagram.
i
!:
(NEAB) I
I
i!: !1. il
Solution 2.2 ,· I: i:I,Pf
! I -
~~
I I I I I:
! !l
~I
--1 f ii
i [ [ [ L_i
-~11~
i i i
1r
i!:; I
i4q.p: ~~,.g:II
Ill 1 I I I
! ! II 1 f 1 j j
i~l~i inp 1~ P' ~~4?1
l l i I : I_ l [ 1 i_l [_1_[_1 i_~ --:~
The scatter diagram is shown below together with the regression line.
To find the equation of the regression line use the formulae or find it directly on the calculator
where you should find that A= 11.473 122 ... and B = 0.232.717 9 .... Giving values to four
significant figures, the equation of the least squares regression line of yon x is
y = 11.47 + 0.2327x
Considering the data of Example 2 ·2 , the summ ary m
. formatiOn
. Is
.
least squares regression line x on y Lx ~. 272.1, Lx' ~ 6175.69, Ly ~ 201, Ly2 ~ 3368.08, Lxy ~ 4559.04 and n ~ 12.
In Example 2.2, the regression line yon x would be used to estimate the breadth of a cuckoo's Jo~~~:~: the equation of the regression line x on y in the form x ~ cy + d calculate c and d as
egg, y, for a given value of tbe width, x. Note that neither the length nor the breadth of the
cuckoo's egg is controlled, so there is no independent variable. If you wanted to estimate the . Lx 272.1
width, x, for a given value of the breadth, y, you would use a different line, the regression line Fmd x~-;:;~12~ 22.675
x on y.
To calculated using the small 5 format:
The least squares regression line x on y is used
1 1
" when neither variable is controlled and you want to estimate x for a given value of y. 5 xy~;; Lxy- xy ~12 X 4559.03-22.675 X 16.75 ~ 0.11291 ...
" when y is the controlled (independent) variable and you want to estimate x for a given
- 1
s,,-;;Ly 2
-y-2 ~ 1 x3368.08-16.75 2 ~0.11083 ...
value of y, or y for a given value of x. Least squares regression
linexony 12
This time the horizontal distances n 1, n 2 , n 3 ••• from the d ~ s,, 0.11291 .. .
points to the line are considered. s,, 0.11083 ... 1.0187 ...
The sum of their squares, To calculate d using the big s format:
""
~ni 2 = n12 + n22 + n32 + ... n, Sx,~Lxy- LxLy ~4559.03- 272.1x201 1.355
is made as small as possible, i.e. the line is drawn so that n, n 12
I. n/ is a minimum. 2 (Ly) 2 201 2
s,~z:y ---~3368.08- --~133
n 12 ·
The equation of the regression line x on y d- sxy- 1.355
-s,, -1.33~ 1.0187 ...
The equation of the regression line x on y is often written in the form
To calculate c, use c ~ x- dy ~ 22.675- 1.0187 x 16.75 ~ 5.6101 ...
x -_,, c+ The equation of regression line x on y is x ~ 5.61 + l.02y.
. 1'me y on x.
It ts mterestmg to plot this on the scatter diagram ' together with the r egressiOn
and d
y-coordinates say y ~ 16.4 andy~ 17 0 d, l PI o h . 5, 16.75). Now choose two other
You know that the line must go through (x y) so l t (22 67
See page 122 for the formulae for sxy> s,,, Sxy and S,. ' · an ca cu ate t e value of x.
Also, since the line goes through (x, y), the equation can be written When y ~ 16.4. x ~ 5.61 + 1.02 x 16.4 ~ 22.3
When Y ~ 17.0, x ~ 5.51 + 1.02 x 17.0 ~ 22.95
X - X~ d()'- ji)
dis known as the regression coefficient of x on y. Plot (22.3, 16.4) and (22.95, 17.0) and join the three points with a straight line.
Note, however, that dis not the gradient of the line. This can be seen by rearranging the
equation x ~ c + dy gradient ~
y o=
dy~x- c
y~(~)x-J.
1
So the gradient of the regression line x on y is d and the
. . c
y-mtercept ts -d.
r
_;+
1
2.2 on page 128. ' g e ata for the cuckoos' eggs given in Example
iI i1 cf
!i li T i SetLRmode
Casio 85W/85WA/570W
/MODE/IIJIIJ
Ji or /MODE//MODE/IIJIIJ
il ~[
Clear memories
l /SHIFT/[@ B
1ii'
H J t Input data
/16.5/ [J /22.3/ [DT/
~
~·
/17.3/[J /24.2//DT/
,1
fl /16.4/ [J /23.2//DTJ
I i! You now have access to
: A~ 5.610 ... (c) Equation of regression line
i /SHIFT/IIJB X~ C + dy
B ~ 1.0187 ... (d)
/SHIFT/[[] B ie x ~ 5.61 + 1.02y
You can check the following
1: Y2 ~ 3368.08
[1\J
'i !Jii 1:y~201
/RCLj
/RCLJ [!) Red third
'!1~~, ,:· Hi1
lt'ltUS Utl
t t
[?: 11 i I il p 'i n~s
/RCL/(9
l'tl\-V (_lf ~._-~LkuLnor.
1:x'~6175.69
Notice that the lines are not the same; in fact they are quite far apart. You will see later that /RCL/IQJ
this indicates that the correlation is not very strong (page 139). LX~ 272.1
/RCLj [[)
1:xy ~ 4559.03 Note that if your
/RCL/(I] calculator shows what is
ji ~ 16.75
Using a calculator to find the regression line x on y /SHIFT/II] B being found on the
syy~0.1108 ... display, you should read
The procedure, using linear regression mode, is similar to that described on page 126 for x~22.675
llli!ITJIIJB~B y- for x and x for y when
calculating the line y on x. This time, however, input the data with they-coordinate first. For /SHIFT/13] B checking these.
the equation x = c + dy, the value of cis given by A on the calculator, and the value of d by B. s"" ~ 0.4852 ...
llli!mliiB~B
To dear LR mode
/MODE/II]
Example 2.3
1 4. To test the effect of a new drug twelve patients It is given that Lx = 567, Ly = 552, Lxy = 36 261,
2=- l:x 2 ~ 37 777, Ly 2 ~ 36 112.
d were examined before the drug was administered
d = 0.5 and given an initial score (I) depending on the {a) Find the equation of the estimated least
severity of various symptoms. After taking the squares regression line of Yon X.
X= c + 0.5y drug they were examined again and given a final {b) A tenth student obtained a mark of 70 in the
So score (F). A decrease in score represented an Christmas examination but was absent from
improvement. The scores for the twelve patients the summer examination. Estimate the mark
You are given that (1, 4) lies on the line are given in the table below. that this student would have obtained in the
1 = C + 0.5 X 4 summer examination. (C)
Score
c=-1 7. For a period of three years a company monitors
· 1· nyisx--1+0.5y. Patient Initial (I) Final (F) the number of units of output produced per
The equation of the regresston me x o -
quarter and the total cost of producing the units.
1 61 49
The table below shows their results.
2 23 12
3 8 3 Units of output Total cost
. f I st squares regression lines 4 14 4 (x) 1000's (y) £1000
Exercise 2a Equat1ons 0 ea . r Itisagoodideatobeabletousethe 5 42 28 14 35
1 I . he equations of the regressiOn mes.
Use the method you prefer for ca cu atmg \ at usin the formula. . . 6 34 27 29 50
calculator in LR mode and to be competen g 2 Th f Hawing data show, in convement umts~ 55 73
· d 0
· theeyield · l reac t"ton run at vanous 7 32 20
1. For each set of data, fm . 1" f x (y) of a chemtca
8 31 20 74 93
(a) the equation of the regress~ on ~ne f Yon '
0
different temperatures (x): 11 31
(b) the equation of the regre~slOn lme o x on y. 9 41 34
23 42
Plot them both on a scatter dtagrarn and Temperature (x) Yield (y) 10 25 15 47 65
comment. 11 20 16 69 86
110 2.1
Data set.l 12 50 40 18 38
120 4.3
22 23 26 36 54
9 11 14 14 15 21 3.1 Calculate the equation of the line of regression of
X 3 7 130 61 81
3.4 F onl.
23 16 10 20 25 140 79 96
y 5 12 5 12 10 17 2.9 On the average what improvement would you
150
5.5 expect for a patient whose initial score was 30? (Use l:x 2 ~ 28 740; Lxy ~ 38 286)
Data set 2 160
3.3 (MEl) (a) Draw a scatter diagram of these data.
170 (b) Calculate the equation of the regression line
X y
5. For a given set of data of y on x and draw this line on your scatter
(a) Plot the data. Comment. on w~ether it diagram
85 Lx ~ 15, Lx 2 ~55, Ly ~ 43, l:y 2 ~ 397,
1 appears that the ~sua! stmpl~ lmear The selling price of each unit of output is £1.60.
5 82 regression modelts appropnate. . Lxy=145,n=5
(c) Use your graph to estimate the level of
5 85 (b) Assuming that such a model is ~ppropnate, Find the equations of the regression lines y on x, output at which the total income and total
5 89 estimate the regression line of yteld on andxony. costs are equal.
6 78 temperature. . h and (d) Give a brief interpretation of this value.
66 (c) Plot your estimated hne on your grap ' 6. The following table shows the marks (x) (AEB)
7.5
indicate clearly on your gr~ph ~h.e ~~s~~ces, obtained in a Christmas examination and the
7.5 77
the sum of whos~ squares ts mmtmtse JEI) marks (y) obtained in the following summer 8. From a set of pairs of observations of the
7.5 81 the linear regresston procedure. ( examination by a group of nine students. variables x andy, it is found that the regression
10 70
line of yon x passes through the point (0, 1.8). If
11 74 3. In a certain heathland rehgion ~here i~~J~:ge the means of the x andy values are 5.0 and 8.3
12.5 65 number of alder trees w ere t e gro . d respectively, find the equation of the regression
69 marshy but very few where the ground ts ry. line of yon x in the form y =a+ bx. (L)
14
14.5 63
_,..--
I
.. I
r
0
J 40 yonx
The technician now varies the temperature ( C) .
~xand ..
30 9150
K 10 400 while keeping other conditions as constant as
37
L possible and obtains the following results ~ ~~ Y coincide
2
(You may assume that :Ex= 459, Th;: = 22 889,
.,/
.
Yield, y Temperature, t
Ly ~ 132 600, Lxy ~ 6 094 750)
(a) Plot the data on a scatter diagram. 127.6 70
(b) Estimate values that could have been used
for a and b last year by fitting the regression 128.7 75 Perfect position correlation x Strong positive strong Some positive correlation x
line y =a + bx to the data. Draw the line on 80 r=l correlation r = 0.8 r == 0.5
130.4
the scatter diagram. 131.2 85
(c) Comment on whether the suggested method
.,
133.6 90
is likely to prove reasonably satisfactory in y y y
practice. He calculates (correctly) that the regression line
j
(d) Without recalculating the regression line find xo~~
is y ~ 107.1 + 0.29t.
the appropriate values of a and b if every
employee were to receive a rise of (i) £500 a (c) Draw a scatter diagram of these data
· ... .. x on y coincides
year, (ii) 8%, (iii) 4% plus £300 per year. together with the regression line. . withy on x
(e) Two employees, Band C, had to work away (d) The technician reports as follows, 'The •.......
.:: :-r·:
·: :Yon X .
from home for a large part of the year. regression coefficient of yield on percentage .... yonx \
In the light of this additional information, additive is larger than that of yield on yonx xony
suggest an improvement to the model. temperature, hence the most effective way of
(AEB) increasing the yield is to make the
No correlation r = 0 Some negative correlation x Strong negative correlation x Perfect negative correlation x
percentage additive as large as possible, r = -0.4 r= -1
12. In a regression calculation for five pairs of r = -0.9
within reason.'
observations one pair of values was lost when Criticise the report and make your own
the data were filed. For the regression of y on x recommendations on how to achieve the
the equation was calculated as maximum yield.
y~ 2x- 0.1
There are ten pairs of data, son~ 10.
r is a very useful measure because it is independent of the units of scale of the variables. It is
x ~ Lx _ 528 _ _ Ly 666
calculated as follows. n -10-52.8 and y~--;;-~10~66.6
Casio 85W/85WA/570W
Set LRmode IMODEl [I) [I]
or IMODEI F.IM~o=D=EI [I] [I]
Clear memories
Output
r~0.826 ... [ili!TIJ 0
Clear LR mode IMODEI[IJ
NOTE: The value of r should be considered in conjunction with a diagram.
By calculation, or using the data already in your calculator, you should find that the
regression line yon x has equation y = 38.7 + 0.527x. See page 126 if you have forgotten how
to obtain this.
Also, it can be shown that the regression line x on y has equation x = -41.1 + 1.41y. Check In Example 2. 6,
this yourself on your calculator. b = 0.527, d= 1.41 so rz= b x d= 0.743 ...
The diagram shows the scatter diagram together with these two regression lines. r= Yb x d= 0.86 (2 d.p.)
As expected, since r is close to 1, the two regression lines are close together. The scatter
diagram confirms good positive correlation. Example 2.7
Solution 2. 7
0 20 40 60 80 Example 2.8
Mark in Physics
:f:'-~'regres ion
N b sxy
ow =-,sob=O· I d sx, regression
sxx ' a so =-,sod= 0 linex ony
Now 5
or The · " •
but ;~uatwn of the regression line yon xis y =a+ bx
- 0, therefore the equation is y = a. '
lineyonx
The equation of th · . X=c
d=0 h f e regresswn 1me x on y is x = c + dy b
t ere ore the equation is x = c. ' ut
0
T
144 /-\CONCiSE. COi.mS[ \N A.-ITv'E_L_ ST,t. TiSTICS
:
-2 -1 0
(c) A 1 3
1 4
\ 4 1 0 \ s 1 2 3 4 5 6 7 8 B 2 4
3 9
c 2 5
2 2 3 3 t 12.4 12.8 12.6 13.9 13.4 13.2 14 14.6
:
(b) 1 1 1 2 D 4 5
3 8 E 5 4
1 2 3 1 2 3 1 2 \ (d)
t 27 43 62 89 72
\ F 5 8
z 48 so 81 75 60 G 6 6
H 7 6
Solution 2.9 I 8 6
(a) Using a calculator for the first set of data, you should find that r = 0, indicating no linear 2. For a given set of data
J 8 7
correlation. But there could be some other relationship between the variables. l.:x ~ 680 l.:y ~ 996 l.:x 2 ~ 20 154
K 9 8
l.:y' ~ 34 670 l.:xy ~ 24 844 n ~ 30.
y L 9 10
Find the product-moment correlation coefficient.
Dete~~ine the product-moment correlation
3. The following data relate to the percentage coeff1c1ent.
You may have noticed that the unemployment and percentage change in wages
2
points all lie on the curve y = x • over several years. 5. Ten boys compete in throwing a cricket ball and
the table shows the height of each boy (x em')
There is a relationship between the % Unemployment % Change in wages ~easured to the nearest centimetre and the
variables - it is a quadratic one. (x) (y) distance (y m) to which he can throw the ball.
1.6 5.0
Boy X y
2.2 3.2
-2 -1 0
NOTE: r = 0 implies that either there is no correlation between the variables and they are 2.3 2.7 A 122 41
1.7 2.1 B 124 38
independent, or the variables are related in a non-linear way.
1.6 4.1 c 133 52
(b) Using a calculator for the second set of data, you should find that r = 0.86 (2 d.p.), 2.1 2.7 D 138 56
apparently indicating a strong degree of positive correlation. 2.6 2.9 E 144 29
1.7 4.6 F 156 54
Scatter diagram to illustrate (b) 1.5 3.5 G 158 59
y 1.6 4.4 H 161 61
8
(c/:'81
But you can see from the scatter diagram that there I 164 63
(a) Calc~l~te the product-moment correlation
is not strong positive correlation. J 168 67
6 cocff1c1ent between x and y.
4 The value of r has been distorted by the point (9, 8), (U~e l.:x ~ 18.9, l.:y ~ 35.2, :Ex'~ 37 01 Calc~l~te the product-moment correlation
XXX l.:y ~ 132.22, l.:xy ~ 64. 7) · ' coeff1c1ent.
XXX
known as an outlier.
2 C:alculate also the equations of the regression
XXX It has been suggested that low unemployment
hnes of yon x and x 011 y. (AEB)
0 and a low rate of wage inflation cannot exist
0 2 4 6 8 together. NOTE_: check your value of r by using the
So a value of r close to 1, or -1, does not necessarily imply a strong degree of linear regression coefficients obtained in the equation
correlation. Always check by referring to a scatter diagram. of the regression lines. s
Consider this example: The five finalists in the Count D
10. The body and heart masses of fourteen Red Setter, a Terrier and a Cocker Spaniel T . d y ogl Show were a Bulldog, a Poodle, a
6. The heights h, in centimetres, and weight W, in ten-month-old mice are tabulated below: preference. The dog they liked best . k ~o JU ges ran <ed the dogs in order of
kilograms, of ten people are measured. It is was ran e 1 and the results are shown in the table:
found that I:h ~ 1710, r;W ~ 760, I:h ~ 293 162,
2
Body mass Heart mass
I:h W ~ 130 628 and I:W2 ~ 59 390. (x g) (ymg) Dog
Calculate the correlation coefficient between the
118 Bulldog Poodle Setter Terrier Spaniel
values of h and W. 27
What is the equation of the regression line of W 136
30 X 1 2 3 4 5
onh? (O&C) 156 Judge
37 y 3 2 4 1 5
38 150 d =rank x- rank y d -2 0 -1 3 0
7. For a set of data, the equations of the least
32 140 d' 4 0 1 9 0 I:d' 14
squares regression lines are
155
y ~ 0.648x + 2.64 (yon x) and 36
To calculate Spearman's rank correlation coefficient, use
x ~ 0.917y -1.91 (x on y) 32 157
find the product-moment correlation coefficient 32 114 6J:,d 2
144 with n~ 5 and r,d2~ 14
for the data. 38
42 159
8. For a given set of data the equations of the least 7
36 149 So 1-10~0.3
squares regression lines are
170 5 X (25 -1)
y ~ -0.219x + 20.8 (yon x) and 44
actually derived from the pr~d t ' PI e~rman s rank correlation coefficient is
x ~ -0.785y + 16.2 (x on y) 33 131 But what does this value of r tell you? In fact S , .
38 160 uc -moment corre atwn coefficient, and is such that
Find the product-moment correlation coefficient
for the data. (a) Draw a scatter diagram of these data. l < T,
(b) Calculate the equation of the regression line
9. For a given set of data, the regression line yon x of y on x and draw this line on the scatter r s = 0.3 indicates a weak positive correlation between h .
is y"" 0.4 + 1.3x and x on y is x = -0.1 + 0.7y. way, it indicates a small degree of agree t b the two rankmgs. To put it another
diagram. men etween t e two Judges.
Find (a) the product-moment correlation (c) Calculate the product-moment coefficient of
coefficient, {b) X andy. correlation. (AEB) r s = + 1 means that the rankings are in perfect agreement.
r s = 0 means that there is no correlation between the rankings.
r s = -1 means that the rankings are in comp lete d.Isagreement. In fact they are in exact reverse
order.
SPEARMAN'S COEFFICIENT Of RANK CORRELATION, r5
To illustrate this, consider three different sets of judges at the Dog Show.
You have used the product moment correlation coefficient, r, as a measure of the strength of
the correlation between the paired data (x 1 , y 1 ), (x 2 , y 2), ... , (x,. y,.). This is reasonable FlrSt parr of judges:
provided that both x andy can be measured. Sometimes it is not possible to measure certain
Bulldog Poodle _ Setter Terrier Spaniel
variables, but it is possible to arrange them in order.
(Perfect A 1 2 3 4 5
For example, if two wine experts were asked to place six wines in order of preference, they
agreement) B 1 2 3 4 5
would rank the six wines in order, using the numbers 1, 2, 3, 4, 5, 6.
d 0 0 0 0 0
The wine they liked best would be ranked 1. d' 0 0 0 0 0 Ld 2 0
The wine they liked least would be ranked 6.
6I:d 2
It is possible to measure the strength of tbe correlation between the two rankings by using rs 1 n(n2 _ ) 1-0 = 1 and the ranldngs arc in perfect agreement.
1
Spearman's coefficient of rank correlation, r 5 •
Second pair of judges:
In general, this is obtained as follows:
Bulldog Poodle Setter Terrier Spaniel
" Assign ranks 1, 2, 3, ... , n to the values of each variable. This can be done by putting the
values in descending order or in ascending order, but whichever you choose, you must use (No c 1 2 3 4 5
correlation) D 4 1 3 5 2
the same rule for both sets of data.
e For each pair of values, calculated d where d ~rank x- rank y. d 3 1 0 -1 3
d' 9 1 0 1 9 r;d' 20
e Calculate r, using the formula
6I:d 2 6 x20
r, 1 1 - 5 x 24 = 1 - 1 = 0 and there is no correlation between 1·ankings.
n(n 2 -1)
T
i
I
It is interesting to compare the value of r s with the value of r, the product-moment correlation
Third pair of judges: coefficient.
Setter Terrier Spaniel
Bulldog Poodle Using your calculator in linear regression mode, or using the formula, you should find that
3 4 5 r=0.15 (2 d.p.).
E 1 2
(Complete 2 1
5 4 3 The two values of the correlation coefficient are very similar in this example.
disagreement) F
2 4
-4 2 0 Plotting a scatter diagram of the marks does not appear to indicate much correlation.
d 4 16 "Ed 2 ~40
16 4 0
d' y
80 X
X
- 61: dz 1 - 6 x 40 = 1 - 2 = -1 and the rankings are in exact reverse order.
= X
X
r,-1 n(n'-1) 5x24 . 60
k d ld be ositive or negative. Since you are gomg X X
NOTE: the difference between ihe ran s, ld' cou .t ~h umerical value for the difference in 40 X
to square this value to obtain d 'you cou JUSt wn e en
the table. This is written I d I' so in the table above, for Bulldog X
20
RankE- Rank F ~ 1- 5 ~ -4 so I d I~ 4 and d' ~ 16.
0
0 lQ 20 30 40 50 60 70 80 90 X
Example 2.10
Spearman's coefficient of rank correlation can be found when data have already been ranked
The marks of eight candidates in English and Mathematics are:
as in the following example.
6 7 8
1 2 3 4 5
Candidate
76 43 40 60
50 58 35 86
English (x)
54 82 32 74 40 53 Example 2.11
Mathematics (y) 65 72
e find Spearman's rank correlation coefficient between the two sets Two judges rank the eight photographs in a competition as follows:
Rancl t h e resu lts and henc
of marks. Comment on the value obtained. Photograph A B c D E F G H
1st judge 2 5 3 6 1 4 7 8
Solution 2.10 .
. h . f data so n =8 Ranking the lowest mark 1 and the highest rank 8 gtves 2nd judge 4 3 2 6 1 8 5 7
There are etg t patrs o , ·
the ranks as shown in the table. Calculate Spearman's coefficient or rank correlation for the data.
86 76 43 40 60
English (x) 50 58 35
53 Solution 2.11
82 32 74 40
Maths (y) 65 72 54
6 In this example, the data have already been ranked.
8 7 3 2
Rankx 4 5 1
2 3 Rankx 2 5 3 6 1 4 7 8
4 8 1 7
Ranky 5 6
0 3 Ranky 4 3 2 6 1 8 5 7
3 0 6 4
ldl 1 1
0 9 "Ed' 72 ldl 2 2 1 0 0 4 2 1
9 0 36 16
d' 1 1
d' 4 4 1 0 0 16 4 1 "Ed 2 ~30
6"Ed 2 6"Ed 2
rs = l n(n 2 -1) rs = 1 where n~ 8
n(n 2 -1)
6(72) 6(30)
1
~ - 8(64-l) ~1
8(64 -1)
= 0.14 (2 d.p.)
= 0.64 (2 d.p.)
Spearman's coefficient of rank correlation is 0.14 (2 d.p.). .
Spearman's coefficient of rank correlation for the data is 0.64, indicating some agreement
This appears to show a very weak positive correlation between the English and Mathematics between the judges.
rankings.
f( _(, 1-\l
® Formulae to calculate r
Alternatively X 1.0 1.5 2.0 2.5 3.0 3.5 4:o 4.5 5.0
y 49 60 66 62 72 64 89 90 96
x _ x = d(y -ji) where
The quantity x is a measure of the amount of chemical applied, andy is the contrast index,
Wi Linear correlation . which takes values between 0 (no contrast) and 100 (maximum contrast).
. . . sure of the strength of the hnear
The product-moment correlation coefflClent, r' IS a mea (a) Plot a scatter diagram to illustrate the data.
correlation -1 < r < 1. (b) It is subsequently discovered that one of the samples of film was damaged and produced
• • • • • an incorrect result. State which sample you think this was .
•• • •
•
• • •
• • • • • In all subsequent calculations this incorrect sample is ignored. The remaining data can be
• • • • • • • •
• • • • • • • summarised as follows:
• • • • •
•
• • • • • • • •
• •
• • • • • • • :Ex= 23.5, :Ey= 584, :Ex 2 =83.75, :Ey 2 =44622, :Exy=1883, n=8 .
• • • • •
• Perfect positive
(c) Calculate the product moment correlation coefficient .
No correlation Some positive (d) State, with a reason, whether it is sensible to conclude from your answer to part (c) that
High negative correlation
Perfect negative {= 0 correlation
r= 1
correlation correlation r = 0.5 x and y are linearly related.
r = -1 r = -0.8
(d) Yes it is sensible to conclude that x andy are related. Since r is very close to 1, it would
. b c 1 late the values of a and b,
(e) The line of regression of yon x has equatlon Y =a+ x. a cu appear to indicate a very strong position linear correlation.
each correct to three significant figures. . d d' to the (e) For the regression line y =a+ bx, a= y- bx
(f) Use your regression equation to estimate what the contrast m ex correspon mg
damaged piece of film would have been if the piece had been undam~ged.
20.9375
. and --- = 11.38 ...
State with a reason, whether it would be sensible to use your rl~gdress!Ohn efqluat!On rtoo (C) 1.839 ...
(g) ' h h . f h mica! app Je to t e 1 m IS ze .
estimate the contrast index w en t e quanttty o c e 167.5
or ---=11.38 ...
14.38 ...
Solution 2.13 a=y-bx=73-11.38 ... x2.9375=39.57 ...
(a) Scatter diagram y = 39.6 + 11.4x (3 s.£.)
10~ u~~ !U nu (f) When x = 3.5, y = 38.57 ... + 11.38 ... x 3.5 = 79 (2 s.f.)
The contrast index would have been 79.
80
(g) No it would not be sensible to use the regression equation when x = 0, since this is outside
the range of data. Extrapolating outside the data is unreliable.
60
40
!l
0
!i!l Iii
2 3
II! 4 5 '
Example 2.14
0 The rules for a flower competition at a village fate are as follows.
(b) Sample F was damaged. Three judges each give a score out of 100 to each entry. The two judges whose rankings
23 5 LY 584 are in closest agreement are identified, and their scores for each entry are added. The
(c) x =LX= · = 2.9375 and y=-=-=73 three prize-winners are those whose total scores from these two judges are the highest.
n 8 n 8
The scores of the third judge are ignored.
To calculate r:
Using smalls format The judges awarded marks as shown in the table below.
5
=~ LXY- xy =~ x 1883-2.9375 x 73 = 20.9375 Contestant A B c D E F G
xy n 8
Judge X 89 83 80 72 69 54 41
5
=~Lx2-x 2 =~x83.75-2.9375 2 =1.839 ... Judge Y 77 84 85 65 79 72 69
XX n 8
Judge Z 73 83 89 80 67 75 69
5 =~Ly2-i=~x44 622-73 2 =248.75
" n 8
sxy 20.9375 = 0. 9787 ... The value of Spearman's rank correlation coefficient between X andY is 0.5, and between X
:. r= sxsy -,J1.839 ... -,J248.75 and Z is 0.46, correct to two decimal places. Calculate the value of Spearman's rank
correlation coefficient between judges Y and Z, and hence establish which were the three
Using big S format
LX LY 23.5 X 584
pnze-wmners. (C)
S =Lxy---=1883 167.5
xy n 8
2
(Lx) 2 23.5
2
SXX =Lx ---=83.75---=14.71
n 8
(Ly)2 584 2
S =Lyl---=44622---=1990
YY n 8
Sxy 167.5 0.9787 ...
:. r S"S y -,J14.7d1990
So r = 0.98 (2 s.f.).
(You should try this on your calculator, using LR mode.)
The diagram shows a scatter diagram of these data.
Solution 2.14 (a) Comment on the suggested model.
c D E F G (b) Suggest, giving reasons, a better model to represent the relationship between y and h
A B 3 •
7 1 5 3 2
Rank Judge Y 4 6 The new variable x 10h000 was ca lculate d an d t h e values of x andy are given in the table
7 5 1 4 2
Rank Judge Z 3 6
4 1 0 below.
1 0 0 4
ldl 16 1 0
1 0 0 16 12.5 19.5 25.0 31.4 55.1 68.1 88.5
d' X
,
i , i Lie Ui ~~
i[.·.' [.[[ . • ,.:•:···
Hi ! I •
i
u
I
Example 2.15 I: II-
I
H .. I. I· . i'
[+
~ !'
I
~20
".
~
• 5
HL I I' I
I I' I I, Ii !I] 1 · I · : ,
~ 15
• 1
, I , :·i
10
•
n,
H I i•
iLIT~;jj IJ i'
[J
•• 10
I •i' I··· . i
5 •• lc L [j
1-' I I I•
... u fc
!•
I·
i .
.
0 50 100
Length {em)
l?u I t~
A mother monitored the growth of her baby and recorded the length h em and weight y kg at
various stages in the baby's development. The results were as follows.
5
11 i1 i
i; !
I
i
i n:
h 50 58 63 68 82 88 96 1=: 'il! .l i i-1 .'
1i· 1 r· 1•
IJ [J
11 '
'
i;
17.95 0
6.31 7.18 10.63 13.60 20 30 40 50 60 70 so gox
y 4.43 4.88 0 10
The scatter diagram of y against x suggests that a linear model would be a reasonabl f't
The mother thought that a model of the form (d) LX= 300.1, Z,y = 64.98 e 1.
y=P +qh Equation of regression line y on x is y =a + bx
where p and q are constants, might be suitable to describe the relationship between y and h.
160 i\ CO~-ICIS'i:" CCUHSI::. !N Pd f~I/H ST!-ITiSTiCS
(a) Calculate the equation of a suitable (d) Estimate the pH of skimmed milk at 20 oc
regression line from which a value oft can and at 95 oc. In each case indicate, with a
To find b using small s format be estimated for a given value of w. Simplify reason but without further calculation, how
~xy -- 3634.185-300.1 x 64.98 ~ 121.1999 ... your answer as far as possible, giving the reliable you think these estimates might be.
sXJ' ~--xy 7 7 7 constants correct to three significant figures. {e) Find the temperature at which you would
n (b) Use your equation to estimate the perceived expect skimmed milk to have a pH of 6.5.
~x' _, 17 653.33 _{ 300.1 \' ~ 683.944 ... temperature when the wind speed is
(i) 38 miles per hour,
(NEAB)
sxx=-;;--x 7 \ 7 } (ii) 55 miles per hour. 4. The price £x of a certain cassette recorder is
(c) Calculate the value of the product moment increased by £2 every six months. The number of
, b ~ Sxy 121.1999 "' ~ 0. 1772 ... correlation coefficient for the data, and state recorders sold during the six months before the
sxx 683.944 ... what this indicates about the data. next increase is y thousand. The values covering
(d) Comment on the reliability of the two eight consecutive periods are shown in the table.
To find b using big S format estimates found in (b). {C)
X 40 42 44 46 48 50 52 54
~x~y 300.1 X 64.98 ~ 848 . 399 ... 3. The following data were collected during a
sxy ~~xy---~3634.185
n 7 study, under experimental conditions, of the y 12.8 11.6 11.3 10.3 10.7 9.1 8.9 9.2
effect of temperature, x oc, on the pH, y, of
(~x)' (300.1) 2 skimmed milk. [Lx ~ 376, Lx 2 ~ 17 840, Ly ~ 83.9,
4787.614 ...
s XX
~~x'---~17653.33
n 7 Ly 2 ~ 893.33, Lxy ~ 3898.4.]
Temperature pH (a) Plot a scatt~r diagram for the data.
(xoC) (b) Obtain, in the form y =a+ bx, the equation
, b ~ Sxy ~ 848.399 "' 0. 1772 .. (y)
of the regression line of y on x, giving the
" sXX 7487.614 ... 4 6.85 values of a and b correct to three significant
64.98 9 6.75 figures. Plot this line on your scatter
x300.1
-- ~ 1.685 ...
a~y-- bx~--- 01772 · 7 17 6.74 diagram.
7 . 1' . 24 6.63 (c) Calculate an estimate of the number of
Giving values to three significant figures, the equation of the regresston me ts recorders sold when the price is £58, and
32 6.68
comment on the reliability of your estimate.
40 6.52
y ~ 1.69 + 0.177x. (d) Without further calculation, state whether
46 6.54 the regression line of x on y will be the same
h'
753
~ 42.1875 57 6.48 as the line plotted in part (b). Give a reason
(e) Whenh~75, x 10000 10000 63 6.36 for your answer. (C)
Whenx~42.1875, y~1.68+0.177x42.1875~9.16 (3 s.f.) 69 6.33
6.35 5. Explain, briefly, your understanding of the term
72
When the baby is 75 em long, an estimate of the weight is 9.16 kg. 'correlation'.
78 6.29 Describe how you used, or could have used,
correlation in a project or in classwork.
(a) Making reference to the following scatter
Twelve students sat two Biology tests, one
diagram for these data, explain what it
theoretical and one practical. Their marks are
reveals about the relationship between x
Miscellaneous exercise 2d andy.
shown in the table.
w t
1. A set of bivariate data can be summarised as ~ l.O ..... Marks in. theoretica~ Marks in practical
follows: 25 -"', ' i ....
..
....
0 ..... ' ...
test (T) test (P)
n~6, Lx~21, Ly~43, 5 21
I
a'· " . ..
....
Jun 1 22.3
y
6.25 8.02 8.42 5.27 7.21 8.71 5.68 X X 0 OA
X
2 20.2 X
X 5 1.5
Jul [Lxy ~ 654.006, Lx ~ 91, Lx ~ 1191.72,
2
17.9 X X 10 3.4
Aug 3
16.1 Ly ~ 49.56, Ly 2 ~ 362.1628] 15 5.5
Sep 4 (a) Find the linear {product-moment) (c) Y X 20 7.7
5 16.8 correlation coefficient between x andy. 9.7
Oct 25
6 12.6 (b) Find the equation of the least squares 30 11.7
Nov regression line of y on x and also that of
7 10.9 35 13.5
Dec x ony. 40 15.4
(c) Given that the rainfall in the growing season
(a) Plot a scatter diagram of the data using as of a subsequent year was 14.0 em, estimate
x coordinates the coding shown in the table You may assume that :Ex= 180, :Ey = 68.8,
and the maximum temperature as the the yield in that year. Lxy ~ 1960, Lx 2 ~ 5100.
(d) Given that the yield in a subsequent year was (d) y
y coordinate. Mark the mean point of the 8.08 tons per acre, estimate the rainfall in (a) Plot the data on a scatter diagram.
data on your graph. the growing season of that year. (C) (b) Calculate the regression line y =a+ bx and
{b) Given that :Exy = 416.7, demonstrate that the draw it on your scatter diagram.
gradient of the line of regression of y on x is 17. Following a leak of radioactivity from a nuclear (c) Predict the temperature 60 minutes from
-1.80 (to three significant figures). What is power station an index of exposure to switching on the fire.
the physical meaning of this gradient? radioactivity was calculated for each of seven Why should this prediction be treated with
(c) Calculate the full equation of regression of geographical areas close to the power station. caution?
maximum temperature on month.
Mixed test 28
Mixed test 2A 1. The average trade-in value of a particular make Give a reason why this estimate differs from the
3. Two people, X andY, were asked to give marks of used car depreciates with time according to actual number of hours of sunshine on May 5th.
1. The following table shows the amount of water, out of 20 for seven brands of fish finger. The the following table, in which the values of x may
in centimetres, applied to seven similar plots on Explain the conc.ept of least squares by reference
results are recorded in the table. be assumed to be exact. to your scatter dugram and the regression line of
an experimental farm. It also shows the yield of
F G y onx. (C)
hay in tonnes per acre. Brand A B c D E Age (x years) Value (£y thousand)
Yield of hay (y) 18 2 1 4 15 2.0 6.10 3. A car manufacturer is testing the braking
Amount of water (x) X's mark 8 10
1 19 2.5 5.55 distance for a new model of car. The table shows
5 14 12 9 4
4.85 Y'smark 3.0 5.09 the braking distance, y metres, for different
30
speeds, x km/h, when the brakes were applied.
45 5.20 Construct a table of ranks and calculate 3.5 4.65
(C) 4.5 3.89
60 5.76 Spearman's rank correlation coefficient. Speed of car,
6.60 5.0 3.51
75 x lcm/h 30 50 70 90 110 130
4. Values of x andy for a set of bivariate data are 6.0 3.31
90 7.35
given in the following table. 7.0 2.50 Braking distance,
105 7.95
y [n ~ 8, Lx ~ 33.5, LY ~ 34.6, LX 2 ~ 161.75, y metres 25 50 85 155 235 350
120 7.77 X
LY 2 ~ 160.2014, LXY ~ 130.035.] (to the nearest 5 metres)
0.1 1.97
(a) Calculate the product moment correlation
(Use Lx 2 ~ 45 675; LXY ~ 3648.75) 0.2 1.94
(a) Find the equation of the regression line of y
coefficient between x and y, and state what LX~ 480, LX 2 ~ 45 400, LY ~ 900
0.3 1.89 its value tells you about a scatter diagram LY 2 ~ 212 100, LXY ~ 94 500. ,
on x in the form y=a+ bx. 1.82
0.4 illustrating the data. (a) Plot a scatter diagram.
(b) Interpret the coefficients of your regression
0.5 1.73 (b) It _is required to estimate the value of y when (b) Calculate the equation of the regression line
line. x ts 4.0. Calculate the equation of a suitable of y on x and draw the line on your scatter
(c) What would you predict the yield to be for 0.6 1.62
line of regression, and use it to obtain the diagram.
x = 28 and for x = 150? Comment on the 0.7 1.49
reliability of each of your predicted yields. required estimate. {c) Use your regression equation to predict
0.8 1.34 (c) Interpret the gradient of the line of
(L) values of y when x = 100 and x = 150.
0.9 1.17 regression in the context of this situation. Comment, with reasons, on the likely
(d) State, with a reason in each case, whether accuracy of these predictions.
[n~9, Lx~4.5, Ly~14.97, Lx ~2.85,
2
2. In a physics experiment, a bottle of milk was
brought from a cool room into a warm room. Its
you could use your equation to obtain a (d) Disc~ss briefly whether the regression line
Ly 2 ~ 25.5309, LXY ~ 6.885.] reliable estimate of provtdes a good model or whether there is a
temperature, y oc, was recorded at t minutes (a) Calculate the product moment correlation
after it was brought in, for 11 different values (i) y when x ~ 10.0, better way of modelling the relationship
coefficient for this data and state what its (ii) x when y ~ 3.00. (C) between y and x. {MEl)
oft. The results are summarised as: value tells you about the relationship
Lt ~ 44, Lt 2 ~ 180.4, Lty ~ 824.5, between x andy. 2. The following table gives x, the number of hours 4. In the t~.o rounds of a show-jumping
LY ~ 205. of sunshine, andy, the mid-day temperature in competttwn, seven riders recorded times in
(a) Calculate the equation of the line of oc, at Springtown on the first seven days in May. seconds, given in the following table. '
regression of y on t in the form y =a+ bt. X X X Mid-day
X Hours of
(b) Explain the practical significance of the X
X temperature, yoC
Rider A B c D E F G
X Date sunshine, x
value of a.
(c) Use your equation to estimate the values of X Round 1 127 131 133 139 140 141 146
X May 1st 10 17
y at t ~ 4.5 and t ~ 20.0. May 2nd 11 21 Round 2 132 130 140 137 133 138 142
(d) State, with a reason, which of these 12
estimates is likely to be the more reliable. May 3rd 2
The experimenter plotted a graph of y against t, May 4th 7 13 (a) Calculate Spearman's rank correlation
but used only the data in the table below. May 5th 5 18 coefficient between the times for the two
May 6th 6 16 rounds.
May 7th 12 15 (b) It was subsequently discovered that rider G
Time
3.8 4.2 4.6 5 had broken the rules of the competition and
(minutes), t 3 3.4 The scatter diagram representing this data is [LX~ 53, LY ~ 112, LX 2 ~ 479, 1 0 seconds was added to his Round 2 time
shown above. LY 2 ~ 1848, I:xy ~ 882.] as a penalty. State, with a reason what can
be said about the value of Spear~an's rank
Temperature (b) State the value of Spearman's rank
(oC), Y 17 18.3 18.6 18.9 19.3 19.4 correlation coefficient for this data, and state Plot the data on a scatter diagram.
correlation coefficient calculated from the
what further information its value gives Calculate the product moment correlation revised data.
{e) Plot this graph, and on it draw the line of about the relationship between x andy. coefficient. {c) Lat~r still it was discovered that, in Round
regression. (c) State which of the following best indicates The regression line of x on y has equation 2, nders A and B had to have their times
(f) State why the linear model could not be the relationship between x andy. x = 0.607y- 2.14, and the regression line of y interchanged. State, with a reason but
valid for very large values of the time. (i) The product moment correlation on x has equation y = 0.438x + 12.7 where the without further calculation, whether, as a
(g) Using your graph, comment on whether the coefficient. c_oefficients are correct to three significant result of this change, the value of
model is a reasonable one, and state, giving {ii) Spearman's rank correlation coefficient. ftgures: Usi-?g the ~quation of the appropriate Spearman's rank correlation coefficient
a reason, whether you consider that a more (iii) The scatter diagram. regressiOn hoe, estimate the number of hours of would increase, decrease or stay the same.
refined model could be found. (L) Give a reason for your answer. {C) su.nshine expected on a day in May when the (C)
mtd-day temperature is 18 oe,
Pf~Ut3i~ L !TY 169
1
Winning the
1
Cutting a pack
1 1
Rain
lottery jackpot at a diamondc dom
owncommg
heads
EXPERIMENTAL PROBABILITY
Probability
When you drop a drawing pin from a height it land .
one of two positions: point-up or point-down · sm point-up point-down
" about different ways of estimating probabilities (a) take ten identical drawing pins and drop th f h .
(b) count tbe number out of tbe ten 'th . en: rohm a eJght, say 30 em, onto a flat surface
" how to use probability notation ( ) WI pomts m t e mr '
c repeat the experiment so that it is carried out a tot I ' . .
" about the probability laws including number of 'points up' after each ti a of 20 times, notmg the cumulative
the rule for combined events (d) calculate the relative frequency of :"p;ints- u p' each time,
. w h ere
the 'or' rule for mutually exclusive events,
the 'and' rule for independent events relative frequency number of 'points-up'
total number of pins thrown
" about conditional probability
" how to use tree diagrams Here is a table showing the results when this experiment was performed.
w about arrangements, selections, permutations and combinations and their application to
probability Number of 'points-up' in Cumulative number Cumulative number Relative frequency
10 drawing pins of 'points-up' of pins thrown of 'points-up' (2 d.p.)
The probability of an event is a measure of the likelihood that it will happen and it is given on 3 3 10 fa= 0.30
a numerical scale from 0 to 1. The numbers representing probabilities can be written as 8 11 20 1b = 0.55
5 16 30 ~{ = 0.53
percentages, fractions or decimals. 5 21 40 ~=0;53
A probability of 0 indicates that the event is impossible. 7 28 50 ~=0.56
A probability of 1 (i.e. 100%) indicates that the event is certain to happen. 6 34 60 ~= 0.57
All other events have a probability between 0 and l. 6 40 70 48 = 0.57
5 45 80 ~~ = 0.56
3 48 90 ~=0.53
For example 7 55 100 No= 0.55
There is an evens chance of a coin coming down heads when tossed; 7 62 110 -tfo = 0.56
the probability is J: or 0.5 or 50%. 7 69 120 t?o= 0.58
5 74 130 ti~ = 0.57
There is a 1 in 4 chance of cutting a pack of cards at a diamond; 4 78 140 &80 = 0.56
the probability is~ or 0.25 or 25%. 8 86 150 t~6o = 0.57
The weather forecaster may say that there is a 70% chance of rain. 7 93 160 llo = 0.58
8 101 170 in=o.59
7 108 180 IZZ = o.6o
The likelihood of winning the lottery with one ticket can be shown to be approximately 7 115 190 g~ = 0.61
1
1 in 14 million so the probability is ~ 0.000 000 07. 7 122 200 !88 = 0.61
14 000 000
COUN!T.RS
The results can be illustrated on a graph. You will need a supply of counters of two different colours. Ask someone to mix them up
in a bag in a ratio known only to them.
-0- 0.6 - _. -- - -
" Use relative frequency methods to estimate the proportion of each colour in the bag. Then
~ check with the actual values to see how close your estimate was.
~8. 0.5
THREECO!NS
Toss three coins a large number of times and use relative frequency methods to estimate the
probability that on any given throw two tails and one head will be obtained.
P(drawing pin lands point-up)= 0 ·6 and a When tossing a coin there are two possible outcomes, a head or a tail and if the coin is fair
these are equally likely to occur. Only one of the outcomes is successful (obtaining a head)
so P(head) = :l_ •
· .
event occurs r tmle:,., +,.
t J.~L
1
- ,·dativr
1·hc "
DOMlNOES . l
Place a set of dominoes m a ar~e
out of t e ag at ran .
bag Use the relative
bability of drawing
frequenchy mbethod to;~~:~ ~:~::oes that have a
1
rn ~
~
PROBABILITY NOTATION AND PROBABILITY lAWS
When deriving mathematical rules for probability it is useful to use the definition based on
number in common on one of their ha ves. equally likely outcomes, but remember that the results hold for probability in general.
Example 3.1
you need some preliminary definitions: A group of 20 university students contains eight who are in their first year of study. A student
. . . trial has a number of possible outcomes. is picked at random to represent the group at a meeting. Find the probability that the student
Any stat!StKal expenment or . ll d the possibility space S. is not in the first year of study.
The set of all posstble outcomes ~~~~e~ to be a subset of S.
An event A of the expenment ts
Solution 3.1
Here are some examples: s Event A: student is in the first year of study.
• When a die is thrown, the outcomes are the numbers
1 to 6.
\9A
3 4 5 6
P(A) =
20
8
= 0.4
So S = (1, 2, 3, 4, 5, 6). . 1 h 3' so P(A') = 1- P(A) = 1 - 0.4 = 0.6.
Define A to be the event 'the score ts ess t an .
• s
Then A= (1, 2). 'bl 6 The probability that the student is not in the first year of study is 0.6.
When two dice are thrown, there are?~ ~osst e "
"' h
outcomes, s own
by dots on the posstbthty space j 5
4 •
Example 3.2
~:K~~~ to be the event 'the sum of th; tw~ scores
3
2 A Two fair coins are tossed. Show the possible outcomes on a possibility space diagram and find
is 6'. These outcomes are shown nnge tn t e
the probability that two heads are obtained.
diagram. 0 -l-~~,-,-,-:
0123456
First die Solution 3.2
Each coin is equally likely to to show a head or a tail. c s
. · often used to show ·g H
(H~ TH
In general terms a Venn d tagram lS The possibility space for the outcomes is shown in the diagram, u "'-
c
0 A
A and S. indicating that n(S) = 4. u
w
"'
Event A: Two heads are obtained. T HT TT
. the possibility space is denoted by n(S) ·
Tbe number o f outcomes m . d db (A) There is just one outcome for this so n(A) = 1. H T
The number of outcomes in event A IS enote y n . n(A) 1 First coin
Writing P(A) for the probability of A, Therefore P(A) = n(S) =
4
1
The probability that two heads are obtained is .
4
A is a subset of S, so 0 < n(A) < n(S).
Dividing throughout by n(S) gtVes
0< <1 3a probabil
Remember that . . 'bl 1. An ordinary die is thrown. Find the probability 3. The possibility space consists of the integers from
P(A) = 0 means that event A ts tmpo~st e, en that the number obtained is 1 to 20 inclusive.
P(A) = 1 means that event A ts certam to happ . (a) a multiple of 3,
(b) less than 7, A is the event 'the number is a multiple of 3'.
{c) a factor of 6. B is the event 'the number is a multiple of 4'.
An integer is picked at random.
The complementary event A' ~-----s
2. In a box of highlighters there are eight which Find (a) P(A), (b) P(B').
have dried up and will not write. The box
A' denotes the event A does not occur. contains 10 red, 15 blue, 5 green and 10 yellow 4. Dan carried out an experiment in which 16 coins
highlighters. were tossed together. The number of tails
n(A') = n(S)- n(A) obtained from tossing the coins was counted.
A high lighter is picked at random from the box.
, n(S)- n(A) _ _ n(A) = 1- P(A) Find the probability that
1 This procedure was carried out ten times in all
so P(A) n(S) - n(S) (a) it is blue, and the results were
{b) it is neither green nor yellow,
l - {c) it is not yellow, Number of tails: 9, 7, 8, 6, 10, 7, 5, 5, 8, 9
Therefore
_, l {d) it is purple, (a) Use Dan's data to calculate the probability
or ' d fA' (e) it will write. of obtaining a tail.
. -A is written for the complementary event instea o .
Note that sometimes
14. Two fair cubical dice are thrown simultaneously (a) Calculate (i) P(9) (ii) P(4) (iii) P(14).
hose who took part were then
The names o f t and the scores multiplied. P(n) denotes the 1
The experiment was co_ntinued until the 16 coins placed in a prize draw· . probability that the number n will be obtained. (b) If P(t) = 9'' find the possible values oft.
were each tossed 100 tunes. F d the probability th~t some<?ne who satd
(b) Calculate the total nu_mber of tails that Dan ' tn . . costs' will wm the pnze.
servtcmg
would expect to obtam.
10 The durations of 60 telephone calls arc
The probability of an event ~ccu~ring is 0?7; . summarised in the table below. ILLUSTRATING TWO OR MORE EVENTS USING VENN DIAGRAMS
5. What is the probability that It wtlt not occur.
9- 18- 27- 36- 45-
Duration (minutes) 0 Suppose A and B are two events associated with the same experiment. Consider the outcmnes
card is drawn at random from an ordinary
6. A d 6 10 21 20 3 0 described below
pack of 52 playing car s. . Number of calls
(a) Find the probability that the card drawn ts (a) AUB
Use linear interpolation t~ estimate the In set language, the set that contains the outcomes that are in A or B or both is called the
(i) the four of spades, . d
bability that the duratton of a call, d
(")
n
the four of spades or any dmmon '
d (] k or Queen or ;~~cted at random from the 60 calls, excee s (C) union of A and B and is written AU B.
(iii) not a picture ca_r ac
King) of any sutt. 30 minutes. To represent AU Bon the Venn diagram, shade the
(b) The card drawn is the three of diamon~s,- It The table summarises the results of d_ll ~he th whole of the coloured 'figure-of-eight' shape.
is laced on the table and a second car ts 11.
driving tests taken at a Test Centre unng e
p
drawn.
What is the probability that the
d. nd? first week of September.
Remember that although this outcome is written
second card drawn is not a wmo . .A or H it includes the events that are in both A and B
Female as well. A u 8 means A orB or both.
7. The pupils in a junior sch?ol cla~s w~rd_ a~~~r
Male
how many brothers. and ststers t ey a . 32 43 (b) A nB
answers are shown tn the table. Pass 15
Fail 8 In set language, the set that contains the outcomes that are in both A and B is called the
2 3 4 5 intersection of A and B and is written An B.
Number of brothers 0 1
A person is chosen at random from those who
and sisters took their test that week. To represent A n B on the Venn diagram, shade the
8 3 2 1
Number of pupils 4 12 (a) Find the probability that the person overlap of A and B. This outcome is often written
(i) passed the driving test, .. /,and H.
Find the probability that a chi}d chos~n a~ly with (ii) w~s a female who failed her dnvmg
random from the class comes rom a am A nBmeansAandB.
test.
three children.
(b) A male is chosen. What is the probability
. I die numbered 1 to 6, is weighted so that he did not pass the test? PROBABILITY RULE FOR COMBINED EVENTS
8. ~~~~t~~ is t~vice as likel~ .to occur as any other components gave the following
number. Find the probabtbty of 12. Wear tests on 100 f l'f 1 h s
grouped frequency distribution o t e engt ,
(a) a six occurring, . A 8
(b) an odd number occurnng. Number of components If the number of outcomes in A is n(A) and the number of
Life length (x hours)
. d yin which outcomes in B is n(B), then for two overlapping sets A and B,
9. A car manufacturder cha.rrhtef otuotrafrsou~;~he 15
eo le were aske w tc ac · 500 <:;X< 530 if you add n(A) and n(B) together you will count the overlap
foll~wing list influenced them most when buymg 24
530<;x<550 twice.
33
a car: 550<;x<570
A_ the colour range available,
21
570 <:;X< 600 AnB
B - the servicing costs, 7
600 <:;X< 650 So to find the number of outcomes in A U B you have to take one overlap away like this:
c- driver air bag,
D- fuel economy,
E _range of optional extras.
Use linear interpolation to estimate the dom n(A n B) ~ n(A) + n(B)- n(A n B)
bability that a component drawn at ran d
The pie chart shows the results from 90 people. r:~m the 100 has a life length between 540 an (C) Dividing by n(S), this becomes
580 hours.
A
Two ordinary unbiased dice are thrown.
13. Alternatively
B Find the probability that
) the sum on the two dice is 3, {)_J_ and lf:
(~) the sum on the two dice exceeds 9, T
(c) the two dice show the samd.e nJ.ff~:~y more Remember that the word
(d) the numbers on the two JCe 1 or means A or B or both.
than 2.
Other useful results relating two events A and B
'''K]5l
Example 3.3 3 f the 11 iris are in the athletics team. A P(A n B) = P(B n A)
In a class of 20 children, 4 of the 9 b~ys~n~ o d spo:n' race on Sports Day. Find the
person from the class is chosen to ~e m e egg an l'{A and B) PW and A.:1
'[hlSl
probability that the person chosen ts
(a) in the athletics team, P(A) = P(A n B) + P(A n B')
(b) female, . lhl
(c) a female member of the athlettcs team, T(A and B) f-'(A but not fl}
(d) a female or in the athlettcs team.
Solution 3.3
Possibility spaceS: the class of 20 people 7
P(A) = - = 0.35
, , '[Uj
AnB' AnB
m
(d)s~ P(neither A nor B) = 1 - P(A or B)
(c) P(female and in the athletics tea~)= P(A and F) 1.e. P(A' n B') = 1 - P(A u B)
There are three girls in the athletiCS team, so
3
P(A and F)= = 0.15
20 A' n 8'
(d) P(A or F)= P(A) + P(F)- P(A and F)
= 0.35 + 0.55-0.15 Example 3.5
In a survey, 15% of the participants said that they had never bought lottery tickets or a
premium bonds, 73% had bought lottery tickets and 49% had bought premium bonds.
Find the probability that a person chosen at random from those taking part in the survey
Example 3.4 4 (a) had bought lottery tickets or premium bonds,
_!! P(D) = _2-_ and P(C U D) =- · (b) had bought lottery tickets and premium bonds,
Events C and Dare such that P(C) - 30' 5 5
(c) had bought lottery tickets only.
FindP(CnD).
Solution 3.5
Solution 3.4 L: person has bought lottery tickets, P(L) = 0. 73.
P(C u D)= P(C) + P(D)- P(C n D) B: person has bought premium bonds, P(B) = 0.49. B
Using P(neither L nor B) = 0.15
~ = !!+_2-_-P(C n D) (a) P(L or B) = 1 - P(neither L nor B)
5
30 5
=1-0.15
19 2 4 =0.85 LorB Neither L nor B
P(C n D)=-+---
30 5 5
7 (b) Use P(L or B)= P(L) + P(B)- P(L and B)
0.85 = 0.73 + 0.49- P(L and B)
P(L and B)= 0. 73 + 0.49- 0.85
Land B
= 0.37
S L~B (c) P(reads only one)= P(reads only A)+ P(reads only B)+ P(reads only C)
= !§+!a+ 5~ = ~ = 0.62
15~
(d) P(reads only A)= ~g = 0.32
(b)
0.3 = 0.1 + P(A n B')
P(A n B') = 0.2
P(A' n B') = 1- P(A U B)
P(A U B)= P(A) + P(B)- P(A n B)
impossible event. There is no overlap of A and B.
For exclusive events, the rule for combined events becomes Od
= 0.3 + 0.4-0.1
=0.6 This is known as the addition rule for exclusive events.
P(A' n B') = 1 -P(A U B) It is also known as the 'or' rule for exclusive events:
= 1-0.6
=0.4
Extending this result to n exclusive events,
or
Example 3 ·7 h rs A B or c they read. The results
I l ed which of t ree newspape ' ' d C 6
A group of 50 peop e was as c d C 5 read both A and B, 4 read both B an ' Example 3.8
showed that 25 read A, 16 read B, 14 rea '
read both c and A and 2 read all 3 · In a race in which there are no dead heats, the probability that John wins is 0.3, the
probability that Paul wins is 0.2 and the probability that Mark winsis 0.4.
(a) Represent these data on a Venn diagram.
Find the probability that a person selected at random from this group reads Find the probability that
(a) John or Mark wins,
(b) at least 1 of the newspapers, (b) John or Paul or Mark wins,
(c) only 1 of the newspapers, (L)
(c) someone else wins.
(d) only A.
Solution 3.8
Solution 3.7 .
. t A B and C and fit in the numbers gtven. Since only one person wins, the events are mutually exclusive.
(a) Draw 3 overlappmg sets to represen '
s (a) P(John or Mark wins)= P(John wins)+ P(Paul wins)
A B
= 0.3 + 0.4 = 0.7
(b) P(John or Paul or Mark wins)= P(John wins)+ P(Paul wins)+ P(Mark wins)
= 0.3 + 0.4 + 0.2 = 0.9
d1c' ;pt~1\::: in d1e
s~--1--- l.\_('\ll\;m\_w.l' :q add (c) P(someone else wins)= 1-0.9 = 0.1
st:ts. j'h_is cur.ues it' shuwi:tg th~lt
s rnd none tht· i~<C\·\;s;J.=i!"-:rs.
c
T D ~,()CJ,Lr:J!ll ;--./ 181
Special case:
Consider an event A and its complementary event A'.
Example 3.9 ·· h s
A card is drawn from an ordinary pack of 52 playing cards. Find the probabtltty that t e
P(A n A')~ 0
card is A'
P(A u A')~ P(A) + P(A') ~ 1
(a) a club or a diamond,
(b) a club or a King.
event A and its complcmcntt~.ry event A' arc hoth mtJtc!ally exclusive and exhaustive.
Solution 3.9 52~------· s Extending this to n events:
Possibility space S: the pack of 52 cards, so n(S) ~ 52
n(C) 13 1 JfA A J! ••• , arc n events viihich hct\\'CCll them make up the \'vhole pt>":ibility space
C: a club is drawn, so P( C) ~ n(S) ~ 52 ~ 4 · wttlHlUI overlapping, then
n(D) 13 1 +
D: a diamond is drawn, so P(D} ~ n(S) ~ 52 ~ 4 ·
and then events arc both mutually exclusive and cxh~wstivc.
(a) Since a card cannot be both a club and a diamond, the events C and D are mutually
exclusive.
Therefore P(C or D)~ P(C) + P(D}
1 1 1
~-+- ~-
4 4 2
n(K) 4 1 3b Probability combined events
(b) Event K: a King is drawn, so P(K) ~ n(S) ~ 52 ~ 13 · 1. An ordinary die is thrown. Find the probability
that the number obtained is Faulty Not faulty
The events C and K are not mutually exclusive since a card can be both a King and a club. (a) even, (b) prime, (c) even or prime.
s Machine A 3 12
52
Therefore c K 2. In a group of 30 students all study at least one of Machine B 2 8
1 the subjects Physics and Biology. 20 attend the Machine C 5 15
P(C and K) ~ P(King of clubs)~ 52 · Physics class and 21 attend the Biology class.
Find the probability that a student chosen at
P(C or K) ~ P(C) + P(K)- P(C and K) random studies both Physics and Biology.
A component is chosen at random from those
tested.
13 4 1 16 4
~ 52+ 52- 52 ~ 52~ 13 . K~ K+ K" K" 3. From an ordinary pack of 52 playing cards the (a) Find the probability that the component
seven of diamonds has been lost. A card is dealt chosen
from the well-shuffled pack. Find the probability (i) is from Machine A,
that it is (a) a diamond, (b) a Queen, (c) a (ii) is a faulty component from Machine C,
diamond or a Queen, (d) a diamond or a seven. (iii) is not faulty or is from Machine A,
EXHAUSTIVE EVENTS (b) It is known that the component chosen is
of the m:•ssi.biJiity 4. For events A and Bit is known that P(A) "" }·,
up the P(A U B) ~ 'f and P(A n B) = f,_. Find P(B). faulty. Find the probability that it is from
H t\VO events A and B are such that het\vecn them u 1. Machine B.
1 _ - l , events
spaccl then A and h are to JC 5. For events C and D,
7. It is known that P(X) "" i and P(Y) = ~- Given
P(C) ~ 0.7, P(D U C)= 0.9, P(C n D)~ 0.3.
that X and Yare mutually exclusive, find
For example, if . . (a) P(X u Y),
S ~(the integers from 1 to 10 mclustve), Find (a) P(D),
(c) P(D n C'),
(b) P(D' n C),
(d) P(D' n C').
(b) P(Y n X), (c) P(Y n X').
A~ (the integers below 7) ~ (1, 2, 3, 4, 5 • 6), 8. For events A and Bit is known that P(A) ""P(B),
B ~(the integers above 5) ~ (6, 7, 8, 9, 10) 6. Tests are carried out on three machines A, B and P(A n B)~ 0.1 and P(A u B)= 0.7.
then Au B ~ (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) ~ S. C to assess the likelihood that each machine will Find P(A').
produce a faulty component. The results are
summarised in the table. 9. The probability that a boy in Class 2 is in the
football team is 0.4 and the probability that he is
in the chess team is 0.5. If the probability that a
boy in the class is in both teams is 0.2, find the
probability that a boy chosen at random is in the
football or the chess team.
It is also true that
16. In a large garden there are seven fruit trees a~d
10. Two ordinary dice are thrown. Find the . 13 other types of tree. Six of the trees have b~rds
probability that the sum of the scores obtamed nesting in them but only two of these are frutt
{a) is a multiple of 5, trees.
(b) is greater than 9, . (a) Copy and complete th~ table below to
(c) is a multiple of 5 or ts greater than 9, illustrate this informatton.
(d) is a multiple of 5 and is greater than 9.
Other tree Total Example 3.10
11. Given that P(A') = ~' P(B) = i and
Fruit tree
P(A n B) = f,, find P(A U B). 6 When a die was thrown the score was an odd number. What is the probability that it was a
Bird's nest 2
12. Two ordinary dice are thrown. Find the No nest prime number?
probability that Total 7 13
{a) at least one six is thrown,
(b) at least one three is thrown, . Solution 3.10
The owner of the garden has given permission
(c) at least one six or at least one three ts
for Abdul to play in the garden but_has . . P(prime and odd)
thrown. instructed him not to climb any frmt trees or P(pnme, g1ven odd)= P(odd)
trees that have birds nesting in them. Abdul
13. A and B are two events such that P(A) = fs,
P(B) = ~and P(A n B) = !. Are A and B selects a tree at random to climb. ~ ,. "fhe1·!· IJT tv-in 'l:lm;A :·~. ,·, :·1:Hi \ pr!1.:r ::~;1d l_,dci
exhaustive events? (b) Find the probabilitythat Abdul will obey
the owner's instructtons.
I i h':c:· '11·,,· 1i1:····· ,.,,;c llU:llWI :, 1, j Lild .\
Rearranging:
i B) X
PROBA.BIL!TY 185
Solution 3.13
(b) Given that 35% are male, full-time students
Events
P(M n Full) ~ 0.35
Also P(Full) ~ P(M n Full) + P(F n Full) M 1: a girl takes module M 1
0.65 ~ 0.35 + P(F n Full) M 2: a girl takes module M 2
.. P(F n Full) ~ 0.30 You are given that P(M 2 IM1) ~ .\, P(M11M,) ~ !
P(F) ~ P(F n Full) + P(F n Part) Since each girl takes one or both, P(M 1 u M 2 ) ~ 1
0.55 ~ 0.30 + P(F n Part)
. . P(Female and part-time)~ 0.25 (a) Let P(M 1 n M 2 ) ~ x
P(Part and F)
(c) P(Part, given F)
P(F)
0.25 P(M, n M1)c--- P(M,nM,i-P(M,nMJ
~-~0.45
0.55 P(M1J
P(student chosen from female students is part-time) ~ 0.45 1 X
5~ P(M 1)
P(M 1 ) ~5x
Example 3.12
P(M I M ) ~ P(M1 n M2)
X and Yare two events such that P(X I Y) ~ 0.4, P(Y) ~ 0.25 and P(X) ~ 0.2. Also
1 2 P(M2)
Find 1- X
(c) P(XU Y)
' - P(Mzl
(a) P(YIX) (b) P(X n Y) P(M 2 ) ~ 3x
P(M 1 U M 2 ) ~ P(M 1 ) + P(M 2 ) - P(M 1 n M)
Solution 3.12 But M 1 and M 2 are exhaustive events, so P(M 1 u 2 ) ~ 1M
(a) P(Y I X) x P(X) ~ P(X I Y) x P(Y) .. l~5x+3x-x
(c)
P(Xn Y) ~ 0.1
P(X U Y) ~ P(X) + P(Y)- P(X n Y)
- 5
-7-7
=1
1 1
2
MGIJ·M
4
7
'
7
2
7
~ 0.2 + 0.25- 0.1
~ 0.35
P(XU Y) ~ 0.35
On one throw, P(5) ~ ~ The events A and B are such that P(A 1 B) ~ 0.4, P(B I A ) ~ 0.25, P(A n B)~ 0.12.
lnd"nndnnt cvcm~ (a) Calculate the value of P(B).
On two throws, P(5 1 and 5 2 ) ~ P(5 1 ) x P(5 2)
(b) Give a reason why A and Bare not independent
=~xi (c) Calculate the value of P(A n B'). · (L)
= 3\
Solution 3.17
P(two fives are thrown)~ 3~
(a) P(A IB) P(A n B)
P(B)
0.4 ~ 0.12
Example 3.15 P(B)
In a group of 60 students, 20 study History, 24 study French and 8 study both History
and French. Are the events 'a student studies History' and 'a student studies French' .. P(B) -- 0.12--
0.4 0.3
independent? (b) P(B IA)~ 0.25
*P(B)
Solution 3.15 A and B are not independent.
From the information given: (c) P(A) ~ P(A n B)+ P(A n B')
P(History) ~ ~ ~ !, P(French) ~~~ ~ P(History and French) ~ fo ~ f's Also P(B 1 A) P(B n A)
P(A)
Now P(History) x P(French) ~ 1, x ~ ~ fs 0.25 ~ O.l2
So P(History and French) ~ P(History) x P(French) P(A)
P(A) ~ 0.48
The two events are independent.
So 0.48 ~ 0.12 + P(A n B')
P(A n B') ~ 0.36
188 f.. CONCISE COURSE IN f..- LEVEL_ STATISTICS
T PROl-3,G.HIUT'/ 189
P(A n B)
(c) P(A I B) ~ P(B)
Example 3.20
0.1
The probability that a certain type of machine will break down in the first month of operation
0.35 is 0.1. If a firm has two such machines which are installed at the same time, find the
~ 0.286 (3 d.p.) probability that, at the end of the first month, just one has broken down.
Assume that the performances of the two machines are independent.
It can be shown that if A and B are independent, then A' and B' are also independent.
For this value of x, find The probability that after one month just one machine has broken down is 0.18.
It is important not to confuse the terms 'mutually exclusive' and 'independent'. (a) Fmd the probabilities of the folio wmg
· events:
Mutually exclusive events are events that cannot happen together. They are usually the Event A: the number showing on the red die will b 5 6
Event B: the total of then b h . e a or a .
outcomes of one experiment. Event C: the total of the n~:b::: s howmg on thhe two dice will be 7,
b . s owmg on t e two dice will be 8.
Independent events are events that can happen simultaneously or can be seen to happen one
( ) State, wtth a reason, which two of the events A B
after the other. (c) Show that the events A and B . d d ' and Care mutually exclusive.
are m epen ent. (NEAB)
These three results are particularly useful. Learn them.
and B) Solution 3.23
(!
(a)
B c A
events w
'6
6 •
w
0
:0
5 • There are 36 equally likely outcomes, so n(S) = 36
c
0
4 •
•
0
u
3 •
n(A) = 12 .. P(A) = 12 =.!.
36 3
and B) ,:;- X "'
(1 B),, X
2 • n(B) = 6 .. P(B) = £36 =. .!.6
•
Example 3.22 0 n(C) = 5 .. P(C) = _5_
1 3 0 2 3 4 5 6 36
The three events E , E and E are defined in the same sample space. The events E and E are
1 2 3 Score on red die
mutually exclusive. The events E 1 and E 2 are independent.
Given that P(E 1) = ~. P(E 3) = t and P(E 1 U E 2) = ~.find (b) It is not possible to score 7 and 8 with one t h row of the die,
overlap. . so events B and C do not
~)P~.
(a) P(E 1 u E 3 ),
W
Events B and C are mutually exclusive.
(c) There are two ways to score 7 with the red die showing 5 or 6. These are (5 2) and (6 1)
Solution 3.22 So n(A and B)= 2 and P(A and B) = 2_ = _1_ ' ' ·
36 18
(a) Since E l and E 3 are mutually exclusive,
P(E 1 u E 3 ) = P(E 1 ) + P(E 3) But P(A) x P(B) = .!. x.!. = _1_
3 6 18
2 1
=-+- So P(A and B)= P(A) x P(B)
5 3 Events A and B are independent.
11
15
192 A CONCISE COURSE iN A~LEVEL STATISTiCS
T PROBABILITY 193
16. Two events A and Bare such that (c) Given also that P(C U D) = !. find the
P(A) ~ fs, P(B) ~ J, P(A IB) ~ t. valueofp. (0)
Exercise 3c Combined events Calculate the probabilities that
(a) Copy and complete the table. (a) both events occur, 19. Events A and Bare such that P(A) = 0.4 and
1. A number is picked at randombfro~ the dli~itf f {b) only one of the two events occurs, P(B) = 0.25. If A and Bare independent events,
1, 2, ... , 9. Given that the num er ts a ~u ttp eo Boys Girls (c) neither event occurs. (NEAB) find
3 find the probability that the number IS (a) P(A n B), (b) P(A n B'), (c) P(A' n B').
(~)even, (b) a multiple of 4. 16 8 17. All the answers to this question should be given
Passed driving test
6 either as fractions in their lowest terms or as 20. Two tetrahedral dice, with faces labelled 1, 2, 3
l. In a large group of people it is knohwn thath10%d Taken driving test, but failed decimals correct to three significant figures. and 4, are thrown and the number on which
have a hot breakfast, 20% have a ot 1unc ~· an Learning, but not yet taken a
{a) A man draws one card at random from a each lands is noted. The score is the sum of these
lS% have a hot breakfast or a hot lunch. Fmd
driving test complete pack of 52 playing cards, replaces two numbers. Find the probability that (a) the
the probability that a person chosen at random
T 00 young to take a driving test it and then draws another card at random score is even, given that at least one die lands on
from this group a three, (b) at least one die lands on a three,
{a) has a hot breakfast and a hot lunch, from the pack.
given that the score is even.
(b) has a hot lunch, given that the person Use your table to find the probability that Calculate the probability that
chosen had a hot breakfast. (L) (b) a student chosen at random has failed a (i) both cards are clubs, 21. Events C and Dare such that P(C) = 1.
J. If events A and B are such that they are . driving test, d .. (ii) exactly one of the cards is a Queen, J,
P(C n D') ~ P(CI D)~ f,.
(c) a girl chosen at random has taken a nvmg
independent and P(A) ~ 0.3, P(B) ~ 0.5, fmd (iii) the two cards are identical. Find (a) P(C n D), (b) P(D), (c) P(D I C).
(a) P(A n B), (b) P(A U B). test,
(d) a boy chosen at random has not yet taken a (b) On another occasion the man draws
Are events A and B mutually exclusive? simultaneously two cards at random from 22. Two athletes, A and B, are attempting to qualify
driving test,
(e) 2 students, chosen at random, are both too the pack of cards. for an international competition in both the
4 _ If P(A 1B) ~ }, P(B) ~ t, P(A) ~ \, find young to take a driving test, Calculate the probability that
5000 m and 10 000 m races. The probabilities of
(a) P(B I A), (b) P(A n B). (f) a boy and a girl, each chosen at random, each qualifying arc shown in the following table.
(i) exactly one of the cards is a Queen,
have both passed their driving test. (C)
5. A die is thrown twice. Find the probability of (ii) the two cards are identical. (C)
Given that two events, A and B, are such Athlete 5000 m 10 000 m
obtaining a number less than three on both
11. (a ) that I'(A and B)~ P(A) x p (B) , state w ha t
throws. 18. (a) The probability that an event A occurs is A 3 1
you can say about the events A and B. 5 4
P(A) = 0.4. B is an event independent of A
6. Events A and B are such that P(A) = ~, If event A is 'obtaining a 6 on a s~n~le throw and the probability of the union of A and B B 2
J t
P(AIB) ~ ~, P(B) ~ t· of a die', suggest a possible descnptton for is P(A u B)~ 0.7.
Find (a) P(B I A), (b) P(A n B). Assuming that the probabilities are independent,
event B. FindP(B).
calculate the probability that
7. A card is picked at random from a pac~ of ~0 (b) Given that two events, C and D, are (b) C and Dare two events such that
cards numbered 1, 2, 3, ... , 20. Given t at t ~. such that I'(C or D)~ P(C) + I'(D), state P(DIC) ~ tandP(CID) ~ j. (a) athlete A will qualify for both races,
card shows an even number, find the probabihty what you can say about the two events (b) exactly one of the athletes qualifies for the
Given that P{C n D)= p, express in terms 5000 m race,
that it is a multiple of 4. Cand D.
of p (c) both athletes qualify only for the 10 000 m
Write down the value of P(C and D). (C)
8. In a group of 100 people, 40 own a ~at, 25 own (i) P(C), (ii) P(D). race. (C)
a dog and 15 own a cat and a dog. Fmd the
probability that a person chosen at random 12. The probability that a person in a particular
evening class is left-handed is !:. From a class of
(a) owns a dog or a cat,
(b) owns a dog or a cat, but not both, 15 women and 5 men a person is chosen, ~t
random. Assuming that 'left-handedness ts
(c) owns a dog, given that he owns a cat,
(d) does not own a cat, given that he owns a dog.
independent of the sex of a person,_ find the . PROBABILITY TREES
probability that the person chosen IS a man or ts
9. A card is picked from a pack codntaidning 52 d left-handed.
A useful way of tackling many probability problems is to draw a probability tree. The method
playing cards. It is then replace. _an a secon
card is picked. Find the probabthty that 13. A and B are exhaustive events and it is known is illustrated in the following example.
(a) both cards are the seven of diamonds, t
that P(A I B) ~ and P(B) ~ 1-
Find I'(A).
{b) the first card is a heart and the second a
spade, h . 14. A bag contains four red counters and six black Example 3.24
(c) one card is from a black suit and the ot er ts counters. A counter is picked at random .from the
from a red suit,
bag and not replaced. A.s~cond counter ts then In a certain selection of flower seeds ~have been treated to improve germination and t have
(d) at least one card is a Queen. picked. Find the probabthty that . been left untreated. The seeds which have been treated have a probability of germination of
(a) the second counter is red, given that the first
0.8, whereas the untreated seeds have a probability of germination of 0.5.
10. A student investigating success in drivin~ tests counter is red,
gathered information from 60 students m her (b) both counters are red, (a) Find the probability that a seed, selected at random, will germinate.
school. Of these students, 25 were girls and 35 (c) the counters are of different colours.
were boys. She found that 37 o~ the st.udent~ had The seeds were sown and given time to germinate.
already taken a driving test, whtlst 5, mcludmg 3 15. A and B are two independent events such that
girls, were too young to take a driving test. ~f P(A) ~ 0.2 and P(B) ~ 0.15. . .. (b) Find the probability that a seed selected at random had been treated, given that it had
the 37 who had taken a test, 16 boys ~nd 8 ?Irls Evaluate the following probabiltties.
germinated. (L)
had passed their test. The remainder, mcludmg (a) P(A 1 B), (b) P(A n B), (c) P(A U B). (L)
6 girls, had failed their test.
194 ,; CONCiSE CCUf~SE iN .A.-L[\,1[:-.l__ STf\TiSriC S
T
I
(a) P(G) = P(T n G)+ P(T' n G)
Solution 3.24 =~X 0.8 + t X 0.5
=0.7
Events
T: seed is treated P(T) = ~' P(T') = 1 P(Tand G)
P(G IT)= 0.8, P(G IT')= 0.5 (b) P(T, given G)
G: seed germinates P(G)
P(Tand G)=P(T)xP(GIT) = ~x0.8 ~X 0.8
0.7
= 0.762 (3 d.p.)
D~
C . i __ ')
0.8 + 0.2 = 1
,, ;;;;--.~0'
P(D n D') = 0.025 x 0.975
(iii) Check that all the end results add up to 1. .f h P(D' n D)= 0.975 x 0.025
(iv) To answer any questions find the relevant end results. If more than one sat!S Y t e
requirements, add these end results together·
~D
In practice you would usually label your tree more simply as follows.
P(T n G)=~ x 0.8 *
o·~o·
~G First pen Second pen
Example 3.26
P(T' n G') =1x0.5 Events X and Yare such that P(X') = ?, P(YI X') = 1, P(Y' IX) = t.
Treated Germinates
or not or not By drawing a tree diagram, find
(a) P(Y) (b) P(X' I Y)
196 .A. CONCiSf_~ COUf-\St: ir'1 .A.-U~\/El_ STAriSTICS
T PF\OBABILiTY 197
Solution 3.27
Solution 3.26 Events Probabilities
Draw a tree diagram, showing event X followed by event Y, and write in all the given X: cab is from X P(X) = 0.4
probabilities. Then work out the missing probabilities using the fact that probabilities on all Y: cab is from Y P(Y) = 0.5
the branches from a point add up to 1. Z: cab is from Z P(Z) = 0.1
~r P(Xn Y)=~x.l=fo L: cab is late P(L I X)= 0.09, P(L I Y) = 0.06, P(L I Z) = 0.2
x~L P(XnL)=0.4x0.09=0.036
x~r'
0.4
~ ,. P(X n L') = 0.4 x 0.91 = 0.364*
<t----"0.:'.5_ _ _ y ~
(a) P(Y) = P(X n Y) + P(X' n Y) ~ '' P(Y n L') = o.s x 0.94 = 0.47
3 1
=-+- 0.1
10 5 ~ P(ZnL)=0.1x0.2=0.02
1
2
(b) P(X' I Y) X P(Y) = P(Y I X') X P(X') z~
1 1 3
P(X'I Y) x 2:= x S ,. P(Z n L') = 0.1 x 0.8 = 0.08
3
2 (a) P(from X and not late)= P(X n L') = 0.364 c-- on dicwcnn
P(X'IY) =s (b) P(arrives late)= P(X and late)+ P(Y and late)+ P(Z and late)
Alternatively = P(X n L) + P(Y n L) + P(Z n L) c - - ;haded m dcc•g•·cnn
P(X' I Y) = P(X~~;~ Y)
= 0.036 + 0.03 + 0.02 .
= 0.086
3 1 The possibility space is now reduced to the outcomes when the cab arrives late where
-x-
5 3 P(L) = 0.086 (part b) '
1 P(Y and late)
P(from Y given it was late)
2 P(late)
2 P(Y n L)
i.e. P(YIL)
5 P(L)
= 0.03 f - - - rM.ckr:d ./in di~tg1·am
0.086
Example 3.27 = 0.349 (3 d.p.)
When a person needs a minicab, it is hired from one of three firms, X, Y and Z. Of the hirings
40% are from X, 50% are from Y and 10% are from Z. For cabs hired from X, 9% arrive
late, the corresponding percentages for cabs hired from firms Y and Z being 6% and 20%
respectively. Calculate the probability that the next cab hired BAYES' THEOREM
(a) will be from X and will not arrive late, P(Y IL) is easy to find from tbe tree diagram once you realise that the sample space has been
(b) will arrive late. reduced 7~ the, out~omes m whiCh L occurs. This is a useful method when you want to 'reverse
Given that a call is made for a minicab and that it arrives late, find, to three decimal places, the cond1t10ns, as m Example 3.27, wben you know P(L I Y) and you wanted P(Y 1 L).
the probability that it came from Y. (L)
(a) Find the probability that the second question is answered correctly.
It is interesting to write out the full formulae used: (b) By extendmg the tree dtagram, or otherwise, find the probability that the second question
IS answered correctly gtven that the third question is answered correctly. (C)
P(Y and L) = P(L I Y) x P(Y)
also P(Y and L) = P(YI L) x P(L)
so P(YI L) X P(L) = P(L I Y) X P(Y) Solution 3.28
c~c, P(C1 n C2 ) = 0.8 x 0.7 = 0.56
P(LI Y) X P(Y) ~·
P(Y IL)
P(L)
c,~
But P(L) = P(X n L) + P(Y n L) + P(Z n L) w,
= P(L I X) X P(X) + P(L I Y) X P(Y) + P(L I Z) X P(Z) P(C 1 n W2 ) = 0.8 x 0.3 = 0.24
The formula has been included here for reference. It is however easier to work from the (b) ~c, P(C1 n C2 n C3 ) = 0.8 x 0.7 x 0.6 = 0.336 *
format
.....______A~
~ ,,_ w3
c,
~~c, P(C1 nW2 nC3 )=0.8x0.3x0.7=0.168
especially when you have a tree diagram to illustrate the situation! ,_________
,:0-----_w
Example 3.28 '
A computer program generates random questions in arithmetic that children have to answer ~c, P(W1 n C2 n C3 ) = 0.2 x 0.8 x 0.7 = 0.112 *
within a fixed time. The probability of the first question being answered correctly is 0.8.
Whenever a question is answered correctly, the next question generated is more difficult, and '·~c,~
~ ''--~· w3
the probability of a correct answer being given is reduced by 0.1. Whenever a question is
answered wrongly, the next question is of the same standard, and the probability of a correct '~ ~c, P(W1 n W2 n C3 ) = 0.2 x 0.2 x 0.8 = 0.032
answer being given remains unchanged. The following tree diagram shows this information
for the first two questions generated.
1st question 2nd question
~w 3
1st 2nd 3rd
Wrong
200 A CONCiSE COURSE iN A-U:VEL STATISTiCS
13. A team needs to win at least two of its remaining subsequent game in the series, Alec's probability
Exercise 3d Tree diagrams three games to secure the championship. The of winning the game is 0.7 if he won tl1e
probabilities that the team will win the games are preceding game but only 0.5 if he lost the
Draw a tree diagram to show all the po~~i?le assessed to be 0.6, 0.7 and 0.8, respectively. preceding game. A game cannot be drawn. Find
Section A total scores and their respective probabiiltles Calculate the probability, based on these assessed the probability that Alec will win the third game
1. The probability that I am late for work is ?·05. after a player has completed two rounds. values, that the team will secure the in the next series he plays with Bill. (NEAB)
Find the probability that, on two consecutiVe Find the probability that a player has (a) a score championship. (C)
mornings, (a) I am late for work twice, (b) I am of 4 after two rounds (b) an odd number score 18. Three men, A, Band C agree to meet at the
late for work once. after two rounds. ' (L Additional) 14. In the game of tennis a player has two serves. theatre. The man A cannot remember whether
If the first serve is successful the game continues. they agreed to meet at the Palace or the Queen's
2. A mother and her daughter both enter the cake 7. The probability that I have to wait at the traffic and tosses a coin to decide which theatre to go
competition at a show. The probability that the lights on my way to school is 0.25.
If the first serve is not successful the player serves
to. The man B also tosses a coin to decide
!
mother wins a prize is and the probability that again. If this second service is successful the
Find the probability that, on two consecutive . game continues. between the Queen's and the Royalty. The man
her daughter wins a prize is ~· mornings, I have to wait on at least one mornmg. C tosses a coin to decide whether to go to the
Assuming that the two events are independent, If both serves are unsuccessful the player has Palace or not and in this latter case he tosses
find the probability that 8. A die is thrown three times. What is the served a 'double fault' and loses the point. again to decide between the Queen's and the
(a) either the mother, or the daughter, but not probability of scoring a two on just one occasion? Gabriella plays tennis. She is successful with 60% Royalty. Find the probability that
both, wins a prize, of her first serves and 95% of her second serves. (a) A and B meet,
(b) at least one of them wins a prize. 9. A coin is tossed four times. Find the probability
(a) Calculate the probability that Gabriella (b) Band C meet,
of obtaining less than two heads.
3. In a restaurant 40% of the customers choose serves a double fault. (c) A, Band Call meet,
steak for their main course. If a customer (d) A, Band Call go to different places,
10. Two golfers, Smith and Jones, are attempting to If Gabriella is successful with her first serve she
chooses steak, the probability that he will choose qualify for a golf championship. I.t i~ est.imated has a probability of 0. 75 of winning the point. (e) at least two meet. (C)
ice cream to follow is 0.6. If he does not have that the probability of Jones quahfymg 1s 0.8, If she is successful with her second serve she has
steak, the probability that he will choose ice and that the probability of both Smith and Jones Section B
a probability of 0.5 of winning the point.
cream is 0.3. Find the probability that a qualifying is 0.6. Given that the probability of 1. I travel to work by route A or route B. The
customer picked at random will choose Smith qualifying and the probability of Jon~s. (b) Calculate the probability that Gabriella wins probability that I choose route A is !. The
(a) steak and ice cream, qualifying are independent, find the probabthty the point. (MEG) probability that I am late for work if I go via
(b) ice cream. that only one of them will qualify. (C)
15. In a group of 12 international referees there are route A is j- and the corresponding probability if
three from Africa, four from Asia and five from I go via route B is!.
4. A box contains six red pens and three blue pens. 11. Whether or not Jonathan gets up in time for (a) What is the probability that I am late for
(a) A pen is selected at random, the colour is school depends on whether he remembers to set Europe. To officiate at a tournament, three
referees arc chosen at random from the group. work on Monday?
noted and the pen is returned to the box. his alarm clock the evening before. (b) Given that I am late for work, what is the
This procedure is performed a second, then a Calculate the probability that
For 85% of the time he remembers to set the probability that I went via route B?
third time. Find the probability of obtaining clock; the other 15% of the time he forgets. (a) a referee is chosen from each continent,
(i) three red pens, (b) exactly two referees arc chosen from Asia, 2. A box contains 20 chocolates, of which 15 have
(ii) two red pens and one blue pen, in any If the clock is set, he gets up in time for school (c) the three referees are chosen from the same soft centres and five have hard centres. Two
order, on 90% of the occasions. continent. (C) chocolates are taken at random, one after the
(iii) more than one blue pen. If the clock is not set, he does not get up in time other. Calculate the probability that
(b) Repeat (a) but this time find the probabilities for school on 60% of the occasions. 16. A bag contains seven black and three white
(a) both chocolates have soft centres,
if, at each selection, the pen is not returned to marbles. Three marbles are chosen at random
On what proportion of the occasions does he get (b) one of each sort of chocolate is taken,
the box. and in succession, each marble being replaced
up in time for school? (NEAB) (c) both chocolates have hard centres, given that
after it has been taken out of the bag.
the second chocolate has a hard centre. (C)
5. Mass-produced glass bricks are inspected ~or 12. In a game, a steel ball is dropped onto a set of Draw a tree diagram to show all possible
defects. The probability that a brick has a1r nails arranged in three levels as shown. selections. 3. (a) Explain in words the meaning of the symbol
bubbles is 0.002. If a brick has air bubbles the P(A IB) where A and B are two events. State
probability that it is also cracked is 0.5 while the When a ball hits a nail, the probability of it From your diagram, or otherwise, calculate, to
the relationship between A and B when
moving right or left before reaching the next two significant figures, the probability of
probability that a brick free of air bubbles is (i) P(A IB)~ 0, (ii) P(A IB)~ P(A).
cracked is 0.005. What is the probability that a level is!-. choosing
(b) When a car owner needs her car serviced she
brick chosen at random is cracked? The (a) three black marbles, phones one of three garages, A, B, or C. Of
,Q
probability that a brick is discoloured is 0.006. (b) a white marble, a black marble and a white her phone calls to them, 30% arc to garage
Given that discolouration occurs independently marble in that order, A, 10% to B and 60% to C.
of the other two defects, find the probability that (c) two white marbles and a black marble in
The percentages of occasions when the
a brick chosen at random has no defects. (0 &C) any order,
garage phoned can take the car in on the
r1Y11Y~
(d) at least one black marble.
6. In each round of a certain game a player can day of phoning arc 20% for A, 6% forB
State an event from this experiment which and 9% for C.
score 1 2 or 3 only. Copy and complete the table together with the event described in (d) would be
which ~haws the scores and two of the respective both exhaustive and mutually exclusive. (L)
Find the probability that the garage phoned
probabilities of these being scored in a single w L.J...J L.J...J L.J will not be able to take the car in on the day
c of phoning.
round. 17. Alec and Bill frequently play each other in a series
Calculate the probability of a ball of games of table tennis. Records of the outcomes Given that the car owner phones a garage
Score 1 2 3 (a) reaching A of these games indicate that whenever they play a and the garage can take her car in on that
(b) reaching B, series of games, Alec has the probability 0.6 of day, find the probability that she phoned
Probability (c) dropping into slot C. (NEAB) winning the first game and that in every garage B. (L)
8. Of a group of pupils studying at A-level in defined as follows: from the disease is denoted by B.
4. A shop stocks tinned cat food of two makes, A schools in a certain area, 56% are boys a~d 44% X: the catch consists of two grade A balls and Evaluate (i) P(A), (ii) P(A u B),
and B, and two sizes, large and small.
are girls. The probability that a boy of th1~ _group two grade C balls (iii) P(A n B), (iv) P(A I B).
Of the stock, 70% is of brand A, 30% is of J
is studying Chemistry is and t~e probab_Ihty Y: the catch consists of two grade B balls and (b) If three different people are selected at
random without replacement, what is the
brand B. that a girl of this group 1s studymg Chemistry two other balls
Z: the catch includes the golfer's own ball probability of (i) all three having the disease,
Of the tins of brand A, 30% are small size whilst is -iT· (ii) exactly one of the three having the
of the tins of brand B, 40% are small size. (a) Find the probability that a pul?il select~d at Assuming that the catch is a random selection disease, (iii) one of the three being a female
random from this group is a gtrl studymg from the balls in the pond, determine with the disease, one a male with the disease
Using a tree diagram, or otherwise, find the (a) P(X), (b) P(Y), (c) P(Z), (d) P(Z I Y).
Chemistry. . and one a female without the disease?
probability that (b) Find the probability that a puptl sele~ted at For each of the pairs X and Y, Y and Z, state, (c) Of people with the disease 96% react
(a) a tin chosen at random from the stock will random from this group is not studymg with a brief reason, whether the two events are positively to a test for diagnosing the disease
be of small size, Chemistry. . . (i) mutually exclusive, (ii) independent. (C) as do 8% of people without the disease.
(b) a small tin chosen at random from the stock (c) Find the probability that a Chemistry puptl What is the probability of a person selected
will be of brand A. (L) selected at random from this group is male. 12. [In this question, give your answers in decimal at random (i) reacting positively, {ii) having
form, correct to three significant figures.] the disease given that he or she reacted
5. A die is known to be biased in such a way that, (You may leave your answers as fractions in their positively? (AEB)
A choir has seven sopranos, six altos, three
when it is thrown, the probability of a si_x . lowest terms.) (O&C)
tenors and four basses. The sopranos and altos
showing is t. This biased die an~ ~n ordmary fatr are women and the tenors and basses are men. 15. In an experiment two bags A and B, containing
die are thrown. Find the probability that 9. Explain, by suitably defining ~~ents A and B,. red and green marbles are used. Bag A contains
what is meant by 'the probabthty of A occurnng At a particular rehearsal, three members of the
choir are chosen at random to make the tea. four red marbles and one green marble and bag
(a) the fair die shows a six and the biased die given that B has occurred'. B contains two red marbles and seven green
does not show a six, (a) Find the probability that all three tenors are marbles. An unbiased coin is tossed. If a head
{b) at least one of the two dice shows a six, A local greengrocer sells conventionally grown
and organically grown vegetables. chosen. turns up, a marble is drawn at random from bag
(c) exactly one of the two dice shows a six, . (b) Find the probability that exactly one bass is A while if a tail turns up, a marble is drawn at
given that at least one of them shows a stx. Conventionally grown vegetables constitute 80% chosen.
(C) random from bag B. Calculate the probability
of his sales· carrots constitute 12% of the (c) Find the conditional probability that two that a red marble is drawn in a single trial. Given
convention'al sales and 30% of the organic sales. women are chosen, given that exactly one that a red marble is selected, calculate the
6. A golfer observes that, when. playing _a parti~ular Display this information in an appropriately and bass is chosen. probability that when the coin was tossed a head
hole at his local course, he htts a stratght dn_ve (d) Find the probability that the chosen group was obtained. {L)
accurately labelled tree diagram.
on 80% of the occasions when the weather 1s not contains exactly one tenor or exactly one
windy but only on 30% of the occasions when One day a customer emerges from the sh~p and bass (or both). (C)
is questioned about her purchases. What 1s the 16. In a computer game played by a single player,
the weather is windy. Local records suggest that the player has to find, within a fixed time, the
the weather is windy on 55% of all days. probability that she bought 13. Vehicles approaching a crossroads must go in path through a maze shown on the computer
(a) Show that the probability that, on a {a) conventionally grown carrots, one of three directions- left, right or straight on. screen. On the first occasion that a particular
randomly chosen day, the golfer will hit a (b) carrots? Observations by traffic engineers showed that of player plays the game, the computer shows a
straight drive at the hole is 0.525. vehicles approaching from the north, 45% turn simple maze, and the probability that the player
(b) Given that he fails to hit a straight drive at Given that she did buy carrots, what is the left, 20% turn right and 35% go straight on. succeeds in finding the path in the time allowed
the hole, calculate the probability that the probability that they were organica_lly grown_?
What assumptions have you made m answenng
Assuming that the driver of each vehicle chooses
direction independently, what is the probability
*·
is On subsequent occasions, the maze shown
depends on the result of the previous game. If the
weather is windy. (NEAB)
this question? (0) that of the next three vehicles approaching from player succeeded on the previous occasion, the
7. In my bookcase there are four shelves and the the north next maze is harder, and the probability that the
number of books on each shelf is as shown in the 10. In a simple model of the weath~r in Oct_ober, (a) all go straight on,
player succeeds iS one half of the probability of
table: each day is classified as either fmc or ramy. !he success on the previous occasion. If the player
probability that a fine day is follow~d by a ~me (b) all go in the same direction,
failed on the previous occasion, a simple maze is
(c) two turn left and one turns right,
Hardback Paperback day is 0.8. The probability that a ramy d~y. IS shown and the probability of the player
(d) all go in different directions,
followed by a fine day is 0.4. The probabthty succeeding is again l
(e) exactly two turn left?
11 9 that 1 October is fine is 0.75.
Shelf 1 The player plays three games.
Shelf 2 8 12 (a) Find the probability that 2 October is fine Given that three consecutive vehicles all go in the
and the probability that 3 October is fine. (a) Show that the probability that the player
Shelf3 16 4 same direction, what is the probability that they
(b) Find the conditional probability that all turned left? (AEB) succeeds in all three games is }?1 •
Shelf 4 9 3 3 October is rainy, given that 1 October is (b) Find the probability that the player succeeds
fine. in exactly one of the games.
14. During an epidemic of a certain disease a doctor (c) Find the probability that the player does not
(a) If I choose a book at random, irrespective of {c) Find the conditional probability that is consulted by 110 people suffering from have two consecutive successes.
its position in the bookcase, what is the 1 October is fine, given that 3 October is symptoms commonly associated with the
probability that it is a paperback? rainy. (C) (d) Find the conditional probability that the
disease. Of the llO people, 45 are female of player has two consecutive successes given
(b) I am equally likely to choose any shelf. I whom 20 actually have the disease and 25 do
choose a shelf at random and then choose a that the player has exactly two successes. (C)
11. At the ninth hole on a certain golf course there is not. Fifteen males have the disease and the rest
book. (i) What is the probability t~at it is a a pond. A golfer hits a grade B ball into the do not.
hardback? (ii) If the book chosen 1s a pond. Including the golfer's ball there are then 17. A sailing competition between two boats, A and
hardback, what is the probability that it is six grade C, ten grade B and four grade A balls (a) A person is selected at random. The event B, consists of a series of independent races, the
from shelf 3? in the pond. The golfer uses a fishing net and that this person is female is denoted by A competition being won by the first boat to win
'catches' four balls. The events X, Y and Z are and the event that this person is suffering three races. Every race is won by either A or B,
204 /l, CONC!SE COURSE IN ,~-LE')EL ST,t:... TiSTiCS
T
'
Given that the first race was won by A, Divide both sides by log(i). Since log(~) is negative, this will reverse the inequality sign.
and their respective probabilities of winning are determine the conditional probability that
influenced by the weather. In rough weather the log(O.Ol)
probability that A will ~in is. 0._9; in fine weather (a) the weather for the fir~t. race was rough, n;,
the probability that A wtll wm IS 0.~. For each (b) A will win the competitiOn. (C) log(il
race the weather is either rough or fme, the n;.25.3 ...
probability of rough weather.bei~g 0.2. ?how
that the probability that A will wm the first race The least value of n is 26
is 0.5.
(a) Problems involving an 'at least' situation Many probability examples involve the use of GPs and the following formula is required.
If s~ =a+ ar + ar 2 + ar 3 + ... (to infinity),
Example 3.29 then
(a) Find the probability of obtaining at least one six when five. dice are thrown.
a
(b) Find the probability of obtaining at least one s1x when n diCe a~e throw~. . . S = -- for Ir I < 1 where a is the first term and r is the common ratio
(c) How many dice must be thrown so that the probability of obtatmng at east one s1x lS at = 1-r
least 0.99?
Example 3.30
Solution 3.29 Joe and Pete play a game in which they each throw a die in turn until someone throws a six.
The person who throws the six wins the game. Joe starts the game. Find the probability that
(a) In one throw P(6) = ~and P(not 6) =~
he wins.
When five dice are thrown,
P(at least one six) = 1 - P(no sixes)
= 1- (~)'
Solution 3.30
= 0.598 (3 d.p.) Joe will win the game if he wins on his first go, or on his second go, or on his third go, and so
on.
(b) When n dice are thrown,
P(at least one six) = 1 - (~)" P(Joe wins on his first go)=~
(c) You need to find n such tbat P(Joe wins on his second go)= P(Joe doesn't throw a six, Pete doesn't throw a six,
then Joe throws a six)
1- (~)" " 0.99 =~X~ (~)2 Xi= Xi
1.e. <il" <: O.ol
You could do this by trial and improvement:
P(Joe wins on his third go)= i x ~xi xi x l; = (i) 4 x l; and so on
P(Joe wins)= l; + (~) 2 x l; + (il 4 x (tl + .. ·
(~) 20
= 0.026 ... > O.ol
= ~(1 + (~)2 + (~)4 + ... )
(i) 25 = 0.0104 ... > O.ol
<iJ 26 = 0.0087 ... < 0.01 Now 1 + (~) 2 + (~) 4 + · · · is the sum of an infinite GP with a= 1, r = (~) 2 = *·
a 1
s~ = - - = -----zs =II .. P(Joe wins) =~X#= i'r
36 •
So the least value of n is 26.
. . 26 dice must be thrown. 1-r 1-36
The first letter can be chosen in four ways (either A orB or Cor D)
Exercise 3e Usefui methods the se~ond letter can be chosen in three ways, '
7. A, B, C and D throw a coin, in turn, starting the th1rd letter can be chosen in two ways,
1. A coin is biased so that the probability that it with A. The first to throw a head wins. The the fourth letter can be chosen in only one way.
falls showing tails is 0.75. game can continue indefinitely until a head is
(a) Find the probability of obtaining at least one thrown. However, D objects because the others Therefore the number of ways of arranging the four letters is 4 x 3 x 2 x 1 = 4! = 24.
head when the coin is tossed five times.
(b) How many times must the coin be tossed so
have their first turn before him.
Compare the probability that D wins with the
On a calculator: GJ ~ (You may have to use ISHIFT I key.)
that the probability of obtaining at least one
probability that A wins. The arrangements are
head is greater than 0.98?
2. A missile is fired at a target and the probability 8. A box contains five black balls and one white ABCD ABDC ACBD ACDB ADCB ADBC
baiL Alan and Bill take turns to draw a ball from
that the target is hit is 0.7. BCDA BCAD BDAC BDCA BACD BADC
the box, starting with Alan. The first boy to
(a) Find how many missiles should be fired so draw the white ball wins the game. CDBA CDAB CABD CADB CBAD CBDA
that the probability that the target is hit at DABC DACB DBCA DBAC DCAB DCBA
least once is greater than 0.995. Assuming that they do not replace the balls as
(b) Find how many missiles should be fired so they draw them out, find the probability that Bill
that the probability that the target is not hit wins the game.
Example 3.31
is less than 0.001. If the game is changed, so that, in the new game,
they replace each ball after it has been drawn A witness reported. that a car seen speeding away from the scene of the crime had a number
3. A die is biased so that the probability of out, find the probabilities that:
obtaining a three is p. When the die is thrown plate that began w1th V or W, the digits were 4, 7 and 8 and the end letters were A, c, E. He
four times the probability that there is at least (a) Alan wins at his first attempt; could not however remember the order of the digits or the end letters. How many cars would
one three is 0.9375. Find the value of p. (b) Alan wins at his second attempt;
(c) Alan wins at his third attempt.
need to be checked to be sure of including the suspect car?
How many times should the die be thrown so
that the probability that there arc no threes is Show that these answers are terms in a
less than 0.03? Geometric Progression. Hence find the Solution 3.31
probability that Alan wins the new game.
4. On a safe there are four alarms which are There are 3! ways of arranging the digits 4, 7, 8 and
arranged so that any one will sound when
9. Two archers A and B shoot alternately at a 3! ways of arranging the letters A, C, E.
someone tries to break into the safe. The
target until one of them hits the centre of the There are two choices for the initial letter.
probability that each alarm will function
properly is 0.85, find the probability that at least target and is declared the winner. The total number of different plates= 2 x 3! x 3!
one alarm will sound when someone tries to Independently, A and B have probabilities of =72
break into the safe. t and~. respectively, of hitting the centre of the 72 cars would need to be checked.
target on each occasion they shoot.
5. For a certain strain of wallflower, the probability
(a) Given that A shoots first, find (i) the
that, when sown, a seed produces a plant with
probability that A wins on his second shot,
yellow flowers is k· Find the minimum number of
(ii) the probability that A wins on his third
seeds that should be sown in order that the
probability of obtaining at least one plant with
shot, (iii) the probability that A wins. Result 2
(b) Given that the archers toss a fair coin to
yellow flowers is greater than 0.98. (L)
determine who shoots first, find the
probability that A wins. (NEAB)
n!
6. Two people, A and B, play a game. An ordinary The number of \\'ays of in a line It objects, of vv-hlch p arc lS
die is thrown and the first person to throw a four p!
wins. A and B take it in turns to throw the die,
starting with A. Find the probability that B wins. If instead of the letters A, B, C, D you have the letters A, A, A, D then the 24 arrangements
hsted prev10usly reduce to the following:
AAAD AADA ADAA DAAA
ARRANGEMENTS
So the number of ways of arranging the four objects, of which three are alike
In order to calculate the number of possible outcomes in a possibility space or an event, the
4! 4 X 3X2X 1
following results are often used.
3!
= 4. On a calculator: GJ ~ G [I] ~ 0
The result can be extended as follows:
Result 1 T'hc number of \\'a)'S of arra,nvinP ln a linen orrJcm of vvhicb p of one type arc q of a
r of a third type arc and so on, is
p!q!r! ...
NOTE: n! = n x (n- 1) x (n- 2) x ... x 3 x 2 x 1.
For example, consider the letters A, B, C, D.
208 A CONCIS[ COUf~S[ IN i\-LEVEL. STATISTICS
T PROBABILiTY 209
(a) Consider the word STATISTICS. El is the event 'the two youngest are not together'.
P(E') = 1 - P(E)
There are ten letters and S occurs three times,
T occurs three times, = 1-0.2 = 0.8
I occurs twice. The probability that the two youngest are separated is 0.8.
10!
Therefore number of ways=-- = 50 400
3!3!2!
On a calculator: [Q] §] G [I] §] G [I] §] G m§] G Example 3.34
(c) If two Os and two Ns are adjacent then it is easier to think of each pair being glued
\ AEAI \ as one item. So treat together like this (]) and m
(b) To find the number of wac with\ vowels together treat Number of different arrangements of L, ®, NN, D = 4! = 24
M T C s and AEAI as 8 ttems. 24 2
M' T ' H ' ' ' ' .. P(two Os, two Ns are adjacent)=--=-
8! 180 15
nts - - - = 10 080
Then numb er o.f arrangeme - 2!2!
(d) If the first two cards are L, 0 then you need to find the number of different arrangements
ofN,D,O,N
4! 4'
Number of arrangements=--'.= 12
A I however can b e arra nged in-= 12 ways, 2!
Th e vowe1s A, E, , ' ' 2 ,.
12 10 080 = 120 960 2 Ns
so total number of arrangements = x
- 120 960 0.024 (2 s.f.) It is quite easy to list these arrangements
P(vowels together) - 989 600
4
*NOON DNNO
NDNO DNON
NOON DONN
NOND ONND
Example 3.36 f h d LONDON are each written on a card and the six cards are then NNDO ONDN
The stx letters o t e wor NNOD ODNN
shuffled and placed in a line.
b f different arrangements. h Of course, only one of these marked (*)will spell LONDON
(a) Calculate the num er o . d both have the letter N on t em.
(b) Find the probability that the mtddle ~wo ca~ Is tt r 0 are adjacent and the two cards with So P(L, 0 and four remaining letters spell LONDON)= .2_
(c) Find the probability that the two car s Wlt e e 12
Thel:::~s~~r::~;~e~::::t~nd placed in a line, face down. The first two cards in the line are
turned over and reveal the letters Land 0. . 11
(d) Find the probability that when the other four cards are turned over the letters w!ll spe (L)
Result 3
rhc number of ,,.. ays of 11 unlike uu 1cc" m a '>'vhcn clock\vlsc and antidock•lvisc
LONDON.
~ arJ.'angcmcnts
arc different is ljt- 1 )!
Solution 3.36 For example, consider four people A, B, C and D, who are to be seated at a round table. The
6! following four arrangements are the same, as A always has D on his iinmediate right and B on
(a) Number of different arrangements of LONDON= 2! x 2! = 180 his immediate left.
/'
2 0•
"' 2 N•
d to find the number of different
(b) If the middle two letters are NN, then you nee
arrangements of LODO.
4!
Number of arrangements= 21 = 12
To find the number of different arrangements, fix A and then consider the number of ways of
2 Os
12 1 arranging B, C and D.
P(middle two letters are NN) = 180 = 1s Therefore the number of different arrangements of four people around the table is 3!
T PRCE~l\BiliTY 213
212 A COi\iC!SE CCJ!_JRS[ !N /'\-I__ EVF\ ST.i TIS-f!CS
Example 3.38
Result 4 I One white, one blue, one red and two yellow beads are thread d .
Fmd the probability that the red and white b d e on a rmg to make a bracelet.
ea s are next to each other.
"vhcn and
The number of vvays of n unlil<:c objects in a
(n- I)!
antidock\visc arr:angcn1cnts arc the same, -is '- ---~--- Solution 3.38
2
Let S be the possibility space.
For example, if A, B, C and D are four different coloured beads which are threaded on a ring,
then the following two arrangements are the same, since one is the other viewed from the If. all the objects are unlike, the number, of ways of arranging five bead son a nng 4!, b ut
. 1.s -
4
other side. smce there are two yellows, n(S) = __·_ = 6 2
(2)(2!)
Let E be the event 'the red and the white beads are next to each other'.
r-- red and whitl' ~-an he aHangcd jn 2! W<lf'S
c f...
2'2 ' '
<---- ·mt;d O~.(\\·ISe
·I ·· · I C 1OC 1\\VI~C
;-)!1( - <1!'1-<lllf;e!JJCiltS HrC the S<ll11('
'0'"0 Q·
Let E be the event 'the bulbs that do not grow are next to each other'.
Consider the two bulbs that do not grow as one item. They can be arranged in 2! ways.
There are now five items to be arranged in a ring and this can be done in 4! ways.
n(E) 2!4!
·O, ·Ow'O·
Therefore =
P(E) = n(E)
So n(S)
2!4!
5!
2
NhOTE:
ot er. as expected, in three of the six arrangements the red and whl'te b ead s are next to each
5
The probability that the bulbs that do not grow are next to each other is~-
214 /\ CC!\CiSt:_ COlH\SF:. If"! i\-i ['/H_ ST.t.TiSTICS
n! n!
NOTE: Using the formula, "P, ~ - --
(n-n)! 0! Solution 3.40
But the number of ways of arranging n unlike objects is n! Let S be the possibility space, then n(S) ~ s c 4 ~ 70
0! cc !
Try it on your calculator. P(E) ~ n(E) ~ _1_.5, ~ 2_
n(S) 70 14
The probability that the four letters chosen are consonants is lw
COMBINATIONS OF r OBJECTS FROM n OBJECTS
When considering the number of combinations of r objects from n objects, the order in which
they are placed is not important. Example 3.41
For example, the one combination ABC gives rise to 3! permutations A team of four is chosen at random from five girls and six boys.
ABC,ACB,BCA,BAC,CAB,CBA (a) In how many ways can the team be chosen if
Denoting the number of combinations of three letters from the seven letters A, B, C, D, E, F, (i) there are no restrictions·
G, by 7 C 3 then (ii) there must be more boy~ than girls?
7 C 3 x3!~ 7 P3 (b) Find the probability that the team contains only one boy.
~ p 3 ~_2!_~35
7
7C
3 3! 3!4!
On the calculator,
7C can be obtained directly:
3
I2J \" C, I [Til ~ I (You may have to use the shift key.)
Pf\OW\BH_ITY 217
216 fo.. CONCISE COURSE. IN A.-I ['/EL_ ST/~.TIST!CS
29. A committee consisting of six persons is to be (ii) Given that the committee consists of
25. How many even numbers can be formed with the selected from five women and six men.
20. A competition has a first prize, a second prize, a digits 3, 4, 5, 6, 7 by using some or all of the three men and three-women and that
third prize and a fourth prize. Ten competitors numbers (repetitions are not allowed)? (a) Calculate the number of ways in which the the men and women must sit alternately
enter this competition and the prizes are awarded chosen committee will contain exactly two round the table, calculate in how many
for the first, second, third and fourth competitors men. different ways they may be seated.
26.
in order of merit.
(a) Find the number of different ways in which
these prizes could be won.
\0000\ (b) Given that the committee is to contain at
~east two men, show that it can be selected
m 456 ways.
(L Additional)
I
Miscellaneous worked examples
" Conditional probability
P(A and B) Example 3.45
P(A given B) P(B) A die is biased so that, when it is rolled, the probability of obtaining a score of 6 is 1. The
probabilities of obtaining each of the other five scores 1, 2, 3, 4, 5 are all equal. Cal~ulate the
P(A n B)
i.e. P(A IB) = P(B) probability of obtaining a score of five with this biased die.
(a) The biased die and au unbiased die are now rolled together. Calculate the probability that
P(A and B)= P(A IB)P(B) = P(B I A)P(A).
the total score is 11 or more.
(b) The two dice are rolled again. Given that the total score is 11 or more, calculate the
" For independent events A, B
probability that the score on the biased die is 6. (C)
P(A IB)= P(A)
P(B IA) = P(B) Solution 3.45
P(A and B) = P(A) x P(B) 'and' rule for independent events
Events
® Tree diagrams (Multiply along the branches) 6B: score 6 on biased die 6u: score 6 on unbiased die
P(A n B)= P(A) x P(B I A) 5B: score 5 on biased die 5u: score 5 on unbiased die
For the biased die, P(6B) =!
:. P(scoreis1,2,3,4or5)=i
1 3 3
P(A n B') = P(A) x P(B' I A) P(5B)=-X-=-
5 4 20
(a) P(11 ormore)=P(6B6u)*+P(6B5u)*+P(5B6u)
1 1 1 1 3 1
==-X- + -X- + -x-
P(A' n B)= P(A'l x P(B IA') 4 6 4 6 20 6
=~[~+~+ ;o]
1 13
P(A' n B'l = P(A'l x P(B' 1 A') ::::-X-
6 20
P(B) =P(A n B)+ P(A' n B) 13
o Arrangements, permutations and combinations 120
- The number of ways of arranging n unlike objects in a line n! . · ) P(6B and score is 11 or more)
(b) P(6 B Iscore 1s 11 or more ==
- The number of ways of arranging in a line n objects of which P of P(score is 11 or more) ------marked '' ahnve
n!
one type are alike, q of another type are alike, r of a third type are 1 1 1 1 1
p!q!r! ... -x-+-x-
4 6 4 6 12 10
alike, and so on
13 13 13
- The number of ways of arranging n unlike objects in a ring when (n- 1)!
clockwise and anticlockwise arrangements are different 120 120
(a) A record card from 1996 is selected at random. Let A represent the event that the dog
T
! Solution 3.4 7
(a) The first post can be allocated in 8 possible ways.
referred to on the record card was female and B represent the event that the dog referred
The second post can be allocated in 7 possible ways.
to was suffering from the disease.
The th1rd post can be allocated in 6 possible ways.
Find Number of allocations~ 8 x 7 x 6 ~ 336
(i) P(A), (b) Number of different sets of three officers~ 8 C3 ~56
(ii) P(A u B), (c) If both the Browns are chosen
(iii) P(A n B), number of ways to choose thi;d representative= 6
(iv) P(A I B). 6 3
So P(both Browns are chosen) ~- ~-
(b) if three different record cards are selected at random, without replacement, find the 56 28
probability that
(i) all three record cards relate to dogs witb the disease,
(ii) exactly one of the three record cards relates to a dog with the disease, Example 3.48
(iii) one record card relates to a female dog with the disease, one to a male dog with the
disease and one to a female dog not suffering from the disease. (L)
A factory has three machines A, B, C producing large numbers of a certain item. Of the total
da1ly produ~tlon of the item, 50% are produced on A, 30% on Band 20% on C. Records
show ~hat 2 Yo of 1tems produced on A are defective, 3% of items produced on B are defective
Solution 3.46 ~nd 4 Yo of Items produced on C are defective. The occurrence of a defective item is
Summarising the information in a table: mdependent of all other items.
Diseased (B) Not Diseased (B') Total One item is chosen at random from a day's total output.
25 35 60 (a) Show that the probability of its being defective is 0.027.
Female (A)
20 45 65 (b) Given that it is defective, find the probability that it was produced on machine A. (W)
Male (A')
45 80 125
Total
Solution 3.48
(a) (i) P(A) ~ f2~ ~ 0.48 Events are defined as follows
25 + 35 +20 80
(ii) P(A U B)~ ~ -~ 0.64 A: Item produced on A P(A) ~ 0.5 P(D, given A)~ 0.02
125 125
B: Item produced on B P(B) ~ 0.3 P(D, given B) ~ 0.03
(iii) P(A n B) ~ ffs ~ 0.2
C: Item produced on C P(C) ~ 0.2 P(D, given C)~ 0.04
(iv) P(A I B) ~ if~~
D: Item is defective
(b) (i) P(BBB) ~ ;i'5 x 1\~ x ;i'3 ~ 0.045 (2 s.f.)
(ii) Number of ways of arranging B, B', B' ~ 3
So P(BB'B' in any order)~ 3 x ;i'5 x cjl£, x Zi.3 ~ 0.44 (2 s.f.) ()()~0 P(D n A) ~ 0.02 x o.s ~ 0.01 *
Number of ways of arranging the cards~ 3!
A~D'
(iii)
So P(female with disease, male with disease, female without disease)
~ 3! x fl5 x 12~4 x {;,", ~ 0.055 (2 s.f.)
').b
I
i
(ii) P(Bird not caught by Albert)= P(B n A')
(a) P(D) = P(D and A)+ P(D and B)+ P(D and C) = P(B) X P(A )
1
(~ ---- i1 :-11\d /\' a!·c: iiHlC'(F'thkll~
= 0.01 + 0.009 + 0.008 = 0.3 X 0.8
= 0.027 = 0.24
(b) You already know that P(D given A)= 0.02, but now you need to 'reverse the conditions' Before answering the next parts, it is useful to show all the given information on a tree diagram:
to find P(A given D)
P(A and D) f--·- --- ·rnrkeci on trc\ A~M P(MnA)=0.1 '''"'P"'''"i(<i
Use P(A given D)
P(D)
0.01 ~( v
o~B
0.027 !-~------- fmtml in p<tn (;t}
(),'
= 0.370 (3 d.p.). P(M n L) = 0.15)
Example 3.49
A house is infested with mice and to combat this the householder acquired four cats, Albert,
P(M n K) = 0.05
Belinda, Khalid and Poon. The householder observes that only half of the creatures caught are ~M
mice. A fifth are voles and the rest are birds.
K~V
20% of the catches are made by Albert, 45% by Belinda, 10% by Khalid and 25% by Poon. B
(a) The probability of a catch being a mouse, a bird or a vole is independent of whether or
not it is made by Albert. What is the probability of a randomly selected catch being a P(MnN)=0.2
The writer submits a different poem for each of {i) Calculate the probability that a
three separate issues of the magazine. Given that randomly chosen playing unit is
Miscellaneous exercise 3g the probabilities remain the same, calculate the rejected.
5. Forty 17- and 18-year old students are the only probability that all three of her poems are (ii) Given that a playing unit is accepted,
1. Each time a table tennis player serves, the
people present at a party. The numbe~s of .male accepted. (C) calculate the probability that a fault was
probability that she wins the point is 0.6, found on the first test. Give your answer
and female students of each age are g1ven m the
independently of the result of any preceding correct to three significant figures.
following table. 8. At an art exhibition seven paintings are to be
serves. At the start of a particular game, she
hung in a row along one wall. Find the number (b) The probability of a randomly chosen
serves for each of the first five points. Calculate
the probability that, for the first two points of
this game,
___ 17-year
_:__ old
18-year old of possible arrangements.
Given that three paintings are by the same artist,
headphone unit being found to be faulty on
the first test is 0.04. If a second test is
Male 9 13 find the number of arrangements in which needed, the probability of a headphone unit
(a) she wins both points, 11 being found to be faulty on the second test is
(b) she wins exactly one of these two points. Female 7 (a) these three paintings are hung side by side,
0.02. Calculate the probability that a
(b) any one of these three paintings is hung at
randomly chosen headphone unit is
Calculate the probability that, for the first five In the Grand Draw, each of the forty students the beginning of the row but neither of the
accepted. Give your answer correct to three
points of this game, has an equal chance of winning one of two other two is hung at the end of the row. (C)
significant figures.
(c) she loses all five points, prizes. The first prize is a gift token and the (c) A randomly chosen playing unit that has
second prize is a box of chocolates. No student 9. A group of three pregnant women attend
(d) she wins at least one of these five points. (C) been accepted and a randomly chosen
may win more than one prize. Find the ante-natal classes together. Assuming that each
headphone unit that has been accepted are
probability that woman is equally likely to give birth on each of
2. A director of a company is selected at random. combined to make a personal stereo system.
the seven days in a week, find the probability
C denotes the event that the director's annual (a) the gift token will be won by an 18-year old Calculate the probability that at least one of
that all three give birth
salary is more than £300 000. male student, the two units has been retested. Give your
C' denotes the event that the director's annual {b) both prizes will be won by female students, (a) on a Monday, answer correct to three significant figures.
(c) the box of chocolates will be won by a (b) on the same day of the week, (C)
salary is not more than £300 000.
D denotes the event that the director's annual J 7-year old student, given that the gift token (c) on different days of the week,
is won by a 17-year old male student. {C) (d) at a weekend (either a Saturday or Sunday). 12. A bag contains four red counters, three blue
salary is less than £200 000.
(e) Find the probability of all three giving birth counters and three green counters. A counter is
E denotes the event that the director's annual
6. Each customer at a supermarket pays by one of on the same day of the week given that they drawn at random from the bag and not replaced.
salary is less than £350 000.
cash, cheque or credit card. The probability of a all give birth at a weekend. A second counter is then drawn at random from
Write down two of the events C, C', D and E randomly selected customer paying by cash is (f) How large would the group need to be to the bag.
which are 0.54 and by cheque is 0.18. make the probability of all the women in the Assuming that at each stage each counter left in
(a) complementary, group giving birth on different days of the the bag has an equal chance of being drawn,
(a) Determine the probability of a randomly
(b) mutually exclusive but not exhaustive, week less than 0.05? (AEB)
selected customer paying by credit card. (a) find the probability, giving your answers as
(c) exhaustive but not mutually exclusive. fractions in their lowest terms, that the
(AEB) Three customers are selected at random. 10. The probability that for any married couple the
husband has a degree is fa and the probability second counter will be blue given that
(b) Find the probability of
3. Newborn babies are routinely screened for a that the wife has a degree is!. The probability (i) the first counter is red,
serious disease which affects only two per 1000 (i) all three paying by cash, that the husband has a degree, given that the (ii) the first counter is blue,
babies. The result of screening can be positive or (ii) exactly one paying by cheque, wife has a degree, is tf. (iii) the first counter is green.
negative. A positive result suggests that the baby (iii) one paying by cash, one by cheque and
A married couple is chosen at random. (b) Find the probability, giving your answer as a
has the disease, but the test is not perfect. If a one by credit card.
Find the probability that fraction in its lowest terms, that the first
baby has the disease, the probability that the counter will be red and the second counter
result will be negative is 0.01. If the baby does The probability that the amount payable exceeds (a) both of them have degrees,
£30 is 0.26. If the amount payable does exceed will be blue.
not have the disease, the probability that the (b) only one of them has a degree, (c) Find the probability, giving your answer as a
£30, then the probability of it being paid by (c) neither of them has a degree.
result will be positive is 0.02. fraction in its lowest terms, that the second
cheque is 0.28.
(a) Find the probability that a baby has the Two married couples are chosen at random. counter will be blue regardless of the colour
disease, given that the result of the test is {c) Find the probability that a randomly of the first counter. (C)
selected customer pays more than £30 and (d) Find the probability that only one of the two
positive. husbands and only one of the two wives
(b) Comment on the value you obtain. (L) pays by cheque. 13. A particular firm has six vacancies to fill from
(d) Hence find the probability that a randomly have a degree. {L) 15 applicants. Calculate the number of ways in
4. A penalty shoot-out in a game of hockey requires selected customer pays more than £30, given which these vacancies could be filled if there are
that the customer pays by cheque. (AEB) 11. A personal stereo system consists of a playing
each of two players to take a pena!ty hino try to no restrictions.
unit and a headphone unit. Each unit is tested for
score a goal. In a simple model, each player has a faults. If a unit is found to be faulty, an attempt The firm decides that three of the six vacancies
7. A writer submits a poem for publication by a shall be filled by women and three by men. The
probability of 0.8 of scoring a goal, and is made to correct the fault and the unit is then
independence is assumed. Calculate the probability literary magazine. The poem will be accepted for
retested. Any unit that is found to be faulty a applicants consist of seven women and eight men.
that exactly one goal is scored from the two hits. publication if it is approved by at least two of the Calculate the number of ways in which the six
three members of the editorial staff who second time is rejected.
In an alternative model, the probability of the vacancies could be filled under these conditions.
independently assess it. Given that the (a) The probability of a randomly chosen
second player scoring is reduced to 0. 7 if the first probabilities that the poem is approved by the playing unit being found to be faulty on the One of the seven women is the wife of one of the
player does not score. Calculate the probability three members are 0.9, 0.7 and 0.6 respectively, first test is 0.1. If a second test is needed, the eight men. Calculate the number of ways in
that the second player has scored, given that only find the probability that the poem is not probability of a playing unit being found to which three women and three men could fill the
one goal is scored. (C) accepted. be faulty on the second test is 0.05. six vacancies, given that both the wife and her
husband are among those appointed.
230 ,r:., CONCiSE: CGUF\S!:: !f'\ ,0..-l.E'v'El STi~.Ti~:JTICS
....
(b) In how many selections are there exactly indistinguishable so that, for example, the t~o Other affairs of an employee selected at random.
three boys? diagrams in figure 1 represent the same dommo.
(c) What is the probability that exactly three N;cky D is the event that a weekly paid employee is
boys are invited to the party? selected.
other Sam
In fact, there are three girls at the party, E is the event that an employee who received no
Other pay rise is selected.
including Laura, and three boys, including Liam Fig. 1
and John. For the party tea they sit round a A domino which has the same number of spots By considering the above tree diagram, or D' and E' are the events "not D" and "notE"
circular table, equally spaced, with Laura sitting otherwise, respectively.
at each end, or no spots at all, is called a
in the position shown in the diagram. 'double'. A domino is drawn at random from the (a) find the probability that both Nicky and Find
0 0 set. Figure 2 shows a sample space diagram to Sam are chosen, (a) P(D),
represent the complete set of outcomes, each of (b) find the probability that both Nicky and (b) P(D u E),
which is equally likely. Sam are chosen, given that at least one of (c) P(D' n E').
A Nicky and Sam is chosen. (C)
0 0 6 6 F is the event that an employee is female.
Laura 5 5
4
2. A bag contains ten balls, of which four are red
4 and six are blue. An experiment consists of (d) Given that P(F') = 0.8, find the number of
3 3
2 drawing at random and without replacement female employees.
0 0 2
(e) Interpret P(D \F) in the context of this
1 three balls, one at a time, from the bag.
(d) In how many different ways can the other 0 question.
2 3 4 5 6 (a) Draw a tree diagram to show all the possible
children fill the remaining seats? 0 2 3 4 5 6 (f) Given that P(D n F)= 0.1, find P(D IF).
Fig. 3 outcomes of the experiment. (AEB)
With Laura sitting in her place, the other Fig. 2
Hence, or otherwise, find the probability that
children take their seats at random. Let the event A be 'the domino is a double', 5. The captain of a darts team is trying to arrange
(e) Find the probability that Laura sits next to event B 'the total number of spots on the domino (b) the first two balls drawn will be of different
an evening match for next Monday, Tuesday,
is six' and event C be 'at least one end of the colours,
Liam and John. Wednesday or Thursday. He hopes that the
(f) Find the probability that boys and girls sit domino has five spots'. (c) the third ball will be red,
leading players, A, B, C and D, will all be free on
alternately. (MEl) (d) the third ball will be red, given that the first
Figure 3 shows the sample space with the event one of these evenings. In fact each of the four
tvvo balls drawn were both blue. (L)
A marked. players has arranged an engagement for exactly
15. A draw is being made for the quarter-finals of a one of the four evenings.
{a) Write down the probability that event A 3. Ann, Barry and Clare are three students taking a
knock-out table tennis tournament. Eight
counters, alike in every respect except that they occurs. multiple choice examination paper. For each Assuming that each player is equally likely to
are numbered from one to eight inclusive, are (b) Find the probability that either B or Cor question a student has to select the correct have chosen any one of the four evenings, and
placed in a bag and drawn one by one, without both occur. answer from five that are offered. For that their choices are independent, find the
(c) Determine whether or not events A and B Question 1, Ann has no idea of the correct probability that
replacement. A typical draw might produce the
numbers in the order 3, 5, 7, 2, 1, 8, 6, 4, are independent. answer, Barry correctly identifies one answer (a) ~and B have both chosen Monday evening,
(d) Find the conditional probability P(A I C). that is wrong and Clare correctly identifies two (b) etther CorD {or both) has chosen Monday
resulting in the matches:
Explain why events A and Care not wrong answers. All three students decide to guess evening,
Match A 3 plays 5 independent. at random from the answers they think stand a (c) the four players have chosen four different
Match B 7 plays 2 After the first domino has been drawn, a second chance of being correct. Calculate the probability evenings,
Match C 1 plays 8 domino is chosen at random from the remainder. that (d) there will be at least one evening when all
Match D 6 plays 4 four players are free. (NEAB)
(e) Find the probability that at least one end of (a) none of the three students chooses the
(a) In how many different orders can the the first domino has the same number of correct answer,
counters be drawn from the bag? spots as at least one end of the second (b) Clare is the only one to choose the correct
(b) In how many ways can the counters be answer,
domino.
drawn such that (c) exactly one of the three students chooses the
[HINT: Consider separately the cases ~h~re
(i) players 1 and 2 play each other in match the first domino is a double and where tt ts correct answer. (NEAB)
A, not.l (MEII
{ii) players 1 and 2 play each other.
232 /-\ CONCiS[ COUf~SE li'J /1.-LE_I.!EL ST/\TiSTiCS
PROBABILITY DISTRIBUTIONS
A probability distribution gives the probability of each possible value of the variable.
Consider this situation:
By mistake, three faulty fuses are put into a box containing two good fuses. The faulty and
good fuses become mixed up and are indistinguishable by sight. You take two fuses from the
box. What is the probability that you take
(a) no faulty fuses,
(b) one faulty fuse,
(c) two faulty fuses.
where the score is the sum of the two b .
density function (p.d.f.) of X, where ;~~hers ~ which the dice land. Find the probability
0
It is possible to show the outcomes and probabilities on a tree diagram: thrown' · e ran om vanable 'the score when t wo d'ICe are
Probability Outcome
P(F,F)=ix~=0.3 2 faulty fuses
Solution 4.1
The score for veach possible outcome is shown in the poss1'b'l'
1 1ty space:
:.a 4 ') 6 8
i
P(F,F') = X~= 0.3 1 faulty fuse
"""u
0 3 4 5 ()
F'~F'
5
1 2 3 4
0 faulty fuses First die
P(F',F') =~X~= 0.1
Since each outcome is e qua 11y n
Ice1y, the probabilities can be found from th d'
(a) P(no faulty fuses)= 0.1 For example P(X = ) _ _±_ • e tagram.
(b) P(one faulty fuse)= 0.3 + 0.3 = 0.6 , 5 - I6 smce 4 out of the possibl 16
(c) P(two faulty fuses)= 0.3 The probability distribution is formed: e outcomes result in a score of 5.
The variable being considered here is 'the number of faulty fuses' and it can be denoted by X.
The values that X can take are 0, 1 or 2. X 2 3 4 5 6 7 8
The probability that there are no faulty fuses, i.e. the probability that the variable X takes the
1 3 'l 2 1
value 0, can be written P(X = 0), so P(X = 0) = 0.1. 16 T6 16 -& T6 16
Similarly P(X = 1) = 0.6 and P(X = 2) = 0.3. Notice the pattern for the probabilities relating to x from 2 to 5.
Sometimes these are written Po= 0.1, p 1 = 0.6, p 2 = 0.3.
When defining variables, the variable is usually denoted by a capital letter (X, Y, R, etc) and a x-1
P(X=x)= _ for x=2 , 3 , 4 , 5
particular value that the variable takes by a small letter (x, y, r, etc), so that P(X = x) means 16
'the probability that the variable X takes the value x'. For x from 6 to 8, there is a different pattern
The probability distribution for X can be summarised in a table and illustrated in a vertical
9-x
line graph. 0.6 PI.X ~ ,, P(X =x) =--u; for x = 6, 7,8
0.5
0~3 I
0.4 These two formulae give the p.d.f. of X.
I
0 1 0.3
If the sum of the probabilities is 1, the variable is said to be random. Example 4.2
In this example P(X = 0) + P(X = 2) = 0.1 + 0.6 + 0.3 = 1, so X is a discrete random variable.
The p.d.f. of a discrete random variabl y. .
is 1, that cis a constant, find the value of c.e IS given by P(Y = y) = cy', for y = 0, 1, 2, 3, 4. Given
for a discrete nllnd•nm ., ... : ·'·"'
or
Solution 4.2
The function that is responsible for allocating probabilities, P(X = x), is known as the probability y 0 1 2 3 4
It helps to write out the
density function of X, sometimes abbreviated to the p.d.f. of X. The probability density function
probability distribution of Y. P(Y=y) 0 c 4c 9c 16c
can either list the probabilities individually or summarise them in a formula.
Example 4.1
Since Y is a rand om vanable
. I
' '"Y P(Y = Y) = 1 ' I.e.
· the sum of all the probabilities is 1.
Two tetrahedral dice, each with faces labelled 1, 2, 3 and 4 are thrown and the score noted,
236 ,t,
,.- ·· · · ·-- . It·:··.•.. c.-.·
CCY\C!St:. CClUFbt: 11\1 F\--U: ·-~
/\1 !STiCS
4. X has probability distribution as shown in the (b) William wins a prize if, at the end of his
table turn, there are two or more tennis balls in
So c + 4c + 9c + 16c = 1 the bucket. What is the probability that
30c= 1 X 1 2 3 4 5 William does not win a prize?
1
c=- 1 3 1 1 10. Emma plays a game in which she throws two
a
10 10 5 20 dice. If she gets two sixes, she wins 20p, if she
gets one six she wins 10p, otherwise she wins
(a} Find the value of a. nothing. She has to pay 5p to enter.
(b) FindP(X~4).
Example 4.3 (c) Find P(X < 1). Write out the probability distribution of X, the
(d) Find P(2 <:X< 4). amount Emma gains in one turn.
The discrete random variable W has probability distribution as shown.
3 -2 -1 0 1 5. Write out the probability distribution for each of 11. A student has a fair coin and two six-sided dice,
w these variables. one of which is white and the other blue. The
0.15 d student tosses the coin and then rolls both dice.
0.25 0.3 (a) The number of heads, X, obtained when
P(W=w) 0.1 Let X be a random variable such that if the coin
two fair coins are tossed.
falls heads, X is the sum of the scores on the two
(b) The number of tails, X, obtained when three
Find fair coins are tossed. dice, otherwise X is the score on the white die
only.
(b) P(-3 <: W < 0) (c) P(W> -1),
(a) the value of d, 6. A drawer contains eight brown socks and four Find the probability function of X in the form of
(d) P(-1 < W < 1), (e) the mode. blue socks. A sock is taken from the drawer at a table of possible values of X and their
random, its colour is noted and it is then associated probabilities.
replaced. This procedure is performed twice
more. X is the random variable the number of Find P(3 <:X<: 7).
Solution 4.3
brown socks taken. Find the probability State the assumption you made to enable you to
distribution for X. evaluate the probability function. (AEB}
(a) Since L P(W = w) = 1 7. The discrete random variable R has p.d.f.
a!! w 12. X can take values 5, 6, 7, 8 and 9. The vertical
P(R = r) = c(3- r) for r = 0, 1, 2, 3. line graph to illustrate the distribution of X is
0.1 + 0.25 + 0.3 + 0.15 + d = 1 incomplete. Given that P(X = 8) = 2P(X = 9),
(a) Find the value of the constant c.
0.8 + d = 1 (b) Draw a vertical line graph to illustrate the complete the line graph and describe the
d= 0.2 distribution. distribution.
(c) Find P(1 <: R < 3).
(b) P(-3,;; W < 0) = P(W = -3) + P(W = -2) + P(W = -1) PIX= X)
0.4
= 0.1 + 0.25 + 0.3 8 Write down the formula for the p.d.f. of X
where X is the numericil value of a digit chosen
=0.65
from a set of random number tables. 0.3
(c) P(W> -1) = P(W = 0) + P(W = 1) 9. A game consists of throwing tennis balls into a
= 0.15 + 0.2 0.2
bucket from a given distance. The probability
= 0.35 that William will get the tennis ball in the
bucket is 0.4. A turn consists of three attempts. 0.1
(d) P(-1<W<l)=P(W=0) (a) Construct the probability distribution for
=0.15 X, the number of tennis balls that land in
the bucket in a turn. 5 6 7 8 9x
(e) The value of w with the highest probability is -1, so the mode= -1.
EXPECTATION OF X, £()0
Exercise 4a Probability distributions E(X) is read as 'E of X' and it gives an average or typical value of X, known as the expected
value or expectation of X. This is comparable with the mean in descriptive statistics.
2. The probability den.sity. function(oXf a d)is~rketefor
1. The discrete random variable X has the given random variable X lS gtven by P =X - x
probability distribution. X= 12, 13, 14. . . df d
4 5
Write out the probability distnbutton an m Experimental approach
1 2 3
X the value of k.
0.4 a 0.05 The frequency distribution shows the results when an unbiased die is thrown 120 times.
0.25
0.2 3 The discrete random variable X can take values
" 3 5 6 8 and 10 only. Given that PJ = 0.1, Score, x 1 2 3 4 5 6
. d the value of a and draw a vertical line
(a) Fm 'b . p:.;, o.'os,
P6 = 0.45 and Ps = 3P1o• calculate Pto·
graph to illustrate the distn utton.
Frequency, f 15 22 23 19 23 18 Total120
(b) Find (i) P(1 <:X<: 3), (ii) P(X > 2 ),
(iii) P(2 <X< 5), (iv) the mode.
Example 4.4
3.6(2 s.f.)
a'"''""'~ 0(\X).
lx15 + 2x22 + 3x23 + 4x19 + 5x23 + 6x18 A random variable X has probability distribution as shown Find th .
The mean score,
'ifx
x ~ If 120
I;,x •I ,: ,: 0 · 1<
You could write this out in a different way 0.15 0.4 0.05 .
- -- 1 X 120
X 15 + 2 X 120
22 + 3 X 23
120 + 4 X 120
19 + 5 X 120
23 + 6 X I2]18
.
e ractwns 15
TI1l>
22
no' 23
no' 19
126,
23
,
18
T:W are t h e re1auve . o f t h e scores o f 1 , 2 , 3 , 4 , 5 , 6
. frequenctes
120
Th f Solution 4.4
respectively.
Notice that they are close to -f~o == *· E(X) ~ 2:: xP(X ~ x)
If you throw the die a large number of times, you would expect each of these fractions to be
closer to J:, the limiting value of the relative frequency of a particular score on the die. : all x
~-o~i X 0.3 + (-1) X 0.1 + 0 X 0.15 + 1 X 0.4 + 2 X 0.05
~ 0.5
''fhis can also be written The expected number of sixes when three dice are thrown is 0 5
NOTE· · ·
The symbolp, pronounced 'mew' is often used for the expectation, where
: m 50 throws you would ex ect 25 six
fOOO throws though, you may get !ry close t~sl~~J'racttce{:u
.
may not get 25 sixes. In
ong-term average value. sixes. e expected value gives you the
p=
-"~~--~~,_--,.~~~~
PROB.AbiUTY DISTRif3UT!Of'\S DiSCRi fT \/AR!i\BLES 241
240 f\ CONCISE. COUf\SE IN A-L[\/EL STP\liSTICS
P(X ~ x)
1
0.1
2
~ 0.3
[j][IJ[j] You win ~1
It can be seen from the table or from the vertical line graph
that the distribution is symmetrical about the central value
0.2
Gf]Gf]Gf] You win 50p
X= 3, so E(X) = 3. 0.1 I I
Check: E(X) = L xP(X = x)
allx
0 .J-..--!-+-1---1-+-
012345x
[QJ[QJ[QJ You win 40p
=] X 0.1 + 2 X 0.2 + 3 X 0.4 + 4 X 0.2 + 5 X Q.l = 3
(b) If X is the random variable 'the digit picked from a random number table', then the p.d.f. [j][I]Gf] You win !lOp
ofXisP(X=x)=O.l forx=O, 1, ... ,9. On o.ny orcler)
6 7 8 9
2. 3 4 5 Find the expected gain or loss if you play a game.
X 0 1
0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1
0.1 0.1 Solution 4.6
The .var~able X is 'the amount gained, in pence in a game'.
~ oj \IIIIIIII
Takmg mto account the cost of lOp to play, X can take the values 90 70 40 30 -10
' ' ' ' .
P(X = 90) = P(3 apples)= 0.1 x 0.1 x 0.1 = 0.001
P(X = 70) = P(2 apples and one with cherries, in any order)
0123456789x
= P(A, A, C)+ P(A, C, A)+ P(C, A, A)
The distribution is symmetrical about the central value mid-way between 4 and 5 so = 3 X 0.1 2 X 0.2
E(X) =4.5. = 0.006
NOTE: the random variable X with p.d.f. P(X = x) = k, for all possible values of x, where k is P(X = 40) = P(3 cherries)= (0.2) 3 = 0.008
a constant, is said to follow a discrete uniform distribution.
P(X = 30) = P(3lemons) = (0.4) 3 = 0.064
P(X = -10) = P(you win none of these prizes)
Example 4.6 = 1 - (0.001 + 0.006 + 0.008 + 0.064) = 0.921
A fruit machine consists of three windows which operate independe:'tly. Each window shows
The probability distribution for X is
pictures of fruits: lemons, apples, cherries or bananas. The probab1hty that a wmdow shows a
particular fruit is as follows. X 90 70 40 30 10
P(lemon) = 0.4 \ 0 \ P(cherries) = 0.2 \ ~\ P(X-x) 0.001 0.006 0.008 0.064 0.921
E(X) = L:xP(X=x)
allx
P(apple) = 0.1
P(banana) = 0.3 \ ~\ = 90
= -6.46
X 0.001 + 70 X 0.006 + 40 X 0.008 + 30 X 0.064 + (-10) X 0.921
Solution 4.8
(i) Find the profit on these magazines in a week when he sells 11 copies. X -5 1
(ii) Construct a probability distribution table for the newsagent's weekly profit from the 19
sale of these magazines. Hence, or otherwise, calculate an estimate of his mean 27
1nake the game worthwhile, perhaps :!ggest ~~:;;. wm tf you get a one or a six is £2.11. To
The minim urn amount you should su est th · ·
a s1x appears.
To make the game worthwhile to yourself, what is the minimum amount that you would
suggest?
244 .0.. CONC!Sl-: COUf\S'r~ iN ,,c:,,-!_.E\·Tl_ ST,D. TiS'I !CS
Find!'·
Draw up a table showing all the possible scores
and the probability of each. E(~) ~~~P(X~x)
If the player pays lOp for each game and receives
3. The probability distribution of a random back a number of pence equal to his score, E(X-4) ~~(x-4)P(X~x)
variable X is as shown in the table: calculate the player's expected gain or loss per 50
4 5 games. (C Additional)
1 2 3
X
0.1 12. In a game a player tosses three fair coins. He wins Example 4.9
0.1 0.3 y 0.2
£10 if three heads occur, £x if two heads occur,
£3 if one head occurs and £2 if no heads occur. The random variable X has p.d.f. P(X ~ x ) for x-_ 1, 2, 3 as shown.
Find (a) the value of y, (b) E(X).
Express in terms of x his expected gain from
4. Find the expected number of heads when two each game.
1 2 3
fair coins are tossed. Given that he pays £4.50 to play each game,
0.1 0.6 0.3
calculate
5. A bag contains five black counters and six red
(a) the value of x for which the game is fair,
counters. Two counters are drawn, one at a time, Calculate
(b) his expected gain or loss over 100 games if
and not replaced. Let X be 'the number of red x ~ 4.90. (C Additional)
counters drawn'. Find E(X). (a) E(X), (b) E(3), (c) E(5X), (d) E(5X + 3).
13. In an examination a candidate is given the four
6. An unbiased tetrahedral die has faces marked 1,
answers to four questions but is not told which
2, 3, 4. If the die lands on the face marked 1, the
answer applies to which question. He is asked to Solution 4.9
player has to pay lOp. write down each of the four answers next to its
If it lands lands on a face marked with a 2 or a 4, appropriate question. (a) E(X) ~~ xP(X ~ x)
the players wins 5p and if it lands on a 3, the ~ 1 X 0.1 + 2 X 0.6 + 3 X 0.3
(a) Calculate in how many different ways he
player wins 3p. Find the expected gain in one
could write down the four answers. ~2.2
throw. (b) Explain why it is impossible for him to have
7. A discrete random variable X can take values 10 just three answers in the correct places and (b) E(3) ~ ~ 3P(X ~ x)
and 20 only. If E(X) == 16, write out the show that there are six ways of getting just ~ 3 X 0.1 + 3 X 0.6 + 3 X 0.3
probability distribution of X. two answers in the correct places.
~3
(c) If a candidate guesses at random where the
8. The discrete random variable X can take values four answers are to go and X is the number Notice that the expected value of a constant is equal to the constant
0, 1, 2 and 3 only. Given P(X < 2) = 0.9,
P(X ~ 1) ~ 0.5 and E(X) ~ 1.4, find
of correct guesses he makes, draw up the
probability distribution for X in tabular form. (c) E(5X) ~ ~ 5xP(X ~ x) .
(a) P(X ~ 1), (b) P(X ~ 0). (d) Calculate E(X). (L Additional) ~ 5 X 0.1 + 10 X 0.6 + 15 X 0.3
~11
9. 0 1 2 3 14. The discrete random variable X has p.d.f.
X P(X ~ x) ~ kx for x ~ 1, 2, 3, 4, 5 where k is Notice that 5E(X) ~ 5 x 2.2 ~ 11
c c' constant. Find E(X). so E(5X) ~ SE(X).
The above table shows the probability 15. A woman has three keys on a ring, just one of
which opens the front door. As she approaches
distribution for a random variable X.
the front door she selects one key after another
Calculate (a) c, (b) E(X). (L Additional) at random without replacement. Draw a tree
10. A bag contains three red balls and one blue ball. diagram to illustrate the various selections
before she finds the correct key. Use this diagram
A second bag contains one red ball and one blue
ball. A ball is picked out of each bag and is then to calculate the expected number of keys that
placed in the other bag. What is the expected she will use before opening the door.
(L Additional)
number of red balls in the first bag?
i~-'f~03/~F:!i L. !T\' DiS-i-F\! DljTiO\~j DiSCH !I i/'VIAfiiES 247
E(R) = l:rP(R = r)
(d) E(5X+3)=1:(5x+3)P(X=x) =lx~+3x~+5x~+7x~+9x~+11xil
= 711 36 36
= 8 X 0.1 + 13 X 0.6 + 18 X 0.3 18
= 14 E(A) = 6 X 7fi = 47~
Notice that E(5X) + E(3) = 11 + 3 = 14
so E(5X + 3) = E(5X) + E(3) The expected value of the area is 47j cm2.
D
i.e. E(5X + 3) = 5E(X) + 3 1
=96E(~)
Example 4.10
E(~) =1:~ P(R = r)
A six-sided die has faces marked with the numbers 1, 3, 5, 7, 9 and 11. It is biased so that the
probability of obtaining the number R in a single roll of tbe die is proportional toR. = 1
11131517 1 9 1 11
-x- + - x -
X 36 + J X 36 + S X 36 + 7 X 36 + 9 36 11 36
(a) Show that the probability distribution of R is given by
6
r=1,3,5,7,9,11.
E(P) = 96 X~= 16
6
(b) The die is to be rolled and a rectangle drawn with sides of lengths 6 em and Rem. The expected value of the perimeter is 16 cm.
Calculate the expected value of the area of the rectangle. 1
(c) The die is to be rolled agaiu and a square drawn with sides of length 24R- em.
Calculate the expected value of the perimeter of the square. (NEAB) Example 4.11
X is the number of heads obtained when two coins are tossed. Find
Solution 4.10 (a) the expected number of heads
7 9 11
1 3 5 (b) E(X 2 ), '
(a) r
7k 9k 11k (c) E(X 2 - X).
k 3k 5k
"'
5
R
(b) A=6R
:. E(A) = E(6R)
= 6E(R) C~}
248 A CONCiSE. COURSE. IN A~ LEV[\_ STATiSTiCS
Theoretical approach
(c) E(X' _X) = "E(x 2 - x)P(X = x) For a discrete random variable X, with E(X) = /', the variance is defined as follows:
=0X~+Ox!+2X~
_1 1S
X) = !,
2
. h -E,(X') E(X) = 1 L 1 =! and E(X -
Nonce t at - 2
so E(X 2 X) E(X') E(X) Alternatively, Var(X) = E(X- 1<) 2
= E(X 2 - 21'X + 1' 2 )
= E(X 2 ) - 21'E(X) + E(l' 2 )
ln generaL for t\"vo functions of X, g(x) and h(x)
= E(X 2 ) - 21' 2 + 11 2
E(g(X! + hiX)) cc Elg(X!l + Elh(XI) = E(X 2) -~t 2
For example
Oil a variance
Example 4.14 1. The discrete random variable X has p.d.f. 7. The ra~dom variable X has p.d.f. P(X, x) as
Two boxes each contain three cards. The first box contains cards labelled 1, 3 and 5; the P(X = x) for x = 1, 2, 3. shown m the table:
second box contains cards labelled 2, 6 and 8. In a game, a player draws one card at random
from each box and his score, X, is the sum of the numbers on the two cards. 1 2 3 -2 -1 0 1 c
(a) Obtain the six possible values of X and find the corresponding probabilities. 0.2 0.3 0.5 0.1 0.1 0.3 0.4 0.1
(b) Calculate E(X), E(X 2 ) and the variance of X. (C Additional) Find the value of c (a) if E(X) = 0. 3 (b) if
Find (a) E(X), (b) E(X 2 ) (c) Var(X).
E(X 2) = 1.8. '
2. The discrete random variable X has the
probability distribution specified in the folio i 8. The ~iscre~e random variable X has probability
Solution 4.14 table. w ng function given by
Probability distribution
(a) Possibility space
-1 0 1 2
1!1' X"" 1, 2, 3, 4, 5,
X x==6
Second box
5 7 9 11 13
p(x)= (~ othet:wise,
3
2 6 8 X
1 2 2
9
1
9
I. P(X-x) 0.25 0.10 0.45 0.20
where c is a constant.
9 9 Determine the value of c and hence the mode and
1 (a) Find P(-1.;;X <1).
(b) Find E(2X + 3). mean of X. (L)
/~' i j_
First box 3
!'t I, 9. A; game consists of tossing four unbiased coins
5 ' 3. The discrete random variable X hasp d f
st.~ultaneously. The total score is calculated by
P(X = 0) = 0.05, P(X = 1) = 0.45 ...
P(X = 2) = 0.5. Find gtvmg thre~ points for each head and one point
(b) E(X) =~xP(X=x) (a) I'= E(X), (b) E(X 2 ), (c) E(5X 2 + 2X- 3). for each tad. The random variable X represents
=3X~+5X~+7X~+9X~+11x~+13X~ the total score.
4. The discrete random variable X has p.d.f. (a) Show that P(X = 8) = i\.
= 8} P(X = x) = k for x = 1, 2, 3, 4, 5, 6. Find (b) Copy and comp~ete the table, given below,
(a) E(X), (b) E(X 2 ), for the symmetncal probability distribution
E(X 2 ) = ~ x 2 P(X = x) (c) E(3X + 4), (d) Var(X). of X.
= 9 X~+ 25 X~+ 49 X~+ 81 X~+ 121 X~+ 169 X~
5. The random variable X takes values 2 4 6 8 4 6 8 10 12
= 78l and its probability distribution is repr~se'nt~d '.
the vertical line graph. m
•
3
2
Var(X) = E(X 2 ) - E (X)
= 78t- 8t' 0.5 (c) Calculate the variance of X. (NEAB)
= 8¥ 0.4
0.3 10. F~nd_Yar.(X) for each of the following probability
0.2 d1stnbutwns:
0.1 (a I
The following results relating to variance are useful. 0 +----1---l-----l----4~ X -3 -2 0 2 3
0 2 4 6 8'
P(X=x) o.:i 0.3 0.2 0.1 0.1
Find Var(X).
(b) 7
X 1 3 5 9
6· A roulette wheel is divided into six sectors of
unequal area, marked with the numbers 1 2 3 [
~
1 1 1
P(X-x)
4, 5, and 6. The wheel is spun and X is th~ ' '
6 6 4 6
14. If X is the random variable 'the sum of the scores 10 20 50 100 F(1) ~ P(X.;; 1) = 0.05
on two tetrahedral dice', where the score is the Value of X
number on which the die lands, find (a) E(X), p q F(2) = P(X,;; 2) = P(X = 1) + P(X ~ 2) ~ 0 05 + 0 4- 0 45
Probability 0.5 0.3
(b) Var(X), (c) Var(2X), (d) Var(2X + 3). F(3) =P(X.;;3)=0.75 . . - .
Given that X can only take the values 1, 2, 5 F(4) ~ P(X.;; 4) ~ 0.9
or 10, and that E(X) == 25, calculate
15. The discrete random variable X has probability F(5) ~P(XO) = 1
distribution as shown in the table. (i) the value of p and of q.
Find Var(2X + 3). (ii) the variance of X. Notice that F(5) give the total probability.
20 30 In a fairground game, a player rolls discs on The cumulative distribution function is
X 10 to a board containing squares, each of
0.6 0.3 which bears one of the numbers, 10, 20, 50 1 2 3 4 5
0.1
or 100. If a disc falls entirely within a
square, the player receives the same number F(x) 0.05 0.45 0.75 0.9 1
16. Two discs are drawn, without replacement, from of pence as the number in the square; if it
l.tlillitl. for tht.: d.i::crdc random -variahlv
a box containing three red discs and four white does not, the player does not receive
discs. The discs are drawn at random. If X is the anything. The probability that a player will cumuL1.·tl. "J.·.·
' ·1 -·
C·t,· !>, 1 fl_llJ[j_{HJ ,. ·
Tl.U!CtlO!l IS ·
random variable 'the number of red discs receive money from any given roll is i:-· If a
drawn', find (a) the expected number of red player does receive money, the probabilities
discs, (b) the standard deviation of X. of receiving lOp, 20p, SOp or £1 are the
same as those connected with the values of Sometimes F(x) can be given by a formula as in the following example.
17. Ten identically shaped discs are in a bag; two of X above. How many discs should a player
them are black, the rest white. Discs are drawn be allowed to roll for SOp, if the game is to
at random from the bag in turn and not replaced.
Example 4.15
be fair? (C Additional)
Let X be the number of discs drawn up to and
including the first black one. 21. {a) A man takes part in a game in which he
The discrete random variable X has cumulative distribution function F(x) = ~ for
X~ 1, 2, ... , 6. Write out the probability distribution and
6
List the values of X and the associated throws two fair dice and scores the sum of . suggest wh a t X represents.
two numbers shown. The rewards for the
theoretical probabilities.
Calculate the mean value of X and its standard scores are given in the following table.
deviation. What is the most likely value of X? other
Solution 4.15
12 10 7 5
If, instead, each disc is replaced before the next is Score
The cumulative distribution function is
drawn, construct a similar list of values and 16 6 3 5 0
Reward(£)
point out the chief differences between the two 1 2 3 4 5 6
Calculate the expected reward for a throw
lists.
of the two dice. 1 6 1 4 5
1
6 6 6 6 6
18. The discrete random variable X has p.d.f. {b) A bag contains five identical discs, two of
which are marked with the letter A and You can find the probability distribution from the table
J'(X ~x) ~ k[x[ three with the letter B. The discs are - 1 .
where x takes the values -3, -2, -1, 0, 1, 2, 3. randomly drawn, one at a time without PCX -1) ~ 6
Find (a) the value of the constant k, replacement, until both discs marked A are P(X ~ 2) ~ F(2)- F(1) = ~ _ j; = t
(b) E(X),
(c) E(X 2 )
obtained. Show that the probability that
three draws are required is fa.
P(X ~ 3) = FC 3l- F(2) ~?,-% = i and so on
(d) the standard deviation of X. Given that X denotes the number of draws The probability distribution is
required to obtain both discs marked A,
19. The random variable X takes integer values only copy and complete the following table. 1 2 3 4 5 6
and has p.d.f. 5 1
~
1 1 l
2 3 4
~x~~~b x~1,~3,~5 Value of X 6 6 6 6
J'(X~x)~k(10-x) x~6,7,8,9
Probability of X This is the uniform distribution, P(X ~ x) ~ l6 x ~ 1 2 6
X could b th h ' ' ' · •• , •
Find
(a) the value of the constant k, (b) E(X), Evaluate (i) E(X), (ii) E(X
2
)
..,,_,,_,,'"~---~o~- e e score w en a die is thrown.
(c) Var(X), (d) E(2X- 3), (e) Var(2X- 3). (iii) the variance of X. (C Additionafl
nction
Example 4.16 . h
~ ' J&re•~ mred<>m ""~ bl< X ''' ru:<>l "'" "'";""Coo fu•re:<><> fi" I " "; lowre
1. The probability distribution for the random 6. For a discrete random variable X the cumulative
F variable Y is shown in the table: distribution function is given by F{x) = kx,
x = 1, 2, 3. Find (a) the value of the constant k,
y 0.1 0.2 0.3 0.4 0.5 (b) P(X < 3),
\:(x) 0.2 0.32 0.67 0.9 1 0.05 0.25 0.3 0.15 0.25
(c) the probability distribution of X,
(d) the standard deviation of X.
Construct the cumulative distribution table. 7. The discrete random variable X has distribution
Find (a) P(X ~ 3), (b) P(X > 2). function F{x) where
2. For a discrete random variable R the cumulative
F(x)~1-(1-*x)" lorx~1,2,3,4
distribution function F'(r) is as shown in the
Solution 4.16 table: (a) Show that F(3) ~;~and F(2) ~ i.
(a) From the table, (b) Obtain the probability distribution of X.
~(r)
1 2 3 4 (c) Find E(X) and Var(X).
F(3) ~ P(X < 3) ~ P(X ~ 1) + P(X ~ 2) + P(X ~ 3) ~ 0.67 (d) Find P(X > E(X)).
F(2) ~ P(X < 2) ~ P(X ~ 1) + P(X ~ 2) ~ 0.32 I 0.13 0.54 0.75 1
8. The cumulative probabilities for X are given in
P(X ~ 3) ~ F(3)- F(2) Find (a) P(R ~ 2), (b) P(R > l), (c) P(R;;. 3), the following table, where X takes the values
~ 0.67- 0.32 ~ 0.35 (d) P(R < 2), (e) E(R). 0, 1' 2, ... 12.
Solution 4.17
(a) P(X < 5) ~ F(5) ~ 0.9327
(b) P(X > 3) ~ 1- P(X < 3) ~ 1-0.6477 ~ 0.3523
(c) P(3,;; X,;; 7) ~ F(7)- F(2) ~ 0.9941- 0.4049 ~ 0.5892
(d) P(X ~ 7) ~ P(X,;; 7)- P(X < 6) ~ 0.9941-0.9781 ~ 0.016
(e) P(X;, 8) d _ P(X <7) d - F(7) ~ 1- 0.9941 ~ 0.0059
T f-T'i :--' i 1/
I
The probability distribution for X+ Y is
TWO INDEPENDENT RANDOM VARIABLES 1 2 3 4 5
Notice that
E(X) + E(Y) ~ 1.3 + 2.2 ~ 3.5 ... <D
Yi
Var(X) + Var(Y) ~ 0.41 + 0.76 ~ 1.17 ... <P J:·;(riX Y)
Now consider the distribution X+ Y where X+ Y can take the values 1, 2, 3, 4, 5.
For example,
P(X + Y ~ 4) ~ P(X ~ 1 andY~ 3) + P(X ~ 2 andY~ 2)
~o.5 x o.5 +0.4 x 0.2
T
~ 0.33 Notice the+ sign here.
A tree diagram shows all the outcomes:
X+Y Probability
X
y Example 4.18
1 0.1 X 0.3 ~ 0.03
X and Yare independent random variables such that
2 0.1 X 0.2~0.02
E(X) ~ 10, Var(X) ~ 2, E(Y) ~ 8, Var(Y) ~ 3.
3 0.1 X 0.5 ~ 0.05
Find (a) E(5X +4¥), (b) Var(5X + 4Y), (c) Var(~X- Y), (d) Var(~X + Y).
2 0.5 X 0.3 ~ 0.15
H the observations an: imlq:,cndclll (a) Find the probability distribution of S, the sum of the two numbers obtained when the die
-\- c;;_2
is thrown twice, where S ~X 1 + X 2 and illustrate it by drawing a vertical line graph.
--~ n Find E(S) and Var(S).
(b) Find the probability distribution of D, where Dis double the number on which the die
lands when it is thrown once. Illustrate by drawing a vertical line graph.
Example 4.19 . f h ber of heads obtained when six coins are tossed. Find E(D) and Var(D).
Find the expectation and vanance o t e num .
Solution 4.20
Solution 4.19 1 o 1 (a) Consider the sum when the die is thrown twice and illustrate the outcomes on a possibility
. . tossed where X can take the va ues ' .
Let X be the number of headds when a co~n~s The p;obability distribution is space diagram.
First find the expectatton an vra_r_.w_n_c_e_o_ _· --;:----11 Scan take the values 2, 3, 4, 5, 6, 7, 8 and the outcomes (all equally likely) are shown in
0
\:(X~
the diagram:
x) os :5 \
E(X) ~ 0.5 (by symmetry)
E(X2) ~ 1 X 0.5 ~ 0.5 2
Var X) ~ E(X')- E2(X) ~ 0.5- 0.5 ~ 0.25 .
so ( X X + ... + X where y is the number of heads when scx heads are
Now consider Y = 1 + 2 6
tossed.
Var(Y) ~ 6 Var(X)
E(Y) ~ 6E(X) 2 3 4
~ 6(0.25) First throw, X1
~ 6(0.5)
~ 1.5
~3
The probability distribution of Sis:
The expected number of heads is 3 and the variance is 1.5.
~'lulL_
s 2 3 4 5 6 7 8
1 1
~
2 3 3 2
P(S~s) T6 T6 T6 T6 T6 T6
Combi of ra va
2
Var(S) = E(S 2 ) - E (S) 1. Ihdep_;ndent random variables X and yare such (e) Chonstruct the probability distribution for
=27.5-25 ~i~~E(X) = 4, E(Y) = 5, Var(X) = 1, Var(Y) = 2. t e random variable X - y
(f) Verify that E(X- Y) = E()C) _ E(Y).
=2.5 (a) E(4X + 2Y), (g) Venfy that Var(X- Y) = Var(X) + Var(Y).
(b) E(5X- Y),
As expected, (c) Var(3X + 2¥) 5. Rods of le~gth 2 m or 3 m are selected at
(d) Var(5Y- 3X): rando~ wtth probabilities 0.4 and 0 6
E(S) = E(X 1 + X 2 ) = 2E(X) = 5 respect1vely. ·
(e) Var(3X- 5¥).
Var(S) = Var(X 1 + X 2) = 2 Var(X) = 2.5
(a) Find the expectation and variance of the
2. Independent random variables X and y h
that E(X') = 14, E(Y') = 20, Var(X) = 1~re sue
(b) D is double the number on which the die lands, so D = 2X. length of a rod.
The probability distribution for D is (b) T:wo lengths are now selected at random
Var(Y) = 11. Find ' Fmd the expectation and variance of the.
8 (a) E(3X- 2Y), (b) Var(5X _ 2Y).
2 4 6 sum of the two lengths.
d (c) Three lengths are now selected at random.
0.25 3. Independent random variables X andy are such
0.25 0.25 S~ow that the probability distribution of y
0.25 that E(X) = 3, E(X') = 12, E(Y) = 4 E(Y') =
Fmdthevalueof ' 18. t e sum of the three lengths, is. '
2 4 6 8
E(D) = 5 (by symmetry) (a) E(3X- 2Y),
y 6 7 8 9
(b) E(2Y- 3X),
2 (c) E(6X+4Y),
E(D 2 ) = "£ d P(D =d) P(Y-y) 0.064 0.288 0.432 0.216
(d) Var(2X- Y),
= 0.25(4 + 16 + 36 + 64) (e) Var(2X + Y), and find E(Y) and Var(Y). Comment on
= 30 (f) Var(3Y + 2X). your results
2
Var(D) = E(D 2 ) - E (D) 4. Indepe~~ent ~an?om variables X and y have 6. Find t~e variance of the sum of the scores when
= 30-25 probabthty distnbutions as shown in the tables: an ordmary die is thrown ten times.
=5
7. X has a p.d.f. given by P(X = x) = kx
As expected, I. :(X-x) 003 1
0.2
2
0.4
3
0.1
X "' 1, 2, 3, 4. Find
(a) k,
'
For 11 otJSCfV<ltirms
It is important that you understand whether multiples or sums are being considered. Think
carefully about this point.
263
~ ~ ~
2
Var(aX +bY)= a 2 Var(X} + b Var(Y) 5
2
Var(aX- bY)= a 2 Var(X} + b Var(Y) I :(X-x) 25
99
7. The probability of there being X unusable (a) Show that P(B ~ 3) ~ fr.
matches in a full box of Surelite matches is given (b) Find the probability distribution of B.
(c) E(D} ~ :E1 dXP~+~ :)lc+ 3 X fa+ 4 X fa+ 5 Xfa+ 6 Xfa+ 7 X fa+ 8 X fa+ 9 X fo by P(X ~ 0) ~ 8k, P(X ~ 1) ~ Sk,
P(X ~2) ~P(X~ 3) ~ k, P(X~4) ~ 0.
(c)
(d)
Find E(B).
Show that P(R ~ 4) ~ ,",.
:::: 20 20
Determine the constant k and the expectation (c) Find P(R ~B). (L)
- _L(2 + 4 + 6 + 8 + 15 + 18 + 14 + 16 + 18) and variance of X.
- 20 2
Two full boxes of Surelite matches are chosen at
~s.os 2 2 16x2+25x-'o+36xfa+49xfa+64xfo+81xw random and the total number Y of unusable
11. The discrete random variable X has the
probability distribution given in the following
(d) E(D2) ~1xfa+4x 20 +9Xw+ 20 2
matches is determined. Calculate P(Y> 4), and table.
~ fo(2 + 8 + 18 + 32 + 75 + 108 + 98 + 128 + 162) state the values of the expectation and variance
~~ (C) 1 2 3 4
~31.55 2
Var(D} ~ E(D')- E2(D} ~ 31.55-5.05 8. Two unbiased four-sided dice, having the 0.4 0.3 0.1 0.2
numbers 1, 2, 3 and 4 on their faces, are thrown
~ 6.05[2 d.p.] together. The random variable D represents the Two independent observations of X arc made.
modulus of the difference between the numbers The value of the random variable Y is found by
on the two hidden faces. subtracting the smaller of the two values of X
(a) Show that P(D ~ 1) ~ ~- from the larger. If the two values of X are equal,
(b) Calculate the probability for each of the Y is zero. Show that P(Y == 1) = 0.34 and tabulate
other possible values of D. the complete probability distribution of Y.
(c) Calculate the expected value of D. (NEAB) Find
Miscellaneous exercise 4f . I ha ed six-faced die produces a
(a) E(Y),
Probability
0.15 0.34 0.27 0.14 0.10
p(x)
Find the mean and standard deviation of X.
Example 5.1
The discrete variable X is such that P(X = x ) -_ c for X= 20, 30, 45, 50. Find
(a) the probability distribution of X
(b) /l, the expectation of X '
(c) P(X <!<), '
(d) a, the standard deviation of X.
Solution 5.1
(a)
20 30 45 50
Special discrete probability distributions I :(X-x) c c c c
2: P(X =x) = 1
.. 4c= 1
In this chapter you will learn c = 0.25
" about the conditions needed to model a situation for a discrete variable using P(X = x) = 0.25 for x = 20, 30, 45, 50
" about the use of the Poisson distribution as an approximation to the binomial distribution
(c) P(X <Ill= P(X < 36.25)
® about the distribution of the sum of two or more independent Poisson variables
= P(X = 20) + P(X = 30)
= 0.25 + 0.25
=0.5
THE UNIFORM DISTRIBUTION (d) E(X 2 ) = 2: x 2 P(X = x)
Throw an ordinary die. The probability distribution of X, tbe number on the die, is shown in = 0.25(20 2 + 30 2 + 45' +SO')
= 1456.25
tbe table and illustrated by the vertical line graph. Var(X) = E(X 2 ) -ll'
I
= 1456.25- 36.25 2
= 142.1875
1 2 3 4 5 6 a=~142.1875 = 11.9(3 s.f.).
X
1
1 1 6 123456x
6 6
P(X = x) = i for x = 1, 2, 3, 4, 5, 6
THE GEOMETRIC DISTRIBUTION
This is an example of a discrete uniform distribution.
Plastic models of animals are given away in ackets of b
packet contains a model of a rabbit 1·s 0 .1 . Cponsr'der t h e reakfast cereal.
prob b'l't d' The
'b probability
. that a
num ber o f packets you open unti'l you get a ra bb'1t. . a 11 y 1stn utwn of X ' the
Conditions for a uniform model
For a situation to be described using a discrete uniform model, P(X = 1) = P(first packet contains a rabbit)= 0 1
~(~: 2): P(first doesn:t, second packet does)·= 0.9 x 0.1 = 0.09
• the discrete random variable X is defined over the set of n distinct values x 1, x 2 , ... , x 11 ( 3)- P(frrst doesn t, second doesn't, third packet does)= 0.9 x 0.9 x 0.1 = 0.081
• each value is equally likely to occur and
1
P(X = x) =- for r = 1, 2, ... , n
' n
T
I
"
Q
= p + qp + q2p
1 1 (' 2
X
"
~
0.6
5
=,+,x;;+ ,) x,1
0: 0.6
0.6 = 0.42 (2 s.f.)
0.4 Alternatively,
0.4
0.4
P(X <;; 3): P(success at some trial in the first three trials)
0.2 - 1 - P(no success m firSt three trials)
0.2
0.2 = 1-q'
0
=1-(~)'
X
0 X 0 1 2 3 4 = 0.42 (2 s.f.)
0 X 0 1 2 3 4 5 6
1 2 3 4 5 6 7
0
0.92",;;; 0.1
n log 0.92::;:;; log 0.1 T:-tki:1g iog~ to h~tS(' !U o!- both sid,_,~
(d) (X>3)=1-P(X<3)
=1-(1-q 3 ) log0.1
n;;,~~=
= q3 log0.92 lng (l,':l2 is r1cr;:ui' t', ad dividi:1ghy <1 nq;:ttivt· (.j'J:mtlty r~·vc·r~vs dh: inc'lll:-tliry.
= (t)J 1.e. n;;, 27.6 ...
= 0.58 (2 s.f.) The smallest value of n is 28, as before.
6. A random number machine generates random 13. In a computer game, the probability that the
digits between 0 and 9. Each of the ten digits is player hits the target is 0.4 for each attempt and
(b) X- Geo(0.2), i.e. p = 0.2, q = 0.8 equally likely to be generated. the result of each attempt is independent of all
4
P(X = 5) = q P (a) X is the value of the digit generated. others. Find
= 0.8 4 X 0.2 Find (a) the probability that he hits the target for the
0.08192 (i) P(X<6), first time on the fourth attempt,
(ii) P(X ~ 7), (b) the mean number of attempts needed to hit
(iii) E(X), the target,
{iv) the standard deviation of X. (c) the standard deviation of the number of
(b) X is the number of digits generated to the attempts,
Example 5.6 5 F d P(X = 1). first occurrence of a 5. (d) the most likely number of attempts to hit the
X_ Geo(p) and it is known that P(X = 2) = 0. 21 and P< O. · m Find target,
(i) the probability that the first occurrence (e) the probability that he takes more than
of the digit 5 is at the seventh number seven attempts to hit the target.
generated,
Solution 5.6 {ii) the most likely number of digits 14. Alice runs a stall at a fete in which each player is
generated to obtain a 5, guaranteed to win £10. Players pay a certain
P(X = 2)= qp where q = 1-p amount each time they throw a die and must
(iii) the mean number of digits generated to
0.21=(1-p)xp obtain a 5. keep throwing the die until a four occurs. When
so 2 a four is obtained, Alice gives the player £10.
0.21 = p- P
7. X~ Geo(0.5). Find On average Alice expects to make a profit of 50p
p2- p + 0.21 = 0
(a) the mode, per game. How much does she charge per throw?
(p- 0.3)(p- 0.7) = 0 (b) the mean of X,
p = 0.3 or p = 0.7 (c) the standard deviation of X. 15. During the winter in Glen Shee, the probability
Since p < 0.5, P = 0.3 that snow will fall on any given day is 0.1.
8. A darts player practises throwing a dart at the Taking 1 November as the first day of winter
P(X 1) p 0.3 hull's eye on a dart board. Independently for and assuming independence from day to day,
each throw, her probability of hitting the hull's find to two significant figures, the probability
eye is 0.2. Let X be the number of throws she that the first snow of winter will fall in Glen Shee
makes, up to and including her first success. on the last day of November (30th).
{a) Find the probability that she is successful for Given that no snow has fallen at Glen Shee
the first time on the third throw. during the whole of November, a teacher decides
{b) Write down the distribution of X and give not to wait any longer to book a skiing holiday.
8
You should find that C2 = 28. So there are 28 different arrangements of two who have blood
THE BINOMIAL DISTRIBUTION type B and six who do not have blood type B.
l d B If three people are selected at Therefore P(exactly 2 have type B)= 28 x 0.9 6 x 0.1 2 = 0.15 (2 s.f.)
In a particular population, 10% of people havbebbloohty pte x~ctly two of them have blood type B?
l . hatls the pro a I lty t a e Using a similar argument, you could find the probability that exactly two have blood type B
random from the popu anon, w h h bl od type of one person is
d t ndom assume t at t e o ') 0 9 in a randomly selected group of 12 people. In this case, ten will not have type B and
Since the people are se lecte a ra , ) - P(B) = 0.1, P(not type B)= P(B = ..
independent of that of another so P(type B - . P(exactly 2 have type B) = 12
c2 X 0.9 10 X 0.1 2 = 0.23 (2 s.f.)
To calculate the probability you could use a tree diagram.
The above three situations have been described using a binomial model.
~8
~
''·~8--------------- P(B B B') = 0.1 X 0.1 X 0.9 = 0.009* Conditions for a binomial model
'-'·'-' B' , , *
8----------- r.'.' B
P(B, B', B)= 0.1 X 0.9 X 0.1 = 0.009
(~B'~ For a situation to be described using a binomial model,
B'~B'~B If the above conditions are satisfied, X is said to follow a binomial distribution. This is written
X- B(n,p) or X- Bin(n,p)
~B'
NOTE: The number of trials, n, and the probability of success, p, are both needed to describe
Third
First Second
person
the distribution completely. They are known as the parameters of the binomial distribution.
person person
Writing P(failure) as q where q = 1 - p:
P(exactly two type B)= l' (B, B , B') + P(B B' , B)+ P(B', B, B)
= 3 X 0.9 X 0.1
2 ,
I the • '
1n n tn;:w.;
J •
1 nc , n'. so 8 8.
C2::::--
otherwise use the formu a r!(n _ r)! 2! 6!
On calculator:
[]]OJEJ[IIOJEl~OJB
Solution 5.8
When 12 people are selected, n = 12, p = 0.1, q = 0.9. X- B(5, p) and X takes the values 0, 1, 2, 3, 4, 5.
X is the number of successful outcomes in 12 trials, so X- B(12, 0.1). P(X = 0) = 5 C q5po = q5
0
(-,(,
P(X = 1) = 5c,q•p' = 5q.p'
P(X=2)= 12C,q'op' 2 P(X = 2) = 5 c 2 q3p' = 10q'p'
=66x0.9 10 x0.1
P(X = 3) = 5 C3q'p' = 10q'p'
=0.23 (2s.f.) 5C q'p• = 5q'p•
P(X = 4) = 4
P(X = 5) = 5 c 5qop5 = p
5
Example 5.7 ~!uti(';,; dl3t ~he- jlO\Wl'S ul,l; :\;\\_-\ C,l ~lc·,'rl
', '·'.'•'• tu 5 r-~'lch :imc.
At Sellitall Supermarket, 60% of customers pay by credit card. Find the probability that in a
randomly selected sample of ten customers, s terms5 q 5' 5q•p' ' ... , P5 are the terms in the b'morma· l expanswn
The . of (q + p)5
o q + 5q•p + 10q3p'
. + 10q'p3 + 5 q!p• + p5 =(q+p)5 .
(a) exactly two pay by credit card,
(b) more than seven pay by credit card. P(_X 0} F(X 1,\ P(X 2} J'(X---3} l-'(X 5}
But (q + P) 5 = 1, since q + p = 1,
Solution 5.7
X is the number of customers in a sample of ten who pay by credit card. :. P(X = 0) + P(X = 1) + ... + P(X = 5) = 1.
Consider 'paying by credit card' as success, p = 0.6, q = 1- p = 0.4. This confirms th a t th e tota1sum of the prob a b'l' . IS
11t1es . 1.
Assuming independence, a binomial model can be used, with n = 10,
so X- B(10, 0.6).
~~"-.~,,=~.
NOTE: Some- vertical
.
line graphs .illustrating the b'momial
.• . .
. dtstribution
. are given o npage 28 9.
·~· ...•
~~·-' ·-=
(a) P(X = 2) = 10 C q8 p2 No:ce ,:,, ,.,,: "id '" •·
2 2
= 45 X 0.4 8 X 0.6
Example 5.9
= 0.011 (2 s.f.)
(b) P(X > 7) = P(X = 8) + P(X = 9) + P(X = 10)
The = 3),variable X is distributed B( 7, 0.2). Fmd,
random
(a) P(X . correct to three decimal places,
= toc,q'p' + toc,q'p' + toc10 qopto
= 45 X 0.4 2 X
9
0.6 8 + 10 X 0.4
1
X 0.6 + 0.6
10
I (b) P(1<X<:4),
= 0.17 (2 s.f.) (c) P(X> 1).
Example 5.8
Five independent trials of an experiment are carried out. The probability of a successful
outcome is p and the probability of failure is 1- p = q.
Write out the probability distribution of X, where X is the number of successful outcomes in
five trials. Comment on your answer.
From the calculator, you find that log 0.9 ~ -0.045 ... , so divide both sides by log 0.9 and
(c) P(X>1) ~P(X~2)+P(X~3)+···+P(X~ 7 l reverse the inequality (as you are dividing by a negative quantity).
Rather than calculate all these terms, it is much quicker to find log 0.05
n>
P(X> 1) ~ 1-P(X< 1) log 0.9
~ 1- (P(X ~ 0) + P(X ~ 1)) n > 28.4 ...
~1-(q 7 + 7 C 1 q 6 P) The least value of n is 29, as before.
6
~ 1- (0.8 7 + 7 X (0.8) X 0.7)
~ 0.423 (3 d.p.)
() 0.2097 <---
Solution 5.10 .
1 X· h mber of faulty pens m n. 0.5767 <---
Let n be the number of pens you needbto see~t. ~slt ~ ~~(n 0.1) with P ~ 0.1, q ~ 0.9. 0.8520 <--- rr''
Assuming independence and using a momta mo e ' ' ' r-(x.
0.9667 <---
I
0.1
bus. week in September, it will rain on
From a sample of ten pupils chosen at random,
I find the probability that
(a) exactly two days,
I 0 1 2 3 4 5 ' (b) at least two days,
(a) only three travel by bus, (c) at most two days,
2 3 4 5 '
0 (b) less than half travel by bus. (d) exactly three days that are consecutive.
you can see that 2. In a survey on washing powder, it is found that 7. A fair coin is tossed six times. Find the
P(X = 0 IP= 0.3) = 0.17=P(X = 5IP =0.7) the probability that a shopper chooses Soapysuds probability of throwing at least four heads.
P(X = 11 p = 0.3) = 0.36 = P(X = 41 p = 0.7) is 0.25. Find the probability that in a random
8. Assuming that a couple are equally likely to
P(X =2IP = 0.3) =0.31 =P(X = 3IP = 0. 7 )
sample of nine shoppers
(a) exactly three choose Soapysuds, produce a boy or a girl, find the probability that
(b) more than seven choose Soapysuds. in a family of five children there are more boys
and so on. = O. 71
Also P(X dl P = 0.3) = 0 · 84 = P(X;;, 3 1p
than girls.
3. A bag contains counters of which 40% are red
and the rest yellow. A counter is taken from the 9. X is B(4, p) and P(X ~ 4) ~ 0.0256.
In general bag, its colour noted and then replaced. This is Find P(X ~ 2).
n r\.X-- .s /! •• 0.4
!t performed eight times in all.
,:; r\ X~ r\ X-~ _l " fJ)}
0.0168
Calculate the probability that 10. Charlie finds that when she takes a cutting from
1 i . I) a particular plant, the probability that it roots
o(,ft"·F [X·-· (a) exactly three will be red,
) r\X ~-· p)) 0.1064 successfully is!.
{b) at least one will be red,
2 0.3154 {c) more than four will be yellow. (a) She takes nine cuttings. Find the probability
0.5941 that
.l
Example 5.12 0.8263
4. The random variable X is B(6, 0.42). Find (i) more than five cuttings root
.j
6 successfully,
The random variable X is B( 8 , 0 · 1· . f X B(8 0 4) to find 5 0.9502 (a) P(X ~ 6), (b) P(X ~ 4), (c) P(X <: 2).
(ii) at least three cuttings root successfully.
Use the extract of the cumulative binomtal tables or - ' , 0.9915 5. An unbiased die is thrown seven times. Find the (b) Find the number of cuttings that she should
6
0.9993 probability of throwing at least 5 sixes. take in order to be 99% certain that at least
(a) P(X:>3) one cutting roots successfully.
s 1.0000
(b) P(X <: 2)
(c) P(X = 5)
Solution 5.13
16. 1% of light bulbs in a box are faulty. Using a
11. An experiment consists of taking seven shots at a binomial model, find the largest sample size X is B(4, 0.8) son~ 4 and p ~ 0.8.
target and counting the number of hits. which can be taken if it is required that the
The probability of hitting the target with a single probability that there are no faulty bulbs in the P(X ~ 0) ~ 0.2 4 ~ 0.0016
shot is 0.6. Using a binomial model, find the
probability that in seven attempts the target is hit
sample is greater than 0.5. P(X ~ 1) ~ 4 X 0.2 3 X 0.8 ~ 0.0256
Comment on the use of the binomial model in
at most twice. P(X ~ 2) ~ 4 C2 X 0.2 2 X 0.8 2 ~ 0.1536
this situation.
Give a reason why the binomial model may not P(X ~ 3) ~ 4 C, x 0.2 x o.8 3 ~ oAo96
be a good one to use in this situation. 17. In a test there are ten multiple choice questions. P(X ~ 4) ~ 0.8 4 ~ 0.4096
For each question there is a choice of four
12. In the mass production of bolts it is found that answers, only one of which is correct. A student The probability distribution for X is
5% arc defective. Bolts are selected at random guesses each of the answers.
and put into packets of ten.
A packet is selected at random. Find the
(a) Find the probability that he gets more than X 0 1 2 3 4
probability that it contains seven correct.
He needs to obtain over half marks to pass and P(X x) 0.0016 0.0256 0.1536 0.4096 0.4096
(a) three defective bolts,
(b) less than three defective bolts. each question carries equal weight.
E(X) ~ LxP(X ~ x)
(b) Find the probability that he passes the test.
Two packets are selected at random. : ~.~ 0.0016 + 1 X 0.0256 + 2 X 0.1536 + 3 X 0.4096 + 4 X 0.40 96
(c) Find the probability that there arc no 18. X~ B(n, 0.3). Find the least possible value of n
defective bolts in either packet. such that P(X ;>1) ~ 0.8.
E(X 2 ) ~ Lx 2 P(X ~ x)
13. A coin is biased so that it is twice as likely to 19. Given that X~ B(7, 0.85) use the cumulative ~ 1 X 0.0256 + 4 X 0.1536 + 9 X 0.4096 + 16 X 0 4096
show heads as tails. The coin is tossed five times.
Calculate the probability that
binomial probability tables on page 646 to write ~ 10.88 .
out the probability distribution of X.
(a) exactly three heads are obtained, Var(X) ~ E(X 2 ) - E 2 (X)
(b) more than three are obtained. 20. The random variable X is B(n, 0.6) and
P(X < 1) ~ 0.0256. Find the value of n. ~ 10.88- 3.22
14. The random variable X can be modelled by a ~ 0.64
binomial distribution with n = 6 and p = 0.5. 21. For each of the experiments described below,
state, giving a reason, whether a binomial Now np ~ 8 x 0.4 ~ 3.2 E(X)~np
Construct the probability distribution and
illustrate it graphically. Comment on the distribution is appropriate. npq ~ 8 X 0.4 X 0.6 ~ 0.64 .. Var(X) ~ npq
Experiment 1: A bag contains black, white anc.l
distribution. red marbles that are selected one at a time, with
15. The probability that a target is hit is 0.3. find replacement. The colour of each marble is noted.
the least number of shots which should be fired if Experiment 2: This experiment is a repeat of
the probability that the target is hit at least once experiment 1 except that the bag contains black Example 5.14
is greater than 0.95. and white marbles only.
State any assumptions that you have made. Experiment 3: This experiment is a repeat of The probability that it will be a fine day is 0.4. Find the expected b ff d .
experiment 2 except that the marbles are not week and also the standard deviation. num er o me ays m a
replaced after each selection. (L)
Solution 5.14
"-~ np
and cc npq where q ~·· 1 p ~2.8
These results can be quoted and should be learnt. They are illustrated iu the following Standard deviation of ~ ;/Var(X)
example. ~~
~ ;/7 X 0.4 X 0.6
Example 5.13 ~ 1.3 days (2 s.f.)
The random variable X is B(4, 0.8). Construct the probability distribution for X and find the
expectation and variance. Verify that E(X) ~ np and Var(X) ~ npq.
(b) Using X- B(4, 0.65) calculate the probabilities of 0 1
these by 500 to obtain the theoret!'cal f requenctes.
. ' ' 2, 3 and 4 heads and multiply
Example 5.15
X is B(n, p) with mean 5 and standard deviation 2. Find the values of nand p.
Frequency
X P(X~x) (nearest integer)
Solution 5.15 0 0.35 4 ~ 0,015 ... 8
therefore np ~ 5 CD 1 4 X 0.35 3 X 0.65 ~ 0.111 ...
E(X) ~np, 56
therefore npq ~ 2 ~ 4
2 @ 2 6 X 0.35 2 X 0.65 2 ~ 0.310 ... 155
Var(X) ~ npq,
3 4 X 0.35 X 0.65 3 ~ 0.384 192
Substituting for np in equation@
5q ~4
4 0.65 4 ~ 0.178 ... 89
q ~ 0.8
p~l-q Total500
So p~0.2
These compare reasonably well with tbe original distribution
n x 0.2 ~ 5 A statistical test to compare the two sets of data ' the x' t es t , 1s
.' 1'11 ustrate d on page 571.
Substituting for p in equation CD
n~ 25
2 3 4 Q 1 2 3 4 5 6 7 X 0 I 2 3 4 ' 0 1 2 3 4 5 6 '
Number of heads 0 1
151 200 87
12 50 p
Frequency
0.2
(a) From the experimental data, estimate the probability of obtaining a head when the coin is
0.1
tossed.
(b) Using a binomial distribution with the same mean, calculate tbe theoretical probabilities
0.1
of obtaining 0, 1, 2, 3 and 4 heads. 0 1 2 3 4 5 6 7 8 9 10 11 12(->-20) X
Probabilities too
0 1 2 3 4 5 6 7 8 gX small to illustrate.
Solution 5.16
(a) For the frequency distribution,
The mode of the binomial distribution
"Lfx 1300
mean~x~--~--~2.6
"Lf 500 i:e mode is the value of X that is most likely to occur.
om the probablhty distribution sketches above, it can be seen that
Let X be the number of heads in four tosses. Then X- B(4, p).
• when p. - 0 .5 an d n lS
· o dd ' there are t'wo modes
For a distribution with the same mean, 4p ~ 2.6 • otherwise the distribution has one mode. '
p ~ 0.65
An estimate of the probability that tbe coin shows heads is 0.65.
The mode
highest can be found
probability Th' by
. hcalculating '1' · and finding value of X with the
. all the pro b.a b!Illes
· IS IS owever very tedious· lt is us 11 1
pro abilities of values of X c1 t h ' ua Yon Y necessary to consider the
b ose o t e mean np.
T
10. Seeds are planted in rows of six and after 14
?ays the number of seeds which have germinated (a) Calculate, to two significant figures, the
Example 5.17 111 each of the 100 rows is noted. probability th~t, in any one sample, two
The probability that a student is awarded a distinction in the Mathematics examination is 0.05. The results are shown in the table: bolts or less will be faulty.
(b) Find the expected value and the variance of
In a randomly selected group of 50 students, what is the most likely number of students Number of seeds the number of bolts in a sample which will
germinating 0 1 2 3 4 5 6 not be faulty. (L Additional)
awarded a distinction?
Number of rows 2 1 2 10 30 35 20 14. An experiment consists of taking 12 shots at a
target and counting the number of hits.
Solution 5.17 Find the theoretical frequencies of 0, 1, ... , 6 When this e;cperiment was repeated a large
X is the number of students who are awarded a distinction in 50, so X- B(50, 0.05). seeds g~rminating in a row, using the associated number of ttmes the mean number of hits was
theorettcal binomial distribution. found to be 3. Calculate
E(X) ~ np ~50 x 0.05 ~ 2.5, so calculate the probabilities for values of X near 2.5. (a) the probability of hitting the target with a
11. Each day a bakery delivers the same number of single shot,
P(X ~ 1) =50 0.95 49 0.05 ~ 0.202 ... loaves to a certain shop which sells on average
X X (b) the standard deviation of the number of hits
98% of them. Assuming that the n~mber of '
P(X ~ 2) ~ 5°C, X 0.95 48
X 0.05 2 = 10.2611 .. . loaves sold per day has a binomial distribution
in an experiment. (C Additional)
with a standard deviation of 7, find the number
P(X ~ 3) ~ 5°C3 X 0.95 47 X 0.05 3 ~ 0.219 .. . 15. In an experiment a certain number of dice are
of loaves the shop would expect to sell per day. thrown and the number of sixes obtained is
From the list, you can see that the value of X with the highest probability is 2. IC Additional) recorded. The dice are all biased and the
12. In a large batch of items from a production line probability of obtaining a six with each individual
The most likely number of students awarded a distinction in a group of 50 is two. the probability that an item is faulty is p. die is fJ. In all there were 60 experiments and the
400 samples, each of size 5, are taken and the results are shown in the table.
number of faulty items in each batch is noted.
From the frequency distribution below estimate p
Number of sixes
and work out the expected frequencies of 0 '] 2
of binomial di on 3, 4, 5 faulty items per batch for a theoreti~al' ' obtained in an
Exercise 5c Expectation, variance binomial distribution having the same mean. experiment 0 2 3 4 >4
In a large number of experiments the standard Frequency 19 26 12 2 1 0
1. 10% of the articles from a certain production Number of
deviation of the number of sixes is 1.5.
line are defective. A sample of 25 articles is faulty items 0 1 2 3 4 5
Calculate the value of p and hence determine, to
taken. Find the expected number of defective Calculate the mean and the standard deviation of
two places of decimals, the probability that
items and the standard deviation. Frequency 297 90 10 2 1 0 these data.
exactly three sixes arc recorded during a
By comparing these answers with those expected
2. The probability that an apple picked at random particular experiment. (C)
for a binomial distribution, estimate
from a sack is bad is 0.15. 7. In a certain African village, 80% of the villagers J 3. On a~era.ge 20% of the bolts produced by a
machme m a factory arc faulty. Samples of ten (a) the number of dice thrown in each
(a) Find the standard deviation of the number are known to have a particular eye disorder. experiment,
Twelve people are waiting to see the nurse. bolts are to ?c selected at random each day.
of bad apples in a sample of 15 apples. Each bolt wt!l be selected and replaced in the set (b) the value of p. (C Additional)
(b) What is the most likely number of bad (a) What is the most likely number to have the of bolts which have been produced on that day.
apples in a sample of 30 apples? eye disorder?
(b) Find the probability that fewer than half
3. The random variable X is B(n, 0.3) and have the eye disorder.
E(X} = 2.4. Find nand the standard deviation
of X. 8. In a bag there arc six red counters, eight yellow
counters and six green counters. An experiment
4. In a group of people the expected number who consists of taking a counter at random from the THE POISSON DISTRIBUTION
wear glasses is two and the variance is 1.6. bag, noting its colour and then replacing it in the
Find the probability that bag. This procedure is carried out ten times in Consider these randmn variables
all. Find
(a) a person chosen at random from the group
wears glasses, (a) the expected number of red counters drawn, * the number of emergency calls received by an ambulance control in an hour
(b) six people in the group wear glasses. (b) the most likely number of green counters e the nun1ber of vehicles approaching a motorway toll bridge in a five-minute' interval
drawn, e the number of flaws in a metre length of material, '
5. The random variable X is B(10, p) where p < 0.5. (c) the probability that no more than four
yellow counters arc drawn. e the number of white corpuscles on a slide.
The variance of X is 1.875. Find
(a) the value of p, 9. The random variable X is distributed binomially Assuming :hat each ~ccu.rs randomly, they are all examples of variables that can be modelled
(b) E(X), with mean 2 and variance '1 .6. Find usmg a Potsson distnbutiOn.
(c) P(X~2). (a) the probability that X is less than 6,
(b) the most likely value of X.
6. A die is biased and the probability, p, of
throwing a six is known to be less than t. An
experiment consists of recording the number of
sixes in 25 throws of the die.
T
I
A_ X 8'
Using P(X ~ x) ~ e-'- with ,1. ~ 4, P(X ~ 5) ~ e- 8 -
x! 5!
~ 0.0916 (3 s.f.)
45
(a) P(X~5)~e- 4 - (b) Let Y be the number of breakdowns in a day.
5!
The mean number of breakdowns in a day is!~ 1.6, soY~ Po(1.6).
~ 0.156 (3 s.f.)
P(Y ~ 1) ~ 1.6e-1. 6
40
(b) P(X~O)~e- - 4 ~ 0.323 (3 s.f.)
0!
~ 0.183 (3 s.f.) (c) Let F be the number of breakdowns in a fortnight.
The mean number of breakdowns in a fortnight is 2 x 8 ~ 16, so F ~ Po(16).
(c) P(X <3) ~ P(X ~ 0) + P(X ~ 1) + P(X ~ 2)
40 41 42 16 8
=e-4-+e-4-+e-4- P(F~ 8) ~ e- 16 -
0! 1! 2! 8!
~ e- 4 (1 + 4 + 8) ~ 0.0120 (3 s.f.)
~ 13e-4
~ 0.238 (3 s.f.)
NOTE:
Mean and variance of the Poisson distribution
40
o P(X~O)~e- 4 - but4°~1 andO!~l,soP(X~O)~e- 4
0! The mean number of occurrences in the interval, A, is all that is needed to define the
4' distribution completely; A. is the only parameter of the distribution.
"P(X~l)~e- 4 - but4 1 ~4andl!~l,soP(X~1)~4e-
4
1! In a Poisson distribution, it is obvious that the mean, E(X) ~A., but it is also the case that
Var(X) ~A.. The following should be learnt:
These two results are useful in general H
If X·-
'! \
then 0\ and -'-·'
T
294 /-\ !j)i'-!( !
(c) P(X:>3)~1-P(X<:2)
Example 5.20 ~ 1-0.7834
X follows a Poisson distribution with standard deviation 1.5. Find P(X :> 3). ~ 0.217 (3 d.p.)
(d) X takes the values 0, 1, 2, ... ,to infinity, but from the tables, P(X <; 8) ~ 1.0000 to four
Solution 5.20 decimal places. This implies that for values of X greater than 8, the probabilities are very
small, so to three decimal places, P(X ~ 10) ~ 0.000.
If X- Po(A) then Var(X) ~A.
2
But Var(X) ~(standard deviation) 2 ~ 1.5 ~ 2.25,
. 1.610
In fact, usmg the formula, P(X ~ 10) ~ e-1. 6 x ~-
10! ~ 0 . 000 006 117 ...
so A~ 2.25 and X- Po(2.25).
If P(X > n) < 0.01
P(X :> 3) ~ 1 - P(X <3) 1- P(X <; n) < 0.01
~ 1- (P(X ~ 0) + P(X ~ 1) + P(X ~ 2)
P(X <; n) > 0.99
~ 1- e-2.25(1 + 2 .25 + 2.252)
2!
From tables P(X <; 4) ~ 0.9763 < 0.99
P(X <; 5) ~ 0.9940 > 0.99
~ 1-0.6093 ... The smallest integer n is 5.
~ 0.391 (3 s.f.)
Using cumulative Poisson probability tables Diagrammatic representation of the Poisson distribution
Notice that for small values of A, the distribution is very skew but it becomes more
If you have access to these tables you may wish to use them to calculate probabilities. The
symmetrical as A increases. '
tables are printed on page 647. As with the cumulative binomial tables, they give P(X <: r) for
p X- Po(!)
various values A, where X- Po(A).
Here is an extract for Po(1.6). p X- Po(l.6)
t.C p
X- Po(2J
Example 5.21
Given that X- Po(l.6), use cumulative Poisson probability tables to find, to tbree decimal 0123456 X
Q 1 2 3 4 5 6 7 X
places,
p X- Po(2.2)
(a) P(X <; 6),
0.3
(b) P(X ~ 5),
p X- Po(3) p X- Po(3.8)
(c) P(X :> 3),
(d) P(X ~ 10).
Find also the smallest integer n such that P(X > n) < 0.01. 0.2 0.2 0.2
!
I
Solution 5.21
Using the table printed above, 0.1 ! 0.1 0.1
ji
'!i'·
~ 0.9940- 0.9763 0 1 2 3 4 5 6 7 8 9 X
012345678X
~ 0.018 (3 d.p.)
Solution 5.22
p X- Po(5)
f-(x 156
(a) x~ f-( ~ ~ 1.04
150
0.2
X- Po(lO)
p (b) Let X be the number of e-mails received in a day. For a Poisson distribution with the same
mean, use X- Po(1.04) and calculate the probabilities of 0, 1, 2, 3, 4, ... e-mails. Multiply
0.1 0.1 these by 150 to obtain the theoretical frequencies.
0 ~ 0.3534 ... 53
o 2 4 5 8 10 12 14 15 18 w•
1 1.04e·t.04 ~0.3675 ... 55
~
The mode of the Poisson distribution 2 0.1911 ... 29
-104 1.043
The mode is the value of X that is most likely to occur, i.e. the one with the greatest 3 e · x-- ~ 0.0662 ... 10
3!
probability. -104 1.044
4 e · x-- ~ 0.0172 ... 3
From the diagrams, you can see that 4!
>4 1- P(X,; 4) ~ 0.000 431 ... 0
when A== 1, there are two modes, 0 and 1,
when A ~ 2, there are two modes, 1 and 2, Total150
when A~ 3, there are two modes, 2 and 3. These compare reasonably well with the original distribution.
_[n if ,;l i~ an there are t'vVO /t 1 and L
A statistical test to compare the two sets of data, the x2 test, is illustrated on page 573.
For example, if X- Po(8), the modes are 7 and 8.
Notice also that
when A~ 1.6, the mode is 1,
when A ~ 2.2, the mode is 2,
when A~ 3.8, the mode is 3.
_j_n if), is not an , the nwdc ]., the in''''"''' -bc-lo"\v
se ·-sn'·l ;·.1
:::,'- U l '-·
1. An insurance company receives on average two 3. The number of bacterial colonies on a petri dish
For example, if X- Po(4.9), the mode is 4. claims per week from a particular factory. can be modelled by a Poisson distribution with
Assuming that the number of claims can be average number 2.5 per cm 2 •
modelled by a Poisson distribution, find the Find the probability that
Fitting a theoretical distribution to practical data probability that it receives
(a) in 1 cm 2 there are no bacterial colonies,
(a) three claims in a given week, (b) in 2 cm 2 there are more than four bacterial
As with the binomial distribution it is possible to fit a theoretical Poisson distribution to (b) more than four claims in a given week, colonies,
(c) four claims in a given fortnight, (c) in 4 cm 2 there arc six bacterial colonies.
experimental data. (d) no claims on a given day, assuming that the
factory operates on a five-day week. 4. On a particular motorway bridge, breakdowns
occur at a rate of 3.2 a week. Assuming that the
Example 5.22 2. A sales manger receives six telephone calls on number of breakdowns can be modelled by a
I recorded the number of e-mails I received over a period of 150 days with the following average between 9.30 a.m. and J 0.30 a.m. on a Poisson distribution, find the probability that
weekday. Find the probability that
results: (a) fewer than the mean number of breakdowns
(a) she will receive two or more calls between occur in a particular week,
0 1 2 3 4 9.30 a.m. and 10.30 a.m. on Tuesday, (b) more than five breakdowns occur in a given
Number of e-mails
(b) she will receive exactly two calls between fortnight,
51 54 36 6 3 9.30 a.m. and 9.40 a.m. on Wednesday, (c) exactly three breakdowns occur in each of
Number of days
(c) during a five-day working week, there will four successive weeks.
(a) Find the mean number of e-mails per day. be exactly three days on which she receives
(b) Calculate the frequencies of the Poisson distribution having the same mean. no calls between 10.00 a.m. and 10.10 a.m.
TI
!
11. F or eac
h of the following sets of data, fit a USING THE POISSON DISTRIBUTION AS AN APPROXIMATION TO THE
Cars arrive at a petrol station at an averag~ rate 'b · 'th the same
theoretical Poisson distn utton wt BINOMIAL DISTRIBUTION
5. of 30 per hour. Assumin~ ~hat the cars arnve at
mean
random, find the probab1hty that
(a) 1 2 3 4 5
(a) no cars arrive during a particular X 0
five-minute interval, . . 12 7 1
110 50 20 X
(b) more than three cars arnve dunng a f
five-minute interval, .
(c) more than five cars arrive in a 15-mmute t;an be ''ll'fHU.cdHEHcu a Poi<;.son dist.rilmt.i.on V•/ith the sdmc m.c:-l.fl; Le., X"-
(b) 1 2 3 4
X 0 T'hc get:, better n gets and jJ get;.; !)maUer.
interval, .
(d) in a half hour period, ten cars arnve, . 44 20 8 3
f 45
{e) fewer than three cars arrive in a ten-mmute
interval. Example 5.23
12. A finn investigated the number of employ~es
6. Flaws occur randomly in a rolll of ~bric at an suffering injuries whilst at work. The resu tl Eggs are packed into boxes of 500. On average 0. 7% of the eggs are found to be broken when
average rate of 1.5 per metre engt . recorded below were obtained for a 52-wee c
the eggs are unpacked. Find, correct to two significant figures, the probability that in a box of
(a) Find the probability that in a randomly h period: 500 eggs,
chosen one-metre length there are more t an
Number of employees
two flaws. d l Number of weeks (a) exactly three are broken,
(b) Find the probability that in a ran om y injured in a week
(b) at least two are broken.
chosen two-metre length there are no flaws. 31
(c) What is the standard deviation of the 0
1 17
number of flaws in a four-metre length? Solution 5.23
2 3
The number of calls made to a.He~lth C_entre can 1 Let X be the number of broken eggs in a box of 500.
' be modelled by a Poisson distnbutlO~ wtth
7 3
0 P(egg is broken)~ 0.007, so X- B(500, 0.007).
standard deviation 2 per five-minute_ mter~al. 4 or more
Find the probability that in a given hve-mthute
Give reasons why one might expect _thts E(X) ~ np ~ 500 x 0.007 ~ 3.5
interval the number of calls is more than t e
distribution to approximate to a Potssor;t f Since n >50 and p < 0.1, use a Poisson approximation, X - Po(3.5).
average' for a five-minute interval.
distribution. Evaluate the m~an_and vanance o
The average number of misprints~_on each page in the data and explain why thts gtv_es f_urt~er 3.5 3
evidence in favour of a Poisson dtstrtbutt:m. (a) P(X ~ 3) ~ e-3·5 -
8. the first draft of a novel is four. hnd the bl
Usin the calculated value of the me~n, ~md_ the 3!
probability that on a randomly selected dou e
theo~etical frequences of ~ Pois~on dtstnbutt~or ~ 0.22 (2 s.f.)
page for the number of weel~s _m whtch 0, 1, 2, 3, (C)
(a) there are three :nisprin~s on each page more, employees were lllJUred. (b) P(X:> 2) ~ 1- (P(X ~ 0) + P(X ~ 1))
(b) there are six mtspnnts m total. ~ 1- (e-3·5 + 3.se-3 ·5 )
13. Along a stretch of motorway, breakdowns ~ 0.86 (2 s.f.)
The number of goals scored in a ma~ch by . require the summoning of the breakdown
9. Random Rovers can be modelled usmg a ~msson services occur with a frequency of 2.4 per day,
distribution. The probability, to thle~ d~cl~tl on average. Assuming that the breal~downs occur
places that the team scores no goa s ts · d : randomly and that they follow a Pmsson
Given,that the mean number of goa~s_score m a Example 5.24
distribution, find
match is an integer, find the proba~thty that~he
(a) the probability that there will be exactly two A Christmas draw aims to sell5000 tickets, 50 of which will win a prize.
team scores fewer than three goals m a mate .
breakdowns on a given day,
(b) the smallest integer n such that the
10. The number of accidents occurring_ in _a w~ek in a
. (a) A syndicate buys 200 tickets. Let X represent the number of these tickets that win a prize.
. f ac t o ry follows a Poisson dtstnbutwn probability of more than n breakdowns m a (i) Justify the use of the Poisson approximation for the distribution of X.
certam
with variance 3.2. Find day is less 0.03. (ii) Calculate P(X <;; 3).
(a) the most likely number of accidents in a
(b) Calculate how many tickets should be bought in order for there to be a 90% probability
given week, 'd
(b) the probability that exa~tly seven acct ents of winning at least one prize. (C)
happen in a given fortmght.
Solution 5.24
P(a ticket wins a prize)~ s58o ~ 0.01
(a) Let X be the number of these tickets that win a prize.
Strictly speaking you do not have independent trials, but since n is very large, X can be
considered to be modelled by a binomial distribution where X- B(200, 0.01)
T
I 5. An aircraft has 116 seats. The airline has found, 8. A manufacturer has found that 3% of seeds
E(X) = np = 200 x 0.01 = 2. from long experience, that on average 2.5% of produced do not germinate. Using a Poisson
people who have bought tickets for a flight do approximation, find, to two significant figures,
(i) Since n >50 and p < 0.1, use a Poisson approximation, X- Po(2). not arrive for that flight. The airline sells 120 the probability that in a pack containing 150
tickets for a particular flight. seeds,
(ii) P(X<; 3) = P(X = 0) +P(X = 1) + P(X =2) + P(X = 3)
22 23 {a) Calculate, using a suitable approximation, (a) more than four fail to germinate,
= e-2 + 2e-2 +- e-2 +- e-2 the probability that more than ll6 people (b) at least 145 germinate
2! 3! arrive for the flight.
{b) Calculate also the probability that there are 9. X is B(250, p). The value of pis such that it is
= 0.86 (2 s.f.)
empty seats on the flight. (C) valid to apply a Poisson approximation. When
this is done, it is found that P(X = 0) = 0.0235.
(b) Let X be the number of these tickets that win a prize inn tickets, so X- B(n, 0.01) and 6. In a large town one person in 80, on average, has Find the value of p.
E(X) = np = 0.01n blood of type X. If 200 blood donors are
Assuming n >50 and p < 0.1, use X- Po(0.01n). sampled at random, find an approximation to 10. The probability that I dial a wrong number when
the probability that they include at least five making a telephone call is 0.015. In a typical
You want P(X> 1) = 0.9 people with blood type X. week I will make 50 telephone calls. Using a
But P(X > 1) = 1- P(X = 0) How many donors must be sampled in order that Poisson approximation to a binomial model find,
= l _ e-0.01n the probability of including at least one donor of correct to two decimal places, the probability
type X is 90% or more? (AEB) that in such a week,
So 0.9 = 1- e-o.otn
e-0.01n = 0.1 (a) I dial no wrong numbers,
7. A lottery has a very large number of tickets, one
(b) I dial more than two wrong numbers.
in every 500 of which entitles the purchaser to
Taking logs to base e prize. An agent sells 1000 tickets for the lottery. Comment on the suitability of the binomial
Using a Poisson approximation, find, to three model and of the Poisson approximation. (C)
-0.01n =In 0.1 decimal places, the probability that the number
ln 0.1 of prize-winning tickets sold by the agent is 11. A newspaper reports that 8.6% of adults in the
n=- 0.01 U.K. painted the outside of their houses.
(a) less than three A sample of 55 adults in the U.K. was selected.
n = 230.25 ... (b) more than five. Stating any necessary assumptions, show that the
Calculate the minimum number of tickets the number in the sample that painted the outsides
So the least integer value of n must be 231. agent must sell to have a 95% chance of selling of their own houses can be approximated by a
23 at least one prize-winning ticket. (NEAB) Poisson distribution.
Check: If n = 230 np = 230 x 0.01 = 2.3 and 1- e- . = 0.8997 ··· < 0.9 Using this approximation, find the probability
If n = 231: np = 231 x 0.01 = 2.31 and 1- e-231 = 0.9007 ... > 0.9 that fewer than four people in the sample painted
the outsides of their own houses. (C)
231 tickets should be bought.
Note that n can be found by trial and improvement methods if logarithms are not used.
~ 1- e-
0
~~ )
2
(O&C)
06 lornes IS three many 30-minute period.
(1 + 0.6 +
Find the probability 4. A restaurant kitchen has two food mixers A and
~ 0.023 (3 d.p) (a) t~at
there will be no lorries passing Pin a B. The number of times per week that A breaks
giVen ten-minute period, down has a Poisson distribution with mean 0.4
(b) that at least one lorry from each direction while independently the number oftimes that B
will pass P i~ a given ten-minute period, breaks down in a week has a Poisson distribution
(c) th?t the~e will be ~xactly four lorries passing with m~~n 0.1. Find, to three decimal places, the
Example 5.26 probability that in the next three weeks
P m a given 20-mmute period. (0 & C)
The centre pages of the Weekly Sentinel consist of a page of film and theatre reviews and a (a) A will not break down at all
page of classified advertisement. The number of misprints in the reviews can be modelled (b) each mixer will break down 'exactly once
using a Poisson distribution with mean 2.3 and the number of misprints in the classified (c) there will be a total of two breakdowns. '(L)
section can be modelled by a Poisson distribution with mean 1. 7.
Using cumulative Poisson probability tables, find
(a) the probability that on the centre pages there will be more than five misprints,
(b) the smallest integer n such that the probability that there are more than n misprints on the
centre pages is less than 5%.
Solution 5.26
Let X be the number of misprints in the reviews, then X- Po(2.3)
Let Y be the number of misprints in the classified advertisements, then Y- Po(l.7)
Let T be the total number of misprints on the centre pages, then T ~ X+ Y and
T- Po(2.3 + 1. 7), i.e. T- Po(4 ). / 4.!1 FiX, , )
The cumulative tables are printed on page 647 and the relevant ------+---
extract is shown here: 0 0.0183
0.0916
0.2381
(a) P(T > 5) ~ 1 - P(T 0) 0.4335
~ 1- 0.7851 0.6288
~ 0.215 (3 d.p.) 0.7851
0.8893
(b) You need the smallest value of n such that 0.9489
R 0.9786
P(T > n) < 0.05 9 0.9919
I.e. 1- P(T< n) < 0.05 lU 0.9972
so P(T< n) > 0.95 JJ 0.9991
'11 0.9997
From the tables, P(T< 7) ~ 0.9489 < 0.95
Jj 0.9999
P(T < 8) ~ 0.9786 > 0.95 1.0000
14
15
The smallest value of n is 8.
T
'
I
Miscellaneous worked examples
Mixed test 58
Mixed test 5A
1. In practising the high jump a certain athlete has 4. The number of customers entering a certain
4. A geography student is studying the distribution
1. A series of n experiments is carried out and in five attempts at a particular height. The branch of a bank on a Monday lunchtime may
of telephone boxes in a large rural area wher;
each experiment the only possible outcomes are probability that she succeeds at any one attempt be modelled by a Poisson distribution with mean
there is an average of 300 boxes per 500 km . A
'success' and 'failure'. The total number of is p. Find an expression, in terms of p, for the 2.4 per minute.
map of part of the area is divided into 50
successes is denoted by X. State two conditions probability that she succeeds
squares, each of area 1 km 2, and the student (a) Find the probability that, during a particular
which must be satisfied for the distribution of X minute, four or more customers enter the
wishes to model the number of telephone boxes (a) exactly four times,
to be modelled by a binomial distribution. branch.
per square. (b) exactly two times.
Gromit invites 11 friends to a party. For each
friend, the probability that he or she will accept (a) Suggest a suitable simple model the student The probability that she succeeds exactly four The probability that a customer, who enters the
the invitation may be taken to be j. Use a could use and specify any parameters times is twice the probability that she succeeds branch, intends to open a new account is 0.002
binomial distribution to calculate the probability required. exactly two times. Find the value of p. {C) and is independent of the intentions of other
that customers. During a particular morning 450
One of the squares is picked at random.
2. Before starting to play the game 'Snakes and customers enter the bank.
{a) exactly nine, (b) Find the probability that this square does Ladders' each player throws an ordinary
(b) fewer than nine, (b) Use a suitable approximation to find the
not contain any telephone boxes. unbiased die until a six is obtained. The number
probability that three or fewer of these 450
of the friends will accept the invitation. (c) Find the probability that this square of throws before a player starts is the random customers intend to open new accounts.
Give a reason why a binomial distribution might contains at least three telephone boxes. variable Y, where Y takes the values 1, 2, 3, .... (AEB)
not be a good model in this situation. (C) The student suggests using this model on another {a) Name the probability distribution of Y,
map of a large city and surrounding villages. stating a necessary assumption. 5. A process for making plate glass produces small
2. The weekly number of detached dwellings sold
(d) Comment, giving your reason briefly, on the (b) Find Var(Y). bubbles {imperfections) scattered at random in
by an estate agent may be modelled by a Poisson
suitability of the model in this situation. (L) (c) Two people play Snakes and Ladders. the glass, at an average rate of four small bubbles
distribution with mean 2.75 and, independently,
Calculate the probability that they will each per 10m 2 •
the weekly number of other dwellings sold may
5. A crossword puzzle is published in The Times need at least five throws before starting. (C) Assuming a Poisson model for the number of
be modelled by a Poisson distribution with
each day of the week, except Sunday. A woman small bubbles, determine, to three decimal
mean 3.25. 3. State, giving your reasons, the distribution which places, the probability that a piece of glass
Determine the probability that the estate agent is able to complete, on average, eight out of ten
of the crossword puzzles. you would expect to be appropriate in describing 2.2 m x 3.0 m will contain
sells
(a) Find the expected value and the standard (a) the number of heads in ten throws of a (a) exactly two small bubbles,
(a) exactly four detached dwellings in a week,
deviation of the number of completed penny, (b) at least one small bubble,
(b) between ten and 15, inclusive, detached
crosswords in a given week. (b) the number of blemishes per square metre of (c) at most two small bubbles.
dwellings over a four-week period,
(b) Show that the probability that she will sheet metal. Show that the probability that five pieces of
(c) fewer than five dwellings in a week. (NEAB)
complete at least five in a given week is A building has an automatic telephone exchange. glass, each 2.5 m by 2.0 m, will all be free of
3. In one part of the country, one person in 80 has 0.655 (to three significant figures). The number X of wrong connections in any one small bubbles is e-io.
blood of Type P. A random sample of 150 blood {c) Given that she completes the puzzle on day is a Poisson variable with parameter A. Find, Find, to three decimal places, the probability that
donors is chosen from that part of the country. Monday, find, to three significant figures, in terms of A, the probability that in any one day five pieces of glass, each 2.5 m by 2.0 m, will
Let X represent the number of donors in the the probability that she will complete at there will be contain a total of at least ten small bubbles. (L)
sample having blood of Type P. least four in the rest of the week.
(d) Find, to three significant figures, the (c) exactly three wrong connections,
(a) State the distribution of X. Find the probability that, in a period of four weeks, (d) three or more wrong connections.
parameter of the Poisson distribution which she completes four or less in only one of the Evaluate, to three decimal places, these
can be used as an approximation. Give a four weeks. (C) probabilities when A= 0.5. Find, to three decimal
reason why a Poisson approximation is places, the largest value of A for the probability
appropriate. of one or more wrong connections in any day to
(b) Using the Poisson distribution, calculate the
be at most-!. (L)
probability that in the sample of 150 donors
at least two have blood of Type P.
(c) A hospital urgently requires blood of
Type P. How large a random sample of
donors must be taken in order that the
probability of finding at least one donor of
Type P should be 0.99 or more. (MEI)
T Example 6.1
X is the delay, in hours, of a flight from Chicago, where
f(x) = 0.2- 0. 02x, 0 <: x <: 10
Find
(a) the probability that the delay will be less than four hours
(b) the probability that the delay will be between two and si~ hours.
Solution 6.1
It is useful to draw a sketch of f(x).
Probability distributions II Note that since f(x) is valid for 0 <: x <: 10, the delay can be between 0 and 10 hours.
continuous variables f{x)
0.2
how to find (a) The probability that the delay will be less than four hours is given by the area under th
- the expectation, E()() of the continuous random variable, X curve between 0 and 4. e
- the expectation of any function of X Method 1 - using geometry
- the variance of X In this example
. it is easy to calculate the area using A = 1z (a+ b)h , th e f ormu1a f or the area
- the mode o f a t rapez1um.
about the cumulative distribution function, F(x) a= 0.2, h = 4 f(X)
how to find the median, quartiles and other percentiles, b = ((4) = 0.2-0.02 X 4 = 0.12 0.2
-~10;:;-----:;,
how to obtain the probability density function f(x) from the cumulative function F(x) A=i(a+b)h
= !(0.2 + 0.12) X 4
about the rectangular (uniform) distribution
= 0.64
Method 2 -using integration Ii;;=;:A:::;¥-4b
CONTINUOUS RANDOM VARIABLES
The following are examples of continuous random variables:
P(O <:X<: 4) =I: (0.2- 0.02x)dx
= [ 0.2x- 0.01x 2 ]~
e the mass, in grams, of a bag of sugar packaged by a particular machine
= 0.8-0.16
• the time taken, in minutes, to perform a task,
e the height, in centimetres, of a five-year-old girl, = 0.64
e the lifetime, in hours, of a 1 00-watt light bulb. The probability that the delay will be less than four hours is 0.64.
(b) The probability that the delay will be between two and six hours is given by the area
under the curve between 2 and 6.
PROBABILITY DENSITY FUNCTION (P.D.F.) f(x)
Method 1 - using geometry:
A continuous random variable X is given by its probability density function (p.d.f.), which is 0.2
((2) = 0.2-0.02 X 2 = 0.16
specified for the range of values for which x is valid. The function can be illustrated by a
((6) = 0.2- 0.02 X 6 = 0.08
curve, y = f(x). Note that this function cannot be negative throughout tbe specified range.
A= !(a+ b)h 0.16
Probabilities are given by the area under the curve. It is sometimes possible to find an area by A
= !(0.16 + 0.08) X 4
geometry, for example by using formulae for the area of a triangle or a trapezium. Often, = 0.48
0 2
however, areas need to be calculated using integration. P(2 <: X<: 6) = 0.48 4
You will need to find this by integrating:
Method 2 - using integration: 6 1
f(x) =-jgx{6-xl
= [ 0.2x- 0.01x ]~
2 = 1
36
J' (6x-
5
2
x )dx
~I
= 1.2-0.36- (0.4- 0.04)
= 0.48 = 316 [3xz-
The probability that the delay will be between two and six hours is 0.48.
= 0.074 (3 d.p.)
The probability that the mass is more than 5 kg is 0.074 (3 d.p.).
Notice that the total area under the curve gives the total probability.
In the above example it is easy to check by finding the area of the triangle.
ln for a continuous random variable \'1/ith y y = f(x)
Area of triangle= ! base x height f(x) valid ovTr the range a ,;; x < h I l }
i
= X 10 X 0.2 0.2 ~
I I
=1 I I
Area= 1
lO J10 I I
Alternatively, f(x)dx = (0.2- 0.02x)dx b
J0 0 '
= [0.2x- 0.01x']10
0 0 10
y
=1
Note that it is not possible to find the probability that the delay is, say, exactly three hours.
If you try to integrate, you get
P(X = 3) = r
3
f(x)dx = 0
a
Remember that in an e.xperimental approach, the area under the histogram represents
x1
frequency. In a theorellcal approach, the area under the curve y = f(x) represents probability.
x2 b x
You can only find the probability that X lies within a particular range.
It is also not possible to distinguish between
Example 6.3
P(2<X<6),
A continuous random variable has p.d.f. f(x) = kx 2 for 0 ( x ( 4.
P(2 <X< 6),
P(2 <X< 6), (a) Find the value of the constant k.
P(2 <X< 6), (b) Find P(l< X ( 3).
so there is no need to worry about whether the inequality is strict or not.
Solution 6.3
Example 6.2
X is the continuous variable, the mass, in kilograms, of a substance produced per minute in an
(a) J
all x
f(x)dx = 1
!
1 I
2
f(x)~! O<:x<:1!
~ :4 (~'J: 0 otherwise
Y=~(x+2) 2
y
~ 0.406 25
~ 0.41 (2 s.f.)
0
y
0 1 1~
O)~J ~(x+2) 2 dx
0
!
-2 0 1
f(x)~ 4k 0 <:x<: it 1
~24(8-1)
0 otherwise
7
(a) Find the value of the constant k.
24
(b) Sketchy~ f(x).
(c) Find P(-1 <:X<: 1). and P(O ,;;; X,;;; 1) ~area of rectangle
(d) Find P(X > 1). 1
2
Solution 6.4 7 1 19
.. P(-1<:X<:1)~-+-~-
2 24
~1
24
(a) To find k, you need to use the result J
allx
f(x)dx
(d) From the diagram,
Y Area= ~x~=~
f(x) has been given in two parts, so you will need to calculate two separate integrals, as
follows:
P(X > 1) ~area of shaded rectangle ~11t
I I
1 1 I 11
L0 k(x + 2) 2 dx + J''
0
' 4kdx ~ 1
.. P(X>1)~G
=-X-
3
1
2
-2 0
I
I
I2
1J,
Hcx+2)'L +4+r ~1
~ (8) +4k(i) ~ 1 leu litiC:S
8k~ 1 1. The continuous random f(x) 3. The continuous random variable X has p.d.f.
1 variable X has a p.d.f. f(x) where f(x) ~ k(4- x), 1 ~ x 0.
k~- f(x) where f(x) ~ kx 2 ,
8 O~x.;2.
(a) Find the value of the constant k.
(b) Sketchy~ f(x).
(a) Find the value of the (c) Find P(1.2 ~X~ 2.4).
constant k.
(b) Find P(X #1). 0 2 4. The continuous random
f~)
(c) Find P(0.5 ~X 0.5). variable X has p.d.f. f(x)
where f(x) ~ k(x + 2) 2 ,
2. The continuous random variable X has p.d.f. O<x.-;;;2.
f(x) where f(x) ~ k, -2.; x ~ 3.
(a) Find the value of the
(a) Sketchy~ f(x). constant k.
(b) Find the value of the constant k. (b) Find P(O ~X~ 1) and
(c) FindP(-1.6~X~2.1). hence find P(X> 1). 0 2
__ ___::
320 ,t, CONCISE COUf\SE II' I /-'.-1 [\/fL_ ST,t.T!Sr!CS
If f(x) has a line of symmetry in the specified range, then E(X) can be found directly as in the
7. The continuous random variable X has p.d.f.
5. The continuous random t(xl followmg example.
f(x) where
variable X has p.d.f. f(x)
where f(x) = kx 3 , 0 ~x ~ c 0 <;x< 2
and P(X ~ !) = -/6.
Find the values of the
f(x) ~ l~(2x- 3) 2 ~X~ 3
otherwise
Example 6.6
A continuous random variable X has p.d.f. f(x) where ·
constants c and k.
0 c (a) Find the value of the constant k. 0.25x 0 <x< 2
(b) Sketchy~ f(x).
(c) Find P(X.;; 1). f(x) ~ 1 - 0.25x 2 <x < 4
6. A continuous random variable has p.d.f. f(x) (
where f(x) ~ kx, 0 <; x <; 4. (d) Find P(X > 2.5). 0 otherwise
(e) Find P(l<; X.;; 2.3).
(a) Find the value of the constant k. Sketchy~ f(x) and find E(X).
(b) Sketchy~ f(x).
(c) Find P(l <;X<; 2.5).
Solution 6.6
Sketch of y ~ f(x)
EXPECTATION OF X, E()O f~)
(b) FindP(X<!'). ~
J
2
x x 0.25xdx + L4
x x (1 - 0.25x)dx
Solution 6.5
(a) I'~ E(X) 0 3
0
~ 0.25 I: x 2
dx + r(x - 0.25x 2)dx
~J
,ux
xf(x)dx ~ 0.25[3x']20 + [x2
2- 0.25 ~']42
~~+(8-136 -(2-~))
~2
Example 6.7
A teacher of young children is thinking of asking her class to guess her height in metres. The
teacher consrders that the height guessed by a randomly selected child can be modelled by the
random vanable H wrth probability density function
f(h) ~ {-
{, (4h- h") o <h < 2
0 otherwise
Using this model,
(a) find P(H < 1 ),
(b) show that E(H) ~ 1.25.
322 ,t, CCll'ICiSE c:OUH.SL .t.. _;_ iU ;;T/\T!STiCS
T
A friend of the teacher suggests that the random variable X with probability density function
(e) E(X) = J: xg(x)dx
kx 3 0 <; x <; 2
g(x) = {0 otherwise = _1_ Iz x4 dx
4 0
where k is a constant, might be a more suitable model.
(c)
(d)
Show that k = l
Find P(X < 1).
=~[~'J:
= 1.6
(e) Find E(X). . h. h f h d
(f) Using your calculations in (a), (b), (d) and (e), state, givmg reasons,. w lC o t e ran omL (f) For H, P(H < 1) = 0.3125, so 31% of children guess the teacher's height to be less than
variables H or X is likely to be the more appropriate model m this mstance. ( ) 1 m (i.e. 3 ft 3 in).
E(H) = 1.25, so the average guess for height of the teacher is 1.25 m (i.e. 4ft 1 in).
Solution 6. 7 For X,
P(X < 1) = 0.062 55, so only 6% of children guess the height to be less than 3ft 3 in.
(a) P(H < 1) = J: f(h)dh "" sketch of f(h)
E(X) = 1.6, so the average guess for the height of the teacher is 1.6 m (i.e. 5 ft2 in).
X is the more appropriate model.
= J' 2_
0 16
(4h- P)dh
h3]1o
=u3 [2h2-3 0 2 h
6b Expectation
=-=
5
0.3125
1. Find E(X) for each of the following continuous i ~X< 2
random variables.
16 2 <x~ 4
(a) f(x)~i(x 2 +1),0<;x0.
otherwise
(b) E(H) = [ hf(h)dh f(X)
f(x)
=2_J
2
(4P-h')dh
16 0
3
4 2 4
3 [4h' h ]
=163-40 0
0 ~ 2 4 '
= 1.25 (b) f(x)~ix(2-x), 0<;x<;2.
2. The continuous random variable X has p.d.f.
(c)
Ja!lx
g(x)dx = 1 sketch of g{x)
"''~ f(x) where
O~x<1
)
kx
k 1 <x<3
[ kx 3 dx=1
0
f(x)~ k (4-x) 3 <;x~ 4
2 '
k[x44I = 1 """'"""'"' I
(c) f(x) ~ ,\,(6- x), 0 <; x <; 6. l0 otherwise
(a) Draw a sketch of y ~ f(x).
0
~~~~
2 ' (b) Find k.
4k=1 (c) Find E(X).
1 3. X is a continuous random variable with p.d.f.
k=-
4 fix)~ kx 2 , 0 ~ x <; 4.
0 6 ' Find E(X).
l 1
3 (d) f(x)=kx'\O<x<2.
(d) P(X < 1)=
J0 -x
4
dx 4. In a game a wooden block is propelled with a
l~
stick across a flat deck. On each attempt the
= _1_ [x4]1 distance, x metres, reached by the block lies
between 0 and 10m, and the variation is
4 4 0 modelled by the probability density function
1 f(x) ~ 0.0012x 2 (10 -x).
=-=0.0625
16 Calculate the mean distance reached by the
0 2 '
block. (SMP)
The lifetime X in years of an electric light bulb Example 6.8
5. The continuous random variable X has the
probability density function f given by f(x) ""kx, has this distribution. Given that a lamp standard
is fitted with two such new bulbs and that their The continuous random variable X has p.d.f. f(x) where f(x) ~ z'o (x + 3), 0,; x,; 4.
5 < x < 10, f(x)"" 0 otherwise.
failures are independent, find the probability that
(a) Find the value of k. (a) Find E(X).
neither bulb fails in the first year and the
(b) Find the expected value of X. probability that exactly one bulb fails within two (b) Find E(2X + 5).
(c) Find the probability that X> 8. years. (MEI) (c) Find E(X 2 ).
The annual income from money invested in a (d) Find E(X 2 + 2X- 3).
Unit Trust Fund is X per cent of the amount 8. The mass, X kg, of a particular substance
invested, where X has the above distribution. produced per hour in a chemical process is a
Suppose that you have a sum of money to invest continuous random variable whose probability Solution 6.8
and that you are prepared to leave the money density function is given by
invested over a period of several years. State, Sketch of y ~ f(x). y
3x 2
with your reasons, whether you would invest in f(xl~- O<x<2
the Unit Trust Fund or in a Money Bond offering 32
i\(11\: fmn1 dh: ?kctc·h tint
a guaranteed annual income of 8% on the money 3(6-xl
invested. (NEAB) f( xl- -----u- 2<x<6 dwrc i~ IHl 'J'Jlllllcrry.
r
Calculate the mean of X. an hour the profit will exceed £7. (NEAB)
A torch runs on two batteries, both of which 9. A continuous random variable X has the
~ 20
have to be working for the torch to function. If probability density function f defined by 1 2
two new batteries are put in the torch, what is (x + 3x)dx
the probability that the torch will function for at
least 22 hours, on the assumption that the life-
times of the batteries are independent? (0 & C) 3 <x< 4 ~ 210 [~\ 3~'1:
otherwise ~ 2.266 ...
7. A random variable X has a probability density
function f given by where c is a positive constant. Find ~ 2.3 (2 s.f.)
cx(S-xl O~x<S (a} the value of c, (b) E(2X + 5) ~ E(2X) + 5 (Result 3)
f (xl~ 0 otherwise (b) the mean of X,
\ (c) the value, a, for there to be a probability of ~2E(X)+5 (Result 2)
6 . 0.85 that a randomly observed value of X ~ 2(2.266 ... ) + 5
Show that c""- -and fmd the mean of X. will exceed a. (NEAB)
125 ~ 9.533...
~ 9.5 (2 s.f.)
(c) E(X
2
) ~J x
2
f(x)dx
r
THE EXPECTATION OF ANY FUNCTION OF X allx
~ 20
1
is any function of the continuous random tbcn 2
x (x + 3)dx
lf
~ 20
1
r 3
(x + 3x )dx
2
also hold when X is continuous; a and b are constants: (d) E(X 2 + 2X- 3) ~ E(X 2 ) + E(2X)- E(3) (Result 4)
~ E(X 2 ) + 2E(X) - 3 (Results 1, 2)
1. E(a) " a
2. EiaXI aE(X)
~ 6.4 + 2(2.266 ... ) -3
~ 7.933 ...
3. E(aX +b)~ aEIX) + b
~ 7.9 (2 s.f.)
4. E(g(X) +
Example 6.10
Example 6.9
The continuous random variable X has p.d.f. f(x) where
The mass, X kg, of a particular substance produced in one hour in a chemical process is
I0
modelled by a continuous random variable with probability density function given by ~X 0 <x< 1
f(x)~ -~x(2-x) 1 <:x<: 2
f(x) ~ f,_ x 2, 0 <: x < 2,
f(x)~f,.(6-x), 2 <:x<: 6, otherwise
2
f(x) ~ 0, otherwise Find E(X ).
~~[:·J: +~[~·- ~T
~ 1.328 ...
~ 1.3 (2 s.f.)
r r
0 1 2 3 4 5 6
J
0
-3 x 3 dx+ -(6x-x
32
3
2 32
2
J'
)dx ~ E(X2) _1"2
~%+ :2 (108-72-(12-~))
,_.,y·bcrc
7
~2-
8 l'hv sLmdard deviation of ls often \vrittcn a3 a, sou
(d) Y~100X-20
As in the case of the discrete random variable (see page 250) the following results also hold
(e) E(Y) ~ E(100X- 20) when X is continuous; where a and b are constants '
~ 1OOE(X) - 20
~267!
So the expected profit is £267.50. l.
328 /" COi'JC!SF: COUF.SE iN ,t.-LE\IU_ Sf-,6-TISTiCS
T
Example 6.12
Example 6.11
The continuous random variable X has p.d.f. f(x) where f(x) = ix, 0 <; x <; 4 · Find As an experiment a temporary roundabout is installed at the crossroads. The time, X minutes,
which vehicles have to wait before entering the roundabout has probability density function
(a) E(X), 0.8- 0.32x 0 <;x <; 2.5
(b) E(X 2 ), f(x) = .
(c) Var(X),
0 1 otherwtse
(d) a, the standard deviation of X, Find the mean and the standard deviation of X. (AEB)
(e) Var(3X + 2).
Solution 6.12
=J· ~x'dx
0 8 = [o.8 x'- 0.32 x']'·'
2 3 0
=H~l
0 4
= 0.833 ... minutes
=50 seconds
= 2.666 ... The mean time is 50 seconds
= 2.7 (2 s.f.)
E(X 2 ) = J x 2 f(x )dx
(b) E(X 2 ) =J
allx
=J· ~x'dx
2
x f(x )dx
= rallx
5
(0.8x 2 - 0.32x 3 )dx
0 8
= [0.8 ~, _ 0.32 :T'
=H:l
1
=1.041 ...
Var(X) = E(X 2 ) - E 2 (X)
=8(64)
= 1.041 ... (-0.833 ... ) 2
=8 = 0.347 .. .
(c) Var(X) = E(X 1 ) - E 2 (X) s.d. of X= ~0.347 .. .
= 8- (2.666 ... )1 = 0.589 ... minutes
= 0.888 ... = 35 seconds (2 s.f.)
= 0.89 (2 s.f.)
(d) a=~Var(X)
= ~0.888 .. . THE MODE
= 0.9428 .. .
= 0.94 (2 s.f.) The mode is the value of X for which f(x) is greatest in the given range of X.
(using variance result 3) To locate the mode it is a good idea to draw a sketch. Sometimes the mode can be deduced
(e) Var(3X + 2) = 9 Var(X)
immediately.
= 9(0.888 ... )
y y y
=8
0 4 0 2 4 0 4
Mode is 4 Mode is 2 Mode is 0
For some probability density functions you will need to determine the maximum point on the Solution 6.14
d
d
curve y ~ f(x) using the fact that, at a maximum point, f'(x) ~ 0, where f'(x) ~ d f(x).
X
(a) Since X is a random variable, I
allx
f(x)dx ~1
Note that a maximum point is confirmed if f"(x) < 0, where f"(x) ~ dx f'(x).
.. 1~J:Ax(6-x) 2 dx
\~
X has p.d.f. defined by f(x) ~ 8\, (2 + x )(4 - x ), for 0 ( x ( 4 and is illustrated in the diagram.
Find the mode. ~A[ 18x 2
- 4x
3
+~ x J: 4
y
~ 108A
""y=fo(2+xl(4-x} 1
A~-
'
jjj
108 0 6 '
(b) f(x) ~ 6
1 8 x(6 -x) 2
0 :;;;x..;;;;; 6
0 Mode 4
(i) The mean is E(X) where E(X) ~I xf(x )dx
allx
- J' x
~ -108
Solution 6.13 1 2
E(X) (6- x) 2 dx
f(x) ~ g\,(2 + x)(4 -x) ~ g\,(8 + 2x- x
2
) 0
Differentiate again to find f"(x) (ii) To find the mode, find the value of x for which f(x) is a maximum, 0 ( x ( 6.
f"(x) ~fax (-2) ~-fa , f(x) ~ 6
1 8 (36x -12x 2 + x 3 )
so f"(x) < 0 for all values of x, indicating that there is a maximum point when x ~ 1. Differentiating
f'(x) ~ 16 8 (36- 24x + 3x 2 )
The mode~ 1 ~16s(6-x)(2-x)
.. f'(x) ~ 0 when x ~ 2 and when x ~ 6
f"(x) ~ 6
1 8 (6x- 24)
Example 6.14
To check maximum or minimum, consider f'(x).
A random variable X has a probability density function
When x ~ 2, f"(x) < 0 and when x ~ 6, f"(x) > 0.
f(x) ~Ax(6 -x) 2 0 ..-:;;x~ 6 f(x) is maximum when x ~ 2, so the mode is 2.
~o elsewhere.
(iii) To find the variance of X, first find E(X 2 ).
(a) Find the value of the constant A.
(b) Calculate E(X2 ) ~I x 2 f(x)dx
(ii) the mode, allx
I:
(i) the mean, (AEB)
(iii) the variance, (iv) the standard deviation of X.
~ 1 ~8 3
(36x -12x +x )dx
4 5
5
~ _1_ [9x 4 _ 12x + x']'
108 5 6
0
~7.2
Var(X) ~ E(X2 ) - E 2 (X) ~ 7.2- (2.4)
~11.44
2
(c) E(t) ~ 1 111
tf(t)dt
O.G Il.O
~10c t 3 dt+9c (t-t 2 )dt
(iv) Standard deviation of X~ ..,Jvar(X) I0 0.6
~ 1.44
~ 1.2
~·-t
10c [ 4]1.o +9c---
4 2
t ll.o
3 0.6
[t 2 3
0.6
~ 0.225 + 0.366 ...
Example 6.15 ~ 0.591 ... hours
~ 35.5 minutes
The time taken to perform a particular task, t hours, has the probability density function
The expected time is 35.5 minutes.
10ct2 0 (t< 0.6
!
f{t)
(d) (i) 48 minutes~ 0.8 hours
f(t) ~ 9c(1 - t) 0.6 ( t ( 1.0
1.0
where c is a constant.
0 otherwise, P(T > 0.8) ~ 9c
I 0.8
(1- t)dt
(a) Find the value of c and sketch the graph of this distribution. ~ 9c[t<]t.o 0 0.6 0.8 1
(b) Write down the most likely time. 0.8
~ 0.125
(c) Find the expected time. . .
(d) Determine the probability that the lime will be The probability that the time will be more
(i) more than 48 minutes, than 48 minutes is 0.125. f(t)
(a) 1~ J f(t)dt
P(T<0.4)~10c
10
t 2 dt 0 0.4 0.8 1
~ 1~c [~3r
all t
Jl.O
~1ocJ.
06
t 2 dt+9c (1-t)dt
0 0.6
~0.1481 ...
~ 1~c [i']:' +9c[t<]l.O P(0.4 <T< 0.8)~1-0.125-0.1481...
0.6
~ 0.72c+ 0.72c ~ 0. 727 (3 s.f.)
~ 1.44c The probability that the time will be between 24 and 48 minutes is 0. 72 7.
1 100 25
c~ 1.44 ~ 144 ~ 36
t{t)
6c Standa deviation a va nee
The probability density function is In Questions 1-7 find
lfit 2
o ( t < o.6 2.5
(b) E(X 2 ),
!
(a) E(X), (c) Var(X), (d) the standard deviation of X.
f(t) ~ L,f (1- t) o.6 q.;, 1.0 It is assumed that the value of the function is zero outside the range(s) stated. Do not forget to look for
symmetry when considering E(X).
0 otherwise
0 0.6 NOTE: some of these functions were given in Exercise 6a and you may wish to refer to your previous sketches.
1. f(x)~jx' O<x<2
f(x)~li(2x-3)
(b) From the sketch, t ~ 0.6 gives the maximum value of f(t). 6.
0 <x< 2
2. f(x) ~ ~ -2,;;:; X,;;;; 3 2 ..;;x,.;; 3
Therefore the mode is 0.6 hours~ 36 minutes. 3. f(x) ~ l(4 -x) 1..-;;x..-;;3
The most likely time is 36 minutes. 4. f(x)~i!;(x+2) 2 O..-;;x..-;;2 7. f(x) ~ wx + 2)' -2 ..;;x,.;; 0
0,.;; X,.;; 1!
5. f(x) ~4x 3 O..-;;x..-;;1
T f--'f(_();rw
otherwise
2
3
Notice that
~
~1
r
F(b) ~ P(X <: b)
f(x)dx
a b '
I
THE CUMULATIVE DISTRIBUTION FUNCTION, f(x) Finding the median, quartiles and other percentiles
In Chapter 4 (page 253) you met the idea of a cumulative distribution function, F(x), for a The median is the value 50% of the way through the distribution. It splits the area under the
discrete random variable and in Chapter 5 (pages 283 and 294) you used cumulauve curve y ~ f(x) into two halves. If m is the median, then for f(x) defined for a.;; x < b,
probability tables giving F(r) ~ P(X <: r) for binomial and Pmsson dtstnbut10ns.
In the same way, if X is a continuous random variable with p.d.f. f(x), you can find tbe
cumulative distribution function F(x).
For a particular value, t, in the range of the function,
The lower limit is given as -oo, but in practice it is the smallest possible value of X in the range
for which x is valid.
f(x)
So if is valid in the range a < x < bJ
then F(t)
a b '
336 ,L\ CCiNCiS[ COUf?.SE: lf\i f..-l_[v'EL_ ST;\TiSTiCS
The upper quartile, q , is the value 7 5% of the way through the distribution, so (b) P(O.hX<:1.8)~F(1.8)-F(0.3)
3
1 82
((x)dx ··-· 0.75 F(1.8) ~--ii;-~ 0.2025
0 32
l.C. •• 0.75 F(0.3) ~ ~
6 ~ 0.005 625
Similarly for other percentiles, for example
P(0. 3 ,;; X,;; 1.8) = 0.025-0.005 625 = 0.197 (3 d.p.)
F(10th percentile)= 0.1 and F(35th percentile)= 0.35.
(c) For the median m , Y Sketch of y "" f{x)
y=-~ + 2
=l~:I
_H4) X y
16 0 ~X~ 2 2 3
3 3
r=} )
2x
i" f(x) ~ --+2 2 ~X~ 3
3
16
0 otherwise 0 2
tl 0 I t 2 3 ' 6
6
x2 y
F(1) = ~
So, for 0 < x < 2, F(x) =6 P(1 <X< 2.5) = F(2.5)- F(l)
11 1
NOTE: F(2)=t=i
12 6
0 2 = 0.75
'1
F(m)=G
=F(2)+[- x; +2xI So
m2
6=0.5
0
=~+{- ~ +2t-(-i+4)1
m 2 =3
i;' ~, m = 1.73 (2 d.p.)
tl --;'
' h
=--+2t-2
3 ··~ Cl
:::::,\___, Cumulative clistri on
Writing the answer as a general formula in terms of x,
Sketch of y = F(x) 1. The random variable X has probability density 3. The random variable X has probability density
x2 function function
0 <x< 2 y
y =
x' 2x-2
-3+
6 =I f(x)=ix'. O<:x<:2 f(x)=k, 1 <:x<; 6
x2 Find (a) Find k.
F(x)= --+2x-2 2<x<3
3 (b) Find the cumulative distribution function
'3 (a) the cumulative distribution function, F(x)
F(x).
x>3 x' and draw a sketch of y = F(x),
1 1
y=6 (b) the median, m. (c) Find the 20th percentile.
3 (d) Find the interquartile range.
2. The random variable X has probability density )
0 -~
function 4. The random variable X has probability density
0 2 3 ' function
f(x)=h4-x), 1 <:x<; 3
0 <x< 2
Find f(x)=j;(2x-3) 2 <x< 3
(a) the cumulative distribution function F(x),
(b) P(l.S <X< 2). Find
(a) the cumulative distribution function F(x),
(b) the median m.
I
15. A continuous random variable, X, has 16. The continuous random variable X has
5. The random variable X has cumulative 11. A factory is supplied with flour at the beginning probability density function given by
of each week. The weekly demand, X thousand probability density function given by
distribution function tonnes for flour from this factory is a
~ ~~
f(x) ~ax- bx' for Oo;;;:xo;;;:2
0 X< 0 contin~ous random variable having the forl<xo;;;:9,
~o elsewhere
probability density function f(x)
F(x)~ x 4 0 <:x<: 1 Observations on X indicate that the mean is 1. otherwise
{1 X~ 1 f(x)~k(1-x)', 0 "x" 1 (a) Obtain two simultaneous equations for a
where k is a constant. Giving your answers
f(x) ~ o, elsewhere and b, show that a= 1.5 and find the value
Find correct to three significant figures where
of b.
Find appropriate, find
(a) P(0.3 <X< 0.6), (b) Find the variance of X.
(b) the median m, (a) the value of k, (c) If F(x) is the probability that X~ x find F(x) (a) the value of k, and also the median value
(c) the value of a such that P(X >a)~ 0.4. (b) the mean value of X, . and verify that F(2) ~ 1. of X,
(c) the variance of X, to three dec1mal places. (d) If two independent observations are made (b) the mean and variance of X,
6. The continuous random variable X has p.d.f. on X what is the probability that at least (c) the cumulative distribution function, F, of
Sketch the probability density functi?n. X, and sketch the graph of y ~ F(x). (C)
f(x) ~ j, 0 <; x <: 3. Find Find, to the nearest tonne, the quantity of flour
one of them is less than!?
(a) E(X), (b) Var(X), that the factory should have in stock at ~he
(c) F(x) and sketchy~ F(x), (d) P(X ~ 1.8), beginning of a week in order that there IS a
(e) P(1.1<:X"1.7). probability of 0.98 that the demand in that week OBTAINING THE P.D.F., f(x), FROM THE CUMULATIVE DISTRIBUTION
will be met. (L)
7. X is the continuous random variable with pk.d.fd. FUNCTION f(x)
f(x) = kx 2 , 1.;;;: x.;;;: 2. Find (a) the co?-st~nt an 12. A continuous random variable X has probability
sketchy= f(x), (b) the standard dev1.at10n a, density function, f, defined by Since F can be obtained by integrating f, it follows that f can be obtained by differentiating F.
(c) the cumulative distribution funct10n F(x),
(d) the median, m. f(x)~~. 0 "x" 1 d
x' 1 o;;;:xo;;;: 2 dx
f<xJ~
8. The continuous random variable X has
probability density function f given by
5.
f(x) ~ 0, otherwise
NOTE: the gradient of the F(x) curve gives the value of f(x).
k(4-x 2) forO"x"2
Obtain the distribution function and hence, or
f(x) ~ ( 0 otherwise
otherwise, find, to three decimal places, the
median and the interquartile range of the Example 6.18
where k is a constant. Show that k = -f6 and find
the values of E(X) and Var(X). distribution (L)
The continuous random variable X has cumulative distribution function F(x) where
Find the cumulative distribution function of X,
and verify by calculation that the median value 13. The continuous random variable X has 0 x~O
J
of X is between 0.69 and 0. 70 probability density function f given by
x' _)~~~~~-~-
Find also P(0.69 <X< 0.70), giving your answer -3o;;;:xo;;;:3 F(x)~ 0 ~X~ 3
k(x + 3),
correct to one significant figure. (C) f(x) ~ 27
otherwise
\ 0'
9. The continuous random variable X has 1 x)3
where k is a constant.
continUous p.d.f. f(x) where 0 3 '
(a) Showthatk=fs.
X 2 (b) Find E(X) and Var(X). (a) State the range of values for which the probability density function f(x) is valid.
2"x"3 (c) Find the lower quartile of X, i.e. the value q
3 3 (b) Find f(x) and illustrate it in a sketch. · ·
3 ,;;;xo;;;: 5 such that P(X" q) ~ ~·
f(x) ~ a (d) Let Y =aX+ b, where a and bare constants
2 -f3x 5 o;;;:x,;;; 6 with a > 0. Find the values of a and b for Solution 6.18
\
otherwise which E(Y) ~ 0 and Var(Y) ~ 1. (C)
0
(a) Since F(x) is unchanging in the regions x ( 0 and x ;> 3 it follows that f(x) must be zero for
Find (a) a and j3, (b) F(x) and sketchy~ F(x), 14. The continuous random variable, X, has x < 0 and x) 3.
(c) P(2 <;X" 3.5), (d) P(X ~ 5.5). probability density function defined by
So f(x) is valid for 0 ( x ( 3 and f(x) ~ 0 otherwise.
10. The continuous random variable X has ~kx, 0 ,;;;xo;;;: 8
d
probability density function f(x) ~ l~k, 8 <xo;;;: 9 (b) f(x) ~ dx F(x)
p+x otherwise
~ ~ (~;)
1o;;;:xo;;;:3
f(x)~\o 6 where k is a constant.
otherwise
(a) Sketch the graph of f(x).
(a) Sketch the probability density function of X. (b) Show that k ~ 0.025. 3x 2
(c) Determine, for all x, the distribution
(b) Calculate the mean of X. 27
(c) Specify fully the cumulative distribution function F(x). d x2
(d) Calculate the probability that an obse;;~AB)
function of X.
(d) FindmsuchthatP(X"m)~1. (L) value of X exceeds 6. ( 9
The p.d.f. for X is f(x) where f{xl
,, Example 6.20
g
r
f(x) =
I The continuous random variable X has cumulative distribution function given by
O<:x<:3 I
f(x) = ; I x<O
I
otherwise 0 .;;;;;x< 1
0 3 ' x>1
(a) Show that P(X < !l = l
(b) Find the interquartile range of X. (C)
Example 6.19 .
The continuous random variable X has cumulative distribution function F(x) as shown m the Solution 6.20
sketch. (a) P(X < !l=F(!)=2x!-C!l 2 =0.75
F~l
0 x< -2 (b) To find the interquartile range, you need to find the upper quartile and lower quartile.
rz(2+X) -2 <:x< 0 Upper quartile q 3 is such that F(q 3 ) = 0.75.
F(x)= ~(1 +x) 0 <:x< 4 From(a) F(!)=0.75
4<:x<6 :. q3 =!
,',:(6+x)
1 x>6 Lower quartile q 1 is such that F(q 1) = 0.25
' 2 4 6 F(q 1) = 2q 1 - q/
-2 0
,', 2q1- q/ = 0.25
(a) Find the p.d.f. of X, f(x), and sketchY= f(x).
2
(b) Find E(X). q1 -2q1 +0.25=0
(q1 -1) 2 -1 + 0.25 = 0
Solution 6.19 (q1 -1) 2 = 0.75
(a) Since F(x) is unchanging for x < -2 and ~
x"" · fo11ows that f(x) must be zero for x < -2
6, 1t
q1-1= ±>/0.75
and x :> 6.
d So q 1 = 1 +>10.75 or q 1 = 1->/0.75
Since f(x) = dx F(x ),
Since F~x) is unchanging for x > 1, f(x) = 0 for x > 1.
d 1 1 So 1 + 0.75 is outside the range of f(x).
for -2 <:x< 0, f(x)= dx 12 (2+x)= 12 " q1=1->/0.75=0.1339 ...
d 1 1 Interquartile range = q 3 - q 1
for 0 <:x< 4, f(x)= dx 6(1+x)=6
d 1 1
= 0.5-0.1339 ...
for 4<:x<6, f(x)=dx C6 +x)=u = 0.37 (2 s.£.)
12
y=l se f(x)
y=
-~. -·~.- '
. . ~-I
l y =
1
I2
:==t 1. The cumulative distribution function of X is
' ' given by
-2 0 4 6 '
0 2 4 6 '
344 /' CCNClSE: COUF6L iN fl, L. E\·T_\_ STAi!STiCS
F(x) ~ \~
3 0 <x< 1
1 x;;;:.7 k=0.2
f(x) = 0.2, 1 ....;;x....;; 6
x>1 Find
(a) the p.d.f. f(x) and sketch it, 0 6 '
Find (b) E(X) X is said to follow a continuous uniform, or rectangular, distribution between 1 and 6.
(a) the median, (c) Var(X),
(d) the median of X, This can be written X - R(1, 6).
(b) the mean. . . .
3. The cumulative distribution functiOn of X 1s (e) P(2.8 <:X" 5.2).
In general:
given by
6. The continuous random variable X has
x<O The probability rtnrmh m
~ \~-
(cumulative) distribution function given by
0 <x< 2 the range a < x < b is
F(x) kx' 1+X
-1 <x< 0 1
y f(x) =If-
x;;;:.2 8
f -- - §l.
b-a
''
Find 1 + 3x ''
F(x) ~ 0 <x< 2
(a) the value of k, . 8
··rhi':> Is \Vrittcn X" b.! ''
(b) the probability density functlon f(x), 5+x '
(c) the median of X, 2 <x< 3
8 a and b arc kno\vn as the parameters of the distribution. 0 a b
(d) the variance of X.
where F(x) = 0 for x < -1, and F(x) = 1 for x > 3.
NOTE: It is easy to see from the diagram that the total area is 1.
4. The continuous random variable X has (a) Sketch the graph of the probability density
cumulative distribution function F(x) where
function f(x).
0 x<O (b) Determine the expectation of X and the 1
2x
variance of X. I1 Area= (b -a) x - -
(b -a)
F(x)~
3
:._+k
0
1
<x<
<x< 2
1 (c) Determine P(3 <: 2X" 5).
Example 6.24
Example 6.22
The error, in grams, made by a greengrocer's scales may be modelled by the random variable
The lengths of metal rods are measured to the nearest 5 mm. What is the distribution of the
X, With probability density function '
random variable E, the rounding error made when measuring? Give its probability density
f(x) = (0.1 -3 <: x <: 7
function f(e).
0 otherwise.
Find the probability that
Solution 6.22
(a) an error is positive,
The error is the difference between the true length and the recorded length after rounding to
(b) the magnitude of an error exceeds 2 grams (i.e.! X I> 2),
the nearest 5 mm. (c) the magnitude of an error is less than 4 grams (i.e. I X 1 < 4). (AEB)
Suppose you have recorded a length to be 7 5 mm, to the nearest 5 mm. The true length could
have been any length in the interval
Solution 6.24
72.5 mm <: I < 77.5 mm
(a) "''" 0.1
r--+·~~~---,
So the error, E, could be anywhere in the interval-2.5 <: E < 2.5.
'' P(X> 0) = 7 X 0.1 = 0.7
All points in this interval are equally likely 'stopping places' for E, so E is uniformly ''
distributed in the interval, i.e. -3 0 7 '
E- R(-2.5, 2.5)
(b) (()()
O.I P(i X I> 2) = 1- P(i X I< 2)
1
f(e) =1-P(-2<X<2)
2.5- (-2.5)
1 -3-2 0 2 = 1-4 X 0.1
=- -2.5 <e<:2.5 7 '
5' = 0.6
-2.5 2.5 '
(c)
f'"~-~·
"''
0.1 P(iXI <4) =P(-4 <X <4)
' Since f(x) = 0 when x < -3, find P(-3 <X< 4).
Example 6.23 P(-3<X<4)=7x0.1
Rosie spins a 'Spinning Jenny' at a fair. When the wheel stops, the shorter distance of an -3 0 4 7 '
arrow measured along the circumference from Rosie is denoted by C. What is the distribution = 0.7
of C? So P(IXI < 4) = 0.7
Solution 6.23
EXPECTATION AND VARIANCE OF THE UNIFORM DISTRIBUTION
All the points on the circumference are equally likely
stopping places for the arrow, so C is uniformly
distributed between 0 (when the arrow is next to Rosie)
and nr (when the arrow is diametrically opposite Rosie).
(j Rosie
Example 6.25
The continuous random variable Y has a rectangular distribution
1 n n
!
So C- R(O, nr) - --<y.;;;;;-
1 f(y) = n 2 2
I···~·~·~ f(c)=--
nr-0 0 otherwise
~
((c)
0 "
2
y
I
~ s 4dx
5 9
~[~xI
1
F(t) ~ (t-5) X-
(b) To find Var(Y), find E(¥ 2
) first 4
t 5 t- 5
2
E(Y ) ~I 2
y f(y)dy Var(Y) ~ E(Y2) - E 2 (Y) 4 4 4
ally n2 t-5
~--0
12 4
l
~-5
n2 X< 5
12
,i' So F(x)~ : 5 <x< 9
The variance of Y is - .
12 x;;;::9
12 I I
I I b-a
It is possible to write the mean and the variance of a uniform distribution in general formulae. ) x>b
a b t
If the continuous variable X is uniformly distributed over the interval (a, b), then F(x) can be illustrated diagrammatically.
X- R(a, b) y
f{x) = _bl
1 .~-r-·~-'
By symmetry b-'
+ /;)
It can also be shown that
Find
(a)
0
the value of k,
k
7. The continuous random variable Xb is uniformly
distributed in the interval a< x < . .
The lower quartile is 5 and the upper quarttle
is 9.
Find
"
Iallx
r
illustrated as follows
y
F0<l
F(t) = f(x )dx for a< t <b.
To obtain f(x) from F(x), differentiate F(x)
o+-~--------~~~, d
0 0.5 3 f(x) = dx F(x) = F'(x).
-2 3
If two independent observations of X are made,
find the probability that one is less than 1.5 and (a) Find the probability density function f(x). Median, quartiles and other percentiles
the other is greater than the mean. (b) Find the standard deviation of X. Medianm:
(c) Find the interquartile range.
F(m) =0.5
5. The random variable Y has probability density (d) Find the 20th percentile Lower quartile q 1: F(q 1 ) = 0.25
function given by
Upper quartile q 3 : F(q 3 ) = 0.75
f0.2 32<y<37
f(y) ~ )0 otherwise n
nth percentile F(nth percentile)=-
Find the probability that Y lies within one 100
standard deviation of the mean. Interquartile range = q 3 - q 1
® The continuous uniform (rectangular) distribution
1 f~)
If f(x)=-b- a.;;x.;;b,thenX-R(a,b)
-a I
f(x) = (b-a)
E(X) =!(a+ b)
Var(X) = fzeb- ajl
o+-~----4-~
0 x.o;;;a 0 a b x
x-a
F(x)= - - a .;;x.;; b
b-a
11 X;> 1
352 /~CONCiSE. COURSE: IN .6.--LE\f _L STr\T!Snr:.:s
Example 6.28
Miscellaneous worked examples
The length of blades of grass mown from a lawn are modelled by a uniform distribution
between 1 em and 5 em.
Example 6.27
(a) Find the standard deviation of this distribution.
The random variable X has probability density function
(b) Find the percentage of blades of grass whose lengths lie within one standard deviation of
3x• 0 <: x <: 1, the mean length.
f(x) = {0 otherwise, (c) A better model may be a triangular distribution as shown.
f(x)
where k is a positive integer.
c
Find
(a) the value of k,
(b) the mean of X,
(c) the value, x, such that P(X <: x) = 0.5.
•'
0
A''
1 5
''
3x 2 0 <: x <: 1
= 0.577 ... 0 '
f(x) = {0 otherwise
0 1 3 5
3- -{f 3 + -{f
(b) E(X) = [ xf(x )dx
So approximately 58% of blades of grass have length within one standard deviation of the
mean.
3
= J>x dx (c) Total area= 1 f(x)
Area of rectangle= 4 x ! = !
=~[xt
c
i
Area of triangle = x 4 x h = 2h
= 0.75 y
i + 2h = 1 •'
The mean of X is 0.75. h=~
11 1 3
0' 4o-+-l-----51---,
r
(c) Let P(X <: x 1) = 0.5 C = h +s=4+s=s
Therefore 3x 2 dx = 0.5
[x 3 ]~' = 0.5
x,'=0.5
X 1 = (0.5)l
= 0.794 (3 d.p.)
Sox- 0.794 (3 d.p.)
_j
Example 6.29
Miscellaneous exercise 6g
On any day, the amount of time, measured in hours, that Mr Goggle spends watching 1. A continuous random variable X has a 5. The continuous random variable X has
television is a continuous random variable T, with cumulative distribution function g1ven by probability density function, f, defined by probability density function f(x) defined by
f(x)~jx O<x<2, e
(x < -1)
t ~ 0
~ ~~- k(15- t)
x4
!
f(x) = 0 otherwise.
f(x)~ e~2-x )
2 2
F(t) 0 q,;, 15 (-1 <x~ 1)
Find the expected value of
t;;, 15 (x > 1)
(a) X, x4
(b) 2X+4 (NEAB)
where k is a constant. (a) Show that c = l·
(b) Sketch the graph of f(x).
(a) Show that k ~ 2h and find P(5,;, T,;, 10). 2. (a) A continuous variable X is distributed at
(c) Determine the cumulative distribution
(h) Show that, for 0,;, t,;, 15, the probability density function ofT is given by random between the values 2 and 3 and has
6 function F(x).
a probability density function of 2 . (d) Determine the expected value of X and the
f(t) ~ fs- zis t. X variance of X. (C)
(C) Find the median value of X.
(c) Find the median ofT. 6. A continuous variable X is distributed at random
(b) A continuous random variable X takes
values between 0 and 1, with a probability between the values x = 0 and x = 2, and has a
density function of Ax(1- x) 3• Find the probability density function of ax 1 + bx. The
Solution 6.29 value of A, and the mean and standard mean is 1.25.
(a) When t ~ 0, F(t) ~ 0 deviation of X. (a) Show that b =~'and find the value of a.
(b) Find the variance of X.
Using F(t) ~ 1- k(15- t) 2 , 3. A continuous random variable X has probability (c) Verify that the median value of X is
when t~ 0 density function f(x) given by f(x) = 0 for x < 0 approximately 1.3.
0 ~ 1- k X 15 2 and x > 3 and between x = 0 and x = 3 its form is (d) Find the mode.
as shown in the graph.
7. The continuous random variable X has
f(X)
P(5,;, y,;, 10) ~ F(10)- F(5) probability density function given by
~ 1- 2) 5 (15 -10)"- (1- 1, (15- 5) 2) ex 2 0 .;;;;x.;;;; 2
l
2
l
ex, 0 <x< 1
The cost of extracting the metal from 10 kg of
m ~ 15- '1/112.5 ore is £10x. Find the expected cost of extracting f(x)~ c(2-x) 1 .;;;;x< 2
~4.393... the metal from 10 kg of ore. (MEl) 0, otherwise
18. A random variable X has cumulative 19. (a) A discrete random variable R takes integer Mixed test 6B
values between 0 and 4 inclusive with
(distribution) function F(x) where
probabilities given by 1. The continuous random variable X has 3. A firm has a large number of employees. The
0 X< -1 probability density function given by distance in miles they have to travel each day
r+ 1
ax+a -1 <x< 0 (r ~ 0, 1, 2) from home to work can be modelled by a
F(x)~
2ax+a 0 <;;;;x< 1 P(R ~ r) ~
10 f(x)~('otx' 0<;x~2 continuous random variable X whose cumulative
9-2r otherwise. distribution function is given by
1~x (r~3,4)
\ 3a Sketch the graph of f.
\ 10 (a) F(l) ~ 0
Determine (b) Calculate the mean of X.
Find the expectation and variance of R.
(a) the value of a, (c) Calculate the standard deviation of X. F(xH(1-M 1 <;x~b
(b) A continuous random variable X takes
{d) Show why the median value of X must be
(b) the frequency function f(x) of X, values in the interval x > 0. The probability
(c) the expected value ,a of X, greater than the mean. (NEAB) F(b) ~ 1
density function of X is defined by
(d) the standard deviation a of X,
2. The random variable X has probability density where b represents the farthest distance anyone
~ {k:
(e) the probability that l X -Jl l exceeds~· (C) if 0 <;;;;X<;;;; 1
lives from work.
function
f(x) The diagram shows a sketch of this cumulative
if X> 1
x' f(x) ~ ~~(x- x') O<;x~l distribution function.
otherwise
Prove that k = ~ and find the expectation F(x)
1. A survey of 491 households, in part of the 2. The random variable X has a probability density
Midlands, gave the following results for gross function given by o+-4-------~--+
weekly income, £y. 0 <;;;;x< 1 0 b
(a) Draw a histogram on graph paper to (a) Find the cumulative distribution function
illustrate these data. Label your scales and of W.
axes clearly. (b) Find, to three decimal places, the probability
that the family eats between 2 kg and 4 kg
A statistician suggests that a suitable model for of vegetables in one week.
the gross weekly income in £100 units is the (c) Given that the mean of the distribution is
continuous random variable X with probability 3l, find, to three decimal places, the
density function
variance of W.
3k O~x<4 (d) Find the mode of the distribution.
(e) Verify that the amount, m, of vegetables
f(x)~k 4~x<:8
such that the family is equally likely to eat
{0 otherwise more or less than m in any week is about
where k is a constant. 3.431 kg.
(f) Use the information above to comment on
(b) Find the value of k.
the skewness of the distribution. (L)
(c) Use this model to estimate how many of
these 491 households have a gross weekly
income in the range £0-£130.
(d) Comment on your findings. (L)
Notice also that
• approximately 95% of the " approximately 99.9% (very nearly all)
distribution lies within two standard of the distribution lies within three
deviations of the mean standard deviations of the mean
The spread of the distribution depends on a. Here are some normal curves, each drawn to the
same scale:
In this chapter you will/earn how to
X- N(O, 1) X- N(4,!) X -N(50,4)
@ standardise a normal variable and use standard normal tables f(x)+ f(x)t I
I
@ use the normal distribution as a model to solve problems I ' I
i i
fi·~
I
® use the normal distribution as an approximation to the binomial distribution and to the l
~·•~~'"rr
Poisson distribution
---r= - c+- ·~
-3 -2 -1 0
0
2 3 ' 2 3 4 5 6 ' 44 45 46 47 48 49 50 51 52 53 54 55 56 '
jl = p=4 )i =50
The normal distribution is one of the most important distributions in statistics. Many a= 1 a= '
2 rr=2
measured quantities in the natural sciences follow a normal distribution and under certai~
circumstances it is also a useful approximation to the binomial distribution and to the Pmsson
distribution. FINDING PROBABILITIES
The normal variable X is continuous. Its probability density function f(x) depends on its
The probability that X lies between a and b is written
. . 1 -(X-!t?/2a 2
mean fJ- and standard dev1at10n a, where f(x) = ~~ e , -<XI< x < 00 • P(a <X< b). To find this probability, you need to find the
a~2n area under the normal curve between a and b.
This is very complicated and has been included just for reference. You would not be expected One way of finding areas is to integrate, but since the
to remember it! normal function is complicated and very difficult to
integrate, tables are used instead. a b
To describe the distribution, write
·
Th e maximum va 1ue o f f(x) 1·s -{2;; '
I ; Ci ; ADD
2 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 O.S359 4 8 12 16 20 24 28 32 36
the diagram below. 1,1 ,, ' 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 10.56361 0.5675 0.5714 0.5753 4 8 12 16 20 24 28 32 36
0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 4 8 12 15 19 23 27 31 35
Now translate tbe curve 50 units to the left so that the mean is 0. This is shown on the left (b) 0.6179 0.6217
hand section of the diagram. The standard deviation a is still 2, so the max1mum value 1s
'),_ 0.6255 0.6293 10.63311 0.6368 0.6404 0.6443 0.6480 0.6517 4 7 11 15 ll2l 22 26 30 34
(c) 0.6554 0.6591 10.66281 0.6664 0.6700 0.6736 0.6772 0.6808 0.6884 0.6879 4 7 11 14 18 22 25 291m
again approximately 0.2. 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 3 7 10 14 17 20 24 27 31
0.4
f(x)
"''
0.4
X~ N(50, 41 "·"
0.7257
0.7580
0.7291
0.7611
0.7324
0.7642
0.7357
0.7673
0.7389
0.7704
0.7422
0.7734
0.7454
0.7764
0.7486
0.7794
0.7517
0.7823
0.7549
0.7852
3 7 10 13 16 19 23
3 6 9 12 15 18 21
26
24
29
27
,·I.~ 0.7881
X- 50 - NIO, 41 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 3 5 8 11 14 16 19 22 25
Translate 50 units i) '/ 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 3 5 8 10 13 15 18 20 23
(a) To find P(Z < 0.16), read off the value of <1>(0.16):
~ ~ ~ 0 2 4 6 M ~ q ~ ~ M M
" find row 0.1 and go across to column 6. This gives 0.5636.
P(Z < 0.16) ~ 0.5636 Q>(z) = 0.5636
Now 'squash' the curve towards the vertical axis so that the standard deviation is 1. This is
done by dividing by the standard deviation (a~ 2). (b) To find P(Z < 0.345), read off the value of <1>(0.345):
I
e Find the value when z ~ 0.34 from row 0.3, column 4.
f(Z) 00.16
This is 0.6331.
ot;-, I
X- 50 ~ N(O 1) • Now go to the right-hand section and read the number along that row in column 5.
--7,:\
'squash' in /0.2 1 \
~
'squash' in
2 '
X-50 •
This is 19.
Note the instruction to ADD. This means that 19 is added to the digits 6331.
Youwrite Z=---
/
~~.,,-~.,·
: \ "'·"- so that Z ~
2
N(O, 1)
6331
+ 19 P(Z < 0.345) ~ 0.6350
-3 -2 -1 0 2 3 6350
In general
(c) To find P(Z < 0.429), read off <1>(0.429): 6228
fo " Find row 0.4, column 2, right-hand section 9. + 32
•,ubtract the mea.n p P(Z < 0.429) ~ 0.6660 6660
then divide the deviation o
to obtain
vvhcn~ l)
Example 7.1
Using the standard normal tables printed on page 649, find
(a) P(Z < 0.85) (b) P(Z > 0.85)
USING STANDARD NORMAL TABLES
Solution 7.1
The standard normal tables give the area under the curve as far as
a particular value z. This is written <l>(z). (a) (b)
This area gives the probability that Z is less than a particular 1 - ¢(0.85)
value z, so P(Z < z) ~ <l>(z).
Note that <1> is a Greek letter, pronounced phi. 0 z
The tables are printed on page 649. On the foJlowing page is an extract from the first section. 0 0.85 0 0.85
The highlighted values are referred to in the following text. Notice that the values of <l>(z) are P(Z < 0.85) ~ <1>(0.85) P(Z > 0.85) ~ 1- <1>(0.85)
given to four decimal places in the tables. ~ 0.8023 ~ 1-0.8023
~ 0.1977
364 f\ C:Of-,iC!S\ C~ClJFZSI_- !1-l t. i._ (~\/':_1._ ~:.T.L !iS 1';\-~c;
A
In general P(Z > -1.377) = <1>(1.377)
> a) 1 <ll(a) = 0.9158
= 0.92 (2 s.f.)
-1.377 0
0
' (ii) I Using P(Z >a) = 1 _<I>( a)
Finding probabilities involving negative values of z ~\ P(Z > 1.377) = 1- <1>(1.377)
= 1-0.9158
The standard normal tables start at z = 0. You can however find probabilities relating to = 0.0842
negative values of z by using the symmetrical properties of the curve. Look at these diagrams: = 0.084 (2 s.f.)
0 1.377
To find P(Z <-a), where a> 0 (iii) Using P(Z <-a) = P(Z >a) = 1 _<I>( a)
P(Z < -1.377) = 1- <1>(1.37 7 )
= 1- 0.9158
P(Z <-a) =<I>( -a) = 0.0842
= 1 -<!>(a)
= 0.084 (2 s.f.).
-1.377 0
0 '
.
''~''i~······~ ~~·--~~··-~··~···~ .
mportant results- these are worth learning.
To find P(Z >-a), where a> 0
In the following, a> 0, b > 0 and a< b.
Examples:
<!>(a)
P(Z >-a)= <I>( a)
(a) (a) P(0.345 < Z <1. 751)
= <1>(1.751)- <1>(0.345)
= 0.9600- 0.6350
-a 0 0 a
= 0.3250
From the diagrams, it is obvious that
(()
0 ' b
0 1.377
\
Solution 7.3
(c) (c) P(-1.4 < Z < -0.6) 1•.
:~95%
I
~ <1>(1.4)- <!>(0.6) I (a) P(-1.96 < Z < 1.96) ~ 2<!>(1.96) -1
I
~ 0.9192- 0.7257 ~ 2(0.975)- 1 I \
I
I \
~ 0.1935 I ~ 0.95 2.5% 2.5%
I
I
I
I .. P(-1.96 < Z <1.96) ~ 0.95. -1.96 0 1.96
Ii"i ~99%
P(-b < Z <-a)~ <1>(-a)- <1>(-b) ~ 2(0.995) - 1
~ 1- <!>(a)- (1- <I>( b)) ~ 0.99
""
J
-2.575
i \ •<L
I
0 2.575
0.5%
(d)
(d) P( Iz I< 1.433) NOTE: These are important results which will be used in later work.
~ P(-1.433 < Z < 1.433)
~ 2<1>(1.433)- 1
~ 2 X 0.9240- 1
~ 0.8480 p II Z~IV(Cl, 1)
0
Draw sketches to illustrate your answer and consider whether your answer is sensible.
-a 0 a I. If Z- N(O, 1), find 5. If Z- N(O, 1), find
P( 1 z 1 <a)~ P(-a < Z <a) (a) P(Z < 0.874), (b) P(Z > -0.874), (a) P(0.829 < Z < 1.834),
~<!>(a)+ <!>(a)- 1 Result (b) (c) P(Z > 0.874), (d) P(Z < -0.874). (b) P(-2.56 < Z < 0.134),
~ 2<l>(a) - 1
(c) P(-1.762<Z<-0.246),
2. If Z- N(O, 1), find (d) P(O<Z<1.73),
I'( I Z I < a! ~ J (a) P(Z > 1.8), (e) P(-2.05 < Z < 0),
(b) P(Z < -0.65), (f) P(-2.08<2<2.08),
(c) P(Z > -2.46), (d) P(Z < 1.36), (g) P(1.764<Z<2.567),
P( I Z I> 1.433
J :i \
(e) (e) P(Z > 2.58), (f) P(Z > -2.37),
(e) (h) P( -1.65 < Z <1. 725),
~ P(Z < -1.433) + P(Z > 1.433) (g) P(Z < 1.86), (h) P(Z < -0. 725), (i) P( -0.98 < Z < -0.16),
(i) P(Z > 1.863), (j) P(Z < 1.63),
~ 2(1- <!>(1.433)) /~ (k) P(Z > -2.061), (j) P(Z < -1.97 or Z > 2.5),
(I) P(Z < -2.875).
(k) P( I z I< 1.78),
~ 2(1- 0.9240) (I) P(IZI>0.754),
~ 0.1520 ~-~~-
3. If Z- N(O, 1), find
(m) P(-1.645 < Z <1.645),
(a) P(Z > 1.645), (b) P(Z < -1.645), (n) P( I Z I> 2.326).
P( 1
-a 0
i i
THE i\OFir/1\L Di(lTR:DUTiCf'l 369
(b) To find the probability that the length is within 5 em of the mean you need to find
USING STANDARD NORMAL TABLES FOR ANY NORMAL VARIABLE X P( I X- 150 I <5). '
to give
x -r<
z~- where Z ~ N(O, 1)
~ 0.383
a ~ 0.38 (2 s.f.)
The procedure is illustrated in the following example: The probability that the length is within 5 em of the mean is
0.38. X: 145 150 165
Z: -0.5 0 0.5
Example 7.4
Lengths of metal strips produced by a machine are normally distributed with mean length of
150 em and a standard deviation of 10 em. Example 7.5
Find the probability that the length of a randomly selected strip is The time taken by the millcman to deliver to the High Street is normally distributed with a
mean of 12 minutes and a standard deviation of 2 minutes. He delivers milk every day.
(a) shorter than 165 em,
Estimate the number of days during the year when he takes
(b) within 5 em of the mean.
(a) longer than 17 minutes,
(b) less than ten minutes,
Solution 7.4 (c) between nine and 13 minutes.
X is the length, in centimetres, of a metal strip.
Since f' ~ 150 and a~ 10, X~ N(150, 10
2
)
Solution 7.5
(a) You need to find the probability that the length is shorter X is the time, in minutes, taken to deliver milk to the High Street.
that 165 em, i.e. P(X < 165). X~ N(12, 2 2 )
To be able to use the standard normal tables, standardise . . X-rt X-12
StandardiSe X usmg Z ~-- , i.e. Z ~ - - - .
the X variable by subtracting the mean, 150, then dividing a 2
by the standard deviation, 10. Apply this to both sides of
17)~r(z >
17 12
the inequality X< 165. (a) P(X > ; )
X-150
X becomes ~ Z, ~ P(Z > 2.5)
10 ~ 1 - <1>(2.5)
165-150 ~ 1- 0.9938
165 becomes ~ 1.5,
10 ~ 0.0062
'
I
I ;
~ ') ADD
(d) more than 10 em difference from the mean normally distributed with a mean of 150 hou~s
height. and standard deviation of 12 hours. In a quahty ("} u 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 10,94061 0.9418 0.9429 0.9441 1 2 4 5 6 7 8 10 11
control test two batteries are chosen at random 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1 2' 3 4 5 6 7 8 9
3, X- N(300, 25) from a bat~h. If both batteries have a life less '·' 0.9554 0.9564 10,95731 0.9582 0.9591 0.9599
Find the probabilities represented by the shaded than 120 hours the batch is rejected.
(b)
1.. 11 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678
0.9608
0.9686
0.9616
0.9693
0.9625
0.9699
0.9633
0.9706
1
1
2
1
3
2
4 4 5 0 7 8
3 4 4 5 6 6
areas in the diagrams: Find the probability that the batch is rejected. l ( 0.9713
(a) 1:\ 8. Cartons of milk from a particular supermarket Ll.
0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756
0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
0.9761 0.9767 1
0
1
1
2
1
2
2
3
2
4
3
4
3 4
5 5
)\h rnrnrn
are advertised as containing 1litre, but in fact (c) 2 ; 0.9821 0.9826 10,98301 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 0 1 1 3 3 4
the volume of the contents is normally 2,"· 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 0 1 1 1 2 2 2 3 3
distributed with a mean of 1012 ml and a
standard deviation of 5 ml. (a} If you are given that <!>(z} = 0.9406,
300 308
(a) Find the probability that a randomly chosen to find z, look for 0.9406 in the main body of the table.
(b) carton contains more than 1010 ml. This occurs when z = L56,
(b) In a batch of 1000 cartons, estimate the
number of cartons that contain less than the so if <!>(z} = 0.9406, then z = 1.56,
advertised volume of mille. Using notation similar to the used in trigonometry where,
for example, if sin e = 0, 82, then e = sin - 1 0. 82, you could
write 0 '
(c)
q,-'(0,9406} = 1.56
This means that the value of z such that <!>(z} = 0.9046 is 1.56.
so
<l>(z) ~ 0.9832
z ~ <1>- 1 (0.9832)
iI
0 _,
(d) P(Z <a)~ 0.0793
Look for 0.9832 in the main body of the table; note that ¢(2.12) ~ 0.9830.
Since the probability is less than 0.5, a must be
Refer to the end column and you find that negative.
<1>(2.124) ~ 0.9832 Using symmetry,
<1>(2.125) ~ 0.9832 <I>( -a) ~
1 - 0.0793
<1>(2.126) ~ 0.9832 ~
0.9207
The probabilities have been given to four decimal places and it is not possible to -a~ <1>- 1(0.9207)
distinguish between the z values, so just decide on one of them, say 2.124. ~1.41
So z ~ <1>- 1 (0.9832) ~ 2.124. a~-1.41
0.0793
NOTE: If you cannot find the value for the probability in the table, choose the value that
is closest to the required probability. 0 ..,
Often final answers are given to two or three significant figures, so these discrepancies are
not important. Example 7.7
If Z- N(O, 1) find a such that P( 1 z 1 <a)~ 0.9.
Example 7.6
Solution 7.7
If Z- N(O, 1), find the value of a if
P( IZ I <a) ~ 0.9,
(a) P(Z <a)~ 0.9693 i.e. P(-a < Z <a)~ 0.9.
(b) P(Z >a)~ 0.3802
From symmetry, using result (d) on page 366,
(c) P(Z >a)~ 0.7367
(d) P(Z <a)~ 0.0793 2<1>(a)- 1 ~ 0.9
2<1>(a) ~ 1.9
<l>(a) ~ 0.95
Solution 7.6 a~ <1>- 1 (0.95)
(a) P(Z <a) ~ 0.9693 P(Z <a)= 0.9693 ~ 1.645
i.e. <!>(a)~ 0.9693 This me~ns that the central 90% of the standard normal
a~ <1>- 1 (0.9693) d1stnbut10n lies between ±1.645.
~ 1.87 -a 0 a
I Alternatively,
0 a
(b) P(Z >a)~ 0.3802
If P(-a < Z <a)~ 0.9 I
I
i.e. 1- <I>( a)~ 0.3802 t~en the value of a corresponds to an upper tail probability <P(a) = 0.95 1
so <I>( a)~ 1- 0.3802 o 0.05, and a lower tall probability of 0.95. I
I
~ 0.6198 P(Z <a) = 0.3802 ·· ~
<l>(a) 0.95 I
a~ <1>- 1 (0.6198) a~ <1>- 1(0.95) I
~ 0.305 ~ 1.645 0 a
0 a
USING THE TABLES IN REVERSE FOR ANY NORMAL VARIABLE X Example 7.9
The marks of 500 candidates in an examination are normally distributed with a mean of
Example 7.8 . f 45 marks and a standard deviation of 20 marks.
The heights of female students at a particular co 11ege are norma 11y distributed wtth a mean o (a) Given that the pass mark is 41, estimate the number of candidates who passed the
169 em and a standard deviation of 9 em. . 1 fh examination.
h h . ht 1 than h em fmd the va ue o . (b) If 5% of the candidates obtain a distinction by scoring x marks or more, estimate the
(a) Given that 80% of these female students ave a etg ess h ' find the value
(b) Given that 60% of these female students have a height greater t an s em, value of x.
of s. (c) Estimate the interquartile range of the distribution. (L Additional)
s-: )~0.6 96 z, 0 z
> z ~ <1>- 1(0.95)
~ 1.645
s-196 x, s 169 x-45
Let z z, z 0 --~1.645
9 20
P(Z > z)~0.6 X~ 45 + 1.645 X 20
~ 78 (2 s.f.)
z must be negative
and <1>(-z) ~ 0.6 A distinction is awarded for a mark of 78 or more.
-z ~ <1>- 1 (0.6) (c) The interquartile range encloses the central 50% of the distribution between the lower
~ 0.253 quartile q 1, and upper quartile, q 3 .
z~ -0.253
5. Find x in each of the following. 8. A sample of 100 apples is taken from a load.
Now z is the standardised value of the upper quartile, q '' (a) X- N(60, 25) The apples have the following distribution of sizes
/i
82 -I" z~ rp- 1 (0.7)
z~--
6 ~ 0.524
so P(Z > z) ~ 0.0478 4.00-1"
<l>(z) ~ 1- 0.0478 0.524
(5
Example 7.13 Exercise 7cl Finding .u oro or both, where X-~ N(,u, a")
The speeds of cars passing a certain point on a motorway can be taken to be normally You are advised to draw sketches to illustrate your answers.
distributed. Observations show that of cars passing the point, 95% are travellmg at less than
1. The random variable X is distributed N(J.J., a 2 ), 10. The marks in an examination were found to be
85 m.p.h. and 10% are travelling at less than 55 m.p.h. witha=25. normally distributed.
If P(X < 27.5) ~ 0.3085, find the value of I'· 10% of the candidates were awarded a
(a) Find the average speed of the cars passing the point. distinction for obtaining over 75.
(b) Find the proportion of cars that travel at more than 70 m.p.h. (L) 2. The random variable X is normally distributed 20% of the candidates failed the examination
with a mean of 45. The probability that X is with a mark of under 40. Find the mean and
greater than 51 is 0.288. Find the standard standard deviation of the distribution of marks.
Solution 7.13 deviation of the distribution.
11. A farmer cuts hazel twigs to make into bean
X is the speed, in m.p.h., of a car passing a certain point. 3. The volumes of drinks in cans are normally poles to sell at the market. He says that a stick is
distributed with a mean of 333 ml.
X -N{ft,a 2 ). 240 em long. In fact the lengths of the sticks are
k
Given that 20% of the cans contain more than normally distributed and 55% are over 240 em
(a) P(X < 85) ~ 0.95 0.95 340 ml, find the standard deviation of the long. 10% are over 250 em long.
volume of drink in a can. Find also the Find the probability that a randomly selected
85- fl percentage of cans that contain less than than
i.e. P(Z < z 1) ~ 0.95 where Z 1 ~ ~~ stick is shorter than 235 em.
a 330 ml.
z1 ~ q,- 1(0.95) x, I' 85 12. The diameters of bolts produced by a particular
~ 1.645
z, 0 z, 4. The random variable X is distributed N{!J., 12) machine follow a normal distribution with mean
and it is known that P(X > 32) = 0.8438. Find 1.34 em and standard deviation 0.04 em. A bolt
85- fl ~ 1.645 85 ~ f' + 1.645a ... ®
the value of f-1· is rejected if its diameter is less than 1.24 em or
more than 1.40 em. Find the percentage of bolts
a 5. The heights, measured in metres, of 500 people which are accepted.
are normally distributed with a standard The setting of the machine is altered so that the
P(X <55)~ 0.1 deviation of 0.080 m. Given that the heights of mean diameter changes but the standard
55- fl 129 of these people are greater than the mean deviation remains the same. With the new
i.e. P(Z < z 2 ) ~ 0.1 where z2 = --- height, but less than 1.806 m, estimate the mean setting, 3% of the bolts are rejected because they
a height. (C) are too large in diameter. Find the new mean
q'>(-z 2 ) ~ 0.9 diameter of bolts produced by the machine. Find
6. The random variable X is distributed N(J.i., a 2 ). also the percentage of bolts that are now rejected
-z, ~ q,- 1 (0.9) P(X> 80) ~ 0.0113 and P(X < 30) ~ 0.0287.
X: 55 p because they are too small in diameter.
~ 1.282 Findp and a.
-
Z: z2 0
. . z, ~ -1.282 13. Tea is sold in packages marked 750 g. The
7. The masses of boxes of apples are normally masses of the packages are normally distributed
55 -r< ~-1.282
55~ r<- 1.282a ... @ distributed such that 20% of them are greater with a mean of 760 g. It is known that less
()" than 5.08 kg and 15% are greater than 5.62 kg. than 1% of packages are underweight. What is
Estimate the mean and standard deviation of the the maximum value of the standard deviation of
CD-@ gives 30 ~ 2.927a masses. the distribution?
a~ 10.249 ...
Substituting in CD 85 ~ p + 1.645 x 10.249 ... 8. Metal rods produced by a machine have lengths 14. The random variable X is normally distributed.
that are normally distributed. The probability that X is less than 53 is 0.04 and
fl ~ 68.139 ... 2% of the rods are rejected as being too short the probability that X is less than 65 is 0.97.
and 5% are rejected as being too long. Find the interquartile range of the distribution.
The average speed is 68 m.p.h. (2 s.f.).
70-68.13 ... ) :\ (a) Given that the least and greatest acceptable
l~ngths of the rods are 6.32 em and 7.52 em, 15. A certain make of car tyre can be safely used for
(b) P(Z > 70)~P ( Z > _ ... I \ calculate the mean and variance of the 25 000 km on average before it is replaced. The
10 24
~ P(Z > 0.1815) 1\ I
: ~~~
' lengths of the rods.
(b) If ten rods are chosen at random from a
batch produced by the machine, find the
makers guarantee to pay compensation to
anyone whose tyre does not last for 22 000 km.
They expect 7.5% of all tyres sold to qualify for
~ 1- ¢(0.1815) I compensation. Assuming that the distance X
probability that exactly three of them are
~ 1-0.5718 6K14 70 tra veiled before a tyre is replaced has a normal
0 0.1815 rejected as being too long.
~ 0.4282
probability distribution, draw a diagram
9. The random variable X is distributed N(j.J., a 2 ). illustrating the facts given above.
The proportion of cars travelling at more than 70 m.p.h. is 0.43 (2 s.f.). P(X < 35) ~ 0.2 and P(35 <X< 45) ~ 0.65. Calculate, to three significant figures, the
Find ft and a. standard deviation of X.
Estimate the number of tyres per 1000 which
will not have been replaced when they have
covered 26 500 km. (L Additional)
fl-1!:. i\Of?f··.J1/-\L DIST'f\IBUTION 383
382 t\ CONCIS[ COURSf:_ il\i .A,-l_E\iF~L- ST.tJ!STiCS
/(fl'h
\
with mean 283 ns and standard deviation
8 ns. What is the probability that the delay
time of one line selected at random from
Goodline's output is between 275 ns and
286 ns?
19. A machine is used to fill cans of soup with a
nominal volume of 0.450 litres. Suppose that the
machine delivers a quantity of soup which is
normally distributed with mean fl litres and
standard deviation a litres. Given that fl = 0.457
r
0
I''I'~
2 3 4 5
' 0 1 2 3 4 5
(b) It is found that, in the output of Megadelay, (b) n~ 12
10% of the delay times are less than and a= 0.004, find the probability that a
randomly chosen can will contain less than the X- 8(12, 0.2) X- 8(12, 0.5)
274.6 ns and 7.5% are more than 288.2 ns.
Again assuming a normal distribution, nominal volume.
""" """
calculate the mean and standard deviation It is required by law that no more than 1% of
" "
Q
l
of the delay times for Megadelay. Give your cans contain less than the nominal volume.
answers correct to three significant figures.
(C)
Find
' 'h, I
,A1'f h,l
(a) the least value of p which will comply with
the law if a= 0.004,
17. Machine components arc mass-produced at a
factory. A customer requires that the
(b) the greatest value of a which will comply
with the law if I'~ 0.457. (MEl) 0 2 4 6
~ '
8
~ .
' "
10 12 X 0 2 4 6 8 10 12 X
components should be 5.2 em long but they will
be acceptable if they are within limits 5.195 em 20. The masses of packets of sugar are normally (c) n~20
to 5.205 em. The customer tests the components distributed. In a large consignment of packets of
and finds that 10.75% of those supplied are >< X- 8(20, 0.2) X- 8(20, 0.5}
over-size and 4.95% are under-size. Find the
sugar, it is found that 5% of them have a mass
" """
mean and standard deviation of the lengths of
the components supplied, assuming that they are
normally distributed.
greater than 510 g and 2% have a mass greater
than 515 g. Estimate the mean and the standard
deviation of this distribution. (C)
" "
If three of the components are selected at
random what is the probability that one is
21. On a particular day, 50% of the employees in a
large company had arrived at work by 8.30 a.m., /1/ 1Ill
under-size, one is over-size and one satisfactory? and l0%1 had not arrived by 8.55 a.m.
(a) Assuming a normal model, find the standard
( lh.
0 2 4 6 8 10 0 3 10 17 20 X
18. A machine dispenses peanuts into bags so that
the weight of peanuts in a bag is normally deviation of the arrival times, in minutes. Notice
(b) It is given that only 5% of the employees
distributed.
had arrived by 8.05 a.m. Without further ., when p ~ 0.5, the distributions are symmetric and f 1
(a) Initially the mean weight of peanuts in a bag calculation, explain why this might suggest takes on the characteristic nonnal shape, ' or arger values of n, the distribution
is 128.5 g and the standard deviation is that a normal model is not appropriate.
" when P"" 0.2, the distribution is positively skewed for small values of
~ 20,
1.5 g. Find the probability that the weight of (c) Eighty employees are selected at random.
peanuts in a randomly chosen bag exceeds Find the expectation of the number of these the dtstnbutwn ts almost symmetrical and bell-shaped. n, but when n
130 g. employees that arrived between 8.30 and
(b) The machine is given a minor overhaul that For the discrete random variable X, distributed binomial! wh X- .
changes the mean weight, p, of peanuts in a
8.55 a.m. (C)
~ E(X) ~ np and the variance
!'
· a2-- Va r (X) ~ npq (see page
y 286).
ere B(n, p), the mean
bag without affecting the standard
X is the number of heads in 12 tosses. Note that the probabilities found by the two different methods compare well and the working
Since the coin is fair, P(head) ~ 0.5, so X- B(12, 0.5). for part (b) is quicker to perform. The approximation is good because, although n is not very
large, p ~ 0.5.
(a) Using the binomial distribution,
P(X ~ 4) ~ 12 C4(0.5) 8 X (0.5) 4 ~ 12 C4(0.5) 12 ~ 0.1208 ...
P(X ~ 5) ~ 12 C5 (0.5) 12 ~ 0.1933 .. . More about continuity corrections
P(X ~ 6) ~ 12 C6 (0.5) 12 ~ 0.2255 .. . Continuity corrections sometimes cause difficulties, so these are considered in more detail,
P(X ~ 7) ~ 12 C7 (0.5) 12 ~ 0.1933 .. . using the diagram for the distribution of the number of heads when a coin is tossed 12 times.
P(4<:X0)~0.733 (3 d.p.) If you want the probability that there are three heads or fewer,
i.e. P(X <: 3 ), then consider P(X < 3.5).
(b) The diagram below shows the probability distribution for X- B(12, 0.5). Note that the
vertical lines have been replaced by rectangles to help illustrate the intention to use a
continuous distribution as an approximation for a discrete one. The required binomial
probability is represented by the sum of the areas of the shaded rectangles.
0 1 2 3 4
P(X ,-;; 3) rectangle for 3 included
First check the conditions for a normal approximation:
np ~ 12 x 0.5 ~ 6, so np > 5 If you want the probability that there are fewer than three
nq ~ 12 x 0.5 ~ 6, so nq > 5 heads, i.e. P(X < 3), then consider I'( X< 2.5).
Since np > 5 and nq > 5, use the normal approximation
X- N(np, npq) with np ~ 6, npq ~ 12 x 0.5 x 0.5 ~ 3
0 l 2 3 4
So X- N(6, 3). P(X < 3) rectangle for 3 not included
Superimposing the curve which is approximately N(6, 3), the probability of obtaining 4, If you want the probability that there are exactly three heads,
5, 6 or 7 heads is found by considering the area under this normal curve from x ~ 3.5 to i.e. P(X ~ 3), then consider P(2.5 <X< 3.5).
X~ 7.5.
Exercise 7 e Continuity corrections (c) P(X > 160)--> P(X > 160.5) (continuity correction)
Example 7.15 It is given that 40% of the population support the Gamboge Party. One hundred and fifty
members of the populatiOn are selected at random. Use a suitable approximation to find the
In a sack of mixed grass seeds, the probability that a seed is ryegrass is 0.35. probability that more than 55 out of the 150 support the Gamboge Party. (C)
Find the probability that in a random sample of 400 seeds from the sack,
(a) less than 120 are ryegrass seeds, Solution 7.16
(b) between 120 and 150 (inclusive) are ryegrass,
(c) more than 160 are ryegrass seeds. X is the number in 150 who support the Gamboge Party.
n ~ 150, p ~ 0.4, q ~ 0.6
Solution 7.15 so X- B(n, p) with n ~ 150, p ~ 0.4, q ~ 0.6
X is the number of ryegrass seeds in a sample of 400 seeds. Check np and nq:
n ~ 400, p ~ 0.35, q ~ 0.65, so X- B(400, 0.35) np ~ 150 x 0.4 ~ 60, nq ~ 150 x 0.6 ~ 90
To see whether a normal approximation is suitable, check the value of np and nq:
Since np > 5 and nq > 5, use the normal approximation
np ~ 400 x 0.35 ~ 140 and nq ~ 400 x 0.65 ~ 260. X- N(np, npq) with np ~ 60, npq ~ 150 x 0.4 x 0.6 ~ 36
Since np > 5 and nq < 5, use the normal approximation So X- N(60, 36)
X- N(np, npq) with np ~ 140, npq ~ 400 x 0.35 x 0.65 ~ 91 P(X >55)--> P(X > 55.5) (continuity correction)
So X- N(140, 91) ~ r(z > 55.5 - 60)
6
(a) P(X < 120)--> P(X < 119.5) (continuity correction)
A
~ P(Z > -0.75)
119.5- 140) ~ <I>(0.75)
P(X < 119.5)~P Z <
(
19'1
~ 0.7734
X: 55.5 60
~ P(Z < -2.149) ~ 0.77 (2 s.f.) Z: -0.75 0
~ 1 - <I>(2.149) X: 119.5 140
Z: -2.149 0
~ 0.0158
The probability that there are less than 120 ryegrass seeds is 0.016 (2 s.f.). DECIDING WHEN TO USE A NORMAL APPROXIMATION AND WHEN TO
(b) P(120 (X ( 150)--> P(119.5 <X< 150.5) (continuity correction) USE A POISSON APPROXIMATION FOR A BINOMIAL DISTRIBUTION
119.5 -140 150.5 -140) For X·· p)
P(119.5 <X< 150.5)~P ( 19'1 <Z< 19'1
® a Poisson approximation can be used \-\'hen n is large 1:n >50) and pis small (jJ < 0.1 ).
~ P(-2.149 < Z < 1.101) Then X·~ Po(np) anuroxnnarrl\'
~ <I>(2.149) + <I>(1.101) - 1 X: 119.5 140 150.5
~ 0.8487
Z: -2.149 0 1.101 ® ;.J normal a-pproximation can be used \vhcn n and p arc such that np > 5 and nq > 5.
Then X.~. npq) approximately.
The probability that there are between 120 and 150 ryegrass seeds is 0.85 (2 s.f.).
389
-a 0 -a 0 The probability that the mass lies between 1.37 kg and 1.45 kg is 0.86 (2 s.f.).
A 0 a b
P(a < Z <b)= Cfl(b)- <P{a)
5QQQ X 0.0026 = 13
If <I>(a) = k, i.e. P(Z <a)= k, then a= q,-'(k)
13 packets have a mass less than 1.35 kg.
The normal approximation to the binomial distribution.
If X- B(n, p) and np > 5, nq > 5 then X- N(np, npq).
The normal approximation to the Poisson distribution Example 7.20
If X- Po(A) and A> 15 then X- N(A, A). In a certain cross country running competition the times that each of the 136 runners took to
" A Poisson approximation to the binomial distribution complete the course were recorded to the nearest minute. The winner completed the course in
If X- B(n, p) and n is large (n >50) and pis small (p < 0.1) then X- Po(np). 23 minutes and the final runner came in with a time of 78 minutes. The full results are
summarised in the table below.
Continuity corrections
These must be used when using a continuous distribution (e.g. normal) as an Recorded time 20-29 30-39 40-49 50-59 60-69 70-79
approximation to a discrete distribution (e.g. binomial, Poisson) Frequency 7 21 42 37 20 9
(a) Use linear interpolation to estimate the median time. (d) Assuming Q 1 = 40.9 and Q 3 = 58.1 50%
fl = ~ (40.9 + 58.1)
The upper and lower quartiles of the time taken are 58.1 and 40.9 respectively. =49.5
25% 25%
r draw a box and whisker plot for the results from this competition. You If X is the recorded time, in minutes
(b) 0 n grap h pape ' . 1 1 diagram
should mark the end points, tbe median and the quart!1es c ear y on your ·
X- N(49.5, o- 2 ) and P(X < 58.1) = 0.75 X: 40.9 49.5 58.1
(c) Comment on the skewness. Z: 0 z
Assume that the time taken by the runners to complete the course follows a normal The central 50% of tbe distribution is enclosed between Q1 and Q3 , so the z value for Q 3
corresponds to an upper tail probability of 0.25, i.e. a lower tail probability of 0.75.
distribution with the values for the quartiles as given above.
58.1-49.5 8.6
(d) Calculate the mean of this normal distribution. . . . .. P(Z < z) = 0.75 where z
(L) a a
(e) Calculate the standard deviation of this normal dtstnbutlOn. <!>(z) = 0.75
z = <1>- 1(0. 75)
= 0.674
Solution,7·~·:2_:o_ _ _ _ _ _ _ _ _ _--:-:--::---.;:-;---::-;~---:::-,;-~-~;s'
~
Recorded time
<29.5 <39.5 <49.5 <59.5 <69.5 <79.5 ~=0.674
(J
(i) Standardising, you need to find z such that X is the number of drawn matches in 12.
X- B(12, 0.2) since P(draw) = t = 0.2
P(Z < z) = O.Gl 12
(a) (i) P(X = 3) = C3 (0.8) 9 (0.2) 3
1.e. <!>(z) = O.Ql
so <!>( -z) = 0.99 = 0.24 (2 s.f.)
-z = <l>- 1(0.99)
(ii) P(X>4)=1-P(X<4)
= 2.326 125 I'
0
z = -2.326 ' = 1 - ((0.8)' 2 + 12(0.8) 11 (0.2) + 12
C2 (0.8) 10 (0.2) 2 + 12 C (0.8) '(0.2)')
3
125-'" = 1-0.794 ...
---c--c-=:....;- 2.326
1.95
125- /" <:;- 2.326 X 1.95 = 0.21 (2 s.f.).
/" ';> 125 + 2.326 X 1.95 (b) X is the number of drawn matches in 90.
''" 129.53 ...
Then X- B(n, p) with n = 90, p = 0.2, q = 0.8
The smallest mean weight is 129.5 g (1 d.p.).
Now np = 90 x 0.2 = 18, nq = 90 x 0.8 = 72
(ii) Y is the weight, in grams of coffee in a bag from machine B.
Since np > 5, nq > 5, use a normal approximation
Y- N(128.5, a 2 ) and P(Y < 125) = 0.01
X- N(np, npq) with np = 18, and npq = 90 x 0.2 x 0.8 = 14.4,
Standardising:
125-128.5 so X- N(18, 14.4).
P(Z < z) = 0.01 where Z
a P(13:;:;;: X:;:;;: 20) ~ P(12.5 <X< 20.5) :mt:rtt1Ut y c:orr:c._-ti()t!:t I
-3.5
a
x.
z.
125
-2.326
128.5
0 = P(
12.5 -18 20.5 -18)
<Z<-r==-
,\
I
.Y14.4 .Y14.4 I
From part (i) z = -2.326 I
=P(-1.449 < Z < 0.659) I
-3.5 I
.. -2.326 = -
a = <!>(1.449) + <!>(0.659)- 1 x. 12.5 18 20.5
3.5 z. -1.449 0 0.659
a=-- = 0.9264 + 0.7451 -1
2.326
= 1.504 ... = 0.67 (2 s.f.)
The standard deviation is 1.5 g (1 d.p.). (c) D is the number of drawn matches.
D- B(20, 0.2) np = 20 x 0.2 = 4, so np < 5.
H is the number of home wins.
Example 7.22
H- B(20, 0.5) np = 20 x 0.5 = 10 > 5, nq = 10 > 5.
It is estimated that, on average, one match in five in the Football League is drawn, and that
one match in two is a home win. For H, np > 5 and nq > 5, soH can be better approximated by a normal variable.
(a) Twelve matches are selected at random. Calculate the probability that the number of
drawn matches is
(i) exactly three,
(ii) at least four.
(b) Ninety matches are selected at random. Use a suitable approximation to calculate the
probability that between 13 and 20 (inclusive) of the matches are drawn.
(c) Twenty matches are selected at random. The random variables D and Hare the numbers
of drawn matches and home wins, respectively, in these matches. State, with a reason,
which of D and H can be better approximated by a normal variable. (C)
8. A machine is used to fill tubes, of nominal 10. In 1994 an insurance company received claims
Miscellaneous exercise 7h content 100 ml, with toothpaste. The amount of from 20% of the motorists it had insured.
t~ot~paste delivered by the machine is normally
1. Squash balls, dropped onto a concrete floor from 5. Alan is a member of an athletics club. In long dtstnbuted and may be set to any required mean (a) For a random sample of 14 motorists
a given point, rebound to heights which can be jump competitions, his jumps are normally value. Immediately after the machine has been insured with the company in 1994, find the
·modelled by a normal distribution with mean distributed with a mean of 7.6 m and a standard overhauled, the standard deviation of the probability that
0.8 m and standard deviation 0.2 m. The balls deviation of 0.16 m. amount delivered is 2 ml. As time passes this (i) exactly three claimed on their
are classified by height of rebound, in order of (a) Calculate the probability of him jumping standard deviation increases until the mdchine is insurance,
decreasing height, into these categories: Fast, (i) more than 8.0 m, again overhauled. The following three conditions (ii) between two and five inclusive claimed
Medium, Slow, Super-Slow and Rejected. (ii) between 7.50 m and 7.75 m. are necessar~ for a batch of tubes of toothpaste on their insurance
(b) Determine the distance exceeded by 75% of to comply with current legislation: (iii) a majority claimed on their insurance.
(a) Balls which rebound to heights between (b) For a random sample of 90 motorists
0.65 m and 0.9 mare classified as Slow. his jumps. I the average content of the tubes must be at insured ~ith the company, use an
Calculate the percentage of balls classified as Brian also belongs to the athletics club. In long least 100 ml,
approp_nate approximating distribution to
Slow. jump competitions, his jumps are normally II not more than 2.5% of the tubes may
det.ermme the probability that at least 25
(b) Given that 9% of balls are classified as distributed with a mean of 7.4 5 m and 9 5.2% of contain less than 95.5 ml,
claimed on their insurance in 1994. (NEAB)
Rejected, calculate the maximum height of them exceed 7.0 m. III not more than 0.1% of the tubes may
rebound of these balls. (c) Calculate, correct to two decimal places, the contain less than 91 ml. 11. A horticulturalist knows from experience that
(c) The percentage of balls classified as Fast and standard deviation of Brian's jumps. (<!>-! (0.999) ~ 3.09)
when taking cuttings from bay trees only 15 in
as Medium are equal. Calculate the (a) For a batch of tubes with mean content every 100 successfully take root.
The athletics club has to select either Alan or
minimum height of rebound of a ball 98.8 ml and standard deviation 2 ml find
Brian to be its long jump competitor at a major (a) In a batch of ten randomly selected cuttings
classified as Fast, giving your answer correct
athletics meeting. In order to qualify for the final the proportion of tubes which contai~ find the probability that '
to two decimal places. (C) (i) less than 95.5 ml,
rounds of jumps at the meeting, it is necessary to {~! none of the cuttings take root,
achieve a jump of at least 8.0 min the (ii) less than 91 ml.
2. The mass of grapes sold per day in a (11) fewer than three of the cuttings take
supermarket can be modelled by a normal preliminary rounds. Hence state which, if any, of the three root.
distribution. It is found that, over a long period, (d) State, with justification, which of the two conditions above are not satisfied. {b) Let. n be the smallest number of cuttings
the mean mass sold per day is 35.0 kg, and that, athletes should be selected. (NEAR) (b) If the standard deviation is 5 ml find the which need to be examined before there is at
on average, less than 15.0 kg are sold on one day ~ean in each of the following c~ses: least a 95% chance that one or more of
in twenty. 6. The time required to complete a certain car (I) exactly 2.5% of tubes contain less than them will have taken root.
journey has been found from experience to have 95.5 ml, {i) Show that n satisfies (0.85)n..;;;; 0.05.
(a) Show that the standard deviation of the mean 2 hours 20 minutes and standard deviation (ii) Given that (0.85) 17 ~ 0.0631, find the
(ii) exactly 0.1% of tubes contain less than
mass of grapes sold per day is 12.2 kg, 15 minutes. value of n.
91 mi.
correct to three significant figures. (c) Using a suitable approximation estimate the
(b) Calculate the probability that, on a day (a) Use a normal model to calculate the Hence state the smallest value of the mean
probability that, on one day chosen at probability that fewer than six in a batch of
chosen at random, more than 53.0 kg are which would enable all three conditions to
random, the journey requires between 50 cuttings take root. (L)
sold. be met when the standard deviation is 5 ml.
(c) Ten days are chosen at random. Assuming 1 hour 50 minutes and 2 hours 40 minutes.
(c) Currently exactly 0.1% of tubes contain less 12. A large bag of seeds contains three varieties in
independence, find the probability that less (b) It is known that delays occur rarely on this
than 91 ml and exactly 2.5% contain less the ratios 4 : 2 : 1 and their germination rates are
than 15.0 kg will be sold on exactly two of journey, but that when they do occur they than 95.5 mi. 50%, 60% and 80% respectively.
these days. (C) are lengthy. Give a reason why this (i) Find the current values of the mean and Show that the probability that a seed chosen at
information suggests that a normal the standard deviation. r~ndom from the bag will germinate is~.
3. (a) Give two reasons why the normal distribution might not be a good model. (C) (ii) State, giving a reason, whether you Fmd, to three decimal places, the probability that
distribution is important in statistics. would recommend that the machine is of four seeds cho'sen at random from the bag,
(b) An airline has a regular flight from one 7. A machine is producing a type of circular gasket.
The specifications for the use of these gaskets in overhauled immediately. (AEB) exactly two of them will germinate.
airport to another. The airline models the Given that 150 seeds are chosen at random from
duration of a flight as a normally distributed the manufacture of a certain make of engine are 9. A wholesaler buys cauliflowers from a farmer for
that the thickness should lie between 5.45 mm the bag, estimate, to three decimal places the
random variable with a mean of 246
and 5.55 mm, and the diameter should lie
distribution to retail greengrocers. prob~bility that fewer than 90 of them will
minutes and a standard deviation of five The wholesaler classifies the lightest 15% of germmate. (L)
minutes. Use this model to calculate, to one between 8.45 mm and 8.54 mm. The machine is
cauliflowers as small, the heaviest 25% as large
decimal place, the percentage of these flights producing the gaskets so that their thicknesses and the rest as medium. '
arc N(5.5, 0.0004), that is, normally distributed 13. A building society announces its intention to
that are completed in less than four hours. 2 (a) Given that the wholesaler makes a profit of convert to a bank. During the first day following
(NEAB) with mean 5.5 mm and variance 0.0004 mm ,
and their diameters arc independently distributed 2 pence on each small cauliflower, 12 pence th: announcement, the number of calls per
on each medium one and 27 pence on each mmute answered by the society's hotline may be
4. The random variable X is normally distributed N(8.54, 0.0025).
Calculate, to one decimal place, the percentage large one, calculate the wholesaler's mean modelled satisfactorily by a Poisson distribution
with mean p and variance a 2 • profit per cauliflower. with mean 12.
of gaskets produced which will not meet
Given that P(X > .\8.37) ~ 0.02 The weights of the cauliflowers can be modelled (a) Calculate the probability that the hotline
(a) the specified thickness limits,
and P(X < 40.85) ~ 0.01 (b) the specified diameter limits, by a normal distribution with a mean of 628 g answers more than ten calls in a one-minute
and a standard deviation of 160 g. period.
find ll and a. (L) (c) the specifications.
(b) Calculate the weight that a cauliflower must (b) Estimate the probability that the hotline
Find, to three decimal places, the probability answers fewer than 700 calls in one hour.
exceed to be classified as large.
that, if six gaskets made by the machine are (NEAB)
(c) Calculate the weight that a cauliflower must
chosen at random, exactly five of them will meet
fall below to be classified as small. (NEAB)
the specifications. (L)
14. (a) A trade union asked 300 of its members Assuming that the range of rounds is normally 19. An ol~ car is never garaged at night. On the
distributed, find the mean and standard (b) A wholesaler buys 500 randomly chosen
whether they were full-time workers or mornmg following a wet night, the probability
deviation of the range. Flashpan batteries. Using a suitable
part-time workers, and the number of hours 0that the car does not start is 1,.
Estimate the number of rounds falling within approximation, find the probability that at
they worked in a particular week. The table .n the .n:tor~i~g following. a dry night, this most three have lives each less than one
below shows an analysis of this survey. 5 m of the centre of the target. (C)
pwbabihty IS TI· The startmg performance of the year.
car each morning is independent of its (c) A retailer buys ten randomly chosen
Standard 17. A traffic survey is being undertaken on a main
Mean performance on previous mornings.
road to determine whether or not a pedestrian Flashpan batteries .. Find the probability that
Number number of deviation of crossing should be installed. On five successive (a) There a~e six consecutive wet nights. at least four have hves each exceeding two
workers hours worked hours worked days, from Monday to Friday, the hour between Determme the probability that the car does years. (C)
8 a.m. and 9 a.m. was split up into 30-second not ~tart on at least two of the six mornings.
Full-time 100 40 4.5 intervals, and the number of vehicles passing a (b) ~unng a wet autumn there are 32 wet 21. ~escri~e, b.rie~y, the conditions under which the
20 6.9 certain point in each of these intervals was mghts ..Using a suitable approximation bmom1~l distnbution Bin(n, p) may be
Part-time 200
recorded. detenmne the probability that the car does approximated by
The hours, both for the full-time workers The random variable X represents the number of not start on fewer than 16 of the 32 (a) a normal distribution
and for the part-time workers, are normally cars travelling from the town centre per mornings. (b) a Poisson distribution'
30-second interval. For the 600 observations (c) During a long summer drought there are '
distributed. g~vi~g th.e parameters of each of the approximate
(i) Calculate the total number of workers the mean and variance were 3.1 and 3.2 7 100 dry nights. Using a Poisson
approximation, determine the probability d1stnbut10ns.
who worked more than 32 hours. respectively.
that the car does not start on five or more of Am~ng the blood cells of a certain animal
(ii) Given that only 6% of the full-time (a) Explain why X might be modelled by a sp~Cles, the proportion of cells which are of type
workers worked for less than T 1 hours, the 100 mornings.
Poisson distribution. A IS 0.~7 and the proportion of cells which are of
calculate T1• (b) Using the sample mean as an estimate for (Give three decimal places in your answers.) (C) type B ~s. 0.004. Find, to three decimal p"taces, the
(iii) Given that only 3% of the part-time the Poisson parameter, calculate the probability that in a random sample of eight
workers worked for more than T2 probability of recording exactly three 20. The life, in years, of a randomly chosen Flashpan b!ood cells at least two will be of type A.
hours, calculate T2 • vehicles travelling from the town centre in a car battery ts normally distributed with mean 2 Fmd, to three decimal places, an approximate
(b) A set of numbers is normally distributed; 30-second interval. and standard deviation 0.4. value for the probability that
1.5% of the numbers exceed 1434 and (c) Calculate the probability of recording at Show that the probability that a randomly chosen
16.6% of the numbers exceed 1194. least six vehicles travelling from the town Flashpan battery has a life less than one year is (c) in a random sample of 200 blood cells the
Calculate the mean and the standard centre in a 60-second interval. 0.006 21, correct to five places of decimals. ~ombined number of type A and type B cells
deviation of the distribution. (C) IS 81 or more,
The mean number of vehicles per 30-second (a) A far~1er b~tys two randomly chosen Flashpan (d) there will be four or more cells of type Bin
interval passing the survey point travelling batter~es. Fmd the probability that the a random sample of 300 blood cells. (L)
15. During an advertising campaign, the
manufacturer of Wolfitt (a dog food) claimed towards the town centre during the same survey battenes each have a life more than one year.
that 60% of dog owners preferred to buy period was 7.9.
Wolfitt. Assuming that the manufacturer's claim (d) Show that there is roughly a 12% chance
is correct for the population of dog owners, that the total number of vehicles passing per Mixed test 7A
calculate 30-second interval is ten.
(a) using the binomial distribution, and (e) Using a suitable approximation, estimate the 1. A smoker's blood nicotine level, measured in (b) Find the probability that in a random
(b) using a normal approximation to the probability of between 16 and 24 vehicles ng/~n!, may be modelled by a normal random s~mple of 20 students fewer than 15 will be
(inclusive) passing the survey point in a vanable with mean 310 and standard deviation
binomial; nght-handed.
60-second interval. (MEl) 110.
the probability that at least six of a random (c) Determ.ine, to two decimal places, an
sample of eight dog owners prefer to buy (a) What proportion of smokers have blood ~pprox1mate value for the probability that
18. [In this question give three places of decimals in
Wolfitt. nicotine levels lower than 250? m a random sample of 200 students at most
each answer.]
Comment on the agreement, or disagreement, When a telephone call is made in the country of (b) What blood nicotine level is exceeded by 184 will be right-handed. (NEAB)
between your two values. Would the agreement Japonica, the probability of getting the intended 20% of smokers? (AEB)
be better or worse if the proportion had been 4. The random variable X represents the weight in
number is 0.95. grams, of chocolate chips in packets sold by;
80% instead of 60%? 2. The number of hours of sunshine at a resort has
Continuing to assume that the manufacturer's (a) Ten independent calls are made. Find the been reco~ded for each month for many years. supermarket. It is suggested that X can be
figure of 60% is correct, use the normal probability of getting eight or more of the One year ts selected at random and His the modelled by a normal distribution with
approximation to the binomial to estimate the intended numbers. Find also the conditional number of hours of sunshine in August of that X- N(lOO, 25).
probability that, of a random sample of 100 dog probability of getting all ten intended ye.ar. H can be modelled by a normal variable (a) Find P(X> 108).
numbers given that at least eight of the With mean 130.
owners, the number preferring Wolfitt is between (b) Show dut P(l X -100 I< 6.8) ~ 0.8262.
60 and 70 inclusive. (MEl) intended numbers are obtained.
(b) Three hundred independent calls are made. (a) Given that P(H < 179) = 0.975, calculate the Three packets are selected at random from the
Find the probability of failing to get the standard deviation of H packets of chocolate chips on the supermarket
16. Six hundred rounds are fired from a gun at a (b) Calculate P(100 < H <ISO).
horizontal target 50 m long which extends from intended number on a least ten but not more (C) shell.
950 m to 1000 min range from the gun. than twenty of the calls. 3 · I~ a large university 90% of the students are (c) Find th: probability that exactly two of
The trajectories of the rounds all lie in the (c) Four hundred independent calls are made.
nght-handed. them will have weights in the range
For each call the probability of getting
vertical plane through the gun and the target. It
'number unobtainable' is 0.004. Find the (a) Show that the probability that in a random
I X -100 I< 6.8.
is found that 27 rounds fall short of the target (d) C:om.me~t on the suitability of the normal
probability of getting 'number unobtainable' s~mple of eight students exactly six will be
and 69 rounds fall beyond it. dtstnbutwn as a model for X. (L)
fewer than three times. (C) nght-handed is approximately 0.149.
Mixed test 78
1. The area that can be painted using one litre of 4. Frugal Bakeries claim that pac!<~ of ten o~ their
Luxibrite paint is normally distributed with buns contain on average 75 rmsms. A P01sson
mean 13.2 m 2 and standard deviation 0.197 m .
2 distribution is used to model the number of
The correspondi~g figures for o ne o.f Max~gloss raisins in a randomly selected bun.
paint are 13.4 m and 0 34:' m . I~ ts reqmre~ to
2
(a) Specify the value of the parameter.
paint an area of 12.9 m 2. Fmd w_htch ~amt gtves (b) State any assumption required about. the
the greater probability that one htre wtll be distribution of raisins in the productiOn
sufficient, and obtain this probability. (C) process for this model to be valid.
(c) Show that the probabi~ity that a r~ndom~y.
2. Soup is sold in tins which are filled by a ~achine. selected bun contains more than etght ralSlns
The actual weight of soup deltvered to a tm by
the filling machine is always normally distri.bu.ted
is 0.338.
(d) Find the probability that in a pack of ten
Linear combinations of normal variables
about the mean weight with a standard de.vtatiOn buns at least two buns contain more than
of 8 g. The machine is set originally to dehver a eight raisins.
mean weight of 810 g. (e) Using a suitable approximation, find the
(a) Determine the probability that the weight of probability that in a pack of ten buns there In this chapter you will learn about the distributions for
soup in a tin, selected at random, is less than are more than 80 raisins. (L)
800 g. . ® the sum of independent normal variables
(b) Determine the probability that the we1ght of 5. An engineering firm sets an aptitude tes~ when
soup in a tin, selected at random, is between applicants first apply for training. The Urnes ® the difference of independent normal variables
795 g and 820 g. taken to complete the test are normally
distributed with mean 40.5 minutes and standard multiples of independent normal variables
Proposed legislation requires that not more ~han deviation 7.5 minutes. Applicants who complete
2.5% of tins may contain less than the nommal the test in less than 30 minutes are immediately
net weight of 800 g. accepted for training. Thos~ who take between You will need the following results, first introduced on pages 256 and 257.
(c) Assuming that the value of the standa.rd 30 and 36 minutes are reqmred to take a further
deviation remains unchanged, determme the test. All other applicants are rejected. If X and Yare any two random variables, discrete or continuous, and a and b are any two
minimum mean weight that the machine (a) For a randomly chosen applicant calculate constants,
should be set to deliver in order to comply the probability of . .
with this requirement. (NEAB) (i) immediate acceptance for trammg, Sums Differences
(ii) requirement to take a further t~st.
3. Consultants employed by a large library reported (b) Given that a randomly chosen applicant was
E(X + Y) ~ E(X) + E(Y .•• <D E(X- Y) ~ E(X)- E(Y) •.• @
that the time spent in the library by a user could not rejected after this first test, calculate, to E(aX +bY)~ aE(X) + bE(Y) ® E(aX- bY)~ aE(X) - bE(Y) ... ®
be modelled by a normal distribution with mean three decimal places, the probability that the
65 minutes and standard deviation 20 minutes. applicant was immediately accepted for Also, if X and Yare independent, then
(a) Assuming that this model is adequate, what training.
(c) On a certain occasion there were 100
Var(X + Y) ~ Var(X) + Var(Y) ... ® Var(X- Y) ~ Var(X) + Var(Y) ... ®
is the probability that a user spends
(i) less than 90 minutes i-?- the li.brary, applicants. Use a suitable distributiona~. Var(aX +bY)~ a 2 Var(X) + b 2 Var(Y) ... ® Var(aX- bY)~ a 2 Var(X) + b 2 Var(Y) ... ®
(ii) between 60 and 90 mmutes m the approximation to calculate the probabt~tty
library? that more than 25 applicants were reqmred
to take a further test. (NEAB)
The library closes at 9.00 p.m.
(b) Explain why the model above could not
apply to a user who entered the library at THE SUM OF INDEPENDENT NORMAL VARIABLES
8.00 p.m. .
(c) Estimate an approximate latest time of entry Consider this example which involves the sum of independent normal variables.
for which the model above could still be
plausible. (AEB)
Example 8.1
A coffee machine is installed in a students' com1non room. It dispenses white coffee by first
releasing a quantity of black coffee, normally distributed with mean 122.5 ml and standard
deviation 7.5 ml, and then adding a quantity of milk, normally distributed with mean 30 rnl
and standard deviation 5 ml.
Each cup is marked to a level of 137.5 ml and if this level is not attained the customer receives
the drink free of charge.
What percentage of cups of white coffee will be given free of charge?
Solution 8.2
Solution 8.1
2
B is the amount, in millilitres, of black coffee, where B- N(122.5, 7.5 ). Let T be the total time, in seconds, for the relay race.
2
M is the amount, in millilitres, of milk, where M- N(30, 5 ). Then T =A+ B + C + D
B and M are independent normal variables. E(T) = E(A) + E(B) + E(C) + E(D) ! i
Consider W, the amount, in millilitres, of white coffee, made by combining the black coffee = 10.8 + 23.7 + 62.8 + 121.2
= 218.5
and milk, so W = B + M and
(u,:ill['. l<c:sttlt i abnVt''i Var(T) = Var(A) + Var(B) + Var(C) + Var(D)
E(W) = E(B) + E(M) = 122.5 + 30 = 152.5
2 2 (\Loin!.!, Rc:scdt· .'\above) = 0.2 2 + 0.3 2 + 0.9 2 + 2.1 2
Var(W) = Var(B) + Var(M) = 7.5 +5 = 81.25 =5.35
SoW= B + M has a mean of 152.5 and a variance of 81.25. .. T- N(218.5, 5.35)
For independent normal variables, it is true that the snm of these variables is also normally
To find the probabi'(lity that the total time is less than 3 minutes 35 seconds, i.e. 215 seconds,
distributed, so
B + M- N(152.5, 81.25)
find P(T < 215) = p z < 215- 218.5)
'i5.35
i.e. W- N(152.5, 81.25)
= P(Z < -1.513)
The drink is free of charge if W < 137.5 = 1 - <!>(1.513)
137.5- 152.5) = 1-0.9349
P(W<137.5)=PZ< -~ = 0.0651 I
( ~81.25 T: 2.15 218.5
Z: -1.513 0
= P(Z < -1.664)
~~~~e probability that the runners take less than 3 minutes 35 seconds is 0.065 (2 s.f.).
= 1 - <!>( 1.664)
= 1-0.9519
= 0.0481
w,
z,
137.5 152.5
-1.664 0 Consider nolwd:he 'sbpe~ial
case when xl, x2, ... , XII :r~-n indep~=~obs~:rva~~:~~;~~ ~~:==·
same norma 1stn uuon
=4.81%
So approximately 5% of the cups of white coffee will be given free of charge. so X 1 - N(/-l, a 2 ), X 2 - N(/-l, a 2 ), ... , Xn- N(f-l, a2)
then E(X 1 +X 2 + ... +Xn) = E(X 1) + E(X 2 ) + ... + E(X")
=f-l+f-l+"· +f-l
In general =nf-l
If X.~ Var(X, + Xz + ... + Xn) = Var(X 1 ) + Var(X 2 ) + ... +Var(X )
then X+ Y ·~ = a2 + a2 + ... + a2 n
This result can be uu.cmJcu to any set of ndcpcondcnl normal vm·L11l1C• X1 J =na 2
So I I
1X 1 I· I _1 + ,112
Example 8.3
So T- N(410, 550}. : The probability that the clearance is between 0.05 em and 0.25 em is 0.49 (2 s.f.}.
Solution 8.6
1. X and Yare independent normal variables with 6. The mass, in grams, of a Chocolate Delight cake
X is the volume, in millilitres, of liquid and X- N(20.42, O.i;"i;6,) X- N(lOO, 49) andY- N(llO, 576).
y is the capacity, in millilitres, of a bottle andY- N(21.77, · · . is normally distributed with mean 20 g and
standard deviation 2 g. The cakes are sold in
The bottle will overflow if the quantity of liquid is greater than the capacity of the batt1e,
(a) Find the mean and the standard deviation of
the distribution X+ Y. packets of six and the mass of the packing
(b) Describe the distribution of X+ Y. material is normally distributed with a mean of
i.e. if X> Y so X- Y > 0 30 g and a standard deviation of 4 g.
(c) Find P(X + Y> 200).
LetD=X-Y (d) Find P(180 <X+ Y < 240). (a} Find the probability that the mass of
E(D) = E(X)- E(Y) = 20.42-21.77 = -1.35 six cakes is less than 110 g.
2. Each weekday Mr Harper goes to the local (b) Find the probability that the total mass of a
Var(D) = Var(X) + Var(Y) = 0.429' + 0.210' = 0.2281 library to read the newspapers. The time he packet containing six cakes is
spends travelling is a normal variable with mean (i) more than 162 g,
D- N(-1.35, 0.2281) I
1.5 minutes and standard deviation 2 minutes.
I (ii) less that 137 g,
0- (-1.35)) I The time he spends in the library is normally (iii) between 140 g and 153 g.
P(D > O) = p ( z > "1/0.2281 I
I
distributed with mean 25 minutes and standard
deviation 4 minutes. 7. In a certain village, the heights of the women are
I Find the probability that, on a particular day, Mr
= P(Z > 2.827) normally distributed with a mean of 164 em and
I Harper
= 1- <1>(2.827) ---~~~----+1--~~-~~- a standard deviation of 5 em. The heights of the
-1.35 0 (a) is away from the house for more than men are normally distributed with a mean of
= 1-0.9976 0 2.827 45 minutes, 173 em and a standard deviation of 6 em.
= 0.0024 {b) spends more time travelling than in the A man and a woman are picked at random from
.. library. the people in the village.
0 _24 % of bottles will overflow during filling. Find the probability that
3. Bolts are manufactured which are to fit in holes (a) the woman is taller than the man,
in steel plates. (b) the man is more than 5 em taller than the
Example 8.7 . The diameter of the bolts is normally distributed woman.
In a cafeteria, baked beans are served either in ordlinary bploerwtioi_tnhsmore: :lgd~:~ss~~~~~;;· The
with mean 2.60 em and standard deviation
0.03 em. The diameter of the holes is normally 8. The mass of a certain grade of apple is normally
. . f d· ortion 1s a norma vana distributed with mean of 2. 71 em and standard
quantity given or an or mary p f hild's portion is a normal variable with mean 43 g distributed with mean mass 120 g and standard
deviation 3 g and the quantity given or a c deviation of 0.04 em. deviation 10 g.
and standard deviation 2 g. , · · · than his (a) Verify that, if a bolt and a hole are selected (a) An apple of this grade is selected at random.
What is the probability that Tom, who has two children s portwns, IS given more at random, the probability that the bolt is Find the probability that its mass lies
. ) too large to enter the hole is 0.0139. between 100.5 g and 124 g.
father, who has an ordinary portwn.
{b) The random selection of a bolt and hole is (b) Four apples of this grade are selected at
carried out five times. Find the probability random. Find the probability that their total
Solution 8. 7 that in every case the bolt will be able to mass exceeds 505 g.
enter the hole. (C)
. th
C IS ti"ty I· grams in a child's portion. Then C- N(43, 4)
e quan , 11 ' . h A N(90 9) 9. ROds are produced in two lengths, called 'short'
A is the quantity' in grams, in an ordinary portion. T en ~ ' 4. The mass of a particular article follows a normal and 'long'.
distribution with mean 20 g and variance 4 g2 • A S is the length, in centimetres, of a short rod,
You need to find P(C1 + C, >A), i.e. P(CI + c,- A> O) random sample of 12 items is tested. Find the where S- N(5, 0.25).·
LetW=CI+C2 -A probability that the total mass is less than 230 g. L is the length, in centimetres, of a long rod,
where L- N(lO, 1).
E(W) = E(C1 ) + E(C2 ) - E(A) 5. Fiona, Carly, Jenny and Vicky swim in the Rods are joined to give longer lengths. Find the
= 2E( C) - E(A) 4 x 100 m freestyle relay team, with each one probability that a length consisting of
= 86-90 swimming 100 m. The times in seconds taken by
(a) two short rods and four long rods is longer
each of the girls to swim 100m are independent
=-4 normal variables, distributed as follows:
than 52 em,
Var(W) = Var(CI) + Var(C2 ) + Var(A) (b) three short rods and two long rods is
F- N(52.5, 0.3'), C- N(52.0, 0.6 2), between 33 em and 36 em long,
= 2 Var(C) + Var(A) J- N(53.5, 1.2 2 ). V- N(51.5, 0.6 2 ). {c) six short rods is longer than a length
=8+9 Calculate the probability that in a particular consisting of three long rods.
= 17 race,
So W-N(-4,17) (a) Fiona will swim her leg in less than
52.5 seconds,
0- (-4)) (b) the relay team will take longer than
P(W>O)=P ( Z > f0 3 minutes 31.3 seconds to swim the race,
(c) Carly will swim her leg faster than Vicky.
= P(Z > 0.970)
= 1 - <1>(0.970) w -4 0
0 0.970
= 0.166 z.
The probability that Tom is given more than his father is 0.17 (2 s.f.).
't
Standard
Station on another train whose time of arrival is
normally distributed about the scheduled time of
P(D > 0) = p(z > 0- (-10))
deviation 08:20 with standard deviation of 1 minute. It ffl
Mean
takes me three minutes to change platforms. = P(Z > 1.443)
3.7 0.42 If I miss the train from Temple Meads, I am late = 1- <I>(1.443)
The coat of paint A
0.15 for work. = 1-0.9255
Each coat of paint B 1.3 0: -10 0
(a) Find the probability that I am late for work.
Each coat of paint C 1.0 0.12
(b) Find the probability that I miss the train
= 0.0745 z, 0 L443
from Temple Meads Station every day from
Monday to Friday in a given week. :e
;:~~::r~:~:::f::: ;~;:~=~~: ~o;i:~~C;I{'~a:~~). of X is more than twice the value of
MULTIPLES OF INDEPENDENT NORMAL VARIABLES
~r:a:a~~~:~:~i~~lte~ken in distinguishing between a sum of random variables and a multiple
Remember that, for any constant a,
2
E(aX) = aE(X) (page 246) and Var(aX) = a Var(X) (page 250)
:~::r:~7:~::e!: X is the weight of a small loaf, then the sum X t + X 2 + X 3 is the total weight
2
If X is a normal variable such that X- N(f<, a )
If X- N(p, a 2 ) then X 1 + X 2 + X 3 - N(3p, 3a').
then E(aX) = aE(X) =aft But · 1oa f w h'tch ts
if there isf a large econolmyf-stze . three times the weight of a small/oaf then
2 2
Var(aX) = a 2 Var(X) = a a the weight
o an economy oa ts 3X (a multiple) '
It can be shown that aX is also normally distributed
and 3X- N(3ft, 9a 2 ).
so aX,~
t
(b) To find the probability that the amount in a large bottle is less than the total amount in
(i) \
four small bottles you need P(L < S1+ S2+ S3+ S4) = P(L _ (S, + s, + s3+ s4) < 0)
111
X
nX 11 o E(L- (S 1 + ··· + S4 )) = E(L)- E(S 1 + ...
= E(L)- 4E(S)
:s,) ····~
Notice that the means are the same but the variances are not.
= 1012- 1008
The distribution for the multiple is more spread out.
= 4 fc./___ -- ----~-------
Look carefully at the following example. Var(L- (S 1 + ··· + S4 )) = Var(L) +Var(S 1 + · ·· + S4 ) fkmunlwdw 1 +ign
= Var(L) + 4 Var(S)
= 25 + 16
Example 8.9
=41
A soft drinks manufacturer sells bottles of drinks in two sizes. The amount in each bottle, in
Therefore L- (S 1 + ··· + S4 ) - N(4, 41)
( 0ffi-4)
Mean (ml) Variance (ml')
P(L- (S 1 + · · · + S4 ) < 0) = P Z < - -
Small 252 4
Large 1012 25 = P(Z < -0.625)
= 1 - <!>(0.625) L- (Sl +···+S 4) 0 4
millilitres, is normally distributed as shown in the table: = 0.266 Z: -0.625 0
(a) A bottle of each size is selected at random. Find the probability that the large bottle
contains less than four times the amount in the small bottle. The probability that a large bottle contains less than four small bottles is 0.27 (2 s.f.).
(b) One large and four small bottles are selected at random. Find the probability that the It is very i1nportant to distinguish between
amount in the large bottle is less than the total amount in the four small bottles.
the multiple of Sin part (a) and
the sum of s,, s,, s3, s. in part (b).
Solution 8.9 Note that E(L- 4S) = 4
E(L- (S 1 + S2 + S3 + S4 )) = 4 } The means are the same.
LetS be the amount, in millilitres, in a small bottle. Then S- N(252, 4).
Let L be the amount, in millilitres, in a large bottle. Then L- N(1012, 25). Var(L- 4S) = 89
(a) To find the probability that the large bottle contains less than four times the amount in a Var(L- (S 1 + S2 + S3 + S4 )) = 41 ) The variances are different.
small bottle, you need P(L < 4S)
i.e. P(L- 4S < 0).
Now E(L- 4S) = E(L)- E(4S) ;ttkh,ck "' s: 8b rvlultiples of normal variables
= E(L)- 4E(S)
1. X and Yare independent normal variables such 3. The thiclmess, P em, of a randomly chosen
= 1012- 1008
that X- N(40, 12) andY- N(60, 15). Find paperback .book may .be regarded as an
=4 observation from a normal distribution with
(a) P(2X + Y> 130)
(b) P(3X-2Y<20) mean 2.0 and variance 0. 730.
Var(L- 4S) = Var(L) Jvar(4S) The thickness, Hem, of a randomly chosen
= Var(L) + 16 Var(S) hardback book may be regarded as an
2. The time taken by Simon to do his Mathematics
= 25 + 64 homework can be modelled by a normal observation from a normal distribution with
= 89 distribution with mean 50 minutes and standard mean 4.9 and variance 1.920.
deviation 10 minutes. The time taken by Belinda (a) Determine the probability that the combined
So L- 4S- N(4, 89) is N(30, 25). thickness of four randomly chosen
( 0m-4)
P(L- 4S < 0) = P X < f\
I \
(a) Find the probability that, for a particular
homework, Simon takes more than twice as
long as Belinda.
paperbacks is greater than the combined
thickness of two randomly chosen
hardbacks.
The probability that an adult husky dog has a mass greater than 30 kg is 0.919 (3 d.p.).
10 For n independent normal variables such that X, ~ N(f<;, a/) ~ <1'>(1.960) + <1'>(1.470)- 1
~ 0.9750 + 0.9292-1 T: 198 222 240
x1 + x2 + ... +XII- N(ft1 + f.l-2 + ... + J..t,l, a/+ al + ... +a Ill) ~ 0.9042 Z: -1.96 0 1.47
.. For n independent observations of the random variable X where X~ N(ft, a
2
),
The probability that six huskies have a total mass lying between 198 g and 240 k ·
X 1 + X 2 + ··· + Xn ~ N(nf.l, na )
2
0.904 (3 d.p.). . g IS
., For the normal variable such that X~ N(fr, a 2 ), and for any constant a
aX ~ N(af.l, a 2a 2)
,. For two independent normal variables such that
X- N(f.l 1,a,") andY- N(f.l 2,al) and for any constants a and b
2
aX+ bY- N(af< 1 + bf< 2 , a 2 a," + b a,I)
aX- bY~ N(af.l 1 - bf.l 2 , a2 a 12 + b a,I)
2
416 .6.. CONCISt: COi_JF-;St: i~-i A- U~\,.-'[L_ ST,4TiS IICS
T
I
(c) Y is the lifetime of an Enersaver light bulb and Y ~ N(7900, 502).
Example 8.11
P(Y> 8X) is needed, i.e. P(Y- 8X;;. 0).
The lifetimes of Econ light bulbs are normally distributed with mean 1000 h and standard
E(Y- 8X) = E(Y)- 8E(X) = 7900-8000 = -100
deviation 25 h. Var(Y- 8X) = Var(Y) + 8 2 Var(X) = 50 2 + 64 x 25 2 = 42 500 ( · · d d
Y- 8X ~ N(-100, 42 500) assummg m epen ence).
(a) Find, to three decimal places, the probability that an Econ light bulb will have a lifetime
between 975 hand 1020 h.
(b) Calculate, to three decimal places, the probability that the sum of the lifetimes of eight
Econ light bulbs will exceed 7930 h. Indicate clearly the stage in your calculation when an
P(Y- 8X> 0) = r(z;;. 1
O- (- 00))
-,/42 500
assumption concerning independence is essential. = P(Z > 0.485)
= 1 - <!>(0.485)
The lifetimes of Enersaver light bulbs are normally distributed with mean 7900 h and = 1-0.6862
Y-8X: -100 0
standard deviation 50 h. = 0.3138 Z: 0 0.485
(c) Calculate, to three decimal places, the probability that an Enersaver light bulb will last at
The probability that an Enersaver light bulb lasts at least eight times as long as an E
least eight times as long as an Econ light bulb. (NEAB) hght bulb 1s 0.314 (3 d.p.). con
Solution 8.11
X is the lifetime, in hours, of an Econ light bulb. Then X~ N(lOOO, 25
2
).
5. The tensile strengths, measured in newtons (N), 9. Jam is packed into tins of advertise~ wei~ht 1_ kg. 12. [In this question give three places of decimals in 13. A small bank has two cashiers dealing with
of a large number of ropes of equal length are The weight of a randomly selected tm ?f Jam .ts each answer.] customers wanting to withdraw or deposit cash.
independently and normally distributed such that normally distributed about a target wetght wtth a The mass of tea in 'Supacuppa' tea bags has a For each cashier, the time taken to deal with a
5% are under 706 Nand 5% over 1294 N. standard deviation of 12 g. normal distribution with mean 4.1 g and customer is a random variable having a normal
Four such ropes are randomly selected and (a) If the targetweight is 1 kg, find the . standard deviation 0.12 g. The mass of tea in distribution with mean 1.50 sand standard
joined end-to-end to form a sin.gle rope; the probability that a randomly chosen tm 'Bumpacuppa' tea bags has a normal distribution deviation 4 5 s.
strength of the combined rope IS equal to the weighs with mean 5.2 g and standard deviation 0.15 g.
(a) Find the probability that the time taken for
strength of the weakest of the ~ selecte? ropes. (i) less than 985 g, (a) Find the probability that a randomly chosen a randomly chosen customer to be dealt
Derive the probabilities that thts combmed rope (ii) between 970 g and 1015 g. . Supacuppa teabag contains more than 4.0 g with by a cashier is more than 180 s.
will not break under tensions of 1000 Nand (b) If not more than one tin in 100 is to weigh of tea. (b) One of the cashiers deals with rw-o
900 N, respectively. less than the advertised weight, find the (b) Find the probability that, of 2 randomly customers, one straight after the other.
A further 4 ropes are randomly selected and minimum target weight required to meet this chosen Supacuppa teabags, one contains Assuming that the times for the customers
attached between two rings, the strength of the condition. more than 4.0 g of tea and one contains less are independent of each other, find the
arrangement being the sum of the stren~t_h:' of (c) The target weight is fixed at 1 kg. Th~ than 4.0 g of tea. probability that the total time taken by the
the 4 separate ropes. Derive the proba~1ht1es that resulting tins are packed in boxes o~ st::C and (c) Find the probability that 5 randomly chosen cashier is less than 200 s.
this arrangement will break under tensions of the weight of the box is normally dtstnbuted Supacuppa teabags contain a total of more (c) At a certain time, one cashier has a queue of
4000 Nand 4200 N, respectively. (NEAB) with mean weight 250 g and standard than 20.8 g of tea. 4 customers and the other cashier has a
deviation 10 g. Find the probability that a (d) Find the probability that the total mass of queue of 3 customers, and the cashiers begin
6. X and Yare independent normally distributed randomly chosen box of 6 tins will weigh tea in 5 randomly chosen Supacuppa to deal with the customers at the front of
random variables such that X has mean 32 and less than 6.2 kg. (L) tea bags is more than the total mass of tea in their queues. Assuming that the cashiers
variance 25, andY has mean 43 and variance 96. 4 randomly chosen Bumpacuppa tea bags. work independently, find the probability
Find 10 (a) The lifetime in hours of an electrical (C)
component has a normal distribution with that the 4 customers in the first queue will
(a) P(X > 43), mean 150 hours and standard deviation all be dealt with before the 3 customers in
(b) P(X- Y>O), the second queue are all dealt with. (C)
(NEAB) 8 hours.
(c) P(2X- Y > 0). Find the probability that
(i) a new component lasts at least 160
7. The times taken by two runners A and B to run
hours,
400 m races are independent and normally
distributed with means 45.0 sand 45.2 s, and
(ii) a component which has already Mixed test SA
operated for 145 hours will last at
standard deviations 0.5 sand 0.8 s respectively.
least another 15 hours. 1. A country baker makes biscuits whose masses (a) the mass of a randomly chosen cake is
The two runners are to complete in a 400 m race
(b) The weight of these components is normally arc normally distributed with mean 30 g and between 24.7 g and 25.7 g,
for which there is a track record of 44.5 s. distributed with mean 250 g and standard standard deviation 2.3 g. She packs them by (b) the total mass of a randomly chosen packet
(a) Calculate, to three decimal pl~ccs, the deviation 10 g. Each component is in its hand into either a small carton (containing 20 is less than J 73 g.
probability of runner A breakmg the track own box, the weight of which is also biscuits) or a large carton (containing 30
record. normally distributed with mean 50 g and State one assumption that you have made in
biscuits).
(b) Show that the probability of runner B standard deviation 5 g. There are 10 boxed answering (b). (NEAB)
breaking the track record is greater than components to a carton and the wei?ht of (a) State the distribution of the total mass, S, of
the carton is normally distributed wtth mean biscuits in a small carton and find the 3. Manto sherry is sold in bottles of rw-o sizes:
that of runner A.
(c) Calculate, to three decimal places, the probability that Sis greater than 615 g. standard and large. For each size, the content, in
75 g and standard deviation 7 g.
probability of runner A beating runner B. Find the probability that a carton of 10 (b) Six small and four large cartons are placed litres, of a randomlY chosen bottle is normally
(NEAB) boxed components weighs less than 3 kg. (L) in a box. Find the probability that the total distributed with mean and standard deviation as
mass of biscuits in the 10 cartons lies given in the table.
8. In a packaging factory, the empty _containers for 11. Jim Longlegs is an athlete whose specialist event between 7150 g and 7250 g.
a certain product have a mean wetght of 400 g is the triple jump. This is made up of a hop, a (c) Find the probability that 3 small cartons Standard
with a standard deviation of 1 0 g. The mean step and a jump. Over a season the lengths of the contain at least 25 g more than 2 large ones.
Mean deviation
weight of the contents of a full cont,~iner is 800 g hop step and jump sections, denoted by f!,
S The label on a large carton of biscuits reads 'Net
with a standard deviation of 15 g. hnd the and'; respectively, are measured, from wluch the mass· 900 g'. A trading standards officer insists Standard bottle 0.760 0.008
expected total weight of 10 ~ull c?ntainers a-?-d following models are proposed: that 90°/.J of such cartons should contain biscuits Large bottle 1.010 0.009
the standard deviation of this wetght, assummg H- N(5.5, 0.5 2 ), S- N(5.1, 0.6 2 ) , / - N(6.2, 0.8') with a total mass of at least 900 g.
that the weights of containers and contents are (a) Show that the probability that a randomly
where all distances are in metres. Assume that H, (d) Assuming the standard deviation remains
independent. . unchanged, find the least value of the mean chosen standard bottle contains less than
Assuming further that these wetghts are norm~lly S and .Tare independent. 0.750 litres is 0.1056, correct to four places
· o f h.IS tnp· 1e JU
· mps will mass of a biscuit consistent with this
distributed random variables, find the proportiOn (a) In what proportton of decimals.
requirement. (MEl)
of batches of 10 full containers which weigh Jim's total distance exceed 18m? I (b) Find the probability that a box of 10
more than 12.1 kg. (O&C) (b) In 6 successive independent attempts, wut 2. Foster's Fancy Cakes are sold in packets of six. randomly chosen standard bottles contains
is the probability that at least one total The mass of each cake is a normally distributed at least 3 bottles whose contents are each
distance will exceed 18m? 0
f random variable having mean 25 g and standard less than 0.750 litrcs. Give three significant
(c) What total distance will Jim exceed 95 Yo
0
deviation 0.4 g. The mass of the packaging is a figures in your answer.
the time? . . , t triple normally distributed random variable having {c) Find the probability that there is more
(d) Find the probability that, ll1 }1m s n~x h mean 20 g and standard deviation 1 g. Find, to sherry in 4 randomly chosen standard
. h. ·11 b t than h1s (MET)
op.
1ump, ts step w1 e grea er three decimal places, the probabilities that bottles than in 3 randomly chosen large
bottles. (C)
Mixed test 8B
1. The continuous random variables X and Y The mass of a ginger biscuit has a normal
represent the masses of male and female students distribution with mean 10 g and standard
who attend my local College. deviation 0.3 g. Determine the probability that a
Both X and Yare normally distributed such that collection of 7 cheese biscuits has a mass greater
X- N(75, 6 2 ) andY- N(65, 5 2 ), where all than a collection of 4 ginger biscuits.
masses are given in kilograms. (It may be assumed that all the biscuits were
sampled at random from their respective
(a) Find the probability that, if a male student populations.) (C)
and a female student are chosen at random,
they both have a mass exceeding 70 kg. 3. Certain components for a revolutionary new
(b) State carefully the distribution of the sewing machine are assembled by inserting a part
combined mass of a random sample of
m male and f female students.
of one type (sprotsil) into a part of another type Sampling and estimation
(weavil). Sprotsils have external dimensions
A lift in the college has a notice which are normally distributed with mean
2.50 em and standard deviation 0.018 em.
I MAXIMUM 8 PEOPLE or 650 kg Weavils have internal dimensions which are In this chapter you will learn about
normally distributed with mean 2.54 em and
Find the probability that the combined mass standard deviation 0.024 em. Under suitable
of a random sample of 8 students will pressure, the two types fit together satisfactorily sampling methods including random and non-random sampling
exceed the mass restriction if it consists of if the dimensions differ by not more than
(i) 8 males, ±0.035 em. Show that, if pairs of parts are how to simulate a random sample from a given distribution
(ii) 5 males and 3 females. chosen at random, the difference
(c) What is the probability that a randomly "' the expectation and variance of the sample mean
selected female student has a greater mass D = internal dimension of a weavil
than a randomly selected male student? -external dimension of a sprotsil the distribution of the sample mean
(MEl) is distributed with mean 0.04 em and standard the use of the central limit theorem
deviation 0.030 em. Hence show that
2. The mass of a cheese biscuit has a normal approximately 42.8% of randomly selected pairs the distribution of the sample proportion
distribution with mean 6 g and standard will fit together satisfactorily. Now, if it is
deviation 0.2 g. Determine the probability that known that the internal dimension of a given estimates of population parameters:
(a) a collection of twenty-five cheese biscuits weavil is 2.517 em, what is the probability that a mean
has a mass of more than 149 g, randomly chosen sprotsil will fit this weavil variance
(b) a collection of 30 cheese biscuits has a mass satisfactorily? (AEB)
proportion
of less than 180 g,
(c) twenty-five times the mass of a cheese G confidence intervals for:
biscuit is less than 149 g.
a population mean, involving the z-distribution
a population mean, involving the !-distribution
- a population proportion
SAMPLING
Population
In a statistical enquiry you often need information about a particular group. This group is
known as the population or the target population, and it could be small, large or even infinite.
Note that the word 'population' does not necessarily mean 'people'.
Here are some examples of populations:
pupils in a class,
people in England in full time employment,
hospitals in Wales,
cans of soft drink produced in a factory,
ferns in a wood,
rational numbers between 0 and 10.
The sampling units must be defined clearly. These are the people or items to be sampled, for
SURVEYS example
the primary school,
Information is collected by means of a survey. There are two types:
the oak tree,
(a) a census, the person suffering from a heart attack.
(b) a sample survey. Once the sampling units within a population are individually named or numbered to form a
list, then this list of sampling units is called a sampling frame. It could take various forms
(e.g. a list, a map, a set of maps), and should be as accurate as possible.
{a) Census Ideally the sarr,'pling frame should be the same as the target population. For example, if the
target populatiOn iS all the first year students in a college, then the sampling frame and the
In a census every member of the population is surveyed.
target population should be the same, provided that the register is up-to-date and accurate. A
When the population is small, this could be a straightforward exercise. For example, it would sampling frame for people in Britain eligible to vote, however, is more difficult to form. The
be easy to find out how each pupil in a class travelled to s~hool on a particular _mormng. electoral register attempts to list all those who are eligible to vote throughout all the areas in
When populations are large, taking a census can be very time consummg and difficult to do the country, but it is never completely accurate, since many changes occur during the time that
with accuracy. Each year the government carries out a census in schools on the th1rd Thursday the information is being processed. Some people do not return the forms, people move in and
in January. This requests the number of boys and girls in each age group on the roll of every out of the area, people die etc.
school in the country. Its accuracy, though, relates only to that day. Even more difficult to
carry out accurately is the population census taken every ten years. This attempts to provide In some instances it is not possible to enumerate all the population, for example, the fish in a
details of different age groups for every area in Britain. When populations are very large, or lake.
infinite, it is not possible to survey every member.
On occasions it would not be sensible to survey every member. For example, if you performed Example 9.1
a census to establish the length of life of a particular brand of light bulb, you would test each (a) Explain briefly what you understand by
bulb until it failed and so you would destroy the population! (i) a population,
(ii) a sampling frame.
(b) A market research organisation wants to take a sample of
(b) Sample survey (i) owners of diesel motor cars in the UK,
(ii) persons living in Oxford who suffered from injuries to the back during July 1996.
When a survey covers less than 100°/o of the population, it is known as a sample survey. In
many circumstances, taking a sample is preferable to carrying out a census. Sample data can Suggest a suitable sampling frame in each case. (L)
be obtained relatively cheaply and quickly and, if the sample is representative of the . .
population, a sample survey can give an accurate indication of the population charactensttc
Solution 9.1
being studied.
(a) (i) A population is a particular group of individuals or items.
The size of the sample does not depend on the size of the population. It often depends on the
(ii) Once the individual members of a population have been numbered to form a list this
time and money available to collect information. Note that large samples are more hkely to
list is called a sampling frame. '
give more reliable information than small ones. The next time that you read the results of a
public opinion poll in the newspaper, look at the size of the sample - it is usually over 1000. (b) (i) The list of registered owners as kept by DVLA in Swansea.
(ii) A list made from information supplied by Health Clinics in Oxford during July 1996.
Sample design
Once the purpose of a survey has been stated precisely, the target population must be defined, Bias
for example
The purpose of sampling is to gain information about the whole population by selecting a
all the primary schools in England, sample from that population. You want the sample to be representative of the population so
all the oak trees in Hampshire, . . . k. you must give every member of the population an equal chance of being included in the
all the people admitted to the General Hospital m January suffenng from a heart attac
sample. This should eliminate any bias in the selection of the sample.
.................-----------------------,1------------------------------------------··
ll-_I I_"
Suppose a population consists of N sampling units and you require a sample of n of these units. (a) To select a group of eight people from a target population of 100 people, allocate a two-
A sample of size n is called a simple random sample if all possible samples of size n are equally digit number to each person, for example allocate 01 to the first on the list 02 to the
likely to be selected. Some form of random processes must be used to make the selection. second, ... up to 98, 99, 00, calling the hundredth person 00 for convenie;ce.
If the unit selected at each draw is replaced into the population before the next draw, then it Using the list, starting at the begi1ming of the first row and reading along the rows you
can appear more than once in the sample. This is known as sampling with replacement. would select people corresponding to the following numbers: '
If the unit selected at each draw is not replaced into the population before the next draw, this 68 72 53 81 59 25 34 70
is known as sampling without replacement. Alternatively, you could decide to read the digits backwards, from bottom right, in which
The second method of sampling without replacement is known as simple random sampling. case your sample would consist of people corresponding to the numbers
e drawing lots, (b) To select a group of eight from a target population of 60 people, allocate each person a
number from 01 to 60.
e random number sampling.
For each, make a list of all N members of the population and give each member a different Using the tables, disregard any two-digit number outside the range.
nmnber.
Starting at the beginning of the first row and grouping in pairs gives Suppose the numbers you get are
_,61( ;n; 53 ;;1 59 25 34 /0 54 35 32 _,61( J4 47 05 0.730, 0.798, 0.369, 0.499, 0.491, 0.310, 0.135, 0.112, 0.593, 0.652, 0.015, 0.346
So you would choose people corresponding to the numbers You can interpret them in various ways, for example:
53, 59, 25, 34, 54, 32, 47, 05. " If you decide to use the first two digits to the right of the decimal point each time, you
would obtain the numbers .73, JY, 36, 49, 49, 31, 13, 11, 551, .65, 01, 34.
Ignoring repeats and numbers bigger than 49, the six numbers would be
Example 9.3 36, 49, 31, 13, 11, 1.
Use the following extract from random number tables to select a ~a~domls;mple of 12 " Suppose instead you decide to choose the second and third digits to the right of the decimal
numbers, each to two decimal places, from the contmuous range ""'x < . point and ignore repeats and numbers bigger than 49. In this case your numbers would be
30, 10, 35, 12, 15, 46.
54 80 68 72 51 96 08 00
~ ~ w ~ 60 43 57 ~ 13 «
e If you decide to use all the digits after the decimal point, you would be choosing from the
digits 730798369499491310135112593652015346. Grouping these as two-digit numbers
gives 73; 07,%; 36, 94; 99; 49, 13, 10, 13; >1, 12, 59, 36, 52, 01, 53, 46.
Solution 9.3 Ignoring repeats and numbers bigger than 49 gives the six numbers as
7, 36, 49, 13, 10, 12.
Since the sample values are required to two dec~mal place accura~~' consider groups of three
. . mser
d 1g1ts, . t.m g the decimal point between the brst and second digit. The lists are endless!
Solution 9.4 d d. · Describe how to choose a systematic sample of eight members from a list of 300.
Consider groups of four digits, inserting the decimal point between the first and secon igit.
Solution 9.5
Disregard any values that are out of range. This gives
_8.Aij() .s.;t31f ..6d(J3' 1.538 4.224 2.330 ~ 0.747 Since you are going to choose every kth member, you need to find a suitable value for h. To
N
So the numbers chosen are 1.538, 4.224, 2.330, 0.747. do this, choose a convenient value close to-.
n
N 300
In this case,-~--~ 37.5, so k ~ 40 will do.
n 8
Calculator random number generator Now choose a random starting point, for example if IRan#/ on your calculator gives 0.870
1 ~ on your calculator, which take the first member of the sample as 87 and then add 40 each time . The other members are
You probably have a random number generator <ey an ss it The numbers generated are in 127, 167, 207, 247, 287, 27 and 67. Note that when you reach the end of the list, go back to
produces a number, for exampl? 01.3f98, evlery t;;"e you hryepse~do random numbers, but they the beginning.
fact obtained using a mathematlca ormu a an are rea
suit the purpose very well indeed. 1 So the sample consists of 27, 67, 87, 127, 167 207, 247, 287.
Suppose you want to use your c~lculator to select a random samp1eo f SIX . num b e rs between
and 49 for your entry in the Natwnal Lottery. The advantages of systematic sampling are that it is quick to carry out and it is easy to check
for errors. For large scale sampling, systematic selection is usually used in preference to taking
To do this, you probably need to press [Shift[ then [Ran# l ~- simple random samples.
T
I
The disadvantage of this system is that there may be a periodic cycle ~ithin th~ fr~me itself. Non-random sampling
For example a machine may o~era~e in such~ manner th~~de;::~~~~~th~~;~;sit:~:~~ the
Sy stematic sampling of every fifth item, starnng at 5, wo
· ld d
.h f 1 .
mple wit no au ty items.
Of I (a) Cluster sampling
0.1 0.2 0.4 0.3 So the random sample of four observations from the binomial distribution consists of the
values 0, 11, 2.
Solution 9.8
Example 9.10
Form the cumulative distribution function F(x) and then allocate random numbers in a
convenient way. Using the random number 8135 take a single randon1 observation from a Poisson distribution
with parameter 3.
Corresponding
Solution 9.10
X P(X~x) F(x) random numbers
X- Po(3).
0 0.1 0.1 1
1 0.2 0.3 2,3 Using cumulative Poisson probability tables (see page 648) and arranging the results in a table
0.4 0.7 4, 5, 6, 7 together with a convenient corresponding random nmnber allocation gives:
2
3 0.3 1 8, 9, 0
Corresponding random
X F(x) numbers
Take the 10 random numbers given and convert them to sample values: 0 0.0498 0001 to 0498
Random number 3 7 4 7 6 5 3 3 9 0 1 0.1991 0499 to 1991
Sample values 1 2 2 2 2 2 1 1 3 3 2 0.4232 1992 to 4232
3 0.6472 4233 to 6472
So the sample values are 1, 1, 1, 2, 2, 2, 2, 2, 3, 3.
4 0.8153 6473 to 8153
5 0.9161 8154 to 9161
6 0.9665 9162 to 9665
Example 9.9 7 0.9881 9666 to 9881
1
Generate a random sample of size 4 from the binomial distribution X- B(4, 0.2), using oe 8 or over 1 9882 to 9999 and 0000
random numbers 2811 5747 6157 8988.
The given random number 8135 is in the range 6473 to 8153, so the random observation
corresponds to x = 4.
T
Now take the second three digits
Example 9.11
<I>~ 0.824
Using the random digits 723 850 take a random sample of size 2 from the continuous
z ~ <1>- 1(0.824)
distribution with probability density function
~ 0.931
3 2 x-30
f (x) ~-x forO <x< 2 -2-~0.931
8
x~30+1.862
~31.9 (1 d.p.) 0 0.931
Solution 9.11 30 31.862
The cumulative distribution function is given by So the two random observations are 29.4 and 31.9.
X 3
F(x) ~
I
0
-x 2 dx
8
x' ise 9b Simulating random sarnples from given distrib .+: •. --
8
In the foll?wing, use the random number tables on 6.
Taking the first three random numbers: page 653.1f random numbers have not been given. T_ak~ a r?ndom sample of size 6 from the
dtstnbutwn:
if F(x) ~ 0. 723, then
~~-=· ill
Example 9.12 4. The ~iscrete random variable X has distribution P(X ~x) 0.11 0.2 0.45 0.24
functiOn
·d F(x)=~}(x-2) , x=J ,4, ,56
.sm U. g
Use the random numbers 382 824 to take a random sample of 2 from the normal distribution I~n om nu~ber tables, generate 10 observations (b) P(X~x)~kx,x~0,1,2,3.
0 X, ~howmg your working dearly.
N(30,4).
Descnbe ~ow you would select a random sample 9. ~ak~ a r~ndom sample of sizeS from the
of 30 pupils from a school containing 850 pupils. chstnbutwn of X where F(x) = lx
5 ,
x = 2 , 3' 4 , '1.
Solution 9.12
5. You wish to select a person at random from a 10. (a) ~he ~iscrete random variable X is such that
X- N(30, 4). group of 58 people. The following procedure is . - (3, 0.4). Take a random sample of
Cumulative probabilities <l>(z) are given in the standard normal tables (see page 649). suggested: stze 5 from this distribution, using the
Taking the first three digits of the random number list Allocate th.c n~mbers 1 to 58 to the people. random numbers
C1rose ~ lme m a table of random numbers and 407 315 401 203 972
<l>(z) ~ 0.382 ~a,;: the first two digits x andy. Let z =lOx+ y. If
""z ,; ; ~8 then the person who was allocated the (b) ~sing the random number 6143 take a
z ~ <1>- 1(0.382) number lS selected. Otherwise, the person st_ngi: ra-?dom observation from the Poisson
a ocatcd the number z- 58 is selected. distnbution with parameter 4.
~-0.3
Comment on this method of selection.
x- 30 11. Using the random numbers 267 394 018 take a
--~-0.3 r~nd?m ~ample of size 3 from the normal
2 dJstnbutwn with mean 35 and variance 9.
X~ 30-0.6 -0.3 0
~29.4 29.4 30
I i
15. You are given the random number 431. Use this
12. Using the random numbers 2654 9342, make The mean and variance of the sampling distribution of means
two random observations from each of the number to obtain a sample observation from
following distributions: (a) a binomial distribution with n = 12 and
p ~ 0.4.
It Is possible to work out the mean and . .
(a) The number of seeds that germinate in a expectation algebra. vanance of this sampling distribution using
group of 5 selected at random, given that (b) a normal distribution with mean 6.2 and
75% are expected to germinate. standard deviation 0.1.
(b) The number of goals in a football match, You are expected to explain clearly how you
~o:sider da population X in which E(X) = /1 and Var(X) = az
where the number of goals follows a Poisson obtain the sample observations. (0) a e n m ependent observations X 1' X 2'
. ... ,
X f
m rom X.
.
distribution with variance 2.4.
(c) The mass of a bag of sugar, where the mass
Smce E(X) = f.l,
16 The digits 8453276 are obtained from a table of
is normally distributed with mean 1010 g random digits. Use them to obtain a random E(X1J =f.l, E(X)
2 =f.l, ... , E (Xn)=f.l
and standard deviation 4.5 g. observation from each of the following
distributions:
Since Var(X) = az,
13. Using the random number 256 construct a
(a) the number of the winning ticket in a lottery
Var(X 1) = a 2 , Var(X)- 2
2 -a, ... , V ar(Xn)==a 2
random observation of the continuous random
in which there are 500 ticket numbers from The sample mean
variable X where 1 to 500 and every ticket has the same '
F(x)=~x 2 , O<x<3. chance of being selected.
(b) the number of babies born in a cottage
14. Take 20 samples, each of size 2, from the hospital in a week, assuming that on average
following distribution: one baby is born every three days and that
births are independent (and ignoring the
X 1 2 3 4 possibility of multiple births). (0)
10 15 25 35
f
Calculate the mean of each sample and find the
mean and variance of the sample means. Find the
mean and variance of the original distribution.
Comment.
a/
SAMPLE STATISTICS
When you are trying to find out information about a population it seems sensible to take
random samples and then consider the values obtained from them. It is therefore useful to
know how these sample values are distributed.
111 Take a random sample of n independent observations from a population. Note that from a 'LOlJW tr·r',r)(
finite population, sampling should be with replacement to ensure that the observations are
independent.
<> Calculate the mean of these n sample values. This is known as the sample mean.
® Now repeat the procedure until you have taken all possible samples of size n, calculating
the sample mean of each one.
111 Form a distribution of all the sample means.
The distribution that would be formed is called the sampling distribution of means. n
/1 and
The standard deviation of the sampling distribution is
/a' a
~ -;;' usually written ~. This is
Solution 9.13
X is the mass, in kilograms, of a male student at the call
known as the standard error of the mean. and X~ N(/l, a2), with I"~ 70 and a~ 5. ege,
The mean of the sampling distribution is the same as the mean of the population. . a1so normal and
Since the distribution of X is normal ' the distribution of X- IS
population since a 2 has been divided by n. This implies that the sample means are much more )
clustered around f' than the population values are. In fact, the larger the sample size, the more
clustered they are. . X~
1.e. - N ( 70, 25)
The following diagrams help to illustrate the shape of the sampling distribution of means 4
resulting from different sized samples from given populations. so X ~ N(70, 6.25)
~---
~ 1-0.9772 x. 65 70
z,
~+-'---------'=-~
~ 0.0228 -2 0
100 The probability that the mean mass is less than 65 kg is 0.023 (2 s.f.).
The diagram below shows the distributions of X and X drawn to scale.
Distribution of X when n ~ 2, 5 and 25.
100
!\
I
100
Means of samples Means of samples Example 9.14
Means of samples
of size 5 of size 25
of size 2
The distribution of the random variable X is N (25 340) Th f
From the diagrams, you can see that if samples are taken from a normal population, the size n drawn from this distribution is X Find th '1 . e mean o a random sample of
iven that P(X- > 28) . . 1 . eva ue of n, correct to two significant figures
sampling distribution of means is normal for any sample size. g Is approxrmate y 0.005 . (C)'
~r(z > 3 ~)
..J340
You are given that P(X > 28) ~ 0.005, (ii) Distribution of X when X- Po (4)
0.2
.. 3--r,; ) ~ 1-0.005 ~ 0.995
p z < "1/340
(
3--r,;
- - ~ q,-'(0.995)
0.005 0.1
"1/340 K.
~ 2.576 x, 25 28
3--f,; ~ 2.576 X "1/340 z, 0 2.576 0 2 3 4 5 6 7 8 9 10 X
0.1
2 3 4
I
5
"
6 7 8 9 10 X
0.25
0
4 4
By :e::~:1 ~li::~:::n:,
0
mean calculated. Find, in each case, the probability that the sample mean exceeds 5. since n is large, X is approximately normal and
(a) X is the number of telephone calls made in an evening to a counselling service, where
X- Po(4.5).
(b) X is the number of heads obtained when an unbiased coin is tossed nine times.
(c) X is distributed uniformly throughout the range 2 <;; x <;; 7. i.e. X- N(4.5, ~)
X- N(4.5,0.0694 ... )
Solution 9.15
5 45
(a) X- Po(4.5) P(X > 5)=P(z > - · )
2 >/0.0694 ...
/t=!c=4.5, a =!c=4.5 '""""" 2''''•
:: t~ ~e:~~~ ~)~i::o~:m;;ince n is large, X is approximately normal, = P(Z > 1.897)
= 1-0.9711 ~ ~5 5
= 0.029 (2 s.f.) Z: 0 1.897
- - N ( 4.5, 3o
i.e. X 4.5)
3. In an examination taken by a large number of 11. The mean of a sample of 100 observations of the THE DISTRIBUTION OF THE SAMPLE PROPORTION, p
students the mean mark was 64.5 and the random variable X is denoted by X. The mean of
variance was 64. The mean mark in a random X is 20 and the standard deviation of X is 0.3.
sample of 100 scripts is denoted by X. Find Find the mean and the standard deviation of X. Suppose a random sample of n observations is taken from a population in which the
(a) P(X > 65.5) proportiOn of successes 1s p and the proportion of failures is q = 1 _ p.
12. A sample of 11 independent observations is taken
(b) P(63.8 <X< 64.5) from a normal population with mean 74 and If X is the number of successes in the sample, then X follows a binomial distribution i.e.
4. The mean of 50 observations of X, where
standard deviation 6. The sample mean is X- B(n, p) and E(X) ~ np, Var(X) = nqp (see page 286).
denoted by X.
X- B(12, 0.4), is X.
(a) Find n if P(X > 75) ~ 0.282. The random variable for the proportion of success in the sample is X.
(a) State the approximate distribution of X. (b) Find n if P(X < 70.4) ~ 0.0037.
(b) Hence find P(X < 5) n
= ~ E(X) = (~
6. Independent observations are taken from a limit theorem has played in the calculations.
normal distribution with mean 30 and x Var(X)
variance 5. 14. The diameters, x, of 110 steel rods were
(a) Find the probability that the average of 10 measured in centimetres and the results were 1 1
observations exceeds 30.5. summarised as follows: ~;; x np = n2 x npq
(b) Find the probability that the average of 40 2:x~36.5, pq
observations exceeds 30.5.
(c) Find the probability that the average of 100 Find the mean and standard deviation of these n
observations exceeds 30.5. measurements.
(d) Find the least value of n such that the Assuming these measurements are a sample from The distribution of P, has mean p and variance /Jq .
probability that the average of n a normal distribution with this mean and this n \S.d.
observations exceeds 30.5 is less than 1%. variance, find the probability that the mean \
\
diameter of a sample of size 110 is greater than \
I \
7. The standard deviation of the masses of articles 0.345 em. (0 & C) /
r: j j_!
in a large population is 4.55 kg. Random /
samples of size 100 are drawn from the 15. In a certain nation, men have heights distributed
population. Find the probability that a sample normally with mean 1. 70 m and standard
deviation 10 em. Find the probability that a man p
mean will differ from the population mean by
less than 0.8 kg. chosen randomly has height not less than
1.83 m. The distribution of P, is known as the sampling distribution of proportions. The standard
What is the probability that the average height of
8. The variable X is such that X- N(p, 4).
A random sample of size 11 is taken from the 3 men chosen randomly is greater than 1. 78 m
. t"IOn o f t h"IS d"tstn"b utwn
d ev1a . IS (Pq an d it is known as the standard er~or of proportion.
. ~------;;
population. Find the least 11 such that and the probability that all three will have
P(IX-~<1< 0.5) > 0.95. heights greater than 1.83 m? (MEl)
NOTE:_ When considering ;he normal approximation to the binomial distribution, a
9. {a) A large number of random samples of size n 16. Two red balls and 2 white balls are placed in a contmmty correction of ± 2 is needed (see page 383).
are taken from B(20, 0.2). Approximately 90% bag. Balls are drawn one by one, at random and
of the sample means are less than 4.354.
Estimate 11.
without replacement. The random variable X is
the number of white balls drawn before the first
Since P, ~ ~ x X, use a continuity correction~ x ( ± ~)i.e. 1
±-
2n
(b) A large number of random samples of size 11 red ball is drawn.
are taken from Po(2.9). Approximately 1% (a) Show that P(X = 1) == j-, and find the rest of
of the sample means are greater than 3.41. the probability distribution of X. Example 9.16
Estimate 11. (b) Find E(X) and show that Var(X) ~ j.
(c) The sample mean for 80 indepen~ent It is known that 3% of frozen pies delivered to a canteen are broken. What is the probability
10. The random variable X has standard deviation a. observations of X is denoted by X. Using a
The mean of 40 observations of X is X. Given that, on a mormng when 500 pies are delivered, 5% or more are broken?
suitable approximation, find P(X > 0.75).
that Var(X) = 0.625, find the value of a. (C)
Solution 9.16
~OO)
probability that area are connected to the internet. Find the
find P(P,) 0.05)---+ P(P, > 0.05 X ip><ocim>UY co'''''""') probability that at least 73 of a random sample
2 (a) fewer than 40% of the tosses will result in
heads, of 100 households arc connected to the internet.
~ P(P, > 0.049) (b) between 40% and 50% (inclusive) of the 6. A die is biased so that 1 in 5 throws results in a
0.049-0.03) tosses will result in heads, six. Find the probability that, when the die is
- p z > -::;;~==="=' (c) at least 55% of the tosses will result in thrown 300 times, the number that result in a six
- ( '>/o.ooo 058 2 heads.
{a} is more than 70,
~ P(Z > 2.491) (b) is at least 70,
3. A fair coin is tossed 300 times.
~ 1- <1>(2.491) {c) is less than 57.
Work through part (c) as in question 2.
~ 1-0.9936 0.03 0.049 Explain why your answer is different from that 7. 70% of the strawberry plants of a particular
0 2.491 obtained in question 2.
~ 0.0064 variety produce more than ten stra wherries per
plant. Find the probability that a random sample
4. Mr Hand gained 48% of the votes in the District of 50 plants of this variety consists of more
Council elections. than 37 plants which produce more than ten
Alternative method for Solution 9.16 strawberries per plant.
Instead of considering p, the proportion of broken pies, you could consider X, the number of
broken pies in the sample.
In this case, X- B(n, p) with n ~ 500, P ~ 0.03, q ~ 1- P ~ 0.97.
UNBIASED ESTIMATES OF POPUlATION PARAMETERS
Now np ~ 500 x 0.03 ~ 15 and nq ~ 500 x 0.97 ~ 485. . . In order to define a binomial distribution you need to know n and P; to define a Poisson
distribution
2
you need to know.< and to define a normal distribution you need to know f'
Since n is large such that np > 5 and nq > 5, use the normal approximation for the bmormal and a • These are known as the population parameters of the distributions.
distribution (see page 382), 14 55
where X- N(np, npq) with np ~ 15 and npq ~ 500 x 0.03 x 0.97 ~ · Suppose that you do not know the value of a particular parameter of a distribution, for
1.e. X- N(15, 14.55). example the mean or the variance or the proportion of successes. It seems sensible that you
would take a random sample from the distribution and use it in some. way to make an
You want the probability that 5% or more are broken. estimate of the value of your unknown parameter.
5% of 500 ~ 25, so find the probability that 25 or more are broken.
This estimate is unbiased if the average (or expectation) of a large number of values taken in
P(X;;,:;. 25 ) ----7 P(X > 24.5) ,:continuit-y COITt~\:tim1) the same way is the true value of the para1neter. There may be several ways of obtaining an
24.5 -15) unbiased estimate but the best (most efficient) estimate is the one with the smallest variance.
~P Z > ---,=~
( '-114.55
~ P(Z > 2.491) x, 15 24.5 POINT ESTIMATES
~ 0.0064 (as above) z, o 2.49 1
If the random sample taken is of size n,
NOTE: Since the same underlying theory has been used, pro b a bT · o f t hit
1ltles s ypecan be X
"d ring
found either by considering Ps, the distribution of sample proportw~s, or by ~on;.lo~ to the' ® the best unbiased estimate of p, the proportion of successes in the population, is f3 where
the distribution of the number of successes, and applying the norma approx1ma 1 f)"'-- /J, (J, is the proponio:l '~'_[ ~LL('\'C~~(·~ ir1rk- sarnpk
binomial distribution. In either case, the sample s1ze, n, must be large.
® the best unbiased estimate of fl, the population mean, is tl where
. b oth cases, or omt.tted in both cases, the
Note that if the continuity correction is used rn
;?_ ::\'
standardized z values will agree exactly.
II
Try this using the raw data and your calculator in SD mode (see page 40). Input the data as
follows:
There are alternative formats for 82:
Casio 570W/85W/85WA
11
ll ~·· l 11
I, (x
n I
Set SD mode ~IM=o=n=EIIMODEIIIJ or IMODEI m
Clear memories ISHIFTIIScJI G
or Input data Q2J IDTI
1 [,
n--'1 11 n n-..-J\'""'"' n
ITJinTI
QJ IDTI
. d it is ossible to find the value of
NOTE· that if you are using your calculator m SD mo e, p . db .
. d l x l 0 some models this is obtame y pressmg
(j directly. Look for a key marke ~. n
ISHIFTII.II. Find the key on your model. To obtain
Consider first how to calculate the end-values of the most commonly used interval, the H :t i'> the mean of a randorn of any size II taken from a normal population \'i'ith
95% confidence interval. The method can then be adapted for other levels of confidence. km_nvn variance
Note that it is useful to be able to follow the theory for the derivation of the end-points, but in then a 9S<;r;) confidence interval for ,u 1s
practice you will probably only need to be able to apply the formula.
As you saw on page 438, for random samples of size n,
x-~~
The mass of vitamin E in a capsule manufactured by a certain drug company is normally
Standardising, Z =----:r, where Z- N(O, 1) distributed with standard deviation 0.042 mg. A random sample of five capsules was analysed
a/vn and the mean mass of vitamin E was found to be 5.12 mg. Calculate a symmetric 95%
Consider the distribution of Z. confidence interval for the population mean mass of vitamin E per capsule. Give the values of
For a 9 5% confidence interval you need to find the values of z between which the central the end-points of the interval correct to three significant figures. (C)
95% of the distribution lies. This means that the upper tail probability is 0.025 aud the lower
tail is 0.975. Solution 9.19
P(Z < z) = 0.975 X is the mass, in milligrams of a vitamin E capsule.
z = <1>- 1(0.975) 95% X- N(p., a 2 ) with a= 0.042.
= 1.96
The values of z are ±1.96. \ X-
- ( "')
N p.,-;; with n = 5.
2.5%
So P(-1.96 < Z < 1.96) = 0.95
The 95% confidence interval for I' is ( x -1.96 ~, x + 1.96 ~}
i.e. P( -1.96 < ~/-:£, < 1.96) = 0.95 -1.96 0 1.96
_ a 0.042
Now consider the inequality in two parts: x ± 1.96 .,-=5.12 ± 1.96x--
vn 15
X -p. x-~~
-1.96 < - - - - < 1.96 = 5.12 ± 0.0368 ...
a a
i
>In >In
a - - a
-1.96 >fn <X -~t X-p. < 1.96 >fn Lower confidence limit= 5.12-0.0368 ... = 5.08 (3 s.f.)
Upper confidence limit= 5.12 + 0.0368 ... = 5.16 (3 s.f.)
- a - a
lt<X+1.96>fn X-1.96- <I' So the 95% confidence interval for p, based on the sample mean, is (5.08 mg, 5.16 mg).
>In
NOTE: The probability that the interval (5.08 mg, 5.16 mg) includes, or has trapped, I" is
T In a 95% confidence interval,
0.95, i.e. 95%. If you took another random sample of the same size, you would probably get
a different interval. If you took lots of samples in a similar way then, on average, 95% of
I the upper tail probability is 0.025
so the lower tail probability is 0.975. I
/Y'\/.95%
IlL\
these intervals would include the true population mean I"· P(Z < z) ~ 0.975 ii
l I
I
\
<
/( \
I.e. <P(z) ~ 0.975
z ~ <P-'(0.975)
2...·.5
.
>:·]/. .
.... :I
[~<.
\. 2.5%
The following computer simulation illustrates the intervals obtained when 100 confidence
-1.96 0 1.96
intervals are constructed, each with 95o/o confidence. On average, 5% do not include fl· ~ 1.96
In practice, you would only construct one interval. Remember that there is a 5% chance that
In a 99% confidence interval,
your interval does not include ft. the upper tail probability is 0.005
The intervals shown in bold are the ones which do so the lower tail probability is 0.995.
not include I"· You will see that in this case just six P(Z < z) ~ 0.995
of the 100 do not include fl. On average 95% of
intervals constructed in this way will include the 1.e. <P(z) ~ 0.995
true population mean. z ~ <P- 1(0.995)
~2.576 -2.576 0 2.576
Summary
i fj ,'
,'fj
\ [\)<::
so the lower tail probability is 0.95. l I
~:4
z 0.674 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291
P(Z < z) ~ 0.95
<P(z) ~ 0.95 - for a 90% confidence interval, P(Z < z) ~ 0.95; p ~ 0.95 gives z ~ 1.645,
1.e.
-1.645 0 1.645
z ~ <P- 1(0.95) - for a 95% confidence interval, P(Z < z) ~ 0.975; p ~ 0.975 gives z ~ 1.96,
~ 1.645 - for a 99% confidence interval, P(Z < z) ~ 0.995; p ~ 0.995 gives z ~ 2.576.
If you want a 98% confidence interval, this implies an upper tail probability of 0.01.
P(Z < z) ~ 0.99 gives z ~ 2.326.
If the p-value for the confidence interval that you require is not in this summary table, you
Will need to work from the main body of the normal distribution table.
454 /\ CC)f'i::'ISE: COU~\s~=· if< ,f:c. -LE\/f:_:__ ;;:,.6.,]'-:;r:C'~
In this case, since the sample size is large, the central limit theorem can be used. (x- 1.645 :[;;,X+ 1.645 :[;;)
N~t, :
2
X is approximately normal and X- ) (see page 442).
a 5
x± 1.645 _,~178.2 ± 1.645x-
H 5:: is the mean of a random sample of size tt, \Vherc n is \n > taken from a non- vn 10
normal ·vvith knon-·n variance ~ 178.2 ± 0.8225
then ;1 95 (% confidence inten-'al for p -is
So the 90% confidence interval~ (178.2- 0.8225, 178.2 + 0.8225)
a a
,X+ L9C~ ~ (177.38 em, 179.02 em) (2 d.p.)
5.096 < 2
" '{;; "'@ (c) Confidence interval for ,u, the population mean
'{;; > 5.096 • of a normal or non-normal population,
2 • with unknown variance a 2
• using a large sample, n
'{;; > 2.548
n > 2.548 2 When calculating confidence intervals it is often the case that the population variance, a 2 , is
i.e. n > 6.49 ... not known. Provided that the sample size, n, is large, (n;;;;. 30 say) it is pennissible to use 8 2 ,
the best unbiased estimate for a 2 (see page 447).
The least number of tests that should be carried out is 7.
Ideally the distribution of X should be normal, but an approximate confidence interval can
also be given when the distribution of X is not normal. Remember that in both cases, n must
be large.
Solution 9.24
30
a 95~/;) conhde::m ""'"''"''for pi~
'
(a ) f..t=X 2: X 35 050
200 ~ 175.25
n
! h n
lx
!
L% X-'- 1.96 u"2 ~ --s
n-1
2
~ _n (2:---x
x' _2)
where n-1 n
n-
! ~ 200 (6 163 109
or
l II 175.25 2 )
II 199 200
~ 103.5
Example 9.23
The fuel consumption of a new model of car is being tested, In one trial, 50 cars chosen at Alternatively
random, were driven under identical conditions and the distances, x km, covered on 1 litre of
8' ~-1 (2:: x'- (2:: x)') svc p:-tgt: 44N
petrol were recorded. The results gave the following totals: n-1 n
Lx ~ 525, 2:x 2 ~ 5625, ~ 1_ (6163 109 35 050')
Calculate a 95o/o confidence interval for the mean petrol consumption, in kilometres per litre, 199 200
of cars of this type. ~ 103.5
8 LS15 ,,
x±L96.,-~10.5±L96x
~n
-=
-~so
~ 10,5 ± 0.42
95% confidence interval for fl ~ (10.08 km/litre, 10.92 km/litre)
Example 9.24
The height, x em, of each man in a random sample of 200 men living in the UK was
measured, The following results were obtained:
Lx ~ 35 050, 2:x 2 ~ 6 163 109,
(a) Calculate unbiased estimates of the mean and variance of the heights of men living in
the UIC
(b) Determine an approximate 90% confidence interval for the mean height of men living in
the UK. Name the theorem that you have assumed. (NEAB)
-:-·······i>lil/'·.····,·,,·,,l>l•········---------------....t~~··....------------------------·······.·~i·.·.,·. ·-,,,·.·I!·CJ·i'i·4·6·1·---...
(b) Forty random samples of 36light bulbs are 15. The age, X, in years at last birthday, of 250
taken and a 90% confidence interval for fl is mothers when their first child was born is given
calculated for each sample. Find the
on expected number of intervals that contain It·
in the following table:
6. A random sample of 6 items taken from a
1. The concentrations, in milligrams per litre, of a X No. of mothers
normal population with mean p and variance 12. An efficiency expert wishes to determine the
trace element in 7 randomly chosen samples of
4.5 cm 2 gave the following data: mean time taken to drill a number of holes in a 18- 14
water from a spring were
Sample values: 12.9 em, 13.2 em, 14.6 em, metal sheet. Determine how large a random
240.8, 237.3, 236.7, 236.6, 234.2, 233.9, 232.5. 20- 36
12.6 em, 11..3 em, 10.1 em. sample is needed so that the expert can be 95%
22- 42
Determine the unbiased estimates of the mean (a) Find the 95% confiden~e inte_rval fo~ p. certain that the sample mean will differ from the
and the variance of the concentration of the trace (b) What is the width of thts conftdence mterval? true mean time by less than 15 seconds. Assume 24- 57
element per litre of water from the spring. (L) that it is known from previous studies that the 26- 48
7. A factory produces cans of meat whose masses population standard deviation is 40 seconds. (L) 28- 26
2. Find the best unbiased estimates of the mean {t are normally distributed with standard deviation
and variance a 2 of the population from which 13. A random sample of 60 loaves is taken from a 30- 17
18 g. A random sample of 25 cans is found to
each of the following samples is drawn. It is a have a mean mass of 458 g. population whose masses are normally 32- 7
good idea to do parts (a) to (c) both with and distributed with mean p and standard deviation 34- 2
(a) Obtain the 99% confidence interval for the 10 g.
without a calculator. 36- 0
population mean mass of a can of meat
(a) 46, 48, 51, 50, 45, 53, so, 48 produced at the factory. (a) Calculate the width of a symmetric 38- 1
1.687, 1.688, 1.689, (b) Explain what the interval means. 95% confidence interval for p based on this
(b) 1.684, 1.691, sample.
1.688, 1.690, 1.693, 1.685 (c) Would the interval be wider if a 90% (The notation implies that, for example in row 1,
confidence interval was calculated? (b) Find the confidence level of a symmetric there are 14 mothers for whom the continuous
(c) 22 23 24 25 Explain your reasoning. 95% confidence interval having the same variable X satisfies 18 <X< 20.)
X 20 21
width as before but based on a random Calculate, to the nearest 0.1 of a year, estimates
14 17 26 20 9 8. A random sample of 100 observations from a sample of 40 loaves. of the mean and the standard deviation of X.
f 4
normal population with mean It gave the
following data: LX~ 8200, Lx 2 ~ 686 800. 14: The distribution of measurements of thicknesses If the 250 mothers are a random sample from a
(d) L;< ~ 120, L;< 2 ~ 2 102, n~8 large population of mothers, find 95%
of a random sample of yarns produced in a
(e) L;< ~ 100, L;< 2 ~ 1 028,n ~ 10 (a) Find a 9 8% confidence interval for fl· textile mill is shown in the following table. confidence limits for the mean age, p, of the total
(b) Find a 99% confidence interval for ll. population. (C)
n ~ 34, LX~ 330, LX ~ 23 700
2
(f) (c) Would your answers have been different if
Yarn thickness in microns
the population was not normal? 16. The lifetimes of 200 electrical components were
3. A measuring rule was used to measure the ~ength Explain your answer. (mid-interval values) Frequency recorded to the nearest hour and classified in the
of a rod of stated length 1 m. On 8 successive frequency tabulation.
occasions the following results, in millimetres, 9. Eighty employees at an insurance company were 72.5 6
were obtained. asked to measure their pulse rates when they 77.5 18 Lifetime Frequency Lifetime Frequency
1000, 999, 999, 1002, 1001, 1000, 1002, 1001. woke up in the morning. The researcher then 82.5 32
calculated the mean and the standard deviation 0- 80 600- 4
87.5 57
Calculate unbiased estimates of the mean and, to of the sample and found these to be 69 be~ts and 100- 48 700- 3
two significant figures, the variance of the errors 4 beats respectively. Calculate a 97% confidence 92.5 102
occurring when the rule is used for measuring a 97.5 200:_ 30 800- 2
interval for the mean pulse rate of all the 51
1 m length. (L) employees at the company, stating any 300- 18 900- 0
102.5 25
assumptions that you have made. 400- 10 1000- 0
4. Cartons of orange are filled by a machine. A 107.5 9
sample of 10 cartons selected at random from 500- 5
the production contained the following
10. One hundred and fifty bags of flour are taken Illustrate these data on a histogram.
from a production line and found to have a mean
quantities (in millilitres) mass of 748 g and standard deviation of 3.6 g. Estimate, to two decimal places, the mean and Draw a histogram of the data and estimate the
standard deviation of yarn thickness. Hence mean and standard deviation of the distribution.
201.2 205.0 209.1 202.3 204.6 (a) Calculate an unbiased estimate of the estimate the standard error of the mean to two Calculate a symmetric 90% confidence interval
206.4 210.1 201.9 203.7 207.3 standard deviation of a bag of flour decimal places, and use it to determine for the population means, using a suitable
Calculate unbiased estimates of the mean and produced on this production _line. . approximate symmetric 95% confidence limits normal approximation for the distribution of the
variance of the population from which the (b) Calculate a 98% confidence mterval fot the giving your answer to one decimal place. (Min) sample mean. (MEl)
sample was taken. (L) mean mass of a bag of flour produced on
this production line.
5. A certain type of tennis ball is known to have a (c) State any assumptions you have made.
height of bounce which is normally distributed
with standard deviation 2 em. A sample of 60 11. (a) A 95% confidence interval for the mfel~nl
. of a parttcu
length of hfe . lar. b ran d 0 tglt
tennis balls is tested and the mean height of
bounce of the sample is 140 em. bulb was calculated and the confi~e~cl~ottrs.
limits were 1023.3 hours and 11° · fa
(a) Find a 95% confidence interval for the mean
height of bounce of this type _of tennis ~all.
The interval was based on the rest~~s :/the
random sample of 36 light bulbs. tn
(b) State any assumptions made m calculatmg . . lf
99% conftdence mterva or fl,_
the mean
b lb
your interval. length of life of this brand of !tght u '
(d) Confidence interval for p when For example, for a sample of size 8,
T follows a !-distribution with 7 degrees of freedom. You would write T- t(7).
'" the population is normal
• a 2 is unknown, The 95% confidence interval for 1-' is obtained as follows:
" sample size n is small, H
2
and s are the mean and ,.-ariancc of ;-1 smal_l f'rom a normal
When calculating confidence intervals, you have already encountered the situation when large \Vith unkmnvn mean p and unkJw\vn vari~m 1..:e
samples (n;;. 30) are taken from a normal population with unknown variance a •
2 then a 9S (>;) confidence interval for p i~
i f1 CJ !I
For large samples, (X--- t 'I .k _,_ t
X.-,, 11
Before looking at confidence intervals f' when the sample size is small, consider further the r~ P(Td)~p
!-distribution.
: \
I
THE t-DISTRIBUTION I
~'
0
The distribution ofT is a member of a family of !-distributions. All !-distributions are
symmetric about zero and have a single parameter v (pronounced new) which is a positive (! {)_ 0 ..90 0 . 95 (}_ ')'?5 !),9)! ()_ 995 o.(J975 D._'Jl)(;~ 0.~1')()5
integer.
II - 1 1.000 3.078 6.314 12.71 31.82
v is known as the number of degrees of freedom of the distribution and if, for example, T has 63.66 127.3 318.3 636.6
2 0.816 1.886 2.920 4.303 6.965 9.925 14.09 22.33
a !-distribution with five degrees of freedom, you would write T- t(5). 31.60
0.765 1.638 2.353 3.182 4.541 5.841 7.453 10.21
The diagram below shows two curves, t(2) and t(5). 12.92
4 0.741 1.533 2.132 2.776 3.747 4.604 5.598
Note that as v increases, the corresponding t(v) curve rese1nbles the standardised normal 7.173 8.610
distribution N(O, 1). In fact when v;;. 30, the difference between the t(v) distribution and the 5 0.727 1.476 2.015 2.571 3.365 4.032 4.773 5.893 6.869
normal distribution is negligible. G 0.718 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959
For samples of size n, it can be shown that 0.711 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408
8 0.706 1.397 1.860 2.306 2.896 3.355
Standardised normal curve 3.833 4.501 5.041
-., N(O, 1) 9 0.703 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781
'/ [() 0.700 1.372 1.812 2.228 2.764 3.169 3.581 4.144
v = 10 4.587
!j 0.697 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437
v=2 ..
3() 0.683 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646
40 0.681 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.551
GO 0.679 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.460
120 0.677 1.289
I 1.658 1.980 2.358 2.617 2.860 3.160 3.373
0.674 ( 1.282 1.645 1.960 2.326 2.576 2.807 3.090
-3 -2 -1 0 2 3 3.291
~~:will see ~ha~ as'!' increases, the corresponding t-distribution becomes more and more like
T= X~/': follows a !-distribution with (n -1) degrees of freedom. d' .~r~al d1st~Ibutwn. Compare the last row, v == oo, with the critical values for the normal
Istn utwn, prmted on page 649.
fJ/m
T
I
You will find that you use the t-distribution tables in a slightly different way from the normal ' (iii) ',':ou need P(l Tl < 2.201) i.e. P(-2. 201 < T < 2 _2011
tables, so you need to ensure that you can use them correctly.
Fmd row 11, go across to 2.201 which is in column 0.975.
In this extract, the highlighted values are referred to in tbe text.
So P(T < 2.201) ~ 0.975
i),95 nS7S n. 0.995 U.'!:J7'i 11.Y')':j (1,99 }s P(T> 2.201) ~ 1-0.975 ~ 0.025
iL7.\
Example 9.25(b)
Example 9.25(a)
The random variable T has a t-distribution with 14 degrees of freedom i e T _ t( 141
Consider T following a t-distribution with 11 degrees of freedom, i.e. v ~ 11 and T- t(11 ). Fmd the value oft for which ' · · · ·
Find (i) P(T < 1. 796) (i) P(T < t) ~ 0.90 (ii) P( I T I < t) = 0.98
(ii) P(T>3.106)
:\
f/ : \[K._
(iii) P(i Tl < 2.201) Solution 9.25(b)
1 'vo.9s
(i) v ~ 14, row 14, column 0.90 gives t= 1.345. I .C\
Solution 9.25(a) (ii) Th e require d value for t corresponds to an upper tail 0.01 0.0.1
(i) v ~ 11, so find row 11 and go across to 1.796, then probability of 0.01, so the p-value must be 0.99. ~>S: :
up to the top of the column. This gives 0.95. Row 14, column 0.99 gives t ~ 2.624. 0 t
0.95
P(T < 1.796) ~ 0.95 ~ 95%
-t 0
0.'775 The 95% confidence limits for fl are
/! (.1,75 0.'71i ll.'iS
So if T ~ 1(9) for example, 8
1.000 3.078 6.314 12.71 x ± t -Jn where (-1, t) enclose 95% of the t(9) distribution.
}
0.816 1.886 2.920 4.303 P(T < t) ~ 0.975 gives 1 ~ 2.262
0.765 1.638 2.353 3.182 From tables, as illustrated on page 466, the critical value 1 is 2.262
and the critical values oft for the 95%
4 0.741 1.533 2.132 2.776
confidence interval are ±2.262 3' 213 ...
0.727 1.476 2.015 2.571 :. conf1'd ence 1'1m1ts
. are 39 7.87 ± 2.262 x --=~
1.440 1.943 2.447 '[\.[(_)'/ L: '[his v :il\,_' \Viii be: uc;c:d i'L ~-.X<ll,-q;lt· ) .) ()
{10
h 0.718
1.895 2.365 ~ 397.87 ± 2.298 .. .
0.711 1.415
g 0.706 1.397 1.860 2.306 95% confidence interval for JJ. ~ (397.87- 2.298 ... , 397.87 + 2.298 ... )
') 0.703 1.383 1.833 12.2621 ~ (395.6 g, 400.3 g) (1 d.p.)
Since "2 is unknown, find 82 (see page 447) Since n ~ 15, the t(14) distribution is considered.
2
a~ _ 1 _ (2:x2- (2:x) ) with n ~ 10 (),75 0.90 0,95
n-1 n 1.000 3.078 6.314
2
For a symmetric 90o/o confidence interval,
. 1( 3978.7 ) 0.816 1.886 2.920 column p ~ 0.95 is required in order to find the
~9 1 583 098.3- 10
critical value of t.
~ 10.325 .. .
12 0.695 1.356 1.782 When v ~ 14, t ~ 1. 761,
L3 0.694 1.350 1.771 so (-1.761, 1.761) encloses the central90% of
&~3.213 .. .
0.692 1.345 11.7611 the t( 14) distribution.
l:x 3978 ·7 - 397.87.
The sample mean, X = (Extract from tables on page 650)
n 10
Since n is small, a l(n- 1) distribution is required.
n ~ 10, so use a t(9) distribution.
468 T
I
The 90% confidence limits for I" are
8 1.22 ...
I CONFIDENCE INTERVALS FOR THE POPULATION PROPORTION, p
X± t-~ 12.2 ± 1.761
-{,:,
X ·=
~15
Imagine that you want to find p,(the proportion of successes in a particular population. To
obtain an idea of its value, you could take a random sample of size n and calculate p" the
~ 12.2 ± 0.556 ... proportion of successes in your sample. This would give the best unbiased estimate p,where
.
90% confidence mterva ) +(0.556
l ~ (12 ·2 - 0 ·556 ... , 12.2 d )... ) p ~ p, (see page 447). You could also use this value of p, to obtain an interval estimate of p,
~ (11.64 em, 12.76 em 2 ·P· known as a confidence interval for p.
Width of interval~ 2 x 0.556 ··· The theory needed to derive the confidence interval for p is based on the sampling distribution
~ 1.11 em (2 d.p.) of proportions, P, described on page 445.
This states that, provided the sample size n is large, (n ?> 30),
ur··:
1. The heights, in metres,. of a rand.om sample of 6
5. A random sample of 7 independent observations
The standard deviation of the sampling distribution of proportions, JEf. is needed in the
of a normal variable gave calculation of the limits for the confidence interval. The difficulty is, however, that its value
policemen from a partlcular statlOn were as
follows: LX=35.9, L.x'=186.19. isn't known, since p isn't known!
1.80, 1.76, 1.79, 1.81, 1.83, 1.79. Calculate
policeme~ from
To overcome this, use p ~ p,. Jriting 1 - p, as q" the standard deviation of the sampling
Assum ing that the heights of (a) an unbiased estimate of the population
. 'b d th
that station are normally dtstn ute wt mean, l . dIstn utwn IS approximate ly -p,q,
. 'b . . . -.
meanp, (b) an unbiased estimate of the popu atton n
(a) calculate a 95% con~i~ence interval for It, standard deviation, You are then able to find approximate confidence intervals for p as follows:
{b) state the width of thts mterval. (c) a 90 % confidence interval for the
population mean.
Confidence limits Confidence interval
A sample of 8 independent observations of a . Width
2. normally distributed variable gave the followmg 6. The masses, in grams, of 13 washers selected
values:
from a production line at random are: 90% p, ± 1.645 ~ (r, -1.645 ~, p, + 1.645 ~) 2x1.645 ~
3.6, 3.9, 4.5, 3.8, 4.4, 4.9, 4.2, 3.8. 15.4, 15.2, 14.6, 16.1, 14.8,
~ ~.r,+L96 ~)
15.3, 15.9, 16.0, 15.4, 14.6,
~
(a) Determine a 99% confidence interval for the 95%
15.0, 15.5, 16.1. p,± 1.96 (r,-1.96 2xl.96
population mean f..L· 'd h f
b) Find the difference between the WI t s ~ a Calculate 98% confidence_ limits. for the mean
( 90% confidence interval for panda 95 Yo
confidence interval for 11-·
mass of the washers on thts particular
production line, assuming that the mass can e
b 99% p,±2.576 ~ (r,-2.576 ~,p,+L96 ~) · 2x2.576 ~
modelled by a normal distribution.
3. Twenty measurements of x, the life, in hom~s, of
a particular make of candle gave the followmg 7. Fifteen pupils performed experiments t? fi~htl:; Remember that the sample size, n, should be large (n ?> 30 say), since the normal
data: value of g, the acceleration due to gravtty. e approximation to the binomial distribution is used in obtaining the distribution of sample
LX= 172, LX'= 1495.5. results were as follows: proportions. Also, since a continuous distribution has been used as an approximation for a
Assuming that the length of life is I?odelledoby a 9.806, 9.807, 9.810, 9.802, 9.805, discrete distribution, continuity corrections should be used. These are usually omitted,
normal distribution with mean p., fmd a 98 Yo 9.806, 9.804, 9.811, 9.801, 9.804, however, when calculating confidence intervals.
confidence interval for It· 9.805, 9.808, 9.803, 9.809, 9.807.
4. A random sample of 8 independent observations Assuming that these are taken f~om a ~?n:na~or Example 9.28
of a normal variable gave population, calculate 95% confidence tmltS
the value of g based on these results.
LX= 261.2, L(x -x)' = 3.22. A manufacturer wants to assess the proportion of defective items in a large batch produced by
Calculate a 95% confidence interval for the a particular machine. He tests a random sample of 300 items and finds that 45 items are
population mean. f defective.
If 400 such samples were taken:' how many o
these would you expect not to mclude the (a) Calculate an approximate 95% confidence interval for the proportion of defective items in
population mean? the batch.
(b) If 200 such tests are performed and a 95% confidence interval calculated for each, how
many would you expect to include the proportion of defective items in the batch?
T (
I
Solution 9.28 I (c) In part (b) the percentage of shops was estimated t
Youn .
ow reqmre n sue
hth
. h"
OWlt m±3.9%
at the percentage is to within ±2 %, .
45
(a) p,~-~ 0.15, q,~ 1- p, ~ 0.85, n ~ 300.
300 so that P, ± 1.645 Jp~q, ~p, ± 0.0 2
The 95% confidence limits for p are
Taking the + sign on both sides
p, ± 1.96 )p~q, ~0.15
0.15 X 0.85
± 1.96
300 P,+1.645 ~~p,+0.02
~ 0.15 ± 0.0404
95% confidence interval~ (0.15- 0.0404, 0.15 + 0.0404) .. 1.645 ~~ 0.02
~ (0.1096, 0.1904).
(b) The expected number of tests that include the proportion of defective items in the batch 0.34 X 0.66
~ 200 X 0.95 ~ 190.
1.e 1· 645 x
J
1.645 X V0.34 X 0.66
n
~ 0.02
0.02 ~ .y;;
Example 9.29 .y;; ~ 3 8.96 ...
In a random sample of 400 carpet shops, it was discovered that 136 of them sold carpets at n ~ 1520 (3 s.f.)
below the list prices recommended by the manufacturer. 1520 shops would have to he sampled.
(a) Estimate the percentage of all carpet shops selling below list price.
(b) Calculate an approximate 90% confidence interval for the proportion of shops that sell
below list price and explain briefly what this means. 9g Confidence i Is for o
(c) What size sample would have to be taken in order to estimate the percentage to within '
1. In a survey of a random sample of
8 9 8 10 11 8 7 12 12 9
±2o/o, with 90o/o confidence? 250 households in a large city, 170 households 9 8 11 8 9 7 11 12 11 10
owned at least one pet.
9 10 10 10 8 8 7 12 9 9
(a) ~ind an approximate 95% confidence 10 13 7 8 9 9 10 10 8 12
Solution 9.29 mter~al for the proportion of households in 9 9 10 10 11 12 9 9 10 9
the ct~y that own.at least one pet. (a) ~ind the proportion in the sample requiring
136 (b) Explam why the mterval is approximate.
(a) p, ~ ~ 0.34 SIZe 9,
400 (b) Assuming that the last 50 recruits can be
An estimate of p, the proportion of all carpet shops selling below list price, is f! where 2. In order to assess the probability of a successful
regarded as a random sample of all recruits
~utcome, an experiment was performed 200
f! ~ p, ~ 0.34. So an estimate of the percentage of shops is 34%. ~alculatc an approximate 90% confidence '
ttmes. The number of successful outcomes was
(b) An approximate 90% confidence interval for pis given by 72. mter~~l fo~ the proportion, p, of all recruits
reqmnng SIZe 9.b 0 ots.
(a) Find a 9 5% confidence interval for p the
)p,nq, ~ 0.34 ± 1.645 X
0 34 0 66 (c) Explain why the interval is approximate.
p, ± 1.645 " X " p~obability of a successful outcome. '
400 (b) Fmd a 99% confidence interval for p. 5. In a market research survey, 25 people out of a
~ 0.34 ± 0.039 random sample of 100 in a certain town said
3 · ~ survey was undertaken of the use of the that th~y regularly used a particular brand of
~ (0.301, 0.379)
u~ternet by residents in a large city and it was ;oap. Fmd app~oximate 97% confidence limits
~(30.1 %, 37.9%) dts~overcd that in a random sample of 150 or the proportr.on of people in the town who
resrdents, 45 logged on to the internet at least regularly use this brand of soap.
The probability that the interval (30.1 %, 37.9%) includes the true population percentage once a day.
is 0.90. If a large number of intervals are calculated in the same way, 90% of them would 6. A co !lege principal decides to consult the
(a) ~alculate an approximate 90% confidence
include the true percentage. mteryal for p, the proportion of residents in ~tudents abou! a proposed change in the times of
the crty that log on to the internet at least ectures. She fmds that, out of a random sample
once a day. of 80 students, 57 are in favour of the change.
(b) One hundred similar surveys arc carried out (a) ~ind an approximate 90% confidence
and the 90% confidence interval calculated mterval.for the proportion of students who
for. each survey. State the expected number are not m favour of the change.
of mtervals that include p. (b) State the effect on the width of such a
confidence interval when the confidence
4. Recruits are issued ~ith boots when they join the level is increased.
army. !he ~ast 50 pmrs of boots issued were the
! ollowmg s1zes:
7. In an opinion poll, 2000 people were interviewed (b) Estimate the additional number of families
and 527 said that they preferred white chocolate to be contacted if the probability that the Distribution of the sample proportion:
to milk chocolate. estimated proportion is in error by more
than 0.01 is to be at most 1%. (AEB) If a number of r~ndom samples, each of the same size n, is taken from a parent population
(a) Calculate an approximate 95% confidence
interval for the proportion of the population and the ~ro~ort~on of successes, P5 , calculated for each sample, then these proportions
who prefer white chocolate. State any 9. The probability of success in each of a long series form a dtstrlbutwn called the sampling distribution of proportions.
of n independent trials is constant and equal to
assumptions you have made.
(b) The a% confidence limits for the proportion p. Explain how an approximate 95% confidence Provided that n is large, the sampling distribution of proportions is approximately normal
preferring white chocolate, based on a sample interval for p may be obtained.
of size 500, are 0.2278 and 0.2922. Calculate
In an opinion poll carried out before a local
election, 501 people out of a random sample of
such that P,- N(p. pnq) where q= 1- p.
(i) the proportion of people in the sample
925 voters declare that they will vote for a
~is called the standard error of proportion.
of 500 who preferred white chocolate,
particular one of the two candidates contesting
(ii) the value of a.
the election. Find approximate 95% confidence
8. The results of a survey showed that 3600 out of limits for the proportion of all voters in favour of
10 000 families regularly purchased a specific this candidate. (AEB) Interval estimates:
weekly magazine.
Confidence interval for the population mean ,u
(a) Find approximate 95% confidence limits for
the proportion of families buying the Conditions
magazine. 95% confidence interval for fl
Normal population
a X+ 1.96 >f,;a)
(--1.96 >f,;'
-with known variance o 2
X
mrna -sample size n large or small
-sample mean X
Point estimates: unbiased estimates for
Non-normal population
population mean ,u fl = x, the sample mean
population proportion p f! = p" the sample proportion
-with known variance o 2
-sample size n large (n ~ 30) (x-1. 96 :[;;, X+ 1.96 :[;;)
n - sample mean X
population variance a 2 b2= --s
2
(s 2 is the sample variance)
n-1 Non-normal population
for large valnes of n, the distribution X is approximately normal such that X- NV'' :')- Conditions
- sample size n large
95% confidence interval for p
Each of a random sample of 50 one-pound coins was weighed and their masses, x grams, are ..r,; ~ 0.445 .. .
summarised by 0.0264 .. .
..r,;~ 16.85 .. .
Lx ~ 474.51, L:x 2 ~ 4503.8276. n ~ 283.9 .. .
(a) Use an unbiased estimate of variance to calculate an approximate 90°/o confidence interval The sample size required is 280 (2 s.f. ).
for the mean mass (in grams) of all one-pound coins, giving the end-values of the interval
(c) When the scales are underweighing by 0.05 g,
to two decimal places.
(b) Estimate the size of a random sample of one-pound coins that would be required to give a - :e co~id~~ce interval in part (a) would be amended. It would be shifted 0.05 units to
95% confidence interval whose width is half that of the interval calculated in (a). eng t. e new confidence interval would be (9.51, 9.57).
(c) It was found later that the scales were consistently underweighing by 0.05 grams. State - the confidence mterval m part (b) would remain the same, since this uses the estimate of
which of the results of (a) and (b) should be amended and which should not. Give the the vanance wh1ch would not be altered if all the readings were increased by 0.05 g.
amended values. (C)
Example 9.31
Solution 9.30
Out of a random sample of 1000 French people interviewed during Autumn 1996,410
X is the mass, in grams of a one-pound coin. supported a smgle European Currency.
(a) a2 ~ _1_ (L x2- (L x)z) (a) Calculate an approximate 99% confidence interval for the population proportion p of
n-1 n French people who supported a single European Currency. ' '
2 (b) Estunate the s1ze of a sample that would have provided a 99% confidence interval of
1 ( 474.51 )
w1dth 0.04 for p.
~ 49 4503.8276- 50
(c) Give one reason (other than rounding) why your answer to (b) is only an estimate. (C)
~ 0.01291 ...
a~ >/0129 ... ~ 0.1136 ... Solution 9.31
LX 474.51 410
(a) p,~
1000 ~0.41andq,~1-p,~0.59.
x~-~ 9.4902
n 50
By the central limit theorem, and using 8 since a is unknown,
In a sample of size 1000, the 99% confidence limits for p are
90% confidence limits for 11 are
a
X± 1.645 _,- ~ 9.4902 ± 1.645 X
0.1136 ...
-{5(5
P, ± 2.576 )p,nq'~0.41 ± 2.576x 0.41 x0.59
vn 50 1000
~
9.4902 ± 0.0264 ... ~ 0.41 ± 0.4006 ...
90% confidence interval~ (9.46 g, 9.52 g) (2 d.p.). 99% confidence interval~ (0.37, 0.45) (2 s.f.).
(b) For a width of 0.04, confidence interval would be 0.41 ± 0.02
a
(b) Width of 90% confidence interval~ 2 x 1.645 .,Jn r--------.-------""'
1.e. 2.576 ~/0.41 nx 0.59 0.02 0.41-0.02 0.41 0.41 + 0.02
~ 2 X 0.0264 ...
~ 0.05287 ... 1.154 ...
0.02 ~ ------ --- 7
Width of required interval~! x 0.05287 ... ..r,; width == 0.04
Now ~-~(~:9) ~ 2.326, so a z-value of 2.326 gives an upper tail probability of 0.01.
5
(c) The answer is only an estimate because the estimate for p, p ~ p, was used to obtain an
approximate value for the standard deviation of the sampling distribution JE!. ··
8
/;/n must lie to the right of 2.326.
54-50
Also in the sampling distribution of proportions (from which the confidence interval is So /;/n > 2.326
obtained) a normal approximation is used for a binomial distribution. 8
4 > 2.326 X _!l_
;[;,
Example 9.32 ;[;, > 4.652 .. .
n>21.64 .. .
It may be assumed that the breaking strength of paving slabs laid in public areas is normally
Smallest sample size is 22.
distributed with mean 50 units and standard deviation 8 units. Random samples of n paving
slabs are taken. The mean breaking strength for a sample is denoted by X.
When n is large, by the central limit theorem
(a) State the distribution of X, giving its mean and variance.
- IS
. approximate
. 1y norma1 and X
- 8
(b) Find the probability that X exceeds 54 units in the case n ~ 25.
X ~ N ( SO,---;;')
(c) Find the smallest possible sample size if the probability that X exceeds 54 units is less
than 0.01. When n is small, you cannot say what the distribution of X is. You only know that its mean is
. . . 8'
Suppose that the breaking strength of paving slabs laid in public areas has mean 50 units and 50 an d 1ts vanance Is - .
n
standard deviation 8 units, but that the form of the distribution of breaking strengths is not
known. Random samples of n paving slabs are taken. What can be said about the form of the
distribution of the mean breaking strength of these samples in the case when n is large, and
Example 9.33
also in the case when n is small? (C)
The 'reading age' of children about to start secondary school is a measure of how good they
are at readmg and understandmg pnnted text. A child's reading age, measured in years is
Solution 9.32
denoted by the random variable X. The distribution of X is assumed to be N(fl, a 2 ). The
X is the breaking strength, X- N(50, 8 2 ). readmg ages of a random sample of 20 children were measured and the data obtained is
summarised by L.x ~ 232.6, 2:x 2 ~ 2756.22. '
(a) x- N(5o, ~). (a) Calculate unbiased estimates of fl and a 2 , giving your answers to correct to two decimal
places.
64 (b) Calculate a symmetric 9 5% confidence interval for fl. (C)
so X follows a normal distribution with mean 50 and variance .
n
Whenn~25,X- N(5o,~;).
Solution 9.33
(b)
, - Lx 232.6
- 8 (a ) fl~x~-~--~ 11.63
Standard deviation of X is -. n 20
5
a'~ _1_ (L:x'- (2:x)')
- >
P(X (
54)~P Z >
54- 50) n-l n
815
~ P(Z > 2.5) ~_!_
19
23 6
(2756.22- 2. ')
20
~ 0.0062
~ 2.688 ...
(c) P(X > 54) < 0.01 :;0.99 ~ 2.69 (2 d.p.).
So P ( Z <
54- 50)
8
/;/n < 0.99
0 z :=
10'001
2.326
478 ,6, COf\JCISE COUF\SE lrJ P., Lf~\/EL ST/\TISTiCS
SAMPLING .'\NO ESTIMATION 479
(b) Since the population is normal, with variance unknown, and the sample size is small, use 5. A random sample of 600 was chosen from the
the t-distribution. 9. I~ june 1996, 150 randomly chosen people aged
adults living in a town in order to investigate the
sJxteen or more were asked whether they smoked
ii number x of days of work lost through illness.
The 95% confidence limits for I' are x ± t ..r,; Before taking the sample it was decided that
cigarettes and 34 said that they did. Assuming
that the responses were truthful, calculate an
certain categories of people would be excluded
approximate 99% confidence interval for the
where (-t, t) encloses the central95% of the t(n- 1) distribution. from the analysis of the number of working days
population proportion of people aged sixteen or
lost although they would not be excluded from
Since n ~ 20, consider t(19). the sample. In the sample 180 were found to be
more who smoked cigarettes.
Give a reason why this interval might not
from these categories. For the remaining 420 contain the true population proportion. (C)
0.75 0.911 0.95 0.975 members of the sample:
2:x ~ 1260 1x 2 ~ 46 000. 10. A certain type of yarn is known to have a
1.000 From tables, the critical t-value is (a) Estimate the mean number of days lost breaking strength with a mean of 25 newtons. In
found from column 0.975, v ~ 19 through illness, for the restricted population, an attempt to increase its breaking strength the
and it is t ~ 2.093. and give a 95% confidence interval for the yarn is treated with a chemical. Each piece of
17 0.689 1.333 1.740 2.110 yarn in a random sample of 80 treated pieces has
mean.
18 0.688 1.330 1.734 2.101 (b) Estimate the percentage of people in the its breaking strength, x newtons, measured,
19 0.688 1.328 1.729 12.0931 town who fall into the excluded categories, producing the following summarised data:
and give a 99% confidence interval for this 2:x ~ 2122 1x 2 ~56 384
95% confidence limits are 11.63 ± 2.093 x -=
-v2.688 ...
v20
percentage.
{c) Give two examples, with reasons, of people (a) Obtain unbiased estimates of the mean, fl,
and variance a 2 , of the breaking strengths of
who might fall into the excluded categories.
~ 11.63 ± 0.767 ... pieces of yarn treated with the chemical.
(0) (b) Construct a symmetric 99% confidence
95% confidence interval for I"~ (11.63- 0.767 ... , 11.63 + 0.767 ... ) 6. The proportion of bruised apricots in a large interval for fl.
~ (10.9 years, 12.4 years) (1 d.p.). consignment is denoted by p. A sample of 100 (c) Hence state, with a reason, whether or not
apricots is examined and 11 apricots are found the manufacturer of the yarn is justified in
to be bruised. claiming that the treatment increases the
mean breaking strength of this type of yarn.
(a) Give an assumption under which it would (d) Explain why you were able to construct
be valid to calculate an approximate
your confidence interval without knowing
confidence interval for p.
the form of the distribution of the breaking
(b) Given that the assumption in part (a) is strength of a piece of yarn. (NEAB)
justified, calculate an approximate 90%
Miscellaneous exercise 9h confidence interval for p. Give the end- 11. Shoe shop staff routinely measure the length of
points correct to two decimal places. (C) their customers' feet. Measurements of the length
1. The mass of a certain brand of chocolate bar has 3. A catering company asked 50 randomly selected of one foot (without shoes) from each of 180
a normal distribution with mean p grams and college students to state the amount of money, 7. The lifetimes of light bulbs of a certain type have
adult male customers yielded a mean length of
standard deviation 0.85 grams. The masses, in $x, which they spent daily on lunch, and the standard deviation 25.3 hours. Each bulb in a
29.2 em and a standard deviation of 1.47 em.
grams, of 5 randomly chosen bars are results were summarised by Lx"" 56.50 and randomly chosen box of 12 was tested to failure
Lx 2 = 66.80. Calculate unbiased estimates of t~e and the mean lifetime was found to be 1.785. 7 (a) Calculate a 95% confidence interval for the
124.31, 125.14, 124.23, 125.41, 125.76. hours. mean length of male feet.
mean and the variance of the amount spent dady
Calculate a symmetric 90% confidence interval on lunch by students at the college, ~iving your (a) State two assumptions which are required so (b) Why was it not necessary to assume that the
for fl, giving the end-points correct to two answers correct to three significant ftgure~. that a symmetric 90% confidence interval lengths of feet are normally distributed in
decimal places. Hence find a symmetric 90% confidence mt~r~al for the population mean lifetime of the order to calculate the confidence interval in
Forty random samples of 5 bars are taken, and a for the mean amount spent daily on lunch, g!VIng part (a)?
bulbs can be calculated.
90% confidence interval for fl is calculated for the end-points correct to the n~ar~st ~0.0~. (b) Calculate a symmetric 90% confidence (c) What assumption was it necessary to make
each sample. Find the expected number of Justify the use of the normal dtstnbutwn m interval, given the validity of the in order to calculate the confidence interval
intervals that do not contain fl. (C) constructing the confidence interval. (C) in part (a)?
assumptions. The values of the end-points
should be given to the nearest integer. (C)
(d) Given that the lengths of male feet may be
2. A telephone company selected a random sample 4. A random sample of 250 adult me-?- un?ergoing a modelled by a normal distribution, and
of size 150 from those customers who had not routine medical inspection had thetr hetghts 8. A consumer group wishes to estimate the making any other necessary assumptions,
paid their bills one month after they had been (x em) measured to the neare.st centimetre, and proportion, p, of packages of sausages whose fat calculate an interval within which 90% of
sent out. The mean amount owed by the the following data were obtamed: content is greater than that stated on the label. A the lengths of male feet will lie.
customers in the sample was £97.50 and the random sample of 40 packages was tested and (e) In the light of your calculations in parts (a)
standard deviation was £29.00. 1x ~ 43 205, 2:x 2 ~ 7 469 107. and (d), discuss, briefly, the question 'is a
nine packages were found to contain more fat
Calculate a 90% confidence interval for the Calculate an unbiased estimate of the population than stated on the label. foot a foot long?' (One foot is 30.5 em.)
mean amount owed by all customers who had variance. Calculate also a symmetr~c 99% (C) (a) Estimate the number of packages that would (AEB)
not paid their bills one month after they had confidence interval for the populatton mean.
have to be tested in order that a 95%
been sent out. (AEB)
confidence interval for p should have a
width of 0.1.
{b) State, giving a reason, whether the number
of packages to be tested would be larger or
smaller than the answer in (a) if the
confidence level were changed to 90%. (C)
174, 164, 182, 169, 171, 187, 17. In an investigation to assess the difference in use The increase in length, in millllnetres, were as
12. Before its annual overhaul, the mean operating 176, 177, 168, 171, 180, 175. betvveen a credit card and a store card a random follows:
time of an automatic machine was 103 seconds.
sample of 20 people, each using both cards, was
After the annual overhaul, the following random Find a 95% confidence interval for the mean 13.52, 14.06, 13.19, 14.77,
selected. They supplied information from which 12.80,
sample of operating times (in seconds) was mass of pink bars of soap. 12.06, 15.12, 14.39, 15.81, 13.38.
in 1994, the difference between each person's '
obtained: Calculate also an interval within which
mean monthly spending on the credit and store Calculate a 95% confidence interval for the
approximately 90% of the masses of the white
90, 97, 101, 92, 101, 95, 95, 98, 96, 95. cards, £d, was calculated. The following mean increase in length of the population of
bars of soap will lie. (AEB)
Assuming that the time taken by the n:ac?ine to summary data were then calculated. fibres, assuming that the increase in length can
perform the operation is a normally d1stnbuted Ld ~ 1664 and Id 2 ~ 426 445. be modelled by a normal distribution.
15. An experimental physicist needs to estimate. the
random variable with a known standard true viscosity, ft Pascal seconds (Pa s), of a hght Stating all necessary. distributional assumptions,
deviation of 5 seconds, find 98% confidence machine oil. Using the same apparatus he takes 20. During a particular evening, 10 babies were born
calculate a symmetnc 90% confidence interval on a particular maternity ward in a large
limits for the mean operating time after the 12 independent measurements, x Pas, of the for the mean difference between the mean
overhaul. viscosity of the oil, obtaining the values below: hospital. The lengths, in centimetres of the
monthly spending for aU users of the two cards. babies were noted: '
Comment on the magnitude of these limits 25.8, 25.2, 24.7, 25.5, 25.3, 25.4, (NEAB)
relative to the mean operating time before the 50, 51, 45, 47, 49, 48, 54, 53, 45, 50.
25.2, 25.3, 25.8, 25.9, 25.2, 24.9.
overhaul. (AEB) 18. The mass, x millgrams, of each of 10 randomly
(LX~ 304.2 LX 2 ~ 7712.9) Assuming that the sample came from an
selected units of a new cancer drug was underlying normal population, calculate a
13. Packets of baking powder have a nominal weight When using this apparatus, measurements of the measured and the following results obtained: 95% confidence interval for the mean of the
of 200 g. The distribution of weights is normal oil's viscosity are distributed with mean p and
and the standard deviation is 7g. Average • 2 35.9, 35.2, 35.0, 34.9, 35.4, population.
vanance a . 34.8, 35.0, 35.1, 35.3, 35.1.
quantity system legislation states that, if the 2
Obtain unbiased estimates of ft and a . Hence
nominal weight is 200 g, 21. The external diameters (measured in units of
obtain a symmetric 95% confidence interval for Assuming that the masses are normally 0:01 m':l above a nominal value) of a sample of
(i) the average weight must be at least 200 g: ft. State any distributional as~umpti?ns you have distributed with mean ft, calculate an 80% p1ston nngs produced on the same machine were:
(ii) not more than 2.5% of packages may wetgh made in obtaining your conftdence mterval. confidence interval for Jt.
less than 191 g. The physicist explained the meaning of his 11, 9, 32, 18, 29, 1, 21, 19, 6.
(iii) not more than 1 in 1000 packages may confidence interval by saying there was a 19. Ten random samples of nylon fibre were tested Assuming a normal distribution, calculate a 95%
weigh less than 182 g. probability of 0.95 that play b_et;"'een the li~it~ for the amount of stretching under tension. confidence interval for the population mean.
of the interval. Explain why thts mterpretatwn ts Each fibre had the same length and diameter and (AEB)
A random sample of 30 packages had the
wrong and provide a correct explanation of 95% was stretched by applying a standard load.
following weights:
confidence as used in this context. . .
218,207,214,189,211,206,203,217, The manufacturer of the oil quotes a v1scos1ty of
183,186,219,213,207,214,203,204, 25.5 Pa s for the oil. With reference to your
195,197,213,212,188,221,217,184, confidence interval, state any conclusion you can
186,216, 198, 211,216,200. come to regarding the validity of this figure.
(NEAB)
(a) Calculate a 95% confidence interval for
the mean weight.
(b) Find the proportion of packets in the 16. Three weeks before an election in a certain
Mixed test 9A
sample weighing less than 191 g and constituency an opinion poll was conducted
using a random sample of 800 voters selected 1. A random sample of 40 nails is drawn from a 3. Out of 248 cars parked in a car park 72 were
use your result to calculate an population whose lengths are normally
approximate 95% confidence interval from the electoral roll. The numbers of persons fitted with an anti-theft device on th; steering
for the proportion of all packets who said they would vote for parties A, B, ~are distributed with mean p mm and standard wheel. Assuming that the cars form a random
recorded below; the remainder were categonsed deviation 0.48 mm. sample of parked cars, calculate an approximate
weighing less than 191 g. (AEB)
as 'Don't know'. (a) Calculate the width of a symmetric 95% confidence "interval for the population
99% confidence interval for Jl based on proportion of parked cars fitted with an
14. A company manufactures bars of soap. In a Don't know
random sample of 70 bars, 18 were found to be Party A Party B Party C this sample. anti-theft device on the steering wheel.
mis-shaped. Calculate an approximate 99% (b) Find the confidence level of a symmetric Give a reason, other than rounding in the
confidence interval for the proportion of 264 256 144 136 confidence interval having the same width as calculations, why the interval is approximate.
mis-shaped bars of soap. before, but based on a random sample of Give a reason why the assumption of
Explain what you understand by_a . (a) Calculate an approximate 90% symmetric 20 nails. (C) randomness might not be valid. {C)
99% confidence interval by constdenng confidence interval for the proportion of ~he
total electorate in the constituency that wJll 2. From time to time a firm manufacturing 4. The fat content of a well-known brand of
(a} intervals in general based on the above pre-packed furniture needs to check the mean beefburger was investigated by measuring the
vote for party A in the election.
method, (b) Give a very brief description of how the distance between pairs of holes drilled by percentage of fat, X, in each of 12 randomly
(b) the interval you have calculated. sample might have been selected, to ensure machine in pieces of chipboard to ensure that no selected beefburgers. The results were
The bars of soap are either pink or white in that it was random. change has occurred. It is known from summarised as follows:
colour and differently shaped according to (c) In the actual election, 41% of the total experience that the standard deviation of the LX~ 228, Lx 2 ~ 4448.
colour. The masses of both types of soap are electorate voted for party A. Give two . distance is 0.43 mm. The firm intends to take a
known to be normally distributed, the mean possible explanations for the fact that_thts random sample of size 11, and to calculate a Assuming the percentage fat content to be
mass of the white bars being 176.2 g. The value is not contained within the conftdenc~) 99% confidence interval for the mean of the normally distributed, find a 90% confidence
standard deviation for both bars is 6.46 g. A interval calculated in (a). (NEA population. The width of this interval must be no interval for the population mean p.
sample of 12 of the pink bars of soap had more than 0.60 mm. Calculate the minimum
masses, measured to the nearest gram, as value of 11. (L)
follows:
482 ,6... CONCiSE COURSE iN ,t.-LE\ El ST,t1TiSTiCS
Mixed test 98
J. A group of 65 students is asked to g?ess the 3. A researcher is designing a study to standardise a
length of a particular object and thetr. answers new intelligence test. It is kno~n t_hat scor~s on
arc recorded as x em, with the followmg results: this type of test are normally dtstnbuted wtth a
standard deviation of 15.0.
LX~ 6019.0 and L.x 2 ~ 557 733.8.
(a) Write down in terms of X, the sample mean,
(a) Show that the estimated standard error of and n, the sample size, an expression for a
the sample mean is 0.3 em. 99% Symmetric confidence interval for the
(b) Determine an approximate symmetric 95% mean test score.
confidence interval for the mean of the (b) Calculate, to the nearest 100, the value of n
population of all such g~esses, giving your such that the width of this confidence
limits correct to two dectmal places. interval will be less than 1.0. (NEAB)
(c) State one assumption which you have made
in your calculations. (NEAB) 4. In Tesbury's supermarket, economy packs of
Hypothesis tests: discrete distributions
butter are marked 250 g. An inspector takes a
2. A survey was carried out by a County Meals random sample of 12 packs and weighs them.
Service in order to gauge the response to a new
'healthy eating' menu. A random sample of ~00
Correct to the nearest 0.1 g, the weights, in In this chapter you will learn about
grams, were:
schoolchildren was selected from schools usmg
the menu and it was found that 84 children 246.5, 240.9, 245.3, 250.5, 248.7, 249.1, ® the language of hypothesis testing
approved of it. Calculate an appro::cimate 95% 251.0, 249.8, 249.8, 247.6, 246.2, 241.4.
confidence interval for the populatwn (a) Making any necessary assumptions, which w how to perform a test
proportion, p, who approve of t~e new menu. should be stated, calculate a 99% for the parameter p of a binomial distribution (small sample)
It is given that p = 0.38. Use a swtable .. confidence interval for the mean weight of
approximation to calculate th~ probabthty that,
- for the mean .<of a Poisson distribution
the packs of butter. .
in a random sample of 200 chtldren, the . (b) Calculate the width of the 99% confidence • Type I and Type II errors associated with hypothesis tests
proportion who approve of the new menu wtll be interval.
at least 0.42. (C) (c) How is the width affected when calculating
a 90% confidence interval?
Background knowledge
You will need to be able to
recognise the conditions needed for a situation to be modelled by a binomial distribution
or a Poisson distribution.
find related probabilities by direct evaluation or by using cumulative probability tables.
Sid says that he has psychic powers and can read people's thoughts. To test this claim, a
volunteer from the audience sits on the stage while Sid sits in a separate room off stage. The
volunteer chooses a card from a well-shuffled pack and concentrates on the card for five
seconds. At the same time, Sid writes down the suit of the card, either hearts, diamonds,
spades or clubs. The card is replaced in the pack, the pack is shuffled and another card drawn.
The procedure is repeated until20 cards have been drawn.
There are four suits, so Sid has a one in four chance of writing down the correct suit if he
guesses the answer. If he isn't guessing, you would expect him to get more than one in four
correct. So if he gets five (or fewer) correct answers out of the 20, you would definitely say
that he is just guessing but if he gets as many as 19 or 20 correct you would have no
hesitation in saying that he could read people's thoughts.
But what about other values? If he gets 12 correct answers, would this be very unusual? What
Would you say if he got 10 correct? What about 8 correct?
484 I ''i"'
Somehow you have to decide on a cut-off point, c. This would be the least value you could You have to make a decision about the value of the probabili'ty th t · 'd d · l
find such that the probability of getting cor more correct answers would be very small. It l'k 1 h' . a 1s canst ere to Imp y an
un I e y or rare event. T 1s probability is called the significance level f th t A ·d
would be considered a rare event to get c or more correct answers. events that have a probability of 5% or less are regarded as unlt'k l o d e est.h sa gm e,
b b'l' f 1 o' 1 e Y an events avmg a
pro a I Ity o 'o or ess are regarded as very unlikely Often a · 'f' h
· · d
1eve1Is carne out. · s1gm Icance test at t e 5%
0 2 3 4 c-1 c 18 19 20
The cut-off point cis known as the critical value and the group of observations that are
constdered to be unusual or unhkely (rare) events is called the critical (or reJ· e .: ) ·
The t 1 1 d ·· 1 · d Cuon reg10n
en 1ca va ue an cnt1ca regwn epend on the significance level chosen. ·
To decide on the value of c, you could choose a number that seemed reasonable. If however Suppose you choose a significance level of 5% to test Sid's claim From the 1· b
P(X::;.8) 10')( Th' · · woronga ove
you perform a hypothesis (or significance) test you will be able to back up your argument and v ~ o. IS Is greater th~n 5%, sox# 8 is not the critical region; getting eight '
correct answers would not be considered an unlikely or rare event.
conclusion with statistical theory.
Suppose that X is the number of correct answers that Sid writes down for the suits of the 20 But P(X > 9) ~ 4%, which is less than 5%, so getting nine correct answers would be
cards. If you assume that Sid is just guessing, the probability that he writes down the correct considered an unhkely or rare event. Therefore the critical value for a 5°'0 level of · 'f'
· 9 d h ·· 1 · . ' stgmicance
IS an t e cntica regwn Is x;;, 9 , i.e. 9 , 10 , 11 , 12, ... , 19
. or 20 correc t answers.
suit is 0.25. The experiment is performed 20 times, so there are 20 independent trials, each
with a probability of 0.25 of success. This suggests a binomial situation (see page 279). In
fact, on the assumption that Sid is guessing, X can be modelled by a binomial distribution 0 2 3 4 5 6 7 8 . 9 10 11 12 13 14 15 16 17 18 19 20
with n ~ 20 and p ~ 0.25, i.e. X~ B(20, 0.25).
You now need to look for a value, c, in this distribution such that P(X;;, c) is very small.
Binomial probabilities can be calculated directly (see page 279) or found from cumulative
probability tables which give P(X < r) for various values of nand p, where X~ B(n, p). The
extract here relates to X~ B(20, 0.25) and has been reproduced from page 646.
The tables give probabilities to four decimal places indicating
that, to four decimal places, P(X < 13) ~ 1.0000. This implies P(X <; r) for X~ B(20, 0.25) The language used in hypothesis testing
that P(X;;, 14) ~ 1 ~ P(X < 13) tends to 0. If he is just guessing, it
p ~ 0.25 The assumption that Sid is guessing is called the null hypothesis and it is written Ho. The null
would be almost impossible for Sid to give 14, 15, 16, 17, 18,
19 or 20 correct answers. So if he gives, for example, 14 correct n~20 r= 0 0.0032 hypothests IS very nnportant as it provides the model for the calculations. You would write
answers you would certainly have to conclude that he is able to 1 0.0243 H 0 : P ~ 0.25
read people's thoughts in some way! 2 0.0913 T T
Similarly P(X;;, 13) ~ 1- P(X < 12) ~ 1- 0.9998 ~ 0.0002. 3 0.2252
Getting 13 or more correct answers would be a very rare event. 4 0.4148
P(X;;, 12) ~ 1- P(X.; 11) ~ 1 ~ 0.9991 ~ 0.0009. 5 0.6172 If Sid has psychic powers, then he should get more than one in four co~rect and the
Getting 12 or n1ore correct answers is still a very rare event. 6 0.7858 probabiii:y that he gives the correct suit will be more than 0.25. This is called the alternative
7 0.8982 hypothesis and 1s denoted by H 1 . You would write
P(X;;, 11) ~ 1- P(X.; 10) ~ 1- 0.9961 ~ 0.0039.
8 0.9591
On about four occasions in every thousand Sid might give 11 or H 1: P> 0.25
9 0.9861
more correct answers. This is still a rare event.
0.9961
T T
10 ·'it-:'1
PROCEDURE FOR CARRYING OUT A HYPOTHESIS TEST Since P(X ~ 7) > 5%, the test value x = 7 is not in the critical region. There is not
enough evidence to reject H 0 •
To find whether the test value is in the critical region you can work out the critical region as You would conclude that Sid is just guessing; he does not have psychic powers.
described above. This is a useful method as it gives a lot of information, but its disadvantage NOTE: when you are testing the value x = 7, it may seem strange that you have to work out
is that it can be rather time-consuming. P(X ~ 7) r~ther than just P(X = 7). Remember that this is necessary as you are essentially
In this example, it may be quicker instead to calculate the probability that X is greater than lookmg fm the cnt1cal regton to see whether the test value lies in this region or not.
the test value. If this probability is less than 5%, this means that the test value is in the upper The pr~babilities and critical region can be illustrated diagrammatically. Below is the
tail So/o of the distribution, i.e. it is in the critical region. probabll1ty d1stnbutwn for X- B(20, 0.25). Note that it is positively skewed and the
This method is illustrated in the working below which tests the sample value x = 7 and probabilities for 12 to 30 correct answers are so small that they cannot be shown on the
d1agram. The test value has been circled.
assumes that you have not found the critical region first. Note that the stages of the test are
shown in the 1nargin and additional commentary is given in italics. ~ Boundary for
i critical region (5%)
Let X be the number of correctly identified suits out of the 20 trials. Assuming I
I
variahl,:. that the pack is well shuffled between each trial and the trials are independent,
X can be modelled by a binomial distribution, where X- B(20, p).
L 'ltau: i 11, :1ml H 0 : p = 0.25 (Sid is guessing)
il, H 1: p > 0.25 (Sid has psychic powers)
If H 0 is true, then X- B(20, 0.25)
T I '
1+-+-~r-r.~~~~-r~
0 2 3 4 5 6 CZJ 8 I 9 10 11 12 13 14 15 16 17 18 19 20
4, St:-llt' lcvd :111d Use a one-tailed (upper tail) test, at the 5% level. + ,-------~----"""--·---~----·--~------:;:;.~
fc~s' ·-.::dew Cdicai region
type nf tc:~t,
Since P(X ~ 8) ~ 10% and P(X ~ 9) ~ 4%, the 5% boundary comes between 8 and 9. Note
The test value, x, will lie in the critical region, (the upper tail 5% of the
that w1th discrete distributions you will probably not get a perfect 5% in your calculations.
distribution), if P(X ~ x) < 5%.
Reject H 0 if P(X ~ x) < 5%, where xis
the test value. P(X <; r) for X- B(20, 0.25) Example 10.1
p = 0.25
Sid gives 7 correct answers, so test A drugs company produced a new pain-relieving drug for migraine sufferers and its
l'l'ljttin:d x = 7 and find P(X ~ 7). n = 20 r = 0 0.0032 ~dvert1sements stated that the drug had a 90% success rate. A doctor doubted whether the
j1COf:nl,iJj\y. From cumulative binomial tables 1 0.0243 rug would be as successful as the company claimed. She prescribed the drug for 15 of her
2 0.0913 pat1ents. After s1x months, 11 of these patients said that their migraine symptoms had been
P(X ~ 7) = 1 - P(X < 6) reheved by the drug.
3 0.2252
= 1-0.7858
4 0.4148 (a) Test the drug company's claim, at the 5% level of significance.
= 0.2141 5 0.6172 (b) Should the doctor continue to prescribe the drug?
~ 21% 6 0.7858 <--- P(X.;; 6)
7 0.8982
8 0.9591
¥
I
'[!
488
ill
the company claims.)
i'l'
If H 0 is true, then X- B(15, 0.9).
T 0 2 3 4 5 6 7 8
' I
ii' i' 9 10 @ 12 13 14 15
Since the alternative hypothesis is p < 0.9, the critical region is in the lower tail
of the distribution, so use a one-tailed (lower tail) test, at the 5% level.
The test value, x, will lie in the critical region, (the lower tail 5% of the
"''!I()' I distribution), if P(X <:; x) < 5%. !JNE-TAILED AND TWO-TAILED TESTS
Reject H 0 if P(X <:; x) < 5%, where xis the test value.
The test value is x ~ 11, so find P(X <:; 11). One-tailed test
Using cumulative binomial tables, if the tables give only values of p up to 0.5,
you need to use symmetry properties as illustrated on page 284. In the examples so far, one-tailed tests have been considered with either th I
tall ~emg used for the critical region, depending on the alter~ative hypothe~s~pper or ower
P(X .; 11\ p ~ 0.9) ~ P(X ;;, 4 I p ~ 0.1) P(X <; r) for X- B(15, 0.1)
~ 1- P(X.; 3) p ~ 0.1 In general, for a significance level of a%, null hypothesis Ho: p ~Po and a test value x,
~ 1-0.9444
n~ 15 r~ 0 0.2059 - :f HII infvodlvesha >h sign, indicating that you are looking for an increase in P use the upper
~ 0.0556 attomweterP(X~x)<a%, '
1 0.5490
~ 5.6%
0.8159
- if Hf1 dinvohlvehs a < sign, indicating that you are looking for a decrease in P use the lower tail
If you are calculating the probabilities <--P(XO) tom wet erP(X.;;;;x)<a%. '
directly: 4
Remember that in both cases you use P(X ... ) < a%.
P(X .; 11) ~ 1 - P(X ;;, 12) 5 0.9978
~ 1- ( 15 C 12 X 0.1 3 X 0.9 12 + 15 C 13 X 0.1 2 X 0.9 13
+ 15 X 0.1 X 0.9 14 + 0.9 15 ) Two-tailed test
~ 1-0.944 ...
~ 0.0556 (4 d.p.) A t~~--tailed te.st is carried out when the alternative hypothesis looks for a chan e in not
~ 5.6% spectf1cally an mcrease or a decrease. If the significance level is a 'X0 then th , 'tg ] p, · ·
in two t h If · h 1 ] ' e en 1ca regwn 1s
P(X <:; 11) is greater than 5%. This means that boundary for the critical region par s, a m t e ower tai and half in the upper tail,
(the lower tailS% of the distribution) will be slightly to the left of x ~ 11. So ~ if HI involves a * sign, indicating that you are looking for a change in p,
x ~ 11 is not in the critical region.
H 0 is not rejected and the drugs company's claim of a 90% success rate is m the lower tail, the critical region consists of values less than or equal to c h th
P(X <; c 1) < lao'
2 10.
1 sue at
upheld.
(b) P(X <:; 11) is only just greater than 5%. With safety in mind, it would be wise to suggest in the upper tail, the critical region consists of values greater than or equ 1t h h
P(X :, c ) < 1 o; a o c 2 sue t at
""' 2 2a1o.
that the doctor errs on the side of caution and carries out further tests before accepting that
the success rate is 90%. For
. . exampl
I · ·t·lcance 1eve1 (two-tailed test) the probability distribution and
. e, for a 5')(0 Slgm
cntica regwn might look like this:
¥I
i !CS
coJ:c.lt:~ion.
SinceP(X"1)<5"'
. . ""' /o' t h e samp Ie value x = 1 lie . ..
IS reJected in favour of H. At the 10'Yc . 'fi s m the crltlcal region, so Ho
. that support for the Purpie party has ;h~!;~d~ance level, there is evidence
(b) To fmd the critical region, consider se aratel h .
distribution. p Y t e upper aricqower tails of the
' S:_~l tl' II:, H 0 : p ~ 0.35 (The support has not changed) But is 8 the smallest value in the critical region? To check this try c ~ 7·
f! i. H 1: p * 0.35 (The support has changed) P( 12
X;;. 7 ) ~ c, X 0.65 5 X 0.35 7 + 0.0255 ...
' •
In this case the 10% for the significance level is distributed evenly between 5% i ! 5%
I I
the upper and lower tails, with 5% at each tail. I I
I
5. St--1:c t!tt: The test value, x will lie in the critical region, (the lower tail 5% or the
upper tailS%), if P(X ( x) < 5%, or P(X;;. x) < 5%. '
I
I
I
l'
I
('L'Hei"iCJii, l
Reject H 0 if P(X ( x) < 5%, or P(X;;. x) < 5%.
'''
I
6. C1ic:Liau:· d1v The test value is x ~ 1, so you need to look at the lower tail part of the I
I
I'
critical region and find P(X ( 1).
n~'illi:·\_'d
'
P(X < 1) ~ P(X ~ 0) + P(X ~ 1) '
l I
I
0.04244 ...~ . I
I
I
I I I
4.2% ~ 0 2 3 4 5 6 7 8 9101112x
(You can use cumulative binomial tables if they are available.)
492 ( 1 i( i; '
'
.. T
i
• State the null hypothesis H0 and the alternative hypothesis H 1 concerning A, for example
Ho: ;[ ~ Ao alternative hypothesis is A> 6 · 5 , use a one-tale
Since the .d "] d test at the 5'!\ 1 1
H 1:;t>.:l 0 an d cons! er the upper tail of the d.18 t n.b ut1on
. for t h e cnt1cal
.. reg1.on. o eve
o State the distribution of X assuming that the null hypothesis is true, i.e. assuming that X At the 5% level, the sample value x will lie in the
does follow a Poisson distribution with the value of ;t specified in H 0 , for example crzt1cal region if P(X;;, x) < 5%. P(X < r) where X- Po(6.5)
X ~Po(;l 0 ) Reject Ho if P(X;;, x) < 5%, where xis the test value A~ 6.5
• State the type of test (one-tailed or two-tailed), for example, G, The test value is x ~ 12, so find P(X;;, 12 ). 0.0015
Usmg cumulative Poisson tables (page 64 8 ) 1 0.0113
H 0: A~A 0 Ho:J.~Ao Ho: J.~Ao
0
I I
2 3 4 5 6 7 8 9
I IL
1011@ 13 14 15 16 17
Example 10.4
The number of misprints in the classified advertisements pages of the Daily Informer is found
to have a Poisson distribution with average 6.5 misprints per page. A new proof reader is
employed and the number of misprints on a page was found to be 12. The editor said that the
average number of misprints had increased. Test this claim at ~he 5°/o level.
498 /' c:cr-Jr_:!Si: CC,:!Jf\3t- if!,'\ ll-\/'
4. Si~Hc level ami Since the alternative hypothesis is !c < 4.5, use a one-tailed test and consider the you are told that values of x < 5 are in the critical region So a P(X < r) where X- Po(8)
lower tail of the distribution for the critical region. test value of x < 5 would lead to the null hypothesis being.
A= 8.0
reJected.
At the 10% level, the sample value x will lie in the critical region if
r= 0 0.0003
rcjccciu11 P(X<;;x) < 10%. P(Type I error) = P(reject H 0 when H 0 is true)
1 0.0030
c'L'i\l'nun. Reject H 0 if P(X ( x) < 10%, where xis the test value. = P(X < 5 when X~ Po(8))
2 0.0138
6. C1kd.1u' thL- The test value is 2, so find P(X <;; 2). P(X < r) where X- Po(4.5) From tables 3 0.0424
Using cumulative Poisson tables (page 64 7) A =4.5 4 0.0996
n:qu:t·L-d P(X < 5) = P(X ,;;; 4)
j)l'UiJJbi!ity.
P(X ,;;; 2) = 0.1736 = 17.36% r= 0 0.0111 = 0.0996 5 0.1912
1 0.0611 = 10%
NOTE: if you do not have access to tables, then
calculate P(X <;; 2) as follows (see page 292): 2 0.1736 P(Type I error)= 0.10 (2 d.p.)
3 0.3423
P(X,;;; 2) =P(X = 0) +P(X = 1)+P(X =2) 0.5321
4
4 2 b bT .
~~ )
(c) The significance level of the test is the same as th
45 5 0.7029 the significance level is 10%. e pro a 1 tty of makmg a Type I error, so
=e- (1+4.5+
6 0.8311
= 0.17357 ... (d) If, in fact, the number of flaws per metre has been red d
of flaws in a 4-metre length is four Th h h b uce . to 1, then the expected number
= 17.36% · e ypot eses ecome
Since P(X <;; 2) > 10%, the test value of two breakdowns does not lie in the H 0 : ;t = 8
7. fvl.:lke yolll P(X <r) where X- Po(4)
CllllciUSillll, critical region, so H 0 is not rejected. HI: A= 4 A=4.0
There is no evidence, at the 10% level, of an improvement in the reliability of
P(Type II error)= P(accept Ho when HI is true)
the photocopier. r=O 0.0183
y .
ou reJect Ho when x < 5, so accept Ho when x;;;;. 5. 1 0.0916
2 0.2381
P(Type II error)= P(X ) 5 when ;1. = 4)
Example 10.6 3 0.4335
=1-P(X <:; 4whenX ~ Po(4))
The number of flaws per metre of fabric follows a Poisson distribution with mean 2. With the 4 0.6288
aim of reducing the number of flaws, the fabric is subjected to a different treatment process. = 1-0.6288 5 0.7851
After this treatment a significance test is devised to gauge whether it has been successful. The = 0.3712 6 0.8893
test states that the number of flaws has decreased if a randomly selected 4-metre length of = 37%
cloth contains fewer than five flaws.
The probability of making a Type II error is approximately 37%.
(a) State the null and alternative hypotheses for this significance test.
(b) Find the probability of making a Type I error when the test is carried out.
500
T 'I I 501
A die is suspected of bias towards showing more sixes than would be expected of an ordinary
H,: A< 4 (the number of flaws has decreased)
die. In order to test this, it is decided to throw the die 12 times. The null hypothesis p = ~, If H 0 is true, then X- Po(4).
where p is the probability of the die showing a six, will be rejected in favour of the alternative Use a one-tailed test and consider the lower tail for the crit' 1 .
hypothesis p > ~if the number of sixes obtained is 4 or more. Calculate, to three decimal At the 10% level the value x = 1 will be in the critical . lc~ regwn.
places, the probability of making reject H 0 if P(X.;; 1) < 10%. regwn If P(X <: 1) < 10%, therefore
(c) (i) Comment on your answer to part (a). 9. It is known that the number of defects in a
Miscellaneous exercise lOc- Binomial and Poisson tests (ii) Suggest an improvement to the one-metre length of steel pipe has mean 2.4. It
4. The ABC School of Motoring claim that at least procedure used by the consumer group. has been suggested that a Poisson distribution
1, Before I sat an examination, my teacher told me (NEAB) would be a reasonable model for the number of
80% of their pupils pass the driving test first
that I had a 60% chance of obtaining a grade A, defects in a randomly chosen one-metre length of
time. The XYZ School of Motoring suspect that
but I thought I had a better chance than ~hat. 7 A firm producing mugs has a quality control this steel pipe.
more than20% of ABC's pupils fail first time.
In preparation for the examination, we dtd seven scheme in which a random sample of 10 mugs
They test this suspicion by checking the results of {a) State two assumptions that would need to
tests each of the same standard as the from each batch is inspected. For 50 such samples,
a random sample of 25 former ABC pupils, be made for a Poisson distribution to be an
examination. Assuming my teacher is right, find the numbers of defective mugs are as follows:
finding out how many failed first time. appropriate model in this case.
the probability that I would get a grade A on
(a) State suitable null and alternative Number of (b) Using this Poisson model, calculate the
(a) all 7 tests, hypotheses to be used in the test. probability that in a randomly chosen
(b) exactly 6 tests out of 7, defective mugs 0 1 2 3 4 5 6+ one-metre length of steel pipe there are:
(b) Identify the model that should be us.ed for
(c) exactly 5 tests out of 7. the distribution of the number of fat!ures. (i) exactly 3 defects,
Number of (ii) more than 3 defects.
In fact I got a grade A on 6 tests out of 7. State (c) Find the smallest number of failures which
suitable null and alternative hypotheses and would allow ABC's claim to be rejected at samples 5 13 15 12 4 1 0 (c) Determine the probability that there arc
carry out a statistical test to determine ~hether the 5% level of significance. (NEAB) exactly 6 defects in a randomly chosen two-
or not there is evidence that my teacher ts (a) Find the mean and standard deviation of the metre length of the same type of steel pipe.
underestimating my chances of a grade A. (MEI) 5. For most small birds, the ratio of males to number of defective mugs per sample. (d) It is believed that the manufacturing process
females may be expected to be about 1 : 1. In. one (b) Show that a reasonable estimate for p, the may now be producing more defects than
2. Harry Hotspur is a footballer who likes to take ornithological study birds are trapped by settmg probability that a mug is defective, is_0.2. before. In a quality control experiment a
penalty kicks. On past performance he reckons fine-mesh nets. The trapped birds are counted Use this figure to calculate the probability one-metre length of the steel pipe is chosen
that on average he scores 7 times out of 10. and then released. The catch may be regarded as that a randomly chosen sample will contain and is found to have 7 defects. Test, at the
Assume that Harry is correct, and consider the a random sample of the birds in the area. exactly 2 defective mugs. Comment on the 5% level of significance, the hypothesis that
next 8 penalty kicks he takes. The ornithologists want to test whether the sex agreement between this value and the the number of defects in this type of steel
(a) Find the probability that he will score at ratio of blackbirds is, in fact, 1: 1. observed data. pipe has increased. State your hypotheses
(a) Assuming that the sex ratio of blackbirds is clearly. (0)
least 6 times. The management is not satisfied with 20% of
(b) Find the modal score and state its 1 : 1, find the probability that a random mugs being defective and introduces a new process 10. (a) The number, X, of breakdowns per day of
probability. . sample of 16 blackbirds contains to reduce the proportion of defective mugs. the lifts in a large block of flats has a
(c) What further assumption have you made m (i) 12 males (ii) at least 12 males Poisson distribution with mean 0.2. Find, to
(c) A random sample of 20 mugs, produced by
calculating the probabilities in (a) and (b)? (iii) at least 12 of the same sex. three decimal places, the probability that on
the new process, contains just one which is
After a period of intense practice, Harry reckons (b) State the null and alternative hypotheses the a particular day
defective. Test, at the 5% level, whether it is
that he has improved his penalty taking. ornithologists should use, clearly indicating (i) there will be at least one breakdown,
reasonable to suppose that the proportion of
why the alternative hypothesis takes the (ii) there will be at most two breakdowns.
(d) Write down suitable null and alternative defective mugs has been reduced, stating
form it does. your null and alternative hypotheses clearly. (b) Find, to three decimal places, the probability
hypotheses for testing the value of p, the
probability that Harry scores from a penalty In one sample of 16 blackbirds there are 12 (d) What would the conclusion have been if the that, during a 20-day period, there will be
males and 4 females. management had chosen to conduct the test no lift breakdowns.
kick. (c) The maintenance contract for the lifts is
(c) Carry out a suitable test using these data at at the 10% level? (MEl)
He takes 15 penalty kicks and scores from 13 of the 5% significance level, stating your given to a new company. With this company
conclusion clearly. Find the critical region 8. In a certain country, 90% of letters are delivered it is found that there are two breakdowns
them.
for the test. the day after posting. over a period-of 30 days. Perform a
(e) Carry out the hypothesis test, at the 10%. A resident posts 8 letters on a certain day. significance .test at the 5% level to decide
level of significance, stating your concluslOn (d) Another ornithologist points out that,
because female birds spend much time Find the probability that: whether or not the number of breakdowns
clearly. sitting on the nest, females are less lik~ly to has .dccfeased.· (L)
(f) Harry takes a further set of 15 penalty (a) all 8 letters are delivered the next day,
kicks. Out of the total of 30 kicks he scores be caught than males. Explain how thts (b) at least 6letters are delivered the next day, 11. The number, X, of emergency telephone calls to
from 26. Without further calculation would affect your conclusions. (MEl) (c) exactly half the letters are delivered the next a gas board office in t minutes at weekends is
explain carefully whether this additional day. known to follow a Poisson distribution with
6. Over many years it has been found that at a
information strengthens Harry's case or not. particular station 20% of trains arrive late. A It is later suspected that the service has mean <Jot. Given that the telephone in that office
(MEl)
consumer group wishes to test wheth~r the deteriorated as a result of mechanisation. To test is unmanned for 10 minutes, calculate, to two
percentage of trains arriving late has mcreased this, 17 letters are posted and it is found that significant figures, the probability that there will
3. The manufacturers of a certain type of recently. It decides to observe 20 ~rai~s. If J?ore only 13 of them arrive the next day. Let p denote be at least 2 emergency telephone calls to the
microwave oven claim that at least 95% of their than four of the trains arrive late 1t wtll clatm the probability, after mechanisation, that a letter office during that time.
ovens will not fail during the first two years of that the percentage of trains arriving late has is delivered the next day. Find, to the nearest minute, the length of time
use. In order to test this claim, a Consumer that the telephone can be left unmanned for there
increased. (d) Write down suitable null and alternative
Agency purchased a random sample of 15 ovens to be a probability of 0.9 that no emergency
and ran them under similar conditions over a (a) In the case where the percentage of trainsh hypotheses for the value of p. Explain why telephone call is made to the office during the
. . late has remame
arnvmg · d at 20"'to, find It e the alternative hypothesis takes the form it
two-year period. It was found that 12 ovens had period the telephone is unmanned.
probability that the consumer group rna <es does.
not failed during that period. During a week of very cold weather it was found
Test the manufacturer's claim using an exact a Type I error. . (e) Carry out the hypothesis test, at the 5% level that there had been 10 emergency telephone calls
binomial distribution. The significance level {b) In the case where the percentage of t~·adnsh of significance, staring your results clearly. to the office in the first 12 hours of the weekend.
should be as close as possible to 5%. arriving late has increased to 25%, ftn t e (f) Write down the critical region for the test, Using tables, or otherwise, determine whether
. . t hat t he consumer group ma 1<es
probabthty giving a reason for your choice. (MEI)
Explain why an exact 5% significance level is not the increase in the average number of emergency
possible. (C) a Type II error. telephone calls to that office is significant at the
5% level. (L)
Mixed test lOA (Binomial)
1. The random variable, R, can be modelled by a A student complained that this sample did not
binomial distribution with parameters n = 10 and give a true picture of the effectiveness of the new
p, whose value is unknown. procedure.
Find the critical region for the test of
(b) Explain briefly why the student's claim
H 0: p = 0.5 againstH 1 : p * 0.5 might be justified and suggest how a more
effective check on the new procedure could
at the 10% level of significance. (NEAB) be made. (L)
2. A large college introduced a new procedure to 3. An enthusiastic gardener claimed that she could
try to ensure that staff arrived on time for the never work in the garden at the weekend because
start of lectures. A recent survey by the students 'It always rains on Saturday and Sunday when
had suggested that in 15% of cases the staff
arrived late for the start of a lecture. In the first
I'm at home and it's always fine on weekdays
when I'm not!' She noted the weather for the
Hypothesis testing (z-tests and t-tests)
week following the introduction of this new next month and recorded that, out of 10 wet
procedure a random sample of 35 lectures was days, five were either a Saturday or a Sunday.
taken and in only one case did the member of The gardener's claim may be modelled by
staff arrive late. regarding her observation as a single sample
In this chapter you will
(a) Stating your hypotheses clearly test, at the from a B(10, p) distribution. Given that one
5% level of significance, whether or not would expect 2 out of every 7 wet days to be • be reminded about the language of hypothesis (significance) testing introduced in Chapter 10
there is evidence that the new procedure has either a Saturday or a Sunday, the null
hypothesis, p = ~' may be tested against the • be reminded about Type I and Type II errors
been successful.
alternative hypothesis, p > ~·Carry out a • learn how to perform the following hypothesis tests:
hypothesis test to test her claim at the 10%
significance level. {C) Test 1: Testing ft, the mean
la: of a normal distribution with known variance, any size sample (z-test)
lb: of a distribution with known variance, large sample (z-test)
lc: of a distribution with unknown variance, large sample (z-test)
Mixed test lOB (Poisson) ld: of normal distribution with unknown variance, small sample (1-test)
1. The mean number of serious accidents at a (ii) Calculate the probability that, in two Test 2:Testing p, the proportion of a binomial distribution, n large (z-test)
motorway interchange is 2.1 per week. such dishes, the total number of
(a) State the probability distribution which may bacterial colonies that develop will be Test 3:Testing 111 -112 , the difference between means of two normal distributions
reasonably be used to model the weekly between 10 and 20 inclusive. 3a: when population variances are known (z-test)
number of serious accidents at this (b) Experiments were conducted to determine
the effectiveness of an antibiotic spray in
3b: when there is a known common population variance (z-test)
motorway interchange, and give its 3c: when the common population variance is unknown,
parameter. reducing the number of bacterial colonies
that develop. large samples (z-test)
(b) Usc an appropriate distribution to determine
the probability that he number of serious In one experiment in which one dish was - small samples {t-test)
accidents is: sprayed, the number of bacterial colonies
{i) two or fewer in a randomly selected that developed was 3. Stating suitable null Background knowledge:
week: and alternative hypotheses, determine
(ii) exactly one on a randomly selected day. whether or not this result provides For the z-tests you will need to be familiar with
significant evidence at the 5% level that the
(c) Given that there were 6 serious accidents
spray is effective. {NEAB)
the normal distribution and the use of the standard normal tables (see page 3 62)
during one wet winter week, test, at the 5% the distribution of the sample mean (see page 436)
level of significance, the hypothesis that the
accident rate is higher in wet weather. (0)
3. A single observation is taken from a Poisson the unbiased estimate for the population variance (see page 447)
distribution with mean p and used to test the the normal approximation to the binomial distribution (see page 382)
2. (a) The number of bacterial colonies that hypothesis fl = 6 against the alternative
hypothesis ft > 6. For the t-tests you will need to be familiar with
develop in dishes of nutrient exposed to an
infected environment has a Poisson The critical region is chosen to be x ;-; , 11. the use of the t-distribution tables (see page 463)
distribution with mean 7.5. (a) At what significance level is the test carried
{i) Calculate the probability that, in one out? . II
such dish, the number of bacterial (b) Find the probability of makmg a Type HYPOTHESIS TESTING
colonies that develop will be greater error if, in fact, p = 8.5.
than 10.
If you have worked through Chapter 10 you will be familiar with the terminology and
~ethods used to carry out hypothesis tests relating to discrete distributions. For those new to
t e t?prc, these are described again in the following text, but this time in relation to
continuous distributions. The following example illustrates the hypothesis test for the mean of
a normal distribution.
11111111111111111111111111111111
5os · c: u rrc;: s ~ : , rH s r. r r:: rr v : •vc : ··llllll..............................------------~q;~~~------------------................lllllllllllllllllllllllllllllllllllllllllllllllllll
1-!\'l'C:·r,,,-- (~l!f\!C1' _-;-l<l"i"S/:.fJC!f l-iSTS) 509
In the production of ice packs for use in cool boxes, a machine fills packs with liquid and the The result of the test depends on the h b · h · · ·
of 524 9 I h I w erea outs m t e samplmg drstnbution of the test value
packs are then frozen. Since space is needed in the packs for the liquid to expand, it is . m' t e mean vo ume of the sampl f 50 I k b
need to find out wheth . 524 9 . I eo pae<s ta en y the supervisor. She would
important that they are not over-filled. The volume of liquid in the packs follows a normal er . rs c ose to 524 or far away from 524.
distribution with mean 524 ml and standard deviation 3 mi. If it is close to 524 then it is likely to have come from a
The machine breaks down and is repaired. In the next batch of production, there is a drstnhutron with mean 524 ml and there would t b
"d nn e
suspicion that the mean volume of liquid dispensed by the machine into the packs has ~nough evt ence to say that the mean volume has
increased and is greater than 524 ml. In order to investigate this, the supervisor takes a Increased.
random sample of 50 packs and finds that the mean volume of liquid in these is 524.9 mi. If it is far away from 524, i.e. in the right-hand (upper
Does this provide evidence that the machine is over-dispensing? tar!) of the drstnbutwn, then it is unlikely to have come
from a distribution with a mean of 524 mi. The mean is 524 524.9
The mean volume of the sample, 524.9 ml, is higher than the established mean of 524 mi. But
(close to' 524)
is it high enough to say that the mean volume of all the packs filled by the machine has likely to be higher than 524 mi.
increased? Perhaps the mean is still 524 ml and this higher value has occurred just because of Note that the upper tail is being used because the
sampling variation. A hypothesis (or significance) test will enable a decision to be made that is supervisor suspects that there is an increase in Jt. This
backed by statistical theory, not just based on a suspicion. type of test is called a one-tailed (upper tail) test.
Let X be the volume, in millitres, of liquid dispensed into a pack after the machine has been A decision need~ to be taken about the cut off point, c, ~~~./~---~----*'"-~
repaired and let the mean of X be ft, where p is unknown. Assuming that the standard known as. the cntlcal value, which indicates the boundary X: 524 524.9
deviation remains unchanged, X~ N(ft, a 2 ) with a= 3. of the regwn where values of X would be considered to be (far away from 524)
The hypothesis is made that I" is 524 ml, i.e. the mean has remained the same as it was prior too far away from 524 ml and therefore would be
to the repair. This is known as the nnll hypothesis, H 0 and is written unlikely to occur. The region is known as the critical region or rejection region.
H 0 : p=524 The critical value and region are fixed using probabilities linked to the significance level of the
tes_!: In gener:l, for an up~~r tail t~st ~t the a% level, the critical value cis fixed so that
Since it is suspected that the mean volume has increased, the alternative hypothesis, H 1 , is that P(X > c)= a Yo and the cntrcal (reJectwn) region is x > c.
the mean is greater than 524 mi. This is written
HI:,,> 524
To carry out the test, the focus moves from X, the volume of liquid in a pack, to the
distribution of X, the mean volume of a sample of 50 packs. In this test, X is known as the
test statistic and its distribution is needed. The
distribution of X is known as the sampling ,, c
distribution of means. critical region, X> c
{rejection)
In Chapter 9 you saw that if X~ N(p, a 2
),
The hypothesis test involves finding whether or not the sample value ·x- Ir"es · th ·· 1
- ( a')
then, for samples of size n, X ~ N p, -;; . X:
re f h 1· d' . .
gwn o t e samp mg 1stnbut10n of means, X.
_ . , , m e cntiCa
:~r a sig n~fica?ce level ~f ~~'if the sample mean lies in the critical (or rejection) region then
The sampling distribution of means, therefore, follows a normal distribution with mean 1
e resu t Is satd to be Significant at the a% level. '
524 ml and variance 0.18 ml2 • The standard deviation is '1/0.18 mi.
I if a resu JtIS
NoteI that
any · signt
· ·f·Kant at, say, the 1% level, then it is automatically significant at
a 3 eve greater than 1%, for example So/o or 10%.
NOTE: This is sometimes left in the form =.
= •r = .
vn v50 Say! th(at the supervisor chooses a significance level of 5%. She will then re1·ect H if the test
d" ue.b r·:
va e the m ean vo Iume of t h e sampIe of 50 cans) lies in the upper tailS%
. ofo the
tstn utton of sample means.
Since this distribution is normal, instead of finding c, the critical X value, it is possible to work
in standardised values and find the z-value that gives 5% X~ N(524, :~)
in the upper tail. Using standard normal tables (page 649),
if P(Z > z) ~ 0.05 <P(z) = 0.95
Z- NIO. !)
so P(X > 524 _9 ) ~ p(z > 524.9- 524)
3/m
boundary for 5%
v'
then P(Z < z) ~ 1- 0.05 ~ 0.95 5%
~ P(Z > 2.1213 ... )
I.e. <!>(z) ~0.95 1.7%
~ 1-0.9831
z ~ <1>- 1(0.95) 0 1.645
~ 0.0169 X: 524 524.9
~ 1.645 critical region ~ 1.7%
So z-values that are greater than 1.645 lie in the upper tail 5% of the distribution. This probability is less than 5% im 1 in h th b d ·· ·
the left f th 1 I f , p y g t at e oun ary for the Cnt!cal regwn must lie to
0
e samp eva ue o 524.9 and confirming that 524.9 lies in the critical re i h"
methfod also tbells you that the test value of 524.9 will lie in the critical region for angy re~eTo~s
This enables a statement to be made, known as the rejection criterion, which tells you when to
reject the null hypothesis: sigm Icance a ove 1. 7%.
Reject H 0 if z > 1.645, where z is the standardised value of the mean of the sample of 50
This probability method ~an be used, if preferred, in the hypothesis test to find whether the
packs, sample value hes m the reJeCtiOn region. In this example , for a so' 1 1 f · ·f· h
. . . • /o eve o sigm Icance t e
x-524 reJectwn cntenon would be to reject H o if P(X > x) < 0 •05 , w h ere x- Is
· t h e sample mean.
'
I.e.
3/m
Note that to avoid being influenced by sample readings, it is important that the rejection ONE-TAILED AND TWO-TAILED TESTS
criterion is decided upon before any sample values are taken.
When the sample was taken, it was found that
so z
524.9-524
x ~ 524.9,
\ \ 5%
Say that the null hypothesis is 1, ~ l'o·
In a one-tai.led test, the alternative hypothesis HI looks for an increase or a decrease in Jt:
for an Increase, Ht is Jl > Jto and the critical region is in the upper tail,
3/m
~ 2.12 (2 d.p.) z, 0 1.645
1\
""'
\_-
test value
_)
z = 2.12
The result of the test is now stated in statistical terms and then related to the context of the
test, as follows:
Since z > 1.645, H 0 is rejected in favour of H 1 . The supervisor would conclude that the mean critical region
volume of liquid being dispensed by the machine is not 524 ml, but has increased, she would for a decrease, HI is Jl < Jlo and the critical region is in the lower tail.
be wise therefore to stop production so that the setting on the machine could be adjusted.
Note that the critical x-value, c, can be found by de-standardising the critical
z-value of 1.645, where
c- 524 1.645
3/m
3 i<o
c~524+ 1.645 x ·= critical region
~so
~ 524.7 ~h:t~wo-tailed test, the alternative hypothesis H, looks for a change in!' without specifying
X: 524 524.7 '\ er It IS an mcrease or a decrease and H1 is Jt *flo· The critical region is in two parts:
test value
Since the test value of 524.9 is greater than 524.7, X"' 524.9
it lies in the critical region, confirming the result
obtained above.
If you want even more information, you can find out exactly where the sample mean lies in
the distribution of X. Note that this is the method used in Chapter 10 for discrete variables.
i<o
critical region critical region
!-!"YT'Ori-I[S!S Tt-:STiNG (.7--Tf~STS PJ\JD •'-I cSTSI 5!3
1% 1%
STAGES IN THE HYPOTHESIS TEST
0 2.326 z. -2.326 0
:~:~i~f{~~:g out a hypothesis test, it is useful to work through the following stages. This is
For a two-tailed test, at the 1% level: the 1% in the tails is split evenly between the upper and in Chapt:r 10. same procedure as m the tests for parameters of discrete distributions described
lower tails with 0.5% in each. There are two critical values.
To find the upper tail value, you need to find z such that <l>(z) = 0.995, so look up p = 0.995. 1. State the variable being considered.
From the table z = 2.576. 2. State the null hypothesis Ho and the alternative hypothesis Hl.
So the upper tail critical value is 2.576 and the lower tail value (by symmetry) is -2.576.
Remember that if you are looking for
a decrease
. '
then H t •· • • • < • •• (one-tat·1 ed test, 1ower tail)
Critical values:
an mcrease' then H 1 .• ... > ••• (one-tate .1 d test, upper tail)
99% a change' then * (
H 1 ·• ••• • •• two-tate ·1 d test, upper and lower tails)
3. ~onsider the distribution of the test statistic, assuming that the null hypothesis is true
0.5% 0.5% 0.5% you aretestmg a sample mean, then the test statistic is X, and the sampling distrJ'b t. ·
o f means IS considered.
z. 0 2.576 z. -2.576 0 2.576' 4
· State thfehtype of test (i.e. whether it is one-tailed or two-tailed) and decide the · ·f·
u IOn
6. ::;{~~~ any calculations necess~ry to find out whether the test value is in the critical
7. Make your conclusion in statistical terms: Does this provide evidence, at the 5% significance level, that trainees from this county did not
perform well as expected?
- If the test value is in the critical region, reject H 0 in favour of H 1•
- If the test value is not in the critical region, do not reject H 0 • Solution 11.1
Then relate your conclusion to the situation being tested.
The stages of the hypothesis test are shown in the margin and additional comments are given
m rtahcs.
There are several hypothesis tests involving continuous distributions and some of these are
illustrated in the following text. Let X be the mark of a trainee from the particular county and let the population
mean mark be fi·
~:~~ing that the standard deviation has not changed, then X- N(fi, a 2 ) with
HYPOTHESIS TEST 1: TESTING Jl (THE MEAN OF A POPULATION)
Consider a population X with unknown mean fl and variance a 2• H 0 : fi ~ 70 (The trainees have performed as expected)
A value for fl, call it f'o' is specified in the hypotheses, for example H 1 : fi < 70 (The trainees have not performed as well as expected)
Ho:fl~fio The test is c:arried out based on the value of the sample mean, x. The test
H, : fl< fio (or fi > fio or fi * fio) statistic is X and you need to consider the sampling distribution of means.
a~ 6 and n ~ 25.
2
To test the hypotheses, take a sample of size n from the population and calculate the sample For samples of size n, X- N(fi, : ) with
mean, X. The test statistic is X, and the sampling distribution of means is considered.
There are now several cases that may occur, depending on whether the population is normal You now use the value of fi given by the null hypothesis.
or not, whether the sample size is large or small and whether the population variance is
known or not. If His true, then p ~ 70, so X - N( 70, ~;} i.e. X - N(70, 1.44).
0
70 1-\ This means that the sample mean, X must lie in the critical region, i.e. X must be greater than
/.\~.
c- =-1.645 the critical value, c.
6/m
c=70-1.645 x -=
6
~25
;/5% Working first in standardised values, the critical z-value
that gives 1% in the upper tail is z = 2.576 (see page 649).
.. c = 68.026 x. l 68.026 70 c-100
67 3 De-standardising to give c: 2.576 1%
So the critical region is x < 68.026. This means that any · 3/ru
test value less than 68.026 would result in the null hypothesis being rejected. 3 z. 0 2.576
C= 100 + 2.576 X .r:;-; x. 100 c = 101.932
~16 f-------?
NOTE2: c = 101.932 reject H0
If you prefer to use the probability method to decide whether x lies in the critical region, then,
since the significance level is 5o/o, the rejection criterion would be to reject H 0 if So the critical region is x > 101.932.
P(X < 67.3) < 0.05. Since the null hypothesis is rejected, the sample mean, x, is greater than 101.932.
Now
- (
P(X < 67.3)=P Z <
67.30- 70)
-=
6/~25
= P(Z < -2.25) Test lb: Testing p when the population X is not normal, the variance
= 0.013
=1.3%
a 2 is known and the sample size n is large
X: 67.3 70
Since P(X < 67.3) < 0.05, reject H 0 (as before). Z: -2.25 0 Since the population is not normal, you cannot say that the distribution of X is normal for all
sample sizes. If the sample size n is large, however, you can apply the central limit theorem
This method also tells you that H 0 would be rejected at any significance level above 1.3%.
(see page 442). This states that for large samples taken from a non-normal population, the
sampling distribution of means X is approximately normal, whatever the distribution of the
parent population.
Example 11.2
the mean of a. JWn-nnnna1 X -;,vith known
A sample of size 16 is taken from the distribution of X- N(/1, 3 2) and a hypothesis test is the
carried out at the 1% level of significance. On the basis of the value of the sample mean x, the
null hypothesis fl = 100 is rejected in favour of the alternative hypothesis fl > 100. the tc~t statistic is X, -;,vhcrc X is
What can be said about the value of x?
Jn 'iWn(Lirdiscd
Solution 11.2
the test statistic i,s / X Pu 1-vbcrc Z --
In this question you are being asked to find the critical region in terms of x. 11
(c) Perform the hypothesis test as follows: A Type I error is made if H 0 is rejected when H is true.
)_ DdJ;r,: the Let X be the lifetime, in hours, of a light bulb. This is written --~--~,"lw 'I'Y:ll'ton w iS 11 uc,duc<:d in C:h tptq 1 re; dc~crih,·
vari:1hlt· Let the population mean be I' and the population standard deviation be a. [error) i.s trw
H 0 : p ~ 1000 (The statement on the packaging is correct) H the lc,,..el i.s a(_>;_) then rhc
h :,o the siEniilicom• e
and !-! 1 H 1 : p < 1000 (The statement on the packaging is overestimating the length of lcvd of the test and dw of HJ;tKing I error arc both the sam_c.
life)
-~- .'\I :tt,- I: he· For samples of size n, where n is large, by the central limit theorem and using 8 A Type II error is made if H 0 is accepted when H is false.
disHibuTiuil of for a, This is written °
2 !l
~ ) withn~64and8~7.055.
lS ac!:eptcd
X-N0,
To calculate the probability of a Type II error, a particular value must be specified in the
2
alternatrve hypothesis H 1 .
If H 0 is true, p ~
-
1000, so X- N ( 1000, ~
7.055 ) . So II error) J;; IfUC)
Example 11.5
6. Pcdorm -h,-- From the sample, x ~ 998.6.
I'~'C[llii'Cd 998.6-1000 A random variable has a normal distribution with mean p and standard deviation 3.
.. z
The null hypothesis-'' ~ 20 is to be tested against the alternative hypothesis 1, > 20 using a
i'<iku):lt]O_!I. 10%
7.055/fM
~ -1.587 ... random sample of srze 25. It ts dectded that the null hypothesis will be rejected if the sample
z. )' -1.282 0 mean rs greater than 21.4.
test value= -1.587
(a) Calculate the probability of making a Type I error.
7. ;\-Lkv y .lli' Since z < -1.282, reject H 0 (the mean is 1000 hours) in favour of H 1 (the mean
C'OiJC! 11~1\l! i is less than 1000 hours). (b) Calculate the probability of making a Type II error, when in fact /1 ~ 21.
There is evidence, at the 10% level, that the statement on the packaging Solution 11.5
overestimates the length of life of this type of bulb.
(a) You are given that X- N(p, 3 2 )
and H 0 : p ~ 20
H 1 : I'> 20
~ 25.
2
TYPE I AND TYPE II ERRORS For samples of size 25, X- N(p, : ) with n
When you make your decision about whether or not to reject H 0 there are two types of error
that could be made. These were described in Chapter 10 (page 493) and are called Type I and If the null hypothesis is true, p ~ 20 and X - N(2o 2'.)
'25
Type II errors:
P(Type I error)~ P(H0 is rejected when H 0 is true)
- a Type I error is made when you wrongly reject a true hypothesis, ~ P(X > 21.4 when p ~ 20)
Distribution of X when Jt = 20.
- a Type II error is made when you wrongly accept a false hypothesis.
~ P(z > 21.4- 20)
These can be summarised in a table: 3/fu
~P(Z > 2.333)
Test decision
~ 1 - <P(2.333)
AcceptH0 Reject H 0 ~ 1-0.9902
1%
n
mean ,u g and standard deviation 15 g. A test of (c) Find the range of values of I' for which the n-1 n
the null hypothesis p"" 375 against the probability of making a Type II error is less
~~(11.5538- 7
alternative hypothesis p > 375 is carried out at than 0.025.
the 2!% significance level using a random (d) The test is carried out, independently, on
sample of 16 boxes. two different occasions. Find the probability
that at least one Type I error is made. (C) ~ 0.000 45
(a) Show that the alternative hypothesis is
accepted when X > 382.35, where X g is the so a ~ 0.0212 (3 s.f.)
sample mean mass.
(c) Ho: I"~ 1.50 (the wire is pure silver)
H,: fl > 1.50 (the wire is impure)
Test ld: Testing the mean p. when the population X is normal but the If Ho is true, fl = 1.50. Since n is small and a2 is unknown, the test statistic is T
variance a 2 is unknown and the sample size n is small -
X-1.50
'
h
were T and T-t(n-1),
In this case, the population is normal, so X . . . N{jt, a 2 ). Since a 2 is unknown, 8 2 is used instead &/-&z
(as in Test 1c on page 519).
I.e. T
x -1.50
and T- t(4).
Consider the distribution of the sample mean X. When the sample size is small, X does not 0.0212/'fS
follow a normal distribution. As you saw in Chapter 9 (page 462), the standardised statistic is Use a one-tailed test (upper tail) at the 5% level.
called T and it follows a !-distribution with (n - 1) degrees of freedom. ul
_';, 'IV U'l
When the mean of a nornwl From the tables on page 650 the p 0.75 0.90 0.95 0.975
critical value for t is found f:om
row v ~ 4, p ~ 0.95 giving 2.132. v~l 1.000 3.078 6.314 12.71
2 0.816 1.886 2.920 4.303
the test st<ldstic i.s T \Vherc T = X /f \! and 'f .. 1). Reject H 0 if the test value of t is
3 0.765 1.638 2.353 3.182
greater thau 2.132.
4 0.741 1.533 12.1321 2.776
When finding the critical I-values, !-distribution tables are needed and these are printed on
page 650. You may need to remind yourself how to use them by reading again the notes on From the sample, x ~ 1.52.
page 464. 1.52-1.50
T
x- 0.05 and T- t(n -1)
(b) 17 605.2 23016.92 H 0 :p = 40, 5% 7. A marmalade manufacturer produces thousands
of jars of marmalade each week. The mass of
H 1 :peF40
8/'>/n (c) 6 9034.8 50.8 H 0 : ft""' 1503, 1.0%
marmalade in a jar is an observation from a
normal distribution having mean of 455 g and
X-0.05 HI: !l * 1503 standard deviation 0.8 g.
i.e. T and T- t(7). (d) 10 1298 97.6
0.00214/fS H 0 : p = 133.0, 1% Following a slight adjustment to the filling
H 1:p<133.0 machine, a random sample of 10 jars is found to
4. St-dlt: the: lv cl contain the following masses, in grams, of
Use a two-tailed test at the 1% level. marmalade:
uf tht' tc~t. 2. An athlete finds that her times for running a race
are normally distributed with mean 10.6 454.8, 453.8, 455.0, 454.4, 455.4,
S. Dccick ou The critical value for p 0.75 0.90 0.95 0.975 0.99 0.995 seconds. She trains intensively for a week and 454.4, 454.4, 455.0, 455.0, 453.6
t is found from row then records her time in the next 6 races. Her
v ~ 7, p ~ 0.995 v-1 1.000 3.078 6.314 12.71 31.82 63.66 times, in seconds, are {a} Assuming that the variance of the
en tenon.
2 0.816 1.886 2.920 4.303 6.965 9.925 distribution is unaltered by the adjustment,
(because you want 10.70, 10.65, 10.75, 10.80, 10.60 test at the 5% significance level the
0.5% in the each tail) hypothesis that there has been no change in
7 0.711 1.415 1.895 2.365 2.998 13.4991 Is there evidence, at the 5% level, that training the mean of the distribution.
giving ±3.499. intensively has improved her times? {b) Assuming that the variance of the
Reject H 0 if t < -3.499 or t >3.499 i.e. if It I> 3.499. 3. Family packs of bacon slices are sold in 1.5 kg distribution may have been altered, obtain
an unbiased estimate of the new variance
6, l\:rful'rl1 dw From the sample, x ~ 0.04 7. T- 1(7) packs. A sample of 12 packs was selected at and, using this estimate, test at the 5% level
random and their masses, measured in of significance the hypothesis that there has
!:cqctircd
kilograms, noted. The following results were
c:dcttL-Ition. t 0.047-0.05 ~ -3.96 ... obtained:
been no change in the mean of this
distribution. (C)
0.002 14/fS Ex~ 17.81, I:x 2 ~ 26.4357
0.5% 0.5% 8. Six observations of a continuous random
7. rvtakc: your Since It I> 3.499, H 0 is rejected. Assuming that the masses of packs follow a variable X gave the following values:
co1Khtsiuu. -3.499 0 3.499 normal distribution with variance a 2 , test at the
1% level whether the packs are underweight,
120.3, 122.4, 119.8, 121.0, 122.5, 119.6
2
There is evidence, at the 1% level, that the output from the machine is different from that (a} if a is unknown, {b) if a = 0.0003.
2 State any conditions that are necessary for the
valid use of a t-test to test a hypothesis about the
expected. 4. It is thought that a normal population has mean mean of X.
1.6. A random sample of 10 observations gives a Assuming that the use of the t-test is valid, test
mean of 1.49 and standard deviation of 0.3. the null hypothesis that the mean of X is 120
Does this provide evidence, at the 5% level, that against the alternative hypothesis that the mean
the population mean is less than 1.6? is not 120, using a 5% significance level.
such that njJ >5 and IUJ > S, X is normal and 56.5
1
npq). Since z < 1.645, the sample value of 57 heads is not in the critical region and H
U'liK!iiS!C.•Ii.
ts not reJected. o
in standardised
X- nfJ ~n the. statistical evidence, Caroline should have concluded that the coin is no
the test stMistic is Z ~.::.: where Z ,~ I'>. b1ased m favour of heads. t
It is intebrestidng to work out how many heads would need to be obtained to conclude that the
Example 11.8 x, np cmn Is Iase m favour of heads. This can be done as follows:
Caroline was asked to test whether a coin is biased in favour of heads, using a 5% level of The standardised test value lies in the critical region if z >] .645.
significance. She tossed the coin 100 times and obtained 57 heads. What should she have
concluded?
If thOe 5numhber of heads is x, then, applying the continuity correction, you need to consider
x - . w en standardtsmg the test value.
148
<-1.64s '"'-
-1.645 X+ 0.5
Reject H 0 if the standardised value of 124.5 is less i.n i.n ~ use this as
than -2.326.
the test value
" X< 80- 0.5- 1.645 X m
x<68.10 ...
124.5- np
l't-(pti n~cl
z Since xis an integer, the critical region is x ~ 68.
'-lnpq
v·dciiLuiun. ~~---check:
124.5-135
'-Ins z, ' w~ 8 ,z= 68.5-80
~-~.. ~ m _ _ . . ..
~ -2.857 ... x. l-2.326 1~5 1.659 < 1.645, SO 68 lS 111 the Crltlcal region.
124.5
69.5-80
Since z < -2.326, the sample value is in the When x ~ 69, z ~ >{48 = -1
' 515 > -1 ·645 ' SO 69 IS· not 1ll
· the cntlcal
.. region.
critical region and so H 0 (the germination rate is 90%) is rejected in favour of
H 1 (the germination rate is less than 90% ).
There is evidence that the manufacturer is overstating the germination rate.
(b) If p = 0.3, the hypotheses become 7. In an investigation into the ownership of mobile
In Manuel's restaurant, of a random sample of
phones amongst school children, 200 randomly
H 0 : p = 0.4 chosen school children were interviewed and 1.42
100 people ordering meals, 31 ordered
H 1:p=0.3 vegetarian meals.
owned a mobile phone. Test, at the 5% level of
significance, the hypothesis that 65% of school (b) Set up null and alternative hypotheses and,
From part( a), the critical region is X,;; 68, children own a mobile phone against the using a suitable approximation, test whether
so H 0 is accepted when X> 68. alternative hypothesis that more than 65% own or not the proportion of people eating
a mobile phone. vegetarian meals at Manuel's restaurant is
P(Type II error) = P(H0 is accepted when H 1 is true) different from that at Enrico's restaurant.
8. (a) A gardener sows 150 Special cabbage seeds
=P(X> 68 when p = 0.3) and knows that the germination rate is
Use a 5% level of significance. (L)
When p = 0.3, np = 200 x 0.3 = 60 and npq = 200 x 0.3 x 0.7 = 42 75%. By using a suitable approximation 12. When a drawing pin is dropped on to the floor,
find the probability that: the probability that it lands point up is p.
Therefore X- N(60, 42). (i) more than 122 seeds germinate
(ii) fewer than 106 seeds germinate (a) A teacher drops a drawing pin 900 times
Note that the conditions np > 5, nq > 5 are satisfied so the normal approximation can be and observes that it lands point up 315
(b) The gardener also sows 120 Everyday times. Test, at the 1% level, the hypothesis
applied. cabbage seeds and finds that 81 germinate. that p = 0.4 against the alternative
Distribution given by H1 Test whether the Everyday seeds have a hypothesis p < 0.4.
Now P(X > 68)---; P(X > 68.5) (continuity correction), X- N (60, 421 germination rate less than 75%. Perform a
•.., significance test at the 4% level.
(b) A student drops a drawing pin 600 times
/
and observes that it lands point up 251
P(tpe II error) =·P(X > 68.5 when X- N(60, 42))
\ 9. A government report states that a third of
times. Using the student's results, find a
~· l"~.
68.5- 60) symmetric 95% confidence interval for p.
=P ( Z >
~42
·= teenagers in Great Britain belong to a youth
organisation. A survey, conducted among a As part of a statistics investigation, 1500
random sample of 1000 teenagers from a certain students carry out similar experiments and they
= P(Z > 1.312) each calculate (correctly) their own symmetric
city revealed that 370 belonged to a youth
= 1-0.6224 x.
- I
60 68.5 organisation. Does this provide evidence, at the 95% confidence interval for p. Find the expected
~38% z, 0 1.312
2% level, that the proportion of teenagers in this number of these intervals that do not contain the
city who belong to a youth organisation is true value of p. (C)
greater than the national average?
13. After carrying out a survey, a market research
11c a binomial Ia n 10. A questionnaire was sent to a large number of
people, asking for their opinions about a
company asserted that 7 5% of TV viewers
watched a certain programme. Another company
proposal to alter an examination syllabus. Of the interviewed 75 viewers and found that 51 had
1. In the following, X~ B(n, p) with n as shown .. 3. In a survey it was found that 3 out of every 10
180 replies received, 134 were in favour of the watched the programme and 24 had not. Does
p is unknown and x is the number of successes m people supported a particular political party. A
proposal. -s-tating necessary assumption, this provide evidence, at the 5% significance
the sample. month later a party representative claimed that
level, that the first company's figure of 75% was
Test the hypotheses stated at the level of the popularity of the party had increased. Would (a) test, at the 5% !eve, hypothesis that the incorrect?
significance indicated. you accept that the numbe7 who supported the population proportion in our of the
party was still 3 out of 10 tf a further survey proposal is 0. 7 against the alte tive that it 14. The Paper Engineering Company has
.n Hypotheses Level revealed that in a random sample of 100 people, is more than 0. 7,
X tr<iditionally supplied 85% of the retail outlets
38 supported the party? Test at the 3% level. (b) find a symmetric 95% confidence interval for origami products. With the onset of increased
(a) so 45 H 0 : p=O.S, 5% for the population proportion in favour of competition they feared that this proportion
4. A large college claims that it admits equal the proposal. (C)
H 1 :p>0.8 might have-fallen. They examined a random
numbers of men and women. In a random
(b) 60 42 H 0 : p=O.SS, 2% sample of 500 retail outlets and found that 405
sample of 500 students at the college there were 11. Over a long period of time it has been found that of them sold Paper Engineering Company
H 1: P*O.SS 267 males. Is this evidence, at the 5% level, that in Enrico's restaurant the ratio of non-vegetarian products. Usc a normal approximation to the
(c) 120 21 H 0 : p=0.25, 5% the college population is not evenly divided to vegetarian meals ordered is 3 to 1. binomial distribution to carry out a hypothesis
between males and females? During one particular day at Enrico's restaurant,
H 1: p * 0.25 test at the 1% significance level to test whether
a random sample of 20 people contained two or not their proportion of the retail outlets has
(d) 300 213 H 0 : p=0.65 1% 5. A theory predicts that the probability of an eve?t who ordered a vegetarian meal. fallen. Give suitable null and alternative
H 1: P*0.65 is 0.4. The theory is tested experimentally and m
(a) Carry out a significance test to determine hypotheses and state your conclusion clearly. (C)
90 56 H 0 : p = 0.76, 1% 400 independent trials, the event occurred 140
(e) whether or not the proportion of vegetarian
times. Is the number of occurrences significantly
H 1: p<0.76 less than that predicted by the theory? Test at the meals ordered that day is lower than usual.
State clearly your hypotheses and use a 10%
1% level.
significance level. Use an exact binomial
2. A manufacturer claims that 8 out 10 dogs prefer test.
its brand of dog food to any other. In a random 6. It is thought that the proportion of d~fec_tive
items produced by a particular machme IS 0.1.
sample of 120 dogs, it was found that 85
appeared to prefer that brand. Test, at the 5 0Yo A random sample of 100 items is inspected ~nd
found to contain 15 defective items. Does th1s
level, whether you would accept the
provide evidence, at the 5% level,_tha_t the
manufacturer's claim.
machine is producing more defective Items tha 0
expected?
HYPOTHESIS TEST 3: TESTING ,u 1 - ,u 2 , THE DIFFERENCE BETWEEN Test 3b: The populations have a common variance, CJ2' which is known
MEANS OF TWO NORMAL POPULATIONS
ff there is a common population
This test is used when you have two normal populations X 1 and X 2 with unknown means, ft 1 the test statistic is 5( 1
and p 2 , and you want to test the difference between the means of these populations. Consider
X 1 - N(p 1,a 12) and X 2 - N(p 2,a,Z).
The hypotheses might be:
Ha: 1'1 _,,, ~ ..•
H 1 : p 1 -11 2 > ··· (or 11 1 -11 2 <···or 11 1 -11 2 * ···)
Often the test involves the null hypothesis that the means are the same, i.e. p 1 ~ p 2 or Note that the 95% confidence limits for 1, 1 _ 1, 2 are (x 1 -x) + 1 96 J F } l
z -.. a -+-.
11 -11 ~ 0, so the null hypothesis would be H 0 : 11 1 -11 2 ~ 0. n1 nz
1 2
To test the difference between the means, take a random sample of size n 1 from X 1 and work
out its sample mean, .X 1 • Also take a random sample of size n 2 from X 2 and work out its Teh~t h3~: The populations have a common population variance CJ2
sample mean X1 . w 1c IS unknown • •
The test statistic is X1 - X2, and you need to consider the sampling distribution of the
If the common population variance 2 · J h ·
difference between means. The mean and variance of this distribution can be found as follows: instead. This is sometimes known a~: P' ~soulndctnown, t eln an ~nbtased estimate, az' is used
e wo-samp e estimate where
2 2 ,
Az n1s1 +nzsz
a = nl + nz- 2 (stz and Szz are the sample variances)
normal.
2
Test 3a: The population variances CJ and CJ} are known The tcsr statistic is X 1 }( ;~ vv· here X1
1
In, 1
H the '/ar!ances u 12 and arc
I
f;
n, fl.,
Solution 11.11 Let X 2 be a scout's score and let the population mean be Jt 2 •
Then X 2 - N(fl with cr ~ 3.48.
I. Ddi:tc the Let X 1 be the mass, in kilograms, of an anima; in Region A and let the
population mean bef't· Then X 1 - N(f<,, 0.04 ). . H 0 : fl 1 - fl 0 (there is no difference in the performance)
Let X 2 be the mass, in kilograms, of an amma; m Regton B and let the H 1: llt -p 2 < 0 (the guides did not perform as well as the scouts)
population mean be 11 2 • Then X 2 - N(/lz, 0.09 ).
Co tder the distribution of the difference between the means X1 - X2 •
Ha: 111 _ 112 ~ 0 (there is no difference in the masses between the regions)
tnd II,. .
H J•llt , > 0 (the animals in Regwn A have greater mass)
_ r2 _ - - 1 -X
X -2 - 1 n,
N (fl 1 -f< 2,cr'(;;;-+ . n 1 ~ 144, n 2 ~ 100
1 )) wtth
), St:ll c I iH· Consider tbe distribution of the difference between the means, X 1- X,.
distt·iin:ti\Yl ol
If H 0 is true then fl 1 - flz ~ 0,
X -X - f ~ 60, n 2 ~50
N(0,3.48 2 ( 1 ~ 4 + 1 ~ 0 ))
N(f' -112, a +a,') with n 1
n2
''CJ.'Oi'dinCJ, IU /-{,
1 2 1 nl
so X1 -X2 -
fk_
n:qu;n,·c 60 50 26.81-27.53
caiccd:tliun 3.03-3.00
0.452 ...
0.0137 ... ~ -1.589 -1.645 0
~2.184 ...
I )( Since z > -1.645, do not reject H 0 •
7. i'vhkc: y Hit Since z < 2.326, do not reject H 0 • z, 0 2.326
cundctstc•·-- There is no evidence, at the 5% level, that the guides did not perform as well as
There is no evidence, at the 1% level, tbat the animals in region A have a greater the scouts in the fitness tests.
mass than those in region B.
Example 11.13
An investigation was carried out to assess the effects of adding certain vitamins to the diet. A
group of two-week old rats was given a vitamin supplement in their diet for a period of one
month, after which time their masses were noted. A control group of rats of the same age was
fed on an ordinary diet and their 1nasses were also noted after one month.
The results are summarised in the table: /'"'' Since z > 1.645, reject H 0 •
Number in Standard There is evidence, at the 5% level, that the rats given the vitamin supplement
sample Mean deviation -·~-~.~haz:ve
a great~er mass than the rats not given the supplement.
With vitamin supplement 64 89 ·6 g 12.96 g
Without vitamin supplement 36 83 ·5 g 11.41 g Example 11.14
.
~~~~--~~~~=·
les as lar e samples from normal distributions with the san:e v~nance, a '
2
Two statisti~s t chers, Mr Chalk and Mr Talk, argue about their abilities at golf. Mr Chalk
~:~a:~~~:~e~a;:,;,l whethe~ the results provide evidence that rats given t~e vltamm claims that wi a number 7 iron he can hit the ball, on average, at least 10m further than Mr
supplement have a greater mass, at age six weeks, than those not gtven t e Vltamm Talk. They c nducted an experiment, measuring the distances for several shots.
supplement. Denoting th distance Mr Chalk hits the ball by x metres, the following results were obtained:
n 1 ~ 40, Lx ~ 4080, L(x- x) 2 ~ 1132.
Solution 11.13 Denoting the distance Mr Talk hits the ball by y metres, the following results were obtained:
n 2 ~ 35, Ly ~ 3325, L(y- y) 2 ~ 1197.
Let X be the mass of a rat given a vitamin supplement and let the population
mean be p,. Then X 1 - N(p,, az) with a unknown. Assuming that the populations have a common variance, test whether there is evidence, at the
1% level, to support Mr Chalk's claim.
Let X2 be the mass of a rat in the control group and let the population mean be 1'2·
Then x 2 _ N(112 , a 2 ) with a unknown Solution 11.14
Since the common population vanance . . un lcnown, use 8 2 where
a 2 lS
Let X be the distance, in metres, for Mr Chalk and let the population mean be,u •
n 1 s 12 +n 2 s 22 Then X- N(/t 1, a 2 ) with a unknown. 1
n 1 + n2 - 2 Let Y be the distance, in metres, for Mr Talk and let the population mean be p •
64 X 12.96 2 + 36 X 11.41 2 Then Y- N(,u 2 , a 2 ) with a unknown. 2
Mean Variance
Type A boiler 63.83 104.32
Type B boiler 52.89 72.07
n, [ I l ! j I~ l ! ! 1(
Ly 3325 . hi(~
(b) (i) Let X, be the mass of dust deposited in a type A boiler and let the
y~-~--~95 populatiOn mean be ft 1 and population variance be a 2 •
n2 35
Then X 1 - N(jr 1, a 2 ) with a unknown.
102-95-10
So
z~5.648~
-2.326 0 Let X 2 be the mass of dust deposited in a type B boiler and let the
populatiOn mean be f.-l 2 and population variance be a 2 •
The hypotheses are as before, but the test statistic, the difference between the means, (iii) 50 1545 6.5 50 1480 7.1 Ho:flt =p.z 1%
is distributed Ht: flt >flz
n,
18 470
!:x
300 27 663 0.86 Ho:flt =flz
Ht: ftt *Pz
10%
~
Level
2.5% 2.5% (v) 40
14 2128 810 50 2580 772
9 Ho:flt =flz 5%
~ 1.802 ... -1.96 1.96 Ht:fl1 >pz
" (vi) 80 6824 2508 100 8740 3969 Ho: flt -.ftz 2%
since Iz I< 1.96, H 0 is not rejected. Ht:!f.t *P.z
(vii) 65 5369 8886 80
There is no difference between the samples with regard to the mean dust deposit. 4672 5026 Ho: !l 1 - {t 2 = 20 1%
Ht: flt- flz > 20
(c) Considering the variances of the samples, it would seem that the common population 2. A large group of sunflowers is growing in the
· f 196 0 given in part (b) is suspect. The value of JUSt over 100 gJVen by the shady side of a garden. A random sample of 36 3. The lengths, in millimetres, of 9 screws selected
vanance o · I . n 1 b urate of these sunflowers is measured. The sample at random from a large consignment are found
unbiased estimate appears more reasonable and so resu t (a) 1s Ee y to e more ace · to be:
mean height is found to be 2.86 m, and the
sample standard deviation is found to be 0.60 m. 8.00, 8.02, 8.03, 7.99, 8.00,
A second group of sunflowers is growing in the 8.01, 8.01, 7.99, 8.D1.
sunny side of the garden. A random sample of 26
From a second large consignment, 16 screws are
of these sunflowers is measured. The sample selected at random and their mean length is
mean height is found to be 3.29 m and the found to be 7.992 mm.
sample standard deviation is found to be 0.9 m. Assuming that both samples are from normal
Treating the samples as large samples from populations with variance 0.0001, test, at the
normal distributions having the same variance 5% significance level, the hypothesis that the
but possibly different means, obtain a pooled second population has the same mean as the first
estimate of the variance and test whether the population, against the alternative hypothesis
results provide significant evidence (at the 5% that the second population has a smaller mean
level) that the sunny-side sunflowers grow taller, that the first population. (C)
on average, than the shady-side sunflowers. (C)
544 /i, C ~;~~CJSE~ COUF.Sf~ i\ .6,-
T
I
!-!'lPCI"i--if:_S!S t~s-r!>lc: rt:STS,G..i\Di -ii:.STS.: 545
4. Hischi and Taschi are two makes of video tapes. 7. A random sample of size J 00 is taken from a A rand~m sample of 48 individuals from the
They are both advertised as having a recording ai
normal population with variance = 40. The populatiOn of young men aged 18 and of b~en. dra:'rn at random from independent normal
time of 3 hours. A sample of 49 Hischi tapes sample mean .X 1 is 38.3. Another random sample, moderat.e intelligence have foot lengths d1stnbut10ns having a common variance. Obtain
was tested and denoting the actual recording time of size 80, is taken from a normal population summansed b~ X= 26.6, I:(x -.X) 2 = 123.20. A an unbiased two-sample estimate of this common
by h minutes, the following results were obtained: with variance a~= 30. The sample mean .X 2 is c?m~lex g.enettc theory suggests that persons of varian~e. Tre~ting the samples as large samples,
40.1. Test, at the 5% level, whether there is a test thts genetic theory, using a significance test
Lh ~ 8673, L(h- h) 2 ~ 12 720 htgh mtelhgence have a greater foot length than
at the 1% significance level and stating clearly
significant difference in the population means p 1 do those of moderate intelligence. The two
A sample of 81 Taschi tapes was also tested. and p 2 • the hypotheses under comparison. (C)
samples described above may be assumed to have
Denoting the actual recording time by t minutes,
the results obtained were: 8. A certain political group maintains that girls
Lt~ 14 904, L(t- 1) 2 ~ 33 488 reach a higher standard in single-sex classes than
If the recording times for the two makes are in mixed classes. To test this hypothesis 140 girls
normally distributed and have a common of similar ability are split into two groups, with
variance, show that the unbiased estimate of this 68 attending classes containing only girls and 72
attending classes with boys. All the classes follow 1. A random sample of size n 1 is taken from population X_ N(p 0 2) and a rando , f . .
common variance is 361. Test whether there is ta 1cen from population y _ N(Jt , 0 2). !> m samp1eo s1ze n 2 1s
significant evidence, at the 5% level, of a the same syllabus and after a specified time the 2
difference in the mean recording times. Is the girls are given a test. The test results are (a) Obtain an unbiased estimate of a 2 by pooling the results from the two sam les.
difference significant at the 4% level? summarised thus: (b) Test the hypotheses stated at the level of significance indicated. _p
Girls in the mixed classes:
5. A large number of tomato plants are grown Lx ~ 7920, Lx 2 ~ 879 912 n, Lx L(x -x) 2 n, Ly L(y-y)'
under controlled conditions. Half of the plants, Hypotheses Level
Girls in single-sex classes:
chosen at random, are treated with a new Ly ~ 7820, Ly 2 ~ 904 808 (i) 6 171 83 7 164.5 112 Ho:ftt =pz
fertiliser, and the other half of the plants are 5%
treated with a standard fertiliser. Random Treating both samples as large samples from Ht:Pt >f..tz
samples of 100 plants are selected from each normal distributions having the same variance, (ii) 5 678.5 562.3 7 971.6 308.6 Ho:ftt =Jtz 5%
half, and records are kept of the total crop mass obtain a two-sample pooled estimate of the
of each plant. For those treated with the new common population variance. Test whether the Ht:#t *-ltz
results provide significant evidence, at the 1% (iii) 8 238.4 296 10
fertiliser, the crop masses (in suitable units) are 206 145 Ho:Pt-Pz =4 1%
summarized by the figures level, that girls reach a higher standard in single-
sex classes. Hl:tt 1 -tt 2 >4
Lx ~ 1030.0, Lx 2 ~ 11 045.59. (iv) 12 116.16 45.1 18 156.96 72
The corresponding figures for those plants 9. The mean height of 50 male students of a college Ho=Pr =pz 10%
treated with the standard fertiliser are who took an active part in athletic activities was H1: Pt *Pz
Ly ~ 990.0, Ly 2 ~ 10 079.19. 178 em with a standard deviation of 5 em while
50 male students who showed no interest in such 2. The heights (measured to the nearest centimetre)
Assuming that the variance of X is the same for
Treating the sample as a large sample from a of a random sample of six policemen from a
activities had a mean height of 176 em with a both types ?f golf ball, obtain a pooled (two
normal distribution, and assuming that the standard deviation of 7 em. Test the hypothesis certain force in Wales were found to be:
sample) estimate of this variance and test at the
population variances of both distributions are that male students who take an active part in 176, 180, 179, 181, 183, 179. S:O level whether his results for 'Gof~r' golf balls
equal, obtain a two-sample pooled estimate of athletic activities have the same mean height as drffer significantly from those for 'Farfly' golf
the common population variance. the other male students. The heights {measured to the nearest centimetre) balls. (C)
Assumjng that it is impossible for the new If both samples had been of size n, instead of 50, of a random sample of 11 policemen from a
fertiliser to be less efficacious than the old find the least value of n which would ensure that certain force in Scotland gave the following data: 4. ~r Mean notes the time, in minutes, that it takes
fertiliser and assuming that both distributions are the observed difference of 2 em in the mean hnn to drive to work in the mornings. The results
normal, test whether the results provide Ly ~ 1991, l:(y- y) 2 ~54. are:
height would be significant at the 1% level.
significant evidence (at the 3% level) that the (Assume that the samples continue to have the Test at the 5% level, the hypothesis that Welsh n 1 =8, l:x 1 =120, :Ex/=1827.
new fertiliser is associated with a greater mean same means and standard deviations.) (C) policemen are shorter than Scottish policemen. For his return journey in the rush hour Mr
crop mass, stating clearly your null and
Assume that the heights of policemen in both Mean notes that: '
alternative hypotheses. (C) 10. A random sample of 27 individuals from the forces are normally distributed and have a
population of young men aged 18 and of high commOn population variance. nz = J 0, l:x 2 = 230, :Ex}= 5436.
6. Mr Brown and Mr Green work at the same
intelligence have foot lengths (in centimetres, to He maintains that, on average, it takes him at
office and live next door to each other.
the nearest centimetre) as summarised below. 3. An expert golfer wishes to discover whether the least ten minutes longer to drive home.
Each day they leave for work together but travel
average distances travelled by two different
by different routes. Mr Brown maintains that his
Foot length (a) Using t~e results from the two samples, find
route is quicker, on average, by at least four brands of golf ball differ significantly. He tests
an unbrased estimate of the common
minutes. Both men time their journeys in minutes {in em) 24 25 26 27 28 29 30 each ball by hitting it with his driver and population variance.
over a period of ten weeks. The results obtained measuring the distance X (in metres) that it
{b) Assuming that the times of all journeys are
were: Number with this travels. The distribution of X may be assumed to
be normal. normally distributed, use the two-sample t-
foot length 2 3 9 6 5 1 test at the 5% level to test Mr Mean's daim.
MrBrown: n 1 =:50, .X 1=:21, s?=10.24 His results for a random sample of 9 'Farfly' golf
Mr Green: 11 2 =50, X 2 = 24, s/ = 7.84 Obtain the sample mean and show that the balls were X= 214 and :E(x- X) 2 = 2048. 5. Random samples of year ·1 0 pupils at two
Assuming that the times are normally distributed unbiased estimate of the population variance, His results for a random sample of 16 'Gofar' schools are given the same mathematics test. The
and that they have a common population based on this sample, is 2.00. Obtain a 96% golf balls were results are summarised thus:
variance, test at the 5% level whether Mr confidence interval for the mean foot length of x ~ 224 and L(x- x) 2 ~ 2460. School A: 11 1 =20, X=43, l:(x- x) 2 ~ 1296
Brown's claim can be accepted. this type of person. School R: n 2 = 17, y = 36, L(y- y) 2 ~ 1388
T
I
Assuming that the distributions of marks are It is desired to examine whether the average
normal with a common population variance, test volume of liquid delivered to a container by the
at the 2% level whether there is a significant
difference in the mathematical ability of the Year
machine is the same after overhaul as it was
before.
Summary
J 0 pupils at the two schools. (a) State the assumptions that are necessary for
the use of the customary t-test.
6. A random sample of size n 1 is taken from a {b) State formally the null and alternative
population P1 whose mean isp 1 and variance a/ hypotheses that are to be tested. ., For stages in a hypothesis test, see page 513
and a random sample of size n 2 is taken from {c) Carry out the t-test, using a 5% level of
population P2 with meantt 2 and variance a/. significance. ® For critical values and rejection criteria for a z-test see page 51.3
Under what circumstances is it valid to test the (d) Discuss briefly which of the assumptions in
hypothesis tt 1 - tt 2 ""0 using a two-sample t-test? {a} is least likely to be valid in practice and
why. (MEI)
0 Standardised test statistics:
A machine fills bags of sugar and a random
sample of 20 bags selected from a week's Test 1: Testing an unknown population mean /l H. p-
8. The performances of trainee actors who have ' o· - fto·
production yielded a mean weight of 499.8 g
passed through a drama school are rated by a When a 2 is known.
with standard deviation 0.63 g. A week later a
panel of experienced actors who assign an
sample of 25 bags yielded a mean weight of 1a X is normally distributed, X- N!.f.l, 0 2)
overall mark for each trainee. The drama school
500.2 g with standard deviation 0.48 g.
has recently introduced a new training method For samples of size n (any size),
Assuming that your stated conditions are which, it is claimed, will lead to better
satisfied, perform a test to determine whether the 2
performances.
mean has increased significantly during the The marks for a random sample of 6 trainees X- N(l'o, : )
second week. using the old training method were
Test whether the mean during the second week 243, 228, 220, 206, 230, 198. · . z X-fto
could be 500 g. (Use a 5% significance level for Test statistic = af;/n where z- N(O, 1).
both tests.) and the marks for a random sample of 8 using
the new method were
1b X is not normally distributed
7. A liquid product is sold in containers. The
containers are filled by a machine. The volumes 235, 259, 227, 242, 238, 253, 221, 217.
For large samples of size n, by the central limit theorem
of liquid (in millilitres) in a random sample of 6 Use an appropriate t-test to examine, at the 5% 2
containers were found to be: level of significance, whether there is evidence
that the new method has led, on average, to
X- N(l'o. : ) '
497.8, 501.4, 500.2, 500.8, 498.3, 500.0.
higher scores. State carefully the assumptions on
After overhaul of the machine, the volumes {in which this procedure is based. .. z X-fto
millilitres) in a random sample of 11 containers Provide a two-sided 95% confidence interval for T est statistic = a/;/n where z- N(O, 1).
were found to be the true difference in mean scores between the
old and new methods. State carefully the When a2 is unknown,
501.1, 499.6, 500.3, 500.9, 498.7, 502.1, interpretation of this interval. (MEl)
500.4, 499.7, 501.0, 500.1, 499.3.
lc
X is pr;~:(:o~~yally distributed. For large n,
.. Z X-l'o
Test statist:lc = &f{;, where z- N(O, 1).
.. T
T est statistic X-l'o
= &f{;, where T- t(n- 1).
Test 2: Testing a binomial proportion p, where X- B(n, p).
X is the number of successes in n trials.
If n is large such that np > 5 and nq > 5, then X- N(np, npq).
.. Z X-np
Test statiStic = - - where Z- N(O, 1).
Remember to Lt$C a continc;iry correction ( ± 0.5).
'lnpq
ill'
T
(b) Test at the 5% significance level whether there is evidence that the population mean time
Test 3: Testing p, _ p 2 , the difference between means of two normal distributions has changed from 21.75 seconds.
3a 2 2 k A technician who carried out the above test concluded with the following incorrect
;,·-a~2 ~n:w(r: -r<2, a,"+ al)
n1 nz
statement.
Give a corrected version.
..
Test stat1st1C Z ~
x,- x2 - (u, _,,2) where Z ~ N(O, 1).
'It is not necessary for the population to be normal since the sample size is large and the
central limit theorem states that any sufficiently large sample is normal.' (C)
2 2
a1 az
-+-
n1 nz Solution 11.16
2
3b Common population variance a known Let T be the time, in seconds, to check an item.
Test statistic Z
X, - X~ (u, - flz)
2- where Z ~ N(O, 1). 1 (24592.35---
1107 )
2
~-
.2_+.2_
(f 49 50
n1 nz ~1.7014 ...
Carry out a two-tailed test at the 5% level and reject H 0 if Iz I> 1.96 where z~ t-r<
,,-·
ii/vn
~~
1107
From the sample, t ~- ~ - - ~ 22.14
n 50
22.14-21.75
z~ 2.114 ...
Miscellaneous worked examples .Yl. 70/-YsO
Since Iz I> 1.96, reject H 0 • -1.96 1.96
Example 11.16
There is evidence, at the 5% level, that the population mean time has changed from
An inspector of items from a production line takes, on a:erage, 21.75 seconds to ch~ck each 21.75 seconds.
item. After the installation of a new lighting system the times, t seconds, to check each of
50 randomly chosen items from the production line are summansed by Lt ~ 1107, The central limit theorem states that the distribution of means is approximately normal
for large sample size n.
~t ~
2
24 592.35.
(a) Calculate an unbiased estimate of the population variance of the time taken to check an NOTE: the variable in Example 11.16 was given as T. Do not confuse with the
standardised statistic in the t-distribution.
item under the new lighting system.
'"i'!J' !! 'rS) 55!
Solution 11.17 IPf(Tn is increased, but P(Type I error)= 0.05, the critical value for X will decrease
ype II error) w1ll also decrease. ·
P(Type I error)= P(H0 is rejected when H 0 is true)= 0.05.
This is illustrated in the following diagrams:
So the significance level of the test is 5%.
When n=30:
X- N(u, 3.5 2 )
Ho
H 0 :11= 15
H 1:p>15
under H 1
According to H 0 , X- N(15, 3.5 2 ) X- N(17 ' 330
· 5')
(a) Using a one-tailed (upper tail) test at the 5% level, reject H 0 if z > 1.645
------;--1':5~_ ____:1'-"6.05 17
x-15 Accept H0
where z 130
3.5/ 30 When n > 30, the curves are more squashed:
So the critical (rejection) region for xis given by
j (=\
\
H 0 is accepted jf x < 27, i.e. if x < 26.5 (contmmty correctiOn) There is not enough evidence to say that the mean volume is not two litres.
so P(Type II error)= P(X < 26.5 when X- N(32, 6.4)) Let W be the volume, in litres, dispensed by the new machine.
26.5- 32) \ Assume that
2
W has a normal distribution and the two samples have a common population
= P( Z < ?(Type II error) "\ variance a • This time the two samples are considered to give a two-sample estimate of a 2 •
16.4 ""
-~L./"...L-~'---~-
,, .. z nJs12+nzSz2
302
=P(Z < -2.174) a
= 1 - 0.9852 ~· -~ 6ii4 n 1 +n2 -2
= 0.0148
~ 1.5%
T
where s," ~ 0.002 976 (from first part of question) (b) Fin~ a critical region for a 5% significance
test 111 the form, me~n X used to test the null hypothesis fl = 65
I;(w- w) 2 a~an:st the alternative hypothesis p > 65, where
and s/ sample mean X > k,
nz fl_CIS the population mean temperature of
where the value of the constant k is gi discharged water. It may be assumed that the
n 2 s 22 ~ I;(w- w) 2 ~ 0.060 40 , rrect ~o two d
co '
ecunal places. ven
population standard deviation of xis 5.0.
10 X 0.002 976 + 0.060 40 {c) State, With a reason, your conclusion for the
so a' 10 + 20-2 test when the mean speed calculated from
(a) State, in the context of the question, what
you understand by
the sample was 35 m.p.h. {i) a Type I error
~ 0.003 22
(d) Calculate the power of the test when, in (ii) a Type II erro;,
a~ 0.0567 (3 s.f) fact, I'~ 40. (NEAB)
(b) The probability of a Type I error is fixed at
H 0 : /). 1 - f'z ~ 0 (the machines dispense the same amount) 4 · A supermarket's statistician reports that, over the 0.1: Show that the range of values of X for
H 1 : f't - f'z > 0 (the new machine is dispensing less than the old) past three months, the mean amount spent per wh:ch the null hypothesis is rejected is given
customer has been £43 with a standard deviation by x > 66.01, correct to two decimal places
v-W-o where T - t(n 1 + n 2 - 2) of£20. (c) State the conclusion of the test when ·
The test statistic T X= 65:7, a-?d the type of error that might be
The supermarket carries out a promotion for one
week by offering 'buy two ... get one free' on a made m this case.
range of products which it sells. The management (d) Calculate the probability of making a
hopes that this will increase the mean amount Type II error when, in fact, Jt = 68.
T where T- t(28). spent per customer; you may assume that the Wh~t can be deduced about the probability of
standard deviation remains unchanged. makmg a Type II error when, in fact, p > 68? (C)
A random sample of 50 customers visiting the
supermarket that week spent a total of £ 2400 . 7. !he manager of a large supermarket wishes to
iJ- fD (a) Write down suitable null and alternative Judge the effect of a new layout on the
Carry out a one-tailed test, at the 10% level. Reject H 0 if t> 1.313, where t hypotheses in order to test whether or not ~ustomers. On the day that the layout was
the promotion has increased the average mtroduced the first 200 customers in the store
level ~£ spending per customer. were asked whether or not they approved of the
new layout.
(b) Explam ~arefull~ the use of the Central limit
theorem m carrymg out this hypothesis test. Comment on the manner in which the sample
{c) ~ar~~ out the hypothesis test at the 5% was cho~en, and suggest a way of obtaining a
more smtable sample.
sigmfic~nce level, dearly stating your
conclusion. Out of a suitably chosen sample of 200
(d) Find a 90% confidence interval for the mean customers, 148 approved of the new layout.
am~:mnt spent by customers during the ~alculate an approximate 95% confidence
Since t > 1.313, reject H 0 • penod of the prom.at~on. State, giving a mterval for the population percentage of
reason, _wh:ther this IS consistent with your customers who approve of the new layout.
There is evidence at the 10% level that the new machine is dispensing less paint than the old conclusiOn 111 (c). (MEl) The supermarket manager claims that 80% of
machine. customers approve of the new layout. Show that
5. T_he process of manufacturing a certain kind of the · ·£·
The condition required: The two populations must be normal with common variance. 1 1data provide evidence
. at the 2lot
2 to s1gm lCance
dmner plate results in a proportion 0.13 of faulty ev~ that the populatwn percentage is less than
plates. An alteration is made to the process 80~. (C)
which is intended to reduce the proportion of
Miscellaneous exercise lle faulty plates. State suitable null and alternative 8. T~e random variable X has a normal distribution
hypotheses for a statistical test of the Wlth mean fl {unknown) and variance 0 1
{a} State an assumption necessary for these 500 effectiveness of the alteration. (known). To test the null hypothesis Ho: fl ""fto
1. The amount of nicotine, in milligrams, in a
cigarette of a certain brand is normally children to be considered as a random In order to carry out the test, the quality control a random sample of n observations of X is take~
distributed with mean fl and standard deviation sample of the population of all children. department count the number of faulty plates in and the sample mean is X. Find, in terms of fl a'
2.5. A random sample of 10 cigarettes yielded a (b) Test at the 10% significance level whether a random sample of 2500. If 290 or fewer faulty and n, the set of values of X which will result i~
mean nicotine value of 18.4. Obtain a symmetric the data indicate that boys and girls are not plates ~refound then it will be accepted that the each of the following:
90% confidence interval for ft, giving values to equally likely in the population. (C) alteratwn does result in a reduction in the
~rop~rtion of faulty plates. Calculate the
*
(a) Ho being rejected in favour of H 1 :p p at
0
three significant figures. the 5% level of significance
Give a reason why the value of p might not be 3. A resident of an urban road claims that the sigmficance level of this test, using a suitable {b) Ho not being rejected in fav~ur of H 1 :p <p
average speed of vehicles using the road is normal approximation. at the 1% level of significance.
0
inside this interval.
Test the null hypothesis ft = 17.8 against the greater than the 30 m.p.h. speed limit. To Caku!ate the probability of making a Type II
*
alternative hypothesis p 17.8 at the 10% investigate this claim the police time a randomly error ~~ the above test, given that the alteration 9. The masses of components used in making a
significance level. (C) selected sample of 25 vehicles over a measured results 10 a decrease in the proportion of faulty model car are being checked. Each of a random
mile on the road. It is assumed that the speeds plates to 0.11. (C) sample of 200 components is weighed and the
2. A study is made of the numbers of boys and girls calculated from their observations come from a masses, x g, are summarised by
normal distribution with mean ft m.p.h. and 6. ~ater from a cooling tower at a power station is
in families. A random sample of families is
discharged into a river. In order to test whether n ~ 200, l:x ~ 1484.2, l:x 2 ~ 11 098.19.
chosen. The total number of children is 500, of standard deviation 12 m.p.h.
the mean temperature of discharged water is (a) Calculate an unbiased estimate of the
whom 261 are girls. It is desired to test the null (a) State appropriate null and alternative
hypothesis that boys and girls are equally likely greater than the permitted maximum of 65 oc population variance.
hypotheses for a significance test. the temperature (x oq of 40 randomly selected (b) Sta.te what you understand by 'unbiased
in the population against the alternative estimate'.
hypothesis that they are not equally likely. samples of water will be taken and the sample
T
t:--, /..([;··if<_;[~:: 557
The components are produced in large batches. It measured, the results being summarised by: 15. A study of the annual rainfall x em to th
is desired that the mean mass of components in a :Ex~ 2092.0 and :Ex 2 ~ 24 994.5. nearest centimetre, over the l;st 20 years ~or a I know that the he_igh!s, in metres, of men in
batch should be at least 7.40 g. In order to (a) Calculate, to four significant figures, small town gave the following results: general have the dtstnbution N(1.73 0 082) I
make the assumption that the hei h ' . .
decide whether to accept or reject a batch each of
a random sample of 50 components from the
unbiased estimates of
(i) the population mean distance, fl miles,
:Ex~ 1325, :Ex'~ 90 316. o~ m~le basketball players are als~ :~'r!~~etres,
batch is weighed. The sample data is used to of the houses from the station, (a) Fin? unbiased estimates of the mean and the dtstnbu~ed, with the same variance as the ~i
vanance of the annual rainfall f or th.IS town of men tn general, but possibly with a I ghts
perform a test of the null hypothesis fl"" 7.40 (ii) the population variance of the distances mean. arger
against the alternative hypothesis fl < 7.40, where of the houses from the station. ~~chive reco~ds sho~ that the annual rainfall f .
11 g is the mean mass of components in the batch. State what you understand by the term t Is town, pnor to th 1s period had I or (a) Write down the null and alternative
For the test the population variance is taken to 'unbiased estimate'. f 62 50 d ' a mean va ue hypotheses under test.
o . em an a standard deviation of
have the value found in part (a). The batch is (b) Using the sample data, a significance test of 11.45 em. 1 propose to base my test on the heights of ei ht
rejected if the null hypothesis is rejected using a the null hypothesis 11"" 10 against the (b) Assuming that the standard de . . male basketball players who recently gd
21% significance level. Show that the batch will alternative hypothesis ft > 10 is carried out . viatwn fo 1 1 appeare
remams unchanged at 11.45 em t t h . r ?~r oca team, and I shall use a 5% 1 1 f
be rejected if the sample mean mass is less than at the a% significance level. In the test, the So/c 1 I of Sigm. .. , es at t e sigmficance. eve o
7.22 g. sample mean is compared with the critical th o eve
. 'd 1ICance wheth er or not
For one such batch the sample data is value of 10.65; as the sample mean is less ere IS e~t ence of an increase in mean (b) Write ':!_own the distribution of the sample
summarised by n =50, Lx"" 366.0. than 10.65 the null hypothesis is not annual ram fall over the last . ,State
, 20 yeats. mea~, X_, fo~ samples of size 8 drawn from
rejected. Calculate the value of a. your hypotheses clearly. {L) the distn?u.tiOn of X assuming that the null
Determine whether or not this batch is rejected. hypothesis IS true.
(c) Give a reason why it is not necessary for the
Calculate the probability of making a Type II
distances to be normally distributed for the 16. I~/97~!dhe Borsetshire County Council tree (c) ?eterm~ne the critical region for my test,
error in carrying out the above test for a batch o tcer I a survey of a random sample of 64
test to be valid. (C) Illustratmg your answer with a sketch.
whose mean mass is actually 7.10 g. (C) separate areas, ea:h 1 km square, and found an {d) C~rry out the test, given that the mean
13. A particular investigation concentrated on people averag~ of 19.5 diseased trees per square. The height of the eight players is 1.765 m. You
10. A box of dice contains some which are unbiased recently re-employed following a first period of followmg year, to test whether the disease had sho~ld present your conclusions carefully
and some which are biased in such a way that spread, she took a new random sample of 36
unemployment. Each of a random sample of 50 statmg any additional assumption you ne~d
the probability of throwing a six with one of separate areas, also each 1 km square and f 1 to make.
such persons was asked the duration, in months,
these dice is t. One die is selected at random an average of 21.7 diseased trees per ~quare~unc
of this period of unemployment. A summary of In fact, the distribution of X is N(1.80, 0.062).
from the box and, in order to decide whether it is
the results is as follows. (a) A_ssume that, in both years, the number of
biased, it is thrown 240 times and the number of 2 (e) Find the probability that a test based on a
sixes, N, is counted. The probability of throwing mean= 16.7 months, variance= 193.21 month d~se~sed .trees per 1 km square had a normal
distnbution Wtth population variance 18 2 ~a:l?om sa~pl? of size 8 and using the
a six with this die is denoted by p. The null Investigate at the 5% level of significance the ~.:ntical ~·egton m part (c) will lead to the
hypothesis p"" ~is tested against an appropriate claim that, for people re-employed after a first
Test, at t?e 1% significance level, the ..
conclusiOn that male basketball PIayers are
alternative hypothesis at the 5% significance hypothesis that the mean number of
period of unemployment, the mean duration of not ta IIer than men in general. (MEl)
level. dtseascd trees per 1 km square in 1979
unemployment is more than 12 months. th · was
Indicate why, in carrying out your test, no e same as m 19 78, against the hypothesis 18. The in~redients for concrete are mixed together
(a) State an appropriate alternative hypothesis. t h at the mean number had increased
assumption regarding the distribution of the to obtam a mean breaking strength of
(b) Find the set of values of N for which it is {b) F~trther evidcn_ce suggests that the n~mber of
accepted that the die is biased. duration of the first period of unemployment is
dtseased trees IS not normally distributed. ~000 newtons. If the mean breaking strength
(c) Find the probability of making a Type II necessary. (NEAB) rops below 1800 newtons then the composition
~ty what changes you might have to make
error in the test. [<!>(3.355) ~ 0.99961 (C) I any.' ~o the test you have carried out, ' must be ~hanged The distribution of the breaking
14. The error in the readings made on a measuring sti:ength IS normal with standard deviation <
instrument can be modelled by the continuous explammg the reasons for your answer. Do 200 newtons.
11. A manufacturer makes two grades of squash not carry out any further tests. (C)
ball: 'slow' and 'fast'. Slow balls have a 'bounce' random variable X which has mean ft and Samples are taken in order to investigate the
(measured under standard conditions) which is standard deviation a. If the instrument is 17. :hen watching games of men's basketball I hypotheses:
known to be a normal variable with mean 10 em correctly calibrated then ft = 0.
In order to check the calibration of the instrument, . ave noticed that the players are often tall' 1 H 0 : p. = 2000 newtons
and standard deviation 2 em. The 'bounce' of
the errors in a random sample of 40 readings were mterested to find out whether or not m:n ~h~m HI :ft = 1800 newtons
fast balls is a normal variable with mean J 5 em play basketball really are taller than men in
and standard deviation 2 em. A box of balls is determined. These data are summarised by: How many samples must be tested so that
general.
unlabelled so that it is not known whether they l:x ~ 120, :Ex 2 ~ 3285. P(Type I cnor)) ~ 0.05 and
P(Type II error) ~ 0.1?
are all slow or all fast. {a) Estimate a 1 .
Devise a test, based on an observation of the (b) Carry out a hypothesis test, at the 5% level of
mean bounce of a sample of four balls from the significance, to test whether the machine is, or
box such that the Type I error is 0.05 and state is not, correctly calibrated. You should state
the magnitude of the Type II error for this test. your hypotheses and conclusions carefully.
(C) (c) Obtain a symmetric 95% confidence
interval for 11> explaining why it is only
12. An ambulance station serves an area which approximate.
includes more than 10 000 houses. It has been (d) Suppose the data from the 40 readings had
decided that if the mean distance of the houses been such that the estimate of a 2 as found in
from the ambulance station is greater than ten part (a) was larger, but without changing
miles then a new ambulance station will be the sample mean. State the effect this would
necessary. The distance, x miles, from the station have on the value of the test statistic in part
of each of a random sample of 200 houses was (b). Explain why this might affect the
conclusion to part {b). (MEI)
558 T ! ! 'if--·( I ;.<f- :-)
I
Test llA (z-tests) (b) State ~hat 'a Type II error has occurred'
For th~ last 15 Number One records featuring
means m the context of the playing tim f
tapes. es o male smgers, the data are:
1. Cans of lemonade are filled by a machine which (b) The tar yields in cigarettes of a particular
is set to dispense a mean amount of 330 ml into brand are distributed normally with mean (c) Calculate the probability of making a 1, 1, 2, 2, 2, 3, 4, 2, 1, 2, 3, 5, 1, 2, 3.
each can. The manufacturer suspects that the p. mg and standard deviation 0.8 mg. In Type II error when, in fact, p = 59. 7 . (C) A music industry producer wished to test
machine is tending to over-dispense and, in order order to test H 0 :p= 17.5 against
to test the suspicion, measures the contents, x ml, H 1 : ft > 17.5 at the 1% level of significance, 4. The t~p ~0 chart of the Recorded Music whether there was any difference in the time
Assonatwn has been compiled every week for spent at Number One between female and male
of a random sample of 30 cans. The results arc a random sample of 10 cigarettes of this
some years, and the standard deviation of the smgers. ~he assumes that both the distributions
brand is to be obtained and the sample
summarised by:
number of weeks which a record spends at from wh,~h the two samples are drawn are
mean X calculated.
~x ~ 9925, ~x 2 ~ 3 284 137. Number One in the chart has been found to be normal With standard deviation 0.87 weeks.
(i) In the case when the yields were
(a) Calculate an unbiased estimate of the recorded, in milligrams, as: 0.87 weeks. The number of weeks which the last (a) State the null and alternative hypotheses she
population variance of the amount ten N~mber One records featuring female singers must use.
17.1, 18.3, 18.9, 17.8, 16.9, 19.2, spent m the Number One position are: (b) <?ar~~ out the test at the 5% level of
dispensed into each can. Give four
17.8, 18.3, 18.5, 18.2 sigmftcance.
significant figures in your answer. 3, 1, 1, 2, 1, 2, 3, 2, 1' 1.
(b) Test the manufacturer's suspicion at the carry out the required significance test. (c) Give a ~eason why her assumption of
10% significance level. (ii) Determine a critical region for the test normality may be invalid. (L)
(c) Indicate where the central limit theorem is in the form, X > c, where c is a
used in the test, and state why the use of the constant whose value is to be
central limit theorem is necessary. (C) determined.
(iii) Calculate the size of the Type II error Test llC <t-tests)
2. The proportion of patients who suffer an allergic for this test when, in fact, fl = 18.0.
reaction to a certain drug used to treat a (NEAB) 1. Six cleaning firms were selected at random and
asked abo~t their hourly rates of pay, $x, ~ith (a) Calcula~e an unbiased estimate of the
particular medical condition is assumed to be
the followrng results: populatiOn variance.
0.045. 4. The random variable X is distributed as
When 400 patients were treated, 25 suffered an N(,Lt, 16). A random sample of size 25 is (b) Is the:e evidence at the 10% level that the
7[.00, 6.80, 6.62, 6.94, 7.48, 7.04 machme is issuing too many crisps per
allergic reaction. Using a normal approximation, available. The null hypothesis ft = 0 is to be Lx~41.88, ~'~292.74.]
test at the 5% significance level whether the tested against the alternative hypothesis fl * 0. packet? State any distributional assumptions
made.
quoted figure of 0.045 is an underestimate. (C) The null hypothesis will be accepted if Carry_ out a t-test at the 1% significance level to
-1.5 <X < 1.5, where X is the value of the est.abhsh whether the mean hourly rate of (c) How v:rould t~e test procedure in part (b)
. y cIeanmg
P~l db · 1·mns, falls below a proposed~h have differed If the population variance was
3. (a) A null hypothesis H 0 is to be tested against sample mean, otherwise it will be rejected.
mtmmum of $7.40. known? (AQA)
an alternative hypothesis H 1. Explain what Calculate the probability of a Type I error.
is meant by: Calculate the probability of a Type II error if in State ~lear~y an assumption made in applying the
t-test m thts context. (C) 4. The
· . customers
d of a local branch of a bancare
1
{i) a Type I error, fact ft = 0.5; comment on the value of this
mvi~e to c~mment on various aspects of the
(ii) a Type II error. probability. (MEI)
2. On a certain day in July the maximum servtce. Their comments are translated · t
, "f. moan
temperature, m oc, was recorded at 11 points avera 11 satts actiOn score'. This score can be
chosen at random on the island of San Marco taken a~ normally distributed over the whole
0~ the same day the maximum temperature, . popu1atwn of customers.
P C, was recorded at 20 points chosen at A s_taff training programme has recently been
Test llB (z-tests) random _on the island of San Polo. The results are completed. A random sample of scores before the
summansed by: programme was as follows:
1. A certain brand of mineral water comes in A spokesman for the government stated that the
m~25.30, p~26.45, 126, 93, 114, 107, 98, 112.
bottles. The amount of water in a bottle, in results showed that 40% was too low.
L(m- m)' ~ 16.74, L(p- 151' ~ 15.29. A separate random sample of scores after the
millilitres, follows a normal distribution of mean Stating the null and alternative hypotheses, test
p. and standard deviation 2. The manufacturer at the 5% level which of the spokesmen was Test, at the 2.5% significance level, the claim programme was as follows:
claims that,Lt is 125. In order to maintain justified in his assertion. (L) that S~n. Marco was cooler than San Polo on that 124, 107, 117, 136, 120, 122.
standards the manufacturer takes a sample of 15 day, gtv:ng your null and alternative hypotheses
bottles and calculates the mean amount of water 3. The playing times of a particular brand of audio and statmg any assumptions necessary for your Test at t~e 5% level of significance the null
per bottle to be 124.2 millilitres. Test, at the 5% tape are normally distributed with mean test to be valid. (C) hypot~e~ts that the mean score is the same after
level, whether or not there is evidence that the p minutes and standard deviation 0.24 minutes. the
h trammg
1 . programme as it was· before agamst
·
3. The contents of a packet of crisps are marked as t e :'1- ternative that t~e new mean score is higher,
value of p is lower than the manufacturer's The manufacturer states that ft = 60. A large
30 g ..The .manufacturer believes that one of their statmg ~our as~umptwn concerning the
claim. State your hypotheses clearly. (L) batch of these tapes is delivered to a store and, in
order to check the manufacturer's statement, the
machmes IS faulty and is issuing too many crisps und~rlymg ~anances. Provide a two-sided 99%
per packet. A sa~nple of 10 packets is selected at confidence Interval for the true difference in
2. A newspaper headline stated 'Majority would playing times of a random sample of ten tapes ran_dom from thts machine and the masses of
vote for Prime Minister'. The article explained were measured. The null hypothesis ft = 60 is mean scores. (ME!)
thetr contents were:
that in a survey of 70 randomly selected people, tested against the alternative hypothesis p < 60 at
38 had said that they would vote for the Prime the 1% significance level. 31.5, 28.9, 30.5, 32.2, 35.5,
Minister. A spokesman for the opposition party 34.2, 31.8, 32.8, 29.1, 32.1.
(a) Find the range of values of the sample mean
said that such evidence was inconclusive, and, X for which the null hypothesis is rejected,
according to standard statistical techniques, the giving two decimal places in your answer.
result was consistent with only 40% of the whole
population voting for the Prime Minister.
Following the pattern for hypothesis tests established in earlier chapters, you assume that the
null hypothesis is true and then calculate the frequencies that you would expect to occur based
on this assumption. These are denoted by E or (,. These expected frequencies are then
compared with the actual (or observed) frequencies, denoted by 0 or f".
A test statistic involving 0 and E is calculated. This is often written X 2 and, subject to certairr
conditions, it can be approximated by a x2 distribution. Before looking in detail at how the
xtest
2 statistic is calculated and how to perform the test, consider some of the features of the
distribution.
The parameter v is known as the number of degrees of freedom and it is the number of
THE X2 SIGNIFICANCE TEST independent variables used in calculating the test statistic. Details of how to find v are given ill
the following text and in the summary table on pages 579 and 590.
. .
There are two main situatiOns w h en a x2 significance
test is used:
Critical values and levels of significance
1 A 2 goodness-o f-fIt
i~
. tes t . d ant to know h ow we11 a particular
.
T.his used when you have some practtcal data an yloumwodels that data. The null hypothesis The XI test is conducted as a one-tailed (upper tail) test. When carrying out the test, you will
h b. om~al or a norma' I .
statistical distribution, su~ ~s a . m does rovide a model for the data; the a ternatJve want to know whether the calculated value of the test statistic lies in the main bulk of the XI
Ho is that the particular dlstnbutlon p
distribution or whether it is in the upper tail critical (or rejection) region. The boundary of the
hypothesis H, is that tt does not. . . critical region is called the critical value.
. d d (or for assoClatmn). l d want to
2. A XI test form epcn ence . 1 d t concerning two variab es an you The null The critical value depends on the level of significance of the test. Often a 5% or a 1% level of
h h ve some practlca a a . . b n them.
This is used w en you ad d tor whether there is an assocmtmn etwee . hat they arc significance is used and the critical values can be found from x2 tables. For example, for a 5%
know whether they are m epen en . d dent· the alternative hypothesiS H' lS t level of significance, the critical value is such that 5% of the area is in the upper tail and the
h ypo thes ·Is H o is that the factors are m epen ' critical value is written x25 % (v), for a particular value of v.
not.
10% level 5% level 1% level
l l l
v 0.990 0.975 0.950 0.100 0.050 0.025 0.010 0.005
1 0.000 0.001 0.004 2.705 3.841 5.024 6.635 7.879
2 0.020 0.051 0.103 \4.605\ 5.991 7.378 9.210 10.597
3 0.115 0.216 0.352 6.251
critical region \7.815\ 9.348 11.345 12.838
4 0.297 0.484 0.711 7.779 9.488 11.143 13.277 14.860
h . H is rejected in favour of
If the test value lies in the critical region, then the null hypot ests o
Examples (highlighted in the extract) x2{v} The tables
the alternative hypothesis H,. give the
(c) For a significance level of 5% and 3 degrees of freedom, upper tail
x2 5%(3) ~ 7.815. probability
In this case the random digits are 5, 9, 3, 1, 9, 4, 1, 0, 6 The X test compares each observed frequency with the corresponding expected frequency.
(0- E) 2
Here are 100 digits generated on a calculator.
For each pair, calculate E , then calculate the sum to give the test statistic
4 9 8 3 3 3 7 1 3 9
9 91 86 1 1 6
3 6
0 7 7 3
3 3 7 3
5 4
3 8 1 4 2 8 8 6 1 9
~
2
4 5 3 4 9 4 3 8 5 5 If X 2 0, then there is exact agreement between the observed and expected frequencies.
8 6 6 7 5 9 2 6 3 3 If X > 0, then 0 and E do not2 agree exactly; the larger the value of X 2 the greater the
3 8 2 4 8 4 1 9 8 4 discrepancy. A low value of X implies a good fit, whereas a high value of X 2 implies a
poor fit.
1 4 2 2 1 7 0 8 2 5
7 5 8 0 4 7 6 9 1 2 For the above data,
9 7 7 5 3 7 4 0 6 6 2
xz (4 -10)
-'----c-c-'-- +
c1o -lW
+
(7 -10J' c1o -1W
+ ... + -----c-::--- 9.4
Ax' goodness of fit. test IS. used to test w hether
1
the numbers generated on the calculator are
h d t
random enough. To make It easier to ana yse t e a a, ar
range the digits in a frequency table: 10 10 10 10
The calculations are usually summarised in a table:
Digit 012 3 4 56 7 8 9
0 (O-E)2
Frequency 4 10 7 16 12 8 10 11 12 10 Total100 E
E
Make null and alternative hypotheses as follows: 4 10 3.6
10 10 0
Ho: the digits are random
7 10 0.9
HI: the digits are not random. . . .
16 10 3.6
X' ~ L (0 - E)2
Then calculate the frequencies that you would expect if the digits are random. 12 10 E
0.4 ~9.4
Expected frequency for each digit is 100 x 0.1 ~ 10 d 8 10 0.4
10 10 0
A dd anot h er row t o the table so that the observed frequencies (0) and the expecte
11 10 0.1
frequencies (E) can be compared.
12 10 0.4
Digit 0 1 2 3 4 5 6 7 8 9 10 10 0
Observed frequency (0) 4 10 7 16 12 8 10 11 12 10 Total100 1:0 ~ 100 I:E ~ 100 9.4
Expected frequency (E) 10 10 10 10 10 10 10 10 10 10 Total100
To decide whether the data give a good fit, you need to know whether 9.41ies in the main
The frequencies can be illustrated by a vertical line graph: body of the distribution or whether it is in the critical (rejection) region in the upper tail. If it
lies in the critical region, reject H •
Distribution of 100 random digits generated on a calculator 0
g- 16
Expected frequencies x
The boundary of the critical region is found from the appropriate 2 distribution which
~:-"--~"->' Observed frequencies depends on the number of degrees of freedom, v, the number of independent variables used in
~ 14 calculating the test statistic. It is found as follows:
g
G: 12
I
IO
8
l
l '
I '
I
I
I ' I '
I '
I
I
I
I
I I
The number of classes is 10 and there is one restriction (that the total of the expected
frequencies is 100), so v ~ 10- 1 ~ 9. Consider the x2 (9) distribution.
'' ' ' ''' I:
l I I I I l
j I I I I I j
6
I I I I I
4 I I I I I• '
I
I
I I l
j
Say that the test is to be carried out at the 5% significance level. The critical value, x2 %(9) is
I: Ii
l l I I I I :I
2 l '
I I I I I I I
found from tables (see page 651). 5
I I l I I I I I
0
0 2 3 4 5 6 7 8 9
,t;
lCY-
1
2
5. Compare the calculated value of X with the critical value. Make your conclusion (H is
SoH will be rejected if X > 1 6 · 919 · . rejected or H 0 is not rejected) and relate it to the context of the situation being 0
investigated.
o 1 of the test statistic, 9.4, IS less
Since the calculated va uel. . th critical region and Ho 2
than 16.919, it does not Ie m e 9.4 16.919
Note that when the value of X is very small, it is wise to query the reliability of the observed
f--:::-~
is not rejected. . d would accept that the
critical region data. This is where the lower tail (left-hand) probabilities might be useful.
On the evidence obtame. ,_rou
digits are true random digits. For example, suppose that the test involves a x2 ( 4) distribution and that the calculated value
of the test statistic is X 2 = 0. 7.
You can see from the tables on page 651 that x2 95 %(4) = 0.711, which means that if the null
.
Summary of the Procedure for performmg a X
z goodness-of-fit test: hypothesis is true you would expect a value less than 0. 711 from at most 5% of samples, so
this would be quite rare. You might wonder whether the observed data have been fiddled.
. h bserved frequencies 0: d h
For a set of data Wit o d" "b ted in a particular way an t e
Make the null hypothesis Ho that the data are tstn u
1. h . H that they are not. TEST 1: GOODNESS-OF-FIT TEST FOR A UNIFORM DISTRIBUTION
alternative hypot ests ' . "b . f !lows the one given in Ho· Note
. t d if the distn utton o f xz .. Example 12.1
2. Calculate E, the frequeXn~tes e~f:~l:es of E tend to give a large valulde o b '~~e~ IS
that when calculating ' sm e uencies below 5 shou not . e .
d . ble to adopt the rule that expected fr q f a class that is sufficiently large. The table shows the number of employees absent for just one day during a particular period
of time.
a VIsa b" dj. acent classes to orm k . d table
If E < 5 for any class, com me a . . h b erved data also and rna e a revise .
Combine corresponding frequencies m t e o s Day of the week Man Tues Wed Thurs Fri
3. Work out t h e num ber of degrees of freedom, v, where
. . Number of absentees 121 87 87 91
f classes - number of restncttons 114
Total500
b
v = num er o . · 1 ] e in tables. (a) Find the frequencies expected according to the hypothesis that the number of absentees is
h 1 1 f the test an d Iook u P the appropriate cnttca va u
Decide on~ e f eve
For examp e, or a
;%
significance level, look up Xz sdv).
~x2(v)
independent of the day of the week.
(b) Test at the 5% level whether the differences in the observed and expected data are
Use it to state the rejection criteria: .. significant.
If xz > xl 5%(v) then the test value lies in the cntical 5%
Solution 12.1
(rejection) regionb the observed and expected x',%
J!~:~~~~:~~~~~ns::::~ to be too great and Ho is e<ltlcal region H 0 : The number of absentees is independent of the day ofthe week.
H 1 : The number of absentees is not independent of the day of the week.
IfXz <xzs%(v) the test value does not rIe m
rejected. . the l~x'(v) 2 5% If the number of absentees is independent of the day of the week then you
critical region and Ho IS not rejected. would expect the total of 500 to be spread uniformly throughout the week.
Expected number of absentees for any day is 100.
cL .'d;\iC tiw lc.tc! Perform the test at the 5% level. Perform the test at the 1% level.
ri1c· \L·_q :wd From tables x'
s%(4) ~ 9.488, so reject H 0 if X > 9.488.
2
x
From tables 2 (2) _ 9 2 10 .
1% - • , so reJect H 0 if X'> 9.210.
(O-E) 2
0 E 0 (O-E) 2
X'~ L(O-E)
E 2
E x2(4) E 5.88
121 100 4.41 24 E
30 1.2
87 100 1.69 5% 14 20 1.8
87 100 1.69 62 50 2.88
91 100 0.81 9.488"'
test value L 0-100 LE ~ 100
114 100 1.96 5.88
10.56
9. During the course of one ydearha tutord:Jr:~j the 11. The grades in a statistics examination for a
111 assignments. The gra es e awar . . group of students were as follows. TEST 3: GOODNESS-OF-FIT TEST FOR A BINOMIAL DISTRIBUTION
comparable national proportions are gtven m the
table: Grade A B C D E
Example 12.3
Grade A B c D Number of students 14 18 32 20 16
A farmer kept a record of the number of heifer calves born to each cow during the first five
Number he awarded 86 18 6 1 Test the hypothesis that the distributi~n ~~
years of breeding of the cow. The results are summarised in the table:
grad es 1.s um.1orm. Use a 5% level of stgmftcance.
(L)
National proportion 71% 16% 7% 6% Number of heifers 0 1 2 3 4 5
Calculate the expected numbers (to ~ne decimal 12. An ordinary die is thrown 120 times an~ each Number of cows 4 19 41
place) based on the national pr~portwns. time the number on the uppermost facets noted. 52 26 8
The X2 goodness of fit test reqmres the The results are as follows: (a)
summation of terms of the form Test, at the 5% level of significance, whether or not the binomial distribution with
Number on die 1 2 3 4 5 6 parameters n ~ 5, p ~ 0.5, is an adequate model for these data.
(0 -E) 2 (b)
Explain briefly what changes you would make in your analysis if you were testing whether
or not the binomial distribution with n ~ 5 and unspecified p fitted the data.
E Frequency 14 16 24 22 24 20
(AEB)
where 0 and E are observed and expected
frequencies. Suggest reasons why Is the die fair? Test at the 10% level.
Solution 12.3
(a) the difference ~etween 0 and E is used 13. In a certain town an investigation w~s carried
(b) this difference ts squar~d, ~~d .t. s:~;;. ,r;_, .u1d
out into accidents in the home to children under (a)
(c) the squared difference ts dtvtded by E. 12 years of age. The numbers of reported if.'. Let X be the number of heifer calves born to a cow in the first five years of
breeding.
Test at the 5% level, whether there is an~ accidents and the ages of the children concerned
diffe'rence between the tutor's and the r.'-atwnal are summarised in the table. H 0 : X- B(5, 0.5)
awarding of grades. State your conclusiOns
clearly. (0) Group Age of child (yrs) No. of accidents H 1: X is not distributed in this way.
10. A calibrated instrument is used ~ver ~~ide ran~e A Oto<2 42 To calculate the binomial probabilities, use cumulative probability tables
of values. To assess the operato.r s ab.tl~ty. tor~~ B 2 to<4 52 which give P(X,; x).
the instrument accurately, the fmal dtgtt mea c 4to<6 28
of 700 readings was noted. The results are Alternatively, calculate the probabilities using
tabulated below. D 6 to< 8 20
P(X =x) = Cx(0.5) 5 -x(0.5)x ~ 5 Cx(0.5) 5
5
E 8 to< 10 18
Final digit Frequency F !Oto<12 16
The total number of cows is 150, so the expected frequencies are found by
0 75 (a) State the modal class.
multiplying P(X = x) by 150.
1 63 (b) Calculate to the nearest month, the mean
2 50 age and the standard deviation of the Note on accuracy: When calculating it is often necessary to approximate, say to
3 58 distribution of ages. d the nearest integer or to one decimal place. If you have memory facilities on
(c) Draw a cumulative frequency curve, an h your calculator for retaining several numbers then you may prefer to do so.
4 73 from it estimate, to the nea_rest month, t :
5 95 median and the interquarttle range for th Using tables:
6 96 ages of 'all children under 12 yea~·s of a~e (Extract from page 645)
concerned in reported accidents m the ?dmc. E ~ 150 x P(X ~ x)
7 63 . a reason, w h ether you cons! er X-B(5, 0.5)
State, givmg .
8 46 the mean the mode or the medta~dbest . P(X ~ 0) ~ 0.0313
~
n~5 P(X <; r)
9 81 represents' t he average age for acct ents m
of age. P(X ~ 1) ~ 0.1875-0.0313 ~ 0.1562
the home to children under 12 fears . the P(X ~ 2) ~ 0.5-0.1875 ~ 0.3125 r~o 0.0313
Use an approximate X2 st~tis.tic to test whet~er (d) An investigator believes that_chtldrenhm 46.9
1
there is any evidence of btas m the op~ra~o: s groups A, B, C , D' E• F areh hkely . to ave P(X ~ 3) = 0.8125-0.5 ~ 0.3125 0.1875
46.9
reading of the instrument. Use a 5% .stgmftcance accidents in the home in t e rattos 2 test P(X ~ 4) ~ 0.9688-0.8125 ~ 0.1563 2 0.5000
~
level and state your null and alternative 2:2: 1 : 1 : 1: 1 respectively. Us.e: ;hether P(X ~ 5) ~ 1-0.9688 ~ 0.0312 3 0.8125
hypotheses. (L) at a 5% significance le':"e.l to dectd (L)
or not this belief is justtfted. 4 0.9688
5 1.0000
Check on size of expected frequencies:
Since the expected frequencies for the first and last classes are less than 5,
combine them with the next classes,
! ( \/
Do a revised table for tbe expected frequencies and also show the corresponding
observed frequencies: TEST 4: GOODNESS OF FIT FOR A POISSON DISTRIBUTION
Number of heifers x 0 or 1 2 3 4 or 5 Example 12.4
Solution 12.4
L0-150
28.1
LE~ 150
1.238
3.461 ...
...
I j(
1',
7.815
Alternatively, calculate tbe probabilities using
P(X~x)~e- 2 -.
lx
x!
Since xz < 7.815, do n?t rej~ct Ho. d p- 0 5 is an adequate model for the
;··)J:d.IS!')rl
The binomial distributiOn wtth n ~ 5 an - · The total number of matches is 100, so the expected frequencies are found by
multiplying P(X ~ x) by 100.
data. d 1 "th p
. .b . B(5 p) provides an adequate rna e' wt Using tables:
(b) If you want to test whether the distn utw~ 'the data using the fact that the mean of a (Extract from page 64 7)
unspecified, you would need to esttmate p rom
X- Po(2)
binomial distributiOn IS np. E ~ 100P(X ~ x)
4 ~2.0 P(X <; x)
P(X ~ 0) ~ 0.1353
From the data 13.53
P(X ~ 1) ~ 0.4060- 0.1353 ~ 0.2707 r~o
0.1353
27.07
x~ I:fx ~ 401 ~2.673 ... P(X ~ 2) ~ 0.6767- 0.4060 ~ 0.2707
27.07
1 0.4060
:E f 105 P(X ~ 3) ~ 0.8571-0.6767 ~ 0.1804 2 0.6767
18.04
Since P(X ~ 4) ~ 0.9473-0.8571 ~ 0.0902 3 0.8571
9.02
P(X ~ 5) ~ 0.9834-0.9473 ~ 0.0361 4 0.9473
x~np '3:61
P(X ~ 6) ~ 0.9955- 0.9834 ~ 0.0121 5 0.9834
2.673 ~ 5p 1.21
P(X ~ 7 or more)~ 1- 0.9955 ~ 0.0045 6 0.9955
0.45
p- 0 535 (3 d.p.) · auld 7 0.9989
. hypothesis would be Ho: X- B(5, 0· 535) and tbe expected frequencieS w 8 0.9998
So the -null
Check on size of expected frequencies: 9 1.0000
be calculated using P ~ 0.535. ld t k into account that 2
The x test is not valid for expected frequencies less than 5, 10
When working out v, the number o. ';'e~Eo~ t~o (as before) and the other is that P "
fd f f edam you wou a e .
so combine the last three classes to give 5 or more goals. 11
there are now two restnctwns, one IS t at
Revised table:
estimated from the sample. . - 3 xz ~ 0.85 (dependmg
h h. You should frnd that v ~5- 2 - ' X 0 1
Try workrng tbroug t IS test. 1 l t s) and H not rejected. 2 3 4
on degree of approximatiOn m your ca cu a Ion o ~ ~ =~--- 5 or more
0 14 18 29 18 10 11
E 13.53 27.07 27.07 18.04 9.02 5.27
TEST 575
~
Revised table:
10 9.02 0.106 ...
11 5.27 6.230 ... X 0 1 2 3
9.236 4 5 or more
LO ~ 100 LE ~ 100 9.529 ... 0 14 18 29 18 10 11
6. rVbb~ you1 Since X 2 > 9 .236, reject H 0 • E 10.03 LO 100
23.06 26.52 20.33
condtls\On. The number of goals per match cannot be modelled by a Poisson distribution 11.69 8.43
Degrees of freedom v:
with parameter 2.
There are six classes.
There are two restrictions:
Example 12.5 ~E~ 100
Can the data of Example 12.4 be modelled by a Poisson distribution having the same mean as Ththefmean,of the Poisson distribution has to be estimated from the data
the observed data? Test at the 10% level. ere ore v ~ 6-2 ~ 4, so consider the x'(4) dt' t 'b . .
s n utwn
Perform the test at the 10% level. .
Solution 12.5 From tables xz (4) 7 7
IO% ~ • 79, so reject Ho if xz > 7. 779 .
~fx 230 Calculating xz
For the observed data, x~-f ~--~ 2.3.
~ 100
0 E (0-E) 2
The null hypothesis is that the distribution is Poisson, with parameter 2.3,
E xz~I(O-EJz
t.e. H 0 : X- Po(2.3) 14 10.03 4.208 (3 d.p.)
1.571 ... E
H 1 : X is not distributed in this way. 18 23.06 1.110 ...
29 26.52
The probabilities are found from cumulative tables or by calculating using 0.231 ... x2 (4)
18 20.33
3 0.267 ...
' - -
l(X-x)-e -2.3 (2. 1' -
--,x-0,1,2, ... 10 11.69 0.244 ... .~.~
x! 11 8.43 0.783 ...
The expected frequencies are given by 100 x P(X ~ x). LO 100 LE ~ 100 4.208 ... 7.779
Since xz 7 77
The < . 9, do not reject Ho.
number of goals per match can b d 1
as the observed data e mo e led by a Poisson distribution with th
· e same mean
4. -''1-\Tc· th,· Jc-,.
TEST 5: GOODNESS-OF-FIT TEST FOR A NORMAL DISTRIBUTION Perform the test at the 5% level
;b,- kst :tnd
From tables X2S% (4) = 9 ·4 88, so .reJect
. H 0 if xz > 9.43 8
-h· r·<.'ICd!CI!I
Ci'llt.'l'iUtL
Example 12.6
The height, in centimetres, gained by a conifer in its first year of planting is denoted by the
random variable X. The value of X is measured for a random sample of 86 conifers and the 0 (0-Ef 2
E
E
x2 _ "(O-E)
results obtained are summarised in the table: - L, E 2.61 (2 d.p.)
10 13.7 0.999 .. .
<35 35-45 45-55 55-65 >65 18 18.1 0.0005 .. .
X
28 22.4 1.4
10 18 28 18 12
Observed frequency 18 18.1 0.0005 ...
12 13.7
(a) Assuming that X is modelled by a N(50, 15 2 ) distribution, calculate the expected
frequencies for each of the five classes.
(b) Carry out a x2 goodness of fit analysis to test, at the 5% level, the hypothesis that X can 6. :\'l:tl<c yot't
.EO- 86
s·mce X 2 < 9.488, do not reject Ho.
0.210
2.611 ... -
5%
9.488
Solution 12.6
Example 12.7
(a) X- N(50, 15 2 )
Standardise each X value, A weaving mill sells lengths of cloth with a n . II
lengths and obtained the following fre odmmab ength of 70 m. A customer measured 100
e.g. when x = 35 quency Istn utwn:
X -f.' 35-50
z=-- -1 Length (m) 61-67 67_ 69 69 71 71 73
a 15 73 75 75 81
x, 35 45 55 65
Frequency 1 16
-1-0.333 0.333 1 26 19 20 18
Notice that there is symmetry in the diagram. z,
E =probability x 86 Use a X2 test at the 5% level of significance to sh . .
Probabilities adequate model for the data. ow that the normal dtstnbution is not an
P(X < 35) = P(Z < -1) = 1-0.8413 = 0.1587 13.7 (AEB)
P(35 <X< 45) = P(-1 < Z < -0.333) = 0.8413-0.6304 = 0.2109 18.1
Solution 12.7
P(45 <X< 55)= P(-0.333 < Z < 0.333) = 2 x 0.6304-1 = 0.2608 22.4
P(55 <X< 65) = 0.2109 (by symmetry) 18.1 The null hypothesis is that the distribution is normal . .
vanance is given they have to b t. df ' but smce neither the mean nor the
(P > 65) = 0.1587 (by symmetry) 13.7 ' e es tmate rom the data. .
.EE = 86 Mid-interval value x 64 68 70 72. .74 78
Note that the expected frequencies have been given to 1 d.p. Frequency 1 16 26 19 20 18
(b) x' goodness-of-fit test: From the calculator
:L Sr:tl-c }(,and H 0 : X- N(SO, 15 2 ) /l = 5i = 72.24 (see page 32)
H 1: X is not distributed in this way.
i-i,.. a2 = 11.5.78 (3 d.p.) (see page 449)
X <35 35-45 45-55 55-65 >65 2
X goodness-of-fit test
I. St~'- 1v !-_! 0
28 18 12 .EO= 86
,-1:-; (I
H 0 : X--: N(72.24, 11.5.78)
e}:pcctvd 0 10 18 1-il.
frt.'.(1l1CllC'iC·~ :=tn
H,: X ts not distributed in this way.
E 13.7 18.1 22.4 18.1
g1·c:-1t(.:r !h:1J1 5. Standardise the boundary values of the
(Note that all expected frequencies are greater than 5 so there is no need to l'_\i'<..'V!t'(l intervals (to 3 d.p.) using z _ x- f.l x- 72.24
combine classes) a >111.578
Degrees of freedom v when x = 61, z = 61- 72.24
There are five classes and one restriction (I:E = 86). >/11.5.78 -3.303.
X: 61 6769717375 81
Therefore v = 5-1= 4, so consider the x2 (4) distribution. NOTE: P(X < 61) = P(Z < -3 ·303)..., 0 ' so take the first class as X< 67.
E ~ prob x 100
Probabilities _
6.18
P(X < 67) ~ P(Z < -1. ) ~ - ~~!~~)-~ ~.
0 0618
2 _ 0.8294 ~ 0.1088
938
540 1 Summary of the number of degrees of freedom for goodness-of-fit
10.88 tests
P(67 <X< 69) ~ P(-1.540 < z < 0. 364)- 0 8294- 0.6421 ~ 0.1873 18.73
) -P(-0 952<Z<- · - ·
P(69 <X< 71 - . 0 223)- 0 6421 + 0.5883-1 ~ 0.2254 22.54 Distribution
P( 71 <X< 73) ~ P( -0.364 < Z < · - ·
v
P(73 <X< 75) ~ P(0· 223 <
z < 0 811) ~ 0.7913-0.5883 ~ 0.208
. - 0 2037
20.8
Uniform
P(75 <X< 81) ~ P(0.811 < z < 2.574) ~ 0.995-0.7913- .
P(X> 81) ~ P(Z > 2.574) ~ 1-0.995 ~ 0.005 E:J Given ratio
5. Smallwoods Ltd. run a weekly football pools Number landing (a) Find. the correspondi~g expected frequencies
competition. One part of this involves a fixed- obtamed from the Pmsson distribution with
point down Frequency Number of Number of stations (f)
odds contest where the entrant has to forecast parameter 1.40.
rainstorms (x)
correctly the result of each of five given matches.
3 2 (b) ~ar~~ out a x2 test, at the 5% level of reporting x rainstorms
In the event of a fully correct forecast the entrant sigmficance, to determine whether or not the
is paid out at odds of 100 to 1. During the last 4 2 0 102
proposed model should be accepted. State
two years Miss Fortune has entered this fixed- 5 5 1
cl~arly the null and alternative hypotheses 1!4
odds contest 80 times. The table below 6 7 bemg tested and the conclusion which is 2 74
summarises her results. reached. (MEI) 3
7 17 28
8 8 4
Number of matches Number of entries 9. Th~ numbers of cars passing a check-point 10
correctly forecast with x correct 9 6 dunng 100 intervals, each of time 5 minutes 5 2
10 1 were noted: ' more than 5
per entry (x) forecasts (f) 0
11 2
0 8 Number of cars Frequency (a) F:nd the expect~d frequencies of rainstorms
1 19 (a) Calculate the mean number landing point given by the Poisson distribution having the
0 5 s~m~ me~n and total as the observed
2 25 down. Hence show that an estimate for the
probability of a drawing pin landing point 1 23 d1stnbutton.
3 22 2 (b) Use the x.l distribution to test the adequacy
down is 0.35. 23
4 5 (b) What are the parameters of the appropriate 3 of the Poisson distribution as a model for
25 these data. (AEB)
5 1 binomial distribution for these data? 4
Calculate the probability of exactly eight 14
5 13. Ov~r a period of 50 weeks the numbers of road
(a) Find the frequencies of the number of landing point down, and hence write down, 10
matches correctly forecast per entry given by accurate to one decimal place, its expected 6 or more ~cchtdents reported to a police station are shown
0 m t e table below.
a binomial distribution having the same frequency.
mean and total as the observed distribution. (c) Using appropriate tables, find, making your Fit a Poisson distribution to these data and te t No. of accidents 0 1
(b) Use the x2 distribution and a 10% level of method clear, the expected number of times the goodness of fit. s 2 3
significance to test the adequacy of the five or fewer pins would land point down. No. of weeks 23 13 10 4
binomial distribution as a model for these (d) The chi-squared goodness-of-fit test can be 10. Durin? the weaving of cloth the thread
data. used to judge how well data follow a sometimes breaks. 147lengths of thread of eq 1
Find t~e mean number of accidents per week.
(c) On the evidence before you, and assuming distribution. Group the above data in the length were observed during weaving and the ua
Use thts mean, a 5% level of significance and
that the point of entering is to win money, following manner and evaluate the missing tab!e recm~ds ~he number of these threads for
~our table of xl to test the hypothesis th~t these
would you advise Miss Fortune to continue expected or observed frequencies: which the mdicated number of breaks occurred.
~ta are a. rando~ s_ample from a population
with this competition and why? (AEB) Wtth a Pmsson dtstnbution. (O&C)
Number of pins <5 6 7 8 Number of breaks
6. Samples of size 5 are selected regularly from a per thread 0 1 2 14. (a) The dat~ in the .follo~ing table are the result
Expected 8.6 11.8 3 4 5
production line and tested. During one week 500
samples are taken and the number of defective Number of threads 48 46
?f countmg radiOactiVe events in five-second
30 12 9 2 mtervals:
items in each sample is recorded. Observed 9 7 17
Fit a ~oisson distribution to the data and Number of events
Calculate the value of the chi-squared statistic 0 1 2 ~3
Number of examme ~hether the deviation between theory
for this data. and expenment is significant. (MEI)
defectives, x 0 1 2 3 4 5 Number of observations 5 14 13 8
(e) How many degrees of freedom does your
Frequency,{ 170 180 120 20 8 2 test have? By referring to your tables carry 11. A shop that repairs television sets keeps a record
out the test and make your findings clear. ~how that the mean number of events in a
(0) df the n~mber of sets brou~ht in for repair each fr:e-second interval is 1. 7 (taking the group
(a) It is suggested that a binomial model, with ay. Th.~ numbers brought m during a random Wit~ frequency 8 to have a mean of 3.5).
sample of 40 days were as follows.
mean the same as the observed data, can be
8. A local council has records of the number of (b) Wnte down th~ probability of 0, 1, 2, >3
used. Find the frequencies expected by this 4000211000 0110300010 events for a Pots_son distribution with mean
children and the number of households in its
model. 4000002010 0001110200 1. 7. Hence obtam to one decimal place th
area. It is therefore known that the average
(b) Test whether t:•,is binomial model is a good expected frequencies. e
one. Use a 5% level of significance.
number of children per household is 1.40. It is ~est, at the 5% significance level, the hypothesis (c) Usc the chi-squared goodness of fit test to
suggested that the number of children per ~ ~t these. nu~nbers are observations from a
household can be modelled by the Poisson Olsson d1stnbution. (C) assess whether it is reasonable to claim that
7. A group of students are performing an
distribution with parameter 1.40. In order ~o test the data come from a Poisson distribution.
experiment where 20 drawing pins arc dropped 12. Make your method clear and conduct your
randomly on to the floor and the number landing
this, a random sample of 1000 households IS ~he tabl~ gives the distribution for the number of test at the 10% level.
taken, giving the following data. ea;y ra!nstorms reported by 330 weather
point down is counted. The procedure is then (d) A student co~ducting a similar experiment
repeated several times. Describe the assumptions statiOns m t~e United States of America over a
Number of one-year penod. found the chi-squared statistic for his results
you would need to make in order to be satisfied was 0.0.15. What conclusions do you draw
with modelling this situation by a binomial 1 . 2 3 4 5+
children 0 from thts value? (O)
distribution. The experiment was carried out
until the students had 50 observations; their Number of
273 361 263 78 21 4
results are given in the table: households
582 .L\ Vif JC:St
T
15. For a period of six months 100 similar hamsters (a) Test, at the 5% level, whether the data
were given a new type of feedstuff. The gains in follow a normal distribution with mean (b) age of voter and voting preference
mass are recorded in the table below: 173.5 em and standard deviation 7 em.
(b) Find the expected frequencies for a normal
Gain in mass (g) x Observed frequency f distribution having the same mean and Candidate
variance as the data given, and test the
-oo<xo;;;;...:1o 3 A B
goodness of fit, using a 5% level of
-10<x<;-5 6 significance. 18-25
.....0 373 62
-5<xo;;;;O 9 17. In a European country registration for military b 25-40 484
~
~ 187
O<x<;S 15 service is compulsory for all eighteen-year-old ~ !? 40-60 167 563
5<x<;10 24 males. All males must report to a barracks Over 60
where, after an inspection some people, including 100 492
10<x<;15 16
all those less than 1.6 m tall, are excused service.
15<x<;20 14 The heights of a sample of 125 eighteen-year- This is a 4 by 2 contmgency table (4 rows and 2 columns).
20 <X <;25 8 aids measured at the barracks were as follows:
25<x<;30 3 _¥ ou can use a X2 test to investigate whether th
30<xo;;;;oo 2 Height, m 1.2- 1.4- 1.6- 1.8- 2.0-2.2 ~ an association between them. The test follo:s tw~ faftors are independent or whether there
hut this time the null hypothesis H is that th t a ~mi ar pattern to the goodness of fit test
Frequency 6 34 31 42 12
It is thought that these data follow a normal ypothesis HI is that there is an ass~ciation bet wo ahctors are independent and the alternative
distribution, with mean 10 and variance 100. e ween t em.
Use the x2 distribution at the 5% level of (a) Use a x2 test and a 5% significance level to The following example explains how to 1
significance to test this hypothesis. confirm that the normal distribution is not contingency table ca culate the expected frequencies for data . .
Describe briefly how you would modify this test an adequate model for this data. · given m a
if the mean and variance were unknown. (AEB) (b) Show that, if the second and third classes
(1.4- and 1.6-) are combined, the normal
Example 12.8
16. The following data give the heights in distribution does appear to fit the data.
centimetres of 100 male students. Comment on this apparent contradiction in
the light of the information at the beginning The members of a sports team are interested in h
Height (em) Frequency of the question. (AEB) results. They play 50 matches with the fall . w ethelr the weather has an effect on their
' owmg resu ts
155-160 5
161-166 17 Weather
167-172 38 Good Bad Total
173-178 25 ~ Win 12
"3 4 16
179-184 9 ~ Draw 5
~
~
8 13
185-190 6 Lose 7 14 21
Total 24 26 so
THE x 2
SIGNIFICANCE TEST FOR INDEPENDENCE Formulate suitable null and alternative h oth
1% significance level, that the weather h YP e;;s• and use a X2 test to test the claim at the
Sometimes situations arise when data are classified according to two different factors or conclusion clearly. as no e ect on the team's results. State you/
attributes and these are often displayed in a table, known as a contingency table, for example
(C)
(a) examination grades for Mathematics in three further education colleges Solution 12.8
College Note that the factors are the result of the match
lmked in a 3 by 2 contingency table. and the type of weather and they have been
Bradley Cooper Dunstan
_!,Stale' u., :t!h!
.tl ""'l5
~ ~
;\nc] chn i; r ]-. 1( When calculating the expected fre .
E 16 17 12 remain the same. quencies, the row and column totals must
l'XilCCl:'d
N 5 12 8
"'!'"'""'''' ;;J_'(
Consider the cell linking a win with good weather: Note that all the expected frequencies are greater than five, so cells do not need
to be combined.
Total number of Weather
Total Degrees of freedom, v
wins~ 16, therefore Good Bad
16 Notice that in this table once two of the expected frequencies in different rows
16
P(result is a win)~ so· "il
Win
D ''Tc\' :()(::[
have been calculated (for example those in bold type), the others are known
~ Draw automatically. This is because the row and column totals must agree with those
Total number of matches ~ Lose in the observed data, for example
in good weather~ 24, if expected number of wins in good weather~ 7.68,
50
therefore
Total 24 then expected number of wins in bad weather~ 16- 7.68 = 8.32
~i'
24 Number of degrees of freedom, v ~ 2 and the x2 (2) distribution is considered.
P(good weather)~ 50 . ·ole'""'""''
Test at the 1% level.
. h events 'the result is a win' and 'the weather From tables x 1 %(2) ~ 9.21, so reject H 0 if X 2 > 9.21.
2
According to the null hypothesis, t ~ lti lication rule for independent events
is good' are independent, so, usmg t emu p
(see page 198)
p (win and good weather)~ P(win) x P(good weather)
X 2 ~"'
(0-Ejl 2
0 E
16 24
~-x- E L.. (O-
E E) 6. 96 (2 d.p. )
50 50 12 7.68
16 24 2.43
Expected number of wins in good weather ~.56' x .56' x 50 5 6.24 0.246 ...
7 10.08 0.941 ...
16x24
4 8.32 2.243 ...
50 8 6.76 0.227 ...
~7.68
14 10.92 0.868 ...
· 16 x 24 g1vesacu
. I e to the quick way of working out
Note that theca IcuIatlon 50 ro ~so :EE ~50 6.956 ...
9.21
the expected frequency: 2
H)\.Y total>< colurnn tou1 Since X < 9.21 do not reject H 0 , and conclude that the team's results are
independent of the weather.
tmal
At the 1% level conclude that the weather .has no effect on the team's result.
So, for example, the expected numb er of draws in bad weather is calculated as
follows:
Weather
v~(4-1) x (3 -1)
Good Bad Total
~3x2
Total 24 26 50
T
2 by 4 table Solution 12.9
Ho: There ishno relationship between the sex of a candidate and the abil'ty t
pass at t e first attempt. I o
2 by 2 table
Ht: There is a relationship.
V=(2-1) X (2-1) 2, C:;dcd~lk F
To calculate expected frequencies, use
=1x1 ;,t d; h~·~:k t·iwt
I I ~1 c'X[1C:ctn!
''"''(llt'iKir:c;
:,;-c·:tln th:lil
arc·
S.
Expected frequency row total x column total
grand total
In general, if there are h rows, then once (h -1) expected frequencies in a row have been
So expected number of males who pass 40x62
calculated, the last value in the row is known because the row total must agree. 100 = 24.8
Similarly, if there are k columns, once (k- 1) expected frequencies in a column have been
calculated, the last value in the column is known because the column total must agree. ~~; athllethfact that row fand column totals agree with the observed data to work
e remammg requenc1es:
For an h by k
number of (JCizrr'" of freedom (h 1) ;< (/.z 1 ). Result of driving test
Pass Fail Total
Male 24.8 15.2 40
Yates' correction for a 2 by 2 contingency table Female 37.2 22.8f-- 60
Totals /62 38
In particular, for a 2 by 2 contingency table, v = 1 and the x2 (1) distribution is considered. In 100
this case, Yates' correction should be applied when calculating X 2 , where I
,->, ,L
Note that there are no expected frequencies that are less than 5.
l. k OUi /}.
Degrees of freedom, v
v ~ (2 - 1 H2 - 1) = 1, so use the x2 (1) distribution.
Example 12.9
4, ~;l:!i~'
Test at the 5% level.
A driving school examined the results of 100 candidates who took their test for the first time.
From tables X 5 %(1) = 3.841, so reject H 0 if x2 > 3.841.
ct·_llc' lt-'i' 1,[ 2
It was found that out of the 40 men, 28 passed and out of the 60 women, 34 passed. Do these
results indicate, at the So/o significance level, a relationship between the sex of candidate and CJ.'lil:!_-;u;l
I
11'1
T If If- x:' ');Cf-Jii'' 'L!'IC:i IS 589
I
I
15. A market research organisation interviewed ~ Miscellaneous worked examples
14. The personnel manager of a large firm i~ . le of 1 20 users of launderettes m
. . . w hether there is any assocutwn
mvesttgatmg 1 ran d om samp f d b nd X
London and found that 37 pre erre ra d h
between the length of service of t~e emp oyhes h. owder 66 preferred brand y an t e
and the type of training they recctve from t e d
;::ai~~~ prefer'red brand z. A simil~r survey Example 12.10
firm. A random sample of 200 empl?yee re~?r s . d t ·n Birmingham. In this survey,
. k f the last few years and ts was carne ou 1 db dX
1sta en rom f classtfled
· · of 80 people interviewed, 19 prefer~e ran ' In experiments in pea breeding Gregor Mendel obtained the following data relating to 556
according to these criteria. Length o servtce IS peas.
40 referred brand y and the remamder
classified as short (meaning less than 1 year), z
pf db and Test whether these results
. ( _ years) and long (more than. 3 pre erre r · . h 5% level of
med mm 1 3 I .. d b g provide significant evtdence, at t. e. 0 (C) Round and
ears). Type of training is c assJ 1te' as em . . . '
Wrinkled and Round and Wrinkled and
. . . I ,.m d u cn·on course ' proper mtttal
ymerely an mltla different preferences in the two cities.
d Yellow Yellow Green
on-the-job training but little if any more, an Green
regular and continuous training. The data are as 16. The results obtained by 200 stude~ts {; bl 315 101
chemistry and biology are shown m t e ta e. 108 32
follows: Test, at the 5% level, whether the performances
Length of service in both subjects are related. According to Mendel's theoretical results, the expected figures are in the ratios 9: 3: 3: 1.
2
Short Medium Long Calculate the value of x for these data on the assumption that the theory is correct.
Chemistry
23 13 Pass Fail Test at the 10% significance level whether the theory is contradicted.
~bO Induction course 14
0<1,)'2
" 12 7 13
<><·-
Initial on-the-job ~ Pass 102 45 It has been suggested that Mendel's results are suspect in that they are unlikely to have been
'"'~ Continuous 28 32 58 obtained from random observations. Comment on this suggestion in relation to the value of
'" " ~ Fail 21 32 x2 calculated. (C)
. a t the solto level of significance
E xamme . . whether
these data provide evidence of assoclatiO~ .
between length of service and type o~ trammg, Solution 12.10
stating clearly your null and alternative
hypotheses. . (MEJ) H 0 : The different types of peas occur in the ratio 9 : 3 : 3 : 1.
Discuss your conclusiOns. H 1: The different types of peas do not occur in this ratio.
Expected frequencies, according to H 0 :
~
101 104.25 0.101
108 104.25 0.134
32 34.75 0.217
E0 ~556 EE~556 0.470
6.251
Since X 2 < 6.25, accept H 0 and conclude that the types are in the ratio 9: 3 : 3 : 1. Expected data:
For school A and three passes
The calculated value of X 2 is very small indeed, suggesting very little discrepancy between the
observed and expected frequencies. expected frequency= row total x column total
From x2 tables, P(X 2 < 0.352} = 5% so on only just over 5% of occasions would you grand total
expect to have a test value this low. This could suggest that the data are not random 26 X 25
observations. 63
= 10.32 (2 d.p.}
The complete table is as follows·
Example 12.11
Mr and Mrs Smith live in a small town with two primary schools A and B. They are trying to 3 passes 1 or 2 passes No passes Totals
decide which school would provide the better learning environment for their children. They School A 10.32 8.25 7.43 26
have available the results of recent national tests in mathematics, English and science. Each School B 14.68 11.75 10.57
child in the final year took three tests, one in each subject, and they either passed or failed 37
each test. These results are summarised in the table below. Totals 25 20 18 63
3 passes 1 or 2 passes No passes The table has 2 rows and 3 columns
so v = (2- 1}(3- 1} = 1 x 2 = 2 and th e X'(2} d.Istn.butwn. is considered
School A 15 6 5
13 T est at the 5% level. · ·
School B 10 14
From tables ' x' 5% (2} -_ 5 •9 91 'so reJeCt
. Ho if x2 > 5.991.
(a} Stating your hypotheses clearly test, at the 5% level of significance, whether or not there is
evidence of an association between school and test results. 0 (O-E)'
E
E xz= L (0-E)'
E = 6.01 (2 d.p.}
Mr and Mrs Smith also have available the results of a questionnaire about the annual family
15 10.32 2.122 ...
income x, in thousands of pounds, of the families of the children taking these tests. The results
6 8.25 0.613 ...
are summarised in the table below.
5 7.43 0.794 ...
x>30 20 <X<; 30 15 <x<:20 X<; 15 10 14.68 1.491 ...
5 14 11.75 0.430 ...
School A 7 5 9
13 10.57 0.558 ...
School B 6 13 8 10 5%
J; 0-63 l:E = 63 6.012 ...
A x2 test fOr association between school and family income using this information gave a test 5.991
statistic of 3.545. There was no pooling of classes.
Since X 2 > 5 .991 ' reJect
· H o and conclude that ther . .d f
(b) Using a 5% level of significance, interpret this statistic stating the critical value used. the school and the test results. e IS evt ence o art association between
(c) In the light of parts (a} and (b) state, giving reasons, which of the two schools Mr and Mrs (b) The table has 2 rows and 4 columns
Smith might choose for their children. (L} SOV=(2-1)(4-1)=1X3=3 and t h e X2( 3} distnbutwn
. . . is considered
Ho: The two factors 'school' and 'f .I . , . .
H . Th . arm y mcome are mdependent
Solution 12.11 ere IS an association between school and f .I . .
F 1· amt y mcorne.
(a} H 0 : The two factors 'school' and 'results' are independent. 2
rom tables X sd3) = 7.815, so reject H 0 if X'> 7.8 1S.
H : The factors are not independent and there is an association between school and
1
results.
It is given that xz = 3.54S
Since X'< 7.815, do not reject H
Observed data: There is no associatiOn
· · between school
o· and f "I .
No passes Totals (c) A . ami y mcome.
3 passes 1 or 2 passes
s there Is no association between school and fa 'I .
5 26 to base their choice on the results of the n t" lmi y mcome, Mr and Mrs Smith are likely
School A 15 6
37 school A obtained three passes a dwna tests. Smce 57% of the pupils in
School B 10 14 13 M d , as compare With onl 2 7'X . h hr
ran Mrs Smith might conclude that scho 0 I A yd ohWJt t ee passes in school B,
20 18 63 envtronment. prov1 es t e better learning
Totals 25
,-, ,-'~TIS
Miscellaneous exercise 12d (a) Stating your hypotheses clearly, test at the
5% level of significance whether or not The null hypothesis is that a person's opinion
there is any evidence of an association about the amount of sport shown on TV is
1. It is suggested that preferencbs for three Eye colour between brand of fertiliser and yield. independent of the person's sex.
f a town y~pass are
proposed routes or I r e Each person in Grey or (a) Construct a table showing the expected
1
associated with ;~£r;goe;~oepl~~ ~hosen fr~m the
Fertilisers A and B are produced by Quickgrow
Blue green Brown whereas Cis produced by Bumpercrops. The frequencies, assuming that the null
a random samph d surrounding villages,
inhabitants of.t e town
sked whtch route e or
hn she preferred. The ~
0 - 7 8 18 farmers wanted to decide from which company
to purchase fertiliser and combined the figures
hypothesis is true.
(b) Use a x test to test this null hypothesis,
2
was~~
resu s are given in the following table.
·p
u + 29 10 16 for A and B to give a 3 x 2 table. The statistic using a 5% significance level. Show full
•~
0:: ++ 21 9 2 L (0- E)
2 details of your method and state your
Town Surrounding villages conclusion dearly. (C)
E
50 25 the5%
Perform an appropria~e test at null and 8. The Director of Studies at a College of Further
Route 1 for this new table was calculated and gave the
tm our
value 7.622. Education believed that there was a connection
Route 2 28 22
between candidates' grades in mathematics and
Route 3 16 9 (b) By carrying out a suitable test at the 5% physics at A-level. For a set of candidates who
level of significance, advise the farmers had taken both examinations, she recorded the
. 11 and alternative whether or not there is any evidence of an number of candidates in each of four categories,
State appropnate nu 2 test to test at the 5%
as shown in the table.
hypotheses, and use a X tion th~t there is an association between the choice of company
significance level, the surges d route and where and yield.
association between pre erre (C) (c) Giving your reason, advise the farmers Mathematics Mathematics
people live. which company they should use. (L) grades A-C grades D-U
le of supermarkets was sent Physics grades A-C
2. (a) A rand?m sa~p 6. A statistician, who is suspected to be suffering 22 9
hich they were asked to
from asthma, is asked to record his peak flow Physics grades D-U
a questwnnatre on~ f shoplifting 8 15
report the number o cases o h f h Lower value of Number of measurement four times each day for a period of
they had dealt with in each mon~ o t th grouping interval Observations four weeks. (a) Test the Director's belief at the 2.5% level
previous year. The totals for eac mon He groups by value the 112 recorded of significance, stating your null and
were as follows. -oo 0 measurements into seven classes giving observed alternative hypotheses.
-2.0 1 frequencies, o;, i = 1, 2, ... , 7. He then calculates Her colleague said that she was losing accuracy
J F MAM] J A s 0 N D correctly corresponding expected frequencies, ei, by combining the grades A to C in one group,
-1.5 0 using a normal distribution having mean and
6 18 16 17 10 22 14 16 and grades D to U in another. He suggested that
16 12 10 17 -1.0 6 variance estimated from the original she should create a 7 x 7 table showing all
-0.5 10 measurements. possible combinations of grades.
Carry out a chi-squar.ed .t~st at an
±
. 1 1of stgmftcance to 0.0 12 The value of the test statistic (b) State why his suggestion might lead to a
approp:tate h~~er or not shoplifting is more 0.5 15 (o 1 - e1)
2 problem in performing the test. (L)
determme w e. hs than others.
likely to occur 111 some ilio~~ be of the same 1.0 23 i~t e; 9. During a working day a machine requires
(You may takke alll monur ~ull and alternative 1.5 16
is then calculated correctly by the statistician as occasional adjustments which appear to be
1 th)Ma ecearyo
h;:otheses, the level of ~ignificance you are 2.0 13 5.624. randomly disrr·ibuted throughout the day. A
3 factory foreman records the number of
using, and your concluswn. 2.5 (a) Using a 1% level of significance and stating adjustments made to the machine each day for a
. · h se the fact that, 3.0 1 the null hypothesis, complete the test.
~h~:aif;~~~~~u;:sof {.are equal, the us~al 3.5 0 (b) Give the usual requirement made on each of
period of 200 working days, obtaining the data
displayed in the table.
chi-squared test statistic may be wntten a the values e 1 prior to calculating the test
(C) statistic, and indicate how a failure to meet Number of adjustments 0 1 2 3 4 5
.!_L"-Lfo . d .d d to test three new the requirement may be overcome. (NEAB)
f lo 5. A farmers' co?.rerattveB :~deC allocating them Number of <:fays 34 78 61 20 5 2
brands of fertthser, A, h ield of the crop was 7. A random sample of 100 people was asked for
(b) Provee the result glVen
.
a t the end of part (a).(MEI) at random to 75 plots .. T e y l The results are their opinions about the amount of sport shown Previous experience has suggested that the daily
classified as high, medmmb ol r ow.
· din the table e ow. on TV. Each person had to say whether there number of adjustments to this machine follows a
summanse
. iation between was too much sport shown, about the right Poisson distribution with mean 1.5.
3. It is thought that ther~ ts a~s a:~~cthe reaction of amount, or not enough. The numbers of men 2
the colour of a persol s ey. let light In order to Fertiliser (a) Perform a x goodness of fit test to decide
and women making each response are shown in
the person's s.kin th u ya:~:dom sa~ple of 120 c Total whether the data in the table can reasonably
investigate thts ~ac dotoaa standard dose of
people was subjecte d of their reaction was High
A
12
B
15 3
-
30
the table.
be considered as conforming to a Poisson
distribution with mean 1.5.
Men Women (b) Outline, without detailed calculation, the
~
ultraviolet light. '!'he egree . '+'indicating 8 8 8 24
d ,_, indicatmg no reactton, Medium necessary modifications to your test if the
21 Too much sport 13 26
note , . d , ++, indicating strong Low 5 7 9 Poisson mean is not assumed to be 1.5.
slight reactwn an l hown in the table About right 22
reaction. The resu ts are s 75 22 (c) The distribution B(5, 0.3) is a very good fit
Total 25 30 20 Not enough sport 12 to the data in the table. Without further
below. 5
calculation, explain why, despite this good
fit, the binomial model is not appropriate.
(NEAB)
597
12. A factory operates four production lines. (c) Given that a total of 30 students
10. A department store has five doorways, each for
entrance and exit. It is claimed that the
Maintenance records show that the daily number
of stoppages due to mechanical failure were as
~~:~e~e:~·eatnum
theb5% significanc:~!:~t
er of student .
State theI condition
th · which sometimes n ecessttates
e ~rna garnatwn of rows or columns in
.
proportion of shoppers entering or leaving the shown in the table below (it is possible for a a particular question is ass . s adtte~ptmg c~ntmgency tables. Explain why amalgamation
store is the same for each of the five doorways. production line to break down more than once type of question. octate With the mrght not ~e appropriate for this table.
The number of customers entering or leaving the The followmg table summarises the data relatin
on the same day). You may assume that (d) Compa•e the diff I d
store is counted at each doorway for three different types of Iqc~ ty,.an . pohpularity of the to the day of the week on which the accident g
Lf~ 1400, Lfx ~ 1036. es 1011 m t e light of
randomly selected days with the following results. occurred.
your answers to (a) and (c)
3 4 5 6 or · (AEB)
Number of customers Number of 0 1 2
Doorway Number of
more 14. (a) rbhe numd her of books borrowed from a
stoppages, x
601 Ml raryd uring a ceramt · weelc were 518 on Day accidents
A
673 Number of 728 447 138 48 26 13 0 d
W 0 ay, 431 on Tuesday 485 on
B
~d nesday, 443 on Thur;day and 523 on Monday 60
c 626 day',f Fn ay. Tuesday 54
D 618 (a) Use a x2 distribution and a 1% significance ~s th{r~ any evidence that the number of Wednesday 48
702 oo s arrowed varies between the five
E level to determine whether the Poisson Thursday 53
distribution is an adequate model for the · Tof the week>· Use a 1"'
days I I f
toeveo
Test whether or not these data support the claim. stgm tc~nce. Interpret fully your Friday 53
data. conclusiOns. Saturday
The same store also records the daily number of (b) The maintenance engineer claims that 75
(b) Analysis of the rate of turnover of Sunday
sales charged to stolen credit cards. The results breakdowns occur at random and that the 77
employees. by a personnel manager produced
for the first four months of 1990 are as follows. mean rate has remained constant
the followmg table showing the length of
throughout the period. State, giving a
sttahy of 2001 people who left the company for
!~~~~~aste th~ h[pothesis .that these data are a
Number of sales Number of days reason, whether your answer to (a) is amp e rom a umform. distribution.
o er emp oyrnent.
consistent with this claim. (AEB)
0 31 (c) Of the 1036 breakdowns which occurred
39 230 were on production line A, 303 on B, Length of employment (years) 16. (a) The number of accidents per day on
1 stretch of motorway was recorded fora 100
19 270 on C and 233 on D. Test at the 5% 0-2 2-5 >5 d
2 ays and the following results obtained.
significance level whether these data are
3 11 consistent with breakdowns occurring at an ~ Managerial 4 11 6
:>4 0 equal rate on each production line. (AEB) ""(3" Skilled 32 28 21
Number of accidents Frequency
Unskilled 25 23 50
Explain why a Poisson distribution may be 0 44
13. A group of students studying A-level statistics
appropriate as a model for the daily number of was set a paper, to be attempted under 1 32
sales charged to stolen credit cards. Test the examination conditions, containing four Yfmg a ~% level of significance, analyse th.
2
hypothesis that the daily number of sales does questions requiring the use of the x distribution.
2
~~~:~::tn· and state fully the conclusions from 3
9
follow a Poisson distribution. (NEAB) The following table shows the type of question ysts. (AEB) 10
and the number of students who obtained good 4 5
15. Over a long penod of ttrne, a research team
11. In the mathematics department of a college, (14 or more out of 20) and bad (fewer than 14 5 or more 0
candidates in an examination are graded A, B, C, momtored the number of car accrdents whl h
occurred m a particular county. Each acctd~
out of 20) marks.
D or E. Records from previous years show that Examme whethe r or not a p otsson
·
was classtfted as bemg tnvial (mmor dam nt . model is
examiners have awarded a grade A to 15% of
no/ersonal InJUries), senous (damage to ~~~I~}~~
Type of question SUitable to represent the number of 'd
candidates, B to 20%, C to 35%, D to 25% and er d h' acc1 ents
Contingency Binomial Normal Poisson P ay ~n t. :s ·stretch of road. Use a 1 %
E to 5%. A new syllabus is examined by a new an passengers, but no deaths) or fat 1 (d Ievel of. stgmftcanc:e
table fit fit fit tohvehicles I a famage
and loss of life) . Thecoouro
board of examiners who award the grades to 200 h thee (b) Th ~ results of a survey .
to establish th
candidates as follows: 12 11 w 1C l~ the opmwn of the research team
J ar attt~de of individuals to a particular e
Good 25 12
A, 33; B, 37; C, 81; D, 36; E, 13 mark
~l~~eth! da ayc~frdtehnt waskalso recorded, tog~ther pohtlcal proposal showed that
e wee on which th 'd ~hree-quarters of those interviewed were
(a) Stating clearly your hypotheses and using a occurred The f0 ll d e acc1 ent
11 3 12 · owmg ata were collected. ouse owners. Of the 44 interviewed
5% level of significance investigate whether Bad 4
or not the new board of examiners awards mark Colour only 6 ofhthe 35 in favour of the pro~osal
Trivial Serious Fatal were not ouse owners.
grades in the same proportions as the
(a) Test at the 5% significance level whether the White 50 D~e~ the survey indicate that a person's
prevwus one. 25 16
mark obtained (by the students who . Black ~pmwn on the proposal is independent of
In addition to being classified by examination 35 39 18 .ou~~ ownership? Use a 1% level of
attempted the question) is associated w1th Green
grade, these 200 students are classified as male 28 23 13 Sigmftcance. (AEB)
the type of question.
or female and the results summarised in a (b) Under some circumstances it is necessary to Red 25 17
contingency table. Assuming all expected values
11
combine classes in order to carry out a test. Yellow 17 20
are 5 or more, the statistic 16
If it had been necessary to combine the . Blue
binomial fit question with another quesuoJ~,
24 33 10
2::ro (0.-E.)' was 14.27.
1 1
which question would you have combined Jt
~~~yse t~ese diata for evidence of association
i~J Ei with and why?
(b) Stating your hypotheses and using a 1% accid~:~. t e co our of the car and the type of
significance level, investigate whether or not
sex and grad~ are associated. (L)
598 l\ z:O>iC:I~J'
degree of-correlation. You might say, or ~xampt~' correlation There is a significance test
between the variables or that there is wea c nega l~et" n betwee~ the variables, backed by
that allows you to decide whether there·~ a corre a 10 The alternative hypothesis, H 1
statistical theory rather than just a suspiCIOn.
The alternative hypothesis depends on whether the test is one-tailed or two-tailed.
One-tailed tests
TEST FOR THE PRODUCT-MOMENT CORRELATION COEFFICIENT, r
If you think there is a positive correlation between the variables X and Y, the alternative
h roduct-moment correlatwn hypothesis is H 1: p > 0 (there is a positive correlation between the variables).
In C h apter 2 (page 1 39) you learnt how to calculate r, t e p
coefficient between two sets of data X and Y. If you think there is a negative correlation between the variables X and Y, the alternative
hypothesis is H 1: p < 0 (there is a negative correlation between the variables).
Using small s format: Two-tailed tests
1 -- };xy -- If you are looking for a correlation but not specifying whether it is positive or negative, then
where s ~-};xy-xy~---xy
xy n n *
the alternative hypothesis is p 0 (there is some correlation between the variables).
s
X
~£~ ~r]:_n};xl-xl~
XX
J};nx2 -xl The calculated value of r, the product-moment correlation coefficient, is compared with the
critical value which is found from tables. An extract is given below and the tables are printed
on page 652.
sy~-rs:~ ~;;L.,Y
/1" 2_-2~
Y
J};yz
n
-y2
!, I I \f
!(}
ICiU\'TS 603
Sample 0.6319 0
Level +---------1
0.6319 1
1·-----~·
0.10 0.05 0.025 0.01 0.005 size ~·rii rc: I . ,,.,.,,1
0.8114 0.8822 0.9172 6 The scatter diagram illustrating ten pairs of values (x, y) is shown below.
(i) 0.6084 lo.7293l
7 y
0.5509 0.6694 0.7545 0.8329 0.8745
(ii) 0.5067 0.6215 0.7067 lo.7887l 0.8343 8 30
The tables are easy to use. The highlighted values are referred to in the following illustrations:
10
(i) Consider hypotheses
H 0 : p ~ 0 (there is no correlation between the variables)
HI: p > 0 (there is a positive correlation between the variables). 0 10 20 30 X
(a) Comment on the diagram.
This is a one-tailed (upper tail) test. At the 5% level, the critical value is found under
column 0.05. If r has been calculated from, say, six pairs of data, i.e. sample size 6, the (b) Calculate the value of r the product m .
shown in the diagram. , - oment correlanon coefficient for the pairs of data
critical value is 0.7293.
This means that in random samples from a distribution in which p ~ 0, only 5% of these (c) Assuming that X andy are jointly normall d' 'b . .
samples will give a value of r greater than 0. 7293. So, at the 5% level of significance, you the data constitutes a random sample testy t ~~tn5~eld w;th correlatron coefficient p, and
would reject H 0 (that there is no correlation) in favour of HI (that there is positive correlation between X and Y. ' ' a e o eve ' whether there IS a positive
correlation) if r > 0. 7293 (d) Would your conclusion be the same at the 1% level?
-1 0 Solution 13.1
(a) From the scatter diagram there appears t b .. .
(ii) The same tables are used when testing for a negative correlation. Consider hypotheses not appear to be very str~ng. o e some positive Imear correlation but it does
H 0 : p ~ 0 (there is no correlation between the variables) (b) In the diagram, the data points are
H 1: p < 0 (there is a negative correlation between the variables).
5 8 12 15 15 17
This test is one-tailed (lower tail). At the 1% level, look up the value in the column 20 21 '25 27
headed 0.01. For a sample size of eight pairs of data, the value given in the table is y 3 11 9 6 15 13 25 15 13 20
0.7887, indicating that the critical value is -0.7887. At the 1% level, you would reject H 0
if r < -0.7887. Using the calculator in LR mode, it can be shown that r ~ 0.6954 (4d.p.).
(See page 140 if you need to review how to calculate r with or with t I l
. .. . ~ au a ca cu ator.)
-1 -0.7887 0 (c ) Th e sigmftcance test IS carried out as follows:
<--------i
u·i!ic;;i ;n;:(
Ho~ p ~ 0 (there is no correlation between X and Y)
(iii) Now consider hypotheses HI. p > 0 (there IS positive correlation between X andY)
H 0 : p ~ 0 (there is no correlation between the variables) Perform a one-tailed (upper tail) test at the 5% level.
*
HI: p 0 (there is some correlation between the variables). 3· S;:rr'c :he· r_·c')l:'\ I tun
The sample size is 10.
CritC't'iC!li,
This test is two-tailed. At the 5% level of significance, you want critical values that give . H o I'f r > 0.5494.
From tables, the critical value is 0 .5494 , so reJect
2.5% in each tail, so look under the column headed 0.025. For a sample size of 10, the
critical value given in the table is 0.6319. This means that you would reject H 0 in favour From the calculations in (b), r ~ 0.6954 .
of HI if r >0.6319 orr< -0.6319 i.e. if Ir I> 0.6319. Since r_> 0.~494, Ho is rejected in favour of HI.
There IS evidence of positive correlation between X and y .
604 T
I
. 0 · 7155 so H o is rejected if r> 0.7155.
lue IS
to level ' the critical. va H
(d) For a test at the 1 o; SPEARMAN'S COEFFICIENT OF RANK CORRELATION, rs
Since r = 0.6954 < 0.7155, do not reJec;d o· that there is positive correlation
At the 1% level, there is not enough evl ence to say Spearman's coefficient of rank correlation is calculated using the ranks of the data. As you saw
between X and Y. on page 146, for n data points, if d is the difference between the ranks for a data point, then
r, 1-
l) .
nee for C1
l
Remember that -1 ,; ,; : r s,;;,;;: 1, with rs = 1 indicating perfect agreement between the rankings,
h f the following significance tests for the
1. Ineaco Value of car, Y
ff' h Value of house, x rs = -1 indicating that the rankings are in exact reverse order (complete disagreement) and
product-moment correlation coe Ictent t e r s = 0 indicating no correlation between the rankings.
calculated value of r is as shown. Us~ ta~les of . 110 12
critical values to decide whether Ho IS rejected or 106 9.5 Writing p, for the population rank correlation coefficient, the null hypothesis is always
not. 2.4
51
H 0: p, = 0 (there is no correlation between the rankings)
Level of 94 4.2
66 4.1 The alternative hypothesis is either
n r Hypotheses significance
26 0.3
7 0.893 *
Ho: P = 0, Ht: P 0 2% H 1: p, > 0 and there is positive correlation (agreement) between the rankings
(a) 72 3.2
1% (one-tailed (upper tail) test)
(b) 14 0.499 H 0: p = 0, H1 : P > 0 6.0
51
(c) 28 0.324 Ho: P = 0, Ht: P 0* 10%
53 7.8 or H 1: p, < 0 and there is negative correlation (disagreement) between the rankings
(d) 28 0.324 H 0: p = 0, H1 : P > 0 1% 15 (one-tailed (lower tail) test)
133
-0.419 Ho: P = 0, HI: P < 0 5%
(e)
(£)
16
12 -0.689 *
H 0: p = 0, H1 : P 0 10% A student argues that when two vanables
or H 1 : p, * 0 and there is correlation between the rankings (two-tailed test).
Ho:p=O,H1:p>0 1% {d) are correlated one must be the ca_use of the Note that the test for Spearman's coefficient of rank correlation does not make any
(g) 12 0.689
1% other. Briefly discuss this view wtth reg~~I assumptions about the population parameters. It is known as a non-parametric test.
(h) 10 0.733 H 0:p=O,Ht:p>0 to the data in this question. ( )
The critical values for Spearman's rank correlation coefficient are found from tables which are
A small bus company provides a. serv~ce for a 4 For the sets of data given, test th.e hypotheses
2 ' 'ndicated. Then draw a scatter dtagram and very similar in format to those for the product-moment correlation coefficient. An extract is
' small town and some neighbounng vtllages. In a
study of their service a random s~mple of .
~omment on whether this reinforces your shown below and the tables are printed on page 652.
20 J'ourneys was taken and the dtstances x, tn conclusion. 1 t'
· · mm
· utes ' were [p is the population product-moment corre a ton
kilometres and journey tunes t, m
recorded. The average dis.tance was 4.535 ~m es
coefficient.] Critical values for Spearman's rank correlation coefficient
and the average journey tlme was 15.15 mmut . (a)
Sample Level
(a) Using Lxz = 493.77, Ltz = 4897,
Lxt = 1433.8, calculate the product-moment X 7 12 13 17 23 25 30 20 size 0.05 0.025 O.Dl
correlation coefficient for these data.
(b) Stating your hypotheses clearly,, test? at the y 23 22 18 15 7 13 8 27 4 1.0000
5% level, whether or not there.ts evtden.ce of 5 0.9000
_ 0 ' H.
H o: p- p < 0·' 5% significance level 1.0000 1.0000
a positive correlation between JOUrney ttme 1'
6 0.8286 0.8857 0.9429
and distance. b d (b)
(c) State any assump~ions that have to e rna e X y 7 I o.7143 I 0.7857 0.8929
to justify the test m (b). (L) 8 0.6429
5.1 5.3 0.7381 0.8333
3. In order to investigate the sltrengfthhof the nd the
correlation between the va ue o a ouse a 5.4 10.2 9 0.6000 1 o.7ooo 1 0.7833
value of the householder's car, a random sa.mple 5.5 15.7 10 0.5636 0.6485 0.7455
of householders was questioned. T~e res.ultmg 10 5 11 0.5364 0.6182 0.7091
data are shown in the table, the umts bemg 10.2 10.9
thousands of pounds. 10.4 15.1 For a one-tailed test at the 5% level, sample size 7, look under column 0.05. This gives the
l.:x = 762 LX'= 68 088 LY = 64.5
15 5.3 value 0. 7143 and means that
LY' ~ 606.63 LXY ~ 6067.4 15.4 10.9
(a) Represent the data graphically. .
'" for H 1: p, > 0, H 0 is rejected if r, > 0.7143
15.6 15.3
(b) Calculate the product-moment correlation 30 25.1
coefficient. . bl 20.2 20 -1 0 0.7143
(c) Carry out a hypothesis test, at .a sutta e
level of significance, to determme whether . _ 0 ' H t·. p > 0·' 1% sigmftcance level
or not it is reasonable to suppose that the. H o·P-
value of a house is positively correlated wtth
the value of the householder's car.
• for H 1 : p, < 0, H 0 is rejected if r, < -0.7143.
T (b) The significance test is carried out as follows:
This gives the value 0.7000 and means that Perform a one-tailed (upper tail) test at the 1% level.
St;uv 1·iw i'c'I,Xtton
The sample size is 11.
" for H 1: p, * 0, H 0 is rejected if r, >0.7000 orr,< -0.7000, i.e. if Ir, I> 0.7000. cT!i(.'(i•.)l),
From tables (pag 652) h ·.
r, > 0.7091. e 't e cntica1value is 0.7091, so reject Ho if
-1 -0.7000 0 0.7000
From part (a), r, ~ 0.8636 ...
d' 1 0 0 1 4 4 4 16 0 0 0 2. An ex~ert on porcelain is asked to place 7 china (a) Calculate, to two decimal places, the
bowls m date order of manufacture assigning the Spearman rank correlation coefficient
6L-d 2 rank 1 to the oldest bowl. The actual dates of bet~een these hvo sets of marks.
L.d 2 ~ 30 and n ~ 11, so r, ~ 1- manufacture and the order given by the expert {b) statmg Y?ur. ~ypotheses and using a 5%
n(n 2 -1) are shown. level of stgmftcance, interpret your result.
6 X 30 Bowl (L)
~1- Date of manufacture Order given by expert
11 X 120
A 1920
~ 0.8636 ... 7
B 1857 3
c 1710 4
D 1896 6
E 1810 2
F 1690 1
G 1780 5
(a) Calculate, to two decimal places, a rank
4. Ten architects each produced a design for a new correlation coefficient for the performances
building and two judges, A and B, independently 3. State the rejection criteria b ..
awarded marks, x andy respectively, to the 10 of the ski-jumpers in the two jumps. Reject H if n, o tm~mg the critical value from tables.
(b) Using a 5% level of significance and quoting 0 ·
designs, as given in the table below.
from tables of critical values, interpret your Reject H 0 if R . H "f
r >critical value eJect 0 1
Design Judge A (x) Judge B (y) result. State clearly your null and alternative r <- critical value I I
hypotheses. (L) 4. Calculate r and compare with the critical value. r >critical value
1 50 46 5. Make your conclusion.
2 35 26 6. The positions in a league of 8 hockey clubs at the
48 end of a season are shown in the table. Shown Significance test for Sp earman,s rank correlation coeffic. t
3 55
4 60 44 also are the average attendances (in hundreds) at Note th t hi8 18
. •en ' r,
home matches during that season. a t a non-parametric test.
5 85 62 Calculate a coefficient of rank correlation 1. State H . p - 0 (th ·
25 28 State Ho· ,f-11 ere IS no correlation between the ranks of X and Y)
6 between position in the league and average home 1 asoows
7 65 30 attendance.
60 Hl: P s > 0 (there is agreement HI: Ps < 0 {there is
8 90
34 Club Position Average attendance between the ranks of
*
H1: Ps 0 (there is correlation
9 45 disagreement between the ranks of
10 40 42 X andY) between the ranks
A 1 30 X andY)
Calculate Spearman's rank correlation coefficient B 2 32 of X andY)
for the data and test, at the 5% level, the c 3 12 2. . and type o f t est, e.g. one-tailed test at the 5o/r: 1 I
State the level
hypothesis that there is no correlation between
the marks awarded by the two judges. (C) D 4 19 3. Sta.te ther~Jection criterion, obtaining the critical value fro: ::~I~s·
E 5 27 Rqect H 0 1f . H {} 1.f
R eJect R .· H "f
5. In a ski-jumping contest each competitor made F 6 18 rs >critical value r < .. I I eJect o I
s - cnt1ca va ue 1 1
two jumps. The orders of merit for the 10 7 15 4. Calculate r and compare with th .. I I rs >critical value
G s e cntJCa va ue.
competitors who completed both jumps are
H 8 25 5. Make your conclusion.
shown in the table.
Ski jumper First jump Second jump Refer to the appropriate table of critical values to
comment on the significance of your result,
A 2 4 stating clearly the null hypothesis being tested.
(L)
B 9 10 Miscellaneous worked example
c 7 5
D 4 1
8
Example 13.3
E 10
F 8 9 During the lambing season 8 ewes and the lambs the b . .
G 6 2 with the followmg results: y ore were weighed at the time of birth
H 5 7
I 1 3 Ewe A B c D E F G H
J 3 6
Weight of ewe, x kg 44 41 43 40 41 37 38 35
Weight of lamb, y kg 3.5 2.8 3.2 2.7 2.9 2.5 2.8 2.6
You may assume I:x ~ 319 ' 2::Y ~ 23.0, I:x ~ 12 785,
2
I:yz ~ 66 _88 ,
Summary I:xy ~ 923.2.
Ma ~ulate the product-moment correlatiOn coefficient between X and y
C I
e Significance test for the product-moment correlation coefficient, r .a mg any ~ecessary assumptions, test whether the data could h .
The assumptions are that X and Yare jointly normally distributed and the sample must with correlation coefficient p ~ 0 Use so' I I f . .f. ave come from a populatiOn
· a 10 eve o s1gm Icance · (AEB)
constitute a random sample from the whole populations of X and Y.
L State H 0 : p = 0 (there is no correlation between X and Y)
State Hi as follows
Hi: p > 0 (there is positive H 1: p < 0 (there is negative Hi: p *
0 (there is correlation
correlation between X correlation between between X and Y)
and Y) X andY)
2. State the level and type of test, e.g. one-tailed test at the 5% level.
610 /'. COi'JCISE CC)_.If~SF IIJ /\"l c: ·-- <, i~ :SrtCC; 1 61!
Coursework A c D B F c
Assume that X and Yare jointly normally distributed with product-moment correlation G E
Examination mark 92
coefficient p and the data form a random sample from the populations of X and Y. The 75 63 54 48
Coursework rank 45 34 18
significance test is carried out as follows: 1 3.5 5 2 7 3.5 8
Examination mark rank 1 6
1.. St-ate E-fu cmd l-!j. H 0 : p = 0 (there is no correlation between the weight of a ewe and its lamb) 2 3 4 5
ldl 6 7 8
*
H 1: p 0 (there is correlation between the weight of a ewe and its lamb) d'
0 1.5 2 2 2 2.5 1 2
0 2.25 4
2. Stan: level and 4 4 6.25 1
Perform a two-tailed test at the 5% level. 4
rype of test.
The sample size is 8. From tables, the critical value for a two-tailed test at I:Jl- 25.5, n= 8~ therefore r, ~ 1 6 I:d 2
J. State tht: rcicction
cnrcnon. the 5% level is 0.7067 (page 652, row n ~ 8, column 0.025). n(n 2 -1)
6 X 25.5
H 0 is rejected if I r I> 0.7067. ~1
8 x63
4. CalculCJte r. For the data, r ~ 0.868. = 0.696 (3 s.f.)
S. l\hkc umclusion. Since I r I> 0.7067, H 0 is rejected in favour of H 1 • There is evidence of (b) Ho: P, ~ 0 (there is no correlation)
correlation between the weight of a ewe and its lamb. H,: P, * 0 (there is evidence of correlation)
It is unlikely that the data came from a population with correlation Perform a two-tailed test at 5% level.
coefficient p ~ 0. From tables (page 651), critical value is 0 7381 (n- 8 1
R · . · - , co umn 0.025)
Note that the conclusion would have been the same if you had chosen to carry out a one· eject Ho If I r, I> 0. 7381. ·
tailed test. In this case H 1 is p > 0, the critical value of r is 0.6215 and H 0 is rejected since Since r ~ 0.696 < 0 7381 d .
r > 0.6215. (c) p f ' . . . ' o not reJect Ho. There is no evidence of correlation.
er ormance m the examination does not fl
re ect on performance in coursework.
il',l' ::(,[! !Utf\i'" 613
Furthermore students can now present their findings electronically, and teachers can store,
share and continually refine their lesson plans. And if you add to all this the obvious benefit of
the computer's ability to carry out calculations without effort, ICT methods are almost
guaranteed to enhance the enjoyment of those teaching and studying this subject.
USING A SPREADSHEET
A spreadsheet, such as Excel, has enormous power; it can effortlessly analyse huge data sets,
conduct simple simulations. Getting familiar with all this can come only with practice, and
there are plenty of spreadsheet tutorials around, both in print and on the net. For starters,
here is a summary of some of the more important features that relate to statistics, and in each
of the following pages, features that relate to specific topics are listed.
618 1
I
619
lllllllllllllj or~
Tools '""'solver: this finds the value of a cell that makes the target cell a max, a min a
~ ~
value, e.g. to solve xA 3 - 2 0, set A 1 1 and Bl ~AI 3 - 2. Select Bl then Tools- Solver.
A
Bl is the 'target cell', "Equal to Value of" (0), by changing cell Al. Solve!
To show formulae instead of results:
Tools =::}options==> View'=} Window options
TICK 'show formulas' (or use Ctrl- ')
To hide the grid lines:
Tools =}options = } View=::} Window options
UNTICK 'show gridlines'
To ensure auto-recalculation on pressing F9:
Tools =}options =} Calculations =}Automatic
For iterative solving, e.g. x ~ g(x):
Tools -options -Iteration
-Max iterations (100)- Max change 0.001
e.g. EnterAl ~ 1
Select A1
Enter~ COS( AI)
Press lf2J
For a cell formula referring to itself without iteration:
Tools =} options=} Iteration
Array . a sprea
. ous selection of cells m . d sh ee1. Arrays are referred to by -Max iterations (1) (leave Max change)
This word is used for a conhtmu d. tes e g. A2:B16 (e.g. Dl ~ Dl + 1 to increment by 1)
the top-left and bottom-ng t coor ma , .
To draw a histogram (applicable only to equal classes):
Tools -Data Analysis -Histogram
The edit menu (see Chapter 1)
To insert a data set from some other ~,ource it is sometimes useful to try
For random number generation (see Chapter 9):
Edit '""'Paste Special'""' select "text 1 ·n a cell and highlight it. Tools -Data Analysis
. e.g. 1, 2, 3 ' ... 1000: First enter the start va ue '
To generate a senes~ =}Random Number generation
Edit '""'Fill'""' Senes Continuous Uniform (a, b), normal (m, s), Bernouilli (p), binomial (n, p), Poisson(m),
Discrete, in 2 columns: x, P(X ~ x)
http:/lwww.stats.gla.ac.uk:80! ctil
USING INTERNET RESOURCES AND WORD CTI Statistics (changing to LTSN-CMSOR)
There is an ever-growing amount of useful information on the net for the study of statistics http:/lwww.kuleuven.ac. be!ucs!java!index.htm .
and probability. Mixed in with all that is the less useful, and the task of sifting out quality JAVA StatiStiCs- some fascinating 'applets' from Belgium
resources is getting harder. Listed here are some authenticated sites, but the pace of change
being what it is there is no guarantee that they will still be there and still be useful by the time
~~~::I{ur~stati newcastle. edu.au!surfstat!main/surfs tat. html
a Ia on me text. An mtroductory course by Annett D b I
this is read! http·!/ o son eta ., Newcastle University
· cast.massey.ac.nz
CAST: Computer Assisted Statistics Teachin [r . . .
Doug Stirling, Massey University Palmersto g NegishtraNtwn reqmred] -a complete course, by
Data sets ' n ort , ew Zealand
http:/119 3.61.1 07. 61/volume
http:!!lib.stat.cmu.edu!DASLI DISCUSS statistics teaching resources
DASL: The Data and Story Library (USA), categorised by topic. An Important contribution to the understandin of ..
Coventry University. g probal)!hty and statistics from the team at
http:!!www.maths.uq.edu.au/ -gks! data
OzDASL (University of Queensland, Australia). Australian version of the above. http·//www ·
p . .mzs.coventry.ac.uk! -styrrell!resource.htm
http :II forum.swarthmore. edu! workshops! sum96/data. collections!datalibrary! ersonal selectiOn of statistical web resources from .
co-author of DISCUSS Sidney Tyrrell, Coventry University,
The Data Library (from the Math Forum, USA)
http:!!www.ni.com.au!mercury!mathguys/mercindx.htm
Chance and Data (from Tasmania, Australia)
1
I
A bivariate data set can b d .
Getting all this into word . e create m various ways:
(a) By addmg 'cursors' perh .
( I , aps m a pattern th 'll h
Any text or graphics can be copied straight into Word. Simply ~;~e~ost y i~ a well-correlated line, but wi~~ : ' el~make a particular teaching point
aroun at Will subsequently. ne out Ier). Cursors can of course be
1. Mark any text you want, or hover over any graphic you want.
Use the right-click options: 'Select a II cursors' and 'C
2. Right-click 'Copy' (or Ctrl-C) This will change th . onvert to data set'.
. d' 'd e cursors mto a singled b'
3. Click where you want to insert it in Word m lVI ual cursor around if you hold down a~~r~ Ject, though you can still move an
4. Right-dick 'Paste' (or Ctrl-V) you can double-click on any one curs . .
box. or m the data set to open the 'Ed.It d ata set , dialogue
.
It is often better to paste into a text box, so you have 1nore control over the layout and
positioning. Note: any internet links (underlined in blue) will be copied too.
If data is presented on the web page in columns, it should copy and paste in TAB-separated ,I I
USING AUTOGRAPH
Autograph is a dynamic graphing package that operates in both bivariate and single variable
modes. In the bivariate mode, as well as a full range of equations and coordinate geometry
operations, data sets can be represented as scatter diagrams. In the single-variable mode, data
can be displayed in all the usual diagrams, and probability distributions can be drawn. A
variety of on-screen calculations are available.
Many of these operations can also be created very effectively on a spreadsheet, and
throughout this supplement both approaches are explored.
I OK Cancel
Help .. 1
(b) By using the Edit Data Set dialogu I . h . . .. ·. .
Imported by loading a CSV file ( e Jox. ere data can be entered directly in .
spreadsheet. comma-separated), or pasted in from t wo coPIumns
atrs, m
. a
Bivariate data
~ata can then be sorted (by x or by y) scaled b
In Autograph, the word 'cursor' is used to describe a coordinate point that is added by the
h~w Statistics' to create a dynamically linl d y an~ formula, or swapped over. Tick
user, either by 'point and dick' or entering coordinates directly. are ragged around (while holding down C~~). set o results, which change if any points
Most operations are available on the button bar, or through the right-click menu. This is
dependent on the selection of objects that has been made, and standard rules for object
selection are used.
Single variable statistics CHAPTER 1 REPRESENTATION AND SUMMARY OF DATA
Data sets come in all shapes and sizes these days. Computers can make light work of
presenting
job. data in a digestible form, but users need to take care to use the right tool for the
Using a spreadsheet
The following functions are relevant:
Lx SUM( array)
Lfx
SUMPRODUCT(array, array)
Lfx 2
SUMPRODUCT(array, array, array)
m = (l/n) Ix AVERAGE(array)
(1/n) I(x 2 ) - m 2STDEV(array)
n
COUNT(array)
COUNT!F(array, test)
kth smallest SMALL( array, k)
ktb largest LARGE( array, k)
Minimum MIN(array)
Maximum MAX(array)
Mode MODE(array)
Median MEDIAN(array)
Quartiles QUARTILE(array, q)
q = o- Min, q = 1: LQ, q = 2: Median,
q=3: UQ, q=4: Max
( = FREQUENCY(array1,array2)}
To get this to work you need:
(a) array1 with all the data in a single column
(b) array2 called the 'bin' array, listing tbe right hand ends of the classes, e.g. 20, 40, 60, 80
'samPle_-Siz_e>··_N:: ]10~ (c) array3 (empty) marked where you want the frequencies to go.
Sel8Ct Dlstrib: I Edit DistriQ.
J This operation then returns array of frequencies for <: 20, <: 40, <: 60, <: 80 and also> 80, but
f~ Wn_pefin·ed you need to have marked an array first ready to receive this information. (Note, it is one more
cell than the 'bin' array-2). This last cell is optional.
lmportCSV Memory I Recall
I NOTE: this formula is generating an array. Excel requires that you press SHIFT-CTRL-ENTER
when you have finished editing the formula: this puts curly brackets round the formula.
E·xportCSV Sortb_oi)o(
I ClearD'ata
I
I OK l Cancel J Help a
TICK 'Chart Output' to draw the histogram 'F mat Data Series'- Options- Set Gap
Double-click on the sbaded histogram section or 10
~idth to zero:~·------------------~~~--~~------------------~
10 20 30 40 50 so 70 80
100 90 1 0
()' 80 The dot plot is useful for showing where the raw data points actually are, especially when
c 60 drawn at the same time as another diagram, e.g. a box and whisker or a histogram.
(])
::l
cr 40 • Numerical statistics can be generated as text:
(])
~
u.. 20
0 (a) summary statistics (mean, mode, quartiles, SD, range, etc, for raw data and for grouped)
0~ co~ "\~ '0~ qp ~0 (b) tabulated results, including mid-interval value
,~
"'~
'?)~
"'~ ~0
(c) stem and leaf diagram (really only works for discrete integer data)
An example of a stem and leaf diagram, generated as text in the 'Results Box'.
0:
Using Autograph d
10: 1 3 5 5 6
~ith a groupe d d ata set entered (with or without underlying raw data) you can raw: 20: 2 2 4 4 4 5 5 5 5 6 6 9
9 9 9
30: 1 1 1 1 1 2 4 4 4 4 4 4
., Histogram (see next page) . 4 4 4 4 4 4 5 5 6
40: 0 0 1 2 4 5 5 6 6 8 8 8 9 9 9 9 9 9
. (frequency or percentile scale) 8 8
• Cumulative frequency d ~agram 50: 0 0 1 2 3 5
60:
100 f --- ' --I 70:
1
2
5 6 6
----,
80 80:
90: 2
- -- Example: Consider a grouped data set entered into Autograph defined by these variable-width
- -- class intervals:
- -
6 0,20,30,40,45,50, 100
-- --
--
.. and the following associated frequencies:
4
0, 10, 130,90,55, 75
'
1- - - If you select to draw a histogram from this data, the dialogue box asks you to choose
,1-- 'frequency' or 'frequency density'
~J[~~----!---~---+,----~--~6; 20 40
Frequency:
Cumulative frequency, frequency polygon an d box and whisker diagrams are similarly
Continuous:
10.0 20 0 0
0-20
25.0 10 10 10
20-30
35.0 10 130 140
30-40
42.5 5 90 230
40-45 Here, a data set is clearly being mis-represented: the mode is wrong and there is undue weight
47.5 5 55 285 to the final class.
45-50
75.0 50 75 360
50-100 Frequency Density:
Discrete:
9.5 20 0 0
0-19
24.5 10 10 10
20-29
34.5 10 130 140
30-39
42.0 5 90 230
40-44
47.0 5 55 285
45-49
360
50-99 74.5 50 75
When selecting frequency density, you need also to specify the 'per' unit. The default value ~1
so that tbe area under the histogram is a direct measurement of the frequency.
, _ 16 683 'Zfx 2 ~ 857 259
L(~360 'Z,x-. . 05 )SD~15.29(unchanged)
Mean= 46.34 ( = Contmuous mean- .
1
Example: Enter the following discrete raw data: Choose Type: linear, and
Options: Forecast forward/back
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 23, 24, 24, 25, 25, 27, 31 and draw a histogram using the
display equation, display R 2
unequal classes 0, 10, 40 (i.e. 0-9 and 10-39). There are nine items in each class, so plotting
'frequency' gives two equal frequencies. Here is the same data plotted as a frequency density.
I BRAIN SIZE - IQ I
10~--------------~~----------~~----------------------------~
9
8
7
6
5
4 900000
3
2 1111·111·' 800000
lliitlllllllalllilllllllli
ll.. lll.llll Ill
700000
10 20 600000
100 150 200
Y= 1244.3x + 770610
CHAPTER 2 CORRELATION AND REGRESSION IQ
R' = 0.1496
I
- ~ - --1
I
L
A B c
1
16 ~00 ~~ -~ =- -1 -- - I
- I 1The effect of_
J
2
DICE 1
DICE2
~ INT(6 *RAND())+ 1
~ INT(6 *RAND())+ 1
~·· I_
_1_ - -I - -
I 3 SUM ~ C1 + C2
~e~ i-- -\- \ _ 1 m~uy data 1- : _ 4
11 -~~~~t~~- -~--i- -- --r- 1 _ . -: ~- - -1 5 Score
'x'
2
'f'
~ CS + (B5 ~ $C$3)
1 I -
_:_-l~-1--!-~1- 6 Score 3
~--~~r----L._- --:--r-T- I
7 Score
- L~ =\=::-
---Hr~~J-=-=t 21 Score
12 Fill down, mark B4:(2) and plot
The logical statement in cell C5, '(B5 = $C$3)' equals 1 if TRUE, otherwise equals 0. This is a
simple way to add 1 to a total if a condition is met.
wo
1 10 w 30
However, a cell formula referring to itself is called a 'circular reference' and you need to set
. h. ch can be moved around by pressing Ctrl and the following to make it work in this instance:
Here a set of data has a rogue pornt (w '.d d line of best fit are also shown.
dragging with the mouse), and the centrO! an Tools -Options- Calculation tab- Tick Iteration
~Max iterations= 1 (leave Max change)
Then hold down F9 to run the simulation.
SUM of 2-DICE
50~~~~~
40 ~44'44'44'~
30~~~
I 20~~
-I I
I . ht
. . 1e o f least . . . illustrated ' relating to a variable stratg
Here the pnnctp . squares regresston ts bemg 10
line through the centrmd.
0
2 3 4 5 6 7 8 9 10 11 12
CHAPTER 3 PROBABILITY
To make the x-axis work properly in this chart, proceed as follows:
Using a spreadsheet for probability
Choose the 'Column' chart type and observe that it is plotting 'x' and 'f' against the row
There are various ways to create random numbers in Excel: number.
Random number 0 :-.;; x ~ 1 Click 'next'=} 'Chart Source Data'=} 'Series'.
RAND 0 ) Raudom integer 1 <: x <: 6
INT(6 *RAND()+
1
dom integer a<: x <: b 1 'f' is OK.
RANDBETWEEN(a, b) Ran T ls ~Add-Ins and tick 'Analysis Too
NOTE: If RAND BETWEEN does not work, go to oo Select 'x' (which is plotting on the wrong axis). Copy and paste its array into the Category X
axis labels slot, then click 'Remove' to take it off they-axis list.
Pack'.
···················----------------r----------··········~~~~~~~~ii~lf~~~Erf~S4iND5DIOCR\riR4NDOM~R!ML~~5
Internet resources for probability 2. Dice throwing
This is similar to the spreadsheet example above, but more automatic to use. Options include
(a) the sum of 2 dice
1. The DISCUSS site f C tr University covering many aspects (b) the difference between 2 dice
This is a growing set o f teac h'mg resources
. . rom oven Y '
(c) throwing one die
of school level probability and statlsUcs. (d) throwing n dice
. Iu d e one on Buffon's needle.
Simulations available me
The following formulae are available for generating discrete random variables in Excel:
BINOMDIST(r, n, p, T)
e.g. T ~ 0: X~ B(10, 0.5)
" , d data site from Tasmania . . BINOMDIST(2, 10, 0.5, 0)- P(X ~ 2)
2. The vhance an . . r 1 babilitytheorytostonesmthe
This has an excellent probability sectwn whlch m cs pro e.g. T ~ 1: (cumulative)
newspapers. BINOMDIST(2, 10, 0.5, 1)- P(X.;; 2)
POISSON(x, m, T)
Chance and Data in the News e.g. T ~ 0: X~ Po(4)
POISSON(2, 4, 0)- P(X ~ 2)
e.g. T ~ 1: (cumulative) PO!SSON(2, 4, 1)- P(X <: 2)
lVlainlndex Example: To produce the distribution and the cumulative distribution for X~ Bin(10, p)
NameA2 'n'
~.':.
••
,; ·. !fi!l··· ••.
\iil Name B2 'p'
Formula in D2 ~ binomdist(x, n, p, OJ
or for cumulative: = binorndist(x, n, p, 1}
From the Autograph 'extras'
Note 'x' is the column heading Cl and
this can be used in the formula.
Enter C2 ~ 0
To create 0-10 in C2-C12,
use Edit- Fill- Series
Fill down D2-D12 (double-click on the
,,,c---------~ D2 cell dot)
3.18
3.17
3.18 0.30
315~
0.25
n 0.20
0.15
3.13
0.10
3.12
0.05
3.11
0.00
110 L"'-'-'-'~~"'-'-'-'~~~~- 0 1 2 3 4 5 6 7 8 9 10
Cf-!APTEfFi 6, / ~\i'JLJ S C.:ONTii\JUOUS DISTPil-3UT!O;\JS f\1·10 rHt: f'J0f\rv1.l\L DISTRiBUTION 637
-,-- T - - ;-
i i :
1.20
1.00
~(1 qoo o.,s)
0.80
0.60
0.40 om -
0.20
0.00
0 1 2 3 4 5 6 7 8 9 450
500
550
The important principle to appreciate is that the total area must= 1. Therefore the function to
b2 nm11~d';m" be plotted must be f(x) = kx 2 , where k =I x 2 dx over the range.
b3 named "s'' Autograph automatically converts any f(x) entry to k.f(x), and areas can be measured on-
b4 named "x"_ _ . . . . . . . . . . . . . . . i screen by entering limits. By dragging the limits around, it can be seen that the total area= 1,
-~t=2::E!!ill~J:~~
and so areas represent probability.
(x-m)ls)
.' bG rmm~d ·y· i f
z={y-m)J's =i
-- ________ 1
{y-m)/s) _/ f(x) =kx 2
/
0.9
2
Example: X- N(500, 100 2 ): here areas between limits and inverse calculations are possible.
The parameters ,u and a can also be varied dynamically.
roo - - - - - - -; - - - - -: - - - - -- - - - - ,- - - - - -
0110)4·-----
N(500, 1002 )
- - - ,L
/
'
/
640 A CONCISE COURSE IN A · EVEI- STATiSTICS
. ·'--
Use New Statistics Page~ Add Grouped Data~ Use RawData~ Edit Raw Data~ Select
Distribution. There is the option to create a set of random data from the following probability
distributions:
Enter the sample size N, and press 'Create Sample'. Click 'OK', then 'Suggest Intervals' Example: CONFIDENCE(a, s, n)
(amend if necessary), then click 'Continuous' or 'Discrete' as appropriate, then 'OK'. ~hls. ~eturns the confidence intervals for .
sigmflcance level= a. Unfortun ' , a sample size of n from a population .
With a data set in place, select 'histogram' then 'autoscale'. Then choose 'Sample Means'. So for 9 5% confidence ' a = 1 - a9t5el/yl,OOa . Is a measure of the probability outsidewthlth SD = s,I s.at a
e mterva
In the 'Edit Sample Means' dialogue box, enter tbe sample size (e.g. n = 5). You can then
e.g. CONFIDENCE(1- 95/100 2 5 50)- 0
~c ..
I = samp Ie mean± 0.69 at '95%
· '
- . 69
(a) take samples one at a time, in which case the actual samples are indicated on the diagram
together with their mean. Example: CHITEST(array-1, array-2)
(b) take many samples (e.g. 100), in which case a dot plot is created. array-1 =actual frequencies
array-2 =expected frequencies
The Central Limit Theorem is very effectively demonstrated with almost any parent returns tbe x2 calculation for tbe two arrays
population. This enables a set of actual data (fre .
an underlying probability distributio~~encJes) to be tested against frequencies calculated from
~~~L+~~.~-1~~~~~~-~~-~~
T= 1: POISSON(2, 4, 1) = P(X,;;; 2)
""'
2
NORMDIST(x, m, s, T). e.g. for X- N(m, s )
T = 0 ~value of the pdf (for plotting the curve)
T= 1 ~P(X <x)
H. ere Ho IS· P = 0.2
under B(25, 0.2) and H is >
w
2
hmlts can be dragged up and down the x-~xi/ 0.2. If x;;, 9 Ho is rejected. The boundary
NORMINV(p, m, s). e.g. for X- N(m, s )
This returns the value x such that P(X < x) = p Example: Hypothesis testing on continuous probability distributions
NORMSDIST(z) "' "
returns probability P(Z ( z)
NORMSINV(p) 0.15 -- _- -~~-:_- _ Ho: N(23.68, 8.73)
returns z such that P(Z ( z) = p -- - -_ -_ - P(X$18.82) = 0.05 = 5%
STANDARDIZE (x, m, s) ~ z = (x- m)./s - __ TYPE 1 error
Example: A tough driving examiner claims to pass only 20% of his candidates. After 50 tests, "'
H1: N(18. 17, 4.16)
what is the smallest number of passes required to refute this claim at 5%? P(X~18.82) = 0.3Z5 = 37.5%
Answer: critbinom(25, 0.2, 0.05) = 2 TYPE 2 error
This means P(X,;;; 2) > 5%
whereas P(X ( 1) < 5%
[
644 11 CONCIS.
~ C0lJRS'-
~·J '··
IN '"-1 ~'Fl STAf!Si!CS
" ~J ~-
p 0.75 0.90 0.95 0.975 0.99 0.995 0.9975 0.999 0.9995 p 0.990 0.975 0.950 0.100
v~1 1.000 3.078 6.314 12.71 31.82 63.66 127.3 318.3 636.6 v 1 0.050 0.025
0.000 0.001 0.010 0.005
2 0.816 1.886 2.920 4.303 6.965 9.925 14.09 22.33 31.60 2 0.004 2.705
0.020 0.051 3.841 5.024
3 0.765 1.638 2.353 3.182 4.541 5.841 7.453 10.21 12.92 3 0.103 4.605 6.635 7.879
O.l15 0.216 5.991 7.378
4 0.741 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610 4 0.352 6.251 9.210 10.597
0.297 0.484 7.815 9.348
0.711 7.779 11.345 12.838
5 0.727 1.476 2.015 2.571 3.365 4.032 4.773 5.893 6.869 5 0.554 9.488
0.831 1.145 11.143 13.277
6 0.718 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959 9.236 11.070 14.860
6 0.872 12.832 15.086
7 0.711 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408 1.237 1.635 16.750
7 1.239 10.645 12.592
8 0.706 2.306 2.896 3.355 3.833 4.501 5.041 1.690 2.167 14.449 16.812
1.397 1.860 8 12.017 14.067 18.548
1.646 2.180 16.013
9 0.703 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781 2.733 13.362 18.475 20.278
9 2.088 2.700 15.507 17.535
3.325 14.684 20.090 21.955
10 0.700 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587 10 2.558 16.919
3.247 3.940 19.023 21.666
2.201 4.025 4.437 15.987 18.307 23.589
11 0.697 1.363 1.796 2.718 3.106 3.497 11 20.483
3.053 3.816 23.209 25.188
12 0.695 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318 4.575 17.275
12 3.571 19.675
2.160 3.372 3.852 4.221 4.404 5.226 21.920 24.725
13 0.694 1.350 1.771 2.650 3.012 13 18.549 21.026 26.757
4.107 5.009 23.337
14 0.692 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140 5.892 19.812 26.217 28.300
14 4.660 22.362
5.629 6.571 24.736 27.688
15 5.229 21.064 23.685 29.819
15 0.691 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073 6.262 7.261 26.119
22.307 29.141 31.319
16 0.690 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015 24.996 27.488
16 5.812 30.578 32.801
17 0.689 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.965 6.908 7.962
17 6.408 23.542 26.296
18 0.688 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.922 7.564 8.672 28.845 32.000
18 7.015 24.769 27.587 34.267
19 2.861 3.174 3.579 3.883 8.231 9.390 30.191 33.409
0.688 1.328 1.729 2.093 2.539 19 25.989 28.869 35.718
7.633 8.907 31.526
10.117 27.204 34.805 37.156
20 0.687 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.850 20 8.260 30.144
9.591 10.851 32.852 36.191
3.819 28.412 31.410 38.582
21 0.686 1.323 1.721 2.080 2.518 2.831 3.135 3.527 21 34.170
8.897 10.283 37.566 39.997
22 0.686 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792 11.591 29.615
22 9.542 32.671
3.768 10.982 12.338 35.479 38.932
23 0.685 1.319 1.714 2.069 2.500 2.807 3.104 3.485 23 30.813 33.924 41.401
10.196 11.689 36.781 .
24 0.685 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.745 13.091 32.007 40.289 42.796
24 10.856 35.172 38.076
12.401 13.848 41.638
3.725 25 11.524 33.196 36.415 44.181
25 0.684 1.316 1.708 2.060 2.485 2.787 3.078 3.450 13.120 14.611 39.364 42.980
3.707 34.382 37.652 45.558
26 0.684 1.315 1.706 2.056 2.479 2.779 3.067 3.435 26 40.646 44.314
12.198 13.844 46.928
27 0.684 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.690 15.379 35.563
27 12.879 14.573 38.885 41.923
28 0.683 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674 16.151 36.741 45.642 48.290
28 13.565 40.113 43.194
0.683 3.396 3.659 15.308 16.928 46.963
29 1.311 1.699 2.045 2.462 2.756 3.038 29 37.916 41.337 49.645
14.256 16.047 44.461
17.708 39.088 48.278 50.993
30 0.683 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646 30 14.953 42.557
16.791 18.493 45.722 49.588
3.307 3.551 40.256 43.773 52.336
40 0.681 1.303 1.684 2.021 2.423 2.704 2.971 46.979 50.892
3.232 3.460 53.672
60 0.679 1.296 1.671 2.000 2.390 2.660 2.915
120 0.677 1.289 1.658 1.980 2.358 2.617 2.860 3.160 3.373
= 0.674 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291
.4Fi-"b\Di/ 653
RANDOM NUMBERS
~ VALUES FOR CORRELATION COEFFICIENTS . 65 23 68 00 77 82 58 14 10 85 11 85 57 11 73 74 45 25 50 46
JRITICAL . t. n coefficient p ts o. 09 56 76 51 04 73 94 30 16 74 69 59 04 38 83 98 30 20 87 85
twn corre1a to
f h h thesis t hat a popu1a
h db ample 55 99 98 60 01 33 06 93 85 13 23 17 25 51 92 04 52 31 38 70
These tables concern tests o ht e ypo urn values which need to be reac e y als d test 72 82 45 44 09 53 04 83 03 83 98 41 67 41 01 38
. h bl are t e mmtm l h n a one-tate . · 66 83 11 99
The values m t e ta es b . .tficant at the !eve s own, o 04 21 28 72 73 25 02 74 35 81 78 49 52 67 61 40 60 50 47 50
. . . order to e stgn
correlation coe fftctent lU ff . t
Spearman's Coe tcten 87 01 80 59 89 36 41 59 60 27 64 89 47 45 18 21 69 84 76 06
31 62 46 53 84 40 56 31 74 76 52 23 72 95 96 06 56 83 85 22
Product-moment Coefficient Level 29 81 57 94 35 91 90 70 94 24 19 35 50 22 23 72 87 34 83 15
Sample O.Dl 39 98 74 22 77 19 12 81 29 42 04 50 62 34 36 81 43 07 97 92
Level 0.05 0.025
0.005 size 56 14 80 10 76 52 38 54 84 13 99 90 22 55 41 04 72 37 89 33
0.025 O.Dl
0.10 0.05 1.0000 29 56 62 74 12 67 09 35 89 33 04 28 44 75 01 57 87 45 52 21
0.9900 4 1.0000
0.9500 0.9800 0.9000 1.0000 93 32 57 38 39 36 87 42 72 55 73 97 98 36 57 41 76 09 11 68
0.8000 0.9000 0.9587 5 95 69 51 54 43 19 20 49 57 25 90 55 26 20 70 98 43 73
0.9343 0.9429 56 45
0.8054 0.8783 0.8857 65 71 32 43 64 67
0.6870 6 0.8286 22 55 65 65 48 86 10 88 20 12 40 18 49 25
0.8822 0.9172 0.7857 0.8929 90 27 33 43 97 84 20 57 49 91 41 20 17 64 29 60 66 87 55 97
0.7293 0.8114 7 0.7143
0.6084 0.8329 0.8745 0.7381 0.8333 90 29 42 45 61 34
0.7545 0.6429 30 13 30 39 21 52 59 28 64 98 08 76 09 27
0.5509 0.6694 0.8343 8 0.7833
0.7067 0.7887 0.6000 0.7000 99 74 06 29 20 55 72 70 11 43 95 82 75 37 90 24 77 43 63 21
0.5067 0.6215 0.7977 9 0.7455 87 87 66 91 16 97 51 50 61 36 96 47 76 68 49 11 50 56 51 06
0.6664 0.7498 0.5636 0.6485
0.4716 0.5822 0.7646 10 46 24 17 74 97 37 39 03 54 83 34 00 74 61 77 51 43 63 15 67
0.6319 0.7155 0.7091
0.4428 0.5494 0.6182 66 79 81 43 40 92 84 72 88 32 83 24 67 01 41 34 70 19 26 93
11 0.5364
0.6851 0.7348 0.5874 0.6783 36 42 94 58 83 30 92 39 18 40 03 00 12 90 32 37 91 65 48 15
0.5214 0.6021 12 0.5035
0.4187 0.6581 0.7079 0.5604 0.6484 07 66 25 08 99 27 69 48 85 32 16 46 19 31 85 02 86 36 22 96
0.4973 0.5760 13 0.4835
0.3981 0.6339 0.6835 0.5385 0.6264 93 10 05 72 18 26 36 67 68 48 31 69 68 58 93 49 45 86 99 29
0.4762 0.5529 14 0.4637 49 50 63 99 26 71 47 94 32 71 72 91 34 18 74 06 32 14 40 80
0.3802 0.6120 0.6614 0.5214 0.6036
0.4575 0.5324 15 0.4464 20 75 58 89 39 04 42 73 37 93 11 07 28 77 91 36 60 47 82 62
0.3646 0.5923 0.6411
0.4409 0.5140 0.5029 0.5824 02 40 62 09 00 71 09 37 80 44 50 37 32 70 20 38 71 86 75 34
0.3507 16 0.4294
0.5742 0.6226 0.4877 0.5662 59 87 21 38 29 78 72 67 42 83 65 21 54 79 66 42 47 86 31 15
0.4259 0.4973 17 0.4142 48 08 99 66 43 38 28 13 50 25 47 93 11 15 07 84 28 30 19 07
0.3383 0.5577 0.6055 0.4716 0.5501
0.4124 0.4821 18 0.4014 54 26 86 75 44 15 20 39 20 03 58 54 80 29 62 53 06 97 71 51
0.3271 0.5425 0.5897 0.4596 0.5351
0.4000 0.4683 19 0.3912 35 35 58 45 23 58 63 66 09 62 80 92 14 55 81 41 21 48 87 34
0.3170 0.5285 0.5751 0.4466 0.5218
0.3887 0.4555 20 0.3805 73 84 90 49 01 21 90 29 57 06 68 73 51 10 51 95 63 08 57 99
0.3077 0.5155 0.5614
0.3783 0.4438 0.4364 0.5091 34 64 78 00 92 59 67 74 58 48 92 09 42 20 40 37 63 80 58 93
0.2992 21 0.3701 68 56 87 47 63 06 24 71 41 98 79 06 0718 58 29 16 49 67 37
0.5034 0.5487 0.4252 0.4975
0.3687 0.4329 22 0.3608 72 47 05 42 88 07 27 55 58 74 82 08 42 28 26 48 25 32 00 31
0.2914 0.4921 0.5368 0.4160 0.4862
0.3598 0.4227 23 0.3528 44 44 96 75 89 57 12 60 42 38 77 36 45 69 21 68 32 70 04 96
0.2841 0.4815 0.5256 0.4070 0.4757
0.3515 0.4133
0.5151 24 0.3443
0.4662
28 11 57 47 61 57 89 88 62 18 93 67 57 32 9672 21 17 13 54
0.2774 0.4716 0.3977 87 22 38 88 91 99 16 08 17 76 52 14
0.3438 0.4044 25 0.3369 27 47 98 86 35 68 23 85
0.2711 0.4622 0.5052 44 93 14 59 67 40 24 10 11 63 40 47 07 56 14 22 62 74 93 39
0.3365 0.3961 0.3901 0.4571
0.2653 26 0.3306 81 84 37 25 90 43 56 62 94 58 49 03 84 22 57 22 47 98 86 37
0.4534 0.4958 0.3828 0.4487
0.3882 27 0.3242 09 75 35 21 04 47 54 08 98 44 08 16 44 86 69 71 20 52 64 94
0.2598 0.3297 0.4869 0.4401
0.3809 0.4451 0.3180 0.3755 77 65 05 04 22 18 20 10 81 87 05 69 43 70 96 76 42 05 21 10
0.2546 0.3233 0.4785 28 0.4325
0.3739 0.4372 0.3118 0.3685 19 06 51 61 34 03 61 55 98 58 83 50 01 48 99 85 08 67 15 91
0.2497 0.3172 0.4705 29 0.4251 19 62 32 28 04 91
0.3673 0.4297 0.3624 52 91 87 07 42 48 65 24 86 09 87 68 55 51
0.3115 30 0.3063
0.2451 0.4226 0.4629 52 47 25 14 93 91 75 51 49 26 49 41 20 83 30 30 43 22 69 08
0.3061 0.3610 0.3128 0.3681
0.2407 0.2640 52 67 87 40 63 41 91 86 10 47 80 70 56 87 25 86 89 94 21 42
0.4026 40 0.3293
0.3120 0.3665 0.2353 0.2791 66 25 71 73 78 60 50 62 91 04 95 97 64 16 71 31 32 80 19 61
0.2070 0.2638 0.3610 50 0.3005
0.2787 0.3281 0.2144 0.2545 29 97 56 42 56 90 16 75 74 95 99 26 01 63 25 16 54 18 54 46
0.1843 0.2353 0.3301 60 0.2782 15 25 03 68 92 45 53 00 06 29 46 43 46 66 27 12 85 05 22 44
0.2542 0.2997 0.1982 0.2354
0.1678 0.2144 0.3060 70 0.2602 82 08 65 67 64 13 51 14 38 28 24 30 39 62 20 35 23 90 57 36
0.2352 0.2776 0.1852 0.2201
0.1550 0.1982 0.2864 80 81 35 03 25 87 24 83 59 04 67 51 52 26 21 69 75 87 28 61 50
0.2199 0.2597 0.2453
0.1448 0.1852 0.1745 0.2074
0.2702 90 0.2327
0.2072 0.2449 0.1654 0.1967 Each digit in this table is an independent sample from a population where each of the digits 0
0.1364 0.1745 0.2565 100
0.1966 0.2324 to 9 has a probability of occurrence of 0.1. It should be noted that these digits have been
0.1292 0.1654
computer generated, and are therefore 'pseudo' random numbers.
ANSWERS
0-~~
6. (a) 7.4 hours, 0.5 hrs
(b) 0.074 g, 0.005 g
0 5 10 15 20 25 30 35 40 45
time (s)
)6 A. CONCISE COURS[ !I\ .A.-LE''/[L ST.A.TISTiCS
/-\NS\/ci!:.RS 657
. (a) 5.
Mass (g) Frequency f. d. f.d.
Speed 0 20- 24- 30- 32- 38- 48- 60- 0.3 J 6
ii! iii'! . !)
85-89 4 0.8 tti
90-94 6 1.2 frequency 20 24 24 16 12 10 6 0 i~. f. d.
!~L
95-99 7 1.4
0.2
p,
100-104 13 2.6 6. Boundary points 176.5, 186.5, 191.5, 196.5, 201.5, 206.5, 4
105-109 10 2 216.5
110-114 5 1 f.d. 1.2, 1.6, 1.6, 1.8, 1.4, 0.6
0.1
.i!
115-119 5 1
2
iLl ;I' .illi'
(b)
I!' d 1. f.d.
0 jill!_
2
w H 50 100 150 200 250 300
score
0
·r
111111111
height (em)
0 30
186.5 196.5 206.5 216.5 f. d. (c) 12.9 m (3 d.)
176.5
height (em) 8. (a) Boundary points 9.5, 19.5, 24.5, 29.5, 30.5, 34.5,
H~ !4. 20 39.5, 59.5
1:8111
! 7. Plot polygon at (0.75, 2), (2.25, 41), (4.5, 7!), (9, 31), f.d. 2, 4, 3, 14, 4, 2, 0.5
(13.5,2), (18, 1). (b) 28 seconds.
84.5 94.5 104.5 114.5 8.
modal class is 100-104 mass (g)
Exercise le Weighted means (page 36)
Number of occurrences of c Frequency Width f.d.
(c) 8 6 6 7 8 1. 10.4
1
9 222233 0-2 1 3 2. Class teacher, 1.65%
9 5 6 6 7 8 9 9 3-5 5 3 1i' 45 50 55 60 65
3.
4.
40.6
4
10 0 0 0 1 1 1 1 1 2 2 3 3 4 height (em)
6-8 6 3 2 5. 5, 65.8
10 5 5 5 6 6 7 7 8 8 9
9-11 3 3 1 The maize seedlings showed a tendency to grow taller with
11 013 3 4 ]Key:10I3means103] the stronger solution.
1166788 . 12-14 5 3 1' Exercise lf Mean and standard deviation
15. (a) a=20,b=26,c=12
mode= 101 g
15-17 4 3 1'
' (b) 88
(page 44)
3. Boundary points 0, 25, 60, 80, 150, 300
f.d. 2.48, 2, 4.4, 4, 0.2
' 1. (a) 5, 2 (b) 8.5, 1.80 (c) 18.8, 6.46 (d) 10l, 4.10
Plot boundaries at -0.5, 2.5, 5.5, 8.5, 11. 5 , 14 ·S, 175 Exercise lc Pie charts (page 26) (e) 3.42, 1.91 (f) 205,3.16
or at 0, 3, 6, 9, 12, 15, 18 2. (a) f.d. 0.2, 0.32, 0.62S, 1.04, 0.08
1. (a) 1540,26°,64°, 116°
9. Plot polygon at (18, 17.5), (22.5, 94), (27.5, 107), (c) 5.51 em h ' ~.
(32.5, 56), (40, 11.8). . !I' ' '
2. 208°,460,38°,36°, 32°; 5.25 em
Modal class 25-30, skewed with a tail to the nght.
(Other answers possible) S
3. 66°, 156°,24°,42°, 72°; 5.5 em, 6 em; 50° f.d. w u II'! :;
10. Boundary points -0.5, 9.5, 14.5, 19.5, 29.5, 39.5, 9 ·5
(say)
or o, 10, 15, 20, 30, 40, 60 (say)
4. (a) £120 000 (b) 68 000 (c) 90", 2JO, 9", 30"; 7.5 em
S. (a) 42 (b) 40o (c) 91; 420,30.0 em
6. (a) 860, 38", 32", 20", 168", 16" (b) 5.5 em
0
200 300 400
i' Iii
500
7. (a) £2000,£8000 (b) £400 (c) 2JO (d) 80"
2 f.d. 0.5, 1.6, 6.4, 4.1, 1.6, (0.1). 69 5 wage(£)
8. 28.8", 72", 115.2", 144"; 180
11. Boundary points 9.5, 29.5, 39.5, 49.5, 59.5, 64.5, . , (b) £338.25,£59.60
9. (a) £4500 (b) 1550, 1650 (c) 132", 24"; 8 em
1 84.5 85 3. 69.3, 1.7
or 10, 30, 40, 50, 60, 65, 70, 4. 11S.8, 7.58
or 9,29,39,49,59,64,69,84
Exercise ld The mean (page 34)
5. 16.6 seconds, 2.63 seconds
f. d. 1.1, 1.8, 2.2, 2.4, 2.8, 2.4, 1.6 1. (a) 9.7 (b) 154.8 (c) 51.375 (d) 1775j 6. 6.8, 1.11
50 100 150 200 250 300 (e) 0.908 (3d.) (f) 4 (g) 29.54 (h) 122.82 7. (a) 2 min 38 sec, 1 min 54 sec
12. 6, 8, 8, 6, 4, 10 50 300
time (mins) 13 Take boundary points 50, 100, 1SO, 200,2 •' 2. 49.3
{b) Histogram f. d. 6, 10, 15, 2.5, 0.8
. Lucy' Plot (75, 0.12), (125, 0.28), (175, 0.2), (225, 0.12), 3. 45 (2d.)
4. Boundary points 40.5, 50.5, 55.5, 60.5, 70.5, 75.5 Frequency polygon: plot (0.5, 6), (1.5, 10), (2.5, 15),
(275, 0.08) 25 0 32) 4. {a) Boundary points 0, 5, 10, 15, 20,40 (4, 2.5), (7.5, 0.8)
f.d. 2.1, 12.4, 11, 5, 2.4 Jack: Plot (75, 0.04), (125, 0.12), (175, 0.2), (2 ' . ' f.d. 2.4, 7.6, 8.4, 4, 0.4 8. 29, 5.9
(275, 0.12) (b) £11.92 9. 5.10
5. Boundary points 0, 15, 30, SO, 70, 100 10. 5
f.d. 3.6, 5.2, 6, 4.4, 2 11. (a) 10 (b) 11.7
10 43.35 years
12. (a) 121,6.19 (b) 14, 1703.8 (c) 1716,3.59
6. 21.4 em (d) 1026,58 770
f.d.
7. (a} There should not be gaps between the bars. Heights 13. {a) Frequency=5+18+22+28+22+18+5=118
5 should be adjusted so that area= frequency (Area= f. d. x width)
(b) Boundary points 4.S, 9.S, 12.5, 15.S, 18.5, 28.5 (b) Symmetric. Midpoints of intervals have been taken to
f.d. 2.8, 6, 5, 1j-, 0.8 represent the interval.
(c) 3.5 mm (2 s.f.)
0 14. 28.15, 3.84
40.5 50.5 60.5 70.5 15. 5.3
mass (kg) 16. 30.0 mph, 5.85 mph
(b) 82%
xercise lg Mean and standard deviation 6. (a) numberofgoals 0 ~1 "2 .;;:3 .;;;4 (.5 .;;:6 (c) 6.5, median
8. r;::.::::-;:(~---;-,...--c----.
trme mlns) cumulative frequency
lage 50) (d) Histogram to show pH value
cumulative 0 1 4 6 11 19 25 <39.5 0
l. 19
L 8
frequency .•
.£ 4
u
<44.5
<49.5
8
30
3. 7 (b) <54.5 64
4. 3.74 25 II'
';,'! hH'u i>
~ 3 <59.5 94
5. a=6,b=4
6. 15.6, 7.66
~
g 20
~
,, l <64.5 120
r distance (km)
(a_)~9:n:Iin~s~(=b~)~A~p~p~ro:x:·~1~1°:Yo~;~56~m~io~"~--
~
0.
1.
2.
3.
2.3, 1.41
11.7%, 2.2%
(a) 4.6, 2 (b) 4.56, 2.04
25 1 2 4 4
B 5
0
0 2
:iii.
3 4
iii
5 6
li 9.
0
cumulative frequency
0
30 0 1 1 2 2 2 3 3 3 4 number of goals <4 1
35 0 1 2 3 3 3 3 4 <10 3
(c) 5 <20
40 0 2 2 4 9
(d) 2
45 0 4 3. <35 28
7. (a) 2 (b) 3 (c) 2.47 (d) 1.94 mass (g) cumulative frequency
so 2 8. (a) 2,3 (b) 2
<60 40
55
60 1
I
Key: 45! 4 means 49[ (c) It only considers the middle 50% and does not take 3
5
< 100 50
account of large families.
Features: modal class 30-34, skewed to the right, 61
extreme value (outlier), 36.87, 35.59
l4. £195.45,£14.12
Exercise 1k Cumulative frequency, median and
10
22
Cumulative frequency curve to show distances travelled
50
lfi'" I''"'
,,
quartiles- grouped data (page 81) 32
l5. 11.87, 0.80 38 !!
Some answers are approximate and depend on the curve drawn 40
l6. 4.44 40
Exercise lh Scaling sets of data (page 55) 1· {a) r-m--as-s~(k~g~)-----c-um--u~la~ti-vo-f~r,-q-u-,n-c-y' Plot (50, 3), (54, 5), (58, 10), (62, 22) (66 32) (70 38)
(74, 40) ' ' ' , ,
1. (a) 6, 2.14 (b) 516,2.14 (c) 78,27.8 .;;: 39.5 0 Median mass= 61.3 g
2. 50, 12 ..:;;44.5 3 4 · (a} time (minutes)
3. (a) p + k, a (b) Pf.l, pa; 3,u + 5, 3o 5 cumulative frequency
<49.5
4. (a) 2 (b) 200 (c) 2.02 (d) -4, -1, 2, 5, 8, 11,14 <54.5 12 <S 2
5. (a) a~~. b ~ 22 (b) 70 (c) 76 <59.6 30 < 10 4 10
6. (a) 38, 8.99 (b) 34, 77 <64.5 48 <15
7. a=l.6,b=10 <69.5 51 <20
7
13
!n
8.
9.
a= 0.8, b = -5; 6.25
(b) Take mark intervals 0 < mark < 10, 10 < mark < 20,
<74.5 52 <25 25 0 "' !m W 1
.;;;30 41 0 20 40 60 80
Plot !00
etc. 05 47 o, M
f.d. 0.1, 0.8, 1.9, 2.8, 2.5, 1.7, 0.7, 0.3, 0.1.
(39.5, 0), (44.5, 3), (49.5, 5), (54.5, 12), (59.5, 30),
<40 so "" distance (km)
r
(a"-)c-;;32;;:=:km~·'i(~b)C..::A:"p'::pr~o::'x:_.3:'0~k"'".'n~lcC)I_A'.'pPip!"r".'ox";. 54%
(64.5, 48), {69.5, 51), (74.5, 52). Join with a smooth
boundaries 0, 10, 20, 30, 40, 50, 60, 70, 80
curve. (b) 24 (c) 26 (d) 23 {c) 25 mrns (f) 4.5 mins 10.
(c) midpoints 5, 15, 25, etc, 40.4, 15.4; price (£x} cumulative frequency
(b) 21 (c) 14 (d) 62 kg (o) 58.4 kg (f) 7.2 kg 5. (a) 687.5 hours (b) 13.2 hours
(d) a~ 24, b ~ 0.65 (2 d.)
10. (a) 12.5 (b) 20; 80, 5. 2. (a) I ii(j !!''(\ 6. (a) 80.75 g (b) 215 0
50 7. (a) Cumulative frequency cwve to show maximum temperatures 6
Exercise 1i Coding (page 58) H'ti 16
28
1. (a) 313.76,5.19 (b) 431,132 (c) 0.0171,0.00818 40 "110 41
2. 51.235,0.927 < 120 48
~~~~~1
3. 89.3275 53
4. 31.7mins.
5. 71.2, 3.82 Plot (75, 0), (95, 6), (100, 16), (105 28) (110 41)
(120, 48), {135, 53) ' ' ' '
6. 46~ sees.
(a) £104 (b) £13 (c) 47
11. x=25,y=17
Exercise lj Cumulative frequency, median and Hili 12. Plot (405, 0), (415, 4), (425, 7) (435 13) (445 23)
quartiles - ungrouped data (page 73) (455,28), (465,30}. ' , ' ' '
10 437, 412.5, 453.
1.
2.
(a)
4
9 (b) 207 (c) 1896 (d) 0.55
:::: 13. Plot (80, 0), (85, 6), (90, 18), (95, 40) (100 71) (105 86)
3. (a) 61 (b) 52 (c) 73 (d) 21 5 0 5 10 15 20 25 30 (!1 O, 93),(115, 97), (120, 99), (125, lOO) ' ' ' '
!
4. (a) 46,35 (b) 1.8, 1.2 (c) 20.5, 11.5 0 temperature "C (a) 97 mms (b) 10 mins (c) 62
5. (a) 7, 2 (b) 14, 3 4.4 4.8 5.2 5.6 6.0 6.4 6.8 7.2 7.6 8.0 8.4 14. Plot (165, 0), (170, 18), (175, 55), (180 115) (185 180)
pH value (b) 12"'C (c) 80 (d) Approx. 10% (190,228), (195,250) ' ' ' ,
(a) 180.5 em (b) 175.5 em (c) 187 em (d) 189.5 em
15 · (a) 5? mins (b) 71.5 mins (c) 32%.
660 /-\ CONC!SE COUf\St: if·l A-i_t_\T_ s·:-.c\TiSTiCS
(c) u.c.b. 0, 5, 10, 15, 20, 25, 35; c.f. 0, 1, 6, 9, 11, 12, 12. (a) Stem Leaf
16. Plot (69.5, 0), (74.5, 8), (79.5, 28), (84.5, 53), (89.5, 84),
13; Q, ~ 7.25, Q, ~ 10.8, Q, ~ 16.875; 6.075, 3.55; 4 1234467788 (d) a= 0.85, b = 1 (dependent on values in (c))
(94.5, 94), (99.5, 100). (c) yes
positively skewed 5 0222346788
9.3 sees, 22,75.5 sees. (d) u.c.b. 0, 5, 10, 15, 20, 25, 30; c.f. 0, 5, 20, 45, 90, 6. (a) (i) 49.66 (ii) 433.97 (iii) 20.83
6 023366778
17. SOp, £4.96, £5.96. Large amounts affect the mean but not (b) c.f. 3, 9, 18, 28, 40, 58, 72, 83, 88
140, 160; Q, ~ 14, Q, ~ 18.9, Q, ~ 23; 4.1, 4.9; 7 00224467888
the median (c) Plot (0, 0), (10, 3), (20, 9), (30, 18), (40, 28), (50, 40)
negatively skewed 8 01255667 r~~------~
18. Histogram: frequency densities 0.2, 0.5, 0.9, 0.8, 0.1
5. X= 63.9, s = 29.5, outliers would be less than 4.9 mins, 9 3 3.4 I
Key: 412 means 42/ (60, 58), (70, 72), (80, 83), (90, 88)
(d) (i) 52 (ii) 32
,
thickness (mm) 0 <20 <30 <40 <50 <60 greater than 122.8 mins, outliers are 133, 144. (b) Q2 = 66 mtles, Q 1 =52 miles, Q 3 = 78 miles
(c) (t) 11
6. Compare median, quartiles, range, skewness
cumulative number 0 2 7 16 24 25 7. c.£. 11, 39, 77, 111, 138, 150
7. December: Q 1 = 0.3, Q 2 = 1.8, Q 3 = 2.7;
of strata July; Q, ~ 4.1, Q, ~ 6.5, Q, ~ 9.8 Take as boundaries 0.90, 1.15, 1.30 etc. or 0.91, 1.16,
40 50 60 70 80 90 1.31, etc. or 0.905, 1.155, etc. Median ""£1 30
Plot (0, 0), (20, 2), (30, 7), (40, 16), (50, 24), (60, 25) 8. (a) Stem(£) Leaf (p) · ·
36 mm, 15 mm, 0.24. Distance (miles)
(d) (i) Gives a visual impression of the data whilst 3 40 60 75 95
Exercise 11 Skewness (page 90) keeping the details. 4 20 50 75
July (ii) Gives an immediate impression of an 5 @ 75
1. (a) 0.535 (b) -0.674 a~proximately symmetrical distribution with the
0 5 10 6 45 60
2. -2.4 Hours of sunshine mtddle 50% lying between 52 and 78 miles. 7 25
3. 2 8 75
4. {a) Frequency densities: 0.8, 3, 5, 1.8, 1.2, 0.47, 0.2 8. (a) 0 1 2 2 5 9 Miscellaneous exercise ln (page 110)
1 0 0 2 3 5 7 9 9 9 60
(b) Positively skewed 10
2 25999 1. (o) X= 5.42, s = 0.33; range= 1.79, g = 5.46 ,
5. Vertical line graph, 2, 3, 3.53, 1.985, 0.801, 0.771 2
Ql = 5.295, Q 3 = 5.615, outlier= 4.07 11
6. -0.482 3 0 1
7. (a) B (b) A (c) C 4 5 7 8 IKey: 2!5 means 9.25 a.m. \ (b) (i) 5.465
(ii) 5.47 outlier
12 25
Q 2 = £5.20, range= £8.85
8. (a) (i) 0.75 (ii) 0.28 5 3
(iii) 0.22
*---~ (b) x~£6,,~£2.47
(b) Frequency densities: 0.2, 1, 1.2, 1.8, 2.8, 0.6, 1.2, 1, (b) 9.19 a.m.
(c) A: X= £6.30, s = £2.47
0.4, 0.2 (c) 9.10 a.m., 29~ minutes past 9.
4.00 5.00 6.00 B: X= £6.30, s = £2.59
9. (a) 9.6 ruins, 1 min (b) 0.33 (d)
specific gravity (d) mean.remains the same; lower paid workers do not
(c) (4.65 mins, 14.61 ruins) benefit under scheme B.
2. (a} Boundary points for histogram
(d) (4.3 ruins, 15.27 mins) 9. (a} 8, 6, 4t, 3
9.00 9.10 9.20 9.30 9.40 9.50 689.5, 709.5, 719.5, 729.5, 739.5, 744.5, 749.5
10. (a) 0.143 (Q, ~ 17, Q, ~ 26, Q, ~ 38)
(b) 0.0668(Q, ~ 11.9, Q, ~ 16.1, Q, ~20.9) Time of delivery 754.5, 759.5, 769.5, 789.5 , 10
(c) 0.333 (Q, ~ 9, Q, ~ 11, Q, d5). First interval l.c.b. 689.5, u.c.b. 709.5 f.d.
f.d. 0.15, 0.7, 1.5, 3.8, 8.2, 7, 4.2, 3.2, 1.4, 0.5
(b) Plot (689.5, 0), (709.5, 3), (719.5, 10), (729.5, 25), 8
Exercise 1m Box plots (page 99)
0 2 4 X (739.5, 63), (744.5, 104), (749.5, 139), (754.5, 160),
1. (a) Plot (0, 0), (1, 8), (2, 19), (3, 36), (5, 44), (10, 50) (759.5, 176), (769.5, 190), (789.5, 200). 6 f---
(b) 2.35 mins, 1.4 mins, 3.4 mins (c) 744.24, 14.86
(c) Positively skewed (d) 744.01, 736.08, 752.12
(t) 0.046 4
LJ_j-----------;10 (f) 0.011
2345678x
Length of call (mins) (g) In b?x plot, draw whiskers from 689.5 to 789.5, with 2
10. (a) 6, 5
medtan and quartiles as in (d).
2. Group 1: Q 1 = 0.17; Q 2 = 0.21, Q 3 = 0.23; times from (b) More than 3 standard deviations from the mean 3. 16, 6 (e) 5.86 (b) 15, 7
0.14 to 0.26 (c) (i) older brother or sister also attended 4. 35 yrs 1 month, 11 yrs 3 months. 0~~----~r--r--~
Group 2: Q 1 = 0.16, Q 2 = 0.19, Q 3 = 0.22; times from {ii) a mistake had been made 0 10 20
(a) median= 33 yrs 9 months, IQR = 17 yrs 11 months 40 60 80 100
(d) 5.5, 5 (b) 61.8% length (mm)
0.09 to 0.25
(t) decrease 5, (e) 44.5
Group 1 (c) Approx. 2.5 mm (modal class is 0.;;; x < 5)
(f) positive, less (b) 51.75 (d) (i) 39.9 mm
Group 2
11. (a) height gain (grams)
36 0 9 9
(c) >.
~ 20 0 i! !,, "i (ii) 35 nun
10. (a) 275
37 6
l" :Iii! i (c) Comparative bar chart
11. 57, (a) it becomes 39
38 !;c;
0.08 0.26 39 1 7 7 9 ~ 150 (b) x = 3x- 141 does not have an integer solution.
Reaction time (sees) 40 2 3 7 3 !j 12. 10~.7.mm, .0.4 mm; machine B nearer 100 on average, less
3. Q 1 = 22, Q 1 = 35 Q 3 =51; whiskers from 16 to 97. 41 0 0 E vanatwn Wtth machine A.
Boundary for outliers 94.5; outlier 97 42 0 5 7 G 100
43 0 4 H~
44 5
45 !
Key: 39\7 means 397] 50 w::
10 50 90 97 46 2
Length of line (mml (b) Draw plots - New corn: whiskers from 360 to 462,
Q 1 = 397, Q 2 = 450, Q 3 = 426; Standard corn:
4. (a) u.c.b. 0, 20, 30, 40, 50; c.f. 0, 20, 40, 65, 69; 0
whiskers from 321 to 423; Q 1 = 353, Q 2 = 368.5,
Q 1 = 17.5, Q 2 = 27.5, Q 3 = 35; 7.5, 10; negatively 69.5 89.5
Q, ~ 383 mark
skewed
{b) u.c.b. 0, 20, 40, 80, 100; c.f. 0, 4, 10, 34, 44; Ql =40, Q.1 =64
Q 1 = 41. 7, Q 2 = 60, Q 3 = 78.3; 18.3, 18.3; negatively
skewed, zero quartile skewness
............---------------------
(a) Histogram to show times to complete half-marathon Mixed test 18 (page 116) Data set 2
13. (a) 1, could be 1 or 2
(b) Positive skew, possible outlier 3 1. (a) 1.15 (b) 1 (c) 1.09 (a) y=90.31-1.78x
{c) 2, 1.7; more than 3 standard deviations from the 2. {a) (i) Easier to see the spread (b) X~ 37.80- 0.39y
mean (ii) 1 1 2 2 3 4 4 4
(d) (A) a mistake. 2 1 56779
(B) could be correct. 2 1 1 1 2
2 5 5 7
,(,,~)~1~·:88~,;1~.4;8~~~~~--~~~~--~~ 3 1 2 2
14. 'Cost (£1000) ~50 ~ 60 00 000 050
3 5 5 9
c.f. 540 1690 3010 3870 4320 4 1 3
Plot (20 000, 0), (50 000, 540), (60 000, 1690),
4 4 5 /Key: 1!5 is 15cm I
(b) 24.6 em
(70 000, 3010), (100 000, 3870), (150 000, 4320). 75 85 95 105 115 125 135 145 155
time (mins) (c) 21cm
Q "' £63 000, IQR: a value between £18 000 and £23 000
2 (d) Median better; distribution not symmetrical
is acceptable (b) 96.15 mins 3. (a) «% .
15. f. d. 0.93, 2.4, 1.4, 1.6, 0.9 2. {a) 7, 6, 4, 8 (b) 33' 0 -f'llJ.L..'.CfLi'.li!.J.+.ill.IJ.illjJJ-'+.
Histogram to show age distribution (b) 6.55, 5.7, 8.1 4. (a) Median same for both. 0 5 10
(c) ~has 3 outliers; ignoring these, B's average waiting Good negative correlation
trme would be lower. 2. (a)
2 B's times are less variable than A's.
5.0 6.0 7.0 8.0 9.0 10.0
.,; A's times are positively skewed, B's are negatively
Blood glucose level (mmoi/C) 6
1 skewed.
(d) Positive skew. (b) {i) If outl!ers are not the Post Office's fault, choose B
3. (a) 4.5 (b) 1.5 for qmcker service,
0 (c) No change to mean, standard deviation is increased. (ii) If outliers are the Post Office's fault then the
20 30 40 50 60
age (years) 4. (a) Pie chart, bar chart situation could happen again and there could be a
(b) Children in school, sample not representative. long wait. A avoids long waits.
(b) 35! yrs. 5. (a) 006788 2
(a) 40.15 (c) f.d. 3.6, 6.4, 4.4, 1.4, 0.7
16. 10 0 0 1 2 2 3 3 3 4 4
Time(mins) Frequency Frequency density
10 5 6 6 6 6 7 7 7 8 8 9 9
0<:;;x<1 20 20 20 0 2 3 o+V~~~~lli4~~
1<x.;;;2 47 47 20 7 0 110 120 130 140 150 160 170
2<x<l.S 51 102 (b) Q 2 = 15.5 mins, Q 1 = 12 mins, Q 3 = 18 mins. temperature
2.5<x.;;;3 59 118 (c) f--{IJ----1 (b) y~ 0.614 + 0.0207x
3<x<5 138 69 3. y = -2.59 + 0.65x; 36.5
S<x.:;;lO 85 17 4. F = -6.33 + 0.90!, F = 20.8
5 10 15 20 25 30
5. )' = 3.8 + 1.6x, x = -2.06 + 0.59y
Times (mins)
6. (a) Y ~ 15.83 + 0.72x (b) 66 (c) 59
6. (a) (i) 3 hrs 3 mins 7. (a) y
(ii) Ql = 2 hrs 42 mins, Q 3 = 3 hrs 42 mins. ,,
~~
100
(b) 40, (200), 200, 60
(c) (i} 3 hrs 20 mins (ii) 54 mins
!r
80
words per sentence
NB: boundary points could be 0.5, 5.5, 10.5, 15.5, 25.5, Chapter 2 0
45.5
0
0 60
+i
(d) 13.8, 10.2 Exercise 2a Equations of least squares :;;
(e) 9.11 regression lines (page 136) §
5. {a) Histogram 40
1. Dataset!
3} mins, divides area in half. {b) Individual values are not known and mid points have
been taken as representatives of the intervals. (a) y = 4.50 + 0.64x ~ ,,,,
17. (a) 8, 9.5 mins
(b) Boundaries 0, 5, 10, 15, 20, 25, 30; (c) 69.5, 7.6
(b) x=4.42+0.75y
20
i!
f.d. 8, 11.2, 5.6, 4, 2.4, 0.8 (d) Median- no effect, IQR- no effect,
(c) 10 mean- increased. YT:rrrr:c;n'"'"''"'"''' ,, i
(d) A False, B True, C False, D True 7, 15, 35, 20, 13, 10 ii!
9, 5.43, 14.5 0
20 0 20 40 60 80
Mixed test lA {page 114) Male employees
units of output (1000's)
1. t f f. d. r--[IJ-------1 (b) 20.7 + 0.96x
(c) 31 000-33 000 units (d) Break-even point.
65<t<85 25 1.25 Female employees 8. y=1.8+1.3x
85 <t<-95
95 <t< 105
28
20
2.8
2
~ 9. y=-8+1.2x
10. c=15,d=-5
105 <t<-115 17 1.7 10 20 30
0
115 <t" 155 10 0.25 time (years) Good positive correlation
r
i
'li_ , 'L
8. -0.036, no agreement. y
11. (a) y, 3. (a) -0.558 · h 9. 0.84, strong positive correlation between number of years
20000
{b) Low unemployment appears to be linked to htg wage
smoking and extent of lung damage. 12.5 '''iii; iii j
4. 0.79
inflation, so suggestion justified.
12.0
L\'' i!J
5. 0.73, y = -25.4 + 0.53x, x = 94.4 + 1.01y w ,•• 11.5 H:i
15000 6, 0.60, w ~ -76 + 0.89 h
§ 7. 0.77
11.0
"
ro 8. -0.415 10.5
ir
"" 10000
'L
9. (a) 0.954 (b) 2, 3 10.0
ro
0 10, (a) y X X
Iii
c
c liiliilr (c) -0,92 (d) -0.9 9.5
ro L~,&
"
5
u
y~;:
5000 170 9.0
~ 160
!! j TiD !1'
0
E
-;;; 150
11. (a)
...•. .... . .. . 8.5
0
iii: i'i i;
0 20 40 60 80 •
~
• •••• : • • • • 0
0 40 45 50 55 '
140 (b) y~23.0-0.267x
(b) y d710 + 192x
(c) Appears reasonably satisfactory apart from Band~ 130 (c) 7500, There isn't a wide degree of scatter, so estimate
X X
who have earned substantially more than the equatton could be reliable, but in general it is unwise to
120 0,60, 0.60
suggests. extrapolate outside the range of data.
12. (a) 0.7, good agreement between judges.
(d) (i) y ~ 4210 + 192x 110 No. The points do not lie in a line.
(b)
p
(ii) y=4010+207x ~ X
0
(iii) y = 4160 + 100x
(c) It would contain a term for employees who work
away from home e.g. y =a+ bx + c, where c"" £3000
for employees who work away from home and zero
25 30 35 40 45
body mass (g) 'IL_
.: ·:_: YL·
..
.
(b) y ~ 48.35 + 2.75x
otherwise. (c) 0.787
X X
12. 0.3, 0.6
13. (a) Exercise 2c Spearman's coefficient of rank 13. (a) (i) -0.976 (ii) -0.292 (m 0.292)
(b) The transport manager's order is more profitable for
correlation (page 151) the seller, saleswomen is unlikely to try to dissuade.
0 10
(b) 0.935
1. 0.26 (c) (i) No, maximum value is 1
{c) b) indicates strong positive linear correlation and
130 2. (a) 0.43 1· (ii) Yes, higher performing cars generally do less
diagram confirms this is appropriate.
(b) Some agreement betwee~ avcra~e. att:ndance ran cmg mileage to the gallon.
".,
~ (d) p ~ 2.58 + 0,88T; 15
and position in league, htgh posttJOn m league {iii) No, the higher the engine capacity, the dearer the
6. (a) Em\ page 121 diagram 3
correlating with high attendance . car.
125 (b) y = 7.77- 0.005x
4. 0.033, little or no correlation. (d) When only rankings are known; when relationship is
5. -0.62, some agreement between the scores. (c) 5.77; treat with caution as outside range of data.
0 non-linear.
4.0 6.0 X {d) The lower the percentage moisture content, the greater
0 2.0 14. 0.84; very good agreement between the ran kings indicating
"'o additive G. (a) .fi"H+'i i+CTrc:1:criTl'jlf)ij\c:f I1WI
8;P;4+ strong positive correlation between the marks in English
the heat output.
7. (a) -0.901, strong negative correlation, the greater the
(b) y= 127 + 1.17x and the marks in History; E.
nun:ber of items finished, the lower the mean quality
(c) y score.
:ill Miscellaneous exercise 2d (page 160)
135 y
1. (a) y= 3.07 + 1.17x 8.0
{b) When they variable is the controlled or independent
variable.
2. (a) ton w is required; t = 18.8- 0.853w 7.0
(b) (i) -13.6'F (ii)-28.1'F
125 ~ii (c) -0.946, points lie close to the regression line.
(d) Good estimate for w = 38, since strong correlation. 6.0
0 '>:J lli
Estimate for w =55 needs to be treated with care since !ii
70 80 90
temperature extrapolation (outside range of data) is unreliable.
5.0 i!!
3. (a) Strong negative correlation
{d) Argument invalid since relationship between yield and (b) (2.275, 38.375) . (b) y ~ 6,85- 0.0072x
additive is not linear yield declines above 4.5% (c) Ranking both p and d from lowest to highest gJves (c) pH= 6.85 at t = ooc; for an increase of 10°C, 4.0
additive; suggest additive 4.5%, temperature 90°. pH drops by 0.07
~~9 . .~
(d) In general the population density ts greater neare (d) 6.71, reliable; 6.17, unreliable, outside range of data
Exercise 2b Product-moment correlation centre of the town and less on the outskirts of the (e) 48.6'C ;
3.0
coefficient (page 145) town. f
(e) H, low population density and distance rom cen
rrc of 0
:r
1. (a) 0.930, strong positive correlation 10 20 30 y
(b) -0.828, strong negative correlation town.
(b) Amend; possibly negative trend but not strong
(c) 0.867, strong positive correlation 0.3, 0.5, 0.7 d
Mrs Brown and John; 1) Headrests 2) Heatc rear correlation, (32, 3.7) is an outlier
(d) 0.742, positive correlation. (c) Ignore outlier; weak negative correlation between
window 3) Anti-rust treatment.
2. 0.82 number of items and quality score.
l
ANSWEHS 667
•
approx. £1 per item 0 10 13.
20. (a) y 0.~15; 8.8 hours; regression line gives average value 15. At least one tail is obtained; both coins show tails
u ~Jifi4t~llw~~ul,~rrwj~~~J
20 15 pomtsnotthatdosetolineasr=051 16. (a) . ·
h . . . · -5,·L111;2 mJtumJse
• . '. d
Frmt tree Other tree Total
w ere m; IS vert1cal d1stance from point tor Birds nest
i ijJii! li 3. (a} 1ne. 2 4 6
~lOlillliil!ll!!li!il!!lt
No nest
0
0 20 40 60 80" • y 5 9 14
Exercise 3d Tree diagrams (page 200) 3. (a) 4! 9! (b)* Mixed test 3A (page 231) 2. , - - - y - - ; : - - - - - -
4. ir 1. (a) fs (b) -b x 12 13 14.
Section A 5. ,16 2. (a) P(X~x) 12k 13k 14k
'k=-l9
6. (a) 8! (b) -1:8
1. 1,1 0.0025 (b) 0.095 12! b I 3 0.1
2. (a) f.r (b)~ 7. (a) (2!)4 ( ) 66
lo !
4 (a) (b) {c) 0 (d)~
3. (a) 0.24 (b) 0.42
s. N3
IP(:~x) I ;
5
4. (a) (i) 2~ (ii) ~ (iii)
(b) (i) fr
(ii) ¥s (iii) *
:jy
9. t.f:i
10. f.&
(a)
1
2
1
6. (a)
7. t6
*
5. 0.00599,0.987
(b) ij 11. (a) 210
12. 12
(b) fs (c) -}o (b)
X 0 1
'
2
•
3
s. H 13. (a) 'i~ (b) ~ (c) -io P(X~x)
' l l t
9. ft
10. 0.35
14. (a) 65 268 (b) 4263
15. 510 6. "
X 0 1 2 3
11. 0.825 16. H
12. (a) 0.5 (b) 0.5 (c) 0.375 17. 4608 P(X~x)
' ~ ,• -c- /?'
First Second
"'
'
Third
13. 0.788 18. (a) 1260 (b) 2520 draw draw
19. (a) 420 (b) B 252, G 462 (c) 120 (d) ,\\
draw 7. (a) ' ' (b) PIR ~,)
14. (a) 0.02 (b) 0.64 (b) Is
(c) ~ {d) ±
15. (a) fr (b) H (c) ;(4 20. (,) 5040 (b) 1680 (c) 672
3. (a) 0.4 (b) 0.2 (c) :M I
16.
17.
(a) 0.34 (b) 0.063 (c) 0.19 (d) 0.97; 3 wh;te
0.624
21.
22.
5005,720,72
5040 (a) 144 (b) 1440 4. {aJ M tbl m tcJ 135~ tdJ 3o 2
(e) The probability that a female employee is weekly
18. (a) l (b) ! (c) -&, {d) ! (e) i 23. (a) 2.5 x 10-' (b) 3 193 344
paid. (f) 0.5
24. (a) ! (b) ! 5 · (a) ft {b) ft {c) /:i {d) ii
Section B 25. 130
26. (a) 360 (b) 6 (c) 12 (d) 1170
1. (a) -& (b) t 27. (a) 64 (b) 18 (c) H Mixed test 38 (page 232) r
2. {a) ~ (b) ji (c) ~ 28. (a) 9! (b) f. (c) 1260 (d) l 1. (a) 0.64 (b) 0.75
(c) f
3. (a) P(A occurs, given that B occurs)
(i) mutually exclusive (ii) independent
29. (a) 75 (c) m
(d) (i) 6! (ii) 72
2. (a) q-0.25 2
(c) 11
(b)---
~· f,\x ~_x_:l_~_:o.:.:.1::_,l-x_~.::'o':C:1::_,2:C'.::'":.:·•:-_9_ _ _ _ _ __
1
30. 70, (a) 55 3(4q-1) " X 0 1 2 3
(b) 0.88, 0.05 3. (a) 0.857 (b) 0.135 (c) 0.13917
(b) 30 (d) 0.973
4. {a) 0.33 (b) fr 4. (a) 0.1792 (b) 0.1686 (c) 0.203 P(X- x) 0.216 0.432 0.288 0.064
(c) 65
5. (a) ~ (b) i {c) ~ 5, (a)
(d) j (b) 0.648
6. tbl H (e) ~ ~R 10
i
(ii) -l5
R~R~S
7. (a) fg (b) (i) X -5 5 15
(f) ~
8. (a) fs (b) t·~ (c) ~ P(X~x) ;
M -k·
< ~.·.·.'s~.··.·:
9. (a) 0.096 (b) 0.156; f, Tii
10. (a) 0.7, 0.68 (b) 0.28 (c) 0.65625
Miscellaneous exercise 3g (page 228) 11
X 1 2 3 4 5 6
11. (a) 3 ~ 3 (b) H~ (c) !
{d) ! 1. (a) 0.36 (b) 0.48 (c) 0.01024 (d) 0.98976
(i) Yes, no (ii) No, yes
12. (a) 0.000877 (b) 0.421 (c) 0.65 (d) 0.642
2. (a) C, C' (b) C, D (c) C, E
~R
P(X-x) fz ii n
8
n' ~ u
n
3. (a) 0.0902 (b) unsatisfactory test
13. (a) 0.042875 (b) 0.142 (c) 0.1215
(d) 0.189 (c) 0.334125; 0.642
4. 0.32, 0.467
5. (a) 0.325 (b) i~10 (c) -&
s~R~s X 7 8 9 10 11 12
14. (a) (i) :fi (ii) l~f (iii) ft (iv) ~
~s~R ' '7~
P(X-x) .£ u
6. (a) 0.28 (b) (i) 0.157 (;;) 0,363 (iii) 0.163 II '71. n
Ti n
ibl u1 o.o3o3 u;l oA5o liiil o.o348 (c) 0.0728 (d) 0.404 u
(c) (i) 0.36 (ii) 0.848 7. 0.166, 0.580 ~s w; Equally hkely outcomes
12. ~or x = 8, draw verticalline·to 0.2; for x = 9, draw vertical
1s. H.~ 8. 5040 (a) 720 (b) 1440 1997 1998 1999 hne to 0.3; symmetrical distribution.
16. (bl i2 (cl Ns tdl # 9. (a) !
3 3 (b) ~ (c) ~.g (d) 3 ~ 3 (e) ~ (f) 6 (b) 0.372
17. (a) 0.36 (b) 0.6875 10. (a) H (b) M (d) ~(c) No (c) j!J4 Exercise 4b Expectation (page 244)
11. (a) (;) 0.005 (i;) 0.0955 (b) 0.999 (c) 0.136 (d) 8
Exercise 3e Useful methods (page 206) 12. (a) (i) j (ii) ~ (iii) i (b) fs (c) fo 1. 2.25
2. (,) 5 (b) 6
14. (a) 792 (b) 210 (c) i'f, (d) 120 (e) 0.1 (f) 0.1 Chapter 4 3.
4.
(a) 0.3
1
(b) 2.9
15. (a) 40 320 (b) (i) 1440 (ii) 5760
3. 0.5, 6
(c) (i) ~ (ii) ~ (d) 576 (e) is 5. H
4. 0.999 Exercise 4a Probability distributions 6. 0.75p
5. 22 16. (a) ~ (b) -fi
(c) independent (page 236) 7' ,--x--,~1~0--2-o-
1
6. fr (d) j, P(AICI • P(A) (c) {,
1 (a) 0,1 P(X = x)
7. 1; 8
P(X ~ x) 0.4 0.6
8. 0.5 (a) ~ (b) N6 (c) -#Nr;; fr o.4--/it'ii•i·~i++il.f.tci·>ilLI+i
9. (a) (i) l (ii) lz (iii) j (b) -b 8, (a) 0.3 (b) 0.2
9. (a) 0.2 (b) 2.08
Exercise 31 Arrangements, permutations, 10. 2.75
combinations (page 219) 11. ~~--.-:-----,---,--------
X 4 6 8 9 · n 14
1. 9!, --h
P[X- x) 0.16 0.32 0.16
2. (a) 6! (b) t 012345X 0.16 0.16 0.04
(b) Iii 0.85 (ii) 0.55 u;;) 0.5 (iv) 3 Loss of £1.20.
~··- ~-- ·- '''""'2.~""'"-"<"---·
----~·~
2x-1 12.
12. £~(7+x) {a) 5 (b) Loss of £3.75 5. {a) {b) l {c) P(X=x)=- -,x=l, 2 , 3 {d)!!• X 2 3 4 5 6 7 8 9 10 11 12 Exercise 5b The binomial distribution
9
13. {a) 24
is (page 285)
{c)
6. {a) (b) ~ {c) P(X = x) = \, x =1, 2, 3 {d) 0.816 P{X=x) 2~ -b '
IT 21- z1- tJ IT' fi is E
l
X 0 2 3 4 7. {b) 1. {a) 0.267 {b) 0.850
1 2 3 4
P{X-x) i ' t 0 ,,' X
l 13.
7.2,£75
{a) -:k (b) :& (c) U; --b,, 7
2. {a) 0.234 (b) 0.000107 (0.0001 from tables}
{d) 1
' '' " 14. (a) i,
-f4 {b) 2.78 {c) 0.260
3.
4.
{a) 0.279 {b) 0.983 {c) 0.594
{a) 0.00549 {b) 0.157 {c) 0.503
(c) 2 1_l4 , 0.547 (d) ! 15. {a) 0.8 {b) -0.24p {c) 3.34p'
14. ¥ 8 . {a) 0.9900 {b) 0.1746 {c) 0.5886 16. {a) 1.7, 1.18 {b) 4.76
5. 0.00200
15. 2 6. {a) 0.318 {b) 0.671 {c) 0.647 {d) 0.0324
{d) 0.5565 {c) 0.9785 17. (a) 1, ~ (b) ¥, f~ {c) 11.2, 7.28 7. 0.344
Exercise 4c Expectation and variance t 0 1 2 3 8. 0.5
Exercise 4e Combinations of random variables 4
9. 0.3456
(page 251) (page 261) ,,
P{T=t) ls is l
"" is 10. {a) {i) 0.0424 (ii) 0.623 {b) 12
1. {a) 2.3 {b) 5.9 {c) 0.61 11. 0.0963, improve with practice
1. {a) 26 {b) 15 {c) 17 {d) 59 {c) 59
2. {a) 0.35 {b) 4.2 18. P(X =x)=1;, x = 1, 2, 3, 4, 5; P(X = 6) = 0; P(X =x)= __I__ 12. {a) 0.0105 {b) 0.988 {c) 0.358
2 (a) 0 or 12 or -12 (b) 294 36
3. {a) 1.45 {b) 2.45 {c) 12.15 " X= 7, 8, ... , 12; 4A-, f, '
3: {a) 1 {b) -1 {c) 34 {d) 14 {c) 14 {f) 30 13. {a) 0.329 {b) 0.461
4. {a) 3.5 {b) 15) {c) 14.5 {d) 2n 14.
4. {a) 1.3, 1 1 01 0 8 X P{X=x)
5. {a) 2.56 Mixed test 4A (page 269)
6. {a) 3.5 {b) 14 {c) 5.5 {d) 84 {c) 1.75 {b) 0 1 2
x+y 1. {a) 0.2 {b) 8 0 0.0156
{c) 11.6
7. {a) 2 {b) 3 m -3 0.32 2. {b) 1 0.0938
P(X+ Y=x+y) 0.12 0.14
8. (a) 312 , 1, 1~~ X 0 1 2 3 4 2 0.2344
9. {b) 6 8 10 12 5 . 3 0.3125
l
''
4 4 P{X=x) 1~
l
X x+y 3
' '
N 4 0.2343
P{X=x)
"' P(X+Y=x+y) 0.2 0.18 0.04 (c)
3. {a)
1t (e) 0, ~ 5
6
0.0938
0.0156
{c) 4 {c) 0
X 0 1 2 3 4 5
x-y 2 1
10. {a) 4.2 {b) 7\ {c) 3.67 P(X = x)
P{X=x) g fs ~ ! I is
11. {a) TO (b) 3i (c) 15fo (d) 2~
' (e) 47H- P{X- Y=x-y) 0.12 0.14 0.32
]li!~l :.
12. {a) 1' {b) 3j (c) ~ {b) 117 (c) ~
13.
'
1 2 x-y 1 2 3 '" 0.3
X 0
0.04
Mixed test 48 (page 269)
P{X- Y=x-y) 0.2 0.18
j
P{X=x) j '' 5. {a) 2.6, 0.24 {b) 5.2, 0.48 {c) 7.8, 0.72
1. {a) 0.4 {b) 0.8 {c) 2.6 {d) 1.44 {c) 15.6 0.2
(a) ~ (b) ¥- (c) ~ (d) ~H;l 2. {a)
X 4 5 6 8 9 12
14. {a) 5 {b) 2.5 {c) 10 {d) 10 6. 29! f 3
7. {a) 0.1 {b) 3 {c) 1 {d) 0.2 {c) 12 I)
I
'' '
15. 144 P{X=x) ;I -k
16. {a) j {b) 0.639
Miscellaneous exercise 41 (page 266) {b) 6i, 43fi, 3-H- (c) Loss of £1 (d) J\
10-x .312211· 3. {a)
17. P(X=x)=~,x=1, 2 , ... , 9, 3' · ' • 1 2 3 4
1. 0.1825,£1.75 6 12
P(X = x) = (~y-1(!), x = 1, 2, .. . 2. (a) -b (b) 2, H X
P{S=s) t l l
'
18. {a) t',
{b) 0 {c) 6 {d) 2.45 3.
4.
{a) 0.01 {b) 3.54, 0.4684
v9o> 2.57
{c) 14.7, 11.71
{b) 4.5 {c) 11
' ' ' '' 15.
symmetrical
9
19. {a) 0.04 {b) 5 {c) 4 {d) 7 {c) 16
20. {a) Lo" £3 {b) {i) p = 0.12~q = 0.08 {ii) 645, 8 5. -fs, 3 5 1 25 12, 20 16. 68i not strictly binorniaJ asp is not constant, but model
21. {a) £2 {b) {i) 4 {ii) 17 {m) 1 6. {a) 5 can be used if there are a large number of bulbs in the box.
X 1 2 4
Chapter 5 17. (a) 0.000416 (tables give 0.0004) (h) 0.0197
Exercise 4d Cumulative distribution function P{X=x) A i2 j ! 18. 5
Exercise 5a The uniform and geometric 19 · ' x_ _ _P:c{c:Xc-=-x-c)'
(page 255) {b I 2 3 4 5 6 7 8 9 10 distributions (page 276)
y
1.
I PlY< y)
y 0.1
0.05 .
. 0.2
0.3
0.3
0.6
0.4
0.75
0.5
1
P{Y=y) 1~4 n
;
.
,'' Tii' "" ~
'' '' "'
1.
2.
{a)
{a)
0.2 {b) 8 {c) 0.4
0.096 {b) 0.179 {c) 0.725 {d) 2.86
0
1
2
0.0000
0.0001
0.0011
3. {a) 0.9744 {b) 0.01024 {c) 1 {d) 1j {c) 2.5
4. {a) 1 {b) 0.7599 3 0.0109
2. {a) 0 41 {b) 0 87 {c) 0.46 {d) 0.13 {c) 2.58 4 0.0617
{c) 1.25 5. {a) 0.0226 {b) 0.00374
3. Ia) 0 1 2 5 0.2096
X 6. (a) (i} 0.6 (ii) 0.3 (iii) 4.5 {iv) 2.87
ibi iii 0.0531 Iii) 1 {iii) 10 6 0.3960
F{x) ~ "'" 1
9. {a) 0.1248 {b) 2.8352, 236
7. {a)
8. {a)
1 {b) 2 {c) 1.41
0.128 {b) X- Goo{0.2) {c) 0.512 20. 4
7 0.3206
b) X 1 2 3 4 5 6
9. {a) P{X=4)=0.7 1 x0.3=0.1029
{c)
F{x)
*
~ '
t ~
""
1
10. {b)
P{B=b)
b 0
''
1
l
2
~
3
~
"
.
4
"
(b)
(c)
The first success is at the nth attempt.
There arc at least n attempts before the first success is
obtained.
21. Experiment 1- no, 3 outcomes; Experiment 2- yes,
constant probability of obtaining black {or white),
independent trials; Experiment 3 - no, trials not
independent.
X 0 1 2 3
z (c) 1t~ (c) M 10. 0.7225
F{x) ! ' 1 11. 0.00026
Exercise 5c Expectation, variance and mode of
' ' 11.
y 0 1 2 3 12. 2
the binomial distribution (page 290)
4.
I P{X
X
x) 0.01
3
0.22
4
0.41
5
0.22
6 7
0.14
; 0 .9724 P{Y=y)
{a) 1.22
0.3
{b) 1.0916
0.34
{c) 0.36
0.2 0.16 13.
14.
15.
{a) 0.0864 {b) 2.5 {c) 1.94 {d) 1 {c) 0.028
£1.75
0.0047, December 22nd
1. 2.5, 1.5
2. {a) 1.38 {b) 4
16. tal i tbl N6 (cJ m
tdJ 1 (eJ 6 ttl 11 3. 8, 1.30
4. {a) 0.2 {b) 0.00551
672 A CONCISE COURSE IN A LEVEL STATISTICS
ANSWERS 673
8. (o) 0.223 (b) 0.116 (c) 9.28, 2.86 (d) 18.9 3. (o) 0.25
5. (o) 0.25 (b) 2.5 (c) 0.282 4
(e) Part {c) gives 223, part (d) gives 227, increase 11. (b) 2, 4 - -
6. 0.1, 0.23 9. {a) Large number of balls (b) 0.799
(b) f s
f(x)L l(x):~ 1 - 0.25x ln 3
7. (a) 10 (b) 0.000390 12. a=2,k=0.75;
10. 0.790, calls occur randomly f(x)
8. (a) 3 (b) 3 (c) 0.633
11. (a) 0.104 (b) 0.283 (c) 0.00113 (d) 9 0.5 ""
9 (a) 0.994 (b) 2 0.75 ~=0.75x(2-xl
12. 0.632, 0.069, 0.154
10. 0, 0, 3, 13, 30, 36,18 13. (a) (i) X- B (28, 0.004) (ii) 0.00545 (b) 0.785 0 3 X
11. 2500 (c) 0.66
(c) independence 0 1 2 X
12. 0.06; 293, 94, 12, 1, 0, 0
14. (o) 0.311 (b) 0.959; 3.6, 1.2 4. (a) 5k (b) ~~, ~ ; 0.2
l3. (o) 0.68 (b) 8, 1.6 5. c=1,k=4
15. (a) 0.253 (b) 3.6, 1.59 13. 0.6, 0.2
14. (o) 0.25 (b) 1.5 6. (a) 0.125 (b)
16. (a) (i) 0.201 (ii) 0.00637 (b) 2 (c) 5, 2 (d) 14
15. 1, 0.894 (a) 5 (b) 0.2 l(x)j £fix)~ 0.125x
17. (a) 0.203 (b) (ii) 0.136 (c) 0.316
(d) Assume p constant; very unlikely in First World War 0.5~
r::~~i~~~~ Cumulative distribution function
Exercise 5d The Poisson distribution l" F(x)
1
(page 297)
1. (a) 0.180 (b) 0.0527 (c) 0.195 (d) 0.670
18. P(X=x)=e- -,A,A
x!
(a) 0.082 (b) 0.242; 6.15 (c) 0.328
0 4 X
1. (a) F(x)~~~' 0<x<2 ,J r--
2.
3.
(a) 0.983 (b) 0.184 (c) 0.199
(a) 0.0821 (b) 0.560 (c) 0.0631
19. (a) 0.908 (b) 9
20. (o) 3, 7 (b) 20,20
7. (o) 0.25
(b)
1 x>2 JL._
f{x) (b) 1.59 0 2 X
4. (a) 0.603 (b) 0.616 (c) 0.00246 Reason for (a) E(Y- X)* Var(Y- X)
F(x)=~~(Sx-x2-7)
0.75
5. (a) 0.0821 (b) 0.242 (c) 0.759 Reason for (b) 2 Y + 10 could not take values less than 10. 2 {a) 1 <x.;;; 3
(d) 0.0486 (e) 0.125 21. 600 m, Po(2.5), 0.0821, 0.109, 0.779, 0.207 (b) _:1
0.5
6. (a) 0.191 (b) 0.0498 (c) 2.45 22. (a) (ii) 1.5 (b) 0.577 (c) 0.0249 1 X# 3
(b)F(x)~~~(x-1)
7. 0.371 23. 0.407, 0.366, 0.165, 0.0629, 0.816, 0.0518 0.25-P---J
8. (o) 0.0382 (b) 0.122 24. (a) 22 (b) 19; 39 3.(a) 1<x<6 (c) 2 (d) 2.5
9. 0.677 25. (a) 0.135 (b) 0.323; 0.81
10. (o) 3 (b) 0.145 26. (a) 0.387 (b) 0.929 (c) 0.893 0 3 X 1 x>6
11. (a) 90, 72, 29, 8, 1, 0 (b) 44, 44, 22, 8, 2 (d) 0.205 (e) 0.816; 0.0290 (c) 0.25 (d) 0.3125 (e) 0.3475 X O.;;;x<l
~
12. Random events; 0.5, 0.481; 31, 16, 4, 1, 0 27. (a) 0.0902 (b) 0.0613; 4
l3. (a) 0.261 (b) 6 Exercise 6b Expectation E(X) {page 323 ) 4. (a) F(x) ; 2
(b) 2
:(x -3x+4) 2 <x< 3
Exercise 5e The Poisson approximation to the
binomial (page 300)
Mixed test 5A (page 312)
1. (a) 0.159 (b) 0.766;
Query independence: friends may have joint engagements.
1. (a)
2. (a)
{6 (b) 1 (c) 2 (d) 1.6 (e) 2l4
fix)
5. (o) 0.1215
I (b) 0.841
x>3
(c) O.SSO
~ ~->
1
1. (a) (i) 0.0476 (ii) 0.0498 (b) (i) 0.225 (;i) 0.224 2. (a) 0.152 (b) 0.567 (c) 0.285
0
<x 0
kl/
.""
(c) (i) 0.171 (ii) 0.168 3. (a) X- B(150, iJ), A= 1.875, p < 0.1, n >50 6. (a) 1.5 (b) 0.75 (c) F(x)
2. (a) (i) 0.184 (ii) 0.0190 (b) 0.271 (c) 0.0498 (b) 0.559 (c) 369
x>3
3. (a) 0.287 (b) 0.191 4. (a) X- Po(0.6), X is number of boxes in a square km.
0 1 4 (d) 0.4 (e) 0.2
4. (a) -k (b) 0.713 (b) 0.549 (c) 0.0231 2 3 X
5. (a) 0.647 (b) 0.185 (d) Probably not suitable; different scatter of telephone (b) j (c) 2 F{x)
3. 3
1~
6. 0.109, 185 boxes in the city.
7. (o) 0.677 (b) 0.017; 1498 S. (o) 4.8, 0.98 (c) 0.737 (d) 0.388 4. 6m
8. (a) 0.468 (b) 0.703 5. (a) fs (b) -w (c) 0.48, money bond
9. 0.0150 Mixed test 5B (page 313) 6. 2, 0.124
10. (a) 0.47 (b) 0.041 7. 2.5, 0.803, 0.456
Poisson applies since p < 0.1 and n =50. Events may not 1. J (a) 5(1- p)p' (b) 10(1- f>)'p' 8. (a) 2.875 kg (b) £4.75, ?
3 X
6
be independent. After mis-dialling, you are likely to be 2. (a) Y - Geoi!l (b) 30 (c) 0.233 9. (a) 0.4 (b) 2.6 (c) 1.5
'*!L_
1
l'
more careful. 3. (a) Binomial (b) Poisson (c) e-
11. Random sample, 0.305 6 Exercise 6c Standard deviation and variance
Exercise 51 Sums of Poisson variables
(d) 1- e-1 ( 1 +A+~} 0.013, 0.014, 0.182 {page 333)
1 (a) 1.5 (b) 2.4 (c) 0.15 (d) 0.387
(page 303) 4. (a) 0.221 (b) 0.987 2· (a) 0.5 (b) 2i (c) 2-fl (d) 1 44 0 1 2 X
5. (a) 0.249 (b) 0.929 (c) 0.508; 0.542 (a) 1t (b) 3~ (c) j--1;- (d) 0 )53
F(x)=~~(x'-1)
1. 0.121 3.
2.
3.
(a) 0.189
(a) 0.323
(b) 0.308
(b) 0.119
(c) 0.184 4· (a) ;-134 (b) 1jf (c) (d) 0.545 m (b) 0.272 (c) 1 <x<2 (d) 1.65
5 (a) s (b) ~ {c) -f5 {d) 0.163
4. (a) 0.301 (b) 0.080 (c) 0.251 Chapter 6 6· (a) ~~ (b) 4-b (c) *~~ (d) 0.912
x>2
Miscellaneous exercise 5g (page 307) Exercise 6a Calculating probabilities 7· (a) 18 {b) ~64 (c) ?;;~10 (d) 0.672 3 19 ~~x-__!__x3 0 <x< 2
(page 319) 8 (a) ~ (b) 3, ~ {c) z_ 8. 4' 80' F(x) ~ 41 16 , 0.007
1. 0.752, 0.537 9. (a) 1 (b) 1 (c) l '(d) JJ ( ) 1 x>2
2. (a) 3 (b) 0.223 (c) 0.988 1. (a) i (b) ~ (c) H 10
.
I I
a f{x)
6 .n e
9. (a) j,!
3. (o) 0.733 (b) 0.0703 2. (a) f(x)
4. (o) (i) 0.434 (ii) 0.378 (iii) 0.148 (;v) 0.0401
(b) (i) 45 (ii) 111;N>20 k t {flxl~x 1
x' 2x 2
---+-
6 3 3
2 <x< 3
5. 0.507
6. (a) (i) 0.130 (ii) 0.271 (iii) 0.276; 65,0.0159
3./ (b) F(x) =
X
3
5
6
3 <x< 5
(b) 90,3 -2 0 3 X 0 5 X
x'
7. (a) 0.270 (b) 0.350 (c) 0.182 (b) 3fi {c) 12~-~ (d) 1.008
2x---5 5 < x<6
(b) 0.2 (c) 0.74 6
(d) 0.124 (e) £45
1 x>6
(c) j (d) -14
67 4 /\ CONCiS[ CGU f-\SE ,6,- !._E''/E l_ S,TI\TlS! iC>-; M'ISWms 675
\
(b) 2\ (c) (d) 2.16 ~(x-1) 1 <x< 3
- 12x 3
1
l
a -1 <x<O
16~ 1
3
7. (b) 2,2 (c) 1.71 (d) 0.264 (e) 0.3645
8. (b) 0.5 (d) 0.36
18. (a) j (b) f(x) = 2a 0 .;;;x < 1
\
~x 0 <x< 1 0 1 3 7 X
11. (a) 0 (b) 0.15625 (c) (i) symmetry \ii) 0.05
7. ::: xJ 0 (x.;;; 3
2 Income(£)
X
1 X ): 3 (b) ft (c) 120
8 9 (d) From original data, 106 have income in this range. In
(c) f(x}=~x 1 ,0.;;;x.;;;3
0.0125x 2 0.;;; x.;;; 8 8. (a) 2 (b) {(x)~2,0<x<0.5 (c) 0.25 (d) 0.144 the model, f(x) = 3k, 0.;;; x.;;; 4 gives too high an esti-
(c) F(x) = o.2x- 0.8 8 .;;;x.;;; 9 mate; perhaps f(x) = 2.5k, 0.;;; x < 4 would be better,
~x-~x 1 -2_
4. 0.4 {b) 0.650 (c) 0.794 (d) 3.75 (f) Negatively skewed
16. (a) 0.455, 3 (b) 3.64, 4.95 5. 0.577 (c) F(x)= 1 .;;;x.;;; 4
1
-lnx
(c) F(x) ~ 1n 9
- 1.;;;x.;;;9 6.
7.
{a) 4.5 (b) 2-b
(a) ad, b ~ 11 (b) 0.125 l1
3 12 3
X ): 4
Mixed test 68 (page :359)
1, (a)
1 l
\
~(x-3)
'[L
1 x>9 3 <x< 11 (d) £283.33 (c)
(c) F(x) ~ 8 14. 8, ~, 39litres
F(x)~g-0.0 1 (x- 1 0)' O.;;;x<10
:
1 X): 11 15. (a) 2.93 (b)
8. (a) f(x) ~ 0.2, -2 <; x <3 (b) 1.44 (c) 2 ..1 (d) -1 X) 10
F(xl
0 1 9 X
Miscellaneous exercise 6g (page 355) 0 1 2 X
1. (a) 1_!_ (b) 6i (b) 1.6 (c) 0.327 (d) F(m) ~ 0.5, Fit<!< 0,5
Exercise 6e Obtaining f(x) from F(x) '
2. (a) 2.4 (b) 20_,,
.
0.178 ' 2. (b) fs (c) 0.577
m >p
f(x)~~~x1
(page 343)
1. (a} ((x)=!,2.;;;x.;;;6
0 .;;;x < 1
3. {b) 1.25(1-~)=0.5,m=11
0 10 X
3. (a) '3 (b)
(b)t~ 1 .;;;x.;;; 3 1.25
+L
1--x (c) f(x)=!-s'ox,O.;;;x.;;;10 (c) 0.495 (d) f(x) ~-, , 1 < x <5
3 X
0 otherwise f(x)
1L ~~
0 2 4 6 X (c) 1.27; 0.875
(c) 4 (d) 2
4. 0.8, 0.16,£8
0.25 0 10 X
2. (a) 0.794 (b) 0.75
676 A CONCISE COURSE IN ,4-l_EVEL ST/\TiSTiCS /\f\bWH\S 677
12. (a) Weevils are randomly scattered in the grain, the grain 5. (a) 0.5 (b) 0.8849 (c) 0.2779
9. 39.5, 5.32
is selected at random.
Chapter 7 10. 53.87, 16.48
(b) (i) 0.950 (ii) 0.105 (c) 0.158
6.
7.
(a)
(a)
0.0207 (b) (i) 0.0289 (ii) 0.0200 (iii) 0.6252
0.1247 (b) 0.6957
11. 0.203 13. (a) 0.953 (b) 0.745 (c) 0.19
Exercise 7a Finding probabilities, where 12. 92.7%, 1.32, 1.7% 8. (a)0.6298 (b) 0.1056
14. (b) 0.133 (c) 11 (d) 0.7119
z- N(O, 1) (page 367) 13. 4.299 g 9.
10.
(a)0.1728 (b) 0.6127 (c) 0.5
0.2575
14. 4.46
1. (a) 0.8089 (b) 0.8089 (c) 0.1911 (d) 0.1911
15. 2080, 236
Miscellaneous exercise 7h (page 398) 11. 0.1103,0.753
2. (a) 0.0359 (b) 0.2578 (c) 0.9931 (d) 0.9131 16. (a) 0.4875 (b) 281, 5.00 12. 9.6, 0.522; (a) 1.8% (b) 22.2%
1. (a) 46.5% (b) 0.532 m (c) 1.00 M
(c) 0.0049 (f) 0.9911 (g) 0.9686 (h) 0.2343 13. (a) (94.4, 105.6) (b) 92.55% (c) 22.14%
17. 5.2007, 0.00346; 0.0269 2. (b) 0.0693 (c) 0.0746
Iii 0.0312 iii 0.9484 lkl 0.9803 111 o.oo21 18. (a) 0.1587 (b) 128.4 (c) 1.31 3. (b) 11.5% 14. (a) 0.0787 (b) 3.02 x 1o-'
3. (a) 0.05 (b) 0.05 (c) 0.0999 (d) 0.025 (c) 0.005
19. 0.0401 (a) 0.459 (b) 0.003 4. 50.154,4
(f) 0.01 (g) 0.0025 (h) 0.075 Exercise 8b Multiples of normal variables
20. 490 g, 12.2 g 5. (a) (i) 0.0062 (ii) 0.5598 (b) 7.49 m (c) 0.27
4 (a) 0.044 (b) 0.8185 (c) 0.1336 (d) 0.3023 21. (a) 19.50 (b) not symmetrical (c) 32 (page 413)
(d) Brian, since P(X;;:;. 8) = 0.0207 whereas for Alan
5. (a) 0.1703 (b) 0.5481 (c) 0.3639 (d) 0.4582
P(X > 8) ~ 0.0062. 1. (a) 0.8962 (b) 0.9386
(c) 0.4798 (f) 0.9624 (g) 0.0337 (h) 0.9082
Exercise 7e Continuity corrections (page 386) 6. (a) 0.886 2. (a) 0.2398 (b) 0.2523
01 0.2729 Iii 0.030 lkl 0.925 111 0.4508 (b) Data not symmetric but showing a positive skew.
(m) 0.9 (n) 0.02 1. P(2.5 <X< 9.5) 3. (a) 0.244 (b) 0.659 (c) 0.409
7. (a) 1.2 (b) 53.6 (c) 54.2; 0.066
4. (a) 6, -fi (b) 0.2074 (c) 0.7601 (d) 0.5143
6. 50% 2. P(3.5 <X< 8.5)
8. (a) (i) 4.95% (ii) 0, 1, II 5. 0.2762
7. (a) 0.9 (b) 0.7 3. P(10.5 <X< 24.5)
(b) (i) 105.3 (ii) 106.45; 106.45 6. (a) 0.3446 (b) 0.6915; 0.0033,0.304
8. (a) 0.55 (b) 0.15 4. P(1.5 <X< 7.5)
(c) (i) 103.3, 3.98
9. (a) 0.9 (b) 0.1 5. P(X > 54.5)
(ii) needs overhaul, standard deviation too high.
6. P(X> 75.5)
9. (a) 14.25 p (b) 736 g (c) 462 g
Miscellaneous exercise 8c (page 417)
Exercise 7b Finding probabilities using 7. P(45.5 <X <66.5)
10. (a) (i) 0.250 (ii) 0.758 (iii) 0.00240 (b) 0.0433 1. (a) 0.60 (b) 0.20 (c) 0.95 (d) 0.5
X- N(/t, a 1 ).(page 370) 8. P(X <108.5)
11. (a) (i) 0.197 (ii) 0.820 (b) (ii) 19 (c) 0.2142 2. (a) 0.051 (b) 0.00155 (c) 0.9782
9. P(X < 45.5) 3. 1000,172,3000,298,0.16,0.02
1. (a) 0.0668 (b) 0.4013 (c) 0.1747 12. 0.360, 0. 734
10. P(55.5 <X <56.5) 4. (a) 0.0888 (b) 0.6611
2. (a) 0.7054 (b) 0.0618 (c) 0.4621 (d) 0.00456 13. (a) 0.653 (b) 0.2224
11. P(400.5 <X< 560.5) 5. 0.0625, 0.2574, 0.5, 0.7123
3. (a) 0.0548 (b) 0.1448 (c) 0.9544 14. (a) (i) 104 (ii) 33 (iii) 33 (b) 1000,200
12. P(66.5<X<67.5) 6. (a) 0.0139 (b) 0.1587 (c) 0.9332
4. (a) 0.0106 (b) 0.9857 15. (a) 0.3154 (b) 0.3068; worse, 0.5245
13. P(X > 59.5) 16. 979.27, 17.27, 133 7. (a) 0.159 (c) 0.584
5. (a) 0.3015 (b) 0.5231 (c) 0.3792 14. P(99.5 <X <-100.5)
17. (a) random events, mean"" variance 8. 12 kg, 57.0 g, 3.97%, 765 g
6. 740 15. P(33.5 <X< 42.5)
(b) 0.224 (c) 0.586 (e) 0.6201 9. (a) (i) 0.1056 (ii) 0.8882 (b) 1028 g (c) 0.0537
7. 0.00003844 16. P(6.5 <X< 7.5) 18. (a) 0.988 (b) 0.855 10. (a) (i) 0.1056 (ii) 0.144 (b) 0.0188
8. (a) 0.6554 (b) 8 17. P(X > 508.5)
9. (a) 0.0478 (b) 0.000817 (c) 0.783 (Poisson), 0.784 (binomial) 11. (a) 0.1416 (b) 0.5999 (c) 14.96 m (d) 0.3043
18. P(X < 6.5) 12. (a) 0.798 (b) 0.323 (c) 0.132 (d) 0.228
10. (a) 0.9544 (b) 0.5784 (c) 0.0435 19. (a) 0.649 (b) 0.965 (c) 0.371
19. P(26.5 <X <28.5) 13. (a) 0.252 (b) 0.0581 (c) 0.104
11. (a) 0.1056 (b) 0 7734 (c) 0.6678 20. (a) 0.988 (b) 0.624 (c) 0.828
20. P(52.5 <X <53.5)
12. 0.159, 0.775, 0.067,£37.56 21. (a) np > 5, nq > 5, X~ N(np, npq)
13. 0.785, 0.397 (b) p < 0.1, n >50, X- Po(np); 0.859 Mixed test 8A (page 419)
Exercise 7f The normal approximation to the (c) 0.204 (d) 0.034
14. 0.957 1. (a) S- N(600, 105.8), 0.0724 (b) 0.8392
binomial (page 389)
(c) 0.1606 (d) 30.54 g
Exercise 7c Using the standard normal tables 1. 0.1958 Mixed test 7A (page 401) 2. (a) 0.733 (b) 0.984
in reverse (page 376) 2. (a) np > 5, nq > 5 (b) 0.0197 (c) 0.0968 3. (b) 0.0802 (c) 0.6729
1. (a) 29% (b) 402.62 ng/m1
3. (a) 0.0154 (b) 0.8145 (c) 0.02
2. (a) 25 (b) 0.673
1. (a) O.D15 (b) 0.796 (c) -1.887 (d) -0.454 4. (a) 0.657 (b) 0.2142
(c) -0.562 (f) 1.019 (g) 0.842 3. (b) 0.0113 (c) 0.86 Mixed test 88 (page 420)
5. (a) 0.0318 (b) 0.8345 4. (a) 0.0548 (c) 0.356
2. (a) 1.94 (b) -0.695 (c) -0.915 (d) 0.722 6. (a) 0.9474 (b) 0.6325 (c) 0.5914 (d) 0.0111 1. (a) 0.127 (b) (i) 0.0016 (ii) 0 (c) 0.1003
3. (a) 0.91 (b) 1.66 (c) 0.674 (d) 2.05 7. (a) 0.4502 (b) 0.0996 (c) 0.484 2. (a) 0.8413 (b) 0.5 (c) 0.4207; 0.9938
4. 0.674, -0.674; 0.524 Mixed test 78 (page 402) 3. 0.84
8. 20, 16,0.00436
5. (a) 70 (b) 4.65 (c) 190.742 (d) 1.468 9. P(R=r)="C,(1-p)"-'p',np,np{1-p) 1. Luxibrite, 0.936
6. (458.92, 546.52) (a) 0.2304 (b) 0.9222; 0.8531 2. (a) 0.1056 (b) 0.8641 (c) 815.68
7. (a) 0.6247 (b) 629.52 g (c) 3 10. 0.1432 3. (a) (i) 0.8944 (ii) 0.4931 Chapter 9
8 8 1.158, (6.10, 9.90) (b) only able to stay for a maximum of 60 minutes
9: (~) (384.32,415.68) (b) (394.608,405.392) 11. 0.6886
12. njJ>5,nq>5 (a) 0.1853 {b) 0.1838 (c)
0 81"'10
· (c) mean+ 3a gives 6.55 pm Exercise 9a Sampling methods (page 430)
10. (a) 0.9332 (b) 0.383; 106.6,137 4. (a) 7.5 (b) randomly scattered (d) 0.901 (e) 0.2627 2. {a) 6, 6, 6, 6, 6, 5, 5
11. (a) 0.0548 (h) 26 (c) 67.4 (d) 2183 Exercise 7g The normal approximation to the 5. (a) (i) 0.0808 (ii) 0.1935 4. (b) large: medium: small= 15:25:20
12. (a) 37.8% (b) (125.5, 194.5) (c) 0.405 (b) 0.295 (c) 0.0598
Poisson distribution (page 390)
Exercise 7d Finding I' or a or both, where 1. (a) 0.6201 (b) 0.39 (c) 0.5406 Exercise 9b Simulating random samples from
2. 0.3998 (b) 0.2004 (c) 0.3661 (d) 0.0637 Chapter 8 given distributions (page 435)
X- N(p, a 2 ) (page 381) (a)
3. (a) 0.313 (b) 0.5078 (c) 0.8335 (d) 0.1101 Some answers depend on the random numbers used and on the
1. 30 4. (a) 0.2614 (b) 0.2343 (c) 0.0558 Exercise 8a Sums and differences of normal method of allocation. These are possible answers.
2. 10.7 5. 0.8901
3. 8.31, 35.9%
variables (page 409) 10. (a) 1, 1, 1, 0, 3 (b) 4
6. 0.6887,4 11. 33.134, 33.193,28.712
4. 35.5 7. (a) 0.4574 (b) 0.173 (c) 0.8312 1. (a) 210,625 (b) X- N(210, 625)
12. (a) 3, 5 (b) 1, 5 (c) 1007.2, 1016.8
5. 1.75 8. (a) 0.4594 (b) 0.5363 ... 4 (c) 0.6554 (d) 0.7698
09 13. 1.52
6. 52.73, 11.96 9. (a) (i) 0.9815 (ii) 0.3486 (m) 0.9244 (b) 0 ·0 2. (a) 0.1319 (b) 0.0127
14. means of sample means"" distribution mean; variance of
7. 2.74, 2.78 3. (b) 0.9324
10. (a) 0.199 (b) 0.185; 0.870
4. 0.0745 sample means "" f variance of distribution
8. (a) 6.99, 0.324 (b) 0.0105 J 1. (a) 0.927 (b) 0.0102; 0.297 15. (a) 4 (b) 6.1826
3. (8.07, 9.13) 2. (0.35, 0.49), 0.14 7. (a) 2, 1.18 (b) 0.302
Exercise 9c The distribution of the sample 38.64 38.64)
4. (32.08, 33.22), 380 3. (a) x---,X+-- (c) Ho: fJ = 0.2, H 1 : fJ < 0.2, not reduced
mean, X (page 443) 5. (e) 5.13 (b) 0.588 (c) (4.70, 5.56) ( .r,; ..[;;
(b) 6000
(d) reduced
1. 0.0176 6. (14.98 g, 15.78 g) 4. (a) (244.2 g, 250.2 g) (b) 6.0 g (c) ,malh 8. (a) 0.430 (b) 0.962 (c) 0.00459
2. {a) 0.6234 (b) Approx. 4 7. (9.804, 9.808) (d) Ho: P ~ 0.9, HJ: P < 0.9, looking for a decrease
3. (a) 0.1056 (b) 0.3092 (e) No ev1dence that service has deteriorated.
- Exercise 9g Confidence intervals for p Chapter 10 (f) x.;;; 12; P(X.;;; 12) < 0.05, whereas P(X.;;; 13) > 0.05
4. (a) X- N ( 4.8, So
2.88) (b) 0.7975
(page 471) 9. (a) Defects occur randomly and independently, with 110
5. 1'1 8 (b) no
Exercise lOa Testing pin a binomial two defects at the same spot.
l. (a) (0.622,0.738)
6. (a) 0.2399 (b) 0.0787 (c) 0.0127 (d) n ~ 109 (b) The normal approximation to the binomial has been
distribution (small samples) (page 494) (b) (i) 0.209 (ii) 0.221
(c) 0.140
7. 0.9212 used in the underlying distribution. 1. H 11 : P= 0.7, H 1: P> 0.7; no evidence
8. 62 2. (a) (0.293, 0.427) (b) (0.273, 0.447) 2. (a) I-I,,
p ~ 1/6, H,, p > 1/6 (d) Ho: A= 2.4, Hl: A> 2.4, evidence that number of
defects has increased.
9. (a) 42 (b) 60 3. (a) (0.238, 0.362) (b) 90 (b) There is no evidence that die is biased in favour of 4.
10. (a) (i) 0.181 (ii) 0.999 (b) O.QJ8
10. 5 4. (e) 0.28 (b) (0.176,0.384) 3. (a) Do not reject H 0 (b) Reject H 0
11. 20,3 (c) No evidence of decrease.
5. (0.156, 0.344) 4. (a) Evidence to suggest decrease.
12. (a) 12 (b) 20 11. 0.0057, 9 rnins, not significant
6. (a) (0.223, 0.352) (b) wide; (b) No evidence to suggest decrease.
13. 205od, 1768; no 7. (a) Random sample (0.244, 0.283} (ii) 90 approximately 5, (a) x;;>S
14. 0.332, 0.0587, 0.009 (b) (i) 0.26 (b) The probability that H 0 is rejected when it is in fact
Mixed test lOA (Binomial) (page 506)
15. 0.4948, 0.4944,0.1211 8, (a) (0.351, 0.369) (b) 5277 true. (c) 0.1 1.0,1,9,10
16 (a) P(X=O)=~,P(X=1)=j,P(X=2)=i 9. (0.509, 0.547) 6. (a) Accept H 0 (b) Reject H 0 (c) Reject H 0 2. (a) H 0 :p=0,15,H 1 :jJ<0.15,evidencethatnew
(b) l (c) 0.159 (d) Accept H 0 (e) Accept H 0 (f) Reject H 0 procedure has been successful.
Miscellaneous exercise 9h (page 478) (g) Reject H 0 (h) Accept H 0 (b) Staff making an effort during the first week, take
Exercise 9d Distribution of sample proportions 7. (a) Driving instructor is over-estimating pass rate. sample over a longer period of time.
1. (124.34, 125.60), 4 x> 3
(large samples) (page 447) 2. (£93.59, £101.48)
(b) 3. No evidence to support gardener's claim.
8, She could have been guessing.
l. (a) 0.0745 (b) 0.0037 3. 1.13, 0.0603, ($1.07, £1.19) 9, (a) x < 2 (b) 0.803
2. (a) 0.0057 (b) 0.527 (c) 0.1265 4. 9.71, (172.3, 173.3)
Mixed test lOB (Poisson) (page 506)
10. (a) 15% (b) 0.15(09) (c) 28% (2 d.)
3. 0.0471 5. (a) 3, (2.04, 3.96} (b) 30%, (25.2%, 34.8%) 1. (a) Poisson, 2.1
11. (a) 7.5% (2 d.)
4. (a) 0.0648 (b) 0.0970 6. 0.059, 0.61 (b) (i) 0.650 (ii) 0.222
(b) same as significance level
5. 0.7181 7. (a) Lifetime of bulb follows a normal distribution; the (c) 66% (2 d.)
(c) Evidence suggests higher rate.
6. (a) 0.0648 (b) 0.0851 (c) 0.3068 items in the box constitute a random sample. 2. (e) (i) 0.138 (ii) 0.847
7. (a) 0.22 (b) (1774 hours, 1798 hours) (b) H 0 : A= 7.5, H 1: A< 7.5, does not provide significant
8. (a) 268 (b) smaller, critical z value less
Exercise lOb Testing,< in a Poisson evidence.
Exercise 9e Point estimates and confidence 9. (0.139, 0.315); there is a 1% chance that the interval has distribution (page 500) 3. (a) Nomina!ly 5% (between 4.26% and 8.39%)
1. Increased (b) 76% (2 d.).
intervals for I' (page 460) not trapped f-l·
10. (a) 26.525,1.24 (b) (26.20, 26.85) (c) justified 2. (a) Not increased (b) Decreased
1. 236, 7 ..18 (d) n large, use Central Limit theorem 3. H 0 : A= 9, H 1: A> 9, not increased
2. -(a) 48.875,6.98 (b) 1.69, 8 x 10-'(1 d.) 11. (a) (28.98 em, 29.42 em) (b) Large sample 4. (a) 0.0424 (b) 0.849 Chapter 11
(c) 22 . 79, 1.81 (c) X normally distributed, random sample 5. (a) Accept H 0 (b) Accept H 0 (c) Accept H 0
(d) 15,43.14 (c) 10,3.11 (f) 9.71, 621..12 Exercise lla z-tests for a normal population or
(d) (26.78 em, 31.62 em) (d) Reject H 0 (e) Accept H 0 (f) Reject H 0
3. 0.5, 1.428 (e) no; 30.5 out of range of 95% confidence interval for fl 6. H 0 : A= 3.5, H 1: A> 3.5, not increased large sample size (page 522)
4 . 205.16, 9.223 12. (92.32, 99.68)
5. (a) (139.16, 140.5) (b) random sample 1. (a) z=-1.095,aceeptH 0 (b) z=l.845,rejectH0
6. (a) (10.7.1, 14.1.1) (b) 3.4
13. (a) (202.4, 207.4) (b) 0.2, (0.057, 0.343) Miscellaneous exercise lOc Binomial and (c) z = 2.5, reject H 0 (d) z"' -2.778, reject I-1 0
14. (0.123,0.392), (170.84, 178.16), (165.57, 186.83) Poisson tests (page 504) 2. z = -0.943, no
7 . (a) (448.7, 467.3) 15. 25.35, 0.13, (25.15, 25.6), vatic!
(b) The probability that this interval includes 11 is 0.99. 1. (a) 0.028 (b) 0.131 3. It could be 103.5
16. (a) (0.303, 0.357) 4. z=2.487,yes
(c) No, z value less . (c) 10% probability that interval did not trap p; people (c) 0.261; H 0 : fJ = 0.6, H 1: fJ > 0.6; teacher is not
8 . (a) (79 . 19,84 . 81) (b) (78.89,85.11) underestimating 5. z = 1.909, distribution of the sample mean is approxi-
changed their minds at the last minute mately normal.
(c) No, the central limit theorem can be used, since 11 is 2. (a) 0.552 (b) 6, 0.296
17. (£35.60,£130.80) 6. z= 1.987, no evidence
large. 18. (35.03 mg, 35.31 mg) (c) The probability he scores a penalty kick remains
9. (68.0, 70.0), random sample, central limit thco!em can be constant at 0.7. 7. (a) X< 91.5065 minutes (b) 0.0093 (c) 0.3286
19. (13.10 mm, 14.72 mm) 8. z = 0.983, accept mean is zero
applied. 20. (47.02 em, 51.38 em) (d) H 0:p=0.7,H 1 :p>0.7
10. (a) 3.612 (b) (747.3 g, 748.7 g) (e) No evidence of improvement (f) strengthened 9. 5.778 <X< 6.222
21. (0.0825 mm, 0.242 mm) 10. (a) z=1.778,acceptH0 (b) z=l.778,rejectH0
(c) random sample, central limit theorem can be applied. 3. Manufacturer's claim is not accepted; discrete distribution,
11. (a) (1011, 1114) (b) 36 P(X < 12) ~ 3.6%, P(X < 13) ~ 17.1 %. (c) Z = -1.428, reject H 0 (d) z = -2.487 accept H 0
Mixed test 9A (page 481) 11. (a) Reject H 0 and conclude mean is not 52. (b) 0.04
12. 28 4. (a) H 0 : p = 0.2, H 1 : p > 0.2
13. (a) 5.06 g (b) 89% l. (a) 0.391 (b) 93% (b) X- B(25, 0.2) (c) 9 12. (a) 0.0817 (b) 0.665
14. Histogram: frequency densities 1.2, 3.6, 6.4, 11.4, 20.4, 2. 14 5. (a) (i) 0.0278 (ii) 0.0384 (iii) 0.0768 13. z = 2.946, yes
10.2, 5, 1.8; 91.32, 7.42, 0.43, (90.5, 92.2) 3. (0.23, 0.35); the norma! approximation to the bin.omial (b) Ho: P = 0.5, H 1: fJ =!= 0.5, no indication of whether 14. (b) 0.24 (c) p>389.7 (d) 0.0494
15. 25.3, 3.6, (24.9, 25.8) has been used in the underlying theory; only cars m the car looking for evidence of more males or more females.
16. Histogram: frequency densities 0.8, 0.48, 0.3, 0.18, 0.1, park were sampled which may not constitute a random (c) Evidence of more males than females, x;;;. 13 Exercise llb t~tests for a normal population,
0.05, 0.04, 0.03, 0.02; 194,176, (173.5,214.5) sample. 6. (a) 37% (b) 42% small sample s1ze (page 527)
4. (18.51, 19.49) (c) (i) The consumer group has used a high value for the 1. (a) t=0.909,acceptH 0 (b) t=-1.89,acceptH 11
Exercise 9f Confidence intervals - small significance.
(c) t=2.15,rejectH 0 (d) t=-3.07,acceptH 0
samples (t- distribution) (page 468) Mixed test 9B (page 482) (ii) Choose 5% or 10% significance level to maintain 2. t = 2.828, evidence uf impro\·eJ times
1. (b) (92.01, 93.19) credibility.
1. (a) (177.21 em, 182.12 em) (b) 4.91 em 3. (a) t=-3.54, underweight (b) z=-3.2, underweight
2. (a) (3.59,4.68) (b) 0.146 (c) Central Limit theorem can be applied.
2. t = -1.13, no evidence that Welsh policemen are shorter
4. t= -1.1, no 17. (a) Ho:f1=1.73,H 1:p>1.73 (b) X~N(1.73,0.0008)
5. t = 2.284, mean greater than 4.3 than Scottish policemen. {c) X> 1.777 13. (a} modal class 2 to <4
6. (a) t~-3.23,no (b) (1.69,2.88) 3. 196, z = -1.714, do not differ significantly (b) 4 years 8 months, 3 years 2 months
{d) men who play basketball are not taller (e) 0.14
7. (a) z=-1.66,nochangeinmean 4. (a) 10.8125 (b) t = -1.282, accept claim 18. 9 (c) For cumulative frequency curve plot (O O) (2 42 )
(b) 0.324, t = -2.33, change in mean 5. t = 2.423, not significant difference (4, 94), (6, 122), (8, 142), (10, 160) (1l 1'76) 3 ,
8, X is normally distributed, t = 1.80, accept null hypothesis 6. Normal populations with common variance, t = 2.36, 9 months, 4 years 9 months ' ' ; years
Test llA (z-tests) (page 558) 2
*
9. H 0: f1 = 27, H 1: ft 27, t= 2.9, mean is 27 evidence that mean has increased; t = 2.041, the mean
1. (a) 21.25
(d) X =5.73,v=5,justified
10. H 0 : f1 =SO, H 1 : f1 <50, t = -0.435, not overstating could be 500 g.
7. (a) Normal populations with common variance (b) z= 0.99, no evidence to support manufacturer's Ex~rcise 12b Goodness of fit tests _ binomial
(b) H 0 :ft 1 =p 2 ,H 1 :ft 1 o~=p 2 (c) t=-0.942,same suspicion
Exercise llc Testing a binomial proportion P01sson and normal distributions (page 579)'
large n (page 532) 8. t = 1.868, evidence that new method has led to higher (c) obtaining distribution of X, distribution of X not
scores; (-2.60, 33.9) known 1. Combine last three classes X 2 = 4 09 v- 3
X~ B(S, 0.3} , . ' - 'accept
1. (a) z=1.59,acceptH0 (b) z=2.206,acceptH0 2. z. = 1.5.67, not sufficient evidence to say that the quoted
(c) z=-1.79,acceptH0 (d) z=2.118,acceptH 0 Miscellaneous exercise lle (page 554) ftgure IS an underestimate 2. X 1-_ B(S, i), E = 8~.5, 80:_5, 32, 7 (last 3 classes combined),
{e) z = -2.937, reject H 0 3. (a) (i) P(reject H 0 when H 0 is true) X -8.21,v=3, brased;x=1,p=0.2,X~B(5,0.2},
2. z = -2.40, do not accept claim as there is evidence that 1 (17.1, 19.7), there is a 10% chance that it hasn't trapped (!il P(accept H 0 when H 1 is true) E = 66, 82, 41, 11 (last three classes combined) X 2 is very
proportion is less. ft: z = 0.759,11 could be 17.8 (b) (1} z=2.372,meanisgrcaterthan 17.5 small, v = 2, too good a fit, query data.
3. z = 1.637, yes 2. (a) Children within families selected are representative of (ii) X> 18.09 3. np, 1.6, 0.32, E· 7.3'171 ''
16.1, 7.5, 1.8, o•21COmine b.
4. z= 1.476, no all children. {iii) 0.639 last 3 classes) X 1 = 1. 79, v = 2, good fit
5. z=-1.990,no (b) z = 0.939, data. do not indicate that boys and girls are 4. 0.0606 (•6%), 0.1118 4. ~ - B(2, i), E = 150, 60, 6, X 2 = 9.6, v = 2, reject; use
6. z = 1.5, no not equally likely x = 0.444, p = 0.222, find E, v = 1
7. z = 1. 705, evidence that more than 65% own a mobile 3. (a) H 0 : fl = 30, H 1: ft > 30 (b) X> 33.95 Test llB (z-tests) (page 558) 5. (a) X= 2, P = 0.4, E = 6, 21, 28, 18, 7 (combine last
phone. (c) Evidence that mean speed is greater than 39 mph (X is 2 classes)
1. Ho: II= 125, HI: fl < 125, Z = -1.549, no evidence that{tis 2
8, (a) (i) 0.0297 (ii) 0.0934 in critical region). (b) X =2.21, v=3 (c) yes, binomial adequate
(d) 0.9941 lower
(b) z= -1.792, germination rate less than 75% (only just 6. (a) X- 8(5, 0.2088), E • 155, 205, 108, 28, 4, 0
-do further tests) 4. (a) H 0 :tt=43, H 1:p>43 2. z = 2.318, government spokesman {combme last 3 classes}
(b) Since n is large, the distribution of the sample means is 3. (a) X< 59.82 2
9. z=2.43, yes (b) X = 5.959, v = 2, binomial (but only just)
10. Replies were representative of the population. approximately normal (b) It is acc~pted that the mean is 60 when in fact it is an 7. (o) 7
(a) z = 1.220, no evidence to suggest proportion in favour (c) z = 1. 768, mean amount has increased alternative value (less than 60). (b) n ~ 20, p ~ 0.35; 0.16135, 8.1
(d) (43.35, 52.65), consistent, 43 out of range of (c) 0.057 (c) 12.3
is more than 0.7.
4 · (a) Ho:pl-f12=0,H :p -p o~=O
(b) (0.681, 0.808) confidence interval 1 1 2 (d) E~12.3,8.6,9.2,8.1,11.8 0~9 717 8 9
11. (a) Evidence that proportion is lower 5. H 0:p=0.13,H 1 :p<0.13,2%,0.161 (b) z = 1.6, no difference X 2 = 8.46 (e) 3, not good ,fit at 5% ' ' ' , '
level
(b) No different 6. {a) (i) P(X > critical value If1 = 65) (c) Distribution likely to be skewed rather than symmetric 8. (a) E~246.6, 345.2, 241.7, 112.8, 39.5, 14.2
12. (a) z = -3.03, evidence that p < 0.4 (ii) P(X <critical value 1ft is value specified by the (b) X =32.2, v=S, not accepted
(b) (0.379, 0.458); 75 alternative hypothesis) Test llC (t-tests) (page 559) 9. X=2.5,E=8,21,26,21, 13,11 (combine end classes)
13. z= -1.267, no (c) Accept H 0 , Type II X 2 = 2.59, v = 4, good '
1. t ~ -.3.5~0, evidence that mean falls below $7.40; norma[
14. z = -2.44, evidence that proportion has fallen (d) 0.0059, Type II error would be less and tends to zero d1stnbutwn 10. X~ 1.28, E = 41, 52, 3~, ~~· 6 (combine end classes),
as fl increases 2. t = -2.915, San Marco cooler X = 6.81, v = 3, not s1gmftcant
Exercise lld testing the difference between 7. Not representative as it excludes people at work, school, 11. X= 0.65, E = 20.88, 13.57, 5.55 {combine end classes)
3. (a) 4.238 (b) Norma[ distribution, t = 2.857, yes 2
etc; better to take random samples at random times during X =1.85,v=1,accept '
means of two normal populations {c) Perform z-test not t-test
the day for a spread of days, (68%, 80%). z = -2.03, data 12. (a) X~ 1.2, E = 99, 119, 72, 29, 9, 2 (combine end classes)
4. t = -2.046, new score higher; (-6.948, 32.282) or
Section A: z-tests (page 543) provide significant evidence (-32.282, 6.948) (b) X = 0.48, v = 3, very good fit
1. (a) (i) z = -2.096, reject H 0 (ii) z = -1.402, accept H 0 8. (a) X>p 0 +1.96_!!__orX<!t0 -1.96_!!__
13. X= 0.9, E= 21,,18, 11 (combine last 3 classes), x2 = 1.80,
(iii) z = 2.493, reject H 0 -r,; -r,; v = 1, yes, consistent
(b) (i) z=1.99,acccptH0 (ii) z=2.076,rejectH0 a Chapter 12 14. (b) E·7.3, 12.4, 10.6,9.7
(iii) z = -2.036, accept H 0 (iv) Z = 1. 783, reject H 0
(b) X >!t 0 -2.326 ..r,; 2
(c) X = 177,.v = 2, reasonable
There will be variation in answers, depending on the degree of (d) very low, suspicious
(v) z=1.779,rejectH0 (vi) z=-2.321,acccptH 0 9. (a) 0.422 accuracy used in various stages of the working.
15. E= 6.68, 9.19, 14.98, 19.15, 19.15 14.98 9 19 6 68
(only jqst) (vii) z = 2.55, reject H 0 (b) E(unbiased estimate)= true value; batch not rejected,
X 2 = 3.197, v= 7, accept. ' ' · ' · '
2. 0.567, z = -2.219, flowers on sunny side grow taller 9.6% Exercise 12a Goodness of fit test - uniform If 11, a 2 unknown, v = 5
3. z = 3.52, second population has smaller mean than first 10. (a) H 1:p>-!- (b) N>SO (c) 0.059,.,6% and given ratio (page 569)
4. z = 2.036, significant at 5% level, not significant at 4% 11. Accept as slow if mean bounce <11.645, 0.0004 1
16. (a) E; 3~ 13, 28, 32, 1.8, 6 (combine first 2 classes),
1. X = 1.93, v = 3, die is fair X =11.9,v=4,reJect
level 12. (a) (i) 10.46 (ii) 15.64, E(unbiased estimate)= true value
2. x = 18.16, v = 9, uniform distribution
2 (b) x~171.54,,~7.11 ' E~6 ,18 , . ,32, ,28co 13 m 3( m b'e
5. 4.41, (9.87, 10.73), 3.61, z = 1.49, not significant evidence (b) 1
6. z = -1.646, reject Mr Brown's claim (only just) (c) Central Limit theorem holds when n is large 3. X 2 =6.19,v=2,yes Iast 2 classes), X 1 = 1. 73, v = 2, accept normal
z = 3.367, accept claim that mean duration is more than 4. X~=4.95,v=3,no;X 1 =9.90,v=3,yes 17. (a) x ~ 1.732, & ~ 0.216 (3 d.p.) E" 7. 78 26 05 44 12
7. z= -2.04, evidence of difference 13.
33.64, 13.41,X 1 =8.96,v='2 ' . ' · '
8. 1.15, z = -2.913, significant evidence 12 months; n large, usc Central Limit theorem 5. X = 8.24, V= 7, accept theory
(b) X'·2.42,v~1
9.
10.
z = 1.627, accept; 124
27.33 (26.77, 27.89), 2.4, z= 1.97, those of higher
14. (o) 75
(b) z = 2.19, machine is not correctly calibrated ~: ~~: ~o:~s~;=\~~~
intelligence do not have greater foot length. (c) Unbiased estimate of standard deviation used, 8. 15.5 Exercise 12c Contingency tables (page 588)
distribution of sample mean appmximately normal; 9, 7~.81, 17.8, 7.8, 6-? X 1 =5.92, v=3, no difference 1. E = :8, 10.67, 21.33, 42, 9.33, 18.67, x 2 = 1.037, v= 2
(0.316, 5.684) 10. X = 38.2, v = 9, ev1dence of bias no d1fference '
Section B: t-tests (page 545) 11.
2
X =10,v=4,notuniform
(d) Smaller, might lead to result that machine is correctly 2. E = zs. s, 2s.s, 6o.s, 6o.s, 26.5, 26.5, 7.5, 7.5, x 1 = 2 . 03 ,
1. (i) (a) 17.73 (b) t=2.135,rejectH 0 calibrated. 12. x = 4.4, v = 5, die is fair
2
v = 3, mdependent
(ii) (a) 87.09 (b) t=-0,567,acceptH 0 15. (a) 66.25, 133.40 (b) H 0 : f1 = 62.5, H 1 > 62.5, 3. E= 50.1, 29.5, 23.4, 22.9, 13.5, 10.6, X 2 =4.00, v =2 yes
(iii) (a) 27.5625 (b) t=2.088,acceptH 0 z = 1.465, no evidence of increase 4. E = 65.1, 28.9, 58.9, 26.1, X 2 = 7.43, V= 1, yes '
(iv) (a) 4.182 (b) t=1.260,acceptH 0 16. (a) z=2.475, mean has increased 5. E =27.5, 972.5, 27.5, 972.5, X 1 = 4.79, v = 1, yes
!INS\fv'U\S 683
INDEX
JS 1 1(3bs
2 2 NOV 2004
mid-interval value
modal class 30 multiples of
mode, raw data 12 sum of 246,259
continuous random variable 2 range 256
multiple of random variables 329 interpercentilc 37
normal variables 246,250 interquartile 68
multiplication law (probability) 409 rank correlation 68
mutually exclusive events 186 rectangular (uniform) distribution, continuous 146
179 mean and variance 345
negative correlation
negative skew 119 discrete 347
non-parametric test 84,95 regressi.on, coefficients of 240
605 hmctwn 124, 142
normal approximation to binomial 119
to Poisson 382 least squares lines
distribution 390 calculator 119
89, 360 rejection criteria (rules) 133
goodness of fit test {x 2 ) 513
tables (standard normal) 576 rejection region
usc of 649 485,509
sample mean
null hypothesis (H0 ) 362-377 proportion 436
485,511,566,583,601 445
one-tailed tests sampling distribution of means
or rule, probability 489,511 proportions 436
outlier 183 sampling methods 445
98 duster 424
Pearson's coefficient of skewness design 429
percentile 85 frame 422
permutations 68 quota 429
pie diagrams 214 stratified 423
poi-nt estimates 24 systematic 428
Poisson, approximation to binomial 447 units 427
cumulative probability tables 299 scaling sets of data 423
use of 647 scatter diagram 51
diagrammatic representation 294 significance level 118
distribution 295 tests 485,509
expectation and variance 292 simulating random samples 483,507,560,600
fitting a theoretical distribution 293 skewness 431
goodness of fit test (x 2 ) 296 quartile coefficient of 84
hypothesis test for mean 573 Pearson's coefficient of 88
mode 496 Spearman's rank correlation coefficient 85
normal approximation to 296 significance of 146
sum of two variables 390 table of critical values 605
unit interval 301 standard deviation, discrete random variable 652
pooled two-sample estimate {variance) 293 calculator 249
population 535 frequency distribution 40
positive correlation 421 raw data 41
positive skew 119 stanclard error of mean 37
possibility space 84,95 of proportion 438
power of a test 172 standard normal variable 445
probability 521 cumulative tables 361
aUdition law (or rule) 168 use of 649
arrangements, permutations and combinations 183 stratified sampling 362
Bayes< theorem 206 stem and leaf diagrams (stemplot) 428
complementary event 197 back to back stcmplot 4
conditional events 172 step diagrams 7
density function (p.d.f.), continuous 182 sum of random variables 59
from cumulative distribution 314 normal 256
discrete 341 Poisson 403
distribution 234 survey 301
exhaustive events 233 systematic sampling 422
experimental 180 427
indepenclent events 169 t-distribution
multiplication law (and rule) 185 test statistic 462
mutually exclusive events 186 tied ranks 485,547
subjective 179 tree diagrams 150
trees 171 t-tables 193
193 use of 650
product-moment correlation coefficient 464
significance of 139 t-tests
table of critical values 600 two-tailed tests 524
652 type 1 and type II errors 489, 511
proportion, conficlence interval
dtstribution of sample 469 493,520
unbiasecl estimate
unbiased estimate 445 447
significance test, n large 447 uniform distribution {rectangular), continuous
discrete 345
n small 528 270
483 goodness of fit test
quartile coefficient of skewness unit interval (Poisson distribution) 567
quartiles, ungrouped data 88 upper quartile 293
continuous random nriab!c 69,71 continuous random variable 69, 71, 75
grouped data 336 336
75 variance, from data
quota sampling 38
423 random variables, continuous
random number table discrete 327
usc of 653 unbiased estimate 248
random sampling 425 pooled from two sample 447
from frequency distribution 424 Venn cliagram 535
from probability distribution 431 172, 175
432 weighted mean
random variables, continuous 36
difference between 314 width, confidence interval
interval 457
discrete 257 3
233 Yates' continuity correction
586